Grep's word boundaries include spaces?



I tried to use grep to search for lines containing the word "bead" using "\b" but it doesn't find the lines containing the word "bead" separated by space. I tried this script:

cat in.txt | grep -i "\bbead\b" > out.txt

I get results like

  • BEAD-air.JPG
  • Bead, 3 sided MET DP110317.jpg
  • Bead. -2819 (FindID 10143).jpg
  • Bead(Gem), Artefacts of Phu Hoa site(Dong Nai province).jpg
  • Romano-British pendant amulet (bead) (FindID 241983).jpg

But I don't get the results like

  • Bead fun.jpg

Instead of getting some 2,000 lines, I'm only getting 92 lines

My OS is Windows 10 - 64 bit but I'm using grep 2.5.4 from the GnuWin32 package.

I've also tried the MSYS2, which includes grep 3.0 but it does the same thing.

And then, how can I search for words separated by space?

LATER EDIT: It looks like grep has problems with big files. My input file is 2.4 GB in size. With smaller files, it works - I reported the bug here:

2 Answers: 

Try this,

cat in.txt | grep -wi "bead" 

-w provides you a whole word search

What you are doing normally should work but there are ways of setting what is and is not considered a word boundary. Rather than worry about it please try this instead:

cat in.txt | grep -iP "\bbead(\b|\s)" > out.txt

The P option adds in Perl regular expression power and the \s matches any sort of space character. The Or Bar | separates options within the parens ( )

While you are waiting for grep to be fixed you could use another tool if it is available to you. E.g.

perl -lane 'print if (m/\bbead\b/i);' in.txt > out.txt