Regex with exception of particular words


Keywords:regex 


Question: 

I have problem with regex. I need to make regex with an exception of a set of specified words, for example: apple, orange, juice. and given these words, it will match everything except those words above.

apple (should not match)
applejuice (match)
yummyjuice (match)
yummy-apple-juice (match)
orangeapplejuice (match)
orange (should not match)
juice (should not match)
orange-apple-juice (match)
apple-orange-aple (match)
juice-juice-juice (match)
orange-juice (match)

6 Answers: 

If you really want to do this with a single regular expression, you might find lookaround helpful (especially negative lookahead in this example). Regex written for Ruby (some implementations have different syntax for lookarounds):

rx = /^(?!apple$|orange$|juice$)/


I noticed that apple-juice should match according to your parameters, but what about apple juice? I'm assuming that if you are validating apple juice you still want it to fail.

So - lets build a set of characters that count as a "boundary":

/[^-a-z0-9A-Z_]/        // Will match any character that is <NOT> - _ or 
                        // between a-z 0-9 A-Z 

/(?:^|[^-a-z0-9A-Z_])/  // Matches the beginning of the string, or one of those 
                        // non-word characters.

/(?:[^-a-z0-9A-Z_]|$)/  // Matches a non-word or the end of string

/(?:^|[^-a-z0-9A-Z_])(apple|orange|juice)(?:[^-a-z0-9A-Z_]|$)/ 
   // This should >match< apple/orange/juice ONLY when not preceded/followed by another
   // 'non-word' character just negate the result of the test to obtain your desired
   // result.

In most regexp flavors \b counts as a "word boundary" but the standard list of "word characters" doesn't include - so you need to create a custom one. It could match with /\b(apple|orange|juice)\b/ if you weren't trying to catch - as well...

If you are only testing 'single word' tests you can go with a much simpler:

/^(apple|orange|juice)$/ // and take the negation of this...


Sounds like you want to treat the hyphen as a word character.



This gets some of the way there:

((?:apple|orange|juice)\S)|(\S(?:apple|orange|juice))|(\S(?:apple|orange|juice)\S)


\A(?!apple\Z|juice\Z|orange\Z).*\Z

will match an entire string unless it only consists of one of the forbidden words.

Alternatively, if you're not using Ruby or you're sure that your strings contain no line breaks or you have set the option that ^ and $ do not match on beginnings/ends of lines

^(?!apple$|juice$|orange$).*$

will also work.



Something like (PHP)

$input = "The orange apple gave juice";
if(preg_match("your regex for validating") && !preg_match("/apple|orange|juice/", $input))
{
  // it's ok;
}
else
{
  //throw validation error
}