C# regex extract string enclosed into single quotes


Keywords:c# 


Question: 

I've the following string that I need to parse using RegEx.

abc = 'def' and size = '1 x(3\" x 5\")' and (name='Sam O\'neal')

This is an SQL filter, which I'd like to split into tokens using the following separators:

(, ), >,<,=, whitespace, <=, >=, !=

After the string is parsed, I'd like the output to be:

abc,
=,
def,
and,
size,
=,
'1 up(3\" x 5\")',
and,
(,
Sam O\'neal,
),

I've tried the following code:

string pattern = @"(<=|>=|!=|=|>|<|\)|\(|\s+)";
var tokens = new List<string>(Regex.Split(filter, pattern));
tokens.RemoveAll(x => String.IsNullOrWhiteSpace(x));

I'm not sure how to keep the string in single quotes as a one token. I'm new to Regex and would appreciate any help.


1 Answer: 

Your pattern needs an update with yet another alternative branch: '[^'\\]*(?:\\.[^'\\]*)*'.

It will match:

  • ' - a single quote
  • [^'\\]* - 0+ chars other than ' and \
  • (?: - a non-capturing group matching sequences of:
    • \\. - any escape sequence
    • [^'\\]* - 0+ chars other than ' and \
  • )* - zero or more occurrences
  • ' - a single quote

In C#:

string pattern = @"('[^'\\]*(?:\\.[^'\\]*)*'|<=|>=|!=|=|>|<|\)|\(|\s+)";

See the regex demo

C# demo:

var filter = @"abc = 'def' and size = '1 x(3"" x 5"")' and (name='Sam O\'neal')";
var pattern = @"('[^'\\]*(?:\\.[^'\\]*)*'|<=|>=|!=|=|>|<|\)|\(|\s+)";
var tokens = Regex.Split(filter, pattern).Where(x => !string.IsNullOrWhiteSpace(x));
foreach (var tok in tokens)
    Console.WriteLine(tok);

Output:

abc
=
'def'
and
size
=
'1 x(3" x 5")'
and
(
name
=
'Sam O\'neal'
)