c# - How can I split a string depending on its content?


Keywords:c# 


Question: 

I am trying to parse a string and split it by some delimiters, also including the delimiters.

For example, from the string if(a>b) write(a); I want to get if,(,a,>,b,),write,(,a,),;

Here is what I've tried:

string pattern = "(" + String.Join("|", delimiters.Select(d =>Regex.Escape(d)).ToList()) + ")";
List<string> result = Regex.Split(line, pattern).ToList();

It works, but it fails in some cases. If I had the string if(a>0) write("it is positive"); I would not like to get "it,is,positive" (because space is a delimiter), but "it is positive". How can I do this?


2 Answers: 

Matching C strings can be achieved with a known regex:

"[^"\\]*(?:\\.[^"\\]*)*"

See regex demo

To incorporate it into your code, you just need to add the regex to the list of delimiters, but you need to place it as the first alternative in the capturing group.

var delimiters = new List<string> { " ", "(", ")", ">", "<", ",", ";"};
var line = "if(a>b) write(\"My new result\")";
var escaped_delimiters = new List<string>();
escaped_delimiters.Add(@"""[^""\\]*(?:\\.[^""\\]*)*""");
escaped_delimiters.AddRange(delimiters.Select(d => Regex.Escape(d)).ToList());
var pattern = "(" + String.Join("|", escaped_delimiters) + ")";
var result = Regex.Split(line, pattern).Where(x => !String.IsNullOrWhiteSpace(x)).ToList();

See IDEONE demo

If you need no empty elements, use

List<string> result = Regex.Split(line, pattern).Where(x => !string.IsNullOrWhiteSpace(x)).ToList();

The result will be

enter image description here

 

I suggest you to do matching instead of splitting using the below regex.

@"(?:""[^""]*""|\w|[^\w\s])+"