c# - How to split a string with delimited as pipe (which is not inside double quotes


Keywords:c# 


Question: 

I have a string like below, which is pipe separated. it has double quotes around string (ex: "ANI").

How do I split this with pipe delimiter (which are not inside double quotes) ?

511186|"ANI"|"ABCD-102091474|E|EFG"||"2013-07-20 13:47:19.556"

And splitted values shoule be like below:

511186
"ANI"
"ABCD-102091474|E|EFG"

"2013-07-20 13:47:19.556"

Any help would be appreciated!

EDIT

The answer that I accepted, did not work for those strings which has double quotes inside. Any idea, what should be the issue ?

 using System.Text.RegularExpressions;
 string regexFormat = string.Format(@"(?:^|\{0})(""[^""]*""|[^\{0}]*)", '|');
string[] result = Regex.Matches("111001103|\"E\"|\"BBB\"|\"XXX\"|||10000009|153086649|\"BCTV\"|\"REV\"|||1.00000000|||||\"ABC-BT AD\"|\"\"\"ABC - BT\"\" AD\"|||\"N\"||\"N\"|||\"N\"||\"N",regexFormat)
  .Cast<Match>().Select(m => m.Groups[1].Value).ToArray();
  foreach(var i in result)
  Console.WriteLine(i)

4 Answers: 

You can use a regular expression to match the items in the string:

string[] result = Regex.Matches(s, @"(?:^|\|)(""[^""]*""|[^|]*)")
  .Cast<Match>()
  .Select(m => m.Groups[1].Value)
  .ToArray();

Explanation:

(?:       A non-capturing group
^|\|      Matches start of string or a pipe character
)         End of group
(         Capturing group
"[^"]*"   Zero or more non-quotes surrounded by quotes
|         Or
[^|]*     Zero or more non-pipes
)         End of group
 

Here is one way to do it:

public List<string> Parse(string str)
{
    var parts = str.Split(new[] {"|"}, StringSplitOptions.None);

    List<string> result = new List<string>();

    for (int i = 0; i < parts.Length; i++)
    {
        string part = parts[i];

        if (IsPartStart(part))
        {
            List<string> sub_parts = new List<string>();

            do
            {
                sub_parts.Add(part);
                i++;
                part = parts[i];
            } while (!IsPartEnd(part));

            sub_parts.Add(part);

            part = string.Join("|", sub_parts);
        }

        result.Add(part);
    }

    return result;

}

private bool IsPartStart(string part)
{
    return (part.StartsWith("\"") && !part.EndsWith("\"")) ;
}

private bool IsPartEnd(string part)
{
    return (!part.StartsWith("\"") && part.EndsWith("\""));
}

This works by splitting everything, and it then joins some of the parts that needs joining by searching for parts that starts with " and corresponding parts that ends with ".

 
string.Split("|", inputString);

...will give you the individual parts, but will fail if any of the parts have a pipe separator in them.

If it's a CSV file, following all the usual CSV rules about character-escaping, etc. (but using a pipe symbol instead of comma), then you should look at using CsvHelper, a NuGet package designed for reading and writing CSV files. It does all the hard work, and deals with all the corner cases that you'd otherwise have to do yourself.

 

Here's how I'd do it. It's fairly simple and I think you'll find it's very fast as well. I haven't run any tests, but I'm pretty confident that it's faster than regular expressions.

IEnumerable<string> Parse(string s)
{
    int pos = 0;

    while (pos < s.Length)
    {
        char endChar = '|';

        // Test for quoted value
        if (s[pos] == '"')
        {
            pos++;
            endChar = '"';
        }

        // Extract this value
        int newPos = s.IndexOf(endChar, pos);
        if (newPos < 0)
            newPos = s.Length;
        yield return s.Substring(pos, newPos - pos);

        // Move to start of next value
        pos = newPos + 1;
        if (pos < s.Length && s[pos] == '|')
            pos++;
    }
}