c# - Split String Into Components


Keywords:c# 


Question: 

I am try to split the following string in to 3 parts:

Esmael20170101one => Esmael 20170101 one

What are the options?


3 Answers: 

I suggest matching instead of splitting:

  string source = "Esmael20170101one";

  var match = Regex.Match(source, 
    @"^(?<name>[A-Za-z]+)(?<code>[0-9]+)(?<suffix>[A-Za-z]{3})$");

  string name = match.Groups["name"].Value;
  string code = match.Groups["code"].Value;
  string suffix = match.Groups["suffix"].Value;

if you insist on Regex.Split:

  string[] items = Regex.Split(source, "([0-9]+)");

  string name = items[0];
  string code = items[1];
  string suffix = items[2]; 
 

The regular expression to use is ([a-zA-Z]*)(\d+)([a-zA-Z]*)

string input = "Esmael20170101one";
var match = new Regex("([a-zA-Z]*)(\\d+)([a-zA-Z]*)").Match(input);
if (match.Success) {
    Console.WriteLine(match.Groups[1].ToString());
    Console.WriteLine(match.Groups[2].ToString());
    Console.WriteLine(match.Groups[3].ToString());
}
Console.Read();
 

If you use regex, you can define what areas to capture. For example it appears that the middle component is a date, so why not specify what the date pattern is such as

^                # Beginning of String
(?<Name>[^\d]+)  # Capture to `Name`
(?<Date>\d{8})   # Capture to `Date`
(?<Note>.+)      # Capture to `Note`
$                # End of string

Because I have commented this you will need to use the pattern only option of IgnorePatternWhitespace which just tells the parser to strip the comments (#) out.

The result will be this in a single match

enter image description here

  • Group[0] has the whole thing matched.
  • Group["Name"] or Group[1] is the name that is found.
  • Group["Date"] or Group[2] is the date that is found.
  • Group["Note"] or Group[3] is the note which is found.

As Dmitry pointed out, we need more information. All of these patterns can fail if there are numbers found in either of the groups depending on their location. If you know that all dates are within the 21st century adjust my pattern to be (?<Date>20\d{6}) to make sure that a true date is captured in that field; though it is not foolproof.