regex - Split string based on count and words using C#


Keywords:c# 


Question: 

I need to split string on words and each line should have 25 characters. for example:

string ORIGINAL_TEXT = "Please write a program that breaks this text into small chucks. Each chunk should have a maximum length of 25 "

output should be:

"Please write a program",

"that breaks this text",

"into small chucks. Each",

"chunk should have a",

"maximum length of 25"

I tried using substring - but it is breaking words like

"Please write a program th" - wrong

"Please write a program" - correct

Please write a program - is only 23 characters, it can take more 2 characters but it would break the word that.

string[] splitSampArr = splitSamp.Split(',', '.', ';');
string[] myText = new string[splitSampArr.Length + 1];

int i = 0;
foreach (string splitSampArrVal in splitSampArr)
{
    if (splitSampArrVal.Length > 25)
    {
        myText[i] = splitSampArrVal.Substring(0, 25);
        i++;
    }
    myText[i] = splitSampArrVal;

    i++;
}

2 Answers: 

You can achieve that with:

@"(\b.{1,25})(?:\s+|$)"

See the regex demo

This regex matches and captures into Group 1 any character but a newline (with .) preceded with a word boundary (so, we only start matching whole words), 1 to 25 occurrences (thanks to the limiting quantifier {1,25}), and then matches either 1 or more whitespace characters (with \s+) or the end of string ($).

See a code demo:

using System;
using System.Linq;
using System.Collections.Generic;
using System.Text.RegularExpressions;
public class Test
{
    public static void Main()
    {
        var str = "Please write a program that breaks this text into small chucks. Each chunk should have a maximum length of 25 ";
        var chunks = Regex.Matches(str, @"(\b.{1,25})(?:\s+|$)")
                 .Cast<Match>().Select(p => p.Groups[1].Value)
                 .ToList();
        Console.WriteLine(string.Join("\n", chunks));
    }
}
 
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace ConsoleApplication3
{
    class Program
    {
        static void Main(string[] args)
        {
            var sentence = "Please write a program that breaks this text into small chucks. Each chunk should have a maximum length of 25 ";
            StringBuilder sb = new StringBuilder();
            int count = 0;
            var words = sentence.Split(' ');
            foreach (var word in words)
            {
                if (count + word.Length > 25)
                {
                    sb.Append(Environment.NewLine);
                    count = 0;
                }
                sb.Append(word + " ");
                count += word.Length + 1;
            }
            Console.WriteLine(sb.ToString());
            Console.ReadKey();
        }
    }
}