regex - Capture in split


Keywords:regex 


Question: 

This question already has an answer here:


3 Answers: 

If you put the delimiters in the regexp in a capture group, then split will include the delimiters in its result -- it will alternate between words and delimiters. You can then push the unprocessed delimiters and the processed words onto the result array.

my @words =split(/([~,;#&=\.\s\|\(\)\+\-\?\:]+)/,$string);
my @processed_words = ();
foreach (@words)  {
    if (/[~,;#&=\.\s\|\(\)\+\-\?\:]/) { // delimiter, just copy it
        push(@processed_words, $_)
    } else { // process the word
        push(@processed_words,process_word($_));
}
 

You need to use a capture group around your delimiter to keep the delimiter in the result array. Then use a for loop and check if the index is odd or even, example (that splits on non-word characters and makes word characters uppercase):

echo 'a"b@c%d.e^f$g' | perl -ne '@a=split(/(\W+)/);for($i=0;$i<@a;++$i){ print $i%2 ? @a[$i] : uc@a[$i];}'

(Where $i%2 checks if the index $i is odd or even).

 

Not quite sure what you mean by "compose $string back after splitting", but maybe something like:

my $composed = join(" ", map { process_word($_) } @words);

...would do the trick?