ruby - How do I preserve my first character after doing a regular expression split?


Keywords:ruby 


Question: 

I'm using Rails 5. I'm trying to split on a regular expression but it seems to be cutting off the first character of the item after the split. I have

2.4.0 :044 >   tokens = ["12.BILL R. PRESTON"]
 => ["12.BILL R. PRESTON"] 
2.4.0 :045 > tokens = tokens.flat_map { |token| token =~ /\d\.[a-z]/i ? token.split(/\d\.[a-z]/i) : token } 
 => ["1", "ILL R. PRESTON"]

I would expect the outcome to be

["1", "BILL R. PRESTON"]

but the "B" is getting removed. How can I adjust my split expression?


2 Answers: 

Use a lookahead (?=[a-z]) so that the B is not consumed in the split:

tokens.flat_map { |token| token =~ /\d\.[a-z]/i ? token.split(/\d\.(?=[a-z])/i) : token }
=> ["1", "BILL R. PRESTON"]

And if you want to keep both the 2 and the B, you can add a lookbehind (?<=\d):

tokens.flat_map { |token| token =~ /\d\.[a-z]/i ? token.split(/(?<=\d)\.(?=[a-z])/i) : token }
=> ["12", "BILL R. PRESTON"]
 
tokens = ["12.BILL R. PRESTON", "238.BETTY Z. BOOP"]

tokens.map { |s| s.split(/(?<=\d)\d+\./) }
  #=> [["1", "BILL R. PRESTON"], ["2", "BETTY Z. BOOP"]] 

If you want all the digits split off, remove `\d+`:

tokens.map { |s| s.split(/(?<=\d)\./) }
  #=> [["12", "BILL R. PRESTON"], ["238", "BETTY Z. BOOP"]]