regex - How to extract portion of a line in ruby?


Keywords:ruby 


Question: 

I have a line say

line = "start running at Sat April 1 07:30:37 2017"

and I want to extract

"Sat April 1 07:30:37 2017"

I tried this...

line = "start running at Sat April 1 07:30:37 2017"
if (line =~ /start running at/)
   line.split("start running at ").last
end

... but is there any other way of doing this?


6 Answers: 

This is a way to extract, from an arbitrary string, a substring that represents a time in the given format. I've assumed there is at most one such substring in the string.

require 'time'

R = /
    (?:#{Date::ABBR_DAYNAMES.join('|')})\s
              # match day name abbreviation in non-capture group. space
    (?:#{Date::MONTHNAMES[1,12].join('|')})\s
              # match month name in non-capture group, space
    \d{1,2}\s # match one or two digits, space
    \d{2}:    # match two digits, colon
    \d{2}:    # match two digits, colon
    \d{2}\s   # match two digits, space
    \d{4}     # match 4 digits
    (?!\d)    # do not match digit (negative lookahead)
    /x        # free-spacing regex def mode
  # /
  #  (?:Sun|Mon|Tue|Wed|Thu|Fri|Sat)\s
  #   (?:January|February|March|...|November|December)\s
  # \d{1,2}\s
  # \d{2}:
  # \d{2}:
  # \d{2}\s
  # \d{4}
  # (?!\d)
  # /x 

def extract_time(str)
  s = str[R]
  return nil if s.nil?
  (DateTime.strptime(s, "%a %B %e %H:%M:%S %Y") rescue nil) ? s : nil
end

str = "start eating breakfast at Sat April 1 07:30:37 2017"
extract_time(str)
  #=> "Sat April 1 07:30:37 2017" 

str = "go back to sleep at Cat April 1 07:30:37 2017"
extract_time(str)
  #=> nil

Alternatively, if there is a match against R, but Time#strptime raises an exception (meaning s is not a valid time for the given time format) one could raise an exception to advise the user.

 

try

line.sub(/start running at (.*)/, '\1')
 

The standard way to do this with regular expressions would be:

if md = line.match(/start running at (.*)/)
  md[1]
end

But you don't need regular expressions, you can do regular string operations:

prefix = 'start running at '
if line.start_with?(prefix)
  line[prefix.size..-1]
end
 

Here's another (as it turns out, slightly faster) option using #partition:

# will return empty string if there is no match, instead of raising an exception like split.last will
line.partition('start running at ').last

I was interested how this performs against regexp match, so here's a quick benchmark with 1 million executions each:

line.sub(/start running at (.*)/, '\1')
# => @real=1.7465

line.partition('start running at ').last
# => @real=0.712406
# => this is faster, but you'd need to be calling this quite a bit for it to make a significant difference

Bonus: it also makes it really easy to cater for a more general case e.g. if you have lines that start with "start running at" and others that start with "stop running at". Then something like line.partition(' at ').last will cater for both (and actually run slightly faster).

 

And yet another alternative:

puts $1 if line =~ /start running at (.*)/
 

The shortest would be line["Sat April 1 07:30:37 2017"] which would return your "Sat April 1 07:30:37 2017" string if present and nil if not. The [] notation on a String is a shorthand for getting a substring out of the string and can be used with another string or a Regular Expression. See

In case the string is unknown you can use this shorthand also like Cary suggested

line[/start running at (.*)/, 1]

In case you want to be sure the date extracted is valid you would need the regular expression from his answer but you still could use this method.