regex - Why am I getting Match Error?


Keywords:regex 


Question: 

I have a txt of sql insert statements like:

insert into songlist (id, artist, title, numone) values (6606, 'TIMI YURO', 'HURT', 0);
insert into songlist (id, artist, title, numone) values (6607, 'TIMI YURO', 'WHAT*S A MATTER BABY', 0);
insert into songlist (id, artist, title, numone) values (6608, 'TIMI YURO', 'MAKE THE WORLD GO AWAY', 0);
insert into songlist (id, artist, title, numone) values (6609, 'HELMUT ZACHARIAS', 'WHEN THE WHITE LILACS BLOOM AGAIN', 0);
insert into songlist (id, artist, title, numone) values (6610, 'JOHN *THE COOL GHOUL* ZACHERLE', 'DINNER WITH DRAC', 0);
insert into songlist (id, artist, title, numone) values (6611, 'MICHAEL ZAGER BAND', 'LET*S ALL CHANT', 0);
insert into songlist (id, artist, title, numone) values (6612, 'ZAGER AND EVANS', 'IN THE YEAR 2525 (EXORDIUM AND TERMINUS)', 1);
insert into songlist (id, artist, title, numone) values (6613, 'RICKY ZAHND / BLUEJEANERS', 'NUTTIN* FOR CHRISTMAS', 0);
insert into songlist (id, artist, title, numone) values (6614, 'WARREN ZEVON', 'WEREWOLVES OF LONDON', 0);
insert into songlist (id, artist, title, numone) values (6615, 'ZOMBIES', 'SHE*S NOT THERE', 0);

I'm reading them in the following manner:

val dt_split = bufferedsr.getLines.mkString.split(Pattern.quote("insert into songlist (id, artist, title, numone)"))    


val dt_pt = raw"values \((\d+), '(.*)', '(.*)', (\d+)\);".r

val tmp =  dt_split.map( elem => elem.mkString match {
    case dt_pt (id,artist,title,numone) => (id.toInt, artist, title, numone.toInt) 
  } )

Error: scala.MatchError: (of class java.lang.String) Complete verbose error can be found here.

Note that val dt_split = bufferedsr.getLines.mkString.split(Pattern.quote("insert into songlist (id, artist, title, numone)")).toList gives

 values (6606, 'TIMI YURO', 'HURT', 0);
 values (6607, 'TIMI YURO', 'WHAT*S A MATTER BABY', 0);
 values (6608, 'TIMI YURO', 'MAKE THE WORLD GO AWAY', 0);
 values (6609, 'HELMUT ZACHARIAS', 'WHEN THE WHITE LILACS BLOOM AGAIN', 0);
 values (6610, 'JOHN *THE COOL GHOUL* ZACHERLE', 'DINNER WITH DRAC', 0);
 values (6611, 'MICHAEL ZAGER BAND', 'LET*S ALL CHANT', 0);
 values (6612, 'ZAGER AND EVANS', 'IN THE YEAR 2525 (EXORDIUM AND TERMINUS)', 1);
 values (6613, 'RICKY ZAHND / BLUEJEANERS', 'NUTTIN* FOR CHRISTMAS', 0);
 values (6614, 'WARREN ZEVON', 'WEREWOLVES OF LONDON', 0);
 values (6615, 'ZOMBIES', 'SHE*S NOT THERE', 0); 

What am I missing?


2 Answers: 

Not sure what you are trying to do, but you can extract your required matches from your file using following code,

val linesIterator = Source.fromFile("your_file_path").getLines

val regexPattern = raw".* values \((\d+), '(.*)', '(.*)', (\d+)\);".r

val tupleIterator = linesIterator.flatMap(line => line match {
  case regexPattern(id, artist, title, numone) => Some((id, artist, title, numone))
  case _ => None
})

val tupleList = tupleIterator.toList

tupleList.foreach(println)
// (6606,TIMI YURO,HURT,0)
// (6607,TIMI YURO,WHAT*S A MATTER BABY,0)
// (6608,TIMI YURO,MAKE THE WORLD GO AWAY,0)
// (6609,HELMUT ZACHARIAS,WHEN THE WHITE LILACS BLOOM AGAIN,0)
// (6610,JOHN *THE COOL GHOUL* ZACHERLE,DINNER WITH DRAC,0)
// (6611,MICHAEL ZAGER BAND,LET*S ALL CHANT,0)
// (6612,ZAGER AND EVANS,IN THE YEAR 2525 (EXORDIUM AND TERMINUS),1)
// (6613,RICKY ZAHND / BLUEJEANERS,NUTTIN* FOR CHRISTMAS,0)
// (6614,WARREN ZEVON,WEREWOLVES OF LONDON,0)
// (6615,ZOMBIES,SHE*S NOT THERE,0)
 

The main reason for this error is that the text you are splitting starts with the pattern, so the first result will be an empty string:

scala> "abcd values".split(Pattern.quote("abcd"))
res1: Array[String] = Array("", " values")

A better approach would be to use stripPrefix instead:

bufferedsr.getLines.map(_.stripPrefix("insert into songlist (id, artist, title, numone)"))

This results in an Iterator, but you can convert it to a Seq if you want.


Another error here is that there is a space character missing between your split pattern and regex pattern. You can add this space to the prefix you strip:

bufferedsr.getLines.map(_.stripPrefix("insert into songlist (id, artist, title, numone) "))

Also, there may be empty lines in the source file, especially the last line, so you also may have to filter dt_split.


The full implementation can look like this:

val dt_split = bufferedsr
  .getLines
  .map(_.stripPrefix("insert into songlist (id, artist, title, numone) "))
  .filter(_.nonEmpty)
  .toSeq

val dt_pt = raw"values \((\d+), '(.*)', '(.*)', (\d+)\);".r

val tmp =  dt_split.map( elem => elem.mkString match {
    case dt_pt (id,artist,title,numone) => (id.toInt, artist, title, numone.toInt) 
  } )