python - Split line on comma but not comma within quotes?


Keywords:python 


Question: 

I have an input file whose head looks like this:

AdditionalCookout.create!([
  {day_id: 275, cookout_id: 71, description: "Sample text, that, is ,driving , me, crazy"},
  {day_id: 275, cookout_id: 87, description: nil},
  {day_id: 276, cookout_id: 71, description: nil},
  {day_id: 276, cookout_id: 87, description: nil},
  {day_id: 277, cookout_id: 92, description: nil},
  {day_id: 277, cookout_id: 71, description: nil},

I am trying parse each line into it's own object. However, I can't split on commas because some of the descriptions have commas within them..

Tried these two regex lines from the StackOverflow posts I could find:

re.split(r', (?=(?:"[^"]*?(?: [^"]*)*))|, (?=[^",]+(?:,|$))', content[x])

And:

[y.strip() for y in content[x].split(''',(?=(?:[^'"]|'[^']*'|"[^"]*")*$)''')]

However.. they both output

['{day_id: 275', 'cookout_id: 71, description: "Feeling ambitious? If you really want to exhaust yourself today, consider adding some additional stationary cardio."},']

Turns into:
day_id: 275
cookout_id: 71, description: "Feeling ambitious? If you really want to exhaust yourself today, consider adding some additional stationary cardio.",

Any ideas how I can fix this so it correctly splits each line into three separate sections instead of just two? Thanks


1 Answer: 

Try using PyYAML to parse it. Worked from me on your example. . Then you can avoid the regex headache.

import yaml
yaml.load('{day_id: 275, cookout_id: 71, description: "Sample text, that, is,driving , me, crazy"}')
{'cookout_id': 71,
 'day_id': 275,
 'description': 'Sample text, that, is,driving , me, crazy'}