python - Splitting a string with multiple delimiters and conditions


Keywords:python 


Question: 

I have a string from log file which has multiple delimiter to make it sequence.

Full string field1.field2.field3/field4/field5|field6|field7//|field8..

Delimited by . field1.field2.field3

Delimited by / /field4/field5

Delimited by |. But, "/" and "." are not delimiter in this portion of string |field6|field7//|field8..

Currently, I am parsing like below,

x
Out[64]: 'field1.field2.field3/field4/field5|field6|field7//|field8..'

y= x.split("|")
y
Out[66]: ['field1.field2.field3/field4/field5', 'field6', 'field7//', 'field8..']

z = y[0].split("/")
z
Out[68]: ['field1.field2.field3', 'field4', 'field5']

i = z[0].split(".")
i
Out[70]: ['field1', 'field2', 'field3']


result = i+z[1:]+y[1:]
result
Out[79]: 
['field1',
 'field2',
 'field3',
 'field4',
 'field5',
 'field6',
 'field7//',
 'field8..']

I think its very ineffecient way of parsing. Appreciate some suggestion to make it better.

I cannot have all three delimiters [|\.] to delimit the attribute in string without condition


1 Answer: 

Use re.split

re.split(r'[./|]', x)

or

re.split(r'\b[./]\b|\|', x)
  • \b[./]\b matches all the dots or forward slashes which are preceded and followed by word characters.

  • | OR

  • \| Matches all pipe chars.

  • re.split would do splitting according to the matched chars.

OR

>>> s = "field1.field2.field3/field4/field5|field6|field7//|field8.."
>>> re.split(r'(?<!\.)\.(?!\.)|(?<!\/)\/(?!\/)|(?<!\|)\|(?!\|)', s)
['field1', 'field2', 'field3', 'field4', 'field5', 'field6', 'field7//', 'field8..']