python - pandas splitting letter and number mix column based on pattern


Keywords:python 


Question: 

I have a sample database with one column:

import pandas as pd
d = {

 'question#': ['a1.2','a10','a10.1','b11.1a','k20.3d','b20c']
}
df = pd.DataFrame(d)

it looks like this:

Out[8]: 
question#
0       a1.2
1       a10
2       a10.1
3       b11.1a
4       k20.3d
5       b20c

there isn't any way to sort number and letter mixed column correctly, so i thought the only way to do that is to first split the column into 3 columns:

first column: a letter: (a-z), the string always starts with one letter

second column: two possible outcomes:

  1. single digit or multiple digits: (1-9)+

    OR

  2. digits + '.' + digits: (1-9)+(/.)(1-9)+

third column: a letter or nothing: (a-z)?

so for the sample database i want it to split into the following columns, DESIRED OUTPUT:

Out[8]: 
question#  firstcol   secondcol    thirdcol
0             a         1.2
1             a         10
2             a         10.1
3             b         11.1           a
4             k         20.3           d
5             b         20             c

is the syntax something like this page? i am not sure how exactly to write the regex syntax:

  df['firstcol'] = df['question#'].str.extract(not sure the syntax, expand=True)
  df['secondcol'] = df['question#'].str.extract(not sure the syntax, expand=True)
  df['thirdcol'] = df['question#'].str.extract(not sure the syntax, expand=True)

1 Answer: 

Try

df[['firstcol', 'secondcol', 'thirdcol']] = df['question#'].str.extract('([A-Za-z]+)(\d+\.?\d*)([A-Za-z]*)', expand = True)


    question#   firstcol    secondcol   thirdcol
0   a1.2        a           1.2 
1   a10         a           10  
2   a10.1       a           10.1    
3   b11.1a      b           11.1        a
4   k20.3d      k           20.3        d
5   b20c        b           20          c