Skip to content
Advertisement

Extracting Multiple Parameters from a String using Regex or Pandas

I’m working with the following DataFrame

0                                                      NaN
1        {u'bphigh': u'120', u'bplow': u'70', u'weight'...
2                                                      NaN
3        {u'bphigh': 120, u'bplow': 60, u'weight': u'10...
4                                                      NaN
                               ...                        
13149                                                  NaN
13150    {u'bphigh': u'110', u'bplow': u'60', u'weight'...
13151    {u'bphigh': u'149', u'bplow': u'90', u'weight'...
13152    {u'bphigh': u'113', u'bplow': u'69', u'weight'...
13153    {u'bphigh': u'115', u'bplow': u'76', u'weight'...

Consisting of parameters (bphigh bplow weight) of type stras follows

{u'bphigh': u'120', u'bplow': u'70', u'weight': u'84.8'}

I’d like to extract these parameters and their corresponding values to columns as shown below

    bphigh  bplow   weight
0   11  22  31
1   42  52  61
2   72  82  91

I tried using the following pandas method which hasn’t really been consistent in extracting the parameters vitals['vital'].str.extract(r"{u'bphigh':s*(w+)")

Is there a more efficient workaround in pandas or regex to this issue?

Please Advise

Advertisement

Answer

from ast import literal_eval

try:

df['vital']=df['vital'].astype(str).map(lambda x:literal_eval(x) if x!='nan' else float('NaN'))

#In the above code we are making the string values to actual dictionary via 
#map() method we are iterating the values of 'vital' column and converting the
#string dictionary to actual dictionary via literal_eval() method and anonymous function
#skipping 'nan's' via if/else condition inside map() method

Finally:

out=pd.DataFrame(df['vital'].dropna().tolist())[['bphigh','bplow','weight']]

#In the code we are making Dataframe out of the dictionary values in 
#'vital' columns by making list of values of 'vial' column and then we 
#are selecting only these 3 columns ['bphigh','bplow','weight']

Now If you print out you will get your desired output

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement