Skip to content
Advertisement

How do I separate measurement value and unit into their respective columns if they appear together in DataFrame?

I have a DataFrame which contains measurements; e.g., weight, height, etc. However, sometimes the measurement column contains units together with values. Therefore, I would like to separate measurement values and units if they are together in DataFrame. Eg., In the below DataFrame, df, the height value and unit of the first entry are in respective columns. However, the value column of the 2nd and 3rd entries of height contains both value and unit together. In that case, I would like to move height units “m” and “cm” from the value column to the unit column.

measurement name value unit
height 160.0 cm
height 1.5 m
height 155cm

The output DataFrame should look like the below –

measurement name value unit
height 160.0 cm
height 1.5 m
height 155.0 cm

May I know how I separate values and units into their respective column in DataFrame efficiency in Python?

Advertisement

Answer

Use Series.str.extract with regex for get numeric values with . from start of string by ^, optionaly space separator by s* and non numeric values in end of strings by D with $ and pass to DataFrame.update for replace only extracted values:

df.update(df['value'].str.extract(r'^(?P<value>d+.*d*)s*(?P<unit>D+)$'))
print (df)
  measurement name  value unit
0           height  160.0   cm
1           height    1.5    m
2           height    155   cm
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement