I have a DataFrame with multiple columns:
A B C
a1. b1. r: 200
l: 350
t:600
a2 b2.
a3 b3. r: 400
t: 500
Now I want to break column C(whose each entry is separated by n) into multiple column like this:
A. B. r. l. t a1 b1 200. 350. 600 a2 b2 a3. b3. 400. 500
I tried few techniques but can’t make it. I tried to use df.apply but not able to fix NA columns. Is there a way to cleanly achieve this?
Thanks.
Advertisement
Answer
A solution using a regex with str.split:
df = pd.DataFrame(columns=['A', 'B', 'C'], data=[['a1', 'b1', 'r: 200n l: 350n t:600'], ['a2', 'b2', ''], ['a3', 'b3', 'r:400n t:500']])
splitted = df.C.str.split('([r,l,t]{1}?):s?(d+)n?s?')
filtered = splitted.apply(lambda lst: list(filter(None, lst)))
numerical_values = filtered.apply(lambda lst: pd.Series(index=lst[0::2], data=lst[1::2], dtype=float))
df.join(numerical_values)