I have a DataFrame with multiple columns:
A B C a1. b1. r: 200 l: 350 t:600 a2 b2. a3 b3. r: 400 t: 500
Now I want to break column C(whose each entry is separated by n) into multiple column like this:
A. B. r. l. t a1 b1 200. 350. 600 a2 b2 a3. b3. 400. 500
I tried few techniques but can’t make it. I tried to use df.apply but not able to fix NA columns. Is there a way to cleanly achieve this?
Thanks.
Advertisement
Answer
A solution using a regex
with str.split
:
df = pd.DataFrame(columns=['A', 'B', 'C'], data=[['a1', 'b1', 'r: 200n l: 350n t:600'], ['a2', 'b2', ''], ['a3', 'b3', 'r:400n t:500']]) splitted = df.C.str.split('([r,l,t]{1}?):s?(d+)n?s?') filtered = splitted.apply(lambda lst: list(filter(None, lst))) numerical_values = filtered.apply(lambda lst: pd.Series(index=lst[0::2], data=lst[1::2], dtype=float)) df.join(numerical_values)