Skip to content
Advertisement

String column to multiple columns in DataFrame

I have a DataFrame with multiple columns:

A     B     C

a1.   b1.   r: 200
            l: 350
            t:600

a2    b2.   

a3    b3.   r: 400
            t: 500

            

Now I want to break column C(whose each entry is separated by n) into multiple column like this:

A.  B.  r.   l.   t 
a1  b1  200. 350. 600
a2  b2
a3. b3. 400.      500

I tried few techniques but can’t make it. I tried to use df.apply but not able to fix NA columns. Is there a way to cleanly achieve this?

Thanks.

Advertisement

Answer

A solution using a regex with str.split:

df = pd.DataFrame(columns=['A', 'B', 'C'], data=[['a1', 'b1', 'r: 200n l: 350n t:600'], ['a2', 'b2', ''], ['a3', 'b3', 'r:400n t:500']])
splitted = df.C.str.split('([r,l,t]{1}?):s?(d+)n?s?')
filtered = splitted.apply(lambda lst: list(filter(None, lst)))
numerical_values = filtered.apply(lambda lst: pd.Series(index=lst[0::2], data=lst[1::2], dtype=float))
df.join(numerical_values)
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement