I have a DataFrame with multiple columns:
JavaScript
x
13
13
1
A B C
2
3
a1. b1. r: 200
4
l: 350
5
t:600
6
7
a2 b2.
8
9
a3 b3. r: 400
10
t: 500
11
12
13
Now I want to break column C(whose each entry is separated by n) into multiple column like this:
JavaScript
1
5
1
A. B. r. l. t
2
a1 b1 200. 350. 600
3
a2 b2
4
a3. b3. 400. 500
5
I tried few techniques but can’t make it. I tried to use df.apply but not able to fix NA columns. Is there a way to cleanly achieve this?
Thanks.
Advertisement
Answer
A solution using a regex
with str.split
:
JavaScript
1
6
1
df = pd.DataFrame(columns=['A', 'B', 'C'], data=[['a1', 'b1', 'r: 200n l: 350n t:600'], ['a2', 'b2', ''], ['a3', 'b3', 'r:400n t:500']])
2
splitted = df.C.str.split('([r,l,t]{1}?):s?(d+)n?s?')
3
filtered = splitted.apply(lambda lst: list(filter(None, lst)))
4
numerical_values = filtered.apply(lambda lst: pd.Series(index=lst[0::2], data=lst[1::2], dtype=float))
5
df.join(numerical_values)
6