I have a dataframe in the following structure:
JavaScript
x
4
1
cNames | cValues | number
2
[a,b,c] | [1,2,3] | 10
3
[a,b,d] | [55,66,77]| 20
4
I would like to transpose – create columns from the names in cNames.
But I can’t manage to achieve this with transpose because I want a column for each value in the list.
The needed output:
JavaScript
1
4
1
a | b | c | d | number
2
1 | 2 | 3 | NaN | 10
3
55 | 66 | NaN | 77 | 20
4
How can I achieve this result?
Thanks!
The code to create the DF:
JavaScript
1
4
1
d = {'cNames': [['a','b','c'], ['a','b','d']], 'cValues': [[1,2,3],
2
[55,66,77]], 'number': [10,20]}
3
df = pd.DataFrame(data=d)
4
Advertisement
Answer
One option is concat
:
JavaScript
1
5
1
pd.concat([pd.Series(x['cValues'], x['cNames'], name=idx)
2
for idx, x in df.iterrows()],
3
axis=1
4
).T.join(df.iloc[:,2:])
5
Or a DataFrame construction:
JavaScript
1
4
1
pd.DataFrame({idx: dict(zip(x['cNames'], x['cValues']) )
2
for idx, x in df.iterrows()
3
}).T.join(df.iloc[:,2:])
4
Output:
JavaScript
1
4
1
a b c d number
2
0 1.0 2.0 3.0 NaN 10
3
1 55.0 66.0 NaN 77.0 20
4
Update Performances sort by run time on sample data
DataFrame
JavaScript
1
6
1
%%timeit
2
pd.DataFrame({idx: dict(zip(x['cNames'], x['cValues']) )
3
for idx, x in df.iterrows()
4
}).T.join(df.iloc[:,2:])
5
1.29 ms ± 36.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
6
concat:
JavaScript
1
7
1
%%timeit
2
pd.concat([pd.Series(x['cValues'], x['cNames'], name=idx)
3
for idx, x in df.iterrows()],
4
axis=1
5
).T.join(df.iloc[:,2:])
6
2.03 ms ± 86.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7
KJDII’s new series
JavaScript
1
6
1
%%timeit
2
df['series'] = df.apply(lambda x: dict(zip(x['cNames'], x['cValues'])), axis=1)
3
pd.concat([df['number'], df['series'].apply(pd.Series)], axis=1)
4
5
2.09 ms ± 65.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6
Scott’s apply(pd.Series.explode)
JavaScript
1
9
1
%%timeit
2
df.apply(pd.Series.explode)
3
.set_index(['number', 'cNames'], append=True)['cValues']
4
.unstack()
5
.reset_index()
6
.drop('level_0', axis=1)
7
8
4.9 ms ± 135 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
9
wwnde’s set_index.apply(explode)
JavaScript
1
7
1
%%timeit
2
g=df.set_index('number').apply(lambda x: x.explode()).reset_index()
3
g['cValues']=g['cValues'].astype(int)
4
pd.pivot_table(g, index=["number"],values=["cValues"],columns=["cNames"]).droplevel(0, axis=1).reset_index()
5
6
7.27 ms ± 162 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7
Celius’ double explode
JavaScript
1
7
1
%%timeit
2
df1 = df.explode('cNames').explode('cValues')
3
df1['cValues'] = pd.to_numeric(df1['cValues'])
4
df1.pivot_table(columns='cNames',index='number',values='cValues')
5
6
9.42 ms ± 189 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7