Here is a simple DataFrame:
JavaScript
x
11
11
1
> df = pd.DataFrame({'a': ['a1', 'a2', 'a3'],
2
'b': ['optional1', None, 'optional3'],
3
'c': ['c1', 'c2', 'c3'],
4
'd': [1, 2, 3]})
5
> df
6
7
a b c d
8
0 a1 optional1 c1 1
9
1 a2 None c2 2
10
2 a3 optional3 c3 3
11
Pivot method 1
The data can be pivoted to this:
JavaScript
1
7
1
> df.pivot_table(index=['a','b'], columns='c')
2
d
3
c c1 c3
4
a b
5
a1 optional1 1.0 NaN
6
a3 optional3 NaN 3.0
7
Downside: data in the 2nd row is lost because df['b'][1] == None
.
Pivot method 2
JavaScript
1
8
1
> df.pivot_table(index=['a'], columns='c')
2
d
3
c c1 c2 c3
4
a
5
a1 1.0 NaN NaN
6
a2 NaN 2.0 NaN
7
a3 NaN NaN 3.0
8
Downside: column b
is lost.
How can the two methods be combined so that columns b
and the 2nd row are kept like so:
JavaScript
1
7
1
d
2
c c1 c2 c3
3
a b
4
a1 optional1 1.0 NaN NaN
5
a2 None NaN 2.0 NaN
6
a3 optional3 NaN NaN 3.0
7
More generally: How can information from a row be retained during pivoting if a key has NaN
value?
Advertisement
Answer
Use set_index
and unstack
to perform the pivot:
JavaScript
1
2
1
df = df.set_index(['a', 'b', 'c']).unstack('c')
2
This is essentially what pandas does under the hood for pivot
. The stack
and unstack
methods are closely related to pivot
, and can generally be used to perform pivot-like operations that don’t quite conform with the built-in pivot functions.
The resulting output:
JavaScript
1
7
1
d
2
c c1 c2 c3
3
a b
4
a1 optional1 1.0 NaN NaN
5
a2 NaN NaN 2.0 NaN
6
a3 optional3 NaN NaN 3.0
7