Here is a simple DataFrame:

> df = pd.DataFrame({'a': ['a1', 'a2', 'a3'],
'b': ['optional1', None, 'optional3'],
'c': ['c1', 'c2', 'c3'],
'd': [1, 2, 3]})
> df
a b c d
0 a1 optional1 c1 1
1 a2 None c2 2
2 a3 optional3 c3 3

### Pivot method 1

The data can be pivoted to this:

> df.pivot_table(index=['a','b'], columns='c')
d
c c1 c3
a b
a1 optional1 1.0 NaN
a3 optional3 NaN 3.0

Downside: data in the 2nd row is lost because `df['b'][1] == None`

.

### Pivot method 2

> df.pivot_table(index=['a'], columns='c')
d
c c1 c2 c3
a
a1 1.0 NaN NaN
a2 NaN 2.0 NaN
a3 NaN NaN 3.0

Downside: column `b`

is lost.

How can the two methods be combined so that columns `b`

and the 2nd row are kept like so:

d
c c1 c2 c3
a b
a1 optional1 1.0 NaN NaN
a2 None NaN 2.0 NaN
a3 optional3 NaN NaN 3.0

**More generally**: How can information from a row be retained during pivoting if a key has `NaN`

value?

## Answer

Use `set_index`

and `unstack`

to perform the pivot:

df = df.set_index(['a', 'b', 'c']).unstack('c')

This is essentially what pandas does under the hood for `pivot`

. The `stack`

and `unstack`

methods are closely related to `pivot`

, and can generally be used to perform pivot-like operations that don’t quite conform with the built-in pivot functions.

The resulting output:

d
c c1 c2 c3
a b
a1 optional1 1.0 NaN NaN
a2 NaN NaN 2.0 NaN
a3 optional3 NaN NaN 3.0

## Recent Comments