I have a pandas dataframe with a categorical series that has missing categories.
In the example shown below, group
has the categories "a"
, "b"
, and "c"
, but there are no cases of "c"
in the dataframe.
JavaScript
x
8
1
import pandas as pd
2
dfr = pd.DataFrame({
3
"id": ["111", "222", "111", "333"],
4
"group": ["a", "a", "b", "b"],
5
"value": [1, 4, 9, 16]})
6
dfr["group"] = pd.Categorical(dfr["group"], categories=["a", "b", "c"])
7
dfr.pivot(index="id", columns="group")
8
The resulting pivoted dataframe has columns a
and b
. I expected a c
column containing all missing value as well.
JavaScript
1
7
1
value
2
group a b
3
id
4
111 1.0 9.0
5
222 4.0 NaN
6
333 NaN 16.0
7
How can I pivot a dataframe on a categorical series to include columns with all categories, regardless of whether they were present in the original dataframe?
Advertisement
Answer
pd.pivot_table
has a dropna
argument which dictates dropping or not value columns full of NaNs.
Try setting it to False
:
JavaScript
1
8
1
import pandas as pd
2
dfr = pd.DataFrame({
3
"id": ["111", "222", "111", "333"],
4
"group": ["a", "a", "b", "b"],
5
"value": [1, 4, 9, 16]})
6
dfr["group"] = pd.Categorical(dfr["group"], categories=["a", "b", "c"])
7
pd.pivot_table(dfr, index="id", columns="group", dropna=False)
8