I have a pandas dataframe with a categorical series that has missing categories.
In the example shown below, group
has the categories "a"
, "b"
, and "c"
, but there are no cases of "c"
in the dataframe.
import pandas as pd dfr = pd.DataFrame({ "id": ["111", "222", "111", "333"], "group": ["a", "a", "b", "b"], "value": [1, 4, 9, 16]}) dfr["group"] = pd.Categorical(dfr["group"], categories=["a", "b", "c"]) dfr.pivot(index="id", columns="group")
The resulting pivoted dataframe has columns a
and b
. I expected a c
column containing all missing value as well.
value group a b id 111 1.0 9.0 222 4.0 NaN 333 NaN 16.0
How can I pivot a dataframe on a categorical series to include columns with all categories, regardless of whether they were present in the original dataframe?
Advertisement
Answer
pd.pivot_table
has a dropna
argument which dictates dropping or not value columns full of NaNs.
Try setting it to False
:
import pandas as pd dfr = pd.DataFrame({ "id": ["111", "222", "111", "333"], "group": ["a", "a", "b", "b"], "value": [1, 4, 9, 16]}) dfr["group"] = pd.Categorical(dfr["group"], categories=["a", "b", "c"]) pd.pivot_table(dfr, index="id", columns="group", dropna=False)