I have a pandas dataframe with a categorical series that has missing categories.
In the example shown below, group has the categories "a", "b", and "c", but there are no cases of "c" in the dataframe.
import pandas as pd
dfr = pd.DataFrame({
"id": ["111", "222", "111", "333"],
"group": ["a", "a", "b", "b"],
"value": [1, 4, 9, 16]})
dfr["group"] = pd.Categorical(dfr["group"], categories=["a", "b", "c"])
dfr.pivot(index="id", columns="group")
The resulting pivoted dataframe has columns a and b. I expected a c column containing all missing value as well.
value group a b id 111 1.0 9.0 222 4.0 NaN 333 NaN 16.0
How can I pivot a dataframe on a categorical series to include columns with all categories, regardless of whether they were present in the original dataframe?
Advertisement
Answer
pd.pivot_table has a dropna argument which dictates dropping or not value columns full of NaNs.
Try setting it to False:
import pandas as pd
dfr = pd.DataFrame({
"id": ["111", "222", "111", "333"],
"group": ["a", "a", "b", "b"],
"value": [1, 4, 9, 16]})
dfr["group"] = pd.Categorical(dfr["group"], categories=["a", "b", "c"])
pd.pivot_table(dfr, index="id", columns="group", dropna=False)