Skip to content
Advertisement

In pandas, how to pivot a dataframe on a categorical series with missing categories?

I have a pandas dataframe with a categorical series that has missing categories.

In the example shown below, group has the categories "a", "b", and "c", but there are no cases of "c" in the dataframe.

import pandas as pd
dfr = pd.DataFrame({
    "id": ["111", "222", "111", "333"], 
    "group": ["a", "a", "b", "b"], 
    "value": [1, 4, 9, 16]})
dfr["group"] = pd.Categorical(dfr["group"], categories=["a", "b", "c"])
dfr.pivot(index="id", columns="group")

The resulting pivoted dataframe has columns a and b. I expected a c column containing all missing value as well.

      value      
group     a     b
id               
111     1.0   9.0
222     4.0   NaN
333     NaN  16.0

How can I pivot a dataframe on a categorical series to include columns with all categories, regardless of whether they were present in the original dataframe?

Advertisement

Answer

pd.pivot_table has a dropna argument which dictates dropping or not value columns full of NaNs.

Try setting it to False:

import pandas as pd
dfr = pd.DataFrame({
    "id": ["111", "222", "111", "333"], 
    "group": ["a", "a", "b", "b"], 
    "value": [1, 4, 9, 16]})
dfr["group"] = pd.Categorical(dfr["group"], categories=["a", "b", "c"])
pd.pivot_table(dfr, index="id", columns="group", dropna=False)
Advertisement