Skip to content
Advertisement

Pandas pivot_table aggfunc ignores categories if more than one line of data is being aggregated

I am trying to aggregate a dataframe using pandas.pivot_table and find it behaves differently when multiple lines are aggregated on a categorical series.

Code from this issue helps explain (though the issue is different from mine).

Setting up a dataframe with a categorical column:

JavaScript

If I pivot the dataframe with

JavaScript

I get a dataframe with all the categories with the stations that had no data in the dataframe as columns filled with 0s, which is what I want:

JavaScript

However, if I add some rows with repeated values:

JavaScript

… and pivot

JavaScript

Now the stations not represented in df3 do not apear in the pivot:

JavaScript

I can add the missing categories by iterating over the categories and add a column of 0s if not in the pivot table, but it should be done with pandas, surely?!

I hope that is clear, first question! Thank you

Advertisement

Answer

JavaScript

This is because df2.Station is not a Category yet. You must apply the same transformation as you did for df1 to df2 for the pivot to work.

Adding this line before your concat should resolve the problem:

JavaScript
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement