Skip to content
Advertisement

Aggregating Pandas DF – Losing Data

I’m trying to aggregate a pandas df in a way an excel pivot table would. I have one quantitative variable called “Count”. I would like the same qualitative variables to combine and the “Count” data to sum.

However, when I am trying to do this with the below code, I see that I am somehow losing data. Any idea why this might be happening and how I can fix it?

I expect the number of rows to decrease but the total sum of the “Count” column shouldn’t change.

enter image description here

Advertisement

Answer

Since you have NaNs in your dataframe, they won’t be included in your groupby operation, and thus the data for those rows will not be summed.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement