Skip to content
Advertisement

Pandas sum of count per percentile of rows

Here is a link to a working example on Google Colaboratory.

I have a dataset that represents the reviews (between 0.0 to 10.0) that users have left on various books. It looks like this:

JavaScript

The first rows have 1 review while the last ones have thousands. I want to see the distribution of the reviews across the user population. I researched percentile or binning data with Pandas and found pd.qcut and pd.cut but using those, I was unable to get the format in the way I want it.

This is what I’m looking to get.

JavaScript

I could not figure out a “Pandas” way to do it so I wrote a loop to generate the data in that format myself and graph it.

JavaScript

How can I achieve the same outcome more directly with the library?

Advertisement

Answer

I think you can use qcut to create the slices, in a groupby.sum. So with the sample data given slightly modified to avoid duplicated edges on this small sample (I replaced all the ones in count by 1,2,3,4,5)

JavaScript

being the same result as with your method.

In case your real data has too many 1, you could also use np.array_split so

JavaScript
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement