Pandas sum of count per percentile of rows

Question

Here is a link to a working example on Google Colaboratory. I have a dataset that represents the reviews (between 0.0 to 10.0) that users have left on various books. It looks like this: The first rows have 1 review while the last ones have thousands. I want to see the distribution of the reviews across the us…

Accepted Answer

I think you can use qcut to create the slices, in a groupby.sum. So with the sample data given slightly modified to avoid duplicated edges on this small sample (I replaced all the ones in count by 1,2,3,4,5)count_per_percentile = (    df['count']      .groupby(pd.qcut(df['count'], q=[0,0.2,0.4,0.6,0.8,1])).sum()      .tolist())print(count_per_percentile)# [3, 7, 5855, 12000, 21152]being the same result as with your method.In case your real data has too many 1, you could also use np.array_split socount_per_percentile  = [_s.sum() for _s in np.array_split(df['count'].sort_values(),5)]print(count_per_percentile)# [3, 7, 5855, 12000, 21152] #same result

Advertisement

Answer