Skip to content
Advertisement

How to set groups by the percentiles of whole sample?

I am new to pandas, and I want to figure out how to group values based on sample quantiles.

For example, I have a dataframe with a column a.

df = pd.DataFrame(np.random.randint(0,100,size=(100, 1)), columns=list('a'))

Then what I want to do is to divide the values in a into 10 different group by their deciles, and named the label of their group in a column b.

Which means in the new column:

a b
60 2 group 1
30 3 group 1
94 3 group 1
92 3 group 1
63 3 group 1
47 92 group 10
58 98 group 10
66 99 group 10
73 99 group 10
24 100 group 10

Is that possible?

Now what I do is get the deciles of a first as df2, then merge the df2 to the existing df. A row is like

a decile 1 decile 2 decile 10
0 1 5.5 18 100

And get their group by compare them with deciles by a lot of if else condition. It is ok for like quartiles, but almost impossible for like percentiles.

I hope I explain my questions clearly enough. If there is any misleading expression, please let me know.

Thanks for any help in advance!

Advertisement

Answer

First part answer is subtract 1 with integer division by 10 and add 1 for start groups from 1:

df = pd.DataFrame({'a':range(1,101)})

df['b'] = 'group ' + (df.a.sub(1) // 10 + 1).astype(str)
print(df)
      a         b
0     1   group 1
1     2   group 1
2     3   group 1
3     4   group 1
4     5   group 1
..  ...       ...
95   96  group 10
96   97  group 10
97   98  group 10
98   99  group 10
99  100  group 10

EDIT: For deciles use qcut:

df['b'] = pd.qcut(df.a, 10, labels=False)
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement