I am new to pandas, and I want to figure out how to group values based on sample quantiles.
For example, I have a dataframe with a column a
.
df = pd.DataFrame(np.random.randint(0,100,size=(100, 1)), columns=list('a'))
Then what I want to do is to divide the values in a
into 10 different group by their deciles, and named the label of their group in a column b
.
Which means in the new column:
a | b | |
---|---|---|
60 | 2 | group 1 |
30 | 3 | group 1 |
94 | 3 | group 1 |
92 | 3 | group 1 |
63 | 3 | group 1 |
… | … | … |
47 | 92 | group 10 |
58 | 98 | group 10 |
66 | 99 | group 10 |
73 | 99 | group 10 |
24 | 100 | group 10 |
Is that possible?
Now what I do is get the deciles of a
first as df2
, then merge the df2
to the existing df
. A row is like
a | decile 1 | decile 2 | … | decile 10 | |
---|---|---|---|---|---|
0 | 1 | 5.5 | 18 | … | 100 |
And get their group by compare them with deciles by a lot of if else condition. It is ok for like quartiles, but almost impossible for like percentiles.
I hope I explain my questions clearly enough. If there is any misleading expression, please let me know.
Thanks for any help in advance!
Advertisement
Answer
First part answer is subtract 1 with integer division by 10
and add 1
for start groups from 1
:
df = pd.DataFrame({'a':range(1,101)}) df['b'] = 'group ' + (df.a.sub(1) // 10 + 1).astype(str) print(df) a b 0 1 group 1 1 2 group 1 2 3 group 1 3 4 group 1 4 5 group 1 .. ... ... 95 96 group 10 96 97 group 10 97 98 group 10 98 99 group 10 99 100 group 10
EDIT: For deciles use qcut
:
df['b'] = pd.qcut(df.a, 10, labels=False)