I have this code:
JavaScript
x
11
11
1
y=pd.DataFrame({'num':[10,12,13,11,14]})
2
3
out = (y.join(y['num'].quantile([0.25,0.5,0.75,1])
4
.set_axis([f'{i}Q' for i in range(1,5)], axis=0)
5
.to_frame().T
6
.pipe(lambda x: x.loc[x.index.repeat(len(y))])
7
.reset_index(drop=True))
8
.assign(Rank=y['num'].rank(method='first'))
9
)
10
11
The code is working as it is but is not returning What I want. I was trying to rank num
considering only it’s row so
JavaScript
1
6
1
10 is rank 1 because 10 <= 1Q value
2
12 is rank 2 **(not 3)** because 2Q <= 12 < 3Q value
3
13 is rank 3 **(not 4)** because 3Q <= 13 < 4Q value
4
11 is rank 1 **(not 2)** because 1Q <= 11 < 2Q value
5
14 is rank 4 **(not 5)** because 14>= Q4
6
I tried to change this line:
JavaScript
1
2
1
.assign(Rank=y['num'].rank(method='first'))
2
to:
JavaScript
1
2
1
.assign(Rank=y['num'].rank(axis=1,method='first'))
2
But it didn’t work.
What am i missing here?
Advertisement
Answer
Building on what you already have here:
JavaScript
1
6
1
y = y.join(y['num'].quantile([0.25,0.5,0.75,1])
2
.set_axis([f'{i}Q' for i in range(1,5)], axis=0)
3
.to_frame().T
4
.pipe(lambda x: x.loc[x.index.repeat(len(y))])
5
.reset_index(drop=True))
6
we could add the Rank
column as follows. The idea is to compare the num
column with the quantile columns and get the first column name where the quantile value is greater than a num
value. As it happens each quantile column already has rank numbers on it, so we use those to assign values:
JavaScript
1
4
1
y['Rank'] = (y.drop(columns='num').ge(y['num'], axis=0)
2
.pipe(lambda x: x*x.columns).replace('', pd.NA)
3
.bfill(axis=1)['1Q'].str[0].astype(int))
4
Output:
JavaScript
1
7
1
num 1Q 2Q 3Q 4Q Rank
2
0 10 11.0 12.0 13.0 14.0 1
3
1 12 11.0 12.0 13.0 14.0 2
4
2 13 11.0 12.0 13.0 14.0 3
5
3 11 11.0 12.0 13.0 14.0 1
6
4 14 11.0 12.0 13.0 14.0 4
7