Skip to content
Advertisement

Why the rank function is not working when I set axis=1?

I have this code:

y=pd.DataFrame({'num':[10,12,13,11,14]})

out = (y.join(y['num'].quantile([0.25,0.5,0.75,1])
         .set_axis([f'{i}Q' for i in range(1,5)], axis=0)
         .to_frame().T       
         .pipe(lambda x: x.loc[x.index.repeat(len(y))])
       .reset_index(drop=True))
       .assign(Rank=y['num'].rank(method='first'))
)
   

The code is working as it is but is not returning What I want. I was trying to rank num considering only it’s row so

10 is rank 1 because 10 <= 1Q value
12 is rank 2 **(not 3)** because 2Q <= 12 < 3Q value
13 is rank 3 **(not 4)** because 3Q <= 13 < 4Q value
11 is rank 1 **(not 2)** because 1Q <= 11 < 2Q value
14 is rank 4 **(not 5)** because 14>= Q4

I tried to change this line:

.assign(Rank=y['num'].rank(method='first'))

to:

 .assign(Rank=y['num'].rank(axis=1,method='first'))

But it didn’t work.

What am i missing here?

Advertisement

Answer

Building on what you already have here:

y = y.join(y['num'].quantile([0.25,0.5,0.75,1])
         .set_axis([f'{i}Q' for i in range(1,5)], axis=0)
         .to_frame().T       
         .pipe(lambda x: x.loc[x.index.repeat(len(y))])
       .reset_index(drop=True))

we could add the Rank column as follows. The idea is to compare the num column with the quantile columns and get the first column name where the quantile value is greater than a num value. As it happens each quantile column already has rank numbers on it, so we use those to assign values:

y['Rank'] = (y.drop(columns='num').ge(y['num'], axis=0)
             .pipe(lambda x: x*x.columns).replace('', pd.NA)
             .bfill(axis=1)['1Q'].str[0].astype(int))

Output:

   num    1Q    2Q    3Q    4Q  Rank
0   10  11.0  12.0  13.0  14.0     1
1   12  11.0  12.0  13.0  14.0     2
2   13  11.0  12.0  13.0  14.0     3
3   11  11.0  12.0  13.0  14.0     1
4   14  11.0  12.0  13.0  14.0     4
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement