I have this code:
y=pd.DataFrame({'num':[10,12,13,11,14]}) out = (y.join(y['num'].quantile([0.25,0.5,0.75,1]) .set_axis([f'{i}Q' for i in range(1,5)], axis=0) .to_frame().T .pipe(lambda x: x.loc[x.index.repeat(len(y))]) .reset_index(drop=True)) .assign(Rank=y['num'].rank(method='first')) )
The code is working as it is but is not returning What I want. I was trying to rank num
considering only it’s row so
10 is rank 1 because 10 <= 1Q value 12 is rank 2 **(not 3)** because 2Q <= 12 < 3Q value 13 is rank 3 **(not 4)** because 3Q <= 13 < 4Q value 11 is rank 1 **(not 2)** because 1Q <= 11 < 2Q value 14 is rank 4 **(not 5)** because 14>= Q4
I tried to change this line:
.assign(Rank=y['num'].rank(method='first'))
to:
.assign(Rank=y['num'].rank(axis=1,method='first'))
But it didn’t work.
What am i missing here?
Advertisement
Answer
Building on what you already have here:
y = y.join(y['num'].quantile([0.25,0.5,0.75,1]) .set_axis([f'{i}Q' for i in range(1,5)], axis=0) .to_frame().T .pipe(lambda x: x.loc[x.index.repeat(len(y))]) .reset_index(drop=True))
we could add the Rank
column as follows. The idea is to compare the num
column with the quantile columns and get the first column name where the quantile value is greater than a num
value. As it happens each quantile column already has rank numbers on it, so we use those to assign values:
y['Rank'] = (y.drop(columns='num').ge(y['num'], axis=0) .pipe(lambda x: x*x.columns).replace('', pd.NA) .bfill(axis=1)['1Q'].str[0].astype(int))
Output:
num 1Q 2Q 3Q 4Q Rank 0 10 11.0 12.0 13.0 14.0 1 1 12 11.0 12.0 13.0 14.0 2 2 13 11.0 12.0 13.0 14.0 3 3 11 11.0 12.0 13.0 14.0 1 4 14 11.0 12.0 13.0 14.0 4