I’m trying to find a way to do nested ranking (row number) in python that is equivalent to the following in TSQL: I have a table thank looks like this:
JavaScript
x
16
16
1
data = {
2
'col1':[11,11,11,22,22,33,33],
3
'col2':['a','b','c','a','b','a','b']
4
}
5
df = pd.DataFrame(data)
6
7
8
# col1 col2
9
# 11 a
10
# 11 b
11
# 11 c
12
# 22 a
13
# 22 b
14
# 33 a
15
# 33 b
16
Looking for Python equivalent to:
JavaScript
1
7
1
SELECT
2
col1
3
,col2
4
,row_number() over(Partition by col1 order by col2) as rnk
5
FROM df
6
group by col1, col2
7
The output to be:
JavaScript
1
9
1
# col1 col2 rnk
2
# 11 a 1
3
# 11 b 2
4
# 11 c 3
5
# 22 a 1
6
# 22 b 2
7
# 33 a 1
8
# 33 b 2
9
I’ve tried to use rank()
and groupby()
but I keep running into a problem of No numeric types to aggregate
. Is there a way to rank
non numeric columns and give them row numbers
Advertisement
Answer
Use cumcount()
JavaScript
1
13
13
1
df['rnk']=df.groupby('col1')['col2'].cumcount()+1
2
3
4
5
col1 col2 rnk
6
0 11 a 1
7
1 11 b 2
8
2 11 c 3
9
3 22 a 1
10
4 22 b 2
11
5 33 a 1
12
6 33 b 2
13