I’m trying to find a way to do nested ranking (row number) in python that is equivalent to the following in TSQL: I have a table thank looks like this:
data = {
'col1':[11,11,11,22,22,33,33],
'col2':['a','b','c','a','b','a','b']
}
df = pd.DataFrame(data)
# col1 col2
# 11 a
# 11 b
# 11 c
# 22 a
# 22 b
# 33 a
# 33 b
Looking for Python equivalent to:
SELECT col1 ,col2 ,row_number() over(Partition by col1 order by col2) as rnk FROM df group by col1, col2
The output to be:
# col1 col2 rnk # 11 a 1 # 11 b 2 # 11 c 3 # 22 a 1 # 22 b 2 # 33 a 1 # 33 b 2
I’ve tried to use rank() and groupby() but I keep running into a problem of No numeric types to aggregate. Is there a way to rank non numeric columns and give them row numbers
Advertisement
Answer
Use cumcount()
df['rnk']=df.groupby('col1')['col2'].cumcount()+1
col1 col2 rnk
0 11 a 1
1 11 b 2
2 11 c 3
3 22 a 1
4 22 b 2
5 33 a 1
6 33 b 2