I’m trying to find a way to do nested ranking (row number) in python that is equivalent to the following in TSQL: I have a table thank looks like this:
data = {
    'col1':[11,11,11,22,22,33,33],
    'col2':['a','b','c','a','b','a','b']
}
df = pd.DataFrame(data)
# col1  col2 
# 11     a      
# 11     b  
# 11     c
# 22     a      
# 22     b      
# 33     a      
# 33     b      
Looking for Python equivalent to:
SELECT col1 ,col2 ,row_number() over(Partition by col1 order by col2) as rnk FROM df group by col1, col2
The output to be:
# col1 col2 rnk # 11 a 1 # 11 b 2 # 11 c 3 # 22 a 1 # 22 b 2 # 33 a 1 # 33 b 2
I’ve tried to use rank() and groupby() but I keep running into a problem of No numeric types to aggregate. Is there a way to rank non numeric columns and give them row numbers
Advertisement
Answer
Use cumcount()
 df['rnk']=df.groupby('col1')['col2'].cumcount()+1
 col1 col2  rnk
0    11    a    1
1    11    b    2
2    11    c    3
3    22    a    1
4    22    b    2
5    33    a    1
6    33    b    2