Skip to content
Advertisement

Python Rank with non numeric columns

I’m trying to find a way to do nested ranking (row number) in python that is equivalent to the following in TSQL: I have a table thank looks like this:

data = {
    'col1':[11,11,11,22,22,33,33],
    'col2':['a','b','c','a','b','a','b']
}
df = pd.DataFrame(data)


# col1  col2 
# 11     a      
# 11     b  
# 11     c
# 22     a      
# 22     b      
# 33     a      
# 33     b      

Looking for Python equivalent to:

SELECT
col1
,col2
,row_number() over(Partition by col1 order by col2) as rnk
FROM df
group by col1, col2

The output to be:

# col1  col2  rnk
# 11     a      1
# 11     b      2
# 11     c      3
# 22     a      1
# 22     b      2
# 33     a      1
# 33     b      2

I’ve tried to use rank() and groupby() but I keep running into a problem of No numeric types to aggregate. Is there a way to rank non numeric columns and give them row numbers

Advertisement

Answer

Use cumcount()

 df['rnk']=df.groupby('col1')['col2'].cumcount()+1



 col1 col2  rnk
0    11    a    1
1    11    b    2
2    11    c    3
3    22    a    1
4    22    b    2
5    33    a    1
6    33    b    2
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement