I’m trying to find a way to do nested ranking (row number) in python that is equivalent to the following in TSQL: I have a table thank looks like this:
data = { 'col1':[11,11,11,22,22,33,33], 'col2':['a','b','c','a','b','a','b'] } df = pd.DataFrame(data) # col1 col2 # 11 a # 11 b # 11 c # 22 a # 22 b # 33 a # 33 b
Looking for Python equivalent to:
SELECT col1 ,col2 ,row_number() over(Partition by col1 order by col2) as rnk FROM df group by col1, col2
The output to be:
# col1 col2 rnk # 11 a 1 # 11 b 2 # 11 c 3 # 22 a 1 # 22 b 2 # 33 a 1 # 33 b 2
I’ve tried to use rank()
and groupby()
but I keep running into a problem of No numeric types to aggregate
. Is there a way to rank
non numeric columns and give them row numbers
Advertisement
Answer
Use cumcount()
df['rnk']=df.groupby('col1')['col2'].cumcount()+1 col1 col2 rnk 0 11 a 1 1 11 b 2 2 11 c 3 3 22 a 1 4 22 b 2 5 33 a 1 6 33 b 2