I have a large dataframe (>16M rows) which has a column named ‘user’. Every user have more than one occurrences. I want to add a new column ‘counter’ that increases every time a specific user has a new record.
The dataframe looks like this:
user revenue AAA 100000 BBB 150000 CCC 10000 AAA 200000 BBB 100000
I want it to look like this with the new counter column
user revenue counter AAA 100000 1 BBB 150000 1 CCC 100000 1 AAA 200000 2 BBB 100000 2
I tried the following line of code, but it’s taking ages:
for i in range(500000): user=df_user.iloc[i,0] a=1 for j in range(2000000): if df.iloc[j,0] == user: df.iloc[j,2] = a a = a+1
Advertisement
Answer
Please checkout pandas cumcount
df['counter'] = df.groupby('user').cumcount()
should do the trick