I have a large dataframe (>16M rows) which has a column named ‘user’. Every user have more than one occurrences. I want to add a new column ‘counter’ that increases every time a specific user has a new record.
The dataframe looks like this:
user revenue AAA 100000 BBB 150000 CCC 10000 AAA 200000 BBB 100000
I want it to look like this with the new counter column
user revenue counter AAA 100000 1 BBB 150000 1 CCC 100000 1 AAA 200000 2 BBB 100000 2
I tried the following line of code, but it’s taking ages:
for i in range(500000):
user=df_user.iloc[i,0]
a=1
for j in range(2000000):
if df.iloc[j,0] == user:
df.iloc[j,2] = a
a = a+1
Advertisement
Answer
Please checkout pandas cumcount
df['counter'] = df.groupby('user').cumcount()
should do the trick