I have a large dataframe (>16M rows) which has a column named ‘user’. Every user have more than one occurrences. I want to add a new column ‘counter’ that increases every time a specific user has a new record.
The dataframe looks like this:
JavaScript
x
7
1
user revenue
2
AAA 100000
3
BBB 150000
4
CCC 10000
5
AAA 200000
6
BBB 100000
7
I want it to look like this with the new counter column
JavaScript
1
7
1
user revenue counter
2
AAA 100000 1
3
BBB 150000 1
4
CCC 100000 1
5
AAA 200000 2
6
BBB 100000 2
7
I tried the following line of code, but it’s taking ages:
JavaScript
1
9
1
for i in range(500000):
2
user=df_user.iloc[i,0]
3
a=1
4
5
for j in range(2000000):
6
if df.iloc[j,0] == user:
7
df.iloc[j,2] = a
8
a = a+1
9
Advertisement
Answer
Please checkout pandas cumcount
JavaScript
1
2
1
df['counter'] = df.groupby('user').cumcount()
2
should do the trick