How can I add a counter column that counts specific values in another column in a pandas dataframe?

I have a large dataframe (>16M rows) which has a column named ‘user’. Every user have more than one occurrences. I want to add a new column ‘counter’ that increases every time a specific user has a new record.

The dataframe looks like this:

user     revenue
AAA       100000
BBB       150000
CCC       10000
AAA       200000
BBB       100000

JavaScript
​x
 
user     revenue
AAA       100000
BBB       150000
CCC       10000
AAA       200000
BBB       100000
​

I want it to look like this with the new counter column

user      revenue  counter
AAA       100000   1
BBB       150000   1
CCC       100000   1
AAA       200000   2
BBB       100000   2

JavaScript
 
user      revenue  counter
AAA       100000   1
BBB       150000   1
CCC       100000   1
AAA       200000   2
BBB       100000   2
​

I tried the following line of code, but it’s taking ages:

for i in range(500000):
   user=df_user.iloc[i,0]
   a=1

   for j in range(2000000):
      if df.iloc[j,0] == user:
         df.iloc[j,2] = a
         a = a+1

JavaScript
 
for i in range(500000):
   user=df_user.iloc[i,0]
   a=1
​
   for j in range(2000000):
      if df.iloc[j,0] == user:
         df.iloc[j,2] = a
         a = a+1
​

Answer

Please checkout pandas cumcount

df['counter'] = df.groupby('user').cumcount()

JavaScript
 
df['counter'] = df.groupby('user').cumcount()
​

should do the trick

Advertisement

Answer