Advertisement

Sequentially counting repeated entries

dataframe pandas python

Andrew Davidson

asked 19 Dec, 2017

I am currently working on a project where I have to measure someones activity over time on a site, based on whether they edit a site. I have a data frame that looks similar to this:

df = pd.DataFrame({"x":["a", "b", "c", "b","b"],
               "y":["red", "blue", "green", "yellow","red"],
               "z":[1,2,3,4,5]})

JavaScript
​x
 
df = pd.DataFrame({"x":["a", "b", "c", "b","b"],
               "y":["red", "blue", "green", "yellow","red"],
               "z":[1,2,3,4,5]})
​

I want to add a column to the dataframe such that it counts the number of repeated values (number of edits, which is column x) there are, using the “z” column as the measure of when the events happened.

E.g. to have an additional column of:

df["activity"] = pd.Series([1,1,1,2,3])

JavaScript
 
df["activity"] = pd.Series([1,1,1,2,3])
​

How would I best go about this in Python? Not sure what my best approach here is.

Advertisement

Answer

`groupby` and `cumcount`

df['activity'] = df.groupby('x').cumcount() + 1
df

   x       y  z  activity
0  a     red  1         1
1  b    blue  2         1
2  c   green  3         1
3  b  yellow  4         2
4  b     red  5         3

JavaScript
 
df['activity'] = df.groupby('x').cumcount() + 1
df
​
   x       y  z  activity
0  a     red  1         1
1  b    blue  2         1
2  c   green  3         1
3  b  yellow  4         2
4  b     red  5         3
​

Advertisement