Skip to content
Advertisement

Sequentially counting repeated entries

I am currently working on a project where I have to measure someones activity over time on a site, based on whether they edit a site. I have a data frame that looks similar to this:

df = pd.DataFrame({"x":["a", "b", "c", "b","b"],
               "y":["red", "blue", "green", "yellow","red"],
               "z":[1,2,3,4,5]})

I want to add a column to the dataframe such that it counts the number of repeated values (number of edits, which is column x) there are, using the “z” column as the measure of when the events happened.

E.g. to have an additional column of:

df["activity"] = pd.Series([1,1,1,2,3])

How would I best go about this in Python? Not sure what my best approach here is.

Advertisement

Answer

groupby and cumcount

df['activity'] = df.groupby('x').cumcount() + 1
df

   x       y  z  activity
0  a     red  1         1
1  b    blue  2         1
2  c   green  3         1
3  b  yellow  4         2
4  b     red  5         3
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement