I am currently working on a project where I have to measure someones activity over time on a site, based on whether they edit a site. I have a data frame that looks similar to this:
df = pd.DataFrame({"x":["a", "b", "c", "b","b"], "y":["red", "blue", "green", "yellow","red"], "z":[1,2,3,4,5]})
I want to add a column to the dataframe such that it counts the number of repeated values (number of edits, which is column x) there are, using the “z” column as the measure of when the events happened.
E.g. to have an additional column of:
df["activity"] = pd.Series([1,1,1,2,3])
How would I best go about this in Python? Not sure what my best approach here is.
Advertisement
Answer
groupby
and cumcount
df['activity'] = df.groupby('x').cumcount() + 1 df x y z activity 0 a red 1 1 1 b blue 2 1 2 c green 3 1 3 b yellow 4 2 4 b red 5 3