I have a DF that contains the ids of several creators of certain projects and the outcomes of their projects over time. Each project can either be a success (outcome = 1) or a failure (outcome=0). The DF looks like this:
creator_id outcome 1 1 1 1 2 1 2 0 2 1
I’m looking for a way to create two new columns: previous projects and previous successes.
The first should be the cumulative number of previous projects for any creator. Given that I’ve already ordered the DF by creator and by date of creation, I know I can do this by running the following code:
df['prev_project']=df.groupby(['creator_id']).cumcount()
My issue is that I can’t figure out how to create the column previous success. I tried to run the following code:
df['prev_success']=df.groupby(['creator_id'])['outcome'].cumsum()
The issue with this code is that it starts counting immediately (the success of row x is counted as previous success for the project in row x). Moreover, I have issues with projects whose outcome path is not homogeneous, i.e. one creator’s sequence is: success, failure, success.
Ideally my DF should look like:
creator_id outcome prev_projects prev_success 1 1 0 0 1 1 1 1 2 1 0 0 2 0 1 1 2 1 2 1
Please note the peculiar sequence required as result for creator_id number 2. Hope all is clear
Advertisement
Answer
You can use shift
with fillna
to start your cumsum
from 0:
df['outcome_shifted']=df.groupby(['creator_id'])['outcome'].shift().fillna(0) df['prev_success']=df.groupby(['creator_id'])['outcome_shifted'].cumsum() df
prints
creator_id outcome prev_success outcome_shifted 0 1 1 0.0 0.0 1 1 1 1.0 1.0 2 2 1 0.0 0.0 3 2 0 1.0 1.0 4 2 1 1.0 0.0