Suppose I have a dataframe:
Case_no | Event_ID | Date | Type | Done_by |
---|---|---|---|---|
1 | 1 | 2023-01-01 | Email Sent | Customer |
1 | 2 | 2023-01-02 | Response | Agent |
1 | 3 | 2023-01-03 | Email Sent | Customer |
1 | 4 | 2023-01-10 | Response | Agent |
2 | 5 | 2023-01-02 | Email Sent | Customer |
2 | 6 | 2023-01-04 | Email Sent | Customer |
And I wanted to add two boolean columns that check for each case if an email has been sent in, has there been a response within 5 days on that case. There is no way to link emails and responses but just having it flag up that a response has been sent and whether or not it was in 5 days after is good enough for what I am doing as such:
Case_no | Event_ID | Date | Type | Done_by | Response? | Response_Within_Five_Days |
---|---|---|---|---|---|---|
1 | 1 | 2023-01-01 | Email Sent | Customer | TRUE | TRUE |
1 | 2 | 2023-01-02 | Response | Agent | nan | NAN |
1 | 3 | 2023-01-03 | Email Sent | Customer | TRUE | FALSE |
1 | 4 | 2023-01-10 | Response | Agent | NAN | NAN |
2 | 5 | 2023-01-02 | Email Sent | Customer | FALSE | FALSE |
2 | 6 | 2023-01-04 | Email Sent | Customer | FALSE | FALSE |
How would I go about doing that?
Advertisement
Answer
Solution for compare per groups if next value after Email Sent
is Response
by DataFrameGroupBy.shift
:
df['Date'] = pd.to_datetime(df['Date']) shift = df.groupby('Case_no')[['Type','Date']].shift(-1) m1 = df['Type'].eq('Email Sent') m2 = shift['Type'].eq('Response') m3 = shift['Date'].sub(df['Date']).dt.days.lt(5) df['Response?'] = (m1 & m2).astype('boolean') df['Response_Within_Five_Days'] = df['Response?'] & m3 df.loc[df['Type'].eq('Response'), ['Response?','Response_Within_Five_Days']] = np.nan print (df) Case_no Event_ID Date Type Done_by Response? 0 1 1 2023-01-01 Email Sent Customer True 1 1 2 2023-01-02 Response Agent <NA> 2 1 3 2023-01-03 Email Sent Customer True 3 1 4 2023-01-10 Response Agent <NA> 4 2 5 2023-01-02 Email Sent Customer False 5 2 6 2023-01-04 Email Sent Customer False Response_Within_Five_Days 0 True 1 <NA> 2 False 3 <NA> 4 False 5 False
EDIT: For check next rows after Email Sent
use custom function:
def f(x): m = x['Type'].eq('Email Sent') for k, v in x.loc[m, 'Date'].items(): s = x[~m].loc[k:, 'Date'].sub(v).dt.days x.loc[k, 'Response?'] = not s.empty x.loc[k, 'Response_Within_Five_Days'] = s.lt(5).any() return x df = df.reset_index(drop=True) df = df.groupby('Case_no').apply(f) print (df) Case_no Event_ID Date Type Done_by Response? 0 1 1 2023-01-01 Email Sent Customer True 1 1 2 2023-01-02 Response Agent NaN 2 1 3 2023-01-03 Email Sent Customer True 3 1 4 2023-01-10 Response Agent NaN 4 2 5 2023-01-02 Email Sent Customer False 5 2 6 2023-01-04 Email Sent Customer False Response_Within_Five_Days 0 True 1 NaN 2 False 3 NaN 4 False 5 False