Suppose I have a dataframe:
| Case_no | Event_ID | Date | Type | Done_by | 
|---|---|---|---|---|
| 1 | 1 | 2023-01-01 | Email Sent | Customer | 
| 1 | 2 | 2023-01-02 | Response | Agent | 
| 1 | 3 | 2023-01-03 | Email Sent | Customer | 
| 1 | 4 | 2023-01-10 | Response | Agent | 
| 2 | 5 | 2023-01-02 | Email Sent | Customer | 
| 2 | 6 | 2023-01-04 | Email Sent | Customer | 
And I wanted to add two boolean columns that check for each case if an email has been sent in, has there been a response within 5 days on that case. There is no way to link emails and responses but just having it flag up that a response has been sent and whether or not it was in 5 days after is good enough for what I am doing as such:
| Case_no | Event_ID | Date | Type | Done_by | Response? | Response_Within_Five_Days | 
|---|---|---|---|---|---|---|
| 1 | 1 | 2023-01-01 | Email Sent | Customer | TRUE | TRUE | 
| 1 | 2 | 2023-01-02 | Response | Agent | nan | NAN | 
| 1 | 3 | 2023-01-03 | Email Sent | Customer | TRUE | FALSE | 
| 1 | 4 | 2023-01-10 | Response | Agent | NAN | NAN | 
| 2 | 5 | 2023-01-02 | Email Sent | Customer | FALSE | FALSE | 
| 2 | 6 | 2023-01-04 | Email Sent | Customer | FALSE | FALSE | 
How would I go about doing that?
Advertisement
Answer
Solution for compare per groups if next value after Email Sent is Response by DataFrameGroupBy.shift:
df['Date'] = pd.to_datetime(df['Date'])
shift = df.groupby('Case_no')[['Type','Date']].shift(-1)
m1 = df['Type'].eq('Email Sent')
m2 = shift['Type'].eq('Response')
m3 = shift['Date'].sub(df['Date']).dt.days.lt(5)
df['Response?'] = (m1 & m2).astype('boolean')
df['Response_Within_Five_Days'] = df['Response?'] & m3
df.loc[df['Type'].eq('Response'), ['Response?','Response_Within_Five_Days']] = np.nan
print (df)
   Case_no  Event_ID       Date        Type   Done_by  Response?  
0        1         1 2023-01-01  Email Sent  Customer       True   
1        1         2 2023-01-02    Response     Agent       <NA>   
2        1         3 2023-01-03  Email Sent  Customer       True   
3        1         4 2023-01-10    Response     Agent       <NA>   
4        2         5 2023-01-02  Email Sent  Customer      False   
5        2         6 2023-01-04  Email Sent  Customer      False   
   Response_Within_Five_Days  
0                       True  
1                       <NA>  
2                      False  
3                       <NA>  
4                      False  
5                      False  
EDIT: For check next rows after Email Sent use custom function:
def f(x):
    
    m = x['Type'].eq('Email Sent')
    for k, v in x.loc[m, 'Date'].items():
        s = x[~m].loc[k:, 'Date'].sub(v).dt.days
        x.loc[k, 'Response?'] = not s.empty
        x.loc[k, 'Response_Within_Five_Days'] = s.lt(5).any()
    return x
df = df.reset_index(drop=True)
df =  df.groupby('Case_no').apply(f)
print (df)
   Case_no  Event_ID       Date        Type   Done_by Response?  
0        1         1 2023-01-01  Email Sent  Customer      True   
1        1         2 2023-01-02    Response     Agent       NaN   
2        1         3 2023-01-03  Email Sent  Customer      True   
3        1         4 2023-01-10    Response     Agent       NaN   
4        2         5 2023-01-02  Email Sent  Customer     False   
5        2         6 2023-01-04  Email Sent  Customer     False   
  Response_Within_Five_Days  
0                      True  
1                       NaN  
2                     False  
3                       NaN  
4                     False  
5                     False