I have a dataframe like as shown below
df = pd.DataFrame({'person_id': [101,101,101,101,202,202,202], 'person_type':['A','A','B','C','D','B','A'], 'login_date':['5/7/2013 09:27:00 AM','09/08/2013 11:21:00 AM','06/06/2014 08:00:00 AM','06/06/2014 05:00:00 AM','12/11/2011 10:00:00 AM','13/10/2012 12:00:00 AM','13/12/2012 11:45:00 AM'], 'logout_date':[np.nan,'11/08/2013 11:21:00 AM',np.nan,'06/06/2014 05:00:00 AM',np.nan,'13/10/2012 12:00:00 AM',np.nan]}) df.login_date = pd.to_datetime(df.login_date) df.logout_date = pd.to_datetime(df.logout_date)
I would like to apply 2 rules to the logout_date
column
Rule 1 – If person type is B
, C
,D
,E
AND logout_date is NaN, then copy the login date value
Rule 2 – If person type is A
AND logout_date is NaN, then add 2 days to the login date
I tried the below
df['logout_date'] = np.where(((df['person_type'].isin(['B','C','D'])) & (df['logout_date'].isna())),df['login_date'].dt.date,df['logout_date'].dt.date) df['logout_date'] = np.where(((df['person_type'].isin(['A'])) & (df['logout_date'].isna())),df['login_date'] + pd.DateOffset(days=2).dt.date,df['logout_date'].dt.date)
You can see how lengthy it is. Is there any other better way to write this?
I expect my output to be like as shown below
person_id person_type login_date logout_date 101 A 2013-05-07 09:27:00 2013-05-09 09:27:00 101 A 2013-09-08 11:21:00 2013-11-08 11:21:00 101 B 2014-06-06 08:00:00 2014-06-06 08:00:00 101 C 2014-06-06 05:00:00 2014-06-06 05:00:00 202 D 2011-12-11 10:00:00 2011-12-11 10:00:00 202 B 2012-10-13 00:00:00 2012-10-13 12:00:00 202 A 2012-12-13 11:45:00 2012-12-15 11:45:00
Advertisement
Answer
Use numpy.select
with intermediate variables like mentioned in comments:
s = df['person_type'].fillna('missing value') m1 = s.isin(['B','C','D', 'missing value']) m2 = s.isin(['A','missing value']) df['logout_date'] = np.select([m1, m2], [df['login_date'], df['login_date'] + pd.DateOffset(days=2)], default=df['logout_date'])
Or rewritten your solution:
m1 = df['person_type'].isin(['B','C','D']) m2 = df['person_type'].isin(['A']) m3 = df['logout_date'].isna() df['logout_date'] = np.select([m1 & m3, m2 & m3], [df['login_date'], df['login_date'] + pd.DateOffset(days=2)], default=df['logout_date'])
df['logout_date'] = np.select([m1 & m3, m2 & m3], [df['login_date'].dt.date, (df['login_date'] + pd.DateOffset(days=2)).dt.date], default=df['logout_date'].dt.date)