I have the following dataset of students taking 2 different exams:
df = pd.DataFrame({'student': 'A B C D E'.split(), 'sat_date': [datetime.datetime(2013,4,1),datetime.datetime(2013,5,1), datetime.datetime(2013,5,2),datetime.datetime(2013,7,15), datetime.datetime(2013,8,1)], 'act_date': [datetime.datetime(2013,4,12),datetime.datetime(2013,5,2), datetime.datetime(2013,4,12), datetime.datetime(2013,7,1), datetime.datetime(2013,8,2)]}) print(df) student sat_date act_date 0 A 2013-04-01 2013-04-12 1 B 2013-05-01 2013-05-02 2 C 2013-05-02 2013-04-12 3 D 2013-07-15 2013-07-01 4 E 2013-08-01 2013-08-02
I want to select those students whose two exams are 10 days apart from each other in either direction.
I am trying Timedelta
, but I’m not sure if it’s optimal.
df[(df['sat_date'] >= df['act_date'] + pd.Timedelta(days=10)) | (df['sat_date'] <= df['act_date'] - pd.Timedelta(days=10))]
Desired Output:
student sat_date act_date 0 A 2013-04-01 2013-04-12 2 C 2013-05-02 2013-04-12 3 D 2013-07-15 2013-07-01
Is there any better way of getting the desired output? Any suggestions would be appreciated. Thanks!
Advertisement
Answer
Try as follows:
result = df.loc[abs(df.sat_date - df.act_date).dt.days>=10] print(result) student sat_date act_date 0 A 2013-04-01 2013-04-12 2 C 2013-05-02 2013-04-12 3 D 2013-07-15 2013-07-01
Or maybe nicer:
df.loc[abs(df.sat_date - df.act_date).ge(pd.Timedelta(days=10))]