Skip to content
Advertisement

Select Rows Based on Time Difference [Before or After] In Columns

I have the following dataset of students taking 2 different exams:

df = pd.DataFrame({'student': 'A B C D E'.split(),
                  'sat_date': [datetime.datetime(2013,4,1),datetime.datetime(2013,5,1),
                               datetime.datetime(2013,5,2),datetime.datetime(2013,7,15),
                               datetime.datetime(2013,8,1)],
                   'act_date': [datetime.datetime(2013,4,12),datetime.datetime(2013,5,2),
                               datetime.datetime(2013,4,12), datetime.datetime(2013,7,1),
                               datetime.datetime(2013,8,2)]})

print(df)

student   sat_date   act_date
0       A 2013-04-01 2013-04-12
1       B 2013-05-01 2013-05-02
2       C 2013-05-02 2013-04-12
3       D 2013-07-15 2013-07-01
4       E 2013-08-01 2013-08-02


I want to select those students whose two exams are 10 days apart from each other in either direction.

I am trying Timedelta, but I’m not sure if it’s optimal.

df[(df['sat_date'] >= df['act_date'] + pd.Timedelta(days=10)) | (df['sat_date'] <= df['act_date'] - pd.Timedelta(days=10))]

Desired Output:

student sat_date    act_date
0   A   2013-04-01  2013-04-12
2   C   2013-05-02  2013-04-12
3   D   2013-07-15  2013-07-01

Is there any better way of getting the desired output? Any suggestions would be appreciated. Thanks!

Advertisement

Answer

Try as follows:

result = df.loc[abs(df.sat_date - df.act_date).dt.days>=10]
print(result)

  student   sat_date   act_date
0       A 2013-04-01 2013-04-12
2       C 2013-05-02 2013-04-12
3       D 2013-07-15 2013-07-01

Or maybe nicer:

df.loc[abs(df.sat_date - df.act_date).ge(pd.Timedelta(days=10))]
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement