I have the following dataset of students taking 2 different exams:
JavaScript
x
19
19
1
df = pd.DataFrame({'student': 'A B C D E'.split(),
2
'sat_date': [datetime.datetime(2013,4,1),datetime.datetime(2013,5,1),
3
datetime.datetime(2013,5,2),datetime.datetime(2013,7,15),
4
datetime.datetime(2013,8,1)],
5
'act_date': [datetime.datetime(2013,4,12),datetime.datetime(2013,5,2),
6
datetime.datetime(2013,4,12), datetime.datetime(2013,7,1),
7
datetime.datetime(2013,8,2)]})
8
9
print(df)
10
11
student sat_date act_date
12
0 A 2013-04-01 2013-04-12
13
1 B 2013-05-01 2013-05-02
14
2 C 2013-05-02 2013-04-12
15
3 D 2013-07-15 2013-07-01
16
4 E 2013-08-01 2013-08-02
17
18
19
I want to select those students whose two exams are 10 days apart from each other in either direction.
I am trying Timedelta
, but I’m not sure if it’s optimal.
JavaScript
1
2
1
df[(df['sat_date'] >= df['act_date'] + pd.Timedelta(days=10)) | (df['sat_date'] <= df['act_date'] - pd.Timedelta(days=10))]
2
Desired Output:
JavaScript
1
5
1
student sat_date act_date
2
0 A 2013-04-01 2013-04-12
3
2 C 2013-05-02 2013-04-12
4
3 D 2013-07-15 2013-07-01
5
Is there any better way of getting the desired output? Any suggestions would be appreciated. Thanks!
Advertisement
Answer
Try as follows:
JavaScript
1
8
1
result = df.loc[abs(df.sat_date - df.act_date).dt.days>=10]
2
print(result)
3
4
student sat_date act_date
5
0 A 2013-04-01 2013-04-12
6
2 C 2013-05-02 2013-04-12
7
3 D 2013-07-15 2013-07-01
8
Or maybe nicer:
JavaScript
1
2
1
df.loc[abs(df.sat_date - df.act_date).ge(pd.Timedelta(days=10))]
2