I have the following pd.DataFrame
df = pd.DataFrame({'admission_timestamp': ['2021-01-17 17:45:00', '2020-03-31 23:32:00', '2020-03-27 18:20:00', '2020-04-17 18:12:00', '2020-03-19 19:12:00'], 'end_period': ['2021-01-18 17:45:00', '2020-04-01 23:32:00', '2020-03-28 18:20:00', '2020-04-18 18:12:00', '2020-03-20 19:12:00'], 'start_med': ['NaT', '2020-04-01 00:00:00', '2020-03-27 19:00:00', '2020-04-17 18:39:24', 'NaT'], 'end_med': ['NaT', '2020-04-14 21:00:00', '2020-04-05 00:00:00', '2020-05-06 22:07:29', 'NaT']})
that looks like:
admission_timestamp end_period start_med end_med 1 2021-01-17 17:45:00 2021-01-18 17:45:00 NaT NaT 2 2020-03-31 23:32:00 2020-04-01 23:32:00 2020-04-01 00:00:00 2020-04-14 21:00:00 3 2020-03-27 18:20:00 2020-03-28 18:20:00 2020-03-27 19:00:00 2020-04-05 00:00:00 4 2020-04-17 18:12:00 2020-04-18 18:12:00 2020-04-17 18:39:24 2020-05-06 22:07:29 5 2020-03-19 19:12:00 2020-03-20 19:12:00 NaT NaT
I want to create a new column received_medidation
that states whether or not (boolean) the patient received medication between admission_timestamp
and end_period
(even if it was for only one second). So, the boolean should state if there is any time between admission_timestamp
and end_period
that overlaps with the time between start_med
and end_med
. The dtypes are all datetime64[ns].
I know that we can create boolean masks such as
condition = (df['date'] > start_date) & (df['date'] <= end_date)
… however I fail to understand how this could possibily solve the task above. Any help is appreciated.
Advertisement
Answer
If start_med
is guaranteed to be later than admission_timestamp
, then it suffices that start_med
date is between admission_timestamp
and end_period
for col in df.columns: df[col] = pd.to_datetime(df[col]) df['received_medidation'] = (df['admission_timestamp'] < df['start_med']) & (df['start_med'] < df['end_period'])
If however, start_med
can be before admission_timestamp
, then that means 'start_med' < 'admission_timestamp' < 'end_med'
also creates an intersection of dates. Then we include this case with the previous case using OR operator:
df['received_medidation'] = (df['start_med'].between(df['admission_timestamp'], df['end_period']) | df['admission_timestamp'].between(df['start_med'], df['end_med']))
Note: The overall assumption here is that it’s always true that admission_timestamp < end_period
and start_med < end_med
, in which case the above logical expression catches all intersecting dates.
Output:
admission_timestamp end_period start_med 0 2021-01-17 17:45:00 2021-01-18 17:45:00 NaT 1 2020-03-31 23:32:00 2020-04-01 23:32:00 2020-04-01 00:00:00 2 2020-03-27 18:20:00 2020-03-28 18:20:00 2020-03-27 19:00:00 3 2020-04-17 18:12:00 2020-04-18 18:12:00 2020-04-17 18:39:24 4 2020-03-19 19:12:00 2020-03-20 19:12:00 NaT end_med received_medidation 0 NaT False 1 2020-04-14 21:00:00 True 2 2020-04-05 00:00:00 True 3 2020-05-06 22:07:29 True 4 NaT False