I’m trying to figure out how to check if a date in one column is between the dates from two separate columns. I checked Check if a date column is in a range of dates – pandas but it wasn’t quite the same problem.
I use the unique identifier of each row to know if that row’s date needs to be checked.
If the date is between the two dates, i’d like the unique identifier of that row to be appeneded to a list.
I’m using ‘index’ in my example as the unique identifier.
import pandas as pd date_dict = {'check_date': ['10/31/2019 10:00 PM', '11/10/2012 06:02 PM', '08/06/2008 02:02 PM', '05/13/2009 12:19 PM', '04/19/2008 07:38 PM', '10/08/2012 01:12 PM', '11/29/2012 09:41 AM', '08/03/2016 02:05 AM', '05/15/2015 12:31 AM', '04/05/2016 10:21 AM', '09/26/2018 02:02 PM', '11/13/2014 02:09 AM', '02/28/2014 09:58 AM', '10/02/2015 08:25 PM', '08/21/2008 06:31 AM', '05/31/2017 03:48 AM', '12/16/2010 10:39 PM', '12/05/2008 08:57 AM', '08/18/2010 10:35 PM', '07/06/2010 12:25 AM', '06/14/2013 07:27 AM', '09/27/2015 11:06 PM', '07/03/2014 01:02 AM', '09/18/2009 04:26 PM', '01/21/2016 10:56 PM'], 'start_date': ['02/24/2012 12:57 PM', '09/25/2017 11:35 PM', '07/05/2015 10:58 PM', '04/26/2017 04:26 AM', '09/03/2010 10:50 AM', '07/08/2017 10:17 AM', '06/14/2011 02:19 AM', '03/21/2009 10:11 AM', '10/22/2012 12:39 AM', '11/09/2008 05:20 PM', '12/31/2012 08:51 PM', '08/26/2013 01:03 PM', '05/21/2014 01:48 AM', '11/11/2009 11:55 PM', '04/23/2012 10:14 AM', '11/23/2009 09:26 AM', '08/20/2010 02:13 PM', '08/09/2019 01:00 AM', '01/06/2010 03:06 PM', '02/23/2016 08:23 PM', '10/30/2019 03:20 AM', '06/12/2013 06:25 PM', '02/03/2019 05:46 PM', '08/07/2011 02:50 PM', '06/18/2013 03:59 AM'], 'end_date': ['09/06/2014 03:03 AM', '08/24/2012 12:30 PM', '05/29/2008 05:48 AM', '12/31/2014 01:00 AM', '12/06/2011 05:47 PM', '04/28/2013 07:01 PM', '09/17/2017 02:21 AM', '06/23/2008 03:45 PM', '01/24/2011 03:04 PM', '08/05/2015 02:10 AM', '12/12/2018 11:50 AM', '08/23/2016 06:31 AM', '11/21/2018 08:49 AM', '12/05/2009 03:31 PM', '04/16/2010 09:24 PM', '09/08/2012 12:29 PM', '11/09/2009 08:08 AM', '11/13/2016 04:21 AM', '07/17/2018 12:05 PM', '05/03/2012 06:27 AM', '09/04/2012 09:11 PM', '06/26/2014 06:55 AM', '09/19/2016 08:48 PM', '05/02/2018 09:03 AM', '03/22/2015 04:20 AM']} df = pd.DataFrame(date_dict) df.reset_index(inplace = True) df['flag'] = np.where(df['index'] % 2 == 0, 1, 0) df_list = list(df[df['flag'] == 1]['index']) analyst_list = [] for flag in df_list: min_date = df[df['index'] == flag]['check_date'] for index, row in df.iterrows(): start = row['start_date'] end = row['end_date'] if min_date > start and min_date <= end : analyst_list.append(row['index']) else: pass
When I run the above code, I get the following error that I haven’t been able to get past.
Traceback (most recent call last): File "<ipython-input-112-fecfeaa05d6d>", line 8, in <module> if min_date > start and min_date <= end : File "C:UsersJORDAN.HOWELL.GITDIRAppDataLocalContinuumanaconda3envsstan_envlibsite-packagespandascoregeneric.py", line 1330, in __nonzero__ f"The truth value of a {type(self).__name__} is ambiguous. "
I’m not sure what about the dates are ambiguous. I tried adding .values
to the row['start_date']
and row['end_date']
and ‘row[‘check_date’]` and that did not help.
Does anyone know how to do this or what my issue is?
Advertisement
Answer
use between
after you convert your columns to datetime
df = df.apply(pd.to_datetime) df[df['check_date'].between(df['start_date'], df['end_date'])].index # -> Int64Index([6, 10, 11, 18], dtype='int64') check_date start_date end_date 6 2012-11-29 09:41:00 2011-06-14 02:19:00 2017-09-17 02:21:00 10 2018-09-26 14:02:00 2012-12-31 20:51:00 2018-12-12 11:50:00 11 2014-11-13 02:09:00 2013-08-26 13:03:00 2016-08-23 06:31:00 18 2010-08-18 22:35:00 2010-01-06 15:06:00 2018-07-17 12:05:00
update
# convert to datetime df.loc[:, ['check_date', 'start_date', 'end_date']] = df[['check_date', 'start_date', 'end_date']].apply(pd.to_datetime) # filter for flag flag = df[df['flag'] == 1].copy() # list comprehension to check if each date is between each rows star and end [flag[(date >= flag['start_date']) & (date <= flag['end_date'])].index.tolist() for date in flag['check_date']] [[], [], [], [0, 6, 18], [6, 10, 12, 18], [10, 12], [0, 6, 10, 18, 24], [], [4, 18], [18], [0, 6, 10, 18], [0, 6, 10, 12, 18, 24], [6, 10, 12, 18]]
Or assign the list back to the frame
flag['Check'] = [flag[(date >= flag['start_date']) & (date <= flag['end_date'])].index.tolist() for date in flag['check_date']]
or dict comprehension instead of list
{date: flag[(date >= flag['start_date']) & (date <= flag['end_date'])].index.tolist() for date in flag['check_date']} {Timestamp('2019-10-31 22:00:00'): [], Timestamp('2008-08-06 14:02:00'): [], Timestamp('2008-04-19 19:38:00'): [], Timestamp('2012-11-29 09:41:00'): [0, 6, 18], Timestamp('2015-05-15 00:31:00'): [6, 10, 12, 18], Timestamp('2018-09-26 14:02:00'): [10, 12], Timestamp('2014-02-28 09:58:00'): [0, 6, 10, 18, 24], Timestamp('2008-08-21 06:31:00'): [], Timestamp('2010-12-16 22:39:00'): [4, 18], Timestamp('2010-08-18 22:35:00'): [18], Timestamp('2013-06-14 07:27:00'): [0, 6, 10, 18], Timestamp('2014-07-03 01:02:00'): [0, 6, 10, 12, 18, 24], Timestamp('2016-01-21 22:56:00'): [6, 10, 12, 18]}