Current df:
Date Power 2011-04-18 17:00:00 243.56 2011-04-18 17:00:01 245.83 2011-04-18 17:00:02 246.02 2011-04-18 17:00:03 245.72 2011-04-18 17:00:04 244.71 2011-04-18 17:00:05 245.93 2011-04-18 17:00:06 243.12 2011-04-18 17:00:07 244.72 2011-04-18 17:00:08 242.44 2011-04-18 17:00:09 246.42 2011-04-18 17:00:10 245.02 ... ...
I have the df with Date and a float number. Date is the index and is unique. I would like to create a new df based on the dates found in the next df.
date start date end 0 2011-04-18 17:00:01 2011-04-18 17:00:02 1 2011-04-18 17:00:05 2011-04-18 17:00:06 2 2011-04-18 17:00:08 2011-04-18 17:00:10 ... ... ...
I expect to get:
Date Power 2011-04-18 17:00:01 245.83 2011-04-18 17:00:02 246.02 2011-04-18 17:00:05 245.93 2011-04-18 17:00:06 243.12 2011-04-18 17:00:08 242.44 2011-04-18 17:00:09 246.42 2011-04-18 17:00:10 245.02 ... ...
In other word I want to filter the initial df and find all rows between all the dates found in the second df.
I thought of using pandas.DataFrame.between_time. But the issue is this works only for 1 given date start and date end. How can I do this with many different date periods?
Advertisement
Answer
Use np.logical_or.reduce
with list comprehension:
L = [df1['Date'].between(s, e) for s, e in df2[['date start','date end']].to_numpy()] df = df1[np.logical_or.reduce(L)] print (df) Date Power 1 2011-04-18 17:00:01 245.83 2 2011-04-18 17:00:02 246.02 5 2011-04-18 17:00:05 245.93 6 2011-04-18 17:00:06 243.12 8 2011-04-18 17:00:08 242.44 9 2011-04-18 17:00:09 246.42 10 2011-04-18 17:00:10 245.02
If DatetimeIndex
is possible use:
L = [df1[s:e] for s, e in df2[['date start','date end']].to_numpy()] df = pd.concat(L) print (df) Power Date 2011-04-18 17:00:01 245.83 2011-04-18 17:00:02 246.02 2011-04-18 17:00:05 245.93 2011-04-18 17:00:06 243.12 2011-04-18 17:00:08 242.44 2011-04-18 17:00:09 246.42 2011-04-18 17:00:10 245.02 L = [(df1.index >= s) & (df1.index <= e) for s, e in df2[['date start','date end']].to_numpy()] df = df1[np.logical_or.reduce(L)]