Current df:
Date Power 2011-04-18 17:00:00 243.56 2011-04-18 17:00:01 245.83 2011-04-18 17:00:02 246.02 2011-04-18 17:00:03 245.72 2011-04-18 17:00:04 244.71 2011-04-18 17:00:05 245.93 2011-04-18 17:00:06 243.12 2011-04-18 17:00:07 244.72 2011-04-18 17:00:08 242.44 2011-04-18 17:00:09 246.42 2011-04-18 17:00:10 245.02 ... ...
I have the df with Date and a float number. Date is the index and is unique. I would like to create a new df based on the dates found in the next df.
date start date end 0 2011-04-18 17:00:01 2011-04-18 17:00:02 1 2011-04-18 17:00:05 2011-04-18 17:00:06 2 2011-04-18 17:00:08 2011-04-18 17:00:10 ... ... ...
I expect to get:
Date Power 2011-04-18 17:00:01 245.83 2011-04-18 17:00:02 246.02 2011-04-18 17:00:05 245.93 2011-04-18 17:00:06 243.12 2011-04-18 17:00:08 242.44 2011-04-18 17:00:09 246.42 2011-04-18 17:00:10 245.02 ... ...
In other word I want to filter the initial df and find all rows between all the dates found in the second df.
I thought of using pandas.DataFrame.between_time. But the issue is this works only for 1 given date start and date end. How can I do this with many different date periods?
Advertisement
Answer
Use np.logical_or.reduce with list comprehension:
L = [df1['Date'].between(s, e) for s, e in df2[['date start','date end']].to_numpy()]
df = df1[np.logical_or.reduce(L)]
print (df)
Date Power
1 2011-04-18 17:00:01 245.83
2 2011-04-18 17:00:02 246.02
5 2011-04-18 17:00:05 245.93
6 2011-04-18 17:00:06 243.12
8 2011-04-18 17:00:08 242.44
9 2011-04-18 17:00:09 246.42
10 2011-04-18 17:00:10 245.02
If DatetimeIndex is possible use:
L = [df1[s:e] for s, e in df2[['date start','date end']].to_numpy()]
df = pd.concat(L)
print (df)
Power
Date
2011-04-18 17:00:01 245.83
2011-04-18 17:00:02 246.02
2011-04-18 17:00:05 245.93
2011-04-18 17:00:06 243.12
2011-04-18 17:00:08 242.44
2011-04-18 17:00:09 246.42
2011-04-18 17:00:10 245.02
L = [(df1.index >= s) & (df1.index <= e)
for s, e in df2[['date start','date end']].to_numpy()]
df = df1[np.logical_or.reduce(L)]