Current df: I have the df with Date and a float number. Date is the index and is unique. I would like to create a new df based on the dates found in the next df. I expect to get: In other word I want to filter the initial df and find all rows between all the dates found in

Pandas select dataframe rows between multiple date times

Current df:

Date                 Power
2011-04-18 17:00:00  243.56
2011-04-18 17:00:01  245.83
2011-04-18 17:00:02  246.02
2011-04-18 17:00:03  245.72
2011-04-18 17:00:04  244.71
2011-04-18 17:00:05  245.93
2011-04-18 17:00:06  243.12
2011-04-18 17:00:07  244.72
2011-04-18 17:00:08  242.44
2011-04-18 17:00:09  246.42
2011-04-18 17:00:10  245.02
...                     ...

JavaScript
​x
 
Date                 Power
2011-04-18 17:00:00  243.56
2011-04-18 17:00:01  245.83
2011-04-18 17:00:02  246.02
2011-04-18 17:00:03  245.72
2011-04-18 17:00:04  244.71
2011-04-18 17:00:05  245.93
2011-04-18 17:00:06  243.12
2011-04-18 17:00:07  244.72
2011-04-18 17:00:08  242.44
2011-04-18 17:00:09  246.42
2011-04-18 17:00:10  245.02
...                     ...
​

I have the df with Date and a float number. Date is the index and is unique. I would like to create a new df based on the dates found in the next df.

     date start          date end
0    2011-04-18 17:00:01 2011-04-18 17:00:02
1    2011-04-18 17:00:05 2011-04-18 17:00:06
2    2011-04-18 17:00:08 2011-04-18 17:00:10
...                  ...                 ...

JavaScript
 
     date start          date end
0    2011-04-18 17:00:01 2011-04-18 17:00:02
1    2011-04-18 17:00:05 2011-04-18 17:00:06
2    2011-04-18 17:00:08 2011-04-18 17:00:10
...                  ...                 ...
​

I expect to get:

Date                 Power
2011-04-18 17:00:01  245.83
2011-04-18 17:00:02  246.02
2011-04-18 17:00:05  245.93
2011-04-18 17:00:06  243.12
2011-04-18 17:00:08  242.44
2011-04-18 17:00:09  246.42
2011-04-18 17:00:10  245.02
...                     ...

JavaScript
 
Date                 Power
2011-04-18 17:00:01  245.83
2011-04-18 17:00:02  246.02
2011-04-18 17:00:05  245.93
2011-04-18 17:00:06  243.12
2011-04-18 17:00:08  242.44
2011-04-18 17:00:09  246.42
2011-04-18 17:00:10  245.02
...                     ...
​

In other word I want to filter the initial df and find all rows between all the dates found in the second df.

I thought of using pandas.DataFrame.between_time. But the issue is this works only for 1 given date start and date end. How can I do this with many different date periods?

Answer

Use np.logical_or.reduce with list comprehension:

L = [df1['Date'].between(s, e) for s, e in df2[['date start','date end']].to_numpy()]

df = df1[np.logical_or.reduce(L)]
print (df)
                  Date   Power
1  2011-04-18 17:00:01  245.83
2  2011-04-18 17:00:02  246.02
5  2011-04-18 17:00:05  245.93
6  2011-04-18 17:00:06  243.12
8  2011-04-18 17:00:08  242.44
9  2011-04-18 17:00:09  246.42
10 2011-04-18 17:00:10  245.02

JavaScript
 
L = [df1['Date'].between(s, e) for s, e in df2[['date start','date end']].to_numpy()]
​
df = df1[np.logical_or.reduce(L)]
print (df)
                  Date   Power
1  2011-04-18 17:00:01  245.83
2  2011-04-18 17:00:02  246.02
5  2011-04-18 17:00:05  245.93
6  2011-04-18 17:00:06  243.12
8  2011-04-18 17:00:08  242.44
9  2011-04-18 17:00:09  246.42
10 2011-04-18 17:00:10  245.02
​

If DatetimeIndex is possible use:

L = [df1[s:e] for s, e in df2[['date start','date end']].to_numpy()]

df = pd.concat(L)
print (df)
                      Power
Date                       
2011-04-18 17:00:01  245.83
2011-04-18 17:00:02  246.02
2011-04-18 17:00:05  245.93
2011-04-18 17:00:06  243.12
2011-04-18 17:00:08  242.44
2011-04-18 17:00:09  246.42
2011-04-18 17:00:10  245.02



L = [(df1.index >= s) & (df1.index <= e) 
     for s, e in df2[['date start','date end']].to_numpy()]

df = df1[np.logical_or.reduce(L)]

JavaScript
 
L = [df1[s:e] for s, e in df2[['date start','date end']].to_numpy()]
​
df = pd.concat(L)
print (df)
                      Power
Date                       
2011-04-18 17:00:01  245.83
2011-04-18 17:00:02  246.02
2011-04-18 17:00:05  245.93
2011-04-18 17:00:06  243.12
2011-04-18 17:00:08  242.44
2011-04-18 17:00:09  246.42
2011-04-18 17:00:10  245.02
​
​
​
L = [(df1.index >= s) & (df1.index <= e) 
     for s, e in df2[['date start','date end']].to_numpy()]
​
df = df1[np.logical_or.reduce(L)]
​

Advertisement

Answer