I have a dataframe that looks similar to this :
JavaScript
x
6
1
Price From To
2
300€ 2020-01-01 2020-01-07
3
250€ 2020-01-04 2020-01-08
4
150€ 2020-02-01 2020-02-04
5
350€ 2020-02-04 2020-02-08
6
And then I have a list of dates. For example: list = [2020-01-03, 2020-02-04]
I would like to keep only the rows of the dataframe where the dates are in between the From
column and the To
column.
So, after transformation I would have the following dataframe.
JavaScript
1
5
1
Price From To
2
300€ 2020-01-01 2020-01-07
3
150€ 2020-02-01 2020-02-04
4
350€ 2020-02-04 2020-02-08
5
First I thought of using a lambda with an apply but I thought it was not very efficient because my dataset is very large. Is there a simpler way to do this with pandas ?
The result would be contained in one single dataframe
Advertisement
Answer
Let’s try with numpy broadcasting
:
JavaScript
1
4
1
x, y = df[['From', 'To']].values.T
2
a = np.array(['2020-01-03', '2020-02-04'], dtype=np.datetime64)
3
mask = ((x[:, None] <= a) & (y[:, None] >= a)).any(1)
4
JavaScript
1
7
1
df[mask]
2
3
Price From To
4
0 300€ 2020-01-01 2020-01-07
5
2 150€ 2020-02-01 2020-02-04
6
3 350€ 2020-02-04 2020-02-08
7