I have a dataframe that looks similar to this :
Price From To 300€ 2020-01-01 2020-01-07 250€ 2020-01-04 2020-01-08 150€ 2020-02-01 2020-02-04 350€ 2020-02-04 2020-02-08
And then I have a list of dates. For example: list = [2020-01-03, 2020-02-04]
I would like to keep only the rows of the dataframe where the dates are in between the From
column and the To
column.
So, after transformation I would have the following dataframe.
Price From To 300€ 2020-01-01 2020-01-07 150€ 2020-02-01 2020-02-04 350€ 2020-02-04 2020-02-08
First I thought of using a lambda with an apply but I thought it was not very efficient because my dataset is very large. Is there a simpler way to do this with pandas ?
The result would be contained in one single dataframe
Advertisement
Answer
Let’s try with numpy broadcasting
:
x, y = df[['From', 'To']].values.T a = np.array(['2020-01-03', '2020-02-04'], dtype=np.datetime64) mask = ((x[:, None] <= a) & (y[:, None] >= a)).any(1)
df[mask] Price From To 0 300€ 2020-01-01 2020-01-07 2 150€ 2020-02-01 2020-02-04 3 350€ 2020-02-04 2020-02-08