Skip to content
Advertisement

How to remove datetime values in a row that are within a certain time relative to another row?

If I have a DataFrame as below:

Letter Time
0 x 2021-01-01 14:00:00
1 y 2021-01-01 18:00:00
2 y 2021-01-03 14:00:00

How would I delete a row if a value in the Time column(datetime) is within say 14 hours from the time in the row above?

I’ve tried using:

JavaScript

but I get KeyError 1 in relation to the line

if df.at[i, ‘Time’] – df.at[i-1, ‘Time’] < timedelta(hours=14):

Advertisement

Answer

If a timestamp is within 14hours of an earlier timestamp, does its removal depend on whether the earlier timestamp is removed or not? This answer considers the situation where the answer to this question is “yes”. (If the answer is “no” then the resulting solution for the test data below would be the first timestamp only).

setup

test data:

JavaScript

timestamps looks like this:

JavaScript

The solution we are aiming for consists of the 1st, 4th, 6th and 8th timestamps.

solution

This solution will use piso (pandas interval set operations) package. The idea is to create a 14hr window, i.e. interval, for each of your timestamps and iteratively remove timestamps which belong to intervals starting earlier.

JavaScript

mat will be a dataframe, whose index and columns are timestamps. mat.values looks like this

JavaScript

set diagonal of this matrix to True

JavaScript

We will start with the first interval. From the first row of mat you can deduce that the second and third interval need to be dropped. So we filter out the rows and columns corresponding to these intervals, then move the next interval (row) and so on until we reach the last row. Note we do not need to check any intersections for the last row.

JavaScript

The result will be a dataframe whose values are all True. More importantly, the index (and columns) will be intervals whose left endpoints are the timestamps remaining after removing those within 14hrs.

i.e. pd.Series(mat.index.left) gives

JavaScript

You can use this to filter your original dataframe using pandas.Series.isin

note: I am the creator of piso. Please feel free to reach out with feedback or questions if you have any.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement