How to remove datetime values in a row that are within a certain time relative to another row?

Question

If I have a DataFrame as below: Letter Time 0 x 2021-01-01 14:00:00 1 y 2021-01-01 18:00:00 2 y 2021-01-03 14:00:00 How would I delete a row if a value in the Time column(datetime) is within say 14 hours from the time in the row above? I've tried using: but I get KeyError 1 in relation to the line if

Accepted Answer

If a timestamp is within 14hours of an earlier timestamp, does its removal depend on whether the earlier timestamp is removed or not?  This answer considers the situation where the answer to this question is &#8220;yes&#8221;.  (If the answer is &#8220;no&#8221; then the resulting solution for the test data below would be the first timestamp only).setuptest data:import pandas as pdtimestamps = pd.Series([0, 6,10,14,16,29,33,45,46]).apply(pd.Timedelta, unit="hours") + pd.Timestamp("2022")timestamps looks like this:0   2022-01-01 00:00:001   2022-01-01 06:00:002   2022-01-01 10:00:003   2022-01-01 14:00:004   2022-01-01 16:00:005   2022-01-02 05:00:006   2022-01-02 09:00:007   2022-01-02 21:00:008   2022-01-02 22:00:00dtype: datetime64[ns]The solution we are aiming for consists of the 1st, 4th, 6th and 8th timestamps.solutionThis solution will use piso (pandas interval set operations) package.  The idea is to create a 14hr window, i.e. interval, for each of your timestamps and iteratively remove timestamps which belong to intervals starting earlier.import piso# sort timestamps if not already sortedtimestamps = timestamps.sort_values()# create 14 hour windows for each timestamp.  Can be left-closed or right-closed, but not bothintervals = pd.IntervalIndex.from_arrays(timestamps, timestamps+pd.Timedelta("14h"))# create the "disjoint adjacency matrix", which indicates pairwise if intervals are disjointmat = piso.adjacency_matrix(intervals, edges="disjoint")mat will be a dataframe, whose index and columns are timestamps.  mat.values looks like thisarray([[False, False, False,  True,  True,  True,  True,  True,  True],       [False, False, False, False, False,  True,  True,  True,  True],       [False, False, False, False, False,  True,  True,  True,  True],       [ True, False, False, False, False,  True,  True,  True,  True],       [ True, False, False, False, False, False,  True,  True,  True],       [ True,  True,  True,  True, False, False, False,  True,  True],       [ True,  True,  True,  True,  True, False, False, False, False],       [ True,  True,  True,  True,  True,  True, False, False, False],       [ True,  True,  True,  True,  True,  True, False, False, False]])set diagonal of this matrix to Truemat.iloc[range(len(mat)),range(len(mat))] = TrueWe will start with the first interval.  From the first row of mat you can deduce that the second and third interval need to be dropped.  So we filter out the rows and columns corresponding to these intervals, then move the next interval (row) and so on until we reach the last row.  Note we do not need to check any intersections for the last row.i = 0while i < len(mat) -1:    mat = mat.loc[mat.iloc[i],mat.iloc[i]]    i+=1The result will be a dataframe whose values are all True.  More importantly, the index (and columns) will be intervals whose left endpoints are the timestamps remaining after removing those within 14hrs.i.e. pd.Series(mat.index.left) gives0   2022-01-01 00:00:001   2022-01-01 14:00:002   2022-01-02 05:00:003   2022-01-02 21:00:00dtype: datetime64[ns]You can use this to filter your original dataframe using pandas.Series.isinnote: I am the creator of piso. Please feel free to reach out with feedback or questions if you have any.

	Letter	Time
0	x	2021-01-01 14:00:00
1	y	2021-01-01 18:00:00
2	y	2021-01-03 14:00:00

Advertisement

Answer