Skip to content
Advertisement

How drop duplicate rows based on a time delta whilst keep the latest occurrence of that record?

I have a table in the form:

ID DATE_ENCOUNTER LOAD
151336 2017-08-22 40
151336 2017-08-23 40
151336 2017-08-24 40
151336 2017-08-25 40
151336 2017-09-05 50
151336 2017-09-06 50
151336 2017-10-16 51
151336 2017-10-17 51
151336 2017-10-18 51
151336 2017-10-30 50
151336 2017-10-31 50
151336 2017-11-01 50
151336 2017-12-13 62
151336 2018-01-03 65
151336 2018-02-09 60

Although the dates are not the same, some records are duplicates (just within a 4 day delta).How do I drop duplicates (earliest records) in a dataframe if the timestamps/dates are close (within 4 day delta) but not identical. The result should present a table like below:

ID DATE_ENCOUNTER LOAD
151336 2017-08-25 40
151336 2017-09-06 50
151336 2017-10-18 51
151336 2017-11-01 50
151336 2017-12-13 62
151336 2018-01-03 65
151336 2018-02-09 60

I have tried:

m = df.groupby('ID').DATE_ENCOUNTER.apply(lambda x: x.diff().dt.days < 4)
m2 = df.ID.duplicated(keep=false) & (m | m.shift(-1))
df_dedup2 = df[~m2]

Here is some code to generate the dataframe:

import pandas as pd
details = {
    'ID':[151336,151336,151336,151336,151336,151336,151336,151336,151336,151336,151336,151336,151336,151336,151336],
    'DATE_ENCOUNTER':['2017-08-22','2017-08-23','2017-08-24','2017-08-25','2017-09-05','2017-09-06','2017-10-16','2017-10-17','2017-10-18','2017-10-30','2017-10-31','2017-11-01','2017-12-13','2018-01-03','2018-02-09'],
    'LOAD':[40,40,40,40,50,50,51,51,51,50,50,50,62,65,60]
}
df=pd.DataFrame(details)

Note there are more fields and more IDs.

Advertisement

Answer

You can use:

df[(df.groupby('ID')
      ['DATE_ENCOUNTER']
      .diff(-1).dt.days.mul(-1) # calculate the difference
      .fillna(float('inf'))     # make sure last row is kept
      .ge(4)                    # select diff >= 4
   )]

output:

        ID DATE_ENCOUNTER  LOAD
3   151336     2017-08-25    40
5   151336     2017-09-06    50
8   151336     2017-10-18    51
11  151336     2017-11-01    50
12  151336     2017-12-13    62
13  151336     2018-01-03    65
14  151336     2018-02-09    60
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement