Skip to content
Advertisement

How drop duplicate rows based on a time delta whilst keep the latest occurrence of that record?

I have a table in the form:

ID DATE_ENCOUNTER LOAD
151336 2017-08-22 40
151336 2017-08-23 40
151336 2017-08-24 40
151336 2017-08-25 40
151336 2017-09-05 50
151336 2017-09-06 50
151336 2017-10-16 51
151336 2017-10-17 51
151336 2017-10-18 51
151336 2017-10-30 50
151336 2017-10-31 50
151336 2017-11-01 50
151336 2017-12-13 62
151336 2018-01-03 65
151336 2018-02-09 60

Although the dates are not the same, some records are duplicates (just within a 4 day delta).How do I drop duplicates (earliest records) in a dataframe if the timestamps/dates are close (within 4 day delta) but not identical. The result should present a table like below:

ID DATE_ENCOUNTER LOAD
151336 2017-08-25 40
151336 2017-09-06 50
151336 2017-10-18 51
151336 2017-11-01 50
151336 2017-12-13 62
151336 2018-01-03 65
151336 2018-02-09 60

I have tried:

JavaScript

Here is some code to generate the dataframe:

JavaScript

Note there are more fields and more IDs.

Advertisement

Answer

You can use:

JavaScript

output:

JavaScript
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement