Skip to content
Advertisement

Tag: duplicates

How drop duplicate rows based on a time delta whilst keep the latest occurrence of that record?

I have a table in the form: ID DATE_ENCOUNTER LOAD 151336 2017-08-22 40 151336 2017-08-23 40 151336 2017-08-24 40 151336 2017-08-25 40 151336 2017-09-05 50 151336 2017-09-06 50 151336 2017-10-16 51 151336 2017-10-17 51 151336 2017-10-18 51 151336 2017-10-30 50 151336 2017-10-31 50 151336 2017-11-01 50 151336 2017-12-13 62 151336 2018-01-03 65 151336 2018-02-09 60 Although the dates are not

Summing duplicates rows

I have a database with more than 300 duplicates that look like this: I want that for each duplicate shipment_id only original_cost gets added together and rates remain as they are. like for these duplicates: it should look something like this: is there any way to do this? Answer Group by the duplicate values ([‘shipment_id’, ‘rate’]) and use transform on

Replace a series of repeated occurrences in a list?

I would like to replace the consecutive occurrences with the first appearance. For example if I currently have a list the desired output will be I know that I can definitely do this by using a for loop to iterate through the list but is there a more pythonic way of doing it? Answer Without import anything:

How to drop rows in one DataFrame based on one similar column in another Dataframe that has a different number of rows

I have two DataFrames that are completely dissimilar except for certain values in one particular column: How would I go about finding the matching values in the Email column of df and the Contact column of df2, and then dropping the whole row in df based on that match? Output I’m looking for (index numbering doesn’t matter): I’ve been able

pandas, merge duplicates if row contains wildcard text

I have a dataset of duplicates (ID). Dataset contains both information and emails. I’m trying to concatenate the emails (if row have character @) and then remove the duplicates. My original dataset: What I wish to accomplish: My current code is a modification of Eric Ed Lohmar code and give the following output. My issue is that I’m not able

Filter non-duplicated records in Python-pandas, based on group-by column and row-level comparison

This is a complicated issue and I am not able to figure this out, and I really appreciate your help in this. The below dataframe is generated from a pandas function DataFrame.duplicated(), based on ‘Loc'(groupby) and ‘Category’ repeated records are marked as True/False accordingly. My Expectation is to create another column based on ‘Loc'(groupby), ‘Category’ and ‘IsDuplicate’ to represent only

Advertisement