I need to check if there are some duplicates value in one column of a dataframe using Pandas and, if there is any duplicate, delete the entire row. I need to check just the first column.
Example:
object type apple fruit ball toy banana fruit xbox videogame banana fruit apple fruit
What i need is:
object type apple fruit ball toy banana fruit xbox videogame
I can delete the ‘object’ duplicates with the following code, but I can’t delete the entire row that contains the duplicate as the second column won’t be deleted.
df = pd.read_csv(directory, header=None,) objects= df[0] for object in df[0]:
Advertisement
Answer
Select by duplicated mask and negate it
df = df[~df["object"].duplicated()]
Which gives
object type 0 apple fruit 1 ball toy 2 banana fruit 3 xbox videogame