Skip to content
Advertisement

How to delete duplicates pandas

I need to check if there are some duplicates value in one column of a dataframe using Pandas and, if there is any duplicate, delete the entire row. I need to check just the first column.

Example:

object    type

apple     fruit
ball      toy
banana    fruit
xbox      videogame
banana    fruit
apple     fruit

What i need is:

object    type

apple     fruit
ball      toy
banana    fruit
xbox      videogame

I can delete the ‘object’ duplicates with the following code, but I can’t delete the entire row that contains the duplicate as the second column won’t be deleted.

df = pd.read_csv(directory, header=None,)

objects= df[0]

for object in df[0]:
   

Advertisement

Answer

Select by duplicated mask and negate it

df = df[~df["object"].duplicated()]

Which gives

   object       type
0   apple      fruit
1    ball        toy
2  banana      fruit
3    xbox  videogame
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement