I need to check if there are some duplicates value in one column of a dataframe using Pandas and, if there is any duplicate, delete the entire row. I need to check just the first column.
Example:
JavaScript
x
9
1
object type
2
3
apple fruit
4
ball toy
5
banana fruit
6
xbox videogame
7
banana fruit
8
apple fruit
9
What i need is:
JavaScript
1
7
1
object type
2
3
apple fruit
4
ball toy
5
banana fruit
6
xbox videogame
7
I can delete the ‘object’ duplicates with the following code, but I can’t delete the entire row that contains the duplicate as the second column won’t be deleted.
JavaScript
1
7
1
df = pd.read_csv(directory, header=None,)
2
3
objects= df[0]
4
5
for object in df[0]:
6
7
Advertisement
Answer
Select by duplicated mask and negate it
JavaScript
1
2
1
df = df[~df["object"].duplicated()]
2
Which gives
JavaScript
1
6
1
object type
2
0 apple fruit
3
1 ball toy
4
2 banana fruit
5
3 xbox videogame
6