Skip to content
Advertisement

Handling duplicate values in pandas

I have a dataframe ,that looks like this

       site  Active
0     deals  Active
1     deals  InActive
2     deals  Active
3  discount  InActive
4  discount  Active

i don’t want to drop the duplicate items, but i want to change the Active columns value based on Site column,for example Active has to change inactive based on duplicate item in site column,Inactive also have to change based on number of duplicate items present,last duplicate item has to Active, other than that it has to be Inactive, If it is already inactive it also have to change based on that condition

Expected

       site    Active
0     deals  InActive
1     deals  InActive
2     deals    Active
3  discount  InActive
4  discount    Active

Advertisement

Answer

You can apply duplicated() function with keep='last' which will return False for the last duplicated value. Then simply replace True and False as per your need.

df1["Active"]=df1["site"].duplicated(keep='last').replace(True,"InActive").replace(False,"Active")
Advertisement