I have the following dataset:
year ID Source Category Value 2010 1 A P 10 2010 1 B P 15 2010 1 A q 20 2011 2 A P 12 2011 2 B q 15
I wanna reorganize the dataset in the following way:
year ID Source Category Value 2010 1 A P 10 2010 1 A q 20 2011 2 A P 12 2011 2 B q 15
In word, in each year, for each category if we have value from multiple source (A,B), we will drop the source value of B. But if there is only source value of B, then we will keep that.
I have tried to fix it using df.groupby().count().replace('count')
not working though . Any suggestions to fix this issue?
Advertisement
Answer
Try
df.drop_duplicates(subset=['year', 'Category'], keep="first")