Removing one source value when there are multiple …

I have the following dataset:

year    ID    Source Category Value
2010     1     A      P         10
2010     1     B      P         15
2010     1     A      q         20
2011     2     A      P         12
2011     2     B      q         15

I wanna reorganize the dataset in the following way:

year    ID    Source Category Value
2010     1     A      P         10
2010     1     A      q         20
2011     2     A      P         12
2011     2     B      q         15

In word, in each year, for each category if we have value from multiple source (A,B), we will drop the source value of B. But if there is only source value of B, then we will keep that.

I have tried to fix it using df.groupby().count().replace('count') not working though . Any suggestions to fix this issue?

Answer

Try

df.drop_duplicates(subset=['year', 'Category'], keep="first")

Removing one source value when there are multiple sources

Advertisement

Answer