Skip to content
Advertisement

Removing one source value when there are multiple sources

I have the following dataset:

year    ID    Source Category Value
2010     1     A      P         10
2010     1     B      P         15
2010     1     A      q         20
2011     2     A      P         12
2011     2     B      q         15

I wanna reorganize the dataset in the following way:

year    ID    Source Category Value
2010     1     A      P         10
2010     1     A      q         20
2011     2     A      P         12
2011     2     B      q         15

In word, in each year, for each category if we have value from multiple source (A,B), we will drop the source value of B. But if there is only source value of B, then we will keep that.

I have tried to fix it using df.groupby().count().replace('count') not working though . Any suggestions to fix this issue?

Advertisement

Answer

Try

df.drop_duplicates(subset=['year', 'Category'], keep="first")
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement