I have the following dataset:
JavaScript
x
8
1
year ID Source Category Value
2
2010 1 A P 10
3
2010 1 B P 15
4
2010 1 A q 20
5
2011 2 A P 12
6
2011 2 B q 15
7
8
I wanna reorganize the dataset in the following way:
JavaScript
1
7
1
year ID Source Category Value
2
2010 1 A P 10
3
2010 1 A q 20
4
2011 2 A P 12
5
2011 2 B q 15
6
7
In word, in each year, for each category if we have value from multiple source (A,B), we will drop the source value of B. But if there is only source value of B, then we will keep that.
I have tried to fix it using df.groupby().count().replace('count')
not working though . Any suggestions to fix this issue?
Advertisement
Answer
Try
JavaScript
1
2
1
df.drop_duplicates(subset=['year', 'Category'], keep="first")
2