Skip to content
Advertisement

Consolidating categories in columns

I have a df with a race column, which has 4 categories. However, I would like to only have three categories by combining the last two categories. This is what my current df looks like:

Year |State| Race |Sex | population
-----|-----|------|----|----------
2006 | CA  | 1    |1   | 504893
2006 | CA  | 1    |2   | 783760
2006 | CA  | 2    |1   | 800062
2006 | CA  | 2    |2   | 768300
2006 | CA  | 3    |1   | 347827
2006 | CA  | 3    |2   | 344672
2006 | CA  | 4    |1   | 565344
2006 | CA  | 4    |2   | 356779

I want to consolidate the race==3 and race ==4 into one value (which would be race ==3). So my new df output would look something like this:

Year |State| Race |Sex | population
-----|-----|------|----|----------
2006 | CA  | 1    |1   | 504893
2006 | CA  | 1    |2   | 783760
2006 | CA  | 2    |1   | 800062
2006 | CA  | 2    |2   | 768300
2006 | CA  | 3    |1   | 913171
2006 | CA  | 3    |2   | 701451

Notice, the last two rows in this df are the sum of the the last 4 rows on the last df (by sex). How can I accomplish this?

Advertisement

Answer

Replace Race 4 by 3 and group data by Race + Sex

df.loc[df['Race']==4, 'Race']=3
df = df.groupby(['Race','Sex'],as_index=False)['population'].sum()

You get

Year State Race Sex  population    
2006  CA   1    1    504893
2006  CA   1    2    783760
2006  CA   2    1    800062
2006  CA   2    2    768300
2006  CA   3    1    913171
2006  CA   3    2    701451
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement