I have a dataframe of houses, one row per house, that looks like this:
data = [ ['Oxford', 2016, True], ['Oxford', 2016, True], ['Oxford', 2018, False], ['Cambridge', 2016, False], ['Cambridge', 2016, True] ] df = pd.DataFrame(data, columns=['town', 'year', 'is_detached'])
town year is_detached 0 Oxford 2016 True 1 Oxford 2016 True 2 Oxford 2018 False 3 Cambridge 2016 False 4 Cambridge 2016 True
And I want to end up with a table that looks like this:
town total_houses_2016 total_houses_2018 is_detached_2016 is_detached_2018 0 Oxford 2 1 2 0 1 Cambridge 2 0 1 0
Currently I’m doing two separate groupby calls, and then joining them together:
by_town_totals = df.groupby([df.town, df.year])
    .size()
    .reset_index()
    .pivot(index=["town"], columns="year", values=0).fillna(0)
    .add_prefix('total_houses_')
by_town_detached = df.groupby([df.town, df.year])
    .is_detached.sum().reset_index()
    .pivot(index=["town"], columns="year", values="is_detached").fillna(0)
    .add_prefix('is_detached_')
by_town = pd.concat([by_town_totals, by_town_detached], axis=1).reset_index()
Is there a way I could do this with a single groupby?
Advertisement
Answer
df.year = df.year.astype(str)
df = df.pivot_table(index='town', 
                    columns='year', 
                    values='is_detached', 
                    aggfunc=['size', 'sum'], 
                    fill_value=0)
df.columns = (df.columns.to_flat_index()
                .str.join('_')
                .str.replace('size','total_houses')
                .str.replace('sum', 'is_detached'))
print(df.reset_index())
Output:
town total_houses_2016 total_houses_2018 is_detached_2016 is_detached_2018 0 Cambridge 2 0 1 0 1 Oxford 2 1 2 0