I have a dataframe of houses, one row per house, that looks like this: And I want to end up with a table that looks like this: Currently I'm doing two separate groupby calls, and then joining them together: Is there a way I could do this with a single groupby? Answer Output:

pandas: combine size and sum in a single groupby?

I have a dataframe of houses, one row per house, that looks like this:

data = [
  ['Oxford', 2016, True],
  ['Oxford', 2016, True],
  ['Oxford', 2018, False],
  ['Cambridge', 2016, False],
  ['Cambridge', 2016, True]
]
df = pd.DataFrame(data, columns=['town', 'year', 'is_detached'])

JavaScript
​x
 
data = [
  ['Oxford', 2016, True],
  ['Oxford', 2016, True],
  ['Oxford', 2018, False],
  ['Cambridge', 2016, False],
  ['Cambridge', 2016, True]
]
df = pd.DataFrame(data, columns=['town', 'year', 'is_detached'])
​

        town  year  is_detached
0     Oxford  2016         True
1     Oxford  2016         True
2     Oxford  2018        False
3  Cambridge  2016        False
4  Cambridge  2016         True

JavaScript
 
        town  year  is_detached
0     Oxford  2016         True
1     Oxford  2016         True
2     Oxford  2018        False
3  Cambridge  2016        False
4  Cambridge  2016         True
​

And I want to end up with a table that looks like this:

        town  total_houses_2016  total_houses_2018  is_detached_2016  is_detached_2018
0     Oxford                  2                  1                 2                 0
1  Cambridge                  2                  0                 1                 0

JavaScript
 
        town  total_houses_2016  total_houses_2018  is_detached_2016  is_detached_2018
0     Oxford                  2                  1                 2                 0
1  Cambridge                  2                  0                 1                 0
​

Currently I’m doing two separate groupby calls, and then joining them together:

by_town_totals = df.groupby([df.town, df.year])
    .size()
    .reset_index()
    .pivot(index=["town"], columns="year", values=0).fillna(0)
    .add_prefix('total_houses_')
by_town_detached = df.groupby([df.town, df.year])
    .is_detached.sum().reset_index()
    .pivot(index=["town"], columns="year", values="is_detached").fillna(0)
    .add_prefix('is_detached_')
by_town = pd.concat([by_town_totals, by_town_detached], axis=1).reset_index()

JavaScript
 
by_town_totals = df.groupby([df.town, df.year])
    .size()
    .reset_index()
    .pivot(index=["town"], columns="year", values=0).fillna(0)
    .add_prefix('total_houses_')
by_town_detached = df.groupby([df.town, df.year])
    .is_detached.sum().reset_index()
    .pivot(index=["town"], columns="year", values="is_detached").fillna(0)
    .add_prefix('is_detached_')
by_town = pd.concat([by_town_totals, by_town_detached], axis=1).reset_index()
​

Is there a way I could do this with a single groupby?

Answer

df.year = df.year.astype(str)
df = df.pivot_table(index='town', 
                    columns='year', 
                    values='is_detached', 
                    aggfunc=['size', 'sum'], 
                    fill_value=0)
df.columns = (df.columns.to_flat_index()
                .str.join('_')
                .str.replace('size','total_houses')
                .str.replace('sum', 'is_detached'))
print(df.reset_index())

JavaScript
 
df.year = df.year.astype(str)
df = df.pivot_table(index='town', 
                    columns='year', 
                    values='is_detached', 
                    aggfunc=['size', 'sum'], 
                    fill_value=0)
df.columns = (df.columns.to_flat_index()
                .str.join('_')
                .str.replace('size','total_houses')
                .str.replace('sum', 'is_detached'))
print(df.reset_index())
​

Output:

        town  total_houses_2016  total_houses_2018  is_detached_2016  is_detached_2018
0  Cambridge                  2                  0                 1                 0
1     Oxford                  2                  1                 2                 0

JavaScript
 
        town  total_houses_2016  total_houses_2018  is_detached_2016  is_detached_2018
0  Cambridge                  2                  0                 1                 0
1     Oxford                  2                  1                 2                 0
​

Advertisement

Answer