I have a dataframe of houses, one row per house, that looks like this:
JavaScript
x
9
1
data = [
2
['Oxford', 2016, True],
3
['Oxford', 2016, True],
4
['Oxford', 2018, False],
5
['Cambridge', 2016, False],
6
['Cambridge', 2016, True]
7
]
8
df = pd.DataFrame(data, columns=['town', 'year', 'is_detached'])
9
JavaScript
1
7
1
town year is_detached
2
0 Oxford 2016 True
3
1 Oxford 2016 True
4
2 Oxford 2018 False
5
3 Cambridge 2016 False
6
4 Cambridge 2016 True
7
And I want to end up with a table that looks like this:
JavaScript
1
4
1
town total_houses_2016 total_houses_2018 is_detached_2016 is_detached_2018
2
0 Oxford 2 1 2 0
3
1 Cambridge 2 0 1 0
4
Currently I’m doing two separate groupby calls, and then joining them together:
JavaScript
1
11
11
1
by_town_totals = df.groupby([df.town, df.year])
2
.size()
3
.reset_index()
4
.pivot(index=["town"], columns="year", values=0).fillna(0)
5
.add_prefix('total_houses_')
6
by_town_detached = df.groupby([df.town, df.year])
7
.is_detached.sum().reset_index()
8
.pivot(index=["town"], columns="year", values="is_detached").fillna(0)
9
.add_prefix('is_detached_')
10
by_town = pd.concat([by_town_totals, by_town_detached], axis=1).reset_index()
11
Is there a way I could do this with a single groupby?
Advertisement
Answer
JavaScript
1
12
12
1
df.year = df.year.astype(str)
2
df = df.pivot_table(index='town',
3
columns='year',
4
values='is_detached',
5
aggfunc=['size', 'sum'],
6
fill_value=0)
7
df.columns = (df.columns.to_flat_index()
8
.str.join('_')
9
.str.replace('size','total_houses')
10
.str.replace('sum', 'is_detached'))
11
print(df.reset_index())
12
Output:
JavaScript
1
4
1
town total_houses_2016 total_houses_2018 is_detached_2016 is_detached_2018
2
0 Cambridge 2 0 1 0
3
1 Oxford 2 1 2 0
4