What exactly is the function of as_index
in groupby
in Pandas?
Advertisement
Answer
print()
is your friend when you don’t understand a thing. It clears out doubts many times.
Take a look:
JavaScript
x
10
10
1
import pandas as pd
2
3
df = pd.DataFrame(data={'books':['bk1','bk1','bk1','bk2','bk2','bk3'], 'price': [12,12,12,15,15,17]})
4
5
print(df)
6
7
print(df.groupby('books', as_index=True).sum())
8
9
print(df.groupby('books', as_index=False).sum())
10
Output:
JavaScript
1
19
19
1
books price
2
0 bk1 12
3
1 bk1 12
4
2 bk1 12
5
3 bk2 15
6
4 bk2 15
7
5 bk3 17
8
9
price
10
books
11
bk1 36
12
bk2 30
13
bk3 17
14
15
books price
16
0 bk1 36
17
1 bk2 30
18
2 bk3 17
19
When as_index=True
the key(s) you use in groupby()
will become an index in the new dataframe.
The benefits you get when you set the column as index are:
Speed. When you filter values based on the index column eg.
df.loc['bk1']
, it would be faster because of hashing of index column. It doesn’t have to traverse the entirebooks
column to find'bk1'
. It will just calculate the hash value of'bk1'
and find it in 1 go.Ease. When
as_index=True
you can use this syntaxdf.loc['bk1']
which is shorter and faster as opposed todf.loc[df.books=='bk1']
which is longer and slower.