What exactly is the function of as_index
in groupby
in Pandas?
Advertisement
Answer
print()
is your friend when you don’t understand a thing. It clears out doubts many times.
Take a look:
import pandas as pd df = pd.DataFrame(data={'books':['bk1','bk1','bk1','bk2','bk2','bk3'], 'price': [12,12,12,15,15,17]}) print(df) print(df.groupby('books', as_index=True).sum()) print(df.groupby('books', as_index=False).sum())
Output:
books price 0 bk1 12 1 bk1 12 2 bk1 12 3 bk2 15 4 bk2 15 5 bk3 17 price books bk1 36 bk2 30 bk3 17 books price 0 bk1 36 1 bk2 30 2 bk3 17
When as_index=True
the key(s) you use in groupby()
will become an index in the new dataframe.
The benefits you get when you set the column as index are:
Speed. When you filter values based on the index column eg.
df.loc['bk1']
, it would be faster because of hashing of index column. It doesn’t have to traverse the entirebooks
column to find'bk1'
. It will just calculate the hash value of'bk1'
and find it in 1 go.Ease. When
as_index=True
you can use this syntaxdf.loc['bk1']
which is shorter and faster as opposed todf.loc[df.books=='bk1']
which is longer and slower.