What exactly is the function of as_index in groupby in Pandas?
Advertisement
Answer
print() is your friend when you don’t understand a thing. It clears out doubts many times.
Take a look:
import pandas as pd
df = pd.DataFrame(data={'books':['bk1','bk1','bk1','bk2','bk2','bk3'], 'price': [12,12,12,15,15,17]})
print(df)
print(df.groupby('books', as_index=True).sum())
print(df.groupby('books', as_index=False).sum())
Output:
books price
0 bk1 12
1 bk1 12
2 bk1 12
3 bk2 15
4 bk2 15
5 bk3 17
price
books
bk1 36
bk2 30
bk3 17
books price
0 bk1 36
1 bk2 30
2 bk3 17
When as_index=True the key(s) you use in groupby() will become an index in the new dataframe.
The benefits you get when you set the column as index are:
Speed. When you filter values based on the index column eg.
df.loc['bk1'], it would be faster because of hashing of index column. It doesn’t have to traverse the entirebookscolumn to find'bk1'. It will just calculate the hash value of'bk1'and find it in 1 go.Ease. When
as_index=Trueyou can use this syntaxdf.loc['bk1']which is shorter and faster as opposed todf.loc[df.books=='bk1']which is longer and slower.