(I’m fairly new to Python and completely new to Pandas.)
I have software usage data in a tab-separated txt file like this:
IP_Addr Date Col2 Version Col4 Col5 Lang Country 160.86.229.29 2021-11-01 00:00:14.919 9.6 337722669 3 ja JPN 154.28.188.105 2021-11-01 00:00:19.774 9.7 480113424 3 de DEU 154.6.16.129 2021-11-01 00:00:52.460 9.0 3278201755 2 en USA 218.45.244.124 2021-11-01 00:01:33.853 9.7 1961440872 2 ja JPN 178.248.141.33 2021-11-01 00:01:51.114 9.5 2795265301 2 en EST
The DataFrame is imported correctly, and groupby methods like this work all right:
df.IP_Addr.groupby(df.Country).nunique()
However, when I’m trying to create a pivot table with this line:
country_and_lang = df.pivot_table(index=df.Country, columns=df.Lang, values=df.IP_Addr, aggfunc=df.IP_Addr.count)
I get
KeyError: '160.86.229.29'
where the “key” is the first IP value – which should not be used as a key at all.
What am I doing wrong?
Advertisement
Answer
Use column names instead values:
country_and_lang = df.pivot_table(index='Country', columns='Lang', values='IP_Addr', aggfunc='count') print(country_and_lang) # Output Lang de en ja Country DEU 1.0 NaN NaN EST NaN 1.0 NaN JPN NaN NaN 2.0 USA NaN 1.0 NaN
Or use pd.crosstab
:
country_and_lang = pd.crosstab(df['Country'], df['Lang'], df['IP_Addr'], aggfunc='count') print(country_and_lang) # Output Lang de en ja Country DEU 1.0 NaN NaN EST NaN 1.0 NaN JPN NaN NaN 2.0 USA NaN 1.0 NaN