(I’m fairly new to Python and completely new to Pandas.)
I have software usage data in a tab-separated txt file like this:
JavaScript
x
7
1
IP_Addr Date Col2 Version Col4 Col5 Lang Country
2
160.86.229.29 2021-11-01 00:00:14.919 9.6 337722669 3 ja JPN
3
154.28.188.105 2021-11-01 00:00:19.774 9.7 480113424 3 de DEU
4
154.6.16.129 2021-11-01 00:00:52.460 9.0 3278201755 2 en USA
5
218.45.244.124 2021-11-01 00:01:33.853 9.7 1961440872 2 ja JPN
6
178.248.141.33 2021-11-01 00:01:51.114 9.5 2795265301 2 en EST
7
The DataFrame is imported correctly, and groupby methods like this work all right:
JavaScript
1
2
1
df.IP_Addr.groupby(df.Country).nunique()
2
However, when I’m trying to create a pivot table with this line:
JavaScript
1
2
1
country_and_lang = df.pivot_table(index=df.Country, columns=df.Lang, values=df.IP_Addr, aggfunc=df.IP_Addr.count)
2
I get
JavaScript
1
2
1
KeyError: '160.86.229.29'
2
where the “key” is the first IP value – which should not be used as a key at all.
What am I doing wrong?
Advertisement
Answer
Use column names instead values:
JavaScript
1
12
12
1
country_and_lang = df.pivot_table(index='Country', columns='Lang',
2
values='IP_Addr', aggfunc='count')
3
print(country_and_lang)
4
5
# Output
6
Lang de en ja
7
Country
8
DEU 1.0 NaN NaN
9
EST NaN 1.0 NaN
10
JPN NaN NaN 2.0
11
USA NaN 1.0 NaN
12
Or use pd.crosstab
:
JavaScript
1
12
12
1
country_and_lang = pd.crosstab(df['Country'], df['Lang'],
2
df['IP_Addr'], aggfunc='count')
3
print(country_and_lang)
4
5
# Output
6
Lang de en ja
7
Country
8
DEU 1.0 NaN NaN
9
EST NaN 1.0 NaN
10
JPN NaN NaN 2.0
11
USA NaN 1.0 NaN
12