I would like to ask a question concerning pivot tables in Pandas.
I have been trying to make a pivot table for this kind of table:
| sector | score |
|---|---|
| US | null |
| US | null |
| US | 1 |
| EU | null |
| EU | 2 |
| EU | 2 |
| EU | 4 |
| UK | null |
| UK | null |
| UK | null |
| UK | 4 |
| UK | 4 |
Eventually, I would like this table to be a pivot table that would look like this:
| null | 1 | 2 | 4 | |
|---|---|---|---|---|
| US | 2 | 1 | 0 | 0 |
| EU | 1 | 0 | 2 | 1 |
| UK | 3 | 0 | 0 | 2 |
In order to do so, I have been trying to do as follows:
import pandas as pd
import numpy as np
df = sql("SELECT sector, score FROM database")
df_piv = pd.pivot_table(
df,
index = 'sector',
columns = 'score',
values = 'score',
aggfunc = 'count'
)
However, by doing so, I keep getting the following error:
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
Could you please help me?
Thanks :)
Advertisement
Answer
I prefer using groupby, you can use
>>> df.groupby('sector')['score'].value_counts(dropna=False).unstack(fill_value=0)
score NaN 1.0 2.0 4.0
sector
EU 1 0 2 1
UK 3 0 0 2
US 2 1 0 0
Sidenote:
If you did not want to count the missing values,
pd.pivot_table(df, index='sector', columns='score', aggfunc=len, fill_value=0)
would do the trick. Using len as an actual function because for some reason pivot_table‘s aggfunc parameter does not seem to accept string shortcuts like groupby.agg/transform.