I would like to ask a question concerning pivot tables in Pandas. I have been trying to make a pivot table for this kind of table: sector score US null US null US 1 EU null EU 2 EU 2 EU 4 UK null UK null UK null UK 4 UK 4 Eventually, I would like this table to be

Pandas pivot table count

I would like to ask a question concerning pivot tables in Pandas.
I have been trying to make a pivot table for this kind of table:

sector	score
US	null
US	null
US	1
EU	null
EU	2
EU	2
EU	4
UK	null
UK	null
UK	null
UK	4
UK	4

Eventually, I would like this table to be a pivot table that would look like this:

	null	1	2	4
US	2	1	0	0
EU	1	0	2	1
UK	3	0	0	2

In order to do so, I have been trying to do as follows:

import pandas as pd
import numpy as np

df = sql("SELECT sector, score FROM database")

df_piv = pd.pivot_table(
    df,
    index = 'sector',
    columns = 'score',
    values = 'score',
    aggfunc = 'count'
)

JavaScript
​x
 
import pandas as pd
import numpy as np
​
df = sql("SELECT sector, score FROM database")
​
df_piv = pd.pivot_table(
    df,
    index = 'sector',
    columns = 'score',
    values = 'score',
    aggfunc = 'count'
)
​

However, by doing so, I keep getting the following error:

ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.

JavaScript
 
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
​

Could you please help me?
Thanks :)

Answer

I prefer using groupby, you can use

>>> df.groupby('sector')['score'].value_counts(dropna=False).unstack(fill_value=0)
score   NaN  1.0  2.0  4.0
sector                    
EU        1    0    2    1
UK        3    0    0    2
US        2    1    0    0

JavaScript
 
>>> df.groupby('sector')['score'].value_counts(dropna=False).unstack(fill_value=0)
score   NaN  1.0  2.0  4.0
sector                    
EU        1    0    2    1
UK        3    0    0    2
US        2    1    0    0
​

Sidenote:

If you did not want to count the missing values,

pd.pivot_table(df, index='sector', columns='score', aggfunc=len, fill_value=0)

JavaScript
 
pd.pivot_table(df, index='sector', columns='score', aggfunc=len, fill_value=0)
​

would do the trick. Using len as an actual function because for some reason pivot_table‘s aggfunc parameter does not seem to accept string shortcuts like groupby.agg/transform.

Advertisement

Answer