Given a Pandas DataFrame that has multiple columns with categorical values (0 or 1), is it possible to conveniently get the value_counts for every column at the same time?
For example, suppose I generate a DataFrame as follows:
import numpy as np import pandas as pd np.random.seed(0) df = pd.DataFrame(np.random.randint(0, 2, (10, 4)), columns=list('abcd'))
I can get a DataFrame like this:
a b c d 0 0 1 1 0 1 1 1 1 1 2 1 1 1 0 3 0 1 0 0 4 0 0 0 1 5 0 1 1 0 6 0 1 1 1 7 1 0 1 0 8 1 0 1 1 9 0 1 1 0
How do I conveniently get the value counts for every column and obtain the following conveniently?
a b c d 0 6 3 2 6 1 4 7 8 4
My current solution is:
pieces = [] for col in df.columns: tmp_series = df[col].value_counts() tmp_series.name = col pieces.append(tmp_series) df_value_counts = pd.concat(pieces, axis=1)
But there must be a simpler way, like stacking, pivoting, or groupby?
Advertisement
Answer
Just call apply
and pass pd.Series.value_counts
:
In [212]: df = pd.DataFrame(np.random.randint(0, 2, (10, 4)), columns=list('abcd')) df.apply(pd.Series.value_counts) Out[212]: a b c d 0 4 6 4 3 1 6 4 6 7