Skip to content
Advertisement

Count the total number of multiple distinct occurrences in the same data frame

Suppose we have the data frame df

    c1  c2  c3  c4  c5  c6
0   'A' 'B' NaN NaN NaN NaN
1   'C' 'D' NaN NaN NaN NaN
2   'A' 'A' 'B' NaN NaN NaN
3   'A' 'B' 'C' NaN NaN NaN
4   NaN NaN NaN NaN NaN NaN

I know that to count the number of 'B' I have to use (df == 'B').sum().sum(). Now suppose that I want to count how many elements contained in the list v = ['B', 'C'] there are in the data frame. What could be a way of doing this?

Obviously (df == 'B').sum().sum() + (df == 'C').sum().sum() is okay but I need something more general.

(df.isin(v)).sum().sum() works fine.

Advertisement

Answer

Just stack the dataframe, which will create a series, then you can use isin, and call sum() at last.

>>> df.stack().isin(['B', 'C']).sum()
5

Also, using isin directly on the dataframe will work fine calling sum twice:

>>> df.isin(['B', 'C']).sum().sum()
5
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement