Skip to content
Advertisement

What’s a pythonic way (native function in pandas) to count occurrences of a certain value within cases (SPSS COUNT equivalent)?

I need to count occurrences of a certain value (let’s assume it’s 3) in a range of columns per each case. To do so I wrote a script as below:

import pandas as pd
import numpy as np

objsourcedf = pd.DataFrame({"a": [1, 2, 2], "b": [3, 1, 1], 
                            "c": [3, 2, 1], "d": [4, 3, 8]})
print(objsourcedf)

objauxdf = objsourcedf.transpose()
objauxdf.loc["counts"] = np.sum(objauxdf == 3)  

objsourcedf = objsourcedf.assign(counts=list(objauxdf.loc["counts"]))
print(objsourcedf)

First print is:

   a  b  c  d
0  1  3  3  4
1  2  1  2  3
2  2  1  1  8

Second:

   a  b  c  d  counts
0  1  3  3  4       2
1  2  1  2  3       1
2  2  1  1  8       0

Even though it works fine I am pretty sure there is a more pythonic way to do so. By ‘pythonic’ I mean using native, concise pandas feature and no looping through columns/rows. For example, in SPSS there is a simple count command so regarding this objsourcedf this line would be:

count counts = a b c d (3).
execute.

Sadly, as a beginner in Python and pandas I couldn’t find anything so I’m asking you if there’s a more simple way to get occurences?

Advertisement

Answer

I hope this qualifies at being “Pythonic”:

objsourcedf['count'] = objsourcedf.eq(3).sum(axis=1)
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement