Skip to content
Advertisement

How can you get rolling value count (frequency) with Pandas? (computationally efficient, no loops)

I have a list of values and I want to get their rolling frequency, so something like this:

df = pd.DataFrame({
    'val': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3]
})

result = df.val.rolling(3).freq()

result == pd.Series([1, 2, 3, 3, 3, 1, 2, 3, 3, 3, 1, 2, 3, 3, 3])

Of course I can do this with a loop but with a lot of data it can be computationally expensive so I’d much rather use a built-in or something vectorized, etc. But unfortunately, from my searching, there doesn’t seem to be a solution.

Thanks in advance!

Advertisement

Answer

The first n-1 elements of the result of a rolling function with window size n must be NaN per definition.

result = df.val.rolling(3).apply(lambda x: np.count_nonzero(x==x.iloc[-1])).astype('Int64')

Result:

0     <NA>
1     <NA>
2        3
3        3
4        3
5        1
6        2
7        3
8        3
9        3
10       1
11       2
12       3
13       3
14       3

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement