Skip to content
Advertisement

Why does pandas rolling apply throw ValueError when used on axis=1?

Overview

I am getting a ValueError when trying to apply a simple function over a dataframe with axis=1 (details below). It looks like it is trying to unpack the output into the columns of the dataframe instead of rows. The problem seems to be related to the apply() specifically, and only occurs when axis=1 is used. Why is this error occurring?

Example

Here is a simple example to reproduce the error (obviously in my use case the function I actually want to apply does not exist as a pandas built in):

import pandas as pd
import numpy as np

# data and dummy function
df = df = pd.DataFrame(2 * np.arange(10).reshape(2,5) - 1, columns=list('abcde'))

def my_min(s):
    """
    expects a series as input, outputs the min value
    """
    return s.min()

# when try to apply rolling across the rows it throws error
df.rolling(window=3, min_periods=3, axis=1).apply(my_min)

The relevant part of the traceback is:

enter image description here

Expected output

It works when using the built in min, which is why I guess the problem is related to the apply function itself:

df.rolling(window=3, min_periods=2, axis=1).min()

Gives the expected output:

enter image description here

What I have tried

  1. Checking docs at here, there doesn’t seem to be any useful hints. Just that the apply function should expect a series (when Raw=False, which is default behaviour) and return a scalar.
  2. I also note that when I first transpose the dataframe and run on axis=0, it works fine. So an easy workaround is df.T.rolling(window=3, min_periods=2, axis=0).apply(my_min).T. But it does not answer my question as to why the behaviour is different when rolling across axis=1.
  3. I have noted a related question here, but as far as I can tell it does not answer mine.

Thanks!

Advertisement

Answer

This doesn’t produce an error with the latest pandas (1.4.4):

pd.__version__
1.4.4

df.rolling(window=3, min_periods=3, axis=1).apply(my_min)
    a   b    c     d     e
0 NaN NaN -1.0   1.0   3.0
1 NaN NaN  9.0  11.0  13.0

Versions older than 1.4.1 are impacted (issue #45912). A workaround is to use raw=True:

df.rolling(window=3, min_periods=3, axis=1).apply(my_min, raw=True)
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement