Skip to content
Advertisement

GaussianProcessRegressor ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size

I am running the following code:

JavaScript

The shape of my input is: (19142, 21) dtypes are each: float64

Added in Edit: X and y are Pandas Dataframes. After .values they’re each numpy arrays

And I get the Error:

ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.

I cant image a dataset of 20000 * 20 to be actually too big for gaussian processes, am I wrong?

The entire error message:

JavaScript

Advertisement

Answer

I believe this happened because of the dot product kernel: The traceback in line 2112 leads to the numpy inner product. So, the memory error you get actually arises in numpy and not in Scikit-learn. See also this SO question and this answer where it suggests that error is raised when numpy is calculating the expected array size of the result of the inner prosuct which could lead to an integer overflow in 32-bit Python. My python setup is 64-bit so I can’t do a consistent test but the following snippet runs without error:

JavaScript

I would suggest running your model with less features in order to see at which array shape the memory error is raised. Alternatively you may try different kernels that don’t require the inner product of X.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement