Skip to content
Advertisement

How to get standard deviation across multiple 2d arrays by cell?

I have 16 2d-arrays, each in a shape of [16000, 16000], which means one array has 256000000 cells. I want to have a std_array that is the standard deviation of each cell in the 16 arrays. I tried something but failed, and my questions are in bold.

Here’s my attempt. For example (simplified 3*3 arrays):

JavaScript

However, the np.std function only returns 3 values, but I want 9. What should I do?

JavaScript

In addition, when I apply std on the stacked-arrays, I get this error. Does it simply mean that my arrays are too large to operate?

JavaScript

Advertisement

Answer

In your example, np.vstack((a,b,c)) just stack all lines of each array resulting in this one:

JavaScript

Computing the standard deviation along the axis 0 or 1 does not meet your requirements.

Instead, you can add a new dimension to each array so to stack them in a new dimension:

JavaScript

In this case stack is:

JavaScript

The result is a 2D array of shape (3,3) where the standard deviation is computed based on the 3 values coming from respectively each of the 3 arrays.

The thing is building a huge array so to reduce it later is not memory efficient. You can instead iterate over the lines so to build smaller arrays:

JavaScript

For higher performance, you can use Numba so to avoid the creation of many big arrays (mandatory with Numpy) that are expensive to build and fill.

Advertisement