How to get standard deviation across multiple 2d arrays by cell?

Question

I have 16 2d-arrays, each in a shape of [16000, 16000], which means one array has 256000000 cells. I want to have a std_array that is the standard deviation of each cell in the 16 arrays. I tried something but failed, and my questions are in bold. Here's my attempt. For example (simplified 3*3 arrays): However, the np.std function only

Accepted Answer

In your example, np.vstack((a,b,c)) just stack all lines of each array resulting in this one:array([[1, 2, 3],       [1, 2, 3],       [1, 2, 3],       [2, 3, 4],       [2, 3, 4],       [2, 3, 4],       [3, 4, 5],       [3, 4, 5],       [3, 4, 5]])Computing the standard deviation along the axis 0 or 1 does not meet your requirements.Instead, you can add a new dimension to each array so to stack them in a new dimension:stack = np.vstack([a[None], b[None], c[None]])stack.std(axis=2)In this case stack is:array([[[1, 2, 3],   <-- array `a`        [1, 2, 3],        [1, 2, 3]],       [[2, 3, 4],   <-- array `b`        [2, 3, 4],        [2, 3, 4]],       [[3, 4, 5],   <-- array `c`        [3, 4, 5],        [3, 4, 5]]])The result is a 2D array of shape (3,3) where the standard deviation is computed based on the 3 values coming from respectively each of the 3 arrays.The thing is building a huge array so to reduce it later is not memory efficient. You can instead iterate over the lines so to build smaller arrays:result = np.empty(a.shape, dtype=np.float64)for i in range(a.shape[0]):    stacked_line = np.vstack([a[i, None], b[i, None], c[i, None]])    result[i,:] = stacked_line.std(axis=0)For higher performance, you can use Numba so to avoid the creation of many big arrays (mandatory with Numpy) that are expensive to build and fill.

Advertisement

Answer