I have an array with shape (128,116,116,1)
, where 1st dimension asthe number of subjects, with the 2nd and 3rd being the data.
I was trying to calculate the variance (squared deviation from the mean) at each position (i.e: in (0,0), (0,1), (1,0), etc… until (116,116)) for all the 128 subjects, resulting in an array with shape (116,116)
.
Can anyone tell me how to accomplish this?
Thank you!
Advertisement
Answer
Let’s say we have a multidimensional list a
of shape (3,2,2)
import numpy as np
a =
[
[
[1,1],
[1,1]
],
[
[2,2],
[2,2]
],
[
[3,3],
[3,3]
],
]
np.var(a, axis = 0) # results in:
> array([[0.66666667, 0.66666667],
> [0.66666667, 0.66666667]])
If you want to efficiently compute the variance across all 128 subjects (which would be axis 0
), I don’t see a way to do it using the statistics
package since it doesn’t take multi-lists as input. So you will have to write your own code/logic and add loops on the subjects.
But, using the numpy.var
function, we can easily calculate the variance of each ‘datapoint’ (tuples of indices) across all 128 subjects.
Side note: You mentioned statistics.variance
. However, that is only to be used when you are taking a sample from a population as is mentioned in the documentation you linked. If you were to go the manual route, you would use statistics.pvariance
instead, since we are calculating it on the whole dataset.
The difference can be seen here:
statistics.pvariance([1,2,3])
> 0.6666666666666666 # (correct)
statistics.variance([1,2,3])
> 1 # (incorrect)
np.var([1,2,3])
> 0.6666666666666666 # (np.var also gives the correct output)