I have an array of lists of numbers, e.g.:
[0] (0.01, 0.01, 0.02, 0.04, 0.03) [1] (0.00, 0.02, 0.02, 0.03, 0.02) [2] (0.01, 0.02, 0.02, 0.03, 0.02) ... [n] (0.01, 0.00, 0.01, 0.05, 0.03)
I would like to efficiently calculate the mean and standard deviation at each index of a list, across all array elements.
To do the mean, I have been looping through the array and summing the value at a given index of a list. At the end, I divide each value in my “averages list” by n
(I am working with a population, not a sample from the population).
To do the standard deviation, I loop through again, now that I have the mean calculated.
I would like to avoid going through the array twice, once for the mean and then once for the standard deviation (after I have a mean).
Is there an efficient method for calculating both values, only going through the array once? Any code in an interpreted language (e.g., Perl or Python) or pseudocode is fine.
Advertisement
Answer
The answer is to use Welford’s algorithm, which is very clearly defined after the “naive methods” in:
- Wikipedia: Algorithms for calculating variance
It’s more numerically stable than either the two-pass or online simple sum of squares collectors suggested in other responses. The stability only really matters when you have lots of values that are close to each other as they lead to what is known as “catastrophic cancellation” in the floating point literature.
You might also want to brush up on the difference between dividing by the number of samples (N) and N-1 in the variance calculation (squared deviation). Dividing by N-1 leads to an unbiased estimate of variance from the sample, whereas dividing by N on average underestimates variance (because it doesn’t take into account the variance between the sample mean and the true mean).
I wrote two blog entries on the topic which go into more details, including how to delete previous values online:
- Computing Sample Mean and Variance Online in One Pass
- Deleting Values in Welford’s Algorithm for Online Mean and Variance
You can also take a look at my Java implement; the javadoc, source, and unit tests are all online: