of the Month
I recently received an email from Michael in Switzerland asking me if I could help explain why he was getting differences when comparing the variance results between SAS and Excel using the same set of numbers.
Traditionally the variance is calculated as
Alternately the equation can be written as
The first equation (a) requires two passes of the data - the first pass to calculate the mean and the second to calculate the variance.
This second equation (b) is more commonly known as the Desktop Calculator Formula as it is possible to calculate the variance with one pass of the data. There is one issue with this formula as if the numbers are big and the differences between the numbers are small an incorrect result will prevail. This is because computers and calculators store results as real numbers and some precision may be lost when storing the sum of the square values across a long list of numbers. As an example calculate the variance of 10,000,001, 10,000,003 and 10,000,005 - the answer is 4 but some calculators may produce an answer of 0.
Now what about SAS and Excel, what do they do. SAS uses the Traditional Formula. Microsoft used the Desktop Calculator Formula until Excel 2003 (for Windows) and Excel (for Mac) when it changed to use the Traditional Formula1. The Desktop Calculator Formula appears to have been used by Microsoft as it was faster than doing the Traditional Formula, the latter requiring two passes of the data.
Are there any calculations that will produce a result with one pass of data? The answer is yes. One formula is known as the Method of Provisional Means where
There are other methods that calculate the variance in one pass of the data but these are for later months.
I do hope this is useful. The moral of the story is to always check with your software documentation to see how statistics are being calculated.
Updated May 9, 2005