Variance, Standard Deviation, and Standard Error

In relation to an entire population, the variance and the standard deviation are parameters that describe the dispersion of the values for a characteristic of that population. For example, if the characteristic of interest is the height of the adult individuals in the population, then the variance and standard deviation represent how spread out those heights are. If the population comprises people that are all between 61 and 68 inches, the variance, and the standard deviation, will be smaller than that for a population composed of people whose heights are evenly spread between 48 and 74 inches.

In relation to data, the variance and the standard deviation are statistics that measure how widely spread the values in a dataset are. These are highly useful statistics in that they can be used to calculate a standard error, which can then be used to create confidence intervals or margins of error for sample means and other statistics.

To distinguish the standard deviation and variance that are statistics from the parameters that they estimate, we will call the standard deviation statistic the sample standard deviation, or s, and the variance statistic the sample variance, or s2.

Both the sample variance and the sample standard deviation are always non-negative. If the data points are all close to the mean, then the sample variance and the sample standard deviation are close to zero. If many data points are far from the mean, then the sample variance and the sample standard deviation are far from zero. If all the data values are equal, then the sample variance and the sample standard deviation are both zero.

The formula for the sample variance, or the variance of a simple random sample of data, is as follows:


Source: Wikipedia [disclaimer]

In this formula, s2 is the symbol that represents the sample variance, n is the size of the simple random sample, is the mean of the sample, and yi is an observation, or datapoint, for individual i. The sample standard deviation is then the square root of the sample variance.

The standard error (se) balances the dispersion associated with the underlying population and the error associated with the sampling process. It is derived by dividing the sample standard deviation, s, by the square root of the sample size, . We can think of the standard error as measuring how precisely we have estimated the population mean, or another parameter, via the sample mean, or another statistic. As the sample size gets bigger and bigger, the standard error will shrink, reflecting the fact that our estimate for the mean, or another statistic, will become more and more precise.

Calculation of the sample variance, sample standard deviation, and the standard error in the case of a complex sample design is described at http://epubs.surrey.ac.uk.