Chapter 11 Language of Descriptive Statistics

Section 11.3 Statistical Measures

11.3.3 Measures of Dispersion


Means and quantiles are measures of position, i.e. they give information on the absolute position of the qualitative values xj . If we add a constant c to every value xj , then the position measures also increase by c. In contrast, measures of dispersion are measures that give information on the dispersion or relative distribution of the data values independent of their absolute position. Consider a sample of size n2 of a quantitative property X. Let the original list be given by x=( x1 , x2 ,, xn )n .
Info 11.3.15
 
The sample variance of the original list is defined as

sx 2   =   1 n-1 · k=1 n( xk - x )2   =   ( x1 - x )2 ++( xn - x )2 n-1 .

The sample standard deviation is defined by sx =+ sx 2 .

The sample variance is a measure of dispersion that describes the variability of the observation sample. The smaller the variance the "closer" the data values lie to each other. A variance sx 2 =0 is only possible if all data values are equal. Typically, it strongly increases with increasing n. The standard deviation is a more appropriate measure for the "broadness" of the distribution of data values. The two formulas given above have a few pitfalls:
  • Before the variance can be calculated the mean x must already be known.
  • The fact that in the definition of sx 2 is divided by n-1 and not by n is for deeper mathematical reasons that can only be discussed in a statistics lecture.
  • The notation sx =+ sx 2 is a little misleading. You must not cancel the square by the square root, since the sum sx 2 must be calculated (and this value is not defined as a single square) to determine sx .
  • Be careful using a scientific calculator with statistical functions: the sample variance is available via the s2 key. The σ2 key, however, provides the sum with denominator n instead of n-1. This is not the sample standard deviation.

Example 11.3.16
The data sequence x=(-1,0,1) has the mean x =0 and the sample standard deviation

sx 2   =   1 n-1 · k=1 n( xk - x )2   =   1 3-1 ·((-1-0 )2 +(0-0 )2 +(1-0 )2 )  =  1.

Adding further zeros to the data sequence does not change the position measure x , but the measure of deviation sx 2 ,does change since the data values here are more strongly concentrated at the mean. In contrast, shifting all data values by a constant does not change the variance. For example, the data sequence (-5,-4,-3) has also variance 1.

Exercise 11.3.17
A data sequence (with an unknown number n of values) has the measures x =4, sx 2 =10, and the median x ~ =3. Suppose the values of a second data sequence satisfy the equation yk =(-2)· xk for every k. What are its measures?  
Answer: the measures are y =
, sy 2 =
, and y ~ =
.  
Hint: recall the definitions of the mean, the sample variance, and the median consider how multiplying all x-values by a factor of (-2) influences the entire expression.