#### Chapter 11 Language of Descriptive Statistics

Section 11.3 Statistical Measures

# 11.3.3 Measures of Dispersion

Means and quantiles are measures of position, i.e. they give information on the absolute position of the qualitative values ${x}_{j}$. If we add a constant $c$ to every value ${x}_{j}$, then the position measures also increase by $c$. In contrast, measures of dispersion are measures that give information on the dispersion or relative distribution of the data values independent of their absolute position. Consider a sample of size $n\ge 2$ of a quantitative property $X$. Let the original list be given by $x=\left({x}_{1},{x}_{2},\dots ,{x}_{n}\right)\in ℝ{}^{n}$.
##### Info 11.3.15

The sample variance of the original list is defined as

${s}_{x}^{2}\mathrm{ }=\mathrm{ }\frac{1}{n-1}·\sum _{k=1}^{n}\left({x}_{k}-\stackrel{‾}{x}{\right)}^{2}\mathrm{ }=\mathrm{ }\frac{\left({x}_{1}-\stackrel{‾}{x}{\right)}^{2}+\dots +\left({x}_{n}-\stackrel{‾}{x}{\right)}^{2}}{n-1} .$

The sample standard deviation is defined by ${s}_{x}=+\sqrt{{s}_{x}^{2}}$.

The sample variance is a measure of dispersion that describes the variability of the observation sample. The smaller the variance the "closer" the data values lie to each other. A variance ${s}_{x}^{2}=0$ is only possible if all data values are equal. Typically, it strongly increases with increasing $n$. The standard deviation is a more appropriate measure for the "broadness" of the distribution of data values. The two formulas given above have a few pitfalls:
• Before the variance can be calculated the mean $\stackrel{‾}{x}$ must already be known.
• The fact that in the definition of ${s}_{x}^{2}$ is divided by $n-1$ and not by $n$ is for deeper mathematical reasons that can only be discussed in a statistics lecture.
• The notation ${s}_{x}=+\sqrt{{s}_{x}^{2}}$ is a little misleading. You must not cancel the square by the square root, since the sum ${s}_{x}^{2}$ must be calculated (and this value is not defined as a single square) to determine ${s}_{x}$.
• Be careful using a scientific calculator with statistical functions: the sample variance is available via the ${s}^{2}$ key. The ${\sigma }^{2}$ key, however, provides the sum with denominator $n$ instead of $n-1$. This is not the sample standard deviation.

##### Example 11.3.16
The data sequence $x=\left(-1,0,1\right)$ has the mean $\stackrel{‾}{x}=0$ and the sample standard deviation

${s}_{x}^{2}\mathrm{ }=\mathrm{ }\frac{1}{n-1}·\sum _{k=1}^{n}\left({x}_{k}-\stackrel{‾}{x}{\right)}^{2}\mathrm{ }=\mathrm{ }\frac{1}{3-1}·\left(\left(-1-0{\right)}^{2}+\left(0-0{\right)}^{2}+\left(1-0{\right)}^{2}\right)\mathrm{ }=\mathrm{ }1 .$

Adding further zeros to the data sequence does not change the position measure $\stackrel{‾}{x}$, but the measure of deviation ${s}_{x}^{2}$,does change since the data values here are more strongly concentrated at the mean. In contrast, shifting all data values by a constant does not change the variance. For example, the data sequence $\left(-5,-4,-3\right)$ has also variance $1$.

##### Exercise 11.3.17
A data sequence (with an unknown number $n$ of values) has the measures $\stackrel{‾}{x}=4$, ${s}_{x}^{2}=10$, and the median $\stackrel{~}{x}=3$. Suppose the values of a second data sequence satisfy the equation ${y}_{k}=\left(-2\right)·{x}_{k}$ for every $k$. What are its measures?
Answer: the measures are $\stackrel{‾}{y}$$=$
, ${s}_{y}^{2}$$=$
, and $\stackrel{~}{y}$$=$
.
Hint: recall the definitions of the mean, the sample variance, and the median consider how multiplying all $x$-values by a factor of $\left(-2\right)$ influences the entire expression.