#### Chapter 11 Language of Descriptive Statistics

Section 11.3 Statistical Measures

# 11.3.2 Robust Measures

The measures presented in this section are robust with respect to outliers: large deviations of single data values do not affect this measures (or only affect it slightly).
Consider an original list

$x\mathrm{ }=\mathrm{ }\left({x}_{1},{x}_{2},\dots ,{x}_{n}\right)$

for a sample of size $n$. Let the data ${x}_{i}$ be the property values of a quantitative property $X$.
##### Info 11.3.7

The list ${x}_{\left(\mathrm{ }\right)}=\left({x}_{\left(1\right)},{x}_{\left(2\right)},\dots ,{x}_{\left(n\right)}\right)$ gained by ascending sorting

${x}_{\left(1\right)}\mathrm{ }\le \mathrm{ }{x}_{\left(2\right)}\mathrm{ }\le \mathrm{ }\dots \mathrm{ }\le \mathrm{ }{x}_{\left(n\right)}$

of the original list is called an ordered list or ordered sample (of the original list $x$). The i$\mathrm{th}$ entry ${x}_{\left(i\right)}$ in the ordered list is the $i$th smallest value in the original list.

##### Example 11.3.8
Let us again consider the original list $x=\left({x}_{1},{x}_{2},\dots ,{x}_{20}\right)$ for the sample of size $n=20$ from the examples above. Ascending sorting ${x}_{\left(\mathrm{ }\right)}=\left({x}_{\left(1\right)},{x}_{\left(2\right)},\dots ,{x}_{\left(20\right)}\right)$ results in the following ordered sample:

$\begin{array}{cccccccccccccccccccc}\hfill 7\hfill & \hfill 9\hfill & \hfill 9\hfill & \hfill 9\hfill & \hfill 9\hfill & \hfill 10\hfill & \hfill 10\hfill & \hfill 10\hfill & \hfill 10\hfill & \hfill 11\hfill & \hfill 11\hfill & \hfill 11\hfill & \hfill 11\hfill & \hfill 12\hfill & \hfill 12\hfill & \hfill 12\hfill & \hfill 12\hfill & \hfill 13\hfill & \hfill 13\hfill & \hfill 22\hfill \end{array}$

##### Info 11.3.9

The (empirical) median $\stackrel{~}{x}$ of ${x}_{1},{x}_{2},\dots ,{x}_{n}$ is defined as

In contrast to the arithmetic mean, the (empirical) mean is not sensitive to outliers. For example, the largest value in the ordered original list can be arbitrarily enlarged without changing the median.
##### Example 11.3.10
In the example above, the sample size $n=20$ is even. Thus, we have for the median

$\stackrel{~}{x}\mathrm{ }=\mathrm{ }\frac{1}{2}·\left({x}_{\left(10\right)}+{x}_{\left(11\right)}\right)\mathrm{ }=\mathrm{ }\frac{1}{2}·\left(11+11\right)\mathrm{ }=\mathrm{ }11 .$

Approximately half of the values in the original list are less than or equal to the median, and half of the values are greater than or equal to the median $\stackrel{~}{x}$. This principle can be generalised to define quantiles. For this purpose, take an original list $x=\left({x}_{1},{x}_{2},\dots ,{x}_{n}\right)$ for a sample of size $n$ of a quantitative property $X$.
##### Info 11.3.11

Let

${x}_{\left(\mathrm{ }\right)}\mathrm{ }=\mathrm{ }\left({x}_{\left(1\right)},{x}_{\left(2\right)},\dots ,{x}_{\left(n\right)}\right)$

be the corresponding ordered sample and

$\alpha \in \left(0,1\right)\mathrm{ }\mathrm{ }\text{and}\mathrm{ }\mathrm{ }k=\text{floor}\left(n·\alpha \right)\mathrm{ }=\mathrm{ }⌊n·\alpha ⌋ .$

Then

${\stackrel{~}{x}}_{\alpha }\mathrm{ }=\mathrm{ }\left\{\begin{array}{ccc}{x}_{\left(k+1\right)}\hfill & \text{if}\hfill & n·\alpha \notin ℕ\hfill \\ \frac{1}{2}·\left({x}_{\left(k\right)}+{x}_{\left(k+1\right)}\right)\hfill & \text{if}\hfill & n·\alpha \in ℕ\hfill \end{array}$

is called a sample $\alpha$-quantile or simply $\alpha$-quantile of ${x}_{1},{x}_{2}\dots ,{x}_{n}$.

The $0.25$-quantile is also called the lower quartile. It splits off approximately the lowest 25 % of data values from the highest 75 %. Accordingly, the $0.75$-quantile is called the upper quartile. For $\alpha =0.5$ we have the median, i.e. $\stackrel{~}{x}={\stackrel{~}{x}}_{0.5}$. If $\alpha \in \left(0,1\right)$, the ordered list ${x}_{1},{x}_{2},\dots ,{x}_{n}$ is split so that approximately $\alpha ·100%$ of the data value are less or equal to ${\stackrel{~}{x}}_{\alpha }$ and approximately $\left(1-\alpha \right)·100%$ of the data values are greater or equal to ${\stackrel{~}{x}}_{\alpha }$.
##### Example 11.3.12
Consider again the original list $x=\left({x}_{1},{x}_{2},\dots ,{x}_{20}\right)$ for the sample of size $n=20$ from the examples above together with the ordered sample ${x}_{\left(\mathrm{ }\right)}=\left({x}_{\left(1\right)},{x}_{\left(2\right)},\dots ,{x}_{\left(20\right)}\right)$

$\begin{array}{cccccccccccccccccccc}\hfill 7\hfill & \hfill 9\hfill & \hfill 9\hfill & \hfill 9\hfill & \hfill 9\hfill & \hfill 10\hfill & \hfill 10\hfill & \hfill 10\hfill & \hfill 10\hfill & \hfill 11\hfill & \hfill 11\hfill & \hfill 11\hfill & \hfill 11\hfill & \hfill 12\hfill & \hfill 12\hfill & \hfill 12\hfill & \hfill 12\hfill & \hfill 13\hfill & \hfill 13\hfill & \hfill 22\hfill \end{array}$

For $\alpha =0.25$, the $25%$-quantile is defined by $n·\alpha =\frac{20}{4}=5\in ℕ$, i.e. for the lower quartile we have

${\stackrel{~}{x}}_{0.25}\mathrm{ }=\mathrm{ }\frac{1}{2}·\left({x}_{\left(5\right)}+{x}_{\left(6\right)}\right)\mathrm{ }=\mathrm{ }\frac{1}{2}·\left(9+10\right)\mathrm{ }=\mathrm{ }\frac{19}{2}\mathrm{ }=\mathrm{ }9.5 .$

For the upper quartile, we set $\alpha =0.75$ and obtain $n·\alpha =\frac{20·3}{4}=15\in ℕ$, hence

${\stackrel{~}{x}}_{0,75}\mathrm{ }=\mathrm{ }\frac{1}{2}·\left({x}_{\left(15\right)}+{x}_{\left(16\right)}\right)\mathrm{ }=\mathrm{ }\frac{1}{2}·\left(12+12\right)\mathrm{ }=\mathrm{ }12 .$

again, let a sample of size $n$ be given to a quantitative property $X$ with the corresponding ordered sample

${x}_{\left(\mathrm{ }\right)}\mathrm{ }=\mathrm{ }\left({x}_{\left(1\right)},{x}_{\left(2\right)},\dots ,{x}_{\left(n\right)}\right)$

and

$\alpha \in \left[0,\mathrm{ }0.5\right)\mathrm{ }\mathrm{ }\text{and}\mathrm{ }\mathrm{ }k\mathrm{ }=\mathrm{ }\text{floor}\left(n·\alpha \right)\mathrm{ }=\mathrm{ }⌊n·\alpha ⌋ .$

##### Info 11.3.13

The $\alpha$-trimmed (or $\alpha$-truncated) sample mean is defined as

${\stackrel{‾}{x}}_{\alpha }\mathrm{ }=\mathrm{ }\frac{1}{n-2·k}·\sum _{j=k+1}^{n-k}{x}_{\left(j\right)}\mathrm{ }=\mathrm{ }\frac{1}{n-2·k}·\left({x}_{\left(k+1\right)}+\dots +{x}_{\left(n-k\right)}\right) .$

The $\alpha$-trimmed mean is an arithmetic mean that discards the $\alpha ·100%$ largest and $\alpha ·100%$ smallest data points from the calculation. Thus, it is a flexible protection tool against outliers at the boundaries of the data range. However, we mustn't forget that we no longer take all data into account when we use this tool.
##### Example 11.3.14
In the already much considered data set, the ordered sample ${x}_{\left(\right)}=\left({x}_{\left(1\right)},{x}_{\left(2\right)},\dots ,{x}_{\left(20\right)}\right)$ is given by

$\begin{array}{cccccccccccccccccccc}\hfill 7\hfill & \hfill 9\hfill & \hfill 9\hfill & \hfill 9\hfill & \hfill 9\hfill & \hfill 10\hfill & \hfill 10\hfill & \hfill 10\hfill & \hfill 10\hfill & \hfill 11\hfill & \hfill 11\hfill & \hfill 11\hfill & \hfill 11\hfill & \hfill 12\hfill & \hfill 12\hfill & \hfill 12\hfill & \hfill 12\hfill & \hfill 13\hfill & \hfill 13\hfill & \hfill 22 ,\hfill \end{array}$

and for $\alpha =0.12$ and $k=⌊20·0.12⌋=⌊2.4⌋=2$ we obtain for the $12%$-trimmed mean of the sample

${\stackrel{‾}{x}}_{0.12}\mathrm{ }=\mathrm{ }\frac{1}{16}·\sum _{j=3}^{18}{x}_{\left(j\right)}\mathrm{ }=\mathrm{ }\frac{1}{16}·172\mathrm{ }=\mathrm{ }10.75 .$

It is less than the arithmetic mean $\stackrel{‾}{x}=11.15$ since outliers, such as ${x}_{\left(20\right)}=22$, were ignored.