Chapter 11 Language of Descriptive Statistics

Section 11.3 Statistical Measures

11.3.2 Robust Measures

The measures presented in this section are robust with respect to outliers: large deviations of single data values do not affect this measures (or only affect it slightly).
Consider an original list

x = (x_{1}, x_{2}, \dots, x_{n})

for a sample of size

n

. Let the data

x_{i}

be the property values of a quantitative property

X

Info 11.3.7

The list

x_{()} = (x_{(1)}, x_{(2)}, \dots, x_{(n)})

gained by ascending sorting

x_{(1)} \leq x_{(2)} \leq \dots \leq x_{(n)}

of the original list is called an ordered list or ordered sample (of the original list

x

). The i

th

entry

x_{(i)}

in the ordered list is the

i

th smallest value in the original list.

Example 11.3.8

Let us again consider the original list

x = (x_{1}, x_{2}, \dots, x_{20})

for the sample of size

n = 20

from the examples above. Ascending sorting

x_{()} = (x_{(1)}, x_{(2)}, \dots, x_{(20)})

results in the following ordered sample:

\begin{matrix} 7 & 9 & 9 & 9 & 9 & 10 & 10 & 10 & 10 & 11 & 11 & 11 & 11 & 12 & 12 & 12 & 12 & 13 & 13 & 22 \end{matrix}

Info 11.3.9

The (empirical) median

\tilde{x}

x_{1}, x_{2}, \dots, x_{n}

is defined as

\tilde{x} = {\begin{matrix} x_{(\frac{n + 1}{2})} & for & n odd \\ \frac{1}{2} \cdot (x_{(\frac{n}{2})} + x_{(\frac{n}{2} + 1)}) & for & n even . \end{matrix}

In contrast to the arithmetic mean, the (empirical) mean is not sensitive to outliers. For example, the largest value in the ordered original list can be arbitrarily enlarged without changing the median.

Example 11.3.10

In the example above, the sample size

n = 20

is even. Thus, we have for the median

\tilde{x} = \frac{1}{2} \cdot (x_{(10)} + x_{(11)}) = \frac{1}{2} \cdot (11 + 11) = 11 .

Approximately half of the values in the original list are less than or equal to the median, and half of the values are greater than or equal to the median

\tilde{x}

. This principle can be generalised to define quantiles. For this purpose, take an original list

x = (x_{1}, x_{2}, \dots, x_{n})

for a sample of size

n

of a quantitative property

X

Info 11.3.11

Let

x_{()} = (x_{(1)}, x_{(2)}, \dots, x_{(n)})

be the corresponding ordered sample and

α \in (0,1) and k = floor (n \cdot α) = ⌊ n \cdot α ⌋ .

Then

{\tilde{x}}_{α} = {\begin{matrix} x_{(k + 1)} & if & n \cdot α \notin ℕ \\ \frac{1}{2} \cdot (x_{(k)} + x_{(k + 1)}) & if & n \cdot α \in ℕ \end{matrix}

is called a sample

α

-quantile or simply

α

-quantile of

x_{1}, x_{2} \dots, x_{n}

The

0.25

-quantile is also called the lower quartile. It splits off approximately the lowest 25 % of data values from the highest 75 %. Accordingly, the

0.75

-quantile is called the upper quartile. For

α = 0.5

we have the median, i.e.

\tilde{x} = {\tilde{x}}_{0.5}

. If

α \in (0,1)

, the ordered list

x_{1}, x_{2}, \dots, x_{n}

is split so that approximately

α \cdot 100 %

of the data value are less or equal to

{\tilde{x}}_{α}

and approximately

(1 - α) \cdot 100 %

of the data values are greater or equal to

{\tilde{x}}_{α}

Example 11.3.12

Consider again the original list

x = (x_{1}, x_{2}, \dots, x_{20})

for the sample of size

n = 20

from the examples above together with the ordered sample

x_{()} = (x_{(1)}, x_{(2)}, \dots, x_{(20)})

\begin{matrix} 7 & 9 & 9 & 9 & 9 & 10 & 10 & 10 & 10 & 11 & 11 & 11 & 11 & 12 & 12 & 12 & 12 & 13 & 13 & 22 \end{matrix}

For

α = 0.25

, the

25 %

-quantile is defined by

n \cdot α = \frac{20}{4} = 5 \in ℕ

, i.e. for the lower quartile we have

{\tilde{x}}_{0.25} = \frac{1}{2} \cdot (x_{(5)} + x_{(6)}) = \frac{1}{2} \cdot (9 + 10) = \frac{19}{2} = 9.5 .

For the upper quartile, we set

α = 0.75

and obtain

n \cdot α = \frac{20 \cdot 3}{4} = 15 \in ℕ

, hence

{\tilde{x}}_{0,75} = \frac{1}{2} \cdot (x_{(15)} + x_{(16)}) = \frac{1}{2} \cdot (12 + 12) = 12 .

again, let a sample of size

n

be given to a quantitative property

X

with the corresponding ordered sample

x_{()} = (x_{(1)}, x_{(2)}, \dots, x_{(n)})

and

α \in [0, 0.5) and k = floor (n \cdot α) = ⌊ n \cdot α ⌋ .

Info 11.3.13

The

α

-trimmed (or

α

-truncated) sample mean is defined as

{\overline{x}}_{α} = \frac{1}{n - 2 \cdot k} \cdot \sum_{j = k + 1}^{n - k} x_{(j)} = \frac{1}{n - 2 \cdot k} \cdot (x_{(k + 1)} + \dots + x_{(n - k)}) .

The

α

-trimmed mean is an arithmetic mean that discards the

α \cdot 100 %

largest and

α \cdot 100 %

smallest data points from the calculation. Thus, it is a flexible protection tool against outliers at the boundaries of the data range. However, we mustn't forget that we no longer take all data into account when we use this tool.

Example 11.3.14

In the already much considered data set, the ordered sample

x_{()} = (x_{(1)}, x_{(2)}, \dots, x_{(20)})

is given by

\begin{matrix} 7 & 9 & 9 & 9 & 9 & 10 & 10 & 10 & 10 & 11 & 11 & 11 & 11 & 12 & 12 & 12 & 12 & 13 & 13 & 22, \end{matrix}

and for

α = 0.12

and

k = ⌊ 20 \cdot 0.12 ⌋ = ⌊ 2.4 ⌋ = 2

we obtain for the

12 %

-trimmed mean of the sample

{\overline{x}}_{0.12} = \frac{1}{16} \cdot \sum_{j = 3}^{18} x_{(j)} = \frac{1}{16} \cdot 172 = 10.75 .

It is less than the arithmetic mean

\overline{x} = 11.15

since outliers, such as

x_{(20)} = 22

, were ignored.

Onlinebrückenkurs Mathematik

1. Elementary Arithmetic

2. Equations in one Variable

3. Inequalities in one Variable

4. System of Linear Equations

5. Geometry

6. Elementary Functions

7. Differential Calculus

8. Integral Calculus

9. Objects in the Two-Dimensional Coordinate System

10. Basic Concepts of Descriptive Vector Geometry

11. Language of Descriptive Statistics