#### Chapter 11 Language of Descriptive Statistics

**Section 11.1 Terminology and Language**

# 11.1.1 Introduction

For statistical observations (surveys) of appropriately chosen units of observation (a.k.a. units of investigation or experimental units), the values or attributes of a property or properties are determined. Here, a property is a characteristic of the observation unit to be investigated. The terminology of descriptive statistics is as follows:

- The
**unit of investigation**(also:**unit of observation**) is the smallest unit on which the observations are made.

- The
**characteristic**or**property**is the statistical variable of the unit to be investigated. Characteristics are often denoted by upper-case Latin letters ($X,Y,Z,\dots $).

**Characteristic attributes**or**property values**are values that properties can take. They are often denoted by lower-case Latin letters ($a,b,\dots ,x,y,z,{a}_{1},{a}_{2},\dots $).

- The set of units of observation that is investigated with respect to a property of interest is called
**universe**or also**population**. It is the set of all possible observation units.

- A
**sample**is a "random finite subset" of a certain population of interest. If this set consists of $n$ elements, then this set is called a "sample of size $n$".

**Data**are the observed values (attributes) of one or more characteristics or properties of a sample unit of observation of a certain population.

- The
**original list**is the protocol that lists the sampled data in chronological order. Thus, the original list is a $n$-tuple (or vector, written here mostly in coordinate form):

$x\mathrm{\hspace{0.5em}\hspace{0.5em}}=\mathrm{\hspace{0.5em}\hspace{0.5em}}({x}_{1},\dots ,{x}_{n})\hspace{0.5em}.$

This $n$-tuple is often called a "sample of size $n$".

##### **Example 11.1.1 **

From a daily production of components in a factory, $n=20$ samples of $15$ parts each are taken and the number of defective parts in each sample is determined. Here, ${x}_{i}$ is the number of defective parts in the $i$th sample, $i=1,\dots ,20$. The original list (sample of size $n=20$) contains the following data:

$x\mathrm{\hspace{0.5em}\hspace{0.5em}}=\mathrm{\hspace{0.5em}\hspace{0.5em}}(\mathrm{0,4},\mathrm{2,1},\mathrm{1,0},\mathrm{0,2},\mathrm{3,1},\mathrm{0,5},\mathrm{3,1},\mathrm{1,2},\mathrm{0,0},\mathrm{1,0})\hspace{0.5em}.$

In the second sample, ${x}_{2}=4$ defective parts were found. The population in this example is the set of all $15$-element subsets of the daily production. The property of interest is in this case

$X\mathrm{\hspace{0.5em}\hspace{0.5em}}=\mathrm{\hspace{0.5em}\hspace{0.5em}}\text{Number of defective workpieces in a sample of 15elements}\hspace{0.5em}.$

In the second sample, ${x}_{2}=4$ defective parts were found. The population in this example is the set of all $15$-element subsets of the daily production. The property of interest is in this case

Properties are roughly classified into qualitative properties (that can be ascertained in a descriptive way) and quantitative properties (that can naturally be ascertained numerically):

**Qualitative properties**:- Nominal properties: attributes classified according to purely qualitative aspects. Examples: skin colour, nationality, blood type.

- Ordinal properties: attributes with a natural hierarchy, i.e. they can be ordered or sorted. Examples: grades, ranks, surnames.

- Nominal properties: attributes classified according to purely qualitative aspects. Examples: skin colour, nationality, blood type.
**Quantitative properties**:- Discrete properties: property values are isolated values (e.g. integers). Examples: numbers, years, age in years.

- Continuous properties: property values can (at least in principle) take any value. Examples: body size, weight, length.

- Discrete properties: property values are isolated values (e.g. integers). Examples: numbers, years, age in years.

The transition between continuous and discrete properties is partly fluid, once we consider the possibility of rounding.