1 Variables and relations
The text is partitioned into chapters and sections as indicated in the table of contents. A further subdivision into paragraphs (denoted by §) is provided at the beginning of each chapter.
1.1 Variables and distributions
- Statistical variables
- A fictitious illustration
- Multidimensional variables
- Statistical distributions
- Descriptive statistical statements
- Conditional distributions
- Regression functions
- Descriptive regression models
- Statistical and substantial conditions
1.2 Relations
- Relational variables
- Construction of networks
- Formal descriptions of networks
- Different kinds of relations
- Factual and modal views of relations
The first section introduces elementary statistical concepts for descriptive purposes, in particular, statistical variables and distributions, conditional distributions, and regression functions. The second section extends the statistical framework to allow for an explicit formal representation of relations and then discusses different notions of relations.
1.1 Variables and distributions
1. Statistical variables
Most statistical concepts derive from the notion of a statistical variable. Unfortunately, the word ‘variable’ is easily misleading because it suggests something that “varies” or being a “variable quantity.”1 In order to get an appropriate understanding it is first of all necessary to distinguish statistical from logical variables. Consider the expression ‘x ≤ 5’. In this expression, x is a logical variable that can be replaced by a name. Obviously, without substituting a specific name, the expression ‘x ≤ 5’ has no definite meaning and, in particular, is neither a true nor a false statement. The expression is actually no statement at all but a sentential function. A statement that is true or false or meaningless only results when a name is substituted for x. For example, if the symbol 1 is substituted for x, the result is a true statement (1 ≤ 5); if the symbol 9 is substituted for x, the result is a false statement (9 ≤ 5); and if some name not referring to a number is substituted for x, the result is meaningless. Such logical variables are used, for example, in mathematics to formulate general statements. Statistical variables serve a quite different purpose. They are used to represent the data for statistical calculations which refer to properties of objects. The basic idea is that one can characterize objects by properties. Since this is essentially an assignment of properties to objects, statistical variables are defined as functions:2
X is the name of the function, Ω is its domain, and is the codomain (a set of possible values), also called the range of X . To each element ω ∊ Ω , the statistical variable X assigns exactly one element of denoted by X (ω). In this sense, a statistical variable is simply a function.3 What distinguishes statistical variables from other functions is a specific purpose: statistical variables serve to characterize objects. Therefore, in order to call X a statistical variable (and not just a function), its domain, Ω , should be a set of objects and its codomain, , should be a set of properties that can be meaningfully used to characterize the elements of Ω . To remind of this purpose, the set of possible values of a statistical variable will be called its characteristic or property space and its elements will be called property values.4 In the statistical literature domains of statistical variables are often called populations. This is unfortunate because a statistical variable can refer to any kind of object. We therefore often prefer to speak of the domain or, equivalently, the reference set of a statistical variable.
2. A fictitious illustration
A simple example can serve to illustrate the notion of a statistical variable. In this example, the reference set is a set of 10 people, symbolically Ω := {ω1,...,ω10}.5 The variable, denoted X , is intended to represent, for each member of Ω , the sex. This can be done with a property space := {0,1}, with elements 0 (meaning ‘male’) and 1 (meaning ‘female’). Then, for each member ω ∊ Ω , X (ω) is a value in and shows ω’s sex. Of course, in order to make use of a statistical variable one needs data. In contrast to most functions that are used in mathematics, statistical variables cannot be defined by referring to some kind of rule. There is no rule that allows one to infer of the sex, or any other property, of an individual by knowing its name. In order to make a statistical variable explicitly known one almost always needs a tabulation of its values. The left-hand side of Table 1.1 provides fictitious data as an illustration for the current example.
Table 1.1 Fictitious data for a statistical variable X (left-hand side) and a two-dimensional statistical variable (X ,Y ) (right-hand side)
Note that it is general practice in statistics to represent properties (the elements of a property space) by numbers. One reason for doing so is the resulting simplification in the tabulation of statistical data. The main reason is, however, another one: numerical representations allow the pe...