Chapter 1
Introduction
This book is intended to provide a rigorous treatment of probability theory at the graduate level. The reader is assumed to have a working knowledge of probability and statistics at the undergraduate level. Certain things were over-simplified in more elementary courses because you were likely not ready for probability in its full generality. But now you are like a boxer who has built up enough experience and confidence to face the next higher level of competition. Do not be discouraged if it seems difficult at first. It will become easier as you learn certain techniques that will be used repeatedly. We will highlight the most important of these techniques by writing three stars (***) next to them and including them in summaries of key results found at the end of each chapter.
You will learn different methods of proofs that will be useful for establishing classic probability results, as well as more generally in your graduate career and beyond. Early chapters build a probability foundation, after which we intersperse examples aimed at making seemingly esoteric mathematical constructs more intuitive. Necessary elements in definitions and conditions in theorems will become clear through these examples. Counterexamples will be used to further clarify nuances in meaning and expose common fallacies in logic.
At this point you may be asking yourself two questions: (1) Why is what I have learned so far not considered rigorous? (2) Why is more rigor needed? The answers will become clearer over time, but we hope this chapter gives you some partial answers. Because this chapter presents an introductory survey of problems that will be dealt with in depth in later material, it is somewhat less formal than subsequent chapters.
1.1 Why More Rigor is Needed
You have undoubtedly been given the following simplified presentation. There are two kinds of random variables—discrete and continuous. Discrete variables have a probability mass function and continuous variables have a probability density function. In actuality, there are random variables that are not discrete, and yet do not have densities. Their distribution functions are said to be singular. One interesting example is the following.
Example 1.1. No univariate density
Flip a biased coin with probability p of heads infinitely many times. Let X1, X2, X3, ... be the outcomes, with Xi = 0 denoting tails and Xi = 1 denoting heads on the ith flip. Now form the random number
written in base 2. That is, Y = X1 · (1/2) + X2 · (1/2)2 + X3 · (1/2)3 +... The first digit X1 determines whether Y is in the first half [0, 1/2) (corresponding to X1 = 0) or second half [1/2, 1] (corresponding to X1 = 1). Whichever half Y is in, X2 determines whether Y is in the first or second half of that half, etc. (see Figure 1.1).
Figure 1.1
Base 2 representation of a number Y ∈ [0, 1]. X1 determines which half, [0, 1/2) or [1/2, 1], Y is in; X2 determines which half of that half Y is in, etc.
What is the probability mass function or density of the random quantity Y? If 0.x1x2 ... is the base 2 representation of y, then P(Y = y) = P(X1 = x1)P(X2 = x2)... = 0 if p ∈ (0,1) because each of the infinitely many terms in the product is either p or (1 − p). Because the probability of Y exactly equaling any given number y is 0, Y is not a discrete random variable. In the special case that p = 1/2, Y is uniformly distributed because Y is equally likely to be in the first or second half of [0, 1], then equally likely to be in either half of that half, etc. But what distribution does Y have if p ∈ (0,1) and p ≠ 1/2? It is by no means obvious, but we will show in Example 7.3 of Chapter 7 that for p ≠ 1/2, the distribution of Y has no density!
Another way to think of the Xi in this example is that they represent treatment assignment (Xi = 0 means placebo, Xi = 1 means treatment) for individuals in a randomized clinical trial. Suppose that in a trial of size n, there is a planned imbalance in that roughly twice as many patients are assigned to treatment as to placebo. If we imagine an infinitely large clinical trial, the imbalance is so great that Y fails to have a density because of the preponderance of ones in its base 2 representation. We can also generate a random variable with no density by creating too much balance. Clinical trials often randomize using permuted blocks, whereby the number of patients assigned to treatment and placebo is forced to be balanced after every 2 patients, for example. Denote the assignments by X1, X2, X3, ..., again with Xi = 0 and Xi = 1 denoting placebo or treatment, respectively, for patient i. With permuted blocks of size 2, exactly one of X1, X2 is 1, exactly one of X3, X4 is 1, etc. In this case there is so much balance in an infinitely large clinical trial that again the random number defined by Equation (1.1) has no density (Example 5.33 of Section 5.6).
Example 1.2. No bivariate density
Here we present an example of a singular bivariate distribution de...