1.1 Introduction
Missing data, or incomplete data, is frequently encountered in many disciplines. Statistical analysis with missing data has been an area of considerable interest in the statistical community. Many tools, generic or tailor-made, have already been developed, and many more will be forthcoming to handle missing data problems. Missing data is particularly useful because many statistical issues can be treated as special cases of the missing data problem. For example, data with measurement error can be viewed as a special case of missing data where an imperfect measurement is available instead of true measurement. Two-phase sampling can also be viewed as a planned missing data problem where the key items are observed only in the second-phase sample by design. Many statistical problems employing a latent variable can also be viewed as missing data problems. Furthermore, the advances in statistical computing have made the computational aspects of the missing data analysis techniques more feasible. This book aims to cover the most up-to-date statistical theories and computational methods of the missing data analysis.
Generally speaking, let z be the study variable with density function . We are interested in estimating the parameter θ. If z were observed throughout the sample, then θ would be able to be estimated by the maximum likelihood method. Instead of observing z, however, we only observe and δ, where is an incomplete version of z satisfying and δ is an indicator function that takes either one or zero, depending on the response status. Parameter estimation of θ from the observation of is the core of the problem in missing data analyses.
To handle this problem, the marginal density function of needs to be expressed as a function of the original distribution . Maximum likelihood estimation can be obtained under some identifying assumptions and statistical theories can be developed for the maximum likelihood estimator obtained from the observed sample. Computational tools for producing the maximum likelihood estimator need to be introduced. How to assess the uncertainty of the resulting maximum likelihood estimator is also an important topic.
When z is a vector, there will be more complications. Because several random variables are subject to missingness, the missing data pattern can figure in to simply modeling and estimation. The monotone missing pattern refers to the situation where the set of respondents in one variable is always a subset of the set of respondents for another variable, which may host further subsetting. See Table 1.1 for an illustration of the monotone missing pattern.
TABLE 1.1 Monotone Missing Pattern Y1 | | Y2 | | Y3 |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
1.2 Outline
Maximum likelihood estimation with missing data serves as the starting point of this book. Chapter 2 is about defining the observed likelihood function from the marginal density of the observed part of the data, finding the maximum of the observed likelihood by solving the mean score equation, and obtaining the observed information matrix from the observed likelihood. Chapter 3 deals with computational tools to arrive at the maximum likelihood estimator, especially the EM algorithm.
Imputation, covered in Chapter 4, is also a popular tool for handling missing data. Imputation can be viewed as a computational technique for the Monte Carlo approximation of the conditional expectation of the original complete-sample estimator given the observed data. As for variance estimation of the imputation estimator, an important subject in missing data analyses, the Taylor linearization or replication method can be used. Multiple imputation, introduced in Chapter 5, has been proposed as a general tool for imputation and simplified variance estimation but it requires some special conditions, called congeniality and self-efficiency. Fractional imputation is an alternative general-purpose estimation tool for imputation and is covered in Chapter 6.
Propensity score weighting, covered in Chapter 7, is another tool for handling missing data. Basically the responding units are assigned with propensity score weights so that the weighted analysis can lead to valid inference. The propensity score weighting method is often based on an assumption about the response mechanism and the resulting estimator can be made more efficient by properly taking into account of the auxiliary information available ...