David J. Bartholomew
1.1 The Old Approach
To find the precursor of contemporary latent variable modeling one must go back to the beginning of the 20th century and Charles Spearmanâs invention of factor analysis. This was followed, half a century later, by latent class and latent trait analysis and, from the 1960âs onwards, by covariance structure analysis. The most recent additions to the family have been in the area of latent time series analysis. This chapter briefly reviews each of these fields in turn as a foundation for the evaluations and comparisons which are made later.
1.1.1 Factor analysis
Spearmanâs (1904) original paper on factor analysis is remarkable, not so much for what it achieved, which was primitive by todayâs standards, but for the path-breaking character of its central idea. He was writing when statistical theory was in its infancy. Apart from regression analysis, all of todayâs multivariate methods lay far in the future. Therefore Spearman had not only to formulate the central idea, but to devise the algebraic and computational methods for delivering the results. At the heart of the analysis was the discovery that one could infer the presence of a latent dimension of variation from the pattern of the pairwise correlations coefficients. However, Spearman was somewhat blinkered in his view by his belief in a single underlying latent variable corresponding to general ability, or intelligence. The data did not support this hypothesis and it was left to others, notably Thurstone in the 1930âs, to extend the theory to what became commonly known as multiple factor analysis.
Factor analysis was created by, and almost entirely developed by, psychologists. Hotellingâs introduction of principal components analysis in 1933 approached essentially the same problem from a diffferent perspective, but his work seems to have made little impact on practitioners at the time.
It was not until the 1960âs and the publication of Lawley and Maxwellâs (1971) book Factor Analysis as a Statistical Method that any sustained attempt was made to treat the subject statistically. Even then there was little effect on statisticians who, typically, continued to regard factor analysis as an alien and highly subjective activity which could not compete with principal components analysis. Gradually the range of applications widened but without going far beyond the framework provided by the founders.
1.1.2 Latent class analysis
Latent class analysis, along with latent trait analysis (discussed later), have their roots in the work of the sociologist, Paul Lazarsfeld in the 1960âs. Under the general umbrella of latent structure analysis these techniques were intended as tools of sociological analysis. Although Lazarsfeld recognized certain affinities with factor analysis he emphasized the differences. Thus in the old approach these families of methods were regarded as quite distinct.
Although statistical theory had made great strides since Spearmanâs day there was little input from statisticians until Leo Goodman began to develop efficient methods for handling the latent class model around 1970.
1.1.3 Latent trait analysis
Although a latent trait model differs from a latent class model only in the fact that the latent dimension is viewed as continuous rather than categorical, it is considered separately because it owes its development to one particular application. Educational testing is based on the idea that human abilities vary and that individuals can be located on a scale of the ability under investigation by the answers they give to a set of test items. The latent trait model provides the link between the responses and the underlying trait. A seminal contribution to the theory was provided by Birnbaum (1968) but today there is an enormous literature, both applied and theoretical, including books, journals such as Applied Psychological Measurement and a multitude of articles.
1.1.4 Covariance structure analysis
This term covers developments stemming from the work of Jöreskog in the 1960âs. It is a generalization of factor analysis in which one explores causal relationships (usually linear) among the latent variables. The significance of the word covariance is that these models are fitted, as in factor analysis, by comparing the observed covariance matrix of the data with that predicted by the model. Since much of empirical social science is concerned with trying to establish causal relationships between unobservable variables, this form of analysis has found many applications. This work has been greatly facilitated by the availability of good software packages whose sophistication has kept pace with the speed and capacity of desk-top (or lap-top) computers. In some quarters, empirical social research has become almost synonymous with LISREL analysis. The acronym LISREL has virtually become a generic title for linear structure relations modeling.
1.1.5 Latent time series
The earliest use of latent variable ideas in time series appears to have been due to Wiggins (1973) but, as so often happens, it was not followed up. Much later there was rapid growth in work on latent (or âhiddenâ as they are often called) Markov chains. If individuals move between a set of categories over time it may be that their movement can be modeled by a Markov chain. Sometimes their category cannot be observed directly and the state of the individual must be inferred, indirectly, from other variables related to that state. The true Markov chain is thus latent, or hidden. An introduction to such processes is given in MacDonald and Zucchini (1997). Closely related work has been going on, independently, in the modeling of neural networks. Harvey and Chung (2000) proposed a latent structural time series model to model the local linear trend in unemployment. In this context two observed series are regarded as being imperfect indicators of âtrueâ unemployment.
1.2 The New Approach
The new, or statistical, approach derives from the observation that all of the models behind the foregoing examples are, from a statistical point of view, mixtures. The basis for this remark can be explained by reference to a simple example which, at first sight, appears to have little to do with latent variables. If all members of a population have a very small and equal chance of having an accident on any day, then the distribution of the number of accidents per month, say, will have a Poisson distribution. In practice the observed distribution often has greater dispersion than predicted by the Poisson hypothesis. This can be explained by supposing that the daily risk of accident varies from one individual to another. In other words, there appears to be an unobservable source of variation which may be called âaccident pronenessâ. The latter is a latent variable. The actual distribution of number of accidents is thus a (continuous) mixture of Poisson distributions.
The position is essentially the same with the latent variable models previously discussed. The latent variable is a source of unobservable variation in some quantity, which characterizes members of the population. For the latent class model this latent variable is categorical, for the latent trait and factor analysis model it is continuous. The actual distribution of the manifest variables is then a mixture of the simple distributions they would have had in the absence of that heterogeneity. That simpler distribution is deducible from the assumed behaviour of individuals with the same ability - or whatever it is that distinguishes them. This will be made more precise below.
1.2.1 Origins of the new approach
The first attempt to express all latent variable models within a common mathematical framework appears to have been that of Anderson (1959). The title of the paper suggests that it is concerned only with the latent class model and this may have caused his seminal contribution to be overlooked. Fielding (1977) used Andersonâs treatment in his exposition of latent structure models but this did not appear to have been taken up until the present author used it as a basis for handling the factor analysis of categorical data (Bartholomew 1980). This work was developed in Bartholomew (1984) by the introduction of exponential family models and the key concept of sufficiency. This approach, set out in Bartholomew and Knott (1999), lies behind the treatment of the present chapter. One of the most general treatments, which embraces a very wide family of models, is also contained in Arminger and KĂŒsters (1988).
1.2.2 Where is the new approach located on the map of statistics?
Statistical inference starts with data and seeks to generalize from it. It does this by setting up a probability model which defines the process by which the data are supposed to have been generated. We have observations on a, possibly multivariate, random variable x and wish to make inferences about the process which is determined by a set of parameters Ο. The link between the two is expressed by the distribution of x given Ο. Frequentist inference treats Ο as fixed; Bayesian inference treats Ο as a random variable.
In latent variables analysis one may think of x as partitioned into two parts x and y, where x is observed and y, the latent variable, is not observed. Formally then, this is a standard inference problem in which some of the variables are missing. The model will have to begin with the distribution of x given Ο and y. A purely...