1.1. Aims of the Book
Factor analysis is a generic term for a somewhat vaguely delimited set of techniques for data processing, mainly applicable to the social and biological sciences. These techniques have been developed for the analysis of mutual relationships among a number of measurements made on a number of measurable entities. It may be convenient to be rather more concrete than this and to imagine a typical application in which the measures are scores on tests and the measurable entities are human subjects to whom the tests have been given. When, to fix ideas, we speak of subjects, and occasionally of tests, the reader should bear in mind that any other measurable quantities and anything whatsoever on which the measurements are made could be substituted, if we have reason to do so, for test scores and for subjects.
Factor analysis in the broad sense comprises both a number of statistical models, which yield testable hypotheses (i.e., hypotheses that we may confirm or disconfirm in terms of the usual statistical procedures for making tests of significance), and also a number of simplifying procedures for the approximate description of data, which do not in any sense constitute disconfirmable hypotheses, except in the loose sense that they supply approximations to the data, and sometimes we can say that the approximations are very bad. In the literature, the two types of analysis have often been confused.
The following account of factor analysis is intended to be as elementary as possible. It is hoped that the reader will be able to understand applications of factor analysis in the literature and, up to a point, be able to evaluate them critically. It is hoped that the reader will also be able to undertake a study applying factor analysis to the area of empirical research in which he or she engages, given the availability of a computer center that has up-to-date programs for the purpose and someone to advise on how to prepare and submit the data to the available programs.
Because of the aspects of the subject that this book is not in the slightest degree attempting to cover, a number of general warnings must be issued. The mathematical theory of factor analysis is relatively complex. To come to grips with the deeper literature on the subject, the reader must have reasonable grounding in both mathematical statistics (the mathematical theory of statistical inference as opposed to elementary accounts of statistical techniques for the user) and in linear algebra (the algebra of vectors and matrices). It is an area of inquiry in which, to quote Popeās couplet fully:
A little learning is a dangerous thing,
Drink deep, or taste not the Pierian spring.
It is also an area beset by confusion and disagreement. The full extent of the confusion and disagreement will not be reflected in this book. The attempt will be made to present a view of the subject that is internally consistent and tenable as far as it goes. Others may judge the success of that attempt. Bibliographical notes are appended to the chapters, some of which amount to suggestions for further reading, but the temptation, in the course of writing to add remarks pointing outside the area actually covered has been suppressed to avoid distracting complications.
We turn next to the delicate question of what the reader is assumed to know. The simplest statement is that the reader is assumed to know basic univariate and bivariate statisticsāsample versus population, the computation and use of sample means, variances and correlation coefficients, the testing of simple statistical hypotheses about population means, variances and correlation coefficients, using the normal curve, chi-square and F tables, and typical applications of these to research problems in social science, especially psychological or educational research. Implicit in this assumption are other assumptions, of course (viz., that the reader is familiar with the devices from elementary algebraāsubscripting, summation notation, elementary algebraic manipulationsāthat accompany the teaching of statistical methods to the user). He or she should also have acquired some working intuitions, not too far from those of the mathematician, about probability and statistical distributions. An appendix summarizes some results in matrix algebra, and these are used in mathematical notes at the ends of the chapters. The reader can choose to omit all this material but is encouraged to work with it to see if it is helpful.
From one point of view, the task here is to come to understand the numerical input and the numerical output supplied to and obtained from factor-analytic computer programs. It is particularly true of factor analysis that contemplating or working with the computational formulas will not help us with this task at all. Surely this is fortunate. On the other hand, it seems desirable that understanding of factor-analytic input/output be firmly based on the statistical conceptsā multiple regression and partial and multiple correlationāof which factor analysis is only a slight extension. The following section is therefore offered as a review of basic concepts, and the next section thereafter is devoted to multiple regression theory.
1.2. Review of Basic Concepts
(a) Modes of Inquiry
Students of the social and biological sciences gradually acquire, through the whole course of their training, a general sense of the nature of scientific inquiry as well as notions as to how to inquire into specific problems in specific fields. The reader will know how to supply suitable qualifications and corrections to the broad and perhaps wild generalizations that will now be offered as a way of getting started. Let us agree quickly, virtually without thinking, that the objectives of science are the explanation, prediction, and control of events in nature and that the tools of science are empirical observation and abstract thought.
In mathematical statistics, the word experiment is used very broadly to mean any systematic procedure for making observations (for which statistical analysis can, in principle, be performed). But, generally, we want to distinguish between two modes of empirical observation that may be called the experimental mode and the survey mode.
In the prototypical experiment, we take a given number of subjects (experimental units, objects of some kind) and assign them to two or more distinct and contrastable experimental conditions (treatments). We choose some property or behavior of the subjects, commonly a measurable quantity, that we are going to regard as the response to the treatment. We follow an elaborate tradition of experimental design, yielding an analysis of variance, in which the statistical question is whether the variation of the mean responses from one experimental condition to another is greater than we should expect from the particular assignment of the subjects to the treatments. The logical justification of our statistical inference rests in part on making the assignment of our subjects to the different treatments by a random process. The process of randomly assigning our subjects to our treatments is supposed to have randomized and spread out into random, estimable variability all the effects that might otherwise be confounded with (confused with) the treatment effects that we hope to find. These other effects are of a number of kinds, including variations in the properties of the subjects (individual differences between the subjects in the experiment), errors of measurement of the response, and perhaps (though random assignment does not necessarily cover this) failure to keep each treatment condition as uniform as we would wish. Typically, we do not care how much of the random, within-treatment variability is due to treatment variation, individual differences, or errors of measurement, just so long as we have successfu...