1
The Theory Of Meta-AnalysisāSampling Error and the Law Of Small Numbers
Although not widely known, the meta-analytic technique is actually founded on a sound conceptual basis. That conceptual basis is sampling error theory. In psychometric terms, sampling error is the difference between the characteristics of a sample and those of the population from which that sample was drawn. Sampling error occurs because a sample typically represents only a small fraction of the original population. For example, if a study pertains to general human nature such as sex differences for a given personality characteristic, then a typical sample of 50 represents a meager 0.0000008 percent of the general human population of 6 billion. In such a case, any permutation of results and outcomes can and does occur, such as the effect in the sample being slightly stronger, considerably stronger, slightly weaker, or considerably weaker than the true effect in the population.
In a very real sense, sampling error is a parallel application of classical test theory. Classical test theory maintains that a personās actual score on a test is a combination of his or her true score plus an unknown (i.e., random) error component. By extension, sampling error maintains that the relationship across all of the participants in a given study, whether represented by an effect size statistic (d) or by a correlation coefficient (r), is a combination of the true size of the relationship in a population plus an unknown (i.e., random) error component.
Therein lies the logical and conceptual basis for meta-analysis. Each primary study in a meta-analysis represents one sample taken from a given population. As such, each one of those samples is likely to differ from the population by some unknown amount of sampling error. If we knew nothing at all about these sampling errors then meta-analysis would be impossible. However, we do know one key fact about sampling error that makes meta-analysis not only possible but feasible and powerful. Namely, we know that sampling errors tend to form a normal distribution with a mean of zero. By having a normal distribution with a mean of zero, it logically follows that all of the sampling errors in one direction (e.g., all studies in which the sample effect is stronger than the true population effect) will be balanced by the sampling errors in the other direction (i.e., all studies in which the sample effect is weaker than the true population effect). In short, when one computes the mean sample-weighted effect across studies in a metaanalysis, whether it be the mean effect size or the mean correlation, the resulting value is largely free of sampling error.
Consequently, the mean d value in an effect size meta-analysis is a direct estimate of what one would have obtained if it were possible to test the entire population. The true magnitude of the effect size in a population is represented by the Greek symbol āĪ“ā (lowercase delta; see Hedges & Olkin, 1985; Hunter & Schmidt, 1990a). Accordingly, the mean effect size across the studies in an effect size meta-analysis, namely becomes a direct estimate of Ī“. Similarly, the true magnitude of the correlation in a population is represented by the Greek symbol āĻā (rho), so the mean correlation across the studies in a correlational metaanalysis, namely becomes a direct estimate of Ļ. After estimating Ī“ or Ļ, one next assesses the variability in the aggregated ds or rs. Zero, or low levels of variability, suggest that the ds or rs represent a single population, whereas high levels of variability suggest different subpopulations (i.e., there may be moderator variables operating).
It is important to note that because meta-analytic results theoretically represent population parameters (Ī“ and Ļ), it is conceptually illogical to apply significance tests to meta-analytic results-for example, to test to see whether two Ļs or Ī“s are statistically different. First, significance tests are tests of inferences from a sample to the population; however, meta-analytic results represent population effects. Second, it has been argued that the use of significance tests in primary research under conditions of low power (almost universal in research literatures) is the very reason why research literatures appear contradictory and confusing in the first place (see Cortina & Dunlap, 1997; Hagen, 1997; Schmidt, 1992, 1996; Schmidt & Hunter, 1978; Thompson, 1996; Wilkinson et al., 1999, for a discussion of this issue), and consequently, the reason why meta-analysis is necessary to make sense of these literatures. It is, therefore, illogical to introduce this same problem to meta-analysis itself. Despite this, it is not uncommon to find meta-analytic results being subjected to inferential tests of statistical significance (see, e.g., Alliger, 1995; Aguinis & Pierce, 1998; Hedges & Olkin, 1985); and in fact, in our experience it is also not an uncommon request by journal reviewers and editors.
There are, of course, a number of issues which affect the meta-analysis process that need to be recognized. One of the most prominent of these issues is the influence of statistical artifacts such as range restriction and measurement error. These artifacts reduce the size of the effect in the primary studies, thereby making the mean effect across the studies an underestimate of the true strength of the effect in the population. Hunter and Schmidt (1990a) provide a variety of correction formulas to account for the influence of artifacts. The correction can be done individually for each study before aggregation, although that is rare because, by necessity, each study must report the appropriate artifact data. The more common approach is to correct the mean effect after aggregation using the average of whatever artifact data is available across the studies.
Another prominent issue is whether the population is homogeneous, that is, whether the strength of the effect is consistent across all situations. If an extraneous variable changes the strength of the main effect, that variable is referred to as a moderator variable and it causes the theory of sampling error (the fundamental basis of meta-analysis) to break down. Namely, it induces a nonrandom effect into some of the studies, thereby making the mean of the deviations from the underlying population value something other than zero. In this case the mean effect, if computed, becomes an estimate of the average strength of the effect in the population rather than an estimate of the unitary value of the effect. As noted previously, the degree of variability in the aggregated effects is one indicator of the presence or operation of moderator variables.
The standard solution when dealing with a heterogenous population is to separate the studies by the various levels of a known moderator variable and conduct a separate meta-analysis for each of the levels. If the test for homogeneity (see Hunter and Schmidt, 1990a, and in subsequent sections of this book) subsequently shows each level to be homogeneous, then each mean effect once again becomes an estimate of the unitary strength of the effect in the population (for that level of the moderator variable). If the test for homogeneity still suggests a heterogenous situation, a further breakdown by another moderator variable may be warranted.
GENERAL OVERVIEW OF META-ANALYSIS
As a general descriptive statement, meta-analysis refers to a set of statistical procedures that are used to quantitatively aggregate the results of multiple primary studies to arrive at an overall conclusion or summary across these studies. Ideally, a meta-analysis calls for several hundred data points (e.g., Arthur, Bennett, Edens, & Bell, 2001; Judge, Thoresen, Bono, & Patton, 2000; Hunter & Hunter, 1984; Schmitt, Gooding, Noe, & Kirsch, 1984). This goal is, of course, difficult to meet, although it is not uncommon for meta-analyses to sometimes consist of well over one hundred data points (e.g., Alliger, Tannenbaum, Bennett, Traver, & Shetland, 1997; Arthur, Bennett, Stanush, & McNelly, 1998; Gaugler, Rosenthal, Thornton, & Bentson, 1987; Huffcutt & Arthur, 1994; Stajkovic & Luthans, 1998). The primary reason why a less than optimal number of data points is typically analyzed is due to deficiencies in the quality of microlevel reporting (Orwin & Cordray, 1985). This occurs when primary studies fail to meet the criteria for inclusion because they do not report the information necessary to permit their inclusion in the meta-analysis. The effects and implications of a meta-analysis based on a small numbers of data points or studies are discussed in the sections on fully hierarchical moderator analysis in chapters 2. and 3.
Meta-analysis can be described as a secondary research method or design that can also be put to several uses. For instance, it can be used as a quantitative literature review procedure or a research design to test hypotheses pertaining to the relationships between specified variables. Along these lines, there are several advantages associated with meta-analysis. Meta-analyses are able to summarize large volumes of literature. They can also be used to resolve conflicts between two or more bodies of literature by comparing effect sizes across them. Conversely, they have also been known to generate their own controversies or conflicts when independent meta-analyses of ostensibly the same topic or literature have resulted in somewhat divergent conclusions. Examples of these include (a) the job performance-satisfaction relationship (Iaffaldano & Muchinksy, 1985; Judge et al., 2000; Petty, McGee, & Cavender, 1984); (b) the validity of student evaluation of instructorsā teaching effectiveness (Abrami, 1984; Cohen, 1981, 1982, 1983, 1986; dāApollonia & Abrami, 1996; dāApollonia, Abrami, & Rosenfield, 1993;...