Multilevel Analysis
eBook - ePub

Multilevel Analysis

Techniques and Applications, Third Edition

Joop Hox, Mirjam Moerbeek, Rens van de Schoot

Share book
  1. 348 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Multilevel Analysis

Techniques and Applications, Third Edition

Joop Hox, Mirjam Moerbeek, Rens van de Schoot

Book details
Book preview
Table of contents
Citations

About This Book

Applauded for its clarity, this accessible introduction helps readers apply multilevel techniques to their research. The book also includes advanced extensions, making it useful as both an introduction for students and as a reference for researchers. Basic models and examples are discussed in nontechnical terms with an emphasis on understanding the methodological and statistical issues involved in using these models. The estimation and interpretation of multilevel models is demonstrated using realistic examples from various disciplines including psychology, education, public health, and sociology. Readers are introduced to a general framework on multilevel modeling which covers both observed and latent variables in the same model, while most other books focus on observed variables. In addition, Bayesian estimation is introduced and applied using accessible software.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Multilevel Analysis an online PDF/ePUB?
Yes, you can access Multilevel Analysis by Joop Hox, Mirjam Moerbeek, Rens van de Schoot in PDF and/or ePUB format, as well as other popular books in Psicología & Investigación y metodología en psicología. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Routledge
Year
2017
ISBN
9781317308676

1
Introduction to Multilevel Analysis

Summary

Social research regularly involves problems that investigate the relationship between individuals and the social contexts in which they live, work, or learn. The general concept is that individuals interact with the social contexts to which they belong, that individual persons are influenced by the contexts or groups to which they belong, and that those groups are in turn influenced by the individuals who make up that group. The individuals and the social groups are conceptualized as a hierarchical system of individuals nested within groups, with individuals and groups defined at separate levels of this hierarchical system. Naturally, such systems can be observed at different hierarchical levels, and variables may be defined at each level. This leads to research into the relationships between variables characterizing individuals and variables characterizing groups, a kind of research that is generally referred to as ‘multilevel research’.
In multilevel research, the data structure in the population is hierarchical, and the sample data are a sample from this hierarchical population. For example, in educational research, the population typically consists of classes and pupils within these classes, with classes organized within schools. The sampling procedure often proceeds in successive stages: first, we take a sample of schools, next we take a sample of classes within each sampled school, and finally we take a sample of pupils within each sampled class. Of course, in real research one may have a convenience sample of schools, or one may decide not to sample pupils but to study all available pupils in each class. Nevertheless, one should keep firmly in mind that the central statistical model in multilevel analysis is one of successive sampling from each level of a hierarchical population.
In this example, pupils are nested within classes. Other examples are cross-national studies where the individuals are nested within their national units, organizational research with individuals nested within departments within organizations, family research with family members within families and methodological research into interviewer effects with respondents nested within interviewers. Less obvious applications of multilevel models are longitudinal research and growth curve research, where a series of several distinct observations are viewed as nested within individuals, and meta-analysis where the subjects are nested within different studies.

1.1 Aggregation and Disaggregation

In multilevel research, variables can be defined at any level of the hierarchy. Some of these variables may be measured directly at their ‘own’ natural level; for example, at the school level we may measure school size and denomination, at the class level we measure class size, and at the pupil level, intelligence and school success. In addition, we may move variables from one level to another by aggregation or disaggregation. Aggregation means that the variables at a lower level are moved to a higher level, for instance, by assigning to the classes the class mean of the pupils’ intelligence scores. Disaggregation means moving variables to a lower level, for instance by assigning to all pupils in the schools a variable that indicates the denomination of the school they belong to.
The lowest level (level 1) is usually defined by the individuals. However, this is not always the case. For instance, in longitudinal designs, repeated measures within individuals are the lowest level. In such designs, the individuals are at level two, and groups are at level three. Most software allows for at least three levels, and some software has no formal limit to the number of levels. However, models with many levels can be difficult to estimate, and even if estimation is successful, they are unquestionably more difficult to interpret.
At each level in the hierarchy, we may have several types of variables. The distinctions made in the following are based on the typology offered by Lazarsfeld and Menzel (1961), with some simplifications. In our typology, we distinguish between global, structural and contextual variables.
Global variables are variables that refer only to the level at which they are defined, without reference to other units or levels. A pupil’s intelligence or gender would be a global variable at the pupil level. School denomination and class size would be global variables at the school and class level. Simply put: a global variable is measured at the level at which that variable actually exists.
Structural variables are operationalized by referring to the sub-units at a lower level. They are constructed from variables at a lower level, for example, in defining the class variable ‘mean intelligence’ as the mean of the intelligence scores of the pupils in that class. Using the mean of a lower-level variable as an explanatory variable at a higher level is called aggregation, and it is a common procedure in multilevel analysis. Other functions of the lower-level variables are less common, but may also be valuable. For instance, using the standard deviation of a lower-level variable as an explanatory variable at a higher level could be used to test hypotheses about the effect of group heterogeneity on the outcome variable (cf. Klein and Kozlowski, 2000).
Contextual variables are the result from disaggregation; all units at the lower level receive the value of a global variable for the context to which they belong at the higher level. For instance, we can assign to all pupils in a school the school size, or the mean intelligence, as a pupil-level variable. Disaggregation is not needed in a proper multilevel analysis. For convenience, multilevel data are often stored in a single data file, in which the group-level variables are repeated for each individual within a group, but the statistical model and the software will correctly recognize these as a single value at a higher level. The term contextual variable, however, is still used to denote a variable that models how the context influences an individual.
In order to analyze multilevel models, it is not important to assign each variable to its proper place in the typology. The benefit of the scheme is conceptual; it makes clear to which level a measurement properly belongs. Historically, multilevel problems have led to analysis approaches that moved all variables by aggregation or disaggregation to one single level of interest followed by an ordinary multiple regression, analysis of variance, or some other ‘standard’ analysis method. However, analyzing variables from different levels at one single common level is inadequate, and leads to two distinct types of problems.
The first problem is statistical. If data are aggregated, the result is that different data values from many sub-units are combined into fewer values for fewer higher-level units. As a result, much information is lost, and the statistical analysis loses power. On the other hand, if data are disaggregated, the result is that a few data values from a small number of super-units are ‘blown up’ into many more values for a much larger number of sub-units. Ordinary statistical tests treat all these disaggregated data values as independent information from the much larger sample of sub-units. The proper sample size for these variables is of course the number of higher-level units. Using the larger number of disaggregated cases for the sample size leads to significance tests that reject the null-hypothesis far more often than the nominal alpha level suggests. In other words, investigators come up with many ‘significant’ results that are totally spurious.
The second problem is conceptual. If the analyst is not very careful in the interpretation of the results, s/he may commit the fallacy of the wrong level, which consists of analyzing the data at one level, and formulating conclusions at another level. Probably the best-known fallacy is the ecological fallacy, which is interpreting aggregated data at the individual level. It is also known as the ‘Robinson effect’ after Robinson (1950). Robinson presents aggregated data describing the relationship between the percentage of blacks and the illiteracy level in nine geographic regions in 1930. The ecological correlation, that is, the correlation between the aggregated variables at the region level is 0.95. In contrast, the individual-level correlation between these global variables is 0.20. Robinson concludes that in practice an ecological correlation is almost certainly not equal to its corresponding individual-level correlation. For a statistical explanation, see Robinson (1950) or Kreft and de Leeuw (1987). Formulating inferences at a higher level based on analyses performed at a lower level is just as misleading. This fallacy is known as the atomistic fallacy.
A better way to look at multilevel data is to realize that there is not one ‘proper’ level at which the data should be analyzed. Rather, all levels present in the data are important in their own way. This becomes clear when we investigate cross-level hypotheses, or multilevel problems. A multilevel problem is a problem that concerns the relationships between variables that are measured at a number of different hierarchical levels. For example, a common question is how a number of individual and group variables influence one single individual outcome variable. Typically, some of the higher-level explanatory variables may be structural variables, for example the aggregated group means of lower-level global (individual) variables. The goal of the analysis is to determine the direct effect of individual- and group-level explanatory variables, and to determine if the explanatory variables at the group level serve as moderators of individual-level relationships. If group-level variables moderate lower-level relationships, this shows up as a statistical interaction between explanatory variables from different levels. In the past, such data were analyzed using conventional multiple regression analysis with one dependent variable at the lowest (individual) level and a collection of disaggregated explanatory variables from all available levels (cf. Boyd & Iversen, 1979). This approach is completely outdated, since it analyzes all available data at one single level, it suffers from all of the conceptual and statistical problems mentioned above.

1.2 Why Do We Need Special Multilevel Analysis Techniques?

Multilevel research concerns a population with a hierarchical structure. A sample from such a population can be described as a multistage sample: first, we take a sample of units from the higher level (e.g., schools), and next we sample the sub-units from the available units (e.g., we sample pupils from the schools). In such samples, the individual observations are in general not independent. For instance, pupils in the same school tend to be similar to each other, because of selection processes (for instance, some schools may attract pupils from higher social economic status (SES) levels, while others attract lower SES pupils) and because of the common history the pupils share by going to the same school. As a result, the average correlation (expressed as the so-called intraclass correlation) between variables measured on pupils from the same school will be higher than the average correlation between variables measured on pupils from different schools. Standard statistical tests lean heavily on the assumption of independence of the observations. If this assumption is violated (and with nested data this is almost always the case) the estimates of the standard errors of conventional statistical tests are much too small, and this results in many spuriously ‘significant’ results. The effect is generally not negligible, small dependencies in combination with medium to large group sizes still result in large biases in the standard errors. The strong biases that may be the effect of violation of the assumption of independent observations made in standard statistical tests has been known for a long time (Walsh, 1947) and are still a very important assumption to check in statistical analyses (Stevens, 2009).
The problem of dependencies between individual observations also occurs in survey research, if the sample is not taken at random but cluster sampling from geographical areas is used instead. For similar reasons as in the school example given above, respondents from the same geographical area will be more similar to each other than respondents from different geographical areas are. This leads again to estimates for standard errors that are too small and produce spurious ‘significant’ results. In survey research, this effect of cluster sampling is well known (cf. Kish, 1965, 1987). It is called a ‘design effect’, and various methods are used to deal with it. A convenient correction procedure is to compute the standard errors by ordinary analysis methods, estimate the intraclass correlation between respondents within clusters, and finally employ a correction formula to the standard errors. For instance, Kish (1965, p. 259) corrects the sampling variance using veff=v(1+(nclus1)ρ) where veff is the effective sampling variance, v is the sampling variance calculated by standard methods assuming simple random sampling, nclus is the cluster size, and ρ is the intraclass correlation. The intraclass correlation is described in Chapter 2, together with its estimation. The following example makes clear how important the assumption of independence is. Suppose that we take a sample of 10 classes, each with 20 pupils. This comes to a total sample size of 200. We are interested in a variable with an intraclass correlation οf 0.10, which is a rather low intraclass correlation. However, the effective sample size in this situation is 200 / [1 + (20 – 1)0.1] = 69.0, which is far less than the apparent total sample size of 200. Clearly, using a sample size of 200 will lead to standard errors that are much too low.
Since the design effect depends on both the intraclass correlation and the cluster size, large intraclass correlations are partly compensated by small group sizes. Conversely, small intraclass correlations at the higher levels are offset by the usually large cluster sizes at these levels.
Some of the correction procedures developed for cluster and other complex samples are quite powerful (cf. Skinner et al., 1989). In principle such correction procedures could also be applied in analyzing multilevel data, by adjusting the standard errors of the statistical tests. However, multilevel models are multivariate models, and in general the intraclass correlation and hence the effective N is different for different variables. In addition, in most multilevel problems we have not only clustering of individuals within groups, but we also have variables measured at all available levels, and we are interested in the relationships between all of these variables. Combining variables from different levels in one statistical model is a different and more complicated problem than estimating and correcting for design effects. Multilevel models are designed to analyze variables from different levels simultaneously, using a statistical model that properly includes the dependencies.
To provide an example of a clearly multilevel problem, consider the ‘frog pond’ theory that has been utilized in educational and organizational research. The ‘frog pond’ theory refers to the notion that a specific individual frog may be a medium-sized frog in a pond otherwise filled with large frogs, or a medium-sized frog in a pond otherwise filled with small frogs. Applied to education, this metaphor points out that the effect of an explanatory variable such as ‘intelligence’ on school career may depend on the average intelligence of the other pupils in the school. A moderately intelligent pupil in a highly intelligent context may become demotivated and thus become an underachiever, while the same pupil in a considerably less intelligent context may gain confidence and become an overachiever. Thus, the effect of an individual pupil’s intelligence depends on the average intelligence of the other pupils in the class. A popular approach in educational research to investigate ‘frog pond’ effects has been to aggregate variables like the pupils’ IQ into group means, and then to disaggregate these group means again to the individual level. As a result, the data file contains both individual-level (global) variables and higher-level (contextual) variables in the form of disaggregated group means. Already in 1976 the educational researcher Cronbach suggested to express the individual scores as deviations from their respective group means (Cronbach, 1976), a procedure that has become known as centering on the group mean, or group mean centering. Centering on the group means makes very explicit that the individual scores should be interpreted relative to their group’s mean. The example of the ‘frog pond’ theory and the corresponding practice of centering the predictor variables makes clear that combining and analyzing information from different levels within one statistical model is central to multilevel modeling.

1.3 Multilevel Theories

Multilevel data must be described by multilevel theories, an area that seems underdeveloped compared to the advances made in the modeling and computing machinery. Multilevel models in general require that the grouping criterion is clear, and that variables can be assigned unequivocally to their appropriate level. In reality, group boundaries are sometimes fuzzy and somewhat arbitrary, and the assignment of variables is not always obvious and simple. In multilevel research, decisions about group membership and operationalizations involve a range of theoretical assumptions (Klein & Kozlowski, 2000). If there are effects of the social context on individuals, these e...

Table of contents