PART I
Multilevel Model Specification and Inference
1
The Multilevel Model Framework
Jeff Gill
Washington University, USA
Andrew J. Womack
University of Florida, USA
1.1 OVERVIEW
Multilevel models account for different levels of aggregation that may be present in data. Sometimes researchers are confronted with data that are collected at different levels such that attributes about individual cases are provided as well as the attributes of groupings of these individual cases. In addition, these groupings can also have higher groupings with associated data characteristics. This hierarchical structure is common in data across the sciences, ranging from the social, behavioral, health, and economic sciences to the biological, engineering, and physical sciences, yet is commonly ignored by researchers performing statistical analyses. Unfortunately, neglecting hierarchies in data can have damaging consequences to subsequent statistical inferences.
The frequency of nested data structures in the data-analytic sciences is startling. In the United States and elsewhere, individual voters are nested in precincts which are, in turn, nested in districts, which are nested in states, which are nested in the nation. In healthcare, patients are nested in wards, which are then nested in clinics or hospitals, which are then nested in healthcare management systems, which are nested in states, and so on. In the classic example, students are nested in classrooms, which are nested in schools, which are nested in districts, which are then nested in states, which again are nested in the nation. In another familiar context, it is often the case that survey respondents are nested in areas such as rural versus urban, then these areas are nested by nation, and the nations in regions. Famous studies such as the American National Election Studies, Latinobarometer, Eurobarameter, and Afrobarometer are obvious cases. Often in population biology a hierarchy is built using ancestral information, and phenotypic variation is used to estimate the heritability of certain traits, in what is commonly referred to as the “animal model.” In image processing, spatial relationships emerge between the intensity and hue of pixels. There are many hierarchies that emerge in language processing, such as topic of discussion, document type, region of origin, or intended audience. In longitudinal studies, more complex hierarchies emerge. Units or groups of units are repeatedly observed over a period of time. In addition to group hierarchies, observations are also grouped by the unit being measured. These models are extensively used in the medical/health sciences to model the effect of a stimulus or treatment regime conditional on measures of interest, such as socioeconomic status, disease prevalence in the environment, drug use, or other demographic information. Furthermore, the frequency of data at different levels of aggregation is increasing as more data are generated from geocoding, biometric monitoring, Internet traffic, social networks, an amplification of government and corporate reporting, and high-resolution imaging.
Multilevel models are a powerful and flexible extension to conventional regression frameworks. They extend the linear model and the generalized linear model by incorporating levels directly into the model statement, thus accounting for aggregation present in the data. As a result, all of the familiar model forms for linear, dichotomous, count, restricted range, ordered categorical, and unordered categorical outcomes are supplemented by adding a structural component. This structure classifies cases into known groups, which may have their own set of explanatory variables at the group level. So a hierarchy is established such that some explanatory variables are assigned to explain differences at the individual level and some explanatory variables are assigned to explain differences at the group level. This is powerful because it takes into account correlations between subjects within the same group as distinct from correlations between groups. Thus, with nested data structures the multilevel approach immediately provides a set of critical advantages over conventional, flat modeling where these structures emerge as unaccounted-for heterogeneity and correlation.
What does a multilevel model look like? At the core, there is a regression equation that relates an outcome variable on the left-hand side to a set of explanatory variables on the right-hand side. This is the basic individual-level specification, and looks immediately like a linear model or generalized linear model. The departure comes from the treatment of some of the coefficients assigned to the explanatory variables. What can be done to modify a model when a point estimate is inadequate to describe the variation due to a measured variable? An obvious modification is to treat this coefficient as having a distribution as opposed to being a fixed point. A regression equation can be introduced to model the coefficient itself, using information at the group level to describe the heterogeneity in the coefficient. This is the heart of the multilevel model. Any right-hand side effect can get its own regression expression with its own assumptions about functional form, linearity, independence, variance, distribution of errors, and so on. Such models are often referred to as “mixed,” meaning some of the coefficients are modeled while others are unmodeled.
What this strategy produces is a method of accounting for structured data through utilizing regression equations at different hierarchical levels in the data. The key linkage is that these higher-level models are describing distributions at the level just beneath them for the coefficient that they model as if it were itself an outcome variable. This means that multilevel models are highly symbiotic with Bayesian specifications because the focus in both cases is on making supportable distributional assumptions.
Allowing multiple levels in the same model actually provides an immense amount of flexibility. First, the researcher is not restricted to a particular number of levels. The coefficients at the second grouping level can also be assigned a regression equation, thus adding another level to the hierarchy, although it has been shown that there is diminishing return as the number of levels goes up, and it is rarely efficient to go past three levels from the individual level (Goel and DeGroot 1981, Goel 1983). This is because the effects of the parameterizations at these super-high levels gets washed out as it comes down the hierarchy. Second, as stated, any coefficient at these levels can be chosen to be modeled or unmodeled and in this way the mixture of these decisions at any level gives a combinatorially large set of choices. Third, the form of the link function can differ for any level of the model. In this way the researcher may mix linear, logit/probit, count, constrained, and other forms throughout the total specification.
1.2 BACKGROUND
It is often the case that fundamental ideas in statistics hide for a while in some applied area before scholars realize that these are generalizable and broadly applicable principles. For instance, the well-known EM algorithm of Dempster, Laird, and Rubin (1977) was pre-dated in less fully articulated forms by Newcomb (1886), McKendrick (1926), Heal...