Introduction
The modifiable areal unit problem (MAUP) refers to the arbitrary nature of areal units used in many spatial analyses, as well as the dependency of resulting statistical properties upon the spatial configuration of these areal units (Wong 2009a, Wong 2009b). A configuration of areal units employed in a study is modifiable, or more accurately substitutable, because many alternative surface partitionings exist, which are actually available and/or theoretically viable. Although, in some situations, a specific areal unit configuration is essential because of data availability only with that particular configuration, in other situations, one configuration can be preferred to others. In addition, researchers may constitute a new configuration by aggregating pre-existing lower-level areal units (i.e. smaller polygons). In any of the preceding cases, the arbitrary nature of areal units is unavoidable such that no undeniable justification is possible regarding whether or not one spatial configuration is optimal for revealing an underlying spatial process of a phenomenon under investigation. The twin analytical results aspect of the MAUP, their dependency upon or sensitivity to a spatial configuration (Fotheringham and Wong 1991, 1025), is more fundamental; a different spatial configuration often yields significantly different statistical results. This uncertainty or instability of analytical results (Fotheringham and Wong 1991, Manley 2014) implies that no conclusive statistical statement is possible in the field of spatial analysis, especially when areal data are used.
The vast majority of MAUP studies have been dedicated to exploring and analyzing how significant the effects of the MAUP are, and in which ways they have an impact on statistical results, not only for such basic descriptive statistics as means, variances, and correlation coefficients but also for more sophisticated statistical techniques, such as multiple regression and other types of spatial data analyses (e.g. Arbia and Petrarca, 2011, Fotheringham and Wong 1991, Amrhein 1995, Amrhein and Reynolds 1996, Wong et al. 1999, Flowerdew et al. 2001, Dark and Bram 2007, Arbia and Petrarca 2011). Even though a considerable amount of literature has accumulated especially since the mid-1990s, our knowledge regarding both diagnosis and prognosis of the MAUP is still limited. Indeed, an observation made about 35 years ago by one of the earlier pioneers in the MAUP research is still valid (Openshaw, 1984, 6): ‘the MAUP is today one of the most important unresolved problems left in spatial analysis.’ This sentiment is well echoed by a recent review of the MAUP (Manley 2014, 1158); ‘we have neither a full and detailed understanding of the problem nor the underlying causes.’ Hence, more effort is necessary to develop a research framework to obtain more comprehensive, and possibly more generalizable, results about how the MAUP effects behave.
Spatial autocorrelation (SA) is known to be a primary source of the MAUP (Openshaw and Taylor 1979, Arbia 1989, Fotheringham and Wong 1991, Wong 1996), and efforts to discover a relationship between the level of aggregation (AG) and the level of SA have been made (Cliff and Ord 1981, Chou 1991, 1995, Qi and Wu 1996, Griffith et al. 2003). Also, an impact of spatial aggregation on SA has been well investigated in geostatistics (e.g. Journel and Huijbregts 1978). Especially, the effect of regularization on a variogram (that is, how the overall structure of SA changes with spatial aggregation) is well explored in the context of change of support. Recent studies, including Kyriakidis (2004), Kyriakidis and Yoo (2005), and Yoo et al. (2010), explore impacts of spatial aggregation in area-to-point spatial interpolation, focusing more on scale effects. However, much of the interplay between these two concepts, once referred to as ‘two very stubborn but pervasive problems in statistical analysis of spatial data’ (Wong 2009a, 120), still remains unknown. That is, SA is a source of uncertainty in the MAUP effects that make it difficult to derive a generalizable behavior for the MAUP. In addition, despite a consensus that a well-designed simulation is essential to a solid research framework to evaluate the effects of the MAUP in a statistical analysis (Green and Flowerdew 1996, 43), methodological advances have been meager. A better simulation framework may require a well-founded random aggregation procedure (e.g. Flowerdew et al. 2001), which is equipped with a reliable and efficient algorithm for aggregating areal units for different levels of AG. It also should have a conceptually sound evaluation scheme furnishing a simultaneous assessment of both the scale and zoning effects on statistical properties.
The objective of this paper is to investigate uncertainty surrounding relationships between SA and the MAUP with an extensive simulation experiment. Although the literature shows that they have an impact on each other, it is still uncertain how they affect each other. For instance, Fotheringham and Wong (1991) show how the MAUP can behave differently with four census variables that have various levels of SA, but it is limited to only the empirical variables and is not enough to explore a wide spectrum of uncertainty. Hence, this paper aims to explore how differently the MAUP behaves across levels of SA. Specifically, the investigation focuses on whether the initial level of SA at the finest spatial scale makes a substantial difference to the MAUP effects, the scale effect arising from the level of aggregation, and/or the zoning effect arising from the variety of zonations at the same AG level. That is, the level of SA at the finest resolution is considered as a factor that increases uncertainty of the MAUP effects. The initial level of SA as a potential factor on the MAUP is visualized and examined with a regression analysis using the outcome of the simulation experiments, an assessment not appearing in the literature. The impacts on three univariate summary statistics are focused on: i.e. the mean, variance, and Moran coefficient (MC). In the simulation experiment, a random spatial aggregation (RSA) procedure was devised and utilized to generate random zonations by aggregating smaller areal units.
Spatial autocorrelation and the MAUP
Two major MAUP effects exist: scale and zoning (also referred to as zonation or aggregation). Assuming that the overall MAUP effects occur in a spatial aggregation process (the same as a spatial partitioning process in a theoretical sense) whereby ‘a larger number of smaller areal units are grouped into a smaller number of larger areal units’ (Amrhein 1995, 105), the two sub-effects are jointly responsible for the complete process. The scale effect occurs because of differences in the number of areal units into which a study region has been partitioned. In contrast, the zoning effect occurs exclusively because of differences in how lower-level areal units are grouped into a particular number of higher-level areal units. The importance of SA in MAUP studies, or the interplay of these two concepts, is twofold. First, SA is a primary source of the MAUP. Second, SA itself is subject to the MAUP effects.
Regarding the first aspect, Fotheringham and Wong (1991) and Wong (1996) explicitly point out a direct link between the two, which was suggested earlier by Openshaw and Taylor (1979). A smoothing process occurs when spatial aggregation proceeds, and is responsible for a tendency of reduced variance and correlation. This explanation seems to apply at least to the scale effect (Green and Flowerdew 1996, Wong 1996). As adjacent areal units are aggregated to constitute a larger areal unit, their peculiarities or heterogeneity are expected to be reduced, thus resulting in a reduction in variance and correlation coefficients, assuming a relatively stable covariance (Fotheringham and Wong 1991). Furthermore, Wong (1996) argues that the degree of susceptibility to the MAUP effects could vary from one variable to another because they contain different levels of SA, which may explain why succinct results from MAUP studies dealing with statistical situations involving multiple variables are more difficult to obtain.
The zoning effect, even at some given spatial scale, also can lead to uncertainty or instability in a spatial data analysis. As Openshaw (1984) points out, the zoning effect may be greater than the scale effect. Lee (2001) proposes a spatial smoothing scale, which is subsequently named S, as an alternative univariate SA measure (Lee 2004, 2009, 2017). This particular measure is based on the concept that the SA level of a geographic variable is directly associated with the amount of variance reduction attributable to transforming ...