1.1 Introducing space in agricultural economics
There is a great need for reliable data in agriculture. However, this information should be analysed through appropriate statistical methods to obtain evidence that can assist farmers, for example, to optimize farm returns, decrease unnecessary applications of fertilizers and pesticides, and preserve natural resources.
Standard statistical techniques often perform poorly when employed to agricultural data, due to its spatial nature. In fact, one of the common assumptions in traditional statistics is the independence and homogeneity among the observations, a hypothesis patently violated when applied to agricultural data. An agricultural variable could often display similar values in adjacent areas, leading to spatial clusters. In many cases, nearby fields have similar soil-type, climate, precipitation or an area cultivated with wheat may be close to other wheat-cultivated zones. Ignoring this dependence when analysing agricultural data may produce bias or inefficient estimates. For these reasons, the topic of statistical analysis of spatial data is worth a specific treatment.
For a long time, the analysis of geographically distributed phenomena in agricultural economics was carried out without the consideration of space as crucial information. The present book tries to fill this gap by highlighting potential applications of spatial analysis for agricultural facts. Indeed, space is very important in agricultural economics studies. The land is a crucial resource in agriculture and most of the data collected is spatially distributed. Besides, all agricultural activities are spatially located. However, the application of spatial models has grown to become important in applied agricultural economics only during the last few decades (Anselin 2002; Anselin and Bera 1998; Goodchild et al. 2000).
Generally speaking, the term spatial means that each unit has a geographical reference, i.e., we know where each case happens on a map. If the locations of these sites are observed and enclosed to the observations as labels, the resulting data is called spatial data. In spatial data analysis, the set of spatial locations are considered as essential information in the study. Our main idea is that location matters in agriculture and that the occurrences essentially follow the First Law of Geography, which according to Tobler (1970) states that: “everything is related to everything else, but near things are more related than distant things”.
A proper definition of new spatial data science methods is required in order to analyse agricultural data and to uncover interesting, useful, and non-trivial patterns. The first economist that explicitly claimed the importance of space in agricultural economics was von Thünen (1783–1850). Von Thünen developed the Isolated State model (von Thünen 1826), whose framework is considered to be the first serious application of spatial economics and economic geography in agriculture. His agricultural location theory conjectured that the optimal organisation of agricultural activities is based on location factors. Hence, these activities are arranged in concentric rings around a central consumers’ town.
Recently, in many countries, the National Statistical Office geo-references the sampling frames of physical or administrative bodies used in agricultural surveys, not only with reference to the codes of a geographical taxonomy, but also adding data regarding the spatial position of each record.
Modern tools, such as GIS and remote sensing, are increasingly used in the monitoring of agricultural resources. For example, the developments in GIS technology offer growing opportunities to agricultural economics analysts dealing with large and detailed spatial databases, allowing them to combine spatial information from different sources and to produce different models, as well as tabular and graphic outputs.
The availability of these valuable sources of information makes the advanced models suggested in the spatial statistic and econometric literature applicable to agricultural economics.
More formally, spatial statistics is a field of spatial data analysis in which the observations are modelled using random variables. Ripley (1981) defines spatial statistics as “the reduction of spatial patterns to a few clear and useful summaries”, comparing such statistics “with what might be expected from theories of how the pattern might have originated and developed”.
Conversely, spatial econometrics is a branch of econometrics that deals with the modelling of spatial interaction and spatial heterogeneity in data analysis. The birth of this discipline can be traced back to the works of Paelinck and Klaassen (1979) and Anselin (1988). Following Anselin (1988), spatial econometrics can be defined as: “the collection of techniques that deal with the peculiarities caused by space in the statistical analysis of regional science methods”. In this book, we use a broad definition, referring to the term regional as those spatial units defined as areal regions, locations (i.e., points), and continuous units. Essentially, spatial econometrics represents a toolkit that allows for the rigorous treatment of data that is geographically distributed.
Interestingly, the analysis of agricultural yield data was also the motivation for the seminal paper by Whittle (1954), in which he analysed such data through two-dimensional stochastic models.
This book contains several contributions focused on spatial data and its use in monitoring agricultural resources, farms management, and regional markets. The theory of spatial methods is complemented by real and/or simulated examples implemented through the open-source software R.
The layout of this introductory chapter is as follows: in Section 1.2 the main typologies of spatial data are described. Section 1.3 contains some brief considerations about the R software. Section 1.4 outlines the contributions of this book, stressing the main evidences.
1.2 Spatial concepts: the essential
1.2.1 Spatial effects
Spatial econometrics aims to address in a formal way two effects that are typical of geo-referenced data: spatial dependence and spatial heterogeneity.
In recent years, a very extensive literature has stressed the role of spatial effects in many fields of statistical and econometric analysis (Anselin 1988; LeSage and Pace 2009; Benedetti et al. 2015; Kelejian and Piras 2017).
Spatial dependence may be defined as “the propensity for nearby locations to influence each other and to possess similar attributes” (Goodchild 1992, p.33). Empirical models that do not take spatial dependence and structural heterogeneities into account may show serious misspecification problems (see Chapters 9, 12, and 13 in this book).
Spatial dependence may also be referred to as the relationship among outcomes of a variable that is a result of the geographical position of their locations. It measures the similarity of variables within an area and the level of interdependence between the variables (Cliff and Ord 1981; Cressie 1993; Haining 2003). The procedures used to analyse patterns of spatial dependence vary according to the type of data.
For regional scientists, the economic counterpart of spatial dependence is the analysis of spillovers. Measuring the degree of spatial spillovers (LeSage and Pace 2009) and evaluating the extent of contagion (Debarsy et al. 2017) might help policy makers reach a more accurate comprehension of the agricultural phenomena.
The spatial analysis is often based on the definition of contiguity links that enable practitioners to entail geographical structures. These are defined in terms of a proximity matrix (the so-called W). Typically, the weights from the Wmatrix are non-stochastic and exogenous. The weights matrix is often defined with two areas defined as neighbours if they share a common border. In some cases, there is a need to capture and to model other forms of spatial proximity as hierarchical dependence and patterns of spatial competition (Haining 1990). However, another possible approach to define the elements of the weight matrix is in terms of similarity of one or more covariates (Conley and Topa 2002).
Surprisingly, in the spatial econometric and statistic literature, spatial heterogeneity, which is another relevant characteristic highlighted by spatial data, has been less investigated.
Spatial heterogeneity is connected to the absence of stability and it implies parameters to vary over space (Anselin 1988). The presence of a not constant relationship between a response variable and the covariates on a spatial unit has led to the introduction of spatially varying coefficients (Wheeler and Calder 2007), the geographically weighted regression (Fotheringham et al. 2002), the Bayesian regression models with spatially varying coefficient (Gelfand et al. 2003), and the local linear regression models (Loader 1999). In the field of linear estimation, spatial heterogeneity could lead to a serious problem of misspecification of the model (Postiglione et al. 2013).
The spatial heterogeneity can be classified into discrete heterogeneity and continuous heterogeneity. Continuous heterogeneity specifies how the regression coefficients change over space, as estimated, for example, through a local estimation process, as in the geographically weighted regression (GWR, Fotheringham et al. 2002). Discrete heterogeneity consists of a pre-specified set of spatial regimes or a predetermined group of spatial units (Anselin 1990; Postiglione et al. 2013), between which model coefficients are permitted to vary. For further details on spatial heterogeneity applications on agricultural data, see Chapters 8 and 9 in this book.
As highlighted by Postiglione et al. (2017), a new direction in the field of spatial analysis will be represented by the joint treatment of the two spatial effects: spatial dependence and spatial heterogeneity.
1.2.2 Types of spatial data
Spatial data refers to an observation on which we know the value of the variable and the location.
The variables, for example, may be univariate or multivariate, categorical or continuous. They may be based on an observational study, a well-designed experiment, or a sample survey.
The spatial domain, defined as the set of geographical coordinates, offers a potentially huge source of information for the process analysis. There are many different types of spatial data, and, as a consequence, different forms of spatial statistics are required.
According to the classification suggested by Cressie (1993), three types of spatial data can be identified: