What is R?
R is a free and open source computer program for processing data. It runs on all major operating systems and relies primarily on the command line for data input (www.r.project.org). This means that instead of interacting with the program by clicking on different parts of the screen via a graphical user interface (GUI), users type commands for the operations they wish to complete. For new users this might seem a little daunting at first, but the approach has a number of benefits, as highlighted by Gary Sherman (2008: 283), developer of the popular geographical information system (GIS) QGIS:
With the advent of âmodernâ GIS software, most people want to point and click their way through life. That's good, but there is a tremendous amount of flexibility and power waiting for you with the command line. Many times you can do something on the command line in a fraction of the time you can do it with a GUI.
A key benefit is that commands sent to R can be stored and repeated from scripts. This facilitates transparent and reproducible research by removing the need for software licences and encouraging documentation of code. Furthermore, access to R's source code and the provision of a framework for extensions has enabled many programmers to improve on the basic, or âbaseâ, R functionality. As a result, there are now more than 5000 official add-on packages, allowing R to tackle almost any numerical problem. If there is a useful function that R cannot currently perform, it is likely that someone is working on a solution. One area where extension of R's basic capabilities have been particularly successful in recent years is the addition of a wide variety of spatial analysis and visualisation tools (Bivand et al., 2013). The latter will be the focus of this chapter.
Why R for Spatial Data Visualisation?
R was conceived â and is still primarily known â for its capabilities as a âstatistical programming languageâ (Bivand and Gebhardt, 2000). Statistical analysis functions remain core to the package, but there is broadening functionality to reflect a growing user base across disciplines. It has become âan integrated suite of software facilities for data manipulation, calculation and graphical displayâ (Venables et al., 2013). Spatial data analysis and visualisation is an important growth area within this increased functionality. The map of Facebook friendships produced by Paul Butler, for example, is iconic in this regard and has reached a global audience (Butler, 2010). It shows linkages between friends as lines passing across the curved surface of the Earth (using the geosphere package). The secret to the success of this map was the time taken to select the appropriate colour palette, line widths and transparency for the plot. As we discuss later in this chapter, the importance of such details cannot be overstated. They can be the difference between a stunning graphic and an impenetrable chart.
Arguably Butler's map helped inspire the R community to produce more ambitious graphics, a process fuelled by an increased demand for data visualisation and the development of packages that augment R's preinstalled âbase graphicsâ. Thus R has become a key tool for analysis and visualisation used by the likes of Twitter, the New York Times and Google. Thousands of consultants, design houses and journalists also rely on R â it is not the preserve of academic research, and many graduate jobs now list R as a desirable skill.
It is worth noting that there are a few key differences between R and traditional desktop GIS software. While dedicated GIS programs handle spatial data by default and display the results in a single way, there are various options in R that must be decided by the user.
One example of this is the choice between R's base graphics and a dedicated graphics package such as ggplot2. The former option requires no additional packages and can provide very quick feedback about the nature of the dataset in question with the generic plot() function. The ggplot2 option, by contrast, requires a new package to be loaded but opens up a very wide range of functions for visualising data, beyond the base graphics. ggplot2 also has sensible defaults for grid axes, legends and other features, allowing the user to create complex and beautiful graphics with minimal effort. We encourage users to try both but, following the focus on visualisation, have used ggplot2 for all but the first two plots presented in this chapter.
An innovative feature of this chapter is that all of the graphics presented in it are reproducible (see the next section for how). We encourage users not only to reproduce the graphics presented here but also to play around with the code, taking advantage of the wide range of visual analysis options opened up by R. Indeed, it is this flexibility, illustrated by the custom map of shipping routes presented later in this chapter, that makes R an attractive visualisation solution.
All of the results presented in this chapter can be reproduced (and modified) by typing the short code snippets that are presented into R. Elsewhere in this book, these principles are extended in the context of reproducible geographic information science.