Big Data for Regional Science
eBook - ePub

Big Data for Regional Science

  1. 350 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

About this book

Recent technological advancements and other related factors and trends are contributing to the production of an astoundingly large and rapidly accelerating collection of data, or 'Big Data'. This data now allows us to examine urban and regional phenomena in ways that were previously not possible. Despite the tremendous potential of big data for regional science, its use and application in this context is fraught with issues and challenges. This book brings together leading contributors to present an interdisciplinary, agenda-setting and action-oriented platform for research and practice in the urban and regional community.

This book provides a comprehensive, multidisciplinary and cutting-edge perspective on big data for regional science. Chapters contain a collection of research notes contributed by experts from all over the world with a wide array of disciplinary backgrounds. The content is organized along four themes: sources of big data; integration, processing and management of big data; analytics for big data; and, higher level policy and programmatic considerations. As well as concisely and comprehensively synthesising work done to date, the book also considers future challenges and prospects for the use of big data in regional science.

Big Data for Regional Science provides a seminal contribution to the field of regional science and will appeal to a broad audience, including those at all levels of academia, industry, and government.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Big Data for Regional Science by Laurie A Schintler, Zhenhua Chen, Laurie A Schintler,Zhenhua Chen in PDF and/or ePUB format, as well as other popular books in Betriebswirtschaft & Business allgemein. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Routledge
Year
2017
Print ISBN
9781138282186
eBook ISBN
9781351983259

1 Introduction

Laurie A. Schintler and Zhenhua Chen
Big data is gaining popularity in regional science (Rae and Singleton, 2015). Already in the field we are seeing a variety of applications in the areas of urban planning, human and economic geography, transportation, among others (see e.g., Batty, 2017; Thakuriah, Tilahun and Zellner, 2017; Tranos and Nijkamp, 2015; Arribas-Bel, 2014). But this area of research is just beginning to unfold and take shape, and there are many opportunities yet to be exploited and new ones that are certain to arise as technology advances and other developments create new sources of big data and augment our ability to process, analyze and manage big data. At the same time, big data brings with it a long list of caveats, concerns and complicating factors, and to properly realize the benefits of big data for regional science, these issues will need to be attended to as well. Thus, moving forward there are several questions in front of us. What are the prospects of big data for regional science, both in the near term and the more distant horizon? How can big data be used to develop, enhance or re-think theories and models of spatial systems, and how might these efforts enhance our knowledge of various spatial phenomena, and ultimately improve the livability, sustainability and vitality of cities and regions? What are the technical, institutional and analytical challenges associated with the use of big data for regional science and how do we address them? How reliable is big data, and how does the quality and coverage of the data vary by source and application? And given that location is a feature in the phenomena we study, which in fact differentiates us from other areas of inquiry in the social sciences, how does space in big data present itself as a unique obstacle and opportunity in the use and application of the data? This book is intended to explore these issues and to provide a synthesis of the frontiers of big data for regional science. It is also intended as agenda-setting platform, to help steer the future course of research in this area.

Background

Originally coined in the 1990s to describe data too big to be processed in standard software, the concept of big data has since then expanded in scope and complexity. Big data is now generally defined in terms of four core dimensions: volume, velocity, variety and veracity – colloquially referred to as the ‘4 Vs’. That is, while big data does indeed tend to be large, which in today’s terms equate to datasets measured in units of information that go well-beyond megabytes and gigabytes to terabytes and even bigger, it is much more than that. Big data is also characterized as data that is fast-moving, often being produced and disseminated in real-and near-real time; heterogeneous in formats, comprising not only structured forms of data but also unstructured elements such as text, video and images; and messy, with varying levels of fidelity and reliability. The data is also defined in terms of the contextual terrain in which it is positioned, produced and utilized. In big data for regional science, location is a critical and complicating contextual factor, delineating the unique cultural, economic, social, political and environmental fabric of each region, and this complexity creates additional challenges and opportunities.
Some 15 years ago, long before the term big data entered popular vernacular, Arthur Getis wrote about the prospects and challenges of using large datasets for regional science (Getis, 1999). As he notes, datasets in the field at that time were beginning to get larger and larger, and to increase in spatial and temporal dimensionality and resolution. New sources of data were also beginning to be exploited, moving beyond traditional, official government records – e.g., administrative censuses, to commercial and other proprietary databases, such as Dun and Bradstreet. Interestingly, many of the issues he raised at the time are still very much relevant in today’s big data landscape. While enabling us to gain a richer and more refined understanding of urban and regional phenomena, larger datasets tend to be more convoluted, computationally intensive and difficult to manage, analyze and document than smaller datasets, which contributes to an array of issues related to data curation, data sharing, data quality assurance, data processing and manipulation and data analysis. However, in the current data-driven era while many of these issues do indeed remain, they have become evermore complicated, and as the technological landscape evolves, new problems, solutions and prospects are surfacing.
Technology is a critical driver behind the production of big data and all its derivatives, enabling us to use and exploit the data and contributing to a steady stream of rich and varied information. Recent technological developments and advancements, including the birth and expansion of the Internet and World Wide Web (WWW), as well as the increasing penetration of mobile and location-sensing devices – e.g., smartphones, in society, are giving rise to a large and rapidly amassing collection of spatial big data. This data is ripe with information about the spatial movements, activity patterns, preferences and sentiment of individuals and organizations, and the urban and regional systems in which they reside, work and interact. For example, social media sites, crowdsourcing and citizen sensing platforms and related Web 2.0 ‘apps’, which allow users to share information and data on-the-fly, are capturing and recording the digital footprints of billions of people daily. Online job aggregators, e-commerce exchanges and similar kinds of platforms are acting as near-real-time barometers of economic activity within and across nations, regions and municipalities. Additionally, the Internet of Things (IoTs), which comprises a large and growing amalgamation of devices and sensors connected to the Internet, is actively monitoring many aspects of our lives and environment. ‘Smart Cities’, which use integrated systems of communications and surveillance technologies for managing assets, supporting operational decisions, developing plans and engaging citizens, are producing reams of data on the dynamics of urban systems. All this data can only be expected to proliferate and expand in spatial, temporal and topical coverage as the developing world goes increasingly mobile and digital, the IoTs approaches full maturity and communities become more smart and connected.
These new and expanding sources of data offer tremendous opportunities for regional science, allowing us to examine urban and regional phenomena in ways that have not been (and are not) possible using smaller, more traditional structured data. Web 2.0 ‘apps’ are providing unprecedented insight on individuals who desire to share their consumption patterns and preferences with other, including the status-directed Veblen consumer, and as many ‘apps’ are highly specialized and tailored to specific interests and needs of their users, we can now more readily and robustly study niche economies – e.g., craft beer markets (McLaughlin, Reid and Moore, 2016). Big data also enable us to sense the location preferences and choices of individuals, firms and organizations. It also permits us to re-define regions themselves, and to re-configure boundaries in ways that may be different than those constructed for administrative purposes (Nelson and Rae, 2016) and to focus on individuals as units of observation, as opposed to administrative units. This gives us a chance to explore space-time activity patterns and movements in new and exciting ways, and in much greater detail and on a grander scale than before. On the downside, some have raised concerns about the data deluge signaling an end to theory (Anderson, 2008). On the other hand, big data may in fact allow us to re-test old theories, and to re-consider and recast them in renewed light – that is, in context that is markedly different than when they were originally conceived. Moreover, it may finally provide the critical connection between (description-seeking) and nomothetic (law-seeking) knowledge in the spatial sciences (Miller and Goodchild, 2015), and it can lead to pathbreaking theoretical perspectives on the dynamics of cities and regions, such as in the new theory of urban and network flows proposed by Michael Batty (Batty, 2013).
From a more pragmatic perspective, big data and related technologies can help in designing and facilitating livable, healthy and sustainable communities and moreover in creating spaces that are personalized to the needs and preferences of those who use and reside in them. In fact, this is the concept behind smart and connected communities, and an impetus behind the burgeoning field of urban informatics (Batty, 2017a). But big data is not a remedy for everything and traditional data will continue to serve a purpose – e.g., for understanding aspects of urban and regional systems where civic aggregations have some meaning and useful application and for supporting conventional planning activities such as long-term regional forecasting. Further, it can (and does) lead to socially undesirable outcomes and processes, such as social sorting and discrimination, disadvantaging certain people, locations and segments of society, and moreover, there are deep disparities in who has the resources, clout, skills and technologies necessary to produce and/or consume big data (boyd and Crawford, 2010). While, indeed, there has long been a digital divide, the gap appears to be widening and deepening in the data-driven age (Bilbao-Osorio et al., 2014).
There are also a myriad of statistical and analytical issues and challenges related to the use of big data for regional science. For example, much of the data at our current disposal is endowed with a high degree of spatial (and temporal) resolution and broad coverage. Therefore, it can be parsed and aggregated in countless ways, unlike administrative records, which have a more finite set of configurations and boundaries, making problems like Modifiable Areal Unit and Modifiable Temporal Unit evermore complex. Sample bias is another concern. While information collected by government agencies and other official organizations through censuses and other top-down sourcing mechanisms are generally representative of the population being surveyed in a region, bottom-up collected data – whether actively or passively produced – generally capture narrowly defined slivers of the population. For example, data produced by ‘apps’ tends to be skewed towards the demographics, preferences, motivations, interests and other contextual circumstances of those who use them. At the same time, there is some evidence that such biases wash out in the aggregate (Rae and Nelson, 2018), and this is something which certainly merits further examination. The digital divide introduces additional biases and inconsistencies in the quality and coverage of data, which is becoming increasingly complex in today’s technological landscape (Schintler, 2018). Lastly, big data should not be analyzed through a blind lens. That is, failure to consider the spatial, temporal and broader societal context in which the data has been generated can contribute to spurious correlations and faulty conclusions and inferences, as happened recently with Google Flu Trends (Butler, 2013).
Additional challenges relate to the management and institutional aspects of big data. Big data is constantly ‘on the go’, moving through an information supply chain where at each stop it tends to get reprocessed, repackaged and repurposed – unlike traditional data which is more static in nature – making data curation and documentation formidable undertakings. Moreover, geographic boundaries and other features of the data often change as the data moves from one entity to another, complicating matters further. Given the sheer size and velocity of today’s data, it often contains information beyond what we may need. It may be much larger in spatial and/or temporal coverage than desired, or in a format that is not immediately amenable to analysis, such as geocoded for use in a GIS. There are also issues tied to privacy and security, both of which have a spatial dimension. For example, location surveillance technologies, such as video cameras in rooms of buildings and personal health monitoring devices, are collecting sensitive information on the movements and behavior of individuals in space. Furthermore, given our increasing reliance on the Cloud and connectedness to the Internet, cyber-disruptions are an ever-growing threat with some locations, organizations and people more vulnerable and at risk than others.
To address the challenges at hand, and ultimately to exploit the full potential of big data for regional science, it will be important to develop appropriate methods and tools. Indeed, we already have several powerful analytics at our disposal, such as k-means clustering, Principle Component Analysis, spatial kriging and interpolation, which can help in these regards. However, it will also be imperative for us to draw upon and advance techniques outside of our usual toolbox to accommodate the nuances of spatial big data. In fact, in general, there have been increasing calls for the development and use of new tools and processes for analyzing and modeling big data, apart from and in combination with those traditionally used for investigating and extracting meaning from data (Varian, 2014; Doornik and Hendry, 2015). Visualization as a tool should also not be overlooked. Not only can it help in teasing out patterns of association in large, complex spatial datasets, it can also be an effective and efficient mechanism for conveying technical information to lay audiences. Dashboards, which combine visualization and advanced data analytics, can empower a city with intelligence, providing citizens, planners and authorities with the engines to support better decision making, discovery and exploration (Barufi and Kourtit, 2015). Additionally, gaming, virtual reality and other cutting-edge interactive visual analytics and technologies provide not only new and interesting sources of data, they can also help in making sense of and modeling data. Dimensionality reduction techniques and processes will be essential for handling data too large or multifaceted to be processed in standard software and analytical platforms – i.e., to address the ‘data curse’ problem, and spatial statistical artefacts like autocorrelation could be exploited for these purposes.

Overview of the book

Given the growing interest in the use of big data for regional science, and taking account of the complex and diverse issues and opportunities surrounding the use of such data in the field, there is a need for a book that covers this subject from a comprehensive and multidisciplinary perspective. Moreover, given the applied nature of regional science and its relevance and ties to urban and regional planning, there is a need for a book that speaks to translational aspects of big data for regional science. This book addresses these needs, and it is one of the first of such books to do so. Drawing upon a collection of research notes contributed by experts from all over the world, from the areas of economics, geography, political science, sociology, urban and regional planning, computer science and so on, this book robustly surveys the frontiers of research on big data for regional science and the array of complexities that come along with its use. It is intended as an agenda-setting and action-oriented platform for research and practice in the urban and regional community. And it is designed to appeal to a broad audience, including those from all tiers of academia, industry and government. The book is segmented into four sections, which correspond to major themes related to the use of big data for regional science.
Part I of the book focuses on new big data source for regional science. In Chapter 2, Guy Lansley and Paul Longley discuss the experience of the UK Consumer Data Research Centre in adapting the use of a new big data source derived from retail transactions to applications in health, energy efficiency and activity pattern analysis. Josep Maria Salanova and his colleagues in Chapter 3 provide a qualitative and quantitative description of floating taxi data through the presentation and comparison of datasets covering four cities, including Barcelona, Berlin, Poznan and Thessaloniki. Chapter 4 and 5 demonstrate the potential of using web-crawled data for regional science research. In Chapter 4, Jean-Claude Thill and colleagues provide a demonstration of a regional study on the structure of China’s service-oriented economy using a web-based information service data. Their work illustrates the potential of web-based big social science data for regional and urban science research. Zhenhua Chen in Chapter 5 provides a different example of using Internet-based housing data from web crawling for hedonic price analysis. Chapter 6 contributed by Shipeng Sun introduces an application of large-scale parcel data to study intraurban migration pattern in the Twin Cities Metropolitan Area of Minnesota. Chapters 7 and 8 explore applications of online big data for urban and regional science research. Specifically, Robert Goodspeed and Xiang Yan in Chapter 7 introduce a new approach for visual preference survey based on crowdsourced street voting data. In Chapter 8, Xinyue Ye and colleagues examine the temporal and spatial patterns of public response to campus shootings using Twitter data.
Part II of the book includes contributions to integration, processing and management of big data. Specifically, Yair Grinberger and Daniel Felsenstein i...

Table of contents

  1. Cover
  2. Title
  3. Copyright
  4. Contents
  5. List of figures
  6. List of tables
  7. List of contributors
  8. Foreword
  9. 1 Introduction
  10. Part I New big data sources in regional science
  11. Part II Big data integration and management
  12. Part III Big data analytics in regional science
  13. Part IV New frontiers of big data in regional science
  14. Index