The creation, accumulation, and use of copious amounts of data are driving rapid change across a wide variety of industries and academic disciplines. This 'Big Data' phenomenon is the result of recent developments in computational technology and improved data gathering techniques that have led to substantial innovation in the collection, storage, management, and analysis of data.

Real Estate Analysis in the Information Age: Techniques for Big Data and Statistical Modeling focuses on the real estate discipline, guiding researchers and practitioners alike on the use of data-centric methods and analysis from applied and theoretical perspectives. In it, the authors detail the integration of Big Data into conventional real estate research and analysis. The book is process-oriented, not only describing Big Data and associated methods, but also showing the reader how to use these methods through case studies supported by supplemental online material. The running theme is the construction of efficient, transparent, and reproducible research through the systematic organization and application of data, both traditional and 'big'. The final chapters investigate legal issues, particularly related to those data that are publicly available, and conclude by speculating on the future of Big Data in real estate.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.

Yes, you can access Real Estate Analysis in the Information Age by Kimberly Winson-Geideman,Andy Krause,Clifford A. Lipscomb,Nick Evangelopoulos in PDF and/or ePUB format, as well as other popular books in Business & Industrial Management. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Year

Print ISBN

eBook ISBN

Edition

Topic

Business

Subtopic

Industrial Management

Index

Business

Part I
Data

1
Traditional real estate data

Overview of real estate data

Real estate data are multi-dimensional and take many forms including single-family residential (SFR) property characteristics, house price indices (HPIs), different spatial aggregations of office leasing transactional data, real estate investment trust (REIT) returns, and mortgage rate trends. With recent improvements in computing power, larger, multi-faceted sets of property and other data can now be stored, managed, and analyzed. These large, complex datasets are often referred to by the generic term “big data”, and although the phrase is expressed quite frequently, what actually constitutes “big data” is something of an enigma. A recent book by Foster, Ghani, Jarmin, Kreuter, and Lane (2017) suggests “there are almost as many definitions of big data as there are new types of data” (p. 3). The author’s choose to narrow in on the definition provided by Japec et al. (2015) – “an imprecise description of a rich and complicated set of characteristics, practices, techniques, ethical issues, and outcomes all associated with data.” Others have a more specific view of big data – the White House (2014) defines it in terms of the volume of data collected and processed, the variety of data that is or can be digitized, and the velocity of data that can be obtained in real- or nearly real-time.

To provide some background to this book, we initially cover what most researchers refer to as “traditional” real estate data. These data are most often related to sales transactions (or volume) and can include micro-property data (property listings, property characteristics [e.g. tax assessed value (TAV), structural characteristics, geographic identifiers like latitude and longitude], and building permits) as well as macro-housing data (e.g., HPIs, other longitudinal data, measures of supply and demand for leased space, and property data at a more macro geographic scale). Generally, we discuss property data in two broad categories – commercial real estate and residential real estate. This chapter focuses on the different government and industry data sources within these categories that are most commonly used in market analyses. To conclude the chapter, we discuss some issues related to data licensing, which is how the rights to license big data sources are commonly obtained by firms for their own internal research purposes or to create derivative products.

The four S’s of property data

There is no official or agreed upon typology or system for classifying the ever-increasing stock of property data that is available. In this book we propose that property data can be organized by determining how it falls across four different dimensions: – Scope, Substance, Source, and Sphere – the 4 S’s of property data classification. Briefly, the 4 S’s encompass:

Scope: The relationship between the intended use of the data and the property industry
Substance: The phenomena represented by the observations in the data
Source: The origin of or the authority in charge of data collection and dissemination
Sphere: The public or private-ness of the data content and data ownership

Scope

The scope of property data refers to the nexus between the original purpose for which the data was collected and its use in the property industry. To be more specific, there are three varieties of scope: Specific, Intentional, and Collateral. “Specific” data are data primarily collected for (and usually by) the property industry itself. “Intentional” data are data collected intentionally for the purpose of third party analysis, but not collected specifically for the property industry. “Intentional” data has a much wider usage. Finally, “Collateral” data are data that results as a by-product of some other action, such as a web search, social media post, or action not directly intended as a data collection exercise.

These three ‘scopes’ of property data map well onto the Winson-Geideman and Krause (2016) typology of Core, Static Spatial, and Peripheral real estate data. Data with a “specific” scope relates to the “core” data category. “Core” real estate data includes financial data, physical data, and transactional data. According to Winson-Geideman and Krause (2016), financial data includes REITs (e.g. data center REITs, commercial property REITs) and real estate-related stocks. Second, physical data include information about the hard real estate asset, whether that is an unimproved land parcel or an improved property with several structures on it. Third, transactional data include real estate purchases, mortgage origination data, lease data, expense data (included real estate taxes), as well as general economic returns for a single real estate development/investment or a portfolio of investments. With the physical and transactional data types, a common identifier such as property address or other parcel identification numbering system is often used to link these data types together. Examples of combining these two data types include data captured and maintained by property tax assessors and multiple listing services (MLSs).

“Intentional” data relates to Winson-Geideman and Krause’s “Static Spatial” category. Within this typology, we consider the most appropriate way to incorporate various types of geographic information systems (GIS) and spatial data. Prior to GIS, these externalities were part of the tacit knowledge of real estate professionals that were analyzed qualitatively, whereas property characteristics, financial returns, and other traditional numeric data were analyzed quantitatively. GIS and spatial data are not financial, nor are they physical or transactional per se. Instead, they are “extra-locational.” With the innovations made in GIS technologies in the 1990s, real estate professionals and researchers could efficiently use new kinds of data. GIS innovations allowed for extra-locational spatial data to be collected, quantified, and analyzed.

We use the term “extra-locational” to indicate data on spatial phenomena outside the physical boundaries of a property. Examples of extra-locational data include neighborhood information from the Census Bureau, traffic patterns, analyses of proximity to amenities and disamenities (to measure externalities), viewsheds, accessibility metrics, etc. In other words, this data type “quantifies how the property itself relates to existing external physical realities” (Winson-Geideman & Krause, 2016). Extra-locational data, which has been used for nearly two decades, often remains somewhat limited in terms of temporal resolution (i.e., is not often available in real-time, etc.) and usually possesses a well-defined spatial extent (e.g., Census tract, Census block group, school district).

Table 1.1 Data scope typology examples

Property-Based	Extra-Locational
Specific/Core	Intentional/Static Spatial	Collateral/Peripheral

Sales Transactions	Census Bureau Data	Internet Searches
Lease Transactions	Road Network Data	Transit Ridership Data
Mortgage Data	Geographic Data	Live Traffic Data
Tax Assessment Values	Aggregated Spatial (Core) Data	Point of Sale (POS) Data
Property Level Data (PLD)	Urban Planning Forecasts	Geo-Located Tweets
REIT/Real Estate Stock Data	Spatial Economic Indicators	Pedestrian Traffic Counts

Finally, data that is “Collateral” in scope maps to Winson-Geideman and Krause’s “Peripheral” data. Collateral data refers to data whose collection is incidental to or a byproduct of some other process. Collateral data are varied, disparate, and often available in real-time. It may be useful to think of Collateral data as being primarily human-focused whereas Traditional and Intentional data are typically non-human in nature. Collateral data are often remotely-sensed (gathered mechanically) instead of collected directly by real estate professionals or government workers. Finally, Collateral data are increasingly being used in real estate predictions and forecasting.

To summarize our conception of real estate data in the context of big data, Table 1.1 (based on Winson-Geideman and Krause (2016), 1) provides a non-exhaustive list of the three types of real estate data used or practically available to use in real estate research and analysis.

Substance

The substance of a particular set of data refers to the real world phenomenon that the data represent. There is no bounding set of what the substance can be, especially in data with a collateral scope, but the most common substances are: economic, physical, location, and human. Economic data refers to those data that represent information on economic or financial transactions. In short, it includes any situation in which money is exchanged or values are represented in relation to real estate. Physical data directly describes the hard, real asset that underlies real estate. This can be the structure or the land. Location data provides an orientation to the actual position of the asset (direct scope) or an externality or other party (intentional or collateral scope) on the earth or in relation to some fixed or known point. Location data also includes information regarding the boundaries of various administrative, natural, or market areas, or even the right-of-ways or routes of thoroughfares, transit lines, or waterways. Finally, human data digitally represents the activities of one or more people. This can be the act of walking, driving, spending, resting, tweeting, searching, calling – anything people do that leaves a digital footprint.

Some data may have more than one substance. A set of data giving the geo-located points from which people made mobile device searches of real estate listings has both human and location substance. If the same dataset included, for example, the actual booking of an Airbnb lodging, then it could also be considered economic. While most, if not all, combinations of scope and substance are possible, some are certainly more likely than others. Often physical and economic substance data is of a direct scope, while location data is primarily direct or intentional in scope. Human data is most often collateral in scope.

Source

There are three primary sources of property data: private industry, government, and academia/non-profit organizations. We consider government to be separate from academia and the non-profit sources because in most cases, government data is collected and distributed by mandate, whereas data from academia and non-profits are usually, though not always, collected and distributed on an ad hoc, or occasional basis.

The source of the data is important in understanding the frequency with which it is updated, the cost of accessing the data, and likely standardization (or lack thereof) of the data product. Government data and industry data are more likely to be produced on a regular schedule and distributed in a standardized, at least internally, fashion. Industry data, however, most often comes with a cost, whereas much, but not all of government data is free or open for use. Data sourced from non-profits or academic sources is often distinct to a particular project or cause and may not be updated regularly or standardized to integrate well with other datasets. Additionally, academic/non-profit data is usually some combination of government and industry data. As a result, we focus on government and industry data below, reserving a small discussion on academic and non-profit data resources near the end of the chapter.

Sphere

The sphere of data refers to whether the content is public or private. For example, data about the street network has a public sphere in terms of content. Conversely, data about the structural characteristics of a single family home has a private sphere in terms of content because a home is private property. Both public and private content can come from a government (public) or an industry (private) source. The street network data may be owned (and distributed) by a government (public) or by Google (industry). Likewise, the private content of a home’s structural characteristics can be owned and distributed by an industry entity, such as the Multiple Listing Service, or by a public entity, like a local government assessor or valuer’s office.

Across the four dimensions of the data, any combination of S’s is theoretically possible. There can be specific (scope) physical (substance) industry (source) private data, or collateral human government public data, and all manner of permutations in-between. Some combinations are certainly more likely than others, but we do not present an entire list of those possibilities and their propensity¹ here.

Note that we have not explicitly addressed the issue of data access or cost in our four dimensions of data. We have omitted this aspect of property data for two reason...

Cover
Title
Copyright
Dedication
Contents
Foreword
Acknowledgements
Abbreviations
Contributors
Introduction
PART I Data
PART II Tools and Processes
PART III Modeling and analysis
PART IV Legal and future
Index

About this book