Real Estate Analysis in the Information Age
eBook - ePub

Real Estate Analysis in the Information Age

Techniques for Big Data and Statistical Modeling

Kimberly Winson-Geideman, Andy Krause, Clifford A. Lipscomb, Nick Evangelopoulos

Share book
  1. 164 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Real Estate Analysis in the Information Age

Techniques for Big Data and Statistical Modeling

Kimberly Winson-Geideman, Andy Krause, Clifford A. Lipscomb, Nick Evangelopoulos

Book details
Book preview
Table of contents
Citations

About This Book

The creation, accumulation, and use of copious amounts of data are driving rapid change across a wide variety of industries and academic disciplines. This 'Big Data' phenomenon is the result of recent developments in computational technology and improved data gathering techniques that have led to substantial innovation in the collection, storage, management, and analysis of data.

Real Estate Analysis in the Information Age: Techniques for Big Data and Statistical Modeling focuses on the real estate discipline, guiding researchers and practitioners alike on the use of data-centric methods and analysis from applied and theoretical perspectives. In it, the authors detail the integration of Big Data into conventional real estate research and analysis. The book is process-oriented, not only describing Big Data and associated methods, but also showing the reader how to use these methods through case studies supported by supplemental online material. The running theme is the construction of efficient, transparent, and reproducible research through the systematic organization and application of data, both traditional and 'big'. The final chapters investigate legal issues, particularly related to those data that are publicly available, and conclude by speculating on the future of Big Data in real estate.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Real Estate Analysis in the Information Age an online PDF/ePUB?
Yes, you can access Real Estate Analysis in the Information Age by Kimberly Winson-Geideman, Andy Krause, Clifford A. Lipscomb, Nick Evangelopoulos in PDF and/or ePUB format, as well as other popular books in Business & Real Estate. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Routledge
Year
2017
ISBN
9781315311111
Edition
1
Subtopic
Real Estate

Part I
Data

1
Traditional real estate data

Overview of real estate data

Real estate data are multi-dimensional and take many forms including single-family residential (SFR) property characteristics, house price indices (HPIs), different spatial aggregations of office leasing transactional data, real estate investment trust (REIT) returns, and mortgage rate trends. With recent improvements in computing power, larger, multi-faceted sets of property and other data can now be stored, managed, and analyzed. These large, complex datasets are often referred to by the generic term “big data”, and although the phrase is expressed quite frequently, what actually constitutes “big data” is something of an enigma. A recent book by Foster, Ghani, Jarmin, Kreuter, and Lane (2017) suggests “there are almost as many definitions of big data as there are new types of data” (p. 3). The author’s choose to narrow in on the definition provided by Japec et al. (2015) – “an imprecise description of a rich and complicated set of characteristics, practices, techniques, ethical issues, and outcomes all associated with data.” Others have a more specific view of big data – the White House (2014) defines it in terms of the volume of data collected and processed, the variety of data that is or can be digitized, and the velocity of data that can be obtained in real- or nearly real-time.
To provide some background to this book, we initially cover what most researchers refer to as “traditional” real estate data. These data are most often related to sales transactions (or volume) and can include micro-property data (property listings, property characteristics [e.g. tax assessed value (TAV), structural characteristics, geographic identifiers like latitude and longitude], and building permits) as well as macro-housing data (e.g., HPIs, other longitudinal data, measures of supply and demand for leased space, and property data at a more macro geographic scale). Generally, we discuss property data in two broad categories – commercial real estate and residential real estate. This chapter focuses on the different government and industry data sources within these categories that are most commonly used in market analyses. To conclude the chapter, we discuss some issues related to data licensing, which is how the rights to license big data sources are commonly obtained by firms for their own internal research purposes or to create derivative products.

The four S’s of property data

There is no official or agreed upon typology or system for classifying the ever-increasing stock of property data that is available. In this book we propose that property data can be organized by determining how it falls across four different dimensions: – Scope, Substance, Source, and Sphere – the 4 S’s of property data classification. Briefly, the 4 S’s encompass:
  1. Scope: The relationship between the intended use of the data and the property industry
  2. Substance: The phenomena represented by the observations in the data
  3. Source: The origin of or the authority in charge of data collection and dissemination
  4. Sphere: The public or private-ness of the data content and data ownership

Scope

The scope of property data refers to the nexus between the original purpose for which the data was collected and its use in the property industry. To be more specific, there are three varieties of scope: Specific, Intentional, and Collateral. “Specific” data are data primarily collected for (and usually by) the property industry itself. “Intentional” data are data collected intentionally for the purpose of third party analysis, but not collected specifically for the property industry. “Intentional” data has a much wider usage. Finally, “Collateral” data are data that results as a by-product of some other action, such as a web search, social media post, or action not directly intended as a data collection exercise.
These three ‘scopes’ of property data map well onto the Winson-Geideman and Krause (2016) typology of Core, Static Spatial, and Peripheral real estate data. Data with a “specific” scope relates to the “core” data category. “Core” real estate data includes financial data, physical data, and transactional data. According to Winson-Geideman and Krause (2016), financial data includes REITs (e.g. data center REITs, commercial property REITs) and real estate-related stocks. Second, physical data include information about the hard real estate asset, whether that is an unimproved land parcel or an improved property with several structures on it. Third, transactional data include real estate purchases, mortgage origination data, lease data, expense data (included real estate taxes), as well as general economic returns for a single real estate development/investment or a portfolio of investments. With the physical and transactional data types, a common identifier such as property address or other parcel identification numbering system is often used to link these data types together. Examples of combining these two data types include data captured and maintained by property tax assessors and multiple listing services (MLSs).
“Intentional” data relates to Winson-Geideman and Krause’s “Static Spatial” category. Within this typology, we consider the most appropriate way to incorporate various types of geographic information systems (GIS) and spatial data. Prior to GIS, these externalities were part of the tacit knowledge of real estate professionals that were analyzed qualitatively, whereas property characteristics, financial returns, and other traditional numeric data were analyzed quantitatively. GIS and spatial data are not financial, nor are they physical or transactional per se. Instead, they are “extra-locational.” With the innovations made in GIS technologies in the 1990s, real estate professionals and researchers could efficiently use new kinds of data. GIS innovations allowed for extra-locational spatial data to be collected, quantified, and analyzed.
We use the term “extra-locational” to indicate data on spatial phenomena outside the physical boundaries of a property. Examples of extra-locational data include neighborhood information from the Census Bureau, traffic patterns, analyses of proximity to amenities and disamenities (to measure externalities), viewsheds, accessibility metrics, etc. In other words, this data type “quantifies how the property itself relates to existing external physical realities” (Winson-Geideman & Krause, 2016). Extra-locational data, which has been used for nearly two decades, often remains somewhat limited in terms of temporal resolution (i.e., is not often available in real-time, etc.) and usually possesses a well-defined spatial extent (e.g., Census tract, Census block group, school district).
Table 1.1 Data scope typology examples
Property-Based
Extra-Locational
Specific/Core Intentional/Static Spatial Collateral/Peripheral

Sales Transactions Census Bureau Data Internet Searches
Lease Transactions Road Network Data Transit Ridership Data
Mortgage Data Geographic Data Live Traffic Data
Tax Assessment Values Aggregated Spatial (Core) Data Point of Sale (POS) Data
Property Level Data (PLD) Urban Planning Forecasts Geo-Located Tweets
REIT/Real Estate Stock Data Spatial Economic Indicators Pedestrian Traffic Counts
Finally, data that is “Collateral” in scope maps to Winson-Geideman and Krause’s “Peripheral” data. Collateral data refers to data whose collection is incidental to or a byproduct of some other process. Collateral data are varied, disparate, and often available in real-time. It may be useful to think of Collateral data as being primarily human-focused whereas Traditional and Intentional data are typically non-human in nature. Collateral data are often remotely-sensed (gathered mechanically) instead of collected directly by real estate professionals or government workers. Finally, Collateral data are increasingly being used in real estate predictions and forecasting.
To summarize our conception of real estate data in the context of big data, Table 1.1 (based on Winson-Geideman and Krause (2016), 1) provides a non-exhaustive list of the three types of real estate data used or practically available to use in real estate research and analysis.

Substance

The substance of a particular set of data refers to the real world phenomenon that the data represent. There is no bounding set of what the substance can be, especially in data with a collateral scope, but the most common substances are: economic, physical, location, and human. Economic data refers to those data that represent information on economic or financial transactions. In short, it includes any situation in which money is exchanged or values are represented in relation to real estate. Physical data directly describes the hard, real asset that underlies real estate. This can be the structure or the land. Location data provides an orientation to the actual position of the asset (direct scope) or an externality or other party (intentional or collateral scope) on the earth or in relation to some fixed or known point. Location data also includes information regarding the boundaries of various administrative, natural, or market areas, or even the right-of-ways or routes of thoroughfares, transit lines, or waterways. Finally, human data digitally represents the activities of one or more people. This can be the act of walking, driving, spending, resting, tweeting, searching, calling – anything people do that leaves a digital footprint.
Some data may have more than one substance. A set of data giving the geo-located points from which people made mobile device searches of real estate listings has both human and location substance. If the same dataset included, for example, the actual booking of an Airbnb lodging, then it could also be considered economic. While most, if not all, combinations of scope and substance are possible, some are certainly more likely than others. Often physical and economic substance data is of a direct scope, while location data is primarily direct or intentional in scope. Human data is most often collateral in scope.

Source

There are three primary sources of property data: private industry, government, and academia/non-profit organizations. We consider government to be separate from academia and the non-profit sources because in most cases, government data is collected and distributed by mandate, whereas data from academia and non-profits are usually, though not always, collected and distributed on an ad hoc, or occasional basis.
The source of the data is important in understanding the frequency with which it is updated, the cost of accessing the data, and likely standardization (or lack thereof) of the data product. Government data and industry data are more likely to be produced on a regular schedule and distributed in a standardized, at least internally, fashion. Industry data, however, most often comes with a cost, whereas much, but not all of government data is free or open for use. Data sourced from non-profits or academic sources is often distinct to a particular project or cause and may not be updated regularly or standardized to integrate well with other datasets. Additionally, academic/non-profit data is usually some combination of government and industry data. As a result, we focus on government and industry data below, reserving a small discussion on academic and non-profit data resources near the end of the chapter.

Sphere

The sphere of data refers to whether the content is public or private. For example, data about the street network has a public sphere in terms of content. Conversely, data about the structural characteristics of a single family home has a private sphere in terms of content because a home is private property. Both public and private content can come from a government (public) or an industry (private) source. The street network data may be owned (and distributed) by a government (public) or by Google (industry). Likewise, the private content of a home’s structural characteristics can be owned and distributed by an industry entity, such as the Multiple Listing Service, or by a public entity, like a local government assessor or valuer’s office.
Across the four dimensions of the data, any combination of S’s is theoretically possible. There can be specific (scope) physical (substance) industry (source) private data, or collateral human government public data, and all manner of permutations in-between. Some combinations are certainly more likely than others, but we do not present an entire list of those possibilities and their propensity1 here.
Note that we have not explicitly addressed the issue of data access or cost in our four dimensions of data. We have omitted this aspect of property data for two reason...

Table of contents