1 Introducing Data
contents
- 1.1 Chapter Overview 2
- 1.2 Data Surrounds Us 2
- 1.3 The Power of Data: Fog, Pollution, and Catastrophe in London 2
- 1.4 The Lingering Influence of Data: The Work of Alexis de Tocqueville 3
- 1.5 The Questions that Drive Data Analysis: The Work of Adolphe Quetelet 4
- 1.6 Defining âDataâ 8
- 1.7 From âDataâ to âBig Dataâ 10
- 1.8 Concluding Thoughts 11
- 1.9 Summary 12
- 1.10 Further Reading 12
- 1.11 Discussion Questions 13
1.1 Chapter Overview
By the end of this chapter, you will be able to:
- Describe the impact and lingering influence of data through historical examples
- Gain a first-hand perspective on what data analysis involves by making your own predictions and comparing these predictions with real-life data
- Define the term âdataâ
- Describe the âbig dataâ phenomenon and its potential advantages and risks.
1.2 Data Surrounds Us
Data surrounds us. From stock prices on the morning news to records of calories burned on fitness machines, we all encounter colossal amounts of data â quantitative or qualitative information about ourselves, society, or the universe â every day. Analysing all of this data would be impossible; analysing some of it, however, can tell us a great deal about ourselves, our society, and the world we live in.
Data allows us to make discoveries that intuition or common sense cannot uncover. Data drives policy changes by governments. Data shapes the behaviour of companies and organizations. Data even changes the way people think. Given the power of data, the ability to analyse data is a special skill that is increasingly valued in society. However, because this skill is so potent, it can â like any type of power â be used for good or evil. While many data analysts strive to do good by accurately and clearly presenting their findings, others deliberately misrepresent data for selfish purposes. The goal of this book is to develop your data analysis skills to help you do good things in the world, and recognize when other data analysts are deliberately misrepresenting their findings.
1.3 The Power of Data: Fog, Pollution, and Catastrophe in London
Not yet convinced of the power of data? Letâs look at one example of how data, painstakingly collected and clearly presented to the public, saved countless lives and permanently altered life in one of the worldâs great cities.
Imagine you are living in London in December 1952. Pollution emitted by the cityâs smokestacks and coal-burning fireplaces mixes with the winter fog, resulting in a thick, choking, and nearly immobilizing smog (MacNee & Donaldson, 2008, p. 121). The smog lasts for five days. Visibility becomes so poor that buses stop running and, even indoors, many theatrical events are cancelled because audiences cannot see the stage (BBC News, 2008; Wallace & Hobbs, 2006, p. 179). Even scarier is the fogâs impact on human health. Between 5 December and 9 December 1952, overall hospital admissions increase by half, with many of these admissions due to respiratory conditions (MacNee & Donaldson, 2008, p. 121).
Although the December 1952 fog was particularly severe, similar air pollution events were alarmingly common in nineteenth- and early twentieth-century London. Several past smog incidents had killed hundreds of people in the late 1800s (Brimblecombe, 1987, p. 124). Yet these past tragedies did not prompt policy changes. Although some observers expressed concern about potential health consequences â in the late nineteenth century, for example, one meteorologist argued that elevated levels of bronchitis in London were partly due to smoky fogs (MacNee & Donaldson, 2008, p. 121; Russell, 1889) â data was not yet generally available to analyse this issue. However, by 1952, this had changed. The ability to accurately collect, analyse, and communicate findings from data about the fogâs consequences changed everything.
For the 1952 fog, the first set of influential data was released by the health minister several weeks after the tragedy, revealing that 2800 extra deaths occurred in London during the fog as compared with the same week in 1951 (Thorsheim, 2006, p. 165).1 This dramatic statistic received significant media attention, both in Britain and around the world (âWeek of London fogâŚâ, 1952). It encouraged people to view the 1952 fog as a preventable tragedy that could be stopped through policy action on pollution (Thorsheim, 2004, p. 166). Public concern about the 1952 fog paved the way for adoption of the Clean Air Act of 1956 which regulated industrial emissions and mandated that most homes and businesses stop using coal fires â a massive societal change that did not occur in response to previous fogs (Thorsheim, 2006, pp. 173â174).
This example illustrates the power of data to sharply alter peopleâs perceptions â and even to change policy. Such policy change only occurred after the public was given clear proof of the number of people who had been killed in the fog. In other words, accurately collected and clearly presented data was necessary to prompt this transformation.
1.4 The Lingering Influence of Data: The Work of Alexis de Tocqueville
The data that helped prompt action on air pollution was quantitative data â meaning that it consisted of numbers (in this case, a single alarming statistic about the number of people who had died). When we hear the word âdataâ, we often think about statistics like these. But data does not have to consist of numbers; it can, instead, consist of words, actions, behaviours, images, objects, and numerous other features of individual or social life. Data that is not numeric is known as qualitative data and, as with quantitative data, the analysis of qualitative data has similarly shaped policy and society in innumerable ways. Such influence can in fact resonate centuries into the future.
Letâs look at one such example of the lingering influence of data: the landmark qualitative research of Alexis de Tocqueville. In 1831, de Tocqueville and his colleague Gustave Beaumont undertook a thorough nine-month journey throughout the Eastern, Midwest, and Southern regions of the United States. Although tasked by the French government with studying the US prison system, de Tocqueville expanded the scope of his analysis to include all elements of social and political life in the young country. The lengthy, groundbreaking work that de Tocqueville produced, Democracy in America (1835), offers a multilayered analysis of democracy (and the appalling contradiction of slavery within a democracy), class, gender, cultural attitudes, and political values (Kurweil, 1999).
Although de Tocquevilleâs work is often considered a work of political philosophy, itâs actually an âexample of productive qualitative inquiryâ, since, to produce the volume, de Tocqueville meticulously collected and analysed qualitative data about the attitudes of American citizens and the workings of their democratic government (Lingenfelter, 2016, Chapter 5). Specifically, de Tocquevilleâs work is an early example of âparticipant observationâ (Whitley, 2008, p. 98; see also Handler, 2005, p. 22; Kurweil, 1999, p. 153), a qualitative research method in which a researcher conducts extensive research in the field, and participates in activities or daily life routines within a particular area, organization, society, or group of interest. Participant observation allows a researcher to obtain first-hand insight into what it feels like to be a member of a specific group and engage in particular activities. De Tocqueville achieved such insight by travelling throughout the countryâs regions, conversing with numerous Americans, and participating broadly in community life. As an âoutsiderâ, de Tocqueville was able to bring a uniquely observant perspective to the society he studied, and he is now recognized by some contemporary scholars as âthe first modern social scientistâ (Kurweil, 1999, p. 153).
Perhaps the clearest illustration of the value of de Tocquevilleâs work is its lingering influence today. Democracy in America is still regularly referenced by cultural commentators and is widely assigned to social science students in US universities. A 2015 article in the Washington Post, for example, described de Tocquevilleâs work as â[t]he book every new American citizen â and every old one, too â should readâ (Lozada, 2015); and a 2017 BBC News article entitled âCan democracy survive Facebook?â referenced de Tocquevilleâs work (Rajan, 2017). The fact that de Tocquevilleâs findings are still considered relevant to a twenty-first-century discussion about social media and democracy illustrates that well-executed qualitative research â like quantitative research â can exert a long-lasting impact. If youâre interested in learning more about participant observation and other forms of qualitative research, you can look forward to Chapter 6, where weâll discuss qualitative data analysis in detail.
1.5 The Questions that Drive Data Analysis: The Work of Adolphe Quetelet
Data analysis is driven by questions. Learning to ask interesting and thoughtful questions is the first step in conducting interesting and thoughtful social research. One of this bookâs goals is to inspire you to ask such questions about the world, and in each chapter weâll explore real-life examples of the kinds of fascinating questions data analysts have asked. In fact, letâs start with one example now: the work of Lambert Adolphe Jacques (or simply Adolphe) Quetelet (1796â1874). Queteletâs pioneering quantitative research illustrates how well-thought-out questions can prompt innovative data analysis, and lead to surprising and fundamental insights about the social world. Quetelet, a brilliant early data analyst from Belgium, produced groundbreaking findings in a wide range of disciplines, including medicine and criminology. Weâll focus on Queteletâs work in two controversial areas: human height and weight, and weather and crime.
1.5.1 Height and weight
How do height and weight change over the course of individualsâ lives? Do gender, class, and geographic region affect such changes? These questions, which reverberate in todayâs discussions about rising obesity levels, also fascinated Quetelet in the 1840s. Yet in contrast to more speculative commentators, Quetelet analysed quantitative data to answer these questions rigorously. Queteletâs (1842) data was derived from a broad range of sources â including government registers in Belgium, measurements taken from infants at the Foundling Hospital in Brussels, measurements taken from children working in factories in Manchester and Stockport (in England), and measurements taken from undergraduates at the University of Cambridge.
Quetelet (1842) noted that wealthier individuals tended to be taller than average, and that the growth of poorer individuals was often stunted by poverty and deprivation.2 He also observed that, for the average person, body weight measured in kilograms tended to be proportional to the square of their height measured in meters â an observation that remains deeply influential (Eknoyan, 2008, p. 48) as it helped form the basis of the Body Mass Index (BMI; Keys, Karvonen, Kimura, & Taylor, 1972). Today, the BMI is one of the most widely used measures for assessing whether an individual is underweight, at a healthy weight, overweight, or obese, and since obesity is now a significant global health concern, the BMI currently receives substantial attention from health professionals and the popular media. For example, the US Centers for Disease Control and Prevention (CDC, 2012) note that: âFor adults, overweight and obesity ranges are determined by using weight and height to calculate a number called the âbody mass indexââ. The BMI is also used to trace global trends in obesity, revealing that, from 1980 to 2013, the percentage of men around the world who were obese rose from 19% to 37%, while the percentage of women rose from 30% to 38% (Ng et al., 2014).
The BMI has many limitations. Since it does not distinguish between fat and muscle, it may not be helpful for all individuals (such as athletes with substantial muscle mass), but it can be useful at the population level, as the example of global obesity trends illustrates (see the discussion in Stephenson, 2013). Although more sophisticated methods for measuring obesity now exist, the simplicity and ease with which the BMI can be calculated likely reinforce its appeal. Additionally, as Wells (2014) has described, it is not clear whether some alternative measurements, such as waist size, are actually better indicators of the risk of developing chronic diseases than the BMI.
The BMIâs persistence and continuing value illustrate the potential impact of thoughtful and considered data analysis. Queteletâs observation about human height and weight that inspired the BMI was not based on speculation or intuition; as we have seen, Queteletâs (1842) observation was based on his considered analysis of data of the heights and weights of many different individuals. A more speculative observation would likely not have been as accurate or achieved the precision necessary to survive into the twenty-first century, as the BMI has done. Additionally, the BMIâs persistence also highlights the importance of data presentation. Queteletâs (1842) original observation is just a simple formula, accessible to anyone, and this simplicity has likely helped the BMI to persist despite the availability of more sophisticated techniques for measuring obesity today.
1.5.2 Crime and weather
In addition to height and weight, Quetelet explored the contentious question of whether the weather can affect crime rates. This question has continued to interest scholars, and Quetelet...