Chapter 1
Life in a Data-Laden Age
Finding and Managing Datasets
This chapter covers ...
- . . . what data look like in their raw form within a dataset
- . . . how to work with data to get them ready to analyze
- . . . the wide variety of datasets that are readily available for analysis
- . . . how to build an additive index and check if it is acceptable
- . . . the newer forms that data take, from Internet databases to media analyses
- . . . types of variables used in statistical analysis
- . . . a classification of statistical procedures weāll cover in this book
- . . . an example of how researchers used an Internet dating site to study multiracial people
- . . . an example of how researchers used Google and Twitter to study media effects.
Introduction
Though considered a rude question in other parts of the world, a very typical first question at American parties when people are meeting other people for the first time is āWhat do you do for a living?ā I occasionally attend parties, and so I have had the following conversation many times:
OG (other guest): What do you do for a living?
TL (thatās me): Iām a professor.
OG: Really? What do you teach?
TL: Introductory Sociology.
OG: Nice!
TL: Social Change.
OG: Interesting!
TL: And Statistics.
OG: Yikes! Iām sorry. That must be horrible for you.
TL: No, I love it.
OG: Really? (at this point, OG usually tilts her head to one side and squints a little). Well, I hated my statistics course when I was in school . . .
Letās get one thing out in the open right away: statistics has a bad reputation. Though Iām not a fan of speaking in odds (more on that much later in the book), I would bet that odds are good you are not thrilled to be sitting in front of a book on statistics. Other emotions likely are in play: boredom, trepidation, fear. Maybe not all of these, but if youāre like many students taking a course in statistics, the probability is high that some of these emotions are involved. Any effort I make here to dispel such emotions likely will elicit another set of reactions: skepticism, disbelief, anger. I realize it might take me a while to win you over. But I will do my best. Iād even say that the odds are high that at some point, perhaps not right away, but somewhere down the road, you will, perhaps secretly, start to like statistics.
OK, you may not get to that point. But I do hope to convince you that understanding statistics is completely possible if you have the right combination of guides (your instructor and me). It is not only possible to understand statistics; it is also absolutely essential to being an informed and effective student, citizen, activist, or employee. We live in an age in which information is overwhelmingly everywhere, and a lot of this information is statistical. Legislators measure the success of social policies based on statistics. A philanthropist considering a large donation to a nonprofit organization may ask for evidence of the organizationās prior success, and this evidence is often statistical in nature. Start-up companies have made fortunes by developing better statistical models to help people mine the data created daily by peopleās Internet searches and by consumer behavior (a journalist even went so far as to call these people āthe Numeratiā (Baker 2008)). Therefore, if you canāt speak statistics, or read them, you could very well be left out of all of these loops.
Did I just say, āspeak statisticsā? Yes, I did. In many ways, for many people, learning statistics is very similar to learning a foreign language. If I started speaking, say, Farsi or Swahili right now, Iād probably lose your interest rather quickly (unless, of course, youāre a speaker of these languages, in which case youād probably perk right up). But do I lose you any less slowly when I say āAdding the squared age term raises the explained variation by 0.04 (with an F-test significant at p < .01) and causes the slope for the interaction effect to lose its statistical significance?ā Iād bet not. Right now, to figure out what this sentence meant, youād need to take it to someone who speaks statistics, and youād be relying on that personās translation. By the end of this book, youāll be able to figure out on your own what such sentences mean, which portends that, among your friends, family, and coworkers, you will likely become the statistical translator. And those statistical tables you see in academic journals or policy briefings? You know, those tables that you just skip over because you have no idea what theyāre saying? Iāll give you the necessary skills to be able to navigate such tables with ease.
This book differs substantially from other introductory statistics books. I think thatās a good thing, but, granted, Iām biased. In addition to using a writing style I hope will not bore or confuse you, I get us through the basic statistics relatively quickly. I do this in order to spend much more time than most books do on the statistical techniques that are used most in the real world. In my opinion, many books spend far too many chapters going over statistical techniques that students likely will never see in practice. Then, before they get to the really good stuff, the book ends. This is akin to a movie that has lots of character and plot development, and then, right at the climax, when the school bus filled with orphans is hanging off the cliff, the screen fades to black and the credits roll. This book, in contrast, not only saves those orphans; it finds them all families and buys each child a puppy. In this book, I cover the basics and then get to the good stuff. Although Iāve done my best to write as clearly as possible, there inevitably will be points where, the first time you read through them, something just doesnāt make sense. During such moments, donāt give up right away. Sometimes this material takes a few readings before you really understand it. But, if you are persistent, you will get there.
What Data Look Like
Yes, look. The word data is the plural form of the singular word datum. It may sound weird now, but get used to it, because itās grammatically correct. Stratum, medium, datum; strata, media, data. The data are correct. The data are available on the Internet. The data do not lie. Actually, sometimes they do lie, but more on that later in the book. In our trip together, weāll be calculating and interpreting statistics using lots and lots of data, so the first things I want to go over with you are the basic forms that data take, the major sources of data today, and some useful ways to work with data to answer the questions you want to answer.
Most, though not all, quantitative social research data start with surveys. A survey interviewer collects data from a survey respondent. Next, that respondentās answers are translated into numerical codes that the researchers then input into a dataset. The researchers then use the dataset and a statistical program to calculate their statistics. Reducing peopleās complex behaviors and attitudes to numbers is not a perfect process. Interesting details sometimes get lost in translation. Iāll be the first to defend those who use more qualitative techniques to study the social world. However, because this is a book on statistics, weāll be working with the more quan...