Big data raise major research possibilities for political communication scholars who are interested in how citizens, elites, and journalists interact. With the availability of social media data, academics can observe, on a large scale, how people talk about politics. The opportunity to study political discussions is also available to media organizations and political elites—examining how they make use of big data represents another fruitful scholarly trajectory. The scholars involved in Digital Discussions represent forward thinkers who aim to inform the study of political communication by analyzing the behavior of and messages left by citizens, elites, and journalists in digital spaces. By using a variety of methodological approaches and bringing together diverse theoretical perspectives, this group sheds light on how big data can inform political communication research. It is critical reading for those studying and working in communication studies with a focus on big data.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.

Yes, you can access Digital Discussions by Natalie Jomini Stroud, Shannon McGregor, Natalie Jomini Stroud,Shannon McGregor in PDF and/or ePUB format, as well as other popular books in Politics & International Relations & Digital Media. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Year

Print ISBN

eBook ISBN

Topic

Politics & International Relations

Subtopic

Digital Media

Index

Politics & International Relations

Big Data in Political Communication

Natalie Jomini Stroud and Shannon C. McGregor

“Big data” has entered the academic lexicon as a new buzzword. Although there are no clear guidelines for what dataset size qualifies as “big,” there is widespread recognition that the availability of massive digital datasets provides a novel opportunity for scholars. By using traces of data left behind by people as they navigate their digital environments—the sites they peruse, the social media posts they make, the way they interact with sites—scholars can analyze people’s expressed attitudes and behaviors. In this volume, we focus on what political communication scholars can learn by studying digital trace data—the transmission of information and opinions in public, digital spaces. The messages left in comment sections, posted on social media sites, and tweeted by bloggers provide the raw data for new understandings of how citizens, elites, and journalists make sense of the political world. This book aims to examine the theoretical and methodological implications of big data, and to provide new empirical research that makes use of big data.

There are intriguing possibilities from working with these data. Unlike traditional survey and experimental datasets, big data (at least as conceptualized here) are not created under contrived circumstances. And, unlike in-depth interviews or ethnographies, big data are available on a much larger scale. Of course, the datasets have limitations. Big data come from self-selected participants—only those who have a Twitter account and want to tweet about politics, for instance, will be included in a political Twitter dataset. This is only a substantial weakness if one is looking to make inferences about the broader population. Further, the data are constrained by technology. Algorithmic changes, for instance, can affect the data, as can the availability of digital archives.

Nonetheless, big data present major research possibilities for political communication scholars who are interested in how citizens, elites, and journalists interact. Political discussions, for instance, have long been of interest to communication scholars (e.g. Katz & Lazarsfeld, 1955; Mutz, 2006; Price & Cappella, 2002). With the availability of social media data, academics can observe, on a large scale, how people talk about and interact with politics. The opportunity to study political discussions is also available to media organizations and political elites: examining how they make use of big data represents another fruitful scholarly trajectory. The scholars involved in this book represent forward thinkers who aim to inform the study of political communication by analyzing the behavior of and messages left by citizens, elites, and journalists in digital spaces. Using a variety of methodological approaches and bringing diverse theoretical perspectives, this group is poised to shed light on how big data can inform political communication scholarship.

Big Data and Related Terms

Electing to use the term “big data” to describe this book was not an easy choice. It is fraught with complication because there is no definition of what makes data “big.” The best definitions offered by contributors to this volume sidestep this issue. Bode, for instance, defines big data as “information that is (1) created digitally and (2) collected in large numbers to facilitate analysis.” Guo identifies big data as “any large-scaled numerical, textual, visual, or geographic data, which can be analyzed to reveal patterns and trends of human behavior.” She goes further, saying that the size and complexity of big data are beyond traditional tools for gathering and analyzing data.

We tend to agree that there is no bright line distinguishing big data from medium, or small, data. Nonetheless, the term is useful because it conveys the advanced tools required for gathering and analyzing this form of data. Some traditional statistical programs are unable to accommodate datasets of this size. Further, these datasets tax traditional computers’ storage and processing capacities. As technology improves, however, this definition of big data seems less relevant (boyd & Crawford, 2012).

We considered other terms that also seem to capture the phenomenon of interest. Most of the authors in this book are interested in a particular type of big data—digital trace data. In this volume, Jungherr, drawing on work from Howison, Wiggins, and Crowston (2011), defines digital trace data as “data documenting the interactions of users with digital devices or services.” These data are, quite literally, the traces that people leave behind when they have engaged in digital spaces. This could be browser history, comments, or social media posts, and the list could continue indefinitely. Of course, there are other types of big data beyond digital trace data—you could think about big datasets with relevance to medicine or engineering. For communication scholars, however, digital trace datasets are often of primary interest.

We also considered using the term “computational social science,” which captures a method frequently employed by those using big data. As Shah, Cappella, and Neuman (2015) explain, computational social science involves:

(1)the use of large, complex datasets, often—though not always—measured in terabytes or petabytes; (2) the frequent involvement of ‘naturally occurring’ social and digital media sources and other electronic databases; (3) the use of computational or algorithmic solutions to generate patterns and inferences from these data; and (4) the applicability to social theory in a variety of domains from the study of mass opinion to public health, from examinations of political events to social movements (p. 7).

This form of analysis is at the intersection of computer and social science, and can require collaborations with computer scientists, as Guo notes in her chapter.

Acknowledging that the work here involves both digital trace data and computational social science, we nonetheless opted for the term “big data.” We did so for several reasons. First, “big data” has gained traction in academic communities, and is now widely discussed in popular and scholarly contexts. Second, we wanted to focus on the data in this volume, rather than the method. The term data, we felt, lent itself to more diverse analyses, such as Baldwin-Philippi’s qualitative work on how campaigns are using “big data.” So, with an acknowledgment of the complexities of the term, we adopted it as a defining feature of the chapters that follow.

Big Data and Political Communication

Political communication scholars aim to look at how elites, the media, and the public interact around political topics. Big data allow many opportunities to do precisely this work, as all three entities leave volumes of trace data. Research to date has used big data approaches to examine how political elites communicate (McGregor, Lawrence, & Cardona, 2017), how agenda setting occurs across traditional and social media (Neuman, Guggenheim, Jang, & Bae, 2014), and how norms regarding incivility and partisanship are rewarded and punished in news comment sections (Muddiman & Stroud, 2017). Studies like these demonstrate the utility of this approach for answering questions of theoretical interest to political communication researchers.

Methodologically, political communication scholars should be especially well poised to make contributions to the study of big data. Political content is widely distributed on such platforms as Twitter and political news garners extensive comments on news sites (Coe, Kenski, & Rains, 2014). Communication scholars have been pioneers in the analysis of texts and the methods of content analysis (e.g. Krippendorff, 2012): and political communication scholars, in particular, have been developing computerized content-analysis programs that can be used to analyze large corpuses of text (e.g. Hart, 1985; Young & Soroka, 2012). The availability of content and methods relevant to political communication makes this volume particularly apropos.

With that said, the explosion of research related to big data means that this volume will not be comprehensive. Several aspects of big data are not covered in these chapters, but can be found in other places, such as the analysis of networks and the use of algorithms and recommender systems (e.g. Beam, 2014; Colleoni, Rozza, & Arvidsson, 2014; Flaxman, Goel, & Rao, 2016). We also focus on U.S.-based big data analyses, although the methodological issues raised and the theoretical lessons drawn from the chapters will have relevance to political communication scholars regardless of their country of residence. Finally, there has been an overarching use of big data to analyze textual content, and more development is needed to bring this approach to images and video. This gap in our technical abilities is apparent in this volume as well.

Organization of the Book

The book is organized into three sections; the first examines the benefits and drawbacks of political communication researchers’ use of big data; the second evaluates the reliability and validity our uses of these datasets; and the third demonstrates the ways in which we can gain new insights by using big data.

The first section of this book offers competing takes on the benefits and drawbacks of the use of big data within the social sciences. While Bode is optimistic, Jungherr is less so. Putting them into these camps is, of course, an oversimplification of their positions, but their chapters do have decidedly different tones which serve to provide an overview of the complexities of using big data. Bode offers a hopeful take on the effects of big data on academic scholarship. She sees big data as being able to answer new communication questions, to push us to consider our methodological choices more deeply, and to offer stronger justifications for our work. She also believes that big data findings are more easily understandable, which represents an opportunity to better engage students and the public.

Jungherr, taking a different tack, is critical of contemporary scholarship that uses digital trace data. He identifies two fallacies that frequently crop up. First, people treat the data as though they have every possible data point (the n=all fallacy). But, often, scholars do not have complete data. Platforms may not store all the data, or may have service agreements that prevent scholars from accessing all available data. Second, people see online data as acting as a mirror of some social phenomenon, but it may not be. Twitter data may simply be statements that people were willing to make on Twitter and nothing more—perhaps they do not capture underlying social maladies. Jungherr urges researchers to think much more carefully about what the data actually capture and to subject digital trace data to rigorous validity testing. In its infancy, studies using digital trace data were accepted merely based on the grounds that they were methodologically innovative. As we enter an era of more normalized use of these datasets, Jungherr makes a compelling case for better conceptualization.

The second section of the book expands upon questions about what big data can tell political communication researchers. The three chapters push researchers to think carefully about the validity of the inferences they can draw from big data. Freelon discusses the technical and social aspects of social media platforms and how they can constrain our ability to draw valid inferences. Guo looks at analytic strategies for dealing with big data and how they can be more or less valid. Pasek and Dailey tested how well Twitter sentiment can predict candidate preferences. Each of the three chapters models the care researchers should take in thinking through the validity of any inferences drawn from big data.

Freelon takes a close look at the construct validity of social media trace data. He recommends that researchers consider four factors; the technical design and affordances of a social media platform; the terms of service that govern how people act on a social media platform; the context of how people use social media; and the potential for misrepresentation. The chapter, then, takes a critical look at how people can disclose their gender, race/ethnicity, or location based on these four factors. Facebook, for instance, requires users to indicate their gender when they sign up for an account. Other methods of determining gender, such as inferring it from someone’s name, have questionable validity on some platforms and among some sub-populations. As Freelon aptly notes, digital trace data were not created for the purpose of research and, because of this, researchers must carefully consider the limitations of any inferences made.

Guo offers several cautionary tales about how we analyze big data. She points out that researchers must make numerous choices when deciding how to analyze the data, and each choice can affect the conclusions reached. By sharing the results of several reliability and validity tests, Guo illustrates the extent to which human decision-making can change the results of big data analyses. Although she shows that changes do occur, the examples she shared do not seem to result in dramatic overturns of the reached conclusions. Productively, Guo offers recommendations to researchers, urging them to work with computer scientists and to ensure that they test the results of any “out-of-the-box” big data analytics packages.

Pasek and Dailey undertake precisely the sort of analysis that Jungherr recommends, seeking to analyze the correspondence between Twitter data and survey data regarding electoral preferences and candidate favorability. They find little evidence that sentiment expressed toward the candidates on Twitter corresponds with survey measures. This is true regardless of whether they look at; (a) candidate favorability or electoral preferences; (b) changes in sentiment or absolute levels of sentiment; and (c) survey data corresponding to demographic attributes of Twitter users. There is some suggestion that the Twitter data more closely conformed to survey data about candidate preferences later in the 2008 presidential campaign, but the authors are rightly cautious about how far they would push this conclusion. Park and Dailey’s chapter suggests that Twitter data is what it is—public expressions among a distinct group—as opposed to a proxy for something else.

The third, and final, section of this book provides examples of big data analysis with relevance to political communication scholars. These demonstrations illustrate how big data can be used to answer important questions for political communication scholars and offer both methodological and theoretical insights. The four chapters in this section each examine different sources of big data, whether Yik Yak, comments from The New York Times, campaign uses of big data, or tweets. Each demonstrates the new ways in which scholars must justify the methods that they use to analyze datasets of this size—using the same techniques that communication scholars typically employ when analyzing survey or experimental data is not always possible. This collection of chapters is also particularly important because they analyze the intersections among media, elites, and the public in their communication practices regarding politics.

In their chapter, Vargo and Hopp analyze the use of Yik Yak among college students. Politics, they find, comes up infrequently. Yet, major political events, such as the State of the Union address, yield an uptick in political posts on the platform. Interestingly, political comments on Yik Yak are particularly unlikely at large universities, universities with a higher percentage of large classes, and universities with more fraternities and sororities—perhaps the heterogeneity of these contexts depresses political talk, but more research is needed.

Muddiman looks at comments left on The New York Times website to understand how other people and ...

Cover
Half Title
Series Information
Title Page
Copyright Page
Contents
Notes on Contributors
1 Big Data in Political Communication
2 Normalizing Digital Trace Data
3 Everything Old Is New Again: Big Data and Methodological Transparency
4 Ignorance or Uncertainty: How the “Black Box” Dilemma in Big Data Research May “Misinform” Political Communication
5 Why Don’t Tweets Consistently Track Elections?: Lessons from Linking Twitter and Survey Data Streams
6 Inferring Individual-Level Characteristics from Digital Trace Data: Issues and Recommendations
7 The Technical, the Personal, and the Political: Understanding Journalists and News Users’ Engagement in The New York Times Comments Section
8 Is Yik Yak a Platform for Political Communication?: Exploring College Students’ Communication on an Emergent Social Media Platform
9 Data-Driven Campaigning
10 “Little Marco,” “Lyin’ Ted,” “Crooked Hillary,” and the “Biased” Media: How Trump Used Twitter to Attack and Organize
Index

About this book

Frequently asked questions

Information

Table of contents