Journalism in an Era of Big Data
eBook - ePub

Journalism in an Era of Big Data

Cases, concepts, and critiques

  1. 150 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Journalism in an Era of Big Data

Cases, concepts, and critiques

About this book

Big data is marked by staggering growth in the collection and analysis of digital trace information regarding human and natural activity, bound up in and enabled by the rise of persistent connectivity, networked communication, smart machines, and the internet of things. In addition to their impact on technology and society, these developments have particular significance for the media industry and for journalism as a practice and a profession. These data-centric phenomena are, by some accounts, poised to greatly influence, if not transform, some of the most fundamental aspects of news and its production and distribution by humans and machines.

What such changes actually mean for news, democracy, and public life, however, is far from certain. As such, there is a need for scholarly scrutiny and critique of this trend, and this volume thus explores a range of phenomena—from the use of algorithms in the newsroom, to the emergence of automated news stories—at the intersection between journalism and the social, computer, and information sciences. What are the implications of such developments for journalism's professional norms, routines, and ethics? For its organizations, institutions, and economics? For its authority and expertise? And for the epistemology that underwrites journalism's role as knowledge-producer and sense-maker in society? Altogether, this book offers a first step in understanding what big data means for journalism. This book was originally published as a special issue of Digital Journalism.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Journalism in an Era of Big Data by Seth Lewis in PDF and/or ePUB format, as well as other popular books in Social Sciences & Media Studies. We have over one million books available in our catalogue for you to explore.

Information

CLARIFYING JOURNALISM’S QUANTITATIVE TURN

A typology for evaluating data journalism, computational journalism, and computer-assisted reporting

Mark Coddington
As quantitative forms have become more prevalent in professional journalism, it has become increasingly important to distinguish between them and examine their roles in contemporary journalistic practice. This study defines and compares three quantitative forms of journalism—computer-assisted reporting, data journalism, and computational journalism—examining the points of overlap and divergence among their journalistic values and practices. After setting the three forms against the cultural backdrop of the convergence between the open-source movement and professional journalistic norms, the study introduces a four-part typology to evaluate their epistemological and professional dimensions. In it, the three forms are classified according to their orientation toward professional expertise or networked participation, transparency or opacity, big data or targeted sampling, and a vision of an active or passive public. These three quantitative journalistic forms are ultimately characterized as related but distinct approaches to integrating the values of open-source culture and social science with those of professional journalism, each with its own flaws but also its own distinct contribution to democratically robust journalistic practice.
Introduction
Professional journalism has historically been built around two elements—textual and visual. Numbers have long had a role in journalism as well, but American journalists have consistently downplayed their importance in making up their professional skillset, leading to a notorious difficulty in presenting numerical data accurately and responsibly (Maier 2002). A notable exception has been the professional subfield of computer-assisted reporting (CAR), which has focused on journalistically analyzing quantitative data for at least 40 years. Over the past several years, this data-driven strain of journalism has become more prominent within the profession as it has converged with the increasingly ubiquitous digitization of information both personal and public. As more information has become ones and zeroes at its most elemental level, more journalism has involved gathering, analyzing, and computing that information as quantitative data as well. Journalism appears to be taking, as Petre (2013) puts it, “a quantitative turn.”
This wave of quantitatively oriented journalism has deep democratic roots; various forms of it are tied to open government advocacy (Parasie and Dagiral 2013) and the public-service tradition of investigative journalism (Cox 2000). It has great potential to broaden journalism’s ability to make democratic institutions more responsive and legible to the public, but even within this sub-area of journalism, views of the public and the journalistic process are broadly disparate. Where the CAR of the 1990s was generally a single, unified concept for both professionals and scholars, the area has splintered into a set of ambiguously related practices variously termed by researchers computational journalism (Flew et al. 2012; Karlsen and Stavelin 2014), programmer-journalism (Parasie and Dagiral 2013), open-source journalism (Lewis and Usher 2013), or data journalism (Appelgren and Nygren 2014; Fink and Schudson 2014; Gynnild 2014), among others.
The journalists engaged in these practices seem particularly unconcerned with classifying their work vis-à-vis professional journalism, a sentiment most famously expressed in a short blog post by developer Adrian Holovaty (2009) that answered the question “Is data journalism?” with “Who cares?” This has resulted in several of the aforementioned terms being thrown together within professional discourse as synonyms. For researchers, however, these definitional questions are fundamental to analyzing these practices as sites of professional and cultural meaning, without which it is difficult for a coherent body of scholarship to be built. Indeed, the nascent scholarship in the area is often characterized by initial attempts to define these forms of journalism, each of which has largely been well-conceived and conceptually useful. But taken collectively, they have produced a cacophony of overlapping and indistinct definitions that forms a shaky foundation for deeper research into these practices. As these data-driven forms of journalism move closer to the center of professional journalistic practice, it is imperative that scholars do not treat them as simple synonyms but think carefully about the significant differences between the forms they take and their implications for changing journalistic practice as a whole.
Building on the work of Parasie and Dagiral (2013), Gynnild (2014), and Stavelin (2014) to delineate differences between these practices, this study is an attempt to develop a typology for analyzing forms within this quantitative area of journalism. It examines three professional practices—CAR, data journalism, and computational journalism—along four professional and epistemological dimensions. The analysis will begin with a brief discussion of the cultural background against which these practices are operating, then proceed with an introduction to the three practices, and finally an evaluation of each practice against each of the four dimensions.
Open-source Culture
These new forms of journalistic practice are emerging within an increasing interaction between programmers and journalists, as more programmers have begun to move into professional newsrooms and professional journalists have become increasingly drawn to programming’s technical capabilities and cultural norms, which have been heavily influenced by the open-source movement.
The term “open source” as a technological principle was born in the late 1990s as a more palatable and widely accessible offshoot of the free software movement. Both movements focused on the ability to freely access, modify, and redistribute software as a manifestation of the universal right to access to information and knowledge (Coleman 2013; Kelty 2008). While open-source is intrinsically oriented not toward journalism but toward software, Lewis and Usher (2013) explained its application to journalism through four principles: transparency, iteration, tinkering, and participation. Each of those principles arises from the process of collaboratively building and sharing software, the practice at the core of the open-source software movement. And as Lewis and Usher explained, each is gradually becoming more prevalent within professional journalistic culture as a small subset of more computing-oriented journalists are drawn to the open-source ideals of creativity, experimentation, and liberation of information. In this way, the principles of open source have been an important common ground for bringing together “hacks” (journalists) and “hackers” (technologists).
Data-driven Journalism Practices
The three journalistic practices examined here are not mutually exclusive. Since they have very similar professional and epistemological roots, they will inevitably overlap, in some cases significantly. Actual cases of these practices will often display characteristics of more than one of these categories, as well as the marks of open-source principles. Key institutions have been involved in the perpetuation of more than one of these practices; for example, the National Institute for Computer-Assisted Reporting (NICAR) was the central organization in computer-assisted reporting during the 1990s and is now a central organization in connecting and training those who practice data journalism (Fink and Anderson 2014). In addition, many of the journalists who engage in these practices themselves tend to emphasize their continuity; data journalists generally characterize themselves as following in the same tradition as CAR. But there are significant differences between these forms of practice, and the following is an attempt to pull them apart and clarify them conceptually. This paper relies heavily on research into these practices within the United States and Scandinavia, since those have been the most thoroughly studied geographical settings for this work. It thus broadly describes the forms as they are generally practiced in those environments, though national and local variations certainly exist, both within these areas and outside them.
Computer-assisted Reporting
Though the use of computers in journalism dates back to the 1950s (Cox 2000), the de facto godfather of CAR is Philip Meyer, who outlined a new form called precision journalism in a book of the same name (Meyer 1973). Precision journalism was modeled after social science, using empirical methods (particularly surveys and content analysis) and statistical analysis to achieve more definitive answers to journalistic questions. It was not until the late 1980s and early 1990s that precision journalism, since recast as CAR, began to make significant inroads into newsrooms, led by several high-profile, Pulitzer Prize-winning stories that became an important vehicle for professional validation (Houston 1996).
CAR became closely tied to investigative reporting, often being seen as an auxiliary tool to aid in long-term, public-affairs journalism projects (Cox 2000; Gynnild 2014; Parasie and Dagiral 2013). Though CAR journalists often fought against the perception that their practices were only for time-consuming investigative story packages—an association that may ultimately have limited CAR’s adoption within professional journalism (Gynnild 2014), they also encouraged it at times, characterizing it as, in the words of one CAR pioneer, “the new investigative journalism” (Jaspin 1993). The term CAR has fallen out of favor since the early 2000s as its technology has broadly diffused throughout newsrooms; Meyer himself called in 1999 for the moniker to be retired, describing it as an “embarrassing reminder that we are entering the 21st century as the only profession in which computer users feel the need to call attention to ourselves” (Meyer 1999, 4). Meyer’s call ultimately went unheeded, as CAR continues to be practiced in journalism, though it appears to be invoked more often as a historical mode of quantitative journalism than a contemporary practice. A comparison between CAR and data journalism or computational journalism, as this paper undertakes, is thus a characterization more of change in practice over time than a comparison of contemporaneous practices.
While CAR had its roots in social science-based statistical methods, it came to embody two sets of practices: the data gathering and statistical analysis descended from Meyer’s precision journalism, and more general computer-based information-gathering skills such as online and archival research and even email interviews (Miller 1998; Yarnall et al. 2008). The more general information-gathering skills have become so elemental a part of journalistic work that they can no longer be considered, in Powers’ (2011) terms, “technologically specific work,” though the statistical- and data-oriented forms of CAR remain such because of their relative lack of diffusion. This is the form of CAR that this paper refers to with the term, and the one that serves as the foundation for the modern approaches of data journalism and computational journalism (Gynnild 2014).
Data Journalism
Sometimes referred to as data-driven journalism, data journalism seems to have taken up the mantle of CAR in contemporary professional journalism. Though it is less preferred by scholars, data journalism appears to be the term of choice in the news industry for journalism based on data analysis and the presentation of such analysis (though note the ambivalence toward the term found by Appelgren and Nygren 2014). Professional definitions have tended to be broad, characterizing data journalism as essentially any activity that deals with data in conjunction with journalistic reporting and editing or toward journalistic ends, as in Stray’s (2011) definition of data journalism as “obtaining, reporting on, curating and publishing data in the public interest.” Several others have defined data journalism in terms of its convergence between several disparate fields and practices, characterizing it as a hybrid form that encompasses statistical analysis, computer science, visualization and web design, and reporting (Bell 2012; Bradshaw 2010; Thibodeaux 2011). Data journalism has also been closely associated with the use and proliferation of open data and open-source tools to analyze and display that data (Gynnild 2014), though open data is not necessarily or exclusively a part of its domain of practice (Parasie and Dagiral 2013).
Data journalism has been ascendant since the late 2000s, before which time most data analysis within newsrooms had either been in the form of CAR or in news organizations that dealt largely in specialist financial information (Bell 2012). Though it is not a central element of professional journalistic work, it has made significant inroads into the news industry, with heavy demand throughout the profession despite a relatively small number of dedicated data journalists and relative rarity outside of the most resource-rich news organizations (Fink and Anderson 2014; Howard 2014). Young and Hermida (2014) argue that a new professional class of data journalists is beginning to form, though they have often appropriated computational methods to fit dominant professional practices. One particularly celebrated example of data journalism was The Guardian’s 2009–10 project reporting on the expense claims of Members of the United Kingdom’s Parliament, in which the newspaper published 460,000 pages of expense reports online and asked their readers to sort through them and flag questionable claims. The project resulted in investigative reports and data visualizations led many Members of Parliament to re-examine and re-pay some of their claims. This project exemplifies the data journalism model in its focus on opening data to the public and its use of public input to drive data analysis, visualization, and reporting (Gray, Bounegru, and Chambers 2012).
While data journalism is often used within the context...

Table of contents

  1. Cover
  2. Half Title
  3. Series
  4. Title Page
  5. Copyright
  6. Contents
  7. Citation information
  8. Notes on Contributors
  9. Introduction – Journalism in an Era of Big Data: Cases, concepts, and critiques
  10. 1. Clarifying Journalism’s Quantitative Turn: A typology for evaluating data journalism, computational journalism, and computer-assisted reporting
  11. 2. Between the Unique and the Pattern: Historical tensions in our understanding of quantitative journalism
  12. 3. Data-driven Revelation? Epistemological tensions in investigative journalism in the age of “big data”
  13. 4. From Mr. and Mrs. Outlier to Central Tendencies: Computational journalism and crime reporting at the Los Angeles Times
  14. 5. Algorithmic Accountability: Journalistic investigation of computational power structures
  15. 6. The Robotic Reporter: Automated journalism and the redefinition of labor, compositional forms, and journalistic authority
  16. 7. Waiting for Data Journalism: A qualitative assessment of the anecdotal take-up of data journalism in French-speaking Belgium
  17. 8. Big Data and Journalism: Epistemology, expertise, economics, and ethics
  18. Index