CLARIFYING JOURNALISMâS QUANTITATIVE TURN
A typology for evaluating data journalism, computational journalism, and computer-assisted reporting
Mark Coddington
As quantitative forms have become more prevalent in professional journalism, it has become increasingly important to distinguish between them and examine their roles in contemporary journalistic practice. This study defines and compares three quantitative forms of journalismâcomputer-assisted reporting, data journalism, and computational journalismâexamining the points of overlap and divergence among their journalistic values and practices. After setting the three forms against the cultural backdrop of the convergence between the open-source movement and professional journalistic norms, the study introduces a four-part typology to evaluate their epistemological and professional dimensions. In it, the three forms are classified according to their orientation toward professional expertise or networked participation, transparency or opacity, big data or targeted sampling, and a vision of an active or passive public. These three quantitative journalistic forms are ultimately characterized as related but distinct approaches to integrating the values of open-source culture and social science with those of professional journalism, each with its own flaws but also its own distinct contribution to democratically robust journalistic practice.
Introduction
Professional journalism has historically been built around two elementsâtextual and visual. Numbers have long had a role in journalism as well, but American journalists have consistently downplayed their importance in making up their professional skillset, leading to a notorious difficulty in presenting numerical data accurately and responsibly (Maier 2002). A notable exception has been the professional subfield of computer-assisted reporting (CAR), which has focused on journalistically analyzing quantitative data for at least 40 years. Over the past several years, this data-driven strain of journalism has become more prominent within the profession as it has converged with the increasingly ubiquitous digitization of information both personal and public. As more information has become ones and zeroes at its most elemental level, more journalism has involved gathering, analyzing, and computing that information as quantitative data as well. Journalism appears to be taking, as Petre (2013) puts it, âa quantitative turn.â
This wave of quantitatively oriented journalism has deep democratic roots; various forms of it are tied to open government advocacy (Parasie and Dagiral 2013) and the public-service tradition of investigative journalism (Cox 2000). It has great potential to broaden journalismâs ability to make democratic institutions more responsive and legible to the public, but even within this sub-area of journalism, views of the public and the journalistic process are broadly disparate. Where the CAR of the 1990s was generally a single, unified concept for both professionals and scholars, the area has splintered into a set of ambiguously related practices variously termed by researchers computational journalism (Flew et al. 2012; Karlsen and Stavelin 2014), programmer-journalism (Parasie and Dagiral 2013), open-source journalism (Lewis and Usher 2013), or data journalism (Appelgren and Nygren 2014; Fink and Schudson 2014; Gynnild 2014), among others.
The journalists engaged in these practices seem particularly unconcerned with classifying their work vis-Ă -vis professional journalism, a sentiment most famously expressed in a short blog post by developer Adrian Holovaty (2009) that answered the question âIs data journalism?â with âWho cares?â This has resulted in several of the aforementioned terms being thrown together within professional discourse as synonyms. For researchers, however, these definitional questions are fundamental to analyzing these practices as sites of professional and cultural meaning, without which it is difficult for a coherent body of scholarship to be built. Indeed, the nascent scholarship in the area is often characterized by initial attempts to define these forms of journalism, each of which has largely been well-conceived and conceptually useful. But taken collectively, they have produced a cacophony of overlapping and indistinct definitions that forms a shaky foundation for deeper research into these practices. As these data-driven forms of journalism move closer to the center of professional journalistic practice, it is imperative that scholars do not treat them as simple synonyms but think carefully about the significant differences between the forms they take and their implications for changing journalistic practice as a whole.
Building on the work of Parasie and Dagiral (2013), Gynnild (2014), and Stavelin (2014) to delineate differences between these practices, this study is an attempt to develop a typology for analyzing forms within this quantitative area of journalism. It examines three professional practicesâCAR, data journalism, and computational journalismâalong four professional and epistemological dimensions. The analysis will begin with a brief discussion of the cultural background against which these practices are operating, then proceed with an introduction to the three practices, and finally an evaluation of each practice against each of the four dimensions.
Open-source Culture
These new forms of journalistic practice are emerging within an increasing interaction between programmers and journalists, as more programmers have begun to move into professional newsrooms and professional journalists have become increasingly drawn to programmingâs technical capabilities and cultural norms, which have been heavily influenced by the open-source movement.
The term âopen sourceâ as a technological principle was born in the late 1990s as a more palatable and widely accessible offshoot of the free software movement. Both movements focused on the ability to freely access, modify, and redistribute software as a manifestation of the universal right to access to information and knowledge (Coleman 2013; Kelty 2008). While open-source is intrinsically oriented not toward journalism but toward software, Lewis and Usher (2013) explained its application to journalism through four principles: transparency, iteration, tinkering, and participation. Each of those principles arises from the process of collaboratively building and sharing software, the practice at the core of the open-source software movement. And as Lewis and Usher explained, each is gradually becoming more prevalent within professional journalistic culture as a small subset of more computing-oriented journalists are drawn to the open-source ideals of creativity, experimentation, and liberation of information. In this way, the principles of open source have been an important common ground for bringing together âhacksâ (journalists) and âhackersâ (technologists).
Data-driven Journalism Practices
The three journalistic practices examined here are not mutually exclusive. Since they have very similar professional and epistemological roots, they will inevitably overlap, in some cases significantly. Actual cases of these practices will often display characteristics of more than one of these categories, as well as the marks of open-source principles. Key institutions have been involved in the perpetuation of more than one of these practices; for example, the National Institute for Computer-Assisted Reporting (NICAR) was the central organization in computer-assisted reporting during the 1990s and is now a central organization in connecting and training those who practice data journalism (Fink and Anderson 2014). In addition, many of the journalists who engage in these practices themselves tend to emphasize their continuity; data journalists generally characterize themselves as following in the same tradition as CAR. But there are significant differences between these forms of practice, and the following is an attempt to pull them apart and clarify them conceptually. This paper relies heavily on research into these practices within the United States and Scandinavia, since those have been the most thoroughly studied geographical settings for this work. It thus broadly describes the forms as they are generally practiced in those environments, though national and local variations certainly exist, both within these areas and outside them.
Computer-assisted Reporting
Though the use of computers in journalism dates back to the 1950s (Cox 2000), the de facto godfather of CAR is Philip Meyer, who outlined a new form called precision journalism in a book of the same name (Meyer 1973). Precision journalism was modeled after social science, using empirical methods (particularly surveys and content analysis) and statistical analysis to achieve more definitive answers to journalistic questions. It was not until the late 1980s and early 1990s that precision journalism, since recast as CAR, began to make significant inroads into newsrooms, led by several high-profile, Pulitzer Prize-winning stories that became an important vehicle for professional validation (Houston 1996).
CAR became closely tied to investigative reporting, often being seen as an auxiliary tool to aid in long-term, public-affairs journalism projects (Cox 2000; Gynnild 2014; Parasie and Dagiral 2013). Though CAR journalists often fought against the perception that their practices were only for time-consuming investigative story packagesâan association that may ultimately have limited CARâs adoption within professional journalism (Gynnild 2014), they also encouraged it at times, characterizing it as, in the words of one CAR pioneer, âthe new investigative journalismâ (Jaspin 1993). The term CAR has fallen out of favor since the early 2000s as its technology has broadly diffused throughout newsrooms; Meyer himself called in 1999 for the moniker to be retired, describing it as an âembarrassing reminder that we are entering the 21st century as the only profession in which computer users feel the need to call attention to ourselvesâ (Meyer 1999, 4). Meyerâs call ultimately went unheeded, as CAR continues to be practiced in journalism, though it appears to be invoked more often as a historical mode of quantitative journalism than a contemporary practice. A comparison between CAR and data journalism or computational journalism, as this paper undertakes, is thus a characterization more of change in practice over time than a comparison of contemporaneous practices.
While CAR had its roots in social science-based statistical methods, it came to embody two sets of practices: the data gathering and statistical analysis descended from Meyerâs precision journalism, and more general computer-based information-gathering skills such as online and archival research and even email interviews (Miller 1998; Yarnall et al. 2008). The more general information-gathering skills have become so elemental a part of journalistic work that they can no longer be considered, in Powersâ (2011) terms, âtechnologically specific work,â though the statistical- and data-oriented forms of CAR remain such because of their relative lack of diffusion. This is the form of CAR that this paper refers to with the term, and the one that serves as the foundation for the modern approaches of data journalism and computational journalism (Gynnild 2014).
Data Journalism
Sometimes referred to as data-driven journalism, data journalism seems to have taken up the mantle of CAR in contemporary professional journalism. Though it is less preferred by scholars, data journalism appears to be the term of choice in the news industry for journalism based on data analysis and the presentation of such analysis (though note the ambivalence toward the term found by Appelgren and Nygren 2014). Professional definitions have tended to be broad, characterizing data journalism as essentially any activity that deals with data in conjunction with journalistic reporting and editing or toward journalistic ends, as in Strayâs (2011) definition of data journalism as âobtaining, reporting on, curating and publishing data in the public interest.â Several others have defined data journalism in terms of its convergence between several disparate fields and practices, characterizing it as a hybrid form that encompasses statistical analysis, computer science, visualization and web design, and reporting (Bell 2012; Bradshaw 2010; Thibodeaux 2011). Data journalism has also been closely associated with the use and proliferation of open data and open-source tools to analyze and display that data (Gynnild 2014), though open data is not necessarily or exclusively a part of its domain of practice (Parasie and Dagiral 2013).
Data journalism has been ascendant since the late 2000s, before which time most data analysis within newsrooms had either been in the form of CAR or in news organizations that dealt largely in specialist financial information (Bell 2012). Though it is not a central element of professional journalistic work, it has made significant inroads into the news industry, with heavy demand throughout the profession despite a relatively small number of dedicated data journalists and relative rarity outside of the most resource-rich news organizations (Fink and Anderson 2014; Howard 2014). Young and Hermida (2014) argue that a new professional class of data journalists is beginning to form, though they have often appropriated computational methods to fit dominant professional practices. One particularly celebrated example of data journalism was The Guardianâs 2009â10 project reporting on the expense claims of Members of the United Kingdomâs Parliament, in which the newspaper published 460,000 pages of expense reports online and asked their readers to sort through them and flag questionable claims. The project resulted in investigative reports and data visualizations led many Members of Parliament to re-examine and re-pay some of their claims. This project exemplifies the data journalism model in its focus on opening data to the public and its use of public input to drive data analysis, visualization, and reporting (Gray, Bounegru, and Chambers 2012).
While data journalism is often used within the context...