Introduction
Crowdsourcing is the process of leveraging public participation in or contributions to projects and activities. It has become a familiar term, and a concept that has gained increasing attention in many spheres over the last decade. Government, industry, and commercial enterprises are developing crowdsourcing practices as a means to engage their audiences and readerships, to improve and enrich their own data assets and services, and to address supposed disconnects between the public and professional sectors (Boudreau & Lakhani, 2013). At a time when the Web is simultaneously transforming the way in which people collaborate and communicate, and merging the spaces that the academic and nonacademic communities inhabit, it has never been more important to consider the role that public communities â connected or otherwise â have come to play in academic humanities research. Public involvement in the humanities can take many forms â transcribing handwritten text into digital form; tagging photographs to facilitate discovery and preservation; entering structured or semi-structured data; commenting on content or participating in discussions; or recording oneâs own experiences and memories in the form of oral history â and the relationship between the public and the humanities is convoluted and poorly understood.
This book explores this diverse field, and focuses on crowdsourcing as a research method. We consider where, in purely semantic terms, the boundaries of what is considered to be academic crowdsourcing should lie. Since humanities crowdsourcing is at an emergent stage as a research method, there is a correspondingly emergent field of academic literature dealing with its application and outcomes, which allows some assessments to be made about its potential to produce academically credible knowledge. The problematization of method, academic credibility, value and knowledge outputs is familiar from the history of the Digital Humanities. In 2002, Short and McCarty proposed a âmethodological commonsâ, common ways of doing things that linked subject areas, digital research methods and domains: âcomputational techniques shared among the disciplines of the humanities and closely related social sciences, e.g., database design, text analysis, numerical analysis, imaging, music information retrieval, communicationsâ (McCarty, 2003). In McCartyâs terms, this commons formed a combination of âcollegial serviceâ and âresearch enterpriseâ which both made provision for existing research activities, and expanded them. The principal purpose of this book is to develop a similar âmethodological commonsâ for academic crowdsourcing. We contend that just as (say) the application of text processing technologies in history enhances the study of history and provokes new questions about the past, and can inform the development of processing technologies for (again, say) music; so can methods of leveraging public participation in museums form and relate to participation elsewhere in the humanities. What is needed is a demarcation of the kinds of material involved, the âassetsâ, the types of task available to the public, the processes that undertaking those tasks involve, and the outputs. In other words, we seek to apply to crowdsourcing in academia the kind of formal structure of value and review that crowdsourcing has implicitly acquired in many other domains.
Academia, however, has always been something of a special case. It is worth spending some time reflecting on why this is so. Long before crowdsourcing was ever known by that name, researchers in especially the natural sciences were engaging in âcitizen scienceâ, a set of practices in which unpaid volunteers provided input to professionally coordinated research projects. This has been going on in domains such as field ecology, conservation and habitat studies since at least the 17th century, when in any case the role of professional scientist did not exist, at least in its 21st century form (Miller-Rushing, Primack, & Bonney, 2012). Networks, collaborations and codependencies developed within and across professional boundaries, leading to the production of original knowledge that passed all the thresholds of academic peer review and credibility.
The most significant changes to these networks and collaborations can be traced to the mid- and late 2000s. The Galaxy Zoo project, for example, one of the largest and most successful citizen science projects, and to which we return later, was launched on July 11, 2007, with the Zooniverse suite of collaborations coming 2 years later. Shortly before this, in 2006, Jeffrey Howe coined the term âcrowdsourcingâ in a now-famous article in Wired. In this, he stated:
âAll these companies grew up in the Internet age and were designed to take advantage of the networked world. It doesnât matter where the laborers areâthey might be down the block, they might be in Indonesiaâas long as they are connected to the network ⌠technological advances in everything from product design software to digital video cameras are breaking down the cost barriers that once separated amateurs from professionals. ⌠The labor isnât always free, but it costs a lot less than paying traditional employees. Itâs not outsourcing; itâs crowdsourcing.â
Howe (2006)
This definition and its timing are critical to the thesis of this book. The year 2006 was a period when the World Wide Web was becoming ubiquitous and Hypertext was established as its main medium, and it was the time when social media and the interactive Web started to emerge. Twitter was launched in 2006, Facebook in 2008. The emergence of increasingly fluid digital networks of communication spawned crowdsourcing as both a term and a concept, and brought a range of challenges and opportunities to an academic environment already familiar with the traditions of citizen science. The Galaxy Zoo project was an early adopter, using the affordances of the Internet to engage the public in the task of classifying images of galaxies from the Sloan Digital Sky Survey â a job that is straightforward for the human eye, but impossible for even the most sophisticated automated image processing â with now-legendary success (Bamford et al., 2008). Early crowdsourcing projects in the humanities (such as Transcribe Bentham â see Chapter 3) engaged with the concept of crowdsourcing to operationalize in a similar manner tasks of a larger size and scale than was previously possible using unpaid labour, such as mass transcription tasks. Between the mid-2000s and the present day, this paradigm of academic crowdsourcing underwent a shift in perception. It is now acknowledged that it is not a âcheapâ alternative to paid-for labour, as suggested by the use of unpaid volunteers, and by Howeâs contextualization with outsourcing. Rather, it is a set of processes, as we argue in Chapter 3. There is merit in some of these processes being better described as âmethodologiesâ â methods, extrapolated and grounded, which allow academic teams and institutions, and bodies such as libraries, museums and archives, to function in different ways in terms of their relationships with the public. This draws explicitly on the themes of âcitizen scienceâ, which has long been acknowledged as a distinct set of traditions within the epistemologies of science itself (see above).
Crowdsourcing, citizen science and engagement
In citizen science, a useful distinction can be drawn between tasks which are âdelegativeâ, i.e., where data is processed, digitized or otherwise enhanced, and those which are âdemocratizingâ, i.e., in which participants outside the core research team are involved in setting the research agenda and asking the research questions (Savio, Prainsack, & Buyx, 2016). These are different kinds of engagement.
An immediate assumption underlying the word âengagementâ itself is an ontological separation of one entity in to two or more further entities: for there to be engagement, one entity must engage with another. While the 20th and 21st centuries viewed post-Victorian academia in North America and Europe as an âivory tower, or Matthew Arnoldâs âDreaming Spiresâ, detached from the humdrum concerns of day-to-day life, the truth is that this disconnect has always been rather more complex. Drawing from the traditions of citizen science and public engagement discussed above, different fields of academic research have engaged with crowdsourcing in different ways. As with politics, industry and commerce, there are parts of the academic sphere which have different histories of, and motivations for, engaging with the public. The history of mass contribution to academic research and the building of the most august scholarly resources must include the Oxford Dictionary of National Biography, the British Museumâs Bronze Age Index (Bonacchi et al., 2014), as well as Wikipedia and the Zooniverse suite of citizen science projects (Simpson, Page, & De Roure, 2014), all of which represent different traditions, which are nonetheless rooted in the institutional, economic, social and political distinction between âprofessional researchâ and ânon-professional researchâ. A primary contention of this book, as suggested above, is that the Internet and World Wide Web (WWW) have altered all aspects of this relationship, and that a âmethodological commonsâ which articulates what crowdsourcing can bring to academia must take account of this.
However, the types (and value) of knowledge produced by academic crowdsourcing activities â by the exposure of professional academia to the Internet age â have not been widely addressed by the literature on the subject. In many cases this is because âacademic knowledgeâ, in the pure sense of new understandings that can withstand professional peer scrutiny and critical analysis, has not been sought or expected by those academics consciously engaging with crowdsourcing, except inasmuch as they will write papers in the way that they do during and after every research project. Rather, the focus on crowdsourcing, certainly in the humanities, has been on the improvement and transformation of content from one type to another, the description of objects and the synthesis of information from different sources (Ridge, 2014, p. 23). This may be seen largely as a refinement of Howeâs definition concerned with value: âbreaking down the cost barriers that once separated amateurs from professionalsâ. At the same time, analysts such as Darren Brabham situated it as a productive enterprise, a means of âdoing profitable businessâ (Brabham, 2008, p. 82). In many of the early instances of academic crowdsourcing of the early 2010s, it was seen as a method of approaching digitization of (very) hard-to-digitize humanities research assets (Dunn & Hedges, 2013).
However, in the same period, one of the key sectors of the humanities where public engagement is mission-critical, the cultural memory sector (museums, archives, libraries etc.), much â but not all â of the focus has been on the production of unstructured knowledge content, such as blogs, social media and user-generated content, and the seeking of feedback through social media platforms such as Twitter. This in itself has been hailed as a democratizing paradigm in these sectors (Russo, Watkins, Kelly, & Chan, 2008). However, as with crowdsourcing elsewhere in the humanities it is important to draw a distinction between democratization, and the freeing of content that would otherwise by stored with limited accessibility within an institutional framework; between the production of new (or better) systems of cataloguing and/or documentation, and the production of new knowledge based on that content. For this reason, it is also true that the purpose of academic crowdsourcing, at least in its earlier guises, has not been angled at the critical production of knowledge, as it would be understood in any of the pedagogical literature in the humanities and cultural heritage.
The idea that crowdsourcing is primarily concerned with production was implicit in the typology of Dunn and Hedges (2012), which argued that academic crowdsourcing could be conceived as a set of workflow tasks, assets, processes and outputs. The âAssetâ category for the typology concerned the different kinds of content with which humanists work, such as text, image, video and audio. The âOutputâ category of the typology was described as âthe thing an activity produces as the result of the application of a process, using tasks of a particular task type, to an assetâ (2012, p. 37). In many cases, this involved dependencies of task types: for example, an output of the type âstructured dataâ would likely be a result of a project employing the task types of âcollaborative taggingâ or âlinkingâ. This is perhaps inevitable, as the very act of structuring implies an act...