1
The myth of the new
Theories of technological discourses
In the Discworld, where the librarian of the most prestigious library in the world is an orangutan, L-Space represents the magical manifestation of what happens when large quantities of books are put in close proximity. L-Space is accessed via portals which follow the layout of the floor and ceiling of the library used to access it and links every library throughout space, time and the multiverse. A well-prepared librarian, armed with enough bananas, can find any book ever written and return it to their users. This magical manifestation of the universal library has but three simple rules:
1.) Silence; 2.) Books must be returned by the last date stamped; 3.) Do not interfere with the nature of causality.
(Pratchett, 1989)
In Pratchettâs vividly imagined world, strange things happen when large quantities of books are grouped together. Books bend space and time, warping the world around them as a result of the magical power they exert. When reading the discourse around large-scale digitisation, one could be forgiven for assuming that a similar bending of the rules of space-time was occurring in our world. The effects of digital media on our social, cultural and intellectual practices are profound, certainly, but this impact must be considered in relation to a wider discourse which exaggerates the impact of digital technologies. Evgeny Morozov has provocatively argued that this discourse has co-opted the term âInternetâ to create an all-encompassing technology which defies rational debate: âinstead of debating the merits of individual technologies and crafting appropriate policies and regulations, we have all but surrendered to catchall terms like âthe Internetâ, which try to bypass any serious and empirical debate altogetherâ (2013). But the way in which internet technologies influence our social and intellectual structures is not an inevitable result of an overarching technology: indeed, I will argue in this chapter that the tendency to overstate the impact of large-scale digitisation is part of a wider trend towards building a mythology around new technology, and that the reality of technological adoption and impact is far more complex.
There is, in the discourse which surrounds large-scale digitisation, a consensus that digitising huge swathes of public domain historical materials will drive significant changes in access, research methods and institutional practices. It is certainly true that there has never been more historical material available for those who possess an internet connection and the necessary individual or institutional subscriptions. The unprecedented growth in digital availability of historical materials, which have remained inaccessible for hundreds of years, is a cause for excitement. Yet the enthusiasm for the possibilities offered by large-scale digitisation is in fact part of a larger cultural interrogation of the impact of digital technologies on our social practices and the intellectual paradigms which underpin research, reading and engagement with historical artefacts. As such, it is difficult to focus on the cultural heritage sector without considering their wider significance. While it is certainly true that many critics have been realistic in their assessment of the transformative potential of digital technologies, their claims contrast with grandiose statements of epochal change. The growing scale of digital collections has caused some critics to overstate their benefits and exaggerate any negative impacts. This chapter explores the ways in which theoretical discourse has been subject to a process of exaggeration and how this constrains our efforts to consider the impact of large-scale digitisation in a more nuanced way. It begins with a brief consideration of the most influential project in this debate, the Google Books project. Without understanding how Googleâs enormous digitisation project was received, it is difficult to assess how public and critical opinion has been shaped in the last decade. On the one hand, some have claimed digitisation as a democratising force which will ensure the survival of our cultural heritage and allow us all to access the worldâs knowledge without leaving our homes; others are already developing research projects that rely upon the existence of the massive literary datasets that are a product of Googleâs digitisation efforts. Michel and Shen (2010), for instance, coined the term âCulturomicsâ to describe their quantitative literary analysis of linguistic and cultural phenomena in the English language from 1800â2000, claiming that their work allows them to observe quantitative trends in a corpus of over five million books taken from Googleâs digitised vaults. The inspiration for the term comes from a self-declared identification with computational approaches to scientific problems:
Various fields with the suffix â-omicsâ (genomics, proteomics, transcriptomics, and a host of others) have emerged in recent years ⊠These fields have created data resources and computational infrastructures that have energized biology. The effort to digitize and analyse the worldâs books has proceeded along these lines.
(Culturomics, 2010)
These computational methods have led Wired contributor Chris Anderson to declare that the big data era renders theoretical models obsolete. He comments that âcorrelation supersedes causation, and science can advance without coherent models, unified theories, or really any mechanistic explanation at allâ (2008). While these arguments introduce a spurious claim for objectivity, based on the idea that media which is widely understood to be subjective somehow defies interpretation when considered at scale, the academic community has tempered their language. Moretti observes that:
Quantitative data can tell us when Britain produced one new novel per month, or week, or day, or hour for that matter, but where the significant turning point lies along the continuum â and why â is something that must be decided on a different basis.
(2007)
Despite many such cautionary notes, this excitable media coverage has tended towards overstatement of the significance of aspects of digital media. Castells sums this tendency up accurately, noting that:
The media, keen to inform an anxious public, but lacking the autonomous capacity to assess social trends with rigor, oscillate between reporting the amazing future on offer and following the basic principle of journalism: the only news is bad news.
(2002)
The chapter is divided into a number of sections which are intended to take the reader through the theoretical basis for this work. First, I explore the role of the technological sublime in framing cultural discourses around digitisation. Next, I explore how this mythology of the digital interacts with the diffusion of innovations, which explains how technology is communicated through cultural channels. The role of the medium in shaping meaning is explored further, providing a critique of the extent to which older theories of media specificity remain relevant in the digital age. I also consider how city life and online information behaviour relate to emerging methods such as corpus analysis. The chapter finishes by considering the remediation of print materials online, concluding that our skeuomorphic digital surrogates belie the idea that digitisation represents a significant divergence from what has come before. This being established, there is a need to more carefully consider the subtler impacts of large-scale digitisation. Through a critical awareness of the role of technological discourse in framing and constraining our understanding of new technologies, we can develop methodological approaches to reframe this discourse and begin to explore the pressing question of how users interact with digitised materials in reality.
Google Books: the universal library reimagined
Whatever is fitted in any sort to excite the ideas of pain, and danger, that is to say, whatever is in any sort terrible, or is conversant about terrible objects, or operates in a manner analogous to terror, is a source of the sublime; that is, it is productive of the strongest emotions which the mind is capable of feeling.
(Burke, 1998)
Commercial companies have, more than the heritage sector, been responsible for dramatically accelerating the rate of digitisation at a global level. Public institutions face decreasing levels of funding, to the extent that Google Book Search (GBS) would have struggled to gain such momentum within the constraints of the contemporary heritage sector. Google has shaped demand for digitised content and user expectations for how it will be presented online. The influence of GBS is such that it is almost impossible to address technological discourses around digitisation without considering the project and its reception. Initially branded Google Print when it was announced in 2004, GBS was inspired by a desire to digitise the worldâs knowledge in full. At the Frankfurt Book Fair in October 2004, Google announced the start of Google Print, followed in December 2004 by the announcement of the Google Print Library Project. In partnership with the New York Public Library and the Universities of Stanford, Oxford, Michigan and Harvard, Google announced its intention to digitise an estimated 15 million texts from the internationally significant collections of these libraries. The aims of the project were simultaneously extremely simple and hugely ambitious: to make the full text of all the worldâs books searchable online by anybody with an internet connection. As the project developed, other institutions also signed up to become library partners, recruiting its first non-English-speaking partners in 2006.
The New York Times (Heyman, 2015) estimates that around 25 million books have been digitised by Google to date, the majority of which are now available through GBS for searching and, in the case of public domain materials, reading. Due to copyright restrictions, GBS is based largely around search and discovery, quite severely limiting access to copyrighted materials in what has been quite accurately described as the equivalent of a giant card catalogue. Despite the limitations in access, Googleâs evangelical approach has inspired a techno-utopian interpretation of its impact. For instance, in an interview with Ken Auletta, Google co-founder Eric Schmidt described the moment that he was introduced to the book scanner that would be the catalyst for the project:
It had been inspired by the Great Library of Alexandria, erected around 300 B.C. to house all the worldâs scrolls. Page had used the equivalent of his own 20 percent time to construct a machine that cut off the bindings of books and digitized the pages. âWhat are you going to do with that, Larry?â Schmidt asked. âWeâre going to scan all the books in the worldâ, Page said. For search to be truly comprehensive, he explained, it must include every book ever published. He wanted Google to âunderstand everything in the world and give it back to you.â Sort of a âsuper librarian,â he said.
(2009)
This anecdote bears all the hallmarks of the self-mythologising tendency of the tech sector: the association with the Library of Alexandria, which itself has become the semi-mythological originator of contemporary librarianship; the dismissal of librarians and the assumption that digital materials at scale will replace existing information infrastructures; and a grand belief that technology will inevitably change the world. In reality, this utopianism has been challenged by consistent criticism of Googleâs work from rights holders and critics alike. This critical split will prove illuminating in understanding the theoretical framework within which large-scale digitisation exists. While there is a clear distinction between GBS and the more selective approach of large-scale digitisation practised by libraries and archives, the corporationâs work has profoundly informed the development of interfaces for digitised collections and the debate around their impact.
Criticism of the project has concentrated primarily on Googleâs digitisation of copyrighted texts; unlike many others, Google has sought to digitise texts regardless of their copyright status. For its part, Google has consistently argued that its efforts fall under the traditional definition of âfair useâ,1 a point that has been hotly contested. Lawsuits brought separately by The Authors Guild and the Association of American Publishers in 2005 alleged that Google was infringing copyright and, furthermore, was failing to properly compensate rights holders for reusing their work. This led directly in 2009 to the Google Books Settlement, a wide-ranging document in which Google agreed to compensate publishers and authors in exchange for permission to make copyrighted works searchable in the GBS database. The question of fair use has been revisited in court since, with the latest ruling in the US Appeals Court in October 2015, declaring that Google did not violate copyright law. There is no need to go into the legal intricacies of the case here, and indeed others have already done so in great depth (Band, 2008; Grimmelmann, 2009). Google (2015) provides its own legal guide, which focuses on the companyâs interpretation of the âfair useâ exception in copyright law. In reality, a number of controversial points remained even after the settlement was reached: the lack of progress in deciding the legality of mass book scanning; concerns that this poorly legislated domain would be dominated by âprivate lawâ (Hetcher, 2006) that favours the companies involved; allegations that the settlement created a de facto monopoly in the area of orphan works, thus creating an almost insurmountable barrier to entry for Googleâs rivals (Gibson, 2008); continued disagreement over the validity of the fair use argument, and whether digital indexing should in fact be considered transformative; and the global implications of a class action suit that nominally took place within the exclusive purview of US copyright law (Guo et al., 2010).
These global concerns go beyond the legal ramifications of US jurisdiction being used to decide the validity of a resource which provides international material to a global audience; there has also been sustained concern that GBS will further undermine the cultural output of nations that fall outside dominant Anglo-American circles. One of the most outspoken critics to date has been Jean-NoĂ«l Jeanneney, former president of the National Library of France. He suggests that the US is exerting a form of cultural dominance over the rest of the world and that Googleâs actions are exacerbating the problem. At the heart of his argument are two main anxieties. The first is the effect that American market forces exert on the project. He cites the negative impact of advertising on Google products and the bias towards the English language in Googleâs products. The foreign language instructions for GBS, for instance, had been machine-translated: âthey were filled with gobbledygook, some of it hilariousâ (Jeanneney, 2007). Anthony Grafton portrays GBS as an undemocratic exercise, with the envisioned universal library taking shape as âa patchwork of interfaces and databases, some open to anyone with a computer and Wi-Fi, others closed to those without access or moneyâ (2007), while Hetcher points out that universality cannot be achieved because libraries have already preselected materials for their collections. With each institution focusing its collection development strategy on its own user community, certain types of text will be historically under-represented, regardless of Googleâs success: âif none of these libraries carries pulp fiction, for instance, then these texts will not be available for searchingâ (2006).
Beyond the systemic problems associated with large Western corporations taking control of the public domain, there has also been overwhelming criticism of the lack of quality control exerted during digitisation. Many users were underwhelmed by their early experiences of Google Books; scan quality was ridiculed, and a variety of negative factors led Townsend to suggest that âthe project is falling far short of its central premise of exposing the literature of the worldâ (2007). ...