1The politics of Big Data
Principles, policies, practices
Ann Rudinow Sætnan, Ingrid Schneider, and Nicola Green
Our focus and goal
Kranzberg’s ‘First Law of Technology’ teaches us that ‘[t]echnology is neither good nor bad; nor is it neutral’ (Kranzberg 1986: 547). By this, he means that technologies engage with complex social ecologies to produce social results, but that these results are not entirely controlled by the technologies’ human interactors.1 Having learned also from Barad (2007), we could state that technologies intra-act with human societies – both shaping and taking shape in each moment of entanglement. These mutual shaping moments occur whether we reflect on them or not; we can enter into them more or less watchfully, less or more carelessly. Furthermore, the mutual shaping happens continuously, throughout a technology’s social history. As long as we engage with a technology, we continue to shape it and be shaped by it. By engaging reflexively in that reshaping process, we can also reshape the process itself.
Big Data is a case in point. Big Data is constituted by a nexus of technologies – data-processing hardware and software and a myriad of digitised apparatuses and processes networked through a constant data flow, each element embedded into the varied ecologies of current society.2 This means we are constantly shaping and being shaped by Big Data. While this basic claim can be made of any technology, from the stone wheel to the atom bomb, it seems today that Big Data is taking form as a particularly powerful reshaping force. In this reshaping process, benefits and harms are unevenly distributed across social groups and across different aspects of social life. Forms and degrees of influence over the shaping processes also vary and are unevenly distributed. The more we learn about those distributions – both of benefits and harms being produced through the implementation of Big Data, and in what ways and how effectively individuals, groups and governments have so far been engaged in shaping Big Data – the more effectively we can engage in the shaping process and its outcomes in future.
This book focuses on the shaping of and by Big Data, approaching that theme from many angles. Here in Chapter 1 we will first describe what it is the contributors share in our conceptualisation of that theme. We will then describe a key element of what our various angles of approach have in common as well as how they differ in that regard. Finally, we will give readers a brief presentation of the structure of, and contributions to, the volume.
Our contributions to this book first came into conversation with one another when we all responded to a call for papers to form a conference track under the open and questioning title ‘Big Data, Big Brother?’ As those conversations proceeded, we saw that however varied our contributions were in some regards, they shared a sub-theme: they all addressed issues regarding the politics of Big Data. In other words, the very variety of our approaches served to tighten our shared focus on the political aspects of Big Data. At the same time, that variety of approaches demonstrated the breadth of the political focus, namely the breadth of what can be labelled as ‘politics’ and what as ‘policies’. Accordingly, in this introductory chapter to the volume, we will discuss two concepts that define its scope:
1What we mean by ‘Big Data’, and
2What we mean by ‘politics and policies’.
Big Data – does it exist, and if so, what is it?
In the discussion after the final session of our conference track, one member of the audience challenged us on the very concept of Big Data: ‘Isn’t Big Data all just hype?’ he asked, ‘Does it even actually exist?’3 Our answer is that it both is and isn’t ‘all just hype’, Big Data does and doesn’t really exist; or rather, it certainly exists, but one might well ask in what sense and to what extent it exists.
At the very least, Big Data exists as a sociotechnical imaginary that serves as a meta-narrative to capture the present and future of digitisation, datafication, and globalised networks. For instance, boyd and Crawford (2012: 663) define Big Data as ‘a cultural, technological, and scholarly phenomenon’ that rests on the interplay of technology, analysis, and mythology, referring to the latter as the ‘widespread belief that large data sets offer a higher form of intelligence and knowledge that can generate insights that were previously impossible, with the aura of [emphases added] truth, objectivity, and accuracy’. When that imaginary is expressed, for instance in the form of a definition, Big Data also takes on a semantic existence. From there, either as an imaginary or as a semantic reality, Big Data exists in a phenomenological sense, that is, as subjective perceptions that form an interpretive framework and a basis for action. And finally, Big Data exists because (as many of the empirical chapters will show) we can see those imaginaries and frameworks being acted upon. When Big Data imaginaries, semantics and interpretive frameworks are enacted and the actants (both human and non-human) of those frameworks engaged, Big Data takes on material form(s). Yet these forms do not always fully realise the shapes and traits imagined for them.
So what are these imaginaries and frameworks? A clear definition would make the phenomenon semantically more real while also bounding it in against conceptual drift, but definitions and undefined usages vary widely. It is perhaps unfortunate that there is no apparent consensus on a definition, however many definitions do seem to refer to some form of observable materiality.
Some definitions, as we shall see below, focus on size. Although hyped as revolutionary in that regard (see, for example, Mayer-Schönberger and Cukier’s (2013) Big data: A Revolution that will transform how we live, work, and think), in a sense there is nothing new about Big Data. There is a long history of dealing with amounts of information that at some point seemed too vast to handle (see, for instance, Hacking 1982). Cynically speaking, the only thing ‘new’ about Big Data is the catchy name. Less cynically, what’s new is its scale. Stipulating some minimum size that would deserve the name ‘Big’ might at first glance seem to set a precise boundary vis-à-vis older data practices. For instance, Intel defines Big Data organisations as those ‘generating a median of 300 terabytes (TB) of data weekly’ (Intel 2012). And, certainly there are such organisations.
However, size is a moving target. As data capacities rise and costs fall, 300 TB/week may soon seem not so big after all. Another dimension is relative size. Rather than a fixed number, the current Wikipedia (2017) article defines Big Data as ‘data sets so large or complex that traditional data processing application software is inadequate to deal with them’.4 Again, yes, certainly there are organisations that struggle to make sense of their data due to the sheer size of the databases they deal with.
For a company such as Intel, which sells data capacity, size – be it a fixed figure or a relative target – may be a useful definition criterion. For us as philosophers/historians/social scientists of technology, size, to be relevant, should preferably (at least also) relate to meanings, practices, and the social and ethical consequences thereof. Laney (2001) points to three purported traits of Big Data that may relate to meanings, practices, and consequences. While not presented as a definition of Big Data, the three traits – volume, velocity, and variety, also known as ‘the three Vs’ – have been much used as one (e.g. by UK POST 2014 and by Mayer-Schönberger and Cukier 2013).
For instance, it is used – and expanded upon – by one of the few book-size publications providing a critical analysis of Big Data. Based on literature reviews, Rob Kitchin (2014: 1–2) defines Big Data as:
•huge in volume, consisting of terabytes or petabytes of data;
•high in velocity, being created in or near real time;
•diverse in variety, being structured and unstructured in nature;
•exhaustive in scope, striving to capture entire populations or systems (n = all);
•fine-grained in resolution and uniquely indexical in identification;
•relational in nature, containing common fields that enable the conjoining of different data sets;
•flexible, holding the traits of extensionality (can add new fields easily) and scalability (can expand in size rapidly).
Discussing these traits one by one, Kitchin points out how each is not necessarily some sort of data-based ‘superpower’, but can also be problematic. Each strength can also be a weakness. For instance, volume challenges data storage and analysis capacity, although less so as capacities grow. Velocity challenges analysts’ ability to define and capture relevant moments in ever-changing data flows and trends. Variety challenges the ability to curate – categorise, standardise, stabilise – data for collation, comparison, and analysis. The existence of whole conferences on the curation challenges of Big Data databases (e.g. IDCC 2017) points to the existence of such issues.
Some, for example, IBM (n.d.) add a fourth V for Veracity, but this is a misnomer. There is no evidence that massed data from multiple sources are ‘truer’ than data collected in traditional scientific endeavours. In fact, in their presentation of the fourth V, IBM subtitles ‘veracity’ as ‘uncertainty of data’. Thus, veracity has to be questioned as data may be inaccurate or unreliable.
Other analysts have added Variability as a fifth V, where data qualities and results may change over time. Newly collected data may be added to existing datasets, for example. Furthermore, novel ways of using existing data may be developed by addressing previously unasked questions, by applying new methods of analysis, by correlating data formerly unrelated, or by creating alternative linkages to give a different picture (UK POST 2014).
Google has defined Big Data self-referentially by word-clouding search terms that co-occur with it (Ward and Barker 2013). That they managed to perform such an analysis is doubly self-referential evidence that the phenomenon exists, as a search term and as an approach to repurposing and analysing masses of stored data, in this case data from Google searches. We might even say the results were triply self-referential, as they pointed to other phenomena enmeshed with the concept and practices of Big Data: According to Ward and Barker (2013), Google users who seek information about Big Data often also seek information about Big Data analytics, in general or with specific reference to database technologies such as the data storage and retrieval architecture style known as NoSQL or data storage and analysis software programs such as Hadoop.
Big Data analytics, rather than database size alone, is also where the Big Data hype now focuses. This shift in hype focus may be similar to the focus shift from genomics to proteomics (see, for example, Fujimura 2005; Webster 2005). One way of interpreting such shifts is as Latour (1987: 114–121) describes the rhetorical mobilisation and hiding of barriers (moving attention from one problem to the next, and thus away from unachieved ‘final’ goals) as a strategy to maintain member loyalty in sociotechnical networks. In the case of Big Data we could paraphrase these as follows: ‘You now have access to vast amounts of data and are only more confused? Don’t abandon the bandwagon! We only need to develop a little more self-learning software; then we will reach your goals.’ However, as we shall see, the problems may have stayed on the bandwagon too, especially if they were not merely about size, but also about analysis, ethics, legality, actionability, and so on.
Continuing from how big, via how difficult and how to do, many influential definitions (often implicit rather than explicit and, ironically, often anecdotal or casuistic rather than statistics-based) focus on what Big Data analytics have achieved. For instance, Mayer-Schönberger and Cukier (2013) define Big Data as ‘The ability of society to harness information in novel ways to produce useful insights or goods and services of significant value’ (ibid.: 2) and ‘… things one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value’ (ibid.: 6). Here we are getting to definitions we might challenge as pointing to a mythical object rather than a material one, to ‘air ware’ and hollow promises (or threats), or to boyd and Crawford’s (2012: 663) aforementioned definition of Big Data as a belief system.
Although Mayer-Schönberger and Cukier (2013) temper their Big Data success tale with the caution that privacy may suffer, their emphasis is nevertheless on Big Data’s purported advantages, summarising them as the very definition of Big Data. Others, for example Anderson (2008), mention no disadvantages whatsoever. While some (e.g. boyd and Crawford 2012; Kitchin 2014; Sætnan, Chapter 2 in this volume) go further in their critique of Big Data, examining even its success claims, most authors are less critical. And, judging by numbers of citations (according to both Google Scholar and ISI Web of Science), Big Data enthusiast texts clearly have greater traction than critiques. Through that traction, Big Data – however mythical its results may be – exists as a political, societal and commercial force, a force resulting in attempts to perform Big Data by (re-)collecting and attempting to (re-)use vast amounts of data.
We can readily find such arguments pitched towards businesses. Just do an Internet search for ‘Big Data solutions’ and there they are. One company, called Big Data Solutions, advertises that,
We support our customers in capitalizing the data and combine it with other information in order to produce improved business insight. […] Whether you are currently struggling with dirty, unorganized or unknown data assets looking to uncover opportunities and leveraging big data or embarking on your first master data management initiative, we can help you unlock the true value of your enterprise data.
(Big Data Solutions n.d.)
Another company, SAP, advises potential client firms to, ‘Discover the Big Data solutions that give you 360-degree insight into your business and help you find hidden growth opportunities within the digital noise’ (SAP n.d.). And so the ads read, from company after company offering Big Data har...