The Big Data Agenda : Data Ethics and Critical Data Studies
eBook - ePub

The Big Data Agenda : Data Ethics and Critical Data Studies

  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

The Big Data Agenda : Data Ethics and Critical Data Studies

About this book

Current big data practices are largely guided by deliberations concerning their efficiency, and optimisation. Yet there is another perspective. This book highlights that the capacity for gathering, analysing, and utilising vast amounts of digital (user) data raise significant ethical issues. Annika Richterich provides a systematic contemporary overview of the field of critical data studies that reflects on – corporate, institutional, and governmental – practices of digital data collection and analysis. It assesses in detail one big data research area: biomedical studies, focused on epidemiological surveillance. Specific case studies explore how big data have been used in academic work. The Big Data Agenda concludes by asking if data ownership can be reclaimed by citizens from being simply an assertion of a conception of rights to (user) data that is defined by technological domination. She argues data literacy and discourse ethics may contain solutions as well as a critique.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access The Big Data Agenda : Data Ethics and Critical Data Studies by Annika Richterich in PDF and/or ePUB format, as well as other popular books in Social Sciences & Media Studies. We have over one million books available in our catalogue for you to explore.

CHAPTER 1

Introduction

In times of big data and datafication, we should refrain from using the term ‘sharing’ too lightly. While users want, or need, to communicate online with their family, friends or colleagues, they may not intend their data to be collected, documented, processed and interpreted, let alone traded. Nevertheless, retrieving and interrelating a wide range of digital data points, from, for instance. social networking sites, has become a common strategy for making assumptions about users’ behaviour and interests. Multinational technology and internet corporations are at the forefront of these datafication processes. They control, to a large extent, what data are collected about users who embed various digital, commercial platforms into their daily lives.
Tech and internet corporations determine who receives access to the vast digital data sets generated on their platforms, commonly called ‘big data’. They define how these data are fed back into algorithms crucial to the content that users subsequently get to see online. Such content ranges from advertising to information posted by peers. This corporate control over data has given rise to considerable business euphoria. At the same time, the power exercised with data has increasingly been the subject of bewilderment, controversies, concern and activism during recent years. It has been questioned at whose cost the Silicon Valley mantra ‘Data is the new oil’1 is being put into practice. It is questioned whether this view on data is indeed such an alluring prospect for societies relying increasingly on digital technology, and for individuals exposed to datafication.
Datafication refers to the quantification of social interactions and their transformation into digital data. It has advanced to an ideologically infused ‘[...] leading principle, not just amongst technoadepts, but also amongst scholars who see datafication as a revolutionary research opportunity to investigate human conduct’ (van Dijk 2014, 198). Datafication points to the widespread ideology of big data’s desirability and unquestioned superiority, a tendency termed ‘dataism’ by van Dijk (2014). This book starts from the observation that datafication has left its mark not only on corporate practices, but also on approaches to scientific research. I argue that, as commercial data collection and research become increasingly entangled, interdependencies are emerging which have a bearing on the norms and values relevant to scientific knowledge production.
Big data have not only triggered the emergence of new research approaches and practices, but have also nudged normative changes and sparked controversies regarding how research is ethically justified and conceptualised. Big data and datafication ‘drive’ research ethics in multiple ways. Those who deem the use of big data morally reasonable have normatively framed and justified their approaches. Those who perceive the use of big data in research as irreconcilable with ethical principles have disputed emerging approaches on normative grounds. What we are currently witnessing is a coexistence of research involving big data and contested data ethics relevant to this field. I explore to what extent these positions unfold in dialogue with (or in isolation from) each other and relevant stakeholders.
This book interrogates entanglements between corporate big data practices, research approaches and ethics: a domain which is symptomatic of broader challenges related to data, power and (in-)justice. These challenges, and the urgent need to reflect on, rethink and recapture the power related to vast and continually growing ‘big data’ sets have been forcefully stressed in the field of critical data studies (Iliadis and Russo 2016; Dalton, Taylor and Thatcher 2016; Lupton 2015; Kitchin and Lauriault 2014; Dalton and Thatcher 2014). Approaches in this interdisciplinary research field examine practices of digital data collection, utilisation, and meaning-making in corporate, governmental, institutional, academic, and civic contexts.
Research in critical data studies (CDS) deals with the societal embeddedness and constructedness of data. It examines significant economic, political, ethical, and legal issues, as well as matters of social justice concerning data (Taylor 2017; Dencik, Hintz and Cable 2016). While most companies have come to see, use and promote data as a major economic asset, allegedly comparable to oil, CDS emphasises that data are not a mere commodity (see also Thorp 2012). Instead, many types of digital data are matters of civic rights, personal autonomy and dignity. These data may emerge, for example, from individuals’ use of social networking sites, their search engine queries or interaction with computational devices. CDS researchers analyse and examine the implications, biases, risks and inequalities, as well as the counter-potential, of such (big) data. In this context, the need for qualitative, empirical approaches to data subjects’ daily lives and data practices (Lupton 2016; Metcalf and Crawford 2016) has been increasingly stressed. Such critical work is evolving in parallel with the spreading ideology of datafication’s unquestioned superiority: a tendency which is also noticeable in scientific research.
Many scientists have been intrigued by the methodological opportunities opened up by big data (Paul and Dredze 2017; Young, Yu and Wang 2017; Paul et al. 2016; Ireland et al. 2015; Kramer, Guillory and Hancock 2014; Chunara et al. 2013; see also Chapter 5). They have articulated high hopes about the contributions big data could make to scientific endeavours and policy making (Kettl 2017; Salganik 2017; Mayer-SchĂśnberger and Cukier 2013). As I show in this book, data produced and stored in corporate contexts increasingly play a part in scientific research, conducted also by scholars employed at or affiliated with universities. Such data were originally collected and enabled by internet and tech companies owning social networking sites, microblogging services and search engines.
I focus on developments in public health research and surveillance, with specific regard to the ethics of using big data in these fields. This domain has been chosen because data used in this context are highly sensitive. They allow, for example, for insights into individuals’ state of health, as well as health-relevant (risk) behaviour. In big data-driven research, the data often stem from commercial platforms, raising ethical questions concerning users’ awareness, informed consent, privacy and autonomy (see also Parry and Greenhough 2018, 107–154). At the same time, research in this field has mobilised the argument that big data will make an important contribution to the common good by ultimately improving public health. This is a particularly relevant research field from a CDS perspective, as it is an arena of promises, contradictions and contestation. It facilitates insights into how technological and methodological developments are deeply embedded in and shaped by normative moral discourses.
This study follows up earlier critical work which emphasises that academic research and corporate data sources, as well as tools, are increasingly intertwined (see e.g. Sharon 2016; Harris, Kelly and Wyatt 2016; Van Dijck 2014). As Van Dijck observes, the commercial utilisation of big data has been accompanied by a ‘[...] gradual normalization of datafication as a new paradigm in science and society’ (2014, 198). The author argues that, since researchers have a significant impact on the establishment of social trust (206), academic utilisations of big data also give credibility to their collection in commercial contexts the societal acceptance of big data practices more generally.
This book specifically sheds light on how big data-driven public health research has been communicated, justified and institutionally embedded. I examine interdependencies between such research and the data, infrastructures and analytics shaped by multinational internet/tech corporations. The following questions, whose theoretical foundation is detailed in Chapter 2, are crucial for this endeavour: What are the broader discursive conditions for big data-driven health research: Who is affected and involved, and how are certain views fostered or discouraged? Which ethical arguments have been discussed: How is big data research ethically presented, for example as a relevant, morally right, and societally valuable way to gain scientific insights into public health? What normativities are at play in presenting and (potentially) debating big data-driven research on public health surveillance?
I thus emphasise two analytical angles: first, the discursive conditions and power relations influencing and emerging in interaction with big data research; second, the values and moral arguments which have been raised (e.g. in papers, projects descriptions and debates) as well as implicitly articulated in research practices. I highlight that big data research is inherently a ground of normative framing and debate, although this is rarely foregrounded in big data-driven health studies. To investigate the abovementioned issues, I draw on a pragmatist approach to ethics (Keulartz et al. 2004). Special emphasis is placed on Jürgen Habermas’ notion of ‘discourse ethics’ (2001 [1993], 1990). This theory was in turn inspired by Karl-Otto Apel (1984) and American pragmatism. It will be introduced in more detail in Chapter 2.
Already at this point it is important to stress that the term ‘ethical’ in this context serves as a qualifier for the kind of debate at hand – and not as a normative assessment of content. Within a pragmatist framework, something is ethical because values and morals are being negotiated. this means that ‘unethical’ is not used to disqualify an argument normatively. Instead, it would merely indicate a certain quality of the debate, i.e. that it is not dedicated to norms, values, or moral matters. A moral or immoral decision would be in either case an ethical issue, and ‘[w]e perform ethics when we put up moral routines for discussion’ (Swierstra and Rip 2007, 6).
To further elaborate the perspective taken in this book, the following sections expand on key terms relevant to my analysis: big data and critical data studies. Subsequently, I sketch main objectives of this book and provide an overview of its six chapters.

Big Data: Notorious but Thriving

In 2018, the benefits and pitfalls of digital data analytics were still largely attributed to a concept which had already become somewhat notorious by then: big data. This vague umbrella term refers to the vast amounts of digital data which are being produced in technologically and algorithmically mediated practices. Such data can be retrieved from various digital-material social activities, ranging from social media use to participation in genomics projects.2
Data and their analysis have of course long been a core concern for quantitative social sciences, the natural sciences, and computer science, to name just a few examples. Traditionally though, data have been scarce and their compilation was subject to controlled collection and deliberate analytical processes (Kitchin 2014a; boyd 2010). In contrast, the ‘[...] challenge of analysing big data is coping with abundance, exhaustivity and variety, timeliness and dynamism, messiness and uncertainty, high relationality, and the fact that much of what is generated has no specific question in mind or is a by-product of another activity.’ (Kitchin 2014a, 2)
Already in 2015, The Gartner Group ceased issuing a big data hype cycle and dropped ‘big data’ from the Emerging technologies hype cycle. A Gartner analyst justified this decision, not on the grounds of the term’s irrelevance, but because of big data’s ubiquitous pervasion of diverse domains: it ‘[...] has become prevalent in our lives across many hype cycles.’ (Burton 2015) One might say that the ‘[b]ig data hype [emphasis added] is officially dead’, but only because ‘[...] big data is now the new normal’ (Douglas 2016). While one may argue that the concept has lost its ‘news value’ and some of its traction (e.g. for attracting funding and attention more generally), it is still widely used, not least in the field relevant to his book. For these reasons, I likewise still use the term ‘big data’ when examining developments and cases in public health surveillance. Despite the fact that the hype around big data seems to have passed its peak, much confusion remains about what this term actually means.
In the wake of the big data hype, the interdisciplinary field of data science (Mattmann 2013; Cleveland 2001) received particular attention. Already in the 1960s, Peter Naur – himself a computer scientist – suggested the terms ‘data science’ and ‘datalogy’ as preferable alternatives to ‘computer science’ (Naur 1966; see also Sveinsdottir and Frøkjær 1988). While the term ‘datology’ has not been taken up in international (research) contexts, ‘data science’ has shown that it has more appeal: As early as 2012, Davenport and Patil even went as far as to call data scientist ‘the Sexiest Job of the 21st Century’. Their proposition is indicative of a wider scholarly and societal fascination with new forms of data, ways of retrieval and analytics, thanks to ubiquitous digital technology.
More recently, data science has often been defined in close relation to corporate uses of (big) data. Authors such as Provost and Fawcett state, for instance, that defining ‘[...] the boundaries of data science precisely is not of the utmost importance’ (2013, 51). According to the authors, while this may be of interest in an academic setting, it is more relevant to identify common principles ‘[...] in order for data science to serve business effectively’ (51). In such contexts, big data are indeed predominantly seen as valuable commercial resources, and data science as key to their effective utilisation. The possibilities, hopes, and bold promises put forward for big data have also fostered the interest of political actors, encouraging policymakers such as Neelie Kroes, European Commissioner for the Digital Agenda from 2010 until 2014, to reiterate in one of her speeches on open data: ‘That’s why I say that data is the new oil for the digital age.’ (Kroes 2012)
There are various ways and various reasons to collect big data in corporate contexts: social networking sites such as Facebook document users’ digital interactions (Geerlitz and Helmond 2013). Many instant messaging applications and email providers scan users’ messages for advertising purposes or security-related keywords (Gibbs 2014; Wilhelm 2014; Godin 2013). Every query entered into the search engine Google is documented (Ippolita 2013; Richterich 2014a). And not only users’ digital interactions and communication, but their physical movements and features are turned into digital data. Wearable technology tracks, archives and analyses its owners’ steps and heart rate (Lupton 2014a). Enabled by delayed legal interference, companies such as 23andMe sold personal genomic kits which customers returned with saliva samples, i.e. personal, genetic data. By triggering users’ interest in health information based on genetic analyses, between 2007 and 2013, the company built a corporately owned genotype database of more than 1,000,000 individuals (see Drabiak 2016; Harris, Kelly, and Wyatt 2013a; 2013b; Annas and Sherman 2014).3
One feature common to all of these examples is the emergence of large-scale, continuously expanding databases. Such databases allow for insights into, for example, users’ (present or future) physical condition; the frequency and (linguistic) qualities of their social contacts; their search preferences and patterns; and their geographic mobility. Broadly speaking, corporate big data practices are aimed at selling or employing these data in order to provide customised user experiences, and above all to generate profit.4
Big data differ from traditional large-scale datasets with regards to their volume, velocity, and variety (Kitchin 2014a, 2014b; boyd and Crawford 2012; Marz and Warren 2012; Zikopoulos et al. 2012). These ‘three Vs’ are a commonly quoted reference point for big data. Such datasets are comparatively flexible, easily scalable, and have a strong indexical quality, i.e. are used for drawing conclusions about users’ (inter-)actions. While volume, velocity, and variety are often used to define big data, critical data scholars such as Deborah Lupton have highlighted that ‘[t]hese characterisations principally come from the worlds of data science and data analytics. From the perspective of critical data researchers, there are different ways in which...

Table of contents

  1. Cover
  2. Title Page
  3. Copyright
  4. Acknowledgments
  5. Contents
  6. Chapter 1: Introduction
  7. Chapter 2: Examining (Big) Data Practices and Ethics
  8. Chapter 3: Big Data: Ethical Debates
  9. Chapter 4: Big Data in Biomedical Research
  10. Chapter 5: Big Data-Driven Health Surveillance
  11. Chapter 6: Emerging (Inter-)Dependencies and their Implications
  12. Notes
  13. References
  14. Index