1. Introduction
The COVID-19 pandemic has not only created an unprecedented health emergency in modern times across the globe; it has also brought forward a variety of data privacy issues. Imposed lockdowns, quarantines and ‘self-isolation’ measures are examples of what Anita Allen has coined as ‘unpopular privacy’.1 ‘Unpopular privacy’ refers to coercive mandates that ‘impose unpopular privacies on intended targets and beneficiaries’ like the COVID-19-related social distancing rules.2 Schools and workplaces are closed; public events are cancelled; the use of public transport is limited;3 people are even forbidden to do normal everyday activities,4 such as sunbathing.5 At the same time and in order to combat this pandemic, whole populations are required to endure increased surveillance of their location, their movements and their contacts6 via the invasive monitoring of mobile phone data.7
Widespread health data surveillance is not a new phenomenon. Health data and the capture of their enormous potential through big data analytics have been at the forefront of recent debates, before the emergence of a global health pandemic. Data privacy regulatory responses to health data surveillance vary around the world, but the EU’s General Data Protection Regulation (GDPR),8 with its strengthened data privacy rules and principles, remains a point of reference. This chapter critically examines the GDPR’s provisions relating to health by focusing on two main issues: i) the definitional uncertainties surrounding health data and ii) the legislative choices regarding the balance between the competing interests to data privacy on the one hand – seen mainly within the context of the enhanced protection that personal health data enjoy – and the interests of ‘public health’ on the other hand.
The analysis proceeds as follows: The following section assesses the definitional uncertainties that big health data raise. It takes a closer look at big data analytics and the sources of big health data and examines definitional questions within the GDPR’s context. Section 3 discusses the GDPR’s legislative choices regarding health data by focusing on their enhanced protection as ‘special categories of data’ and the exemptions and restrictions imposed on these for public health purposes. Section 4 offers brief conclusions.
2. On definitional issues: what are big health data?
2.1 Big data analytics
We are living in a big data world. Every minute, 510,000 comments are posted on Facebook, 293,000 statuses are updated, and 136,000 photos are uploaded. Every day, 3.5 billion Google searches are made; 6,000 tweets are sent per second; and more than 95 million photos and videos are uploaded on Instagram per day. There are 3.3 billion smartphone users worldwide, and the average smartphone user has between 60 and 90 apps on their device9 collecting some kind of personal data (i.e., name, email address, location).10 Outside the online world, the Internet of Things (IoT) ‘merges physical and virtual worlds’11 through a range of interconnected devices12 that communicate data, such as smart thermostats, meters, doorbells, smoke alarms, cameras, digital assistants, TVs and refrigerators.13 According to the European Commission, the value of European citizens’ personal data has the potential to grow to nearly €1 trillion annually.
There is no commonly agreed-upon definition of ‘big data’.14 In broad terms, big data refers to the aggregation of huge volumes of diversely sourced information and their analysis, using sophisticated algorithms to inform decisions.15 Big data is made possible due to the increasing capabilities of technology to support the collection and storage of large amounts of data, as well as ‘its ability to analyse, understand and take advantage of the full value of data (in particular using analytics applications)’.16 Big data is often described using the five Vs: Volume, Variety, Velocity, Veracity and Value.17 Volume refers to the expanding amounts of data generated and the large-scale datasets; Variety relates to the different types of data and data sources; Velocity describes both the increasing speed at which data is produced and the increasing demand to analyse the data in near real time to get insights; Veracity18 refers to the correctness and accuracy of the data; and Value denotes the opportunities of big data to lead to measurable improvements of our lives.19
Perhaps the most important characteristic of big data refers to the ways this is analysed. The full potential of big data can be realised using artificial intelligence (AI).20 AI is needed to ‘mine, parse, sort and configure the data into useful packages’,21 build models and draw inferences that are then used ‘to predict and anticipate possible future events’.22 This is often done through machine learning, namely ‘algorithms that change in response to their own output, or “computer programs that automatically improve with experience”’.23 Machine learning means that the system is able to train itself to learn continuously and modify its behaviour during operation, thus acquiring a level of autonomy.24 Big data, AI and machine learning are closely related concepts and sometimes are referred to interchangeably. However, there are differences between the two. As the UK Government Office for Science astutely puts it: ‘If data is the fuel, artificial intelligence is the engine of the digital revolution’.25 As it might be more accurate in terms of terminology to use the umbrella concept ‘big data analytics’ to describe all three of them.26 That being said, this chapter and this book understand ‘big data’ as ‘big data analytics’ and the two terms are used interchangeably.
2.2 Big health data
Health data are at the centre of the big data revolution. Over 250,000 health and fitness apps are currently available on the market. The sale of wearables, such as smart watches, fitness trackers, eye gears, smart clothing, smart jewel-lery and implantables is on the rise, with more than 170 million wearables being purchased in 2018.27 There are ‘vagina fitbits’,28 smart vibrators, smart diapers,29 and smart baby socks that measure babies’ ‘temperature, heart rate, oxygen saturation and movement’30 available on the market. Our bodies emit streams of data: everything from physical activity, calorie intake, sleep and posture to sexual intercourse, menstrual cycles, fertility and breathing patterns can be (self)-tracked, measured, logged and (self)-analysed in order to achieve ‘self-knowledge through numbers’.31 The observation of our bodies through technologies is ingrained in our everyday lives, and global trends such as the Quantified-Self are constantly growing.32 Platforms like PatientsLikeMe enable the exchange of information about illnesses, creating ‘a community of people who are helping each other live their best every day’.33 According to Patient-sLikeMe, over ‘650,000 people living with 2,900 conditions have generated more than 43 million data points, creating an unprecedented source of real-world evidence and opportunities for continuous learning.’34
Big health data analytics promise a number of benefits. Indeed, the convergence between technology and healthcare is expected to i) increase quality of life and contribute to disease prevention,35 and therefore reduce healthcare expenditure;36 ii) allow ‘better healthcare at a lower cost’; iii) foster ‘patient empowerment (i.e. improved control over own healthcare)’; iv) enable ‘easier and more immediate access to medical care and information online’;37 and v) develop ‘more efficient and sustainable healthcare’.38 Algorithmic analysis of huge datasets will develop ‘personalised medicine’ based on more accurate diagnostic predictions and treatment suggestions.39 Such improvements are not an issue of the future; they are happening right now. Deep learning AI is already ‘on a par with human experts’40 when it comes to making medical diagnoses of diseases from cancers to eye conditions41 based on images, and it might soon outperform humans. Big data analysis allows the discovery of previously unknown trends, correlations and patterns and, therefore, offers new valuable insights for medical research.42
2.3 On definitional uncertainties: what are ‘big health data’?
Big health data are generated en masse and offer significant promises to improve our well-being and healthcare. If, therefore, we are to study carefully the challenges that the immense datafication of our bodies is posing and the ways the law can approach these challenges, we need first to define what ‘health data’ and ‘big health data’ means.
Unlike its predecessor (the Data Protection Directive43), the GDPR contains a definition of ‘data concerning health’. This can serve as a starting point for the present analysis. According to the GDPR, ‘data concerning health’ refers to ‘personal data related to the physical or mental health of a natural person, including the provision of health care services, which reveal information about his or her health status’.44 Recital 35 further explains that
personal data concerning health should include all data pertaining to the health status of a data subject which reveal information relating to the past, current or future physical or mental health status of the data subject. This includes information about the natural person collected in the course of the registration for, or the provision of, health care services as referred to in Directive 2011/24/EU.45… to that natural person; a number, symbol or particular assigned to a natural person to uniquely identify the natural person for health purposes; info...