Legal Compliance and Quality Management
Paolo Balboni and Theodora Dragan
CONTENTS
Abstract
1.1Introduction
1.1.1Topic, Approach, and Methodology
1.1.2Structure and Arguments
1.2Business of Big Data
1.2.1Connection between Big Data and Personal Data
1.2.1.1Any Information
1.2.1.2Relating to
1.2.1.3Identified or Identifiable
1.2.1.4Natural Person
1.2.2Competition Aspects
1.3Reconciling Traditional and Modern Data Protection Principles
1.3.1Traditional Data Protection Principles
1.3.1.1Transparency
1.3.1.2Proportionality and Purpose Limitation
1.3.2Modern Data Protection Principles
1.3.2.1Accountability
1.3.2.2Privacy by Design and by Default
1.3.2.3Usersā Control of Their Own Data
1.4Conclusions and Recommendations
ABSTRACT
The overlap between big data and personal data is becoming increasingly relevant in todayās society, in light of the technological developments and, in particular, of the increased use of personal data as currency for purchasing āfreeā services. The global nature of big data, coupled with recently developed data analytics and the interest of companies in predicting trends and consumer preferences, makes it necessary to analyze how personal data and big data are connected. With a focus on the quality of data as fundamental prerequisite for ensuring that outcomes are accurate and relevant, the authors explore the ways in which traditional and modern personal data protection principles apply to the big data context.
It is not about the quantity of the data, but about the quality of it!
1.1Introduction
It is 2016 and big data is everywhere: in the newspapers, on TV, in research papers, and on the lips of every IT specialist. This is not only due to its catchy name, but also due to the sheer quantity of data availableāaccording to IBM, we create 2.5 quintillion (2.5 times 1018) bytes of data every day.ā But what is the big deal with big data and, in particular, to what extent does it affect, or overlap with, personal data?
1.1.1Topic, Approach, and Methodology
By way of introduction, the first step is to provide a definition of the concept that runs through this chapter. Various attempts at defining big data have been made in recent years, but no universal definition has been agreed upon yet. This is likely due to the constant evolution of this concept, which makes it difficult to describe without risking that the definition is either too generic or that it becomes inadequate within a short period of time.
One attempt at a universal definition was made by Gartner, a leading information technology research and advisory company, that defines big data as āhigh-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.āā In this case, data are regarded as assets, which attaches an intrinsic value to it. On the other hand, the Article 29 Data Protection Working Party defines big data as āthe exponential growth both in the availability and in the automated use of information: it refers to gigantic digital datasets held by corporations, governments and other large organisations, which are then extensively analysed using computer algorithms.āā” This definition regards big data as a phenomenon composed of both the process of collecting information and the subsequent step of analyzing it. The common elements of the different definitions are therefore the size of the database and the analytical aspect, which together are expected to lead to better, more focused services and products, as well as more efficient business operations and more targeted approaches.
Big data can be (and has been) used in an incredibly diverse range of situations. It was employed to help athletes of Great Britainās rowing team achieve superior performance levels at the 2016 Olympic Games in Rio de Janeiro, by analyzing relevant information about their predecessorsā performance.§ Predictive analytics were used in order to deal with traffic in highly congested cities, paving the way for the creation of the smart cities of the future.¶ Further, big data can have a great impact on medical sciences, and has already helped boost obesity research results by enabling researchers to identify links between obesity and depression that were previously unknown.āā
Although big data does not always consist of personal data and could, for example, relate to technical information or to information about objects or natural phenomena, the European Data Protection Supervisor (EDPS) pointed out in its Opinion 7/2015 that āone of the greatest values of big data for businesses and governments is derived from the monitoring of human behaviour, collectively and individually.āā Analyzing and predicting human behavior enables decision makers in many areas to make decisions that are more accurate, consistent, and economical, thereby enhancing the efficiency of society as a whole. A few fields of application that immediately come to mind when thinking of big data analytics based on personal data are university admissions, job recruitment, customer profiling, targeted marketing, or health services. Analyzing the information about millions of previous applicants, candidates, customers, or patients makes it easy to establish common threads and to predict all sorts of things, such as whether a specific person is fit for the job or is likely to develop a certain disease in the future.
An interesting study was recently conducted by the University of Cambridge Psychometrics Centre: by analyzing the social networking ālikesā of 58,000 users, researchers found that they were able to predict ethnic origin with an accuracy of 95% and religious or political orientation with an accuracy of over 80%.ā Even more dramatically perhaps, they were able to predict psychological traits such as intelligence or emotional stability. The research was conducted using openly available data provided by the study subjects themselves (Facebook likes). Its results can be fine-tuned even further when cross-referencing them with data about the same subjects drawn from other sources, such as other social networking profiles or Internet usage habits. This is the point where big data starts overlapping with personal data, being separated only by a blurry border: ālikingā a specific rock band does not constitute personal data as such, but the ability of linking this information directly to an individual or to other information makes it possible to identify what the person actually likes; furthermore, it enables to draw inferences about their personality, possibly revealing even sensitive political or religious preference (as was the case in the Cambridge study). āCompanies may consider most of their data to be non personal data sets, but in reality it is now rare for data generated by user activity to be completely and irreversibly anonymised,ā stated the EDPS in a recent Opinion.ā” The availability of massive amounts of data from different sources combined with the desire to learn more about peopleās habits therefore poses a serious challenge regarding the right to privacy of the individual and requires that the data protection principles are carefully taken into consideration.
A fundamental part of big data analytics, however, is that the raw data must be accurate in order to lead to accurate results; massive quantities of inaccurate data can lead to skewed results and poor decision making. Bruce Schneier, an internationally renowned security technologist, refers to this as the āpollution problem of the information age.ā§ There is a risk that analytical applications find patterns in cases where the individual facts are not directly correlated, which may lead to unfair conclusions and may adversely affect the persons involved. Another risk is that of being trapped in an āinformation bubble,ā with people only being shown certain information that has been predicted to be of ...