Frontiers in Data Science
  1. 383 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

About this book

Frontiers in Data Science deals with philosophical and practical results in Data Science. A broad definition of Data Science describes the process of analyzing data to transform data into insights. This also involves asking philosophical, legal and social questions in the context of data generation and analysis. In fact, Big Data also belongs to this universe as it comprises data gathering, data fusion and analysis when it comes to manage big data sets. A major goal of this book is to understand data science as a new scientific discipline rather than the practical aspects of data analysis alone.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Frontiers in Data Science by Matthias Dehmer, Frank Emmert-Streib, Matthias Dehmer,Frank Emmert-Streib in PDF and/or ePUB format, as well as other popular books in Economics & Computer Science General. We have over one million books available in our catalogue for you to explore.

Information

Chapter 1
Legal aspects of information science, data science, and Big Data
Alessandro Mantelero
Giuseppe Vaciago
Introduction: The legal challenges of the use of data
Data collection and data processing: The fundamentals of data protection regulations
The European Union model: From the Data Protection Directive to the General Data Protection Regulation
Use of data and risk-analysis
Use of data for decision-making purposes: From individual to collective dimension of data processing
Data-centered approach and socio-ethical impacts
Multiple-risk assessment and collective interests
The guidelines adopted by the Council of Europe on the protection of individuals with regard to the processing of personal data in a world of Big Data
Data prediction: Social control and social surveillance
Use of data during the investigation: Reasonable doubt versus reasonable suspicion
Big Data and social surveillance: Public and private interplay in social control
The EU reform on data protection
References
Introduction: The legal challenges of the use of data
There are many definitions of Big Data, which differ depending on the specific discipline. Most of the definitions focus on the growing technological ability to collect, process, and extract new and predictive knowledge from a bulk of data characterized by a great volume, velocity, and variety.
However, in terms of protection of individual rights, the main issues do not only concern the volume, velocity, and variety of processed data, but also the analysis of data, using software to extract new and predictive knowledge for decision-making purposes. Therefore, in this contribution, the definition of Big Data encompasses both Big Data and Big Data analytics.
The advent of Big Data has suggested a new paradigm in social empirical studies, in which the traditional approach adopted in statistical studies is complemented or replaced by Big Data analysis. This new paradigm is characterized by the relevant role played by data visualization, which makes it possible the analysis of real-time data streams to get their trajectory and predict future trends possible [3]. Moreover, large amounts of data make it possible to use unsupervised machine-learning algorithms to discover hidden correlations between variables that characterize large datasets.
This kind of approach, which is based on the emerging correlations among data, leads social investigation to adopt a new strategy, in which there are no preexisting research hypotheses to be verified through empirical statistical studies. Big Data analytics suggest possible correlations, which constitute per se the research hypothesis: data show the potential relations between facts or behavior. Nevertheless, these relations are not grounded on causation and, for this reason, should be further investigated using the traditional statistical method.
Assuming that data trends suggest correlations and consequent research hypotheses, at the moment of data collection only very general research hypotheses are possible, as the potential data patterns are still unknown. Therefore, the specific purpose of data processing can be identified only at a later time, when correlations reveal the usefulness of some information to detect specific aspects. Only at that time, the given purpose of the use of information becomes evident, also with regard to further analyses conducted with traditional statistical methods [4].
On the other hand, there are algorithms, such as supervised machine-learning algorithms, that need a preliminary training phase. In this stage, a supervisor uses data training sets to correct the errors of the machine, orienting the algorithm toward correct associations. In this sense, supervised machine-learning algorithms require a prior definition of the purpose of the use of data, identifying the goal that the machine should reach through autonomous processing of all available data.
In this case, although the purpose of data use is defined in the training phase, the manner in which data are processed and the final outcome of data mining remain largely unknown. In fact, these algorithms are black boxes and their internal dynamics are partially unpredictable.
Both data visualization and machine-learning applications pose relevant questions in terms of Big Data processing, which will be addressed in the following sections. How is it possible to define the specific purpose of data processing at the moment of data collection, when the correlations suggested by analytics are unknown at that time? If different sources of data are used in machine training and running learning algorithms, how can data subjects know the specific purpose of the use of their information in given machine-learning applications?
These questions clearly show the tension that characterizes the application of the traditional data protection principles in the Big Data context. But this is not the only crucial aspect: the very notion of personal data is becoming more undefined. Running Big Data analytics over large datasets could make it difficult to distinguish between personal data and anonymous data, as well as between sensitive data and nonsensitive data.
Various studies have demonstrated how information stored in anonymized datasets can be partially reidentified, in some cases without expensive technical solutions [5,6,7,8,9,10,11,12]. This suggests going beyond the traditional dichotomy between personal and anonymous data and representing this distinction as a scale that moves from personal identified information to aggregated data. Between these extremes, the level of anonymization is proportional to the effort, in terms of time, resources and costs, which is required to reidentify information.
Finally, with regard to sensitive data, Big Data analytics make it possible to use nonsensitive data to infer sensitive information, such as information concerning religious practices extracted from location data and mobility patterns [13].
Against this background, the existing data protection regulations and the ongoing proposals [14,15] remain largely focused on the traditional main pillars of the so-called fourth generation of data protection laws [16]: the notice and consent model (i.e., an informed, freely given, and specific consent) [17,18,19,20,21], the purpose limitation principle [24,25], and the minimization principle.
For this reason, the following sections investigate the limits and criticisms of the existing legal framework and the possible options to provide adequate answers to the new challenges of Big Data processing. In this light, this chapter is divided into three main sections.
The first section focuses on the traditional paradigm of data protection and on the provisions, primarily in the new EU General Data Protection Regulation (Regulation (EU) 2016/679, hereafter GDPR), that can be used to safeguard individual rights in Big Data processing.
The second section goes beyond the existing legal framework and, in the light of the path opened by the guidelines on Big Data adopted by the Council of Europe, suggests a broader approach that encompasses the collective dimension of data protection. This dimension often characterizes Big Data applications and leads to assess the ethical and social impacts of data uses, which assume an important role in many Big Data contexts.
The last section deals with the use of Big Data to anticipate fraud detection and to prevent crime. In this light, the new Directive (EU) 2016/680 is briefly analyzed.
Data collection and data processing: The fundamentals of data protection regulations
Before considering the different reasons that induce the law to protect personal information, it should be noted that European legal systems do not recognize the same broad notion of the right to privacy that exists in U.S. jurisprudence. At the same time, in the European countries, data protection laws do not draw their origins from the European idea of privacy and its related case law.
European data protection regulations, since their origins in the second half of the last century, focused on information regarding individuals, without distinguishing between public or private information [32]. Compared with the right to privacy, the issues regarding the protection of personal data have been more recently recognized by law, both in the United States and Europe [33]. This dates from the 1960s, whereas the primitive era of the right to privacy was at the end of the nineteenth century, when the penny press assumed a significant role in limiting the privacy of the people belonging to upper classes [34].
In the light of the above, the analysis of the fundamentals of data processing should start from the effects of the computer revolution that happened in the late 1950s. The advent of computers and its social impact led to the first regulations on data protection and posed the first pillars of the architecture of the present legal framework.
The first generations of data protection regulations were characterized by a national approach. They were adopted in different times by national legislators and were different with regard to the extension of the safeguards provided and the remedies offered.
The notion of data protection was originally based on the idea of control over information, as confirmed by the literature of that period [35,36,37]. The migration from dusty paper archives to computer memories was a Copernican revolution which, for the first time in history, permitted the aggregation of information about every citizen that was previously spread over different archives [38].
The first data protection regulations were the answer to the rising concern of citizens about social control, as the new big mainframe computers gave governments [16,38,39,40,41] and large corporations the opportunity to collect and manage large amount of personal information [16,42]. In this sense, the legal systems gave individuals the opportunity to have a sort of countercontrol over the collected data [16,38,43].
The purpose of the regulations was not to spread and democratize power over information but to increase the level of transparency about data processing and safeguard the right to access to information. Citizens felt they were monitored, and the law gave them the opportunity to know who controlled their data, which kind of information was collected, and for which purposes.
The mandatory notifications of new databases, registration, licensing procedures, and independent authorities [16,44] were the fundamental elements of these new regulations. They were necessary to know who had control over information and to monitor data processing. Another key component was the right to access, which allows citizens to ask data owners about the way in which information is used and, consequently, about the exercise of their power over information. Finally, the entire picture was completed by the creation of ad hoc public authorities to safeguard and enforce citizen’s rights, exercise control over data owners, and react against abuses.
In this model, there was no space for individual consent, due to the economic context of that period. The collect...

Table of contents

  1. Cover
  2. Half Title
  3. Title Page
  4. Copyright Page
  5. Table of Contents
  6. About the Editors
  7. Contributors
  8. 1 Legal aspects of information science, data science, and Big Data
  9. 2 Legal and policy aspects of information science in emerging automated environments
  10. 3 Privacy as secondary rule, or the intrinsic limits of legal orders in the age of Big Data
  11. 4 Data ownership: Taking stock and mapping the issues
  12. 5 Philosophical and methodological foundations of text data analytics
  13. 6 Mobile commerce and the consumer information paradox: A review of practice, theory, and a research agenda
  14. 7 The impact of Big Data on making evidence-based decisions
  15. 8 Automated business analytics for artificial intelligence in Big Data@X 4.0 era
  16. 9 The evolution of recommender systems: From the beginning to the Big Data era
  17. 10 Preprocessing in Big Data: New challenges for discretization and feature selection
  18. 11 Causation, probability, and all that: Data science as a novel inductive paradigm
  19. 12 Big Data in healthcare in China: Applications, obstacles, and suggestions
  20. Index