"Big Data Management and Processing is [a] state-of-the-art book that deals with a wide range of topical themes in the field of Big Data. The book, which probes many issues related to this exciting and rapidly growing field, covers processing, management, analytics, and applications... [It] is a very valuable addition to the literature. It will serve as a source of up-to-date research in this continuously developing area. The book also provides an opportunity for researchers to explore the use of advanced computing technologies and their impact on enhancing our capabilities to conduct more sophisticated studies."

---Sartaj Sahni, University of Florida, USA

"Big Data Management and Processing covers the latest Big Data research results in processing, analytics, management and applications. Both fundamental insights and representative applications are provided. This book is a timely and valuable resource for students, researchers and seasoned practitioners in Big Data fields.

--Hai Jin, Huazhong University of Science and Technology, China

Big Data Management and Processing explores a range of big data related issues and their impact on the design of new computing systems. The twenty-one chapters were carefully selected and feature contributions from several outstanding researchers. The book endeavors to strike a balance between theoretical and practical coverage of innovative problem solving techniques for a range of platforms. It serves as a repository of paradigms, technologies, and applications that target different facets of big data computing systems.

The first part of the book explores energy and resource management issues, as well as legal compliance and quality management for Big Data. It covers In-Memory computing and In-Memory data grids, as well as co-scheduling for high performance computing applications. The second part of the book includes comprehensive coverage of Hadoop and Spark, along with security, privacy, and trust challenges and solutions.

The latter part of the book covers mining and clustering in Big Data, and includes applications in genomics, hospital big data processing, and vehicular cloud computing. The book also analyzes funding for Big Data projects.

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Publisher

Chapman and Hall/CRC

Year

2017

Print ISBN

9780367573614

eBook ISBN

9781351650045

Topic

Computer Science

Subtopic

Statistics for Business & Economics

Index

Computer Science

Big Data∗

Legal Compliance and Quality Management

Paolo Balboni and Theodora Dragan

CONTENTS

Abstract

1.1Introduction

1.1.1Topic, Approach, and Methodology

1.1.2Structure and Arguments

1.2Business of Big Data

1.2.1Connection between Big Data and Personal Data

1.2.1.1Any Information

1.2.1.2Relating to

1.2.1.3Identified or Identifiable

1.2.1.4Natural Person

1.2.2Competition Aspects

1.3Reconciling Traditional and Modern Data Protection Principles

1.3.1Traditional Data Protection Principles

1.3.1.1Transparency

1.3.1.2Proportionality and Purpose Limitation

1.3.2Modern Data Protection Principles

1.3.2.1Accountability

1.3.2.2Privacy by Design and by Default

1.3.2.3Users’ Control of Their Own Data

1.4Conclusions and Recommendations

ABSTRACT

The overlap between big data and personal data is becoming increasingly relevant in today’s society, in light of the technological developments and, in particular, of the increased use of personal data as currency for purchasing “free” services. The global nature of big data, coupled with recently developed data analytics and the interest of companies in predicting trends and consumer preferences, makes it necessary to analyze how personal data and big data are connected. With a focus on the quality of data as fundamental prerequisite for ensuring that outcomes are accurate and relevant, the authors explore the ways in which traditional and modern personal data protection principles apply to the big data context.

It is not about the quantity of the data, but about the quality of it!

1.1Introduction

It is 2016 and big data is everywhere: in the newspapers, on TV, in research papers, and on the lips of every IT specialist. This is not only due to its catchy name, but also due to the sheer quantity of data available—according to IBM, we create 2.5 quintillion (2.5 times 1018) bytes of data every day.∗ But what is the big deal with big data and, in particular, to what extent does it affect, or overlap with, personal data?

1.1.1Topic, Approach, and Methodology

By way of introduction, the first step is to provide a definition of the concept that runs through this chapter. Various attempts at defining big data have been made in recent years, but no universal definition has been agreed upon yet. This is likely due to the constant evolution of this concept, which makes it difficult to describe without risking that the definition is either too generic or that it becomes inadequate within a short period of time.

One attempt at a universal definition was made by Gartner, a leading information technology research and advisory company, that defines big data as “high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.”† In this case, data are regarded as assets, which attaches an intrinsic value to it. On the other hand, the Article 29 Data Protection Working Party defines big data as “the exponential growth both in the availability and in the automated use of information: it refers to gigantic digital datasets held by corporations, governments and other large organisations, which are then extensively analysed using computer algorithms.”‡ This definition regards big data as a phenomenon composed of both the process of collecting information and the subsequent step of analyzing it. The common elements of the different definitions are therefore the size of the database and the analytical aspect, which together are expected to lead to better, more focused services and products, as well as more efficient business operations and more targeted approaches.

Big data can be (and has been) used in an incredibly diverse range of situations. It was employed to help athletes of Great Britain’s rowing team achieve superior performance levels at the 2016 Olympic Games in Rio de Janeiro, by analyzing relevant information about their predecessors’ performance.§ Predictive analytics were used in order to deal with traffic in highly congested cities, paving the way for the creation of the smart cities of the future.¶ Further, big data can have a great impact on medical sciences, and has already helped boost obesity research results by enabling researchers to identify links between obesity and depression that were previously unknown.∗∗

Although big data does not always consist of personal data and could, for example, relate to technical information or to information about objects or natural phenomena, the European Data Protection Supervisor (EDPS) pointed out in its Opinion 7/2015 that “one of the greatest values of big data for businesses and governments is derived from the monitoring of human behaviour, collectively and individually.”∗ Analyzing and predicting human behavior enables decision makers in many areas to make decisions that are more accurate, consistent, and economical, thereby enhancing the efficiency of society as a whole. A few fields of application that immediately come to mind when thinking of big data analytics based on personal data are university admissions, job recruitment, customer profiling, targeted marketing, or health services. Analyzing the information about millions of previous applicants, candidates, customers, or patients makes it easy to establish common threads and to predict all sorts of things, such as whether a specific person is fit for the job or is likely to develop a certain disease in the future.

An interesting study was recently conducted by the University of Cambridge Psychometrics Centre: by analyzing the social networking “likes” of 58,000 users, researchers found that they were able to predict ethnic origin with an accuracy of 95% and religious or political orientation with an accuracy of over 80%.† Even more dramatically perhaps, they were able to predict psychological traits such as intelligence or emotional stability. The research was conducted using openly available data provided by the study subjects themselves (Facebook likes). Its results can be fine-tuned even further when cross-referencing them with data about the same subjects drawn from other sources, such as other social networking profiles or Internet usage habits. This is the point where big data starts overlapping with personal data, being separated only by a blurry border: “liking” a specific rock band does not constitute personal data as such, but the ability of linking this information directly to an individual or to other information makes it possible to identify what the person actually likes; furthermore, it enables to draw inferences about their personality, possibly revealing even sensitive political or religious preference (as was the case in the Cambridge study). “Companies may consider most of their data to be non personal data sets, but in reality it is now rare for data generated by user activity to be completely and irreversibly anonymised,” stated the EDPS in a recent Opinion.‡ The availability of massive amounts of data from different sources combined with the desire to learn more about people’s habits therefore poses a serious challenge regarding the right to privacy of the individual and requires that the data protection principles are carefully taken into consideration.

A fundamental part of big data analytics, however, is that the raw data must be accurate in order to lead to accurate results; massive quantities of inaccurate data can lead to skewed results and poor decision making. Bruce Schneier, an internationally renowned security technologist, refers to this as the “pollution problem of the information age.”§ There is a risk that analytical applications find patterns in cases where the individual facts are not directly correlated, which may lead to unfair conclusions and may adversely affect the persons involved. Another risk is that of being trapped in an “information bubble,” with people only being shown certain information that has been predicted to be of ...

Cover
Half Title
Series Page
Title Page
Copyright Page
Contents
Foreword
Preface
Acknowledgments
Editors
Contributor
Chapter 1: Big Data: Legal Compliance and Quality Management
Chapter 2: Energy Management for Green Big Data Centers
Chapter 3: The Art of In-Memory Computing for Big Data Processing
Chapter 4: Scheduling Nested Transactions on In-Memory Data Grids
Chapter 5: Co-Scheduling High-Performance Computing Application
Chapter 6: Resource Management for MapReduce Jobs Performing Big Data Analytics
Chapter 7: Tyche: An Efficient Ethernet-Based Protocol for Converged Networked Storage
Chapter 8: Parallel Backpropagation Neural Network for Big Data Processing on Many-Core Platform
Chapter 9: SQL-on-Hadoop Systems: State-of-the-Art Exploration, Models, Performances, Issues, and Recommendations
Chapter 10: One Platform Rules All: From Hadoop 1.0 to Hadoop 2.0 and Spark
Chapter 11: Security, Privacy, and Trust for User-Generated Content: The Challenges and Solutions
Chapter 12: Role of Real-Time Big Data Processing in the Internet of Things
Chapter 13: End-to-End Security Framework for Big Sensing Data Stream
Chapter 14: Considerations on the Use of Custom Accelerators for Big Data Analytics
Chapter 15: Complex Mining from Uncertain Big Data in Distributed Environments: Problems, Definitions, and Two Effective and Efficient Algorithms
Chapter 16: Clustering in Big Data
Chapter 17: Large Graph Computing Systems
Chapter 18: Big Data in Genomic
Chapter 19: Maximizing the Return on Investment in Big Data Projects: An Approach Based upon the Incremental Funding of Project Developmen
Chapter 20: Parallel Data Mining and Applications in Hospital Big Data Processin
Chapter 21: Big Data in the Parking Lot
Index

Frequently asked questions

Can I cancel at any time?

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

Can I download books?

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

What is the difference between the pricing plans?

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

How does Perlego work?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Can I read on my tablet or smartphone?

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Is Big Data Management and Processing an online PDF/ePUB?

Yes, you can access Big Data Management and Processing by Kuan-Ching Li, Hai Jiang, Albert Y. Zomaya, Kuan-Ching Li,Hai Jiang,Albert Y. Zomaya in PDF and/or ePUB format, as well as other popular books in Computer Science & Statistics for Business & Economics. We have over 1.5 million books available in our catalogue for you to explore.

About this book

Trusted by 375,005 students

Information

Table of contents

Frequently asked questions