Big Data
eBook - ePub

Big Data

Principles and Paradigms

Rajkumar Buyya,Rodrigo N. Calheiros,Amir Vahid Dastjerdi

Share book
  1. 494 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Big Data

Principles and Paradigms

Rajkumar Buyya,Rodrigo N. Calheiros,Amir Vahid Dastjerdi

Book details
Book preview
Table of contents
Citations

About This Book

Big Data: Principles and Paradigms captures the state-of-the-art research on the architectural aspects, technologies, and applications of Big Data. The book identifies potential future directions and technologies that facilitate insight into numerous scientific, business, and consumer applications.

To help realize Big Data's full potential, the book addresses numerous challenges, offering the conceptual and technological solutions for tackling them. These challenges include life-cycle data management, large-scale storage, flexible processing infrastructure, data modeling, scalable machine learning, data analysis algorithms, sampling techniques, and privacy and ethical issues.

  • Covers computational platforms supporting Big Data applications
  • Addresses key principles underlying Big Data computing
  • Examines key developments supporting next generation Big Data platforms
  • Explores the challenges in Big Data computing and ways to overcome them
  • Contains expert contributors from both academia and industry

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Big Data an online PDF/ePUB?
Yes, you can access Big Data by Rajkumar Buyya,Rodrigo N. Calheiros,Amir Vahid Dastjerdi in PDF and/or ePUB format, as well as other popular books in Informatique & Traitement des données. We have over one million books available in our catalogue for you to explore.

Information

Year
2016
ISBN
9780128093467
Part I
Big Data Science
Chapter 1

Big Data Analytics = Machine Learning + Cloud Computing

C. Wu; R. Buyya; K. Ramamohanarao

Abstract

“Big Data” can mean different things to different people. The scale and challenges of Big Data are often described using three attributes, namely volume, velocity, and variety (3Vs), which only reflect some of the aspects of data. In this chapter, we review historical aspects of the term “big data” and the associated analytics. We augment the 3Vs with additional attributes of big data to make it more comprehensive and relevant. We show that Big Data is not just the 3Vs, but actually 32Vs; that is, 9Vs covering the fundamental motivation behind Big Data, which is to incorporate business intelligence based on different hypothesis or statistical models so that Big Data analytics (BDA) can enable decision makers to make useful predictions for making some crucial decisions or researching results. History of Big Data has demonstrated that the most cost-effective way of performing BDA is to employ machine learning (ML) on the cloud computing (CC)-based infrastructure or simply, ML + CC → BDA. This chapter is devoted to help decision makers by defining BDA as a solution and opportunity to address their business needs.

Keywords

Big Data analytics (BDA); Business intelligence (BI); Machine learning (ML); Cloud computing (CC); Extraction, Transformation, and load (ETL); Statistics; Hadoop; Spark; Flink; MapReduce

1.1 Introduction

Although the term “Big Data” has become popular, there is no general consensus about what it really means. Often, many professional data analysts would imply the process of extraction, transformation, and load (ETL) for large datasets as the connotation of Big Data. A popular description of Big Data is based on three main attributes of data: volume, velocity, and variety (or 3Vs). Nevertheless, it does not capture all the aspects of Big Data accurately. In order to provide a comprehensive meaning of Big Data, we will investigate this term from a historical perspective and see how it has been evolving from yesterday’s meaning to today’s connotation.
Historically, the term Big Data is quite vague and ill defined. It is not a precise term and does not carry a particular meaning other than the notion of its size. The word “big” is too generic; the question how “big” is big and how “small” is small [1] is relative to time, space, and circumstance. From an evolutionary perspective, the size of “Big Data” is always evolving. If we use the current global Internet traffic capacity [2] as a measuring stick, the meaning of Big Data volume would lie between the terabyte (TB or 1012 or 240) and zettabyte (ZB or 1021 or 270) range. Based on the historical data traffic growth rate, Cisco claimed that humans have entered the ZB era in 2015 [2]. To understand the significance of the data volume’s impact, let us glance at the average size of different data files shown in Table 1.
Table 1
Typical Size of Different Data Files
MediaAverage Size of Data FileNotes (2014)
Web page1.6–2 MBAverage 100 objects
eBook1–5 MB200–350 pages
Song3.5–5.8 MBAverage 1.9 MB/per minute (MP3) 256 Kbps rate (3 mins)
Movie100–120 GB60 frames per second (MPEG-4 format, Full High Definition, 2 hours)
The main aim of this chapter is to provide a historical view of Big Data and to argue that it is not just 3Vs, but rather 32Vs or 9Vs. These additional Big Data attributes reflect the real motivation behind Big Data analytics (BDA). We believe that these expanded features clarify some basic questions about the essence of BDA: what problems Big Data can address, and what problems should not be confused as BDA. These issues are covered in the chapter through analysis of historical developments, along with associated technologies that support Big Data processing. The rest of the chapter is organized into eight sections as follows:
1) A historical review for Big Data
2) Interpretation of Big Data 3Vs, 4Vs, and 6Vs
3) Defining Big Data from 3Vs to 32Vs
4) Big Data and Machine Learning (ML)
5) Big Data and cloud computing
6) Hadoop, Hadoop distributed file system (HDFS), MapReduce, Spark, and Flink
7) ML + CC (Cloud Computing) → BDA and guidelines
8) Conclusion

1.2 A Historical Review of Big Data

In order to capture the essence of Big Data, we provide the origin and history of BDA and then propose a precise definition of BDA.

1.2.1 The Origin of Big Data

Several studies have been conducted on the historical views and developments in the BDA area. Gil Press [3] provided a short history of Big Data starting from 1944, which was based on Rider’s work [4]. He covered 68 years of history of evolution of Big Data between 1944 and 2012 and illustrated 32 Big Data-related events in recent data science history. As Press indicated in his article, the fine line between the growth of data and Big Data has become blurred. Very often, the growth rate of data has been referred as “information explosion”; although “data” and “information” are often used interchangeably, the two terms have different connotations. Press’ study is quite comprehensive and covers BDA events up to December 2013. Since then, there have been many relevant Big Data events. Nevertheless, Press’ review did cover both Big Data and data science events. To this extent, the term “data science” could be considered as a complementary meaning to BDA.
In comparison with Press’ re...

Table of contents