Practical Big Data Analytics
eBook - ePub

Practical Big Data Analytics

Nataraj Dasgupta, Giancarlo Zaccone, Patrick Hannah

Buch teilen
  1. 412 Seiten
  2. English
  3. ePUB (handyfreundlich)
  4. Über iOS und Android verfügbar
eBook - ePub

Practical Big Data Analytics

Nataraj Dasgupta, Giancarlo Zaccone, Patrick Hannah

Angaben zum Buch
Buchvorschau
Inhaltsverzeichnis
Quellenangaben

Über dieses Buch

Get command of your organizational Big Data using the power of data science and analytics

Key Features

  • A perfect companion to boost your Big Data storing, processing, analyzing skills to help you take informed business decisions
  • Work with the best tools such as Apache Hadoop, R, Python, and Spark for NoSQL platforms to perform massive online analyses
  • Get expert tips on statistical inference, machine learning, mathematical modeling, and data visualization for Big Data

Book Description

Big Data analytics relates to the strategies used by organizations to collect, organize and analyze large amounts of data to uncover valuable business insights that otherwise cannot be analyzed through traditional systems. Crafting an enterprise-scale cost-efficient Big Data and machine learning solution to uncover insights and value from your organization's data is a challenge. Today, with hundreds of new Big Data systems, machine learning packages and BI Tools, selecting the right combination of technologies is an even greater challenge. This book will help you do that.

With the help of this guide, you will be able to bridge the gap between the theoretical world of technology with the practical ground reality of building corporate Big Data and data science platforms. You will get hands-on exposure to Hadoop and Spark, build machine learning dashboards using R and R Shiny, create web-based apps using NoSQL databases such as MongoDB and even learn how to write R code for neural networks.

By the end of the book, you will have a very clear and concrete understanding of what Big Data analytics means, how it drives revenues for organizations, and how you can develop your own Big Data analytics solution using different tools and methods articulated in this book.

What you will learn

  • - Get a 360-degree view into the world of Big Data, data science and machine learning
  • - Broad range of technical and business Big Data analytics topics that caters to the interests of the technical experts as well as corporate IT executives
  • - Get hands-on experience with industry-standard Big Data and machine learning tools such as Hadoop, Spark, MongoDB, KDB+ and R
  • - Create production-grade machine learning BI Dashboards using R and R Shiny with step-by-step instructions
  • - Learn how to combine open-source Big Data, machine learning and BI Tools to create low-cost business analytics applications
  • - Understand corporate strategies for successful Big Data and data science projects
  • - Go beyond general-purpose analytics to develop cutting-edge Big Data applications using emerging technologies

Who this book is for

The book is intended for existing and aspiring Big Data professionals who wish to become the go-to person in their organization when it comes to Big Data architecture, analytics, and governance. While no prior knowledge of Big Data or related technologies is assumed, it will be helpful to have some programming experience.

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?
Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.
(Wie) Kann ich Bücher herunterladen?
Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.
Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?
Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.
Was ist Perlego?
Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.
Unterstützt Perlego Text-zu-Sprache?
Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.
Ist Practical Big Data Analytics als Online-PDF/ePub verfügbar?
Ja, du hast Zugang zu Practical Big Data Analytics von Nataraj Dasgupta, Giancarlo Zaccone, Patrick Hannah im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Computer Science & Data Processing. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Information

Jahr
2018
ISBN
9781783554409

Big Data Mining with NoSQL

The term NoSQL was first used by Carlo Strozzi, who, in 1998, released the Strozzi NoSQL opensource relational database. In the late 2000s, new paradigms in database architecture emerged, many of which did not adhere to the strict constraints required of relational database systems. These databases, due to their non-conformity with standard database conventions such as ACID compliance, were soon grouped under a broad category known as NoSQL.
Each NoSQL database claims to be optimal for certain use cases. Although few of them would fit the requirements to be a general-purpose database management system, they all leverage a few common themes across the spectrum of NoSQL systems.
In this chapter, we will visit some of the broad categories of NoSQL database management systems. We will discuss the primary drivers that initiated the migration to NoSQL database systems and how such databases solved specific business needs that led to their widespread adoption, and conclude with a few hands-on NoSQL exercises.
The topics covered in this chapter include:
  • Why NoSQL?
  • NoSQL databases
  • In-memory databases
  • Columnar databases
  • Document-oriented databases
  • Key-value databases
  • Graph databases
  • Other NoSQL types and summary
  • Hands-on exercise on NoSQL systems

Why NoSQL?

The term NoSQL generally means Not Only SQL: that is, the underlying database has properties that are different to those of common and traditional database systems. As such, there is no clear distinction that qualifies a database as NoSQL, other than the fact that they do not provide the characteristics of ACID compliance. As such, it would be helpful to understand the nature of ACID properties that have been the mainstay of database systems for many decades, as well as discuss, in brief, the significance of BASE and CAP, two other terminologies central to databases today.

The ACID, BASE, and CAP properties

Let's first proceed with ACID and SQL.

ACID and SQL

ACID stands for atomicity, consistency, isolation, and durability:
  • Atomicity: This indicates that database transactions either execute in full or do not execute at all. In other words, either all transactions should be committed, that is, persisted in their entirety, or not committed at all. There is no scope for a partial execution of a transaction.
  • Consistency: The constraints on the data, that is, the rules that determine data management within a database, will be consistent throughout the database. Different instances will not abide by rules that are any different to those in other instances of the database.
  • Isolation: This property defines the rules of how concurrent operations (transactions) will read and write data. For example, if a certain record is being updated while another process reads the same record, the isolation level of the database system will determine which version of the data would be returned back to the user.
  • Durability: The durability of a database system generally indicates that committed transactions will remain persistent even in the event of a system failure. This is generally managed by the use of transaction logs that databases can refer to during recovery.
The reader may observe that all the properties defined here relate primarily to database transactions. A transaction is a unit of operation that abides by the aforementioned rules and makes a change to the database. For example, a typical cash withdrawal from an ATM may have the following logical pathway:
  1. User withdraws cash from an ATM
  2. The bank checks the current balance of the user
  3. The database system deducts the corresponding amount from the user's account
  4. The database system updates the amount in the user's account to reflect the change
As such, most databases in popular use prior to the mid-1990s, such as Oracle, Sybase, DB2, and others, were optimized for recording and managing transactional data. Until this time, most databases were responsible for managing transactional data. The rapid growth of the internet in the mid-90s led to new types of data that did not necessarily require the strict ACID compliance requirements. Videos on YouTube, music on Pandora, and corporate email records are all examples of use cases where a a transactional database does not add value beyond simply functioning as a technology layer for storing data.

The BASE property of NoSQL

By the late 2000s, data volume had surged and it was apparent that a new alternative model was required in order to manage the data. This new model, called BASE, became a foundational topic that replaced ACID as the preferred model of database management systems.
BASE stands for Basically Available Soft-state Eventually consistency. This implies that the database is basically available for use most of the time; that is, there can be periods during which the services are unavailable (and hence additional redundancy measures should be implemented). Soft-state means that the state of the system cannot be guaranteed - different instances of the same data might have different content as it may not have yet captured recent updates in another part of the cluster. Finally, eventually consistent implies that although the database might not be in the same state at all times, it will eventually get to the same state; that is, become consistent.

The CAP theorem

First introduced in the late 199...

Inhaltsverzeichnis