Practical Big Data Analytics
eBook - ePub

Practical Big Data Analytics

Nataraj Dasgupta, Giancarlo Zaccone, Patrick Hannah

Condividi libro
  1. 412 pagine
  2. English
  3. ePUB (disponibile sull'app)
  4. Disponibile su iOS e Android
eBook - ePub

Practical Big Data Analytics

Nataraj Dasgupta, Giancarlo Zaccone, Patrick Hannah

Dettagli del libro
Anteprima del libro
Indice dei contenuti
Citazioni

Informazioni sul libro

Get command of your organizational Big Data using the power of data science and analytics

Key Features

  • A perfect companion to boost your Big Data storing, processing, analyzing skills to help you take informed business decisions
  • Work with the best tools such as Apache Hadoop, R, Python, and Spark for NoSQL platforms to perform massive online analyses
  • Get expert tips on statistical inference, machine learning, mathematical modeling, and data visualization for Big Data

Book Description

Big Data analytics relates to the strategies used by organizations to collect, organize and analyze large amounts of data to uncover valuable business insights that otherwise cannot be analyzed through traditional systems. Crafting an enterprise-scale cost-efficient Big Data and machine learning solution to uncover insights and value from your organization's data is a challenge. Today, with hundreds of new Big Data systems, machine learning packages and BI Tools, selecting the right combination of technologies is an even greater challenge. This book will help you do that.

With the help of this guide, you will be able to bridge the gap between the theoretical world of technology with the practical ground reality of building corporate Big Data and data science platforms. You will get hands-on exposure to Hadoop and Spark, build machine learning dashboards using R and R Shiny, create web-based apps using NoSQL databases such as MongoDB and even learn how to write R code for neural networks.

By the end of the book, you will have a very clear and concrete understanding of what Big Data analytics means, how it drives revenues for organizations, and how you can develop your own Big Data analytics solution using different tools and methods articulated in this book.

What you will learn

  • - Get a 360-degree view into the world of Big Data, data science and machine learning
  • - Broad range of technical and business Big Data analytics topics that caters to the interests of the technical experts as well as corporate IT executives
  • - Get hands-on experience with industry-standard Big Data and machine learning tools such as Hadoop, Spark, MongoDB, KDB+ and R
  • - Create production-grade machine learning BI Dashboards using R and R Shiny with step-by-step instructions
  • - Learn how to combine open-source Big Data, machine learning and BI Tools to create low-cost business analytics applications
  • - Understand corporate strategies for successful Big Data and data science projects
  • - Go beyond general-purpose analytics to develop cutting-edge Big Data applications using emerging technologies

Who this book is for

The book is intended for existing and aspiring Big Data professionals who wish to become the go-to person in their organization when it comes to Big Data architecture, analytics, and governance. While no prior knowledge of Big Data or related technologies is assumed, it will be helpful to have some programming experience.

Domande frequenti

Come faccio ad annullare l'abbonamento?
È semplicissimo: basta accedere alla sezione Account nelle Impostazioni e cliccare su "Annulla abbonamento". Dopo la cancellazione, l'abbonamento rimarrà attivo per il periodo rimanente già pagato. Per maggiori informazioni, clicca qui
È possibile scaricare libri? Se sì, come?
Al momento è possibile scaricare tramite l'app tutti i nostri libri ePub mobile-friendly. Anche la maggior parte dei nostri PDF è scaricabile e stiamo lavorando per rendere disponibile quanto prima il download di tutti gli altri file. Per maggiori informazioni, clicca qui
Che differenza c'è tra i piani?
Entrambi i piani ti danno accesso illimitato alla libreria e a tutte le funzionalità di Perlego. Le uniche differenze sono il prezzo e il periodo di abbonamento: con il piano annuale risparmierai circa il 30% rispetto a 12 rate con quello mensile.
Cos'è Perlego?
Perlego è un servizio di abbonamento a testi accademici, che ti permette di accedere a un'intera libreria online a un prezzo inferiore rispetto a quello che pagheresti per acquistare un singolo libro al mese. Con oltre 1 milione di testi suddivisi in più di 1.000 categorie, troverai sicuramente ciò che fa per te! Per maggiori informazioni, clicca qui.
Perlego supporta la sintesi vocale?
Cerca l'icona Sintesi vocale nel prossimo libro che leggerai per verificare se è possibile riprodurre l'audio. Questo strumento permette di leggere il testo a voce alta, evidenziandolo man mano che la lettura procede. Puoi aumentare o diminuire la velocità della sintesi vocale, oppure sospendere la riproduzione. Per maggiori informazioni, clicca qui.
Practical Big Data Analytics è disponibile online in formato PDF/ePub?
Sì, puoi accedere a Practical Big Data Analytics di Nataraj Dasgupta, Giancarlo Zaccone, Patrick Hannah in formato PDF e/o ePub, così come ad altri libri molto apprezzati nelle sezioni relative a Computer Science e Data Processing. Scopri oltre 1 milione di libri disponibili nel nostro catalogo.

Informazioni

Anno
2018
ISBN
9781783554409
Edizione
1

Big Data Mining with NoSQL

The term NoSQL was first used by Carlo Strozzi, who, in 1998, released the Strozzi NoSQL opensource relational database. In the late 2000s, new paradigms in database architecture emerged, many of which did not adhere to the strict constraints required of relational database systems. These databases, due to their non-conformity with standard database conventions such as ACID compliance, were soon grouped under a broad category known as NoSQL.
Each NoSQL database claims to be optimal for certain use cases. Although few of them would fit the requirements to be a general-purpose database management system, they all leverage a few common themes across the spectrum of NoSQL systems.
In this chapter, we will visit some of the broad categories of NoSQL database management systems. We will discuss the primary drivers that initiated the migration to NoSQL database systems and how such databases solved specific business needs that led to their widespread adoption, and conclude with a few hands-on NoSQL exercises.
The topics covered in this chapter include:
  • Why NoSQL?
  • NoSQL databases
  • In-memory databases
  • Columnar databases
  • Document-oriented databases
  • Key-value databases
  • Graph databases
  • Other NoSQL types and summary
  • Hands-on exercise on NoSQL systems

Why NoSQL?

The term NoSQL generally means Not Only SQL: that is, the underlying database has properties that are different to those of common and traditional database systems. As such, there is no clear distinction that qualifies a database as NoSQL, other than the fact that they do not provide the characteristics of ACID compliance. As such, it would be helpful to understand the nature of ACID properties that have been the mainstay of database systems for many decades, as well as discuss, in brief, the significance of BASE and CAP, two other terminologies central to databases today.

The ACID, BASE, and CAP properties

Let's first proceed with ACID and SQL.

ACID and SQL

ACID stands for atomicity, consistency, isolation, and durability:
  • Atomicity: This indicates that database transactions either execute in full or do not execute at all. In other words, either all transactions should be committed, that is, persisted in their entirety, or not committed at all. There is no scope for a partial execution of a transaction.
  • Consistency: The constraints on the data, that is, the rules that determine data management within a database, will be consistent throughout the database. Different instances will not abide by rules that are any different to those in other instances of the database.
  • Isolation: This property defines the rules of how concurrent operations (transactions) will read and write data. For example, if a certain record is being updated while another process reads the same record, the isolation level of the database system will determine which version of the data would be returned back to the user.
  • Durability: The durability of a database system generally indicates that committed transactions will remain persistent even in the event of a system failure. This is generally managed by the use of transaction logs that databases can refer to during recovery.
The reader may observe that all the properties defined here relate primarily to database transactions. A transaction is a unit of operation that abides by the aforementioned rules and makes a change to the database. For example, a typical cash withdrawal from an ATM may have the following logical pathway:
  1. User withdraws cash from an ATM
  2. The bank checks the current balance of the user
  3. The database system deducts the corresponding amount from the user's account
  4. The database system updates the amount in the user's account to reflect the change
As such, most databases in popular use prior to the mid-1990s, such as Oracle, Sybase, DB2, and others, were optimized for recording and managing transactional data. Until this time, most databases were responsible for managing transactional data. The rapid growth of the internet in the mid-90s led to new types of data that did not necessarily require the strict ACID compliance requirements. Videos on YouTube, music on Pandora, and corporate email records are all examples of use cases where a a transactional database does not add value beyond simply functioning as a technology layer for storing data.

The BASE property of NoSQL

By the late 2000s, data volume had surged and it was apparent that a new alternative model was required in order to manage the data. This new model, called BASE, became a foundational topic that replaced ACID as the preferred model of database management systems.
BASE stands for Basically Available Soft-state Eventually consistency. This implies that the database is basically available for use most of the time; that is, there can be periods during which the services are unavailable (and hence additional redundancy measures should be implemented). Soft-state means that the state of the system cannot be guaranteed - different instances of the same data might have different content as it may not have yet captured recent updates in another part of the cluster. Finally, eventually consistent implies that although the database might not be in the same state at all times, it will eventually get to the same state; that is, become consistent.

The CAP theorem

First introduced in the late 199...

Indice dei contenuti