Practical Big Data Analytics
eBook - ePub

Practical Big Data Analytics

Nataraj Dasgupta, Giancarlo Zaccone, Patrick Hannah

Partager le livre
  1. 412 pages
  2. English
  3. ePUB (adapté aux mobiles)
  4. Disponible sur iOS et Android
eBook - ePub

Practical Big Data Analytics

Nataraj Dasgupta, Giancarlo Zaccone, Patrick Hannah

DĂ©tails du livre
Aperçu du livre
Table des matiĂšres
Citations

À propos de ce livre

Get command of your organizational Big Data using the power of data science and analytics

Key Features

  • A perfect companion to boost your Big Data storing, processing, analyzing skills to help you take informed business decisions
  • Work with the best tools such as Apache Hadoop, R, Python, and Spark for NoSQL platforms to perform massive online analyses
  • Get expert tips on statistical inference, machine learning, mathematical modeling, and data visualization for Big Data

Book Description

Big Data analytics relates to the strategies used by organizations to collect, organize and analyze large amounts of data to uncover valuable business insights that otherwise cannot be analyzed through traditional systems. Crafting an enterprise-scale cost-efficient Big Data and machine learning solution to uncover insights and value from your organization's data is a challenge. Today, with hundreds of new Big Data systems, machine learning packages and BI Tools, selecting the right combination of technologies is an even greater challenge. This book will help you do that.

With the help of this guide, you will be able to bridge the gap between the theoretical world of technology with the practical ground reality of building corporate Big Data and data science platforms. You will get hands-on exposure to Hadoop and Spark, build machine learning dashboards using R and R Shiny, create web-based apps using NoSQL databases such as MongoDB and even learn how to write R code for neural networks.

By the end of the book, you will have a very clear and concrete understanding of what Big Data analytics means, how it drives revenues for organizations, and how you can develop your own Big Data analytics solution using different tools and methods articulated in this book.

What you will learn

  • - Get a 360-degree view into the world of Big Data, data science and machine learning
  • - Broad range of technical and business Big Data analytics topics that caters to the interests of the technical experts as well as corporate IT executives
  • - Get hands-on experience with industry-standard Big Data and machine learning tools such as Hadoop, Spark, MongoDB, KDB+ and R
  • - Create production-grade machine learning BI Dashboards using R and R Shiny with step-by-step instructions
  • - Learn how to combine open-source Big Data, machine learning and BI Tools to create low-cost business analytics applications
  • - Understand corporate strategies for successful Big Data and data science projects
  • - Go beyond general-purpose analytics to develop cutting-edge Big Data applications using emerging technologies

Who this book is for

The book is intended for existing and aspiring Big Data professionals who wish to become the go-to person in their organization when it comes to Big Data architecture, analytics, and governance. While no prior knowledge of Big Data or related technologies is assumed, it will be helpful to have some programming experience.

Foire aux questions

Comment puis-je résilier mon abonnement ?
Il vous suffit de vous rendre dans la section compte dans paramĂštres et de cliquer sur « RĂ©silier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez rĂ©siliĂ© votre abonnement, il restera actif pour le reste de la pĂ©riode pour laquelle vous avez payĂ©. DĂ©couvrez-en plus ici.
Puis-je / comment puis-je télécharger des livres ?
Pour le moment, tous nos livres en format ePub adaptĂ©s aux mobiles peuvent ĂȘtre tĂ©lĂ©chargĂ©s via l’application. La plupart de nos PDF sont Ă©galement disponibles en tĂ©lĂ©chargement et les autres seront tĂ©lĂ©chargeables trĂšs prochainement. DĂ©couvrez-en plus ici.
Quelle est la différence entre les formules tarifaires ?
Les deux abonnements vous donnent un accĂšs complet Ă  la bibliothĂšque et Ă  toutes les fonctionnalitĂ©s de Perlego. Les seules diffĂ©rences sont les tarifs ainsi que la pĂ©riode d’abonnement : avec l’abonnement annuel, vous Ă©conomiserez environ 30 % par rapport Ă  12 mois d’abonnement mensuel.
Qu’est-ce que Perlego ?
Nous sommes un service d’abonnement Ă  des ouvrages universitaires en ligne, oĂč vous pouvez accĂ©der Ă  toute une bibliothĂšque pour un prix infĂ©rieur Ă  celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! DĂ©couvrez-en plus ici.
Prenez-vous en charge la synthÚse vocale ?
Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte Ă  haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accĂ©lĂ©rer ou le ralentir. DĂ©couvrez-en plus ici.
Est-ce que Practical Big Data Analytics est un PDF/ePUB en ligne ?
Oui, vous pouvez accĂ©der Ă  Practical Big Data Analytics par Nataraj Dasgupta, Giancarlo Zaccone, Patrick Hannah en format PDF et/ou ePUB ainsi qu’à d’autres livres populaires dans Computer Science et Data Processing. Nous disposons de plus d’un million d’ouvrages Ă  dĂ©couvrir dans notre catalogue.

Informations

Année
2018
ISBN
9781783554409
Édition
1
Sous-sujet
Data Processing

Big Data Mining with NoSQL

The term NoSQL was first used by Carlo Strozzi, who, in 1998, released the Strozzi NoSQL opensource relational database. In the late 2000s, new paradigms in database architecture emerged, many of which did not adhere to the strict constraints required of relational database systems. These databases, due to their non-conformity with standard database conventions such as ACID compliance, were soon grouped under a broad category known as NoSQL.
Each NoSQL database claims to be optimal for certain use cases. Although few of them would fit the requirements to be a general-purpose database management system, they all leverage a few common themes across the spectrum of NoSQL systems.
In this chapter, we will visit some of the broad categories of NoSQL database management systems. We will discuss the primary drivers that initiated the migration to NoSQL database systems and how such databases solved specific business needs that led to their widespread adoption, and conclude with a few hands-on NoSQL exercises.
The topics covered in this chapter include:
  • Why NoSQL?
  • NoSQL databases
  • In-memory databases
  • Columnar databases
  • Document-oriented databases
  • Key-value databases
  • Graph databases
  • Other NoSQL types and summary
  • Hands-on exercise on NoSQL systems

Why NoSQL?

The term NoSQL generally means Not Only SQL: that is, the underlying database has properties that are different to those of common and traditional database systems. As such, there is no clear distinction that qualifies a database as NoSQL, other than the fact that they do not provide the characteristics of ACID compliance. As such, it would be helpful to understand the nature of ACID properties that have been the mainstay of database systems for many decades, as well as discuss, in brief, the significance of BASE and CAP, two other terminologies central to databases today.

The ACID, BASE, and CAP properties

Let's first proceed with ACID and SQL.

ACID and SQL

ACID stands for atomicity, consistency, isolation, and durability:
  • Atomicity: This indicates that database transactions either execute in full or do not execute at all. In other words, either all transactions should be committed, that is, persisted in their entirety, or not committed at all. There is no scope for a partial execution of a transaction.
  • Consistency: The constraints on the data, that is, the rules that determine data management within a database, will be consistent throughout the database. Different instances will not abide by rules that are any different to those in other instances of the database.
  • Isolation: This property defines the rules of how concurrent operations (transactions) will read and write data. For example, if a certain record is being updated while another process reads the same record, the isolation level of the database system will determine which version of the data would be returned back to the user.
  • Durability: The durability of a database system generally indicates that committed transactions will remain persistent even in the event of a system failure. This is generally managed by the use of transaction logs that databases can refer to during recovery.
The reader may observe that all the properties defined here relate primarily to database transactions. A transaction is a unit of operation that abides by the aforementioned rules and makes a change to the database. For example, a typical cash withdrawal from an ATM may have the following logical pathway:
  1. User withdraws cash from an ATM
  2. The bank checks the current balance of the user
  3. The database system deducts the corresponding amount from the user's account
  4. The database system updates the amount in the user's account to reflect the change
As such, most databases in popular use prior to the mid-1990s, such as Oracle, Sybase, DB2, and others, were optimized for recording and managing transactional data. Until this time, most databases were responsible for managing transactional data. The rapid growth of the internet in the mid-90s led to new types of data that did not necessarily require the strict ACID compliance requirements. Videos on YouTube, music on Pandora, and corporate email records are all examples of use cases where a a transactional database does not add value beyond simply functioning as a technology layer for storing data.

The BASE property of NoSQL

By the late 2000s, data volume had surged and it was apparent that a new alternative model was required in order to manage the data. This new model, called BASE, became a foundational topic that replaced ACID as the preferred model of database management systems.
BASE stands for Basically Available Soft-state Eventually consistency. This implies that the database is basically available for use most of the time; that is, there can be periods during which the services are unavailable (and hence additional redundancy measures should be implemented). Soft-state means that the state of the system cannot be guaranteed - different instances of the same data might have different content as it may not have yet captured recent updates in another part of the cluster. Finally, eventually consistent implies that although the database might not be in the same state at all times, it will eventually get to the same state; that is, become consistent.

The CAP theorem

First introduced in the late 199...

Table des matiĂšres