eBook - ePub

Hadoop Essentials

Name: Hadoop Essentials
Author: Shiva Achari

Shiva Achari

Condividi libro

194 pagine
English
ePUB (disponibile sull'app)
Disponibile su iOS e Android

eBook - ePub

Hadoop Essentials

Shiva Achari

Dettagli del libro

Anteprima del libro

Indice dei contenuti

Citazioni

Domande frequenti

Come faccio ad annullare l'abbonamento?

È semplicissimo: basta accedere alla sezione Account nelle Impostazioni e cliccare su "Annulla abbonamento". Dopo la cancellazione, l'abbonamento rimarrà attivo per il periodo rimanente già pagato. Per maggiori informazioni, clicca qui

È possibile scaricare libri? Se sì, come?

Al momento è possibile scaricare tramite l'app tutti i nostri libri ePub mobile-friendly. Anche la maggior parte dei nostri PDF è scaricabile e stiamo lavorando per rendere disponibile quanto prima il download di tutti gli altri file. Per maggiori informazioni, clicca qui

Che differenza c'è tra i piani?

Entrambi i piani ti danno accesso illimitato alla libreria e a tutte le funzionalità di Perlego. Le uniche differenze sono il prezzo e il periodo di abbonamento: con il piano annuale risparmierai circa il 30% rispetto a 12 rate con quello mensile.

Cos'è Perlego?

Perlego è un servizio di abbonamento a testi accademici, che ti permette di accedere a un'intera libreria online a un prezzo inferiore rispetto a quello che pagheresti per acquistare un singolo libro al mese. Con oltre 1 milione di testi suddivisi in più di 1.000 categorie, troverai sicuramente ciò che fa per te! Per maggiori informazioni, clicca qui.

Perlego supporta la sintesi vocale?

Cerca l'icona Sintesi vocale nel prossimo libro che leggerai per verificare se è possibile riprodurre l'audio. Questo strumento permette di leggere il testo a voce alta, evidenziandolo man mano che la lettura procede. Puoi aumentare o diminuire la velocità della sintesi vocale, oppure sospendere la riproduzione. Per maggiori informazioni, clicca qui.

Hadoop Essentials è disponibile online in formato PDF/ePub?

Sì, puoi accedere a Hadoop Essentials di Shiva Achari in formato PDF e/o ePub, così come ad altri libri molto apprezzati nelle sezioni relative a Informatique e Bases de données. Scopri oltre 1 milione di libri disponibili nel nostro catalogo.

Informazioni

Editore

Packt Publishing

Anno

2015

ISBN

9781784396688

Argomento

Informatique

Categoria

Bases de données

Hadoop Essentials

Credits

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Introduction to Big Data and Hadoop

V's of big data

Volume

Velocity

Variety

Understanding big data

NoSQL

Types of NoSQL databases

Analytical database

Who is creating big data?

Big data use cases

Big data use case patterns

Big data as a storage pattern

Big data as a data transformation pattern

Big data for a data analysis pattern

Big data for data in a real-time pattern

Big data for a low latency caching pattern

Hadoop

Hadoop history

Description

Advantages of Hadoop

Uses of Hadoop

Hadoop ecosystem

Apache Hadoop

Hadoop distributions

Pillars of Hadoop

Data access components

Data storage component

Data ingestion in Hadoop

Streaming and real-time analysis

Summary

2. Hadoop Ecosystem

Traditional systems

Database trend

The Hadoop use cases

Hadoop's basic data flow

Hadoop integration

The Hadoop ecosystem

Distributed filesystem

HDFS

Distributed programming

NoSQL databases

Apache HBase

Data ingestion

Service programming

Apache YARN

Apache Zookeeper

Scheduling

Data analytics and machine learning

System management

Apache Ambari

Summary

3. Pillars of Hadoop – HDFS, MapReduce, and YARN

HDFS

Features of HDFS

HDFS architecture

NameNode

DataNode

Checkpoint NameNode or Secondary NameNode

BackupNode

Data storage in HDFS

Read pipeline

Write pipeline

Rack awareness

Advantages of rack awareness in HDFS

HDFS federation

Limitations of HDFS 1.0

The benefit of HDFS federation

HDFS ports

HDFS commands

MapReduce

The MapReduce architecture

JobTracker

TaskTracker

Serialization data types

The Writable interface

WritableComparable interface

The MapReduce example

The MapReduce process

Mapper

Shuffle and sorting

Reducer

Speculative execution

FileFormats

InputFormats

RecordReader

OutputFormats

RecordWriter

Writing a MapReduce program

Mapper code

Reducer code

Driver code

Auxiliary steps

Combiner

Partitioner

Custom partitioner

YARN

YARN architecture

ResourceManager

NodeManager

ApplicationMaster

Applications powered by YARN

Summary

4. Data Access Components – Hive and Pig

Need of a data processing tool on Hadoop

Pig

Pig data types

The Pig architecture

The logical plan

The physical plan

The MapReduce plan

Pig modes

Grunt shell

Input data

Loading data

Dump

Store

FOREACH generate

Filter

Group By

Limit

Aggregation

Cogroup

DESCRIBE

EXPLAIN

ILLUSTRATE

Hive

The Hive architecture

Metastore

The Query compiler

The Execution engine

Data types and schemas

Installing Hive

Starting Hive shell

HiveQL

DDL (Data Definition Language) operations

DML (Data Manipulation Language) operations

The SQL operation

Joins

Aggregations

Built-in functions

Custom UDF (User Defined Functions)

Managing tables – external versus managed

SerDe

Partitioning

Bucketing

Summary

5. Storage Component – HBase

An Overview of HBase

Advantages of HBase

The Architecture of HBase

MasterServer

RegionServer

WAL

BlockCache

LRUBlockCache

SlabCache

BucketCache

Regions

MemStore

Zookeeper

The HBase data model

Logical components of a data model

ACID properties

The CAP theorem

The Schema design

The Write pipeline

The Read pipeline

Compaction

The Compaction policy

Minor compaction

Major compaction

Splitting

Pre-Splitting

Auto Splitting

Forced Splitting

Commands

help

Create

List

Put

Scan

Get

Disable

Drop

HBase Hive integration

Performance tuning

Compression

Filters

Counters

HBase coprocessors

Summary

6. Data Ingestion in Hadoop – Sqoop and Flume

Data sources

Challenges in data ingestion

Sqoop

Connectors and drivers

Sqoop 1 architecture

Limitation of Sqoop 1

Sqoop 2 architecture

Imports

Exports

Apache Flume

Reliability

Flume architecture

Multitier topology

Flume master

Flume nodes

Components in Agent

Source

Sink

Channels

Memory channel

File Channel

JDBC Channel

Examples of configuring Flume

The Single agent example

Multiple flows in an agent

Configuring a multiagent setup

Summary

7. Streaming and Real-time Analysis – Storm and Spark

An introduction to Storm

Features of Storm

Physical architecture of Storm

Data architecture of Storm

Storm topology

Storm on YARN

Topology configuration example

Spouts

Bolts

Topology

An introduction to Spark

Features of Spark

Spark framework

Spark SQL

GraphX

MLib

Spark streaming

Spark architecture

Directed Acyclic Graph engine

Resilient Distributed Dataset

Physical architecture

Operat...

Domande frequenti

Informazioni

Hadoop Essentials

Table of Contents

Indice dei contenuti