eBook - ePub

Hadoop Essentials

Name: Hadoop Essentials
Author: Shiva Achari

Shiva Achari

Buch teilen

194 Seiten
English
ePUB (handyfreundlich)
Über iOS und Android verfügbar

eBook - ePub

Hadoop Essentials

Shiva Achari

Angaben zum Buch

Buchvorschau

Inhaltsverzeichnis

Quellenangaben

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?

Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.

(Wie) Kann ich Bücher herunterladen?

Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.

Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?

Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.

Was ist Perlego?

Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.

Unterstützt Perlego Text-zu-Sprache?

Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.

Ist Hadoop Essentials als Online-PDF/ePub verfügbar?

Ja, du hast Zugang zu Hadoop Essentials von Shiva Achari im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Informatique & Bases de données. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Information

Verlag

Packt Publishing

Jahr

2015

ISBN

9781784396688

Thema

Informatique

Thema

Bases de données

Hadoop Essentials

Credits

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Introduction to Big Data and Hadoop

V's of big data

Volume

Velocity

Variety

Understanding big data

NoSQL

Types of NoSQL databases

Analytical database

Who is creating big data?

Big data use cases

Big data use case patterns

Big data as a storage pattern

Big data as a data transformation pattern

Big data for a data analysis pattern

Big data for data in a real-time pattern

Big data for a low latency caching pattern

Hadoop

Hadoop history

Description

Advantages of Hadoop

Uses of Hadoop

Hadoop ecosystem

Apache Hadoop

Hadoop distributions

Pillars of Hadoop

Data access components

Data storage component

Data ingestion in Hadoop

Streaming and real-time analysis

Summary

2. Hadoop Ecosystem

Traditional systems

Database trend

The Hadoop use cases

Hadoop's basic data flow

Hadoop integration

The Hadoop ecosystem

Distributed filesystem

HDFS

Distributed programming

NoSQL databases

Apache HBase

Data ingestion

Service programming

Apache YARN

Apache Zookeeper

Scheduling

Data analytics and machine learning

System management

Apache Ambari

Summary

3. Pillars of Hadoop – HDFS, MapReduce, and YARN

HDFS

Features of HDFS

HDFS architecture

NameNode

DataNode

Checkpoint NameNode or Secondary NameNode

BackupNode

Data storage in HDFS

Read pipeline

Write pipeline

Rack awareness

Advantages of rack awareness in HDFS

HDFS federation

Limitations of HDFS 1.0

The benefit of HDFS federation

HDFS ports

HDFS commands

MapReduce

The MapReduce architecture

JobTracker

TaskTracker

Serialization data types

The Writable interface

WritableComparable interface

The MapReduce example

The MapReduce process

Mapper

Shuffle and sorting

Reducer

Speculative execution

FileFormats

InputFormats

RecordReader

OutputFormats

RecordWriter

Writing a MapReduce program

Mapper code

Reducer code

Driver code

Auxiliary steps

Combiner

Partitioner

Custom partitioner

YARN

YARN architecture

ResourceManager

NodeManager

ApplicationMaster

Applications powered by YARN

Summary

4. Data Access Components – Hive and Pig

Need of a data processing tool on Hadoop

Pig

Pig data types

The Pig architecture

The logical plan

The physical plan

The MapReduce plan

Pig modes

Grunt shell

Input data

Loading data

Dump

Store

FOREACH generate

Filter

Group By

Limit

Aggregation

Cogroup

DESCRIBE

EXPLAIN

ILLUSTRATE

Hive

The Hive architecture

Metastore

The Query compiler

The Execution engine

Data types and schemas

Installing Hive

Starting Hive shell

HiveQL

DDL (Data Definition Language) operations

DML (Data Manipulation Language) operations

The SQL operation

Joins

Aggregations

Built-in functions

Custom UDF (User Defined Functions)

Managing tables – external versus managed

SerDe

Partitioning

Bucketing

Summary

5. Storage Component – HBase

An Overview of HBase

Advantages of HBase

The Architecture of HBase

MasterServer

RegionServer

WAL

BlockCache

LRUBlockCache

SlabCache

BucketCache

Regions

MemStore

Zookeeper

The HBase data model

Logical components of a data model

ACID properties

The CAP theorem

The Schema design

The Write pipeline

The Read pipeline

Compaction

The Compaction policy

Minor compaction

Major compaction

Splitting

Pre-Splitting

Auto Splitting

Forced Splitting

Commands

help

Create

List

Put

Scan

Get

Disable

Drop

HBase Hive integration

Performance tuning

Compression

Filters

Counters

HBase coprocessors

Summary

6. Data Ingestion in Hadoop – Sqoop and Flume

Data sources

Challenges in data ingestion

Sqoop

Connectors and drivers

Sqoop 1 architecture

Limitation of Sqoop 1

Sqoop 2 architecture

Imports

Exports

Apache Flume

Reliability

Flume architecture

Multitier topology

Flume master

Flume nodes

Components in Agent

Source

Sink

Channels

Memory channel

File Channel

JDBC Channel

Examples of configuring Flume

The Single agent example

Multiple flows in an agent

Configuring a multiagent setup

Summary

7. Streaming and Real-time Analysis – Storm and Spark

An introduction to Storm

Features of Storm

Physical architecture of Storm

Data architecture of Storm

Storm topology

Storm on YARN

Topology configuration example

Spouts

Bolts

Topology

An introduction to Spark

Features of Spark

Spark framework

Spark SQL

GraphX

MLib

Spark streaming

Spark architecture

Directed Acyclic Graph engine

Resilient Distributed Dataset

Physical architecture

Operat...

Häufig gestellte Fragen

Information

Hadoop Essentials

Table of Contents

Inhaltsverzeichnis