eBook - ePub

Hadoop Essentials

Name: Hadoop Essentials
ISBN: 9781784396688

Shiva Achari,

194 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Hadoop Essentials

Shiva Achari,

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Publisher

Packt Publishing

Year

2015

Topic

Computer Science

eBook ISBN

9781784396688

Subtopic

Business Intelligence

Index

Computer Science

Hadoop Essentials

Credits

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Introduction to Big Data and Hadoop

V's of big data

Volume

Velocity

Variety

Understanding big data

NoSQL

Types of NoSQL databases

Analytical database

Who is creating big data?

Big data use cases

Big data use case patterns

Big data as a storage pattern

Big data as a data transformation pattern

Big data for a data analysis pattern

Big data for data in a real-time pattern

Big data for a low latency caching pattern

Hadoop

Hadoop history

Description

Advantages of Hadoop

Uses of Hadoop

Hadoop ecosystem

Apache Hadoop

Hadoop distributions

Pillars of Hadoop

Data access components

Data storage component

Data ingestion in Hadoop

Streaming and real-time analysis

Summary

2. Hadoop Ecosystem

Traditional systems

Database trend

The Hadoop use cases

Hadoop's basic data flow

Hadoop integration

The Hadoop ecosystem

Distributed filesystem

HDFS

Distributed programming

NoSQL databases

Apache HBase

Data ingestion

Service programming

Apache YARN

Apache Zookeeper

Scheduling

Data analytics and machine learning

System management

Apache Ambari

Summary

3. Pillars of Hadoop – HDFS, MapReduce, and YARN

HDFS

Features of HDFS

HDFS architecture

NameNode

DataNode

Checkpoint NameNode or Secondary NameNode

BackupNode

Data storage in HDFS

Read pipeline

Write pipeline

Rack awareness

Advantages of rack awareness in HDFS

HDFS federation

Limitations of HDFS 1.0

The benefit of HDFS federation

HDFS ports

HDFS commands

MapReduce

The MapReduce architecture

JobTracker

TaskTracker

Serialization data types

The Writable interface

WritableComparable interface

The MapReduce example

The MapReduce process

Mapper

Shuffle and sorting

Reducer

Speculative execution

FileFormats

InputFormats

RecordReader

OutputFormats

RecordWriter

Writing a MapReduce program

Mapper code

Reducer code

Driver code

Auxiliary steps

Combiner

Partitioner

Custom partitioner

YARN

YARN architecture

ResourceManager

NodeManager

ApplicationMaster

Applications powered by YARN

Summary

4. Data Access Components – Hive and Pig

Need of a data processing tool on Hadoop

Pig

Pig data types

The Pig architecture

The logical plan

The physical plan

The MapReduce plan

Pig modes

Grunt shell

Input data

Loading data

Dump

Store

FOREACH generate

Filter

Group By

Limit

Aggregation

Cogroup

DESCRIBE

EXPLAIN

ILLUSTRATE

Hive

The Hive architecture

Metastore

The Query compiler

The Execution engine

Data types and schemas

Installing Hive

Starting Hive shell

HiveQL

DDL (Data Definition Language) operations

DML (Data Manipulation Language) operations

The SQL operation

Joins

Aggregations

Built-in functions

Custom UDF (User Defined Functions)

Managing tables – external versus managed

SerDe

Partitioning

Bucketing

Summary

5. Storage Component – HBase

An Overview of HBase

Advantages of HBase

The Architecture of HBase

MasterServer

RegionServer

WAL

BlockCache

LRUBlockCache

SlabCache

BucketCache

Regions

MemStore

Zookeeper

The HBase data model

Logical components of a data model

ACID properties

The CAP theorem

The Schema design

The Write pipeline

The Read pipeline

Compaction

The Compaction policy

Minor compaction

Major compaction

Splitting

Pre-Splitting

Auto Splitting

Forced Splitting

Commands

help

Create

List

Put

Scan

Get

Disable

Drop

HBase Hive integration

Performance tuning

Compression

Filters

Counters

HBase coprocessors

Summary

6. Data Ingestion in Hadoop – Sqoop and Flume

Data sources

Challenges in data ingestion

Sqoop

Connectors and drivers

Sqoop 1 architecture

Limitation of Sqoop 1

Sqoop 2 architecture

Imports

Exports

Apache Flume

Reliability

Flume architecture

Multitier topology

Flume master

Flume nodes

Components in Agent

Source

Sink

Channels

Memory channel

File Channel

JDBC Channel

Examples of configuring Flume

The Single agent example

Multiple flows in an agent

Configuring a multiagent setup

Summary

7. Streaming and Real-time Analysis – Storm and Spark

An introduction to Storm

Features of Storm

Physical architecture of Storm

Data architecture of Storm

Storm topology

Storm on YARN

Topology configuration example

Spouts

Bolts

Topology

An introduction to Spark

Features of Spark

Spark framework

Spark SQL

GraphX

MLib

Spark streaming

Spark architecture

Directed Acyclic Graph engine

Resilient Distributed Dataset

Physical architecture

Operat...

Hadoop Essentials

Frequently asked questions

Can I cancel at any time?

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

Can I download books?

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

What is the difference between the pricing plans?

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

How does Perlego work?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Can I read on my tablet or smartphone?

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Is Hadoop Essentials an online PDF/ePUB?

Yes, you can access Hadoop Essentials by Shiva Achari in PDF and/or ePUB format, as well as other popular books in Computer Science & Business Intelligence. We have over 1.5 million books available in our catalogue for you to explore.

Related ISBNs

9781785889707,

9781783554409,

9781118079560,

Hadoop Essentials

Hadoop Essentials

Trusted by 375,005 students

Information

Hadoop Essentials

Table of Contents

Table of contents

Frequently asked questions