
- 194 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
eBook - ePub
Hadoop Essentials
Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead
Information
Hadoop Essentials
Table of Contents
Hadoop Essentials
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Introduction to Big Data and Hadoop
V's of big data
Volume
Velocity
Variety
Understanding big data
NoSQL
Types of NoSQL databases
Analytical database
Who is creating big data?
Big data use cases
Big data use case patterns
Big data as a storage pattern
Big data as a data transformation pattern
Big data for a data analysis pattern
Big data for data in a real-time pattern
Big data for a low latency caching pattern
Hadoop
Hadoop history
Description
Advantages of Hadoop
Uses of Hadoop
Hadoop ecosystem
Apache Hadoop
Hadoop distributions
Pillars of Hadoop
Data access components
Data storage component
Data ingestion in Hadoop
Streaming and real-time analysis
Summary
2. Hadoop Ecosystem
Traditional systems
Database trend
The Hadoop use cases
Hadoop's basic data flow
Hadoop integration
The Hadoop ecosystem
Distributed filesystem
HDFS
Distributed programming
NoSQL databases
Apache HBase
Data ingestion
Service programming
Apache YARN
Apache Zookeeper
Scheduling
Data analytics and machine learning
System management
Apache Ambari
Summary
3. Pillars of Hadoop – HDFS, MapReduce, and YARN
HDFS
Features of HDFS
HDFS architecture
NameNode
DataNode
Checkpoint NameNode or Secondary NameNode
BackupNode
Data storage in HDFS
Read pipeline
Write pipeline
Rack awareness
Advantages of rack awareness in HDFS
HDFS federation
Limitations of HDFS 1.0
The benefit of HDFS federation
HDFS ports
HDFS commands
MapReduce
The MapReduce architecture
JobTracker
TaskTracker
Serialization data types
The Writable interface
WritableComparable interface
The MapReduce example
The MapReduce process
Mapper
Shuffle and sorting
Reducer
Speculative execution
FileFormats
InputFormats
RecordReader
OutputFormats
RecordWriter
Writing a MapReduce program
Mapper code
Reducer code
Driver code
Auxiliary steps
Combiner
Partitioner
Custom partitioner
YARN
YARN architecture
ResourceManager
NodeManager
ApplicationMaster
Applications powered by YARN
Summary
4. Data Access Components – Hive and Pig
Need of a data processing tool on Hadoop
Pig
Pig data types
The Pig architecture
The logical plan
The physical plan
The MapReduce plan
Pig modes
Grunt shell
Input data
Loading data
Dump
Store
FOREACH generate
Filter
Group By
Limit
Aggregation
Cogroup
DESCRIBE
EXPLAIN
ILLUSTRATE
Hive
The Hive architecture
Metastore
The Query compiler
The Execution engine
Data types and schemas
Installing Hive
Starting Hive shell
HiveQL
DDL (Data Definition Language) operations
DML (Data Manipulation Language) operations
The SQL operation
Joins
Aggregations
Built-in functions
Custom UDF (User Defined Functions)
Managing tables – external versus managed
SerDe
Partitioning
Bucketing
Summary
5. Storage Component – HBase
An Overview of HBase
Advantages of HBase
The Architecture of HBase
MasterServer
RegionServer
WAL
BlockCache
LRUBlockCache
SlabCache
BucketCache
Regions
MemStore
Zookeeper
The HBase data model
Logical components of a data model
ACID properties
The CAP theorem
The Schema design
The Write pipeline
The Read pipeline
Compaction
The Compaction policy
Minor compaction
Major compaction
Splitting
Pre-Splitting
Auto Splitting
Forced Splitting
Commands
help
Create
List
Put
Scan
Get
Disable
Drop
HBase Hive integration
Performance tuning
Compression
Filters
Counters
HBase coprocessors
Summary
6. Data Ingestion in Hadoop – Sqoop and Flume
Data sources
Challenges in data ingestion
Sqoop
Connectors and drivers
Sqoop 1 architecture
Limitation of Sqoop 1
Sqoop 2 architecture
Imports
Exports
Apache Flume
Reliability
Flume architecture
Multitier topology
Flume master
Flume nodes
Components in Agent
Source
Sink
Channels
Memory channel
File Channel
JDBC Channel
Examples of configuring Flume
The Single agent example
Multiple flows in an agent
Configuring a multiagent setup
Summary
7. Streaming and Real-time Analysis – Storm and Spark
An introduction to Storm
Features of Storm
Physical architecture of Storm
Data architecture of Storm
Storm topology
Storm on YARN
Topology configuration example
Spouts
Bolts
Topology
An introduction to Spark
Features of Spark
Spark framework
Spark SQL
GraphX
MLib
Spark streaming
Spark architecture
Directed Acyclic Graph engine
Resilient Distributed Dataset
Physical architecture
Operat...
Table of contents
- Hadoop Essentials
Frequently asked questions
Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
- Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
- Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Hadoop Essentials by Shiva Achari in PDF and/or ePUB format, as well as other popular books in Computer Science & Business Intelligence. We have over one million books available in our catalogue for you to explore.