YARN Essentials
YARN Essentials

Amol Fasale, Nirmal Kumar

Table of Contents

YARN Essentials
About the Authors
About the Reviewers
1. Need for YARN
The redesign idea
Limitations of the classical MapReduce or Hadoop 1.x
YARN as the modern operating system of Hadoop
What are the design goals for YARN
2. YARN Architecture
Core components of YARN architecture
ApplicationMaster (AM)
NodeManager (NM)
YARN scheduler policies
The FIFO (First In First Out) scheduler
The fair scheduler
The capacity scheduler
Recent developments in YARN architecture
3. YARN Installation
Single-node installation
Starting with the installation
The standalone mode (local mode)
The pseudo-distributed mode
The fully-distributed mode
Slave files
Operating Hadoop and YARN clusters
Starting Hadoop and YARN clusters
Stopping Hadoop and YARN clusters
Web interfaces of the Ecosystem
4. YARN and Hadoop Ecosystems
The Hadoop 2 release
A short introduction to Hadoop 1.x and MRv1
MRv1 versus MRv2
Understanding where YARN fits into Hadoop
Old and new MapReduce APIs
Backward compatibility of MRv2 APIs
Binary compatibility of org.apache.hadoop.mapred APIs
Source compatibility of org.apache.hadoop.mapred APIs
Practical examples of MRv1 and MRv2
Preparing the input file(s)
Running the job
5. YARN Administration
Container allocation
Container allocation to the application
Container configurations
YARN scheduling policies
The FIFO (First In First Out) scheduler
The FIFO (First In First Out) scheduler
The capacity scheduler
Capacity scheduler configurations
The fair scheduler
Fair scheduler configurations
YARN multitenancy application support
Administration of YARN
Administrative tools
Adding and removing nodes from a YARN cluster
Administrating YARN jobs
MapReduce job configurations
YARN log management
YARN web user interface
6. Developing and Running a Simple YARN Application
Running sample examples on YARN
Running a sample Pi example
Monitoring YARN applications with web GUI
YARN's MapReduce support
The MapReduce ApplicationMaster
Example YARN MapReduce settings
YARN's compatibility with MapReduce applications
Developing YARN applications
The YARN application workflow
Writing the YARN client
Writing the YARN ApplicationMaster
Responsibilities of the ApplicationMaster
7. YARN Frameworks
Apache Samza
Writing a Kafka producer
Writing the hello-samza project
Starting a grid
Hadoop YARN should be installed
Apache ZooKeeper should be installed
Setting up Storm-YARN
Getting the storm.yaml configuration of the launched Storm cluster
Building and running Storm-Starter examples
Apache Spark
Why run on YARN?
Apache Tez
Apache Giraph
HOYA (HBase on YARN)
KOYA (Kafka on YARN)
8. Failures in YARN
ResourceManager failures
ApplicationMaster failures
NodeManager failures
Container failures
Hardware Failures
9. YARN – Alternative Solutions
10. YARN – Future and Support
What YARN means to the big data industry
Journey – present and future
Present on-going features
Future features
YARN-supported frameworks

YARN Essentials

About the Authors

Amol Fasale has more than 4 years of industry experience actively working in the fields of big data and distributed computing; he is also an active blogger in and contributor to the open source community. Amol works as a senior data system engineer at MakeMyTrip.com, a very well-known travel and hospitality portal in India, responsible for real-time personalization of online user experience with Apache Kafka, Apache Storm, Apache Hadoop, and many more. Also, Amol has active hands-on experience in Java/J2EE, Spring Frameworks, Python, machine learning, Hadoop framework components, SQL, NoSQL, and graph databases.
You can follow Amol on Twitter at @amolfasale or on LinkedIn. Amol is very active on social media. You can catch him online for any technical assistance; he would be happy to help.
