Over 100 practical recipes to help you become an expert Hadoop administratorAbout This Book• Become an expert Hadoop administrator and perform tasks to optimize your Hadoop Cluster• Import and export data into Hive and use Oozie to manage workflow.• Practical recipes will help you plan and secure your Hadoop cluster, and make it highly availableWho This Book Is ForIf you are a system administrator with a basic understanding of Hadoop and you want to get into Hadoop administration, this book is for you. It's also ideal if you are a Hadoop administrator who wants a quick reference guide to all the Hadoop administration-related tasks and solutions to commonly occurring problemsWhat You Will Learn• Set up the Hadoop architecture to run a Hadoop cluster smoothly• Maintain a Hadoop cluster on HDFS, YARN, and MapReduce• Understand high availability with Zookeeper and Journal Node• Configure Flume for data ingestion and Oozie to run various workflows• Tune the Hadoop cluster for optimal performance• Schedule jobs on a Hadoop cluster using the Fair and Capacity scheduler• Secure your cluster and troubleshoot it for various common pain pointsIn DetailHadoop enables the distributed storage and processing of large datasets across clusters of computers. Learning how to administer Hadoop is crucial to exploit its unique features. With this book, you will be able to overcome common problems encountered in Hadoop administration.The book begins with laying the foundation by showing you the steps needed to set up a Hadoop cluster and its various nodes. You will get a better understanding of how to maintain Hadoop cluster, especially on the HDFS layer and using YARN and MapReduce. Further on, you will explore durability and high availability of a Hadoop cluster.You'll get a better understanding of the schedulers in Hadoop and how to configure and use them for your tasks. You will also get hands-on experience with the backup and recovery options and the performance tuning aspects of Hadoop. Finally, you will get a better understanding of troubleshooting, diagnostics, and best practices in Hadoop administration.By the end of this book, you will have a proper understanding of working with Hadoop clusters and will also be able to secure, encrypt it, and configure auditing for your Hadoop clusters.Style and approachThis book contains short recipes that will help you run a Hadoop cluster efficiently. The recipes are solutions to real-life problems that administrators encounter while working with a Hadoop cluster

- 348 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
eBook - ePub
Hadoop 2.x Administration Cookbook
About this book
Trusted by 375,005 students
Access to over 1.5 million titles for a fair monthly price.
Study more efficiently using our study tools.
Information
Hadoop 2.x Administration Cookbook
Table of Contents
Hadoop 2.x Administration Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Customer Feedback
Preface
What this book covers
What you need for this book
Who this book is for
Sections
Getting ready
How to do it…
How it works…
There's more…
See also
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Hadoop Architecture and Deployment
Introduction
Overview of Hadoop Architecture
Building and compiling Hadoop
Getting ready
How to do it...
How it works...
Installation methods
Getting ready
How to do it...
How it works...
Setting up host resolution
Getting ready
How to do it...
How it works...
Installing a single-node cluster - HDFS components
Getting ready
How to do it...
How it works...
There's more...
Setting up ResourceManager and NodeManager
Installing a single-node cluster - YARN components
Getting ready
How to do it...
How it works...
There's more...
See also
Installing a multi-node cluster
Getting ready
How to do it...
How it works...
Configuring the Hadoop Gateway node
Getting ready
How to do it...
How it works...
See also
Decommissioning nodes
Getting ready
How to do it...
How it works...
See also
Adding nodes to the cluster
Getting ready
How to do it...
How it works...
There's more...
2. Maintaining Hadoop Cluster HDFS
Introduction
Overview of HDFS
Configuring HDFS block size
Getting ready
How to do it...
How it works...
Setting up Namenode metadata location
Getting ready
How to do it...
How it works...
Loading data in HDFS
Getting ready
How to do it...
How it works...
Configuring HDFS replication
Getting ready
How to do it...
How it works...
See also
HDFS balancer
Getting ready
How to do it...
How it works...
Quota configuration
Getting ready
How to do it...
How it works...
HDFS health and FSCK
Getting ready
How to do it...
How it works...
See also
Configuring rack awareness
Getting ready
How to do it...
How it works...
See also
Recycle or trash bin configuration
Getting ready
How to do it...
How it works...
There's more...
Distcp usage
Getting ready
How to do it...
How it works...
Control block report storm
Getting ready
How to do it...
How it works...
Configuring Datanode heartbeat
Getting ready
How to do it...
How it works...
3. Maintaining Hadoop Cluster – YARN and MapReduce
Introduction
Running a simple MapReduce program
Getting ready
How to do it...
Hadoop streaming
Getting ready
How to do it...
How it works...
Configuring YARN history server
Getting ready
How to do it...
How it works...
There's more...
Job history web interface and metrics
Getting ready
How to do it...
How it works...
Configuring ResourceManager components
Getting ready
How to do it...
How it works...
There's more...
See also
YARN containers and resource allocations
Getting ready
How to do it...
How it works...
There's more...
See also
ResourceManager Web UI and JMX metrics
Getting ready
How to do it...
How it works...
Preserving ResourceManager states
Getting ready
How to do it...
How it works...
There's more...
4. High Availability
Introduction
Namenode HA using shared storage
Getting ready
How to do it...
How it works...
See also
ZooKeeper configuration
Getting ready
How to do it...
How it works...
Namenode HA using Journal node
Getting ready
How to do it...
How it works...
Resourcemanager HA using ZooKeeper
Getting ready
How to do it...
How it works…
Rolling upgrade with HA
Getting ready
How to do it...
How it works...
Configure shared cache manager
Getting ready
How to do it...
There's more...
See also
Configure HDFS cache
Getting ready
How to do it...
How it works...
See also
HDFS snapshots
Getting ready
How to do it...
How it works...
Configuring storage based policies
Getting ready
How to do it...
How it works...
Configuring HA for Edge nodes
Getting ready
How to do it...
How it works...
5. Schedulers
Introduction
Configuring users and groups
Getting ready
How to do it...
How it works...
See also
Fair Scheduler configuration
Getting ready
How to do it...
How it works...
Fair Scheduler pools ...
Table of contents
- Hadoop 2.x Administration Cookbook
Frequently asked questions
Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
- Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
- Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Hadoop 2.x Administration Cookbook by Gurmukh Singh in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Mining. We have over 1.5 million books available in our catalogue for you to explore.