eBook - ePub

Learning Hadoop 2

Name: Learning Hadoop 2
Author: Garry Turkington, Gabriele Modena

Garry Turkington, Gabriele Modena

Share book

382 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Learning Hadoop 2

Garry Turkington, Gabriele Modena

Book details

Book preview

Table of contents

Citations

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is Learning Hadoop 2 an online PDF/ePUB?

Yes, you can access Learning Hadoop 2 by Garry Turkington, Gabriele Modena in PDF and/or ePUB format, as well as other popular books in Informatique & Bases de données. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Packt Publishing

Year

2015

ISBN

9781783285518

Topic

Informatique

Subtopic

Bases de données

Learning Hadoop 2

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Introduction

A note on versioning

The background of Hadoop

Components of Hadoop

Common building blocks

Storage

Computation

Better together

Hadoop 2 – what's the big deal?

Storage in Hadoop 2

Computation in Hadoop 2

Distributions of Apache Hadoop

A dual approach

AWS – infrastructure on demand from Amazon

Simple Storage Service (S3)

Elastic MapReduce (EMR)

Getting started

Cloudera QuickStart VM

Amazon EMR

Creating an AWS account

Signing up for the necessary services

Using Elastic MapReduce

Getting Hadoop up and running

How to use EMR

AWS credentials

The AWS command-line interface

Running the examples

Data processing with Hadoop

Why Twitter?

Building our first dataset

One service, multiple APIs

Anatomy of a Tweet

Twitter credentials

Programmatic access with Python

Summary

2. Storage

The inner workings of HDFS

Cluster startup

NameNode startup

DataNode startup

Block replication

Command-line access to the HDFS filesystem

Exploring the HDFS filesystem

Protecting the filesystem metadata

Secondary NameNode not to the rescue

Hadoop 2 NameNode HA

Keeping the HA NameNodes in sync

Client configuration

How a failover works

Apache ZooKeeper – a different type of filesystem

Implementing a distributed lock with sequential ZNodes

Implementing group membership and leader election using ephemeral ZNodes

Java API

Building blocks

Frequently asked questions

Information

Learning Hadoop 2

Table of Contents

Table of contents