HDInsight Essentials - Second Edition
eBook - ePub

HDInsight Essentials - Second Edition

Rajesh Nadipalli

Share book
  1. 178 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

HDInsight Essentials - Second Edition

Rajesh Nadipalli

Book details
Book preview
Table of contents
Citations

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is HDInsight Essentials - Second Edition an online PDF/ePUB?
Yes, you can access HDInsight Essentials - Second Edition by Rajesh Nadipalli in PDF and/or ePUB format, as well as other popular books in Ciencia de la computación & Almacenamiento de datos. We have over one million books available in our catalogue for you to explore.

Information

HDInsight Essentials Second Edition


Table of Contents

HDInsight Essentials Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Instant updates on new Packt books
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Hadoop and HDInsight in a Heartbeat
Data is everywhere
Business value of big data
Hadoop concepts
Brief history of Hadoop
Core components
Hadoop cluster layout
HDFS overview
Writing a file to HDFS
Reading a file from HDFS
HDFS basic commands
YARN overview
YARN application life cycle
YARN workloads
Hadoop distributions
HDInsight overview
HDInsight and Hadoop relationship
Hadoop on Windows deployment options
Microsoft Azure HDInsight Service
HDInsight Emulator
Hortonworks Data Platform (HDP) for Windows
Summary
2. Enterprise Data Lake using HDInsight
Enterprise Data Warehouse architecture
Source systems
Data warehouse
Storage
Processing
User access
Provisioning and monitoring
Data governance and security
Pain points of EDW
The next generation Hadoop-based Enterprise data architecture
Source systems
Data Lake
Storage
Processing
User access
Provisioning and monitoring
Data governance, security, and metadata
Journey to your Data Lake dream
Ingestion and organization
Transformation (rules driven)
Access, analyze, and report
Tools and technology for Hadoop ecosystem
Use case powered by Microsoft HDInsight
Problem statement
Solution
Source systems
Storage
Processing
User access
Benefits
Summary
3. HDInsight Service on Azure
Registering for an Azure account
Azure storage
Provisioning an HDInsight cluster
Cluster topology
Provisioning using Azure PowerShell
Creating a storage container
Provisioning a new HDInsight cluster
HDInsight management dashboard
Dashboard
Monitor
Configuration
Exploring clusters using the remote desktop
Running a sample MapReduce
Deleting the cluster
HDInsight Emulator for the development
Installing HDInsight Emulator
Installation verification
Using HDInsight Emulator
Summary
4. Administering Your HDInsight Cluster
Monitoring cluster health
Name Node status
The Name Node Overview page
Datanode Status
Utilities and logs
Hadoop Service Availability
YARN Application Status
Azure storage management
Configuring your storage account
Monitoring your storage account
Managing access keys
Deleting your storage account
Azure PowerShell
Access Azure Blob storage using Azure PowerShell
Summary
5. Ingest and Organize Data Lake
End-to-end Data Lake solution
Ingesting to Data Lake using HDFS command
Connecting to a Hadoop client
Getting your files on the local storage
Transferring to HDFS
Loading data to Azure Blob storage using Azure PowerShell
Loading files to Data Lake using GUI tools
Storage access keys
Storage tools
CloudXplorer
Key benefits
Registering your storage account
Uploading files to your Blob storage
Using Sqoop to move data from RDBMS to Data Lake
Key benefits
Two modes of using Sqoop
Using Sqoop to import data (SQL to Hadoop)
Organizing your Data Lake in HDFS
Managing file metadata using HCatalog
Key benefits
Using HCatalog Command Line to create tables
Summary
6. Transform Data in the Data Lake
Transformation overview
Tools for transforming data in Data Lake
HCatalog
Persisting HCatalog metastore in a SQL database
Apache Hive
Hive architecture
Starting Hive in HDInsight
Basic Hive commands
Apache Pig
Pig architecture
Starting Pig in HDInsight node
Basic Pig commands
Pig or Hive
MapReduce
The mapper code
The reducer code
The driver code
Executing MapReduce on HDInsight
Azure PowerShell for execution of Hadoop jobs
Transformation for the OTP project
Cleaning data using Pig
Executing Pig script
Registering a refined and aggregate table using Hive
Executing Hive script
Reviewing results
Other tools used for transformation
Oozie
Spark
Summary
7. Analyze and Report from Data Lake
Data access overview
Analysis using Excel and Microsoft Hive ODBC driver
Prerequisites
Step 1 – installing the Microsoft Hive ODBC driver
Step 2 – creating Hive ODBC Data Source
Step 3 – importing data to Excel
Analysis using Excel Power Query
Prerequisites
Step 1 – installing the Microsoft Power Query for Excel
Step 2 – i...

Table of contents