Big Data Analytics with Hadoop and Spark
eBook - ePub

Big Data Analytics with Hadoop and Spark

A hands-on guide to big data engineering and scalable analytics (English Edition)

  1. 384 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Big Data Analytics with Hadoop and Spark

A hands-on guide to big data engineering and scalable analytics (English Edition)

About this book

Description
Technologies like Hadoop and Spark, powered by the Cloudera platform, have become essential for storing, processing, and analyzing big data across various industries, including finance, healthcare, e-commerce, and research in today's data-driven world.

This book systematically navigates the entire ecosystem, starting with big data fundamentals, security, and HDFS architecture before mastering MapReduce through weather and stock data case studies. Readers will gain hands-on experience with the Cloudera framework, learning high-level scripting with Pig Latin and structured data warehousing using HiveQL's Metastore and partitions. Additionally, it explores NoSQL versatility with HBase and MongoDB's CAP theorem, followed by Scala programming and Spark's high-speed in-memory engine. You will learn to optimize queries with the Catalyst optimizer and process complex Parquet or JSON files using Spark SQL DataFrames. The book also covers machine learning pipelines with spark.ml for professional-grade classification and clustering applications.

By the end of this book, readers will be able to develop strong conceptual clarity and practical expertise in big data analytics. This will enable them to confidently design, implement, and manage scalable data processing solutions, preparing them to solve real-world data challenges and take on professional roles in big data engineering and analytics.

What you will learn
? Understand big data concepts, architecture, ethics, and applications.
? Build scalable storage using HDFS and MapReduce.
? Perform data analysis using Pig and Hive.
? Develop NoSQL solutions using HBase and MongoDB.
? Process large datasets using Apache Spark.
? Analyze data using Spark SQL and DataFrames.
? Implement machine learning using PySpark.

Who this book is for
This book is ideal for students, researchers, and academicians. It empowers aspiring big data engineers, data scientists, and software engineers. Readers should possess basic programming knowledge and database fundamentals to master Hadoop and Spark for professional-grade data science and faculty-level instruction.

Table of Contents
1. Exploring Big Data
2. Introduction to Hadoop
3. Hadoop Distributed File System and MapReduce
4. Big Data Analysis with Cloudera
5. Stock Data Analysis with Cloudera
6. Understanding Pig for Big Data Processing
7. Operators in Pig Latin
8. Functions in Apache Pig
9. Hive-data Warehousing and SQL-like Queries
10. Data Analysis Using Hive
11. Data Storage and Processing Using HBase
12. MongoDB
13. Introduction to Spark for Big Data Processing
14. Getting Started with Scala Programming
15. Data Analysis with Spark SQL
16. Machine Learning Application Using PySpark

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Information

Year
2026
eBook ISBN
9789365894745

Table of contents

  1. Cover
  2. Title Page
  3. Copyright Page
  4. Dedication Page
  5. About the Author
  6. About the Reviewers
  7. Acknowledgement
  8. Preface
  9. Table of Contents
  10. 1. Exploring Big Data
  11. 2. Introduction to Hadoop
  12. 3. Hadoop Distributed File System and MapReduce
  13. 4. Big Data Analysis with Cloudera
  14. 5. Stock Data Analysis with Cloudera
  15. 6. Understanding Pig for Big Data Processing
  16. 7. Operators in Pig Latin
  17. 8. Functions in Apache Pig
  18. 9. Hive-data Warehousing and SQL-like Queries
  19. 10. Data Analysis Using Hive
  20. 11. Data Storage and Processing Using HBase
  21. 12. MongoDB
  22. 13. Introduction to Spark for Big Data Processing
  23. 14. Getting Started with Scala Programming
  24. 15. Data Analysis with Spark SQL
  25. 16. Machine Learning Application Using PySpark
  26. Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Big Data Analytics with Hadoop and Spark by Shikha Mehta in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Mining. We have over 1.5 million books available in our catalogue for you to explore.