A Practical Guide for Building an Enterprise Data Lake
eBook - ePub

A Practical Guide for Building an Enterprise Data Lake

Delivering better, faster, and actionable insights to your data consumers (English Edition)

  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

A Practical Guide for Building an Enterprise Data Lake

Delivering better, faster, and actionable insights to your data consumers (English Edition)

About this book

Description
Data lakes are the essential technology for tackling the explosive growth of big data volume, velocity, and variety, moving beyond traditional data warehousing to unlock advanced analytics and machine learning.

This comprehensive book begins by clearly defining the differences between the data lake, lake house, and data mesh architectures and immediately addresses critical governance pitfalls and required upskilling before diving into technical implementation. You will learn the discovery process to define data zones and master ingestion using bulk methods and streaming via Apache Kafka to build Lambda architectures. We then detail ad-hoc data discovery and cataloguing with tools like AWS Glue Data Catalog, followed by practical data transformation using PySpark ETL and orchestration tools to ensure data quality rules. The book concludes by showing you how to enable consumption layers for OLAP engines and machine learning, and finally, how to secure the entire platform with strong security, networking, and budget governance.

Upon completing this practical book, you will possess the competency to not only architect and build a scalable data lake but also to strategically expand its value by treating data as a product, making you a highly effective and confident enterprise data lake professional ready for real-world application.

What you will learn
? Differentiate Data Lake, Lake House, Data Mesh, and Data Fabric semantics.
? Design data zones and cost allocation during the discovery process.
? Implement streaming ingestion using Apache Kafka for Lambda architecture.
? Build PySpark ETL/SQL ELT pipelines with orchestration tools for quality.
? Implement security, networking, and monitoring requirements for governance.

Who this book is for
This practical book is ideal for business/product leaders, architects, and solution engineers. Readers should have foundational knowledge of open-source technologies and major cloud environments like AWS, GCP, or Azure.

Table of Contents
1. Evolution Towards Modern Data Lakes
2. Understanding Common Pitfalls Making Data Lakes Unsuccessful
3. Performing a Discovery to Build Your Data Lake
4. Bringing Data into Your Data Lake
5. Understanding and Cataloguing Your Data
6. Transforming Data and Making it Consumption Ready
7. Building the Consumption Layer for Data Lake
8. Expanding Your Data Lake by Turning Your Data into a Product
9. Building Your Security and Governance Layer

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Information

Year
2025
eBook ISBN
9789365891430

Table of contents

  1. Cover Page
  2. Title Page
  3. Copyright Page
  4. Dedication
  5. About the Author
  6. About the Reviewers
  7. Acknowledgements
  8. Preface
  9. Table of Contents
  10. 1. Evolution Towards Modern Data Lakes
  11. 2. Understanding Common Pitfalls Making Data Lakes Unsuccessful
  12. 3. Performing a Discovery to Build Your Data Lake
  13. 4. Bringing Data into Your Data Lake
  14. 5. Understanding and Cataloguing Your Data
  15. 6. Transforming Data and Making it Consumption Ready
  16. 7. Building the Consumption Layer for Data Lake
  17. 8. Expanding Your Data Lake by Turning Your Data into a Product
  18. 9. Building Your Security and Governance Layer
  19. Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access A Practical Guide for Building an Enterprise Data Lake by Sai Srinivas Sriparasa in PDF and/or ePUB format, as well as other popular books in Business & Business Intelligence. We have over 1.5 million books available in our catalogue for you to explore.