Data Pipelines with Apache Airflow, Second Edition
eBook - ePub

Data Pipelines with Apache Airflow, Second Edition

Orchestration for data and AI

  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

Data Pipelines with Apache Airflow, Second Edition

Orchestration for data and AI

About this book

Data Pipelines with Apache Airflow has empowered thousands of data engineers to build more successful data platforms. This new second edition has been fully revised for Airflow 3 with coverage of all the latest features of Apache Airflow, including the Taskflow API, deferrable operators, and Large Language Model integration. Filled with real-world scenarios and examples, you'll be carefully guided from Airflow novice to expert.

Using real-world scenarios and examples, this book teaches you how to simplify and automate data pipelines, reduce operational overhead, and smoothly integrate all the technologies in your stack. Part reference and part tutorial, each technique is illustrated with engaging hands-on examples, from training machine learning models for generative AI to optimizing delivery routes.

In Data Pipelines with Apache Airflow, Second Edition you'll learn how to:

• Master the core concepts of Airflow architecture and workflow design
• Schedule data pipelines using the Dataset API and time tables, including complex irregular schedules
• Develop custom Airflow components for your specific needs
• Implement comprehensive testing strategies for your pipelines
• Apply industry best practices for building and maintaining Airflow workflows
• Deploy and operate Airflow in production environments
• Orchestrate workflows in container-native environments
• Build and deploy Machine Learning and Generative AI models using Airflow

About the Technology

Apache Airflow provides a unified platform for collecting, consolidating, cleaning, and analyzing data. With its easy-to-use UI, powerful scheduling and monitoring features, plug-and-play options, and flexible Python scripting, Airflow makes it easy to implement secure, consistent pipelines for any data or AI task.

About the book

Data Pipelines with Apache Airflow, Second Edition teaches you how to build, monitor, and maintain effective data workflows. This new edition adds comprehensive coverage of Airflow 3 features, such as event-driven scheduling, dynamic task mapping, DAG versioning, and Airflow’s entirely new UI. The numerous examples address common use cases like data ingestion and transformation and connecting to multiple data sources, along with AI-aware techniques such as building RAG systems.

What's inside

• Deploying data pipelines as Airflow DAGs
• Time and event-based scheduling strategies
• Integrating with databases, LLMs, and AI models
• Deploying Airflow using Kubernetes

About the reader

For data engineers, machine learning engineers, DevOps, and sysadmins with intermediate Python skills.

About the author

Julian de Ruiter, Ismael Cabral, Kris Geusebroek, Daniel van der Ende, and Bas Harenslak are seasoned data engineers and Airflow experts.

Table of Contents

Part 1
1 Meet Apache Airflow
2 Anatomy of an Airflow DAG
3 Time-based scheduling
4 Asset-aware scheduling
5 Templating tasks using the Airflow context
6 Defining dependencies between tasks
Part 2
7 Triggering workflows with external input
8 Communicating with external systems
9 Extending Airflow with custom operators and sensors
10 Testing
11 Running tasks in containers
Part 3
12 Best practices
13 Project: Finding the fastest way to get around NYC
14 Project: Keeping family traditions alive with Airflow and generative AI
Part 4
15 Operating Airflow in production
16 Securing Airflow
17 Airflow deployment options
A Running code samples
B Prometheus metric mapping

Trusted by 375,005 students

Access to over 1 million titles for a fair monthly price.

Study more efficiently using our study tools.

Information

Publisher
Manning
Year
2026
eBook ISBN
9781638357698

Table of contents

  1. Praise for the First Edition
  2. Data Pipelines with Apache Airflow
  3. copyright
  4. contents
  5. preface
  6. acknowledgments
  7. about this book
  8. about the authors
  9. about the cover illustration
  10. Part 1 Getting started
  11. 1 Meet Apache Airflow
  12. 2 Anatomy of an Airflow DAG
  13. 3 Time-based scheduling
  14. 4 Asset-aware scheduling
  15. 5 Templating tasks using the Airflow context
  16. 6 Defining dependencies between tasks
  17. Part 2 Beyond the basics
  18. 7 Triggering workflows with external input
  19. 8 Communicating with external systems
  20. 9 Extending Airflow with custom operators and sensors
  21. 10 Testing
  22. 11 Running tasks in containers
  23. Part 3 Airflow in practice
  24. 12 Best practices
  25. 13 Project: Finding the fastest way to get around NYC
  26. 14 Project: Keeping family traditions alive with Airflow and generative AI
  27. Part 4 Airflow in production
  28. 15 Operating Airflow in production
  29. 16 Securing Airflow
  30. 17 Airflow deployment options
  31. appendix A  Running code samples
  32. appendix B  Prometheus metric mapping

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Data Pipelines with Apache Airflow, Second Edition by Julian de Ruiter,Ismael Cabral,Kris Geusebroek,Daniel van der Ende,Bas Harenslak in PDF and/or ePUB format, as well as other popular books in Computer Science & Cloud Computing. We have over one million books available in our catalogue for you to explore.