eBook - ePub

Automated Machine Learning on AWS

Name: Automated Machine Learning on AWS
ISBN: 9781801814522

Trenton Potgieter,

Jonathan Dahlberg,

420 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Automated Machine Learning on AWS

Trenton Potgieter,

Jonathan Dahlberg,

About this book

Automate the process of building, training, and deploying machine learning applications to production with AWS solutions such as SageMaker Autopilot, AutoGluon, Step Functions, Amazon Managed Workflows for Apache Airflow, and moreKey Features• Explore the various AWS services that make automated machine learning easier• Recognize the role of DevOps and MLOps methodologies in pipeline automation• Get acquainted with additional AWS services such as Step Functions, MWAA, and more to overcome automation challengesBook DescriptionAWS provides a wide range of solutions to help automate a machine learning workflow with just a few lines of code. With this practical book, you'll learn how to automate a machine learning pipeline using the various AWS services. Automated Machine Learning on AWS begins with a quick overview of what the machine learning pipeline/process looks like and highlights the typical challenges that you may face when building a pipeline. Throughout the book, you'll become well versed with various AWS solutions such as Amazon SageMaker Autopilot, AutoGluon, and AWS Step Functions to automate an end-to-end ML process with the help of hands-on examples. The book will show you how to build, monitor, and execute a CI/CD pipeline for the ML process and how the various CI/CD services within AWS can be applied to a use case with the Cloud Development Kit (CDK). You'll understand what a data-centric ML process is by working with the Amazon Managed Services for Apache Airflow and then build a managed Airflow environment. You'll also cover the key success criteria for an MLSDLC implementation and the process of creating a self-mutating CI/CD pipeline using AWS CDK from the perspective of the platform engineering team. By the end of this AWS book, you'll be able to effectively automate a complete machine learning pipeline and deploy it to production.What you will learn• Employ SageMaker Autopilot and Amazon SageMaker SDK to automate the machine learning process• Understand how to use AutoGluon to automate complicated model building tasks• Use the AWS CDK to codify the machine learning process• Create, deploy, and rebuild a CI/CD pipeline on AWS• Build an ML workflow using AWS Step Functions and the Data Science SDK• Leverage the Amazon SageMaker Feature Store to automate the machine learning software development life cycle (MLSDLC)• Discover how to use Amazon MWAA for a data-centric ML processWho this book is forThis book is for the novice as well as experienced machine learning practitioners looking to automate the process of building, training, and deploying machine learning-based solutions into production, using both purpose-built and other AWS services. A basic understanding of the end-to-end machine learning process and concepts, Python programming, and AWS is necessary to make the most out of this book.

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Packt Publishing

Year

2022

eBook ISBN

9781801814522

Edition

Topic

Computer Science

Subtopic

Artificial Intelligence (AI) & Semantics

Index

Computer Science

Section 1: Fundamentals of the Automated Machine Learning Process and AutoML on AWS

This section will educate you on the complexities of the machine learning process, what AutoML is, and how it can be used to streamline the process.

This section comprises the following chapters:

Chapter 1, Getting Started with Automated Machine Learning on AWS
Chapter 2, Automating Machine Learning Model Development Using SageMaker Autopilot
Chapter 3, Automating Complicated Model Development with AutoGluon

Chapter 1: Getting Started with Automated Machine Learning on AWS

If you have ever had the pleasure of successfully driving a production-ready Machine Learning (ML) application to completion or you are currently in the process of developing your first ML project, I am sure that you will agree with me when I say, "This is not an easy task!"

Why do I say that? Well, if we ignore the intricacies involved in gathering the right training data, analyzing and understanding that data, and then building and training the best possible model, I am sure you will agree that the ML process in itself is a complicated task process, time-consuming, and entirely manual, making it extremely difficult to automate. And it is these factors, plus many more, that contribute to ML tasks being difficult to automate.

The primary goal of this chapter is to emphasize these challenges by reviewing a practical example that sets the stage for why automating the ML process is difficult. This chapter will highlight what governing factors should be considered when performing this automation and how leveraging various Amazon Web Services (AWS) capabilities can make the task of driving ML projects into production less daunting and fully automated. By the end of this chapter, we will have established a common foundation for overcoming these challenges through automation.

Therefore, in this chapter, we will cover the following topics:

Overview of the ML process
Complexities in the ML process
An example of the end-to-end ML process
How AWS can make automating ML development and the deployment process easier

Technical requirements

You will need access to the Jupyter Notebook environment to follow along with the example in this chapter. Although sample code has been provided for the various steps of the ML process, a Jupyter Notebook example has been provided in this book's GitHub repository (https://github.com/PacktPublishing/Automated-Machine-Learning-on-AWS/blob/main/Chapter01/ML%20Process%20Example.ipynb) for you to work through the entire example at your own pace.

For further instructions on how to set up a Jupyter Notebook environment, you can refer to the installation guide (https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html) to either set up JupyterLab or classic Jupyter Notebook. Alternatively, for local notebook development using a development IDE, such as Visual Studio Code, you can refer to the VS Code documentation (https://code.visualstudio.com/docs/datascience/jupyter-notebooks).

Overview of the ML process

Unfortunately, there is no established how-to guide when performing ML. This is because every ML use case is unique and specific to the application that leverages the resultant ML model. Instead, there is a general process pattern that most data scientists, ML engineers, and ML practitioners follow. This process model is called the Cross-Industry Standard Process for Data Mining (CRISP-DM) and while not everyone follows the specific steps of the process verbatim, most production ML models have probably, in some shape or form, been built by using the guardrails that the CRISP-DM methodology provides.

So, when we refer to the ML process, we are invariably referring to the overall methodology of building production-ready ML models using the guardrails from CRSIP-DM.

The following diagram shows an overview of the CRISP-DM guidelines for creating a typical process that an ML practitioner might follow:

Figure 1.1 – Overview of a typical ML process

In a nutshell, the process starts with the ML practitioner being tasked with providing an ML model that addresses a specific business use case. The ML practitioner then finds, ingests, and analyzes an appropriate dataset that can be effectively leveraged to accomplish the goals of the ML project.

Once the data has been analyzed, the ML practitioner determines the most applicable modeling techniques that extract the most relevant information from the data to address the use case. These techniques include the following:

Determining the most applicable ML algorithm
Creating new aspects (engineering new features) of the data that can further improve the chosen model's overall effectiveness
Separating the data into training and testing sets for model training and evaluation

The ML practitioner then codifies the algorithm's architecture and training/testing/evaluation routines. These routines are then executed to determine the best possible model parameters – ones that optimize the model to fit both the data and the business use case.

Finally, the best model is deployed into production to serve predictions that match the initial objective of the business use case.

As you can see, the overall process seems relatively straightforward and easy to follow. So, you may be wondering what all the fuss is about. For example, you may be asking yourself, Where is the complexity in this process? or Why do you say that this is so hard to automate?

While the process may look simplistic, the reality when executing it is vastly different. The following diagram provides a more realistic representation of what an ML practitioner may observe when developing an ML use case:

Figure 1.2 – Overview of a realistic ML process

As you can see, the overall process is far more convoluted than the typical representation shown in Figure 1.1. There are potentially multiple different paths that can be taken through the process. Each course of action is based on the results captured from the previous step in the process. Additionally, taking a particular course of action may not always yield the desired results, thus forcing the ML practitioner to have to reset or go back and choose a different set of criteria that will hopefully produce a better result.

So, now that we have provided a high-level overview of what the typical ML process should entail,...

Automated Machine Learning on AWS
Foreword
Preface
Section 1: Fundamentals of the Automated Machine Learning Process and AutoML on AWS
Chapter 1: Getting Started with Automated Machine Learning on AWS
Chapter 2: Automating Machine Learning Model Development Using SageMaker Autopilot
Chapter 3: Automating Complicated Model Development with AutoGluon
Section 2: Automating the Machine Learning Process with Continuous Integration and Continuous Delivery (CI/CD)
Chapter 4: Continuous Integration and Continuous Delivery (CI/CD) for Machine Learning
Chapter 5: Continuous Deployment of a Production ML Model
Section 3: Optimizing a Source Code-Centric Approach to Automated Machine Learning
Chapter 6: Automating the Machine Learning Process Using AWS Step Functions
Chapter 7: Building the ML Workflow Using AWS Step Functions
Section 4: Optimizing a Data-Centric Approach to Automated Machine Learning
Chapter 8: Automating the Machine Learning Process Using Apache Airflow
Chapter 9: Building the ML Workflow Using Amazon Managed Workflows for Apache Airflow
Section 5: Automating the End-to-End Production Application on AWS
Chapter 10: An Introduction to the Machine Learning Software Development Life Cycle (MLSDLC)
Chapter 11: Continuous Integration, Deployment, and Training for the MLSDLC
Other Books You May Enjoy

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Automated Machine Learning on AWS by Trenton Potgieter,Jonathan Dahlberg in PDF and/or ePUB format, as well as other popular books in Computer Science & Artificial Intelligence (AI) & Semantics. We have over one million books available in our catalogue for you to explore.