eBook - ePub

Automated Machine Learning on AWS

Name: Automated Machine Learning on AWS
Author: Trenton Potgieter, Jonathan Dahlberg

Trenton Potgieter, Jonathan Dahlberg

Share book

420 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Automated Machine Learning on AWS

Trenton Potgieter, Jonathan Dahlberg

Book details

Book preview

Table of contents

Citations

About This Book

Automate the process of building, training, and deploying machine learning applications to production with AWS solutions such as SageMaker Autopilot, AutoGluon, Step Functions, Amazon Managed Workflows for Apache Airflow, and moreKey Features• Explore the various AWS services that make automated machine learning easier• Recognize the role of DevOps and MLOps methodologies in pipeline automation• Get acquainted with additional AWS services such as Step Functions, MWAA, and more to overcome automation challengesBook DescriptionAWS provides a wide range of solutions to help automate a machine learning workflow with just a few lines of code. With this practical book, you'll learn how to automate a machine learning pipeline using the various AWS services. Automated Machine Learning on AWS begins with a quick overview of what the machine learning pipeline/process looks like and highlights the typical challenges that you may face when building a pipeline. Throughout the book, you'll become well versed with various AWS solutions such as Amazon SageMaker Autopilot, AutoGluon, and AWS Step Functions to automate an end-to-end ML process with the help of hands-on examples. The book will show you how to build, monitor, and execute a CI/CD pipeline for the ML process and how the various CI/CD services within AWS can be applied to a use case with the Cloud Development Kit (CDK). You'll understand what a data-centric ML process is by working with the Amazon Managed Services for Apache Airflow and then build a managed Airflow environment. You'll also cover the key success criteria for an MLSDLC implementation and the process of creating a self-mutating CI/CD pipeline using AWS CDK from the perspective of the platform engineering team. By the end of this AWS book, you'll be able to effectively automate a complete machine learning pipeline and deploy it to production.What you will learn• Employ SageMaker Autopilot and Amazon SageMaker SDK to automate the machine learning process• Understand how to use AutoGluon to automate complicated model building tasks• Use the AWS CDK to codify the machine learning process• Create, deploy, and rebuild a CI/CD pipeline on AWS• Build an ML workflow using AWS Step Functions and the Data Science SDK• Leverage the Amazon SageMaker Feature Store to automate the machine learning software development life cycle (MLSDLC)• Discover how to use Amazon MWAA for a data-centric ML processWho this book is forThis book is for the novice as well as experienced machine learning practitioners looking to automate the process of building, training, and deploying machine learning-based solutions into production, using both purpose-built and other AWS services. A basic understanding of the end-to-end machine learning process and concepts, Python programming, and AWS is necessary to make the most out of this book.

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is Automated Machine Learning on AWS an online PDF/ePUB?

Yes, you can access Automated Machine Learning on AWS by Trenton Potgieter, Jonathan Dahlberg in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Processing. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Packt Publishing

Year

2022

ISBN

9781801814522

Edition

Topic

Computer Science

Subtopic

Data Processing

Index

Computer Science

Section 1: Fundamentals of the Automated Machine Learning Process and AutoML on AWS

This section will educate you on the complexities of the machine learning process, what AutoML is, and how it can be used to streamline the process.

This section comprises the following chapters:

Chapter 1, Getting Started with Automated Machine Learning on AWS
Chapter 2, Automating Machine Learning Model Development Using SageMaker Autopilot
Chapter 3, Automating Complicated Model Development with AutoGluon

Chapter 1: Getting Started with Automated Machine Learning on AWS

If you have ever had the pleasure of successfully driving a production-ready Machine Learning (ML) application to completion or you are currently in the process of developing your first ML project, I am sure that you will agree with me when I say, "This is not an easy task!"

Why do I say that? Well, if we ignore the intricacies involved in gathering the right training data, analyzing and understanding that data, and then building and training the best possible model, I am sure you will agree that the ML process in itself is a complicated task process, time-consuming, and entirely manual, making it extremely difficult to automate. And it is these factors, plus many more, that contribute to ML tasks being difficult to automate.

The primary goal of this chapter is to emphasize these challenges by reviewing a practical example that sets the stage for why automating the ML process is difficult. This chapter will highlight what governing factors should be considered when performing this automation and how leveraging various Amazon Web Services (AWS) capabilities can make the task of driving ML projects into production less daunting and fully automated. By the end of this chapter, we will have established a common foundation for overcoming these challenges through automation.

Therefore, in this chapter, we will cover the following topics:

Overview of the ML process
Complexities in the ML process
An example of the end-to-end ML process
How AWS can make automating ML development and the deployment process easier

Technical requirements

You will need access to the Jupyter Notebook environment to follow along with the example in this chapter. Although sample code has been provided for the various steps of the ML process, a Jupyter Notebook example has been provided in this book's GitHub repository (https://github.com/PacktPublishing/Automated-Machine-Learning-on-AWS/blob/main/Chapter01/ML%20Process%20Example.ipynb) for you to work through the entire example at your own pace.

For further instructions on how to set up a Jupyter Notebook environment, you can refer to the installation guide (https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html) to either set up JupyterLab or classic Jupyter Notebook. Alternatively, for local notebook development using a development IDE, such as Visual Studio Code, you can refer to the VS Code documentation (https://code.visualstudio.com/docs/datascience/jupyter-notebooks).

Overview of the ML process

Unfortunately, there is no established how-to guide when performing ML. This is because every ML use case is unique and specific to the application that leverages the resultant ML model. Instead, there is a general process pattern that most data scientists, ML engineers, and ML practitioners follow. This process model is called the Cross-Industry Standard Process for Data Mining (CRISP-DM) and while not everyone follows the specific steps of the process verbatim, most production ML models have probably, in some shape or form, been built by using the guardrails that the CRISP-DM methodology provides.

So, when we refer to the ML process, we are invariably referring to the overall methodology of building production-ready ML models using the guardrails from CRSIP-DM.

The following diagram shows an overview of the CRISP-DM guidelines for creating a typical process that an ML practitioner might follow:

Figure 1.1 – Overview of a typical ML process

In a nutshell, the process starts with the ML practitioner being tasked with providing an ML model that addresses a specific business use case. The ML practitioner then finds, ingests, and analyzes an appropriate dataset that can be effectively leveraged to accomplish the goals of the ML project.

Once the data has been analyzed, the ML practitioner determines the most applicable modeling techniques that extract the most relevant information from the data to address the use case. These techniques include the following:

Determining the most applicable ML algorithm
Creating new aspects (engineering new features) of the data that can further improve the chosen model's overall effectiveness
Separating the data into training and testing sets for model training and evaluation

The ML practitioner then codifies the algorithm's architecture and training/testing/evaluation routines. These routines are then executed to determine the best possible model parameters – ones that optimize the model to fit both the data and the business use case.

Finally, the best model is deployed into production to serve predictions that match the initial objective of the business use case.

As you can see, the overall process seems relatively straightforward and easy to follow. So, you may be wondering what all the fuss is about. For example, you may be asking yourself, Where is the complexity in this process? or Why do you say that this is so hard to automate?

While the process may look simplistic, the reality when executing it is vastly different. The following diagram provides a more realistic representation of what an ML practitioner may observe when developing an ML use case:

Figure 1.2 – Overview of a realistic ML process

As you can see, the overall process is far more convoluted than the typical representation shown in Figure 1.1. There are potentially multiple different paths that can be taken through the process. Each course of action is based on the results captured from the previous step in the process. Additionally, taking a particular course of action may not always yield the desired results, thus forcing the ML practitioner to have to reset or go back and choose a different set of criteria that will hopefully produce a better result.

So, now that we have provided a high-level overview of what the typical ML process should entail,...