eBook - ePub

Automated Machine Learning on AWS

Name: Automated Machine Learning on AWS
Author: Trenton Potgieter, Jonathan Dahlberg

Trenton Potgieter, Jonathan Dahlberg

Compartir libro

420 páginas
English
ePUB (apto para móviles)
Disponible en iOS y Android

eBook - ePub

Automated Machine Learning on AWS

Trenton Potgieter, Jonathan Dahlberg

Detalles del libro

Vista previa del libro

Índice

Citas

Información del libro

Automate the process of building, training, and deploying machine learning applications to production with AWS solutions such as SageMaker Autopilot, AutoGluon, Step Functions, Amazon Managed Workflows for Apache Airflow, and moreKey Features• Explore the various AWS services that make automated machine learning easier• Recognize the role of DevOps and MLOps methodologies in pipeline automation• Get acquainted with additional AWS services such as Step Functions, MWAA, and more to overcome automation challengesBook DescriptionAWS provides a wide range of solutions to help automate a machine learning workflow with just a few lines of code. With this practical book, you'll learn how to automate a machine learning pipeline using the various AWS services. Automated Machine Learning on AWS begins with a quick overview of what the machine learning pipeline/process looks like and highlights the typical challenges that you may face when building a pipeline. Throughout the book, you'll become well versed with various AWS solutions such as Amazon SageMaker Autopilot, AutoGluon, and AWS Step Functions to automate an end-to-end ML process with the help of hands-on examples. The book will show you how to build, monitor, and execute a CI/CD pipeline for the ML process and how the various CI/CD services within AWS can be applied to a use case with the Cloud Development Kit (CDK). You'll understand what a data-centric ML process is by working with the Amazon Managed Services for Apache Airflow and then build a managed Airflow environment. You'll also cover the key success criteria for an MLSDLC implementation and the process of creating a self-mutating CI/CD pipeline using AWS CDK from the perspective of the platform engineering team. By the end of this AWS book, you'll be able to effectively automate a complete machine learning pipeline and deploy it to production.What you will learn• Employ SageMaker Autopilot and Amazon SageMaker SDK to automate the machine learning process• Understand how to use AutoGluon to automate complicated model building tasks• Use the AWS CDK to codify the machine learning process• Create, deploy, and rebuild a CI/CD pipeline on AWS• Build an ML workflow using AWS Step Functions and the Data Science SDK• Leverage the Amazon SageMaker Feature Store to automate the machine learning software development life cycle (MLSDLC)• Discover how to use Amazon MWAA for a data-centric ML processWho this book is forThis book is for the novice as well as experienced machine learning practitioners looking to automate the process of building, training, and deploying machine learning-based solutions into production, using both purpose-built and other AWS services. A basic understanding of the end-to-end machine learning process and concepts, Python programming, and AWS is necessary to make the most out of this book.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?

Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.

¿Cómo descargo los libros?

Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.

¿En qué se diferencian los planes de precios?

Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.

¿Qué es Perlego?

Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.

¿Perlego ofrece la función de texto a voz?

Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.

¿Es Automated Machine Learning on AWS un PDF/ePUB en línea?

Sí, puedes acceder a Automated Machine Learning on AWS de Trenton Potgieter, Jonathan Dahlberg en formato PDF o ePUB, así como a otros libros populares de Computer Science y Data Processing. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Editorial

Packt Publishing

Año

2022

ISBN

9781801814522

Edición

Categoría

Computer Science

Categoría

Data Processing

Section 1: Fundamentals of the Automated Machine Learning Process and AutoML on AWS

This section will educate you on the complexities of the machine learning process, what AutoML is, and how it can be used to streamline the process.

This section comprises the following chapters:

Chapter 1, Getting Started with Automated Machine Learning on AWS
Chapter 2, Automating Machine Learning Model Development Using SageMaker Autopilot
Chapter 3, Automating Complicated Model Development with AutoGluon

Chapter 1: Getting Started with Automated Machine Learning on AWS

If you have ever had the pleasure of successfully driving a production-ready Machine Learning (ML) application to completion or you are currently in the process of developing your first ML project, I am sure that you will agree with me when I say, "This is not an easy task!"

Why do I say that? Well, if we ignore the intricacies involved in gathering the right training data, analyzing and understanding that data, and then building and training the best possible model, I am sure you will agree that the ML process in itself is a complicated task process, time-consuming, and entirely manual, making it extremely difficult to automate. And it is these factors, plus many more, that contribute to ML tasks being difficult to automate.

The primary goal of this chapter is to emphasize these challenges by reviewing a practical example that sets the stage for why automating the ML process is difficult. This chapter will highlight what governing factors should be considered when performing this automation and how leveraging various Amazon Web Services (AWS) capabilities can make the task of driving ML projects into production less daunting and fully automated. By the end of this chapter, we will have established a common foundation for overcoming these challenges through automation.

Therefore, in this chapter, we will cover the following topics:

Overview of the ML process
Complexities in the ML process
An example of the end-to-end ML process
How AWS can make automating ML development and the deployment process easier

Technical requirements

You will need access to the Jupyter Notebook environment to follow along with the example in this chapter. Although sample code has been provided for the various steps of the ML process, a Jupyter Notebook example has been provided in this book's GitHub repository (https://github.com/PacktPublishing/Automated-Machine-Learning-on-AWS/blob/main/Chapter01/ML%20Process%20Example.ipynb) for you to work through the entire example at your own pace.

For further instructions on how to set up a Jupyter Notebook environment, you can refer to the installation guide (https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html) to either set up JupyterLab or classic Jupyter Notebook. Alternatively, for local notebook development using a development IDE, such as Visual Studio Code, you can refer to the VS Code documentation (https://code.visualstudio.com/docs/datascience/jupyter-notebooks).

Overview of the ML process

Unfortunately, there is no established how-to guide when performing ML. This is because every ML use case is unique and specific to the application that leverages the resultant ML model. Instead, there is a general process pattern that most data scientists, ML engineers, and ML practitioners follow. This process model is called the Cross-Industry Standard Process for Data Mining (CRISP-DM) and while not everyone follows the specific steps of the process verbatim, most production ML models have probably, in some shape or form, been built by using the guardrails that the CRISP-DM methodology provides.

So, when we refer to the ML process, we are invariably referring to the overall methodology of building production-ready ML models using the guardrails from CRSIP-DM.

The following diagram shows an overview of the CRISP-DM guidelines for creating a typical process that an ML practitioner might follow:

Figure 1.1 – Overview of a typical ML process

In a nutshell, the process starts with the ML practitioner being tasked with providing an ML model that addresses a specific business use case. The ML practitioner then finds, ingests, and analyzes an appropriate dataset that can be effectively leveraged to accomplish the goals of the ML project.

Once the data has been analyzed, the ML practitioner determines the most applicable modeling techniques that extract the most relevant information from the data to address the use case. These techniques include the following:

Determining the most applicable ML algorithm
Creating new aspects (engineering new features) of the data that can further improve the chosen model's overall effectiveness
Separating the data into training and testing sets for model training and evaluation

The ML practitioner then codifies the algorithm's architecture and training/testing/evaluation routines. These routines are then executed to determine the best possible model parameters – ones that optimize the model to fit both the data and the business use case.

Finally, the best model is deployed into production to serve predictions that match the initial objective of the business use case.

As you can see, the overall process seems relatively straightforward and easy to follow. So, you may be wondering what all the fuss is about. For example, you may be asking yourself, Where is the complexity in this process? or Why do you say that this is so hard to automate?

While the process may look simplistic, the reality when executing it is vastly different. The following diagram provides a more realistic representation of what an ML practitioner may observe when developing an ML use case:

Figure 1.2 – Overview of a realistic ML process

As you can see, the overall process is far more convoluted than the typical representation shown in Figure 1.1. There are potentially multiple different paths that can be taken through the process. Each course of action is based on the results captured from the previous step in the process. Additionally, taking a particular course of action may not always yield the desired results, thus forcing the ML practitioner to have to reset or go back and choose a different set of criteria that will hopefully produce a better result.

So, now that we have provided a high-level overview of what the typical ML process should entail,...