eBook - ePub

Amazon SageMaker Best Practices

Name: Amazon SageMaker Best Practices
Author: Sireesha Muppala, Randy DeFauw, Shelbee Eigenbrode

Sireesha Muppala, Randy DeFauw, Shelbee Eigenbrode

Partager le livre

348 pages
English
ePUB (adapté aux mobiles)
Disponible sur iOS et Android

eBook - ePub

Amazon SageMaker Best Practices

Sireesha Muppala, Randy DeFauw, Shelbee Eigenbrode

Détails du livre

Aperçu du livre

Table des matières

Citations

À propos de ce livre

Overcome advanced challenges in building end-to-end ML solutions by leveraging the capabilities of Amazon SageMaker for developing and integrating ML models into productionKey Features• Learn best practices for all phases of building machine learning solutions - from data preparation to monitoring models in production• Automate end-to-end machine learning workflows with Amazon SageMaker and related AWS• Design, architect, and operate machine learning workloads in the AWS CloudBook DescriptionAmazon SageMaker is a fully managed AWS service that provides the ability to build, train, deploy, and monitor machine learning models. The book begins with a high-level overview of Amazon SageMaker capabilities that map to the various phases of the machine learning process to help set the right foundation. You'll learn efficient tactics to address data science challenges such as processing data at scale, data preparation, connecting to big data pipelines, identifying data bias, running A/B tests, and model explainability using Amazon SageMaker. As you advance, you'll understand how you can tackle the challenge of training at scale, including how to use large data sets while saving costs, monitoring training resources to identify bottlenecks, speeding up long training jobs, and tracking multiple models trained for a common goal. Moving ahead, you'll find out how you can integrate Amazon SageMaker with other AWS to build reliable, cost-optimized, and automated machine learning applications. In addition to this, you'll build ML pipelines integrated with MLOps principles and apply best practices to build secure and performant solutions.By the end of the book, you'll confidently be able to apply Amazon SageMaker's wide range of capabilities to the full spectrum of machine learning workflows.What you will learn• Perform data bias detection with AWS Data Wrangler and SageMaker Clarify• Speed up data processing with SageMaker Feature Store• Overcome labeling bias with SageMaker Ground Truth• Improve training time with the monitoring and profiling capabilities of SageMaker Debugger• Address the challenge of model deployment automation with CI/CD using the SageMaker model registry• Explore SageMaker Neo for model optimization• Implement data and model quality monitoring with Amazon Model Monitor• Improve training time and reduce costs with SageMaker data and model parallelismWho this book is forThis book is for expert data scientists responsible for building machine learning applications using Amazon SageMaker. Working knowledge of Amazon SageMaker, machine learning, deep learning, and experience using Jupyter Notebooks and Python is expected. Basic knowledge of AWS related to data, security, and monitoring will help you make the most of the book.

Foire aux questions

Comment puis-je résilier mon abonnement ?

Il vous suffit de vous rendre dans la section compte dans paramètres et de cliquer sur « Résilier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez résilié votre abonnement, il restera actif pour le reste de la période pour laquelle vous avez payé. Découvrez-en plus ici.

Puis-je / comment puis-je télécharger des livres ?

Pour le moment, tous nos livres en format ePub adaptés aux mobiles peuvent être téléchargés via l’application. La plupart de nos PDF sont également disponibles en téléchargement et les autres seront téléchargeables très prochainement. Découvrez-en plus ici.

Quelle est la différence entre les formules tarifaires ?

Les deux abonnements vous donnent un accès complet à la bibliothèque et à toutes les fonctionnalités de Perlego. Les seules différences sont les tarifs ainsi que la période d’abonnement : avec l’abonnement annuel, vous économiserez environ 30 % par rapport à 12 mois d’abonnement mensuel.

Qu’est-ce que Perlego ?

Nous sommes un service d’abonnement à des ouvrages universitaires en ligne, où vous pouvez accéder à toute une bibliothèque pour un prix inférieur à celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! Découvrez-en plus ici.

Prenez-vous en charge la synthèse vocale ?

Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte à haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accélérer ou le ralentir. Découvrez-en plus ici.

Est-ce que Amazon SageMaker Best Practices est un PDF/ePUB en ligne ?

Oui, vous pouvez accéder à Amazon SageMaker Best Practices par Sireesha Muppala, Randy DeFauw, Shelbee Eigenbrode en format PDF et/ou ePUB ainsi qu’à d’autres livres populaires dans Computer Science et Data Modelling & Design. Nous disposons de plus d’un million d’ouvrages à découvrir dans notre catalogue.

Informations

Éditeur

Packt Publishing

Année

2021

ISBN

9781801077767

Édition

Sujet

Computer Science

Sous-sujet

Data Modelling & Design

Section 1: Processing Data at Scale

This section sets the foundation for the rest of the book with an overview of Amazon SageMaker capabilities, a review of technical requirements, and insights on setting up the data science environment on AWS. This section then addresses the challenges involved in labeling and preparing large volumes of data. You will learn how to apply appropriate Amazon SageMaker capabilities and related services to derive features from raw data and persist features for reuse. Further, you will also learn how to persist features in a centralized repository to share across multiple ML projects.

This section comprises the following chapters:

Chapter 1, Amazon SageMaker Overview
Chapter 2, Data Science Environments
Chapter 3, Data Labeling with Amazon SageMaker Ground Truth
Chapter 4, Data Preparation at Scale Using Amazon SageMaker Data Wrangler and Processing
Chapter 5, Centralized Feature Repository with Amazon SageMaker Feature Store

Chapter 1: Amazon SageMaker Overview

This chapter will provide a high-level overview of the Amazon SageMaker capabilities that map to the various phases of the machine learning (ML) process. This will set a foundation for the best practices discussion of using SageMaker capabilities in order to handle various data science challenges.

In this chapter, we're going to cover the following main topics:

Preparing, building, training and tuning, deploying, and managing ML models
Discussion of data preparation capabilities
Feature tour of model-building capabilities
Feature tour of training and tuning capabilities
Feature tour of model management and deployment capabilities

Technical requirements

All notebooks with coding exercises will be available at the following GitHub link:

https://github.com/PacktPublishing/Amazon-SageMaker-Best-Practices

Preparing, building, training and tuning, deploying, and managing ML models

First, let's review the ML life cycle. By the end of this section, you should understand how SageMaker's capabilities map to the key phases of the ML life cycle. The following diagram shows you what the ML life cycle looks like:

Figure 1.1 – Machine learning life cycle

As you can see, there are three phases of the ML life cycle at a high level:

In the Data Preparation phase, you collect and explore data, label a ground truth dataset, and prepare your features. Feature engineering, in turn, has several steps, including data normalization, encoding, and calculating embeddings, depending on the ML algorithm you choose.
In the Model Training phase, you build your model and tune it until you achieve a reasonable validation score that aligns with your business objective.
In the Operations phase, you test how well your model performs against real-world data, deploy it, and monitor how well it performs. We will cover model monitoring in more detail in Chapter 11, Monitoring Production Models with Amazon SageMaker Model Monitor and Clarify.

This diagram is purposely simplified; in reality, each phase may have multiple smaller steps, and the whole life cycle is iterative. You're never really done with ML; as you gather data on how your model performs in production, you'll likely try to improve it by collecting more data, changing your features, or tuning the model.

So how do SageMaker capabilities map to the ML life cycle? Before we answer that question, let's take a look at the SageMaker console (Figure 1.2):

Figure 1.2 – Navigation pane in the SageMaker console

The appearance of the console changes frequently and the preceding screenshot shows the current appearance of the console at the time of writing.

These capability groups align to the ML life cycle, shown as follows:

Figure 1.3 – Mapping of SageMaker capabilities to the ML life cycle

SageMaker Studio is not shown here, as it is an integrated workbench that provides a user interface for many SageMaker capabilities. The marketplace provides both data and algorithms that can be used across the life cycle.

Now that we have had a look at the console, let's dive deeper into the individual capabilities of SageMaker in each life cycle phase.

Discussion of data preparation capabilities

In this section, we'll dive into SageMaker's data preparation and feature engineering capabilities. By the end of this section, you should understand when to use SageMaker Ground Truth, Data Wrangler, Processing, Feature Store, and Clarify.

SageMaker Ground Truth

Obtaining labeled data for classification, regression, and other tasks is often the biggest barrier to ML projects, as many companies have a lot of data but have not explicitly labeled it according to business properties such as anomalous and high lifetime value. SageMaker Ground Truth helps you systematically label data by defining a labeling workflow and assigning labeling tasks to a human workforce.

Over time, Ground Truth can learn how to label data automatica...