eBook - ePub

Mastering Machine Learning on AWS

Name: Mastering Machine Learning on AWS
Author: Dr. Saket S.R. Mengle, Maximo Gurmendez

Advanced machine learning in Python using SageMaker, Apache Spark, and TensorFlow

Dr. Saket S.R. Mengle, Maximo Gurmendez

Buch teilen

306 Seiten
English
ePUB (handyfreundlich)
Über iOS und Android verfügbar

eBook - ePub

Mastering Machine Learning on AWS

Advanced machine learning in Python using SageMaker, Apache Spark, and TensorFlow

Dr. Saket S.R. Mengle, Maximo Gurmendez

Angaben zum Buch

Buchvorschau

Inhaltsverzeichnis

Quellenangaben

Über dieses Buch

Gain expertise in ML techniques with AWS to create interactive apps using SageMaker, Apache Spark, and TensorFlow.

Key Features

Build machine learning apps on Amazon Web Services (AWS) using SageMaker, Apache Spark and TensorFlow
Learn model optimization, and understand how to scale your models using simple and secure APIs
Develop, train, tune and deploy neural network models to accelerate model performance in the cloud

Book Description

AWS is constantly driving new innovations that empower data scientists to explore a variety of machine learning (ML) cloud services. This book is your comprehensive reference for learning and implementing advanced ML algorithms in AWS cloud.

As you go through the chapters, you'll gain insights into how these algorithms can be trained, tuned and deployed in AWS using Apache Spark on Elastic Map Reduce (EMR), SageMaker, and TensorFlow. While you focus on algorithms such as XGBoost, linear models, factorization machines, and deep nets, the book will also provide you with an overview of AWS as well as detailed practical applications that will help you solve real-world problems. Every practical application includes a series of companion notebooks with all the necessary code to run on AWS. In the next few chapters, you will learn to use SageMaker and EMR Notebooks to perform a range of tasks, right from smart analytics, and predictive modeling, through to sentiment analysis.

By the end of this book, you will be equipped with the skills you need to effectively handle machine learning projects and implement and evaluate algorithms on AWS.

What you will learn

Manage AI workflows by using AWS cloud to deploy services that feed smart data products
Use SageMaker services to create recommendation models
Scale model training and deployment using Apache Spark on EMR
Understand how to cluster big data through EMR and seamlessly integrate it with SageMaker
Build deep learning models on AWS using TensorFlow and deploy them as services
Enhance your apps by combining Apache Spark and Amazon SageMaker

Who this book is for

This book is for data scientists, machine learning developers, deep learning enthusiasts and AWS users who want to build advanced models and smart applications on the cloud using AWS and its integration services. Some understanding of machine learning concepts, Python programming and AWS will be beneficial.

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?

Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.

(Wie) Kann ich Bücher herunterladen?

Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.

Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?

Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.

Was ist Perlego?

Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.

Unterstützt Perlego Text-zu-Sprache?

Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.

Ist Mastering Machine Learning on AWS als Online-PDF/ePub verfügbar?

Ja, du hast Zugang zu Mastering Machine Learning on AWS von Dr. Saket S.R. Mengle, Maximo Gurmendez im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Computer Science & Artificial Intelligence (AI) & Semantics. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Information

Verlag

Packt Publishing

Jahr

2019

ISBN

9781789347500

Auflage

Thema

Computer Science

Thema

Artificial Intelligence (AI) & Semantics

Section 1: Machine Learning on AWS

The objective of this section is to introduce readers to machine learning in the context of AWS cloud computing and services. We expect our audience to have some basic knowledge of machine learning. However, we'll describe the nature of a typically successful machine learning project, and the challenges often faced. We will provide an overview of the different AWS services, along with examples of typical machine learning pipelines and the key aspects to consider in order to create smart AI-powered products.

This section contains the following chapter:

Chapter 1, Getting Started with Machine Learning for AWS

Getting Started with Machine Learning for AWS

In this book, we focus on all three aspects of data science by explaining machine learning (ML) algorithms in business applications, demonstrating how they can be implemented in a scalable environment, and examining how to evaluate models and present evaluation metrics as business key performance indicators (KPIs). This book shows how Amazon Web Services (AWS) ML tools can be effectively used on large datasets. We present various scenarios where mastering ML algorithms in AWS helps data scientists to perform their jobs more effectively.

Let's take a look at the topics we will cover in this chapter:

How AWS empowers data scientists
Identifying candidate problems that can be solved using ML
The ML project life cycle
Deploying models

How AWS empowers data scientists

The number of digital data records that are stored on the internet has grown a lot in the last decade. Due to the drop in storage costs, and new sources of digital data, it is predicted that the amount of digital data stored in 2025 will be 163 zettabytes (1,630,000,000,000 terabytes). Moreover, the amount of data that is generated every day is increasing at an alarming pace, with almost 90% of current data only having been generated during the last two years. With more than 3.5 billion people with access to the internet, this data is not only generated by professionals and large companies, but also by each of the 3.5 billion internet users.

Moreover, since companies understand the importance of data, they store all of their transactional data in the hope of analyzing it and uncovering interesting trends that could help their business make important decisions. Financial investors also crave storing and understanding every bit of information they can get about companies, and train their quantitative analysts or quants to make investment decisions.

It is up to the data scientists of the world to analyze this data and find the gems of information embedded in it. In the last decade, the data science team has become one of the most important teams in every organization. When data science teams were first created, most of the data would fit in Microsoft Excel sheets, and the task was to find statistical trends in the data and provide actionable insights to business teams. However, as the amount of data has increased and ML algorithms have become more sophisticated and potent, the scope of data science teams has expanded.

In the following diagram, we can see the three basic skills that a data scientist needs:

The job description for data scientists varies from company to company. However, in general, a data scientist needs the following three crucial skills:

ML: ML algorithms provide tools to analyze and learn from a large amount of data, and generate predictions or recommendations from that data. It is an important tool for analyzing structured data (such as databases) and unstructured data (such as text documents), and inferring actionable insights from them. A data scientist should be an expert in a plethora of ML algorithms and should understand what algorithm should be applied in a given situation. As data scientists have access to a large library of algorithms that can solve a given problem, they should know which algorithms should be used in each situation.
Computer programming: A data scientist should be an adept programmer, able to write code to access various ML and statistical libraries. There are a lot of programming languages, such as Scala, Python, and R, that provide a number of libraries that let us apply ML algorithms on a dataset. Hence, knowledge of such tools helps a data scientist to perform complex tasks within a feasible time frame. This is crucial in a business environment.
Communication: Along with discovering trends in the data and building complex ML models, a data scientist is also tasked with explaining these findings to business teams. Hence, a data scientist must not only possess good communication skills, but also good analytical and visualization skills. This will help them present complex data models in a way that is easily understood by people not familiar with ML. This also helps data scientists to convey their findings to business teams and provide them with guidance on expected outcomes.

Using AWS tools for ML

ML research spans decades and has deep roots in mathematics and statistics. ML algorithms can be used to solve problems in many business applications. In application areas such as advertising, predictive algorithms are used to predict where to discover further new customers based on trends from previous purchasers. Regression algorithms are used to predict stock prices based on prior trends. Services such as Netflix use recommendation algorithms to study the history of a user and enhance the discoverability of new shows that they may be interested in. Artificial Intelligence (AI) applications such as self-driving cars rely heavily on image recognition algorithms that utilize deep learning to effectively discover and label objects on the road. It is important for a data scientist to understand the nuances of different ML algorithms and understand where they should be applied. Using pre-existing libraries helps a data scientist to explore various algorithms for a given application area and evaluate them. AWS offers a large number of libraries that can be used to perform ML tasks, as explained in the ML algorithms and deep learning algorithms parts of this book.

Identifying candidate problems that can be solved using ML

It is also important for data scientists to be able to understand the scale of data that they are working with. There might be tasks related to medical research that span thousands of patients, with hundreds of features that can be processed on a single node device. However, tasks such as advertising, where companies collect several petabytes of data on customers based on every online advertisement that is served to the user, may require several thousand machines to compute and train ML algorithms. Deep learning algorithms are GPU-intensive and require a different type of machine than other ML algorithms. In this book, for each algorithm, we supply a description of how it is implemented simply using Python libraries, and then, how it can be scaled on large AWS clusters using technologies such as Spark and AWS SageMaker. We also discuss how TensorFlow is used for deep learning applications.

It is crucial to understand the customer of their ML-related tasks. Although it is challenging for data scientists to find which algorithm works for a specific application area, it is also important to gather evidence on how that algorithm enhances the application area and present this to the product owners. Hence, we also discuss how to evaluate each algorithm and visualize the results where necessary. AWS offers a large array of tools for evaluating ML algorithms and presenting the results.

Finally, a data scientist also needs to be able to make decisions on what types of machines best fit their needs on AWS. Once the algorithm is implemented, there are important considerations regarding how it can be deployed on large clusters in the most economical way. AWS offers more than 25 hardware alternatives, called instance types, which can be selected. We will discuss case studies on how an application is deployed on production clusters, and the various issues that a data scientist can face during this process.

The ML project life cycle

A typical ML project life cycle starts by understanding the problem at hand. Typically, someone in the organization (possibly a data scientist or business stakeholder) feels that some part of their business can be improved by the use of ML. For example, a music streaming company could conjecture that providing recommendations of songs similar to those played by a user would improve user engagement with the platform. Once we understand the business context and possible business actions to take, the data science team will need to consider several aspects during the project life cycle.

The following diagram describes various steps in the ML project life cycle:

Data gathering

We need to obtain data and organize it appropriately for the current problem (in our example, this could mean building a dataset linking users to songs they've listened to in the past). Depending on the size of the data, we might pick different technologies for storing the data. For example, it might be fine to train on a local machine using scikit-learn if we're working through a few million records. However, if the data doesn't fit on a single computer, then we must consider AWS solutions such as S3 for storage and Apache Spark, or SageMaker's built-in algorithms for model building.

Evaluation metrics

Before applying an ML algorithm, we need to consider how to assess the effectiveness of our strategy. In some cases, we can use part of our data to simulate the performance of the algorithm. However, on other occasions, the only viable way to ev...