eBook - ePub

Mastering Machine Learning on AWS

Name: Mastering Machine Learning on AWS
ISBN: 9781789347500

Advanced machine learning in Python using SageMaker, Apache Spark, and TensorFlow

Dr. Saket S.R. Mengle,

Maximo Gurmendez,

306 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Mastering Machine Learning on AWS

Advanced machine learning in Python using SageMaker, Apache Spark, and TensorFlow

Dr. Saket S.R. Mengle,

Maximo Gurmendez,

About this book

Gain expertise in ML techniques with AWS to create interactive apps using SageMaker, Apache Spark, and TensorFlow.

Key Features

Build machine learning apps on Amazon Web Services (AWS) using SageMaker, Apache Spark and TensorFlow
Learn model optimization, and understand how to scale your models using simple and secure APIs
Develop, train, tune and deploy neural network models to accelerate model performance in the cloud

Book Description

AWS is constantly driving new innovations that empower data scientists to explore a variety of machine learning (ML) cloud services. This book is your comprehensive reference for learning and implementing advanced ML algorithms in AWS cloud.

As you go through the chapters, you'll gain insights into how these algorithms can be trained, tuned and deployed in AWS using Apache Spark on Elastic Map Reduce (EMR), SageMaker, and TensorFlow. While you focus on algorithms such as XGBoost, linear models, factorization machines, and deep nets, the book will also provide you with an overview of AWS as well as detailed practical applications that will help you solve real-world problems. Every practical application includes a series of companion notebooks with all the necessary code to run on AWS. In the next few chapters, you will learn to use SageMaker and EMR Notebooks to perform a range of tasks, right from smart analytics, and predictive modeling, through to sentiment analysis.

By the end of this book, you will be equipped with the skills you need to effectively handle machine learning projects and implement and evaluate algorithms on AWS.

What you will learn

Manage AI workflows by using AWS cloud to deploy services that feed smart data products
Use SageMaker services to create recommendation models
Scale model training and deployment using Apache Spark on EMR
Understand how to cluster big data through EMR and seamlessly integrate it with SageMaker
Build deep learning models on AWS using TensorFlow and deploy them as services
Enhance your apps by combining Apache Spark and Amazon SageMaker

Who this book is for

This book is for data scientists, machine learning developers, deep learning enthusiasts and AWS users who want to build advanced models and smart applications on the cloud using AWS and its integration services. Some understanding of machine learning concepts, Python programming and AWS will be beneficial.

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Year

Print ISBN

eBook ISBN

Edition

Topic

Computer Science

Subtopic

Artificial Intelligence (AI) & Semantics

Index

Computer Science

Section 1: Machine Learning on AWS

The objective of this section is to introduce readers to machine learning in the context of AWS cloud computing and services. We expect our audience to have some basic knowledge of machine learning. However, we'll describe the nature of a typically successful machine learning project, and the challenges often faced. We will provide an overview of the different AWS services, along with examples of typical machine learning pipelines and the key aspects to consider in order to create smart AI-powered products.

This section contains the following chapter:

Chapter 1, Getting Started with Machine Learning for AWS

Getting Started with Machine Learning for AWS

In this book, we focus on all three aspects of data science by explaining machine learning (ML) algorithms in business applications, demonstrating how they can be implemented in a scalable environment, and examining how to evaluate models and present evaluation metrics as business key performance indicators (KPIs). This book shows how Amazon Web Services (AWS) ML tools can be effectively used on large datasets. We present various scenarios where mastering ML algorithms in AWS helps data scientists to perform their jobs more effectively.

Let's take a look at the topics we will cover in this chapter:

How AWS empowers data scientists
Identifying candidate problems that can be solved using ML
The ML project life cycle
Deploying models

How AWS empowers data scientists

The number of digital data records that are stored on the internet has grown a lot in the last decade. Due to the drop in storage costs, and new sources of digital data, it is predicted that the amount of digital data stored in 2025 will be 163 zettabytes (1,630,000,000,000 terabytes). Moreover, the amount of data that is generated every day is increasing at an alarming pace, with almost 90% of current data only having been generated during the last two years. With more than 3.5 billion people with access to the internet, this data is not only generated by professionals and large companies, but also by each of the 3.5 billion internet users.

Moreover, since companies understand the importance of data, they store all of their transactional data in the hope of analyzing it and uncovering interesting trends that could help their business make important decisions. Financial investors also crave storing and understanding every bit of information they can get about companies, and train their quantitative analysts or quants to make investment decisions.

It is up to the data scientists of the world to analyze this data and find the gems of information embedded in it. In the last decade, the data science team has become one of the most important teams in every organization. When data science teams were first created, most of the data would fit in Microsoft Excel sheets, and the task was to find statistical trends in the data and provide actionable insights to business teams. However, as the amount of data has increased and ML algorithms have become more sophisticated and potent, the scope of data science teams has expanded.

In the following diagram, we can see the three basic skills that a data scientist needs:

The job description for data scientists varies from company to company. However, in general, a data scientist needs the following three crucial skills:

ML: ML algorithms provide tools to analyze and learn from a large amount of data, and generate predictions or recommendations from that data. It is an important tool for analyzing structured data (such as databases) and unstructured data (such as text documents), and inferring actionable insights from them. A data scientist should be an expert in a plethora of ML algorithms and should understand what algorithm should be applied in a given situation. As data scientists have access to a large library of algorithms that can solve a given problem, they should know which algorithms should be used in each situation.
Computer programming: A data scientist should be an adept programmer, able to write code to access various ML and statistical libraries. There are a lot of programming languages, such as Scala, Python, and R, that provide a number of libraries that let us apply ML algorithms on a dataset. Hence, knowledge of such tools helps a data scientist to perform complex tasks within a feasible time frame. This is crucial in a business environment.
Communication: Along with discovering trends in the data and building complex ML models, a data scientist is also tasked with explaining these findings to business teams. Hence, a data scientist must not only possess good communication skills, but also good analytical and visualization skills. This will help them present complex data models in a way that is easily understood by people not familiar with ML. This also helps data scientists to convey their findings to business teams and provide them with guidance on expected outcomes.

Using AWS tools for ML

ML research spans decades and has deep roots in mathematics and statistics. ML algorithms can be used to solve problems in many business applications. In application areas such as advertising, predictive algorithms are used to predict where to discover further new customers based on trends from previous purchasers. Regression algorithms are used to predict stock prices based on prior trends. Services such as Netflix use recommendation algorithms to study the history of a user and enhance the discoverability of new shows that they may be interested in. Artificial Intelligence (AI) applications such as self-driving cars rely heavily on image recognition algorithms that utilize deep learning to effectively discover and label objects on the road. It is important for a data scientist to understand the nuances of different ML algorithms and understand where they should be applied. Using pre-existing libraries helps a data scientist to explore various algorithms for a given application area and evaluate them. AWS offers a large number of libraries that can be used to perform ML tasks, as explained in the ML algorithms and deep learning algorithms parts of this book.

Identifying candidate problems that can be solved using ML

It is also important for data scientists to be able to understand the scale of data that they are working with. There might be tasks related to medical research that span thousands of patients, with hundreds of features that can be processed on a single node device. However, tasks such as advertising, where companies collect several petabytes of data on customers based on every online advertisement that is served to the user, may require several thousand machines to compute and train ML algorithms. Deep learning algorithms are GPU-intensive and require a different type of machine than other ML algorithms. In this book, for each algorithm, we supply a description of how it is implemented simply using Python libraries, and then, how it can be scaled on large AWS clusters using technologies such as Spark and AWS SageMaker. We also discuss how TensorFlow is used for deep learning applications.

It is crucial to understand the customer of their ML-related tasks. Although it is challenging for data scientists to find which algorithm works for a specific application area, it is also important to gather evidence on how that algorithm enhances the application area and present this to the product owners. Hence, we also discuss how to evaluate each algorithm and visualize the results where necessary. AWS offers a large array of tools for evaluating ML algorithms and presenting the results.

Finally, a data scientist also needs to be able to make decisions on what types of machines best fit their needs on AWS. Once the algorithm is implemented, there are important considerations regarding how it can be deployed on large clusters in the most economical way. AWS offers more than 25 hardware alternatives, called instance types, which can be selected. We will discuss case studies on how an application is deployed on production clusters, and the various issues that a data scientist can face during this process.

The ML project life cycle

A typical ML project life cycle starts by understanding the problem at hand. Typically, someone in the organization (possibly a data scientist or business stakeholder) feels that some part of their business can be improved by the use of ML. For example, a music streaming company could conjecture that providing recommendations of songs similar to those played by a user would improve user engagement with the platform. Once we understand the business context and possible business actions to take, the data science team will need to consider several aspects during the project life cycle.

The following diagram describes various steps in the ML project life cycle:

Data gathering

We need to obtain data and organize it appropriately for the current problem (in our example, this could mean building a dataset linking users to songs they've listened to in the past). Depending on the size of the data, we might pick different technologies for storing the data. For example, it might be fine to train on a local machine using scikit-learn if we're working through a few million records. However, if the data doesn't fit on a single computer, then we must consider AWS solutions such as S3 for storage and Apache Spark, or SageMaker's built-in algorithms for model building.

Evaluation metrics

Before applying an ML algorithm, we need to consider how to assess the effectiveness of our strategy. In some cases, we can use part of our data to simulate the performance of the algorithm. However, on other occasions, the only viable way to ev...

Title Page
Copyright and Credits
Dedication
About Packt
Contributors
Preface
Section 1: Machine Learning on AWS
Getting Started with Machine Learning for AWS
Section 2: Implementing Machine Learning Algorithms at Scale on AWS
Classifying Twitter Feeds with Naive Bayes
Predicting House Value with Regression Algorithms
Predicting User Behavior with Tree-Based Methods
Customer Segmentation Using Clustering Algorithms
Analyzing Visitor Patterns to Make Recommendations
Section 3: Deep Learning
Implementing Deep Learning Algorithms
Implementing Deep Learning with TensorFlow on AWS
Image Classification and Detection with SageMaker
Section 4: Integrating Ready-Made AWS Machine Learning Services
Working with AWS Comprehend
Using AWS Rekognition
Building Conversational Interfaces Using AWS Lex
Section 5: Optimizing and Deploying Models through AWS
Creating Clusters on AWS
Optimizing Models in Spark and SageMaker
Tuning Clusters for Machine Learning
Deploying Models Built in AWS
Appendix: Getting Started with AWS
Other Books You May Enjoy

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Mastering Machine Learning on AWS by Dr. Saket S.R. Mengle, Maximo Gurmendez in PDF and/or ePUB format, as well as other popular books in Computer Science & Artificial Intelligence (AI) & Semantics. We have over one million books available in our catalogue for you to explore.

About this book

Tools to learn more effectively

Information

Table of contents

Frequently asked questions