eBook - ePub

Machine Learning Bookcamp

Name: Machine Learning Bookcamp
Author: Alexey Grigorev

Build a portfolio of real-life projects

Alexey Grigorev

Share book

472 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Machine Learning Bookcamp

Build a portfolio of real-life projects

Alexey Grigorev

Book details

Book preview

Table of contents

Citations

About This Book

Time to flex your machine learning muscles! Take on the carefully designed challenges of the Machine Learning Bookcamp and master essential ML techniques through practical application. Summary
In Machine Learning Bookcamp you will: Collect and clean data for training models
Use popular Python tools, including NumPy, Scikit-Learn, and TensorFlow
Apply ML to complex datasets with images
Deploy ML models to a production-ready environment The only way to learn is to practice! In Machine Learning Bookcamp, you'll create and deploy Python-based machine learning models for a variety of increasingly challenging projects. Taking you from the basics of machine learning to complex applications such as image analysis, each new project builds on what you've learned in previous chapters. You'll build a portfolio of business-relevant machine learning projects that hiring managers will be excited to see. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. About the technology
Master key machine learning concepts as you build actual projects! Machine learning is what you need for analyzing customer behavior, predicting price trends, evaluating risk, and much more. To master ML, you need great examples, clear explanations, and lots of practice. This book delivers all three! About the book
Machine Learning Bookcamp presents realistic, practical machine learning scenarios, along with crystal-clear coverage of key concepts. In it, you'll complete engaging projects, such as creating a car price predictor using linear regression and deploying a churn prediction service. You'll go beyond the algorithms and explore important techniques like deploying ML applications on serverless systems and serving models with Kubernetes and Kubeflow. Dig in, get your hands dirty, and have fun building your ML skills! What's inside Collect and clean data for training models
Use popular Python tools, including NumPy, Scikit-Learn, and TensorFlow
Deploy ML models to a production-ready environmentAbout the reader
Python programming skills assumed. No previous machine learning knowledge is required. About the author
Alexey Grigorev is a principal data scientist at OLX Group. He runs DataTalks.Club, a community of people who love data.Table of Contents 1 Introduction to machine learning
2 Machine learning for regression
3 Machine learning for classification
4 Evaluation metrics for classification
5 Deploying machine learning models
6 Decision trees and ensemble learning
7 Neural networks and deep learning
8 Serverless deep learning
9 Serving models with Kubernetes and Kubeflow

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is Machine Learning Bookcamp an online PDF/ePUB?

Yes, you can access Machine Learning Bookcamp by Alexey Grigorev in PDF and/or ePUB format, as well as other popular books in Informatique & Traitement des données. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Manning

Year

2021

ISBN

9781638351054

Topic

Informatique

Subtopic

Traitement des données

1 Introduction to machine learning

This chapter covers

Understanding machine learning and the problems it can solve
Organizing a successful machine learning project
Training and selecting machine learning models
Performing model validation

In this chapter, we introduce machine learning and describe the cases in which it’s most helpful. We show how machine learning projects are different from traditional software engineering (rule-based solutions) and illustrate the differences by using a spam-detection system as an example.

To use machine learning to solve real-life problems, we need a way to organize machine learning projects. In this chapter, we talk about CRISP-DM: a step-by-step methodology for implementing successful machine learning projects.

Finally, we take a closer look at one of the steps of CRISP-DM—the modeling step. In this step, we train different models and select the one that solves our problem best.

1.1 Machine learning

Machine learning is part of applied mathematics and computer science. It uses tools from mathematical disciplines such as probability, statistics, and optimization theory to extract patterns from data.

The main idea behind machine learning is learning from examples: we prepare a dataset with examples, and a machine learning system “learns” from this dataset. In other words, we give the system the input and the desired output, and the system tries to figure out how to do the conversion automatically, without asking a human.

We can collect a dataset with descriptions of cars and their prices, for example. Then we provide a machine learning model with this dataset and “teach” it by showing it cars and their prices. This process is called training or sometimes fitting (figure 1.1).

Figure 1.1 A machine learning algorithm takes in input data (descriptions of cars) and desired output (the cars’ prices). Based on that data, it produces a model.

When training is done, we can use the model by asking it to predict car prices that we don’t know yet (figure 1.2).

Figure 1.2 When training is done, we have a model that can be applied to new input data (cars without prices) to produce the output (predictions of prices).

All we need for machine learning is a dataset in which for each input item (a car) we have the desired output (the price).

This process is quite different from traditional software engineering. Without machine learning, analysts and developers look at the data they have and try to find patterns manually. After that, they come up with some logic: a set of rules for converting the input data to the desired output. Then they explicitly encode these rules using a programming language such as Java or Python, and the result is called software. So, in contrast with machine learning, a human does all the difficult work (figure 1.3).

Figure 1.3 In traditional software, patterns are discovered manually and then encoded with a programming language. A human does all the work.

In summary, the difference between a traditional software system and a system based on machine learning is shown in figure 1.4. In machine learning, we give the system the input and output data, and the result is a model (code) that can transform the input into the output. The difficult work is done by the machine; we need only supervise the training process to make sure that the model is good (figure 1.4B). In contrast, in traditional systems, we first find the patterns in the data ourselves and then write code that converts the data to the desired outcome, using the manually discovered patterns (figure 1.4A).

	(A) In traditional software we discover patterns manually and encode them using a programming language.
	(B) A machine learning system discovers patterns automatically by learning from examples. After training, it produces a model that “knows” these patterns, but we still need to supervise it to make sure the model is correct.

Figure 1.4 The difference between a traditional software system and a machine learning system. In traditional software engineering, we do all the work, whereas in machine learning, we delegate pattern discovery to a machine.

1.1.1 Machine learning vs. rule-based systems

To illustrate the difference between these two approaches and to show why machine learning is helpful, let’s consider a concrete case. In this section, we talk about a spam-detection system to show this difference.

Suppose we are running an email service, and the users start complaining about unsolicited emails with advertisements. To solve this problem, we want to create a system that marks the unwanted messages as spam and forwards them to the spam folder.

The obvious way to solve the problem is to look at these emails ourselves to see whether they have any pattern. For example, we can check the sender and the content.

If we find that there’s indeed a pattern in the spam messages, we write down the discovered patterns and come up with following two simple rules to catch these messages:

If sender = [email protected], then “spam”
If title contains “buy now 50% off” and sender domain is “online.com,” then “spam”
Otherwise, “good email”

We write these rules in Python and create a spam-detection service, which we successfully deploy. At the beginning, the system works well and catches all the spam, but after a while, new spam messages start to slip through. The rules we have are no longer successful at marking these messages as spam.

To solve the problem, we analyze the content of the new messages and find that most of them contain the word deposit. So we add a new rule:

If sender = “[email protected]” then “spam”
If title contains “buy now 50% off” and sender domain is “online.com,” then “spam”
If body contains a word “deposit,” then “spam”
Otherwise, “good email”

After discovering this rule, we deploy the fix to our Python service and start catching more spam, making the users of our mail system happy.

Some time later, however, users start complaining again: some people use the word deposit with good intentions, but our system fails to recognize that fact and marks the messages as...