eBook - ePub

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

Name: Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Author: Tarek Amr

A practical guide to implementing supervised and unsupervised machine learning algorithms in Python

Tarek Amr

Buch teilen

384 Seiten
English
ePUB (handyfreundlich)
Über iOS und Android verfügbar

eBook - ePub

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

A practical guide to implementing supervised and unsupervised machine learning algorithms in Python

Tarek Amr

Angaben zum Buch

Buchvorschau

Inhaltsverzeichnis

Quellenangaben

Über dieses Buch

Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems

Key Features

Delve into machine learning with this comprehensive guide to scikit-learn and scientific Python
Master the art of data-driven problem-solving with hands-on examples
Foster your theoretical and practical knowledge of supervised and unsupervised machine learning algorithms

Book Description

Machine learning is applied everywhere, from business to research and academia, while scikit-learn is a versatile library that is popular among machine learning practitioners. This book serves as a practical guide for anyone looking to provide hands-on machine learning solutions with scikit-learn and Python toolkits.

The book begins with an explanation of machine learning concepts and fundamentals, and strikes a balance between theoretical concepts and their applications. Each chapter covers a different set of algorithms, and shows you how to use them to solve real-life problems. You'll also learn about various key supervised and unsupervised machine learning algorithms using practical examples. Whether it is an instance-based learning algorithm, Bayesian estimation, a deep neural network, a tree-based ensemble, or a recommendation system, you'll gain a thorough understanding of its theory and learn when to apply it. As you advance, you'll learn how to deal with unlabeled data and when to use different clustering and anomaly detection algorithms.

By the end of this machine learning book, you'll have learned how to take a data-driven approach to provide end-to-end machine learning solutions. You'll also have discovered how to formulate the problem at hand, prepare required data, and evaluate and deploy models in production.

What you will learn

Understand when to use supervised, unsupervised, or reinforcement learning algorithms
Find out how to collect and prepare your data for machine learning tasks
Tackle imbalanced data and optimize your algorithm for a bias or variance tradeoff
Apply supervised and unsupervised algorithms to overcome various machine learning challenges
Employ best practices for tuning your algorithm's hyper parameters
Discover how to use neural networks for classification and regression
Build, evaluate, and deploy your machine learning solutions to production

Who this book is for

This book is for data scientists, machine learning practitioners, and anyone who wants to learn how machine learning algorithms work and to build different machine learning models using the Python ecosystem. The book will help you take your knowledge of machine learning to the next level by grasping its ins and outs and tailoring it to your needs. Working knowledge of Python and a basic understanding of underlying mathematical and statistical concepts is required.

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?

Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.

(Wie) Kann ich Bücher herunterladen?

Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.

Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?

Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.

Was ist Perlego?

Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.

Unterstützt Perlego Text-zu-Sprache?

Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.

Ist Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits als Online-PDF/ePub verfügbar?

Ja, du hast Zugang zu Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits von Tarek Amr im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Ciencia de la computación & Ciencias computacionales general. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Information

Verlag

Packt Publishing

Jahr

2020

ISBN

9781838823580

Auflage

Thema

Ciencia de la computación

Thema

Ciencias computacionales general

Section 1: Supervised Learning

Supervised learning is by far the most used machine learning paradigm in business. It’s the key to automating manual tasks. This section comprises the different algorithms available for supervised learning, and you will learn when to use each of them. We will also try to showcase different types of data, from tabular data to textual data and images.

This section comprises the following chapters:

Chapter 1, Introduction to Machine Learning
Chapter 2, Making Decisions with Trees
Chapter 3, Making Decisions with Linear Equations
Chapter 4, Preparing Your Data
Chapter 5, Image Processing with Nearest Neighbors
Chapter 6, Classifying Text Using Naive Bayes

Introduction to Machine Learning

Machine learning is everywhere. When you book a flight ticket, an algorithm decides the price you are going to pay for it. When you apply for a loan, machine learning may decide whether you are going to get it or not. When you scroll through your Facebook timeline, it picks which advertisements to show to you. Machine learning also plays a big role in your Google search results. It organizes your email's inbox and filters out spam, it goes through your resumé before recruiters when you apply for a job, and, more recently, it has also started to play the role of your personal assistant in the form of Siri and other virtual assistants.

In this book, we will learn about the theory and practice of machine learning. We will understand when and how to apply it. To get started, we will look at a high-level introduction to how machine learning works. You will then be able to differentiate between the different machine learning paradigms and know when to use each of them. Then, you'll be taken through the model development life cycle and the different steps practitioners take to solve problems. Finally, we will introduce you to scikit-learn, and learn why it is the de facto tool for many practitioners.

Here is a list of the topics that will be covered in this first chapter:

Understanding machine learning
The model development life cycle
Introduction to scikit-learn
Installing the packages you need

Understanding machine learning

You may be wondering how machines actually learn. To get the answer to this query, let's take the following example of a fictional company. Space Shuttle Corporation has a few space vehicles to rent. They get applications every day from clients who want to travel to Mars. They are not sure whether those clients will ever return the vehicles—maybe they'll decide to continue living on Mars and never come back again. Even worse, some of the clients may be lousy pilots and crash their vehicles on the way. So, the company decides to hire shuttle rent-approval officers whose job is to go through the applications and decide who is worthy of a shuttle ride. Their business, however, grows so big that they need to formulate the shuttle-approval process.

A traditional shuttle company would start by having business rules and hiring junior employees to execute those rules. For example, if you are an alien, then sorry, you cannot rent a shuttle from us. If you are a human and you have kids that are in school on Earth, then you are more than welcome to rent one of our shuttles. As you can see, those rules are too broad. What about aliens who love living on Earth and just want to go to Mars for a quick holiday? To come up with a better business policy, the company starts hiring analysts. Their job is to go through historical data and try to come up with detailed rules or business logic. These analysts can come up with very detailed rules. If you are an alien, one of your parents is from Neptune, your age is between 0.1 and 0.2 Neptunian years, and you have 3 to 4 kids and one of them is 80% or more human, then you are allowed to rent a shuttle. To be able to come up with suitable rules, the analysts also need a way to measure how good this business logic is. For example, what percentage of the shuttles return if certain rules are applied? They use historic data to evaluate these measures, and only then can wesay that these rules are actually learned from data.

Machine learning works in almost the same way. You want to use historic data to come up with some business logic (an algorithm) in order to optimize some measure of how good the logic is (an objective or loss function). Throughout this book, we will learn about numerous machine learning algorithms; they differ from each other in how they represent business logic, what objective functions they use, and what optimization techniques they utilize to reach a model that maximizes (or sometimes minimizes) the objective function. Like the analysts in the previous example, you should pick an objective function that is as close as possible to your business objective. Any time you hear people saying data scientists should have a good understanding of their business, a significant part of that is their choice of a good objective function and ways to evaluate the models they build. In my example, I quickly picked the percentage of shuttles returned as my objective.

But if you think about it, is this really an accurate one-to-one mapping of the shuttle company's revenue? Is the revenue made by allowing a trip equal to the cost of losing a shuttle? Furthermore, rejecting a trip may also cost your company angry calls to the customer care center and negative word-of-mouth advertising. You have to understand all of this well enough before picking your objective function.

Finally, a key benefit to using machine learning is that it can iterate over a vast amount of business logic cases until it reaches the optimum objective function, unlike the case of the analysts in our space shuttle company who can only go so far with their rules. The machine learning approach is also automated in the sense that it keeps updating the business logic whenever new data arrives. These two aspects make it scalable, more accurate, and adaptable to change.

Types of machine learning algorithms

"Society is changing, one learning algorithm at a time."

– Pedro Domingos

In this book, we are going to cover the two main paradigms of machine learning—supervised learning and unsupervised learning. Each of these two paradigms has its own sub-branches that will be discussed in the next section. Although it is not covered in this book, reinforcement learning will also be introduced in the next section:

Let's use our fictional Space Shuttle Corporation company once more to explain the differences between the different machine learning paradigms.

Supervised learning

Remember those old good days at school when you were given examples to practice on, along with the correct answers to them at the end to validate whether you are doing a good job? Then, at exam time, you were left on your own. That's basically what supervised learning is. Say our fictional space vehicle company wants to predict whether travelers will return their space vehicles. Luckily, the company has worked with many travelers in the past, and they already know which of them returned their vehicles and who did not. Think of this data as a spreadsheet, where each column has some information about the travelers—their financial statements, the number of kids they have, whether they are humans or aliens, and maybe their age (in Neptunian years, of course). Machine learners call these columns features. There is one extra column for previous travelers that states whether they returned or not; we call this column the label or target column. In the learning phase, we build a model using the features and targets. The aim of the algorithm while learning is to minimize the differences between its predictions and the actual targets. The difference is what we call the error. Once a model is constructed so that its error is minimal, we then use it to make predictions for newer data points. For new travelers, we only know their features, but we use the model we've just built to predict their corresponding targets. In a nutshell, the presence of the target in our historic data is what makes this process supervised.

Classification versus regression

Supervised learning is furthersubdivided into classification and regression. For cases where we only have a few predefined labels to predict, we use a classifier—for example, returnversusno return orhuman versusMartian versusVenusian. If what we want to predict is a wide-range number—say, how many years a traveler will take to come back—then it is a regression problem since these values can be anything from 1 or 2 years to 3 years, 5 months, and 7 days.

Supervised learning evaluation

Due to their differences, the metrics we use to evaluate these classifiers are usually different from ones we use with regression:

Classifier evaluation metrics: Suppose we are using a classifier to determine whether a traveler is going to return. Then, of those travelers that the classifier predicted to return, we want to measure what percentage of them actually did return. We call this measure precision. Also, of all travelers who did return, we want to measure what percentage of them the classifier correctly predicted to return. We call this recall. Precision and recall can be calculated for each class—that is, we can also calculate precision and recall for the travelers who did not return.

Accuracy is another commonly used, and sometimes abused, measure. For each case in our historic data, we know whether a traveler actually returned (actuals) and we can also generate predictions of whether they will return. The accuracy calculates what percentage of cases of the predictions and actuals match. As you can see, it is labeled agnostic, so it can sometimesbe misleading when the classes are highly imbalanced. In our example business, say 99% of our travelers actually return. We can build a dummy classifier that predicts whether every single traveler returns; it will be accurate 99% of the time. This 99% accuracy value doesn't tell us much, especially if you know that in these cases, the recall value for non-ret...