eBook - ePub

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

Name: Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Author: Tarek Amr

A practical guide to implementing supervised and unsupervised machine learning algorithms in Python

Tarek Amr

Condividi libro

384 pagine
English
ePUB (disponibile sull'app)
Disponibile su iOS e Android

eBook - ePub

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

A practical guide to implementing supervised and unsupervised machine learning algorithms in Python

Tarek Amr

Dettagli del libro

Anteprima del libro

Indice dei contenuti

Citazioni

Informazioni sul libro

Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems

Key Features

Delve into machine learning with this comprehensive guide to scikit-learn and scientific Python
Master the art of data-driven problem-solving with hands-on examples
Foster your theoretical and practical knowledge of supervised and unsupervised machine learning algorithms

Book Description

Machine learning is applied everywhere, from business to research and academia, while scikit-learn is a versatile library that is popular among machine learning practitioners. This book serves as a practical guide for anyone looking to provide hands-on machine learning solutions with scikit-learn and Python toolkits.

The book begins with an explanation of machine learning concepts and fundamentals, and strikes a balance between theoretical concepts and their applications. Each chapter covers a different set of algorithms, and shows you how to use them to solve real-life problems. You'll also learn about various key supervised and unsupervised machine learning algorithms using practical examples. Whether it is an instance-based learning algorithm, Bayesian estimation, a deep neural network, a tree-based ensemble, or a recommendation system, you'll gain a thorough understanding of its theory and learn when to apply it. As you advance, you'll learn how to deal with unlabeled data and when to use different clustering and anomaly detection algorithms.

By the end of this machine learning book, you'll have learned how to take a data-driven approach to provide end-to-end machine learning solutions. You'll also have discovered how to formulate the problem at hand, prepare required data, and evaluate and deploy models in production.

What you will learn

Understand when to use supervised, unsupervised, or reinforcement learning algorithms
Find out how to collect and prepare your data for machine learning tasks
Tackle imbalanced data and optimize your algorithm for a bias or variance tradeoff
Apply supervised and unsupervised algorithms to overcome various machine learning challenges
Employ best practices for tuning your algorithm's hyper parameters
Discover how to use neural networks for classification and regression
Build, evaluate, and deploy your machine learning solutions to production

Who this book is for

This book is for data scientists, machine learning practitioners, and anyone who wants to learn how machine learning algorithms work and to build different machine learning models using the Python ecosystem. The book will help you take your knowledge of machine learning to the next level by grasping its ins and outs and tailoring it to your needs. Working knowledge of Python and a basic understanding of underlying mathematical and statistical concepts is required.

Domande frequenti

Come faccio ad annullare l'abbonamento?

È semplicissimo: basta accedere alla sezione Account nelle Impostazioni e cliccare su "Annulla abbonamento". Dopo la cancellazione, l'abbonamento rimarrà attivo per il periodo rimanente già pagato. Per maggiori informazioni, clicca qui

È possibile scaricare libri? Se sì, come?

Al momento è possibile scaricare tramite l'app tutti i nostri libri ePub mobile-friendly. Anche la maggior parte dei nostri PDF è scaricabile e stiamo lavorando per rendere disponibile quanto prima il download di tutti gli altri file. Per maggiori informazioni, clicca qui

Che differenza c'è tra i piani?

Entrambi i piani ti danno accesso illimitato alla libreria e a tutte le funzionalità di Perlego. Le uniche differenze sono il prezzo e il periodo di abbonamento: con il piano annuale risparmierai circa il 30% rispetto a 12 rate con quello mensile.

Cos'è Perlego?

Perlego è un servizio di abbonamento a testi accademici, che ti permette di accedere a un'intera libreria online a un prezzo inferiore rispetto a quello che pagheresti per acquistare un singolo libro al mese. Con oltre 1 milione di testi suddivisi in più di 1.000 categorie, troverai sicuramente ciò che fa per te! Per maggiori informazioni, clicca qui.

Perlego supporta la sintesi vocale?

Cerca l'icona Sintesi vocale nel prossimo libro che leggerai per verificare se è possibile riprodurre l'audio. Questo strumento permette di leggere il testo a voce alta, evidenziandolo man mano che la lettura procede. Puoi aumentare o diminuire la velocità della sintesi vocale, oppure sospendere la riproduzione. Per maggiori informazioni, clicca qui.

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits è disponibile online in formato PDF/ePub?

Sì, puoi accedere a Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits di Tarek Amr in formato PDF e/o ePub, così come ad altri libri molto apprezzati nelle sezioni relative a Computer Science e Computer Science General. Scopri oltre 1 milione di libri disponibili nel nostro catalogo.

Informazioni

Editore

Packt Publishing

Anno

2020

ISBN

9781838823580

Edizione

Argomento

Computer Science

Categoria

Computer Science General

Section 1: Supervised Learning

Supervised learning is by far the most used machine learning paradigm in business. It’s the key to automating manual tasks. This section comprises the different algorithms available for supervised learning, and you will learn when to use each of them. We will also try to showcase different types of data, from tabular data to textual data and images.

This section comprises the following chapters:

Chapter 1, Introduction to Machine Learning
Chapter 2, Making Decisions with Trees
Chapter 3, Making Decisions with Linear Equations
Chapter 4, Preparing Your Data
Chapter 5, Image Processing with Nearest Neighbors
Chapter 6, Classifying Text Using Naive Bayes

Introduction to Machine Learning

Machine learning is everywhere. When you book a flight ticket, an algorithm decides the price you are going to pay for it. When you apply for a loan, machine learning may decide whether you are going to get it or not. When you scroll through your Facebook timeline, it picks which advertisements to show to you. Machine learning also plays a big role in your Google search results. It organizes your email's inbox and filters out spam, it goes through your resumé before recruiters when you apply for a job, and, more recently, it has also started to play the role of your personal assistant in the form of Siri and other virtual assistants.

In this book, we will learn about the theory and practice of machine learning. We will understand when and how to apply it. To get started, we will look at a high-level introduction to how machine learning works. You will then be able to differentiate between the different machine learning paradigms and know when to use each of them. Then, you'll be taken through the model development life cycle and the different steps practitioners take to solve problems. Finally, we will introduce you to scikit-learn, and learn why it is the de facto tool for many practitioners.

Here is a list of the topics that will be covered in this first chapter:

Understanding machine learning
The model development life cycle
Introduction to scikit-learn
Installing the packages you need

Understanding machine learning

You may be wondering how machines actually learn. To get the answer to this query, let's take the following example of a fictional company. Space Shuttle Corporation has a few space vehicles to rent. They get applications every day from clients who want to travel to Mars. They are not sure whether those clients will ever return the vehicles—maybe they'll decide to continue living on Mars and never come back again. Even worse, some of the clients may be lousy pilots and crash their vehicles on the way. So, the company decides to hire shuttle rent-approval officers whose job is to go through the applications and decide who is worthy of a shuttle ride. Their business, however, grows so big that they need to formulate the shuttle-approval process.

A traditional shuttle company would start by having business rules and hiring junior employees to execute those rules. For example, if you are an alien, then sorry, you cannot rent a shuttle from us. If you are a human and you have kids that are in school on Earth, then you are more than welcome to rent one of our shuttles. As you can see, those rules are too broad. What about aliens who love living on Earth and just want to go to Mars for a quick holiday? To come up with a better business policy, the company starts hiring analysts. Their job is to go through historical data and try to come up with detailed rules or business logic. These analysts can come up with very detailed rules. If you are an alien, one of your parents is from Neptune, your age is between 0.1 and 0.2 Neptunian years, and you have 3 to 4 kids and one of them is 80% or more human, then you are allowed to rent a shuttle. To be able to come up with suitable rules, the analysts also need a way to measure how good this business logic is. For example, what percentage of the shuttles return if certain rules are applied? They use historic data to evaluate these measures, and only then can wesay that these rules are actually learned from data.

Machine learning works in almost the same way. You want to use historic data to come up with some business logic (an algorithm) in order to optimize some measure of how good the logic is (an objective or loss function). Throughout this book, we will learn about numerous machine learning algorithms; they differ from each other in how they represent business logic, what objective functions they use, and what optimization techniques they utilize to reach a model that maximizes (or sometimes minimizes) the objective function. Like the analysts in the previous example, you should pick an objective function that is as close as possible to your business objective. Any time you hear people saying data scientists should have a good understanding of their business, a significant part of that is their choice of a good objective function and ways to evaluate the models they build. In my example, I quickly picked the percentage of shuttles returned as my objective.

But if you think about it, is this really an accurate one-to-one mapping of the shuttle company's revenue? Is the revenue made by allowing a trip equal to the cost of losing a shuttle? Furthermore, rejecting a trip may also cost your company angry calls to the customer care center and negative word-of-mouth advertising. You have to understand all of this well enough before picking your objective function.

Finally, a key benefit to using machine learning is that it can iterate over a vast amount of business logic cases until it reaches the optimum objective function, unlike the case of the analysts in our space shuttle company who can only go so far with their rules. The machine learning approach is also automated in the sense that it keeps updating the business logic whenever new data arrives. These two aspects make it scalable, more accurate, and adaptable to change.

Types of machine learning algorithms

"Society is changing, one learning algorithm at a time."

– Pedro Domingos

In this book, we are going to cover the two main paradigms of machine learning—supervised learning and unsupervised learning. Each of these two paradigms has its own sub-branches that will be discussed in the next section. Although it is not covered in this book, reinforcement learning will also be introduced in the next section:

Let's use our fictional Space Shuttle Corporation company once more to explain the differences between the different machine learning paradigms.

Supervised learning

Remember those old good days at school when you were given examples to practice on, along with the correct answers to them at the end to validate whether you are doing a good job? Then, at exam time, you were left on your own. That's basically what supervised learning is. Say our fictional space vehicle company wants to predict whether travelers will return their space vehicles. Luckily, the company has worked with many travelers in the past, and they already know which of them returned their vehicles and who did not. Think of this data as a spreadsheet, where each column has some information about the travelers—their financial statements, the number of kids they have, whether they are humans or aliens, and maybe their age (in Neptunian years, of course). Machine learners call these columns features. There is one extra column for previous travelers that states whether they returned or not; we call this column the label or target column. In the learning phase, we build a model using the features and targets. The aim of the algorithm while learning is to minimize the differences between its predictions and the actual targets. The difference is what we call the error. Once a model is constructed so that its error is minimal, we then use it to make predictions for newer data points. For new travelers, we only know their features, but we use the model we've just built to predict their corresponding targets. In a nutshell, the presence of the target in our historic data is what makes this process supervised.

Classification versus regression

Supervised learning is furthersubdivided into classification and regression. For cases where we only have a few predefined labels to predict, we use a classifier—for example, returnversusno return orhuman versusMartian versusVenusian. If what we want to predict is a wide-range number—say, how many years a traveler will take to come back—then it is a regression problem since these values can be anything from 1 or 2 years to 3 years, 5 months, and 7 days.

Supervised learning evaluation

Due to their differences, the metrics we use to evaluate these classifiers are usually different from ones we use with regression:

Classifier evaluation metrics: Suppose we are using a classifier to determine whether a traveler is going to return. Then, of those travelers that the classifier predicted to return, we want to measure what percentage of them actually did return. We call this measure precision. Also, of all travelers who did return, we want to measure what percentage of them the classifier correctly predicted to return. We call this recall. Precision and recall can be calculated for each class—that is, we can also calculate precision and recall for the travelers who did not return.

Accuracy is another commonly used, and sometimes abused, measure. For each case in our historic data, we know whether a traveler actually returned (actuals) and we can also generate predictions of whether they will return. The accuracy calculates what percentage of cases of the predictions and actuals match. As you can see, it is labeled agnostic, so it can sometimesbe misleading when the classes are highly imbalanced. In our example business, say 99% of our travelers actually return. We can build a dummy classifier that predicts whether every single traveler returns; it will be accurate 99% of the time. This 99% accuracy value doesn't tell us much, especially if you know that in these cases, the recall value for non-ret...