eBook - ePub

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

Name: Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
ISBN: 9781838823580

A practical guide to implementing supervised and unsupervised machine learning algorithms in Python

Tarek Amr,

384 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

A practical guide to implementing supervised and unsupervised machine learning algorithms in Python

Tarek Amr,

About this book

Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems

Key Features

Delve into machine learning with this comprehensive guide to scikit-learn and scientific Python
Master the art of data-driven problem-solving with hands-on examples
Foster your theoretical and practical knowledge of supervised and unsupervised machine learning algorithms

Book Description

Machine learning is applied everywhere, from business to research and academia, while scikit-learn is a versatile library that is popular among machine learning practitioners. This book serves as a practical guide for anyone looking to provide hands-on machine learning solutions with scikit-learn and Python toolkits.

The book begins with an explanation of machine learning concepts and fundamentals, and strikes a balance between theoretical concepts and their applications. Each chapter covers a different set of algorithms, and shows you how to use them to solve real-life problems. You'll also learn about various key supervised and unsupervised machine learning algorithms using practical examples. Whether it is an instance-based learning algorithm, Bayesian estimation, a deep neural network, a tree-based ensemble, or a recommendation system, you'll gain a thorough understanding of its theory and learn when to apply it. As you advance, you'll learn how to deal with unlabeled data and when to use different clustering and anomaly detection algorithms.

By the end of this machine learning book, you'll have learned how to take a data-driven approach to provide end-to-end machine learning solutions. You'll also have discovered how to formulate the problem at hand, prepare required data, and evaluate and deploy models in production.

What you will learn

Understand when to use supervised, unsupervised, or reinforcement learning algorithms
Find out how to collect and prepare your data for machine learning tasks
Tackle imbalanced data and optimize your algorithm for a bias or variance tradeoff
Apply supervised and unsupervised algorithms to overcome various machine learning challenges
Employ best practices for tuning your algorithm's hyper parameters
Discover how to use neural networks for classification and regression
Build, evaluate, and deploy your machine learning solutions to production

Who this book is for

This book is for data scientists, machine learning practitioners, and anyone who wants to learn how machine learning algorithms work and to build different machine learning models using the Python ecosystem. The book will help you take your knowledge of machine learning to the next level by grasping its ins and outs and tailoring it to your needs. Working knowledge of Python and a basic understanding of underlying mathematical and statistical concepts is required.

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Year

Print ISBN

eBook ISBN

Edition

Topic

Computer Science

Subtopic

Computer Science General

Index

Computer Science

Section 1: Supervised Learning

Supervised learning is by far the most used machine learning paradigm in business. It’s the key to automating manual tasks. This section comprises the different algorithms available for supervised learning, and you will learn when to use each of them. We will also try to showcase different types of data, from tabular data to textual data and images.

This section comprises the following chapters:

Chapter 1, Introduction to Machine Learning
Chapter 2, Making Decisions with Trees
Chapter 3, Making Decisions with Linear Equations
Chapter 4, Preparing Your Data
Chapter 5, Image Processing with Nearest Neighbors
Chapter 6, Classifying Text Using Naive Bayes

Introduction to Machine Learning

Machine learning is everywhere. When you book a flight ticket, an algorithm decides the price you are going to pay for it. When you apply for a loan, machine learning may decide whether you are going to get it or not. When you scroll through your Facebook timeline, it picks which advertisements to show to you. Machine learning also plays a big role in your Google search results. It organizes your email's inbox and filters out spam, it goes through your resumé before recruiters when you apply for a job, and, more recently, it has also started to play the role of your personal assistant in the form of Siri and other virtual assistants.

In this book, we will learn about the theory and practice of machine learning. We will understand when and how to apply it. To get started, we will look at a high-level introduction to how machine learning works. You will then be able to differentiate between the different machine learning paradigms and know when to use each of them. Then, you'll be taken through the model development life cycle and the different steps practitioners take to solve problems. Finally, we will introduce you to scikit-learn, and learn why it is the de facto tool for many practitioners.

Here is a list of the topics that will be covered in this first chapter:

Understanding machine learning
The model development life cycle
Introduction to scikit-learn
Installing the packages you need

Understanding machine learning

You may be wondering how machines actually learn. To get the answer to this query, let's take the following example of a fictional company. Space Shuttle Corporation has a few space vehicles to rent. They get applications every day from clients who want to travel to Mars. They are not sure whether those clients will ever return the vehicles—maybe they'll decide to continue living on Mars and never come back again. Even worse, some of the clients may be lousy pilots and crash their vehicles on the way. So, the company decides to hire shuttle rent-approval officers whose job is to go through the applications and decide who is worthy of a shuttle ride. Their business, however, grows so big that they need to formulate the shuttle-approval process.

A traditional shuttle company would start by having business rules and hiring junior employees to execute those rules. For example, if you are an alien, then sorry, you cannot rent a shuttle from us. If you are a human and you have kids that are in school on Earth, then you are more than welcome to rent one of our shuttles. As you can see, those rules are too broad. What about aliens who love living on Earth and just want to go to Mars for a quick holiday? To come up with a better business policy, the company starts hiring analysts. Their job is to go through historical data and try to come up with detailed rules or business logic. These analysts can come up with very detailed rules. If you are an alien, one of your parents is from Neptune, your age is between 0.1 and 0.2 Neptunian years, and you have 3 to 4 kids and one of them is 80% or more human, then you are allowed to rent a shuttle. To be able to come up with suitable rules, the analysts also need a way to measure how good this business logic is. For example, what percentage of the shuttles return if certain rules are applied? They use historic data to evaluate these measures, and only then can wesay that these rules are actually learned from data.

Machine learning works in almost the same way. You want to use historic data to come up with some business logic (an algorithm) in order to optimize some measure of how good the logic is (an objective or loss function). Throughout this book, we will learn about numerous machine learning algorithms; they differ from each other in how they represent business logic, what objective functions they use, and what optimization techniques they utilize to reach a model that maximizes (or sometimes minimizes) the objective function. Like the analysts in the previous example, you should pick an objective function that is as close as possible to your business objective. Any time you hear people saying data scientists should have a good understanding of their business, a significant part of that is their choice of a good objective function and ways to evaluate the models they build. In my example, I quickly picked the percentage of shuttles returned as my objective.

But if you think about it, is this really an accurate one-to-one mapping of the shuttle company's revenue? Is the revenue made by allowing a trip equal to the cost of losing a shuttle? Furthermore, rejecting a trip may also cost your company angry calls to the customer care center and negative word-of-mouth advertising. You have to understand all of this well enough before picking your objective function.

Finally, a key benefit to using machine learning is that it can iterate over a vast amount of business logic cases until it reaches the optimum objective function, unlike the case of the analysts in our space shuttle company who can only go so far with their rules. The machine learning approach is also automated in the sense that it keeps updating the business logic whenever new data arrives. These two aspects make it scalable, more accurate, and adaptable to change.

Types of machine learning algorithms

"Society is changing, one learning algorithm at a time."

– Pedro Domingos

In this book, we are going to cover the two main paradigms of machine learning—supervised learning and unsupervised learning. Each of these two paradigms has its own sub-branches that will be discussed in the next section. Although it is not covered in this book, reinforcement learning will also be introduced in the next section:

Let's use our fictional Space Shuttle Corporation company once more to explain the differences between the different machine learning paradigms.

Supervised learning

Remember those old good days at school when you were given examples to practice on, along with the correct answers to them at the end to validate whether you are doing a good job? Then, at exam time, you were left on your own. That's basically what supervised learning is. Say our fictional space vehicle company wants to predict whether travelers will return their space vehicles. Luckily, the company has worked with many travelers in the past, and they already know which of them returned their vehicles and who did not. Think of this data as a spreadsheet, where each column has some information about the travelers—their financial statements, the number of kids they have, whether they are humans or aliens, and maybe their age (in Neptunian years, of course). Machine learners call these columns features. There is one extra column for previous travelers that states whether they returned or not; we call this column the label or target column. In the learning phase, we build a model using the features and targets. The aim of the algorithm while learning is to minimize the differences between its predictions and the actual targets. The difference is what we call the error. Once a model is constructed so that its error is minimal, we then use it to make predictions for newer data points. For new travelers, we only know their features, but we use the model we've just built to predict their corresponding targets. In a nutshell, the presence of the target in our historic data is what makes this process supervised.

Classification versus regression

Supervised learning is furthersubdivided into classification and regression. For cases where we only have a few predefined labels to predict, we use a classifier—for example, returnversusno return orhuman versusMartian versusVenusian. If what we want to predict is a wide-range number—say, how many years a traveler will take to come back—then it is a regression problem since these values can be anything from 1 or 2 years to 3 years, 5 months, and 7 days.

Supervised learning evaluation

Due to their differences, the metrics we use to evaluate these classifiers are usually different from ones we use with regression:

Classifier evaluation metrics: Suppose we are using a classifier to determine whether a traveler is going to return. Then, of those travelers that the classifier predicted to return, we want to measure what percentage of them actually did return. We call this measure precision. Also, of all travelers who did return, we want to measure what percentage of them the classifier correctly predicted to return. We call this recall. Precision and recall can be calculated for each class—that is, we can also calculate precision and recall for the travelers who did not return.

Accuracy is another commonly used, and sometimes abused, measure. For each case in our historic data, we know whether a traveler actually returned (actuals) and we can also generate predictions of whether they will return. The accuracy calculates what percentage of cases of the predictions and actuals match. As you can see, it is labeled agnostic, so it can sometimesbe misleading when the classes are highly imbalanced. In our example business, say 99% of our travelers actually return. We can build a dummy classifier that predicts whether every single traveler returns; it will be accurate 99% of the time. This 99% accuracy value doesn't tell us much, especially if you know that in these cases, the recall value for non-ret...

Title Page
Copyright and Credits
About Packt
Contributors
Preface
Section 1: Supervised Learning
Introduction to Machine Learning
Making Decisions with Trees
Making Decisions with Linear Equations
Preparing Your Data
Image Processing with Nearest Neighbors
Classifying Text Using Naive Bayes
Section 2: Advanced Supervised Learning
Neural Networks – Here Comes Deep Learning
Ensembles – When One Model Is Not Enough
The Y is as Important as the X
Imbalanced Learning – Not Even 1% Win the Lottery
Section 3: Unsupervised Learning and More
Clustering – Making Sense of Unlabeled Data
Anomaly Detection – Finding Outliers in Data
Recommender System – Getting to Know Their Taste
Other Books You May Enjoy

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits by Tarek Amr in PDF and/or ePUB format, as well as other popular books in Computer Science & Computer Science General. We have over one million books available in our catalogue for you to explore.