eBook - ePub

Applied Supervised Learning with Python

Name: Applied Supervised Learning with Python
Author: Benjamin Johnston, Ishita Mathur

Use scikit-learn to build predictive models from real-world datasets and prepare yourself for the future of machine learning

Benjamin Johnston, Ishita Mathur

Share book

404 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Applied Supervised Learning with Python

Use scikit-learn to build predictive models from real-world datasets and prepare yourself for the future of machine learning

Benjamin Johnston, Ishita Mathur

Book details

Book preview

Table of contents

Citations

About This Book

Explore the exciting world of machine learning with the fastest growing technology in the world

Key Features

Understand various machine learning concepts with real-world examples
Implement a supervised machine learning pipeline from data ingestion to validation
Gain insights into how you can use machine learning in everyday life

Book Description

Machine learning—the ability of a machine to give right answers based on input data—has revolutionized the way we do business. Applied Supervised Learning with Python provides a rich understanding of how you can apply machine learning techniques in your data science projects using Python. You'll explore Jupyter Notebooks, the technology used commonly in academic and commercial circles with in-line code running support.

With the help of fun examples, you'll gain experience working on the Python machine learning toolkit—from performing basic data cleaning and processing to working with a range of regression and classification algorithms. Once you've grasped the basics, you'll learn how to build and train your own models using advanced techniques such as decision trees, ensemble modeling, validation, and error metrics. You'll also learn data visualization techniques using powerful Python libraries such as Matplotlib and Seaborn.

This book also covers ensemble modeling and random forest classifiers along with other methods for combining results from multiple models, and concludes by delving into cross-validation to test your algorithm and check how well the model works on unseen data.

By the end of this book, you'll be equipped to not only work with machine learning algorithms, but also be able to create some of your own!

What you will learn

Understand the concept of supervised learning and its applications
Implement common supervised learning algorithms using machine learning Python libraries
Validate models using the k-fold technique
Build your models with decision trees to get results effortlessly
Use ensemble modeling techniques to improve the performance of your model
Apply a variety of metrics to compare machine learning models

Who this book is for

Applied Supervised Learning with Python is for you if you want to gain a solid understanding of machine learning using Python. It'll help if you to have some experience in any functional or object-oriented language and a basic understanding of Python libraries and expressions, such as arrays and dictionaries.

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is Applied Supervised Learning with Python an online PDF/ePUB?

Yes, you can access Applied Supervised Learning with Python by Benjamin Johnston, Ishita Mathur in PDF and/or ePUB format, as well as other popular books in Informatique & Bases de données. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Packt Publishing

Year

2019

ISBN

9781789955835

Edition

Topic

Informatique

Subtopic

Bases de données

Chapter 1 Python Machine Learning Toolkit

Learning Objectives

By the end of this chapter, you will be able to:

Explain supervised machine learning and describe common examples of machine learning problems
Install and load Python libraries into your development environment for use in analysis and machine learning problems
Access and interpret the documentation of a subset of Python libraries, including the powerful pandas library
Create an IPython Jupyter notebook and use executable code cells and markdown cells to create a dynamic report
Load an external data source using pandas and use a variety of methods to search, filter, and compute descriptive statistics of the data
Clean a data source of mediocre quality and gauge the potential impact of various issues within the data source

This chapter introduces supervised learning, Jupyter notebooks, and some of the most common pandas data methods.

Introduction

The study and application of machine learning and artificial intelligence has recently been the source of much interest and research in the technology and business communities. Advanced data analytics and machine learning techniques have shown great promise in advancing many sectors, such as personalized healthcare and self-driving cars, as well as in solving some of the world's greatest challenges, such as combating climate change. This book has been designed to assist you in taking advantage of the unique confluence of events in the field of data science and machine learning today. Across the globe, private enterprises and governments are realizing the value and efficiency of data-driven products and services. At the same time, reduced hardware costs and open source software solutions are significantly reducing the barriers to entry of learning and applying machine learning techniques.

Throughout this book, you will develop the skills required to identify, prepare, and build predictive models using supervised machine learning techniques in the Python programming language. The six chapters each cover one aspect of supervised learning. This chapter introduces a subset of the Python machine learning toolkit, as well as some of the things that need to be considered when loading and using data sources. This data exploration process is further explored in Chapter 2, Exploratory Data Analysis and Visualization, as we introduce exploratory data analysis and visualization. Chapter 3, Regression Analysis, and Chapter 4, Classification, look at two subsets of machine learning problems – regression and classification analysis – and demonstrate these techniques through examples. Finally, Chapter 5, Ensemble Modeling, covers ensemble networks, which use multiple predictions from different models to boost overall performance, while Chapter 6, Model Evaluation, covers the extremely important concepts of validation and evaluation metrics. These metrics provide a means of estimating the true performance of a model.

Supervised Machine Learning

A machine learning algorithm is commonly thought of as simply the mathematical process (or algorithm) itself, such as a neural network, deep neural network, or random forest algorithm. However, this is only a component of the overall system; firstly, we must define the problem that can be adequately solved using such techniques. Then, we must specify and procure a clean dataset that is composed of information that can be mapped from the first number space to a secondary one. Once the dataset has been designed and procured, the machine learning model can be specified and designed; for example, a single-layer neural network with 100 hidden nodes that uses a tanh activation function.

With the dataset and model well defined, the means of determining the exact values for the model can be specified. This is a repetitive optimization process that evaluates the output of the model against some existing data and is commonly referred to as training. Once training has been completed and you have your defined model, then it is good practice to evaluate it against some reference data to provide a benchmark of overall performance.

Considering this general description of a complete machine learning algorithm, the problem definition and data collection stages are often the most critical. What is the problem you are trying to solve? What outcome would you like to achieve? How are you going to achieve it? How you answer these questions will drive and define many of the subsequent decisions or model design choices. It is also in answering these questions that we will select which category of machine learning algorithms we will choose: supervised or unsupervised methods.

So, what exactly are supervised and unsupervised machine learning problems or methods? Supervised learning techniques center on mapping some set of information to another by providing the training process with the input information and the desired outputs, then checking its ability to provide the correct result. As an example, let's say you are the publisher of a magazine that reviews and ranks hairstyles from various time periods. Your readers frequently send you far more images of their favorite hairstyles for review than you can manually process. To save some time, you would like to automate the sorting of the hairstyles images you receive based on time periods, starting with hairstyles from the 1960s and 1980s:

Figure 1.1: Hairstyles images from different time periods

To create your hairstyles-sorting algorithm, you start by collecting a large sample of hairstyles images and manually labeling each one with its corresponding time period. Such a dataset (known as a labeled dataset) is the input data (hairstyles images) and the desired output information (time period) is known and recorded. This type of problem is a classic supervised learning problem; we are trying to develop an algorithm that takes a set of inputs and learns to return the answers that we have told it are correct.

When to Use Supervised Learning

Generally, if you are trying to automate or replicate an existing process, the problem is a supervised learning problem. Supervised learning techniques are both very useful and powerful, and you may have come across them or even helped create labeled datasets for them without realizing. As an example, a few years ago, Facebook introduced the ability to tag your friends in any image uploaded to the platform. To tag a friend, you would draw a square over your friend's face and then add the name of your friend to notify them of the image. Fast-forward to today and Facebook will automatically identify your friends in the image and tag them for you. This is yet another example of supervised learning. If you ever used the early tagging system and manually identified your friends in an image, you were in fact helping to create Facebook's labeled dataset. A user who uploaded an image of a person's face (the input d...