Ensemble Machine Learning Cookbook
eBook - ePub

Ensemble Machine Learning Cookbook

Over 35 practical recipes to explore ensemble machine learning techniques using Python

  1. 336 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Ensemble Machine Learning Cookbook

Over 35 practical recipes to explore ensemble machine learning techniques using Python

About this book

Implement machine learning algorithms to build ensemble models using Keras, H2O, Scikit-Learn, Pandas and more

Key Features

  • Apply popular machine learning algorithms using a recipe-based approach
  • Implement boosting, bagging, and stacking ensemble methods to improve machine learning models
  • Discover real-world ensemble applications and encounter complex challenges in Kaggle competitions

Book Description

Ensemble modeling is an approach used to improve the performance of machine learning models. It combines two or more similar or dissimilar machine learning algorithms to deliver superior intellectual powers. This book will help you to implement popular machine learning algorithms to cover different paradigms of ensemble machine learning such as boosting, bagging, and stacking.

The Ensemble Machine Learning Cookbook will start by getting you acquainted with the basics of ensemble techniques and exploratory data analysis. You'll then learn to implement tasks related to statistical and machine learning algorithms to understand the ensemble of multiple heterogeneous algorithms. It will also ensure that you don't miss out on key topics, such as like resampling methods. As you progress, you'll get a better understanding of bagging, boosting, stacking, and working with the Random Forest algorithm using real-world examples. The book will highlight how these ensemble methods use multiple models to improve machine learning results, as compared to a single model. In the concluding chapters, you'll delve into advanced ensemble models using neural networks, natural language processing, and more. You'll also be able to implement models such as fraud detection, text categorization, and sentiment analysis.

By the end of this book, you'll be able to harness ensemble techniques and the working mechanisms of machine learning algorithms to build intelligent models using individual recipes.

What you will learn

  • Understand how to use machine learning algorithms for regression and classification problems
  • Implement ensemble techniques such as averaging, weighted averaging, and max-voting
  • Get to grips with advanced ensemble methods, such as bootstrapping, bagging, and stacking
  • Use Random Forest for tasks such as classification and regression
  • Implement an ensemble of homogeneous and heterogeneous machine learning algorithms
  • Learn and implement various boosting techniques, such as AdaBoost, Gradient Boosting Machine, and XGBoost

Who this book is for

This book is designed for data scientists, machine learning developers, and deep learning enthusiasts who want to delve into machine learning algorithms to build powerful ensemble models. Working knowledge of Python programming and basic statistics is a must to help you grasp the concepts in the book.

Tools to learn more effectively

Saving Books

Saving Books

Keyword Search

Keyword Search

Annotating Text

Annotating Text

Listen to it instead

Listen to it instead

Statistical and Machine Learning Algorithms

In this chapter, we will cover the following recipes:
  • Multiple linear regression
  • Logistic regression
  • Naive Bayes
  • Decision trees
  • Support vector machines

Technical requirements

The technical requirements for this chapter remain the same as those we detailed in Chapter 1, Get Closer to Your Data.
Visit the GitHub repository to get the dataset and the code. These are arranged by chapter and by the name of the topic. For the linear regression dataset and code, for example, visit .../Chapter 3/Linear regression.

Multiple linear regression

Multiple linear regression is a technique used to train a linear model, that assumes that there are linear relationships between multiple predictor variables (
) and a continuous target variable (
). The general equation for a multiple linear regression with m predictor variables is as follows:
Training a linear regression model involves estimating the values of the coefficients for each of the predictor variables denoted by the letter
. In the preceding equation,
denotes an error term, which is normally distributed, and has zero mean and constant variance. This is represented as follows:
Various techniques can be used to build a linear regression model. The most frequently used is the ordinary least square (OLS) estimate. The OLS method is used to produce a linear regression line that seeks to minimize the sum of the squared error. The error is the distance from an actual data point to the regression line. The sum of the squared error measures the aggregate of the squared difference between the training instances, which are each of our data points, and the values predicted by the regression line. This can be represented as follows:
In the preceding equation,
is the actual training instance and
is the value predicted by the regression line.
In the context of machine learning, gradient descent is a common technique that can be used to optimize the coefficients of predictor variables by minimizing the training error of the model through multiple iterations. Gradient descent starts by initializing the coefficients to zero. Then, the coefficients are updated with the intention of minimizing the error. Updating the coefficients is an iterative process and is performed until a minimum squared error is achieved.
In the gradient descent technique, a hyperparameter called the learning rate, denoted
by
is provided to the algorithm. This parameter determines how fast the algorithm moves toward the optimal value of the coefficients. If
is very large, the algorithm might skip the optimal solution. If it is too small, however, the algorithm might have too many iterations to converge to the optimum coefficient values. For this reason, it is important to use the right value for
.
In this recipe, we will use the gradient descent method to train our linear regression model.

Getting ready

In Chapter 1, Get Closer To Your Data, we took the HousePrices.csv file and looked at how to manipulate and prepare our data. We also analyzed and treated the missing values in the dataset. We will now use this final dataset for our model-building exercise, using linear regression:
In the following code block, we will start by importing the required libraries:
# import os for operating system dependent functionalities
import os

# import other required libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
We set our working directory with the os.chdir() command:
# Set your working directory according to your requirement
os.chdir(".../Chapter 4/Linear Regression")
os.getcwd()
Let's read our data. We prefix the DataFrame name with df_ so that we can understand it easily:
df_housingdata = pd.read_csv("Final_HousePrices.csv")

How to do it...

Let's move on to building our model. We will start by identifying our numerical and categorical variables. We study the correlations using the correlation matrix and the correlation plots.
  1. First, we'll take a look at the variables and the variable types:
# See the variables and their data types
df_housingdata.dtypes
  1. We'll then look at the correlation matrix. The corr() method computes the pairwise correlation of columns:
# We pass 'pearson' as the method for calculating our correlation
df_housingdata.corr(method='pearson')
  1. Besides this, we'd also like to study the correlation between the predictor variables and the response variable:
  2. ...

Table of contents

  1. Title Page
  2. Copyright and Credits
  3. About Packt
  4. Foreword
  5. Contributors
  6. Preface
  7. Get Closer to Your Data
  8. Getting Started with Ensemble Machine Learning
  9. Resampling Methods
  10. Statistical and Machine Learning Algorithms
  11. Bag the Models with Bagging
  12. When in Doubt, Use Random Forests
  13. Boosting Model Performance with Boosting
  14. Blend It with Stacking
  15. Homogeneous Ensembles Using Keras
  16. Heterogeneous Ensemble Classifiers Using H2O
  17. Heterogeneous Ensemble for Text Classification Using NLP
  18. Homogenous Ensemble for Multiclass Classification Using Keras
  19. Other Books You May Enjoy

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Ensemble Machine Learning Cookbook by Dipayan Sarkar, Vijayalakshmi Natarajan in PDF and/or ePUB format, as well as other popular books in Computer Science & Artificial Intelligence (AI) & Semantics. We have over one million books available in our catalogue for you to explore.