eBook - ePub

Hands-On Automated Machine Learning

Name: Hands-On Automated Machine Learning
ISBN: 9781788622288

Sibanjan Das,

Umit Mert Cakmak,

282 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Hands-On Automated Machine Learning

Sibanjan Das,

Umit Mert Cakmak,

About this book

Automate data and model pipelines for faster machine learning applicationsAbout This Book• Build automated modules for different machine learning components• Understand each component of a machine learning pipeline in depth• Learn to use different open source AutoML and feature engineering platformsWho This Book Is ForIf you're a budding data scientist, data analyst, or Machine Learning enthusiast and are new to the concept of automated machine learning, this book is ideal for you. You'll also find this book useful if you're an ML engineer or data professional interested in developing quick machine learning pipelines for your projects. Prior exposure to Python programming will help you get the best out of this book.What You Will Learn• Understand the fundamentals of Automated Machine Learning systems• Explore auto-sklearn and MLBox for AutoML tasks • Automate your preprocessing methods along with feature transformation• Enhance feature selection and generation using the Python stack• Assemble individual components of ML into a complete AutoML framework• Demystify hyperparameter tuning to optimize your ML models• Dive into Machine Learning concepts such as neural networks and autoencoders • Understand the information costs and trade-offs associated with AutoMLIn DetailAutoML is designed to automate parts of Machine Learning. Readily available AutoML tools are making data science practitioners' work easy and are received well in the advanced analytics community. Automated Machine Learning covers the necessary foundation needed to create automated machine learning modules and helps you get up to speed with them in the most practical way possible. In this book, you'll learn how to automate different tasks in the machine learning pipeline such as data preprocessing, feature selection, model training, model optimization, and much more. In addition to this, it demonstrates how you can use the available automation libraries, such as auto-sklearn and MLBox, and create and extend your own custom AutoML components for Machine Learning. By the end of this book, you will have a clearer understanding of the different aspects of automated Machine Learning, and you'll be able to incorporate automation tasks using practical datasets. You can leverage your learning from this book to implement Machine Learning in your projects and get a step closer to winning various machine learning competitions.Style and approachStep by step approach to understand how to automate your machine learning tasks

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Packt Publishing

Year

2018

eBook ISBN

9781788622288

Edition

Topic

Computer Science

Subtopic

Artificial Intelligence (AI) & Semantics

Index

Computer Science

Automated Algorithm Selection

This chapter offers a glimpse into the vast landscape of machine learning (ML) algorithms. A bird's-eye view will show you the kind of learning problems that you can tackle with ML, which you have already learned. Let's briefly review them.

If examples/observations in your dataset have associated labels, then these labels can provide guidance to algorithms during model training. Having this guidance or supervision, you will use supervised or semi-supervised learning algorithms. If you don't have labels, you will use unsupervised learning algorithms.

There are other cases that require different approaches, such as reinforcement learning, but, in this chapter, the main focus will be on supervised and unsupervised algorithms.

The next frontier in ML pipelines is automation. When you first think about automating ML pipelines, the core elements are feature transformation, model selection, and hyperparameter optimization. However, there are some other points that you need to consider for your specific problem and you will examine the following points throughout this chapter:

Computational complexity
Differences in training and scoring time
Linearity versus non-linearity
Algorithm-specific feature transformations

Understanding these will help you to understand which algorithms may suit your needs for a given problem. By the end of this chapter:

You will have learned the basics of automated supervised learning and unsupervised learning
You will have learned the main aspects to consider when working with ML pipelines
You will have practiced your skills on various use cases and built supervised and unsupervised ML pipelines

Technical requirements

Check the requirements.txt file for libraries to be installed to run code examples in GitHub for this chapter.

All the code examples can be found in the Chapter 04 folder in GitHub.

Computational complexity

Computational efficiency and complexity are important aspects of choosing ML algorithms, since they will dictate the resources needed for model training and scoring in terms of time and memory requirements.

For example, a compute-intensive algorithm will require a longer time to train and optimize its hyperparameters. You will usually distribute the workload among available CPUs or GPUs to reduce the amount of time spent to acceptable levels.

In this section, some algorithms will be examined in terms of these constraints but, before getting into deeper details of ML algorithms, you need to know the basics of the complexity of an algorithm.

The complexity of an algorithm will be based on its input size. For ML algorithms, this could be the number of elements and features. You will usually count the number of operations needed to complete the task in the worst-case scenario and that will be your algorithm's complexity.

Big O notation

You have probably heard of big O notation. It has different classes for indicating complexity such as linear—O(n), logarithmic—O(log n), quadratic—O(n2), cubic—O(n3), and similar classes. The reason you use big O is because the runtime of algorithms is highly dependent on the hardware and you need a systematic way of measuring the performance of an algorithm based on the size of its input. Big O looks at the steps of an algorithm and figures out the worst-case scenario as mentioned.

For example, if n is the number of elements that you would like to append to a list, its complexity is O(n), because the number of appended operations depends on the n. The following code block will help you to plot how different complexities grow as a function of their input size:

# Importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Setting the style of the plot
plt.style.use('seaborn-whitegrid')

# Creating an array of input sizes
n = 10
x = np.arange(1, n)

# Creating a pandas data frame for popular complexity classes
df = pd.DataFrame({'x': x,
 'O(1)': 0,
 'O(n)': x,
 'O(log_n)': np.log(x),
 'O(n_log_n)': n * np.log(x),
 'O(n2)': np.power(x, 2), # Quadratic
 'O(n3)': np.power(x, 3)}) # Cubic

# Creating labels
labels = ['$O(1) - Constant$',
 '$O(\log{}n) - Logarithmic$',
 '$O(n) - Linear$',
 '$O(n^2) - Quadratic$',
 '$O(n^3) - Cubic$',
 '$O(n\log{}n) - N log n$']

# Plotting every column in dataframe except 'x'
for i, col in enumerate(df.columns.drop('x')):
 print(labels[i], col)
 plt.plot(df[col], label=labels[i])

# Adding a legend
plt.legend()

# Limiting the y-axis
plt.ylim(0,50)

plt.show()

We get the following plot as the output of the preceding code:

Different complexities grow as a function of their input size

One thing to note here is that there are some crossover points between different levels of complexities. This shows the role of data size. It's easy to understand the complexity of simple examples, but what about the complexity of ML algorithms? If the introduction so far has already piqued your interest, continue reading the next section.

Differences in training and scoring time

Time spent for training and scoring can make or break a ML project. If an algorithm takes too long to train on currently available hardware, updating the model with new data and hyperparameter optimization will be painful, which may force you to cross that algorithm out from your candidate list. If an algorithm takes too long to score, then this is probably a problem in the production environment since your application may require fast inference times such as milliseconds or microseconds to get predictions. That's why it's important to learn the inner workings of ML algorithms, at least the common ones at first, to ...

Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Introduction to AutoML
Introduction to Machine Learning Using Python
Data Preprocessing
Automated Algorithm Selection
Hyperparameter Optimization
Creating AutoML Pipelines
Dive into Deep Learning
Critical Aspects of ML and Data Science Projects
Other Books You May Enjoy

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Hands-On Automated Machine Learning by Sibanjan Das, Umit Mert Cakmak in PDF and/or ePUB format, as well as other popular books in Computer Science & Artificial Intelligence (AI) & Semantics. We have over one million books available in our catalogue for you to explore.

About this book

Tools to learn more effectively

Information

Table of contents

Frequently asked questions