eBook - ePub

Training Systems Using Python Statistical Modeling

Name: Training Systems Using Python Statistical Modeling
ISBN: 9781838820640

Explore popular techniques for modeling your data in Python

Curtis Miller,

290 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Training Systems Using Python Statistical Modeling

Explore popular techniques for modeling your data in Python

Curtis Miller,

About this book

Leverage the power of Python and statistical modeling techniques for building accurate predictive models

Key Features

Get introduced to Python's rich suite of libraries for statistical modeling
Implement regression, clustering and train neural networks from scratch
Includes real-world examples on training end-to-end machine learning systems in Python

Book Description

Python's ease of use and multi-purpose nature has led it to become the choice of tool for many data scientists and machine learning developers today. Its rich libraries are widely used for data analysis, and more importantly, for building state-of-the-art predictive models. This book takes you through an exciting journey, of using these libraries to implement effective statistical models for predictive analytics.

You'll start by diving into classical statistical analysis, where you will learn to compute descriptive statistics using pandas. You will look at supervised learning, where you will explore the principles of machine learning and train different machine learning models from scratch. You will also work with binary prediction models, such as data classification using k-nearest neighbors, decision trees, and random forests. This book also covers algorithms for regression analysis, such as ridge and lasso regression, and their implementation in Python. You will also learn how neural networks can be trained and deployed for more accurate predictions, and which Python libraries can be used to implement them.

By the end of this book, you will have all the knowledge you need to design, build, and deploy enterprise-grade statistical models for machine learning using Python and its rich ecosystem of libraries for predictive analytics.

What you will learn

Understand the importance of statistical modeling
Learn about the various Python packages for statistical analysis
Implement algorithms such as Naive Bayes, random forests, and more
Build predictive models from scratch using Python's scikit-learn library
Implement regression analysis and clustering
Learn how to train a neural network in Python

Who this book is for

If you are a data scientist, a statistician or a machine learning developer looking to train and deploy effective machine learning models using popular statistical techniques, then this book is for you. Knowledge of Python programming is required to get the most out of this book.

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Year

Print ISBN

eBook ISBN

Edition

Topic

Computer Science

Subtopic

Data Modelling & Design

Index

Computer Science

Binary Prediction Models

In this chapter, we will look at various methods for classifying data, and focus on binary data. We will start with a simple algorithm—the k-nearest neighbors algorithm. Next, we will move on to decision trees. We will then look at an ensemble method and combine multiple decision trees into a random forest classifier. After that, we will move on to linear classifiers, the first being the Naive Bayes algorithm. Then, we will see how to train support vector machines. Following this, we will look at another well-known and extensively used classifier—logistic regression. Finally, we will see how we can extend algorithms for binary classification to algorithms that are capable of multiclass classification.

The following topics will be covered in this chapter:

K-nearest neighbors classifier
Decision trees
Random forests
Naive Bayes
Support vector machines
Logistic regression
Extending beyond binary classifiers

K-nearest neighbors classifier

Our first learning system will be the k-nearest neighbors (kNN) classifier. I will describe how the classifier makes predictions, the important hyperparameters it uses, and the problems that are faced by the classifier. Throughout, I will be using the classifier to predict species of iris flowers. So, let's go ahead and start a Jupyter Notebook for this classifier:

The first thing we're going to do is load in the dataset and other required functions, as follows:

The iris dataset is provided with sklearn. It is one of their example datasets, and is well known.

Then, we will load in an object that contains the iris data and save that into Python objects:

Then, we will divide the dataset into training and test data by using the following lines of code:

Here are the first five rows of the training data:

Here are the first five labels of the training data:

Throughout this chapter, we will be doing an inappropriate practice. You aren't supposed to be repeatedly looking at the test set; however, we are going to do so throughout this entire section so that you can get an idea of how well these algorithms do on test sets. We are only doing this to demonstrate these algorithms, as well as some things that you might be thinking about when you're training algorithms, just so that we have some way to evaluate an algorithm without actually looking at how it's performing on just the training set.

In a real project, looking at the test set would be the very last thing that you would do, and would be when you were presenting the results of your work to your boss. You would have already picked a classifier; you can only choose one. Getting statistics from the test set is only for academic purposes so that you have some idea of how your classifier is going to behave on your dataset. You might have a test set that isn't actually the test set that you are free to look at as often as you like, but remember that every time you're looking at the test set, it is no longer acting as data that the algorithm has not seen before. Every time you look at it, it is now acting like seen data that is affecting the algorithm that you're training.

All in all, we're going to be doing some sloppy things in this chapter, but that's just because I'm trying to teach you these methods, give you some sense of what's going on, and give you something to think about when you're trying to decide what algorithm you should be using and what you're looking for.

Training a kNN classifier

Training a kNN classifier is easy. First, you need to load in the training data, which will be used f...

Title Page
Copyright and Credits
About Packt
Contributors
Preface
Classical Statistical Analysis
Introduction to Supervised Learning
Binary Prediction Models
Regression Analysis and How to Use It
Neural Networks
Clustering Techniques
Dimensionality Reduction
Other Books You May Enjoy

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Training Systems Using Python Statistical Modeling by Curtis Miller in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Modelling & Design. We have over one million books available in our catalogue for you to explore.

About this book

Tools to learn more effectively

Information

Table of contents

Frequently asked questions