Mastering Predictive Analytics with scikit-learn and TensorFlow
eBook - ePub

Mastering Predictive Analytics with scikit-learn and TensorFlow

Implement machine learning techniques to build advanced predictive models using Python

Alvaro Fuentes

  1. 154 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Mastering Predictive Analytics with scikit-learn and TensorFlow

Implement machine learning techniques to build advanced predictive models using Python

Alvaro Fuentes

Book details
Book preview
Table of contents
Citations

About This Book

Learn advanced techniques to improve the performance and quality of your predictive models

Key Features

  • Use ensemble methods to improve the performance of predictive analytics models
  • Implement feature selection, dimensionality reduction, and cross-validation techniques
  • Develop neural network models and master the basics of deep learning

Book Description

Python is a programming language that provides a wide range of features that can be used in the field of data science. Mastering Predictive Analytics with scikit-learn and TensorFlow covers various implementations of ensemble methods, how they are used with real-world datasets, and how they improve prediction accuracy in classification and regression problems.

This book starts with ensemble methods and their features. You will see that scikit-learn provides tools for choosing hyperparameters for models. As you make your way through the book, you will cover the nitty-gritty of predictive analytics and explore its features and characteristics. You will also be introduced to artificial neural networks and TensorFlow, and how it is used to create neural networks. In the final chapter, you will explore factors such as computational power, along with improvement methods and software enhancements for efficient predictive analytics.

By the end of this book, you will be well-versed in using deep neural networks to solve common problems in big data analysis.

What you will learn

  • Use ensemble algorithms to obtain accurate predictions
  • Apply dimensionality reduction techniques to combine features and build better models
  • Choose the optimal hyperparameters using cross-validation
  • Implement different techniques to solve current challenges in the predictive analytics domain
  • Understand various elements of deep neural network (DNN) models
  • Implement neural networks to solve both classification and regression problems

Who this book is for

Mastering Predictive Analytics with scikit-learn and TensorFlow is for data analysts, software engineers, and machine learning developers who are interested in implementing advanced predictive analytics using Python. Business intelligence experts will also find this book indispensable as it will teach them how to progress from basic predictive models to building advanced models and producing more accurate predictions. Prior knowledge of Python and familiarity with predictive analytics concepts are assumed.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Mastering Predictive Analytics with scikit-learn and TensorFlow an online PDF/ePUB?
Yes, you can access Mastering Predictive Analytics with scikit-learn and TensorFlow by Alvaro Fuentes in PDF and/or ePUB format, as well as other popular books in Informatica & Informatica generale. We have over one million books available in our catalogue for you to explore.

Information

Year
2018
ISBN
9781789612240

Working with Features

In this chapter, we are going to take a close look at how features play an important role in the feature engineering technique. We'll learn some techniques that will allow us to improve our predictive analytics models in two ways: in terms of the performance metrics of our models and to understand the relationship between the features and the target variables that we are trying to predict.
In this chapter, we are going to cover the following topics:
  • Feature selection methods
  • Dimensionality reduction and PCA
  • Creating new features
  • Improving models with feature engineering

Feature selection methods

Feature selection methods are used for selecting features that are likely to help with predictions. The following are the three methods for feature selection:
  • Removing dummy features with low variance
  • Identifying important features statistically
  • Recursive feature elimination
When building predictive analytics models, some features won't be related to the target and this will prove to be less helpful in prediction. Now, the problem is that including irrelevant features in the model can introduce noise and add bias to the model. So, feature selection techniques are a set of techniques used to select the most relevant and useful features that will help either with prediction or with understanding our model.

Removing dummy features with low variance

The first technique of feature selection that we will learn about is removing dummy features with low variance. The only transformation that we have been applying so far to our features is to transform the categorical features using the encoding technique. If we take one categorical feature and use this encoding technique, we get a set of dummy features, which are to be examined to see whether they have variability or not. So, features with a very low variance are likely to have little impact on prediction. Now, why is that? Imagine that you have a dataset where you have a gender feature and that 98% of the observations correspond to just the female gender. This feature won't have any impact on prediction because almost all of the cases are just of a single category, so there is not enough variability. These cases become candidates lined up for elimination and such features should be examined more carefully. Now, take a look at the following formula:
You can remove all dummy features that are either 0 or 1 in more than x% of the samples, or what you can do is to establish a minimum threshold for the variance of such features. Now, the variance of such features can be obtained with the preceding formula, where p is the number or the proportion of 1 in your dummy features. We will see how this works in a Jupyter Notebook.

Identifying important features statistically

This method will help you make use of some statistical tests for identifying and selecting relevant features. So, for example, for classification tasks we can use an ANOVA F-statistic to evaluate the relationship between numerical features and the target, which will be a categorical feature because this is an example of a classic task. Or, to evaluate the statistical relationship between a categorical feature and the target, we will use the chi-squared test to evaluate such a relationship. In scikit-learn, we can use the SelectKBest object and we will see how to use these objects in a Jupyter Notebook.

Recursive feature elimination

The process of identifying important features and removing the ones that we think are not important for our model is called recursive feature elimination (RFE). RFE can also be applied in scikit-learn and we can use this technique for calculating coefficients, such as linear, logistic regression, or with models to calculate something called feature importance. The random forests model provides us with those feature importance metrics. So, for models that don't calculate either coefficients or feature importance, these methods cannot be used; for example, for KNN models, you cannot apply the RFE technique because this begins by predefining the required features to use in your model. Using all features, this method fits the model and then, based on the coefficients or the feature importance, the least important features are eliminated. This procedure is recursively repeated on the selected set of features until the desired number of features to select is eventually reached.
There are the following few methods to select important features in your models:
  • L1 feature
  • Selection threshold methods
  • Tree-based methods
Let's go to our Jupyter Notebook to see how we actually apply these methods in scikit-learn. The following screenshot depicts the necessary libraries and modules to import:
In the following screenshot, we have first used the credit card default dataset and we are applying the traditional transformations that we do to the raw data:
The following screenshot shows the dummy features that we have in our dataset and the numerical features, depending on the type of feature:
Here, we are applying the scaling operation for feature modeling:
The first method that we talked about in the presentation was removing dummy features with low variance to get ...

Table of contents

Citation styles for Mastering Predictive Analytics with scikit-learn and TensorFlow

APA 6 Citation

Fuentes, A. (2018). Mastering Predictive Analytics with scikit-learn and TensorFlow (1st ed.). Packt Publishing. Retrieved from https://www.perlego.com/book/825782/mastering-predictive-analytics-with-scikitlearn-and-tensorflow-implement-machine-learning-techniques-to-build-advanced-predictive-models-using-python-pdf (Original work published 2018)

Chicago Citation

Fuentes, Alvaro. (2018) 2018. Mastering Predictive Analytics with Scikit-Learn and TensorFlow. 1st ed. Packt Publishing. https://www.perlego.com/book/825782/mastering-predictive-analytics-with-scikitlearn-and-tensorflow-implement-machine-learning-techniques-to-build-advanced-predictive-models-using-python-pdf.

Harvard Citation

Fuentes, A. (2018) Mastering Predictive Analytics with scikit-learn and TensorFlow. 1st edn. Packt Publishing. Available at: https://www.perlego.com/book/825782/mastering-predictive-analytics-with-scikitlearn-and-tensorflow-implement-machine-learning-techniques-to-build-advanced-predictive-models-using-python-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Fuentes, Alvaro. Mastering Predictive Analytics with Scikit-Learn and TensorFlow. 1st ed. Packt Publishing, 2018. Web. 14 Oct. 2022.