Hands-On Predictive Analytics with Python
eBook - ePub

Hands-On Predictive Analytics with Python

Master the complete predictive analytics process, from problem definition to model deployment

Alvaro Fuentes

Condividi libro
  1. 330 pagine
  2. English
  3. ePUB (disponibile sull'app)
  4. Disponibile su iOS e Android
eBook - ePub

Hands-On Predictive Analytics with Python

Master the complete predictive analytics process, from problem definition to model deployment

Alvaro Fuentes

Dettagli del libro
Anteprima del libro
Indice dei contenuti

Informazioni sul libro

Step-by-step guide to build high performing predictive applications

Key Features

  • Use the Python data analytics ecosystem to implement end-to-end predictive analytics projects
  • Explore advanced predictive modeling algorithms with an emphasis on theory with intuitive explanations
  • Learn to deploy a predictive model's results as an interactive application

Book Description

Predictive analytics is an applied field that employs a variety of quantitative methods using data to make predictions. It involves much more than just throwing data onto a computer to build a model. This book provides practical coverage to help you understand the most important concepts of predictive analytics. Using practical, step-by-step examples, we build predictive analytics solutions while using cutting-edge Python tools and packages.

The book's step-by-step approach starts by defining the problem and moves on to identifying relevant data. We will also be performing data preparation, exploring and visualizing relationships, building models, tuning, evaluating, and deploying model.

Each stage has relevant practical examples and efficient Python code. You will work with models such as KNN, Random Forests, and neural networks using the most important libraries in Python's data science stack: NumPy, Pandas, Matplotlib, Seaborn, Keras, Dash, and so on. In addition to hands-on code examples, you will find intuitive explanations of the inner workings of the main techniques and algorithms used in predictive analytics.

By the end of this book, you will be all set to build high-performance predictive analytics solutions using Python programming.

What you will learn

  • Get to grips with the main concepts and principles of predictive analytics
  • Learn about the stages involved in producing complete predictive analytics solutions
  • Understand how to define a problem, propose a solution, and prepare a dataset
  • Use visualizations to explore relationships and gain insights into the dataset
  • Learn to build regression and classification models using scikit-learn
  • Use Keras to build powerful neural network models that produce accurate predictions
  • Learn to serve a model's predictions as a web application

Who this book is for

This book is for data analysts, data scientists, data engineers, and Python developers who want to learn about predictive modeling and would like to implement predictive analytics solutions using Python's data stack. People from other backgrounds who would like to enter this exciting field will greatly benefit from reading this book. All you need is to be proficient in Python programming and have a basic understanding of statistics and college-level algebra.

Domande frequenti

Come faccio ad annullare l'abbonamento?
È semplicissimo: basta accedere alla sezione Account nelle Impostazioni e cliccare su "Annulla abbonamento". Dopo la cancellazione, l'abbonamento rimarrà attivo per il periodo rimanente già pagato. Per maggiori informazioni, clicca qui
È possibile scaricare libri? Se sì, come?
Al momento è possibile scaricare tramite l'app tutti i nostri libri ePub mobile-friendly. Anche la maggior parte dei nostri PDF è scaricabile e stiamo lavorando per rendere disponibile quanto prima il download di tutti gli altri file. Per maggiori informazioni, clicca qui
Che differenza c'è tra i piani?
Entrambi i piani ti danno accesso illimitato alla libreria e a tutte le funzionalità di Perlego. Le uniche differenze sono il prezzo e il periodo di abbonamento: con il piano annuale risparmierai circa il 30% rispetto a 12 rate con quello mensile.
Cos'è Perlego?
Perlego è un servizio di abbonamento a testi accademici, che ti permette di accedere a un'intera libreria online a un prezzo inferiore rispetto a quello che pagheresti per acquistare un singolo libro al mese. Con oltre 1 milione di testi suddivisi in più di 1.000 categorie, troverai sicuramente ciò che fa per te! Per maggiori informazioni, clicca qui.
Perlego supporta la sintesi vocale?
Cerca l'icona Sintesi vocale nel prossimo libro che leggerai per verificare se è possibile riprodurre l'audio. Questo strumento permette di leggere il testo a voce alta, evidenziandolo man mano che la lettura procede. Puoi aumentare o diminuire la velocità della sintesi vocale, oppure sospendere la riproduzione. Per maggiori informazioni, clicca qui.
Hands-On Predictive Analytics with Python è disponibile online in formato PDF/ePub?
Sì, puoi accedere a Hands-On Predictive Analytics with Python di Alvaro Fuentes in formato PDF e/o ePub, così come ad altri libri molto apprezzati nelle sezioni relative a Computer Science e Programming in Python. Scopri oltre 1 milione di libri disponibili nel nostro catalogo.



Predicting Categories with Machine Learning

In the previous chapter, we learned the basics of machine learning. In this chapter, we will build models that predict categories. This class of machine learning problems is known as classification tasks. Classification models are the ones that are the most useful in practice, and in this chapter we will talk about some of the most popular and foundational classification models.
We begin the chapter by providing an overview of the classification tasks and some of their applications. Then we bring back our credit card default dataset and start preparing it for modeling. After that, we introduce one of the most popular models for classification—logistic regression, which is similar in spirit to the multiple regression models we discussed in the previous chapter. The next model we present is classification trees. We present this model because it is very popular and easy to understand and, besides, it is the basis for one of the most popular and power models used in predictive analytics—random forests.
As we did in the previous chapter, we explain at a high level how these models work and we use scikit-learn to train models in our credit card default dataset. After training the models, we compare their performance on the testing set. Finally, because the credit card default dataset is a binary classification problem, we finish the chapter with a brief section that contains an example of the multiclass classification problem.
These are the learning outcomes for this chapter:
  • Learn about classification tasks and why classification models are so important
  • Review the credit card default dataset
  • Learn about the logistic regression model
  • Understand the classification trees model
  • Learn the random forest model
  • Provide a simple example of multiclass classification
  • Learn the basics of Naive Bayes classifiers

Technical requirements

The technical requirements for this chapter are as follows:
  • Python 3.6 or higher
  • Jupyter Notebook
  • Recent versions of the following Python libraries: NumPy, pandas, matplotlib, Seaborn, and scikit-learn

Classification tasks

Classification tasks belong to the supervised learning branch of ML. These kinds of tasks are the most widely used in applications in industry and academia. Here are just a few examples of classification tasks in some domains of application:
  • Direct marketing: Predict whether a customer will give a positive or a negative response to a campaign
  • Medicine: Predict whether a patient is healthy or is sick; or, for example, which kind of cancer the patient has
  • Insurance: Classify clients by risk level; for instance, low, average, or high risk
  • Telecommunication and other industries: Churn models are classification models that predict which customers will switch to another provider
  • Education: Predict which students will drop out from a program
  • Email services: Classify emails that go to different places such as inbox, spam, social, and promotions
Of course, our credit card default problem is a classification task because we are trying to predict if a customer will default or pay his credit card next month.
To review what we mentioned in the previous chapter, there are mainly three types of classification problems:
  • Binary classification: The target has only two categories, which is the case for our credit card default problem.
  • Multiclass classification: When the target has more than two classes.
  • Multilabel classification: The problem of assigning more than one category or label to an observation. A popular example could predict the subject of a news article based on its contents. Many news articles hardly fall into just one category; one article could be simultaneously about the broad topics of World News, Politics, and Finance.

Predicting categories and probabilities

ML classification models can output two types of predictions:
  • Predicted classes: For every observation, the model will directly give the prediction of the class.
  • Probabilities for each class: For every observation and every class, the model will output probabilities of that observation belonging to that class. Say, for example, we have three classes—A, B, and C—then the output of the model would be a triple of numbers such as [0.2, 0.7, 0.1], meaning the probabilities of the observation belonging to A, B, and C respectively. Note that, since we are dealing with probabilities, the values should add up to 1.
In the case of models that output the probability for every class, the classification is done by predicting the category with the highest probability. This is like the default rule; however, we can (and sometimes should) change this method of using the probabilities for predicting classes based on the goals we set for our predictive analytics project.
For binary classification models, we often name one of the classes "the positive class" and label the class with a 1 and the other class becomes "the negative class", labeled often with a 0 (many people like using a -1 as well, but I don't like it). The positive class is the class around which the analysis is made. Keep in mind that in this context the term "positive" has nothing to do with the regular use of the word, indicating that something is "good"—for instance, in the credit card default, our positive class will be "default", which of course from the point of view of the financial institution is not "positive" at all.

Credit card default dataset

OK, time to get our hands dirty with the credit card default data. We saw the descriptions of the features back in Chapter 2, Problem Understanding and Data Preparation:
  • SEX: Gender (1 = male; 2 = female).
  • EDUCATION: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others).
  • MARRIAGE: Marital status (1 = married; 2 = single; 3 = others).
  • AGE: Age (year).
  • LIMIT_BAL: Amount of the given credit (New Taiwan dollar)—it includes both the individual consumer credit and his/her family (supplementary) credit.
  • PAY_1 - PAY_6: History of past payment. We tracked the past monthly payment records (from April, 2005, to September, 2005) as follows: 0 = the repayment status in September, 2005; 1 = the repayment status in August, 2005; . . .; 6 = the repayment status in April, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and ...

Indice dei contenuti