Regression Analysis with Python
eBook - ePub

Regression Analysis with Python

  1. 312 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Regression Analysis with Python

About this book

Learn the art of regression analysis with Python

About This Book

  • Become competent at implementing regression analysis in Python
  • Solve some of the complex data science problems related to predicting outcomes
  • Get to grips with various types of regression for effective data analysis

Who This Book Is For

The book targets Python developers, with a basic understanding of data science, statistics, and math, who want to learn how to do regression analysis on a dataset. It is beneficial if you have some knowledge of statistics and data science.

What You Will Learn

  • Format a dataset for regression and evaluate its performance
  • Apply multiple linear regression to real-world problems
  • Learn to classify training points
  • Create an observation matrix, using different techniques of data analysis and cleaning
  • Apply several techniques to decrease (and eventually fix) any overfitting problem
  • Learn to scale linear models to a big dataset and deal with incremental data

In Detail

Regression is the process of learning relationships between inputs and continuous outputs from example data, which enables predictions for novel inputs. There are many kinds of regression algorithms, and the aim of this book is to explain which is the right one to use for each set of problems and how to prepare real-world data for it. With this book you will learn to define a simple regression problem and evaluate its performance. The book will help you understand how to properly parse a dataset, clean it, and create an output matrix optimally built for regression. You will begin with a simple regression algorithm to solve some data science problems and then progress to more complex algorithms. The book will enable you to use regression models to predict outcomes and take critical business decisions. Through the book, you will gain knowledge to use Python for building fast better linear models and to apply the results in Python or in any computer language you prefer.

Style and approach

This is a practical tutorial-based book. You will be given an example problem and then supplied with the relevant code and how to walk through it. The details are provided in a step by step manner, followed by a thorough explanation of the math underlying the solution. This approach will help you leverage your own data using the same techniques.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Regression Analysis with Python by Luca Massaron, Alberto Boschetti in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Processing. We have over one million books available in our catalogue for you to explore.

Information

Regression Analysis with Python


Table of Contents

Regression Analysis with Python
Credits
About the Authors
About the Reviewers
www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Regression – The Workhorse of Data Science
Regression analysis and data science
Exploring the promise of data science
The challenge
The linear models
What you are going to find in the book
Python for data science
Installing Python
Choosing between Python 2 and Python 3
Step-by-step installation
Installing packages
Package upgrades
Scientific distributions
Introducing Jupyter or IPython
Python packages and functions for linear models
NumPy
SciPy
Statsmodels
Scikit-learn
Summary
2. Approaching Simple Linear Regression
Defining a regression problem
Linear models and supervised learning
Reflecting on predictive variables
Reflecting on response variables
The family of linear models
Preparing to discover simple linear regression
Starting from the basics
A measure of linear relationship
Extending to linear regression
Regressing with Statsmodels
The coefficient of determination
Meaning and significance of coefficients
Evaluating the fitted values
Correlation is not causation
Predicting with a regression model
Regressing with Scikit-learn
Minimizing the cost function
Explaining the reason for using squared errors
Pseudoinverse and other optimization methods
Gradient descent at work
Summary
3. Multiple Regression in Action
Using multiple features
Model building with Statsmodels
Using formulas as an alternative
The correlation matrix
Revisiting gradient descent
Feature scaling
Unstandardizing coefficients
Estimating feature importance
Inspecting standardized coefficients
Comparing models by R-squared
Interaction models
Discovering interactions
Polynomial regression
Testing linear versus cubic transformation
Going for higher-degree solutions
Introducing underfitting and overfitting
Summary
4. Logistic Regression
Defining a classification problem
Formalization of the problem: binary classification
Assessing the classifier's performance
Defining a probability-based approach
More on the logistic and logit functions
Let's see some code
Pros and cons of logistic regression
Revisiting gradient descent
Multiclass Logistic Regression
An example
Summary
5. Data Preparation
Numeric feature scaling
Mean centering
Standardization
Normalization
The logistic regression case
Qualitative feature encoding
Dummy coding with Pandas
DictVectorizer and one-hot encoding
Feature hasher
Numeric feature transformation
Observing residuals
Summarizations by binning
Missing data
Missing data imputation
Keeping track of missing values
Outliers
Outliers on the response
Outliers among the predictors
Removing or replacing outliers
Summary
6. Achieving Generalization
Checking on out-of-sample data
Testing by sample split
Cross-validation
Bootstrapping
Greedy selection of features
The Madelon dataset
Univariate selection of features
Recursive feature selection
Regularization optimized by grid-search
Ridge (L2 regularization)
Grid search for optimal parameters
Random grid search
Lasso (L1 regularization)
Elastic net
Stability selection
Experimenting with the Madelon
Summary
7. Online and Batch Learning
Batch learning
Online mini-batch learning
A real example
Streaming scenario without a test set
Summary
8. Advanced Regression Methods
Least Angle Regression
Visual showcase of LARS
A code example
LARS wrap up
Bayesian regression
Bayesian regression wrap up
SGD classification with hinge loss
Comparison with logistic regression
SVR
SVM wrap up
Regression trees (CART)
Regression tree wrap up
Bagging and boosting
Bagging
Boosting
Ensemble wrap up
Gradient Boosting Regressor with LAD
GBM with LAD wrap up
Summary
9. Real-world Applications for Regression Models
Downloading the datasets
Time series problem dataset
Regression problem dataset
Multiclass classification problem dataset
Ranking problem dataset
A regression problem
Testing a classifier instead of a regressor
An imbalanced and multiclass classification problem
A ranking problem
A time series problem
Open questions
Summary
Index

Regression Analysis with Python

Copyright © 2016 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: February 2016
Production reference: 1250216
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78528-631-5
www.packtpub.com

Credits

Authors
Luca Massaron
Alberto Boschetti
Reviewers
Giuliano Janson
Zacharias Voulgaris
Commissioning Editor
Kunal Parikh
Acquisition Editor
Sonali Vernekar
Content Development Editor
Siddhesh Salvi
Technical Editor
Shivani Kiran Mistry
Copy Editor
Stephen Copestake
Project Coordinator
Nidhi Joshi
Proofreader
Safis Editing
Indexer
Mariammal Chettiyar
Graphics
Disha Haria
Production Coordinator
Nilesh Mohite
Cover Work
Nilesh Mohite

About the Authors

Luca Massaron is a data scientist and a marketing research director who is specialized in multivariate statistical analysis, machine learning, and customer insight with over a decade of experience in solving real-world problems and in generating value for stakeholders by applying reasoning, statistics, data mining, and algorithms. From being a pioneer of Web audience analysis in Italy to achieving the rank of a top ten Kaggler, he has always been very passionate about everything regarding data and its analysis and also about demonstrating the potential of data-driven knowledge discovery to both experts and non-experts. Favoring simplicity over unnecessary sophistication, he believes that a lot can be achieved in data science just by doing the essentials.
Alberto Boschetti is a data scientist, with an expertise in signal processing and statistics. He holds a Ph.D. in telecommunication engineering and currently lives and works in London. In his work projects, he faces daily challenges that span from natural language processing (NLP) and machine learning to distributed processing. He is very passionate about his job and always tries to stay updated about the latest developments in data science technologies, attending m...

Table of contents

  1. Regression Analysis with Python