Learning Predictive Analytics with Python
eBook - ePub

Learning Predictive Analytics with Python

Ashish Kumar

Share book
  1. 354 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Learning Predictive Analytics with Python

Ashish Kumar

Book details
Book preview
Table of contents
Citations

About This Book

Gain practical insights into predictive modelling by implementing Predictive Analytics algorithms on public datasets with Python

About This Book

  • A step-by-step guide to predictive modeling including lots of tips, tricks, and best practices
  • Get to grips with the basics of Predictive Analytics with Python
  • Learn how to use the popular predictive modeling algorithms such as Linear Regression, Decision Trees, Logistic Regression, and Clustering

Who This Book Is For

If you wish to learn how to implement Predictive Analytics algorithms using Python libraries, then this is the book for you. If you are familiar with coding in Python (or some other programming/statistical/scripting language) but have never used or read about Predictive Analytics algorithms, this book will also help you. The book will be beneficial to and can be read by any Data Science enthusiasts. Some familiarity with Python will be useful to get the most out of this book, but it is certainly not a prerequisite.

What You Will Learn

  • Understand the statistical and mathematical concepts behind Predictive Analytics algorithms and implement Predictive Analytics algorithms using Python libraries
  • Analyze the result parameters arising from the implementation of Predictive Analytics algorithms
  • Write Python modules/functions from scratch to execute segments or the whole of these algorithms
  • Recognize and mitigate various contingencies and issues related to the implementation of Predictive Analytics algorithms
  • Get to know various methods of importing, cleaning, sub-setting, merging, joining, concatenating, exploring, grouping, and plotting data with pandas and numpy
  • Create dummy datasets and simple mathematical simulations using the Python numpy and pandas libraries
  • Understand the best practices while handling datasets in Python and creating predictive models out of them

In Detail

Social Media and the Internet of Things have resulted in an avalanche of data. Data is powerful but not in its raw form - It needs to be processed and modeled, and Python is one of the most robust tools out there to do so. It has an array of packages for predictive modeling and a suite of IDEs to choose from. Learning to predict who would win, lose, buy, lie, or die with Python is an indispensable skill set to have in this data age.

This book is your guide to getting started with Predictive Analytics using Python. You will see how to process data and make predictive models from it. We balance both statistical and mathematical concepts, and implement them in Python using libraries such as pandas, scikit-learn, and numpy.

You'll start by getting an understanding of the basics of predictive modeling, then you will see how to cleanse your data of impurities and get it ready it for predictive modeling. You will also learn more about the best predictive modeling algorithms such as Linear Regression, Decision Trees, and Logistic Regression. Finally, you will see the best practices in predictive modeling, as well as the different applications of predictive modeling in the modern world.

Style and approach

All the concepts in this book been explained and illustrated using a dataset, and in a step-by-step manner. The Python code snippet to implement a method or concept is followed by the output, such as charts, dataset heads, pictures, and so on. The statistical concepts are explained in detail wherever required.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Learning Predictive Analytics with Python an online PDF/ePUB?
Yes, you can access Learning Predictive Analytics with Python by Ashish Kumar in PDF and/or ePUB format, as well as other popular books in Computer Science & Programming in Python. We have over one million books available in our catalogue for you to explore.

Information

Year
2016
ISBN
9781783983261
Edition
1

Learning Predictive Analytics with Python


Table of Contents

Learning Predictive Analytics with Python
Credits
Foreword
About the Author
Acknowledgments
About the Reviewer
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Getting Started with Predictive Modelling
Introducing predictive modelling
Scope of predictive modelling
Ensemble of statistical algorithms
Statistical tools
Historical data
Mathematical function
Business context
Knowledge matrix for predictive modelling
Task matrix for predictive modelling
Applications and examples of predictive modelling
LinkedIn's "People also viewed" feature
What it does?
How is it done?
Correct targeting of online ads
How is it done?
Santa Cruz predictive policing
How is it done?
Determining the activity of a smartphone user using accelerometer data
How is it done?
Sport and fantasy leagues
How was it done?
Python and its packages – download and installation
Anaconda
Standalone Python
Installing a Python package
Installing pip
Installing Python packages with pip
Python and its packages for predictive modelling
IDEs for Python
Summary
2. Data Cleaning
Reading the data – variations and examples
Data frames
Delimiters
Various methods of importing data in Python
Case 1 – reading a dataset using the read_csv method
The read_csv method
Use cases of the read_csv method
Passing the directory address and filename as variables
Reading a .txt dataset with a comma delimiter
Specifying the column names of a dataset from a list
Case 2 – reading a dataset using the open method of Python
Reading a dataset line by line
Changing the delimiter of a dataset
Case 3 – reading data from a URL
Case 4 – miscellaneous cases
Reading from an .xls or .xlsx file
Writing to a CSV or Excel file
Basics – summary, dimensions, and structure
Handling missing values
Checking for missing values
What constitutes missing data?
How missing values are generated and propagated
Treating missing values
Deletion
Imputation
Creating dummy variables
Visualizing a dataset by basic plotting
Scatter plots
Histograms
Boxplots
Summary
3. Data Wrangling
Subsetting a dataset
Selecting columns
Selecting rows
Selecting a combination of rows and columns
Creating new columns
Generating random numbers and their usage
Various methods for generating random numbers
Seeding a random number
Generating random numbers following probability distributions
Probability density function
Cumulative density function
Uniform distribution
Normal distribution
Using the Monte-Carlo simulation to find the value of pi
Geometry and mathematics behind the calculation of pi
Generating a dummy data frame
Grouping the data – aggregation, filtering, and transformation
Aggregation
Filtering
Transformation
Miscellaneous operations
Random sampling – splitting a dataset in training and testing datasets
Method 1 – using the Customer Churn Model
Method 2 – using sklearn
Method 3 – using the shuffle function
Concatenating and appending data
Merging/joining datasets
Inner Join
Left Join
Right Join
An example of the Inner Join
An example of the Left Join
An example of the Right Join
Summary of Joins in terms of their length
Summary
4. Statistical Concepts for Predictive Modelling
Random sampling and the central limit theorem
Hypothesis testing
Null versus alternate hypothesis
Z-statistic and t-statistic
Confidence intervals, significance levels, and p-values
Different kinds of hypothesis test
A step-by-step guide to do a hypothesis test
An example of a hypothesis test
Chi-square tests
Correlation
Summary
5. Linear Regression with Python
Understanding the maths behind linear regression
Linear regression using simulated data
Fitting a linear regression model and checking its efficacy
Finding the optimum value of variable coefficients
Making sense of result parameters
p-values
F-statistics
Residual Standard Error
Implementing linear regression with Python
Linear regression using the statsmodel library
Multiple linear regression
Multi-collinearity
Variance Inflation Factor
Model validation
Training and testing data split
Summary of models
Linear regression with scikit-learn
Feature selection with scikit-learn
Handling other issues in linear regression
Handling categorical variables
Transforming a variable to fit non-linear relations
Handling outliers
Other considerations and assumptions for linear regression
Summary
6. Logistic Regression with Python
Linear regression versus logistic regression
Understanding the math behind logistic regression
Contingency tables
Conditional probability
Odds ratio
Moving on to logistic regression from linear regression
Estimation using the Maximum...

Table of contents