Mastering Predictive Analytics with Python
eBook - ePub

Mastering Predictive Analytics with Python

Joseph Babcock

Share book
  1. 334 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Mastering Predictive Analytics with Python

Joseph Babcock

Book details
Book preview
Table of contents
Citations

About This Book

Exploit the power of data in your business by building advanced predictive modeling applications with Python

About This Book

  • Master open source Python tools to build sophisticated predictive models
  • Learn to identify the right machine learning algorithm for your problem with this forward-thinking guide
  • Grasp the major methods of predictive modeling and move beyond the basics to a deeper level of understanding

Who This Book Is For

This book is designed for business analysts, BI analysts, data scientists, or junior level data analysts who are ready to move from a conceptual understanding of advanced analytics to an expert in designing and building advanced analytics solutions using Python. You're expected to have basic development experience with Python.

What You Will Learn

  • Gain an insight into components and design decisions for an analytical application
  • Master the use Python notebooks for exploratory data analysis and rapid prototyping
  • Get to grips with applying regression, classification, clustering, and deep learning algorithms
  • Discover the advanced methods to analyze structured and unstructured data
  • Find out how to deploy a machine learning model in a production environment
  • Visualize the performance of models and the insights they produce
  • Scale your solutions as your data grows using Python
  • Ensure the robustness of your analytic applications by mastering the best practices of predictive analysis

In Detail

The volume, diversity, and speed of data available has never been greater. Powerful machine learning methods can unlock the value in this information by finding complex relationships and unanticipated trends. Using the Python programming language, analysts can use these sophisticated methods to build scalable analytic applications to deliver insights that are of tremendous value to their organizations.

In Mastering Predictive Analytics with Python, you will learn the process of turning raw data into powerful insights. Through case studies and code examples using popular open-source Python libraries, this book illustrates the complete development process for analytic applications and how to quickly apply these methods to your own data to create robust and scalable prediction services.

Covering a wide range of algorithms for classification, regression, clustering, as well as cutting-edge techniques such as deep learning, this book illustrates not only how these methods work, but how to implement them in practice. You will learn to choose the right approach for your problem and how to develop engaging visualizations to bring the insights of predictive modeling to life

Style and approach

This book emphasizes on explaining methods through example data and code, showing you templates that you can quickly adapt to your own use cases. It focuses on both a practical application of sophisticated algorithms and the intuitive understanding necessary to apply the correct method to the problem at hand. Through visual examples, it also demonstrates how to convey insights through insightful charts and reporting.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Mastering Predictive Analytics with Python an online PDF/ePUB?
Yes, you can access Mastering Predictive Analytics with Python by Joseph Babcock in PDF and/or ePUB format, as well as other popular books in Commerce & Business Intelligence. We have over one million books available in our catalogue for you to explore.

Information

Year
2016
ISBN
9781785882715
Edition
1

Mastering Predictive Analytics with Python


Table of Contents

Mastering Predictive Analytics with Python
Credits
About the Author
About the Reviewer
www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. From Data to Decisions – Getting Started with Analytic Applications
Designing an advanced analytic solution
Data layer: warehouses, lakes, and streams
Modeling layer
Deployment layer
Reporting layer
Case study: sentiment analysis of social media feeds
Data input and transformation
Sanity checking
Model development
Scoring
Visualization and reporting
Case study: targeted e-mail campaigns
Data input and transformation
Sanity checking
Model development
Scoring
Visualization and reporting
Summary
2. Exploratory Data Analysis and Visualization in Python
Exploring categorical and numerical data in IPython
Installing IPython notebook
The notebook interface
Loading and inspecting data
Basic manipulations – grouping, filtering, mapping, and pivoting
Charting with Matplotlib
Time series analysis
Cleaning and converting
Time series diagnostics
Joining signals and correlation
Working with geospatial data
Loading geospatial data
Working in the cloud
Introduction to PySpark
Creating the SparkContext
Creating an RDD
Creating a Spark DataFrame
Summary
3. Finding Patterns in the Noise – Clustering and Unsupervised Learning
Similarity and distance metrics
Numerical distance metrics
Correlation similarity metrics and time series
Similarity metrics for categorical data
K-means clustering
Affinity propagation – automatically choosing cluster numbers
k-medoids
Agglomerative clustering
Where agglomerative clustering fails
Streaming clustering in Spark
Summary
4. Connecting the Dots with Models – Regression Methods
Linear regression
Data preparation
Model fitting and evaluation
Statistical significance of regression outputs
Generalize estimating equations
Mixed effects models
Time series data
Generalized linear models
Applying regularization to linear models
Tree methods
Decision trees
Random forest
Scaling out with PySpark – predicting year of song release
Summary
5. Putting Data in its Place – Classification Methods and Analysis
Logistic regression
Multiclass logistic classifiers: multinomial regression
Formatting a dataset for classification problems
Learning pointwise updates with stochastic gradient descent
Jointly optimizing all parameters with second-order methods
Fitting the model
Evaluating classification models
Strategies for improving classification models
Separating Nonlinear boundaries with Support vector machines
Fitting and SVM to the census data
Boosting – combining small models to improve accuracy
Gradient boosted decision trees
Comparing classification methods
Case study: fitting classifier models in pyspark
Summary
6. Words and Pixels – Working with Unstructured Data
Working with textual data
Cleaning textual data
Extracting features from textual data
Using dimensionality reduction to simplify datasets
Principal component analysis
Latent Dirichlet Allocation
Using dimensionality reduction in predictive modeling
Images
Cleaning image data
Thresholding images to highlight objects
Dimensionality reduction for image analysis
Case Study: Training a Recommender System in PySpark
Summary
7. Learning from the Bottom Up – Deep Networks and Unsupervised Features
Learning patterns with neural networks
A network of one – the perceptron
Combining perceptrons – a single-layer neural network
Parameter fitting with back-propagation
Discriminative versus generative models
Vanishing gradients and explaining away
Pretraining belief networks
Using dropout to regularize networks
Convolutional networks and rectified units
Compressing Data with autoencoder networks
Optimizing the learning rate
The TensorFlow library and digit recognition
The MNIST data
Constructing the network
Summary
8. Sharing Models with Prediction Services
The architecture of a prediction service
Clients and making requests
The GET requests
The POST request
The HEAD request
The PUT request
The DELETE request
Server – the web traffic controller
Application – the engine of the predictive services
Persisting information with database systems
Case study – logistic regression service
Setting up the database
The web server
The web application
The flow of a prediction service – training a model
On-demand and bulk prediction
Summary
9. Reporting and Testing – Iterating on Analytic Systems
Checking the health of models with diagnostics
Evaluating changes in model performance
Changes in feature importance
Changes in unsupervised model performance
Iterating on models through A/B testing
Experimental allocation – assigning customers to experiments
Deciding a sample size
Multiple hypothesis testing
Guidelines for communication
Translate terms to business values
Visualizing results
Case Study: building a reporting service
The report server
The report application
The visualization layer
Summary
Index

Mastering Predictive Analytics with Python

Copyright © 2016 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: August 2016
Production reference: 1290816
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78588-271-5
www.packtpub.com

Credits

Author
Joseph Babcock
Reviewer
Dipanjan Deb
Commissioning Editor
Kartikey Pandey
Acquisition Editor
Aaron Lazar
Content Development Editor
Sumeet Sawant
Technical Editor
Utkarsha S. Kadam
Copy Editor
Vikrant Phadke
Project Coordinator
Shweta H Birwatkar
Proofreader
Safis Editing
Indexer
Monica Ajmera Mehta
Graphics
Kirk D'Pinha
Production Coordinator
Nilesh Mohite
Cover Work
Nilesh Mohite

About the Author

Joseph Babcock has spent almost a decade exploring complex datasets and combining predictive modeling with visualization to understand correlations and forecast anticipated outcomes. He received a PhD from the Solomon H. Snyder Department of Neuroscience at The Johns Hopkins University School of Medicine, where he used machine learning to predict adverse cardiac side effects of drugs. Outside the academy, he has tackled big data challenges in...

Table of contents