Mastering Predictive Analytics with R - Second Edition
eBook - ePub

Mastering Predictive Analytics with R - Second Edition

James D. Miller, Rui Miguel Forte

Share book
  1. 448 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Mastering Predictive Analytics with R - Second Edition

James D. Miller, Rui Miguel Forte

Book details
Book preview
Table of contents
Citations

About This Book

Master the craft of predictive modeling in R by developing strategy, intuition, and a solid foundation in essential conceptsAbout This Book• Grasping the major methods of predictive modeling and moving beyond black box thinking to a deeper level of understanding• Leveraging the flexibility and modularity of R to experiment with a range of different techniques and data types• Packed with practical advice and tips explaining important concepts and best practices to help you understand quickly and easilyWho This Book Is ForAlthough budding data scientists, predictive modelers, or quantitative analysts with only basic exposure to R and statistics will find this book to be useful, the experienced data scientist professional wishing to attain master level status, will also find this book extremely valuable.. This book assumes familiarity with the fundamentals of R, such as the main data types, simple functions, and how to move data around. Although no prior experience with machine learning or predictive modeling is required, there are some advanced topics provided that will require more than novice exposure.What You Will Learn• Master the steps involved in the predictive modeling process• Grow your expertise in using R and its diverse range of packages• Learn how to classify predictive models and distinguish which models are suitable for a particular problem• Understand steps for tidying data and improving the performing metrics• Recognize the assumptions, strengths, and weaknesses of a predictive model• Understand how and why each predictive model works in R• Select appropriate metrics to assess the performance of different types of predictive model• Explore word embedding and recurrent neural networks in R• Train models in R that can work on very large datasetsIn DetailR offers a free and open source environment that is perfect for both learning and deploying predictive modeling solutions. With its constantly growing community and plethora of packages, R offers the functionality to deal with a truly vast array of problems.The book begins with a dedicated chapter on the language of models and the predictive modeling process. You will understand the learning curve and the process of tidying data. Each subsequent chapter tackles a particular type of model, such as neural networks, and focuses on the three important questions of how the model works, how to use R to train it, and how to measure and assess its performance using real-world datasets. How do you train models that can handle really large datasets? This book will also show you just that. Finally, you will tackle the really important topic of deep learning by implementing applications on word embedding and recurrent neural networks.By the end of this book, you will have explored and tested the most popular modeling techniques in use on real- world datasets and mastered a diverse range of techniques in predictive analytics using R.Style and approachThis book takes a step-by-step approach in explaining the intermediate to advanced concepts in predictive analytics. Every concept is explained in depth, supplemented with practical examples applicable in a real-world setting.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Mastering Predictive Analytics with R - Second Edition an online PDF/ePUB?
Yes, you can access Mastering Predictive Analytics with R - Second Edition by James D. Miller, Rui Miguel Forte in PDF and/or ePUB format, as well as other popular books in Informatik & Datenverarbeitung. We have over one million books available in our catalogue for you to explore.

Information

Year
2017
ISBN
9781787124356
Edition
2

Mastering Predictive Analytics with R Second Edition


Table of Contents

Mastering Predictive Analytics with R Second Edition
Credits
About the Authors
About the Reviewer
www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Customer Feedback
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Gearing Up for Predictive Modeling
Models
Learning from data
The core components of a model
Our first model – k-nearest neighbors
Types of model
Supervised, unsupervised, semi-supervised, and reinforcement learning models
Parametric and nonparametric models
Regression and classification models
Real-time and batch machine learning models
The process of predictive modeling
Defining the model's objective
Collecting the data
Picking a model
Pre-processing the data
Exploratory data analysis
Feature transformations
Encoding categorical features
Missing data
Outliers
Removing problematic features
Feature engineering and dimensionality reduction
Training and assessing the model
Repeating with different models and final model selection
Deploying the model
Summary
2. Tidying Data and Measuring Performance
Getting started
Tidying data
Categorizing data quality
The first step
The next step
The final step
Performance metrics
Assessing regression models
Assessing classification models
Assessing binary classification models
Cross-validation
Learning curves
Plot and ping
Summary
3. Linear Regression
Introduction to linear regression
Assumptions of linear regression
Simple linear regression
Estimating the regression coefficients
Multiple linear regression
Predicting CPU performance
Predicting the price of used cars
Assessing linear regression models
Residual analysis
Significance tests for linear regression
Performance metrics for linear regression
Comparing different regression models
Test set performance
Problems with linear regression
Multicollinearity
Outliers
Feature selection
Regularization
Ridge regression
Least absolute shrinkage and selection operator (lasso)
Implementing regularization in R
Polynomial regression
Summary
4. Generalized Linear Models
Classifying with linear regression
Introduction to logistic regression
Generalized linear models
Interpreting coefficients in logistic regression
Assumptions of logistic regression
Maximum likelihood estimation
Predicting heart disease
Assessing logistic regression models
Model deviance
Test set performance
Regularization with the lasso
Classification metrics
Extensions of the binary logistic classifier
Multinomial logistic regression
Predicting glass type
Ordinal logistic regression
Predicting wine quality
Poisson regression
Negative Binomial regression
Summary
5. Neural Networks
The biological neuron
The artificial neuron
Stochastic gradient descent
Gradient descent and local minima
The perceptron algorithm
Linear separation
The logistic neuron
Multilayer perceptron networks
Training multilayer perceptron networks
The back propagation algorithm
Predicting the energy efficiency of buildings
Evaluating multilayer perceptrons for regression
Predicting glass type revisited
Predicting handwritten digits
Receiver operating characteristic curves
Radial basis function networks
Summary
6. Support Vector Machines
Maximal margin classification
Support vector classification
Inner products
Kernels and support vector machines
Predicting chemical biodegration
Predicting credit scores
Multiclass classification with support vector machines
Summary
7. Tree-Based Methods
The intuition for tree models
Algorithms for training decision trees
Classification and regression trees
CART regression trees
Tree pruning
Missing data
Regression model trees
CART classification trees
C5.0
Predicting class membership on synthetic 2D data
Predicting the authenticity of banknotes
Predicting complex skill learning
Tuning model parameters in CART trees
Variable importance in tree models
Regression model trees in action
Improvements to the M5 model
Summary
8. Dimensionality Reduction
Defining DR
Correlated data analyses
Scatterplots
Causation
The degree of correlation
Reporting on correlation
Principal component analysis
Using R to understand PCA
Independent component analysis
Defining independence
ICA pre-processing
Factor analysis
Explore and confirm
Using R for factor analysis
The output
NNMF
Summary
9. Ensemble Methods
Bagging
Margins and out-of-bag observations
Predicting complex skill learning with bagging
Predicting heart disease with bagging
Limitations of bagging
Boosting
AdaBoost
AdaBoost for binary classification
Predicting atmospheric gamma ray radiation
Predicting complex skill learning with boosting
Limitations of boosting
Random forests
The importance of variables in random forests
XGBoost
Summary
10. Probabilistic Graphical Models
A little graph theory
Bayes' theorem
Conditional independence
Bayesian networks
The Naïve Bayes classifier
Predicting the sentiment of movie reviews
Predicting promoter gene sequences
Predicting letter patterns in English words
Summary
11. Topic Modeling
An overview of topic modeling
Latent Dirichlet Allocation
The Dirichlet distribution
The generative process
Fitting an LDA model
Modeling the topics of online news stories
Model stability
Finding the number of topics
Topic distributions
Word distributions
LDA extensions
Modeling tweet topics
Word clouding
Summary
12. Recommendation Systems
Rating matrix
Measuring user similarity
Collaborative filtering
User-based collaborative filtering
Item-based collaborative filtering
Singular value decomposition
Predicting recommendations for movies and jokes
Loading and pre-processing the data
Exploring the data
Evaluating binary top-N recommendations
Evaluating non-binary top-N recommendations
Evaluating individual predictions
Other approaches to recommendation systems
Summary
13. Scaling Up
Starting the project
Data definition
Experience
Data of scale – big data
Using Excel to gauge your data
Characteristics of big data
Volume
Varieties
Sources and spans
Structure
Statistical noise
Training models at scale
Pain by phase
Specific challenges
Heterogeneity
Scale
Location
Timeliness
Privacy
Collaborations
Reproducibility
A path forward
Opportunities
Bigger data, bigger hardware
Breaking up
Sampling
Aggregation
Dimensional reduction
Alternatives
Chunking
Alternative language integrations
Summary
14. Deep Learning
Machine learning or deep learning
What is deep learning?
An alternative to manual instruction
Growing importance
Deeper data?
Deep learning for IoT
Use cases
Word embedding
Word prediction
Word vectors
Numerical representations of contextual similarities
Netflix learns
Implementations
Deep learning architectures
Artificial neural networks
Recurrent neural networks
Summary
Index

Mastering Predictive Analytics with R Second Edition

Copyright © 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this...

Table of contents