Mastering Predictive Analytics with R
eBook - ePub

Mastering Predictive Analytics with R

Rui Miguel Forte

Buch teilen
  1. 414 Seiten
  2. English
  3. ePUB (handyfreundlich)
  4. Über iOS und Android verfügbar
eBook - ePub

Mastering Predictive Analytics with R

Rui Miguel Forte

Angaben zum Buch
Buchvorschau
Inhaltsverzeichnis
Quellenangaben

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?
Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.
(Wie) Kann ich Bücher herunterladen?
Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.
Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?
Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.
Was ist Perlego?
Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.
Unterstützt Perlego Text-zu-Sprache?
Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.
Ist Mastering Predictive Analytics with R als Online-PDF/ePub verfügbar?
Ja, du hast Zugang zu Mastering Predictive Analytics with R von Rui Miguel Forte im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Informatik & Data Mining. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Information

Jahr
2015
ISBN
9781783982806
Auflage
1

Mastering Predictive Analytics with R


Table of Contents

Mastering Predictive Analytics with R
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Gearing Up for Predictive Modeling
Models
Learning from data
The core components of a model
Our first model: k-nearest neighbors
Types of models
Supervised, unsupervised, semi-supervised, and reinforcement learning models
Parametric and nonparametric models
Regression and classification models
Real-time and batch machine learning models
The process of predictive modeling
Defining the model's objective
Collecting the data
Picking a model
Preprocessing the data
Exploratory data analysis
Feature transformations
Encoding categorical features
Missing data
Outliers
Removing problematic features
Feature engineering and dimensionality reduction
Training and assessing the model
Repeating with different models and final model selection
Deploying the model
Performance metrics
Assessing regression models
Assessing classification models
Assessing binary classification models
Summary
2. Linear Regression
Introduction to linear regression
Assumptions of linear regression
Simple linear regression
Estimating the regression coefficients
Multiple linear regression
Predicting CPU performance
Predicting the price of used cars
Assessing linear regression models
Residual analysis
Significance tests for linear regression
Performance metrics for linear regression
Comparing different regression models
Test set performance
Problems with linear regression
Multicollinearity
Outliers
Feature selection
Regularization
Ridge regression
Least absolute shrinkage and selection operator (lasso)
Implementing regularization in R
Summary
3. Logistic Regression
Classifying with linear regression
Introduction to logistic regression
Generalized linear models
Interpreting coefficients in logistic regression
Assumptions of logistic regression
Maximum likelihood estimation
Predicting heart disease
Assessing logistic regression models
Model deviance
Test set performance
Regularization with the lasso
Classification metrics
Extensions of the binary logistic classifier
Multinomial logistic regression
Predicting glass type
Ordinal logistic regression
Predicting wine quality
Summary
4. Neural Networks
The biological neuron
The artificial neuron
Stochastic gradient descent
Gradient descent and local minima
The perceptron algorithm
Linear separation
The logistic neuron
Multilayer perceptron networks
Training multilayer perceptron networks
Predicting the energy efficiency of buildings
Evaluating multilayer perceptrons for regression
Predicting glass type revisited
Predicting handwritten digits
Receiver operating characteristic curves
Summary
5. Support Vector Machines
Maximal margin classification
Support vector classification
Inner products
Kernels and support vector machines
Predicting chemical biodegration
Cross-validation
Predicting credit scores
Multiclass classification with support vector machines
Summary
6. Tree-based Methods
The intuition for tree models
Algorithms for training decision trees
Classification and regression trees
CART regression trees
Tree pruning
Missing data
Regression model trees
CART classification trees
C5.0
Predicting class membership on synthetic 2D data
Predicting the authenticity of banknotes
Predicting complex skill learning
Tuning model parameters in CART trees
Variable importance in tree models
Regression model trees in action
Summary
7. Ensemble Methods
Bagging
Margins and out-of-bag observations
Predicting complex skill learning with bagging
Predicting heart disease with bagging
Limitations of bagging
Boosting
AdaBoost
Predicting atmospheric gamma ray radiation
Predicting complex skill learning with boosting
Limitations of boosting
Random forests
The importance of variables in random forests
Summary
8. Probabilistic Graphical Models
A little graph theory
Bayes' Theorem
Conditional independence
Bayesian networks
The Naïve Bayes classifier
Predicting the sentiment of movie reviews
Hidden Markov models
Predicting promoter gene sequences
Predicting letter patterns in English words
Summary
9. Time Series Analysis
Fundamental concepts of time series
Time series summary functions
Some fundamental time series
White noise
Fitting a white noise time series
Random walk
Fitting a random walk
Stationarity
Stationary time series models
Moving average models
Autoregressive models
Autoregressive moving average models
Non-stationary time series models
Autoregressive integrated moving average models
Autoregressive conditional heteroscedasticity models
Generalized autoregressive heteroscedasticity models
Predicting intense earthquakes
Predicting lynx trappings
Predicting foreign exchange rates
Other time series models
Summary
10. Topic Modeling
An overview of topic modeling
Latent Dirichlet Allocation
The Dirichlet distribution
The generative process
Fitting an LDA model
Modeling the topics of online news stories
Model stability
Finding the number of topics
Topic distributions
Word distributions
LDA extensions
Summary
11. Recommendation Systems
Rating matrix
Measuring user similarity
Collaborative filtering
User-based collaborative filtering
Item-based collaborative filtering
Singular value decomposition
R and Big Data
Predicting recommendations for movies and jokes
Loading and preprocessing the data
Exploring the data
Evaluating binary top-N recommendations
Evaluating non-binary top-N recommendations
Evaluating individual predictions
Other approaches to recommendation systems
Summary
Index

Mastering Predictive Analytics with R

Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: June 2015
Production reference: 1100615
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78398-280-6
www.packtpub.com

Credits

Author
Rui Miguel Forte
Reviewers
Ajay Dhamija
Prasad Kothari
Dawit Gezahegn Tadesse
Commissioning Editor
Kartikey Pandey
Acquisition Editor
Subho Gupta
Content Development Editor
Govindan Kurumangattu
Technical Editor
Edwin Moses
Copy Editors
Stuti Srivastava
Aditya Nair
Vedangi Narvekar
Project Coordinator
Shipra Chawhan
Proofreaders
Stephen Copestake
Safis Editing
Indexer
Priya Sane
Graphics
Sheetal Aute
Disha Haria
Jason Monteiro
Abhinash Sahu
Production Coordinator
Shantanu Zagade
Cover Work
Shantanu Zagade

About the Author

Rui Miguel Forte is currently the chief data scientist at Workable. He was born and raised in Greece and studied in the UK. He is an experienced data scientist who has over 10 years of work experience in a diverse array of industries spanning mobile marketing, health informatics, education technology, and human resources technology. His projects include t...

Inhaltsverzeichnis