Hands-On Machine Learning with R
eBook - ePub

Hands-On Machine Learning with R

Brad Boehmke, Brandon M. Greenwell

Partager le livre
  1. 456 pages
  2. English
  3. ePUB (adapté aux mobiles)
  4. Disponible sur iOS et Android
eBook - ePub

Hands-On Machine Learning with R

Brad Boehmke, Brandon M. Greenwell

DĂ©tails du livre
Aperçu du livre
Table des matiĂšres
Citations

À propos de ce livre

Hands-on Machine Learning with R provides a practical and applied approach to learning and developing intuition into today's most popular machine learning methods. This book serves as a practitioner's guide to the machine learning process and is meant to help the reader learn to apply the machine learning stack within R, which includes using various R packages such as glmnet, h2o, ranger, xgboost, keras, and others to effectively model and gain insight from their data. The book favors a hands-on approach, providing an intuitive understanding of machine learning concepts through concrete examples and just a little bit of theory.

Throughout this book, the reader will be exposed to the entire machine learning process including feature engineering, resampling, hyperparameter tuning, model evaluation, and interpretation. The reader will be exposed to powerful algorithms such as regularized regression, random forests, gradient boosting machines, deep learning, generalized low rank models, and more! By favoring a hands-on approach and using real word data, the reader will gain an intuitive understanding of the architectures and engines that drive these algorithms and packages, understand when and how to tune the various hyperparameters, and be able to interpret model results. By the end of this book, the reader should have a firm grasp of R's machine learning stack and be able to implement a systematic approach for producing high quality modeling results.

Features:

· Offers a practical and applied introduction to the most popular machine learning methods.

· Topics covered include feature engineering, resampling, deep learning and more.

· Uses a hands-on approach and real world data.

Foire aux questions

Comment puis-je résilier mon abonnement ?
Il vous suffit de vous rendre dans la section compte dans paramĂštres et de cliquer sur « RĂ©silier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez rĂ©siliĂ© votre abonnement, il restera actif pour le reste de la pĂ©riode pour laquelle vous avez payĂ©. DĂ©couvrez-en plus ici.
Puis-je / comment puis-je télécharger des livres ?
Pour le moment, tous nos livres en format ePub adaptĂ©s aux mobiles peuvent ĂȘtre tĂ©lĂ©chargĂ©s via l’application. La plupart de nos PDF sont Ă©galement disponibles en tĂ©lĂ©chargement et les autres seront tĂ©lĂ©chargeables trĂšs prochainement. DĂ©couvrez-en plus ici.
Quelle est la différence entre les formules tarifaires ?
Les deux abonnements vous donnent un accĂšs complet Ă  la bibliothĂšque et Ă  toutes les fonctionnalitĂ©s de Perlego. Les seules diffĂ©rences sont les tarifs ainsi que la pĂ©riode d’abonnement : avec l’abonnement annuel, vous Ă©conomiserez environ 30 % par rapport Ă  12 mois d’abonnement mensuel.
Qu’est-ce que Perlego ?
Nous sommes un service d’abonnement Ă  des ouvrages universitaires en ligne, oĂč vous pouvez accĂ©der Ă  toute une bibliothĂšque pour un prix infĂ©rieur Ă  celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! DĂ©couvrez-en plus ici.
Prenez-vous en charge la synthÚse vocale ?
Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte Ă  haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accĂ©lĂ©rer ou le ralentir. DĂ©couvrez-en plus ici.
Est-ce que Hands-On Machine Learning with R est un PDF/ePUB en ligne ?
Oui, vous pouvez accĂ©der Ă  Hands-On Machine Learning with R par Brad Boehmke, Brandon M. Greenwell en format PDF et/ou ePUB ainsi qu’à d’autres livres populaires dans Economics et Statistics for Business & Economics. Nous disposons de plus d’un million d’ouvrages Ă  dĂ©couvrir dans notre catalogue.

Informations

Année
2019
ISBN
9781000730432

Part II

Supervised Learning

4

Linear Regression

Linear regression, a staple of classical statistical modeling, is one of the simplest algorithms for doing supervised learning. Though it may seem somewhat dull compared to some of the more modern statistical learning approaches described in later chapters, linear regression is still a useful and widely applied statistical learning method. Moreover, it serves as a good starting point for more advanced approaches; as we will see in later chapters, many of the more sophisticated statistical learning approaches can be seen as generalizations to or extensions of ordinary linear regression. Consequently, it is important to have a good understanding of linear regression before studying more complex learning methods. This chapter introduces linear regression with an emphasis on prediction, rather than inference. An excellent and comprehensive overview of linear regression is provided in Kutner et al. (2005). See Faraway (2016b) for a discussion of linear regression in R (the book’s website also provides Python scripts).

4.1 Prerequisites

This chapter leverages the following packages:
# Helper packages
library(dplyr) # for data manipulation
library(ggplot2) # for awesome graphics
# Modeling packages
library(caret) # for cross-validation, etc.
# Model interpretability packages
library(vip) # variable importance
We’ll also continue working with the ames_train data set created in Section 2.7.

4.2 Simple linear regression

Pearson’s correlation coefficient is often used to quantify the strength of the linear association between two continuous variables. In this section, we seek to fully characterize that linear relationship. Simple linear regression (SLR) assumes that the statistical relationship between two continuous variables (say X and Y) is (at least approximately) linear:
Yi=ÎČ0+ÎČ1Xi+Ï”i, for i=1,2,
,n,
(4.1)
where Yi represents the i-th response value, Xi represents the i-th feature value, ÎČ0 and ÎČ1 are fixed, but unknown constants (commonly referred to as coefficients or parameters) that represent the intercept and slope of the regression line, respectively, and Ï”i represents noise or random error. In this chapter, we’ll assume that the errors are normally distributed with mean zero and constant variance σ2, denoted ∌iid (0, σ2). Since the random errors are centered around zero (i.e., E (Ï”) = 0), linear regression is really a problem of estimating a conditional mean:
E(Yi|Xi)=ÎČ0+ÎČ1Xi.
(4.2)
For brevity, we often drop the conditional piece and write E (Y|X) = E (Y). Consequently, the interpretation of the coefficients is in terms of the average, or mean response. For example, the intercept ÎČ0 represents the average response value when X = 0 (it is often not meaningful or of interest and is sometimes referred to as a bias term). The slope ÎČ1 represents the increase in the average response per one-unit increase in X (i.e., it is a rate of change).

4.2.1 Estimation

Ideally, we want estimates of ÎČ0 and ÎČ1 that give us the “best fitting” line. But what is meant by “best fitting”? The most common approach is to use the method of least squares (LS) estimation; this form of linear regression is often referred to as ordinary least squares (OLS) regression. There are multiple ways to measure “best fitting”, but the LS criterion finds the “best fitting” line by minimizing the residual sum of squares (RSS):
RSS(ÎČ0,ÎČ1)=∑i=1n[Yi−(ÎČ0+ÎČ1Xi)]2=∑i=1n(Yi−ÎČ0−ÎČ1Xi)2.
(4.3)
The LS estimates of ÎČ0 and ÎČ1 are denoted as ÎČ̂0 and ÎČ̂1, respectively. Once obtained, we can generate predicted values, say at X = Xnew, using the estimated regression equation:
Y^new=ÎČ^0+ÎČ^1Xnew,
(4.4)
where Y^new=E(Ynew|X⌱=Xnew) is the estimated mean response at X = Xnew.
With the Ames housing data, suppose we wanted to model a linear relationship between the total above ground living space of a home (Gr_Liv_Area) and sale price (Sale_Price). To perform an OLS regression model in R we can use the lm() function:
model1 <- lm(Sale_Price ~ Gr_Liv_Area, data = ames_train)
The fitted model (model1) is displayed in the left plot in Figure 4.1 where the points represent the values of Sale_Price in the training data. In the right plot of Figure 4.1, the vertical lines represent the individual errors, called residuals, associated with each observation. The OLS criterion in Equation (4.3) identifies the “best fitting” line that minimizes the sum of squares of these residuals.
Image
FIGURE 4.1: The least squares fit from regressing sale price on living space for the the Ames housing data. Left: Fitted regressi...

Table des matiĂšres