eBook - ePub

Handbook of Regression Analysis

Name: Handbook of Regression Analysis
ISBN: 9781118532836

Samprit Chatterjee,

Jeffrey S. Simonoff,

English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Handbook of Regression Analysis

Samprit Chatterjee,

Jeffrey S. Simonoff,

About this book

A Comprehensive Account for Data Analysts of the Methods and Applications of Regression Analysis.

Written by two established experts in the field, the purpose of the Handbook of Regression Analysis is to provide a practical, one-stop reference on regression analysis. The focus is on the tools that both practitioners and researchers use in real life. It is intended to be a comprehensive collection of the theory, methods, and applications of regression methods, but it has been deliberately written at an accessible level.

The handbook provides a quick and convenient reference or "refresher" on ideas and methods that are useful for the effective analysis of data and its resulting interpretations. Students can use the book as an introduction to and/or summary of key concepts in regression and related course work (including linear, binary logistic, multinomial logistic, count, and nonlinear regression models). Theory underlying the methodology is presented when it advances conceptual understanding and is always supplemented by hands-on examples.

References are supplied for readers wanting more detailed material on the topics discussed in the book. R code and data for all of the analyses described in the book are available via an author-maintained website. "I enjoyed the presentation of the Handbook, and I would be happy to recommend this nice handy book as a reference to my students. The clarity of the writing and proper choices of examples allows the presentations ofmany statisticalmethods shine. The quality of the examples at the end of each chapter is a strength. They entail explanations of the resulting R outputs and successfully guide readers to interpret them." American Statistician

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.

Yes, you can access Handbook of Regression Analysis by Samprit Chatterjee,Jeffrey S. Simonoff in PDF and/or ePUB format, as well as other popular books in Mathematics & Applied Mathematics. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Year

Print ISBN

eBook ISBN

Edition

Topic

Mathematics

Subtopic

Applied Mathematics

Index

Mathematics

PART ONE

The Multiple Linear Regression Model

CHAPTER ONE

Multiple Linear Regression

1.1 Introduction

1.2 Concepts and Background Material

1.2.1 The Linear Regression Model

1.2.2 Estimation Using Least Squares

1.2.3 Assumptions

1.3 Methodology

1.3.1 Interpreting Regression Coefficients

1.3.2 Measuring the Strength of the Regression Relationship

1.3.3 Hypothesis Tests and Confidence Intervals for β

1.3.4 Fitted Values and Predictions

1.3.5 Checking Assumptions Using Residual Plots

1.4 Example – Estimating Home Prices

1.5 Summary

1.1 Introduction

This is a book about regression modeling, but when we refer to regression models, what do we mean? The regression framework can be characterized in the following way:

1. We have one particular variable that we are interested in understanding or modeling, such as sales of a particular product, sale price of a home, or voting preference of a particular voter. This variable is called the target, response, or dependent variable, and is usually represented by y.

2. We have a set of p other variables that we think might be useful in predicting or modeling the target variable (the price of the product, the competitor’s price, and so on; or the lot size, number of bedrooms, number of bathrooms of the home, and so on; or the gender, age, income, party membership of the voter, and so on). These are called the predicting, or independent variables, and are usually represented by x₁, x₂, etc.

Typically, a regression analysis is used for one (or more) of three purposes:

1. modeling the relationship between x and y;

2. prediction of the target variable (forecasting);

3. and testing of hypotheses.

In this chapter we introduce the basic multiple linear regression model, and discuss how this model can be used for these three purposes. Specifically, we discuss the interpretations of the estimates of different regression parameters, the assumptions underlying the model, measures of the strength of the relationship between the target and predictor variables, the construction of tests of hypotheses and intervals related to regression parameters, and the checking of assumptions using diagnostic plots.

1.2 Concepts and Background Material

1.2.1 THE LINEAR REGRESSION MODEL

The data consist of n sets of observations {x_1i, x_2i, … x_pi, y_i}, which represent a random sample from a larger population. It is assumed that these observations satisfy a linear relationship,

(1.1)

where the β coefficients are unknown parameters, and the ε_i are random error terms. By a linear model, it is meant that the model is linear in the parameters; a quadratic model,

paradoxically enough, is a linear model, since x and x² are just versions of x₁ and x₂.

It is important to recognize that this, or any statistical model, is not viewed as a true representation of reality; rather, the goal is that the model be a useful representation of reality. A model can be used to explore the relationships between variables and make accurate forecasts based on those relationships even if it is not the “truth.” Further, any statistical model is only temporary, representing a provisional version of views about the random process being studied. Models can, and should, change, based on analysis using the current model, selection among several candidate models, the acquisition of new data, and so on. Further, it is often the case that there are several different models that are reasonable representations of reality. Having said this, we will sometimes refer to the “true” model, but this should be understood as referring to the underlying form of the currently hypothesized representation of the regression relationship.

The special case of (1.1) with p = 1 corresponds to the simple regression model, and is consistent with the representation in Figure 1.1. The solid line is the true regression line, the expected value of y given the value of x. The dotted lines are the random errors ε_i that account for the lack of a perfect association between the predictor and the target variables.

FIGURE 1.1 The simple linear regression model. The solid line corresponds to the true regression line, and the dotted lines correspond to the random errors ε_i.

1.2.2 ESTIMATION USING LEAST SQUARES

The true regression function represents the expected relationship between the target and the predictor variables, which is unknown. A primary goal of a regression analysis is to estimate this relationship, or equivalently, to estimate the unknown parameters β. This requires a data-based rule, or criterion, that will give a reasonable estimate. The standard approach is least squares regression, where the estimates are chosen to minimize

(1.2)

Figure 1.2 gives a graphical representation of least squares that is based on Figure 1.1. Now the true regression line is represented by the gray line, and the solid black line is the estimated regression line, designed to estimate the (unknown) gray line as closely as possible. For any choice of estimated parameters

, the estimated expected response value given the observed predictor values equals

FIGURE 1.2 Least squares estimation for the simple linear regression model, using the same data as in Figure 1.1. The gray line corresponds to the true regression line, the solid black line corresponds to the fitted least squares line (designed to estimate the gray line), and the lengths of the dotted lines correspond to the residuals. The sum of squared values of the lengths of the dotted lines is minimized by the solid black line.

and is called the fitted value. The difference between the observed value y_i and the fitted value

_i is called the residual, the set of which are represented by the lengths of the dotted lines in Figure 1.2. The least squares regression line minimizes the sum of squares of the lengths of the dotted lines; that is, the ordinary least squares (OLS) estimates minimize the sum of squares of the residuals.

In higher dimensions (p > 1) the true and estimated regression relationships correspond to planes (p = 2) or hyperplanes (p ≥ 3), but otherwise the principles are the same. Figure 1.3 illustrates the case with two predictors. The length of each vertical line corresponds to a residual (solid lines refer to positive residuals while dashed lines refer to negative residuals), and the (least squares) plane that goes through the observations is chosen to minimize the...

Cover
Half Title page
Title page
Copyright page
Dedication
Preface
Part One: The Multiple Linear Regression Model
Part Two: Addressing Violations of Assumptions
Part Three: Categorical Predictors
Part Four: Other Regression Models
Bibliography
Index