An Introduction to Bootstrap Methods with Applications to R
eBook - ePub

An Introduction to Bootstrap Methods with Applications to R

  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

An Introduction to Bootstrap Methods with Applications to R

About this book

A comprehensive introduction to bootstrap methods in the R programming environment

Bootstrap methods provide a powerful approach to statistical data analysis, as they have more general applications than standard parametric methods. An Introduction to Bootstrap Methods with Applications to R explores the practicality of this approach and successfully utilizes R to illustrate applications for the bootstrap and other resampling methods. This book provides a modern introduction to bootstrap methods for readers who do not have an extensive background in advanced mathematics. Emphasis throughout is on the use of bootstrap methods as an exploratory tool, including its value in variable selection and other modeling environments.

The authors begin with a description of bootstrap methods and its relationship to other resampling methods, along with an overview of the wide variety of applications of the approach. Subsequent chapters offer coverage of improved confidence set estimation, estimation of error rates in discriminant analysis, and applications to a wide variety of hypothesis testing and estimation problems, including pharmaceutical, genomics, and economics. To inform readers on the limitations of the method, the book also exhibits counterexamples to the consistency of bootstrap methods.

An introduction to R programming provides the needed preparation to work with the numerous exercises and applications presented throughout the book. A related website houses the book's R subroutines, and an extensive listing of references provides resources for further study.

Discussing the topic at a remarkably practical and accessible level, An Introduction to Bootstrap Methods with Applications to R is an excellent book for introductory courses on bootstrap and resampling methods at the upper-undergraduate and graduate levels. It also serves as an insightful reference for practitioners working with data in engineering, medicine, and the social sciences who would like to acquire a basic understanding of bootstrap methods.

Trusted by 375,005 students

Access to over 1 million titles for a fair monthly price.

Study more efficiently using our study tools.

Information

Publisher
Wiley
Year
2014
Print ISBN
9780470467046
Edition
1
eBook ISBN
9781118625415

1

INTRODUCTION

1.1 HISTORICAL BACKGROUND

The “bootstrap” is one of a number of techniques that is now part of the broad umbrella of nonparametric statistics that are commonly called resampling methods. Some of the techniques are far older than the bootstrap. Permutation methods go back to Fisher (1935) and Pitman (1937, 1938), and the jackknife started with Quenouille (1949). Bootstrapping was made practical through the use of the Monte Carlo approximation, but it too goes back to the beginning of computers in the early 1940s.
However, 1979 is a critical year for the bootstrap because that is when Brad Efron’s paper in the Annals of Statistics was published (Efron, 1979). Efron had defined a resampling procedure that he coined as bootstrap. He constructed it as a simple approximation to the jackknife (an earlier resampling method that was developed by John Tukey), and his original motivation was to derive properties of the bootstrap to better understand the jackknife. However, in many situations, the bootstrap is as good as or better than the jackknife as a resampling procedure. The jackknife is primarily useful for small samples, becoming computationally inefficient for larger samples but has become more feasible as computer speed increases. A clear description of the jackknife and its connecton to the bootstrap can be found in the SIAM monograph Efron (1982). A description of the jackknife is also given in Section 1.2.1.
Although permutation tests were known in the 1930s, an impediment to their use was the large number (i.e., n!) of distinct permutations available for samples of size n. Since ordinary bootstrapping involves sampling with replacement n times for a sample of size n, there are nn possible distinct ordered bootstrap samples (though some are equivalent under the exchangeability assumption because they are permutations of each other). So, complete enumeration of all the bootstrap samples becomes infeasible except in very small sample sizes. Random sampling from the set of possible bootstrap samples becomes a viable way to approximate the distribution of bootstrap samples. The same problem exists for permutations and the same remedy is possible. The only difference is that n! does not grow as fast as nn, and complete enumeration of permutations is possible for larger n than for the bootstrap.
The idea of taking several Monte Carlo samples of size n with replacement from the original observations was certainly an important idea expressed by Efron but was clearly known and practiced prior to Efron (1979). Although it may not be the first time it was used, Julian Simon laid claim to priority for the bootstrap based on his use of the Monte Carlo approximation in Simon (1969). But Simon was only recommending the Monte Carlo approach as a way to teach probability and statistics in a more intuitive way that does not require the abstraction of a parametric probability model for the generation of the original sample. After Efron made the bootstrap popular, Simon and Bruce joined the campaign (see Simon and Bruce, 1991, 1995).
Efron, however, starting with Efron (1979), first connected bootstrapping to the jackknife, delta method, cross-validation, and permutation tests. He was the first to show it to be a real competitor to the jackknife and delta method for estimating the standard error of an estimator. Also, quite early on, Efron recognized the broad applicability of bootstrapping for confidence intervals, hypothesis testing, and more complex problems. These ideas were emphasized in Efron and Gong (1983), Diaconis and Efron (1983), Efron and Tibshirani (1986), and the SIAM monograph (Efron 1982). These influential articles along with the SIAM monograph led to a great deal of research during the 1980s and 1990s. The explosion of bootstrap papers grew at an exponential rate. Key probabilistic results appeared in Singh (1981), Bickel and Freedman (1981, 1984), Beran (1982), Martin (1990), Hall (1986, 1988), Hall and Martin (1988), and Navidi (1989).
In a very remarkable paper, Efron (1983) used simulation comparisons to show that the use of bootstrap bias correction could provide better estimates of classification error rate than the very popular cross-validation approach (often called leave-one-out and originally proposed by Lachenbruch and Mickey, 1968. These results applied when the sample size was small, and classification was restricted to two or three classes only, and the predicting features had multivariate Gaussian distributions. Efron compared several variants of the bootstrap with cross-validation and the resubstitution methods. This led to several follow-up articles that widened the applicability and superiority of a version of the bootstrap called 632. See Chatterjee and Chatterjee (1983), Chernick et al. (1985, 1986, 1988a, b), Jain et al. (1987), and Efron and Tibshirani (1997).
Chernick was a graduate student at Stanford in the late 1970s when the bootstrap activity began on the Stanford and Berkeley campuses. However, oddly the bootstrap did not catch on with many graduate students. Even Brad Efron’s graduate students chose other topics for their dissertation. Gail Gong was the first student of Efron to do a dissertation on the bootstrap. She did very useful applied work on using the bootstrap in model building (particularly for logistic regression subset selection). See Gong (1986). After Gail Gong, a number of graduate students wrote dissertations on the bootstrap under Efron, including Terry Therneau, Rob Tibshirani, and Tim Hesterberg. Michael Martin visited Stanford while working on his dissertation on bootstrap confidence intervals under Peter Hall. At Berkeley, William Navidi did his thesis on bootstrapping in regression and econometric models under David Freedman.
While exciting theoretical results developed for the bootstrap in the 1980s and 1990s, there were also negative results where it was shown that the bootstrap estimate is not “consistent” in the probabilistic sense (i.e., approaches the true parameter value as the sample size becomes infinite). Examples included the mean when the population distribution does not have a finite variance and when the maximum or minimum is taken from a sample. This is illustrated in Athreya (1987a, b), Knight (1989). Angus (1993), and Hall et al. (1993). The first published example of an inconsistent bootstrap estimate appeared in Bickel and Freedman (1981). Shao et al. (2000) showed that a particular approach to bootstrap estimation of individual bioequivalence is also inconsistent. They also provide a modification that is consistent. Generally, the bootstrap is consistent when the central limit theorem applies (a sufficient condition is Lyapanov’s condition that requires existence of the 2 + ή moment of the population distribution). Consistency results in the literature are based on the existence of Edgeworth expansions; so, additional smoothness conditions for the expansion to exist have also been assumed (but it is not known whether or not they are necessary).
One extension of the bootstrap called m-out-of-n was suggested by Bickel and Ren (1996) in light of previous research on it, and it has been shown to be a method to overcome inconsistency of the bootstrap in several instances. In the m-out-of-n bootstrap, sampling is with replacement from the original sample but with a value of m that is smaller than n. See Bickel et al. (1997), Gine and Zinn (1989), Arcones and Gine (1989), Fukuchi (1994), and Politis et al. (1999).
Some bootstrap approaches in time series have been shown to be inconsistent. Lahiri (2003) covered the use of bootstrap in time series and other dependent cases. He showed that there are remedies for the m-dependent and moving block bootstrap cases (see Section 5.5 for some coverage of moving block bootstrap) that are consistent.

1.2 DEFINITION AND RELATIONSHIP TO THE DELTA METHOD AND OTHER RESAMPLING METHODS

We will first provide an informal definition of bootstrap to provide intuition and understanding before a more formal mathematical definition. The objective of bootstrapping is to estimate a parameter based on the data, such as a mean, median, or standard deviation. We are also interested in the properties of the distribution for the parameter’s estimate and may want to construct confidence intervals. But we do not want to make overly restrictive assumptions about the form of the distribution that the observed data came from.
For the simple case of independent observations coming from the same population distribution, the basic element for bootstrapping is the empirical distribution. The empirical distribution is just the discrete distribution that gives equal weight to each data point (i.e., it assigns probability 1/n to each of the original n observations and shall be denoted Fn).
Most of the common parameters that we consider are functionals of the unknown population distribution. A functional is simply a mapping that takes a function F into a real number. In our case, we are only interested in the functionals of cumulative probability distribution functions. So, for example, the mean and variance of a distribution can be represented as functionals in the following way. Let ÎŒ be the mean for a distribution function F, then ÎŒ = ∫ xdF (x) Let σ2 be the variance then σ2 = ∫(x – ÎŒ)2 dF (x). These integrals over the entire possible set of x values in the domain of F are particular examples of functionals. It is interesting that the sample estimates most commonly used for these parameters are the same functionals applied to the Fn.
Now the idea of bootstrap is to use only what you know from the data and not introduce extraneous assumptions about the population distribution. The “bootstrap principle” says that when F is the population distribution and T(F) is the functional that defines the parameter, we wish to estimate based on a sample of size n, let Fn play the role of F and
images
, the bootstrap distribution (soon to be defined), play the role of Fn in the resampling process. Note that the original sample is a sample of n independent identically distributed observations from the distribution F and the sample estimate of the parameter is T(Fn). So, in bootstrapping, we let Fn play the role of F and take n independent and identically distributed observations from Fn. Since Fn is the empirical distribution, this is just sampling randomly with replacement f...

Table of contents

  1. COVER
  2. TABLE OF CONTENTS
  3. TITLE PAGE
  4. COPYRIGHT
  5. PREFACE
  6. ACKNOWLEDGMENTS
  7. LIST OF TABLES
  8. 1 INTRODUCTION
  9. 2 ESTIMATION
  10. 3 CONFIDENCE INTERVALS
  11. 4 HYPOTHESIS TESTING
  12. 5 TIME SERIES
  13. 6 BOOTSTRAP VARIANTS
  14. 7 CHAPTER SPECIAL TOPICS
  15. 8 WHEN THE BOOTSTRAP IS INCONSISTENT AND HOW TO REMEDY IT
  16. AUTHOR INDEX
  17. SUBJECT INDEX

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access An Introduction to Bootstrap Methods with Applications to R by Michael R. Chernick,Robert A. LaBudde in PDF and/or ePUB format, as well as other popular books in Mathematics & Probability & Statistics. We have over one million books available in our catalogue for you to explore.