Goodness-of-Fit-Techniques
eBook - ePub

Goodness-of-Fit-Techniques

Ralph B. D'Agostino

Share book
  1. 576 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Goodness-of-Fit-Techniques

Ralph B. D'Agostino

Book details
Book preview
Table of contents
Citations

About This Book

Conveniently grouping methods by techniques, such as chi-squared and empirical distributionfunction, and also collecting methods of testing for specific famous distributions, this usefulreference is the fust comprehensive.review of the extensive literature on the subject. It surveysthe leading methods of testing fit... provides tables to make the tests available... assessesthe comparative merits of different test procedures... and supplies numerical examples to aidin understanding these techniques.Goodness-of-Fit Techniques shows how to apply the techniques... emphasizes testing for thethree major distributions, normal, exponential, and uniform... discusses the handling of censoreddata... and contains over 650 bibliographic citations that cover the field.Illustrated with tables and drawings, this volume is an ideal reference for mathematical andapplied statisticians, and biostatisticians; professionals in applied science fields, including psychologists, biometricians, physicians, and quality control and reliability engineers; advancedundergraduate- and graduate-level courses on goodness-of-fit techniques; and professional seminarsand symposia on applied statistics, quality control, and reliability.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on ā€œCancel Subscriptionā€ - itā€™s as simple as that. After you cancel, your membership will stay active for the remainder of the time youā€™ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlegoā€™s features. The only differences are the price and subscription period: With the annual plan youā€™ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weā€™ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Goodness-of-Fit-Techniques an online PDF/ePUB?
Yes, you can access Goodness-of-Fit-Techniques by Ralph B. D'Agostino in PDF and/or ePUB format, as well as other popular books in Mathematics & Number Theory. We have over one million books available in our catalogue for you to explore.

Information

Publisher
CRC Press
Year
2017
ISBN
9781351444552
Edition
1

1
Overview

Ralph B. D'Agostino Boston University, Boston, Massachusetts
Michael A. Stephens Simon Fraser University, Burnaby, B.C., Canada

1.1 Goodness-of-Fit Techniques

This book is devoted to the presentation and discussion of goodness-of-fit techniques. By these we mean methods of examining how well a sample of data agrees with a given distribution as its population. The techniques discussed are almost entirely for univariate data, for which there is a vast literature; methods for multivariate data are much less well developed.
In the formal framework of hypothesis testing the null hypothesis H0 is that a given random variable x follows a stated probability law F(x) (for example, the normal distribution or the Weibull distribution); the random variable may come from a process which is under investigation. The goodness-of-fit techniques applied to test H0 are based on measuring in some way the conformity of the sample data (a set of x-values) to the hypothesized distribution, or, equivalently, its discrepancy from it. The techniques usually give formal statistical tests and the measures of consistency or of discrepancy are test statistics.
The null hypothesis H0 can be a simple hypothesis, when F(x) is specified completely, for example, normal with mean Āµ = 100 and standard deviation Ļƒ = 10; or H0 can give an incomplete specification and will then be a composite hypothesis, for example, when it states only that F(x) is normal with unspecified Āµ and Ļƒ.
In most applications of goodness-of-fit techniques, the alternative hypothesis H1 is composite ā€” it gives little or no information on the distribution of the data, and simply states that H0 is false. The major focus is on the measure of agreement of the data with the null hypothesis; in fact, it is usually hoped to accept that H0 is true.
There are several reasons for this. First, the distribution of sample data may throw light on the process that generated the data; if a suggested model for the process is correct, the sample data follow a specific distribution, which can be tested. Also, parameters of the distribution may be connected with important parameters in describing the basic model. Secondly, knowledge of the distribution of data allows for application of standard statistical testing and estimation procedures. For example, if the data follow a normal distribution, inferences concerning the means and variances can be made using t tests, analyses of variances, and F tests; similarly, if the residuals after fitting a regression model are normal, tests may be made on the model parameters. Estimation procedures such as the calculation of confidence intervals, tolerance intervals, and prediction intervals, often depend strongly on the underlying distribution. Finally, when a distribution can be assumed, extreme tail percentiles, which are needed, for example, in environmental work, can be computed.
The fact that it is usually hoped to accept the null hypothesis and proceed with other analyses as if it were true, sets goodness-of-fit testing apart from most statistical testing procedures. In many testing situations it is rejection of the null hypothesis which appears to prove a point. This might be so, for example, in a test for no treatment effects in a factorial analysisā€” rejection of H0 indicates one or more treatments to be better than others. Even when one would like to accept a null hypothesisā€”for example, in a test for no interaction in the above factorial analysisā€”the statistical test is usually clear and the only problem is with the level of significance. In a test of fit, where the alternative is very vague, the appropriate statistical test will often be by no means clear and no general theory of Neyman-Pearson type appears applicable in these situations. Thus many different, sometimes elaborate, procedures have been generated to test the same null hypothesis, and the ideas and motivations behind these are diverse. Even when concepts such as statistical power of the procedures are considered it rarely happens that one testing procedure emerges as superior.
It may happen that the alternative hypothesis has some specification, although it could be incomplete; for example, an alternative to the null hypothesis of normality may be that the random variable has positive skew-ness. When the alternative distribution contains some such specification, tests of fit should be designed to be sensitive to it. Even in these situations uniquely best tests are rarities.
In addition to formal hypothesis testing procedures, goodness-of-fit techniques also include less formal methods, in particular, graphical techniques. These have a long history in statistical analysis. Graphs are drawn so that adherence to or deviation from the hypothesized distribution results in certain features of the graph. For example, in the probability plot the ordered observations are plotted against functions of the ranks. In such plots a straight line indicates that the hypothesized distribution is a reasonable model for the data and deviations from the straight line indicate inappropriateness of the model. The type of departure from the straight line may indicate the nature of the true distribution. Historically the straight line has been judged by eye, and it is only recently that more formal techniques have been given.

1.2 Objectives of the Book

There are five major objectives of this book. They are:
  1. To identify the major theories behind goodness-of-fit techniques;
  2. To present an up-to-date picture of the status of these techniques;
  3. To give references to the relevant literature;
  4. To illustrate with numerical examples, and
  5. To make some recommendations on the use of different techniques.
There are several features that bear mention. First, a substantial number of numerical examples are Included. These are for the most part easy to find. In many chapters subsections containing numerical examples are identified by the letter E before the section number. For example, In Chapter 9, Section E 9.3.4.1.1 contains a numerical example of the Shapiro-Wilk test for normality.
Second, a set of data sets is used throughout the book. These allow for comparisons of some of the techniques on the same data sets. Some of these data sets are real data and others are simulated. The data sets are given in full in the appendix.
Third, the chapters contain specific recommendations for use of the test methods. Nevertheless, we have avoided the attempt to present final definitive recommendations. The authors for the chapters of this book each have significant expertise, but there is not always complete agreement among them on what is best. As we stated previously, theory does not exist which can identify the uniquely best procedure for most goodness-of-fit situations, and personal opinion and judgment will often enter any consideration. Each author has made recommendations based on his or her understanding and view of the problem.
Fourth, many references are given. There is an enormous literature and we have made no attempt to survey all of it. We have especially avoided heavy mathematical treatment and the details of theorems. A substantial list of references is given with each chapter, they Include references to earlier source material and to the theoretical background of the test procedures; it is hoped they will aid the development of further research.
Finally we recognize that it is Impossible to include all goodness-of-fit topics in this survey; our emphasis is largely on the practical aspects of testing. Some techniques are still underdeveloped, and, for example, suggested tests may lack tables for practical application, or enough comparisons have not been made to assess their merits; for these and similar reasons, some subjects have been lightly treated, if at all.
In goodness-of-fit there are many areas with unsolved problems, or unanswered questions. Some of the subjects on which there will surely be much work in the future include tests for censored data, especially for randomly censored data, tests based on the empirical characteristic function, tests based on spacings, and tests for multivariate distributions, especially for multivariate normality. Many comparisons between techniques are still needed, and also the exploration of wider questions such as the relationship of formal goodness-of-fit testing (as, indeed, in other forms of testing) to modern, more informal, approaches to statistical analysis where distributional models are not so rigidly specified. We hope this book sets forth the major topics of its subject, and will act as a base from which these and many other questions can be explored.

1.3 The Topics of the Book

In addition to this chapter the book consists of eleven other chapters. These are divided into three groups. The first consists of Chapters 2, 3, 4, 5, 6 and 7, containing general concepts applicable to testing for a variety of distributions. Chapter 2 describes graphical procedures for evaluating goodness-of-fit. These are informal procedures based mainly on the probability plot, useful for exploring data and for supplementing the formal testing procedures of the other chapters.
Chapter 3 reviews chi-square-type tests. The classical chi-square goodness-of-fit tests are reviewed first and then recent developments involving general quadratic forms and nonstandard chi-squared statistics are also discussed.
Chapter 4 presents tests based on the empirical distribution function (edf). These tests include the classical Kolmogorov-Smirnov test and other tests such as the Cramer-von Mises and Anderson-Darling tests. Consideration is given to simple and composite null hypotheses. The normal, exponential, extreme-value, Weibull, and gamma distributions among other distributions are given individual discussion.
Chapter 5 deals with tests based on regression and correlation. Some of these procedures can be viewed as arising from computing a correlation coefficient from a probability plot and testing if it differs significantly from unity. Also involved are tests based on comparisons of linear regression estimates of the scale parameter of the hypothesized distribution to the estimate coming from the sample standard deviation. The Shapiro-Wilk test for normality is one such test.
In Chapter 6 transformation techniques are reviewed. Here the data are first transformed to uniformity and goodness-of-fit tests for uniformity are applied to these transformed data. These techniques can deal with simple and composite hypotheses.
Tests based on the third and fourth sample moments are presented in Chapter 7. These techniques were first developed to test for normality. In Chapter 7 they are extended to nonnormal distributions.
The second group of chapters consists of Chapters 8, 9, and 10. These deal with tests for three distributionsā€”the uniform, the normal, and the exponentialā€”which have played prominent roles in statistical methodology. Many tests for these distributions have been devised, often based on the methods of previous chapters, and they are brought together, for each distribution, in these three chapters.
Chapters 11 and 12 form the last group; they cover extra materials. The problem of analyzing censored data is of great importance and Chapter 11 is devoted to this. Many of the previous chapters have sections on censored data. Chapter 11 collects these together, fills in some omissions, and gives examples; there is also a discussion on probability plotting of censored data.
The final chapter 12 is on the analysis and detection of outliers. This material might be considered outside the direct scope of goodness-of-fit techniques; however, it is closely related to them since they are often applied with this problem in mind, so we felt it would be useful to close the book with a chapter on outliers.

2
Graphical Analysis

Ralph B. Dā€™Agostlno Boston University, Boston, Massachusetts

2.1 Introduction

The purpose of this chapter is to illustrate the use of graph...

Table of contents