Common Errors in Statistics (and How to Avoid Them)
eBook - ePub

Common Errors in Statistics (and How to Avoid Them)

  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

Common Errors in Statistics (and How to Avoid Them)

About this book

Praise for the Second Edition

"All statistics students and teachers will find in this book a friendly and intelligentguide to . . . applied statistics in practice."
Journal of Applied Statistics

". . . a very engaging and valuable book for all who use statistics in any setting."
CHOICE

". . . a concise guide to the basics of statistics, replete with examples . . . a valuablereference for more advanced statisticians as well."
MAA Reviews

Now in its Third Edition, the highly readable Common Errors in Statistics (and How to Avoid Them) continues to serve as a thorough and straightforward discussion of basic statistical methods, presentations, approaches, and modeling techniques. Further enriched with new examples and counterexamples from the latest research as well as added coverage of relevant topics, this new edition of the benchmark book addresses popular mistakes often made in data collection and provides an indispensable guide to accurate statistical analysis and reporting. The authors' emphasis on careful practice, combined with a focus on the development of solutions, reveals the true value of statistics when applied correctly in any area of research.

The Third Edition has been considerably expanded and revised to include:

  • A new chapter on data quality assessment
  • A new chapter on correlated data

  • An expanded chapter on data analysis covering categorical and ordinal data, continuous measurements, and time-to-event data, including sections on factorial and crossover designs

  • Revamped exercises with a stronger emphasis on solutions

  • An extended chapter on report preparation

  • New sections on factor analysis as well as Poisson and negative binomial regression

Providing valuable, up-to-date information in the same user-friendly format as its predecessor, Common Errors in Statistics (and How to Avoid Them), Third Edition is an excellent book for students and professionals in industry, government, medicine, and the social sciences.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Common Errors in Statistics (and How to Avoid Them) by Phillip I. Good,James W. Hardin in PDF and/or ePUB format, as well as other popular books in Mathematics & Probability & Statistics. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Wiley
Year
2011
Print ISBN
9780470457986
eBook ISBN
9781118211274
PART I
FOUNDATIONS
1
SOURCES OF ERROR
Don't think-use the computer.
-Dyke (tongue in cheek) [1997].
Statistical procedures for hypothesis testing, estimation, and model building are only a part of the decision-making process. They should never be used as the sole basis for making a decision (yes, even those procedures that are based on a solid deductive mathematical foundation). As philosophers have known for centuries, extrapolation from a sample or samples to a larger incompletely examined population must entail a leap of faith.
The sources of error in applying statistical procedures are legion and include all of the following:
  • Using the same set of data to formulate hypotheses and then to test those hypotheses.
  • Taking samples from the wrong population or failing to specify in advance the population(s) about which inferences are to be made.
  • Failing to draw samples that are random and representative.
  • Measuring the wrong variables or failing to measure what you intended to measure.
  • Failing to understand that p-values are statistics, that is, functions of the observations, and will vary in magnitude from sample to sample.
  • Using inappropriate or inefficient statistical methods.
  • Using statistical software without verifying that its current defaults are appropriate for your application.
  • Failing to validate models.
But perhaps the most serious source of error is letting statistical procedures make decisions for you.
In this chapter, as throughout this book, we first offer a preventive prescription, followed by a list of common errors. If these prescriptions are followed carefully, you will be guided to the correct and effective use of statistics and avoid the pitfalls.
PRESCRIPTION
Statistical methods used for experimental design and analysis should be viewed in their rightful role as merely a part, albeit an essential part, of the decision-making procedure.
Here is a partial prescription for the error-free application of statistics:
1. Set forth your objectives and your research intentions before you conduct a laboratory experiment, a clinical trial, or a survey or analyze an existing set of data.
2. Define the population about which you will make inferences from the data you gather.
3. List all possible sources of variation. Control them or measure them to avoid confounding them with relationships among those items that are of primary interest.
4. Formulate your hypotheses and all of the associated alternatives. (See Chapter 2.) List possible experimental findings along with the conclusions you would draw and the actions you would take if this or another result proves to be the case. Do all of these things before you complete a single data collection form and before you turn on your computer.
5. Describe in detail how you intend to draw a representative sample from the population. (See Chapter 3.)
6. Use estimators that are impartial, consistent, efficient, robust, and minimum loss. (See Chapter 5.) To improve the results, focus on sufficient statistics, pivotal statistics, and admissible statistics and use interval estimates. (See Chapters 5 and 6.)
7. Know the assumptions that underlie the tests you use. Use those tests that require the minimum number of assumptions and are most powerful against the alternatives of interest. (See Chapters 5, 6, and 7.)
8. Incorporate in your reports the complete details of how the sample was drawn and describe the population from which it was drawn. If data are missing or the sampling plan was not followed, explain why and list all differences between the data that were present in the sample and the data that were missing or excluded. (See Chapter 8.)
FUNDAMENTAL CONCEPTS
Three concepts are fundamental to the design of experiments and surveys: variation, population, and sample.
A thorough understanding of these concepts will forestall many errors in the collection and interpretation of data.
If there were no variation-if every observation were predictable, a mere repetition of what had gone before-there would be no need for statistics.
Variation
Variation is inherent in virtually all of our observations. We would not expect the outcomes of two consecutive spins of a roulette wheel to be identical. One result might be red, the other black. The outcome varies from spin to spin.
There are gamblers who watch and record the spins of a single roulette wheel hour after hour, hoping to discern a pattern. A roulette wheel is, after all, a mechanical device, and perhaps a pattern will emerge. But even those observers do not anticipate finding a pattern that is 100% predetermined. The outcomes are just too variable.
Anyone who spends time in a schoolroom, as a parent or as a child, can see the vast differences among individuals. This one is tall, that one is short, though all are the same age. Half an aspirin and Dr. Good's headache is gone, but his wife requires four times that dosage.
There is variability even among observations on deterministic formula-satisfying phenomena such as the position of a planet in space or the volume of gas at a given temperature and pressure. Position and volume satisfy Kepler's laws and Boyle's law, respectively, but the observations we collect will depend upon the measuring instrument (which may be affected by the surrounding environment) and the observer. Cut a length of string and measure it three times. Do you record the same length each time?
In designing an experiment or a survey, we must always consider the possibility of errors arising from the measuring instrument and from the observer. It is one of the wonders of science that Kepler was able to formulate his laws at all given the relatively crude instruments at his disposal.
Population
The population(s) of interest must be clearly defined before we begin to gather data.
From time to time, someone will ask us how to generate confidence intervals (see Chapter 8) for the statistics arising from a total census of a population. Our answer is that we cannot help. Population statistics (mean, median, 30th percentile) are not estimates. They are fixed values and will be known with 100% accuracy if two criteria are fulfilled:
1. Every member of the population is observed.
2. All the observations are recorded correctly.
Confidence intervals would be appropriate if the first criterion is violated, for then we are looking at a sample, not a population. And if the second criterion is violated, then we might want to talk about the confidence we have in our measurements.
Debates about the accuracy of the 2000 United States Census arose from doubts about the fulfillment of these criteria.1 "You didn't count the homeless" was one challenge. "You didn't verify the answers" was another. Whether we collect data for a sample or an entire population, both of these challenges or their equivalents can and should be made.
Kepler's "laws" of planetary movement are not testable by statistical means when applied to the original planets (Jupiter, Mars, Mercury, and Venus) for which they were formulated. But when we make statements such as "Planets that revolve around Alpha Centauri will also follow Kepler's laws," we begin to view our original population, the planets of our sun, as a sample of all possible planets in all possible solar systems.
A major problem with many studies is that the population of interest is not adequately defined before the sample is drawn. Don't make this mistake. A second major problem is that the sample proves to have been drawn from a different population than was originally envisioned. We consider these issues in the next section and again in Chapters 2, 6, and 7.
Sample
A sample is any (proper) subset of a population.
Small samples may give a distorted view of the population. For example, if a minority group comprises 10% or less of a population, a jury of 12 persons selected at random from that population fails to contain any members of that minority at least 28% of the time.
As a sample grows larger, or as we combine more clusters within a single sample, the sample will resemble more closely the population from which it is drawn.
How large a sample must be drawn to obtain a sufficient degree of closeness will depend upon the manner in which the sample is chosen from the population.
Are the elements of the sample drawn at random, so that each unit in the population has an equal probability of being selected? Are the elements of the sample drawn independently of one another? If either of these criteria is not satisfied, then even a very large sample may bear little or no relation to the population from which it was drawn.
An obvious example is the use of recruits from a Marine boot camp as representatives of the population as a whole or even as representatives of all Marines. In fact, any group or cluster of individuals who live, work, study, or pray together may fail to be representative for any or all of the following reasons [Cummings and Koepsell, 2002]:
1. Shared exposure to the same physical or social environment.
2. Self-selection in belonging to the group.
3. Sharing of behaviors, ideas, or diseases among members of the group.
A sample consisting of the first few animals to be removed from a cage will not satisfy these criteria either, because, depending on how we grab, we are more likely to select more active or more passive animals. Activity tends to be associated with higher levels of corticosteroids, and corticosteroids are associated with virtually every body function.
Sample bias is ...

Table of contents

  1. Cover
  2. Title
  3. Copyright
  4. PREFACE
  5. PART I: FOUNDATIONS
  6. PART II: STATISTICAL ANALYSIS
  7. PART III: REPORTS
  8. PART IV: BUILDING A MODEL
  9. GLOSSARY, GROUPED BY RELATED BUT DISTINCT TERMS
  10. BIBLIOGRAPHY
  11. AUTHOR INDEX
  12. SUBJECT INDEX