Medical Uses of Statistics
eBook - ePub

Medical Uses of Statistics

  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

About this book

A new edition of the classic guide to the use of statistics in medicine, featuring examples from articles in the New England Journal of Medicine

Medical Uses of Statistics has served as one of the most influential works on the subject for physicians, physicians-in-training, and a myriad of healthcare experts who need a clear idea of the proper application of statistical techniques in clinical studies as well as the implications of their interpretation for clinical practice. This Third Edition maintains the focus on the critical ideas, rather than the mechanics, to give practitioners and students the resources they need to understand the statistical methods they encounter in modern medical literature.

Bringing together contributions from more than two dozen distinguished statisticians and medical doctors, this volume stresses the underlying concepts in areas such as randomized trials, survival analysis, genetics, linear regression, meta-analysis, and risk analysis. The Third Edition includes:

  • Numerous examples based on studies taken directly from the pages of the New England Journal of Medicine
  • Two added chapters on statistics in genetics
  • Two new chapters on the application of statistical methods to studies in epidemiology
  • New chapters on analyses of randomized trials, linear regression, categorical data analysis, meta-analysis, subgroup analyses, and risk analysis
  • Updated chapters on statistical thinking, crossover designs, p-values, survival analysis, and reporting research results
  • A focus on helping readers to critically interpret published results of clinical research

Medical Uses of Statistics, Third Edition is a valuable resource for researchers and physicians working in any health-related field. It is also an excellent supplemental book for courses on medicine, biostatistics, and clinical research at the upper-undergraduate and graduate levels.

You can also visit the New England Journal of Medicine website for related information.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Medical Uses of Statistics by John C. Bailar, David C. Hoaglin, John C. Bailar,David C. Hoaglin in PDF and/or ePUB format, as well as other popular books in Medicine & Biostatistics. We have over one million books available in our catalogue for you to explore.

Information

Edition
3
SECTION III
Analysis
CHAPTER 8
p-Values
JAMES H. WARE, PH.D., FREDERICK MOSTELLER, PH.D., FERNANDO DELGADO, M.S., CHRISTL DONNELLY, D.SC., AND JOSEPH A. INGELFINGER, M.D.
ABSTRACT Many scientific studies use p-values to measure the strength of statistical evidence. They indicate the probability that a result at least as extreme as that observed would occur by chance. Although p-values are a way of reporting the results of statistical tests, they do not define the practical importance of the results. They depend upon a test statistic, a null hypothesis, and an alternative hypothesis. Multiple tests and selection of subgroups, outcomes, or variables for analysis can yield misleading p-values. Full reporting and statistical adjustment can help avoid these misleading values. Negative studies with low statistical power can lead to unjustified conclusions about the lack of effectiveness of medical interventions.
We discuss the role and use of p-values in scientific reporting and review the use of p-values in a sample of 25 articles from Volume 316 of the New England Journal of Medicine. We recommend that investigators report (1) summary statistics for the data, (2) the actual p-value rather than a range, (3) whether a test is one-sided or two-sided, (4) confidence intervals, (5) the effects of selection or multiplicity, and (6) the power of tests that describe nonsignificant comparisons.
Readers of the medical literature encounter p-values and associated tests of significance more often than any other statistical technique. In their review of Volumes 350 through 352 of the New England Journal of Medicine, Horton and Switzer1 found that 80 (26%) of the 311 articles classified as Original Articles used the t-test, and 166 (53%) used methods for the analysis of contingency tables. Other articles used p-values in association with nonparametric tests, in life-table analyses, and in the study of regression and correlation coefficients. Only 39 articles (13%) employed no statistical analyses. Of 91 Original Articles published in Volume 350 (excluding Brief Reports), 82 (90%) reported one or more p-values.
Because p-values play a central role in medical reporting, medical investigators and clinicians need to understand their origins, the pitfalls they present, and controversies about their use. Calculating p-values requires making assumptions about the data, and analyses involving the calculation of many p-values can be misleading. Although p-values can be useful as an aid to reporting, they are most informative when they are reported with descriptive information about study results.
Investigators compute p-values from the measurements provided by the sample of participants included in a scientific study and use these values to draw conclusions about the population from which the observations were drawn. For example, in a study of patients undergoing coronary bypass surgery, Mangano et al.2 reported that mortality was 1.3% (40 of 2999) among patients who received aspirin within 48 hours after revascularization and 4.0% (81 of 2023) among those who did not receive aspirin during this period (p < 0.001). The authors concluded from the small p-value that patients who received aspirin during this period were at reduced risk, as compared with patients who did not. This example illustrates a major use of p-values: to determine whether an observed effect can be explained by chance, i.e., random variation in patient outcomes. The specific meaning of the statement “p < 0.001” is that the observed difference or a more extreme difference in mortality rates would occur with probability less than 0.001 if the true mortality rates were identical for patients who did and who did not receive aspirin within 48 hours after revascularization. We return to these concepts below.
The frequent reporting of p-values attests to a wide belief in their usefulness in communicating scientific results. Moreover, the p-value associated with the primary results of a scientific study can be a factor in an editorial decision about publishing those results.3 Thus, clear and consistent practices in the calculation and citation of p-values are important elements of good scientific reporting. The interpretation of a p-value can depend substantially on the design of the study, the method of collecting data, and the analytic practices used. Scientific reports frequently underemphasize or omit information that readers need to assess an author’s conclusions. As a result, readers sometimes misunderstand the importance of either a highly significant or a nonsignificant p-value.
This chapter describes the basic ideas in a p-value calculation. The actual calculation of p-values is discussed in many textbooks.4–7 We also discuss the value of confidence intervals as a complement to p-values, review controversies regarding the use of p-values in medical reporting, and recommend six specific practices for the reporting of p-values.
To sharpen our understanding of the current role of p-values in medical reporting, we reviewed the use of p-values and related statistical information in 25 Original Articles selected over 20 years ago from the New England Journal of Medicine(Volume 316, January–June 1987). In discussions of controversies and recommendations regarding current use of p-values, we discuss how the issue was managed in these 25 articles and in articles chosen from recent literature.
WHAT ARE p-VALUES?
p-values are used to assess the degree of dissimilarity between two or more sets of measurements or between one set of measurements and a standard. A p-value is a probability, usually the probability of obtaining a result as extreme as, or more extreme than, the one observed if the dissimilarity is entirely due to random variation in measurements or in subject response—that is, if it is the result of chance alone. For example, Yanovski et al.,8 in reporting a study of holiday weight gain, state that “the perceived weight gain (1.57 ± 1.47 kg) was significantly greater than the measured weight gain by an average of 1.12 ± 1.79 kg (p < 0.001 by paired t-test).” Here, the p-value indicates that an average difference between the measured and perceived weight gains of study participants at least as great as that observed would occur with probability less than 0.001, or in less than 1 in 1000 trials, if there were no true difference between the average measured and perceived weight gain in the population represented by the study participants and if the assumed probability model were correct. The p-value depends implicitly on three elements: the test statistic, the null hypothesis, and the alternative hypothesis.
The Test Statistic
To summarize the dissimilarity between two sets of data, we choose a statistic that reflects the differences likely to be caused by the treatment or condition under study, such as the difference in means, which may use a t-statistic, or a difference in death rates, which may use a chi-squared statistic. Yanovski et al.8determined the perceived and measured weight gain for each study participant and calculated the paired t-statistic from these changes to test for a difference between perceived and actual weight gain. In the study of cardiac surgery cited earlier, Mangano and colleagues used a different statistic, the chi-squared test for association between death and receipt of aspirin, which they computed from the numbers of subjects and deaths in each of the two groups.
The p-value properly reflects the relative frequency of getting values of the statistic as extreme as the one observed when the pattern of results is due entirely to random variation. When systematic effects are present, such as treatment increasing length of life or increasing probability of survival in a surgical operation, the statistic should tend to reflect this by being more likely to produce extreme, usually small, p-values. When small p-values occur, they signal either that a rare random event has occurred or that a systematic effect is present. In a well-designed study, the investigator usually finds the systematic effect more plausible.
The Null Hypothesis
The information needed for the calculation of the p-value comes from expressing a scientific hypothesis in probabilistic terms. In the analysis mentioned above from the study by Yanovski et al., the scientific hypothesis to be tested statistically (though not necessarily what the authors expected or hoped to find) was that the true average perceived holiday weight gain was equal to the true average measured weight gain in the study population. Such a hypothesis is called a null hypothesis, the “null” implying that no effect beyond random variation is present. If this null hypothesis were true, the mean difference between the perceived and measured weight gain would vary around zero in repeated sampling.
Although the null hypothesis is central to the calculation of a p-value, only 4 of the 25 articles we reviewed from Volume 316 of the New England Journal of Medicine stated the null hypothesis explicitly. In most articles, the null hypothesis was defined implicitly by statements such as “calorie-adjusted potassium intake [was] significantly lower in the men and women who subsequently had a stroke-associated death, as compared with all other subjects (p = 0.06).”9Although such implicit definitions could, in principle, be ambiguous, we found no instances of such ambiguities in the 25 articles we reviewed.
One might ask why small or large p-values give us any feeling about the truth of the null hypothesis. After all, if chance alone were at work, a small p-value would merely tell us that a rare event has occurred, just as a large p-value goes with a likely event. This argument calls attention to another concept needed to understand p-values: the alternative hypothesis.
The Alternative Hypothesis
When investigators report p-values, they have an underlying, but usually unstated, notion that the null hypothesis may be false and that some other situation may be the true one. For example, Yanovski et al. had an alternative hypothesis in mind—that study participants would not accurately estimate their weight gain. (Because this alternative hypothesis includes all of the amounts by which study participants might underestimate or overestimate their weight gain, statisticians sometimes speak of alternative hypotheses.) Statistical tests are designed so that small p-values are more likely to occur when the null hypothesis is false.
Thus, we reject the null hypothesis not only because the probability of such an event is low when there is no effect, but also because the probability is greater when there is an effect. These two ideas together encourage us to believe that an alternative hypothesis is true when the test of the null hypothesis yields a small p-value for the observed effect.
Notice the similarity between this way of thinking and proof by contradiction. Logicians state as a premise whatever they intend to disprove. If a valid argument from that point leads to a contradiction, the premise in question must be false. In statistics, we follow this approach, but instead of reaching an absolute contradiction, we may observe an improbable outcome.10 We must then conclude either that the null hypothesis is correct and the improbable has happened, or that the null hypothesis is false. Deciding between these two possibilities can be difficult. One question that arises is how small the probability should be before we can conclude that the null hypothesis is mistaken.
Authors rarely state the alternative hypothesis explicitly. In our review of 25 articles from Volume 316 of the New England Journal of Medicine, we found no instances in which authors did so. There is greater potential for ambiguity with implicit specification of alternative hypotheses than with null hypotheses. Nevertheless, we could infer the alternative hypothesis in each of the 25 articles we reviewed.
THE 0.05 AND 0.0l SIGNIFICANCE LEVELS
The p-value measures surprise. The smaller the p-value, the more surprising the result if the null hypothesis is true. Sometimes a very rough idea of the degree of surprise suffices, and various simplifying conventions have come into use. One popular approach is to indicate only that the p-value is smaller than 0.05 (p < 0.05) or smaller than 0.01 (p < 0.01). When the p-value is between 0.05 and 0.01, the result is usually called “statistically significant”; when it is less than 0.01, the result is often called “highly statistically significant.” This standardization of wording has both advantages and disadvantages. Its main advantage is that it gives investigators a specific, objectively chosen level to keep in mind. In the past it was sometimes easier to determine whether a p-value was smaller than or larger than 0.05 than it was to compute the exact probability, but powerful desktop statistical software has made this circumstance rare. The main disadvantage of this wording is that it suggests a rather mindless cut-off point, which has nothing to do with the importance of the decision to be made or with the costs and losses associated with the outcomes.
The 0.05 level was popularized in part through its use in quality-control work, where the emphasis is on the performance of a decision rule in repeated testing. This viewpoint carries over reasonably well to the relatively small and frequently repeated studies of process and mechanism that represent the building blocks of scientific understanding. In the large, expensive clinical trials and descriptive studies that are increasingly common in modern science, however, the protection provided by repetition is rarely available.
Confidence Intervals
Some methodologists argue that medical reports rely too heavily on p-values, especially when the p-value is the only statistical information reported.11–13Because no single study determines scientific opinion on a subject, it is incumbent upon the investigator t...

Table of contents

  1. Cover Page
  2. Title Page
  3. Copyright
  4. Dedication
  5. Contributors
  6. Preface
  7. Preface To The Second Edition
  8. Preface To The First Edition
  9. Acknowledgments
  10. Origins Of Chapters
  11. Introduction
  12. SECTION I: Broad Concepts And Analytic Techniques
  13. SECTION II: Design
  14. SECTION III: Analysis
  15. SECTION IV: Communicating Results
  16. SECTION V: Specialized Methods
  17. INDEX