Statistical Power Analysis for the Behavioral Sciences
eBook - ePub

Statistical Power Analysis for the Behavioral Sciences

  1. 567 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Statistical Power Analysis for the Behavioral Sciences

About this book

Statistical Power Analysis is a nontechnical guide to power analysis in research planning that provides users of applied statistics with the tools they need for more effective analysis. The Second Edition includes:
* a chapter covering power analysis in set correlation and multivariate methods;
* a chapter considering effect size, psychometric reliability, and the efficacy of "qualifying" dependent variables and;
* expanded power and sample size tables for multiple regression/correlation.

Tools to learn more effectively

Saving Books

Saving Books

Keyword Search

Keyword Search

Annotating Text

Annotating Text

Listen to it instead

Listen to it instead

Information

Chapter 1
The Concepts of Power Analysis
The power of a statistical test is the probability that it will yield statistically significant results. Since statistical significance is so earnestly sought and devoutly wished for by behavioral scientists, one would think that the a priori probability of its accomplishment would be routinely determined and well understood. Quite surprisingly, this is not the case. Instead, if we take as evidence the research literature, we find evidence that statistical power is frequenty not understood and, in reports of research where it is clearly relevant, the issue is not addressed.
The purpose of this book is to provide a self-contained comprehensive treatment of statistical power analysis from an “applied” viewpoint. The purpose of this chapter is to present the basic conceptual framework of statistical hypothesis testing, giving emphasis to power, followed by the framework within which this book is organized.
1.1 GENERAL INTRODUCTION
When the behavioral scientist has occasion to don the mantle of the applied statistician, the probability is high that it will be for the purpose of testing one or more null hypotheses, i.e., “the hypothesis that the phenomenon to be demonstrated is in fact absent [Fisher, 1949, p. 13].” Not that he hopes to “prove” this hypothesis. On the contrary, he typically hopes to “reject” this hypothesis and thus “prove” that the phenomenon in question is in fact present.
Let us acknowledge at the outset the necessarily probabilistic character of statistical inference, and dispense with the mocking quotation marks about words like reject and prove. This may be done by requiring that an investigator set certain appropriate probability standards for research results which provide a basis for rejection of the null hypothesis and hence for the proof of the existence of the phenomenon under test. Results from a random sample drawn from a population will only approximate the characteristics of the population. Therefore, even if the null hypothesis is, in fact, true, a given sample result is not expected to mirror this fact exactly. Before sample data are gathered, therefore, the investigator selects some prudently small value a (say .01 or .05), so that he may eventually be able to say about his sample data, “If the null hypothesis is true, the probability of the obtained sample result is no more than a,” i.e. a statistically significant result. If he can make this statement, since a is small, he said to have rejected the null hypothesis “with an a significance criterion” or “at the a significance level.” If, on the other hand, he finds the probability to be greater than a, he cannot make the above statement and he has failed to reject the null hypothesis, or, equivalently finds it “tenable,” or “accepts” it, all at the a significance level. Note that a is set in advance.
We have thus isolated one element of this form of statistical inference, the standard of proof that the phenomenon exists, or, equivalently, the standard of disproof of the null hypothesis that states that the phenomenon does not exist.
Another component of the significance criterion concerns the exact definition of the nature of the phenomenon’s existence. This depends on the details of how the phenomenon is manifested and statistically tested, e.g., the directionality/nondirectionality (“one tailed”/”two tailed”) of the statement of the alternative to the null hypothesis.1 When, for example, the investigator is working in a context of comparing some parameter (e.g., mean, proportion, correlation coefficient) for two populations A and B, he can define the existence of the phenomenon in two different ways:
1. The phenomenon is taken to exist if the parameters of A and B differ. No direction of the difference, such as A larger than B, is specified, so that departures in either direction from the null hypothesis constitute evidence against it. Because either tail of the sampling distribution of differences may contribute to a, this is usually called a two-tailed or two-sided test.
2. The phenomenon is taken to exist only if the parameters of A and B differ in a direction specified in advance, e.g., A larger than B. In this circumstance, departures from the null hypothesis only in the direction specified constitute evidence against it. Because only one tail of the sampling distribution of differences may contribute to a, this is usually called a one-tailed or one-sided test.
It is convenient to conceive of the significance criterion as embodying both the probability of falsely rejecting the null hypothesis, a, and the “sidedness” of the definition of the existence of the phenomenon (when relevant). Thus, the significance criterion on a two-tailed test of the null hypothesis at the .05 significance level, which will be symbolized as a2 =.05, says two things: (a) that the phenomenon whose existence is at issue is understood to be manifested by any difference between the two populations’ parameter values, and (b) that the standard of proof is a sample result that would occur less than 5 % of the time if the null hypothesis is true. Similarly, a prior specification defining the phenomenon under study as that for which the parameter value for A is larger than that of B (i.e., one-tailed) and the probability of falsely rejecting the null is set at. 10 would be symbolized as a significance criterion of a1 =.10. The combination of the probability and the sidedness of the test into a single entity, the significance criterion, is convenient because this combination defines in advance the “critical region,” i.e., the range of values of the outcome which leads to rejection of the null hypothesis and, perforce, the range of values which leads to its nonrejection. Thus, when an investigator plans a statistical test at some given significance criterion, say a1 =.10, he has effected a specific division of all the possible results of his study into those which will lead him to conclude that the phenomenon exists (with risk a no greater than .10 and a one-sided definition of the phenomenon) and those which will not make possible that conclusion.2
The above review of the logic of classical statistical inference reduces to a null hypothesis and a significance criterion which defines the circumstances which will lead to its rejection or nonrejection. Observe that the significance criterion embodies the risk of mistakenly rejecting a null hypothesis. The entire discussion above is conditional on the truth of the null hypothesis.
But what if, indeed, the phenomenon does exist and the null hypothesis is false? This is the usual expectation of the investigator, who has stated the null hypothesis for tactical purposes so that he may reject it and conclude that the phenomenon exists. But, of course, the fact that the phenomenon exists in the population far from guarantees a statistically significant result, i.e., one which warrants the conclusion that it exists, for this conclusion depends upon meeting the agreed-upon standard of proof (i.e., significance criterion). It is at this point that the concept of statistical power must be considered.
The power of a statistical test of a null hypothesis is the probability that it will lead to the rejection of the null hypothesis, i.e., the probability that it will result in the conclusion that the phenomenon exists. Given the characteristics of a specific statistical test of the null hypothesis and the state of affairs in the population, the power of the test can be determined. It clearly represents a vital piece of information about a statistical test applied to research data (cf. Cohen, 1962). For example, the discovery, during the planning phase of an investigation, that the power of the eventual statistical test is low should lead to a revision in the plans. As another example, consider a completed experiment which led to nonrejection of the null hypothesis. An analysis which finds that the power was low should lead one to regard the negative results as ambiguous, since failure to reject the null hypothesis cannot have much substantive meaning when, even though the phenomenon exists (to some given degree), the a priori probability of rejecting the null hypothesis was low. A detailed consideration of the use of power analysis in planning investigations and assessing completed investigations is reserved for later sections.
The power of a statistical test depends upon three parameters: the significance criterion, the reliability of the sample results, and the “effect size,” that is, the degree to which the phenomenon exists.
1.2 SIGNIFICANCE CRITERION
The role of this parameter in testing null hypotheses has already been given some consideration. As noted above, the significance criterion represents the standard of proof that the phenomenon exists, or the risk of mistakenly rejecting the null hypothesis. As used here, it directly implies the “critical region of rejection” of the null hypothesis, since it embodies both the probability of a class of results given that the null hypothesis is true (a), as well as the definition of the phenomenon’s existence with regard to directionality. For power to be defined, its value must be set in advance.
The significance level, a, has been variously called the error of the first kind, the Type I error, and the alpha error. Since it is the rate of rejecting a true null hypothesis, it is taken as a relatively small value. It follows then that the smaller the value, the more rigorous the standard of null hypothesis rejection or, equivalently, of proof of the phenomenon’s existence. Assume that a phenomenon exists in the population to some given degree. Other things equal, the more stringent the standard for proof, i.e., the lower the value of a, the poorer the chances are that the sample will provide results which meet this standard, i.e., the lower the power. Concretely, if an investigator is prepared to run only a 1 % risk of false rejection of the null hypothesis, the probability of his data meeting this standard is lower than would be the case were he prepared to use the less stringent standard of a 10% risk of false rejection.
The practice of taking a very small (“the smaller the better”) then results in power values being relatively small. However, the complement of the power (1 – power), here symbolized as b, is also error, called Type II or beta error, since it represents the “error” rate of failing to reject a false null hypothesis. Thus it is seen that statistical inference can be viewed as weighing, in a manner relevant to the substantive issues of an investigation, these two kinds of errors. An investigator can set the risk of false null hypothesis rejection at a vanishingly small level, say a =.001, but in so doing, he may reduce the power of his test to .10 (hence beta error probability, b, is 1 –.10 =.90). Two comments may be made here:
1. The general neglect of issues of statistical power in behavioral science may well result, in such instances, in the investigator’s failing to realize that the a =.001 val...

Table of contents

  1. Cover
  2. Half Title
  3. Title Page
  4. Copyright
  5. Dedication
  6. Contents
  7. Preface to the Second Edition
  8. Preface to the Revised Edition
  9. Preface to the Original Edition
  10. Chapter 1. The Concepts of Power Analysis
  11. Chapter 2. The t Test for Means
  12. Chapter 3. The Significance of a Product Moment rs
  13. Chapter 4. Differences between Correlation Coefficients
  14. Chapter 5. The Test that a Proportion is .50 and the Sign Test
  15. Chapter 6. Differences between Proportions
  16. Chapter 7. Chi-Square Tests for Goodness of Fit and Contingency Tables
  17. Chapter 8. The Analysis of Variance and Covariance
  18. Chapter 9. Multiple Regression and Correlation Analysis
  19. Chapter 10. Set Correlation and Multivariate Methods
  20. Chapter 11. Some Issues in Power Analysis
  21. Chapter 12. Computational Procedures
  22. References
  23. Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Statistical Power Analysis for the Behavioral Sciences by Jacob Cohen in PDF and/or ePUB format, as well as other popular books in Psychology & History & Theory in Psychology. We have over one million books available in our catalogue for you to explore.