Statistical Power Analysis for the Behavioral Sciences
eBook - ePub

Statistical Power Analysis for the Behavioral Sciences

Jacob Cohen

Share book
  1. 567 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Statistical Power Analysis for the Behavioral Sciences

Jacob Cohen

Book details
Book preview
Table of contents
Citations

About This Book

Statistical Power Analysis is a nontechnical guide to power analysis in research planning that provides users of applied statistics with the tools they need for more effective analysis. The Second Edition includes:
* a chapter covering power analysis in set correlation and multivariate methods;
* a chapter considering effect size, psychometric reliability, and the efficacy of "qualifying" dependent variables and;
* expanded power and sample size tables for multiple regression/correlation.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Statistical Power Analysis for the Behavioral Sciences an online PDF/ePUB?
Yes, you can access Statistical Power Analysis for the Behavioral Sciences by Jacob Cohen in PDF and/or ePUB format, as well as other popular books in Psychologie & Histoire et théorie en psychologie. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Routledge
Year
2013
ISBN
9781134742776
Chapter 1
The Concepts of Power Analysis
The power of a statistical test is the probability that it will yield statistically significant results. Since statistical significance is so earnestly sought and devoutly wished for by behavioral scientists, one would think that the a priori probability of its accomplishment would be routinely determined and well understood. Quite surprisingly, this is not the case. Instead, if we take as evidence the research literature, we find evidence that statistical power is frequenty not understood and, in reports of research where it is clearly relevant, the issue is not addressed.
The purpose of this book is to provide a self-contained comprehensive treatment of statistical power analysis from an “applied” viewpoint. The purpose of this chapter is to present the basic conceptual framework of statistical hypothesis testing, giving emphasis to power, followed by the framework within which this book is organized.
1.1 GENERAL INTRODUCTION
When the behavioral scientist has occasion to don the mantle of the applied statistician, the probability is high that it will be for the purpose of testing one or more null hypotheses, i.e., “the hypothesis that the phenomenon to be demonstrated is in fact absent [Fisher, 1949, p. 13].” Not that he hopes to “prove” this hypothesis. On the contrary, he typically hopes to “reject” this hypothesis and thus “prove” that the phenomenon in question is in fact present.
Let us acknowledge at the outset the necessarily probabilistic character of statistical inference, and dispense with the mocking quotation marks about words like reject and prove. This may be done by requiring that an investigator set certain appropriate probability standards for research results which provide a basis for rejection of the null hypothesis and hence for the proof of the existence of the phenomenon under test. Results from a random sample drawn from a population will only approximate the characteristics of the population. Therefore, even if the null hypothesis is, in fact, true, a given sample result is not expected to mirror this fact exactly. Before sample data are gathered, therefore, the investigator selects some prudently small value a (say .01 or .05), so that he may eventually be able to say about his sample data, “If the null hypothesis is true, the probability of the obtained sample result is no more than a,” i.e. a statistically significant result. If he can make this statement, since a is small, he said to have rejected the null hypothesis “with an a significance criterion” or “at the a significance level.” If, on the other hand, he finds the probability to be greater than a, he cannot make the above statement and he has failed to reject the null hypothesis, or, equivalently finds it “tenable,” or “accepts” it, all at the a significance level. Note that a is set in advance.
We have thus isolated one element of this form of statistical inference, the standard of proof that the phenomenon exists, or, equivalently, the standard of disproof of the null hypothesis that states that the phenomenon does not exist.
Another component of the significance criterion concerns the exact definition of the nature of the phenomenon’s existence. This depends on the details of how the phenomenon is manifested and statistically tested, e.g., the directionality/nondirectionality (“one tailed”/”two tailed”) of the statement of the alternative to the null hypothesis.1 When, for example, the investigator is working in a context of comparing some parameter (e.g., mean, proportion, correlation coefficient) for two populations A and B, he can define the existence of the phenomenon in two different ways:
1. The phenomenon is taken to exist if the parameters of A and B differ. No direction of the difference, such as A larger than B, is specified, so that departures in either direction from the null hypothesis constitute evidence against it. Because either tail of the sampling distribution of differences may contribute to a, this is usually called a two-tailed or two-sided test.
2. The phenomenon is taken to exist only if the parameters of A and B differ in a direction specified in advance, e.g., A larger than B. In this circumstance, departures from the null hypothesis only in the direction specified constitute evidence against it. Because only one tail of the sampling distribution of differences may contribute to a, this is usually called a one-tailed or one-sided test.
It is convenient to conceive of the significance criterion as embodying both the probability of falsely rejecting the null hypothesis, a, and the “sidedness” of the definition of the existence of the phenomenon (when relevant). Thus, the significance criterion on a two-tailed test of the null hypothesis at the .05 significance level, which will be symbolized as a2 =.05, says two things: (a) that the phenomenon whose existence is at issue is understood to be manifested by any difference between the two populations’ parameter values, and (b) that the standard of proof is a sample result that would occur less than 5 % of the time if the null hypothesis is true. Similarly, a prior specification defining the phenomenon under study as that for which the parameter value for A is larger than that of B (i.e., one-tailed) and the probability of falsely rejecting the null is set at. 10 would be symbolized as a significance criterion of a1 =.10. The combination of the probability and the sidedness of the test into a single entity, the significance criterion, is convenient because this combination defines in advance the “critical region,” i.e., the range of values of the outcome which leads to rejection of the null hypothesis and, perforce, the range of values which leads to its nonrejection. Thus, when an investigator plans a statistical test at some given significance criterion, say a1 =.10, he has effected a specific division of all the possible results of his study into those which will lead him to conclude that the phenomenon exists (with risk a no greater than .10 and a one-sided definition of the phenomenon) and those which will not make possible that conclusion.2
The above review of the logic of classical statistical inference reduces to a null hypothesis and a significance criterion which defines the circumstances which will lead to its rejection or nonrejection. Observe that the significance criterion embodies the risk of mistakenly rejecting a null hypothesis. The entire discussion above is conditional on the truth of the null hypothesis.
But what if, indeed, the phenomenon does exist and the null hypothesis is false? This is the usual expectation of the investigator, who has stated the null hypothesis for tactical purposes so that he may reject it and conclude that the phenomenon exists. But, of course, the fact that the phenomenon exists in the population far from guarantees a statistically significant result, i.e., one which warrants the conclusion that it exists, for this conclusion depends upon meeting the agreed-upon standard of proof (i.e., significance criterion). It is at this point that the concept of statistical power must be considered.
The power of a statistical test of a null hypothesis is the probability that it will lead to the rejection of the null hypothesis, i.e., the probability that it will result in the conclusion that the phenomenon exists. Given the characteristics of a specific statistical test of the null hypothesis and the state of affairs in the population, the power of the test can be determined. It clearly represents a vital piece of information about a statistical test applied to research data (cf. Cohen, 1962). For example, the discovery, during the planning phase of an investigation, that the power of the eventual statistical test is low should lead to a revision in the plans. As another example, consider a completed experiment which led to nonrejection of the null hypothesis. An analysis which finds that the power was low should lead one to regard the negative results as ambiguous, since failure to reject the null hypothesis cannot have much substantive meaning when, even though the phenomenon exists (to some given degree), the a priori probability of rejecting the null hypothesis was low. A detailed consideration of the use of power analysis in planning investigations and assessing completed investigations is reserved for later sections.
The power of a statistical test depends upon three parameters: the significance criterion, the reliability of the sample results, and the “effect size,” that is, the degree to which the phenomenon exists.
1.2 SIGNIFICANCE CRITERION
The role of this parameter in testing null hypotheses has already been given some consideration. As noted above, the significance criterion represents the standard of proof that the phenomenon exists, or the risk of mistakenly rejecting the null hypothesis. As used here, it directly implies the “critical region of rejection” of the null hypothesis, since it embodies both the probability of a class of results given that the null hypothesis is true (a), as well as the definition of the phenomenon’s existence with regard to directionality. For power to be defined, its value must be set in advance.
The significance level, a, has been variously called the error of the first kind, the Type I error, and the alpha error. Since it is the rate of rejecting a true null hypothesis, it is taken as a relatively small value. It follows then that the smaller the value, the more rigorous the standard of null hypothesis rejection or, equivalently, of proof of the phenomenon’s existence. Assume that a phenomenon exists in the population to some given degree. Other things equal, the more stringent the standard for proof, i.e., the lower the value of a, the poorer the chances are that the sample will provide results which meet this standard, i.e., the lower the power. Concretely, if an investigator is prepared to run only a 1 % risk of false rejection of the null hypothesis, the probability of his data meeting this standard is lower than would be the case were he prepared to use the less stringent standard of a 10% risk of false rejection.
The practice of taking a very small (“the smaller the better”) then results in power values being relatively small. However, the complement of the power (1 – power), here symbolized as b, is also error, called Type II or beta error, since it represents the “error” rate of failing to reject a false null hypothesis. Thus it is seen that statistical inference can be viewed as weighing, in a manner relevant to the substantive issues of an investigation, these two kinds of errors. An investigator can set the risk of false null hypothesis rejection at a vanishingly small level, say a =.001, but in so doing, he may reduce the power of his test to .10 (hence beta error probability, b, is 1 –.10 =.90). Two comments may be made here:
1. The general neglect of issues of statistical power in behavioral science may well result, in such instances, in the investigator’s failing to realize that the a =.001 val...

Table of contents