Mathematics

Hypothesis Test for Correlation

A hypothesis test for correlation is a statistical method used to determine if there is a significant linear relationship between two variables. It involves testing the null hypothesis that there is no correlation against the alternative hypothesis that there is a correlation. The test produces a p-value, which indicates the strength of evidence against the null hypothesis.

Written by Perlego with AI-assistance

12 Key excerpts on "Hypothesis Test for Correlation"

  • Book cover image for: Statistical Concepts for the Behavioral Sciences
    To find if two variables covary, we measure a sample of people and obtain two scores from each person, such as a liking a professor score and an in-class attention score. Then a correlation coefficient is calculated. A correlation coefficient is a statistic that provides a numerical description of the extent of the relatedness of two sets of scores and the direc- tion of the relationship. Values of this coefficient may range from . Statistical hypothesis testing also enters into use with the correlation coefficient. There will always be some chance relationship between scores on two different variables. Thus, the question arises of whether an observed relation, given by the numerical value of the correlation coefficient, is greater than would be expected from chance alone. A statisti- cal test on the correlation coefficient provides an answer for this question. If the two sets of scores are related beyond chance occurrence, then we may be inter- ested in attempting to predict one score from the other. If you knew a subject’s liking of a professor score, could you predict his or her in-class attention score? And, if you could predict the in-class attention score, how accurate would your prediction be? Predicting a score on one variable from a score on a second variable involves using regression analysis. Correlation and regression analysis techniques are widely used in many areas of behav- ioral science to find relationships between variables and to find if one variable predicts another. This chapter introduces correlational studies and the statistics associated with them. Using correlated scores to predict one score from another is introduced in Chapter 14. Let us continue with our example of a possible relationship between liking a professor and in-class attention to introduce concepts of correlation and the correlation coefficient.
  • Book cover image for: Correlation and Regression
    eBook - ePub

    Correlation and Regression

    Applications for Industrial Organizational Psychology and Management

    Chapter I ). In fact, testing Pearson product moment correlations for significance is relatively straightforward, although the procedures depend upon the type of hypothesis being tested.
    TESTING A SINGLE CORRELATION AGAINST ZERO
    The hypothesis H 0 : ρ = 0, where ρ is the underlying population correlation, is by far the most frequently tested correlational hypothesis. In fact, many researchers run around asking each other, “Was your r significant?” This is researcher shorthand for, “Was your sample value of r significantly different from the hypothesized value of ρ = 0?”or “Assuming the population correlation was 0, was your sample value of r so far away from 0 that it was not likely to have occurred by chance?”
    Notationally, let ρ be the value of the population correlation, let r be our old friend the sample correlation, and let n be the sample size. Then, it can be shown that the value has a t -distribution with (n − 2) degrees of freedom. Formally, we have
    Note that the value of r is tested by computing a fairly straightforward (but nonlinear) function of r and comparing the result to well-known critical values of Student’s t -distribution.2 For example, suppose the predictive validity of the Scholastic Aptitude Test (SAT) is being questioned. Assume that a researcher uses a simple random sample of n = 122 and obtains a measure of success in college (Y ) as well as previous SAT scores (X ) for each individual. Suppose the resulting correlation is r = .27. Then, Equation III.A gives
    The obtained sample value of t = 3.07 is larger than the two-sided, p = .05 critical t -value of 1.980 (see Appendix, Table A.1 ). Therefore, this correlation is “significant.” More correctly stated, we have rejected the null hypothesis (that the underlying value of ρ is 0) in favor of the alternative hypothesis that ρ is nonzero (but you’ll usually just hear the cry, “My r is significant!”). [By the way, the same value of 3.07 is statistically significant at the p
  • Book cover image for: Statistics for Anthropology
    I am sure you have heard the saying “Correlation does not mean causation.” This is so true and so frequently forgotten! The natural and social world is full of spurious correlations, correlations which arise only because of chance and which have no meaning or importance in the natural and social world. 9.1 The Pearson product-moment correlation The Pearson correlation is a commonly applied parametric test which quantifies the relation between two numeric variables, and tests the null hypothesis that such relation is not statistically significant. The correlation between the variables is quantified with a coefficient whose statistical symbol is r, and whose parametric symbol is  (“rho”). The coefficient ranges in value from −1 to +1. If r is negative, then as Y 1 increases, Y 2 decreases (Figure 9.1). If r is positive then as Y 1 increases, Y 2 increases as well (Figure 9.2). If r is not statistically significantly different from 0, then there is no significant relation between Y 1 and Y 2 (Figure 9.3). Thus, in correlation analysis the null hypothesis is that the parametric correlation between the two variables is 0; the 194 Correlation analysis Z 0 10 20 30 Y 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Figure 9.1 A scatter plot of two variables which have a significantly negative correlation. Y 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 X 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Figure 9.2 A scatter plot of two variables which have a significantly positive correlation. usual two-tailed test null hypothesis is H0:  = 0. A one-tailed test is possible as well, although it should be used only when there are compelling reasons for it. The reader is by now familiar with the fact that many statistical techniques assume a sample data set to be normally distributed. Indeed, for analysis of variance, it was stressed that every sample be tested for normality of distribution. Correlation analysis
  • Book cover image for: Statistics for the Social Sciences
    eBook - PDF

    Statistics for the Social Sciences

    A General Linear Model Approach

    An experiment requires the experimenter to randomly assign sample members to two groups. Usually a group called the control group receives no 350 Correlation Summary Pearson’ s correlation is a statistic that quantifies the relationship between two interval- or ratio-level variables. It is calculated with Formula 12.1: r ¼ P x i  X ð Þ y i  Y ð Þ n1 s x  s y . There are two components of the Pearson’ s r value to interpret. The first is the sign of the correlation coefficient. Positive values indicate that individuals who have high X scores tend to have high Y scores (and that individuals with low X scores tend to have low Y scores). A negative correlation indicates that individuals with high X scores tend to have low Y scores (and that individuals with low X scores tend to have high Y scores). The second component of a correlation coefficient that is interpreted is the number, which indicates the strength of the relationship and the consistency with which the relationship is observed. The closer an r value is to +1 or –1, the stronger the relationship between the variables. The closer the number is to zero, the weaker the relationship. For many purposes, calculating the r value is enough to answer the research questions in a study. But it is also possible to test a correlation coefficient for statistical significance, where the null hypothesis is r = 0. Such a null hypothesis statistical significance test (NHST) follows the same steps of all NHSTs. The effect size for Pearson’ s r is calculated by squaring the r value (r 2 ). The data used to calculate Pearson’ s r can be visualized with a scatterplot, which was introduced in Chapter 3. In a scatterplot each sample member is represented as a dot plotted in a treatment, while the other group, called the experimental group, receives the treatment.
  • Book cover image for: RESEARCH METHODS FOR BEHAVIORAL SCIENCE
    All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Statistical Assessment of Relationships 169 hypothesis is that the variables are independent ( r 5 0) and the research hypothesis is that the variables are not independent (either r . 0 or r , 0). In some cases, the correlation between the variables can be reported in the text of the research report, for instance, “As predicted by the research hypothesis, the variables of optimism and reported health behavior were significantly pos-itively correlated in the sample, r (20) 5 .52, p , .01.” In this case, the correla-tion coefficient is .52, 20 refers to the sample size ( N ), and .01 is the p -value of the observed correlation. When there are many correlations to be reported at the same time, they can be presented in a correlation matrix, which is a table showing the correlations of many variables with each other. An example of a correlation matrix printed out by the statistical software program IBM SPSS ® is presented in Table 9.3. The variables that have been correlated are SAT, social support, study hours, and col-lege GPA, although these names have been abbreviated by IBM SPSS into shorter labels. The printout contains 16 cells, each indicating the correlation between two of these variables. Within each box are the appropriate correlations ( r ) on the first line, the p -value on the second line, and the sample size ( N ) on the third line. Note that IBM SPSS indicates the (two-tailed) p -values as “sig.” Because any variable correlates at r 5 1.00 with itself, the correlations on the diagonal of a correlation matrix are all 1.00.
  • Book cover image for: Reasoning with Data
    eBook - PDF

    Reasoning with Data

    An Introduction to Traditional and Bayesian Statistics Using R

    The alternative hypothesis is simply the logical opposite, and incorporates the possibility of a nonzero correlation that is either negative or positive. In fact, we get a hint right at the beginning of the output that this correlation is negative from the minus sign on the t-test. The observed value of t on 148 degrees of freedom is –4.79. Because the corresponding p-value, 4.073e-06, is decidedly less than the conventional alpha threshold of p < .05, we reject the null hypothesis. Remember that the scientific notation e-06 means that we should move the decimal point six spaces to the left to get the corre- sponding decimal number (0.00000473). To become better informed about the uncertainty around the point esti- mate of our correlation, we can also look at the width of the confidence inter- val, which ranges from –0.497 up to –0.219. Although that is a fairly wide range, the confidence interval does not straddle 0, so we have a sense of cer- tainty that the correlation is negative. In fact, the point estimate for the correla- tion reported by R is –0.36. If you check carefully you will find that r = –0.36 does not quite fall symmetrically between –0.497 and –0.219. This result is to be expected if you hearken back to our earlier discussion of the ceiling effect that is imposed on correlations because they can’t go any higher than 1.0 or any lower than –1.0. BAYESIAN TESTS ON THE CORRELATION COEFFICIENT Experts have been working on versions of Bayesian tests that can directly exam- ine the value of a Pearson’s r correlation, but for now those procedures are not available in an R package. With that said, I used a little statistical trickery so that we can take advantage of the capabilities that do exist in the BayesFactor package to make our own Bayesian test of the correlation coefficient. The fol- lowing code creates a custom function to do the job: Associations between Variables 135
  • Book cover image for: Statistical Methods for Climate Scientists
    2 Hypothesis Tests The problem is to determine the rule which, for each set of values of the observations, specifies what decision should be taken. 1 Lehman and Romano (2005) The previous chapter considered the following problem: given a distribution, deduce the characteristics of samples drawn from that distribution. This chapter goes in the opposite direction: given a random sample, infer the distribution from which the sample was drawn. It is impossible to infer the distribution exactly from a finite sample. Our strategy is more limited: we propose a hypothesis about the distribution, then decide whether or not to accept the hypothesis based on the sample. Such procedures are called hypothesis tests. In each test, a decision rule for deciding whether to accept or reject the hypothesis is formulated. The probability that the rule gives the wrong decision when the hypothesis is true leads to the concept of a significance level. In climate studies, perhaps the most common questions addressed by hypothesis test are whether two random variables: • have the same mean. • have the same variance. • are independent. This chapter discusses the corresponding tests for normal distributions, which are known as: • the t -test (or the difference-in-means test). • the F-test (or the difference-in-variance test). • the correlation test. 1 Reprinted by permission from Springer Nature: Springer Testing Statistical Hypotheses by E. L. Lehmann and J. P. Romano, 2005. Page 3. 30 2.1 The Problem 31 2.1 The Problem Most people learn in elementary school that the scientific method involves formu- lating a hypothesis about nature, developing consequences of that hypothesis, and then comparing those consequences to experiment. The initial stages of developing a scientific theory were beautifully described by Richard Feynman, Nobel prize winner in physics: The principle of science, the definition, almost, is the following: The test of all knowledge is experiment.
  • Book cover image for: Statistics for the Social Sciences
    eBook - PDF

    Statistics for the Social Sciences

    A General Linear Model Approach

    Check Yourself! • Why does a correlation between two variables not imply that the independent variable causes the dependent variable? • Explain the third variable problem. Summary Pearson’ s correlation is a statistic that quantifies the relationship between two interval- or ratio-level variables. It is calculated with Formula 12.1: r ¼ Σ x i   X ð Þ y i   Y ð Þ n  1    ^ σ x ^ σ y   356 Correlation There are two components of the Pearson’ s r value to interpret. The first is the sign of the correlation coefficient. Positive values indicate that individuals who have high X scores tend to have high Y scores (and that individuals with low X scores tend to have low Y scores). A negative correlation indicates that individuals with high X scores tend to have low Y scores (and that individuals with low X scores tend to have high Y scores). The second component of a correlation coefficient that is interpreted is the correlation’ s number, which indicates the strength of the relationship and the consistency with which the relationship is observed. The closer an r value is to +1 or –1, the stronger the relationship between the variables. The closer the number is to zero, the weaker the relationship. For many purposes, calculating the r value is enough to answer the research questions in a study. But it is also possible to test a correlation coefficient for statistical significance, where the null hypothesis is r = 0. Such a null hypothesis statistical significance test (NHST) follows the same steps as all NHSTs. The effect size for Pearson’ s r is calculated by squaring the r value (i.e., obtaining r 2 ). The data used to calculate Pearson’ s r can be visualized with a scatterplot, which was introduced in Chapter 4. In a scatterplot each sample member is represented as a dot plotted in a position corresponding to the individual’ s X and Y scores. Scatterplots for strong correlations tend to have a group of dots that are closely grouped together.
  • Book cover image for: Essential Statistics
    The three cases are shown in Fig. 14.2. y y y tt X X r = 0 (approx.) x X X X X X X (b) x (c) X Figure 14.2 Scatter Diagrams for: (a) r = +1; (b) r = -1; (c) r = 0 (approx.) Correlation of Quantitative Variables ■ 215 Within the range of possible values for r from —1 to +1, we may describe a value of +0.874 (obtained above) as ‘high positive correla-tion’. But, a word of warning! Do not judge the association between two variables simply from the value of the correlation coefficient. We must also take into account the value of n, the number of ‘individuals’ contrib-uting to the sample data. Intuitively, r = 0.874, based on a sample of 6 individuals, is not as impressive as r = 0.874 based on a sample of 60 individuals. Had we obtained the latter we would have much more evidence of the degree of association in the population. This intuitive argument is formalised in a hypothesis test for p, the population value of Pearson’s correlation coefficient, in the next section. 14.3 Hypothesis Test for Pearson's Population Correlation Coefficient, p Example We will use the data and calculations of the previous section, and set out the seven-step method: 1. H0 : p = 0. This implies that there is no correlation between the variables in the population. 2. Ht p > 0. This implies that there is a positive correlation in the population, i.e., increasing height is associated with increasing weight 3. 5% significance level. 4. The calculated test statistic is Notice that this formula contains n, the number of ‘individuals’ as well as r. For our data, 5. Tab t = 2.132 from Table C.5, for a = 0.05, one-sided H 1, and v = (n — 2) = 6 — 2 = 4, for this formula and these data, respectively. (It may help you to remember that the number of degrees of freedom, namely in — 2) occurs in the formula for Calc t). (14.2) 216 ■ Essential Statistics 6. Since Calc t > Tab t , reject H 0. 7. There is significant positive correlation between height and weight.
  • Book cover image for: Essential Statistics for Applied Linguistics
    • Hanneke Loerts, Wander Lowie, Bregtje Seton(Authors)
    • 2020(Publication Date)
    „ A computer has calculated for these data that r xy = .996. What would be your first impression? „ The critical values (also see Section 4.3.3) for this to be significant at α = .05 and df = 6 are −.707 and .707. Can we conclude that reading skill and listening comprehension are significantly related? ASSESSING RELATIONSHIPS AND COMPARING GROUPS 67 So far, we have only discussed correlations for interval data that show a linear, homoscedastic relationship between two normally distributed variables. In case one or more of these assumptions are violated, correlations can still be calculated by using a type of correla-tion that is based on mean rank orders. The most commonly reported statistic in this case is the Spearman’s Rho ( ρ ). The interpretation of Spearman’s Rho is largely identical to that of Pearson r . When you have small sample sizes and many identical scores, it may be better to report Kendall’s Tau ( τ ). Correlations can also come in handy when you want to assess the reliability of an experiment or exam, that is, whether the experiment or exam you created is in fact a good one and does what it claims to do. An exam is reliable when students would perform the same if they had to take the exam again ( test-retest reliability ), but it is also reliable when students who are equal receive the same score. These two ideas of relia-bility are difficult to test, because you cannot really test students again and again on the same test and you can also never be sure how equal students are. There is another type of reliability, however, that we can test. When we create a test or an exam we expect the better participants to outperform the less good ones on every question. If the less strong participants do better on one of the questions than the better partic-ipants, what does that tell us about that question? Clearly, it indicates that this might not be a reliable question for the test.
  • Book cover image for: Beginning R
    eBook - PDF

    Beginning R

    The Statistical Programming Language

    • Mark Gardener(Author)
    • 2012(Publication Date)
    • Wrox
      (Publisher)
    Simple Hypothesis Testing WHAT YOU WILL LEARN IN THIS CHAPTER: ➤ How to carry out some basic hypothesis tests ➤ How to carry out the Student’s t-test ➤ How to conduct the U-test for non-parametric data ➤ How to carry out paired tests for parametric and non-parametric data ➤ How to produce correlation and covariance matrices ➤ How to carry out a range of correlations tests ➤ How to test for association using chi-squared ➤ How to carry out goodness of fit tests Many statistical analyses are concerned with testing hypotheses. In this chapter you look at methods of testing some simple hypotheses using standard and classic tests. You start by comparing differences between two samples. Then you look at the correlation between two samples, and fi nally look at tests for association and goodness of fit. Other tests are avail-able in R, but the ones illustrated here will form a good foundation and give you an idea of how R works. Should you require a different test, you will be able to work out how to carry it out for yourself. USING THE STUDENT’S T-TEST The Student’s t-test is a method for comparing two samples; looking at the means to determine if the samples are different. This is a parametric test and the data should be normally distrib-uted. You looked at the distribution of data previously in Chapter 5. Several versions of the t-test exist, and R can handle these using the t.test( ) command, which has a variety of options (see Table 6-1), and the test can be pressed into service to 6 182 ❘ CHAPTER 6 SIMPLE HYPOTHESIS TESTING deal with two- and one-sample tests as well as paired tests. The latter option is discussed in the later section “Paired T- and U-Tests”; in this section you look at some more basic options. TABLE 6-1: The t.test() Command and Some of the Options Available. COMMAND EXPLANATION t.test(data.1, data.2) The basic method of applying a t-test is to compare two vectors of numeric data.
  • Book cover image for: Choosing and Using Statistics
    eBook - ePub
    2 value, the similarity to the Pearson product-moment correlation is very great. The P-value given in a standard linear regression is the probability that the best-fit slope of the relationship between two variables is actually zero. In a comparison with the Pearson statistic, this translates to the probability that there is no relationship (i.e. r = 0). Regression analysis usually considers a second null hypothesis: ‘the value of y is zero when x is zero’. This translates to a test of whether the best-fit line through the data set passes through the origin. It is often labelled as a test of the intercept.
    The advantage of using regression rather than Pearson’s correlation is that the assumption that both variables are distributed normally is lifted. The assumptions are different, although slightly less restrictive. For example, regression assumes that the x (‘cause’) values should be measured without error, that the variation in the y (‘effect’) is the same for any value of x, that the y values should be normally distributed at any value of x and, for linear regression, that the relationship between two variables can be described by a straight line. Of these the assumption that variance in y is the same for all values of x is probably the least likely to be true. It is usual for variance in y to increase as the x (‘cause’) variable increases.
    If you decide to use regression to determine the association between two variables please use great caution because the implication is that one of the variables in some way depends on the other. Also, one of the underlying assumptions of regression is that the values of the ‘cause’ variable are in some way set, or chosen, by the investigator; clearly this is not the case if the observations are taken at random.
    An example
    Again we consider the penguin pairs used as the example throughout this section. The researchers were testing the null hypothesis that male and female birds were forming pairs independent of their size. The alternative hypothesis was that there was an association (either positive or negative) of male and female sizes in pairs. Framed in this way the hypothesis is not suitable for regression. But, if the penguins form pairs by a choice of one sex for another then it might become a more like a regression problem. If females actively chose males then the null hypothesis can be framed in regression terms as ‘male size does not depend on female size’. However, even if male size can be said to depend
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.