Mathematics

Spearman's Rank Correlation Coefficient

Spearman's rank correlation coefficient is a statistical measure that assesses the strength and direction of association between two ranked variables. It is based on the ranks of the data rather than the actual values, making it robust to outliers and non-normal distributions. The coefficient ranges from -1 to 1, where 1 indicates a perfect positive relationship and -1 indicates a perfect negative relationship.

Written by Perlego with AI-assistance

12 Key excerpts on "Spearman's Rank Correlation Coefficient"

  • Book cover image for: Econometrics
    eBook - ePub
    • K. Nirmal Ravi Kumar(Author)
    • 2020(Publication Date)
    • CRC Press
      (Publisher)
    The calculation of Pearson’s correlation coefficient holds good under the following data assumptions: •  Interval or ratio level. •  Linearly related. •  Bivariate normally distributed. •  Monotonically related. If the available data does not meet the above assumptions, then we use Spearman’s rank correlation test. That is, if the available data have the following characteristics, we employ r s : •  Interval or ratio level or ordinal; •  Monotonically related •  Unlike Pearson’s correlation coefficient, there is no requirement of normal distribution of data (ie., normality) and hence, it is a non-parametric statistic. If there are no repeated data values, a perfect Spearman correlation of +1 or −1 occurs, when each of the variables is a perfect monotone function of the other. It is possible to avoid making assumptions about the population parameters by giving ranks to observations. Some of the variables like education, management, sex of the individuals etc., cannot be given quantitative data and they can be measured in terms of ranks. Spearman’s rank correlation coefficient studies the degree of association between the variables, but in terms of ranks instead of actual numerical values. The formula to work out Spearman’s rank correlation coefficient is given by: r s = 1 − 6 Σ D 2 n (n 2 − 1) = 1 − 6 Σ D 2 n 3 − n Equation 2.19 where, n = number of observations, D = differences between ranks i.e., r 1 – r 2 i. Properties of r s : They include: 1.  The value of r s ranges from −1 to +1. If r s is +1, it indicates there is complete agreement in the order of ranks and the ranks are in the same direction. Proof: r 1 r 2 D = (r 1 – r 2) D 2 1 1 0 0 2 2 0 0 3 3 0 0 ∑ D 2 =0 r s = 1 − 6 Σ D 2 n (n 2 − 1) = 1 − 6 * 0 3 (3 2 − 1) = 1 Similarly, when r s is −1, there is complete disagreement in the order of ranks and they are in opposite direction. Proof
  • Book cover image for: Statistical Analysis for Education and Psychology Researchers
    eBook - ePub

    Statistical Analysis for Education and Psychology Researchers

    Tools for researchers in education and psychology

    • Ian Peers(Author)
    • 2006(Publication Date)
    • Routledge
      (Publisher)
    In this chapter we will consider Spearman’s rank order correlation, which is appropriate when variables are measured at an ordinal level, or when data is transformed to an ordinal scale, this would include percentages. It is one of a number of alternative distribution-free correlation-type statistics. Other nonparametric coefficients include: the point biserial correlation (when both variables are discrete true dichotomies); biserial correlation (when variables have been dichotomized from an underlying continuous distribution) and Kendall’s Tau coefficient (an alternative to Spearman’s rank correlation which is actually a measure of concordance—similarity of two rank orders rather than a correlation). For discussion and illustrated examples of these alternative correlation statistics see Siegel and Castellan, (1988); Hays, (1981); and Guilford and Fruchter (1973). We are concerned in this and in the subsequent chapter with the inferential use of correlations and consequently, we should bear in mind how sample data was generated, especially possible bias and range restrictions which can attenuate correlations (reduce sample correlations). 7.2 Spearman’s rho (rank order correlation coefficient) When to Use Spearman’s rank order correlation should be used when: the relationship between two variables is not linear, (this can be checked by plotting the two variables); when measurement and distributional assumptions are not met (the variables are not interval or ratio measures and observations do not come from a bivariate normal distribution); when sample sizes are too small to establish an underlying distribution, or when the data naturally occur in the form of ranks. Spearman’s rank order correlation is equivalent to the Pearson Product Moment correlation (a parametric correlation procedure) performed on the ranks of the scores rather than on the raw scores themselves. The rank order correlation procedure is probably used less often than it should be
  • Book cover image for: Statistics Using R
    eBook - PDF

    Statistics Using R

    An Integrative Approach

    Consequently, it is important to check the reliability of any measuring instruments used and to employ those instruments with a high reliability. A more complete discussion of reliability is beyond the scope of this book. The interested reader should consult a text on tests and measurement. One such example is Cohen and Swerdlik (2005). WHEN AT LEAST ONE VARIABLE IS ORDINAL AND THE OTHER IS AT LEAST ORDINAL: THE SPEARMAN RANK CORRELATION COEFFICIENT The Spearman rank correlation coefficient measures the strength of the linear relationship between two variables when the values of each variable are rank-ordered from 1 to N, where N is the number of pairs of values. The formula for the Spearman correlation coefficient, given as Equation 5.2 and denoted by r s , may be obtained as a special case of the Pearson correlation coefficient when the N cases of each variable are assigned the integer values from 1 to N, inclusive, and no two cases share the same value: r s ¼ 1  6 P d 2 i N 3  N ; (5.2) 164 5 EXPLORING RELATIONSHIPS BETWEEN TWO VARIABLES where d i represents the difference between ranks for each case. As a special case of the Pearson correlation coefficient, r s is interpreted in the same way as the Pearson. For example, notice that when the ranks for the two variables being correlated are identical, the differences between them will be zero, and the Spearman correlation coefficient will be 1.00, indicating a perfect positive correlation between the variables. •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• EXAMPLE 5.3 Consider the McDonald’s hamburger data. Suppose we are not convinced that our measure of fat is interval-leveled. That is, while we believe that the Quarter Pounder™, with 21 grams of fat, has more fat than the Cheeseburger, with 14 grams of fat, and less fat than the Big Mac™, with 28 grams of fat, we are not convinced that, in terms of fat, the Quarter Pounder™ is midway between the other two types of hamburger.
  • Book cover image for: Statistical Reasoning in the Behavioral Sciences
    • Bruce M. King, Patrick J. Rosopa, Edward W. Minium(Authors)
    • 2018(Publication Date)
    • Wiley
      (Publisher)
    Spearman’s rank-order correlation coefficient, r S , is r s symbol for Spearman’s rank-order correlation coefficient closely related to the Pearson correlation coefficient. 2 In fact, if the paired scores are both in the form of ranks (and there are no ties in rank), calculation of r S and Pearson’s r result in identical outcomes. When would one want to use the rank-order coefficient? Suppose 10 aspects of a job are iden- tified, such as hours of work, working conditions, quality of supervision, wages, and so forth. Suppose we ask workers to place these job aspects in rank order according to their importance, and suppose we also ask their supervisors to rank the same aspects according to the importance that they believe workers would assign. We may now ask about the extent to which one worker agrees with another and the extent to which supervisors understand workers’ feelings. We can study the degree of agreement between any two persons by calculating the coefficient of correla- tion between the ranks they assign to the 10 job aspects. Because the data are in the form of ranks, r and r S will yield the same coefficient. We prefer r S in this case because it is simpler to calculate. We can also use r S when measures are in score form. In this case, we translate each set of mea- sures into rank form, assigning 1 to the lowest score, 2 to the next lowest, and so on. When would we do this? Sometimes, the scale properties of the measures appear doubtful (see Section 1.6). If what matters is that one score is higher than another and how much higher is not really important, translating scores to ranks will be suitable. In any event, we typically use r S only in circumstances in which n is rather small. When n is large, the proportion of tied ranks is likely to increase, and the work of translating scores to ranks becomes progressively more burdensome and error-prone.
  • Book cover image for: Statistics Using IBM SPSS
    eBook - PDF

    Statistics Using IBM SPSS

    An Integrative Approach

    The formula for the Spearman Correlation Coefficient, given as Equation (5.2) , and denoted by r s , ρ , or rho , may be obtained as a EXPLORING RELATIONSHIPS BETWEEN TWO VARIABLES 156 special case of the Pearson Correlation Coefficient when the N cases of each variable are assigned the integer values from 1 to N inclusive and no two cases share the same value. r d N N s i = --∑ 1 6 2 3 (5.2) where d i represents the difference between ranks for each case. As a special case of the Pearson Correlation Coefficient, r s is interpreted in the same way as the Pearson. For exam-ple, notice that when the ranks for the two variables being correlated are identical, the dif-ferences between them will be zero and the Spearman Correlation Coefficient will be 1.00, indicating a perfect positive correlation between the variables. EXAMPLE 5.3. Consider the McDonald’s hamburger data. Suppose we are not con-vinced that our measure of fat is interval-leveled. That is, while we believe that the Quarter Pounder™, with 21 grams of fat, has more fat than the Cheeseburger, with 14 grams of fat, and less fat than the Big Mac™, with 28 grams of fat, we are not convinced that, in terms of fat, the Quarter Pounder™ is midway between the other two types of hamburger. Accordingly, we decide to consider the fat scale as ordinal, and transform the original data to ranked data. To find the relationship between fat and calories, we compute the Spearman Correlation Coefficient on the ranked data instead of the Pearson Correlation Coefficient on the original data. ☞ Remark. The formula for the Spearman is derived as a special case of the Pearson for-mula by taking advantage of the fact that the data are rankings from 1 to N . We may note that the Spearman Correlation Coefficient on the ranked data will give the identical result as the Pearson Correlation Coefficient on the ranked data.
  • Book cover image for: Introduction to Human Factors and Ergonomics for Engineers
    • Mark R. Lehto, Steven J. Landry(Authors)
    • 2012(Publication Date)
    • CRC Press
      (Publisher)
    The statistical significance of a sample * Sidney Siegel describes the Spearman correlation coefficient and provides numerical examples on pages 202–213 of his book (Siegel, 1956). Other related measures include Kendall’s rank correlation, a partial rank coefficient, and a coefficient of concordance. 539 Chapter fourteen: Questionnaires and interviews correlation can be determined for a particular sample size, N , and level of significance α . For larger sample sizes, smaller sample correlation coefficients become statistically signifi-cant. Table 14.6 shows the minimum statistically significant sample rank correlation coef-ficient* as a function of the sample size when α = 0.05. * See Siegel (1956) for an extended version of this table. BOX 14.1 EXAMPLE OF CALCULATION OF THE SPEARMAN RANK CORRELATION COEFFICIENT In the numerical example in Table 14.5, the rank differences between X and Y are given in the last column. The sum of those squared differences is d i i N 2 1 5 2 2 2 2 2 0 5 1 1 1 5 1 0 25 1 1 2 25 1 5 50 = = ∑ = + + + + = + + + + = . . . . . and N = 5 so the correlation is r s = -= -= 1 6 5 50 5 5 1 33 0 120 0 725 3 ( . ) . . BOX 14.2 EXAMPLE FROM BOX 14.1 WITH TIE CORRECTIONS The sums of squares for X and Y , from the Box 14.1 example, corrected for ties, are X i i N 2 1 3 3 3 5 5 12 3 3 12 2 2 12 10 2 0 5 7 5 = ∑ = -----= --= . . Y i i N 2 1 3 3 5 5 12 3 3 12 10 2 8 0 = ---= -= = ∑ . The corrected Spearman correlation coefficient is r s = + -= = 7 5 8 0 5 5 2 7 5 8 0 10 0 15 49 0 645 . . . . ( . ) . . . The ties in ranking reduce the sums of squares and that in turn lowers the correla-tion coefficient from 0.725 without considering ties to 0.645 after ties are considered. 540 Introduction to human factors and ergonomics for engineers Wilcoxen sign test The Wilcoxen sign test is a nonparametric test that can be used to measure the sta-tistical significance of differences in paired ordinal ratings of items A and B .
  • Book cover image for: Statistics Using Stata
    eBook - PDF

    Statistics Using Stata

    An Integrative Approach

    The formula for the Spearman Correlation Coefficient, given as Equation (5.4), and denoted by r s , ρ, or rho, may be obtained as a special case of the Pearson Correlation Coefficient when the N cases of each variable are assigned the integer values from 1 to N inclusive and no two cases share the same value. r d N N s i = - - ∑ 1 6 2 3 (5.4) where d i represents the difference between ranks for each case. As a special case of the Pearson Correlation Coefficient, r s is interpreted in the same way as the Pearson. For exam- ple, notice that when the ranks for the two variables being correlated are identical, the dif- ferences between them will be zero and the Spearman Correlation Coefficient will be 1.00, indicating a perfect positive correlation between the variables. EXAMPLE 5.3. Consider the McDonald’s hamburger data. Suppose we are not convinced that our measure of fat is interval-leveled. That is, while we believe that the Quarter Pounder™, with 21 grams of fat, has more fat than the Cheeseburger, with 14 grams of fat, and less fat than the Big Mac™, with 28 grams of fat, we are not convinced that, in terms of fat, the Quarter Pounder™ is midway between the other two types of hamburger. Accordingly, we decide to consider the fat scale as ordinal, and transform the original data to ranked data. To find the relationship between fat and calories, we compute the Spearman Correlation Coefficient on the ranked data instead of the Pearson Correlation Coefficient on the original data. ☞ Remark. The formula for the Spearman is derived as a special case of the Pearson for- mula by taking advantage of the fact that the data are rankings from 1 to N. We may note that the Spearman Correlation Coefficient on the ranked data will give the identical result as the Pearson Correlation Coefficient on the ranked data.
  • Book cover image for: Essential Statistics
    Correlation of Quantitative Variables ■ 221 14.6 Hypothesis Test for Spearman's Rank Correlation Coefficient Example Using the same data from the example in the previous section: 1. /Zo:The ranks of height and weight are uncorrelated. 2. H l: High ranks of height correspond to high ranks of weight (one-sided alternative). 3. 5% significance level. 4. Calc rs = 0.943, from previous section. 5. Tab rs = 0.829, from Table C.12 of Appendix C, for n = 6, one-sided alternative hypothesis and 5% level of significance. 6. Since Calc rs > Tab rs, reject H 0. 7. There is a significant positive correlation between the ranks of height and weight (5% level). Assumption'. We must be able to rank each variable. The extensive notes in Section 14.4 on the interpretation of correlation coefficients apply equally to both the Pearson and the Spearman coeffi-cients. 14.7 Spearman's Coefficient in the Case of Ties In Section 14.4 it was stated that Formula (14.3) did not apply in the case of tied ranks. In this situation, we can either use a more complicated formula for rs , or we can use the following ingenious method. In the case of ties, calculate Pearson’s r using the ranks rather than the original observed values of the two variables. It can be shown that the resulting value is the correct value of Spearm an ’s rs and this can then be tested for significance as in Section 14.6. Example A random sample of ten students were asked to rate, on a 10-point scale, two courses they had all taken. A rating of 1 means ‘absolutely dreadful’, while a rating of 10 means ‘absolutely wonderful’. The data are given in the first two columns of Table 14.3. Here we are not interested in whether one course has a higher mean rating than the other (but, if we were, then a Wilcoxon signed rank test would be appropriate), but we are interested
  • Book cover image for: Simple Statistical Tests for Geography
    If there is a clear causal relationship you can square the r -value to give the pro-portion of variance explained. For example, if the effect size r (point-biserial correlation) is 0.71 then you could conclude that gender explains about 50% (0.71 squared = 0.50) of the variance in examination performance. R 2 = 0.77 20 30 40 50 60 70 80 90 20 30 40 50 60 70 Question 1 as (%) agreement Question 2 as (%) agreement FIGURE 10.13 Level of agreement with two questions recorded as percentages. 249 Correlation 10.5 Spearman’s Rank Correlation or Spearman’s Rho 10.5.1 When It Is Useful There are many circumstances when it is not really appropriate to use the Pearson’s cor-relation coefficient because the assumptions are not met. Spearman’s, in contrast, is a non-parametric method, so there are fewer assumptions and it can often be used when the data are not suitable for the parametric method. As with other non-parametric methods it copes well with small samples and there is no assumption of normality, so the data can be skewed and outliers are not such a problem. An important advantage of Spearman’s approach is that it does not matter if the relationship between the two parameters is linear or curved. The relationship still has to be monotonic, however, which means that it cannot go up and then back down, or vice versa. 10.5.2 What It Is based On The difference between Pearson’s r -value and the equivalent Spearman’s rho (Greek letter that looks so confusingly like the letter p in most fonts ( ρ ) that I will just spell it), is that Spearman’s approach uses the ranks of the two data sets rather than the original values (Spearman 1904). In every other respect the two methods are identical, which allows us to use a sneaky shortcut when calculating Spearman’s rho in a spreadsheet.
  • Book cover image for: Compassionate Statistics
    eBook - ePub

    Compassionate Statistics

    Applied Quantitative Analysis for Social Services (With exercises and instructions in SPSS)

    11

    Correlation

    Spearman’s rho and Pearson’s r

    While the individual man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty…. Individuals vary, but percentages remain constant. So says the statistician .
    —Arthur Conan Doyle, cited onwww.quotegarden.com/statistics

    Introductory Case Illustration

    Because of the prevalence of dating violence in current society, research is being conducted on this problem, especially on the severity of this problem’s impact on adolescent victims. Using a combination of focus group discussions and survey questionnaires, Prospero (2006) explored the dating perceptions and behavioral expectations of 89 middle school students in the southwestern part of the United States and performed a correlation analysis on some of the results. In this study, the male student was identified as the protagonist.
    The researcher notes, “A Spearman rank correlation was conducted to investigate the relationship between the perceptions and the behavioral expectations of the protagonist. All perceptions questions were aggregated to form a 5-point scale (0 to 4), where 0 = not aggressive responses to all four questions and 4 = aggressive responses to all four questions. The result of the correlation analysis was statistically significant with r = .388, p < .000” (Prospero, 2006, p. 476).
    Discussion . A Spearman rho test for a correlation between two variables (in this example, the variable male perceptions and male behavioral expectations ) captures whether there exists a relationship between variables measured at the ordinal level. A correlation at either the ordinal or scale level describes whether a quantitative association exists between variables, how strong that association is, and, finally, in what direction it flows. In other words, and using this case illustration, if respondents tended to answer low or high on a question involving their own perception , would they also tend to answer in a similar way on another question related to behavioral
  • Book cover image for: Statistics for the Social Sciences
    eBook - PDF

    Statistics for the Social Sciences

    A General Linear Model Approach

    Check Yourself! • Why does a correlation between two variables not imply that the independent variable causes the dependent variable? • Explain the third variable problem. Summary Pearson’ s correlation is a statistic that quantifies the relationship between two interval- or ratio-level variables. It is calculated with Formula 12.1: r ¼ Σ x i   X ð Þ y i   Y ð Þ n  1    ^ σ x ^ σ y   356 Correlation There are two components of the Pearson’ s r value to interpret. The first is the sign of the correlation coefficient. Positive values indicate that individuals who have high X scores tend to have high Y scores (and that individuals with low X scores tend to have low Y scores). A negative correlation indicates that individuals with high X scores tend to have low Y scores (and that individuals with low X scores tend to have high Y scores). The second component of a correlation coefficient that is interpreted is the correlation’ s number, which indicates the strength of the relationship and the consistency with which the relationship is observed. The closer an r value is to +1 or –1, the stronger the relationship between the variables. The closer the number is to zero, the weaker the relationship. For many purposes, calculating the r value is enough to answer the research questions in a study. But it is also possible to test a correlation coefficient for statistical significance, where the null hypothesis is r = 0. Such a null hypothesis statistical significance test (NHST) follows the same steps as all NHSTs. The effect size for Pearson’ s r is calculated by squaring the r value (i.e., obtaining r 2 ). The data used to calculate Pearson’ s r can be visualized with a scatterplot, which was introduced in Chapter 4. In a scatterplot each sample member is represented as a dot plotted in a position corresponding to the individual’ s X and Y scores. Scatterplots for strong correlations tend to have a group of dots that are closely grouped together.
  • Book cover image for: Statistics for the Social Sciences
    eBook - PDF

    Statistics for the Social Sciences

    A General Linear Model Approach

    An experiment requires the experimenter to randomly assign sample members to two groups. Usually a group called the control group receives no 350 Correlation Summary Pearson’ s correlation is a statistic that quantifies the relationship between two interval- or ratio-level variables. It is calculated with Formula 12.1: r ¼ P x i  X ð Þ y i  Y ð Þ n1 s x  s y . There are two components of the Pearson’ s r value to interpret. The first is the sign of the correlation coefficient. Positive values indicate that individuals who have high X scores tend to have high Y scores (and that individuals with low X scores tend to have low Y scores). A negative correlation indicates that individuals with high X scores tend to have low Y scores (and that individuals with low X scores tend to have high Y scores). The second component of a correlation coefficient that is interpreted is the number, which indicates the strength of the relationship and the consistency with which the relationship is observed. The closer an r value is to +1 or –1, the stronger the relationship between the variables. The closer the number is to zero, the weaker the relationship. For many purposes, calculating the r value is enough to answer the research questions in a study. But it is also possible to test a correlation coefficient for statistical significance, where the null hypothesis is r = 0. Such a null hypothesis statistical significance test (NHST) follows the same steps of all NHSTs. The effect size for Pearson’ s r is calculated by squaring the r value (r 2 ). The data used to calculate Pearson’ s r can be visualized with a scatterplot, which was introduced in Chapter 3. In a scatterplot each sample member is represented as a dot plotted in a treatment, while the other group, called the experimental group, receives the treatment.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.