by Svetlana Cheusheva, updated on
The tutorial explains the basics of Spearman correlation in a simple language and shows how to calculate the Spearman rank correlation coefficient in Excel.
When doing correlation analysis in Excel, in most cases you will deal with the Pearson correlation. But because the Pearson correlation coefficient measures only a linear relationship between two variables, it does not work for all data types - your variables may be strongly associated in a non-linear way and still have the coefficient close to zero. In such circumstances, you can do the Spearman rank correlation instead of Pearson's.
The Spearman correlation is the nonparametric version of the Pearson correlation coefficient that measure the degree of association between two variables based on their ranks.
The Pearson Product Moment Correlation tests the linear relationship between two continuous variables. Linear means a relationship when two variables change in the same direction at a constant rate.
Spearman Rank Correlation evaluates the monotonic relationship between the ranked values. In a monotonic relationship, the variables also tend to change together, but not necessarily at a constant rate.
The Spearman correlation analysis is to be used in any of the following circumstances when the underlying assumptions of the Pearson correlation are not met:
For example, you can use the Spearman correlation to find the answers to the following questions:
In statistics, the Spearman correlation coefficient is represented by either rs or the Greek letter ρ ("rho"), which is why it is often called Spearman's rho.
The Spearman rank correlation coefficient measures both the strength and direction of the relationship between the ranks of data. It can be any value from -1 to 1, and the closer the absolute value of the coefficient to 1, the stronger the relationship:
Depending on whether there are or there are no ties in the ranking (the same rank assigned to two or more observations), the Spearman correlation coefficient can be calculated with one of the following formulas.
If there are no tied ranks, a simpler formula will do:
Where:
To deal with tied ranks, the full version of Spearman correlation formula has to be used, which is a slightly modified version of Pearson's r:
Where:
Regrettably, Excel does not have an inbuilt function for calculating the Spearman rank correlation coefficient. However, that does not mean you will have to rack your brain with the above formulas. By manipulating Excel a bit, we can come up with a much simpler way to do Spearman correlation.
As an example, let's try to find out if our physical activity has any relation to our blood pressure. In column B, we have the number of minutes that 10 men of the same age spend daily in a gym, and in column C, we have their systolic blood pressure.
To find the Spearman correlation coefficient in Excel, perform these steps:
Because the Spearman correlation evaluates the associations between two variables based on their ranks, you need to rank your source data. This can be quickly done by using the Excel RANK.AVG function.
To rank the first variable (physical activity), enter the below formula in D2 and then drag it down to D11:
=RANK.AVG(B2,$B$2:$B$11,0)
To rank the second variable (blood pressure), put the following formula in cell E2 and copy it down the column:
=RANK.AVG(C2,$C$2:$C$11,0)
For the formulas to work correctly, please be sure to lock the ranges with absolute cell references.
At this point, your source data should look similar to this:
=CORREL(D2:D11, E2:E11)
The formula returns a coefficient of -0.7576 (rounded to 4 digits), which shows a fairly strong negative correlation and allows us to conclude that the more a person exercises, the lower their blood pressure.
The Pearson correlation coefficient for the same sample (-0.7445) indicates a bit weaker correlation, but still statistically significant:
The beauty of this method is that it is quick, easy, and works regardless of whether there are ties in the ranking or not.
If you are not quite sure that the CORREL function has computed Spearman's rho right, you can verify the result with the traditional formula used in statistics. Here's how:
=D2-E2
This formula goes to F2 and is then copied down the column.
=F2^2
This formula goes to column G.
=SUM(G2:G11)
This formula can go to any blank cell, G12 in our case.
From the following screenshot, you will probably gain better understanding of the data arrangement:
In our example, there are no ties, so we can go with a simpler formula:
With d2 equal to 290, and n (number of observations) equal to 10, the formula undergoes the following transformations:
As the result, you get -0.757575758, which perfectly agrees with the Spearman correlation coefficient calculated in the previous example.
In Microsoft Excel, the above calculations can be performed with the following equation:
=1-(6*G12/(10*(10^2-1)))
Where G12 is the sum of the squared rank differences (d2).
The correlation coefficients in Excel only measure linear (Pearson) or monotonic (Spearman) relationships. However, other associations are possible. So, no matter which correlation you do, it is always a good idea to represent the relationship between the variables in a graph.
To draw a correlation graph for the ranked data, here's what you need to do:
As the result, you will get a visual representation of the relationship between the ranks. Additionally, you will get the Coefficient of Determination (R2), the square root of which is the Pearson correlation coefficient (r). But because you have plotted the ranked data, this Pearson's r is nothing else but Spearman's rho.
Note. R-squared is always a positive number, hence the deduced Spearman rank correlation coefficient will also be always positive. To add an appropriate sign, just look at the line in your correlation graph - an upward slope indicates a positive correlation (plus sign) and a downward slope indicates a negative correlation (minus sign).
In our case, R2 equals 0.5739210285. Use the SQRT function to find the square root:
=SQRT(0.5739210285)
…and you will get the already familiar coefficient of 0.757575758.
The downward slope in the graph exhibits a negative correlation, so we add the minus sign and get the correct Spearman correlation coefficient of -0.757575758.
That's how you can calculate the Spearman rank correlation coefficient in Excel. To have a closer look at the examples discussed in this tutorial, you are welcome to download our sample workbook below. I thank you for reading and hope to see you on our blog next week!
Spearman Rank Correlation in Excel (.xlsx file)
Table of contents