by Svetlana Cheusheva, updated on

*The tutorial explains the basics of Spearman correlation in a simple language and shows how to calculate the Spearman rank correlation coefficient in Excel.*

When doing correlation analysis in Excel, in most cases you will deal with the Pearson correlation. But because the Pearson correlation coefficient measures only a linear relationship between two variables, it does not work for all data types - your variables may be strongly associated in a non-linear way and still have the coefficient close to zero. In such circumstances, you can do the Spearman rank correlation instead of Pearson's.

The **Spearman correlation** is the nonparametric version of the Pearson correlation coefficient that measure the degree of association between two variables based on their ranks.

The *Pearson Product Moment Correlation* tests the **linear** relationship between two continuous variables. Linear means a relationship when two variables change in the same direction at a constant rate.

*Spearman Rank Correlation* evaluates the **monotonic** relationship between the ranked values. In a monotonic relationship, the variables also tend to change together, but not necessarily at a constant rate.

The Spearman correlation analysis is to be used in any of the following circumstances when the underlying assumptions of the Pearson correlation are not met:

- If your data exhibit a
**non-linear**relationship or are not normally distributed. - If at least one variable is
**ordinal**. If your values can be placed in "first, second, third…" order, you are dealing with ordinal data. - If there are significant
**outliers**. Unlike the Pearson correlation, the Spearman correlation is not sensitive to outliers because it performs calculations on the ranks, so the difference between actual values does not have meaning.

For example, you can use the Spearman correlation to find the answers to the following questions:

- Are people with a higher level of education more concerned about the environment?
- Is the number of symptoms a patient has related to their willingness to take medication?

In statistics, the **Spearman correlation coefficient** is represented by either *r _{s}* or the Greek letter

The Spearman rank correlation coefficient measures both the strength and direction of the relationship between the ranks of data. It can be any value from -1 to 1, and the closer the absolute value of the coefficient to 1, the stronger the relationship:

- 1 is a perfect positive correlation
- -1 is a perfect negative correlation
- 0 is no correlation

Depending on whether there are or there are no ties in the ranking (the same rank assigned to two or more observations), the Spearman correlation coefficient can be calculated with one of the following formulas.

If there are **no tied ranks**, a simpler formula will do:

Where:

*d*is the difference between a pair of ranks_{i}*n*is the number of observations

To deal with **tied ranks**, the full version of Spearman correlation formula has to be used, which is a slightly modified version of Pearson's r:

Where:

- R(x) and R(y) are the ranks of the
*x*and*y*variables - R(x) and R(y) are the mean ranks

Regrettably, Excel does not have an inbuilt function for calculating the Spearman rank correlation coefficient. However, that does not mean you will have to rack your brain with the above formulas. By manipulating Excel a bit, we can come up with a much simpler way to do Spearman correlation.

As an example, let's try to find out if our physical activity has any relation to our blood pressure. In column B, we have the number of minutes that 10 men of the same age spend daily in a gym, and in column C, we have their systolic blood pressure.

To find the Spearman correlation coefficient in Excel, perform these steps:

**Rank your data**Because the Spearman correlation evaluates the associations between two variables based on their ranks, you need to rank your source data. This can be quickly done by using the Excel RANK.AVG function.

To rank the first variable (physical activity), enter the below formula in D2 and then drag it down to D11:

`=RANK.AVG(B2,$B$2:$B$11,0)`

To rank the second variable (blood pressure), put the following formula in cell E2 and copy it down the column:

`=RANK.AVG(C2,$C$2:$C$11,0)`

For the formulas to work correctly, please be sure to lock the ranges with absolute cell references.

At this point, your source data should look similar to this:

**Find Spearman correlation coefficient**

With the ranks established, we can now use the Excel CORREL function to get Spearman's rho:`=CORREL(D2:D11, E2:E11)`

The formula returns a coefficient of -0.7576 (rounded to 4 digits), which shows a fairly strong negative correlation and allows us to conclude that the more a person exercises, the lower their blood pressure.

The Pearson correlation coefficient for the same sample (-0.7445) indicates a bit weaker correlation, but still statistically significant:

The beauty of this method is that it is quick, easy, and works regardless of whether there are ties in the ranking or not.

If you are not quite sure that the CORREL function has computed Spearman's rho right, you can verify the result with the traditional formula used in statistics. Here's how:

- Find the difference between each pair of ranks (
**d**) by subtracting one rank from the other:`=D2-E2`

This formula goes to F2 and is then copied down the column.

- Raise each rank difference to the power of two (
**d**):^{2}

`=F2^2`

This formula goes to column G.

- Add up the squared differences:
`=SUM(G2:G11)`

This formula can go to any blank cell, G12 in our case.

From the following screenshot, you will probably gain better understanding of the data arrangement:

- Depending on whether your data set has any tied ranks or not, use one of these formulas to calculate the Spearman correlation coefficient.

In our example, there are no ties, so we can go with a simpler formula:

With *d ^{2}* equal to 290, and

As the result, you get -0.757575758, which perfectly agrees with the Spearman correlation coefficient calculated in the previous example.

In Microsoft Excel, the above calculations can be performed with the following equation:

`=1-(6*G12/(10*(10^2-1)))`

Where G12 is the sum of the squared rank differences (d^{2}).

The correlation coefficients in Excel only measure linear (Pearson) or monotonic (Spearman) relationships. However, other associations are possible. So, no matter which correlation you do, it is always a good idea to represent the relationship between the variables in a graph.

To draw a correlation graph for the ranked data, here's what you need to do:

- Calculate the ranks by using the RANK.AVG function as explained in this example.
- Select two columns with the ranks.
- Insert an XY scatter chart. For this, click the
*Scatter**chart*icon on the*Inset*tab, in the*Chats*group. - Add a trendline to your chart. The fastest way is to click the
*Chart Elements*button >*Add Trendline…*. - Display R-squared value on the chart. Double-click the trendline to open its pane, switch to the
*Trendline Options*tab and select the*Display R-squared value**on chart*box. - Show more digits in the R
^{2}value for better accuracy.

As the result, you will get a visual representation of the relationship between the ranks. Additionally, you will get the *Coefficient of Determination* (R^{2}), the square root of which is the Pearson correlation coefficient (r). But because you have plotted the ranked data, this Pearson's r is nothing else but Spearman's rho.

Note. R-squared is always a positive number, hence the deduced Spearman rank correlation coefficient will also be always positive. To add an appropriate sign, just look at the line in your correlation graph - an upward slope indicates a positive correlation (plus sign) and a downward slope indicates a negative correlation (minus sign).

In our case, R^{2} equals 0.5739210285. Use the SQRT function to find the square root:

`=SQRT(0.5739210285)`

…and you will get the already familiar coefficient of 0.757575758.

The downward slope in the graph exhibits a negative correlation, so we add the minus sign and get the correct Spearman correlation coefficient of -0.757575758.

That's how you can calculate the Spearman rank correlation coefficient in Excel. To have a closer look at the examples discussed in this tutorial, you are welcome to download our sample workbook below. I thank you for reading and hope to see you on our blog next week!

Spearman Rank Correlation in Excel (.xlsx file)

Table of contents