If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

## Algebra 1 (Eureka Math/EngageNY)

### Course: Algebra 1 (Eureka Math/EngageNY)>Unit 2

Lesson 14: Topic D: Lesson 19: Interpreting correlation

# Calculating correlation coefficient r

AP.STATS:
DAT‑1 (EU)
,
DAT‑1.B (LO)
,
DAT‑1.B.1 (EK)
,
DAT‑1.B.2 (EK)
,
DAT‑1.C (LO)
,
DAT‑1.C.1 (EK)
The most common way to calculate the correlation coefficient (r) is by using technology, but using the formula can help us understand how r measures the direction and strength of the linear association between two quantitative variables.

## Want to join the conversation?

• Why would you not divide by 4 when getting the SD for x? I don't understand how we got three.
• For calculating SD for a sample (not a population), you divide by N-1 instead of N.
• Why is r always between -1 and 1?

I know that this question has been asked before but the answers are either too technical or too naive. Could someone please provide an answer that is mathematical in nature but can be understood by someone who have ok but not strong mathematical foundation.

• Why is r always between -1 and 1?
• How was the formula for correlation derived?
• Is the correlation coefficient also called the Pearson correlation coefficient?
• The Pearson correlation coefficient(also known as the Pearson Product Moment correlation coefficient) is calculated differently then the sample correlation coefficient. In this video, Sal showed the calculation for the sample correlation coefficient.
• Why is the denominator n-1 instead of n?
Thanks.
• When instructor calculated standard deviation (std) he used formula for unbiased std containing n-1 in denominator. If you have the whole data (or almost the whole) there are also another way how to calculate correlation. In this case you must use biased std which has n in denominator. And in overall formula you must divide by n but not by n-1. Does not matter in which way you decide to calculate. The result will be the same.
• What does the little i stand for? Like in xi or yi in the equation. Also, the sideways m means sum right?
• This is a bit of math lingo related to doing the sum function, "Σ". The "i" tells us which x or y value we want. Imagine we're going through the data points in order: (1,1) then (2,2) then (2,3) then (3,6). Remembering that these stand for (x,y), if we went through the all the "x"s, we would get "1" then "2" then "2" again then "3". The "i" indicates which index of that list we're on. So if "i" is 1, then "Xi" is "1", if "i" is 2 then "Xi" is "2", if "i" is 3 then "Xi" is "2" again, and then when "i" is 4 then "Xi" is "3".
• would the correlation coefficient be undefined if one of the z-scores in the calculation have 0 in the denominator? I thought it was possible for the standard deviation to equal 0 when all of the data points are equal to the mean.
• Yes. Assume that the following data points describe two variables (1,4); (1,7); (1,9); and (1,10). The mean for the x-values is 1, and the standard deviation is 0 (since they are all the same value). Given this scenario, the correlation coefficient would be undefined.

Another question to ask is whether it would ever make sense to calculate a correlation coefficient when one has the exact same data for one of the variables. Recall that the correlation coefficient is supposed to describe how well the two variables can be described by a linear relationship. Given that one variable (x in this case) is constant, I don't see how a line would ever describe the relationship between the two variables.