If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

### Course: AP®︎/College Statistics>Unit 5

Lesson 2: Correlation

# Calculating correlation coefficient r

The most common way to calculate the correlation coefficient (r) is by using technology, but using the formula can help us understand how r measures the direction and strength of the linear association between two quantitative variables.

## Want to join the conversation?

• Why would you not divide by 4 when getting the SD for x? I don't understand how we got three.
• For calculating SD for a sample (not a population), you divide by N-1 instead of N.
• Why is r always between -1 and 1?

I know that this question has been asked before but the answers are either too technical or too naive. Could someone please provide an answer that is mathematical in nature but can be understood by someone who have ok but not strong mathematical foundation.

• Hey there,

r is very different from slope. Slope can have any positive or negative value since the question being addressed is as follows: How much does x change when y changes? r is not telling us how great that change is. r is just trying to tell us whether the relationship between x and y is positive negative or neither. In math terms, is r equal to 1, -1 or 0? If, on average, the relationship between changes in x and changes in y are positive then we say r=1. If the relationship is positive but not perfectly so it might have a score of 0.85 (or any other number between 0 and 1). If there is no relationship then r=0. If the relationship is perfectly negative then r=-1.

I think this is easiest to understand if you try to visualize what a change in r means in terms of the angle of the least squares line (which Sal draws at ). The least squares line will always go through the mean of X and the mean of Y. So imagine the minute hand on a clock which can rotate 360 degrees but is pinned down to the centre of the clock.

When it comes to telling the time we refer to the angle of the minute hand by splitting the clock into 60. Here, when we say that r has a value of 1 we are basically saying that on average an increase in X will result in an increase in Y. This line (r=1) is an upwards sloping line. Now let's rotate our line clockwise....until the line is a straight horizontal line. This line has an r value of 0. (Which btw means that a change in X results in no change in Y).

Let's continue rotating our imaginary line clockwise....now we are moving 'beneath' the line which has an r of 0 so we are moving into negative territory. Do you see what I am getting at? Now r has a negative value. As this line moves further and further from the line which has an r of 0 we are getting closer to the 'opposite' of the line which had an r of positive 1. A line which is a 'perfect opposite' of r=1 will be r=-1 i.e a downwards sloping line.

But this cannot go on forever. As we rotate our line further and further clockwise we once again pass the perfectly horizontal line (r=0), but this time we are moving into positive territory i.e. we are moving away from r=0 and closer and closer to the line which has an r of 1.

Hope that helps :)
• Why is r always between -1 and 1?
• Is the correlation coefficient also called the Pearson correlation coefficient?
• The Pearson correlation coefficient(also known as the Pearson Product Moment correlation coefficient) is calculated differently then the sample correlation coefficient. In this video, Sal showed the calculation for the sample correlation coefficient.
• How was the formula for correlation derived?
• bro like im i 6th and i can't understand a single word of this
• I understand that the equation for r is the average of the x-zscores multiplied by their corresponding y-zscores. But what's the reasoning behind multiplying the zscores?
• Multiplying the z-scores of X and Y in the correlation coefficient formula helps capture the relationship between the deviations of each variable from their respective means.
When both z-scores have the same sign (both positive or both negative), their product is positive, indicating a positive correlation. When the signs differ (one positive, one negative), their product is negative, indicating a negative correlation.
Essentially, multiplying the z-scores accounts for the direction of deviation from the mean in both variables, which is crucial for assessing the relationship between them.
(1 vote)
• What does the little i stand for? Like in xi or yi in the equation. Also, the sideways m means sum right?