If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

## AP®︎/College Statistics

### Course: AP®︎/College Statistics>Unit 5

Lesson 2: Correlation

# Example: Correlation coefficient intuition

Sal explains the intuition behind correlation coefficients and does a problem where he matches correlation coefficients to scatter plots.

## Want to join the conversation?

• What is "r", in the correlation coefficient r= 0.65?
• "r" is the correlation coefficient. It is always between -1 and 1, with -1 meaning the points are on a perfect straight line with negative slope, and r = 1 meaning the points are on a perfect straight line with positive slope.
If you want to calculate it from data, this is the procedure:
1) Find the mean (average) of all the x-values. Call this xbar.
2) Find the mean (average) of all the y-values. Call this ybar.
3) For every x-value, subtract xbar. Call these Δxi (i is an index. i = 1, 2, 3, ...)
4) For every y-value, subtract ybar. Call these Δyi (i is an index. i = 1, 2, 3, ...)

These Δxi's and Δyi's are called the "deviations". They will be approximately half positive and half negative, since (usually) about half the values are above the mean and half are below. To calculate r,
r = ( Σ(Δxi*Δyi) ) / [sqrt( Σ( Δxi)² ) * sqrt( Σ( Δyi)² ) ]
So you can see that the bottom is the square root of the sum of the squared deviations for x, times the same for y. Because the deviations are squared, every term is positive (except maybe a few are zero when Δxi = 0 or Δyi = 0 (i.e. for any values exactly equal to the mean).

The key is the top, where nothing is squared. The top is the sum of Δxi *Δyi, so it will be positive when Δx and Δy are BOTH positive or BOTH negative. This pushes r towards being positive (positive correlation). But when Δx and Δy have opposite signs, then Δxi *Δyi will be negative, and that pushes r towards being negative (negative correlation).

Make up a simple example and try it, with, say, four points. Here are four points to try it with that make the calculation not too bad:
(1, 1), (2, 3), (6, 5), (7, 11)
You should find xbar = 4 and ybar = 5
Thus, Δxi's are -3, -2, 2, 3, and Δyi's are -4, -2, 0, 6
Put these in the formula and you should get r = 0.891, a quite high correlation.
Conversely, pick any four points that make a horizontal rectangle, for example (2, 2), (8, 2), (2, 6), (8, 6). If you calculate r for these points, it will be 0.
• Explain In fortnite terms pls
• So if you were playing fortnite and you saw a graph of your kd ratio and it goes into the negatives that means that the number would be negative and if you were going into the positives that means that the number would be positive.
• I think the answer is no, but does the slope of the line matter in regards to the r-value?

Will it always be -1 even if the line is just slightly tilted "downwards"?
• Yes and no. There are two particular situations where the slope (oarlock there of) do matter:

1. When there is no variation in the x-variable (ie: all of the points are on a vertical line).
2. When there is no variation in the y-variable (all the points are on a horizontal line).

In both of these cases, the correlation (and also the slope) are undefined. But outside of these special cases, the answer is no, the magnitude of the slope doesn't matter, only the sign. If all the points lie on a straight line, then the slop could be -1 or -1000, and the correlation coefficient would still be -1.
• Can a line be greater than 1 or less than -1?
• Not in this context, no. 1 means a perfect positive correlation here while -1 means a perfect negative correlation. Any deviation from this perfect correlation would reduce the correlation coefficient. It is important to note that the correlation coefficient is NOT the incline / slope of the line that depicts the given data but rather the degree to which all of the data is displayable by that line or how far the data diverts from it. Hence the term linear correlation. If the data results in a perfect line, it is an r = 1 (the more, the more) or an r = -1 (the more, the less).
• Ηλλο ψαν ανυονε σεε τηισ μεσσαγε? μυ ναμε ισ ..... ανδ ι λοωε ματηυαυ
• In , why are these points in Scatterplot C not having a relationship?
• There is no relationship because x has no effect on y. As x increases, y doesn’t clearly increase or decrease. The points seem to be randomly spread through out the graph.
• i dont get how you got r=1 in
• In the video, Sal explains that if you can draw a line that goes through all data set points, your r-value will be 1. Essentially, in simplistic terms, all of the points must be colinear. Now, the sign of your r-value comes with the direction your data set generally trends to point. As shown in Sal's example, the data set moved in the positive direction, justifying the positive sign. Besides signs, this concept slightly contradicts the idea of a slope as its coefficients measure how "perfect" points can line up when you are drawing the best fit. Slope's coefficients, however, give you your change in y concerning x.
• @, Sal says that a correlation coefficient of 0 means that a line would not fit well at all. Do we define lines as y=mx+b (algebra) or a set of points that extend infinitely in both/opposite directions(geometry)? Because x=0 geometrically is a line, but algebraically is not. So if the line of best fit is x=0, then what would the correlation coefficient be? (Sorry if this is a dumb question.)
• for the last specific case you mentioned (x=0), the correlation coefficient r would be 0 too.

visually, the line is exactly on the y axis. this means you have no choice on x variable and even when you "choose" 0 as x, it can't give you a definite answer as it could spit out any values as y, thus there's no trend between x and y variables here at all

i think your question isn't dumb, rather thought-provoking

keep going

#p.s. if you meant y=0m+b by saying x=0, the same logic can be applied more clearly. y=b means a line of 0 slope. thus whatever you choose as x, it has no impact on y as y is always b. so no trend, thus r=0 once again.