If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content
Current time:0:00Total duration:7:21
DAT‑1 (EU)
DAT‑1.B (LO)
DAT‑1.B.1 (EK)
DAT‑1.C (LO)
DAT‑1.C.1 (EK)

Video transcript

so I took some screen captures from the Khan Academy exercise on correlation coefficient intuition and we've given they've given us some correlation coefficient so we need to match them to the various scatter plots on that exercise this little interface where we can drag these around in a table to match them to the different scatter plots and the point isn't to figure out how exactly to calculate these we'll do that in the future but really get an intuition of what we are trying to measure and the main idea is that correlation coefficients are trying to measure how well a linear model can describe the relationship between two variables so for example if I have fi let me go let me do some coordinate axes here so let's say that's one variable say that's my Y variable and let's say that is my X variable and so let's say when X is low Y is low when X is a little higher Y is a little higher when X is a little bit higher Y is higher when X is really high Y is even higher this one a linear model would describe it very very very well we can it's quite easy to draw a line that goes through that essentially goes through those points so something like this would have an R of 1 R is equal to 1 a linear model perfectly describes it and it's a positive correlation when one increases when one variable gets larger than the other variable is larger when what the other one one variable is smaller than the other variable is smaller and vice versa now what would an R of negative 1 look like well that would once again be a situation where linear model works really well but when one variable moves up the other one moves down and vice versa so let me draw my coordinates my coordinate axes again so I'm going to try to draw a data set where the R would be negative 1 so maybe when Y is high X is very low when y becomes lower X becomes higher when Y becomes good but lower X becomes a good bit higher so once again when Y when y decreases Y in x increases or as x increases Y decreases so they're moving in opposite directions but you can fit a line very easily to this so the mine would look something like look something like this so this would have an R of negative 1 and an R of 0 r is equal to 0 would be a data set where it's a line doesn't really fit very well at all so I'll do that one really small since I don't have much space here so an R of 0 might look might look something like this well maybe I have a data point here maybe have a data point here maybe have a data point here maybe have a data point here maybe I have one there there there there there and it wouldn't necessarily be this well-organized but this gives you a sense of thing where would you actually how would you actually try to fit a line here you could equally justify a line that looks like that or a line that looks like that or a line that looks like or a line that looks like that so there really isn't a linear model really does not describe the relationship between the two variables that well right over here so what that is a primer let's see if we can tackle these scatter plots now the way I'm going to do it is I'm just gonna try to eyeball what a linear model might look like and there's different methods of trying to fit a linear model to a data set and imperfect geddes that I drew very perfect ones at least for the R equals negative 1 and R equals 1 but these are what the real world actually looks like nothing very few times will things perfectly sit on a line so for scatter plot a if I were to try to fit a line it would look something like it would look something like that if I were try to minimize distances from these points to the line I do see a general trend that when y is you know if I look at these data points over here when y is high X is low and when X is high when X is larger Y is smaller so it looks like R is going to be less than zero in a reasonable bit less than 0 it's going to approach this thing here and if we look at our choices so it wouldn't be R equals 0.65 these are positive so I wouldn't use that one or that one and this one is almost no correlation R equals negative 0.02 this is pretty close to zero so I feel good with R is equal to negative 0.72 R is equal to negative 0.72 now I want to be clear if I didn't have these choices here I wouldn't just be able to say just looking at these data points without being able to do a calculation that R is equal to negative 0 seven - I'm just basing it on the intuition that it is a negative correlation it seems pretty strong a you know you're the pattern kind of jumps out at you that when Y is large Y X is small when X is large y is small and so I like something that's approaching R equals negative one so I've used this one up already now scatterplot B if I were to just try to eyeball it once again this is going to be imperfect but the trend if I were to try to fit a line it looks something like that so it looks like a line fits it reasonably well there's some points that would still be hard to fit they're still pretty far from the line and looks like it's a positive correlation and when X is small when Y is small X is relatively small and vice versa and when X as X grows Y grows and when Y grows X grows so this one's going to be positive and it looks like it would be reasonably positive and I have two choices here so I don't know which of these it's going to be so it's either going to be R is equal to 0.65 or R is equal to 0.8 for I also get scatter plots see now this one's all over the place it kind of looks like what we did over here you know I could you know I what does a line look like you could almost imagine anything does it look like that does it look like that does a line look like that these things really aren't don't seem to there's not a direction that you could say well as x increases maybe Y increases or decreases there's no rhyme or reason here so this looks very non correlated and so this one is pretty close to zero so I feel pretty good that this is the R is equal to negative point zero two in fact you know if we tried probably the best line that could be fit would be one with a slight negative slope so it might look something like this it might look something like this and notice even when we try to fit a line there's all sorts of points that are way off the line so the linear model did not fit it that well so R is equal to negative 0.02 so we use that one and so now we have scatterplot D so that's going to use one of the other positive correlations and it does look like well you know there is a positive correlation when Y is low X is low and Y is when X is high Y is I and vice versa and so we could try to fit something that looks something looks something like that but it's still not as good as that one you can see the points that we're trying to fit there's several points that are still pretty far away from our our model so the model is not fitting it that well so I would say scatterplot B is a better fit a linear model works better for scatterplot B then it works for scatterplot D so I would give the higher R to scatter plot B and the lower R R equals 0.65 to scatterplot D R is equal to 0.65 and once again that's because with the linear model it looks like there's a trend but there's a several data points that really more data points do not are way off the line in scatterplot D than in the case of scatterplot B there's a few that are still way off the line but these are even more off of the line in D