If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content
Current time:0:00Total duration:8:11
AP.STATS:
DAT‑1 (EU)
,
DAT‑1.G (LO)
,
DAT‑1.G.1 (EK)
,
DAT‑1.G.2 (EK)

Video transcript

in previous videos we took this bivariate data and we calculated the correlation coefficient and just as a bit of a review we have the formula here and it looks a bit intimidating but in that video we saw all it is is an average of the product of the z-scores for each of those pairs and as we said if R is equal to one you have a perfect positive correlation if R is equal to negative one you have a perfect negative correlation and if R is equal to zero you don't have a correlation but for this particular bivariate data set we got an R of 0.9 for 6 which means we have a fairly strong positive correlation what we're going to do on this video is build on this notion and actually come up with the equation for the least squares line that tries to fit these points so before I do that let's just visualize some of the statistics that we have here for these data points we clearly have the four data points plotted but let's plot the statistics for X so the sample mean and the sample standard deviation for X are here in red and actually let me box these off in red so that you know that's what is going on here so the sample mean for X is easy to calculate one plus two plus two plus three divided by four is eight divided by four which is two so we have x equals two right over here and then this is one sample standard deviation above the mean this is one sample standard deviation below the mean and then we could do the same thing for the Y variables so the mean is three and this is one sample standard deviation for y above the mean and this is one sample standard deviation for Y below the mean and visualizing these means especially their intersection and also their standard deviations will help us build an intuition for the equation of the least squares line so generally speaking the equation for any line is going to be y is equal to to MX plus B where this is the slope and this is the y-intercept for the regression line we'll put a little hat over it so this you would literally say Y hat this tells you that this is a regression line that we're trying to fit to these points first what is going to be the slope well the slope is going to be R times the ratio between the sample standard deviation in the Y Direction over the sample standard deviation in the X Direction this might not seem intuitive at first but we'll talk about it in a few seconds and hopefully it'll make a lot more sense but the next thing we need to know is alright if we can calculate our slope how do we calculate our y-intercept well like you first learned in algebra 1 you can calculate the y-intercept if you already know the slope by saying well what point is definitely going to be on my line and for a least-squares regression line you're definitely going to have the point sample mean of X comma sample mean of Y so you're definitely going to go through that point so before I even calculate for this particular example we're in previous videos we calculated the R to be zero point nine four six or roughly equal to that let's just think about what's going on so our least squares line is definitely going to go through that point now if R were one if we had a perfect positive correlation then our slope would be the standard deviation of Y over the standard deviation of X so if you were to start at this point and if you were to run your standard deviation of X and rise your standard deviation of Y well with a perfect positive correlation your line would look like this and that makes a lot of sense because you're looking at your spread of Y over your spread of X if R were equal to one this would be your slope standard deviation of Y or standard deviation of X that has parallels to when you first learn about slope change in Y over change in X here you're seeing the you could say the average spread and why over the average bread and X and this would be the case when R is one so let me write that down this would be the case if R is equal to one what if R were equal to negative one it would look like this that would be our line if we had a perfect negative correlation now what if our word zero then your slope would be zero and then your line would just be this line y is equal to the mean of Y so you would just go through that right over there but now let's think about this scenario in this scenario our R is zero point 9 4 6 so we have a fairly strong correlation this is pretty close to 1 and so if you were to take zero point nine four six and multiply it by this ratio if you were to move forward in X by the standard deviation and X for this case how much would you move up and Y well you would move up R times the standard deviation of Y and as we said if R was 1 you would get all the way up to this perfect correlation line but here it's zero point nine four six so you would get up about 95% of the way to that and so our line without even looking at the equation is going to look something like this which we can see is a pretty good fit for those points I'm not proving it here in this video but now that we have an intuition for these things hopefully you appreciate this isn't just coming out of nowhere and there's some strange formula it actually makes intuitive sense let's calculate it for this particular set of data M is going to be equal to R zero point nine four six times the sample standard deviation of Y two point one six zero over the sample standard deviation of X zero point eight one six we can get our calculator out to calculate that so we have zero point nine four six times two point one six zero divided by 0.81 6 it gets us to two point five zero let's just round to the nearest hundredths for simplicity here so this is approximately equal to two point five zero and so how do we figure out the y-intercept well remember we go through this point so we're going to have two point five zero times our x mean so our x mean is two times two remember this right over here is our x mean plus B plus B is going to be equal to our Y meet our Y mean we see right over here is three and so what do we get we get three is equal to 5 plus B five plus B and so what is B well if you subtract 5 from both sides you get B is equal to negative 2 and so there you have it the equation for our regression line we deserve a little bit of a drumroll here we would say Y hat the hat tells us that this is equation for a regression line is equal to 2.5 0 times X minus 2 minus 2 and we are done