If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains ***.kastatic.org** and ***.kasandbox.org** are unblocked.

Main content

Current time:0:00Total duration:9:27

in the last several videos we did some fairly hairy mathematics and you might have even skipped them but we got to a pretty neat result we got to a formula for the slope and y-intercept of the best-fitting regression line when you measure the error by the squared distance to that line and our formula is and I'll just rewrite it here just so we have something neat to look at so the slope of that line is going to be the mean of X's times the mean of the Y's minus the mean of the X YS and don't worry this seems really confusing we're going to actually do an example of this actually in a few seconds divided by the mean of x squared minus the mean of the X Squared's and if this looks a little different than what you see in your statistics class or your textbook you might see this swapped around if you multiply both the numerator and the denominator by negative one you could you could see this written as the mean of the X wise minus the mean of x times the mean of the Y's all of that over the mean of the X Squared's minus the mean of the X is squared these are obviously the same thing they're just you multiply the numerator and the denominator by negative one which is the same thing as multiplying the whole thing by one and of course whatever you get for y sort whatever you get for M you can then just substitute back in this back over here to get your be your B is going to be equal to the mean of the Y's - your M whatever M value you got over here let me write that in yellow so you get it's very clear you solved for the M value here minus M times the mean of the X's minus M times the mean of the X's and this is all you need so let's actually put that into practice let's actually put that into practice so let's say I have let's say I have three points and I'm going to make sure that these points aren't collinear because that otherwise it wouldn't be interesting so let me draw three points over here let's say that one point is the point 1 1 comma 2 so this is 1/2 so we have the point right over here we have the point 1 comma 2 and then we also have the point let's say we also have the point oh I don't know let's let's say we also have the point 2 comma 1 let's say we have the point 2 comma 1 and then let's say we also have the point let's say we also have the point 3 comma I don't know let's put do something let's do something a little bit crazy let's do 3 comma 4 so 3 or that well let's let's do it over here let's do 4 comma 3 just so we can actually fit it on the page so 4 comma 3 is going to be something right over here so this is 4 comma 3 so those are our three points and what we want to do is find the best fitting we want to find the best fitting regression line which we suspect is going to look something we'll see what it looks like but I suspect it's going to look something like that we'll see what it actually looks like using our formulas which we have proven which we have proven so a good place to start is just to calculate these things ahead of time and then just substitute them back into the equation so what's the mean of our X's the mean of our X is the mean of our X's is going to be 1 plus I'll do the same colors 1 plus 2 1 plus 2 plus 4 plus 4 divided by I'll do this in a neutral color the mean over X is divided by 3 and what's this going to be 1 plus 2 is 3 + 4 7 divided by 3 it is equal to 7 over 3 now what is the mean of our Y's the mean of our Y's once again I want to do this in a neutral color the mean of our Y's is equal to 2 is equal to 2 plus 1 plus 1 plus 3 plus 3 all of that over 3 so this is 2 plus 1 is 3 plus 3 is 6 divided by 3 is equal to 2 sorry is equal to 2 this is 6 divided by 3 is equal to 2 now what is the mean what is the mean of our what is the mean of our X wise what is the mean of our X wise well over here it's going to be so our first XY over here is 1 x 2 so it's going to be 1 times 2 plus 2 times 1 plus 2 times 1 plus 4 times 3 plus 4 times 3 and we have three of these X YS so divided by 3 so what's this going to be equal to we have 2 plus 2 which is 4 4 plus 12 which is 16 so it's going to be 16 over 3 I get that right 4 plus 12 yep 16 over 3 and then the last one we have to calculate is the mean of the X Squared's so what's the mean of the x squared so the first x squared is just going to be 1 squared this one squared right over here plus this 2 squared plus 2 squared right over here plus this 4 squared plus this 4 squared and we have three data points again so this is 1 plus 4 which is 5 plus 16 5 plus 16 is equal to 21 over 3 which is equal to 7 so that worked out to a pretty neat number so let's actually find our MS and our and our B's so our slope are optimal slope for this for our regression line the mean of the X's it's going to be 7/3 7 over 3 times the mean of the Y's the mean of the Y's is 2 minus the mean of the X Y's well that's 16 over 3 16 over 3 and then all of that all of that over the mean of the X is the mean of the X is a 7/3 squared so 7 over 3 squared minus the mean of the x squared so it's going to be minus this 7 right over here and now we just have to do a little bit of mathematics here I'm tempted to get out my calculator but I'll resist the temptation it's nice to keep things as fractions so let's see if I can calculate this so this is going to be equal to this is 14 over 3 14 over 3 minus 16 over 3 all of that over this is 49 49 over nine right 7/3 squared is 49 over nine and then minus seven if I wanted to express that as something over nine that's the same thing as sixty-three that's the same thing as 63 over nine and so in our numerator we get negative two thirds negative two over three and then in our denominator what's 49 minus 63 that's negative let's see that's negative fourteen that's negative fourteen over nine and this is the same thing as negative two-thirds times nine over fourteen negative nine over fourteen divide numerator and denominator by three well the negatives are going to cancel out first of all we divide by three that becomes a one that becomes a three divided by two becomes a one that becomes a seven so our slope is 3/7 not too bad now we can go back and figure out our y-intercept so let's figure out our y-intercept using this right over here so our y-intercept B is going to be equal to the mean of the Y's the mean of the Y's is two minus our slope we just figured out our slope to be 3/7 so minus 3/7 times the mean of the X's which is 7/3 times 7/3 well these these just are the reciprocal of each other so they cancel out that just becomes one so our y-intercept is literally just two minus one so it equals one so we have the equation for our line our regression line is going to be y is equal to we figured out M M is 3/7 Y is equal to 3/7 X plus our y-intercept is 1 plus 1 and we are done we are done so let's actually try to graph this so our y-intercept is going to be 1 it's going to be right over there and the slope of our line is three sevens so for every seven we run we rise three or another way to think of it for every three and a half we run we rise one and a half so we're going to go one 1/2 right over here so this line if you were to graph it and obviously I'm hand drawing it so it's not going to be that exact is going to look like is going to look like that right over there it actually won't go directly it actually won't go directly through that line so I don't want to give you that impression so it might look something like this at this line we have shown that this formula minimizes the squared distances from each of these points to that line anyway that was at least in my mind pretty neat