If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains ***.kastatic.org** and ***.kasandbox.org** are unblocked.

Main content

Current time:0:00Total duration:6:47

in the next few videos I'm going to embark on something that will we're just result in a formula that's pretty straightforward to apply and in most in most statistics classes you'll just see that end product but I actually want to show how to get there but I just want to warn you right now it's going to be a lot of hairy math most of it hairy algebra and then we're actually going to have to do a little bit of calculus near the end we're gonna have to do a few partial derivatives so if any of that sounds daunting or it sounds like something that will discourage you in some way you don't have to watch it you could skip to the end and just get the formula that we're going to derive but I at least find it pretty satisfying to actually derive it so what we're going to think about here is let's say we have n points let's say we have n points on a coordinate plane we have n points here and they all don't have to be in the first quadrant but just for simplicity or visualization I'll draw them all in the first quadrant so let's say I have this point right over here let me do them in different colors let's say I have this point right over here and that coordinate is x1 y1 and then let's say I have another point over here I have do that in a different color okay I have another point over here we're going to the coordinates there are x2 y2 and then I could keep adding points and I could keep drawing them you know we just have a ton of points there and there and there and we go all the way to the end point all the way to the actual end point maybe it's over here the end point is over here and we're just going to call that x + y n so we have n points here I haven't drawn all of the actual points but what I want to do is find a line that minimizes the squared distances to these different points so let's think about it let's visualize that line for a second so there's going to be some line and I'm going to try to draw a line that kind of approximates what these points are doing so let me draw this line here so maybe the line might look something like this I'm going to try my best to approximate it so maybe it looks something like that actually we draw it a little bit different maybe it looks something something like that I don't even know what it looks like right now what we want to do is minimize the squared error from each of these points to the line so let's think about what that means so the if the equation of this line right here the equation of that line is y is equal to MX plus B and this just comes straight out of Algebra one this is the slope of the line this is the slope of the line and this is the y-intercept this is actually the point zero B right here zero B what I want to do is I want to find and that's what the the topic of the next few videos are going to be I want to find an M and a B so I want to find these two things that define this line so that it minimizes the squared error so let me define what the error even is so for each of these points the error between it and the line is the vertical distance so this right here so this right here we can call error one error error one and then this right here this right here would be error two it would be the vertical distance between that point and the line or you could think of it the Y value of this point and the y value of the line and you just keep going all the way to the end point between the Y value of this point and the y value of the line so this error right here error one if you think about it it is this value right here this Y value it is this Y value it's equal to Y 1 minus this Y value well what's this Y value going to be well over here we have X is equal to X 1 and this point is the point M X 1 plus B you took you take X 1 into this equation of the line and you're going to get this point right over here so that's literally going to be equal to M X 1 plus B that's that first error we can keep doing it with all of the points this error right over here is going to be y 2 minus M X 2 plus B is y 2 and then this right here this point right here is M X 2 plus B the value when you take x2 into this line and then we keep going all the way to our endpoint this error right here is going to be Y n minus M minus M X n m xn plus B now what we want to do so if we wanted to just take the straight up sum of the errors we could just sum these things up but we want to do is minimize the square of the error between each of these points each of these endpoints in the line so let me draw refine let me define I'll do this in a new color let me define the squared error against this line as being equal to the sum of these squared errors so this this error right here or error one we could call it is y1 minus mx1 plus B and we're going to square it so this is the error 1 squared and then we're going to go to error 2 squared error 2 squared is y 2 y 2 minus y 2 minus M X 2 plus B and then we're going to square that error we're squaring this error and then we keep going we're going to keep going we're going to go n spaces or n points I should say we keep going all the way to this n error the nth error is going to be yn minus M X n plus B and then we're going to square it and then we are going to square it so this is the squared error of the line and I want to find and we're going to over the next few videos is I want to find find the M and B that minimizes minimizes this value that minimizes the squared error of the of this line right here so if you if you view this as the best metric for how good a fit a line is we're going to try to find the best fitting line for these points and I'll continue in the next video because I find that with these very hairy math problems it's good to kind of just deliver one concept of the time and it also minimizes my probability of making a mistake