If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Proof (part 2) minimizing squared error to regression line

Proof Part 2 Minimizing Squared Error to Line. Created by Sal Khan.

Want to join the conversation?

Video transcript

Our goal is to simplify this expression for the squared error between those n points. Just to remind ourselves what we're doing, we have these n points. And we're taking the sum of the squared error between each of those n points and our actual line, y equals mx plus b. And we get this expression over here, which we've been simplifying over the last couple videos. We're going to try to simplify this expression as much as possible. And then, we're going to try to to minimize this expression. Or find the m and b values that minimize it. Or I guess you could call it the best fitting line. Now to do that, it looks like we were just making the algebra even hairier and hairier. But this next step is going to simplify things a good bit. So just to show you that, if I want to take the mean of all of the squared values of the y's-- So that would be this. That would be y1 squared plus y2 squared plus all the way to yn squared. So I've summed n values, n squared values. And then I want to divide it by n, since there are n values here. And this is the mean of the y's squared. That's how we can denote it, just like that. Or, if you multiply both sides of this equation by n, you get y1 squared plus y2 squared plus all the way to yn squared is equal to n times the mean of the squared values of y. And notice, this is exactly what we have over here. That is n times the mean of the squared values of y. Or the mean of the y squareds. And we can do that with each of these terms. What is x1y1 plus x2y2 plus all the way to all the way to xnyn. Well, if we take this whole sum and we divide it by n terms, this is going to be the mean value for x times y. For each of those points, you multiply x times y. And you find the mean of all of those products. That's exactly what this is. Well, once again, you multiply both sides of this equation by n, and you get x1y1 plus x2y2 plus all the way to xnyn is equal to n times the mean of xy's. I think you see where this is going. This term right here is going to be equal to n times the mean of the products of xy. This term right here is n times the mean of the y values. And then, this term right here is n times the mean of the x squared values. This term right here is the mean of the x's times n. If you divided this by n, you'd get the mean. Since were not dividing it by n, this is the mean times n. And then this is, obviously, we don't the simplify anything. So let's rewrite everything using our new notation, knowing that these are the means of y squared, of xy, and all that. So our squared error to the line from the sum of the squared error to the line from the n points is going to be equal to-- this term right here is n times the mean of the y squared values. This term right here is equal to negative 2m. That's just that right there. Times n times the mean of the xy values, the arithmetic mean. And then we have this term over here. I think you can appreciate this is simplifying the algebraic expression a good bit. This term right over here is going to be minus 2bn times the mean of the y values. And then we have plus m squared times n times the mean of the x squared values. And then we have-- almost there, home stretch-- we have this over here which is plus 2mb times n times the mean of the x values. And then, finally, we have plus nb squared. So really, in the last two to three videos, all we've done is we simplified the expression for the sum of the squared differences from the those n points to this line, y equals mx plus b. So we're finished with the hard core algebra stage. The next stage, we actually want to optimize this. Maybe a the better way to talk about it, we want to minimize this expression right over here. We want to find the m and the b values that minimize it. And to help visualize it, we're going to start breaking into a little bit of three-dimensional calculus here. But hopefully it won't be too daunting. If you've done any partial derivatives, it won't be difficult. This is a surface. If you view that you have the x and y data points, everything here is a constant except for the m's and the b's. We're assuming that we have the x's and y's. So we can figure out the mean of the squared values of y, the mean of the xy product, the mean of the y's, the mean of the x squareds. We assume that those are all actual numbers. So this expression right here, it's actually going to be a surface in three dimensions. So you can imagine, this right here, that is the m-axis. This right here is the b-axis. And then, you could imagine the vertical axis to be the squared error. This is the squared error of the line axis. So for any combination of m and b, if you're in the mb plane, you pick some combination of m and b. You put it into this expression for the squared error of the line. It'll give you a point. If you do that for all of the combinations of m's and b's, you're going to get a surface. And the surface is going to look something like this. I'm going to try my best to draw it. It's going to look like this. You could almost imagine it as a kind of a bowl. Or you could even think of it as a three-dimensional parabola. If you want to think of it that way. Instead of a parabola that just goes like this. If you were to kind of rotate it around and distort it a little bit, you would get this thing that looks kind of like a cup, or a thimble, or whatever. And so what we want to do is to find the m and b values that minimize. Notice, this is a three-dimensional surface. I don't know if I'm doing justice to it. So you can imagine a three-dimensional surface that looks something like this. This is the back part that you're not seeing. So that's the inside of our three-dimensional surface. We want to find the m and b values that minimize the value on the surface. So there's some m and b value right over here that minimizes it. And I'll actually do the calculation in the next video. But to do that, we're going to find the partial derivative of this with respect to m. And we're going to find the partial derivative of this with respect to b and set both of them equal to 0. Because at this minimum point, I guess you could say in three dimensions, this minimum point on the surface is going to occur when the slope with respect to m and the slope with respect to b is 0. So at that point, the partial derivative of our squared error with respect to m is going to be equal to 0. And the partial derivative of our squared error with respect to b is going to be equal to 0. So all we're going to do, in the next video, is take the partial derivative of this expression with respect to m, set that equal to 0. And the partial derivative of this with respect to b, set that equal to 0. And then we're ready to solve for the m in the b. Or the particular m and b.