Studying for a test? Prepare with these 6 lessons on Exploring bivariate numerical data.
See 6 lessons

# Proof (part 3) minimizing squared error to regression line

Video transcript
All right, so where we left off, we had simplified our algebraic expression for the squared error to the line from the n data points. We kind of visualized it. This expression right here would be a surface, I guess you could view it as a surface in three dimensions, where for any m and b is going to be a point on that surface that represents the squared error for that line. Our goal is to find the m and the b, which would define an actual line, that minimize the squared error. The way that we do that, is we find a point where the partial derivative of the squared error with respect to m is 0, and the partial derivative with respect to b is also equal to 0. So it's flat with respect to m. So that means that the slope in this direction is going to be flat. Let me do it in the same color. So the slope in this direction, that's the partial derivative with respect to m, is going to be flat. It's not going to change in that direction. The partial derivative with respect to b is going to be flat. So it will be a flat point right over there. The slope at that point in that direction will also be 0, and that is our minimum point. So let's figure out the m and b's that give us this. So if I were to take the partial derivative of this expression with respect to m. Well this first term has no m terms in it. So it's a constant from the point of view of m. Just as a reminder, partial derivatives, it's just like taking a regular derivative. You're just assuming that everything but the variable that you're doing the partial derivative with respect to, you're assuming everything else is a constant. So in this expression, all the x's, the y's, the b's, the n's, those are all constant. The only variable, when we take the partial derivative with respect to m, that matters is the m. So this is a constant. There's no m here. This term right over here, we're taking with respect to m. So the derivative of this with respect to m, it's kind of the coefficients on the m. So negative 2 times n times the mean of the xy's, that's the partial of this with respect to m. Then this term or right here has no m's in it. So it's constant with respect to m. So its partial derivative with respect to m is 0. Then this term here, you have n times the mean of the x squared times m squared. So this is going to be-- we're talking about a partial derivative with respect to m-- so it's going to be 2 times n times the mean of the x [? squareds ?] times m. The derivative of m squared is 2m, and then you just have this coefficient there as well. Now this term, you also have an m over there. So let's see, everything else is just kind of a coefficient on this m. So the derivative with respect to m is 2bn times the mean of the x's. If I took the derivative of 3m, the derivative is just 3. It's just the coefficient on it. Then finally, this is a constant with respect to m. So we don't see it. So this is the partial derivative with respect to m. That's that right over there. We want to set this equal to 0. Now let's do the same thing with respect to b. This term, once again, is a constant from the perspective of b. There's no b here. There's no b over here. So the partial derivatives of either of these with respect to b is 0. Then over here you have a negative 2n times the mean of y's as a coefficient on a b. So the partial derivative with respect to b is going to be minus 2n, or negative 2n, times the mean of the y's. Then there's no b over here. Then we do have a b over here. So it's plus 2mn times the mean of the x's. This is essentially the coefficient on the b over here. It was written in a mixed order, but all of these are constants from the point of view of b. They are the coefficient in front of the b. The partial derivative of that with respect to b is just going to be the coefficient. Then finally, the partial derivative of this with respect to b is going to be 2nb, Or 2nb to the first you could even say. We want to set this equal to 0. So it looks very complicated. But remember, we're just trying to solve for the m's and the b 's. We have two equations with two unknowns here. We have the m's and then we have the b's. To simplify this, both of these equations, actually the top one and the bottom one, both sides are divisible by 2n. I mean 0 is divisible by anything. It'll be just 0. So let's divide the top equation and by 2n and see what we get. If we divide the top equation by 2n, this'll become just 1. That goes away, and then those go away. You would just be left with negative times the mean, the negative mean of the xy's plus m times the mean of the x squareds, plus b times the mean of the x's is equal to 0. That's this first expression when you divide both sides by negative 2n. The second expression will be, this will go away. This is when you divide it by 2n. I don't want to say negative 2n. When you divide this by 2n, that'll go away, that will go away, and then those will go away. You're just left with the negative mean of the y's plus m times the mean of the x's plus b is equal to 0. So if we find the m and the b values that satisfy the system of equations, we have minimized the squared error. We could just solve it in a traditional way. But I want to rewrite this, because I think it's kind of interesting to see what these really represents. So let's add this mean of the xy's to both sides of this top equation. What do we get? We get m times the mean of the x [? squareds ?] plus b times the mean of the x's is equal to, these are going to cancel out, is equal to the mean of the xy's. That's that top equation. This bottom equation, right here, let's add the mean of y to both sides of this equation. I do that so that that cancels out. And then we're left with m-- I'll do that in the blue color to show you the same equation-- we have m times the mean of the x's plus b is equal to the mean of the y's. Now, I actually want to get both of these into mx plus b form. This is actually already there. Actually you can see, that if our best-fitting line is going to be y is equal to mx plus b-- we still have to find the m and the b-- but we see on that best-fitting line, because the m and the b that satisfy both of these equations are going to be the m and the b on that best-fitting line. So that best-fitting line actually contains the point, and we get this from the second equation right here. It contains the point. I should write it this way. The coordinate mean of x mean of y lies on the line. And you could see it right over here. If you put the mean of x in this for the optimal m and b, you are going to get the mean of the y. So that's interesting. This optimal line. Let's never forget what we're even trying to do. This optimal line is going to contain some point on it-- let me do that in a new color-- it's going to contain some point on it right here that is the mean of all of the x values and the mean of all the y values. That's just interesting. It kind of makes sense. It kind of makes intuitive sense. Now this other thing, just to kind of get it in the same point of view. Then it will actually become a kind of an easier way to solve the system. You could solve this a million different ways. But just to give us an intuition of what even is going on here, what's another point that's on the line? Because if you have two points on the line, you know what the equation of the line is going to be. Well the other point, we want this to be in mx plus b form. So let's divide both sides of this equation by this term right here, by the mean of the x 's. If we do that, we get m times the mean of the x [? squareds ?] divided by the mean of the x's plus b is equal to the mean of the xy's divided by the mean of the x's. So when you write it in this form, this is the exact same equation as that, I just divided both sides by the mean of the x's, you get another interesting point that will lie on this optimal fitting line, at least from the point of view of the squared distances. So another point that will lie on it, on this optimal line, the x value is going to be this, the mean of the x [? squareds ?] divided by the mean of the x's. Then the y value is going to be the mean of the xy's divided by the mean of the x's. I'll let you think about that a little bit more. But already, this is actually the two points that lie on the line, so both of these on the best-fitting line based on how we're measuring a good fit, which is the squared distance. These are on the line that minimize that squared distance. What I'm going to do in next video, and this is turning into like a six or seven video saga on trying to prove the best-fitting line or finding the formula for the best-fitting line. But it's interesting. There's all sorts of kind of neat little mathematical things to ponder over here. But in the next video, we can actually use this information. We could have just solved the system straight up. But we can actually use this information right here to solve for our m and b's. Maybe we'll do it both ways depending on my mood.