Main content

## More on regression

Current time:0:00Total duration:10:54

# Proof (part 3) minimizing squared error to regression line

## Video transcript

All right, so where we left
off, we had simplified our algebraic expression for the
squared error to the line from the n data points. We kind of visualized it. This expression right here would
be a surface, I guess you could view it as a surface
in three dimensions, where for any m and b is going to be a
point on that surface that represents the squared
error for that line. Our goal is to find the m and
the b, which would define an actual line, that minimize
the squared error. The way that we do that, is we
find a point where the partial derivative of the squared error
with respect to m is 0, and the partial derivative
with respect to b is also equal to 0. So it's flat with
respect to m. So that means that the slope
in this direction is going to be flat. Let me do it in the
same color. So the slope in this direction,
that's the partial derivative with respect to
m, is going to be flat. It's not going to change
in that direction. The partial derivative
with respect to b is going to be flat. So it will be a flat point
right over there. The slope at that point in that
direction will also be 0, and that is our minimum point. So let's figure out the m and
b's that give us this. So if I were to take the partial
derivative of this expression with respect to m. Well this first term has
no m terms in it. So it's a constant from the
point of view of m. Just as a reminder, partial
derivatives, it's just like taking a regular derivative. You're just assuming that
everything but the variable that you're doing the partial
derivative with respect to, you're assuming everything
else is a constant. So in this expression, all the
x's, the y's, the b's, the n's, those are all constant. The only variable, when we take
the partial derivative with respect to m, that
matters is the m. So this is a constant. There's no m here. This term right over
here, we're taking with respect to m. So the derivative of this with
respect to m, it's kind of the coefficients on the m. So negative 2 times n times the
mean of the xy's, that's the partial of this
with respect to m. Then this term or right
here has no m's in it. So it's constant with
respect to m. So its partial derivative
with respect to m is 0. Then this term here, you have
n times the mean of the x squared times m squared. So this is going to be-- we're
talking about a partial derivative with respect to m--
so it's going to be 2 times n times the mean of the
x [? squareds ?] times m. The derivative of m squared is
2m, and then you just have this coefficient
there as well. Now this term, you also
have an m over there. So let's see, everything
else is just kind of a coefficient on this m. So the derivative with respect
to m is 2bn times the mean of the x's. If I took the derivative of 3m,
the derivative is just 3. It's just the coefficient
on it. Then finally, this is a constant
with respect to m. So we don't see it. So this is the partial
derivative with respect to m. That's that right over there. We want to set this
equal to 0. Now let's do the same thing
with respect to b. This term, once again, is
a constant from the perspective of b. There's no b here. There's no b over here. So the partial derivatives
of either of these with respect to b is 0. Then over here you have a
negative 2n times the mean of y's as a coefficient on a b. So the partial derivative with
respect to b is going to be minus 2n, or negative 2n, times
the mean of the y's. Then there's no b over here. Then we do have a b over here. So it's plus 2mn times
the mean of the x's. This is essentially
the coefficient on the b over here. It was written in a mixed order,
but all of these are constants from the point
of view of b. They are the coefficient
in front of the b. The partial derivative of that
with respect to b is just going to be the coefficient. Then finally, the partial
derivative of this with respect to b is going to be 2nb,
Or 2nb to the first you could even say. We want to set this
equal to 0. So it looks very complicated. But remember, we're just trying
to solve for the m's and the b 's. We have two equations with
two unknowns here. We have the m's and then
we have the b's. To simplify this, both of these
equations, actually the top one and the bottom
one, both sides are divisible by 2n. I mean 0 is divisible
by anything. It'll be just 0. So let's divide the top equation
and by 2n and see what we get. If we divide the top equation by
2n, this'll become just 1. That goes away, and then
those go away. You would just be left with
negative times the mean, the negative mean of the xy's plus
m times the mean of the x squareds, plus b times the mean
of the x's is equal to 0. That's this first expression
when you divide both sides by negative 2n. The second expression will
be, this will go away. This is when you divide
it by 2n. I don't want to say
negative 2n. When you divide this by 2n,
that'll go away, that will go away, and then those
will go away. You're just left with the
negative mean of the y's plus m times the mean of the x's
plus b is equal to 0. So if we find the m and the b
values that satisfy the system of equations, we have minimized
the squared error. We could just solve it
in a traditional way. But I want to rewrite this,
because I think it's kind of interesting to see what these
really represents. So let's add this mean
of the xy's to both sides of this top equation. What do we get? We get m times the mean of
the x [? squareds ?] plus b times the mean of the
x's is equal to, these are going to cancel out, is equal
to the mean of the xy's. That's that top equation. This bottom equation, right
here, let's add the mean of y to both sides of
this equation. I do that so that that
cancels out. And then we're left with m--
I'll do that in the blue color to show you the same equation--
we have m times the mean of the x's plus b is equal
to the mean of the y's. Now, I actually want to
get both of these into mx plus b form. This is actually
already there. Actually you can see, that if
our best-fitting line is going to be y is equal to mx plus b--
we still have to find the m and the b-- but we see on
that best-fitting line, because the m and the b that
satisfy both of these equations are going to be
the m and the b on that best-fitting line. So that best-fitting line
actually contains the point, and we get this from the second
equation right here. It contains the point. I should write it this way. The coordinate mean of x mean
of y lies on the line. And you could see it
right over here. If you put the mean of x in this
for the optimal m and b, you are going to get
the mean of the y. So that's interesting. This optimal line. Let's never forget what we're
even trying to do. This optimal line is going to
contain some point on it-- let me do that in a new color--
it's going to contain some point on it right here that is
the mean of all of the x values and the mean of
all the y values. That's just interesting. It kind of makes sense. It kind of makes intuitive
sense. Now this other thing, just to
kind of get it in the same point of view. Then it will actually become a
kind of an easier way to solve the system. You could solve this a million
different ways. But just to give us an intuition
of what even is going on here, what's another
point that's on the line? Because if you have two points
on the line, you know what the equation of the line
is going to be. Well the other point, we want
this to be in mx plus b form. So let's divide both sides of
this equation by this term right here, by the
mean of the x 's. If we do that, we get m times
the mean of the x [? squareds ?] divided by the mean of the x's
plus b is equal to the mean of the xy's divided by the
mean of the x's. So when you write it in this
form, this is the exact same equation as that, I just divided
both sides by the mean of the x's, you get another
interesting point that will lie on this optimal fitting
line, at least from the point of view of the squared
distances. So another point that will lie
on it, on this optimal line, the x value is going to be
this, the mean of the x [? squareds ?] divided by the mean
of the x's. Then the y value is going to
be the mean of the xy's divided by the mean
of the x's. I'll let you think about
that a little bit more. But already, this is actually
the two points that lie on the line, so both of these on the
best-fitting line based on how we're measuring a good fit,
which is the squared distance. These are on the line
that minimize that squared distance. What I'm going to do in next
video, and this is turning into like a six or seven video
saga on trying to prove the best-fitting line or finding
the formula for the best-fitting line. But it's interesting. There's all sorts of kind of
neat little mathematical things to ponder over here. But in the next video, we can
actually use this information. We could have just solved
the system straight up. But we can actually use this
information right here to solve for our m and b's. Maybe we'll do it both ways
depending on my mood.