Current time:0:00Total duration:9:54

0 energy points

# Proof (part 2) minimizing squared error to regression line

Video transcript

Our goal is to simplify this
expression for the squared error between those n points. Just to remind ourselves
what we're doing, we have these n points. And we're taking the sum of the
squared error between each of those n points and
our actual line, y equals mx plus b. And we get this expression over
here, which we've been simplifying over the
last couple videos. We're going to try
to simplify this expression as much as possible. And then, we're going to
try to to minimize this expression. Or find the m and b values
that minimize it. Or I guess you could call it
the best fitting line. Now to do that, it looks like
we were just making the algebra even hairier
and hairier. But this next step is going to
simplify things a good bit. So just to show you that, if I
want to take the mean of all of the squared values of the
y's-- So that would be this. That would be y1 squared plus y2
squared plus all the way to yn squared. So I've summed n values,
n squared values. And then I want to divide
it by n, since there are n values here. And this is the mean
of the y's squared. That's how we can denote
it, just like that. Or, if you multiply both sides
of this equation by n, you get y1 squared plus y2 squared plus
all the way to yn squared is equal to n times the mean
of the squared values of y. And notice, this is exactly
what we have over here. That is n times the mean of
the squared values of y. Or the mean of the y squareds. And we can do that with each of
these terms. What is x1y1 plus x2y2 plus all the way
to all the way to xnyn. Well, if we take this whole
sum and we divide it by n terms, this is going to be the
mean value for x times y. For each of those points,
you multiply x times y. And you find the mean of
all of those products. That's exactly what this is. Well, once again, you multiply
both sides of this equation by n, and you get x1y1 plus x2y2
plus all the way to xnyn is equal to n times the
mean of xy's. I think you see where
this is going. This term right here is going
to be equal to n times the mean of the products of xy. This term right here
is n times the mean of the y values. And then, this term right here
is n times the mean of the x squared values. This term right here is the
mean of the x's times n. If you divided this by n,
you'd get the mean. Since were not dividing it by
n, this is the mean times n. And then this is, obviously,
we don't the simplify anything. So let's rewrite everything
using our new notation, knowing that these are the
means of y squared, of xy, and all that. So our squared error to the
line from the sum of the squared error to the line from
the n points is going to be equal to-- this term right here
is n times the mean of the y squared values. This term right here is
equal to negative 2m. That's just that right there. Times n times the mean
of the xy values, the arithmetic mean. And then we have this
term over here. I think you can appreciate
this is simplifying the algebraic expression
a good bit. This term right over here is
going to be minus 2bn times the mean of the y values. And then we have plus m squared
times n times the mean of the x squared values. And then we have-- almost there,
home stretch-- we have this over here which is plus 2mb
times n times the mean of the x values. And then, finally, we have
plus nb squared. So really, in the last two to
three videos, all we've done is we simplified the expression
for the sum of the squared differences from the
those n points to this line, y equals mx plus b. So we're finished with the
hard core algebra stage. The next stage, we actually
want to optimize this. Maybe a the better way to talk
about it, we want to minimize this expression right
over here. We want to find the m and the
b values that minimize it. And to help visualize it, we're
going to start breaking into a little bit of three-dimensional calculus here. But hopefully it won't
be too daunting. If you've done any partial derivatives, it won't be difficult. This is a surface. If you view that you have
the x and y data points, everything here is a
constant except for the m's and the b's. We're assuming that we
have the x's and y's. So we can figure out the mean
of the squared values of y, the mean of the xy product, the
mean of the y's, the mean of the x squareds. We assume that those are
all actual numbers. So this expression right here,
it's actually going to be a surface in three dimensions. So you can imagine, this right
here, that is the m-axis. This right here is the b-axis. And then, you could imagine the
vertical axis to be the squared error. This is the squared error
of the line axis. So for any combination of m
and b, if you're in the mb plane, you pick some combination
of m and b. You put it into this expression
for the squared error of the line. It'll give you a point. If you do that for all of the
combinations of m's and b's, you're going to get a surface. And the surface is going to
look something like this. I'm going to try my
best to draw it. It's going to look like this. You could almost imagine
it as a kind of a bowl. Or you could even
think of it as a three-dimensional parabola. If you want to think
of it that way. Instead of a parabola that
just goes like this. If you were to kind of rotate
it around and distort it a little bit, you would get this
thing that looks kind of like a cup, or a thimble,
or whatever. And so what we want to do is
to find the m and b values that minimize. Notice, this is a
three-dimensional surface. I don't know if I'm doing
justice to it. So you can imagine a
three-dimensional surface that looks something like this. This is the back part that
you're not seeing. So that's the inside of our
three-dimensional surface. We want to find the m and b
values that minimize the value on the surface. So there's some m and b
value right over here that minimizes it. And I'll actually do the
calculation in the next video. But to do that, we're going to
find the partial derivative of this with respect to m. And we're going to find the
partial derivative of this with respect to b and set
both of them equal to 0. Because at this minimum point, I
guess you could say in three dimensions, this minimum point
on the surface is going to occur when the slope with
respect to m and the slope with respect to b is 0. So at that point, the partial
derivative of our squared error with respect to m is
going to be equal to 0. And the partial derivative of
our squared error with respect to b is going to
be equal to 0. So all we're going to do, in
the next video, is take the partial derivative of this
expression with respect to m, set that equal to 0. And the partial derivative of
this with respect to b, set that equal to 0. And then we're ready to solve
for the m in the b. Or the particular m and b.