Linear regression and correlation
-
Estimating the line of best fit
-
Correlation and Causality
-
Squared Error of Regression Line
-
Proof (Part 1) Minimizing Squared Error to Regression Line
-
Proof Part 2 Minimizing Squared Error to Line
-
Proof (Part 3) Minimizing Squared Error to Regression Line
-
Proof (Part 4) Minimizing Squared Error to Regression Line
-
Regression Line Example
-
Second Regression Example
-
R-Squared or Coefficient of Determination
-
Calculating R-Squared
-
Covariance and the Regression Line
Proof Part 2 Minimizing Squared Error to Line Proof Part 2 Minimizing Squared Error to Line
⇐ Use this menu to view and help create subtitles for this video in many different languages.
You'll probably want to hide YouTube's captions if using these subtitles.
- Our goal is to simplify this expression for the squared
- error between those n points.
- Just to remind ourselves what we're doing, we
- have these n points.
- And we're taking the sum of the squared error between each
- of those n points and our actual line, y
- equals mx plus b.
- And we get this expression over here, which we've been
- simplifying over the last couple videos.
- We're going to try to simplify this
- expression as much as possible.
- And then, we're going to try to to minimize this
- expression.
- Or find the m and b values that minimize it.
- Or I guess you could call it the best fitting line.
- Now to do that, it looks like we were just making the
- algebra even hairier and hairier.
- But this next step is going to simplify things a good bit.
- So just to show you that, if I want to take the mean of all
- of the squared values of the y's-- So that would be this.
- That would be y1 squared plus y2 squared plus all the way to
- yn squared.
- So I've summed n values, n squared values.
- And then I want to divide it by n, since there
- are n values here.
- And this is the mean of the y's squared.
- That's how we can denote it, just like that.
- Or, if you multiply both sides of this equation by n, you get
- y1 squared plus y2 squared plus all the way to yn squared
- is equal to n times the mean of the squared values of y.
- And notice, this is exactly what we have over here.
- That is n times the mean of the squared values of y.
- Or the mean of the y squareds.
- And we can do that with each of these terms. What is x1y1
- plus x2y2 plus all the way to all the way to xnyn.
- Well, if we take this whole sum and we divide it by n
- terms, this is going to be the mean value for x times y.
- For each of those points, you multiply x times y.
- And you find the mean of all of those products.
- That's exactly what this is.
- Well, once again, you multiply both sides of this equation by
- n, and you get x1y1 plus x2y2 plus all the way to xnyn is
- equal to n times the mean of xy's.
- I think you see where this is going.
- This term right here is going to be equal to n times the
- mean of the products of xy.
- This term right here is n times the
- mean of the y values.
- And then, this term right here is n times the mean of the x
- squared values.
- This term right here is the mean of the x's times n.
- If you divided this by n, you'd get the mean.
- Since were not dividing it by n, this is the mean times n.
- And then this is, obviously, we don't
- the simplify anything.
- So let's rewrite everything using our new notation,
- knowing that these are the means of y squared,
- of xy, and all that.
- So our squared error to the line from the sum of the
- squared error to the line from the n points is going to be
- equal to-- this term right here is n times the mean of
- the y squared values.
- This term right here is equal to negative 2m.
- That's just that right there.
- Times n times the mean of the xy values,
- the arithmetic mean.
- And then we have this term over here.
- I think you can appreciate this is simplifying the
- algebraic expression a good bit.
- This term right over here is going to be minus 2bn times
- the mean of the y values.
- And then we have plus m squared times n times the mean
- of the x squared values.
- And then we have-- almost there, home stretch-- we have
- this over here which is plus 2mb times n times the mean of
- the x values.
- And then, finally, we have plus nb squared.
- So really, in the last two to three videos, all we've done
- is we simplified the expression for the sum of the
- squared differences from the those n points to this line, y
- equals mx plus b.
- So we're finished with the hard core algebra stage.
- The next stage, we actually want to optimize this.
- Maybe a the better way to talk about it, we want to minimize
- this expression right over here.
- We want to find the m and the b values that minimize it.
- And to help visualize it, we're going to start breaking
- into a little bit of
- three-dimensional calculus here.
- But hopefully it won't be too daunting.
- If you've done any partial
- derivatives, it won't be difficult.
- This is a surface.
- If you view that you have the x and y data points,
- everything here is a constant except for
- the m's and the b's.
- We're assuming that we have the x's and y's.
- So we can figure out the mean of the squared values of y,
- the mean of the xy product, the mean of the y's, the mean
- of the x squareds.
- We assume that those are all actual numbers.
- So this expression right here, it's actually going to be a
- surface in three dimensions.
- So you can imagine, this right here, that is the m-axis.
- This right here is the b-axis.
- And then, you could imagine the vertical axis to be the
- squared error.
- This is the squared error of the line axis.
- So for any combination of m and b, if you're in the mb
- plane, you pick some combination of m and b.
- You put it into this expression for the squared
- error of the line.
- It'll give you a point.
- If you do that for all of the combinations of m's and b's,
- you're going to get a surface.
- And the surface is going to look something like this.
- I'm going to try my best to draw it.
- It's going to look like this.
- You could almost imagine it as a kind of a bowl.
- Or you could even think of it as a
- three-dimensional parabola.
- If you want to think of it that way.
- Instead of a parabola that just goes like this.
- If you were to kind of rotate it around and distort it a
- little bit, you would get this thing that looks kind of like
- a cup, or a thimble, or whatever.
- And so what we want to do is to find the m and b values
- that minimize.
- Notice, this is a three-dimensional surface.
- I don't know if I'm doing justice to it.
- So you can imagine a three-dimensional surface that
- looks something like this.
- This is the back part that you're not seeing.
- So that's the inside of our three-dimensional surface.
- We want to find the m and b values that minimize the value
- on the surface.
- So there's some m and b value right over here
- that minimizes it.
- And I'll actually do the calculation in the next video.
- But to do that, we're going to find the partial derivative of
- this with respect to m.
- And we're going to find the partial derivative of this
- with respect to b and set both of them equal to 0.
- Because at this minimum point, I guess you could say in three
- dimensions, this minimum point on the surface is going to
- occur when the slope with respect to m and the slope
- with respect to b is 0.
- So at that point, the partial derivative of our squared
- error with respect to m is going to be equal to 0.
- And the partial derivative of our squared error with respect
- to b is going to be equal to 0.
- So all we're going to do, in the next video, is take the
- partial derivative of this expression with respect to m,
- set that equal to 0.
- And the partial derivative of this with respect to b, set
- that equal to 0.
- And then we're ready to solve for the m in the b.
- Or the particular m and b.
Be specific, and indicate a time in the video:
At 5:31, how is the moon large enough to block the sun? Isn't the sun way larger?
|
Have something that's not a question about this content? |
This discussion area is not meant for answering homework questions.
Discuss the site
For general discussions about Khan Academy, visit our Reddit discussion page.
Flag inappropriate posts
Here are posts to avoid making. If you do encounter them, flag them for attention from our Guardians.
abuse
- disrespectful or offensive
- an advertisement
not helpful
- low quality
- not about the video topic
- soliciting votes or seeking badges
- a homework question
- a duplicate answer
- repeatedly making the same post
wrong category
- a tip or feedback in Questions
- a question in Tips & Feedback
- an answer that should be its own question
about the site
Share a tip
Suggest a fix
Have something that's not a tip or feedback about this content?
This discussion area is not meant for answering homework questions.