If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Proof (part 2) minimizing squared error to regression line

Proof Part 2 Minimizing Squared Error to Line. Created by Sal Khan.

Want to join the conversation?

  • starky ultimate style avatar for user Peter Qi
    why are the axis m and b instead of x and y? aren't m and b constants and parts of the equation for the surface, with x,y, and SE being the coordinates?
    (7 votes)
    Default Khan Academy avatar avatar for user
    • mr pants teal style avatar for user Cassie
      In line fitting, we are trying to find the equation of the line - find the slope (m) and the y-intercept (b) of the best-fit line y=mx+b, based on a known set of (x,y) coordinates. So the mean of y, for example, is a constant, since it is the arithmetic mean of all the y's in our data set. We don't know what m or b are, and we're trying out different ones, so they are variables. "If I try this slope and this intercept, how big is my SE?" That is what this graph would showing. If the question was "which of these data points are closest to the ideal, given by this line" we would use x and y as the variables.

      We're in the situation of "I know the answer,now what was the question?" We know y given our x values, now we need to find the line that would get us as close as possible to those y values if all we had was the x values.
      (14 votes)
  • blobby green style avatar for user Andy Little
    Why would you divided the y1^ 2+ y2^2 + ...yn^2 by n to simplify the equation? Just a little unclear on why you divide by n.
    (5 votes)
    Default Khan Academy avatar avatar for user
  • spunky sam blue style avatar for user Vishnu Gopalakrishnan
    How exactly did Sal figure out that the surface was 3D parabolic ?
    (5 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user DC83
      I'm assuming this is because you are dealing with the slope (m), y intercept (b), and your SE line (yellow line) and you are estimating the partial derivative of the squared error. Anything that involves minute changes in the measuring of something takes it away from algebra (which deals in straight lines and x and y coordinates only) into calculus and derivatives and 3 variable i.e. 3 dimensional graphs.
      (3 votes)
  • piceratops seed style avatar for user Ayush  Garg
    how can you say that it will be a parabola?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user DC83
      Whenever you deal with the square of an independent variable (x value or the values on the x-axis) it will be a parabola. What you could do yourself is plot x and y values, making the y values the square of the x values. So x = 2 then y = 4, x = 3 then y = 9 and so on. You will see it is a parabola.
      (3 votes)
  • hopper jumping style avatar for user Yuya Fujikawa
    At , If the partial derivative of SE with respect to m or b is 0, then it could be minimum but could also be maximum, since the derivative is 0, how can we say surely, that what we get is minimized m and b and not maximized m and b? Thank you.
    (2 votes)
    Default Khan Academy avatar avatar for user
    • female robot grace style avatar for user tyersome
      Hi Yuya!

      The m and b containing terms can be looked at as equations for parabolas. Since the "squared" term is positive in both cases these parabolas open upwards. Consequently these parabolas can only have minima. You can confirm this using either the first or second derivative tests.
      (3 votes)
  • piceratops tree style avatar for user Tombentom
    what is partial derivative by the way? I learned in the past but forgot xD. Can anyone explain briefly for me what its usage is??
    Could you tell me where to find Sal's videos about it too?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • leaf green style avatar for user Vasishta Polisetty
    In the 3D graph that Sal draws, can the SE values be negative? Aren't they positive for all values for m and b?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • starky sapling style avatar for user ForgottenUser
    I don't understand why we are trying to use the partial derivatives to find the solution to this 3D parabola. Wouldn't it make more sense to find the point at which SE is minimized?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Abhishek Inamdar
    why the derivative of SE w.r.t. m and d is zero?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Atharva
    but if we are finding minimum values of m and b differently, wont the coordinate actually lie off the equation?
    (2 votes)
    Default Khan Academy avatar avatar for user

Video transcript

Our goal is to simplify this expression for the squared error between those n points. Just to remind ourselves what we're doing, we have these n points. And we're taking the sum of the squared error between each of those n points and our actual line, y equals mx plus b. And we get this expression over here, which we've been simplifying over the last couple videos. We're going to try to simplify this expression as much as possible. And then, we're going to try to to minimize this expression. Or find the m and b values that minimize it. Or I guess you could call it the best fitting line. Now to do that, it looks like we were just making the algebra even hairier and hairier. But this next step is going to simplify things a good bit. So just to show you that, if I want to take the mean of all of the squared values of the y's-- So that would be this. That would be y1 squared plus y2 squared plus all the way to yn squared. So I've summed n values, n squared values. And then I want to divide it by n, since there are n values here. And this is the mean of the y's squared. That's how we can denote it, just like that. Or, if you multiply both sides of this equation by n, you get y1 squared plus y2 squared plus all the way to yn squared is equal to n times the mean of the squared values of y. And notice, this is exactly what we have over here. That is n times the mean of the squared values of y. Or the mean of the y squareds. And we can do that with each of these terms. What is x1y1 plus x2y2 plus all the way to all the way to xnyn. Well, if we take this whole sum and we divide it by n terms, this is going to be the mean value for x times y. For each of those points, you multiply x times y. And you find the mean of all of those products. That's exactly what this is. Well, once again, you multiply both sides of this equation by n, and you get x1y1 plus x2y2 plus all the way to xnyn is equal to n times the mean of xy's. I think you see where this is going. This term right here is going to be equal to n times the mean of the products of xy. This term right here is n times the mean of the y values. And then, this term right here is n times the mean of the x squared values. This term right here is the mean of the x's times n. If you divided this by n, you'd get the mean. Since were not dividing it by n, this is the mean times n. And then this is, obviously, we don't the simplify anything. So let's rewrite everything using our new notation, knowing that these are the means of y squared, of xy, and all that. So our squared error to the line from the sum of the squared error to the line from the n points is going to be equal to-- this term right here is n times the mean of the y squared values. This term right here is equal to negative 2m. That's just that right there. Times n times the mean of the xy values, the arithmetic mean. And then we have this term over here. I think you can appreciate this is simplifying the algebraic expression a good bit. This term right over here is going to be minus 2bn times the mean of the y values. And then we have plus m squared times n times the mean of the x squared values. And then we have-- almost there, home stretch-- we have this over here which is plus 2mb times n times the mean of the x values. And then, finally, we have plus nb squared. So really, in the last two to three videos, all we've done is we simplified the expression for the sum of the squared differences from the those n points to this line, y equals mx plus b. So we're finished with the hard core algebra stage. The next stage, we actually want to optimize this. Maybe a the better way to talk about it, we want to minimize this expression right over here. We want to find the m and the b values that minimize it. And to help visualize it, we're going to start breaking into a little bit of three-dimensional calculus here. But hopefully it won't be too daunting. If you've done any partial derivatives, it won't be difficult. This is a surface. If you view that you have the x and y data points, everything here is a constant except for the m's and the b's. We're assuming that we have the x's and y's. So we can figure out the mean of the squared values of y, the mean of the xy product, the mean of the y's, the mean of the x squareds. We assume that those are all actual numbers. So this expression right here, it's actually going to be a surface in three dimensions. So you can imagine, this right here, that is the m-axis. This right here is the b-axis. And then, you could imagine the vertical axis to be the squared error. This is the squared error of the line axis. So for any combination of m and b, if you're in the mb plane, you pick some combination of m and b. You put it into this expression for the squared error of the line. It'll give you a point. If you do that for all of the combinations of m's and b's, you're going to get a surface. And the surface is going to look something like this. I'm going to try my best to draw it. It's going to look like this. You could almost imagine it as a kind of a bowl. Or you could even think of it as a three-dimensional parabola. If you want to think of it that way. Instead of a parabola that just goes like this. If you were to kind of rotate it around and distort it a little bit, you would get this thing that looks kind of like a cup, or a thimble, or whatever. And so what we want to do is to find the m and b values that minimize. Notice, this is a three-dimensional surface. I don't know if I'm doing justice to it. So you can imagine a three-dimensional surface that looks something like this. This is the back part that you're not seeing. So that's the inside of our three-dimensional surface. We want to find the m and b values that minimize the value on the surface. So there's some m and b value right over here that minimizes it. And I'll actually do the calculation in the next video. But to do that, we're going to find the partial derivative of this with respect to m. And we're going to find the partial derivative of this with respect to b and set both of them equal to 0. Because at this minimum point, I guess you could say in three dimensions, this minimum point on the surface is going to occur when the slope with respect to m and the slope with respect to b is 0. So at that point, the partial derivative of our squared error with respect to m is going to be equal to 0. And the partial derivative of our squared error with respect to b is going to be equal to 0. So all we're going to do, in the next video, is take the partial derivative of this expression with respect to m, set that equal to 0. And the partial derivative of this with respect to b, set that equal to 0. And then we're ready to solve for the m in the b. Or the particular m and b.