If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

# Proof (part 2) minimizing squared error to regression line

Proof Part 2 Minimizing Squared Error to Line. Created by Sal Khan.

## Want to join the conversation?

• why are the axis m and b instead of x and y? aren't m and b constants and parts of the equation for the surface, with x,y, and SE being the coordinates?
• In line fitting, we are trying to find the equation of the line - find the slope (m) and the y-intercept (b) of the best-fit line y=mx+b, based on a known set of (x,y) coordinates. So the mean of y, for example, is a constant, since it is the arithmetic mean of all the y's in our data set. We don't know what m or b are, and we're trying out different ones, so they are variables. "If I try this slope and this intercept, how big is my SE?" That is what this graph would showing. If the question was "which of these data points are closest to the ideal, given by this line" we would use x and y as the variables.

We're in the situation of "I know the answer,now what was the question?" We know y given our x values, now we need to find the line that would get us as close as possible to those y values if all we had was the x values.
• Why would you divided the y1^ 2+ y2^2 + ...yn^2 by n to simplify the equation? Just a little unclear on why you divide by n.
• If you add up all n y ^2 terms (each one is in general different), then divide by n, you get a mean value for y^2. So you no longer need all the different values, because you have one that represents them all.
• How exactly did Sal figure out that the surface was 3D parabolic ?
• I'm assuming this is because you are dealing with the slope (m), y intercept (b), and your SE line (yellow line) and you are estimating the partial derivative of the squared error. Anything that involves minute changes in the measuring of something takes it away from algebra (which deals in straight lines and x and y coordinates only) into calculus and derivatives and 3 variable i.e. 3 dimensional graphs.
• At , If the partial derivative of SE with respect to m or b is 0, then it could be minimum but could also be maximum, since the derivative is 0, how can we say surely, that what we get is minimized m and b and not maximized m and b? Thank you.
• Hi Yuya!

The `m` and `b` containing terms can be looked at as equations for parabolas. Since the "squared" term is positive in both cases these parabolas open upwards. Consequently these parabolas can only have minima. You can confirm this using either the first or second derivative tests.
• how can you say that it will be a parabola?
• Whenever you deal with the square of an independent variable (x value or the values on the x-axis) it will be a parabola. What you could do yourself is plot x and y values, making the y values the square of the x values. So x = 2 then y = 4, x = 3 then y = 9 and so on. You will see it is a parabola.
• what is partial derivative by the way? I learned in the past but forgot xD. Can anyone explain briefly for me what its usage is??
Could you tell me where to find Sal's videos about it too?
• In the 3D graph that Sal draws, can the SE values be negative? Aren't they positive for all values for m and b?
• You're correct that the squared error (SE) values should be non-negative, as they represent the sum of squared distances which cannot be negative. In the 3D graph, the surface representing the SE should indeed be non-negative for all valid combinations of m and b. Any negative values in the SE would indicate a computational error or a conceptual mistake.
(1 vote)
• Proof Part 2 Minimizing Squared Error To Line:How is it that you are taking the mean of the parenthetical values (@ /) in the video? It appears that this operation to simplify is being performed on only one side of the equation. How is this algebraically correct to not perform this on both sides of the equation?