Main content

## More on regression

# Second regression example

## Video transcript

Let's find the equation for
the regression line that best fits this. Where the fit minimizes the
squared distance to each of the points. And then let's actually
calculate how good of a fit it is using an r squared. And we might have to do that
in the next video, depending on time. So just as a reminder, the
line is going to have the equation y is equal mx plus b. And we've shown ourselves that
the slope of this line-- the one that best minimizes the
squared distance to each of those points-- is going to be
the mean of the xy's minus the mean of x times the mean of y. All of that over the mean of the
x's squared, or the mean of the x squareds, minus the
means of the x's squared. So one way to memorize it, I
guess, is the first terms have the mean of the combined
things. You're multiplying x times
itself first, then meaning. You're multiplying x
times y, times each other first, then meaning. And then the second terms,
you're finding the means of the individual components
and then multiplying. Mean of x, times mean of y,
mean of x times mean of x. So hopefully maybe that helps. Maybe it doesn't. But we can calculate
the slope. And then the y intercept, b, is
just going to be equal to the mean of y times whatever we
calculate here for m, times the mean of x. And we can do that because we
know that the point mean of x comma mean of y is going to be
on this regression live. So what's calculate them. And you'll see, in the last
example we did three points. We only have four points here. But the computations get
more and more intense. You can imagine what would
happen if you had 10 or 20 or 100 points. You pretty much have to use a
calculator at that point. Or computer, even better. Or a spreadsheet. So let's calculate m. And to do that, let's calculate
the components. So the mean of x-- the mean of
the x's-- is going to be equal to, this x is negative 2, plus
negative 1, plus 1, plus 4. All of that over, we have
four x data points. These two guys cancel out. Negative 2 plus 4 is 2. 2 over 4 is equal to 1/2. Now let's do the mean
of the y's. We have negative 3, we
have a negative 1. And then we have a 2, and
then we have a 3. And once again, we have
four data points. That guy and that
guy cancel out. Negative 1 plus 2 is 1. So this is equal to 1/4. Now let's figure out the
mean of the xy's. So x times y, the
mean of that. So over here we have negative
2 times negative 3. Negative 2 times negative
3 is positive 6. Plus negative 1 times negative
1 is positive 1. Plus 1 times 2 is 2. Plus 4 times 3 is 12. And we have four of
these points. And what is this? This is 6 plus 1 is 7. 7 plus 2 is 9. 9 plus 12 is 21 over 4. This is equal to 21/4. And then finally, we want-- I'll
do this in a new color-- the mean of the x's squared. And so that is going to be equal
to-- negative 2 squared is positive 4. Plus negative 1 squared
is positive 1. Plus 1 squared is 1. Plus for 4 squared is 16. All of that over 4. 4 plus 2 is 6 plus
16 is 22 over 4. So 22/4 is the same
thing as 11/2. So now we're now ready to
calculate the actual slope. Let me do it over here. Well actually, let me
do it over here. I want to be able look at
everything we've done. So this is going to be equal to,
in this case, it's going to be the mean of the
xy's, which is 21/4. Minus the product of the mean
of x, which is 1/2. Times the mean of the
y's, which is 1/4. And then all of that over
the mean of the x squareds, which is 11/2. So we did that. Minus the mean of
the x's squared. The mean of the x's,
once again, is 1/2. And so what is this equal to? I'm just going to go straight
to the calculator. I could deal with the fractions,
but this isn't a review of adding
and subtracting and multiplying fractions. Let's just go straight
to the calculator. Actually, let me simplify
it before. It's just too tempting
to simplify. Let me copy and paste it. Let's go down here
to calculate it. And so this is going to be--
maybe I should have used the calculator, but it's
too tempting. So what's this on top? On top, we have 21/4 minus 1/2
times 1/4 is minus 1/8. All of that over 11/2 minus
1/2 squared, which is 1/4. Now, one way to simplify this
right from the get go is multiply the numerator and
the denominator by 8. And that's just to get rid
of all these fractions. So 21/4 times 8 is going to be
the same thing is 21 times 2, which is equal to 42. Minus 1/8 times 8. We have to, of course,
distribute the eights. So it's going to be minus 1. All of that over, 8 times 11/2
is going to be 11 times 4, which is 44. And then 8 times 1/4 is
2, so it's minus 2. So 42 minus 1 is 41. And then 44 minus 2 is 42. So the slope is 41/42. So a little bit less than
a slope of one. 42/42 would be exactly 1. So our regression slope is
a little bit less than 1. And then our regression
y-intercept, b, is going to be equal to the mean of the y. So 1/4, minus our slope, minus
41/42, times the mean of the x's, so times 1/2. And so this is going to be
equal to 1/4 minus 41/84, which is equal to-- let
me just find a common denominator. So let's go over 84. So what's 1/4 of 84? 1/4 of 80 is 20. So this is 21. 21 times 4 is 84. This is 1/4 of 84. Yep, that's right. So it's going to be 21 minus 41
over 84, which is equal to negative 20. Negative 20 over 84, which is
the same thing, they're both divisible by 4, the numerator
divided by 4 is negative 5, over 21. So our regression line is going
to be y is equal to 41/42 x minus 5/21. And 5/21 is a little
bit less than 1/4. 5/20 would be 1/4. We made the denominator a little
bit bigger, so it's going to be a little bit
less than negative 1/4. So our y-intercept is going to
be a little bit less than negative 1/4. And then we're going to
have a slope a little bit less than 1. So our line is going to look
something like this. If I were able to actually draw
a straight line, it would look something like
that over there. So I'm going to leave you
there in this video. In the next video, we're
actually going to calculate the r squared for this line. How good of a fit is it? How much of the total variation
in the y values can be explained by the variation
in the x values, or by the line itself?