Linear regression and correlation
-
Estimating the line of best fit
-
Correlation and Causality
-
Squared Error of Regression Line
-
Proof (Part 1) Minimizing Squared Error to Regression Line
-
Proof Part 2 Minimizing Squared Error to Line
-
Proof (Part 3) Minimizing Squared Error to Regression Line
-
Proof (Part 4) Minimizing Squared Error to Regression Line
-
Regression Line Example
-
Second Regression Example
-
R-Squared or Coefficient of Determination
-
Calculating R-Squared
-
Covariance and the Regression Line
Second Regression Example Second Regression Example
⇐ Use this menu to view and help create subtitles for this video in many different languages.
You'll probably want to hide YouTube's captions if using these subtitles.
- Let's find the equation for the regression line
- that best fits this.
- Where the fit minimizes the squared distance to each of
- the points.
- And then let's actually calculate how good of a fit it
- is using an r squared.
- And we might have to do that in the next video,
- depending on time.
- So just as a reminder, the line is going to have the
- equation y is equal mx plus b.
- And we've shown ourselves that the slope of this line-- the
- one that best minimizes the squared distance to each of
- those points-- is going to be the mean of the xy's minus the
- mean of x times the mean of y.
- All of that over the mean of the x's squared, or the mean
- of the x squareds, minus the means of the x's squared.
- So one way to memorize it, I guess, is the first terms have
- the mean of the combined things.
- You're multiplying x times itself first, then meaning.
- You're multiplying x times y, times each
- other first, then meaning.
- And then the second terms, you're finding the means of
- the individual components and then multiplying.
- Mean of x, times mean of y, mean of x times mean of x.
- So hopefully maybe that helps.
- Maybe it doesn't.
- But we can calculate the slope.
- And then the y intercept, b, is just going to be equal to
- the mean of y times whatever we calculate here for m, times
- the mean of x.
- And we can do that because we know that the point mean of x
- comma mean of y is going to be on this regression live.
- So what's calculate them.
- And you'll see, in the last example we did three points.
- We only have four points here.
- But the computations get more and more intense.
- You can imagine what would happen if you had 10 or 20 or
- 100 points.
- You pretty much have to use a calculator at that point.
- Or computer, even better.
- Or a spreadsheet.
- So let's calculate m.
- And to do that, let's calculate the components.
- So the mean of x-- the mean of the x's-- is going to be equal
- to, this x is negative 2, plus negative 1, plus 1, plus 4.
- All of that over, we have four x data points.
- These two guys cancel out.
- Negative 2 plus 4 is 2.
- 2 over 4 is equal to 1/2.
- Now let's do the mean of the y's.
- We have negative 3, we have a negative 1.
- And then we have a 2, and then we have a 3.
- And once again, we have four data points.
- That guy and that guy cancel out.
- Negative 1 plus 2 is 1.
- So this is equal to 1/4.
- Now let's figure out the mean of the xy's.
- So x times y, the mean of that.
- So over here we have negative 2 times negative 3.
- Negative 2 times negative 3 is positive 6.
- Plus negative 1 times negative 1 is positive 1.
- Plus 1 times 2 is 2.
- Plus 4 times 3 is 12.
- And we have four of these points.
- And what is this?
- This is 6 plus 1 is 7.
- 7 plus 2 is 9.
- 9 plus 12 is 21 over 4.
- This is equal to 21/4.
- And then finally, we want-- I'll do this in a new color--
- the mean of the x's squared.
- And so that is going to be equal to-- negative 2 squared
- is positive 4.
- Plus negative 1 squared is positive 1.
- Plus 1 squared is 1.
- Plus for 4 squared is 16.
- All of that over 4.
- 4 plus 2 is 6 plus 16 is 22 over 4.
- So 22/4 is the same thing as 11/2.
- So now we're now ready to calculate the actual slope.
- Let me do it over here.
- Well actually, let me do it over here.
- I want to be able look at everything we've done.
- So this is going to be equal to, in this case, it's going
- to be the mean of the xy's, which is 21/4.
- Minus the product of the mean of x, which is 1/2.
- Times the mean of the y's, which is 1/4.
- And then all of that over the mean of the x
- squareds, which is 11/2.
- So we did that.
- Minus the mean of the x's squared.
- The mean of the x's, once again, is 1/2.
- And so what is this equal to?
- I'm just going to go straight to the calculator.
- I could deal with the fractions, but this isn't a
- review of adding and subtracting
- and multiplying fractions.
- Let's just go straight to the calculator.
- Actually, let me simplify it before.
- It's just too tempting to simplify.
- Let me copy and paste it.
- Let's go down here to calculate it.
- And so this is going to be-- maybe I should have used the
- calculator, but it's too tempting.
- So what's this on top?
- On top, we have 21/4 minus 1/2 times 1/4 is minus 1/8.
- All of that over 11/2 minus 1/2 squared, which is 1/4.
- Now, one way to simplify this right from the get go is
- multiply the numerator and the denominator by 8.
- And that's just to get rid of all these fractions.
- So 21/4 times 8 is going to be the same thing is 21 times 2,
- which is equal to 42.
- Minus 1/8 times 8.
- We have to, of course, distribute the eights.
- So it's going to be minus 1.
- All of that over, 8 times 11/2 is going to be 11 times 4,
- which is 44.
- And then 8 times 1/4 is 2, so it's minus 2.
- So 42 minus 1 is 41.
- And then 44 minus 2 is 42.
- So the slope is 41/42.
- So a little bit less than a slope of one.
- 42/42 would be exactly 1.
- So our regression slope is a little bit less than 1.
- And then our regression y-intercept, b, is going to be
- equal to the mean of the y.
- So 1/4, minus our slope, minus 41/42, times the mean of the
- x's, so times 1/2.
- And so this is going to be equal to 1/4 minus 41/84,
- which is equal to-- let me just find a common
- denominator.
- So let's go over 84.
- So what's 1/4 of 84?
- 1/4 of 80 is 20.
- So this is 21.
- 21 times 4 is 84.
- This is 1/4 of 84.
- Yep, that's right.
- So it's going to be 21 minus 41 over 84, which is equal to
- negative 20.
- Negative 20 over 84, which is the same thing, they're both
- divisible by 4, the numerator divided by 4 is
- negative 5, over 21.
- So our regression line is going to be y is equal to
- 41/42 x minus 5/21.
- And 5/21 is a little bit less than 1/4.
- 5/20 would be 1/4.
- We made the denominator a little bit bigger, so it's
- going to be a little bit less than negative 1/4.
- So our y-intercept is going to be a little bit less than
- negative 1/4.
- And then we're going to have a slope a little
- bit less than 1.
- So our line is going to look something like this.
- If I were able to actually draw a straight line, it would
- look something like that over there.
- So I'm going to leave you there in this video.
- In the next video, we're actually going to calculate
- the r squared for this line.
- How good of a fit is it?
- How much of the total variation in the y values can
- be explained by the variation in the x values, or by the
- line itself?
Be specific, and indicate a time in the video:
At 5:31, how is the moon large enough to block the sun? Isn't the sun way larger?
|
Have something that's not a question about this content? |
This discussion area is not meant for answering homework questions.
Discuss the site
For general discussions about Khan Academy, visit our Reddit discussion page.
Flag inappropriate posts
Here are posts to avoid making. If you do encounter them, flag them for attention from our Guardians.
abuse
- disrespectful or offensive
- an advertisement
not helpful
- low quality
- not about the video topic
- soliciting votes or seeking badges
- a homework question
- a duplicate answer
- repeatedly making the same post
wrong category
- a tip or feedback in Questions
- a question in Tips & Feedback
- an answer that should be its own question
about the site
Share a tip
Suggest a fix
Have something that's not a tip or feedback about this content?
This discussion area is not meant for answering homework questions.