Covariance and the Regression Line Covariance, Variance and the Slope of the Regression Line
Covariance and the Regression Line
⇐ Use this menu to view and help create subtitles for this video in many different languages. You'll probably want to hide YouTube's captions if using these subtitles.
- What I want to do in this video is introduce you
- to the idea of the covariance between 2 random variables.
- It's defined as
- the expected value of the product of the distances of each random variable from their mean
- or from their expected value. So let me just write that down.
- So I'll have X first. I'll do it in another color.
- It's the expected value of random variable minus expected value of X.
- You could view this as the population mean of X.
- Times-- and then this is random variable Y-- so times the distance from Y
- to its expected value or the population mean of Y.
- If it doesn't make a lot of sense yet,
- well, you can just always think about what it's doing, playing around with some numbers here.
- but the reality is it's saying how much they vary together.
- So you always take an X and a Y for each of the data points. At the
- whole population, every X and Y, they kind of go together with each other that are coordinate.
- You put into this. What happens is
- let's say that X is above its mean when Y is below its mean.
- So let's say in the population, you had the point.
- So one instantiation of the random variables, you have--
- you sample once from the universe, and you get X=1 and Y=3.
- And you knew ahead of time that E[X] is 0.
- And let's say E[Y]=4.
- So in this situation, what just happened?
- Now we don't know the entire covariance. We only have 1 sample here for this random variable.
- But what just happened here?
- We have one minus-- we're not gonna calculate the entire expected value.
- I just want to calculate what happens what's inside the expected value.
- We'll have 1-0. So you'll have a 1 times a 3-4, times a -1.
- So 1 times -1 is -1.
- What is that telling us? Well, it's telling us at least for this sample,
- this one time that we sampled our random variables X and Y,
- X was above its expected value,
- when Y was below its expected value.
- Let's say for the entire population this happened.
- Then it would make sense that they have negative covariance.
- When one goes up, the other one goes down. When one goes down, the other one goes up.
- If they both go up together, they would have a positive covariance.
- If they both go down together,
- and the degree to which they do it together, will tell you de magnitude of the covariance.
- Hopefully, that gives you a little bit of intuition about what the covariance is trying to tell us.
- But the more important thing I want to do in this video, is to connect this formula,
- this definition of covariance, to everything we've been doing with least square regression.
- Really, it's kind of fun math thing to do, to show you all these connections,
- where the definition of covariance really becomes useful.
- I really do think it's motivated to a large degree by where it shows up in regressions.
- This is all stuff that we've kind of seen before. You'll just see it in a different way.
- In this whole video, I'm gonna rewrite this definition of covariance over here.
- So this is going to be the same thing as the expected value of--
- I'm just gonna multiply these 2 binomials here--
- so the expected value of our random variable X times our random variable Y,
- minus-- I'll just do X's first-- so plus X times -E[Y].
- So I'll just say - X times E[Y].
- Negative sign comes from this negative sign right over here.
- And then we have -E[X] times Y.
- This is doing the distributive property twice.
- And then finally, you have the -E[X] time -E[Y].
- And the negatives cancel out.
- And you're just going to have plus E[X] times E[Y].
- And of course it's the expected value of this entire thing.
- Now let's see if we can rewrite this.
- The expected value of the sum and difference of a bunch of random variables
- is just the sum or difference their expected values. So this is going to be the same thing--
- Remember, expected value, in a lot of context, you can view it just as arithmetic mean.
- Or in a continuous distribution, you can view it as a probability weighted sum or integral.
- Either way. We've seen it before, I think.
- So let's rewrite this.
- This is equal to the expected value of the random variables, X and Y, X times Y.
- I'm trying to keep it color coded for you.
- And then we have minus X times E[Y].
- So then we're going to have - E [X time E[Y]].
- Stay with the right colors.
- Then you're going to have -E[E[X] times Y].
- This might look really confusing with all these embedded expected values.
- But one of the ways to think about it is, the
- things that already have the expected values can just be viewed as numbers, as knowns.
- Usually, we'll take them out of the expected value,
- because the expected value of the expected value is the same thing as the expected value.
- Actually, let me write this over here, just to remind ourselves.
- The expected value of the expected value of X is just going to be the expected value of X.
- Think of it this way, you can view this as the population mean of the random variable.
- So that's going to be a known. It's out there in the universe.
- So the expected value of that is going to be itself.
- If the population mean or the expected value of X is 5,
- this is like saying the expected value of 5.
- The expected value of 5 is going to be 5.
- Hopefully that will make sense. We'll use that in a second.
- So we're almost done. We did the expected value of this, and we have one term left.
- The final term, the expected value of this guy.
- And here we can actually use property-- I'll just write it down.
- So the expected value of E[X] times E[Y].
- Let's see if we can simplify here.
- So this is just going to be the expected value of the product of these random variables.
- I'll just leave that the way it is.
- So the expected value of XY.
- Now what do we have over here?
- We have the expected value of X times--
- once again, you can go back to what we've just said. This is just gonna be a number, E[Y].
- So we can just bring this out.
- If this was the expected value of 3X,
- it would be the same thing as 3 times E[X].
- So we could rewrite this as E[Y] times E[X].
- So you can view it as we factor it out of the expected value.
- So just like that. And then you have minus--
- Same thing over here. You can factor out this E[X].
- Minus E[X] times E[Y].
- This is getting confusing with all the E's around.
- And then finally, you have the expected value of this thing, of two expected values.
- That's just going to be product of those two expected values.
- So that's just going to be plus E[X] times E[Y].
- Now what do we have here? We have E[Y] times E[X].
- And then we're subtracting E[X]?E[Y].
- These two things are exactly the same thing.
- So this is going to be-- we're actually subtracting it twice, and then we have one more.
- These are all the same thing.
- This is E[Y]?E[X].
- This is E[Y]?E[X], just in different order.
- And this is E[Y]?E[X].
- We're subtracting it twice, then we're adding it.
- Or the other way to think about it, this guy and that guy will cancel out.
- But what do we have left? We have the covariance of these 2 random variables X and Y,
- equal to the expected value of--
- I'll switch back to my colors, just because it's the final result.
- E[XY] - E[Y]E[X].
- Now you can calculate these expected values if you know everything about the
- probability distribution or density functions for each of these random variables,
- or if you have the entire population that you're sampling from
- whenever you take an instantiation of these random variables.
- But let's say you just had a sample of these random variables, how could you estimate it?
- Well, if you're estimating it, let's say you just have a bunch of data points,
- a bunch of coordinates-- I think we'll start to see how this relates to what we do with regression.
- The expected value of XY can be approximated by the sample mean of the product of X and Y.
- You take each of your X Y associations, take the product,
- and then take the mean of all of them. That's going to be the product of X and Y.
- Then this thing right over here, E[Y], can be approximated by the sample mean of Y.
- And E[X] can be approximated by the sample mean of X.
- So what can the covariance of 2 random variables be approximated by?
- Well, this right here is the mean of their product from your sample.
- Minus the mean of your sample Y's, times the mean of your sample X's.
- And this should start looking a little bit familiar. Because what is this?
- This was the numerator when we were trying to figure the slope of the regression line.
- Let me just rewrite the formula here just to remind you.
- It was literally the mean of the products of each of our data points, or the x, y's,
- minus the mean of y's, times the mean of the x's,
- all of that over, the mean of the x squareds-- or you can even view it as the mean of x times x's--
- or I can just write x squareds over here, minus the mean of x's squared.
- This is how we figured out the slope of our regression line.
- Or maybe a better way to think about it, if we assume in our regression line that
- the points we have were sampled from an entire universe of possible points,
- then you could say we're approximating the slope of our regression line.
- You might see this little hat notation in a lot of books.
- Don't want you to be confused.
- You're approximating the population's regression line from a sample of it.
- Now this right here is an estimate of the covariance of X and Y.
- Now what is this over here?
- Well, I just said, you could write this bottom part very easily as
- the mean of x times x-- that's the same thing as x squared--
- minus the mean of x times the mean of x. Right? That's what the mean of x squared is.
- What's this? Well, you can view this as the covariance of X with X.
- We've actually already seen this. I've actually shown you many, many videos ago
- when we first learnt about it what it is.
- The covariance of a random variable with itself
- is really just the variance of that random variable.
- You could verify it for yourself.
- If you change this Y to an X,
- this becomes (X-E[X])?(X-E[X]).
- Or that's the expected value of X-E[X] squared, that's the definition of variance.
- So another way of thinking about the slope of our regression line,
- it can be literally viewed as the covariance of our 2 random variables
- over the variance of X. You can kind of view it as the independent random variable.
- That right there is the slope of our regression line.
- Anyway, I thought that was interesting. And I want to make connections between
- things you see in different parts of statistics and show you that they really are connected.
Be specific, and indicate a time in the video:
At 5:31, how is the moon large enough to block the sun? Isn't the sun way larger?
Have something that's not a question about this content?
This discussion area is not meant for answering homework questions.
Share a tip
When naming a variable, it is okay to use most letters, but some are reserved, like 'e', which represents the value 2.7831...
Have something that's not a tip or feedback about this content?
This discussion area is not meant for answering homework questions.
Discuss the site
For general discussions about Khan Academy, visit our Reddit discussion page.
Flag inappropriate posts
Here are posts to avoid making. If you do encounter them, flag them for attention from our Guardians.
- disrespectful or offensive
- an advertisement
- low quality
- not about the video topic
- soliciting votes or seeking badges
- a homework question
- a duplicate answer
- repeatedly making the same post
- a tip or feedback in Questions
- a question in Tips & Feedback
- an answer that should be its own question
about the site