Main content

## Statistics and probability

### Unit 5: Lesson 6

More on regression- Squared error of regression line
- Proof (part 1) minimizing squared error to regression line
- Proof (part 2) minimizing squared error to regression line
- Proof (part 3) minimizing squared error to regression line
- Proof (part 4) minimizing squared error to regression line
- Regression line example
- Second regression example
- Calculating R-squared
- Covariance and the regression line

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Covariance and the regression line

Covariance, Variance and the Slope of the Regression Line. Created by Sal Khan.

## Want to join the conversation?

- 2:41

Wouldn't this just be 0? Isn't the mean of the product of two variables the same thing as the product of the means of those two variables? Or am I wrong about this.(13 votes)- The mean of the product is not the same as the product of the means. For example if x = [1,2,3] and y = [4,5,6] then the mean of the product of [x,y] would be (1 * 4 + 2 * 5 + 3 *6)/3 or (4 + 10 + 18)/3 = 32/3 = 10.666... Alternatively, the product of the means would be ((1+2+3)/3) * ((4+5+6)/3) = 2*5 = 10 So they are not equal. Hope this helps!(2 votes)

- Why did we assume the expected value of (XY) and can be approximated to sample mean of product of XY? Is this a standard rule? (This was discussed on the 11th minute into the video)(11 votes)
- Since X and Y are both random variables, the product of X and Y can be viewed as another random variable. With a "large enough" sample size, we can then use the Central Limit Theorem to approximate the expected value of XY with the sample mean of paired XY products.(2 votes)

- Can you make the connection between Pearsons Coefficient correlation (R) and the Coefficient of determination (R2). I'd like to know the difference. thanks(4 votes)
- The coefficient of determination is the PCC squared.(8 votes)

- This equation only works for the covariance of a population not a sample. How would you modify this equation to work for a sample?(3 votes)
- Anytime the notation "X-bar" is used (X with a line above it), this means we are dealing with a sample. So X-bar is a sample statistic that approximates the population parameter, i.e. E[X].(7 votes)

- Somebody plz tell me what is the practical usage of Covariance? Thanks! An Illustration would be very aprappreciated!(6 votes)
- Sal Khan, you made me fall in love with Statistics (the logically and step by step explained concepts, accompanying examples etc). God bless you much. How can I donate to this cause please?(5 votes)
- At6:25to about6:50, Sal says that E[E[x]] is just E[X]. I just can't intuitively see why this is true. I don't understand.(3 votes)
- So in other words, it's like we expect the expected value of X to never change with a particular X. Say the expected value of X is 'N' when X is 3. N will always be N as long as X = 3 . So we know that the expected value of X will always equal N if X = 3. EACH X HAS ONLY ONE EXPECTED VALUE... NOW I SEE, X can have more than one value, but each X can only have 1 expected value. That's why the expected value is a constant. Unlike X that might be 3 or 5, N will never change for a particular X value, so we can expect it to be N. The expected value of the expected value of X is a constant that changes as X changes. Thanks a lot guys!(2 votes)

- what is meant by expected value is it different from normal value of x and y(2 votes)
- The expected value is a weighted average of outcomes using probability. Take the sum of the probability of each outcome multiplied by that outcome. If you took the expected value when you are gambling it would tell you how much money you'd "expect to have in the end" and if it was positive it would be a good bet, but if it was negative it would mean you were losing your money.

Try this video to learn more:

https://www.khanacademy.org/math/probability/random-variables-topic/random_variables_prob_dist/v/expected-value--e-x(3 votes)

- What is covariance is not explained in this video, nor could I find other videos talking about covariance in this site. Are there more introduction videos on covariance which I just did not find? thanks(2 votes)
- Many people have an intuitive feel for the term "correlation". Two terms are correlated when they "co-vary" or go up or down together. Covariance is just like correlation, but has not been normalized to fit between -1 and +1.(3 votes)

- What is the connection between correlation and covariance?(3 votes)
- Divide the covariance by the product of the sd (X) and sd(Y)(1 vote)

## Video transcript

What I want to do in this
video is introduce you to the idea of the covariance
between two random variables. And it's defined as the
expected value of the distance-- or I guess the product
of the distances of each random variable
from their mean, or from their expected value. So let me just write that down. So I'll have X first, I'll
do this in another color. So it's the expected
value of random variable X minus the expected value
of X. You could view this as the population
mean of X times-- and then this is a
random variable y-- so times the distance from
Y to its expected value or the population mean of y. And if it doesn't make
a lot of intuitive sense yet-- well, one, you
could just always kind of think about what
it's doing play around with some numbers here. But the reality is it's saying
how much they vary together. So you always take an X and a
y for each of the data points. Let's say you had
the whole population. So every X and Y that
kind of go together with each other that are
coordinate you put into this. And what happens is-- let's
say that X is above its mean when Y is below its mean. So let's say that in the
population you had the point. So one instantiation
of the random variables you sample once
from the universe and you get X is equal to 1 and
that Y is equal to-- let's say Y is equal to 3. And let's say that you
knew ahead of time, that the expected
value of X is 0. And let's say that the expected
value of Y is equal to 4. So in this situation,
what just happened? Now we don't know the
entire covariance, we only have one sample here
of this random variable. But what just happened here? We have one minus-- so we're
just going to calculate, we're not going to calculate
the entire expected value, I just want to
calculate what happens when we do what's inside
the expected value. We'll have 1 minus 0, so you'll
have a 1 times a 3 minus 4, times a negative 1. So you're going to have
1 times negative 1, which is negative 1. And what is that telling us? Well, it's telling us at least
for this sample, this one time that we sampled the
random variables X and Y, X was above it's
expected value when Y was below its expected value. And if we kept doing this, let's
say for the entire population this happened, then
it would make sense that they have a
negative covariance. When one goes up, the
other one goes down. When one goes down,
the other one goes up. If they both go
up together, they would have a positive variance
or they both go down together. And the degree to which
they do it together will tell you the magnitude
of the covariance. Hopefully that gives you
a little bit of intuition about what the covariance
is trying to tell us. But the more important thing
that I want to do in this video is to connect this formula. I want to connect to this
definition of covariance to everything we've been doing
with least squared regression. And really it's just
kind of a fun math thing to do to show you all
of these connections, and where, really, the
definition of covariance really becomes useful. And I really do think it's
motivated to a large degree by where it shows
up in regressions. And this is all stuff
that we've kind of seen before, you're just going
to see it in a different way. So this whole video,
I'm just going to rewrite this definition of
covariance right over here. So this is going to be the
same thing as the expected value of-- and I'm
just going to multiply these two binomials in here. So the expected value
of our random variable X times our random variable
Y minus-- well, I'll just do the X first. So plus X times the negative
expected value of Y. So I'll just say minus X
times the expected value of Y. And that negative sign comes
from this negative sign right over here. And then we have minus
expected value of X times Y, just doing the distributive
property twice, and then finally you have the
negative expected value of X times a negative
expected value of Y. And the negatives cancel out. And so you're just going to have
plus the expected value of X times the expected value
of Y. And of course, it's the expected value
of this entire thing. Now let's see if we
can rewrite this. Well the expected
value of the sum of a bunch of random variables,
or the sum and difference of a bunch of random variables,
is just the sum or difference of their expected value. So this is going to
be the same thing. And remember, expected
value, in a lot of contexts, you could view it as
just the arithmetic mean. Or, in a continuous
distribution, you could view it as a
probability weighted sum or probability weighted
integral, either way. We've seen it before, I think. So let's rewrite this. So this is equal to the expected
value of the random variables X and Y. X times Y. Trying to
keep them color-coded for you. And then we have minus X
times the expected value of Y. So then we're going to have
minus the expected value of X times the expected value of
Y. Stay with the right colors. Then you're going to have
minus the expected value of this thing-- I'll close the
parentheses-- of this thing right over here. Expected value of
X times Y. I know this might look really
confusing with all the embedded expected values. But one way to think
about is the things that already have
the expected values, you can view these as numbers. You've already used
them as knowns. We're actually going to take
them out of the expected value, because the expected
value of an expected value is the same thing as
the expected value. Actually let me write this over
here, just remind ourselves. The expected value
of X is just going to be the expected value
of X. Think of it this way. You could view this
as the population mean for the random variable. So that's just going to be
a known, it's out there, it's in the universe. So the expected value of that
is just going to be itself. If the population mean, or
the expected value of X is 5-- this is like saying the
expected value of 5. Well the expected
value of 5 is going to be 5, which is the same thing
as the expected value of X. Hopefully that
makes sense, we're going to use that in a second. So we're almost done. We did the expected value of
this and we have one term left. And then the final term, the
expected value of this guy. And here, we can actually use a
property right from the get go. I'll write it down. So the expected value of--
get some big brackets up-- of this thing right over here. Expected value of X times the
expected value of Y. And let's see if we can simplify
it right here. So this is just going
to be the expected value of the product of these
two random variables. I'll just leave
that the way it is. So let me just--
the stuff that I'm going to leave the way it is
I'm just going to freeze them. So the expected value of XY. Now what do we have over here? We have the expected
value of X times-- once again, you
can kind of view it if you go back to
what we just said-- is this is just going to be a
number, expected value of Y, so we can just bring this out. If this was the
expected value of 3X, would be the same thing as 3
times the expected value of X. So we can rewrite this as
negative expected value of Y times the expected value of
X. You can kind of view this as we took it out of
the expected value, we factored it out. So just like that. And then you have minus. Same thing over here. You can factor out
this expected value of X. Minus the expected
value of X times the expected value of Y. This is getting
confusing with all the E's laying around. And then finally,
the expected value of this thing, of
two expected values, well that's just going to
be the product of those two expected values. So that's just going
to be plus-- I'll freeze this-- expected
value of X times the expected value of Y. Now what do we have here? We have expected value of Y
times the expected value of X. And then we are subtracting
the expected value of X times the
expected value of Y. These two things are
the exact same thing. Right? So this is going to be--
and actually look at this. We're subtracting it twice
and then we have one more. These are all the same thing. This is the expected value of Y
times the expected value of X. This is the expected value of Y
times the expected value of X, just written in a
different order. And this is the expected value
of Y times the expected value of X. We're subtracting it
twice and then we're adding it. Or, one way to think about it
is that this guy and that guy will cancel out. You could have also picked
that guy and that guy. But what do we have left? We have the covariance of
these two random variables. X and Y are equal to the
expected value of-- I'll switch back to my
colors just because this is the final result-- the
expected value of X times the expected value of
the product of XY minus-- what is this? The expected value of Y times
the expected value of X. Now you can calculate
these expected values if you know everything about
the probability distribution or density functions for each
of these random variables. Or if you had the
entire population that you're sampling
from, whenever you take an instantiation
of these random variables. But let's say you
just had a sample of these random variables. How could you estimate them? Well, if you were estimating
it, the expected value, and let's say you just have
a bunch of data points, a bunch of coordinates. And I think you'll start
to see how this relates to what we do with regression. The expected value
of X times Y, it can be approximated
by the sample mean of the products of
X and Y. This is going to be the sample
mean of X and Y. You take each of
your XY associations, take their product, and then
take the mean of all of them. So that's going to be
the product of X and Y. And then this thing
right over here, the expected value of Y that can
be approximated by the sample mean of Y, and the
expected value of X can be approximated by
the sample mean of X. So what can the covariance
of two random variables be approximated by? What can it be approximated by? Well this right here is the
mean of their product from your sample minus the mean of
your sample Y's times the mean of your sample X's. And this should start
looking familiar. This should look a little bit
familiar, because what is this? This was the numerator. This right here is
the numerator when we were trying to figure out the
slope of the regression line. So when we tried to figure out
the slope of the regression line, we had the-- let me just
rewrite the formula here just to remind you-- it was literally
the mean of the products of each of our data points,
or the XY's, minus the mean of Y's times the
mean of the X's. All of that over the
mean the X squareds. And you could even
view it as this, over the mean of
the X times the X's. But I could just write
the X squareds, over here, minus the mean of X squared. This is how we figured out the
slope of our regression line. Or maybe a better way
to think about it, if we assume in
our regression line that the points
that we have were a sample from an entire
universe of possible points, then you could say that
we are approximating the slope of our
aggression line. And you might see this little
hat notation in a lot of books. I don't want you to be confused. They are saying that you're
approximating the population's regression line
from a sample of it. Now, this right
here-- so everything we've learned right now-- this
right here is the covariance, or this is an estimate of
the covariance of X and Y. Now what is this over here? Well, I just said,
you could rewrite this very easily
as-- this bottom part right here-- you could write as
the mean of X times X-- that's the same thing as X
squared-- minus the mean of X times the mean of X, right? That's what the mean
of X squared is. Well, what's this? Well, you could view this as
the covariance of X with X. But we've actually
already seen this. And I've actually
shown you many, many videos ago when we first
learned about it what this is. The covariance of a random
variable with itself is really just the variance
of that random variable. And you could verify
it for yourself. If you change this
Y to an X, this becomes X minus
the expected value of X times X minus
expected value of X. Or that's the
expected value of X minus the expected
value of X squared. That's your definition
of variance. So another way of thinking about
the slope of our aggression line, it can be literally viewed
as the covariance of our two random variables over
the variance of X. Or you can kind of view it
as the independent random variable. That right there is the
slope of our regression line. Anyway, I thought
that was interesting. And I wanted to make
connections between things you see in different
parts of statistics, and show you that they
really are connected.