If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

# Covariance and the regression line

Covariance, Variance and the Slope of the Regression Line. Created by Sal Khan.

## Want to join the conversation?

• What is covariance is not explained in this video, nor could I find other videos talking about covariance in this site. Are there more introduction videos on covariance which I just did not find? thanks
• (Posting so people with the same question can see) It was explained it in the beginning - in a nutshell, the covariance of two random variables is defined as how these two variables change in relation to each other over the data set. To explain further:
A NEGATIVE covariance means variable X will increase as Y decreases, and vice versa, while a POSITIVE covariance means that X and Y will increase or decrease together. If you think about it like a line starting from (0,0), NEGATIVE covariance will be in quadrants 2 and 4 of a graph, and POSITIVE will be in quadrants 1 and 3.

• Wouldn't this just be 0? Isn't the mean of the product of two variables the same thing as the product of the means of those two variables? Or am I wrong about this.
• The mean of the product is not the same as the product of the means. For example if x = [1,2,3] and y = [4,5,6] then the mean of the product of [x,y] would be (1 * 4 + 2 * 5 + 3 *6)/3 or (4 + 10 + 18)/3 = 32/3 = 10.666... Alternatively, the product of the means would be ((1+2+3)/3) * ((4+5+6)/3) = 2*5 = 10 So they are not equal. Hope this helps!
• Why did we assume the expected value of (XY) and can be approximated to sample mean of product of XY? Is this a standard rule? (This was discussed on the 11th minute into the video)
• Since X and Y are both random variables, the product of X and Y can be viewed as another random variable. With a "large enough" sample size, we can then use the Central Limit Theorem to approximate the expected value of XY with the sample mean of paired XY products.
• At to about , Sal says that E[E[x]] is just E[X]. I just can't intuitively see why this is true. I don't understand.
• For any constant c, E[c] = c. Then E[X] = µ, where µ is just a constant, so E[µ] = µ.

Hence, E[ E[X] ] = E[ µ ] = µ = E[X].
• Can you make the connection between Pearsons Coefficient correlation (R) and the Coefficient of determination (R2). I'd like to know the difference. thanks
• The coefficient of determination is the PCC squared.
• Sal Khan, you made me fall in love with Statistics (the logically and step by step explained concepts, accompanying examples etc). God bless you much. How can I donate to this cause please?
• This equation only works for the covariance of a population not a sample. How would you modify this equation to work for a sample?
• Anytime the notation "X-bar" is used (X with a line above it), this means we are dealing with a sample. So X-bar is a sample statistic that approximates the population parameter, i.e. E[X].