If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

# Difference of sample means distribution

## Video transcript

I want to build on what we did on the last video a little bit so let's say we have two random variables so I have random variable X and let me draw its probability distribution and actually it doesn't have to be normal but I'll just draw it as a normal distribution so this is the distribution of random variable X this is the mean the population mean of random variable X and then it has some type of standard deviation or actually let me just focus on the variance so it has some variance right here for random variable X now let's say let me just write this is X the distribution for X and let's say we have another random variable random variable Y let's do the same thing for it let's draw it's its distribution its distribution and let me draw the parameters for that distribution so it has some true mean some population mean for the random variable Y and it has some variance right over here so it has very some variance to this distribution and I've drawn it it roughly normal once again we don't have to assume that it's normal because we're going to assume when we go to the next level that when we take the samples we're taking enough samples that the central limit theorem will actually apply but with that said let's think about the sampling distributions of each of these random variables so let's think about the sampling distribution the sampling distribution of the sample mean of X the sample mean of X when the sample size and let's say the sample the sample size over here is going to be equal to n and actually over here I'm going to ah stay in green right now so what is that going to look like well it's going to be some distribution some distribution and now we're assuming that n is is a fairly large number so this is going to be a normal distribution or can be approximated with a normal distribution notice I drew it having a let me shift it over a little bit I'm going to draw it a little bit narrow because we learn from the central limit theorem that the standard deviation this thing that the standard deviation so let me draw the mean so the population mean of the sampling distribution is going to be we're going to denote it with this X bar that tells us the distribution of the means when the sample size is N and we know that this is going to be the same thing as the population mean for that random variable and we know from the central limit theorem we know from the central limit theorem that the variance the variance of the sampling distribution or often called the standard error of the mean is going to be equal to the population is going to be equal to the population variance the population variance divided by this n right over here divided by n divided by that n over there and if you wanted the standard deviation of this you just take the square root of both sides now let's do the same thing for random variable Y so let's take the sampling distribution sampling distribution of the sample mean but here we're talking about Y random variable Y and let's just say it has a different sample size it doesn't have to be a different one but it just shows you that it doesn't have to be the same so it has a sample sample size let's say it has a sample size of M so let me draw its distribution right over here once again it'll be a narrower distribution than the population distribution and it will be approximately normal assuming that we have a large enough sample size and it's mean the set the mean of the sampling distribution of the sample mean is going to be the same thing as the population mean we've seen that multiple times same thing as the population mean and it's variance its variance so the variance over here so the variance for the sample means or the standard error of the mean actually this isn't the standard error this is the I guess you could well standard error would be the square root of this so if I call this the standard error mean that's wrong the standard error of the mean is the square root of this standard deviation this is the variance of the mean the variance of the mean don't want to confuse you so the variance of the mean here is going to be the exact same thing it's going to be the variance of the population it's going to be the variance of the population divided by our sample size divided by our sample size and everything we've done so far is complete review it's a little different because I'm actually doing it with two different random variables and I'm doing it with two different random variables for a reason because now I'm going to define a new random variable I'm now going to define a new random variable that is well we could just call it Z we'll just call it Z but Z is equal to the difference of our sample means Z is equal to and now let me state with the colors it's equal to the X sample mean minus the Y sample mean minus the Y sample mean so what does that really mean well to get a sample mean you're taking or at least for this distribution you're taking n samples you're taking n samples from this population over here maybe n is 10 you're taking 10 samples and finding it's mean that sample mean is a random variable you could view that sample mean let's say you take 10 samples from here and you get 9.2 when you find their mean that 9.2 can be viewed as a sample from this distribution right over here same thing if this right here is M you're taking or if M right here is 12 you're taking 12 samples taking its mean and that sample mean maybe it's 15.2 could be viewed as a sample from this distribution as a from the sampling distribution so what Z is Z is a Z is a random variable where you're taking n samples from this distribution up here this population distribution taking its mean that you're taking M samples from this population distribution up here taking its mean and then finding the difference between that mean and that means so it's another random variable but what is the distribution of Z what is going to be the distribution of so let's draw it let's draw it like this well there's a couple of things we immediately know about Z and we kind of came up with this in the last video so the mean the mean of Z instead of writing Z I'm just going to write the mean of X let me do that same shade of green the mean of X bar which is the mean of X minus or a sample from the sampling distribution of X or the sample mean of X minus the sample mean of Y so the mean of this is going to be equal to and we saw this in the last video in fact I think I still have the work up here yeah I still have the work right up here the mean of the difference is going to be the difference of the means the mean of the difference is the same thing as the difference of the means so the mean of this new distribution right over here is going to be the same thing as it's going to be the mean of it's going to be the mean of our sample mean minus the mean of our sample mean of Y sample mean of Y and this might seem a little abstract in this video in the next video we're actually going to do this with concrete numbers and hopefully it'll make a little bit more sense and the whole just so you know where we're going with this the whole point of this is so that we can eventually do some inferential statistics about differences of means how likely is a difference of means of two samples random chance or not random chance or what is a confidence interval of difference of means that's what this is all building up to so anyway we know the mean of this distribution right over here and what's the variance of this distribution and we came up with that result in the last video if we take if we're taking essentially the difference of two random variables the variance is going to be the sum of those two random variables and a whole point of that video is to show you that hey you know it's not the difference of the variances it's the sum of the variances so the variance the variance of this of this new distribution and I haven't drawn the distribution yet the variance of this new distribution I'll just write X bar minus y bar is going to be equal to the sum of the variances of each of these distribute the variance of X bar plus the variance of Y bar now what is actually let me just draw this here just so we can visualize another distribution although I'm going to draw is a another normal distribution so this is its mean so the mean over here so this let me scroll down a little bit so the mean over here mean of X bar minus y bar is going to be equal to the difference of these means over here I don't have to rewrite it and then it let me draw and then let me draw its let me draw the curve and notice I'm drawing a fatter curve I'm drawing a fatter curve than either one and why am I doing that because the variance here is the sum of the variances here so we're going to have a fatter curve it's going to have a bigger variance or a bigger standard deviation than either of these so then we have some variance here variance of X bar minus y bar now what are these in terms of the original population distribution well we came up with that result those results right over here we know what the standard deviation we know that this thing we know that this thing is the same thing as the variance of the population distribution divided by n we've done this multiple multiple times so this is going to be equal to what's this going to be equal to this is this part right here is the same thing as the variance of our population distribution the variance of our population distribution and the X just means this is for random variable X but there's no bar on top this is the actual population distribution not the sampling distribution of the sample mean so that divided by N and then if we want the variance of the sampling distribution for Y let me do that in a different color I'll use blue because that was what we were using for the Y random variable that's going to be equal to this thing over here and we've done this multiple times same exact logic is this the population distribution for Y divided by M divided divided by M and so once again I'll just write this out front this is the variance of the differences of the sample means the differences of the sample means and now if you wanted the standard deviation of the differences of the sample means you just have to take the square root of both sides of this if you take the square root of this you get the standard deviation of the difference of the sample means is equal to the square root the square root of the population distribution of X the or the variance of the population distribution of X divided by n plus the variance of the population distribution of Y divided by M and then the whole reason why I've even done this and this is just neat because it kind of looks a little bit like a distance formula and I'll kind of throw that out there as we get more sophisticated with our statistics and try to visualize what all of this kind of stuff means in more advanced topics but the whole point of this is now we can we can make inferences about a difference of means if we have two samples and we want to say and we take the means of both of those samples and we find some difference we can make some conclusions about how likely that difference was just by chance and we're going to do that in the next video