If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

# Confidence interval of difference of means

## Video transcript

we're trying to test whether a new low-fat diet actually helps obese people lose weight 100 randomly assigned people are assigned to group 1 and put on the low-fat diet another 100 randomly assigned pea Opie's people are assigned to group 2 and put on the dot on a diet of approximately the same amount of food but not as low in fat so group 2 is the control just let no diet group 1 is the low fat group so see if it actually works after 4 months the mean weight loss was nine point three one pounds for Group one so let me write this down the mean weight loss for Group one let me make it very clear so the low-fat the low-fat group the mean weight loss the mean weight loss was nine point three one pounds so our sample mean for Group one is nine point three one pounds with a sample standard deviation of four point six seven and both of these are obviously very easy to calculate from the actual data and then for our control group for our control for our control the sample mean and 7.40 pounds for group two so our sample mean here for the control is 7.40 with a sample standard deviation sample standard deviation of port 4.0 four pounds and now if we just look at it superficially it looks like the low-fat group lost more weight than just based on our samples then the control group now if we take the difference of them so if we take the difference of the low-fat group from or between the low-fat group and the control group we get what is it's nine point three one nine point three one minus seven point four zero minus seven point four zero is equal to now let's the calculator out so then we have nine point three one minus seven point four so it's one point nine one so it's one point nine one so the difference of our samples is one point nine one so just based on what we see it says hey maybe maybe you lose an incremental one point nine one pounds every four months if you are on this diet and what we want to do in this video is is to get a 95% confidence interval around this number to see that is you know are all that in that 95% confidence interval maybe do we always lose weight and or or is there a chance that we can actually go the other way with the low-fat diet so really just this video 95% confidence interval in the next video we'll actually do a hypothesis test using this same data and now to do a 95% confidence interval let's think about the distribution that we're thinking about so let's look at the distribution of course we're going to think about the distribution that we're thinking about we want to think about the the distribution of the difference of the means so it's going to have some true mean here it's going to have some true mean over here which is going to have some true mean which is the mean of the difference of the sample means actually let me write that it's not a why it's an x1 and x2 so it's the sample mean of x1 minus the sample mean of x2 and then this distribution right here is going to have it's going to have some standard deviation it's going to have some standard deviation so it's a standard deviation of the distribution of X of the the mean of one of x1 minus the sample mean of x2 is going to have some standard deviation here and we want to make an inference about this we are guess the best way to thing but we want to get a 95% confidence interval based on our sample we want to create an interval around this where there's a where we are confident that there's a 95% chance that this true mean the true mean of the differences lies within that interval and to do that to do that let's just think of it the other way how can we create a 95% confidence how can we create a 95% interval around this around the mean where we are 95% sure or construct an interval around this where we're 95% sure that any sample from this distribution and this is one of those samples that were 95% that there's a 95% chance that we will select from this region right over here so we care about a 95% region right over here so how many standard deviations do we have to go in each direction and to do that we just have to look at a Z table and just remember if we have 95% in the middle right over here we're going to have two and a half percent two and a half percent over here and we're going to have two and a half percent over here we have to have five percent split between these two symmetric tails so when we look at a z table we want we want the critical Z value that they give right over here and we have to be careful here we're not going to look up 95% because the Z table gives us a cumulative probability up to that critical Z value so the Z table is going to be interpreted like this so there's going to be some Z value right over here where we have two and a half percent above it two and a hat the probability of getting a more extreme result or a z score above that is two and a half percent and the probability of getting one below that is going to be ninety seven point five percent ninety seven point five percent but if we can find whatever Z value this is right over here it's going to be the same Z value as that instead of thinking about it in terms of a one-tailed scenario we're going to think of it in a two-tailed scenario so let's look up let's look it up for ninety seven point five percent ninety seven point five percent on our Z table let's see we have ninety seven right here this is point nine seven five or ninety seven point five and this gives us a Z value of 1.96 so this is Z is equal to one point nine six or or only two-and-a-half percent of the results or of the of the samples from this population are going to be more more than one point nine six standard deviations away from the mean so this critical Z value right here is 1.96 standard deviations the ordered if this is 1.96 times the standard deviation of X 1 minus X 2 and then this right here is going to be negative 1.96 times the same thing let me write that so this right here it's symmetric this distance is going to be the same is that distance so this is negative 1.96 times the standard deviation of this distribution and if there's a 95% chance so let's put it this way there's a 95% chance there is a 95% chance that our our mean or our I guess we could say that our sample that we got from our distribution this is the sample is the difference of these other samples there's a 95% chance that 1.91 not one point nine one lies within lies within or let me just write is within is within this distance is within 1.96 times the standard deviation of that distribution so you could view it as a standard error of distance of this statistic so x1 minus x2 or we can say that there is a 95 let me finish that sentence there's a 95% chance that one point nine one which is R which is the sample statistic or the statistics that we got is within 1.96 times the standard deviation of this distribution of of the mean of the distribution of the true mean of the distribution or we could say it the other way around there's a 95% chance there is a 95% chance that the true mean of the distribution that the true mean of the distribution is within is within 1.96 times the standard deviation of the distribution of one point nine one these are equivalent statements if I say I'm within three feet of you that's equivalent to saying you're within three feet of me that's all that's saying but when we constructed this way it becomes pretty clear how do we how do we actually construct the confidence interval we just have to figure out we just have to figure out what this distance right over here is and to figure out what that distance is we're going to have to figure out what the standard deviation of this distribution is well the standard deviation of the differences of the sample means is going to be equal to and we saw this in the last video in fact I have it I think I have it right at the bottom here it's going to be equal to the square root it's going to be equal to the square root of the variances of each of those District of those distributions right because the or the variance of this this distribution is going to be equal to the SUP let me write it over this way right over here so the variance let me write it right over here the variance I'll reread kind of prove it the variance of the means or the variance of our distribution is going to be equal to the sum of the variances of each of these sampling distributions of each of these sampling distributions each of these sampling distributions and we know that the variance of each of these sampling distributions is equal to the variance of this sampling distribution is equal to the variance of the population distribution the variance of the population distribution divided by our sample size and our sample size in this case is 100 and the variance of this sampling distribution for our control this in a new color for our control is going to be equal to the variance of the population distribution for the control for the control divided by its sample size and if since we don't know what these are we can approximate them especially because our n is greater than 30 for both circumstances we can approximate these with our sample with our sample stat our sample variances for each of these distributions so let me make this clear our sample variances for each of these distributions so this is going to be our sample variance one score actually our sample standard deviation one squared which is the sample variance for that distribution over 100 plus plus my sample standard deviation for the control squared which is the sample variance standard deviation squared is just the variance / 100 and this will give us the variance this will give us the variance for this for the for this for this distribution and if we want the standard deviation if we want the standard deviation we just take the square roots of both sides so if we want the square root sorry if we want the standard deviation of this distribution right here this is the variance right now so we just need to take we just need to take the square root so let's calculate this we actually know these values s 1 s 1 our sample standard deviation for Group 1 is 4 point 6 7 we wrote it right here as well it's four point six seven and four point oh four so this is 4 point six seven four point six seven and this number right here is four point zero for the S the S is four point six seven we're going to square it and the S 2 is four point oh four we're going to have to square it so let's calculate that so we get so we're going to take the square root of square root of four point six seven four point six seven squared divided by 100 plus four point oh four squared divided by 100 and then close the parenthesis close the parenthesis and we get point six one seven so this is equal to let me write it right here this is going to be equal to 0.61 seven so if we go back up over here we calculated the standard deviation of this distribution to be zero point six one seven so now we can actually calculate our interval because this is going to be zero point six one seven so if you want one point nine six times that so we get one point nine six times that point six one seven I'll just write the answer we just got so we get one point two one so this is this number right here this number right here is one point two one so the 95% confidence interval is going to be is going to be the the the difference of our means 1.91 plus or minus plus or minus this number 1.21 so what's our confidence interval if we subtract it so the low-end of our confidence interval and I'm running out of space the low-end one point nine one minus one point two one is just what is that that's just point seven so the low-end is 0.7 and then the high-end one point nine one plus one point two one what is that that's two two point one two two point one two let me just make sure that my brain sometimes doesn't work properly when I'm making these videos three point one to a good thing I didn't read it three point one two of course yeah three point one two so let me so it is three point one two so and then just be clear there's not a ninety there's not a pure 95% chance that the true the true difference of the true means lies in this we are just confident that there's a 95% chance and we always have to put that little confidence there because remember we actually we didn't actually know the population standard deviations or the population variances we estimated them with our sample and because of that we don't know that it's an exact probability we just have to say we're confident that it's a 95% probability that and that's why it's really just reading say it's a confidence interval it's not a pure probability but it's a pretty neat result we are now we have this ninety-five percent confidence interval so we're confident that there's a 95% chance that the true difference of these two samples and remember the sample means the means of the sample the difference between the let me make it very clear the difference between the means of the sample is or let me put the sample means the expected value of the sample means is actually the same thing as the expected value of the populations and so if you what this is giving us is actually a confidence interval for the true difference between the populations if you were to give everyone every possible person diet one and every a possible person diet too this is giving us a confidence interval for the true population means and so when you look at this it looks like diet one actually does do something because in any case even at the low end of the confidence interval you still have a greater weight loss than diet two hopefully that doesn't confuse you too much in the next video we're actually going to do a hypothesis test with the same data