If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

# Small sample size confidence intervals

## Video transcript

seven patients blood pressures have been measured after having been given a new drug for three months they had blood pressure increases of and they give us seven data points right here who knows when it's in some blood pressure units construct a 95% confidence interval for the true the true expected blood pressure increase for all patients in a population so there's some population distribution here it's a reasonable assumption to think that it is normal it's a biological process so if you gave this drug to every person who has ever lived and you just that will result in some mean increase mean increase in blood pressure or who knows maybe it's an actual decrease and there's also going to be some standard deviation here there's going to be some standard deviation here it is a normal distribution and the reason why it's reasonable to assume that it's a normal distribution is because it's a biological process it's going to be the sum of many thousands and millions of random events and things that are sums of many millions and thousands of random events tend to be normal distribution so this is a population distribution this is the population distribution and we don't know anything really about it outside of the sample that we have here now what we can do is and just tends to be a good thing to do when you do have a sample let's just figure out everything that you can figure out about that sample from the get-go so we have our seven data points and you can add them up and divide by seven and get your sample mean so our sample mean our sample mean here is two point three four and then you can also calculate your sample standard deviation find the squared distance from each of these points to your sample mean add them up divide by n minus one because it's a sample then take the square root and you get your sample standard deviation I did this ahead of time just to save time sample standard deviation is 1.04 and when you don't know anything about the population distribution the thing that we've been doing from the get-go is estimating is estimating that character with our sample standard deviation so we've been estimating the true standard deviation of the population with our with our sample standard with our sample standard deviation now in this problem this exact problem we're going to run into a problem we're estimating our standard deviation with an N of only 7 so this is probably going to be a not so good estimate not so good because because because let me just write because n is n is small in general this is considered a bad estimate if n is less than 30 above 30 you're dealing in the realm of pretty good estimates and so the whole focus of this video is when we think about the sampling distribution which is what is what we're going to use to generate our interval instead of assuming that the sampling distribution is normal like we did in the in many other videos using the central limit theorem and all of that we're going to have a we're going to tweak the sampling distribution we're not going to assume it's a normal distribution because this is a bad estimate we're going to assume that it is something called a T distribution and the T distribution is essentially the best way to think about it it's almost engineered it's almost engineered so it gives a better estimate of your confidence intervals and all of that when you do have a small sample size and it looks very similar to a normal distribution it looks very similar to a normal distribution it has some mean so this is your mean of your sampling distribution still but it also has fatter tails it has fatter fatter tails and the way I think about why it has fatter tails is when you assume when you make an assumption that this is the standard deviation for well let me let me go let me take one more step so normally what we do is we find the estimate of the true standard deviation and then we say that the standard deviation of the sampling distribution the standard deviation of the sampling distribution is equal to the true standard deviation of our population divided by the square root of N in this case n is equal to seven and then we say okay we never know the true standard or we seldom know sometimes you do know we seldom know the true standard deviation so if we don't know that the best thing we can put in there is our sample standard deviation so the best thing we can put in there is our sample standard deviation and this this right here this is the whole reason why we even call it well we don't say that this is just a 95 probability interval this is a whole reason we call it a confidence interval because we're we're making some assumptions here this thing is going to change from sample to sample and in particular this is going to be a particularly bad estimate when we have a small sample size a size less than 30 so when you are estimating the standard deviation where you don't know it you're estimating it with your sample standard deviation and your sample size is small and you're going to use this to estimate your standard deviation of your sampling distribution you don't assume your sampling distribution is a normal distribution you assume it has fat or tails and it has fatter tails because you're essentially under estimating you're under estimating the standard deviation over here anyway with all of that said let's just actually go through this problem so we need to think about a 95% confidence interval around this mean right over here so a 95% confidence interval if you if this was a normal distribution you would just look it up in a Z table but it's not this is a T distribution this is a T distribution we're looking for a 95% confidence interval so some some interval around the mean that encapsulates 95% of the area for T distribution you use a T table and I have a t-table ahead of time right over here and what you want to do is use the two-sided you want to use the two-sided row for what we're doing right over here and the best way to think about is that we're symmetric we're symmetric around the mean this is a what we then that's why they call it two-sided it would be one side if it was kind of a cumulative percentage up to some critical threshold but in this case it's two-sided we're symmetric or another way to think about is we're excluding the two sides so we want the ninety-five percent in the middle and this is a sampling distribution this is a sampling distribution sampling distribution of the sample mean for n is equal to seven and I won't go into the details here when n is equal to seven you have six degrees of freedom or n minus 1 and minus one and the way that tea tables are set up you go and find the degrees of freedom so you don't go to the end you go to the N minus 1 so you go to the 6 right here so if you want to encapsulate 95% of this right over here and you want and you have an N of 6 you have to go to point 4 4 7 standard deviations in each direction and this T stable this tea table assumes that you are approximating that standard deviation using your sample standard deviation so it's another way to think of it you have to go to point four for seven of these approximated standard deviations so let me need to go right here so you have to go to point four four seven this distance right here is two point four four seven times times this approximated times this approximated standard deviation this approximated standard deviation and sometimes you'll see this in some statistics book this thing right here this exact number is shown like this they put a little hat on top of the standard deviation to show that it has been approximated using the sample standard deviation so we'll put a little hat over here because frankly this is the only thing that we can calculate so this is how far you have to go in each direction now we know what this value is we know what the sample distribution is so let's get our calculator out let's get our calculator out so let me let me get our calculator so we know our sample standard deviation is one point zero four one point zero four and we want to divide that by the square root of seven divided by the square root of seven so we get point three nine so we get point three nine so this right here this right here this right here is zero point three nine and so if we want to find the distance around this this population mean that encapsulates 95 percent of the population or of the sampling distribution we have to multiply 0.39 times two point four four seven so let's do that so x times two point four four seven is equal to 0.96 so this is equal to so this distance right here so this distance right here is 0.96 and then this distance right here is 0.96 so if you take a random sample and that's exactly what we did when we when we found these these seven samples when we took these seven samples and took their mean that mean can be viewed as a random sample from the sampling distribution and so the probability and so we can view it we could say that there's a 95% chance 95% chance and we actually actually caveat everything with a confident because we're doing all of these estimations here so it's not a true precise a 95% chance we're just confident that there's a 95% chance that our random population a random sampling mean right here so that 2.3 for which we can kind of use we just pick that 2.3 for from this this this distribution right here so there's a 95% chance that 2.34 is within is within point nine six zero point nine six of the true sampling distribution mean which we know is also the same thing as the population mean of so I'll just say with it of the population mean or we can just rearrange the sentence and say that there is a 95% chance 95% chance that the mean the true mean which is the same thing as the sampling distribution mean is within is within 0.96 of our sample mean of two point three four so at the low end so if you go to point three six minus if you go to point three four two point three four minus 0.9 six that's the low end of our confidence interval one point three eight and the high end of our confidence interval two point three four plus point nine six is equal to three point three so we our 95% confidence interval is from one point three eight to three point three