If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Small sample size confidence intervals

Constructing small sample size confidence intervals using t-distributions. Created by Sal Khan.

Want to join the conversation?

  • leaf green style avatar for user kraever
    What have I missed? you have a 95% chance of being between 1.4 and 3.3 - but two of the values used to calculate that is outside that inteval (0.9 and 3.9). About 29% of the data set is outside the 95% confidence interval.
    (7 votes)
    Default Khan Academy avatar avatar for user
  • purple pi purple style avatar for user Rebecca Masterson
    Isn't it incorrect to say "There is a 95% CHANCE than that the true value of mu is within...."? It's not a 95% chance... mu is either in the range we calculate, or it's not. Wouldn't it be more accurate to say "We can say with 95% confidence that....."
    (7 votes)
    Default Khan Academy avatar avatar for user
    • leaf red style avatar for user dfbarbour
      From Wikipedia article on 'confidence interval':
      "A 95% confidence interval does not mean that for a given realised interval calculated from sample data there is a 95% probability the population parameter lies within the interval, nor that there is a 95% probability that the interval covers the population parameter. Once an experiment is done and an interval calculated, this interval either covers the parameter value or it does not, it is no longer a matter of probability. The 95% probability relates to the reliability of the estimation procedure, not to a specific calculated interval.
      At , Sal says "There is a 95% chance that our random sampling mean is within 0.96 of the population mean." What he should say is that if this procedure were repeated many times, the results would tend toward this interval with a probability of 95%.
      (10 votes)
  • blobby green style avatar for user dave
    I cannot grasp how there is a '95% chance mu is within +/- 0,96 of 2,34'. For example if mu is 1,05 then 2,34 is not within 0,96 of mu... I can understand that there is '95% chance 2,34 is within 0,96 of mu' but the logic behind reversing this statement is not clear to me.
    (4 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user InnocentRealist
      If there was a 95% probability that a given interval around mu (we don't know mu, but it has some particular value) contains our sample mean (which we know), then wouldn't there also be the same probability that the same interval around our sample mean contains mu?
      (Does mu change because we change our statement? They still have the same relationship to each other and our probability is still the same.)
      (1 vote)
  • blobby green style avatar for user Mohamed Zidan
    Why is the population mean is equal to the sample mean?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • starky sapling style avatar for user jonathan
    so i have a problem i can't quite figure out. the problem is as follows:
    you randomly choose 16 unfurnished one-bedroom apartments from a large number of advertisements in your local newspaper. You calculate that their mean monthly rent is $613 and their standard deviation is $96. What is the standard error of the mean? What are the degrees of freedom for a one sample t statistic?
    standard error is the mean/square root of n, or in this case 16, right? which comes out to 24. (69/sqrt 16)
    In my book it says to get the one sample t statistic, you take x-bar minus mu divided by the standard error.. but i don't know mu. How do i get mu from just one sample mean?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • male robot hal style avatar for user mymath.tutor.james
    Where does the t-table come from? I understand the z-table comes from the definate integral of the normal distribution function but how is the t-distribution defined and why is it that small sample sizes tend to follow a t-distribution model rather than some other model?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • purple pi purple style avatar for user futjim
    I have a question, why did Sal mulitply 0.39, the standard deviation of sample distribution, to 2.447 to get the distance from the miu to the critical value?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user splttingatms
      He multiplied because 2.447 is the t-critical value that corresponds to a 95% two-sided confidence interval using a t-distribution. The 2.447 is a standardized value that explains what t-values will contain 95% of the t-distribution. The t-critical value must then be converted back to units of the original question. Multiplying by the standard deviation of the sampling distribution will then result in the distance from the sampling mean (mu).

      When finding one-sample t confidence intervals, the general equation x_bar +/- (t critical value)*s/sqrt(n) is used. The multiplication is the (t critical value)*s/sqrt(n).
      (1 vote)
  • blobby green style avatar for user Mez Cooper
    Why do you know the population mean is the sampling distribution mean? What's the difference between that mean and the mean of 2.34 in the video?

    Sample size of 30 needed to be normal? What if you're at 20 - 29?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • aqualine ultimate style avatar for user Saivishnu Tulugu
      According to something called the Central Limit Theorem in Statistics a sample size of 30 is the minimum needed in order to run a valid statistical test (unless the data appears to be normal through another method such as a Normal Probability Plot). The sample mean or x-bar is the mean of the sample size that the researcher went out and collected. The actual mean is the true population mean. For example, if I wanted to find the true mean height of people. The actual mean would be records provided by the American Medical Association. The x-bar or sample mean would be from the sample I took the data from (ex.42 of my neighbors).

      Hope this helps!
      (1 vote)
  • blobby green style avatar for user jst2702
    I've had trouble distinguishing when to use a t-table vs a z-table.

    Do we use Z-tables for proportions or sample means when we assume a normal distribution? n>=30 or approximately normally symmetric

    Do we use T-tables for any other instance? Because at first, I thought Z-tables were for proportions and T-tables were for means.
    (1 vote)
    Default Khan Academy avatar avatar for user
  • male robot hal style avatar for user Lee
    Why is the mean 2.34? I get 2.417: the numbers sum to 14.5 and that is divided by 6.
    This then results in sample variance and SD of 1.258 and 1.121 respectively.
    (2 votes)
    Default Khan Academy avatar avatar for user

Video transcript

7 patients blood pressures have been measured after having been given a new drug for 3 months. They had blood pressure increases of, and they give us seven data points right here-- who knows, that's in some blood pressure units. Construct a 95% confidence interval for the true expected blood pressure increase for all patients in a population. So there's some population distribution here. It's a reasonable assumption to think that it is normal. It's a biological process. So if you gave this drug to every person who has ever lived, that will result in some mean increase in blood pressure, or who knows, maybe it actually will decrease. And there's also going to be some standard deviation here. It is a normal distribution. And the reason why it's reasonable to assume that it's a normal distribution is because it's a biological process. It's going to be the sum of many thousands and millions of random events. And things that are sums of millions and thousands of random events tend to be normal distribution. So this is a population distribution. And we don't know anything really about it outside of the sample that we have here. Now, what we can do is, and this tends to be a good thing to do, when you do have a sample just figure out everything that you can figure out about that sample from the get-go. So we have our seven data points. And you could add them up and divide by 7 and get your sample mean. So our sample mean here is 2.34. And then you can also calculate your sample standard deviation. Find the square distance from each of these points to your sample mean, add them up, divide by n minus 1, because it's a sample, then take the square root, and you get your sample standard deviation. I did this ahead of time just to save time. Sample standard deviation is 1.04. And when you don't know anything about the population distribution, the thing that we've been doing from the get-go is estimating that character with our sample standard deviation. So we've been estimating the true standard deviation of the population with our sample standard deviation. Now in this problem, this exact problem, we're going to run into a problem. We're estimating our standard deviation with an n of only 7. So this is probably going to be a not so good estimate because-- let me just write-- because n is small. In general, this is considered a bad estimate if n is less than 30. Above 30 you're dealing in the realm of pretty good estimates. So the whole focus of this video is when we think about the sampling distribution, which is what we're going to use to generate our interval, instead of assuming that the sampling distribution is normal like we did in many other videos using the central limit theorem and all of that, we're going to tweak the sampling distribution. We're not going to assume it's a normal distribution because this is a bad estimate. We're going to assume that it's something called a t-distribution. And a t-distribution is essentially, the best way to think about is it's almost engineered so it gives a better estimate of your confidence intervals and all of that when you do have a small sample size. It looks very similar to a normal distribution. It has some mean, so this is your mean of your sampling distribution still. But it also has fatter tails. And the way I think about why it has fatter tails is when you make an assumption that this is a standard deviation for-- let me take one more step. So normally what we do is we find the estimate of the true standard deviation, and then we say that the standard deviation of the sampling distribution is equal to the true standard deviation of our population divided by the square root of n. In this case, n is equal to 7. And then we say OK, we never know the true standard, or we seldom know-- sometimes you do know-- we seldom know the true standard deviation. So if we don't know that the best thing we can put in there is our sample standard deviation. And this right here, this is the whole reason why we don't say that this is just a 95 probability interval. This is the whole reason why we call it a confidence interval because we're making some assumptions. This thing is going to change from sample to sample. And in particular, this is going to be a particularly bad estimate when we have a small sample size, a size less than 30. So when you are estimating the standard deviation where you don't know it, you're estimating it with your sample standard deviation, and your sample size is small, and you're going to use this to estimate the standard deviation of your sampling distribution, you don't assume your sampling distribution is a normal distribution. You assume it has fatter tails. And it has fatter tails because you're essentially underestimating-- you're underestimating the standard deviation over here. Anyway, with all of that said, let's just actually go through this problem. So we need to think about a 95% confidence interval around this mean right over here. So a 95% confidence interval, if this was a normal distribution you would just look it up in a Z-table. But it's not, this is a t-distribution. We're looking for a 95% confidence interval. So some interval around the mean that encapsulates 95% of the area. For a t-distribution you use t-table, and I have a t-table ahead of time right over here. And what you want to do is use the two-sided row for what we're doing right over here. And the best way to think about it is that we're symmetric around the mean. And that's why they call it two-sided. It would be one-sided if it was kind of a cumulative percentage up to some critical threshold. But in this case, it's two-sided, we're symmetric. Or another way to think about it is we're excluding the two sides. So we want the 95% in the middle. And this is a sampling distribution of the sample mean for n is equal to 7. And I won't go into the details here, but when n is equal to 7 you have 6 degrees of freedom, or n minus 1. And the way that t-tables are set up, you go and find the degrees of freedom. So you don't go to the n, you go to the n minus 1. So you go to the 6 right here. So if you want to encapsulate 95% of this right over here, and you have an n of 6, you have to go 2.447 standard deviations in each direction. And this t-table assumes that you are approximating that standard deviation using your sample standard deviation. So another way to think of it you have to go 2.447 of these approximated standard deviations. Let me it right here. So you have to go 2.447-- this distance right here is 2.447 times this approximated standard deviation. And sometimes you'll see this in some statistics book. This thing right here, this exact number, is shown like this. They put a little hat on top of the standard deviation to show that it has been approximated using the sample standard deviation. So we'll put a little hat over here, because frankly, this is the only thing that we can calculate. So this is how far you have to go in each direction. And we know what this value is. We know what the sample distribution is. So let's get our calculator out. So we know our sample standard deviation is 1.04. And we want to divide that by the square root of 7. So we get 0.39. So this right here is 0.39. And so if we want to find the distance around this population mean that encapsulates 95% of the population or of the sampling distribution, we have to multiply 0.39 times 2.447, so let's do that. So times 2.447 is equal to 0.96. So this is equal to-- so this distance right here is 0.96, and then this distance right here is 0.96. So if you take a random sample, and that's exactly what we did when we found these 7 samples. When we took these 7 samples and took their mean, that mean can be viewed as a random sample from the sampling distribution. And so the probability, and so we can view it, we could say that there's a 95% chance-- and we have to actually caveat everything with a confident, because we're doing all of these estimations here. So it's not a true precise 95% chance. We're just confident that there's a 95% chance that our random population, our random sampling mean right here, so that 2.34, which we can kind of use-- we just picked that 2.34 from this distribution right here. So there's a 95% chance that 2.34 is within 0.96 of the true sampling distribution mean, which we know is also the same thing as the population mean. Or we can just rearrange the sentence and say that there is a 95% chance that the mean, the true mean, which is the same thing as a sampling distribution mean, is within 0.96 of our sample mean, of 2.34. So at the low end, so if you go 2.36 minus-- if you go 2.34 minus 0.96-- that's the low end of our confidence interval, 1.38. And the high end of our confidence interval, 2.34 plus 0.96 is equal to 3.3. So our 95% confidence interval is from 1.38 to 3.3.