If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

T-statistic confidence interval

Sal computes a confidence interval for the emission from an engine with a new design. Created by Sal Khan.

Want to join the conversation?

  • blobby green style avatar for user Travis John
    Hello Sal. I checked up a t-distribution table and found that the degrees of freedom went upto 120. Why would we need that much when we only use the t-distribution when n < 30?
    (21 votes)
    Default Khan Academy avatar avatar for user
  • leaf green style avatar for user Uroosa Rubab
    when we use sigma and s?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Matthew Daly
      sigma is the standard deviation of a population, and s is the standard deviation of a sample. My tip for remembering it is that the population is unknown and mysterious but the sample is very clear data, so that's why we use mysterious Greek letters like mu and sigma to describe population statistics but familiar Latin letters like x-bar and s to describe sample statistics.
      (37 votes)
  • starky ultimate style avatar for user Hallowdean
    I may have missed this somewhere and a site search didn't seem to find it: where might the t statistic videos be? Thanks.
    (5 votes)
    Default Khan Academy avatar avatar for user
    • old spice man green style avatar for user Bastian Widanski
      couldn't find any video specifically describing this way to do a t statistic too. But I guess he means the videos about the t statistic in general, like "Introduction to t statistics" and stuff.

      Since the formula is basically the same, just written in another way. So the formula we were given in the videos is:
      x_bar +- t* sigma/root(n) to get your confidence interval
      using this you can conclude that:
      x_bar - t* sigma/root(n) < mu < x_bar + t* sigma/root(n)

      all - x_bar
      => -t* sigma/root(n) < mu - x_bar < t* sigma/root(n)

      all /sigma/root(n):
      => -t* < (mu - x_bar)/(sigma/root(n)) < t*

      all /(-1)
      => t* > (x_bar - mu)/(sigma/root(n)) > -t*
      <=> -t* < (x_bar - mu)/(sigma/root(n)) < t*

      And here you have the formula he used in this video
      (11 votes)
  • aqualine ultimate style avatar for user Euler
    sort of like Katoriak's question. Why do you use the degrees of freedom for anything? I'm not making an intuitive connection.

    Mattson's answer makes sense...but why do we replace 'n' with the dof ?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user F B
      You use (n-1) degrees of freedom because all the values leading up to that last value can be any value, but the last one must fit in just right to make everything before it match the value on the other side of the equal sign. Let's say I have one hundred toys. Furthermore, I have 10 buckets. I only have (10 buckets- 1 bucket= 9 buckets) 9 buckets where I can store these toys. Whether those buckets have equal amounts of toys or not, the last bucket must bring the total number of toys to 100. So I can put ten toys per bucket (10 toys per 10 buckets equals 100), or 99 toys in the first bucket but zero toys in the middle buckets, but the last bucket must have 1, because 99 toys+1 toy= 100 toys.
      (11 votes)
  • blobby green style avatar for user Stephen Marc
    Around the end of the video, Sal talks about how there's a 95% chance that it's true that our real population mean is between 19.3 and 15.04. I don't want to confuse anyone but what I learnt in class is that it rather means that a 95% confidence interval represents the fact that when sampling from the population 95% of the time we're going to get a mean between those two values.

    It relates more to sampling a certain amount of individuals from a population multiple times and getting different sample means which could all be right.

    Its hard to explain, and a small distinction but could be important when writing a report.

    Or am I mistaken?
    (7 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user megamanxpert
    so my teacher always told us we want to "reject the null hypothesis" and if we can't we have to state that we "could not reject the null hypothesis". why was that?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Anastasia Stepanova
    In the beginning of the video Sal refers to another video with the same problem. Where can I find that video? Thanks.
    (4 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user lieunguyen57
    wait, how'd you get S to be 2.98?
    (5 votes)
    Default Khan Academy avatar avatar for user
    • male robot hal style avatar for user Sid
      He says it's from the previous video. In the previous video, he says he calculated it to be 2.98.

      If you do the math yourself, you'll get the same value. Just make sure you divide by 9, not 10 in the last step to find the variance.
      (1 vote)
  • blobby green style avatar for user Andrea Menozzi
    at he explain what t* is equal to. he says we've seen this multiple times, but i don't remember this explained before.
    would it be this?
    t*=(x_bar-mu)/s/sqrt(n)
    It seems very similar to the z score, but instead of dividing by the sample standard deviation of the sampling distribution sigma/sqrt(n) it uses s/sqrt(n)
    it this explained sonewhere else?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • leaf green style avatar for user Ric Hard
    Why can't/didn't we assume that the mean of the sample distribution of the means is 17.17? Sal assumed it would be 20. But can't we assume that it is 17.17 and do our confidence interval around that? / also do the small sample hypothesis test as well?

    I solved it and got the same answer using 17.17 as the mean.. just want to understand Sal's logic behind it. Also in previous lessons like the 7 patients and the apples weight one, we assumed sample's mean is the Ux. So I thought Ux should be 17.17 here.
    (3 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Matthew Daly
      I'm not sure where you think Sal assumed that the mean of the sample distribution was 20 in this video. starting at , Sal states that the mean of the sample is 17.17, and about you can see how he incorporates 17.17 into the formula and not 20. So your intuition was right about this problem, and that's why you got the same answer as Sal.

      If you're talking about the previous video, the reason Sal worked from 20 is because the purpose was to use the data to reject the null hypothesis that the sample was pulled from a population whose mean is 20. I suppose that he could have easily done this work in the previous video and rejected the null hypothesis because 20 is not in the 95% confidence interval, but the two ways of thinking about it are equally valid in the end.
      (0 votes)

Video transcript

This is the same problem that we had in the last video. But instead of trying to figure out whether the data supplies sufficient evidence to conclude that the engines meet the actual emissions requirement, and all of the hypothesis testing, I thought I would also use the same data that we had in the last video to actually come up with a 95% confidence interval. So you could ignore the question right here. You can ignore all of this. I'm just using that same data to come up with a 95% confidence interval for the actual mean emission for this new engine design. So we want to find a 95% confidence interval. And as you could imagine, because we only have 10 samples right here, we're going to want to use a T-distribution. And right down here I have a T-table. And we want a 95% confidence interval. So we want to think about the range of T-values that 95-- or the range that 95% of T-values will fall under. So let's think about this way. So let me draw a T-distribution right over here. So a T-distribution looks very similar to a normal distribution but it has fatter tails. This end and this end will be fatter than in a normal distribution. And then we want to find an interval, so if this is a normalized T-distribution the mean is going to be 0. And we want to find interval of T-values between some negative value here and some positive value here that contains 95% of the probability. So this right here has to be 95%. And to figure what these critical T-values are at this end and this end, we can just use a T-table. And we're going to use the two-sided version of this because we're symmetric around the center. So you look at the two-sided, we want a 95% confidence interval, so we're going to look right over here, 95% confidence interval. We have 10 data points, which means we have 9 degrees of freedom. So 9 degrees of freedom for our 10 data points. We just took 10 minus 1. So if we look over here, so for a T-distribution with 9 degrees of freedom, you're going to have 95% of the probability is going to be contained within a T-value of-- so the T-value is going to be between negative, so this value right here is 2.262, and this value right here is negative 2.262. That's what this right here tells us. That if you contain all the values that are less than 2.262 away from the center of your T-distribution, you will contain 95% of the probability. So that is our T-distribution right there. Let me make it very clear. This is our T-distribution. So if you randomly pick a T-value from this T-distribution, it has a 95% chance of being within this far from the mean. Or maybe we should write this way. If I pick a random T-value, if I take a random T-statistic-- let me write it this way-- there's a 95% chance that a random T-statistic is going to be less than 2.262, and greater than negative 2.262. 95% percent chance. Now when we took this sample, we could also derive a random T-statistic from this. We have our sample mean and our sample standard deviation, our sample mean here is 17.17-- figured that out in the last video, just add these up, divide by 10-- and our sample standard deviation here is 2.98. So the T-statistic that we can derive from this information right over here-- so let me write it over here-- the T-statistic that we could derive from this, and you can view this T-statistic as being a random sample from a T-distribution. A T-distribution with 9 degrees of freedom. So the T-statistic that we could derive from that is going to be our mean, 17.17 minus the true mean of our population. Or actually you would say the true mean of our sampling distribution, which is also going to be the same as the true mean of our population, because that's our population mean over there, divided by s, which is 2.98 over the square root of our number of samples. We've seen this multiple times. This right here is the T-statistic. So by taking this sample you can say that we've randomly sampled a T-statistic from this 9 degree of freedom T-distribution. So there's a 95% chance that this thing right over here is going to be between-- is going to be less than 2.262 and greater than negative 2.262. So the 95% probability still applies to this right here. Now we just have to do some math, calculate these things. So let me get my calculator out. And so let me just calculate this denominator right over here. So we have 2.98 divided by the square root of 10. So that's 0.9423. So what I'm going to do is I'm going to multiply both sides of this equation by this expression right over here. So if I do that-- so let me just do that right over-- so if I multiply this entire-- this is really two equations or two inequalities I should say. That this quantity is greater than this quantity and that this quantity's greater than that quantity. But we can operate on all of them at the same time, this entire inequality. So what we want to do is multiply this entire inequality by this value right over here. And we just calculated it at that value-- let me write it over here-- that 2.98-- I'll write it right over here-- 2.98 over the square root of 10 is equal to 0.942. So if I multiplied this entire inequality by 0.942 I get, on this left-hand side over here I have negative 2.262 times 0.942-- and it's a positive number that we're multiplying the whole inequality by, so the inequality signs are still going to be in the same direction-- is less than-- we're multiplying this whole expression by the same expression in the denominator so it'll cancel out. So we're just going to be less than 17.17 minus our population mean, which is going to be less than 2.262 times, once again, 0.942. Let me scroll over to the right a little bit. 0.942. Just be clear, I'm just multiplying all three sides of this inequality by this number right over here. In the middle this cancels out. So if I multiply-- I'll just write it over here-- 0.942, 0.942, 0.942. This and this is the same number so that's why those cancel out. And now let's get the calculator to figure out what these numbers are. So if we have the 0.942 times 2.262. So we're going to say times 2.262 is 2.13. So this number right over here on the right-hand side is 2.13. This number on the left is just the negative of that. So it's negative 2.13. And then we still have our inequalities-- is going to be less than 17.17 minus the mean, which is less than 2.13. Now what I want to do is I actually want to solve for this mean. And I don't like that negative sign in the mean. I'd rather have this swapped around. I'd rather have the mean minus 17.17. So what I'm going to do is multiply this entire inequality by negative 1. If you do that, if you multiply the entire thing times negative 1, this quantity right here, this negative 2.13 will become a positive 2.13. But since we are multiplying an inequality by a negative number you have to swap the inequality sign. So this less than will become a greater than. This negative mu will become a positive mu. This positive 17.17 will become a negative 17.17. We're going to have to swap this inequality sign as well, and this positive 2.13 will become a negative 2.13. And we're almost there. We just want to solve for mu. Have this inequality expressed in terms of mu. So what we can do is now just add 17.17 to all three sides of this inequality, and we are left with 2.13 plus 17.17 is greater than mu minus 17.17 plus 17.17 is just going to be mu, which is greater than-- so this is greater than mu, which is greater than negative 2.13 plus 17.17. Or a more natural way to write it since we actually have a bunch of greater than signs, that this is actually the largest number and this-- oh sorry, this is actually the smallest number and this over here is actually the largest number, is actually flipped-- you can just re-write this inequality the other way. So now we can write-- actually let's just figure out what these values are. So we have 2.13 plus 17.17. So that is the high end of our range. So that is 19.3. So this value right over here, so this is 19-- let me do it in that same color-- this value right here is 19.3 is going to be greater than mu, which is going to be greater than-- and this is negative 2.13 plus 17.17. Or we could have 17.17 minus 2.13, which gives us 15.04. And remember, the whole thing, all of this, we started with, there was a 95% chance that a random T-statistic will fall in this interval. We had a random T-statistic, and all we did is a bunch of math. So there's a 95% chance that any of these steps are true. So there's a 95% chance that this is true. There's a 95% chance that the true population mean, which is the same thing as the mean of the sampling distribution of the sample mean, there's a 95% chance, or that we are confident that there's a 95% chance, that it will fall in this interval. And we're done.