If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Conditions for valid t intervals

Examples showing how to determine if the conditions have been met for making a t interval to estimate a mean.

Want to join the conversation?

  • blobby green style avatar for user JorgeMercedes
    Why can't we use the '# of Success & #of Failure both >/= 10' test to test for Normality?
    (12 votes)
    Default Khan Academy avatar avatar for user
    • duskpin ultimate style avatar for user Daniel Ho
      I think it is because in this case we are not going to calssify people's age as success or failure.
      If we want to know the number of faculty members that older than 30 years old, then we may classify success (older than 30 yrs) and failure (not older than 30 yrs).
      (8 votes)
  • blobby green style avatar for user Ahmed Nasret
    @ previously in the videos of central limit theorem it was said that as your sample size approach infinity the sample distribution of the sample mean approaches normal. and said it is not that extent "approach infinity" 10 or 20 samples are enough to approach normal dist of sample mean.
    NOW WHERE THIS 30 SAMPLE RULE CAME FROM?
    (6 votes)
    Default Khan Academy avatar avatar for user
    • leafers tree style avatar for user ltt1456
      If you want the sample mean distribution approaches normal with 10 or 20 samples then the size of each samples must >= 30 (30 is the size of a sample not 30 samples).
      If your sample's size is < 30 and if the distribution of the population is not normal (in this video the population distribution is right skew) then then sample mean distribution won't approach normal even the number of samples you make approach infinity.
      (2 votes)
  • piceratops ultimate style avatar for user CzechDanny
    Regarding the second assumption (normality), isn't it rather law of large number than central limit theorem that would ensure that? After all, we only have one random sample.
    (4 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Ahmed Nasret
    @ said that we use t* and when we don't have access to the sample standard deviation of the sample distribution we use the sample standard deviation.
    didn't he mean to say that when we don't have access to the (( true population standard deviation .............. )) ? as sample distribution is a part of building our estimation and we use the known parameters "if any, like pop S.D" but often we don't know such parameter then we use the sample S.D
    is that true? we use sample S.D instead of the missing parameter not instead of the missing s.D of sample distribution.
    (3 votes)
    Default Khan Academy avatar avatar for user
    • leafers tree style avatar for user Tams Fletcher
      I am having some trouble understanding your question exactly. You seem to be asking why we don't know the population parameters (mean and standard deviation). I will try to explain what I think.
      In general we are using a sample to estimate the parameters of a population because it is impractical to know something about every item in a true (often large) population. For example, it is too expensive for a child seat company to call all parents about their opinions on a new carseat design. So we take a sample, a subset of the total population.
      The standard deviation of the sampling distribution is the standard error, which is approximated by the standard deviation of our sample (it cannot be by the standard deviation of the population because we do not know that parameter) divided by the square root of our sample size. We don't know the 'true' standard deviation of a sampling distribution.
      (1 vote)
  • blobby green style avatar for user Dominic B
    I read from other sources that we use t statistics when n<= 30. Why Sal is saying the opposite here?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user Chuck B
      I did not interpret it as saying the opposite. He's saying that the sampling distribution of the statistic in question should be approximately normal. And, there are three "rules" we can use to consider the distribution as approximately normal:

      1. The sample size n >= 30
      2. The original distribution is approximately normal
      3. The original distribution is symmetric about the mean

      Therefore, even when n <= 30, it may be appropriate to use t statistics when "rules" 2 or 3 apply.

      You did not cite the other sources so I couldn't chime in about that.
      (1 vote)
  • blobby green style avatar for user Taiwoo Kim
    I asked same question for later video about "z - statistics vs t - statistics"

    At , the one of conditions for t - testing is met when sample size, n, is greater or equal to 30.

    However, the later video (which I mentioned on first line of sentence) says that we should use T- statistics when we have less than 30 for our sample size.

    Could you please clarify this for me?

    Thanks!
    (2 votes)
    Default Khan Academy avatar avatar for user
  • starky ultimate style avatar for user weirderquark
    Couldn't you just say that the sampling distribution is approximately normal if:

    (1) sample size >= 30
    (2) the population distribution is roughly symmetric

    The population distribution being normal itself would satisfy the second condition, so it seems like it doesn't actually add anything.
    (3 votes)
    Default Khan Academy avatar avatar for user
  • leaf red style avatar for user dfbarbour
    It is confusing to me that the word "sample" is used for both a single data point within a trial, and for the complete trial taken as a whole. For example, if we took a bunch of 10 ml samples of pond water, each of these "samples" would have a specific measurement, say, the number of microorganisms per ml. We might also speak of a "sample" of 50 voters, for example, taken from a population of, let's say, 1000. This ambiguity of language leads to unnecessary confusion IMHO.
    (3 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user weikongwei
    At
    What is the difference between sample standard deviation and standard deviation of sampling distribution?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • leafers tree style avatar for user Tams Fletcher
      The sample standard deviation is the standard deviation from one sample (e.g. I sampled 100 voters in my home state of Queensland, and asked if they support Australia becoming a republic). This is used to approximate the true population (all Queenslander eligible to vote) standard deviation (measure of spread). The standard deviation of the sampling distribution does not approximate the population standard deviation. It is a measure of the spread of a seperate thing called the sampling distribution (e.g. I sampled 100 as above and reported the proportion as a data point, then I did a fresh 100 sample and reported that, then another, and another, ad infinitum, until I have a distribution of the values from lots of samples).
      (2 votes)
  • blobby green style avatar for user Fardeen Ashraf
    : I thought the 10% rule only applied when sampling from binomial distributions, which the original distribution clearly is not.
    (1 vote)
    Default Khan Academy avatar avatar for user

Video transcript

- [Instructor] Flavia wanted to estimate the mean age of the faculty members at her large university. She took an SRS, or simple random sample, of 20 of the approximately 700 faculty members, and each faculty member in the sample provided Flavia with their age. The data were skewed to the right with a sample mean of 38.75. She's considering using her data to make a confidence interval to estimate the mean age of faculty members at her university. Which conditions for constructing a t interval have been met? So pause this video and see if you can answer this on your own. Okay, now let's try to answer this together. So there's 700 faculty members over here. She's trying to estimate the population mean, the mean age. She can't talk to all 700, so she takes a sample, a simple random sample of 20, so the n is equal to 20 here. From this 20, she calculates a sample mean of 38.75. Now ideally, she wants to construct a t interval, a confidence interval, using the t statistic and so that interval would look something like this. It would be the sample mean plus or minus the critical value times the sample standard deviation divided by the square root of n. And we use a t statistic like this and a t table and a t distribution when we are trying to create confidence intervals for means where we don't have access to the standard deviation of the sampling distribution, but we can compute the sample standard deviation. Now in order for this to hold true, there's three conditions just like what we saw when we thought about z intervals. The first is is that our sample is random. Well, they tell us that here that she took a simple random sample of 20, and so we know that we are meeting that constraint, and that's actually choice A, the data is a random sample from the population of interest, so we can circle that in. So the next condition is the normal condition. Now the normal condition when we're doing a t interval is a little bit more involved because we do need to assume that the sampling distribution of the sample means is roughly normal. Now there's a couple of ways that we can get there. Either our sample size is greater than or equal to 30. The central limit theorem tells us that then our sampling distribution, regardless of what the distribution is in the population, that the sampling distribution actually would then be approximately normal. She didn't mean that constraint right over here. Here, her sample size is only 20, so so far this isn't looking good. Now that's not the only way to meet the normal condition. Another way to meet the normal condition, if we have a smaller sample size smaller than 30 is one, if the original distribution of ages is normal, so original distribution normal, or even if it's roughly symmetric around the mean, so approximately symmetric, but if look at it, this, they tell us that it has right skew. They say the data were skewed to the right with the sample mean of 38.75. So that tells us that the data set that we're getting in our sample is not symmetric, and the original distribution is unlikely to be normal. Think about it. It's not going to be, you're likely to have people who are, you could have faculty members who are 30 years older than this, 68 and three quarters, but you're very unlikely to have faculty members who are 30 years younger than this, and that's actually what's causing that skew to the right. So this one does not meet the normal condition. We can't feel good that our sampling distribution of the sample means is going to be normal, so I'm not gonna fill that one in. Choice C: Individual observations can be considered independent. So there's two ways to meet this constraint. One is is if we sample with replacement. Every faculty member we look at after asking them their age, we say, "Hey, go back into the pool and we might pick 'em again "until we get our sample of 20." It does not look like she did that. It doesn't look like she sampled with replacement, and so even if you're sampling without replacement, the 10% rule says that, "Look, as long as this is less than "10% or less than or equal to 10% of the population, "then we're good," and the 10% of this population is 70; 70 is 10% of 700, and so this is definitely less than or equal to 10%, and so it can be considered independent, and so we can actually meet that constraint as well. So the main issue where our t interval might not be so good is that our sampling distribution, we can't feel so confidant that that is going to be normal.