Main content

### Course: AP®︎/College Statistics > Unit 11

Lesson 1: Constructing a confidence interval for a population mean- Introduction to t statistics
- Simulation showing value of t statistic
- Conditions for valid t intervals
- Reference: Conditions for inference on a mean
- Conditions for a t interval for a mean
- Example finding critical t value
- Finding the critical value t* for a desired confidence level
- Example constructing a t interval for a mean
- Calculating a t interval for a mean
- Confidence interval for a mean with paired data
- Making a t interval for paired data
- Interpreting a confidence interval for a mean

© 2024 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Conditions for valid t intervals

Examples showing how to determine if the conditions have been met for making a t interval to estimate a mean.

## Want to join the conversation?

- Why can't we use the '# of Success & #of Failure both >/= 10' test to test for Normality?(12 votes)
- I think it is because in this case we are not going to calssify people's age as success or failure.

If we want to know the number of faculty members that older than 30 years old, then we may classify success (older than 30 yrs) and failure (not older than 30 yrs).(8 votes)

- @2:20previously in the videos of central limit theorem it was said that as your sample size approach infinity the sample distribution of the sample mean approaches normal. and said it is not that extent "approach infinity" 10 or 20 samples are enough to approach normal dist of sample mean.

NOW WHERE THIS 30 SAMPLE RULE CAME FROM?(6 votes)- If you want the sample mean distribution approaches normal with 10 or 20 samples then the size of each samples must >= 30 (30 is the size of a sample not 30 samples).

If your sample's size is < 30 and if the distribution of the population is not normal (in this video the population distribution is right skew) then then sample mean distribution won't approach normal even the number of samples you make approach infinity.(2 votes)

- Regarding the second assumption (normality), isn't it rather law of large number than central limit theorem that would ensure that? After all, we only have one random sample.(4 votes)
- The law of large numbers tells us only that the sample mean will converge to the population mean without telling us how it is distributed which is necessary when we are talking about confidence intervals.(2 votes)

- @1:32said that we use t* and when we don't have access to the sample standard deviation of the sample distribution we use the sample standard deviation.

didn't he mean to say that when we don't have access to the (( true population standard deviation .............. )) ? as sample distribution is a part of building our estimation and we use the known parameters "if any, like pop S.D" but often we don't know such parameter then we use the sample S.D

is that true? we use sample S.D instead of the missing parameter not instead of the missing s.D of sample distribution.(3 votes)- I am having some trouble understanding your question exactly. You seem to be asking why we don't know the population parameters (mean and standard deviation). I will try to explain what I think.

In general we are using a sample to estimate the parameters of a population because it is impractical to know something about every item in a true (often large) population. For example, it is too expensive for a child seat company to call all parents about their opinions on a new carseat design. So we take a sample, a subset of the total population.

The standard deviation of the sampling distribution is the standard error, which is approximated by the standard deviation of our sample (it cannot be by the standard deviation of the population because we do not know that parameter) divided by the square root of our sample size. We don't know the 'true' standard deviation of a sampling distribution.(1 vote)

- I read from other sources that we use t statistics when n<= 30. Why Sal is saying the opposite here?(3 votes)
- I did not interpret it as saying the opposite. He's saying that the sampling distribution of the statistic in question should be approximately normal. And, there are three "rules" we can use to consider the distribution as approximately normal:

1. The sample size n >= 30

2. The original distribution is approximately normal

3. The original distribution is symmetric about the mean

Therefore, even when n <= 30, it may be appropriate to use t statistics when "rules" 2 or 3 apply.

You did not cite the other sources so I couldn't chime in about that.(1 vote)

- I asked same question for later video about "z - statistics vs t - statistics"

At2:36, the one of conditions for t - testing is met when sample size, n, is greater or equal to 30.

However, the later video (which I mentioned on first line of sentence) says that we should use T- statistics when we have less than 30 for our sample size.

Could you please clarify this for me?

Thanks!(2 votes)- Ok basically the conditions for Z and T include

1. SRS

2. Normality (n>30)

3. Independence

ALL conditions must be met to use Z. For T, though it is preferred to meet all the conditions, the test is possible to use with 1 or more of the conditions violated.(3 votes)

- Couldn't you just say that the sampling distribution is approximately normal if:

(1) sample size >= 30

(2) the population distribution is roughly symmetric

The population distribution being normal itself would satisfy the second condition, so it seems like it doesn't actually add anything.(3 votes) - It is confusing to me that the word "sample" is used for both a single data point within a trial, and for the complete trial taken as a whole. For example, if we took a bunch of 10 ml samples of pond water, each of these "samples" would have a specific measurement, say, the number of microorganisms per ml. We might also speak of a "sample" of 50 voters, for example, taken from a population of, let's say, 1000. This ambiguity of language leads to unnecessary confusion IMHO.(3 votes)
- At1:33

What is the difference between sample standard deviation and standard deviation of sampling distribution?(1 vote)- The sample standard deviation is the standard deviation from one sample (e.g. I sampled 100 voters in my home state of Queensland, and asked if they support Australia becoming a republic). This is used to approximate the true population (all Queenslander eligible to vote) standard deviation (measure of spread). The standard deviation of the sampling distribution does not approximate the population standard deviation. It is a measure of the spread of a seperate thing called the sampling distribution (e.g. I sampled 100 as above and reported the proportion as a data point, then I did a fresh 100 sample and reported that, then another, and another, ad infinitum, until I have a distribution of the values from lots of samples).(2 votes)

- 4:15: I thought the 10% rule only applied when sampling from binomial distributions, which the original distribution clearly is not.(1 vote)

## Video transcript

- [Instructor] Flavia wanted
to estimate the mean age of the faculty members
at her large university. She took an SRS, or simple random sample, of 20 of the approximately
700 faculty members, and each faculty member in
the sample provided Flavia with their age. The data were skewed to the
right with a sample mean of 38.75. She's considering using her data to make a confidence interval
to estimate the mean age of faculty members at her university. Which conditions for
constructing a t interval have been met? So pause this video and
see if you can answer this on your own. Okay, now let's try to
answer this together. So there's 700 faculty members over here. She's trying to estimate the
population mean, the mean age. She can't talk to all 700,
so she takes a sample, a simple random sample of 20,
so the n is equal to 20 here. From this 20, she calculates
a sample mean of 38.75. Now ideally, she wants to
construct a t interval, a confidence interval,
using the t statistic and so that interval would
look something like this. It would be the sample mean
plus or minus the critical value times the sample standard deviation divided by the square root of n. And we use a t statistic
like this and a t table and a t distribution when
we are trying to create confidence intervals for means
where we don't have access to the standard deviation of
the sampling distribution, but we can compute the
sample standard deviation. Now in order for this to hold
true, there's three conditions just like what we saw when
we thought about z intervals. The first is is that our sample is random. Well, they tell us that here that she took a simple
random sample of 20, and so we know that we are
meeting that constraint, and that's actually choice A, the data is a random sample
from the population of interest, so we can circle that in. So the next condition
is the normal condition. Now the normal condition
when we're doing a t interval is a little bit more involved
because we do need to assume that the sampling distribution
of the sample means is roughly normal. Now there's a couple of
ways that we can get there. Either our sample size is
greater than or equal to 30. The central limit theorem tells us that then our sampling distribution, regardless of what the
distribution is in the population, that the sampling distribution
actually would then be approximately normal. She didn't mean that
constraint right over here. Here, her sample size is only 20, so so far this isn't looking good. Now that's not the only way
to meet the normal condition. Another way to meet the normal condition, if we have a smaller
sample size smaller than 30 is one, if the original
distribution of ages is normal, so original distribution normal, or even if it's roughly
symmetric around the mean, so approximately symmetric, but if look at it, this, they tell us that it has right skew. They say the data were skewed to the right with the sample mean of 38.75. So that tells us that the data set that we're getting in our
sample is not symmetric, and the original distribution
is unlikely to be normal. Think about it. It's not going to be, you're
likely to have people who are, you could have faculty
members who are 30 years older than this, 68 and three quarters, but you're very unlikely
to have faculty members who are 30 years younger than this, and that's actually what's
causing that skew to the right. So this one does not meet
the normal condition. We can't feel good that
our sampling distribution of the sample means is going to be normal, so I'm not gonna fill that one in. Choice C: Individual
observations can be considered independent. So there's two ways to
meet this constraint. One is is if we sample with replacement. Every faculty member we look at after asking them their age, we say, "Hey, go back into the pool
and we might pick 'em again "until we get our sample of 20." It does not look like she did that. It doesn't look like she
sampled with replacement, and so even if you're
sampling without replacement, the 10% rule says that, "Look,
as long as this is less than "10% or less than or equal
to 10% of the population, "then we're good," and
the 10% of this population is 70; 70 is 10% of 700,
and so this is definitely less than or equal to 10%,
and so it can be considered independent, and so we can
actually meet that constraint as well. So the main issue where our t
interval might not be so good is that our sampling distribution, we can't feel so confidant that
that is going to be normal.