Estimating a population mean
Conditions for valid t intervals
- [Instructor] Flavia wanted to estimate the mean age of the faculty members at her large university. She took an SRS, or simple random sample, of 20 of the approximately 700 faculty members, and each faculty member in the sample provided Flavia with their age. The data were skewed to the right with a sample mean of 38.75. She's considering using her data to make a confidence interval to estimate the mean age of faculty members at her university. Which conditions for constructing a t interval have been met? So pause this video and see if you can answer this on your own. Okay, now let's try to answer this together. So there's 700 faculty members over here. She's trying to estimate the population mean, the mean age. She can't talk to all 700, so she takes a sample, a simple random sample of 20, so the n is equal to 20 here. From this 20, she calculates a sample mean of 38.75. Now ideally, she wants to construct a t interval, a confidence interval, using the t statistic and so that interval would look something like this. It would be the sample mean plus or minus the critical value times the sample standard deviation divided by the square root of n. And we use a t statistic like this and a t table and a t distribution when we are trying to create confidence intervals for means where we don't have access to the standard deviation of the sampling distribution, but we can compute the sample standard deviation. Now in order for this to hold true, there's three conditions just like what we saw when we thought about z intervals. The first is is that our sample is random. Well, they tell us that here that she took a simple random sample of 20, and so we know that we are meeting that constraint, and that's actually choice A, the data is a random sample from the population of interest, so we can circle that in. So the next condition is the normal condition. Now the normal condition when we're doing a t interval is a little bit more involved because we do need to assume that the sampling distribution of the sample means is roughly normal. Now there's a couple of ways that we can get there. Either our sample size is greater than or equal to 30. The central limit theorem tells us that then our sampling distribution, regardless of what the distribution is in the population, that the sampling distribution actually would then be approximately normal. She didn't mean that constraint right over here. Here, her sample size is only 20, so so far this isn't looking good. Now that's not the only way to meet the normal condition. Another way to meet the normal condition, if we have a smaller sample size smaller than 30 is one, if the original distribution of ages is normal, so original distribution normal, or even if it's roughly symmetric around the mean, so approximately symmetric, but if look at it, this, they tell us that it has right skew. They say the data were skewed to the right with the sample mean of 38.75. So that tells us that the data set that we're getting in our sample is not symmetric, and the original distribution is unlikely to be normal. Think about it. It's not going to be, you're likely to have people who are, you could have faculty members who are 30 years older than this, 68 and three quarters, but you're very unlikely to have faculty members who are 30 years younger than this, and that's actually what's causing that skew to the right. So this one does not meet the normal condition. We can't feel good that our sampling distribution of the sample means is going to be normal, so I'm not gonna fill that one in. Choice C: Individual observations can be considered independent. So there's two ways to meet this constraint. One is is if we sample with replacement. Every faculty member we look at after asking them their age, we say, "Hey, go back into the pool and we might pick 'em again "until we get our sample of 20." It does not look like she did that. It doesn't look like she sampled with replacement, and so even if you're sampling without replacement, the 10% rule says that, "Look, as long as this is less than "10% or less than or equal to 10% of the population, "then we're good," and the 10% of this population is 70; 70 is 10% of 700, and so this is definitely less than or equal to 10%, and so it can be considered independent, and so we can actually meet that constraint as well. So the main issue where our t interval might not be so good is that our sampling distribution, we can't feel so confidant that that is going to be normal.