Main content

## Statistics and probability

### Course: Statistics and probability > Unit 11

Lesson 2: Estimating a population proportion- Confidence interval example
- Margin of error 1
- Margin of error 2
- Conditions for valid confidence intervals for a proportion
- Conditions for confidence interval for a proportion worked examples
- Reference: Conditions for inference on a proportion
- Conditions for a z interval for a proportion
- Critical value (z*) for a given confidence level
- Finding the critical value z* for a desired confidence level
- Example constructing and interpreting a confidence interval for p
- Calculating a z interval for a proportion
- Interpreting a z interval for a proportion
- Determining sample size based on confidence and margin of error
- Sample size and margin of error in a z interval for p

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Conditions for confidence interval for a proportion worked examples

AP.STATS:

UNC‑4 (EU)

, UNC‑4.B (LO)

, UNC‑4.B.1 (EK)

, UNC‑4.B.2 (EK)

Conditions for confidence intervals for a population proportion worked examples--random condition, independence condition and normal condition.

## Want to join the conversation?

- So in this example, how is a good model of analysis? we can't increase sample size to meet the 2nd requirement, because will violate the 3rd condition, and also can't decrease sample size to meet 3rd condition because it will violate the 2nd condition?(5 votes)
- I sometimes see the rule np ≥10 and sometimes it says n should be sufficiently large. Are these conditions of different things ? How is central limit theorem related to confidence levels?(2 votes)
- The rule np ≥ 10 is used for binomial distributions. "n should be sufficiently large" refers to the Central Limit Theorem, which states that if n > 30, then we can say that the distribution is approximately normal. If n < 30, then we can not definitively assert that the distribution is approximately normal. (Note that if n is close to 30, say 25 or more, we can sometimes still assume a normal distribution.)

I am not sure whether you mean confidence level or confidence interval in your second question since confidence level is usually something you choose according to the situation. I will assume you meant confidence intervals in the following explanation:

To relate the Central Limit Theorem to confidence intervals, we need to look at the formula for a confidence interval. For a normal distribution with a population mean μ and sample mean x̄, the confidence interval would be x̄ ± z*(σ/√n). So if n is small, ie less than 30, the confidence interval would be larger (less confidence in our results). If n is large, the confidence interval would be smaller (more confidence in our results). This makes sense since the more data we have, the more representative the sample is of the population.(5 votes)

- Why is the Condition 3 not met? Here can we interpret that he randomly selects 30 people first and went to them one by one while sampling? This is a bit ambiguous in the context.(1 vote)
- Since we're sampling without replacement the first person has a

1∕150 ≈ 0.67% chance of being picked while the 30th person has a 1/(150 − 29) ≈ 0.83% chance.

This is too much of a difference to be considered independent, and will cause the expected sample proportion to be different from the population proportion.

To avoid bias the sample size shouldn't exceed 10% of the population.(6 votes)

- But taking more samples would give better estimates?(2 votes)
- what is required to calculate a confidence interval(1 vote)
- On the matter of independence, what if he samples all 30 individuals at once? He selects 30 people and then asks, instead of drawing a person and then asking.

Does that change anything?(1 vote) - does it apply to np-hat as well? originally it is np (population proportion).(1 vote)
- I understand that the sample size relative to population does not meet the description of independence, but two aspects of this exclusion seem problematic in this case. 1) if you were to replace, it seems that you would be more likely to cause deviation between the sample and pop. mean because the replaced item could be counted twice and over-represent that characteristic. 2)if n=p the mean is the same, and if n approximates p the sample mean is likely to be close to or equal the pop. mean. Are these concerns unreasonable?(1 vote)

## Video transcript

- [Instructor] Ali is in
charge of the dinner menu for his senior prom, and he
wants to use a one-sample Z interval to estimate
what proportion of seniors would order a vegetarian option. He randomly selects 30
of the 150 total seniors and finds that seven of those sampled would order the vegetarian option. Which conditions for constructing
this confidence interval did Ali's sample meet? So, pause this video, and you can select more than one of these. Alright now, let's work
through this together. So one thing that you might
be wondering is, well, what is a one-sample Z interval? Well, you could really interpret
that as he's gonna take one sample and then construct
a confidence interval based on that. The reason why it might
be called a Z interval is the whole idea behind a
confidence interval is you're going to pick a number of
standard deviations above and below the true parameter that
you are actually trying to estimate, and then use that
to make your inferences. And one way of thinking
about the number of standard deviations, people will
often call that a Z score, or Z is often used as a variable
for the number of standard deviations above or below something. So really, he's just trying
to construct a confidence interval, but remember,
in order to construct a confidence interval, we
have to make some assumptions. He's taking, there's 150
students, right over here. He's finding it impractical
to survey all 150 to figure out the true
population proportion. So instead, he samples 30 of the seniors. So, N is equal to 30. And from that, he calculates
a sample proportion. Looks like seven out of the 30 are, they want the vegetarian option. And he's going to determine
some confidence level and then construct a confidence interval. But remember the conditions
that we've talked about in the previous videos. The first thing is, we
have to be confident that, is this a random sample? So that would be the random condition, and that's what choice A is telling us. The data is a random sample
from the population of interest. Do we know that? Well, it tells us in the passage here, he randomly selects 30
of the total seniors. So I guess we'll take their word for it. We don't know his methodology
of what he considers random, but we'll take their
word for it, that yes, this has been met. The data is a random sample. If it said he sampled
the football team, well, that would not have been a random sample. The next condition here
looks all mathematical, but this is really the normal condition. And the idea behind the
normal condition is that, in order to construct
these confidence intervals, we're assuming that the
sampling distribution of the sample proportions
is roughly normal, and it is not skewed to the right or skewed to the left like this. And so, right here it says,
look, the sample size times our sample proportion has to
be greater than/equal to 10. Or our sample size times one
minus our sample proportion has to be greater than/equal to 10. Well, another way to think about this is, our successes in our sample need to be greater than/equal to 10,
and our failures need to be greater than/equal to 10. Well, how many successes were there? There were seven. And you could even say,
look, our N is 30 times our sample proportion is seven over 30, which is going to be seven. So our successes is less
than 10, so actually, we violate the normal condition. Once again, this is a rule of
thumb, but this is telling us that our actual sampling
distribution might be skewed. Remember, this is just
based on one sample, what we're able to figure out. This is one sample Z interval. We might be wrong, but
we wouldn't feel good that we're meeting the
normal condition here, so I would rule this one out. Individual observations can
be considered independent. Well, if he randomly selected
people with replacement, then they could be independent. Or, if the people he is
selecting, if his sample size is less than 10% of the total population, then it could be considered independent, even though it wouldn't
be perfectly independent. But we see here that he
sampled 30 people out of 150. So his sample size was 30 out of 150, which is the same thing as
one fifth of the population, which is the same thing as 20%. And since this is greater than 10%, we are violating the
independence condition. We could have met the
independence condition if he was sampling with replacement, which
it doesn't seem like he is, or if this thing right over
here was less than 10%. But we're not meeting that, so we cannot feel good about that constraint. And so, since we're not meeting
two of the three constraints for, I would say, valid
confidence intervals, or confidence intervals we
would feel confident in, this is not so good of an
analysis on Ali's part.