If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

# Conditions for valid confidence intervals for a proportion

AP.STATS:
UNC‑4 (EU)
,
UNC‑4.B (LO)
,
UNC‑4.B.1 (EK)
,
UNC‑4.B.2 (EK)

## Video transcript

what we're going to do in this video is dig a little bit deeper into confidence intervals and other videos we compute them we even interpret them but here we're going to make sure that we are making the right assumptions so that we can have confidence in our confidence intervals or that we are even calculate them in the right way or in the right context so just as a bit of review a lot of what we do in confidence intervals is we are trying to assume we're trying to estimate some population parameter let's say it's the proportion maybe it's the proportion that will vote for candidate we can't survey everyone so we take a sample and from that sample maybe we calculate a sample proportion and then using this sample proportion we calculate a confidence interval on either side of that sample proportion and what we know is that if we do this many many many times every time we do it we are very likely to have a different sample proportion so that B sample proportion one sample proportion too and every time we do it we might get this is maybe this is sample portion two not only will we get a different I guess you say center of our interval but the margin of error might change because it's we are using the sample proportion to calculate it but the first assumption that has to be true and even or make any claims about this confidence interval with confidence is that your sample is random so that you have a random sample if you're trying to estimate the proportion of people that are going to vote for a certain candidate but you are only serving people at a senior community well that would not be a truly random sample or for your only survey people on a college campus so like with all things with statistics you really want to make sure that you're dealing with a random sample and take great care to do that the second thing that we have to assume and this is sometimes known as the normal condition normal condition remember the whole basis behind confidence intervals as we assume that the distribution of the sample proportions the sampling distribution of the sample proportions has roughly a normal shape like that but in order to make that assumption that it's roughly normal we have this normal condition and the rule of thumb here is that you would expect per sample more than ten successes successes and successes and failures each each so for example if your sample size was only 10 let's say the true proportion was 50 50 percent or 0.5 then you wouldn't meet that normal condition because you would expect 5 successes and 5 failures for each sample now because usually when we're doing confidence intervals we don't even know the true population parameter what we would actually just do is look at our sample and just count how many successes and how many failures we have and if we have less than 10 on on either one of those then we are going to have a problem so you want to expect you want to have at least greater than or equal to 10 successes or failures on each and you actually don't even have to say expect because you're going to get a sample and you could just count how many successes and failures you have if you don't see that then the normal condition is not met and the statements you make about your confidence interval aren't necessarily going to be as valid the last thing we want to really make sure is known as the independence condition independence condition and this is the 10% rule if we are sampling without replacement and sometimes it's hard to do a replacement if you're surveying people who are exiting a store for example you can't ask them to go back into the store or it might be very awkward to ask them to go back in the store and so the independence condition is that your sample size so sample let me just say n n is less than 10% of the population size and so let's say your population were a hundred thousand people and if you surveyed a thousand people well that was one percent of the population so you'd feel pretty good that the independence condition is met and once again this is valuable when you are sampling without replacement now to appreciate how our confidence intervals don't do what we think they're going to do when any of these things are broken and I'll focus on these latter two the random sample condition that's super important frankly all of statistics so let's first look at a situation where our independence condition breaks down so right over here you can see that we are using our little gumball simulation and in that gumball simulation we have a true population proportion but someone doing these samples might not know that we're trying to construct confidence interval with a 95% confidence level and what we've set up here is we aren't replacing so every member of our sample we're not looking at it then putting it back in we're just gonna take a sample of 200 and I've set up the population so that it's a far larger than 10% of the population and then when I drew a bunch of samples so this is a situation where I did almost 1,500 samples here of size 200 what you can see here is the situations where our true population parameter was contained in the confidence interval that we calculated for that sample and then you see in red the ones where it's not and as you can see we are only having a hit so to speak the overlap between the confidence interval that we're calculating in the true population parameter is happening about 93 percent of the time and this is a pretty large number of samples that's truly at a 95% confidence level this should be happening 95% of the time similarly we can look at a situation where our normal condition breaks down and our normal condition we can see here that our sample size right here is 15 and actually if I scroll down a little bit you can see that the simulation even worsse me there are fewer than 10 expected successes and you can see that when I do once again I did a bunch of samples here I did over 2,000 samples even though I'm trying to set up these confidence intervals that have that every time I computed that I have over time that there's a kind of a 95 percent hit rate so to speak here there's only a 94 percent hit rate and I've done a lot of samples here and so the big takeaway not being random will really skew things but if you don't feel good about how normal the actual sampling distribution of the sample portions are or if your sample size is a fairly large chunk of your population and you're not replacing and you're violating the independence condition then your confidence level that you think your computing for when you make your confidence intervals might not be valid