If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content
Current time:0:00Total duration:6:43

10% Rule of assuming "independence" between trials

Video transcript

as we go further in our statistical careers it's going to be valuable to assume that certain distributions are normal distributions or sometimes to assume that they are binomial distributions because if we can do that we can make all sorts of interesting inferences about them when we make that assumption but one of the key things about normal distributions or binomial distributions is we assume that they're the sum or they can be viewed as a sum of a bunch of independent trials so we have to assume that trials trials are independent now that is reasonable in a lot of situations but sometimes let's say you're conducting a survey of people exiting them all and in that case and let's say you're saying whether they have done their taxes already if they're exiting the mall it's hard to do these samples with replacement they're leaving the mall you can't say hey wait I just asked you a question now you've answered it now go back into the mall because I want each trial to be truly independent but we all know it feels intuitive that hey if there are 10,000 people in the mall and I'm going to sample ten of them does it really matter that it's truly independent doesn't it matter that we're just kind of close to being independent and because of that idea and because we do want to make inferences based on things being close to a binomial distribution or a normal distribution we have something called the ten percent rule and the ten percent rule says that if our sample if our sample is less than or equal to ten percent of the population then it is okay to assume approximate independence approximate in the pendants and there are some fairly sophisticated ways of coming up with this ten percent threshold people could have picked nine percent they could have picked 10.1% but ten percent is a nice round number and if we look at some tangible examples it seems to do a pretty good job so for example right over here let's let X be the number boys from three trials of selecting from a classroom of n students where 50% of the class is a boy and 50% of the class is a girl and so what we have over here is we have a bunch of different ends what if we have 20 students in the class what if we have 30 what if we have 100 what if we have 10,000 and so we could find the probability that we select three boys with replacement in each of these scenarios and we could also find the probability that we select three boys without replacement and then we could think about what proportion is our sample size of the entire population and then we could say hey does the 10% rule actually make sense so this first column where we are picking three boys with replacement in this case because we are replacing each of these trials are independent are truly independent and in our trials are independent then X would be truly a binomial variable here we aren't independent because we are not replacing so not in dependent and so officially in this column right over here when we're not replacing X would not be considered a binomial random variable but let's see if there's a threshold where if our sample size is a small enough percentage of our entire population where we would feel not so bad about assuming X is close to being binomial so in all of the cases where you have independent trials and 50% of the population is boys 50% is girls well you're going to amount to 1/2 times 1/2 times 1/2 so in all of those situations you have a 12.5% chance that X is going to be equal to 3 and in this case X would be a binomial variable but look over here when we when 3 is a fairly large percentage of our population in this case it is 15% the percent chance of getting 3 boys without replacement is 10.5% which is reasonably different from 12 and 1/2 percent it is 2 percent different but 2 percent relative to 12 and 1/2 percent so that's someplace in between 10 and 20 percent difference in terms of the probability so this is a reasonably big difference but as we increase the population size without increasing the sample size we see that these numbers get closer and closer to each other all the way so that if you have 10,000 people in your population and you're only doing three trials that the numbers get very very close this is actually twelve point four nine something percent but if you round to the nearest tenth of a percent you see that they are close so I think most people would say all right if your sample is three ten thousandth of the population that you feel pretty good treating this column without replacement as being pretty close to being a binomial variable and most people would say all right this first scenario where your sample size is fifteen percent of your population you wouldn't feel so good treating this without replacement column as a binomial random variable but where do you draw the line and as we alluded to earlier in the video the line is typically drawn at ten percent that if your sample size is less than or equal to ten percent of your population it's not unreasonable to treat your random variable even though it's not officially binomial to say okay maybe it is maybe I can functionally treat it by as binomial and then from there I can make all of the powerful inferences that we tend to do in statistics with that said the lower the percentage the sample is of the population the better now to be clear that's not saying that small sample sizes are better than large sample sizes in statistics large sample sizes tend to be a lot better than small sample sizes but if you want to make this independence assumption so to speak even when it's not exactly true you want your sample to be a small percentage of the population so the idea let's say you're doing a survey at the mall you might want to survey a hundred people but you would hope that there's at least a thousand people in the mall and in order for you to feel like your trials are reasonably independent if there's 10,000 people in the mall or somehow 50,000 people in the mall which would be a very large mall that that's even better