If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

10% Rule of assuming "independence" between trials

10% Rule of assuming "independence" between trials.

Want to join the conversation?

  • blobby green style avatar for user Darren Huang
    doesn't this mean drawing cards in a deck of cards without replacing can be binomial if my number of trials is less than 5?
    (14 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Rishabh Chopra
    How does the 10% rule make sense? The 10% rule says that if my sample size is less than 10% of the population, then I can assume independence. Isn’t this counterintuitive? Why does taking a smaller sample size result in a more accurate probability?
    (4 votes)
    Default Khan Academy avatar avatar for user
    • piceratops ultimate style avatar for user Dayvyd
      Hi Rishabh,
      A smaller sample size does not result in a more accurate probability, but rather results in the ability to assume independence, which then allows us to make some useful inferences about the results. Sal touches on this during the last minute.
      Hope this helped. You can learn anything!
      (11 votes)
  • blobby green style avatar for user ju lee
    why is 10% chosen? what advantage does it have over other percentages?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • starky ultimate style avatar for user Kevin Winata
      The main idea here is that because as the proportion of the sample size over the population approaches 0, it behaves more like binomial distribution. So people might want to make a rule of thumb to use the assumption of independence. There's no particular reason to choose why 10% as why don't we choose 11% or 9%. It depends on the statistician's preference to accuracy. One possible reason to favor 10% is because it's easier to compute 10% of a number than, let say, 8%. Hope that helps! CMIIW
      (8 votes)
  • blobby green style avatar for user Sanjana Khedekar
    what are the properties of normal distribution? there is no video till now on it , to best of my knowledge. Plz provide me link if there is so.
    (1 vote)
    Default Khan Academy avatar avatar for user
    • primosaur seed style avatar for user Ian Pulizzotto
      Here are some properties of the normal distribution.

      1. The normal distribution is symmetric about its only peak. The peak is located at the mean, median, and mode, which are all equal.

      2. The probability is approximately 68% that the score is within 1 standard deviation from the mean (in either direction), approximately 95% that the score is within 2 standard deviations from the mean, and approximately 99.73% that the score is within 3 standard deviations of the mean.

      3. Any linear combination of any number of independent normally distributed random variables is also normally distributed.

      4. For a sufficiently large number of independent random variables with a common distribution (not necessarily normal) with finite mean and finite nonzero variance, the sample mean is approximately normally distributed.
      (8 votes)
  • blobby green style avatar for user Jiska Mulderij
    Does the 10% rule apply to a randomized experiment?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • aqualine tree style avatar for user Brynn Wallace
    So does the 10% condition take into account future population? I'm analyzing data from all 29 kids in my senior class and we're a small school, so this is more than 10% of seniors who have gone to my school, but less than 10% of all seniors who will ever go to my school. Does that qualify or not?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • leaf green style avatar for user Amandeep Singh
    In this example what was the sample size?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • winston default style avatar for user Victor Gutierrez
    In the example that Sal explains at the beginning about a mall, if there is people entering the mall while we sample the people that leave the mall, we shouldn´t mind about the 10% rule right? Because this way is as if there was replacement.
    (1 vote)
    Default Khan Academy avatar avatar for user
  • stelly blue style avatar for user var
    the video refers to the rule of independence but doesn't that rule just mean that the outcome of one trial doesn't affect the outcome of another?
    i feel like the right rule this relates to is the one that states that the probability of success must be constant (which is usually what replacement affects)
    maybe that was just an oversight but that was a bit confusing
    (1 vote)
    Default Khan Academy avatar avatar for user
  • leaf green style avatar for user omprakash.nekkanti
    why is it important for our trials to be independent in a binomial varible?
    (1 vote)
    Default Khan Academy avatar avatar for user

Video transcript

- [Instructor] As we go further in our statistical careers, it's going to be valuable to assume that certain distributors are normal distributions or sometimes to assume that they are binomial distributions because if we can do that, we can make all sorts of interesting inferences about them when we make that assumption. But one of the key things about normal distributions or binomial distributions is we assume that they're the sum or they can be viewed as the sum of a bunch of independent trials so we have to assume that trials are independent. Now that is reasonable in a lot of situations, but sometimes let's say you're conducting a survey of people exiting a mall and in that case and let's say you're saying whether they have done their taxes already. If they're exiting the mall, it's hard to do these samples with replacement. They're leaving the mall. You can't say, "Hey, hey, wait. "I just asked you a question. "Now you've answered it. "Now go back into the mall "because I want each trial to be truly independent." But we all know it feels intuitive that hey if there are 10,000 people in the mall and I'm going to sample 10 of them, does it really matter that it's truly independent? Doesn't it matter that we're just close to being independent? And because of that idea and because we do wanna make inferences based on things being close to a binomial distribution or a normal distribution, we have something called the 10% rule and the 10% rule says that if our sample, if our sample is less than or equal to 10% of the population then it is okay to assume approximate independence and there are some fairly sophisticated ways of coming up with this 10% threshold. People could have picked 9%. They could have picked 10.1%, but 10% is a nice round number. And if we look at some tangible examples, it seems to do a pretty good job. So for example right over here, let's let x be the number of boys from three trials selecting from a classroom of n students where 50% of the class is a boy and 50% of the glass is a girl and so what we have over here is we have a bunch of different n's. What if we have 20 students in the class? What if we have 30? What if we have 100? What if we have 10,000? And so we could find the probability that we select three boys with replacement in each of these scenarios and we could also find the probability that we select three boys without replacement and then we could think about what proportion is our sample size of the entire population and then we could say, "Hey, does the 10% rule actually make sense?" So this first column where we are picking three boys with replacement, in this case because we are replacing, each of these trials are independent, are truly independent. And if our trials are independent, then x would be truly a binomial variable. Here, we aren't independent because we are not replacing, so not independent, and so officially in this column right over here when we're not replacing, x would not be considered a binomial random variable. Let's see if there's a threshold where if our sample size is a small enough percentage of our entire population where we would feel not so bad about assuming x is close to being binomial. So in all of the cases where you have independent trials and 50% of the population is boys, 50% is girls, you're going to amount to 1/2 times 1/2 times 1/2 so in all of those situations we have a 12.5% chance that x is going to be equal to three and in this case x would be a binomial variable. But look over here. When three is a fairly large percentage of our population, in this case it is 15%, the percent chance of getting three boys without replacement is 10.5% which is reasonably different from 12.5%. It is 2% different but 2% relative to 12.5% so that's some place in between 10 and 20% difference in terms of the probability. So this is a reasonably big difference. But as we increase the population size without increasing the sample size, we see that these numbers get closer and closer to each other all the way so that if you have 10,000 people in your population and you're only doing three trials that the numbers get very, very close. This is actually 12.49 something percent, but if you round to the nearest tenth of a percent, you see that they are close. So I think most people would say, "All right, if your sample "is three ten-thousandths of the population "that you'd feel pretty good "treating this column without replacement "as being pretty close to being a binomial variable." And most people would say, "All right, this first scenario "where your sample size is 15% of your population, "you wouldn't feel so good treating this "without replacement column as a binomial random variable." But where do you draw the line? And as we alluded to earlier in the video, the line is typically drawn at 10%. That if your sample size is less than or equal to 10% of your population, it's not unreasonable to treat your random variable, even though it's not officially binomial to say, "Okay, maybe it is. "Maybe I can functionally treat it as binomial "and then from there "I can make all of the powerful interferences "that we tend to do in statistics." With that said, the lower the percentage the sample is of the population the better. Now to be clear, that's not saying that small sample sizes are better than large sample sizes. In statistics, large sample sizes tend to be a lot better than small sample sizes. But if you wanna make this independence assumption, so to speak, even when it's not exactly true, you want your sample to be a small percentage of the population. So the ideal, let's say you're doing a survey at the mall, you might wanna survey 100 people but you would hope that there's at least 1,000 people in the mall in order for you to feel like your trials are reasonably independent. If there's 10,000 people in the mall or somehow 50,000 people in the mall, which would be a very large mall, well that's even better.