What is the intuition for the rule np>=10 , n(1-p) >=10 ?

Think about the edge case of `p = 0.1`, then `n * 0.1 >= 10` i.e. `n >= 100` - so, if your probability parameter is around `10%` then you would need at least 100 samples before the SD is so tight that it (i.e. the left side) gets mostly captured in the `[0-0.1]` interval

In the first example, how could we tell which way it was going to be skewed?

Someone correct me if I'm wrong. The mean of the sampling dist is p (population proportion). If your sampling dist is indeed skewed, then when p is closer to 0 than 1, the top of the distribution "hump" will be closer to 0 than to 1, so it will be skewed to the left, and vice versa.

So as n increases, even very low or very high values of p start to produce normal sampling distributions? Are normal distributions impossible if n < 10?

As for your first question, you might be interested in this video by _jbstatistics_ on YT: https://www.youtube.com/watch?v=fuGwbG9_W1c. The part 6:45~7:57 graphs the sample distribution of sample proportion with p=0.04 and increasing n, and the part 8:15~8:52 does the same thing except with p=0.96. That might answer your question. As for the second question: First, I think the 10 is more of an arbitrary value than an actual rule. That's because it's quite subjective whether a graph with lower (np) and/or n(1-p) is normal or not (i.e., the video I linked for your first question gives 15 instead of 10 as in this video.) Secondly though, as you can see in the video on youtube, if your np or n(1-p) takes on small values, a part of your otherwise quite normal histogram gets cut off at zero. Higher np or n(1-p) values kind of makes the distribution skinnier and therefore prevents the cutoff. I know your question was posted six months earlier, but hopefully this answers your question if you are still confused with it.

what does np represent?

That's a good question! np is the population mean of a binomial distribution. Where n is the sample size, and p is the probability of success. Since it's a binomial variable, the probability of success is constant.

Why do the conditions need to be n*p > 10 and n * q > 10? I thought that n > 30 is what it needs to be in order for any sample distribution of sample statistic to be normally distributed according the central limit theorem. So, The sample distribution of the sample means will be normally distributed if n > 30. The sample distribution of the sample proportions will be normally distributed if n > 30? Am I not correct about this? What am I missing?

The conditions n*p > 10 and n*q > 10 ensure that p is not too close to 0 or 1. For any given value of n, if p is too close to 0 or 1, then the distribution of the number of successes in a binomial distribution with n trials and success probability p would be significantly asymmetric about its mean (and so significantly non-normal).

If we already know the true population proportion, why are we interested in calculating a sample proportion? We are using the "true" population proportion to validate the normal distribution of the sample, but then why not just work with the (known to be accurate) population data? Why bother with sampling at all in this case?

In short, if the sampling distribution is approximately normal, then we can calculate how likely it is for a sample proportion to deviate from the population proportion by a certain number of standard deviations. In later lessons we will use this to figure out how likely it is that the population proportion is what it is said to be.

For these examples, we know that the sample size is 50 and 125 respectively, but what about the number of samples (each of which consists of 50 or 125 in this case) taken? Obviously, the more the number of samples, the smoother the curve, so it's implicitly assumed that they take many samples (of 50 or 125)?

It doesn't really matter how many samples we take, the proportion of each sample is still deviating from the population proportion with a probability that resembles that of a normal distribution. It does however take quite a few samples before we can actually see this in a graph.

Main content

Course: AP®︎/College Statistics > Unit 9

Lesson 4: Sampling distributions for sample proportions

Normal conditions for sampling distributions of sample proportions

Name: Normal conditions for sampling distributions of sample proportions
Uploaded: 2017-12-20T21:00:59Z
Description: Conditions for roughly normal sampling distribution of sample proportions

Google Classroom

Conditions for roughly normal sampling distribution of sample proportions.

Want to join the conversation?

Sort by:

Prashant Kumar
Posted 6 years ago. Direct link to Prashant Kumar's post “What is the intuition for...”
What is the intuition for the rule np>=10 , n(1-p) >=10 ?
Button navigates to signup pageComment on Prashant Kumar's post “What is the intuition for...”
(17 votes)
Answer
- Par M
  Posted 4 years ago. Direct link to Par M's post “Think about the edge case...”
  Think about the edge case of p = 0.1, then n * 0.1 >= 10 i.e. n >= 100 - so, if your probability parameter is around 10% then you would need at least 100 samples before the SD is so tight that it (i.e. the left side) gets mostly captured in the [0-0.1] interval
  Button navigates to signup page
  (3 votes)
eawagena
Posted 6 years ago. Direct link to eawagena's post “So as n increases, even v...”
So as n increases, even very low or very high values of p start to produce normal sampling distributions?
Are normal distributions impossible if n < 10?
Button navigates to signup pageButton navigates to signup page
(3 votes)
Answer
- matthewbocay2
  Posted 6 years ago. Direct link to matthewbocay2's post “As for your first questio...”
  As for your first question, you might be interested in this video by jbstatistics on YT: https://www.youtube.com/watch?v=fuGwbG9_W1c.
  The part
  6:45
  ~
  7:57
  graphs the sample distribution of sample proportion with p=0.04 and increasing n, and the part
  8:15
  ~
  8:52
  does the same thing except with p=0.96. That might answer your question.
  
  As for the second question:
  First, I think the 10 is more of an arbitrary value than an actual rule. That's because it's quite subjective whether a graph with lower (np) and/or n(1-p) is normal or not (i.e., the video I linked for your first question gives 15 instead of 10 as in this video.)
  Secondly though, as you can see in the video on youtube, if your np or n(1-p) takes on small values, a part of your otherwise quite normal histogram gets cut off at zero. Higher np or n(1-p) values kind of makes the distribution skinnier and therefore prevents the cutoff.
  
  I know your question was posted six months earlier, but hopefully this answers your question if you are still confused with it.
  Comment on matthewbocay2's post “As for your first questio...”
  (7 votes)
philiphong15
Posted 4 years ago. Direct link to philiphong15's post “In the first example, how...”
In the first example, how could we tell which way it was going to be skewed?
Button navigates to signup pageButton navigates to signup page
(5 votes)
Answer
- Bryan
  Posted 4 years ago. Direct link to Bryan's post “Someone correct me if I'm...”
  Someone correct me if I'm wrong.
  
  The mean of the sampling dist is p (population proportion). If your sampling dist is indeed skewed, then when p is closer to 0 than 1, the top of the distribution "hump" will be closer to 0 than to 1, so it will be skewed to the left, and vice versa.
  Comment on Bryan's post “Someone correct me if I'm...”
  (2 votes)
Andrea Menozzi
Posted 3 years ago. Direct link to Andrea Menozzi's post “what does np represent?”
what does np represent?
Button navigates to signup pageButton navigates to signup page
(3 votes)
Answer
- George D.
  Posted 3 years ago. Direct link to George D.'s post “That's a good question! n...”
  That's a good question! np is the population mean of a binomial distribution. Where n is the sample size, and p is the probability of success. Since it's a binomial variable, the probability of success is constant.
  Button navigates to signup page
  (4 votes)
hellowKhanLearning
Posted 4 years ago. Direct link to hellowKhanLearning's post “Why do the conditions nee...”
Why do the conditions need to be n*p > 10 and n * q > 10?

I thought that n > 30 is what it needs to be in order for any sample distribution of sample statistic to be normally distributed according the central limit theorem.

So,

The sample distribution of the sample means will be normally distributed if n > 30.

The sample distribution of the sample proportions will be normally distributed if n > 30?

Am I not correct about this?
What am I missing?
Button navigates to signup pageButton navigates to signup page
(1 vote)
Answer
- Ian Pulizzotto
  Posted 4 years ago. Direct link to Ian Pulizzotto's post “The conditions n*p > 10 a...”
  The conditions n*p > 10 and n*q > 10 ensure that p is not too close to 0 or 1.
  
  For any given value of n, if p is too close to 0 or 1, then the distribution of the number of successes in a binomial distribution with n trials and success probability p would be significantly asymmetric about its mean (and so significantly non-normal).
  Comment on Ian Pulizzotto's post “The conditions n*p > 10 a...”
  (6 votes)
mohiuddin shojib
Posted 3 years ago. Direct link to mohiuddin shojib's post “for the first problem, 'a...”
for the first problem, 'a shipment of 50 tangerines everyday' is it means the 'population'? if yes , then how can she sampled 50 tangerines out of 50 population?
Button navigates to signup pageButton navigates to signup page
(2 votes)
Answer
- daniella
  Posted a month ago. Direct link to daniella's post “In the example, the shipm...”
  In the example, the shipment of 50 tangerines every day represents the sample size (n), not the population. The population would be the larger pool from which these tangerines are drawn, potentially encompassing all tangerines supplied by the distributor. Emiliana's daily sample of 50 tangerines is considered a random sample from this larger population. The confusion might arise from the wording, but in the context of statistical sampling, the population refers to the total set of observations that could be made, not just the number in a specific shipment.
  Button navigates to signup page
  (1 vote)
dfbarbour
Posted 4 years ago. Direct link to dfbarbour's post “If we already know the tr...”
If we already know the true population proportion, why are we interested in calculating a sample proportion? We are using the "true" population proportion to validate the normal distribution of the sample, but then why not just work with the (known to be accurate) population data? Why bother with sampling at all in this case?
Button navigates to signup pageButton navigates to signup page
(1 vote)
Answer
- Jerry Nilsson
  Posted 4 years ago. Direct link to Jerry Nilsson's post “In short, if the sampling...”
  In short, if the sampling distribution is approximately normal, then we can calculate how likely it is for a sample proportion to deviate from the population proportion by a certain number of standard deviations.
  
  In later lessons we will use this to figure out how likely it is that the population proportion is what it is said to be.
  Button navigates to signup page
  (3 votes)
Yuya Fujikawa
Posted 2 years ago. Direct link to Yuya Fujikawa's post “For these examples, we kn...”
For these examples, we know that the sample size is 50 and 125 respectively, but what about the number of samples (each of which consists of 50 or 125 in this case) taken? Obviously, the more the number of samples, the smoother the curve, so it's implicitly assumed that they take many samples (of 50 or 125)?
Button navigates to signup pageButton navigates to signup page
(1 vote)
Answer
- Jerry Nilsson
  Posted 2 years ago. Direct link to Jerry Nilsson's post “It doesn't really matter ...”
  It doesn't really matter how many samples we take, the proportion of each sample is still deviating from the population proportion with a probability that resembles that of a normal distribution.
  
  It does however take quite a few samples before we can actually see this in a graph.
  Button navigates to signup page
  (2 votes)
Yash Singh
Posted 5 years ago. Direct link to Yash Singh's post “How do you see if the sam...”
How do you see if the sampling distribution will turn out to be uniform?
Button navigates to signup pageButton navigates to signup page
(1 vote)
Answer
paperangel220
Posted 2 years ago. Direct link to paperangel220's post “Can the mean and standard...”
Can the mean and standard deviation for the sample distribution of sample proportions still be determined even if the sample distribution is not a normal distribution (ex: skewed left or skewed right)?
Button navigates to signup pageButton navigates to signup page
(1 vote)
Answer
- daniella
  Posted a month ago. Direct link to daniella's post “Yes, the mean (μ sub p ha...”
  Yes, the mean (μ sub p hat) and standard deviation (σ sub p hat) of the sampling distribution of sample proportions can still be determined and are meaningful even if the distribution is not normal (e.g., skewed left or right). The mean of the sampling distribution is always equal to the population proportion (p), and the standard deviation is calculated as sqrt(p(1 − p) / n), where n is the sample size. These measures are useful for understanding the distribution's center and spread, respectively, regardless of its shape. However, the normal approximation (and thus the use of z-scores for probability calculations) is more accurate when the distribution is approximately normal, fulfilling the rule of thumb criteria.
  Button navigates to signup page
  (1 vote)

Video transcript

- [Instructor] What we're going to do in this video is think about under which conditions does the sampling distribution of the sample portions, in which situations does it look roughly normal, and under which situations does it look skewed right so does it look something like this, and under which situations does it look skewed left maybe something like that. And the conditions that we're going to talk about, and this is a rough rule of thumb, that if we take our sample size and we multiply it by the population proportion that we care about and that is greater than or equal to 10 and if we take the sample size and we multiply it times one minus the population proportion and that also is greater than or equal to 10, if both of these are true, the rule of thumb tells us that this is going to be approximately normal in shape, the sampling distribution of the sample proportions. So with that in our minds, let's do some examples here. So this first example says, Emiliana runs a restaurant that receives a shipment of 50 tangerines every day. According to the supplier, approximately 12% of the population of these tangerines is overripe. Suppose that Emiliana calculates the daily proportion of overripe tangerines in her sample of 50. We can assume the supplier's claim is true and that the tangerines each day represent a random sample. What will be the shape of the sampling distribution, what will be the shape of the sampling distribution of the daily proportions of overripe tangerines? Pause this video, think about what we just talked about and see if you can answer this. All right, so right over here, we're getting daily samples of 50 tangerines. So for this particular example, our n is equal to 50 and our population proportion, the proportion that is overripe is 12% so p is 0.12. So if we take n times p, what do we get? NP is equal to 50 times 0.12, well 100 times this would be 12 so 50 times this is going to be equal to six and this is less than or equal to 10. So this immediately violates this first condition and so we know that we're not going to be dealing with a normal distribution. And so the question is, how is it going to be skewed? And the key realization is remember, the mean of the sample proportions or the sampling distribution of the sample proportions or the mean of the sampling distribution of the daily proportions that that's going to be the same thing as our population proportion so the mean is going to be 12%. So if I were to draw it, let me see if I were to draw it right over here where this is 50% and this is 100%, our mean is gonna be right over here at 12% and so you're gonna have it really high over there and then it's gonna be skewed to the right. You're gonna have a big long tail. So this is going to be skewed to the right. Let's do another example. So here we're told, according to a Nielsen survey, radio reaches 88% of children each week. Suppose we took weekly random samples of n equals 125 children from this population and computed the proportion of children in each sample whom radio reaches. What will be the shape of the sampling distribution of the proportions of children the radio reaches? Once again, pause this video and see if you can figure it out. All right, well let's just figure out what n and p are. Our sample size here n is equal to 125 and our population proportion of the proportion of children that are reached each week by radio is 88% so p is 0.88. So now let's calculate np so n is 125 times p is 0.88 and is this going to be greater than or equal to 10. Well, we don't even have to calculate this exactly. This is almost 90% of 125. This is actually going to be over 100 so it for sure is going to be greater than 10 so we meet this first condition. But what about the second condition? We could take n 125 times one minus p so this is times 0.12 so this is 12% of 125. Well, even 10% of 125 would be 12.5 so 12% is for sure going to be greater than that so this too is going to be greater than 10. I didn't even have to calculate it. I could just estimate it and so we meet that second condition. So even though our population proportion is quite high, it's quite close to one here, because our sample size is so large, it still will be roughly normal and one way to get the intuition for that is so this is a proportion of zero, let's say this is 50% and this is 100%, so our mean right over here is gonna be 0.88 for our sampling distribution of the sample proportions. If we had a low sample size, then our standard deviation would be quite large and so then you would end up with a left skewed distribution. But we saw before the higher your sample size, the smaller your standard deviation for the sampling distribution and so what that does is it tightens up, it tightens up the standard deviation and so it's going to look more normal. It's gonna look closer to being normal. So we'll say approximately normal because it met our conditions for this rule of thumb. Is it gonna be perfectly normal? No. In fact, if we didn't have this rule of thumb to draw the line, some might even argue that we still have a longer tail to the left than we do to the right, maybe it's skewed to the left, but using this threshold, using this rule of thumb which is the standard in statistics, this would be viewed as approximately normal.