If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

# Sampling distribution of sample proportion part 2

AP.STATS:
UNC‑3 (EU)
,
UNC‑3.K (LO)
,
UNC‑3.K.1 (EK)
,
UNC‑3.K.2 (EK)
Building intuition for the sampling distribution of sample proportions using a simulation.

## Want to join the conversation?

• When Sal says at about that “we saw the relation between the sampling distribution of the sample proportion and a binomial random variable,” is he talking about the ideas in this video? https://www.khanacademy.org/math/statistics-probability/random-variables-stats-library/binomial-random-variables/v/visualizing-a-binomial-distribution
• Hey,thanks for this super video.I am referring to our 10% rule.Based on this can we have a rule of thumb that a reasonable sample needs to have a size of at least 10 % of population to be studied?Regards
(1 vote)
• For a proportion, the normal approximation is generally good if np and n(1-p) are each at least 10. We also want the sample size to be 10% or less of the population size, so that the effects of selection without replacement (instead of with replacement) are small, meaning that the independence assumption gives a good approximation.
• "So, we're gonna do 50 samples of ten at a time." v/s "And so, here, we can quickly get to a fairly large number of samples. So here, we're over a thousand samples." These two sentences from the transcript, how do they relate to the previous (part 1) video? What is the value of 'n' here, is it 50 or 1050?
(1 vote)
• n is 10 here, we're just taking 50 samples of 10 at once, instead of clicking the button 50 times.
• Why isn't the program workiung?
(1 vote)
• is their a video of this for standard deviation
(1 vote)
• python code for this anyone?
(1 vote)
• Sal mentions about standard deviation in the video. I am confused why it's standard deviation and not standard error, since we are dealing with a sampling distribution here?
(1 vote)

## Video transcript

- [Instructor] This, right over here, is a scratch pad on Khan Academy, created by Khan Academy user Charlotte Auen. And, what you see here, is a simulation that allows us to keep sampling from our gumball machine, and start approximating the sampling distribution of the sample proportion. So, her simulation focuses on green gumballs, but we talked about yellow before, and the yellow gumballs, we said 60% were yellow, so let's make 60% here green. And then, let's take samples of ten, just like we did before. And then, let's just start with one sample. So, we're gonna draw one sample, and what we wanna show, is we wanna show the percentages. Which if the proportion of each sample, that are green. So, if we draw that first sample, notice out of the ten, five ended up being green, and then it plotted that right over here, under 50%. We have one situation where 50% were green, now let's do another sample, so this sample 60% are green. And so, let's keep going. Let's draw another sample. And now that one, we have, we have 50% are green, and so notice now we see here on this distribution; two of them had 50% green. Now, we could keep drawing samples, and let's just really increase. So, we're gonna do 50 samples of ten at a time. And so, here, we can quickly get to a fairly large number of samples. So here, we're over a thousand samples. And, what's interesting here, is we're seeing experimentally, that our sample; the mean of our sample proportion here, is zero point six two. What we calculated, a few minutes ago, was that it should be zero point six. We also see that the standard deviation of our sample proportion, is zero point one six. And what we calculated was approximately zero point one five. And as we draw more and more samples, we should get even closer, and closer to those values. And, we see that, for the most part, we are getting closer, and closer, in fact, now that it's rounded, we are at exactly those values, that we had calculated before. Now, one interesting thing to observe is, when your population proportion is not too close to zero, and not too close to one, this looks pretty close to a normal distribution. And that makes sense. Because, we saw the relation between the sampling distribution of the sample proportion, and a binomial random variable. But, what if our population proportion is closer to zero? So, let's say our population proportion is ten percent. Zero point one. What do you think the distribution is going to look like then? Well, we know that the mean of our sampling distribution is going to be ten percent, and so you could imagine that the distribution is going to be right skewed. But, let's actually see that. So, here we see that our distribution is indeed, right skewed. And that makes sense. Because, you can only get values from zero to one, and if your mean is closer to zero, then you're gonna see the meat of your distribution here, and then you're gonna see a long tail to the right. Which creates that right skew. And, if your population proportion was close to one, well, you can imagine the opposite is going to happen. You're going to end up with a left skew. And, we indeed, see right over here, a left skew. Now, the other interesting thing to appreciate is, the larger your samples, the smaller the standard deviation. And so, let's do a population proportion that is right in-between. And so, here, this is similar to what we saw before, this is looking roughly normal. But now, and that's when we had sample size of ten, but, what if we have a sample size of 50 every time? Well, notice, now it looks like a much tighter distribution. This isn't even going all the way to one yet, but it is a much tighter distribution. And, the reason why that made sense, the standard deviation of your sample proportion, it is inversely proportional to the square root of "n". And, so, that makes sense. So, hopefully you have a good intuition now, for the sample proportion, it's distribution, the sampling distribution of the sample proportion that you can calculate it's mean, and its' standard deviation. And you feel good about it, because we saw it in a simulation.