Main content
AP®︎/College Statistics
Course: AP®︎/College Statistics > Unit 9
Lesson 5: Sampling distributions for differences in sample proportions- Sampling distribution of the difference in sample proportions
- Mean and standard deviation of difference of sample proportions
- Shape of sampling distributions for differences in sample proportions
- Sampling distribution of the difference in sample proportions: Probability example
- Differences of sample proportions — Probability examples
© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice
Sampling distribution of the difference in sample proportions
We can calculate the mean and standard deviation for the sampling distribution of the difference in sample proportions. Also, we can tell if the shape of that sampling distribution is approximately normal. Created by Sal Khan.
Want to join the conversation?
- Great video. For people who are confused at the formula around 4 minutes. I googled it :), there is the answer
"The variance of X/n is equal to the variance of X divided by n², or (np(1-p))/n² = (p(1-p))/n . This formula indicates that as the size of the sample increases, the variance decreases."(2 votes) - When we are trying to find the standard deviation of sample difference, why don't we just calculate the difference between the two sample's std deviation?(1 vote)
Video transcript
- [Instructor] We're told, suppose that 8% of all cars produced at plant
A have a certain defect, and 6% of all cars produced
at plant B have this defect. Each month, a quality control manager takes separate random samples of 200 of the over 3000 cars
produced from each plant. The manager looks at the difference between the proportions
of cars with the defect in each sample. So they're looking at the difference of sample proportions every month. Describe the distribution of the difference of sample proportions in terms of its mean
standard deviation and shape. So let's take these step-by-step. So first, let's think about
the mean of the difference of our sample proportions. Pause this video and try to figure out
what that's going to be. Well, we have seen this
in previous videos, that if we have the mean of the difference of two random variables, that's the same as the
difference of the means or another way to think about it is if we wanna figure out the mean of this, so sample proportion from plant A minus sample proportion from plant B, this is just going to be equal to the mean of the sample proportion from plant A, minus the mean of the sample
proportion from plant B. Now, what are these going to be equal to? Well, what's the mean of the
sample proportion of plant A. Is just going to be the
true population proportion for plant A. And they tell us that. They tell us that 8% of all
cars produced at plant A have a certain defect. So this could be 8% or we
could write it as 0.08. And then from that, we are going to subtract the
mean of the sample proportion from plant B. And we know what that mean's going to be. The mean of a sample proportion is going to be the population proportion. The parameter of the population, which we know for plant B is 6%, 0.06, and then that gets us a
mean of the difference of 0.02 or 2% or 2% difference in defect
rate would be the mean. Now let's think about
the standard deviation. So instead of thinking in
terms of standard deviation, let's think about the square
of the standard deviation, which is variance. And from there, we can go
back to standard deviation by taking a square root. So if we're looking at the variance, lemme write it this way, if we're looking at the variance of the difference of
the sample proportions, so the sample proportion from plant A minus the sample proportion from plant B, but just as a review, if you assume that we're
sampling independently from each of the plants. So what we're sampling from plant A does not affect what we're
sampling from plant B or vice versa, then we
can add the variances. So this is going to be
equal to the variance of the sample proportion from plant A plus the variance of the
sample proportion from plant B. Some of you might be saying, "Wait, aren't we taking the difference of sample proportions here? Why are we adding?" And the reminder is, remember, variance is
a measure of a spread. And whether you're now
taking the difference of random variables or you're
taking the sum of them, when you have more variables, you're going to have more spread. So regardless of whether
this is a negative or positive over here, this
is going to be a positive. So what is this going to be equal to. We can take each of these terms, what's going to be the variance of the sample proportion from plant A? Well, if every time we
looked at one of the cars, we looked at it and then we
put it back into the mix. So if we were sampling with replacement, which means that each of our observations are independent of the other
ones, we have a formula. We know that this variance would be the population
proportion of plant A times one minus the population
proportion of plant A divided by the number that
we sampled from plant A. Now, in the scenario that
we are talking about, we didn't sample with replacement, we just took 200 at a
time and looked at them. We didn't take one at a time and replace it and do that 200 times. But we also know that this is
a pretty good approximation, even when you are not
sampling with replacement. If your sample is less
than 10% of the population, and 200 is less than 10% of 3000. So this is a pretty good approximation, what you would use in a
first year statistics class. And of course, we can use the same logic. This is going to be equal to the population proportion plant B times one minus the population
proportion in plant B, all of that over your
sample size from plant B. And we know all of these things. We know that your population
proportion in plant A is 8% or 0.08. One minus that is 0.92. We're taking samples of
200 at a time from plant A. And then in plant B, we know
the population proportion, they told us is 6% or 0.06. One minus that is 0.94. And then the sample size from plant B is also going to be 200. It's going to be 200. We get 0.08 times 0.92
divided by 200 and then plus, let's open parentheses here, we get 0.06 times 0.94 divided by 200, and then actually let me
close the parentheses, and that equals this business. So 0.00065. So 0.00065. And then from this, we can figure out what the
standard is going to be. The standard deviation of the difference between our sample proportions is going to be just the
square root of this. It's going to be the
square root of 0.00065. And that is approximately equal to, let's just take the square
root, and we get this, 0.025. 0.025. And there you have it, we have thought about
the standard deviation. And then last but not least,
let's think about the shape. So just as a review, we just
have to remind ourselves that the distribution of
each sample proportion is going to be normal as long as we expect at least
10 successes and 10 failures. Well, let's look at each of these. How many successes you expect where a success would
actually be a defect? But let's think about this. 8% of in each case of a sample of 200, that's going to be 16. So you would expect 16 defects, and then you would expect 200 minus 16, which is a lot larger
than 10 of no defects. So both of those are
greater than or equal to 10. And then if you did the
same thing for plant B, you get the same idea. 6% of 200 is 12. And then if you say the
ones that have no defects, that's 200 minus 12,
which is way more than 10, and especially in that latter case. But in every situation, we expect to have at least
10 successes and 10 failures. And so we can assume that the
distributions of each of these are going to be normal. And we also know that the difference of two normally distributed
variables is also normal, so long as they pass that
large count condition that we just talked about. And so let's draw what this
distribution might look like. It might look something like this. It's going to be a normal distribution where you have a mean right over here. I'll do that in that same color. A mean of 0.02. You can definitely take on negative values because there are some situations in which your sample
proportion from plan B actually could be larger
just by random chance than it is from plant A. So you can definitely
take on negative values. But if I wanted to show where zero is, maybe zero is right over here, so we could draw an axes right over here. And then we know what the
standard deviation is. It's 0.025 or it's approximately that. So if we were to go one
standard deviation down, we would go right about there, and if we were to go one
standard deviation up, we would go right about there. And obviously, we could go more
than one standard deviation above or below that mean.