Estimating a population proportion
Current time:0:00Total duration:10:05
Margin of error 2
Where we left off in the last video I kind of gave you a question. Find an interval so that we're reasonably confident-- we'll talk a little bit more about why I have to give this kind of vague wording right here-- reasonably confident that there's a 95% chance that the true population mean, which is p, which is the same thing as the mean of the sampling distribution of the sampling mean. So there's a 95% chance that the true mean-- and let me put this here. This is also the same thing as the mean of the sampling distribution of the sampling mean is in that interval. And to do that let me just throw out a few ideas. What is the probability that if I take a sample and I were to take a mean of that sample, so the probability that a random sample mean is within two standard deviations of the sampling mean, of our sample mean? So what is this probability right over here? Let's just look at our actual distribution. So this is our distribution, this right here is our sampling mean. Maybe I should do it in blue because that's the color up here. This is our sampling mean. And so what is the probability that a random sampling mean is going to be two standard deviations? Well a random sampling is a sample from this distribution. It is a sample from the sampling distribution of the sample mean. So it's literally what is the probability of finding a sample within two standard deviations of the mean? That's one standard deviation, that's another standard deviation right over there. In general, if you haven't committed this to memory already, it's not a bad thing to commit to memory, is that if you have a normal distribution the probability of taking a sample within two standard deviations is 95-- and if you want to get a little bit more accurate it's 95.4%. But you could say it's roughly-- or maybe I could write it like this-- it's roughly 95%. And really that's all that matters because we have this little funny language here called reasonably confident, and we have to estimate the standard deviation anyway. In fact, we could say if we want, I could say that it's going to be exactly equal to 95.4%. But in general, two standard deviations, 95%, that's what people equate with each other. Now this statement is the exact same thing as the probability that the sample mean, that the sampling mean-- not the sample mean, the probability of the mean of the sampling distribution is within two standard deviations of the sampling distribution of x is also going to be the same number, is also going to be equal to 95.4%. These are the exact same statements. If x is within two standard deviations of this, then this, then the mean, is within two standard deviations of x. These are just two ways of phrasing the same thing. Now we know that the mean of the sampling distribution, the same thing as a mean of the population distribution, which is the same thing as the parameter p-- the proportion of people or the proportion of the population that is a 1. So this right here is the same thing as the population mean. So this statement right here we can switch this with p. So the probability that p is within two standard deviations of the sampling distribution of x is 95.4%. Now we don't know what this number right here is. But we have estimated it. Remember, our best estimate of this is the true standard, or it is the true standard deviation of the population divided by 10. We can estimate the true standard deviation of the population with our sampling standard deviation, which was 0.5, 0.5 divided by 10. Our best estimate of the standard deviation of the sampling distribution of the sample mean is 0.05. So now we can say-- and I'll switch colors-- the probability that the parameter p, the proportion of the population saying 1, is within two times-- remember, our best estimate of this right here is 0.05 of a sample mean that we take is equal to 95.4%. And so we could say the probability that p is within 2 times 0.05 is going to be equal to-- 2.0 is going to be 0.10 of our mean is equal to 95-- and actually let me be a little careful here. I can't say the equal now, because over here if we knew this, if we knew this parameter of the sampling distribution of the sample mean, we could say that it is 95.4%. We don't know it. We are just trying to find our best estimator for it. So actually what I'm going to do here is actually just say is roughly-- and just to show that we don't even have that level of accuracy, I'm going to say roughly 95%. We're reasonably confident that it's about 95% because we're using this estimator that came out of our sample, and if the sample is really skewed this is going to be a really weird number. So this is why we just have to be a little bit more exact about what we're doing. But this is the tool for at least saying how good is our result. So this is going to be about 95%. Or we could say that the probability that p is within 0.10 of our sample mean that we actually got. So what was the sample mean that we actually got? It was 0.43. So if we're within 0.1 of 0.43, that means we are within 0.43 plus or minus 0.1 is also, roughly, we're reasonably confident it's about 95%. And I want to be very clear. Everything that I started all the way from up here in brown to yellow and all this magenta, I'm just restating the same thing inside of this. It became a little bit more loosey-goosey once I went from the exact standard deviation of the sampling distribution to an estimator for it. And that's why this is just becoming-- I kind of put the squiggly equal signs there to say we're reasonably confident-- and I even got rid of some of the precision. But we just found our interval. An interval that we can be reasonably confident that there's a 95% probability that p is within that, is going to be 0.43 plus or minus 0.1. Or an interval of-- we have a confidence interval. We have a 95% confidence interval of, and we could say, 0.43 minus 0.1 is 0.33. If we write that as a percent we could say 33% to-- and if we add the 0.1, 0.43 plus 0.1 we get 53%-- to 53%. So we are 95% confident. So we're not saying kind of precisely that the probability of the actual proportion is 95%, but we're 95% confident that the true proportion is between 33% and 55%. That p is in this range over here. Or another way, and you'll see this in a lot of surveys that have been done, people will say we did a survey and we got 43% will vote for number one, and number one in this case is candidate B. And then the other side, since everyone else voted for candidate A, 57% will vote for A. And then they're going to put on margin of error. And you'll see this in any survey that you see on TV. They'll put a margin of error. And the margin of error is just another way of describing this confidence interval. And they'll say that the margin of error in this case is 10%, which means that there's a 95% confidence interval, if you go plus or minus 10% from that value right over there. And I really want to emphasize, you can't say with certainty that there is a 95% chance that the true result will be within 10% of this, because we had to estimate the standard deviation of the sampling mean. But this is the best measure we can with the information you have. If you're going to do a survey of 100 people, this is the best kind of confidence that we can get. And this number is actually fairly big. So if you were to look at this you would say, roughly there's a 95% chance that the true value of this number is between 33% and 53%. So there's actually still a chance that candidate B can win, even though only 43% of your 100 are going to vote for him. If you wanted to make it a little bit more precise you would want to take more samples. You can imagine. Instead of taking 100 samples, instead of n being 100, if you made n equal 1,000, then you would take this number over here, you would take this number here and divide by the square root of 1,000 instead of the square root of 100. So you'd be dividing by 33 or whatever. And so then the size of the standard deviation of your sampling distribution will go down. And so the distance of two standard deviations will be a smaller number, and so then you will have a smaller margin of error. And maybe you want to get the margin of error small enough so that you can figure out decisively who's going to win the election.