If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains ***.kastatic.org** and ***.kasandbox.org** are unblocked.

Main content

Current time:0:00Total duration:11:45

AP.STATS:

UNC‑4 (EU)

, UNC‑4.A (LO)

, UNC‑4.A.1 (EK)

, VAR‑1 (EU)

, VAR‑1.H (LO)

, VAR‑1.H.1 (EK)

it is election season and there is a runoff between candidate a versus candidate B and we are pollsters and we're interested in figuring out well what's the likelihood that candidate a wins this election well ideally we would go to the entire population of likely voters right over here let's say there's a hundred thousand likely voters and we would ask every one of them who do you support and from that we would be able to get the population proportion which would be this is the proportion that support support candidate a but it might not be realistic in fact it definitely will not be realistic to ask while all 100,000 people so instead we do the thing that we tend to do in statistics is is that we sample this population and we calculate a statistic from that sample in order to estimate this parameter so let's say we take a sample right over here so this sample size let's say n equals 100 and we calculate the sample proportion that support Canada Day so out of the hundred let's say that 54 say that they're going to support Canada Day so the sample proportion here is zero point five four and just to appreciate that we're not always going to get zero point five four there could have been a situation where we sampled a different hundred and we would have maybe gotten a different sample proportion maybe in that one we got zero point five eight and we already have the tools in statistics to think about this the distribution of the possible sample proportions we could get we've talked about it when we thought about sampling distributions so you could have the sampling distribution of the sample proportions of the sample proportions for portions and it's going to this distribution is going to be specific to what our sample size is for n is equal to 100 and so we can describe the possible sample proportions we could get and there likelihoods with this sampling distribution so let me do that so it will look something like this because our sample size is so much smaller than the population it's way less than 10% we can assume that each person we're asking that it's approximately independent also if we make the assumption that the true proportion isn't too close to zero or not too close to one then we can say that well look the sampling distribution is roughly going to be normal so have a normal this kind of bell curve shape and we know a lot about the sampling distribution of the sample proportions we know already for example if this is foreign to I encourage you to watch the videos on this on Khan Academy that the mean of this sampling distribution is going to be the actual population proportion and we also know what the standard deviation of this is going to be so let me decimate me that's one standard deviation this is two standard deviations that's three standard deviations above the mean that's one standard deviation two standard deviations three standard deviations below the mean so this distance let me just in a different color this standard deviation right over here which we denote as the standard deviation of the sample proportions for this sampling distribution this is we've already seen the formula there it's the square root of P times 1 minus P where P is once again our population proportion divided by our sample size that's why it's specific for N equals 100 year and so in this first scenario let's just focus on this one right over here when we took a sample size of N equals 100 and we got the sample proportion of 0.5 4 we could have gotten all sorts of outcomes here maybe 0.5 4 is right over here maybe 0.5 4 is right over here and the reason why I have this uncertainty is we actually don't know what the real population parameter is what the real population proportion is but let me ask you maybe a slightly easier question what is what is the probability probability that our sample proportion of 0.54 is within is within two times two standard deviations of P pause the video and think about that well that's just saying look if I'm gonna take a sample and calculate the sample proportion right over here what's the probability that I'm within two standard deviations of the mean well that's essentially going to be this area right over here and we know from studying normal curves that approximately 95% of the areas within two standard deviations so this is approximately 95% 95% of the time that I take a sample size of 100 and I calculate this sample proportion 95% of the time I'm going to be within two standard deviations but if you take this statement you can actually construct another statement that starts to feel a little bit more I guess we could say inferential we could say there there is a 95% probability that the population proportion P is within within two standard deviations two standard deviations of P hat which is equal to zero point five four pause this video appreciate that these two are equivalent statements if there's a 95% chance that our sample proportion is within two standard deviations of the true proportion well that's equivalent to saying that there's a 95% chance that our true proportion is within two standard deviations of our sample proportion and this is really really interesting because if we were to able to figure out what this value is well then we would be able to create what you could call a confidence interval now you immediately might be seeing a problem here in order to calculate this our standard deviation of this distribution we have to know our population parameter so pause this video and think about what we would do instead if we don't know what P if we don't know our population proportion do we have something that we could use as an estimate for our population proportion well yes we calculated P hat already we calculated our sample proportion and so a new statistic that we could define is the standard error the standard error of our sample proportions and we can define that as being equal to since we don't know the population proportion we're going to use a sample proportion P hat times 1 minus P hat all of that over and in this case of course n is 100 we do know that and it actually turns out I'm not going to prove it in this video that this actually is an unbiased estimator for this right over here so this is going to be equal to 0.5 4 times 1 minus 0.5 4 so that's 0.46 all of that over 100 so we have the square root of 0.5 4 times 0.4 6 divided by 100 close my parentheses enter so if we round to the nearest hundredth it's going to be actual even we round to the nearest thousandth it's going to be approximately 5 hundredths so this is going to be this is approximately 0.05 so another way to say all of these things is instead we don't know exactly this but now we have an estimate for it so we can now say with 95% confidence and that will often be known as our confidence level right over here with 95% confidence between between and so we'd want to go to standard errors below our sample proportion that we just happen to calculate so that would be 0.5 4 minus 2 times 5 hundreds so that would be 0.5 4 minus 10 hundredths which would be zero point 4 4 and we'd also want to go to standard errors above the sample proportion so that would be that plus 10 and 0.64 of voters of voters support support a and so this interval that we have right over here from zero point four four to zero point six four this will be known as our confidence interval confidence interval and this will change not just in the starting point and the end point but it will change the actual length of our confidence interval will change depending on what sample proportion we happen to pick for that sample of 100 a related idea to the confidence interval is this notion of margin of error margin of error and for this particular case for this particular sample our margin of error because we care about 95% confidence so that would be two standard errors so our margin of error here is two times our standard error which is B zero point one or zero point one zero and so we're going one margin of error above our sample proportion right over here and one margin of error below our sample proportion right over here to define our confidence interval and as I mentioned this margin of error is not going to be fixed every time we take a sample depending on what our sample proportion is it's not it's going to affect our margin of error because that has calculated essentially with the standard error another interpretation of this is that the method that we use to get this interval right over here the method that we used to get this conference to get this confidence interval when we use it over and over it will produce intervals and the intervals won't always be the same it's going to be dependent on our sample proportion but it will produce intervals which include the true proportion which we might not know and often don't know it'll include the true proportion ninety-five percent of the time I'll cover that intuition more in future videos we'll see how the interval changes how the margin of error changes but when you do this calculation over and over and over again ninety-five percent of the time your true proportion is going to be contained in whatever interval you happen to calculate that time now another interesting question is is well what if you wanted to tighten up the intervals on average how would you do that well if you wanted to lower your margin of error the best way to lower the margin of error is if you increase this denominator right over here and increasing that denominator means increasing the sample size and so one thing that you will often see when people are talking about election courage coverage is well we need a sample more people in order to get a lower margin of error but I'll leave you there and I'll see you in future videos