Statistics and probability
- Confidence interval example
- Margin of error 1
- Margin of error 2
- Conditions for valid confidence intervals for a proportion
- Conditions for confidence interval for a proportion worked examples
- Reference: Conditions for inference on a proportion
- Conditions for a z interval for a proportion
- Critical value (z*) for a given confidence level
- Finding the critical value z* for a desired confidence level
- Example constructing and interpreting a confidence interval for p
- Calculating a z interval for a proportion
- Interpreting a z interval for a proportion
- Determining sample size based on confidence and margin of error
- Sample size and margin of error in a z interval for p
Finding the 95% confidence interval for the proportion of a population voting for a candidate. Created by Sal Khan.
Want to join the conversation?
- I don't understand why the confidence interval doesn't take into account the size of the total population. It is interesting to me that the margin of error would have been 10% even if the population were 105 people, in which case a sample of 100 is much more powerful and precise than the same sample out of a population of 100 million. Can anyone please help clarify this concept?(37 votes)
- You are right that population size matters somewhat, although in many real-world examples the sample size is a tiny fraction of the population. As sample size gets to be 10% or more of the population, a "correction factor" can be used to scale the confidence interval to account for extra precision. More details here: <http://www.childrensmercy.org/stats/size/population.asp>. Also, in your example with a big relative sample size, margin of error depends on the sampling method (with or without replacement, i.e., whether you allow picking the same sample/unit/person multiple times): http://en.wikipedia.org/wiki/Simple_random_sample(22 votes)
- At01:02: Why is there 2 sigma of the sampling mean?(11 votes)
- Good question. Sal did something different here than in previous videos. Previously, when finding the 95% confidence interval, he looked up the Z-score on a Z-table. Since Z-tables are organized by percentile (the entire area to the left of the confidence limit), he first had to say, "A 95% interval is equal to 95 / 2 + 50 = 97.5% percentile."
Then he looked up that percentile, 0.9750, on the Z-table and got a Z-score of 1.96. Finally, he multiplied the Z-score by the standard deviation of the sampling distribution, sigma(x-bar). If you do that here, you get (1.96)(0.05) = 0.098. That is the true 95% confidence interval.
But in this video, Sal used a rule of thumb that says 95% confidence is approximately equal to 2 standard deviations around the mean. So he used an approximate Z-score of 2 instead of the actual Z-score of 1.96. And doing this he got a confidence interval of 0.1 rather than the true 0.098.
It's a good rule of thumb, but to be strictly accurate, you should just remember that 1.96 is ALWAYS the Z-score for a 95% confidence interval (unless you have a small sample size and are using a t-table).(28 votes)
- Can i say that if I have a good amount of samples, 95% of the means of those samples will fall within the range of the confidence interval? My teacher emphasized that we couldn't say the population mean has 95% of chance being in that interval, because the population mean is a constant. It is either in that range or not. My interpretation is we are confident that 95% of the time our sample means will fall within the range we construct around the true population mean in the sampling distribution of the population. Is that correct?(6 votes)
- if we had the population standard deviation (sigma), (which I don't think we ever do) then it seems to me that everything you say is the correct way to look at it. But since we have only our sample standard deviation (s), then doesn't the 95% have a little bit of uncertainty? I think the SEM uses s/sqrt(n) while the central limit theorem uses sigma/sqrt(n).(0 votes)
- What's the difference between a frequentist confidence interval and a Bayesian credible interval?(10 votes)
- How come there's a greater probability for candidate A to win even though more people are voting for candidate B? I mean, I get the calculation and everything but how is this possible?(2 votes)
- There is not a greater chance that candidate A will win. Candidate B is most likely to win but Sal is only trying to make the point that candidate A still could win although it is unlikely.(5 votes)
- Am I correct that the margin of error is INcorrect at the end of this video? 0.43 +/- 0.1 equates to a margin of error of 0.1/0.43 ~ 23%. This will give the proper range, 0.33 to 0.53.(2 votes)
- The range mentioned in the video itself is 33% and 53% which is the same as 0.33 to 0.53 (just in percentage instead). The margin of error is 10% which is +/- 0.1 (again just in percentage).
Your margin of error comes from the 'estimate' standard deviation, and nothing else. As such, I am not really sure as to why you are dividing 0.1 by 0.43 to get 23%.(2 votes)
- What does "sampling distribution of the sample means" say that "distribution of the sample means" doesn't? And, does "sampling distribution" denote anything in particular (that is: Is the term self explanatory, without rote memorization?) (Does it mean "distribution of statistics from different samples around their corresponding population statistics"?) .(1 vote)
- > "What does "sampling distribution of the sample means" say that "distribution of the sample means" doesn't?"
Nothing, they are equivalent. The second is just slightly less of a mouthful to say/write.
> "And, does "sampling distribution" denote anything in particular ... Does it mean "distribution of statistics from different samples around their corresponding population statistics"?"
Your question is answered by your "guess" of the answer. A sampling distribution is the distribution of a statistic over many repeated samples. Hopefully, the corresponding population parameter will be in the middle of that sampling distribution.
Also: Note that a statistic corresponds to the sample, a parameter corresponds to the population. So, e.g., s² is a statistic, σ² is a parameter.(5 votes)
- I don't understand why P( x bar is within two times of standard deviation of u) is equal to P(u is within two times of standard deviation of x bar). Since u is a unknown constant which won't change, but as Sal said, x bar is just one of the sample mean and we can have thousands of these kind of sample mean which located within two times of standard deviation. In other words, x bar is a changing variable, but u is a constant. I can easily understand P( x bar is within two times of standard deviation of u), but I don't think that P(u is within two times of standard deviation of x bar) is equal to the previous one. Only if both of x bar and u are constant, we can say they are equal. otherwise, they could never be equal. More over, I don't know how to calculate P(u is within two times of standard deviation of x bar). What did I miss? Thanks for the answer.(3 votes)
- This is what we know with certainty, from the central limit theorem - that sample means have a normal distribution around mu.
This means that there is a 68% probability that mu and our sample mean are within sigma/sqrt(n) of each other, right? (If our sample mean is x from mu, then how far is mu from our sample mean? If there is a 68% probability that our sample mean is within x of mu, what's the probability that mu is within x of our sample mean?).
We use (sample standard deviation)/sqrt(n) (called SEM) as an approximation to sigma/sqrt(n) for the standard deviation, since we don't have sigma.(1 vote)
- How come (x bar within 2 sigma x bar of mean of mu x bar ) is same as
(mu x bar within 2 sigma x bar of x bar )
How are they interchangeable ? Can anyone please clarify this concept mathematically/graphically though intuitively it looks ok ?(2 votes)
- The way Sal explained it in an earlier video is that, for a range, the order of the start and end points don't matter. That is, the distance from A to B is the same as the distance from B to A. For confidence interval problems, we're given that distance (in this case it's 95%, or roughly 2 SD's) and asked to estimate a range for the true mean by using our sample mean and estimate of standard deviation.(2 votes)
- So just confirming; Is margin of error roughly the same thing as a confidence interval except that the confidence interval is described in terms of the standard deviation and the margin of error is a plan error?(1 vote)
- Margins of error are generally also given in terms of standard deviations. Sometimes I've seen the margin of error given as 1 SD (basically, reporting the mean and SD or the mean and SE). A confidence interval is formed by very specific multiples of the SD that give us the probability bounds. And in fact, it's really just made from calculating a specific margin of error.
For the normal distribution, if we multiply the SE by 1.96, we have the margin of error for a 95% confidence interval. If we take the sample mean and subtract that margin of error, and add that margin of error, we get (xbar-ME, xbar+ME), which forms the 95% confidence interval.(4 votes)
Where we left off in the last video I kind of gave you a question. Find an interval so that we're reasonably confident-- we'll talk a little bit more about why I have to give this kind of vague wording right here-- reasonably confident that there's a 95% chance that the true population mean, which is p, which is the same thing as the mean of the sampling distribution of the sampling mean. So there's a 95% chance that the true mean-- and let me put this here. This is also the same thing as the mean of the sampling distribution of the sampling mean is in that interval. And to do that let me just throw out a few ideas. What is the probability that if I take a sample and I were to take a mean of that sample, so the probability that a random sample mean is within two standard deviations of the sampling mean, of our sample mean? So what is this probability right over here? Let's just look at our actual distribution. So this is our distribution, this right here is our sampling mean. Maybe I should do it in blue because that's the color up here. This is our sampling mean. And so what is the probability that a random sampling mean is going to be two standard deviations? Well a random sampling is a sample from this distribution. It is a sample from the sampling distribution of the sample mean. So it's literally what is the probability of finding a sample within two standard deviations of the mean? That's one standard deviation, that's another standard deviation right over there. In general, if you haven't committed this to memory already, it's not a bad thing to commit to memory, is that if you have a normal distribution the probability of taking a sample within two standard deviations is 95-- and if you want to get a little bit more accurate it's 95.4%. But you could say it's roughly-- or maybe I could write it like this-- it's roughly 95%. And really that's all that matters because we have this little funny language here called reasonably confident, and we have to estimate the standard deviation anyway. In fact, we could say if we want, I could say that it's going to be exactly equal to 95.4%. But in general, two standard deviations, 95%, that's what people equate with each other. Now this statement is the exact same thing as the probability that the sample mean, that the sampling mean-- not the sample mean, the probability of the mean of the sampling distribution is within two standard deviations of the sampling distribution of x is also going to be the same number, is also going to be equal to 95.4%. These are the exact same statements. If x is within two standard deviations of this, then this, then the mean, is within two standard deviations of x. These are just two ways of phrasing the same thing. Now we know that the mean of the sampling distribution, the same thing as a mean of the population distribution, which is the same thing as the parameter p-- the proportion of people or the proportion of the population that is a 1. So this right here is the same thing as the population mean. So this statement right here we can switch this with p. So the probability that p is within two standard deviations of the sampling distribution of x is 95.4%. Now we don't know what this number right here is. But we have estimated it. Remember, our best estimate of this is the true standard, or it is the true standard deviation of the population divided by 10. We can estimate the true standard deviation of the population with our sampling standard deviation, which was 0.5, 0.5 divided by 10. Our best estimate of the standard deviation of the sampling distribution of the sample mean is 0.05. So now we can say-- and I'll switch colors-- the probability that the parameter p, the proportion of the population saying 1, is within two times-- remember, our best estimate of this right here is 0.05 of a sample mean that we take is equal to 95.4%. And so we could say the probability that p is within 2 times 0.05 is going to be equal to-- 2.0 is going to be 0.10 of our mean is equal to 95-- and actually let me be a little careful here. I can't say the equal now, because over here if we knew this, if we knew this parameter of the sampling distribution of the sample mean, we could say that it is 95.4%. We don't know it. We are just trying to find our best estimator for it. So actually what I'm going to do here is actually just say is roughly-- and just to show that we don't even have that level of accuracy, I'm going to say roughly 95%. We're reasonably confident that it's about 95% because we're using this estimator that came out of our sample, and if the sample is really skewed this is going to be a really weird number. So this is why we just have to be a little bit more exact about what we're doing. But this is the tool for at least saying how good is our result. So this is going to be about 95%. Or we could say that the probability that p is within 0.10 of our sample mean that we actually got. So what was the sample mean that we actually got? It was 0.43. So if we're within 0.1 of 0.43, that means we are within 0.43 plus or minus 0.1 is also, roughly, we're reasonably confident it's about 95%. And I want to be very clear. Everything that I started all the way from up here in brown to yellow and all this magenta, I'm just restating the same thing inside of this. It became a little bit more loosey-goosey once I went from the exact standard deviation of the sampling distribution to an estimator for it. And that's why this is just becoming-- I kind of put the squiggly equal signs there to say we're reasonably confident-- and I even got rid of some of the precision. But we just found our interval. An interval that we can be reasonably confident that there's a 95% probability that p is within that, is going to be 0.43 plus or minus 0.1. Or an interval of-- we have a confidence interval. We have a 95% confidence interval of, and we could say, 0.43 minus 0.1 is 0.33. If we write that as a percent we could say 33% to-- and if we add the 0.1, 0.43 plus 0.1 we get 53%-- to 53%. So we are 95% confident. So we're not saying kind of precisely that the probability of the actual proportion is 95%, but we're 95% confident that the true proportion is between 33% and 55%. That p is in this range over here. Or another way, and you'll see this in a lot of surveys that have been done, people will say we did a survey and we got 43% will vote for number one, and number one in this case is candidate B. And then the other side, since everyone else voted for candidate A, 57% will vote for A. And then they're going to put on margin of error. And you'll see this in any survey that you see on TV. They'll put a margin of error. And the margin of error is just another way of describing this confidence interval. And they'll say that the margin of error in this case is 10%, which means that there's a 95% confidence interval, if you go plus or minus 10% from that value right over there. And I really want to emphasize, you can't say with certainty that there is a 95% chance that the true result will be within 10% of this, because we had to estimate the standard deviation of the sampling mean. But this is the best measure we can with the information you have. If you're going to do a survey of 100 people, this is the best kind of confidence that we can get. And this number is actually fairly big. So if you were to look at this you would say, roughly there's a 95% chance that the true value of this number is between 33% and 53%. So there's actually still a chance that candidate B can win, even though only 43% of your 100 are going to vote for him. If you wanted to make it a little bit more precise you would want to take more samples. You can imagine. Instead of taking 100 samples, instead of n being 100, if you made n equal 1,000, then you would take this number over here, you would take this number here and divide by the square root of 1,000 instead of the square root of 100. So you'd be dividing by 33 or whatever. And so then the size of the standard deviation of your sampling distribution will go down. And so the distance of two standard deviations will be a smaller number, and so then you will have a smaller margin of error. And maybe you want to get the margin of error small enough so that you can figure out decisively who's going to win the election.