If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

# Confidence interval example

Sal calculates a 99% confidence interval for the proportion of teachers who felt computers are an essential tool. Created by Sal Khan.

## Want to join the conversation?

• This video got me confused. In the introductory video on confidence intervals:

Sal solves a very similar problem. In both problems we're trying to estimate the standard deviation of the sampling distribution of the sample mean. And in the introductory video, Sal defines standard error of p-hat as:
`SE_p-hat = √(p-hat·(1 - p-hat)/n)`
and says that it is an unbiased estimator for standard deviation of sampling distribution.

In this video, he calculates:
`σ_p-hat = σ/√n`
`σ = √(p-hat·(1 - p-hat)·n/(n - 1))`
`σ_p-hat = √(p-hat·(1 - p-hat)/(n - 1))`
Clearly, we're getting a different estimate than what we would've got by calculating standard error. So, is standard error not, in fact, an unbiased estimator? Or is there some mistake in this video?
• I just cut to the chase from the question and did the square root of (0.568*0.432) / 250 and got the same SE answer as him (0.031). I am not sure why he had to treat it as a Bernoulli at first and add in extra steps.
• Why can't we use p(1-p) for the sample variance? When I do the calculations it works out the same if rounded. Then the formula for variance of sample distribution of the sample mean would be p(1-p)/n which is much easier to remember.
• I think he was just using the sample means of the Bernoulli trials. Which made sence to him and then seeing that through. I agree with you that when dealing with proportions use p(1-p)/n.
• Why did we not straight off consider the distribution of the sample proportion as binomial distribution and proceed to find the standard error using, sq rt[ (sample proportion * (1 - sample proportion))/n ]?
• So I am reviewing stats for grad school and my school provides a brief review. On the section on confidence intervals it says this:

You can calculate a confidence interval with any level of confidence although the most common are 95% (z*=1.96), 90% (z*=1.65) and 99% (z*=2.58).

This confused me a bit. Maybe I am doing something wrong but these numbers don't seem to match up with a z-score chart. Can anyone shed some light on what might be happening here?
• For confidence intervals based on the normal distribution, the critical value is chosen such that P( -z <= Z <= z ) = 0.95. That is, we want an interval that is symmetric about the mean. The middle part, inside of the critical values, must be the confidence level. The two tails must combine to be α, so each tail is α/2.

Hence, for a 95% confidence interval, instead of looking up 0.05 or 0.95, we want to look up 0.25 or 0.975 in the Z-table, and get the Z critical values from those. Doing so, we would obtain the values your review noted.
• I do not understand why there is -1 in denominator while calculating Variance
• But he did not use (n-1) in any of sampling distribution (earlier sections)
• So for the sampling distribution of the sample mean here, we seem to be assuming a normal distribution as usual, that is to say it extends forever in both directions. Doesn't this cause problems if say, our p is very close to 0 or 1, for example if 99% teachers in our sample had been in favour of the computers, we would end up calculating the population mean would be just as likely to be over 1 as under 0.98, which is clearly impossible. How do you correct that?
• When dealing with proportions, there's a general rule that we need to check.
``n*p > 5n*(1-p) > 5 ``

Though note that sometimes the 5 is replaced with 10. When both of these conditions are satisfied, then it's generally reasonable to assume that the sampling distribution of the sample proportion (the sample mean of data that takes values 0 or 1 ). So say p was 99%, then we'd have:
``n*p = 250*0.99 = 247.5n*(1-p) = 250*0.01 = 2.5 ``

The second one is not larger than 5, so in such a case it would not be reasonable to assume a Normal distribution; we'd need the sample size to be much larger. This is related to the Central Limit Theorem, forcing the sample size to be large enough so that the approximation is reasonable.

Though, there's always a possibility of still having extremely rare events (like some rare disease, where 1 in 10000 people have it) and so the raw proportion isn't a very useful measure. Sometimes instead of the proportion, people will think about the "odds," defined as p / (1-p), and the natural log of this quantity is generally assumed to be normally distributed.
• Where did the .495 come from? at
• We want to be 99% confident i.e. with probability of 0.99, sample mean lies in the confidence interval. Since confidence interval is symmetrical about mean of sampling distribution of sample means, so we want 0.99/2=0.495 probability on both sides of mean. From here only, 0.495 was calculated.According to what happy 2332 said. If you look at confidence interval 1, Sal tells you why you want to divide it by 2. Because you only want between the mean and your z score. Because the z score tells you everything to the left of the z score you want to know what is only between. Then and only then, can you multiply to find the interval of your z score.
• What is the difference between Standard Error and Standard Deviation? Why doesn't he use the formula where it's square root of 1-p times p over n? What is the difference between that formula and the formula standard deviation over the square root of the samples? I am so confused.
• The difference between standard error and standard deviation lies in their application and interpretation. Standard deviation (SD) measures the dispersion or spread of a set of values in a population or sample. It tells you how much individual values typically differ from the mean. Standard error (SE), on the other hand, measures the precision of a sample statistic (such as the sample mean or sample proportion) as an estimate of a population parameter. It tells you how much the sample statistic is expected to vary from the true population parameter across different samples.Regarding the formula difference, the formula you mentioned, √[(1-p) * p / n], is used to calculate the standard error of the sample proportion (p-hat) in a binomial distribution, where p represents the population proportion and n represents the sample size. In the context of the provided explanation, the standard deviation of the sample (0.50) divided by the square root of the sample size (n) is used to estimate the standard error of the sample mean, which is a different statistic.
(1 vote)
• This can more easily done and understood as a CI of proportions which resolves quickly into [0.487, 0.649].
phat = 0.568, z-star = 2.58, and n = 250
CI = phat +/- 2.58 sq rt [phat ((1-phat)/n)]
I suggest this video be redone to eliminate a lot of confusion. Why go to binomials and means - easy proportion!
This is the first time on KA I found a disagreeable lesson.
The benefit was I reviewed using a lot of sources and know more now than I would have otherwise! So, thank you!!