If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

# Example constructing and interpreting a confidence interval for p

Check conditions, calculate, and interpret a confidence interval to estimate a population proportion.

## Want to join the conversation?

• When calculating the standard error, is it better to calculate the unbiased standard deviation of the sample and divide that by the positive root of the sample size,
here sqrt( (30*(0.4)^2 + 20*(0.6)^2) / 49 / 50) = 0.0699854...,
or use the formula of the standard error of the sample distribution, using p hat as an estimate of p ?
here sqrt(0.4 * 0.6 / 50) = 0.0692820...

The two don't give the same result. Sal uses both methods in different videos without saying why, so some extra explanation would be helpful!
• Actually, your results are equal if you round them up to the tenth, which are both 0.07.

The similarity between the two formulas is they are both the formulas for calculating the standard deviation of the sample distribution (sigma_x_bar). Which is directly used to calculate the CI (confidence interval).

The difference is that the first one is for dealing with continuous random variable (like weight, height) and the second one is for dealing with binary random variable (or Bernoulli random variable - success/failure).

Up to now, the two formulas give roughly the same results, but I'd advise that in problems that deal with continuous random variable, we use the first formula; and in problems that deal with binary random variable, we use the second formula.
• How do we decide what method to use to estimate the Standard Error.

Method 1) Perform correction. So standard error = sqrt(p hat * (1- p hat) / (n-1) )

Method 2) Do not perform correction. Standard error = sqrt(p hat * (1- p hat) / n )
• It's worth noting that the 'correction' here is incorrect. The actual correction would be to first find the sample standard deviation:

S = SQRT(n * p-hat * (1- p-hat)/(n-1))

then the unbiased Standard Deviation of the Sample Distribution of p-hat is:

Std deviation of p-hat = S/SQRT(n).

As others below point out, whenever population parameters are unknown, it's probably best to use the correction method to avoid bias. I'll happily defer to any experts who can give a more valid explanation.
(1 vote)
• what about using ((std. dev. of sample parameter)/sqrt(n)) instead of standard error?
• that would be for problems that deal with continuous random variable (salary, weight, height, ...), although in this problem they give roughly the same numbers.
(1 vote)
• what is the difference between s/sqrt(n) vs sqrt(p*(1-p)/n)? I believe this is the same formula but not too clear why this is the case and when to used each.
• The formulas s/sqrt(n) and sqrt(p*(1-p)/n) are used in different contexts:

s/sqrt(n) is used to calculate the standard error of the sample mean when the population standard deviation (s) is known. This formula is typically used in situations where you have quantitative data and you're estimating the population mean.

sqrt(p*(1-p)/n) is used to calculate the standard error of the sample proportion when dealing with categorical data (e.g., proportion of success or failure). This formula is used when estimating the population proportion from a sample.
(1 vote)
• Why Does it matter if the equation is independent or not?
• If our sampled trials are not independent then that means each successive trial will not necessarily be equivalent. Because of this, our inferences could be skewed to the right/left because our "supposed" probability will be overestimate/underestimating the real value.

Hope this helps,
- Convenient Colleague
• What ment was why did Sal make it 99.5% with the 0.50% above and not below the middle 99%?
(1 vote)
• Many Z-tables show the area under the curve from -inf up to a point, so if we want to have 99% confidence, it means we want to have 0.5% area left at the left and right side. See also
• When we find z*, why can't we just find that by multiplying the standard deviation by 3? (99 percent is 3 standard deviations in a normal distribution.) In this case, couldn't we just multiply the standard error by 3? I did this, but I didn't get the right answer.
(1 vote)
• 3 standard deviation away from the mean would actually cover 99.7% of the whole distribution (according to the empirical rule).

Although 0.7% doesn't seem to be much, but looking at the z-table, the smallest z-score that covers 99.0% is around 2.32 and 2.33 while a z-score of 3 covers 99.87% of the whole distribution already. The difference in z-score here is around 0.68 or 0.67 which is a lot.

My bottom line is when the confidence level isn't one of the 3 thresholds in the empirical rule, look up the z-score in a z-table. You can even just lookup a z-table for the z-score anyway for better accuracy.
(1 vote)
• Since we have to use an estimate of the population standard deviation, rather than the actual population standard deviation, shouldn't we be using the t-statistic rather than the z-statistic?
(1 vote)
• Yes, when the population standard deviation is unknown and needs to be estimated from the sample, it's more appropriate to use the t-distribution rather than the z-distribution. The t-distribution takes into account the additional uncertainty introduced by estimating the population standard deviation from the sample. However, for large sample sizes (typically n > 30), the t-distribution converges to the standard normal distribution, so using the z-statistic is often acceptable.
(1 vote)
• Can somone demonstrate a problem for me?
(1 vote)
• Problem: Della wants to estimate the proportion of songs on her mobile phone that are by a female artist. She takes a simple random sample of 50 songs and finds that 20 of them are by a female artist. Based on this sample, what is a 99% confidence interval for the proportion of songs by a female artist on her phone?

Solution:
Step 1: Check conditions

Random: Della took a simple random sample, so this condition is met.
Normal: We need at least 10 successes and 10 failures. In this case, Della has 20 successes and 30 failures, so this condition is met.
Independence: Della's sample size (50) is less than 10% of her total songs (500), so we can consider the observations independent.
Step 2: Calculate the confidence interval

Sample proportion (p-hat) = 20/50 = 0.4
Standard error of the sample proportion = sqrt((p-hat * (1 - p-hat)) / n) = sqrt((0.4 * 0.6) / 50) ≈ 0.08165
Critical value (z-star) for a 99% confidence level corresponds to leaving 0.5% in each tail of the standard normal distribution, which is approximately 2.576.
Now, we can construct the confidence interval:
Lower bound = p-hat - (z-star * standard error) = 0.4 - (2.576 * 0.08165) ≈ 0.1965
Upper bound = p-hat + (z-star * standard error) = 0.4 + (2.576 * 0.08165) ≈ 0.6035

Therefore, the 99% confidence interval for the proportion of songs by a female artist on Della's phone is approximately [0.1965, 0.6035].
(1 vote)
• Why didn't he use the value of .9901 from the table if he wants 99% confidence? Why did he choose what he did from the table?
(1 vote)
• Z scores give the area below a point on a curve, so if we want the critical z score for 99 percent confidence, we want the z score that gives the area under that curve, and includes that 99 percent, which, if normal distributions weren't symmetrical, would be, the z score for 0.99, but since normal distributions are symmetrical, we have the piece that is not included in the 99 percent split between both sides, which means that we would need to subtract the other side because, z scores give area, under a point.