If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

### Course: Statistics and probability>Unit 13

Lesson 1: Comparing two proportions

# Comparing population proportions 2

Sal continues the election example for population proportions. Created by Sal Khan.

## Want to join the conversation?

• Why is Sal not taking "corrected standard deviation"? I expected him to multiply variance by (1000/999).
• I have the same question as does someone above.
• can someone explain me why he changed from 95% to 97.5% to find z?
• is almost wrong. It is not there is 95% chance the true population mean difference is within the calculated statistical mean difference. It is that, if we take many more such statistic, and CI each time, 95% of those CIs would contain true population mean difference.
• I'm confused with that too. Hope someone can help explain that.
(1 vote)
• The variance presented on the video for the Bernoulli distribution is the population variance, however what we have is only a sample, so shouldn't it be, men for example, S^2_1=(642(1-0.642)^2+(1000-642)(0-0.642)^2)/999?
• At how it's evident that 95% chance that men are more likely to vote for the candidate than women?
• Thanks- this was confusing because of the way he mouses over the left number .008 stating "Men will be more likely to vote for candidate a" then mouses over to the right number .094 "than women". You start thinking the left is men and the right is women, but that wouldn't make sense.
• If we are asked to draw a 95% confidence interval, why can we not just use the empirical rule to know that the mean must be within 2 standard deviations? Why use the z table at all?
(1 vote)
• The Empirical Rule is an approximation. It's certainly useful, but if we're going to the trouble of making a confidence interval, we may as well be precise.

Additionally, the Empirical Rule corresponds to the Z distribution. Using this for the confidence interval means that you assume you know the population standard deviation. More often, we cannot asume this, and we need to use the t-distribution, for which there is no Empirical Rule.
• Has Sal posted any videos on exactly how to read and use a z or t-table? If not, that would be very helpful.
• For a z-table, you look at how many standard deviations your value is from the mean (its z-score), which should have at least hundredths, look for the row that has the same ones and tenths as your z-score (if your score is 1.56, look for the row starting 1.5) then look for the column with the same hundredths as your score (if your score is 1.56, look for the column with 0.06 at the top). The value in the box in that column and row is the probability of a random score falling in the area below your score. You can also read it the other way, as Sal does in this video, by looking for a percentage and finding the z-score from there.

I'm not completely sure how to read a t-table, but I think you first look at the top two rows to decide which type of t-distribution you have (one-sided means a wonky distribution, asymmetrical, and two-sided means symmetrical, resembling a normal distribution but, as Sal calls it, with 'fatter tails'). From there, you find the column with the percentage you want and follow it down to the row with the appropriate degrees of freedom (listed on the far left column).
• How can we conclude that Men are more likely to vote for the candidate than Women when our confidence interval for difference of means is as low as 0.008? (0.8%) Isn't that statistically insignificant?
• I think you are confusing confidence interval with p-value obtained for a hypothesis testing. We did not perform a hypothesis testing here. Hypothesis testing is done in the next video https://www.khanacademy.org/math/statistics-probability/significance-tests-confidence-intervals-two-samples/comparing-two-proportions/v/hypothesis-test-comparing-population-proportions

Here, we calculate CI for p1-p2, where p1 corresponds to p_men, and p2 corresponds to p_women.

Now
- if p1>p2 i.e men more likely to vote than women then, p1-p2 will be +ve
- if p1<p2 i.e men less likely to vote than women then, p1-p2 will be -ve
- if p1=p2 i.e men equally likely to vote as women then, p1-02 will be 0.

Given the CI here starts with a positive value of 0.008 at lower bound and has a higher bound of 0.094, the entire range of values is +ve; i.e. p1-p2 is +ve , i.e p1>p2, which concludes that men are more likely to vote this candidate than women
(1 vote)
• When the sampled data from two populations has a normal distribution but we don't know the standard deviation of either population, we use the sample standard deviation instead and we then have to use the student-t distribution for our calculations. However, for this example, we don't know the standard deviations of either population yet when we estimate it using the pooled sample standard deviation we can use the normal distribution (z-score) for our calculations. Why is it that we can use the z-score for this case when our test statistic uses the estimate for the SD?