Main content

### Course: AP®︎/College Statistics > Unit 10

Lesson 9: Testing for the difference of two population proportions- Hypothesis test for difference in proportions
- Constructing hypotheses for two proportions
- Writing hypotheses for testing the difference of proportions
- Hypothesis test for difference in proportions example
- Test statistic in a two-sample z test for the difference of proportions
- P-value in a two-sample z test for the difference of proportions
- Comparing P value to significance level for test involving difference of proportions
- Confidence interval for hypothesis test for difference in proportions
- Making conclusions about the difference of proportions

© 2024 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Confidence interval for hypothesis test for difference in proportions

Confidence interval for hypothesis test for difference in proportions.

## Want to join the conversation?

- Shouldn't the null have been rejected? Because the difference is significant as long as p is less than or equal to alpha.(3 votes)
- The p-value isn't calculated or shown here. What the video is stating is that there is 95% confidence that the confidence interval will overlap 0 (P in-person = P online, which means they have a sample difference of 0). Since the confidence interval (-0.04, 0.14) does include zero, it is plausible that p-value is greater than alpha, which means we failed to reject the null hypothesis.(6 votes)

- How does one construct a confidence interval without having a standard deviation for the sampling distribution? (There's no mention of sample sizes, so I don't know how to calculate the sigma.) Maybe that information was omitted because the point of the lecture was interpreting the CI rather than how to calculate it?(1 vote)
- Correct on both accounts: 1) standard deviation of the sampling distribution of the sample differences is needed to compute the confidence interval (we presume said standard deviation was figured "behind the scenes" and not shown to us); and 2) that information was extraneous to the purpose of the video.

But: Fun fact! There's enough information provided for us to work out what the standard deviation used was. Recall that`C.I. = (p^_1 - p^_2) ± z* ⋅ σ_p^1-p^2`

. By examining the C.I. provided, we can see that it is`0.05 ± 0.09`

. Thus,`p^_1 - p^_2 = 0.05`

and`z* ⋅ σ_p^1-p^2 = 0.09`

. For a 95% confidence interval,`z* = 1.96`

. So,`1.96 ⋅ σ_p^1-p^2 = 0.09`

and thus`σ_p^1-p^2 = 0.09 ÷ 1.96 ≈ 0.0459`

.

Now, we also have all the information needed to compute the associated P-value. Left as an exercise for the reader. :)(2 votes)

- While the specific numbers given in this case mean that there is no way to make a 95% confidence interval that includes zero and still reject the null hypothesis that the difference equals zero at the 5% level of significance, it is not in general true for all confidence intervals that could have been given for this problem that include zero. For example, leaving the sample proportions the same, if the in-person sample included 580 students, and the online sample included 493 students, then the 95% CI for the difference in the means is (-0.00021, 0.10021), but the p-value for the hypothesis test that the mean is equal to zero is 0.049886.

The supposed link between the 95% CI and 5% alpha hypothesis test in this video isn't necessarily true because in each case we make different assumptions about the variance of the distribution of the difference in the means. When we test to see whether the difference is zero, we begin by**assuming**that the difference is zero and see how likely it is to get a result at least this extreme. So the variance of the sample means is assumed to be the same for each of the two samples, in which case our best guess for that variance is the estimator for variance that uses the combined estimated proportion of our two samples. On the other hand, when we're trying to pin down the range of differences for which this result could have occurred in 95% of samples, there are many potential differences for which the variance of each sample is different than the other, and so our best estimator for the difference variance uses the assumption that the best guess of the variance of each sample mean is the one calculated using just the proportion of that sample.

The problem can be illustrated as two bell curves centred at different points on the x-axis, one centred at 0 and the other centred at 0.05. The presumption of this video is that if x=0 lies within the middle 95% of the curve centred at 0.05, then 0.05 must lie within the middle 95% of the curve centred at 0, and so we must not be able to reject the hypothesis of mean difference=0 given the sample difference of 0.05. This isn't necessarily true, however, because the two distributions have different variances, and so one is more stretched out than the other. If you set the numbers just right, it is possible to have a 95% CI that slightly extends past 0, and yet still have a p-value < 0.05 for the hypothesis test.(1 vote)- If I'm understanding correctly, we're discussing the relationship between a
`P-value < α`

test and a confidence interval including zero.

Note that we're discussing a two-sided hypothesis`H0: p̂_1 = p̂_2`

.

Let`f(x)`

be the`z`

value for which the area under a standard normal curve from`-f(x)`

to`f(x)`

is`x`

. So, for example,`f(0.95) = 1.96`

, since an area of 0.95 centered under the standard normal curve would yield a`z`

value of 1.96.`P-value < α`

tells us to**reject**`H0`

. To**fail to reject**`H0`

we have`P-value ≥ α`

. This test is equivalent to`-z_x ≤ ẑ ≤ z_x`

where`ẑ`

is the`z`

value corresponding to the P-value -- it is defined in the videos as`(p̂_1 - p̂_2) ÷ σ`

-- and`z_x`

is the`z`

value corresponding to`α`

. When we reject`H0`

due to`P-value < α`

,`α`

refers to the area under the standard normal curve tails below`-z_x`

and above`z_x`

. For failing to reject`H0`

, the rest of the area under the curve applies: the area under the curve from`-z_x`

to`z_x`

; and this area is necessarily equal to`1 - α`

. Thus we could say that`z_x = f(1 - α)`

.

Now, using the confidence interval, we can say it includes zero when`(p̂_1 - p̂_2) - z* ⋅ σ ≤ 0 ≤ (p̂_1 - p̂_2) + z* ⋅ σ`

This follows from the definition of confidence interval as`(p̂_1 - p̂_2) ± z* ⋅ σ`

.

Divide by`σ`

(where`σ ≥ 0`

):`(p̂_1 - p̂_2) ÷ σ - z* ≤ 0 ≤ (p̂_1 - p̂_2) ÷ σ + z*`

Subtract`(p̂_1 - p̂_2) ÷ σ`

:`-z* ≤ -(p̂_1 - p̂_2) ÷ σ ≤ z*`

Multiply by -1:`z* ≥ (p̂_1 - p̂_2) ÷ σ ≥ -z*`

Rearrange:`-z* ≤ (p̂_1 - p̂_2) ÷ σ ≤ z*`

Now, note that`(p̂_1 - p̂_2) ÷ σ`

is precisely`ẑ`

from above. Thus we have`-z* ≤ ẑ ≤ z*`

Therefore, we've shown the two tests are equivalent provided that`z_x = z*`

.

Now, what is`z*`

for a confidence interval?`z* = f(CI)`

.

Thus the two tests are equivalent when`f(1 - α) = f(CI)`

. And so the P-value and confidence interval tests are equivalent when`CI = 1 - α`

.

This is how the video equates a 5%`α`

to a 95% confidence interval.

You don't show how you arrived at your numbers, but I suspect the reason you're observing apparent discrepancies is because you're using different values of`σ`

for the two tests.(1 vote)

## Video transcript

- [Narrator] A university
offers a certain course that students can take in-person or in an online setting. Teachers of the course were curious if there was a difference
in the passing rate between the two settings. Data from a recent semester showed that 80% of students passed
the in-person setting and 75% of students
passed the online setting. They were willing to treat these as representative samples of all students who may take each setting of the course. The teachers used those results to make a 95% confidence interval to estimate the difference between
the proportion of students who pass in each setting of the course. So this is a 95% confidence interval for the difference between the proportion who passed the in-person course and the online course. The resulting interval was approximately, it went from negative 0.04 to 0.14. Just to make sure we understand what this is saying, this is saying 95% of the time that you go through this, because we're talking about
a 95% confidence interval, 95% of the time you take these samples and then you construct
a confidence interval for the difference in proportions, that it will actually
contain the true proportion. They want to use this interval to test their null hypothesis
that the true proportions are the same versus their
alternative hypothesis that the true proportions are different. Assume that all conditions for inference have been met. Based on the interval, what do we know about the corresponding P-value and conclusion to their test? So pause this video and try to figure out on your own. All right, so what's interesting here is we're going to use
a confidence interval to think about a hypothesis test. Remember, in a hypothesis test we assume that our null
hypothesis is true. We'll assume this. There's another way we could write it. We could write it like this,
that the difference between the in-person and the
online, true proportions, is equal to zero. These are equivalent statements. In a hypothesis test, we will assume that this is true. Then in a traditional hypothesis test, we set some significance level. So let's say we set that
significance level at 5% and that is a very typical
significance level. And if the results we
get, if the probability of getting the results that we do get for the difference in
the sample proportions is less than 5%, we say
hey that's pretty unlikely we're gonna reject the null hypothesis which will suggest the alternative. But here we have something interesting. We have a confidence interval. It turns out that if the
sum of your confidence level and your significance
level is equal to 100% and you're doing a
two-sided hypothesis test, so you're thinking about well, our alternative hypothesis isn't just that the in-person is greater than the online, or that it's less than the online, it's that they are different. We have a two-sided hypothesis test. In these situations you can
actually make some inferences about your P value from
your confidence interval. Think about it this way, we are assuming our null hypothesis is true when we do this hypothesis test. And so when we construct
a 95% confidence interval, we would expect that 95%
of confidence intervals would overlap with zero. Where did I get zero from? Remember, this is a confidence interval for the difference in proportions. Our null hypothesis is
that the true difference in proportions is zero. So 95% of the time that we do this, if we assume that the
null hypothesis is true, we will overlap with zero. Or another way you can think about it. 5% of confidence intervals
would not overlap with zero. So if you are in a situation where you go through this process, you try to construct a 95% confidence interval and you don't overlap with
your assumed difference of the true proportions
from your null hypothesis, well in this situation your P value is going to be less than
your 5% significance level. In this situation you would
reject your null hypothesis. In this first situation your P value is going to be greater than
or equal to your alpha level and you would fail to reject. So what's the situation here? Well our interval actually does include the assumed difference in true proportions from the null hypothesis. So that means assuming the null hypothesis we are in this first scenario. This is one of the 95%
of confidence intervals where we actually did overlap with the true parameter that
we are trying to estimate. In that situation our P value is going to be greater than or equal to our alpha, which in this case is 5%. So we fail to reject the null hypothesis. So there isn't evidence to suggest that there is a true difference in passing grades between the in-person and the online exam.