If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

# Confidence interval for hypothesis test for difference in proportions

Confidence interval for hypothesis test for difference in proportions.

## Want to join the conversation?

• Shouldn't the null have been rejected? Because the difference is significant as long as p is less than or equal to alpha.
• The p-value isn't calculated or shown here. What the video is stating is that there is 95% confidence that the confidence interval will overlap 0 (P in-person = P online, which means they have a sample difference of 0). Since the confidence interval (-0.04, 0.14) does include zero, it is plausible that p-value is greater than alpha, which means we failed to reject the null hypothesis.
• How does one construct a confidence interval without having a standard deviation for the sampling distribution? (There's no mention of sample sizes, so I don't know how to calculate the sigma.) Maybe that information was omitted because the point of the lecture was interpreting the CI rather than how to calculate it?
(1 vote)
• Correct on both accounts: 1) standard deviation of the sampling distribution of the sample differences is needed to compute the confidence interval (we presume said standard deviation was figured "behind the scenes" and not shown to us); and 2) that information was extraneous to the purpose of the video.

But: Fun fact! There's enough information provided for us to work out what the standard deviation used was. Recall that `C.I. = (p^_1 - p^_2) ± z* ⋅ σ_p^1-p^2`. By examining the C.I. provided, we can see that it is `0.05 ± 0.09`. Thus, `p^_1 - p^_2 = 0.05` and `z* ⋅ σ_p^1-p^2 = 0.09`. For a 95% confidence interval, `z* = 1.96`. So, `1.96 ⋅ σ_p^1-p^2 = 0.09` and thus `σ_p^1-p^2 = 0.09 ÷ 1.96 ≈ 0.0459`.

Now, we also have all the information needed to compute the associated P-value. Left as an exercise for the reader. :)
• While the specific numbers given in this case mean that there is no way to make a 95% confidence interval that includes zero and still reject the null hypothesis that the difference equals zero at the 5% level of significance, it is not in general true for all confidence intervals that could have been given for this problem that include zero. For example, leaving the sample proportions the same, if the in-person sample included 580 students, and the online sample included 493 students, then the 95% CI for the difference in the means is (-0.00021, 0.10021), but the p-value for the hypothesis test that the mean is equal to zero is 0.049886.

The supposed link between the 95% CI and 5% alpha hypothesis test in this video isn't necessarily true because in each case we make different assumptions about the variance of the distribution of the difference in the means. When we test to see whether the difference is zero, we begin by assuming that the difference is zero and see how likely it is to get a result at least this extreme. So the variance of the sample means is assumed to be the same for each of the two samples, in which case our best guess for that variance is the estimator for variance that uses the combined estimated proportion of our two samples. On the other hand, when we're trying to pin down the range of differences for which this result could have occurred in 95% of samples, there are many potential differences for which the variance of each sample is different than the other, and so our best estimator for the difference variance uses the assumption that the best guess of the variance of each sample mean is the one calculated using just the proportion of that sample.

The problem can be illustrated as two bell curves centred at different points on the x-axis, one centred at 0 and the other centred at 0.05. The presumption of this video is that if x=0 lies within the middle 95% of the curve centred at 0.05, then 0.05 must lie within the middle 95% of the curve centred at 0, and so we must not be able to reject the hypothesis of mean difference=0 given the sample difference of 0.05. This isn't necessarily true, however, because the two distributions have different variances, and so one is more stretched out than the other. If you set the numbers just right, it is possible to have a 95% CI that slightly extends past 0, and yet still have a p-value < 0.05 for the hypothesis test.
(1 vote)
• If I'm understanding correctly, we're discussing the relationship between a `P-value < α` test and a confidence interval including zero.

Note that we're discussing a two-sided hypothesis `H0: p̂_1 = p̂_2`.

Let `f(x)` be the `z` value for which the area under a standard normal curve from `-f(x)` to `f(x)` is `x`. So, for example, `f(0.95) = 1.96`, since an area of 0.95 centered under the standard normal curve would yield a `z` value of 1.96.

`P-value < α` tells us to reject `H0`. To fail to reject `H0` we have `P-value ≥ α`. This test is equivalent to `-z_x ≤ ẑ ≤ z_x` where `ẑ` is the `z` value corresponding to the P-value -- it is defined in the videos as `(p̂_1 - p̂_2) ÷ σ` -- and `z_x` is the `z` value corresponding to `α`. When we reject `H0` due to `P-value < α`, `α` refers to the area under the standard normal curve tails below `-z_x` and above `z_x`. For failing to reject `H0`, the rest of the area under the curve applies: the area under the curve from `-z_x` to `z_x`; and this area is necessarily equal to `1 - α`. Thus we could say that `z_x = f(1 - α)`.

Now, using the confidence interval, we can say it includes zero when

`(p̂_1 - p̂_2) - z* ⋅ σ ≤ 0 ≤ (p̂_1 - p̂_2) + z* ⋅ σ`

This follows from the definition of confidence interval as `(p̂_1 - p̂_2) ± z* ⋅ σ`.

Divide by `σ` (where `σ ≥ 0`):

`(p̂_1 - p̂_2) ÷ σ - z* ≤ 0 ≤ (p̂_1 - p̂_2) ÷ σ + z*`

Subtract `(p̂_1 - p̂_2) ÷ σ`:

`-z* ≤ -(p̂_1 - p̂_2) ÷ σ ≤ z*`

Multiply by -1:

`z* ≥ (p̂_1 - p̂_2) ÷ σ ≥ -z*`

Rearrange:

`-z* ≤ (p̂_1 - p̂_2) ÷ σ ≤ z*`

Now, note that `(p̂_1 - p̂_2) ÷ σ` is precisely `ẑ` from above. Thus we have

`-z* ≤ ẑ ≤ z*`

Therefore, we've shown the two tests are equivalent provided that `z_x = z*`.

Now, what is `z*` for a confidence interval? `z* = f(CI)`.

Thus the two tests are equivalent when `f(1 - α) = f(CI)`. And so the P-value and confidence interval tests are equivalent when `CI = 1 - α`.

This is how the video equates a 5% `α` to a 95% confidence interval.

You don't show how you arrived at your numbers, but I suspect the reason you're observing apparent discrepancies is because you're using different values of `σ` for the two tests.
(1 vote)