Main content
Course: AP®︎/College Statistics > Unit 10
Lesson 9: Testing for the difference of two population proportions- Hypothesis test for difference in proportions
- Constructing hypotheses for two proportions
- Writing hypotheses for testing the difference of proportions
- Hypothesis test for difference in proportions example
- Test statistic in a two-sample z test for the difference of proportions
- P-value in a two-sample z test for the difference of proportions
- Comparing P value to significance level for test involving difference of proportions
- Confidence interval for hypothesis test for difference in proportions
- Making conclusions about the difference of proportions
© 2024 Khan AcademyTerms of usePrivacy PolicyCookie Notice
Hypothesis test for difference in proportions example
Hypothesis test for difference in proportions example.
Want to join the conversation?
- why do we use combined value for estimating standard deviation (p 2015-p 2000)? Why don't we calculate it with their real proportions?(4 votes)
- how would we find the mean of
this?(3 votes)- Since we're subtracting the two samples, the mean would be the 1st sample mean minus the 2nd sample mean (µ1 - µ2). Sal finds that to be
0.38 - 0.33 = 0.05
at6:46. In this video, Sal is figuring out if there is convincing evidence that the difference in population means is actually 0.(2 votes)
- On an exam, would it be safe to write that I'm assuming the we're sampling less than 10% of the population to meet the independence condition? Like what Sal does around2:45?
P.S- Happy New Year(2 votes) - wouldn't the null be less than or equal to because rejecting the null if its equal doesn't suggest an increase (the alternate), only a change?(1 vote)
- The null hypothesis is always equal to and will never be anything else, since it is, as its name implies, null, meaning that nothing is happening or, as Sal often says, "the no news here" hypothesis(1 vote)
- In the computation of
σ
, Sal observes that because the premise of the hypothesis test is that the null hypothesis is true, we assume thatp^_2015 = p^_2000
and thus use the combinedp^_c
as the basis for a "best estimate" ofσ
. This I understand and concur with.
However, it's not clear to me why the numerator for computingz
isp^_2015 - p^_2000
rather thanp^_2015 - p^_c
. In other videos, Sal describes the numerator asp^ - p_0
, wherep_0
is the presumed proportion of the population. In this case, wouldn'tp^_c
be a better estimate of the population proportion thanp^_2000
?(1 vote) - what happens if the test case for normalcy is not met because of one or two expected counts?(1 vote)
- If the test case for normalcy is not met, then technically, you can't continue the problem and must stop.(1 vote)
- If we are rejecting null hypothesis shouldn't we accept the alternate hypothesis which we assumed that myopia is becoming more common over time?(1 vote)
Video transcript
- [Instructor] We are told
that researchers suspect that myopia, or nearsightedness, is becoming more common over time. A study from the year 2000
showed 132 cases of myopia in 400 randomly selected people. A separate study from
2015 showed 228 cases in 600 randomly selected people. So what we're going to do in this video is do a hypothesis test
to see if we have evidence to suggest the researcher
suspicion that myopia is becoming more common over time. If at any point you are
inspired, I encourage you to pause the video and
try to work through things on your own, but here I go. I'm going to do it with you. So let's just start
off by setting our null and alternative hypothesis. So remember, our null
hypothesis, this would be that the known news here. So that would be that
contrary to their suspicions, that myopia is not becoming more common. And so the way that we're
measuring more common over time is we could look at
the proportion of folks who have myopia in 2015 and compare that to the proportion in 2000. So our null hypothesis is
that there's no difference. Is that the true proportion
of folks who have myopia in 2015 is equal to
the proportion of folks who have myopia in 2000. And then our alternative
hypothesis, remember, they are, they suspect it's
becoming more common over time. So that would be a situation
where our true proportion in 2015 is greater than the
true proportion in 2000. In this scenario, myopia would be becoming more common over time because
2015 happens after 2000. So before we even go about
testing our null hypothesis, seeing if we can reject or not, which would suggest our alternative, you have to look at your
conditions for inference. And we've done this many times before. You have your random
condition, and it looks like we meet that because in both
of the samples we have 400 randomly selected people,
randomly selected people. So that looks good. Then you have your normal condition. And to meet your normal condition,
your number of successes and failures in each of the
samples have to be at least 10. And we see that that is the case. We have 132 successes so to
speak, not that it's a success for someone to have
myopia, but the way this is being constructed that would be a success. And then 400 minus 132 failures. In each case, either of those numbers would be greater than 10. And same thing for the sample from 2015, so we're meeting both of those. And then the last condition
that we always talk about, is the independence condition. And two ways to get there,
either you are sampling with replacement or you feel
good that your sample size is no more than 10
percent of the population. And I think it is safe to say
that even this larger sample of 600, that there is more
than 6,000 people out there. And so I think it's reasonable
to say that we're meeting that independence condition. Even though they're not
making it explicit here. But it's good to always think about this. Now the next thing you wanna
do in a hypothesis test is set your significance
level, your alpha. And I'll set my
significance level to 0.05. So we're not going to
assume the null hypothesis and say, well what is the
probability of getting a difference between 2015
and 2000 that is at least as large as the one that we got. And in that probability is less
than our significance level then we would reject our null hypothesis and that would suggest the alternative. If that probability is greater
than our significance level, then we fail to reject the null hypothesis and we fail to have evidence
for the researchers suspicion. So let's move ahead with that. So what we wanna do, let's
come up with a Z value, or a Z score. So our Z is going to be
equal to a sample proportion in 2015 minus our sample
proportion in 2000. All of that over a
standard deviation of the sampling distribution of
the difference between the sample proportions in 2015 and 2000. Now, this is going to be,
and I will say approximately equal to, we can calculate
this numerator exactly, but this denominator we
are going to estimate. So this numerator is going
to be, let's see, in 2015, I'll use some different
colors, 2015 we have 228 cases out of 600. So it's 228 out of 600. And then in 2000, we have
132 cases out of 400. So minus 132 over 400. And then all of that over the square root. And what we use in the denominator here, under the radical sign, is we
use the combined proportion. Could write that as P hat sub C. And the reason why we use
the combined proportion, we talked about this in previous videos, is remember, when we do a hypothesis test, we assume that our null
hypothesis is true. And if our null hypothesis
is true there's no difference between our proportions in 2015 and 2000. And so to get a better estimate
of the true proportion, well we should just add up our samples. So our sample size would be 600 plus 400 and the number of cases of
myopia would be 228 plus 132. Plus 132. Which would get us to,
what is this, 360 over 1000 which is equal to 0.36. And there, and we can use
that inside the expression when we're trying to estimate
our standard deviation of this sampling distribution. So this is going to be
0.36 times one minus 0.36, which would be 0.64 over
the sample size in 2015, which is 600, plus 0.36 times 0.64 over the sample size in
2000, which is equal to 400. And let's see, before I
even get my calculator out, I think I can simplify this a little bit. 228 over 600, 228 divided
by 6 is going to be equal to 38, so this would be 0.38. Let's see, 132 divided
by four would be 33, so this would be 0.33. And so our entire numerator
is going to be 0.05, 0.05. And so noW I can put
this into my calculator and I will get 0.05 divided
by the square root of, let's see, I'm gonna have
0.36 times 0.64 divided by 600 plus 0.36 times 0.64 divided
by 400 is going to get me approximately 1.61. So this is going to be approximately 1.61. And so one way to think about it is, the difference that we got
between our sample proportions, between 2015 and 2000 of 0.05,
but that is 1.61 standard deviations above our mean of
our sampling distribution, if we assume that the
null hypothesis is true. And so from this, we can
calculate our P value. Remember, our P value,
our P value is equal to the probability that our Z
score is at least that big, is greater than or equal to 1.61. And one way to think about
it, if you look at the sample distribution, I
really could just look at any normal distribution now
since we normalized for a Z, so we're looking at 1.61 standard
deviations above the mean. So Z is equal to 1.61. So we're thinking about
this area right over here. That would be our P value. And to help us with that,
we can get out a Z table. And we see this Z table
gives us the cumulative area up to some Z score, and
so we would just have to whatever this gives us, we would just have to do one minus that. So if we go to 1.61, we get 0.9463. So it would be one minus 0.9463. Is equal to one minus
0.9463, which is equal to, let's see, it's 0.0537. And notice, this P value
is ever so slightly higher than our significance level. But this is why we wanna
set our significance level ahead of time. We don't wanna get tempted
to say oh, I'm so close, let me just raise my significance
level a little bit more so that I can reject my null hypothesis and then I can have something that I can tell my friends about. No, that would not be good science. That would not be good statistics. We have to be disciplined. So here, because our P
value, our P value is greater than our significance level,
even though it's very, it's by a very small amount, we fail to reject our null hypothesis. And another way to think about
it, in terms of the context of the question, we can
say that there is not enough evidence to suggest that myopia becoming more common over time. Myopia becoming more common over time. And we're done.