Main content

## AP®︎/College Statistics

### Unit 10: Lesson 6

Concluding a test for a population proportion# Significance test for a proportion free response example

AP.STATS:

DAT‑3 (EU)

, DAT‑3.A (LO)

, DAT‑3.A.1 (EK)

, DAT‑3.A.2 (EK)

, DAT‑3.B (LO)

, DAT‑3.B.2 (EK)

, DAT‑3.B.8 (EK)

, VAR‑6 (EU)

, VAR‑6.D (LO)

, VAR‑6.D.1 (EK)

, VAR‑6.D.2 (EK)

, VAR‑6.D.3 (EK)

, VAR‑6.D.4 (EK)

, VAR‑6.D.5 (EK)

, VAR‑6.E (LO)

, VAR‑6.E.1 (EK)

, VAR‑6.F (LO)

, VAR‑6.F.1 (EK)

, VAR‑6.G (LO)

, VAR‑6.G.1 (EK)

, VAR‑6.G.2 (EK)

, VAR‑6.G.3 (EK)

, VAR‑6.G.4 (EK)

Carrying out every step of a significance test on a proportion.

## Want to join the conversation?

- Why do we assume the significance level is 0.05? (timestamp1:43)(5 votes)
- That's just a typical significance level. We could have chosen 0.01 or 0.1 as well, which are also typical significance levels, but it's worth noting that with any of these, we still get the same result (failing to reject the null hypothesis) in the original problem.(4 votes)

- Hi, why the standard deviation that you plug in the calculator is 1, instead of 0.0496?(5 votes)
- This is to ensure that the upper bound is equal to the z-score. The normalcdf function does not take in z-scores; it finds the area between a concrete lower and upper bound (rather than a relative z-score) given a mean and standard deviation. So, in order to make the bounds equal to the z-scores, the standard deviation is set to 1.(2 votes)

- when do you put the upper/lower bounds as infinity and negative infinity as opposed to 1 and 0? if it's a proportion wouldn't the distribution only go from 0 to 1? or are you assuming it has infinitely long tails to make it a true normal distribution?(2 votes)
- How much time are you given for this problem? Is this one of the longer free response, or the shorter ones?(1 vote)
- If you divide up your time, you should have about 15 minutes for each question.(1 vote)

- Could we have used a binomial distribution instead of a normal distribution and if so would the probability of getting less than or equal to 11 be binomCdf(n=65, p=.2, Lower Bound: 0, Upper Bound 11) which gives .33 much different from Sal's .266(1 vote)
- I still don't understand it and why do we have to do more than you on the video and you are good at math. I also need a tutor.(1 vote)

## Video transcript

- [Instructor] We're told that some boxes of a certain brand of breakfast
cereal include a voucher for a free video rental inside the box. The company that makes the cereal claims that a voucher can be
found in 20% of boxes, however, based on their experiences eating the cereal at home, a group of students
believes that the proportion of boxes with vouchers is less than 20%. This group of students
purchased 65 boxes of the cereal to investigate the company's claim. The student found a total of 11 vouchers for free video rentals in the 65 boxes. Suppose it is reasonable to assume that the 65 boxes purchased by the students are a random sample of all boxes of this cereal. Based on this sample, is there support for the students' belief
that the proportion of boxes with vouchers is less than 20%? Provide statistical evidence
to support your answer. And so, like always, pause this video and see if you can answer it by yourself, and this actually is a question
from an AP statistics exam. Alright, now let's work
through this together and I'm going to try to model some of what you might wanna do
if you were actually trying to answer this on an exam. So, the first thing
you might wanna say is, well, what's our null and
our alternative hypothesis? Well, our null hypothesis would be, well, the reality is what
the breakfast brand claims, that 20% of the boxes contain a voucher, so that would be our null hypothesis, and our alternative hypothesis
would be what we suspect, that the true proportion of boxes that contain a voucher is
actually less than 20%. Now, if you're going to
do a significance test, it's good practice to set
up your significance level that you're going to
eventually compare your p-value to ahead of time. And so, let's say we would want to assume, assume significance level, so let me write this, significance, significance level alpha, let's just go with 0.05, and then we'll wanna
think about the sample. And we're going to figure out, if we assume that the
null hypothesis is true, what's the probability that we get the sample proportion that we do? And if that is below
this significance level, then we would reject the null hypothesis. And so, what we know about the sample, we know that we took 65 boxes of cereal, n is equal to 65, they tell
us that right over there, and from that, we can calculate what the sample proportion is. It's going to be 11 out of 65 and we can get our calculator out, calculators are allowed
on this part of the exam. And so, what is 11 divided by 65? It gives us, and I'll just round to the nearest thousandth, 169. 0.169. 0.169, I'll say approximately
'cause I rounded it there. Now, the next thing we wanna do before we make an
inference is to make sure we're meeting the
conditions for inference, so I'll write this down over here. Conditions, conditions for inference, conditions for inference, and this is to feel good that we have properly
sampling the population, that our sampling distribution is going to be roughly normal. So, the first one is random sample that is truly a random sample and here they tell us. It is reasonable to assume
that the 65 boxes purchased by the students are a random sample, so that checks that off. So, I will just point that
to that right over there, so that checks that off. The next one is the normal condition, that the shape is roughly normal and it isn't skewed dramatically
one way or the other and in order to meet that condition, the sample size times the
true assumed proportion, and we're going to assume that
the null hypothesis is true, and so, we could say that, and we could even say that
this is the proportion assumed in the null hypothesis. That's what that notation would imply, and if you're doing
this on the actual test, you should explain your use
of notation a little bit more than I might do for the sake of time. But this needs to be
greater than or equal to 10 and n times one minus the
assumed proportion needs to be greater than or equal to 10. Well, let's see, n is 65, so 65 times the assumed proportion is 0.2. That is going to be equal to 13. 13 is indeed greater than or equal to 10, so that checks off. And then we would take n 65 times one minus the
assumed proportion, so 0.8, and that is going to
be equal to, let's see, that would just be 65 minus 13, which is going to be equal to 52, and that indeed is also
greater than or equal to 10. So, we met that condition right over there and then the last one is the independence. Independence. We aren't sampling these
boxes with replacement, so we need to feel good
that they are less than 10% of the population of boxes, and they don't tell us that explicitly, but it would be good practice to say going to assume, assume more than, let's see, 10 times that, 650 boxes in the population. Boxes in population, population, which would imply that n is less than 10% or less than or equal
to 10% of population, of population, which would allow us to check off the independence condition. And so, given that we've met
our conditions for inference, now let's think about the
sampling distribution. So, the sampling distribution of the sample proportions because that's what we're going to use to calculate a p-value. So, we know a few things about
the sampling distribution of the sample proportions. We know that the mean of
the sampling distribution of the sample proportions is just going to be the assumed true proportion, so that's the proportion
from the null hypothesis, and we know that the standard deviation of the sampling distribution of the sample proportions. This is going to be equal to, and we've seen this in
multiple videos already, this is the assumed
proportion times one minus the assumed proportion
from our null hypothesis, divided by n, which in this case is going to be equal to 0.2 times 0.8, all of that over at 65. Once again, let's get our calculator out. So, we're gonna have the square root of 0.2 times 0.8 divided by 65 and then close my parentheses. I get, so 0.0496. So, this is approximately 0.0496. Now, the next step is to
figure out the p-value which we can then compare
to our significance level to decide whether or not to
reject the null hypothesis. And in order to calculate the p-value, let's figure out our z statistic, which is, how many
standard deviations above or below the mean of the
sampling distribution is the sample statistic that we happen to get for this sample of 65? And we have seen this in previous videos. This would be equal to
our sample proportion minus the assumed proportion for the population in the null hypothesis, so the difference between those, and then divided by the standard deviation of the sampling distribution
of the sample proportions. This would tell us how many
standard deviations are we above or below the mean of the
sampling distribution. So, in this particular situation, this is going to be 0.169 minus 0.2, all of that over this
value right over here, which is approximately 0.0496. I can get the calculator out again. And so, we have 0.169 minus 0.2, so that's how far below
our sample proportion is than the mean of the
sampling distribution, which is the assumed proportion
from the null hypothesis, assumed population proportion, and then we divide that. We're gonna divide that
by the standard deviation of the sampling distribution of the sample proportions, so divide that by 0.0496 and we get a z-value of approximately, 'cause remember, this is using a bunch of approximations right over here, about negative 0.625. So, z is approximately negative 0.625 and so now we can think
about the actual p-value. Our p-value, which is equal to the probability of getting a sample proportion that is at least as low
as the one that we got, so a sample proportion that is less than or equal to the one that we got, 0.169. Assuming the null hypothesis is true, so we could say assuming
the null hypothesis is true, which is equal to the probability of getting a z statistic that is less than or equal to this value right
over here, negative 0.625. And now we can use our calculator to actually calculate this. So, what we can do is, we can go to second distribution. We wanna do normative, normalcdf, so go to normalcdf, and then our lower bound
is actually going to be, we could say negative infinity. Our upper bound is going to be negative, so negative 0.625, 625. This is, you could say a normalized normal distribution here, so we'll just go with all of this 'cause we're just thinking about the Z statistic right over here, click enter, and then click enter. And then we get, this is going to be, lets say, 0.266. So, this is approximately 0.266, and so let's just make
sure what we just did. If this right over here is the
assumed sampling distribution of the sample proportions where we are assuming that
our null hypothesis is true, so the mean of our sampling
distribution is going to be our assumed proportion, what we're saying is, look,
we got a result over here. This is where our p-hat
happened to be right over here. What's the probability of getting a result that far below the true
proportion or further? So, this is what we calculated just now. Now when you look at this, this
is almost a 27% probability. When you compare our p-value, we're gonna compare our p-value to our significance level and we see that our p-value
is clearly greater than our significance level, 0.266 is clearly greater than our significance level of 0.05. What we were saying is, if there was less than a 5% chance of getting the sample
proportion that we got, then we would reject the null hypothesis, which would suggest the alternative, but here the probability of getting the sample
proportion that we got, if we assume that the
null hypothesis is true, is almost 27%, and so that's well above
our significance level, so we will fail. So, because of this, because of this, we fail to reject, reject our null hypothesis, and from that, we can
say not enough evidence to suggest our alternative hypothesis. And if you have time you might wanna say, there's not enough evidence to suggest that less than 20% of the boxes have the free video rental voucher that they talked about in the
original problem description.