Carrying out every step of a significance test on a proportion.
Want to join the conversation?
- Why do we assume the significance level is 0.05? (timestamp1:43)(8 votes)
- That's just a typical significance level. We could have chosen 0.01 or 0.1 as well, which are also typical significance levels, but it's worth noting that with any of these, we still get the same result (failing to reject the null hypothesis) in the original problem.(4 votes)
- Hi, why the standard deviation that you plug in the calculator is 1, instead of 0.0496?(6 votes)
- This is to ensure that the upper bound is equal to the z-score. The normalcdf function does not take in z-scores; it finds the area between a concrete lower and upper bound (rather than a relative z-score) given a mean and standard deviation. So, in order to make the bounds equal to the z-scores, the standard deviation is set to 1.(2 votes)
- when do you put the upper/lower bounds as infinity and negative infinity as opposed to 1 and 0? if it's a proportion wouldn't the distribution only go from 0 to 1? or are you assuming it has infinitely long tails to make it a true normal distribution?(3 votes)
- How much time are you given for this problem? Is this one of the longer free response, or the shorter ones?(1 vote)
- How much time are you given for this problem? Is this one of the longer free response, or the shorter ones?(1 vote)
- Could we have used a binomial distribution instead of a normal distribution and if so would the probability of getting less than or equal to 11 be binomCdf(n=65, p=.2, Lower Bound: 0, Upper Bound 11) which gives .33 much different from Sal's .266(1 vote)
- I still don't understand it and why do we have to do more than you on the video and you are good at math. I also need a tutor.(1 vote)
- [Instructor] We're told that some boxes of a certain brand of breakfast cereal include a voucher for a free video rental inside the box. The company that makes the cereal claims that a voucher can be found in 20% of boxes, however, based on their experiences eating the cereal at home, a group of students believes that the proportion of boxes with vouchers is less than 20%. This group of students purchased 65 boxes of the cereal to investigate the company's claim. The student found a total of 11 vouchers for free video rentals in the 65 boxes. Suppose it is reasonable to assume that the 65 boxes purchased by the students are a random sample of all boxes of this cereal. Based on this sample, is there support for the students' belief that the proportion of boxes with vouchers is less than 20%? Provide statistical evidence to support your answer. And so, like always, pause this video and see if you can answer it by yourself, and this actually is a question from an AP statistics exam. Alright, now let's work through this together and I'm going to try to model some of what you might wanna do if you were actually trying to answer this on an exam. So, the first thing you might wanna say is, well, what's our null and our alternative hypothesis? Well, our null hypothesis would be, well, the reality is what the breakfast brand claims, that 20% of the boxes contain a voucher, so that would be our null hypothesis, and our alternative hypothesis would be what we suspect, that the true proportion of boxes that contain a voucher is actually less than 20%. Now, if you're going to do a significance test, it's good practice to set up your significance level that you're going to eventually compare your p-value to ahead of time. And so, let's say we would want to assume, assume significance level, so let me write this, significance, significance level alpha, let's just go with 0.05, and then we'll wanna think about the sample. And we're going to figure out, if we assume that the null hypothesis is true, what's the probability that we get the sample proportion that we do? And if that is below this significance level, then we would reject the null hypothesis. And so, what we know about the sample, we know that we took 65 boxes of cereal, n is equal to 65, they tell us that right over there, and from that, we can calculate what the sample proportion is. It's going to be 11 out of 65 and we can get our calculator out, calculators are allowed on this part of the exam. And so, what is 11 divided by 65? It gives us, and I'll just round to the nearest thousandth, 169. 0.169. 0.169, I'll say approximately 'cause I rounded it there. Now, the next thing we wanna do before we make an inference is to make sure we're meeting the conditions for inference, so I'll write this down over here. Conditions, conditions for inference, conditions for inference, and this is to feel good that we have properly sampling the population, that our sampling distribution is going to be roughly normal. So, the first one is random sample that is truly a random sample and here they tell us. It is reasonable to assume that the 65 boxes purchased by the students are a random sample, so that checks that off. So, I will just point that to that right over there, so that checks that off. The next one is the normal condition, that the shape is roughly normal and it isn't skewed dramatically one way or the other and in order to meet that condition, the sample size times the true assumed proportion, and we're going to assume that the null hypothesis is true, and so, we could say that, and we could even say that this is the proportion assumed in the null hypothesis. That's what that notation would imply, and if you're doing this on the actual test, you should explain your use of notation a little bit more than I might do for the sake of time. But this needs to be greater than or equal to 10 and n times one minus the assumed proportion needs to be greater than or equal to 10. Well, let's see, n is 65, so 65 times the assumed proportion is 0.2. That is going to be equal to 13. 13 is indeed greater than or equal to 10, so that checks off. And then we would take n 65 times one minus the assumed proportion, so 0.8, and that is going to be equal to, let's see, that would just be 65 minus 13, which is going to be equal to 52, and that indeed is also greater than or equal to 10. So, we met that condition right over there and then the last one is the independence. Independence. We aren't sampling these boxes with replacement, so we need to feel good that they are less than 10% of the population of boxes, and they don't tell us that explicitly, but it would be good practice to say going to assume, assume more than, let's see, 10 times that, 650 boxes in the population. Boxes in population, population, which would imply that n is less than 10% or less than or equal to 10% of population, of population, which would allow us to check off the independence condition. And so, given that we've met our conditions for inference, now let's think about the sampling distribution. So, the sampling distribution of the sample proportions because that's what we're going to use to calculate a p-value. So, we know a few things about the sampling distribution of the sample proportions. We know that the mean of the sampling distribution of the sample proportions is just going to be the assumed true proportion, so that's the proportion from the null hypothesis, and we know that the standard deviation of the sampling distribution of the sample proportions. This is going to be equal to, and we've seen this in multiple videos already, this is the assumed proportion times one minus the assumed proportion from our null hypothesis, divided by n, which in this case is going to be equal to 0.2 times 0.8, all of that over at 65. Once again, let's get our calculator out. So, we're gonna have the square root of 0.2 times 0.8 divided by 65 and then close my parentheses. I get, so 0.0496. So, this is approximately 0.0496. Now, the next step is to figure out the p-value which we can then compare to our significance level to decide whether or not to reject the null hypothesis. And in order to calculate the p-value, let's figure out our z statistic, which is, how many standard deviations above or below the mean of the sampling distribution is the sample statistic that we happen to get for this sample of 65? And we have seen this in previous videos. This would be equal to our sample proportion minus the assumed proportion for the population in the null hypothesis, so the difference between those, and then divided by the standard deviation of the sampling distribution of the sample proportions. This would tell us how many standard deviations are we above or below the mean of the sampling distribution. So, in this particular situation, this is going to be 0.169 minus 0.2, all of that over this value right over here, which is approximately 0.0496. I can get the calculator out again. And so, we have 0.169 minus 0.2, so that's how far below our sample proportion is than the mean of the sampling distribution, which is the assumed proportion from the null hypothesis, assumed population proportion, and then we divide that. We're gonna divide that by the standard deviation of the sampling distribution of the sample proportions, so divide that by 0.0496 and we get a z-value of approximately, 'cause remember, this is using a bunch of approximations right over here, about negative 0.625. So, z is approximately negative 0.625 and so now we can think about the actual p-value. Our p-value, which is equal to the probability of getting a sample proportion that is at least as low as the one that we got, so a sample proportion that is less than or equal to the one that we got, 0.169. Assuming the null hypothesis is true, so we could say assuming the null hypothesis is true, which is equal to the probability of getting a z statistic that is less than or equal to this value right over here, negative 0.625. And now we can use our calculator to actually calculate this. So, what we can do is, we can go to second distribution. We wanna do normative, normalcdf, so go to normalcdf, and then our lower bound is actually going to be, we could say negative infinity. Our upper bound is going to be negative, so negative 0.625, 625. This is, you could say a normalized normal distribution here, so we'll just go with all of this 'cause we're just thinking about the Z statistic right over here, click enter, and then click enter. And then we get, this is going to be, lets say, 0.266. So, this is approximately 0.266, and so let's just make sure what we just did. If this right over here is the assumed sampling distribution of the sample proportions where we are assuming that our null hypothesis is true, so the mean of our sampling distribution is going to be our assumed proportion, what we're saying is, look, we got a result over here. This is where our p-hat happened to be right over here. What's the probability of getting a result that far below the true proportion or further? So, this is what we calculated just now. Now when you look at this, this is almost a 27% probability. When you compare our p-value, we're gonna compare our p-value to our significance level and we see that our p-value is clearly greater than our significance level, 0.266 is clearly greater than our significance level of 0.05. What we were saying is, if there was less than a 5% chance of getting the sample proportion that we got, then we would reject the null hypothesis, which would suggest the alternative, but here the probability of getting the sample proportion that we got, if we assume that the null hypothesis is true, is almost 27%, and so that's well above our significance level, so we will fail. So, because of this, because of this, we fail to reject, reject our null hypothesis, and from that, we can say not enough evidence to suggest our alternative hypothesis. And if you have time you might wanna say, there's not enough evidence to suggest that less than 20% of the boxes have the free video rental voucher that they talked about in the original problem description.