If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content
Current time:0:00Total duration:14:30

Video transcript

we want to test the hypothesis that more than 30% of US households have internet access with a significance level of 5% we collect a sample of 150 households and find that 57 have access so to do our hypothesis test let's just establish our null hypothesis and our alternative hypothesis so our null hypothesis is that the hypothesis is not correct our null hypothesis is that less than that the proportion of US households that have internet access is less than or equal to 30% and our alternative hypothesis is what our hypothesis actually is is that the proportion is greater than 30% we see it over here we want to test the hypothesis that more than 30% of US households have internet access that's that right here this is what we're testing we're testing the alternative hypothesis and the way we're going to do it is we're going to assume a p-value based on the null hypothesis we're going to assume a proportion based on the null hypothesis for the population and given that assumption what is the probability that 57 what is the probability that 57 out of 150 of our samples actually have actually have internet access and if that probability is less than 5% if that probability is less than 5% if it's less than our significance level then we're going to reject the null hypothesis in favor of the alternative one so let's think about this a little bit so we're going to start off assuming we're going to assume assume the null hypothesis is true and in that assumption we're going to have to pick a a population proportion or a population mean we know that for Bernoulli distributions are the same thing and what I'm going to do is I'm going to pick a proportion so high that so that it maximizes the probability of getting this over here and we actually don't even know what that number is and actually so that we can think about a little more intelligent let's just find out what our sample portion even is we had 57 people out of 150 having internet access or 57 households out of 150 so our simple proportion is 0.38 so let me write that over here our sample proportion our sample proportion is equal to 0.38 so when we when we assume our null hypothesis to be true we're going to assume we're going to assume a proportion a population proportion we're going to assume a population proportion that maximizes the probability that we get this over here so the highest population proportion that's within our null hypothesis that will maximize the probability of getting this is actually if we are right at 30% so if we say our population proportion we're going to assume this is true this is our null hypothesis we're going to assume that it is 0.3 or 30% and I want you to understand that if we said you know we 29% would have been in our null hypothesis 28% that would have been in a null hypothesis but for 29 or 28% the probability of getting this would have been even lower so it wouldn't have been as strong of a test if we take the maximum if we take the maximum proportion that still satisfies our null hypothesis we're maximizing the probability that we get this so that number is still low if it's still less than 5% we can feel pretty good about the alternative hypothesis so just to refresh ourselves we're going to assume a population proportion of 0.3 and if we just think about the distribution sometimes it's helpful to draw these things so I will draw it so this is what the population distribution looks like based on our assumption based on this assumption right over here our population distribution has point or maybe I should write 30 percent 30 percent have internet access and I'll just signify that with a 1 and then the rest don't have internet access the rest the rest don't have internet access 70 percent do not have internet access this is just a Bernoulli distribution we know that the mean over here is going to be the same thing as the proportion that has Internet access so the mean over here the mean over here is going to be the mean is going to be 0.3 same thing as 30 percent this is the population mean and maybe I should write it this way the mean assumed the mean assuming our null hypothesis the population mean assuming our null hypothesis is zero point three and then the population standard deviation let me write this over here in yellow the population standard deviation assuming our null hypothesis assuming our null hypothesis and we've seen this when we first learned about Bernoulli distributions it is going to be the square root it is going to be the square root of the property of the percentage of the population that has internet access so 0.3 0.3 times the proportion of the population that does not have internet access times 0.7 right over here so this would be the square root this is the square root of 0.2 1 and we could deal with this later using our calculator now with that out of the way we want to figure out the probability for in our sample the probability of getting a sample proportion that has 0.38 so let's look at the let's look at the distribution of sample proportions so you could literally look at every combination of getting 150 households from this and you would actually get a Bernoulli distributed a distribution and we've also seen this before you would actually get a binomial distribution where you'd get a bunch of bars you would get a bunch of bars like that but if your n is suitably large and in particular in particular and this is kind of the test for it the test if n times P if n times P and in this case we're saying P is 30% if n times P is greater than 5 and n times 1 minus P is greater than 5 you can assume you can assume that the the the distribution of the sample proportion of the sample proportion distribution is going to be normal so if you looked at all of the different ways you could sample 150 households from this population you get all of these bars but since our n is pretty big it's 150 150 times 0.3 is obviously greater than 5 150 times 0.7 is also greater than 5 you can approximate that with a normal distribution so let me do that so you can approximate it with a normal distribution so this is this is a normal distribution right over there now the mean of the distribution of the proportion data that we're assuming is a normal distribution is going to be and remember working under the context that the null hypothesis is true so this mean is going to be this mean right here so the mean of our of our sample proportions the mean of our sample proportions is going to be the same thing as our population mean so this is going to be 0.3 same value as that and the standard deviation this comes straight from the central limit theorem so the standard deviation of our of our of our sample proportions the standard deviation is going to be the square root it's going to be the square root let me put this way it's going to be our population standard deviation but the standard deviation we're assuming with our null hypothesis divided by the square root of the number of samples we have and in this case we have 150 samples it's going to be 150 samples and we can calculate this this value on top we just figured out is the square root of 0.2 1 so this is the square root of 0.2 1 over the square root of 150 over the square root of 150 and I can get the calculator out to calculate this so this is so let me just I'll just do it the way I wrote it square root of point 2 1 and I'm going to divide that so that whatever answer is I'm going to divide that by the square root square root of 150 so it's a point 0 3 7 so we figured out the standard deviation here of our of or the distribution of our sample proportions is going to be let me write this down I'll scroll over to the right a little bit it is zero point zero three seven I think I'm falling off the screen a little bit so we'll just lay zero point zero three seven now to figure out the problem to figure out the probability of having a sample proportion of 0.38 we just have to figure out how many standard deviations that is away from our mean or essentially calculate AZ to z statistic for our sample because the Z statistic or z-score is really just how many standard deviations you are away from the mean and then figure out whether the probability of getting that Z statistic is more or less than 5% so let's figure out how many standard deviations we are away from the mean so just to remind ourselves this the sample proportion we got we can view is just a sample from this distribution of all of the possible sample proportions so how many standard deviations away from the mean is this so if we take our sample proportion subtract from that the mean of the distribution of sample proportions and divide it by the standard deviation of the distribution of the sample proportions we get we get 0.38 0.38 minus 0.3 minus 0.3 all of that over this value which we just figured out was zero point zero three seven so what does that give us so the numerator over here the new morover over the numerator over here is just zero point zero eight the denominator is zero point zero three seven so let's figure this out so our numerator is 0.08 divided by this last number right here which is the point zero three seven so the second answer and we get two point one I'll just round it to two point one four standard deviations so this is equal to this right here this right here is equal to two point one four standard deviations or we could say that our Z statistic right we could call this our z-score or Z statistic the number of standard deviations we are away from our mean is two point one four where two point one four and two B but to be exact we're two point one four standard deviations above the mean we're going to care about a one-tailed distribution now is the probability of getting this more or less than 5% if it's less than 5% we're going to reject the null hypothesis in favor of our alternative so how do we think about that well we want let's think about just a normalized normal distribution or maybe you could call it the Z distribution if you want if you look at a normal distribution a completely normalized normal distribution it's mean is at zero its mean is at zero and essentially each of these values are essentially z-scores because if you are one a value of 1 literally means you're one standard deviation away from this mean over here so we need to find a critical Z value we need to find a critical Z value right over here let me call that a critical critical Z we could even say is the critical z-score or critical Z value so that the probability of getting a Z value higher than that is 5% so that this whole area right here is 5% and that's because that's what our significance level is anything that has a lower than 5% chance of occurring for us will be validation to reject our null hypothesis or another way of thinking about if that area is 5% this whole area this whole area right over here is 95% and once again this is a one tailed test because we only care about values greater than this Z values greater than that will make us reject the null hypothesis and to figure out this critical Z value we can literally just go to a Z table and we say okay what Z value has in it the probability of getting a Z value less than that is 95% and that's exactly the number that this gives us the cumulative probability of getting a value less than that and so if we just scan this we're looking for 95% we haven't point nine four nine five we have point nine five oh five so I'll go with this just to get make sure we're a little bit closer so this Z value and the Z value here is one point six and the next digit is five one point six five so this critical Z value is equal to one point six five so the probability of getting a ZV you less than 1.65 on a or or even in a completely normalized normal distribution the probability of getting a value less than one point six five or in any normal distribution the probability of being less than one point six five standard deviations away from the mean is going to be ninety five percent so that's our critical Z value now the Z value or the Z statistic for our actual sample is two point one for our actual Z value we got is two point one four it's sitting all the way out here someplace so the probability of getting that was definitely less than five percent and actually we could even say what's the probability of getting that or something or a more extreme result and if you figured out this area and you could actually figure it out by looking at Z table you could figure out the p value of this result but anyway the whole exercise here is just to figure out where if we're going to reject the null hypothesis with a significance level of 5% we can this is a more extreme result than our critical Z value so we can reject we can reject the null hypothesis and in favor of our alternative