Main content

# Small sample hypothesis test

## Video transcript

The mean emission of all engines of a new design needs to be below 20 parts per million if the design is to meet new emission requirements. 10 engines are manufactured for testing purposes, and the emission level of each is determined. The emission data is, and they give us 10 data points for the 10 test engines, and I went ahead and calculated the mean of these data points. The sample mean of 17.17. And the standard deviation of these 10 data points right here is 2.98, the sample standard deviation. Does the data supply sufficient evidence to conclude that this type of engine meets the new standard? Assume we are willing to risk a type-1 error with a probability of 0.01. And we'll touch on this in a second. Before we do that, let's just define what our null hypothesis and our alternative hypothesis are going to be. Our null hypothesis can be that we don't meet the standards. That we just barely don't meet the standards. That the mean of our new engines is exactly 20 parts per million. And you essentially want the best possible value where we still don't meet, or the lowest possible value, where we still don't meet the standard. And then our alternative hypothesis says no, we do meet the standard. That the true mean for our new engines is below 20 parts per million. And to see if the data that we have is sufficient, what we're going to do is assume, we're going to assume that this is true. And given that this is true, if we assume this is true, and the probability of this occurring, and the probability of getting a sample mean of that is less than 1%, then we will reject the null hypothesis. So we are going to reject our null hypothesis if the probability of getting a sample mean of 17.17 given the null hypothesis is true, is less than 1%. And notice, if we do it this way there will be less than a 1% chance that we are making a type-1 error. A type-1 error is that we're rejecting it even though it's true. Here there's only a 1% chance, or less than a 1% chance that we will reject it if it is true. Now the next thing we have to think about is what type of distribution we should think about. And I guess the first thing that rings in my brain is we only have 10 samples here. We only have 10 samples. We have a small sample size right over here. So we're going to be dealing with a T-distribution and T-statistic. So with that said, so let's think of it this way. We can come up with a T-statistic that is based on these statistics right over here. So the T-statistic is going to be 17.17, our sample mean, minus the assumed population mean-- minus 20 parts per million over our sample standard deviation, 2.98-- this is really the definition of the T-statistic. And hopefully we see now that this really comes from a Z-score and the T-distribution is kind of an engineered version of the normal distribution using T-statistics. 2.98 divided by the square root of our sample size. We have 10 samples, so it's divided by the square root of 10. So this value right here-- let me get the calculator out just to get a value in place there. So this is going to be 17.17 minus 20, close parentheses, divided by 2.98 divided by the square root-- that's not what I wanted. Let me delete that. Let me go back. Divided by the square root of 10, and then close parentheses. It is almost exactly negative 3. Our T-statistic is almost exactly negative 3, negative 3.00. And what we need to figure out, because T-statistics have a T-distribution, so what we need to figure out is the probability of getting this T-statistic or a value of T equal to this or less than this, is that less than 1%? So the way we can think about it is we have a T-distribution. And let's say we have a normalized T-distribution. The distribution of all the T-statistics would be a normalized T-distribution. This is the mean of the T-distribution. There's going to be some threshold T-value right here. So this is our threshold T-value. My writing isn't that easy to view. This is some threshold T-value right over here. And we want a threshold T-value such that any T-value less than that, or the probability of getting a T-value less than that is 1%. So that entire area in yellow is 1%. And we need to figure out a threshold T-value there. And this is for a T-distribution that has n equal to 10 or 10 minus 1 equals 9 degrees of freedom. So what is that threshold value over there? And notice that this is a one-sided distribution. We care about this is 1% and then all of this stuff over here is going to be 99%. And just the way most T-tables are set up, they don't set up a negative T-value that is oriented like this, they'll just give you a positive T-value that's oriented the other way. So the way T-tables-- and I have one that we're going to use in a second right over here-- the way T-tables are set up is you have your distribution like this, and they will just give a positive T-value over here, some threshold value. Where the probability of getting a T-value above that is going to be 1%, and the probability of getting a t-value below that is going to be 99%. And you can see that-- well, we know T-distributions are symmetric around their mean, so whatever value this is, if this number is 2 then this value's just going to be negative 2. So we just have to keep that in mind. But the T-tables actually help us figure out this value. So let's figure out a T-value where the probability of getting a T-value below that is 99%. And once again, this is going to be a one-sided situation. So let's look at that over here. So one-sided-- this is just straight from Wikipedia-- one-sided, we want the cumulative distribution below that T-value to be 99%. We have it right over here, 99%. We have 9 degrees of freedom. We have 10 data points, 10 minus 1 is 9. 9 degrees of freedom. So our threshold T-value here is 2.821, so our threshold T-value in the case that we care about is just flip this over, it's completely symmetric is negative 2.821. So what this tells us is the probability of getting a T-value less than the negative 2.821 is going to be 1%. Now we got a value that's a good bit less that we. Got a T-value of negative 3. We got a T-value right here, our T-statistic of negative 3 right over here. So that definitely goes into our-- I guess you could call it our area of rejection. This is even less probable than the 1%. We could even figure it out that the area over here, the probability of getting a T-statistic less than negative 3 is even less than, it's a subset of this yellow area right over here. So because the probability of getting the T-statistic that we actually got is less than 1%, we can safely reject the null hypothesis and feel pretty good about our alternate hypothesis right over here, that we do meet the emission standards. And we know that we have a lower than 1% chance of actually making a type-1 error in this circumstance.