Main content

## More significance testing videos

# Small sample hypothesis test

## Video transcript

The mean emission of all engines
of a new design needs to be below 20 parts per million
if the design is to meet new emission
requirements. 10 engines are manufactured for
testing purposes, and the emission level of each
is determined. The emission data is, and they
give us 10 data points for the 10 test engines, and I went
ahead and calculated the mean of these data points. The sample mean of 17.17. And the standard deviation of
these 10 data points right here is 2.98, the sample
standard deviation. Does the data supply sufficient
evidence to conclude that this type of
engine meets the new standard? Assume we are willing to risk
a type-1 error with a probability of 0.01. And we'll touch on
this in a second. Before we do that, let's just
define what our null hypothesis and our alternative
hypothesis are going to be. Our null hypothesis
can be that we don't meet the standards. That we just barely don't
meet the standards. That the mean of our new engines
is exactly 20 parts per million. And you essentially want the
best possible value where we still don't meet, or the lowest
possible value, where we still don't meet
the standard. And then our alternative
hypothesis says no, we do meet the standard. That the true mean for our
new engines is below 20 parts per million. And to see if the data that we
have is sufficient, what we're going to do is assume,
we're going to assume that this is true. And given that this is true, if
we assume this is true, and the probability of this
occurring, and the probability of getting a sample mean of that
is less than 1%, then we will reject the null
hypothesis. So we are going to reject our
null hypothesis if the probability of getting a sample
mean of 17.17 given the null hypothesis is true,
is less than 1%. And notice, if we do it this way
there will be less than a 1% chance that we are making
a type-1 error. A type-1 error is that
we're rejecting it even though it's true. Here there's only a 1% chance,
or less than a 1% chance that we will reject it
if it is true. Now the next thing we have to
think about is what type of distribution we should
think about. And I guess the first thing that
rings in my brain is we only have 10 samples here. We only have 10 samples. We have a small sample
size right over here. So we're going to be dealing
with a T-distribution and T-statistic. So with that said, so let's
think of it this way. We can come up with a
T-statistic that is based on these statistics right
over here. So the T-statistic is going to
be 17.17, our sample mean, minus the assumed population
mean-- minus 20 parts per million over our sample standard
deviation, 2.98-- this is really the definition
of the T-statistic. And hopefully we see now that
this really comes from a Z-score and the T-distribution
is kind of an engineered version of the normal
distribution using T-statistics. 2.98 divided by the square
root of our sample size. We have 10 samples, so
it's divided by the square root of 10. So this value right here-- let
me get the calculator out just to get a value in place there. So this is going to be 17.17
minus 20, close parentheses, divided by 2.98 divided
by the square root-- that's not what I wanted. Let me delete that. Let me go back. Divided by the square root
of 10, and then close parentheses. It is almost exactly
negative 3. Our T-statistic is
almost exactly negative 3, negative 3.00. And what we need to figure out,
because T-statistics have a T-distribution, so what we
need to figure out is the probability of getting this
T-statistic or a value of T equal to this or less than this,
is that less than 1%? So the way we can think
about it is we have a T-distribution. And let's say we have a
normalized T-distribution. The distribution of all the
T-statistics would be a normalized T-distribution. This is the mean of the
T-distribution. There's going to be some
threshold T-value right here. So this is our threshold
T-value. My writing isn't that
easy to view. This is some threshold T-value
right over here. And we want a threshold T-value
such that any T-value less than that, or the
probability of getting a T-value less than that is 1%. So that entire area
in yellow is 1%. And we need to figure out a
threshold T-value there. And this is for a T-distribution
that has n equal to 10 or 10 minus 1 equals
9 degrees of freedom. So what is that threshold
value over there? And notice that this is a
one-sided distribution. We care about this is 1% and
then all of this stuff over here is going to be 99%. And just the way most T-tables
are set up, they don't set up a negative T-value that is
oriented like this, they'll just give you a positive
T-value that's oriented the other way. So the way T-tables-- and I have
one that we're going to use in a second right over
here-- the way T-tables are set up is you have your
distribution like this, and they will just give a positive
T-value over here, some threshold value. Where the probability of getting
a T-value above that is going to be 1%, and the
probability of getting a t-value below that is
going to be 99%. And you can see that-- well,
we know T-distributions are symmetric around their mean, so
whatever value this is, if this number is 2 then this
value's just going to be negative 2. So we just have to keep
that in mind. But the T-tables actually help
us figure out this value. So let's figure out a T-value
where the probability of getting a T-value below
that is 99%. And once again, this is going
to be a one-sided situation. So let's look at
that over here. So one-sided-- this is just
straight from Wikipedia-- one-sided, we want the
cumulative distribution below that T-value to be 99%. We have it right
over here, 99%. We have 9 degrees of freedom. We have 10 data points,
10 minus 1 is 9. 9 degrees of freedom. So our threshold T-value here
is 2.821, so our threshold T-value in the case that we care
about is just flip this over, it's completely symmetric
is negative 2.821. So what this tells us is the
probability of getting a T-value less than the negative
2.821 is going to be 1%. Now we got a value that's
a good bit less that we. Got a T-value of negative 3. We got a T-value right here, our
T-statistic of negative 3 right over here. So that definitely goes into
our-- I guess you could call it our area of rejection. This is even less probable
than the 1%. We could even figure it out that
the area over here, the probability of getting a
T-statistic less than negative 3 is even less than, it's a
subset of this yellow area right over here. So because the probability of
getting the T-statistic that we actually got is less than 1%,
we can safely reject the null hypothesis and feel
pretty good about our alternate hypothesis right over
here, that we do meet the emission standards. And we know that we have a
lower than 1% chance of actually making a type-1 error
in this circumstance.