Main content

## Statistics and probability

### Course: Statistics and probability > Unit 12

Lesson 5: More significance testing videos# Small sample hypothesis test

Sal walks through an example of a hypothesis test where he determines if there is sufficient evidence to conclude that a new type of engine meets emission requirements. Created by Sal Khan.

## Want to join the conversation?

- I've always been confused about what a degree of freedom is. My textbook is very unclear, and wikipedia isn't much help either. Wikipedia describes degrees of freedom as "the number of values in the final calculation of a statistic that are free to vary", which is very vague. Is anyone really clear on what this is? I've seen it used all the time in hypothesis tests, but it's always baffled me(35 votes)
- Correct me if I'm wrong, but the way I see it can be illustrated by the following example. Lets say you have de letters A, B, C and D and you have four boxes under which those letters are hidden, called 1, 2, 3 and 4. The letters are randomly hidden under the boxes so you have to guess them. You open box 1 and see the letter C, so that one's out. Box 2 reveiles A, so that one's out aswell and the letters B and D are left. However, if you open the third box and you see the letter, you automatically know what's below box number 4 aswell. So if box 3 reveiles the letter D, you automatically know that B is below box 4. Hence, you have one degree of freedom less, since the last letter is known when the previous three boxes are lifted and there's no need to lift up the last box.(55 votes)

- Correction @2:12; it should use "<=" instead of "=", that is: P( xbar <= 17.7 | H0 )<0.01. p-values are the probability of the statistic coming out "more extreme" than what was observed. This makes sense since we are working with a one-sided test that rejects only if the mean is low. (For the two-sided test, you double the probability to represent both tails.)(18 votes)
- I agree that the probability phrasing in the video is incorrect. It should be <=. Since this is a continuous distribution, the probability of getting any single value is actually zero. So, P(xbar = 17.7|H0 is true) = 0). We are truly looking for the probability of getting a value of xbar more extreme than the observed value of 17.7. Later in the video, Sal shifts gears to examining for a value that is more extreme (than the t-statistic), but that "more extreme than" bit should have been present from the beginning of the analysis.(11 votes)

- Why do we actually use s / sqrt(N) and not s / sqrt(N-1) ? I thought that we used the latter if the sample size is small, or am I wrong? When do you use the one or the other?(4 votes)
- Dividing by n-1 is used when we calculate the
*standard deviation*, s. Once we've done that, we've already adjusted for the bias. The calculation of s / sqrt(n) is calculating the*standard error of the sample mean*(well, an estimate of it). This calculation uses just sqrt(n) in the denominator.(11 votes)

- I have a basic question on the null hypothesis (H0). Why wasn't the null hypothesis stated as x<20? Is it because the question mentioned Type 1 error or is there some other reasoning for assessing the problems in general?(4 votes)
- Hypothesis tests are designed to prove the alternative hypothesis, so we try to put what we want to show into H1, and use the opposite of it as the null, Ho.

And yes, this is related to Type I Error - which is the probability of incorrectly deciding that H1 is true. So in this case, if we rejected Ho (that is, conclude the new engine design meets the emission requirements), then there is only a 1% chance that we made a mistake.

Does that answer your question?(7 votes)

- Very basic question, but been a long time since i've done any statistics. So may I please ask you how you found the standarddiviation? Left my calculator at school, so cant try, but is it. )(15,6 - 17,17)^2)* 1/10 + .......((13,9- 17,17)^2)*1/10 ?(3 votes)
- When referencing the t-table, why did Sal decide to use the one tailed test rather than the two-tailed test?(2 votes)
- Why is the null hypothesis u=20 ppm and not u is greater than or equal to 20 ppm?(3 votes)
- The null hypothesis is a value believed to be true. (=)

The alternative hypothesis is the same value as the null hypothesis, but it involves a comparative. (<, >, etc.)(1 vote)

- After watching the previous videos I still do not understand the intuition behind the conditions for H0...

Why do we think that having a low probability (<1%) for 17.7 ppm in the problem leads us to rejecting the H0? If 17.7 has high probability that means that we more than meet the required <20 ppm standards. What am I missing here?(2 votes)- H0 is that we don't meet the standards. By rejecting H0, we are saying we are confident that we do meet the standards.

In general, rejecting H0 means that we got a statistically significant result. It can seem counter-intuitive. I think about it like wanting a negative result on many medical tests - because negative means I don't have whatever disease or condition they were testing for.(3 votes)

- Why not just take the absolute value of t? In the end, it's the magnitude of t that matters and may be less confusing for one to simply deal with positive values, especially from the t-table.(1 vote)
- That is generally how it's used. Though you have to be careful sometimes. If you're performing, say, an upper-tail test, and the t-stat is negative, then taking the absolute value, and comparing to the positive critical values could lead to the wrong decision.(4 votes)

- where does the standard deviation come from?(2 votes)

## Video transcript

The mean emission of all engines
of a new design needs to be below 20 parts per million
if the design is to meet new emission
requirements. 10 engines are manufactured for
testing purposes, and the emission level of each
is determined. The emission data is, and they
give us 10 data points for the 10 test engines, and I went
ahead and calculated the mean of these data points. The sample mean of 17.17. And the standard deviation of
these 10 data points right here is 2.98, the sample
standard deviation. Does the data supply sufficient
evidence to conclude that this type of
engine meets the new standard? Assume we are willing to risk
a type-1 error with a probability of 0.01. And we'll touch on
this in a second. Before we do that, let's just
define what our null hypothesis and our alternative
hypothesis are going to be. Our null hypothesis
can be that we don't meet the standards. That we just barely don't
meet the standards. That the mean of our new engines
is exactly 20 parts per million. And you essentially want the
best possible value where we still don't meet, or the lowest
possible value, where we still don't meet
the standard. And then our alternative
hypothesis says no, we do meet the standard. That the true mean for our
new engines is below 20 parts per million. And to see if the data that we
have is sufficient, what we're going to do is assume,
we're going to assume that this is true. And given that this is true, if
we assume this is true, and the probability of this
occurring, and the probability of getting a sample mean of that
is less than 1%, then we will reject the null
hypothesis. So we are going to reject our
null hypothesis if the probability of getting a sample
mean of 17.17 given the null hypothesis is true,
is less than 1%. And notice, if we do it this way
there will be less than a 1% chance that we are making
a type-1 error. A type-1 error is that
we're rejecting it even though it's true. Here there's only a 1% chance,
or less than a 1% chance that we will reject it
if it is true. Now the next thing we have to
think about is what type of distribution we should
think about. And I guess the first thing that
rings in my brain is we only have 10 samples here. We only have 10 samples. We have a small sample
size right over here. So we're going to be dealing
with a T-distribution and T-statistic. So with that said, so let's
think of it this way. We can come up with a
T-statistic that is based on these statistics right
over here. So the T-statistic is going to
be 17.17, our sample mean, minus the assumed population
mean-- minus 20 parts per million over our sample standard
deviation, 2.98-- this is really the definition
of the T-statistic. And hopefully we see now that
this really comes from a Z-score and the T-distribution
is kind of an engineered version of the normal
distribution using T-statistics. 2.98 divided by the square
root of our sample size. We have 10 samples, so
it's divided by the square root of 10. So this value right here-- let
me get the calculator out just to get a value in place there. So this is going to be 17.17
minus 20, close parentheses, divided by 2.98 divided
by the square root-- that's not what I wanted. Let me delete that. Let me go back. Divided by the square root
of 10, and then close parentheses. It is almost exactly
negative 3. Our T-statistic is
almost exactly negative 3, negative 3.00. And what we need to figure out,
because T-statistics have a T-distribution, so what we
need to figure out is the probability of getting this
T-statistic or a value of T equal to this or less than this,
is that less than 1%? So the way we can think
about it is we have a T-distribution. And let's say we have a
normalized T-distribution. The distribution of all the
T-statistics would be a normalized T-distribution. This is the mean of the
T-distribution. There's going to be some
threshold T-value right here. So this is our threshold
T-value. My writing isn't that
easy to view. This is some threshold T-value
right over here. And we want a threshold T-value
such that any T-value less than that, or the
probability of getting a T-value less than that is 1%. So that entire area
in yellow is 1%. And we need to figure out a
threshold T-value there. And this is for a T-distribution
that has n equal to 10 or 10 minus 1 equals
9 degrees of freedom. So what is that threshold
value over there? And notice that this is a
one-sided distribution. We care about this is 1% and
then all of this stuff over here is going to be 99%. And just the way most T-tables
are set up, they don't set up a negative T-value that is
oriented like this, they'll just give you a positive
T-value that's oriented the other way. So the way T-tables-- and I have
one that we're going to use in a second right over
here-- the way T-tables are set up is you have your
distribution like this, and they will just give a positive
T-value over here, some threshold value. Where the probability of getting
a T-value above that is going to be 1%, and the
probability of getting a t-value below that is
going to be 99%. And you can see that-- well,
we know T-distributions are symmetric around their mean, so
whatever value this is, if this number is 2 then this
value's just going to be negative 2. So we just have to keep
that in mind. But the T-tables actually help
us figure out this value. So let's figure out a T-value
where the probability of getting a T-value below
that is 99%. And once again, this is going
to be a one-sided situation. So let's look at
that over here. So one-sided-- this is just
straight from Wikipedia-- one-sided, we want the
cumulative distribution below that T-value to be 99%. We have it right
over here, 99%. We have 9 degrees of freedom. We have 10 data points,
10 minus 1 is 9. 9 degrees of freedom. So our threshold T-value here
is 2.821, so our threshold T-value in the case that we care
about is just flip this over, it's completely symmetric
is negative 2.821. So what this tells us is the
probability of getting a T-value less than the negative
2.821 is going to be 1%. Now we got a value that's
a good bit less that we. Got a T-value of negative 3. We got a T-value right here, our
T-statistic of negative 3 right over here. So that definitely goes into
our-- I guess you could call it our area of rejection. This is even less probable
than the 1%. We could even figure it out that
the area over here, the probability of getting a
T-statistic less than negative 3 is even less than, it's a
subset of this yellow area right over here. So because the probability of
getting the T-statistic that we actually got is less than 1%,
we can safely reject the null hypothesis and feel
pretty good about our alternate hypothesis right over
here, that we do meet the emission standards. And we know that we have a
lower than 1% chance of actually making a type-1 error
in this circumstance.