Current time:0:00Total duration:10:06

0 energy points

# Hypothesis test for difference of means

Hypothesis Test for Difference of Means. Created by Sal Khan.

Video transcript

In the last video, we came up
with a 95% confidence interval for the mean weight loss between
the low-fat group and the control group. In this video, I actually want
to do a hypothesis test, really to test if this data
makes us believe that the low-fat diet actually does
anything at all. And to do that let's set up
our null and alternative hypotheses. So our null hypothesis
should be that this low-fat diet does nothing. And if the low-fat diet does
nothing, that means that the population mean on our low-fat
diet minus the population mean on our control should
be equal to zero. And this is a completely
equivalent statement to saying that the mean of the sampling
distribution of our low-fat diet minus the mean of the
sampling distribution of our control should be
equal to zero. And that's because we've seen
this multiple times. The mean of your sampling
distribution is going to be the same thing as your
population mean. So this is the same
thing is that. That is the same
thing is that. Or, another way of saying it is,
if we think about the mean of the distribution of the
difference of the sample means, and we focused on this
in the last video, that that should be equal to zero. Because this thing right over
here is the same thing as that right over there. So that is our null
hypothesis. And our alternative hypothesis, I'll write over here. It's just that it actually
does do something. And let's say that it actually
has an improvement. So that would mean that we
have more weight loss. So if we have the mean of Group
One, the population mean of Group One minus the
population mean of Group Two should be greater then zero. So this is going to be a one
tailed distribution. Or another way we can view it,
is that the mean of the difference of the distributions,
x1 minus x2 is going to be greater then zero. These are equivalent
statements. Because we know that this is the
same thing as this, which is the same thing as this,
which is what I wrote right over here. Now, to do any type of
hypothesis test, we have to decide on a level
of significance. What we're going to do is, we're
going to assume that our null hypothesis is correct. And then with that assumption
that the null hypothesis is correct, we're going to see
what is the probability of getting this sample data
right over here. And if that probability is below
some threshold, we will reject the null hypothesis in
favor of the alternative hypothesis. Now, that probability threshold,
and we've seen this before, is called the
significance level, sometimes called alpha. And here, we're going to decide
for a significance level of 95%. Or another way to think about
it, assuming that the null hypothesis is correct, we want
there to be no more than a 5% chance of getting this
result here. Or no more than a 5% chance of
incorrectly rejecting the null hypothesis when it
is actually true. Or that would be a
type one error. So if there's less than a 5%
probability of this happening, we're going to reject
the null hypothesis. Less than a 5% probability given
the null hypothesis is true, then we're going to reject
the null hypothesis in favor of the alternative. So let's think about this. So we have the null
hypothesis. Let me draw a distribution
over here. The null hypothesis says that
the mean of the differences of the sampling distributions
should be equal to zero. Now, in that situation, what
is going to be our critical region here? Well, we need a result, so
we're going to need some critical value here. Because this isn't a
normalized normal distribution. But there's some critical
value here. The hardest thing is statistics
is getting the wording right. There's some critical value here
that the probability of getting a sample from this
distribution above that value is only 5%. So we just need to figure out
what this critical value is. And if our value is larger than
that critical value, then we can reject the
null hypothesis. Because that means the
probability of getting this is less than 5%. We could reject the null
hypothesis and go with the alternative hypothesis. Remember, once again, we can
use Z-scores, and we can assume this is a normal
distribution because our sample size is large for either
of those samples. We have a sample size of 100. And to figure that out, the
first step, if we just look at a normalized normal distribution
like this, what is your critical Z value? We're getting a result
above that Z value, only has a 5% chance. So this is actually
cumulative. So this whole area right
over here is going to be 95% chance. We can just look
at the Z table. We're looking for 95% percent. We're looking at the
one tailed case. So let's look for 95%. This is the closest thing. We want to err on the side of
being a little bit maybe to the right of this. So let's say 95.05
is pretty good. So that's 1.65. So this critical Z value
is equal to 1.65. Or another way to view it is,
this distance right here is going to be 1.65 standard
deviations. I know my writing
is really small. I'm just saying the standard
deviation of that distribution. So what is the standard
deviation of that distribution? We actually calculated it in
the last video, and I'll recalculate it here. The standard deviation of our
distribution of the difference of the sample means is going to
be equal to the square root of the variance of our
first population. Now, the variance of our first
population, we don't know it. But we could estimate it with
our sample standard deviation. If you take your sample standard
deviation, 4.67 and you square it, you get
your sample variance. And so this is the variance. This is our best estimate
of the variance of the population. And we want to divide that
by the sample size. And then plus our best estimate
of the variance of the population of group two,
which is 4.04 squared. The sample standard deviation
of group two squared. That gives us variance
divided by 100. I did before in the last. Maybe
it's still sitting on my calculator. Yes, it's still sitting
on the calculator. It's this quantity
right up here. 4.67 squared divided
by 100 plus 4.04 squared divided by 100. So it's 0.617. So this right here is
going to be 0.617. So this distance right
here, is going to be 1.65 times 0.617. So let's figure out
what that is. So let's take 0.617
times 1.65. So it's 1.02. This distance right
here is 1.02. So what this tells us is, if
we assume that the diet actually does nothing, there's a
only a 5% chance of having a difference between the means of
these two samples to have a difference of more than 1.02. There's only a 5%
chance of that. Well, the mean that we
actually got is 1.91. So that's sitting out
here someplace. So it definitely falls in
this critical region. The probability of getting this,
assuming that the null hypothesis is correct,
is less than 5%. So it's smaller probability than
our significance level. Actually, let me
be very clear. The significance level,
this alpha right here, needs to be 5%. Not the 95%. I think I might have
said here. But I wrote down the
wrong number there. I subtracted it from
one by accident. Probably in my head. But anyway, the significance
level is 5%. The probability given that the
null hypothesis is true, the probability of getting the
result that we got, the probability of getting that
difference, is less than our significance level. It is less than 5%. So based on the rules that we
set out for ourselves of having a significance level of
5%, we will reject the null hypothesis in favor of the
alternative that the diet actually does make you
lose more weight.