Main content

## Testing for the difference of two population means

# Two-sample t test for difference of means

AP.STATS:

DAT‑3.G (LO)

, DAT‑3.G.1 (EK)

, DAT‑3.H (LO)

, DAT‑3.H.1 (EK)

, DAT‑3.H.2 (EK)

, VAR‑7 (EU)

, VAR‑7.F (LO)

, VAR‑7.F.1 (EK)

, VAR‑7.G (LO)

, VAR‑7.G.1 (EK)

, VAR‑7.I (LO)

, VAR‑7.I.1 (EK)

## Video transcript

- [Instructor] "Kaito grows
tomatoes in two separate fields. "When the tomatoes are ready to be picked, "he is curious as to whether
the sizes of his tomato plants "differ between the two fields. "He takes a random sample
of plants from each field "and measures the heights of the plants. "Here is a summary of the results:" So what I want you to
do, is pause this video, and conduct a two sample T test here. And let's assume that
all of the conditions for inference are met,
the random condition, the normal condition, and
the independent condition. And let's assume that we are working with a significance level of 0.05. So pause the video, and conduct
the two sample T test here, to see whether there's evidence that the sizes of tomato plants
differ between the fields. Alright, now let's work
through this together. So like always, let's first
construct our null hypothesis. And that's going to be the situation where there is no difference
between the mean sizes, so that would be that
the mean size in field A is equal to the mean size in field B. Now what about our alternative hypothesis? Well, he wants to see whether
the sizes of his tomato plants differ between the two fields. He's not saying whether
A is bigger than B, or whether B is bigger than A, and so his alternative hypothesis would be around his suspicion, that the mean of A is not
equal to the mean of B, that they differ. And to do this two sample T test now, we assume the null hypothesis. We assume our null hypothesis, and remember we're assuming that all of our conditions for inference are met. And then we wanna calculate a T statistic based on this sample data that we have. And our T statistic is going to be equal to the differences
between the sample means, all of that over our estimate of the standard deviation
of the sampling distribution of the difference of the sample means. This will be the sample standard deviation from sample A squared, over
the sample size from A, plus the sample standard deviation from the B sample squared,
over the sample size from B. And let's see, we have
all the numbers here to calculate it. This numerator is going to
be equal to 1.3 minus 1.6, 1.3 minus 1.6, all of that over the square root of, let's see, the standard deviation, the
sample standard deviation from the sample from field A is 0.5. If you square that, you're gonna get 0.25, and then that's going to
be over the sample size from field A, over 22, plus 0.3 squared, so that is, 0.3 squared is 0.09, all of that over the
sample size from field B, all of that over 24. The numerator is just gonna be -.3, divided by the square root of .25 divided by 22, plus .09 divided by 24, and that gets us -2.44. Approximately -2.44. And so if you think
about a T distribution, and we'll use our calculator
to figure out this probability, so this is a T distribution
right over here, this would be the assumed
mean of our T distribution. And so we got a result that is, we got a T statistic of -2.44, so we're right over
here, so this is -2.44. And so we wanna say
what is the probability from this T distribution
of getting something at least this extreme? So it would be this area, and
it would also be this area, if we got 2.44 above the mean,
it would also be this area. And so what I could do is,
I'm gonna use my calculator to figure out this
probability right over here, and then I'm just gonna
multiply that by two, to get this one as well. So the probability of getting a T value, I guess I could say
where its absolute value is greater than or equal to 2.44, is going to be approximately equal to, I'm going to go to second, distribution, I'm going to go to the
cumulative distribution function for our T distribution, click that. And since I wanna think about
this tail probability here that I'm just gonna multiply by two, the lower bound is a very
very very negative number, and you could view that as
functionally negative infinity. The upper bound is -2.44. - 2.44. And now what's our degrees of freedom? Well if we take the conservative approach, it'll be the smaller of
the two samples minus one. Well the smaller of the two samples is 22, and so 22 minus one is 21. So put 21 in there. Two... 21. And now I can paste, and I get
that number right over there, and if I multiply that by
two, 'cause this just gives me the probability of getting
something lower than that, but I also wanna think
about the probability of getting something 2.44
or more above the mean of our T distribution. So times two, is going to be
equal to approximately 0.024. So approximately 0.024. And what I wanna do then is compare this to my significance level. And you can see very clearly,
this right over here, this is equal to our P value. Our P value in this situation, our P value in this
situation is clearly less than our significance level. And because of that, we said hey, assuming the null hypothesis is true, we got something that's
a pretty low probability below our threshold, so
we are going to reject our null hypothesis, which
tells us that there is, so this suggests, this suggests the alternative hypothesis, that there is indeed a
difference between the sizes of the tomato plants in the two fields.

AP® is a registered trademark of the College Board, which has not reviewed this resource.