If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Two-sample t test for difference of means

Given data from two samples, we can do a signficance test to compare the sample means with a test statistic and p-value, and determine if there is enough evidence to suggest a difference between the two population means.

Want to join the conversation?

Video transcript

- [Instructor] "Kaito grows tomatoes in two separate fields. "When the tomatoes are ready to be picked, "he is curious as to whether the sizes of his tomato plants "differ between the two fields. "He takes a random sample of plants from each field "and measures the heights of the plants. "Here is a summary of the results:" So what I want you to do, is pause this video, and conduct a two sample T test here. And let's assume that all of the conditions for inference are met, the random condition, the normal condition, and the independent condition. And let's assume that we are working with a significance level of 0.05. So pause the video, and conduct the two sample T test here, to see whether there's evidence that the sizes of tomato plants differ between the fields. Alright, now let's work through this together. So like always, let's first construct our null hypothesis. And that's going to be the situation where there is no difference between the mean sizes, so that would be that the mean size in field A is equal to the mean size in field B. Now what about our alternative hypothesis? Well, he wants to see whether the sizes of his tomato plants differ between the two fields. He's not saying whether A is bigger than B, or whether B is bigger than A, and so his alternative hypothesis would be around his suspicion, that the mean of A is not equal to the mean of B, that they differ. And to do this two sample T test now, we assume the null hypothesis. We assume our null hypothesis, and remember we're assuming that all of our conditions for inference are met. And then we wanna calculate a T statistic based on this sample data that we have. And our T statistic is going to be equal to the differences between the sample means, all of that over our estimate of the standard deviation of the sampling distribution of the difference of the sample means. This will be the sample standard deviation from sample A squared, over the sample size from A, plus the sample standard deviation from the B sample squared, over the sample size from B. And let's see, we have all the numbers here to calculate it. This numerator is going to be equal to 1.3 minus 1.6, 1.3 minus 1.6, all of that over the square root of, let's see, the standard deviation, the sample standard deviation from the sample from field A is 0.5. If you square that, you're gonna get 0.25, and then that's going to be over the sample size from field A, over 22, plus 0.3 squared, so that is, 0.3 squared is 0.09, all of that over the sample size from field B, all of that over 24. The numerator is just gonna be -.3, divided by the square root of .25 divided by 22, plus .09 divided by 24, and that gets us -2.44. Approximately -2.44. And so if you think about a T distribution, and we'll use our calculator to figure out this probability, so this is a T distribution right over here, this would be the assumed mean of our T distribution. And so we got a result that is, we got a T statistic of -2.44, so we're right over here, so this is -2.44. And so we wanna say what is the probability from this T distribution of getting something at least this extreme? So it would be this area, and it would also be this area, if we got 2.44 above the mean, it would also be this area. And so what I could do is, I'm gonna use my calculator to figure out this probability right over here, and then I'm just gonna multiply that by two, to get this one as well. So the probability of getting a T value, I guess I could say where its absolute value is greater than or equal to 2.44, is going to be approximately equal to, I'm going to go to second, distribution, I'm going to go to the cumulative distribution function for our T distribution, click that. And since I wanna think about this tail probability here that I'm just gonna multiply by two, the lower bound is a very very very negative number, and you could view that as functionally negative infinity. The upper bound is -2.44. - 2.44. And now what's our degrees of freedom? Well if we take the conservative approach, it'll be the smaller of the two samples minus one. Well the smaller of the two samples is 22, and so 22 minus one is 21. So put 21 in there. Two... 21. And now I can paste, and I get that number right over there, and if I multiply that by two, 'cause this just gives me the probability of getting something lower than that, but I also wanna think about the probability of getting something 2.44 or more above the mean of our T distribution. So times two, is going to be equal to approximately 0.024. So approximately 0.024. And what I wanna do then is compare this to my significance level. And you can see very clearly, this right over here, this is equal to our P value. Our P value in this situation, our P value in this situation is clearly less than our significance level. And because of that, we said hey, assuming the null hypothesis is true, we got something that's a pretty low probability below our threshold, so we are going to reject our null hypothesis, which tells us that there is, so this suggests, this suggests the alternative hypothesis, that there is indeed a difference between the sizes of the tomato plants in the two fields.