If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Constructing t interval for difference of means

Constructing t interval for difference of means.

Want to join the conversation?

Video transcript

- [Instructor] Let's say that we have two populations. So that's the first population, and this is the second population right over here. And we are going to think about the means of these populations. So let's say this first population is a population of golden retrievers and this second population is a population of chihuahuas. And the mean that we're going to think about is maybe the mean weight. So mu one would be the true mean weight of the population of golden retrievers. And mu two would be the true mean weight of the population of chihuahuas. Now what we want to think about is what is the difference between these two population means, between these two population parameters. Well, if we don't know this, all we can do is try to estimate it and maybe construct some type of confidence interval. And that's what we're going to talk about in this video. So how do we go about doing it? Well, we've seen this, or similar things, before. What you would do is you would take a sample from both populations. So from population one here, I would take a sample of size n sub one. And from that, I can calculate a sample mean. So this is a statistic that is trying to estimate that. And I can also calculate a sample standard deviation. And I can do the same thing in the population of chihuahuas, if that's what our population two is all about. So I could take a sample, and actually this sample does not have to be the same as n one, so I'll call it n sub two. It could be, but doesn't have to be. And from that, I can calculate a sample mean, x bar sub two, and a sample standard deviation. So now, assuming that our conditions for inference are met, and we've talked about those before. We have the random condition, we have the normal condition, we have the independence condition. Assuming those conditions are met, and we talk about those in other videos for means, let's think about how we can construct a confidence interval. And so you might say, alright, well that would be the difference of my sample means, x bar sub one minus x bar sub two, plus or minus some z value, times my standard deviation, times the standard deviation of the sampling distribution of the difference of the sample means. So x bar sub one minus x bar sub two. And you might say, well, where do I get my z from? Well, our confidence level would determine that. Confidence, confidence level. If our confidence level is 95%, that would determine our z. Now this would not be incorrect, but we face a problem, because we are going to need to estimate what the standard deviation of the sampling distribution of the difference between our sample means actually is. To make that clear, let me write it this way. So the variance of the sampling distribution of the difference of our sample means is going to be equal to the variance of the sampling distribution of sample mean one, plus the variance of the sampling distribution of sample mean two. Now if we knew the true underlying standard deviations of this population and this population, then we could actually come up with these. In that case, this right over here would be equal to the variance of the population of population one, divided by our sample size, n one plus, plus the variance of the underlying population two divided by this sample size. But we don't know these variances, and so we try to estimate them. So we estimate them with our sample standard deviations. So we say this is going to be approximately equal to our first sample standard deviation squared over n one plus our second sample standard deviation squared over n two. And so we can say that an estimate of the standard deviation of the sampling distribution of the difference between our sample means and estimate is going to be equal to the square root of this. It's going to be approximately equal to the square root of s one squared over n one plus s two squared over n two. But the problem is is once we use this estimate that we can figure out, a critical z value isn't going to be as good as a critical t value. So instead, you would say my confidence interval is going to be x bar sub one minus x bar sub two plus or minus a critical t value, instead of a z value. Because that works better when you are estimating standard deviation of the sampling distribution of the difference between the sample means. And so you have t star times our estimate of this which is going to be equal to the square root of s sub one squared over n one plus s sub two squared over n two. And then you might say, well, what determines our t star? Well once again, you would look it up on a table using your confidence level. And you might be saying, wait, hold on, when I look up a t value, I don't just care about a confidence level, I also care about degrees of freedom. What is going to be the degrees of freedom in this situation? Well, there's a simple answer and a complicated answer. Once we think about the difference of means, there's fairly sophisticated formulas that computers can use to get a more precise degrees of freedom. But what you will typical see in a statistics class is a conservative view of degrees of freedom, where you take the lower of n one and n two, and you subtract one from that. So the degrees of freedom here, so the degrees of freedom here is going to be the lower, lower of n one minus one, or n two minus one. Or you take the lower of n one or n two and you subtract one from that. In future videos, we will work through examples that do this.