Main content
Course: AP®︎/College Statistics > Unit 11
Lesson 4: Confidence intervals for the difference of two means- Conditions for inference for difference of means
- Conditions for inference on two means
- Constructing t interval for difference of means
- Calculating confidence interval for difference of means
- Two-sample t interval for the difference of means (calculator-active)
© 2024 Khan AcademyTerms of usePrivacy PolicyCookie Notice
Constructing t interval for difference of means
Constructing t interval for difference of means.
Want to join the conversation?
- Why don't we estimate the population standard deviation by the formula mentioned in https://www.khanacademy.org/math/ap-statistics/summarizing-quantitative-data-ap/measuring-spread-quantitative/v/sample-standard-deviation-and-bias(3 votes)
- At5:12, Khan said we should use t-value instead of z-value for better estimations. But from some of the previous videos, I understood only if the sample size is smaller than 30(each group) then we will use t-value? As long as we have more than 30 samples we can use z-score to calculate the difference between means??(2 votes)
- Hi Alexia,
It can be confusing at times because sometimes teachers leave things out when they first teach things, but there are two times when you should use a critical t value instead of a critical z value;
1. When the sample size (n) is less than 30.
2. We are estimating the standard deviation using the Standard Error (S/SE).
NOTE: I believe the 2nd condition only applies when your comparing distributions (like in the video!).
Hope this helps,
- Convenient Colleague(3 votes)
- Why not the same logic was applied for confidence intervals for difference between two proportions? there z value was used even though estimators were used in SD?! Check this https://www.khanacademy.org/math/ap-statistics/two-sample-inference/two-sample-z-interval-proportions/v/confidence-intervals-for-the-difference-between-two-proportions(2 votes)
- 3:56through4:21
Why are you estimating the standard deviation of the sampling distribution of sample means with the sample standard deviation? Shouldn't you still divide by the square root of the sample size?
Sigma sub x bar is approximated by s over radical n
σ[x] ~ s/sqrt(n)(1 vote) - Why doesn't my dad love me?(1 vote)
- Why is the variance of the difference between the sample means equal to the sum of the variances of the sample means?(1 vote)
- whaaat is the difference bn mean and proportions i am baffled(1 vote)
Video transcript
- [Instructor] Let's say
that we have two populations. So that's the first population, and this is the second
population right over here. And we are going to think about the means of these populations. So let's say this first population is a population of golden retrievers and this second population is
a population of chihuahuas. And the mean that we're
going to think about is maybe the mean weight. So mu one would be the true mean weight of the population of golden retrievers. And mu two would be the true mean weight of the population of chihuahuas. Now what we want to think about is what is the difference between
these two population means, between these two population parameters. Well, if we don't know this, all we can do is try to estimate it and maybe construct some type of confidence interval. And that's what we're going
to talk about in this video. So how do we go about doing it? Well, we've seen this, or
similar things, before. What you would do is
you would take a sample from both populations. So from population one
here, I would take a sample of size n sub one. And from that, I can
calculate a sample mean. So this is a statistic that
is trying to estimate that. And I can also calculate a
sample standard deviation. And I can do the same thing in
the population of chihuahuas, if that's what our
population two is all about. So I could take a sample, and actually this sample
does not have to be the same as n one, so I'll call it n sub two. It could be, but doesn't have to be. And from that, I can
calculate a sample mean, x bar sub two, and a
sample standard deviation. So now, assuming that our
conditions for inference are met, and we've talked about those before. We have the random condition,
we have the normal condition, we have the independence condition. Assuming those conditions are met, and we talk about those
in other videos for means, let's think about how we can construct a confidence interval. And so you might say,
alright, well that would be the difference of my sample means, x bar sub one minus x bar sub two, plus or minus some z value, times my standard deviation, times the standard deviation
of the sampling distribution of the difference of the sample means. So x bar sub one minus x bar sub two. And you might say, well,
where do I get my z from? Well, our confidence level
would determine that. Confidence, confidence level. If our confidence level is 95%,
that would determine our z. Now this would not be incorrect,
but we face a problem, because we are going to need to estimate what the standard deviation
of the sampling distribution of the difference between
our sample means actually is. To make that clear, let
me write it this way. So the variance of the
sampling distribution of the difference of our sample means is going to be equal to the variance of the sampling distribution
of sample mean one, plus the variance of the
sampling distribution of sample mean two. Now if we knew the true
underlying standard deviations of this population and this population, then we could actually come up with these. In that case, this right over here would be equal to the
variance of the population of population one, divided
by our sample size, n one plus, plus the variance of the
underlying population two divided by this sample size. But we don't know these variances, and so we try to estimate them. So we estimate them with our
sample standard deviations. So we say this is going to
be approximately equal to our first sample standard
deviation squared over n one plus our second
sample standard deviation squared over n two. And so we can say that an
estimate of the standard deviation of the sampling distribution
of the difference between our sample means and estimate is going to be equal to
the square root of this. It's going to be approximately equal to the square root of s
one squared over n one plus s two squared over n two. But the problem is is
once we use this estimate that we can figure out, a critical z value isn't going to be as good
as a critical t value. So instead, you would say
my confidence interval is going to be x bar sub
one minus x bar sub two plus or minus a critical t value, instead of a z value. Because that works better
when you are estimating standard deviation of
the sampling distribution of the difference
between the sample means. And so you have t star
times our estimate of this which is going to be
equal to the square root of s sub one squared over n one plus s sub two squared over n two. And then you might say, well,
what determines our t star? Well once again, you would
look it up on a table using your confidence level. And you might be saying, wait, hold on, when I look up a t
value, I don't just care about a confidence
level, I also care about degrees of freedom. What is going to be the degrees of freedom in this situation? Well, there's a simple answer
and a complicated answer. Once we think about the
difference of means, there's fairly sophisticated formulas that computers can use to get a more precise degrees of freedom. But what you will typical
see in a statistics class is a conservative view
of degrees of freedom, where you take the lower
of n one and n two, and you subtract one from that. So the degrees of freedom here, so the degrees of freedom
here is going to be the lower, lower of n one minus one, or n two minus one. Or you take the lower of n one or n two and you subtract one from that. In future videos, we will
work through examples that do this.