Main content

## AP®︎/College Statistics

### Course: AP®︎/College Statistics > Unit 11

Lesson 5: Testing for the difference of two population means- Hypotheses for a two-sample t test
- Example of hypotheses for paired and two-sample t tests
- Writing hypotheses to test the difference of means
- Two-sample t test for difference of means
- Test statistic in a two-sample t test
- P-value in a two-sample t test
- Conclusion for a two-sample t test using a P-value
- Conclusion for a two-sample t test using a confidence interval
- Making conclusions about the difference of means

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Two-sample t test for difference of means

AP.STATS:

DAT‑3.G (LO)

, DAT‑3.G.1 (EK)

, DAT‑3.H (LO)

, DAT‑3.H.1 (EK)

, DAT‑3.H.2 (EK)

, VAR‑7 (EU)

, VAR‑7.F (LO)

, VAR‑7.F.1 (EK)

, VAR‑7.G (LO)

, VAR‑7.G.1 (EK)

, VAR‑7.I (LO)

, VAR‑7.I.1 (EK)

Given data from two samples, we can do a signficance test to compare the sample means with a test statistic and p-value, and determine if there is enough evidence to suggest a difference between the two population means.

## Want to join the conversation?

- wont we use the pooled estimate of the common standard deviation.

sp = sqrt((n1-1)s1^2 + (n2-1)s2^2)/n1+n2-2)

and then use this sp in the test statistic formula??.

Pls revert(6 votes)- A pooled standard deviation is used when we assumed we don't know the population variances, and they are EQUAL. In the video, the population variances are assumed to be unknown and UNEQUAL. I hope this helps.(2 votes)

- isn't the P (T is greater than or equal to 2,44) = 0,024 wrong? It should just be 0,012 right? The sentence Sal described in the P-value only contains one side. If he wanted it to contain both, it would have to be P (T is greater than or equal to 2,44 + T is less than or equal to -2,44), or am I wrong?(2 votes)
- The way Sal wrote it was a little misleading. He wrote
`P(|T| ≥ 2.44)`

, notice the || lines. These lines mean absolute value. Since |T| will always be positive, the statement will be true if T is greater than 2.44 or less than -2.44.

So,`P(|T| ≥ 2.44) = P(T ≥ 2.44 + T ≤ -2.44) = 0.24`

and`P(T ≥ 2.44) = 0.12`

.

Hope this helps! (:(5 votes)

- At4:09, sal took the probability of absolute(t)>than 2.44. Why did he do that, and in which cases would you take less than?(3 votes)
- To answer your second question, in addition to what Muhammed El-Yamani said, you would take less than when you need the one-tailed probability; i.e. when your alternative hypothesis states not
`μ1 ≠ μ2`

but`μ2 - μ1 > 0`

.(2 votes)

- Why don't we use the standard deviation of combined samples as an estimate of the standard deviation (as we are assuming null hypothesis as true for calculating p-value - as mentioned in hypothesis testing for difference in proportions)?(3 votes)
- I don't have a calculator that can calculate the p-value, what can I do instead?(2 votes)
- There are two ways: one, which is the more practical option, would be to use a z-table, which gives you the area under a normal distribution for a given z statistic. (http://users.stat.ufl.edu/~athienit/Tables/Ztable.pdf).

Another way, which is more complicated, would be to use the formula for the standard normal curve (see http://mathworld.wolfram.com/NormalDistribution.html). Since a p-value is essentially area under this curve, you would have to take the integral of the standard normal curve with the appropriate bounds.(2 votes)

- When should we assume equal standard deviations in a test (due to us assuming the null hypothesis)?

I saw it done in a Hypothesis Test but now I'm slightly confused. :\

Any help?(2 votes) - Here you find the p-value of field A on B, but if you find the p-value of field B on A would it be different and why?(1 vote)
- If you switched A and B in the subtraction, you would just get a negative result (similar to how 5 - 3 = 2, but 3 - 5 = -2). Then when you used a t-table or the tcdf() function, you would just have to find the area of the high end of the distribution instead of the area of the low end (or vise versa). You should end up with the same result though.

Hope this helped! (:(3 votes)

- When would we divide by n-1?(1 vote)
- PROBLEM: The purpose of this experiment is to determine if attending the review session for the distance education course, Statistics For The Behavioral Sciences: Psyc 2317, will affect scores.(1 vote)
- p-value is said to be the smallest significance to reject H0, yet it is also said to be largest significance level to still reject H0. What is the difference in these circumstances?(1 vote)

## Video transcript

- [Instructor] "Kaito grows
tomatoes in two separate fields. "When the tomatoes are ready to be picked, "he is curious as to whether
the sizes of his tomato plants "differ between the two fields. "He takes a random sample
of plants from each field "and measures the heights of the plants. "Here is a summary of the results:" So what I want you to
do, is pause this video, and conduct a two sample T test here. And let's assume that
all of the conditions for inference are met,
the random condition, the normal condition, and
the independent condition. And let's assume that we are working with a significance level of 0.05. So pause the video, and conduct
the two sample T test here, to see whether there's evidence that the sizes of tomato plants
differ between the fields. Alright, now let's work
through this together. So like always, let's first
construct our null hypothesis. And that's going to be the situation where there is no difference
between the mean sizes, so that would be that
the mean size in field A is equal to the mean size in field B. Now what about our alternative hypothesis? Well, he wants to see whether
the sizes of his tomato plants differ between the two fields. He's not saying whether
A is bigger than B, or whether B is bigger than A, and so his alternative hypothesis would be around his suspicion, that the mean of A is not
equal to the mean of B, that they differ. And to do this two sample T test now, we assume the null hypothesis. We assume our null hypothesis, and remember we're assuming that all of our conditions for inference are met. And then we wanna calculate a T statistic based on this sample data that we have. And our T statistic is going to be equal to the differences
between the sample means, all of that over our estimate of the standard deviation
of the sampling distribution of the difference of the sample means. This will be the sample standard deviation from sample A squared, over
the sample size from A, plus the sample standard deviation from the B sample squared,
over the sample size from B. And let's see, we have
all the numbers here to calculate it. This numerator is going to
be equal to 1.3 minus 1.6, 1.3 minus 1.6, all of that over the square root of, let's see, the standard deviation, the
sample standard deviation from the sample from field A is 0.5. If you square that, you're gonna get 0.25, and then that's going to
be over the sample size from field A, over 22, plus 0.3 squared, so that is, 0.3 squared is 0.09, all of that over the
sample size from field B, all of that over 24. The numerator is just gonna be -.3, divided by the square root of .25 divided by 22, plus .09 divided by 24, and that gets us -2.44. Approximately -2.44. And so if you think
about a T distribution, and we'll use our calculator
to figure out this probability, so this is a T distribution
right over here, this would be the assumed
mean of our T distribution. And so we got a result that is, we got a T statistic of -2.44, so we're right over
here, so this is -2.44. And so we wanna say
what is the probability from this T distribution
of getting something at least this extreme? So it would be this area, and
it would also be this area, if we got 2.44 above the mean,
it would also be this area. And so what I could do is,
I'm gonna use my calculator to figure out this
probability right over here, and then I'm just gonna
multiply that by two, to get this one as well. So the probability of getting a T value, I guess I could say
where its absolute value is greater than or equal to 2.44, is going to be approximately equal to, I'm going to go to second, distribution, I'm going to go to the
cumulative distribution function for our T distribution, click that. And since I wanna think about
this tail probability here that I'm just gonna multiply by two, the lower bound is a very
very very negative number, and you could view that as
functionally negative infinity. The upper bound is -2.44. - 2.44. And now what's our degrees of freedom? Well if we take the conservative approach, it'll be the smaller of
the two samples minus one. Well the smaller of the two samples is 22, and so 22 minus one is 21. So put 21 in there. Two... 21. And now I can paste, and I get
that number right over there, and if I multiply that by
two, 'cause this just gives me the probability of getting
something lower than that, but I also wanna think
about the probability of getting something 2.44
or more above the mean of our T distribution. So times two, is going to be
equal to approximately 0.024. So approximately 0.024. And what I wanna do then is compare this to my significance level. And you can see very clearly,
this right over here, this is equal to our P value. Our P value in this situation, our P value in this
situation is clearly less than our significance level. And because of that, we said hey, assuming the null hypothesis is true, we got something that's
a pretty low probability below our threshold, so
we are going to reject our null hypothesis, which
tells us that there is, so this suggests, this suggests the alternative hypothesis, that there is indeed a
difference between the sizes of the tomato plants in the two fields.