If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Hypothesis test for difference of means

Hypothesis Test for Difference of Means. Created by Sal Khan.

Want to join the conversation?

  • ohnoes default style avatar for user Justin Huang
    Even with an n>30, I still don't agree with using the sample's standard deviation as a valid approximation of the population's standard deviation. Say you test your sample the way Sal does it, and realize that the probability of you getting that sample was 1%. Normally, you would reject the null hypothesis. But say the null hypothesis was indeed correct. This means you just happened to choose a lot samples from the far left or far right of the population mean. Since your sample is representative of such a small extreme section, the standard deviation of your sample would have been a lot smaller than the true standard deviation of the total population. Therefore, since Sal used the sample standard deviation as his population standard deviation, he would have underestimated the population standard deviation, and consequently overestimated the z score of that sample. The degree at which he overestimated? I'm not sure, I think I'd have to solve some recursive function that is really hard to think about right now. Anyways, I'm sure there's something I'm missing, because I have full faith in Sal Khan. Please let me know if I have overlooked something.
    (7 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Dr C
      > "The degree at which he overestimated? I'm not sure, I think I'd have to solve some recursive function that is really hard to think about right now."

      I'm not sure about that. There might be some way to formally express the degree of error, but I like jumping to simulations. The Type I error -- rejecting Ho when Ho is actually true -- is slightly inflated. By the time we get to n>30 it's not really by much, but it's a bit above what it should be. This means that using the Z-test when we should use the T-test will conclude a significant result a bit too often.
      (5 votes)
  • piceratops ultimate style avatar for user Alex
    If we are assuming that the null hypothesis is true (there is no difference between the two diets), why are we using each of the sample standard deviations separately, as if they are separate populations? If the null hypothesis is true, both samples are taken from the same population. Shouldn't we then take an average of the two standard deviations, or recalculate the standard deviation of the whole n=200 sample?
    (8 votes)
    Default Khan Academy avatar avatar for user
  • mr pants teal style avatar for user Adolf Gore
    His critical value Z score is 1.65 because of the 95%. I'm a little confused, because my chart says 95% should be 1.96. Am i using a different chart for something unrelated to z scores?
    (4 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Dr C
      One sided vs two sided is important. If we're testing for a "difference" then we need to split the type 1 error probability (alpha) into the two tails, so 0.05 translates to a z value of 1.96. If we're only looking for an increase or decrease, then we put all of the alpha probability into one tail, which leads to a z value of 1.65.
      (9 votes)
  • blobby green style avatar for user milan.griffes
    At , I don't understand why the mean of the sampling dist. is the same as the mean of the population dist.

    Couldn't your sample mean be quite different? I.e. couldn't you draw a sample that had a higher mean than the overall population ?

    Or is he saying that the mean of the distribution of all the samples you draw is going to be the same as the mean of the overall population?
    (4 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user robshowsides
      Great question. And yes, your last sentence is the correct explanation! This is why inferential stats can be confusing, and why every single word is critical when we talk about means and samples and distributions. As you say, the mean of ONE SAMPLE could be (in fact, almost always IS) different from the population mean. But the mean value of the distribution of all the sample means (phew!) will be the same as the population mean. Of course, you DON'T HAVE "all the sample means", you only have ONE of them (usually)! So the "distribution of all sample means" is usually something we just imagine as a theoretical abstraction. But it's critical to understand what that distribution represents (we have to imagine doing the same experiment many, many times) in order for hypothesis testing to make sense.
      (8 votes)
  • blobby green style avatar for user Rob Brockway
    I'm baffled as to why we keep using our original sample standard deviations as estimates for the population SDs (c. ) once we're assuming the null hypothesis. If (and I might be barking up the wrong tree here) the hypothesis is that there's no meaningful difference whatsoever in weight loss effect between the two diets, why should their SDs remain distinct when imagined across the whole population? If the two groups' data are basically identical when viewed globally, shouldn't their SDs be identical too?
    (5 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user nitin.siwach.iitkgp
      because it would lead to same answer. if you sample twice from the same population then the best variance estimator is ((n1-1)var(x1) + (n2-1)var(x2))/(n1+n2-2) ... i know you understand which symbol means what here .. now calculate for variance of difference of means of two iid samples from this population using the just calculated estimate of variance. It is the same thing as what sal does
      (2 votes)
  • aqualine seed style avatar for user John-Ting Li
    On 6.43 sal used the same "The standard deviation of the difference of the sample mean" as the video he used before. But on the last video on this chapter (named "Hypothesis test comparing population proportions), he calculated at new Standard Deviation for the Null Hypothesis. Why didn't he do it here? Where in both cases the Null hypothesis was u1-u2=0.
    (4 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user Arpit Khullar
      I think you have a valid observation and unfortunately I don't have an answer for it. I scrolled all the answers in this thread and looks like no one has responded to your query so far. Please let me know if you have found/(or happen to find in future) the reason as to why new SD was not calculated after assuming null hypothesis as true.
      (3 votes)
  • blobby green style avatar for user prednus86
    Hello..
    Is it ok, if I use general t-test to solve the same problem?
    (4 votes)
    Default Khan Academy avatar avatar for user
  • spunky sam blue style avatar for user rob8876
    Why do we state the null hypothesis as mu_1 - mu_2 = 0 and not simply mu_1 = mu_2?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Dr C
      Some people do state the null as Ho: µ1 = µ2. It's a matter of preference. I prefer expressing it as Ho: µ1 - µ2 = 0 because it makes the hypothesized value more evident. In this case, zero. But we can test for other differences too.
      (6 votes)
  • blobby green style avatar for user Snedden Gonsalves
    why use one tail test,where in the previous example a two tail graph was used?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • aqualine tree style avatar for user cstrosetta
      It is interesting in the context of looking at diet. A researcher could have found that actually the low fat diet can lead to weight gain, which would be an interesting find. But I guess since we already could see that there was a some weight loss, to just see what the probability of that happening by chance with a one tailed test was. I'm not so sure about it actually favouring the control group though, the p value would only just have to be much less.
      (1 vote)
  • blobby green style avatar for user ilmonen.mika
    Im finding it hard to understand where the sum of two population variance with same means (hypothetically, Ho) is derived. Why is it sum and not something else?
    (3 votes)
    Default Khan Academy avatar avatar for user

Video transcript

In the last video, we came up with a 95% confidence interval for the mean weight loss between the low-fat group and the control group. In this video, I actually want to do a hypothesis test, really to test if this data makes us believe that the low-fat diet actually does anything at all. And to do that let's set up our null and alternative hypotheses. So our null hypothesis should be that this low-fat diet does nothing. And if the low-fat diet does nothing, that means that the population mean on our low-fat diet minus the population mean on our control should be equal to zero. And this is a completely equivalent statement to saying that the mean of the sampling distribution of our low-fat diet minus the mean of the sampling distribution of our control should be equal to zero. And that's because we've seen this multiple times. The mean of your sampling distribution is going to be the same thing as your population mean. So this is the same thing is that. That is the same thing is that. Or, another way of saying it is, if we think about the mean of the distribution of the difference of the sample means, and we focused on this in the last video, that that should be equal to zero. Because this thing right over here is the same thing as that right over there. So that is our null hypothesis. And our alternative hypothesis, I'll write over here. It's just that it actually does do something. And let's say that it actually has an improvement. So that would mean that we have more weight loss. So if we have the mean of Group One, the population mean of Group One minus the population mean of Group Two should be greater then zero. So this is going to be a one tailed distribution. Or another way we can view it, is that the mean of the difference of the distributions, x1 minus x2 is going to be greater then zero. These are equivalent statements. Because we know that this is the same thing as this, which is the same thing as this, which is what I wrote right over here. Now, to do any type of hypothesis test, we have to decide on a level of significance. What we're going to do is, we're going to assume that our null hypothesis is correct. And then with that assumption that the null hypothesis is correct, we're going to see what is the probability of getting this sample data right over here. And if that probability is below some threshold, we will reject the null hypothesis in favor of the alternative hypothesis. Now, that probability threshold, and we've seen this before, is called the significance level, sometimes called alpha. And here, we're going to decide for a significance level of 95%. Or another way to think about it, assuming that the null hypothesis is correct, we want there to be no more than a 5% chance of getting this result here. Or no more than a 5% chance of incorrectly rejecting the null hypothesis when it is actually true. Or that would be a type one error. So if there's less than a 5% probability of this happening, we're going to reject the null hypothesis. Less than a 5% probability given the null hypothesis is true, then we're going to reject the null hypothesis in favor of the alternative. So let's think about this. So we have the null hypothesis. Let me draw a distribution over here. The null hypothesis says that the mean of the differences of the sampling distributions should be equal to zero. Now, in that situation, what is going to be our critical region here? Well, we need a result, so we're going to need some critical value here. Because this isn't a normalized normal distribution. But there's some critical value here. The hardest thing is statistics is getting the wording right. There's some critical value here that the probability of getting a sample from this distribution above that value is only 5%. So we just need to figure out what this critical value is. And if our value is larger than that critical value, then we can reject the null hypothesis. Because that means the probability of getting this is less than 5%. We could reject the null hypothesis and go with the alternative hypothesis. Remember, once again, we can use Z-scores, and we can assume this is a normal distribution because our sample size is large for either of those samples. We have a sample size of 100. And to figure that out, the first step, if we just look at a normalized normal distribution like this, what is your critical Z value? We're getting a result above that Z value, only has a 5% chance. So this is actually cumulative. So this whole area right over here is going to be 95% chance. We can just look at the Z table. We're looking for 95% percent. We're looking at the one tailed case. So let's look for 95%. This is the closest thing. We want to err on the side of being a little bit maybe to the right of this. So let's say 95.05 is pretty good. So that's 1.65. So this critical Z value is equal to 1.65. Or another way to view it is, this distance right here is going to be 1.65 standard deviations. I know my writing is really small. I'm just saying the standard deviation of that distribution. So what is the standard deviation of that distribution? We actually calculated it in the last video, and I'll recalculate it here. The standard deviation of our distribution of the difference of the sample means is going to be equal to the square root of the variance of our first population. Now, the variance of our first population, we don't know it. But we could estimate it with our sample standard deviation. If you take your sample standard deviation, 4.67 and you square it, you get your sample variance. And so this is the variance. This is our best estimate of the variance of the population. And we want to divide that by the sample size. And then plus our best estimate of the variance of the population of group two, which is 4.04 squared. The sample standard deviation of group two squared. That gives us variance divided by 100. I did before in the last. Maybe it's still sitting on my calculator. Yes, it's still sitting on the calculator. It's this quantity right up here. 4.67 squared divided by 100 plus 4.04 squared divided by 100. So it's 0.617. So this right here is going to be 0.617. So this distance right here, is going to be 1.65 times 0.617. So let's figure out what that is. So let's take 0.617 times 1.65. So it's 1.02. This distance right here is 1.02. So what this tells us is, if we assume that the diet actually does nothing, there's a only a 5% chance of having a difference between the means of these two samples to have a difference of more than 1.02. There's only a 5% chance of that. Well, the mean that we actually got is 1.91. So that's sitting out here someplace. So it definitely falls in this critical region. The probability of getting this, assuming that the null hypothesis is correct, is less than 5%. So it's smaller probability than our significance level. Actually, let me be very clear. The significance level, this alpha right here, needs to be 5%. Not the 95%. I think I might have said here. But I wrote down the wrong number there. I subtracted it from one by accident. Probably in my head. But anyway, the significance level is 5%. The probability given that the null hypothesis is true, the probability of getting the result that we got, the probability of getting that difference, is less than our significance level. It is less than 5%. So based on the rules that we set out for ourselves of having a significance level of 5%, we will reject the null hypothesis in favor of the alternative that the diet actually does make you lose more weight.