Pearson's Chi Square Test (Goodness of Fit) Pearson's Chi Square Test (Goodness of Fit)
Pearson's Chi Square Test (Goodness of Fit)
⇐ Use this menu to view and help create subtitles for this video in many different languages. You'll probably want to hide YouTube's captions if using these subtitles.
- I'm thinking about buying a restaurant.
- So I go and ask the current owner, what is the
- distribution of the number of costumers you get each day.
- He said, oh I've already figured that out. And he gives me this distribution over here.
- It essentially says, %10 of his costumers come on Monday;
- %10 on Tuesday; %15 on Wednesday, so forth and so on.
- They're closed on Sundays.
- So this is 100% of their costumers for a week. If you add it all up you will get 100%.
- I obviously am a little suspicious. So I decide to see
- how good this distribution that he is describing actually fits observed data.
- So I actually observe the number of costumers when they come in during the week,
- and this is what I get for observed data.
- So to figure out whether I want accept or reject his hypothesis right here.
- I'm gonna do a little of a hypothesis test.
- So I'll make the null hypothesis that the owner's distribution is correct.
- And then the alternative hypothesis is going to be, that it's not correct.
- That it's not a correct distribution, that I should not feel reasonably okay relying on this.
- I should reject the owner's distribution.
- I want to do this with a significance level of 5%.
- Or another way of thinking about it, I'm going to calculate a statistic based on the data right here.
- It's going to be a chi-square statistic.
- Or the statistic I'm going to calculate has approximately a chi-square distribution.
- And given it has a chi-square distribution with a certain number of degrees of freedom--
- I'm gonna calculate that--
- what I want to see is the probability of getting a result like this or more extreme < 5%.
- If the probability of getting a result like this or something less likely than this
- is less than 5%, then I'm going to reject the null hypothesis,
- which is essentially rejecting the owner's distribution.
- If I don't get that, if I say, hey, the probability of getting a chi-square statistic
- then I'm not gonna reject it. I have no reason to really assume he's lying. Let's do that.
- So to calculate the chi-square statistic,
- here we're assuming the owner's distribution is correct.
- So assuming the owner's distribution was correct
- what would have been the expected observed?
- So we have the expected percentage here, but what would have been the expected observed?
- Let me write it here, expected.
- So we would have expected 10% of the total customers of that week to come on Monday;
- 10% of the total customers of that week to come on Tuesday;
- 15% to come on Wednesday... To figure
- out what that that actual number is, we need to figure out the total number of customers.
- So let's add these numbers right here.
- We have-- calculator out--
- so we have 30+14+34+45+57+20.
- So there's a total of 200 customers who came into the restaurant that week.
- So let me write this down.
- So this is equal to-- so I'll write the total over here. Total.
- Ignore this right here. I had 200 customers coming for the week.
- What is the expected number on Monday?
- Well, on Monday, we would've expected 10% of the 200 to come in.
- 20 customers, 10% times 200.
- On Tuesday, another 10%, so we would've expected 20 customers.
- Wednesday 15% of 200, that's 30 customers.
- On Thursday, 20% of 200 customers, so that would've been 40 customers.
- Then on Friday, 30%, that would've been 60 customers.
- And then on Saturday, 15% of 200, it would've been 30 customers.
- So if this distribution is correct, this is the actual number I would have expected.
- Now to calculate our chi-square statistic,
- let me just show it to you. And I'll write it instead of chi, I'm writing a capital X2.
- Sometimes someone will write the actual Greek letter chi here.
- But I'll write the X2 here to show-- let me write it this way,
- this is our chi-square statistic.
- But I'm going to write it with a X instead of a chi, because this is going to be
- approximately a chi-square distribution.
- I can't assume that it's exactly. So here we're dealing approximation right here.
- But it's fairly straight-forward to calculate.
- We take for each of the days, we take the difference between the observed and the expected.
- So it's going to be 30-20.
- I'll do the first one color coded.
- Divided by the expected.
- So we're essentially taking the square of
- almost kind of the error between what we observed and expected.
- Or the difference between what we observed and expected.
- We're kind of normalizing it by the expected right over here.
- We want to take the sum of all of these. I'll do all of those in yellow.
- So + (14-20)2/20 + (34-30)2/30 + (45-40)2/40 + (57-60)2/60 + (20-30)2/30.
- I just took the observed minus expected squared over the expected and took the sum of it.
- And this is what gives us chi-square statistic.
- Now let's just calculate what this number is going to be.
- So this is going to be equal to what?
- 30-20 is 10 squared which is 100 divided by 20, which is 5.
- I might not be able to do all of them in my head like this.
- Plus-- actually, let me I write it this way, so you see what I'm doing.
- This is going to be 100/20,
- + 14-20 is -6 squared is positive 36. So plus 36/20.
- + 34-30 is 4, squared is 16, so +16/30.
- + 45-40 is 5, squared is 25, so +25/40.
- + the different here is three squared is 9, so it's 9/60.
- + we have a difference of 10, squared is 100, over 30, +100/30.
- And this is equal to-- I'll get the calculator out for this. This is equal to--
- we have 100/20+36/20+16/30+25/40+9/60+100/30.
- It gives us 11.44.
- Let me write that down. This right here is going to be 11.44.
- This is my chi-square statistic, or you can call it X2. Sometimes
- you'll have it written as chi-squared, but this is approximately--
- this statistic is going to have approximately chi-square distribution
- Anyway with that said, let's figure out, if we assume that has a roughly chi-square distribution,
- what is the probability of getting a result at least this extreme?
- Or another way to say it,
- is this a more extreme result than the critical chi-square value
- that there's a 5% chance of getting result that extreme?
- So let's do it that way, let's figure our the critical chi-square value,
- and if this is more extreme than that, then we will reject our null hypothesis.
- So let's figure out our critical chi-square value.
- So we have an alpha, 5 percent.
- Actually, another thing to figure out is the degrees of freedom.
- The degrees of freedom here, we're taking 1, 2, 3, 4, 5, 6 sums.
- So you might be tempted to say that the degrees of freedom are six.
- But one thing to realize is that if you had all of this information over here,
- you could actually figure out this last piece of information.
- So actually have 5 degrees of freedom.
- When you have n data points like this, and you're measuring the observed vs expected,
- your degrees of freedom are going to be n-1,
- because you can figure out that nth data point,
- just based on everything else that you have, all of the other information.
- So our degrees of freedom here are going to be 5, n-1.
- Our significant level is 5% and our degrees of freedom is also going to be 5.
- So let's look at our chi-square distribution.
- We have a degree of freedom of five; we have a significance level of 5%.
- And so the critical chi-square value's 11.07. Let's go to this chart.
- We have a chi-square distribution with a degree of freedom of 5.
- So that's this distribution over here in magenta.
- And we care about a critical value of 11.07.
- So this is right here. We can't even see it on this.
- If I were to keep drawing this magenta thing, all the way over here,
- you'd have 8,
- over here, you'd have 10, over here, you'd have 12.
- 11.07 may be someplace right over there.
- So what it's saying is that the probability of getting a result at least as extreme as 11.07 is 5%.
- Our results, so our critical chi-square value is equal to-- we just saw-- 11.07.
- Let me look at the chart again. 11.07.
- The result we got for our statistic is even less likely than that.
- The probability is less than our significance level.
- So then we are going to reject.
- So the probability of getting-- 11.44 is more extreme than our critical chi-square level.
- So it is very unlikely that this distribution is true.
- So we will reject what he's telling us; we'll reject this distribution.
- It's not a good fit based on the significance level.
Be specific, and indicate a time in the video:
At 5:31, how is the moon large enough to block the sun? Isn't the sun way larger?
Have something that's not a question about this content?
This discussion area is not meant for answering homework questions.
Share a tip
When naming a variable, it is okay to use most letters, but some are reserved, like 'e', which represents the value 2.7831...
Have something that's not a tip or feedback about this content?
This discussion area is not meant for answering homework questions.
Discuss the site
For general discussions about Khan Academy, visit our Reddit discussion page.
Flag inappropriate posts
Here are posts to avoid making. If you do encounter them, flag them for attention from our Guardians.
- disrespectful or offensive
- an advertisement
- low quality
- not about the video topic
- soliciting votes or seeking badges
- a homework question
- a duplicate answer
- repeatedly making the same post
- a tip or feedback in Questions
- a question in Tips & Feedback
- an answer that should be its own question
about the site