Main content
Statistics and probability
Course: Statistics and probability > Unit 14
Lesson 2: Chi-square tests for relationships- Filling out frequency table for independent events
- Contingency table chi-square test
- Introduction to the chi-square test for homogeneity
- Chi-square test for association (independence)
- Expected counts in chi-squared tests with two-way tables
- Test statistic and P-value in chi-square tests with two-way tables
- Making conclusions in chi-square tests for two-way tables
© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice
Contingency table chi-square test
Sal uses the contingency table chi-square test to see if a couple of different herbs prevent people from getting sick. Created by Sal Khan.
Want to join the conversation?
- Why do we use the data for both 'sick' and 'not sick' in computing our chi squared statistic? It seems like it will make our result seem more deviant than it really is since in each group the number of 'not sick' people is directly related to the number of 'sick' people.(31 votes)
- Yes, this seems to conflict with the Chi Square Distro Intro where he says the terms of the Chi Square Distro should be independent.(8 votes)
- Isn't there the potential for one herb to be really effective and the other to be ineffective? I feel like those scores could cancel each other out, leading to a Type II error (failing to reject the null when it is false). I don't understand why, if you're interested in testing several conditions, it makes sense to mix all the data up.(7 votes)
- You wouldn't conduct a Chi-square test to answer the question, "are the herbs better than the placebo?" The correct null hypothesis in this example (that can be assessed by a chi-square test) is: Ho: There is no relationship between taking pills and getting sick.(15 votes)
- Athe says that 21% did not get sick and then writes 21% in the row labeled "sick". Did he make a mistake, or am I missing something here? 05:24(11 votes)
- Why use the Chi square statistics to address this problem instead of the Bernoulli one and continue inferring the data as before? Basically, how does one decide which approach to apply?(8 votes)
- Well, a Bernoulli hypothesis test with two samples would work... if we had two samples :) But in this case we have 3 samples( herb 1, herb 2, and placebo). You just can't compare 3 things to see if they are the same. If you say x1-x2 = 0 it means that x1 & x2 are the same. But if you say x1-x2-x3 = 0 you can't really say anything about them, they could be any numbers that add up to zero. So the best way to do it is to use a contingency table with a chi-square test.(6 votes)
- Throughout this hypothesis test the actual values were used, resulting in a chi-square value of 2.53, but I decided to try the whole calculation from scratch using the percentages of each subgroup instead. The result was a chi-square value of 2.08 . So this makes me wonder: is it possible to manipulate the parameters of the study (i.e. obtain a larger sample of the population) to where it would result in a chi-square value greater than our critical chi-square value?(3 votes)
- Actually, Chi-square test statistics are extremely sensitive to the sample size - and it is not because larger samples are inherently better. The chi-square always gets larger with larger sample size, and it always gets small with smaller sample sizes (Daniel's math is correct). Thus, you can have a strong statistical association but fail to find it significant with a chi-square test if you have a small sample size (and the reverse). The chi-square test has many limitations - still it is one of the most useful tests in social statistics. The key is to learn to use it appropriately and to learn to interpret your findings in light of the limitations of a chi-square. For more information, see your friendly stats textbook. I suggest (because I use this book in my own stats classes and have it handy) The Essentials of Social Statistics for a Diverse Society, page 210 for a discussion of this specific issue (sample size and Chi-square test statistics).(9 votes)
- Shouldn't we be performing a two tailed test here? the null hypothesis says that the effect of the herbs is nothing and the alternate hypothesis says that the effect is not nothing.(4 votes)
- By its nature, this chi-square test is one-tailed. This is because you square the observed frequency minus the expected frequency, so you will never get a negative number, so you can never have a negative chi-squared as the result. Consequently, the test only needs to have one tail.(2 votes)
- Why would you include the number of people who got sick and also took an herb in the expected percentage of who would get sick with no interference? I would think the whole point of having a control group would be to get the actual percentage of people who would get sick with no interference and then test the observed for the two herb groups. Based on this test you could see if there was a difference between observed for the herb and expected for no interference. Then you could answer if the herb made a difference or not. What Sal did seems to be was say that if we are assuming that there is no difference we can just include the sick people from the herb categories in the percentage that get sick with no interference.(4 votes)
- great point
please refer to the top questions and answers above (especially by @Anna Mueller(1 vote)
- But don't we do here an overcalculation, counting one the same error two times (because second row = 100% - first row)?(3 votes)
- Hi Varvara. If you are talking about the expected values, they are not percentages so they do not have to add up to 100; only to their line or column total...(2 votes)
- In this video you said that Ho: Herbs are useless. I would have said that Ho:Herbs had effect. How would you know which one to use for Ho in this one? I thought that Ho was something you're trying to prove. Aren't we trying to prove that the herbs have effect?(1 vote)
- Ho is always along the lines of "no difference, no effect, everything stays the same".(3 votes)
- if Anna Mueller is correct and this is not the proper statistical method for assessing this question. what is the correct method(1 vote)
- there are many ways you could correctly answer the question of herbs doing something. One simple way would be to run two separate chi-squares, one that tests herb1 versus placebo and then one that tests herb2 versus placebo. The test you would use in part depends on exactly what question you want answered.(4 votes)
Video transcript
Let's say there are
a couple of herbs that people believe
help prevent the flu. So, to test this, what we
do is we wait for flu season and we randomly assign people
to three different groups. And over the course
of flu season, we have them either in
one group taking herb one, in the second group taking herb
two, and in the third group they take a placebo. And if you don't know what
a placebo is it's something that, to the patient or to
the person participating, it feels like they're taking
something that you've told them might help them,
but it does nothing. It could be just a sugar pill,
just so it feels like medicine. The reason why you would even
go through the effort of giving them something is because
oftentimes there's something called a placebo
effect, where people get better just because they're being
told that they're being given something that will
make them better. So this could, right here,
just be a sugar pill, and a very small amount
of sugar so it really can't affect the actual
likelihood of getting the flu. So over here we have a
table, and this is actually called a contingency table. And it has on it in each group
the number that got sick, the number that didn't get sick. And so we also can from this
calculate the total number. So in group one, we had
a total of 120 people. In group two, we had a total
of 30 plus 110 is 140 people. And in the placebo group, the
group that just got the sugar pill, we had a
total of 120 people. And then we could also
tabulate the number of people, the total number
of people, that got sick. So that's 20 plus 30
is 50 plus 30 is 80. This is the total
column right over here. And then the total people
that didn't get sick over here is 100 plus 110 is
210 plus 90 is 300, and then the total
people here are 380. Both this column and this
row should add up to 380. So with that out of
the way, let's think about how we can use this
information in the contingency table and our knowledge of
the chi-square distribution to come up with some conclusion. So let's just make
a null hypothesis. Our null hypothesis is
that the herbs do nothing. Let's just assume-- let me
get some space here-- so let's assume the null hypothesis
that the herbs do nothing. And then we have our
alternative hypothesis, or alternate hypothesis,
that the herbs do something. Notice I don't even care
whether they actually improve. I'm just saying
they do something. They might even increase your
likelihood of getting the flu. We're not testing whether
they're actually good. We're just saying, are
they different than just doing nothing. So like we do with all
of our hypothesis tests, let's just assume the null. We're going to assume the null
and, given that assumption, figure out if the
likelihood of getting data like this or more
extreme is really low. And if it is really
low, then we will reject the null hypothesis. And in this test, like
every hypothesis test, we need a significance level. And let's say our significance
level we care about for whatever reason
is 10% or 0.10. That's the significance
level that we care about. Now to do this, we
have to calculate a chi-square statistic for
this contingency table. And to do that, we do
it very similar to what we did with the
restaurant situation. We figure out, assuming
the null hypothesis, the expected
results you would've gotten in each of these cells. You could view each of
these entries as a cell. You know that's
what we do with it. You call each of those
entries in Excel also a cell, each of the
entries in a table. What we do is we figure out
what the expected value would have been if you do assume
the null hypothesis. Then we find the
squared distance from that expected value, and
we, I guess you could call it, normalize it by
the expected value. Take the sum of all
of those differences, and if those squares
differences are really big, the probability of getting
it would be really small, and maybe we'll reject
the null hypothesis. So let's just figure out how
we can get the expected number. So we're assuming
the herbs do nothing. So if the herbs do
nothing, then we can just figure out that
this whole population just had nothing happen to them. These herbs were useless. And so we can use this
population sample-- or I shouldn't call
it the population-- we should use this
sample right here to figure out the
expected number of people who would get sick or not sick. And so over here, we have 80
out of 380 did not get sick. And I want to be careful, I
just said the word population, but we haven't sampled
the whole universe of all people taking this herb. This is a sample. So I don't want to confuse you. I was using population in more
of the conversational sense than the statistical sense. But anyway, of our sample--
and we're using all of the data because we're assuming
there's no difference. We might as well just use
the total data to figure out the expected frequency
of getting sick and not getting sick. So 80 divided by 380
did not get sick. And that's 21%. 21% did not get sick. So let me write that over here. So 21, and that's
21% of the total, and then this would be 79% if
we just subtract 1 minus 21. We could divide 300 by 380,
and we should get 79% as well. So you would expect--
one would expect-- that 21% of your total,
based on the total sample right over here, that our
best guess is that 21% percent should be getting sick and 79%
should not be getting sick. So let's look at it for
each of these groups. If we assume that 21%
of these 120 people should have gotten
sick, what would have been the expected
value right over here? So let's just multiply
21% times 120. So let's just multiply
that times 120. That gets us to 25 point--
I'll just round it-- 25.3 people should
have gotten sick. So the expected-- so let
me write it over here, I'll do expected in yellow-- so
the expected right over here. If you assume that 21% of each
group should have gotten sick is that you would have expected
25.3 people to get sick in group one, in herb one group. And then the remainder
will not get sick. So let's just subtract
or I could actually multiply 79% times 120, either
one of those would be good. But let me just take 120 minus
25.3, and then I get 94.7. So you would have expected
94.7 to not get sick. So this is expected again. 94.7 to not get sick. And now let's do that
for each of these groups. So once again, group
two, you would've expected 21% to get sick. 21% of the total people in
that group, so that's 140, so that's 29.4. And then the remainder--
let's see, 140 minus 29.4-- should not have gotten sick. So that gets us this right here. We have 29.4 should have gotten
sick if the herbs did nothing. And then, over
here, we would have 110.6 should not
have gotten sick. And these are pretty close. So, just looking
at the numbers, it looks this herb doesn't
do too much relative to the total, all of
the groups combined. And then in the placebo
group, let's see what happens. Let's see what happens. We expect 21% to get sick,
21% of our group of 120. So it's 25.2. So this right over here. And actually, this should be 25
point-- since we're rounding, actually, these will be
the same number over here-- so I said 21%, but it's 21 point
something something something. The group sizes
are the same, so we should expect the same
proportion to get sick. So I'll say 25.3 just
to make it consistent. The reason why I
got 25.2 just now is because I lost some of the
trailing decimals over here. But since I had
them over here, I'm going to use them
over here as well. And then over here
in this group, you would expect
94.7 to get sick. So if you just actually
relied on this data, it looks like herb two is
actually, to some degree, even worse than the-- oh. No, no, I take that back. It's not worse because you would
have expected a small number, and a lot of people
got sick here. So this is the placebo-- Well anyway, we don't want
to make judgments just staring at the numbers. Let's figure out our
chi-square statistic. And to do that, let's
get our statistic, our chi-square statistic. I'll write it like
this, maybe, for fun. Or maybe I'll write it as a
big X because it's really, this random variables
distribution, is approximately a
chi-square distribution. So I'll write it like that. And, well, we'll talk about the
degrees of freedom in a second. Actually, let me write
it with the curly X, just so you see that
some people write it with the chi instead of the X. So our chi-square
statistic over here. We're literally just going
to find the squared distance between the observed
and expected. And then divide it
by the expected. So it's going to be 20
minus 25.3 squared over 25.3 plus 30 minus 29.4
squared over 29.4-- I'm going to run out of space--
plus 30 minus 25.3 squared over 25.3. And then I'm going to have
to do these over here, so let me just continue it. You could ignore
this H1 over here. So plus 100 minus
94.7 squared over 94.7 plus-- I think you
see where this is going-- 110 minus 110.6
squared over 110.6. And then, finally,
plus 90 minus 94.7-- let me scroll to the
right a little bit-- squared, all of that over 94.7. So let me just get
the calculator out to calculate this. Take a little bit of time. So we have-- I have to
type on the calculator for these parentheses-- so
we have 20 minus 25.3 squared divided by 25.3 plus, open
parentheses, 30 minus 29.4 squared divided by 29.4 plus,
open parentheses, 30 minus 25.3 squared divided
by 25.3-- halfway there-- plus 100,
open parentheses, this is the tedious part,
100 minus 94.7 squared divided by 94.7 plus 110
minus-- I'll let you type it out, we can do a lot
of these in our head, but let me just do it--
110 minus 110.6 squared divided by 110.6-- and
then last one, homestretch, assuming we haven't
made any mistakes-- we have 90 minus 94.7
squared divided by 94.7. And let's see what we get. We get 2.528, so let's
just say it's 2.53. So our chi-square
statistic-- always have trouble saying that-- our
chi-square statistic, assuming the null hypothesis is
correct, is equal to 2.53. Now, the next
thing we have to do is figure out the
degrees of freedom that we had in calculating
the chi-square statistic. And I'll give you
the rule of thumb, and I'll give you a little
bit of a sense of why this is the rule of thumb for
a contingency table like this. And in the future, we'll
talk a little bit more deeply about degrees of freedom. So the rule of thumb
for a contingency table is you have the number of
rows, so you have rows, and then you have your
number of columns. So here we have two rows,
and we have three columns. You don't count the totals. So you have three
columns over here. And the degrees of freedom,
and this is the rule of thumb, the degrees of freedom
for your contingency table is going to be the number of
rows minus 1 times the number of columns minus 1. In our situation, we have
2 rows and 3 columns. So it's going to be 2
minus 1 times 3 minus 1. So it's going to be 2
minus 1 times 3 minus 1, which is just 1
times 2, which is 2. We have 2 degrees of freedom. Now, the reason that
that should make a little bit of
intuitive sense, we'll talk about this in more
depth in the future, is that if you assume
that you know the totals. So let's just assume
that you know the totals. So if you know all of this
information over here, if you know the total
information-- or actually, if you knew the parameters
of the population as well-- but if you know the
total information, and if you know
this information, or if you know r minus 1 of
the information in the rows, the last one can
be figured out just by subtracting from the total. So for example, in this
situation, if you know this, you can easily figure out this. This is not new information,
it's just the total minus 20. Same thing, if you know
this one right over here, this one over here is
not new information. And similarly, if you know
these two, this guy over here isn't new information. You could always
just calculate him based on the total
and everything else. So that's the sense of
why our degrees of freedom are the columns minus 1
times the rows minus 1. But anyway, so our
chi-square statistic has 2 degrees of freedom. So what we have to do is
remember our alpha value-- let me get it up here,
we had it right over here-- our significance
level that we care about, our alpha value is 10%. Let me rewrite it over here. So our alpha is 10%. So what we're going
to do is figure out what is our critical
chi-square statistic that gives us an alpha of 10%. If this is more
extreme than that-- if the probability
of getting this is even less than that
critical statistic-- it'll be less than
10%, and we'll reject the null hypothesis. If it's not more
extreme, then we won't reject the
null hypothesis. So what we need to
do is to figure out with the chi-square distribution
and 2 degrees of freedom, what is our critical
chi-square statistic. So let's just go back. So we have 2 degrees of freedom. We care about a
significance level of 10%. So our critical
chi-square value is 4.60. So another way to
visualize this. If we look at the chi-square
distribution with 2 degrees of freedom, that's this
blue one over here, at a value of-- I'm trying
to pick a nice blue to use-- at a critical value of 4.60. So 4.60-- this is 5-- so 4.60
will be right around here. At a critical value of
4.60, so this is 4.60, the probability of getting
something at least that extreme, so that extreme
or more extreme, is 10%. This is what we care about. Now, if the chi-square
statistic that we calculated falls into this
rejection region, then we're going to reject
the null hypothesis. But our chi-square
statistic is only 2.53. It is only 2.53. So it's sitting someplace right
over here is actually ours. So it's actually not
that crazy to get it if you assume the
null hypothesis. So based on our data
that we have right now, we cannot reject
the null hypothesis. So we don't know for a fact
that the herbs do nothing, but we can't say that they
do something based on this. So we're not going to reject it. We won't say 100%
that it's true, but we can't say that
we're rejecting it. So at least from
this point of view, it doesn't look like
the herbs did anything that would make us
believe that they're any different than each other. And one of the herbs
is obviously a placebo. So any different than a
placebo or each other.