Chi-square probability distribution
Contingency Table Chi-Square Test Contingency Table Chi-Square Test
⇐ Use this menu to view and help create subtitles for this video in many different languages.
You'll probably want to hide YouTube's captions if using these subtitles.
- lets say there are a couple of herbs that believe help prevent the flu.
- so to test this what we do is we wait for flu season and
- randomly assign people to three different groups.
- And over the course of flu season,we have them either in one group taking
- we have one group taking herb one and the second group taking herb two
- And the third group they take a placebo.
- And if you don't know what a placebo is, its something thatto the patient or to the person
- participating in it it feels like its doing something that you told them might help them
- but it does nothing. It could be just a sugar pill so it feels like medicine
- The reason you go through the effort of giving them something is cause often
- times there is something called the placebo effect where people
- get better just because they are told that they are being given something that
- will make them better.
- So this right here could just be a sugar pill.
- And a very small amount of sugar so it really cant affect their
- actual likelyhood of getting the flu.
- over here we have a table and this is actually called a contingency
- table, contingency table,contingency table and it has on
- it on each group the number that got sick and the number
- that did not get sick. And from this we can also calculate the total number
- So in group 1 we had a total of 120 people and group 2 we had
- a total of 30 plus 110 is a 140 people and
- the placebo group, that is the group that just got the sugar pill,
- we had a total of 120 people and we can also tabulate the total number of people that got sick
- so thats 20 plus 30 that is 50 plus 30 is 80
- so that is the total column right over here and the total people that didnt get sick
- over here is 100 and 110 is 210 plus 90 is 300
- ant the total people here are 380 both this column and this row should
- add up to 380. so with this out of the way lets think about how we
- can use this information in the contingency table and our
- knowledge of the chi square distribution
- to come up with some conclusion.
- so lets come up with some null hypothesis. so our null hypothesis
- is that the herbs do nothing.
- the herbs do nothing.do nothing. and we have our alternative
- hypothesis that the herbs do something. herbs do something.
- notice i dont even care whether they actually improve.
- Im just saying they do something. they may even increase your likelyhood
- of getting the flu. all we are not testing if they are actually good.
- we are just saying are they different than doing nothing.
- so like we did with all of our hypothesis tests, lets just assume
- the null . we are going to assume the null and given that assumption,
- figure out the problems. figure out if the likelyhood of getting data like this
- or more extreme is really low.
- and if it is really low, and if it is really low, we reject it.
- and in this test like in every hypothesis test, we need a significance level and the significance level
- we care about for whatever reason is 10% or 0.10 thats the significance level we care about.
- now to do this we need to calculate a chi square statistic for this contingency table. and to do this,
- we do something very similiar with what we did with the restaurant situation. we figure out assuming
- the null hypothesis, the expected results in each of these cells in each of these entries as a cell,
- thats what we do in excel. each of the entries in excel.
- each of the entries in a table.
- we figure out what each of the values would have been if you do assume the null hypothesis, we find the
- square distance from that expected value and you normalize it by the expected value ,take the sum of
- all those differences and if the square differences are really big, the probability of getting it would
- be really small and maybe we can reject the null hypothesis so lets just figure out how you can get the
- expected number .
- so we are assuming that the herbs do nothing. so if the herbs do nothing, we can just figure out that
- this whole population had nothing happen to them and the herbs were useless and so we can use this population
- sample,i shouldnt call this a population sample. we can use this sample right here to figure out the
- expected number of people who would get sick or not sick so over here, we have 80 out of 380 did not
- get sick. i should be careful here. i just said population but we havent sampled the whole universe of
- people taking this herb so this is a sample. i dont want to confuse you . i am using population in more
- of a conversational sense rather than a statistical sense. any way in all of our sample, we are using
- all of the data because we are assuming that there is no difference so we might aswell use all of the
- data to find the frequency of getting sick and not getting sick so 80 divided by 380 didnot get sick
- so thats 21 percent. 21 percent didnot get sick. so 21 percent, thats 21 percent of the total and ten
- if this will be 79 percent if we just subtract so we should divide 300 by 380, we should get 79 percent
- as well so one would expect the 21 percent of your total based on the total sample righ over here that
- 21 percent should be getting sick and 79 percent should not be getting sick. so lets look at this for
- each of the group. if we assume that 21 percent of this 120 people should have gotten sick, what would
- have been the expected value right over here? lets just multiply 21 percent times a 120. so lets just
- multiply that times 120 that gets us to 25.3 people. ill just round it. so the expected ill just write
- this over here. ill do expected in yellow. so the expected right over here . so if you assume that 21
- percent of a group would have gotten sick, you would have expected 25.3 people to get sick in group 1
- in herb 1 group and the remainder will not get sick. so lets just subtract. or i can multiply 79 percent
- times a 120 either one of them will be good. but let me just take 120 - 25.3 and i get 94.7. so you would
- have expected 94.7 to not get sick. so this is expected again. expected. expected. 94.7 to not get sick
- and i also do that for each of those groups. so once again group 2, you would have expected 21 percent
- to get sick. 21 percent of the total people in that group thats 140 that 29.4 and the remainder that
- is 140- 29.4 should not have gotten sick. so that gets us this right here. we have 29.4 should have
- gotten sick if the herbs did nothing. and over here we have 110.6 should not have gotten sick and this
- is pretty close and by just looking at the result it looks like the herb doesnt do too much relative
- to all of the groups combined. and in the placebo group, lets see what happens. we expect 21 percent
- to get sick of our group of 120 . so thats 25.2 so this right here, actually this would be the same number
- over here. i said 21 percent but it is actually 21. something. but the group sizes are the same and we
- should expect the same proportion to get sick ill say 25.3. just to make it consistant. the reason why
- i got 25.2 is because i lost some of the trailing decimals over here . but since i had them over here,
- im gonna use them over here as well and over here, in this group, you would expect 94.7 to get sick.
- lets figure out our chi square statistic. so to figure this out, lets get our statistic, our chi square
- statistic. ill write it like this here for fun. or maybe ill right it as a big X because its really
- , this random distribution is approximately a chi square distribution. so ill write it like that. and
- well talk about the degrees of freedom in a second acually let me write it in a curly axis.
- so you see that some people write with the chi instead of the x. so our chi square statistic over here,
- we are literally going to find the squared distance between the observed and the expected divided by
- the expected which will be 20-25.3 squared over 25.3 plus 30-29.4 squared over 29.4 + 30 -25.3 squared
- over 25.3 and then im gonna have to do these over here so let me just continue it you can ignore this
- h1 over here + 100-94.7 squared over 94.7 +110-110.6 squared over 110.6 and finally 90-94.7 squared over
- 94.7. so we have (20-25.3) squared /25.3+(30-29.4 )squared/29.4 + (30-25.3)squared/25.3+(100-94.7)squared
- /94.7+(110-110.6)squared /110.6+(90-94.7)squared/94.7
- we get 2.53.so our chi square statistic assuming the null hypothesis is correct is equal to 2.53
- next thing we need to do is figure out the degrees of freedom we had while doing this. and ill give you
- the rule of thumb. that is you have the number of rows. so you have the rows and you have the number
- of columns so you have 2 rows and 3 columns you dont count the total. so the degrees of freedom is for
- your contingency table is the number of rows-1 times the number of columns-1.
- in our situation we have two rows and 3 columns so that will be 2-1 times 3-1.
- so that is going to be 2.we have 2 degrees of freedom. the reason why that should have some intuitive
- sense is that if you assume that you know the total. if you know all of this information, over here,
- if you know the total information, actually if you knew the parameters of the population as well, but
- if you knew the total information, and you knew this information or if you knew r-1 of the informations
- in the rows, the last one can be figured out if you subtract it from the otal. for example, in this situation,
- if you knew this, you could easily figure out this. this is not new information.l this is just total
- - 20. same thing, if you knew this one over here, this one is not new information. similarly,if you knew
- these two, this one here id not new information. you can calculate this based on the total and everything
- else. so that's the sense as to why the degrees of freedom are the columns - 1 times the rows - 1 .
- so any way, our chi square statistic has two degrees of freedom. so remember our alpha value is 10 %.
- so we are going to figure out what our critical chi square statistic is that gives us an alpha of 10
- % and if this is more extreme than that, if the probability of getting this is even less than that critical
- statistic, we can reject the null hpothesis, and if it is not more extreme, we will not reject the null
- hypothesis. so what we need to do is to figure out what the chi square distribution is and 2 degrees
- of freedom, what is our critical chi square statistic. so lets just go back, we have two degrees of freedom
- here and we have a care about a significance level of 10 % so our critical chi square value is 4.60.
- another way to visualize this is if you look at a kai square distribution with 2 degrees of freedom,
- that is the blue one over here, at a value of 4.60, the probability of getting something atleast that
- extreme is 10%.this is what we care about. If the kai square distribution that we care about falls into
- this rejection region, then we reject our null hypothesis but our kai square statistic is only 2.53 so
- it is sitting some place right over here. so its actually not that crazy to get it if you assume the null hypothesis. so based
- on the data we have right now, we cant reject the null hypothesis. we dont know for fact that the herbs
- do nothing but we cant say that they do something. so we are not going to reject it.but from this point
- of view , it doesnt seem like the herbs are different from each other and one of the herbs is obviously a placebo.
Be specific, and indicate a time in the video:
At 5:31, how is the moon large enough to block the sun? Isn't the sun way larger?
|
Have something that's not a question about this content? |
This discussion area is not meant for answering homework questions.
Discuss the site
For general discussions about Khan Academy, visit our Reddit discussion page.
Flag inappropriate posts
Here are posts to avoid making. If you do encounter them, flag them for attention from our Guardians.
abuse
- disrespectful or offensive
- an advertisement
not helpful
- low quality
- not about the video topic
- soliciting votes or seeking badges
- a homework question
- a duplicate answer
- repeatedly making the same post
wrong category
- a tip or feedback in Questions
- a question in Tips & Feedback
- an answer that should be its own question
about the site
Share a tip
Suggest a fix
Have something that's not a tip or feedback about this content?
This discussion area is not meant for answering homework questions.