If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content
Current time:0:00Total duration:17:37

Video transcript

let's say there are a couple of herbs that people believe help help prevent the flu so to test this what we do is we wait for flu season and we randomly assign people to three different groups and over the the course of flu season we have them either in one group taking herb one and the second group taking herb two and in the third group they take a placebo and if you don't know placebo is it's something that to the patient or to the person participating it feels like they're taking something that you've told them might help them but it does nothing it could be just a sugar pill just so it feels like medicine the reason why you go even go through the effort of giving them something is because often times or something called the placebo effect where people get better just because they're being told that they're giving that they're being given something that will make them better so this could be this could right here and just be a sugar pill this right here could just be a sugar pill and a very small amount of sugar so it really can't affect their their actual likelihood of getting the flu so what let over here we have a table and this is actually called a contingency table contingency table contingency table and it has on it in each group the number that got sick the number that didn't get sick and so we also can from this calculate the total number so in group in Group one we had a total of 120 people and group two we had a total of 30 plus 110 is 140 people and in the placebo group the groups that the group just got the sugar pill we had total of 120 people and then we could also tabulate the number of people the total number of people that got sick so that's 20 plus 30 is 50 plus 30 is 80 so I mean this is the total column right over here and then the total people that didn't get sick over here is 100 plus 110 is 210 plus 90 is 300 300 and then the total people here are 380 both this column and this row should add up to 380 so with that out of the way let's think about how we can use this information in the contingency table and our knowledge of the chi-squared distribution to come up with some conclusion so let's just make a null hypothesis our null hypothesis is that the herbs do nothing the null hypothesis is let's just assume let me get some space here so let's assume the null hypothesis that the herbs herbs do nothing do nothing and then we have our alternative hypothesis our alternative hypothesis that the herbs do something herbs do something notice I'm not I don't even care whether they actually improve I'm just saying they do something they might even increase your likelihood of getting the flu we're not testing whether they're actually good we're just saying are they different than just doing nothing so like we did do with all of our hypothesis tests let's just assume the null we're going to assume assume the null and given that assumption figure out the props figure out if the likelihood of getting data like this or more extreme is really low and if it is really low then we will reject the null hypothesis and in this test like every hypothesis test we need a significance level and let's say our significance level we care about for whatever reason is 10 percent or 0 zero point one zero or 10 percent that's the significance level that we care about now to do this we have to figure out we have to calculate a chi-square test statistic for this contingency table and to do that we do it very similar to what we did with the restaurant situation we figure out assuming the null hypothesis the expected results you would have gotten in each of these in each of these cells you can view each of these entries as a cell you know that's what we do with it you called each of those entries in excel also a cell d2 the entries in the table what we do is we figure out what the expected value would have been would have been if the null hypothesis if you do assume the null hypothesis then we find this we the squared distance from that expected value and we we I usually call it normalize it by the expected value take the sum of all of those differences and if that's if that if those difference is those squared differences are really big the probability of getting it would be really small and maybe we'll reject the null hypothesis so let's just figure out how we can get the expected the expected number so we're assuming the IRB's do nothing so if the herbs do nothing then we can just figure out then you know this whole population they just had nothing happened to them these herbs were useless and so we can figure we can use this population sample or I shouldn't call it the population we should use this sample right here to figure out the expected number of people who would get sick or not sick and so over here over here we have 80 out of three hundred eighty did not get sick and I want to be careful I just said the word population but we haven't sampled the whole universe of all people taking this herbs this is a sample so I don't want to confuse you as using the population and more of the conversational sense than the statistical sense anyway of our sample and we're using all of the data because we you know there's we're assuming there's no difference so we might as well just use the total data to figure out the expected frequency of getting sick and not getting sick so 80 divided by 380 did not get sick and that's 21% 21% did not get sick so let me write that over here so 21 and that's 21% of the total and then if this would be 79% if we just subtract 1 minus 21 we could divide 300 by 380 we should get 79% as well so you would expect one would expect the 21% of each of your total based on the total sample right over here that you our best guess is a 21% should be getting sick and 79% should not be getting sick so let's look at it for each of these groups if we assume that 21% of these 120 people should have gotten sick what would have been the expected value right over here so let's just multiply 21% let's just multiply this 21% times 120 so let's just multiply that times 120 that gets us to 25 point I'll just around a twenty five point three people should have gotten sick so the expected so let me write it over here I'll do expected in in yellow so the expected the expected right over here if you assume the 21 percent of each group should have gotten sick is that you would have expected twenty five point three people to get sick in Group one and herb one group and then the remainder will not get sick so let's just let's just subtract let's just or I could actually multiply 79% times 120 either one of those will be good but let me just take 120 minus twenty five point three and then I get ninety four point seven ninety four point seven so you would have expected 94.7 to not get sick 94 so this is expected again expected expected ninety four point seven and not get sick and I'll let's do that for each of these groups so once again group two you would have expected twenty one percent to get sick twenty one percent of the total people in that group so that's one hundred and forty so that's twenty nine point four and then the remainder see one forty - twenty nine point four should not have gotten sick so that gets us this right here we have Twenty twenty nine point four should have gotten sick if the herbs did nothing and then over here we would have 110 110 0.6 should not have gotten sick and these are pretty close so just looking at the numbers it looks like this herb doesn't do too much relative to the total or all of the groups combined and then in the placebo group let's see what happens let's see what happens you have thirty I'm sorry we expect 21% to get sick twenty-one percent of our group of 120 so it's twenty five point two twenty five point two so this right over here and actually I should I should make this should be a twenty five point since we're rounding actually these will be the same number over here so I said twenty one percent but it's twenty-one point something something something the group sizes are the same so we should expect the same proportion to get sick so I'll say twenty five point three just to make it consistent the reason why I got twenty five point two just now is this because I lost some of the trailing decimals over here but since I had them over here I'm going to use them over here as well and then over here in this group you would like you would expect 94.7 ninety four point seven to get sick so if you just actually relied on this data it looks like herb two is that to some degree even worse then it's even worse than the oh no no I take that back it's not worse because you would have expected a small number and a lot of people got sick here so this is the placebo well anyway we don't want to make judgments just staring at the numbers let's figure out our chi-square test statistic and to do that let's get our statistic our chi-square statistic I'll write it like this maybe for fun or maybe I'll write it as a big X because it's really it this this this random variables distribution is approximately a chi-squared distribution so I'll write it like that and well we'll talk about the degrees of freedom in a second actually let me write it the Curley extra so you see that some people write it with the KY instead of the X so our chi-square statistic over here we're literally just going to find the the squared distance between the observed and expected and then divide it by the expect it's going to be twenty minus twenty five point three twenty minus twenty five point three squared over twenty five point three twenty five point three plus 30 minus twenty nine point four squared over twenty nine point four I'm going to run out of space plus 30 minus twenty five point three squared over twenty five point three and I'm gonna have to do these over here so let me just continue it you can ignore this h1 over here so plus plus 100 minus ninety four point seven squared over ninety four point seven plus I think you see where this going 110 minus 110 0.6 squared over 110 point six and then finally plus ninety over ninety four point sorry ninety minus ninety four point seven scroll to the right a little bit squared all of that over ninety four point seven so let me just get the calculator out to calculate this take a little bit of time so we have to type it on the calculator for these parentheses so we have twenty minus 20-25 point 3 squared divided by 25 point 3 plus open parentheses 30 minus 20 9 point 4 squared divided by twenty nine point four plus open parentheses 30 minus twenty five point three twenty five point three squared divided by twenty five point three halfway there plus 100 open parentheses this is a tedious part 100 minus ninety four point seven squared divided by ninety four point seven plus 110 - well this will I liked you type it out we could do a lot of these in our head but let me just do it 110 minus 110 point six squared divided by 110 point six and then last one homestretch assuming we haven't made any mistakes we have 90 minus ninety four point seven squared divided by ninety four point seven and let's see what we get we get two point five two eight so let's just say it's two point five three so r-ky our chi-square just statistic I always have trouble saying that our chi-square statistic assuming the null hypothesis is correct is equal to two point five two point five three now the next thing we have to do is figure out the degrees of freedom that we had in calculating this chi-square statistic and I'll give you the rule of thumb and I'll give you a little bit of a sense of why this is the rule of thumb for contingency table like this and then the future we'll talk a little bit more deeply about degrees of freedom so when you do the rule of thumb for contingency table is you have the number of rows so you have rows and then you have your number of columns so here we have two rows and we have three columns you don't count the littlez so you have three columns over here and the degrees of freedom the degrees of freedom and this is the rule of thumb the degrees of freedom for your contingency table is going to be the number of rows minus one times the number of columns minus one in our situation we have two rows and three columns so it's going to be two minus one times three minus one so it's going to be 2 minus 1 times 3 minus 1 which is just 1 times 2 which is 2 we have two degrees of freedom now the reason that that should make a little bit of intuitive sense we'll talk about this in more depth in the future is that if you if you assume that you know the totals so let's just assume that you know the total so if you know all of this information over here if you know the total information or if you knew the parameters of the population as well but if you know the total information and if you know this information or if you know if you know R minus 1 of the R minus 1 of the information in the rows the last one can be figured out just by subtracting from the total so for example in this situation if you know this you can easily figure out this this is not new information it's just a total minus 20 same thing if you know this one right over here this one over here is not new information and similarly if you know these two this guy over here isn't new information you can just you can always just calculate him based on the total and everything else so that's the sense of why our degrees of freedom are the columns minus 1 times the rows minus 1 but anyway so our chi-square statistic has two degrees of freedom so what we have to do is remember our alpha value let me get it up here we have it right over here our significance level that we care about our alpha value is 10% let me rewrite it over here so our alpha is 10% so what we're going to do is figure out what is the critical Chi what is our critical chi-square statistic that gives us an alpha of 10% if this is more extreme than that if the probability of getting this is even less than that critical statistic it'll be less than 10% and we'll reject the null hypothesis if it's not more extreme then we won't reject null hypothesis so what we need to do is to figure out with a chi-squared distribution and two degrees of freedom what is our critical chi-square statistic so let's just go back so we have two degrees of freedom we have two degrees of freedom here and we have a critical we care about a significance level of 10 percent 10 percent so our critical our critical chi-square value is four point six zero so another way to visualize this if we look at the chi-square distribution with two degrees of freedom that's this blue one over here this blue one over here at a value of trying to pick up nice blue to use at a value of a critical value of four point six zero so four point six zero this is five so four point six zero will be right around here at a critical value of four point six zero so this is four point six zero you have it the probability of getting something that ecstatic stream so that extreme or more extreme is ten percent this is our this is what we care about now if the chi-square statistic that we calculated falls into this rejection region then we're going to reject the null hypothesis but our chi-squared statistic statistic is only two point five three it is only two point five three so it's sitting it's sitting someplace right over here is actually ours so it's actually very it's not that crazy to get it if you assume the null hypothesis so based on our data that we have right now we cannot reject the null hypothesis so we cannot we don't know for a fact that the IRB's do nothing but we can't say that they do something based on this so we're not going to reject it we won't say 100% that is true but we can't say that we're rejecting it so at least from this point of view we it doesn't look like the IRB's did anything that would make us believe that they that they're any different then then each other and one of the herbs is obviously a placebo so any different than a placebo or each other