If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

# Chi-square goodness-of-fit example

AP.STATS:
DAT‑3 (EU)
,
DAT‑3.J (LO)
,
DAT‑3.J.1 (EK)
,
DAT‑3.J.2 (EK)
,
VAR‑1 (EU)
,
VAR‑1.J (LO)
,
VAR‑1.J.1 (EK)
,
VAR‑8 (EU)
,
VAR‑8.B (LO)
,
VAR‑8.B.1 (EK)
,
VAR‑8.C (LO)
,
VAR‑8.C.1 (EK)
,
VAR‑8.D (LO)
,
VAR‑8.D.1 (EK)
,
VAR‑8.E (LO)
,
VAR‑8.E.1 (EK)
,
VAR‑8.F (LO)
,
VAR‑8.F.1 (EK)
,
VAR‑8.G (LO)
,
VAR‑8.G.1 (EK)

## Video transcript

in the game rock-paper-scissors Kenny expects to win tie and lose with equal frequency Kenny plays rock-paper-scissors often but he suspected his own games were not following that pattern so he took a random sample of 24 games and recorded their outcomes here are his results so out of the 24 games he won for lost 13 and tied 7 times he wants to use these results to carry out a chi-squared goodness of fit test to determine if the distribution of his outcomes disagrees with an even distribution what are the values of the test statistic the chi-squared test statistic and p-value for Kenny's test so pause this video and see if you can figure that out okay so he's essentially just doing a hypothesis test using the chi-square test statistic because it's a hypothesis that's thinking about multiple categories so what would his null hypothesis be well his null hypothesis would be that he has that all of the outcomes are equal probability outcomes equal equal probability and then his alternative hypothesis would be that his outcomes have not equal not equal probability remember we assume that the null hypothesis is true and then assuming if the null hypothesis is true the probability of getting a result at least this extreme is low enough then we would reject our null hypothesis another way to think about is if our p-values below a threshold we would reject our null hypothesis and so what he did is he took a sample of 24 games so n is equal to 24 and then this was the data that he got now before we even calculate our chi-square statistic and figure out what's the probability of getting a chi-square statistic that large or greater let's make sure we meet the conditions for inference for a chi-squared goodness of fit test so you've seen some of them but some of them are a little bit different one is the random condition I'll write them up here the random condition and that would be that this truly a random sample of games and it tells us right here he took a random sample of his 24 so we meet that condition the second condition when we're talking about chi-squared hypothesis testing is the large counts large counts condition and this is an important one to appreciate this is that the expected number of each category of outcomes is at least equal to five now you might say hey wait wait I only got four wins or kenney only got four wins out of his sample of 24 but that does not violate the large counts condition remember what is the expected number of wins losses and ties well if you were assuming the null hypothesis where the outcomes have equal probability so the expected the expected I could write right over here it would be that it's 1/3 1/3 1/3 and so 1/3 of 24 is 8 8 and 8 that's what Kenny would expect and since because all of these are at least equal to 5 we meet the large counts condition and then the last condition is the independence condition if we aren't sampling with replacement then we just have to feel good that our sample size is no more than 10% of the population and he can definitely play more than 240 games in his life so we would assume that we meet that condition as well and so with that out of the way we can actually calculate our chi-square it's it's a statistic and try to make some inference based on it and so let's see our chi-squared statistic is going to be equal to so for each category it's going to be the difference between the expected and what he got in that sample squared divided by the expected so the first category is wins so that's going to be 4 minus 8 4 minus 8 squared over an expected number of wins of 8 plus losses so that's 13 minus 8 13 is how many he got how many he lost minus 8 expected squared over the number expected plus he got 7 ties he would have expected 8 squared all of that over 8 and so let's see what is this 4 minus 8 is negative 4 you square that you get 16 13 minus 8 is 5 if you square that you get 25 seven minus eight is negative one square that you get one and 16 divided by 8 is going to be 225 divided by 8 is going to be let's see that's 3 and 1/8 so that's three point one two five and then 1/8 is 0.125 0.125 you add these together so let's see it's going to be two plus three point one two five five point one two five plus another 0.125 so that's going to be five point 2 5 so our chi-squared statistic is five point two five and now to figure out our p value our p value is going to be equal to the probability of getting a chi-square statistic greater than or equal to five point two five and you could use a chi-squared table for that and we always have to think about our degrees of freedom we have one two three categories so our degrees of freedom is going to be one less than that or 3 minus 1 which is 2 so our degrees of freedom it's going to be equal to 2 and that makes sense because you know for a certain number of games if you know the number of wins and you know the certain number of losses you can figure out the number of ties or if you know any two of these categories you can always figure out the third so that's why you have two degrees of freedom and so let's get out our chi-squared table so we have two degrees of freedom so we are in this row and where is five point two five so five point two five is right over there and so our probability is going to be between zero point one zero and zero point zero five so our p-value it's going to be greater than 0.05 and less than zero point one zero and so for example if ahead of time and he should have done this ahead of time he said a significance level of 5% and our p-value here is greater than 5% which we just saw he would fail to reject in this situation the null hypothesis but they're not asking us that here all they're asking us is what is our chi-squared value and what range is our p-value in well let's see five point two five both of these values and we saw we got a p-value between 5% and 10% so it is choice a right over there
AP® is a registered trademark of the College Board, which has not reviewed this resource.