If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

## Statistics and probability

### Course: Statistics and probability>Unit 14

Lesson 1: Chi-square goodness-of-fit tests

# Chi-square distribution introduction

Chi-Square Distribution Introduction. Created by Sal Khan.

## Want to join the conversation?

• I always love Sal's videos and they help me so much, but this one confused me even further than I already was. I know it says Chi-Square introduction, but could we get an intro for the intro? Or am I the only one lost with this?!
• You may find it helpful to go back and watch the introductory videos for the normal distribution. If they are too advanced it may be because you don't have a good grasp on probability notation, in which case you need to go back further and watch the introductions to probability including expected values and variance.
• I thought degrees of freedom are n-1.
• The degrees of freedom vary depending on the constraints on your data.

In the "Pearson's chi square test" video, the degree of freedom is indeed n-1. This is because the total number of customers at the restaurant is fixed for the observed and the expected values. So if we know n-1 of the data points (in this case customers on a given day of the week) then we can figure out the last data point because we know the fixed total.

In "Contigency table chi-square test", the degree of freedom in completely different because our data is measured along two axes (type of drug and cured/sick). For a contingency table, the degree of freedom is (r - 1) * (c -1) where r is the number of rows and c is the number of columns.

In this video, a chi-square random variable is constructed out of n independent normal random variables. There are no constraints placed on these variables, so knowing n-1 doesn't mean you know the last one. So in this case, the degree of freedom is just n.
• Is the chi-square test still valid if we are not sure whether the "random variable" is subject to gaussian distribution?
• I pretty sure it has to be a normal distribution. You would have to run a separate test (probability plot, ect) to determine normality before running a chi-square test.
• What is the y axis on the Chi-Square graph?
• The Chi-square graph in the video plots probability density function value (y-axis) against for chi-squared variable (x-axis) at different degree-of-freedom values. It is important to remind ourselves that in probability 'density' function graph y-axis does not represent a probability for each variable. Rather, the summed area of the 'range' of variables define probability.

Hope this helps..
• My intuition for understanding the chi-square distribution is that while the sampling distribution of the sample means can be described with a normal distribution, the sampling distribution of sample variances can be described as a chi-square distribution (provided the population is normally distributed). Then I can say that the ratio of two chi-squares can be described with the F-distribution, the ratio of a normal and chi-square is "t" distributed, etc. I realize these statements may be over generalized, but how far off the mark am i in framing my understanding this way?
• Not very far off the mark at all. It depends on how precise you want to get, but if you're shooting for a general idea, you're right on the bulls-eye.

To get more technical:
- An F distribution is the ratio of two Chi-square variables, each of which is divided its respective degrees of freedom. So (C1/c1) / (C2/c2), where the capital letters are the random variable (RV), and the lowercase are the degrees of freedom.
- A t-distribution is the ratio of a Standard Normal divided by the square root of a Chi-square divided by it's degrees of freedom, e.g., Z / sqrt( C/c ), where Z is a standard normal RV, C is the Chi-square RV, and c is the degrees of freedom.
- The sample variance isn't directly Chi-square distributed, but an interesting ratio of it is, it's (n-1)*s^2 / sigma^2. So, this ratio uses the real variance inside of it, but that's not really a problem, because one of the main uses of this is to calculate a t test statistic, and the sigma term will actually cancel out.
• Which videos explain what degrees of freedom is ?
• Degrees of freedom indicate how many variable can vary. Formal: the numbers of observations minus the number of parameters (restrictions).
This is likely to not get you very far and there are no videos on it (on this site). I recommend searching the internet for more info
• Is it possible for this topic to be explained in... a simpler way?
• how does one come up with the "degrees of freedom" data in order to confirm or deny the observed vs expected samples? are the "degrees of freedom" charts something that can be expected with a typical problem that may need to be solved and how does one determine the "degrees of freedom" information in the lab/field in a real world setting where the degrees of freedom may be unknown?
(1 vote)
• Degrees of freedom are the number of values that are "free" to vary depending on the parameter you are trying to estimate. Using sample variance to estimate population variance is a typical example used to illustrate the concept (and possibly the most appropriate given that you seem to be studying the chi-square distribution). Because all the residuals (distance of each data point from the sample mean) in a sample must add up to 0, you could figure out what the last data point must be if you are all the other ones. In that sense, that last data point isn't "free" to vary because it must be THE value that makes the residuals add to zero. This is where the "n - 1" degrees of freedom arises from for sample variance and its corresponding chi-square distribution.

Finding the degrees of freedom is simply understanding the math and constraints underlying the parameters you are estimating. To my understanding, they aren't ever "unknown" in the field and are at most a simple calculation away. Some parameters have more than one degree of freedom (an example is the F-stat, which is a fraction and it's numerator and denominator will have separate degrees of freedom)
• At Sal says that "we are essentially sampling from this standard normal distribution and then squaring whatever number you got". This is where I get confused. If we are sampling from this normal distribution, does that sample size has a sample size of n associated, and what is the relationship between that sample size n and the chi-square degrees of freedom referred to in the remainder of the video?
(1 vote)
• If X is Normally distributed with mean=0 and sd=1 , then X^2 is Chi-square distributed with df=1. Furthermore, if several Chi-square variables are independent, then if we add them together, we just need to add up their degrees of freedom.

So if X1, X2, and X3 are all Normally distributed with mean 0 and sd 1, and all three are independent, then X1^2 + X2^2 + X3^2 is Chi-square distributed with df=3.