Main content

## Statistics and probability

# ANOVA 3: Hypothesis test with F-statistic

Analysis of Variance 3 -Hypothesis Test with F-Statistic. Created by Sal Khan.

## Want to join the conversation?

- I have a question: is SSB similar to the Explained Sum of Squares, as well as SSW similar to the Residual Sum of Squares I see in Econometric textbooks?(7 votes)
- Yes. The "two" concepts of ANOVA and linear regression are really one and the same, but sometimes there are different terms used in each (presumably for reasons of interpretation). It's basically "modeled" versus "leftover" variation.(7 votes)

- Hey, can someone explain to me why t^2=F for a ANOVA analysis?

I can use it to solve problems, but I don't know why the p-values of both density curves predict the same p-values.

How can I transform the t^2 distribution to the corresponding F distribution?

Why does it work? F uses 2 different degrees (numerator and denomentator) of freedom, while T only uses 1.(5 votes)- I found this video on youtube (http://www.youtube.com/watch?v=Rr3VaQXORo8) which showed a mathematical derivation of t^2 (d.f.=p) distribution to be equivalent to an F distribution (d.f. 1,p). Hope it helps(2 votes)

- The F(1,df) statistic is said to be equivalent to (t(df))^2. Can you explain/derive ?(3 votes)
- Please refer to this video on youtube (http://www.youtube.com/watch?v=Rr3VaQXORo8) which shows a mathematical derivation of the relationship(3 votes)

- At4:23Sal roughly defines the F-stat as the ratio between the SSB and the SSW. Can someone please explain why the SSW is the denominator and not the other way around?(1 vote)
- IF I DONT HAVE F-statistic table BUT HAVE CHI-SQUARED TABLE and want to compute critical value through definition F=(Chi-sq(p)/p)/(Chi-sq(k)/k) where p, k - degrees of freedom. I have tried to do so for F(2,19) at 5% significance level and did not come to the true value of 3,52. Here are my calculations: (5,991/2)/(30.14/19)=1.888

PLEASE HELP ME TO FIND A MISTAKE AND EXPLAIN HOW TO CALCULATE F statistics WITH CHI-SQUARED DISTRIBUTION TABLE

THANK YOU)(1 vote)- I think the mistake is with your premise: An F-distributed random variable arises from the ratio of two Chi-squares divided by their respective degrees of freedom, but it does not follow that the critical values of this random variable's distribution will be computed as the ratio of the critical values of the chi-square distributions.(4 votes)

- How do you get a critical value for an F-test when the actual degrees of freedom arent present in the F-table? For example if the dfs are 28 and 32.(2 votes)
- If you really need an accurate value, you can interpolate. So if your chart only shows 25, 30, 35, ... for both df(n) and df(d), then you can first interpolate, say, between the crit values at 25 and 30 to get two values for "row" 28 in the 30 column and "row" 28 in the 35 column. Then you can interpolate between those two to get a value for 28 and 32. To interpolate, it just means you assume the values change linearly between two (nearby!) entries in the table. For example, lets say the (25,30) entry is 1.66 and the (30,30) entry is 1.61, then you can see that it changes by -.05 as the first df goes from 25 to 30, and therefore the entry at (28,30) will be very close to 1.66 + (-.05)*(28-25)/(30-25) = 1.66 + (-.05)(3/5) = 1.63. The concept is pretty simple: if it changed by an amount (-.05) when the df increased by 5, then it should change by (3/5) of (-.05) if the df increases by 3.(2 votes)

- How come the p-value is determined only by the right tail? Why don't we consider both right and left tails?(1 vote)
- When we're doing ANOVA, the null hypothesis is "no differences among the group means." If the null hypothesis is correct, then the F statistic will be small (if the group means are all identical, it will be 0). When the group means start to differ, the F statistic gets larger. Hence, only large values make us think the null hypothesis is wrong, and thus we only look at the right tail.(4 votes)

- How come you didn't use alpha/2 if the null and alternative hypotheses were equals (as opposed to < or >)? Do these problems not have two sided tests?(1 vote)
- ANOVA is inherently a 2-sided test.

Say you have two groups, A and B, and you want to run a 2-sample t-test on them, with the alternative hypothesis being:`Ha: µ.a ≠ µ.b`

. You will get some test statistic, call it t, and some p-value, call it p1. If you then run an ANOVA on these two groups, you will get an test statistic, f, and a p-value p2.

If you look, then f = t² and p2 = p1. That is: the p-values are the exact same, and the test statistic for ANOVA is simply the square of the test statistic for the t-test. They are the same test, exactly.

Note: This requires using the*pooled*2-sample t-test. The unpooled t-test will not be exactly equivalent to ANOVA.(3 votes)

- In what cases do we use ANOVA, Chi Square, t-test etce(2 votes)
- Thank you so much for covering this! I just wanted to ask if you actually had practise questions for these, as I love the format of questioning that you guys do with the other math concepts.

Many thanks!(2 votes)

## Video transcript

In the last couple of videos we first figured out the TOTAL variation in these 9 data points right here and we got 30, that's our Total Sum of Squares. Then we asked ourselves, how much of that variation is due to variation WITHIN each of these groups, versus variation BETWEEN the groups themselves? So, for the variation within the groups we have our Sum of Squares within. And there we got 6. And then the balance of this, 30, the balance of this variation, came from variation between the groups, and we calculated it, We got 24. What I want to do in this video, is actually use this type of information, essentially these statistics we've calculated, to do some inferential statistics, to come to some time of conclusion, or maybe not to come to some type of conclusion. What I want to do is to put some context around these groups. We've been dealing with them abstractly right now, but you can imagine these are the results of some type of experiment. Let's say that I gave 3 different types of pills or 3 different types of food to people taking a test. And these are the scores on the test. So this is food 1, food 2, and then this over here is food 3. And I want to figure out if the type of food people take going into the test really affect their scores? If you look at these means, it looks like they perform best in group 3, than in group 2 or 1. But is that difference purely random? Random chance? Or can I be pretty confident that it's due to actual differences in the population means, of all of the people who would ever take food 3 vs food 2 vs food 1? So, my question here is, are the means and the true population means the same? This is a sample mean based on 3 samples. But if I knew the true population means-- So my question is: Is the mean of the population of people taking Food 1 equal to the mean of Food 2? Obviously I'll never be able to give that food to every human being that could ever live and then make them all take an exam. But there is some true mean there, it's just not really measurable. So my question is "this" equal to "this" equal to the mean 3, the true population of mean 3. And my question is, are these equal? Because if they're not equal, that means that the type of food given does have some type of impact on how people perform on a test. So let's do a little hypothesis test here. Let's say that my null hypothesis is that the means are the same. Food doesn't make a difference. "food doesn't make a difference" and that my Alternate hypothesis is that it does. "It does." and the way of thinking about this quantitatively is that if it doesn't make a difference, the true population means of the groups will be the same. The true population mean of the group that took food 1 will be the same as the group that took food 2, which will be the same as the group that took food 3. If our alternate hypothesis is correct, then these means will not be all the same. How can we test this hypothesis? So we're going to assume the null hypothesis, which is what we always do when we are hypothesis testing, we're going to assume our null hypothesis. And then essentially figure out, what are the chances of getting a certain statistic this extreme? And I haven't even defined what that statistic is. So we're going to define--we're going to assume our null hypothesis, and then we're going to come up with a statistic called the F statistic. So our F statistic which has an F distribution--and we won't go real deep into the details of the F distribution. But you can already start to think of it as the ratio of two Chi-squared distributions that may or may not have different degrees of freedom. Our F statistic is going to be the ratio of our Sum of Squares between the samples-- Sum of Squares between divided by, our degrees of freedom between and this is sometimes called the mean squares between, MSB, that, divided by the Sum of Squares within, so that's what I had done up here, the SSW in blue, divided by the SSW divided by the degrees of freedom of the SSwithin, and that was m (n-1). Now let's just think about what this is doing right here. If this number, the numerator, is much larger than the denominator, then that tells us that the variation in this data is due mostly to the differences between the actual means and its due less to the variation within the means. That's if this numerator is much bigger than this denominator over here. So that should make us believe that there is a difference in the true population mean. So if this number is really big, it should tell us that there is a lower probability that our null hypothesis is correct. If this number is really small and our denominator is larger, that means that our variation within each sample, makes up more of the total variation than our variation between the samples. So that means that our variation within each of these samples is a bigger percentage of the total variation versus the variation between the samples. So that would make us believe that "hey! ya know... any difference we see between the means is probably just random." And that would make it a little harder to reject the null. So let's actually calculate it. So in this case, our SSbetween, we calculated over here, was 24. and we had 2 degrees of freedom. And our SSwithin was 6 and we had how many degrees of freedom? Also, 6. 6 degrees of freedom. So this is going to be 24/2 which is 12, divided by 1. Our F statistic that we've calculated is going to be 12. F stands for Fischer who is the biologist and statistician who came up with this. So our F statistic is going to be 12. We're going to see that this is a pretty high number. Now, one thing I forgot to mention, with any hypothesis test, we're going to need some type of significance level. So let's say the significance level that we care about, for our hypothesis test, is 10%. 0.10 -- which means that if we assume the null hypothesis, there is less than a 10% chance of getting the result we got, of getting this F statistic, then we will reject the null hypothesis. So what we want to do is figure out a critical F statistic value, that getting that extreme of a value or greater, is 10% and if this is bigger than our critical F statistic value, then we're going to reject the null hypothesis, if it's less, we can't reject the null. So I'm not going to go into a lot of the guts of the F statistic, but we can already appreciate that each of these Sum of squares has a Chi-squared distribution. "This" has a Chi-squared distribution, and "this" has a different Chi-squared distribution This is a Chi-squared distribution with 2 degrees of freedom, this is a Chi-squared distribution with--And we haven't normalized it and all of that-- but roughly a Chi squared distribution with 6 degrees of freedom. So the F distribution is actually the ratio of two Chi-squared distributions And I got this--this is a screenshot from a professor's course at UCLA, I hope they don't mind, I need to find us an F table for us to look into. But this is what an F distribution looks like. And obviously it's going to look different depending on the df of the numerator and the denominator. There's two df to think about, the numerator degrees of freedom and the denominator degrees of freedom With that said, let's calculate the critical F statistic, for alpha is equal to 0.10, and you're actually going to see different F tables for each different alpha, where our numerator df is 2, and our denominator df is 6. So this table that I got, this whole table is for an alpha of 10% or 0.10, and our numerator df was 2 and our denominator was 6. So our critical F value is 3.46. So our critical F value is 3.46--this value right over here is 3.46 The value that we got based on our data is much larger than this, WAY above it. It's going to have a very, very small p value. The probability of getting something this extreme, just by chance, assuming the null hypothesis, is very low. It's way bigger than our critical F statistic with a 10% significance level. So because of that we can reject the null hypothesis. Which leads us to believe, "you know what, there probably IS a difference in the population means." Which tells us there probably is a difference in performance on an exam if you give them the different foods.