If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

## Statistics and probability

### Course: Statistics and probability>Unit 16

Lesson 1: Analysis of variance (ANOVA)

# ANOVA 1: Calculating SST (total sum of squares)

Analysis of Variance 1 - Calculating SST (Total Sum of Squares). Created by Sal Khan.

## Want to join the conversation?

• Sal says that the grand mean is equal to the "mean of the means" of the three groups. Is this always true, or does it depend on the groups having the same number of elements? For example, I can imagine three groups: A={1,2,3}, B={4,3,2}, C={8,9,10,11,12,13,14}. In this case the grand mean is 7.08 but the mean of means is (2+3+11)/3=5.3. • For the grand mean to be the "mean of means," the groups need to have the same number of observations (i.e., be "balanced").

ANOVA will still work, but depending on how Sal expressed the formulas, the precise formulas he showed may not apply (I forget what formulas he used, and I don't feel like rewatching to find out).
• Hi sal I was going thru your tutorials anova 1, 2 and 3, I was not able to find if anova can be measured as one tail or two tailed test... • what is homogeneity of variance  • I though that Degrees of Freedom should be the total number of participants minus total number of groups. But here it is N-1. Would you please make the concept of Degrees of Freedom clear for me?
Thanks • If the mean of mean is needed to calculate the 9th value, doesn't that count as being not determined by just the 8 values, and thus there are 9 independent values in total, and thus 9 degrees of freedom? • Degrees of freedom is alwaysthe number of values that you have -1, in other words, n-1. Plus if you watch the previous video Sal explains how we take the Rows x the columns and that gives you (N). So in this example, if you multiply Rows (3) x the Columns (3)----3*3=9. 9 is your N. Now take 9-1=8. For this sample set you have 8 degrees of freedom. I hope this helps.
• I noticed in the other video that SSW and SSB were used can any one tell me which one of them is SSR and which SSE. Thank you
(1 vote) • Good question. You have to be VERY CAREFUL with these, because depending on the source, you could get confused, especially between Regression and ANOVA.
So, in ANOVA, there are THREE DIFFERENT TRADITIONS:
1) SSW (Within) + SSB (Between) = SST (Total!!)
This is what Sal uses. But if you search the web or textbooks, you ALSO FIND:
2) SSE (Error) + SST (Treatment!!) = SS(Total) THIS IS THE WORST.
3) SSE (Error) + SSM (Model) = SST (Total)
Wait, WHAT?! There are two different SST's? I know, it's horrible. Anyway, that's the way it is. If people use SST to mean "treatment", then they have to write SS(Total) for the total sum of squares, or they might even write TSS for "Total Sum of Squares".
"Error" means the same as "Within groups" This is the variation which is NOT explained by the fact that we can put the data into different groups.
"Treatment" or "Model" (or sometimes "Factor") means the same as "Between groups" This is the variation that IS explained by the fact that there are different groups of data (often because they come from patients who get different treatments).
Now, in Regression, we have:
SSR (Residuals) + SSE (Explained) = SST (Total)
SSR is the sum of (y_i - yhat_i)^2, so it is the variation of the data away from the regression line. So it is similar to SSW, it is the residual variation of y-values not explained by the changing x-value.
SSE is the sum of (yhat_i - ybar)^2, so it is the variation of the regression line itself away from the overall mean of the y-values. Thus it tells us how much of the variation in the data is explained by the changing x-values.   