Main content

## Statistics and probability

# ANOVA 1: Calculating SST (total sum of squares)

Analysis of Variance 1 - Calculating SST (Total Sum of Squares). Created by Sal Khan.

## Want to join the conversation?

- Sal says that the grand mean is equal to the "mean of the means" of the three groups. Is this always true, or does it depend on the groups having the same number of elements? For example, I can imagine three groups: A={1,2,3}, B={4,3,2}, C={8,9,10,11,12,13,14}. In this case the grand mean is 7.08 but the mean of means is (2+3+11)/3=5.3.(7 votes)
- For the grand mean to be the "mean of means," the groups need to have the same number of observations (i.e., be "balanced").

ANOVA will still work, but depending on how Sal expressed the formulas, the precise formulas he showed may not apply (I forget what formulas he used, and I don't feel like rewatching to find out).(11 votes)

- Hi sal I was going thru your tutorials anova 1, 2 and 3, I was not able to find if anova can be measured as one tail or two tailed test...

your help is highly appriciated.(2 votes)- This may be a bit late to help you, but ANOVA is always one-tailed.(9 votes)

- what is homogeneity of variance(4 votes)
- Homogeneity = "same". Variance is a measure of the spread of data. So homogeneity of variance refers to two groups of data that have approximately the same spread. I don't know if that answers your specific question (and I know this was posted quite some time ago) but if you have further questions I'd be happy to try to help.(5 votes)

- Is there a specific video where I can learn more about degrees of freedom?(3 votes)
- What exactly is your concern with degrees of freedom maybe the community can help if you are just looking for more general theory on it check out this site its short and sweet with a cool explanation tucked into the end of it.

www.statsdirect.com/help/basics/degrees_of_freedom.htm(4 votes)

- I though that Degrees of Freedom should be the total number of participants minus total number of groups. But here it is N-1. Would you please make the concept of Degrees of Freedom clear for me?

Thanks(2 votes)- There are several degrees of freedom in an ANOVA setting.

The*total*degrees of freedom is N - 1.

The degrees of freedom for the model is M - 1, where M is the number of groups.

The degrees of freedom for the residual / error is N - M.(6 votes)

- If the mean of mean is needed to calculate the 9th value, doesn't that count as being not determined by just the 8 values, and thus there are 9 independent values in total, and thus 9 degrees of freedom?(2 votes)
- Degrees of freedom is alwaysthe number of values that you have -1, in other words, n-1. Plus if you watch the previous video Sal explains how we take the Rows x the columns and that gives you (N). So in this example, if you multiply Rows (3) x the Columns (3)----3*3=9. 9 is your N. Now take 9-1=8. For this sample set you have 8 degrees of freedom. I hope this helps.(3 votes)

- At around6:26, I don't understand why he's referring to the degrees of freedom as m*n. I'm only familiar with df as "the number of values in the final calculation of a statistic that are free to vary" (wikipedia) and that it is used in virtually every hypothesis test(2 votes)
- He is using m*n as the number of values horizontally times the number vertically to get the total number of data points in the set. He's actually equating df to (m*n)-1. If you don't understand why df=(m*n)-1 take a look at the question above this one.(2 votes)

- I noticed in the other video that SSW and SSB were used can any one tell me which one of them is SSR and which SSE. Thank you(1 vote)
- Good question. You have to be VERY CAREFUL with these, because depending on the source, you could get confused, especially between Regression and ANOVA.

So, in ANOVA, there are THREE DIFFERENT TRADITIONS:

1) SSW (Within) + SSB (Between) = SST (Total!!)

This is what Sal uses. But if you search the web or textbooks, you ALSO FIND:

2) SSE (Error) + SST (Treatment!!) = SS(Total) THIS IS THE WORST.

3) SSE (Error) + SSM (Model) = SST (Total)

Wait, WHAT?! There are two different SST's? I know, it's horrible. Anyway, that's the way it is. If people use SST to mean "treatment", then they have to write SS(Total) for the total sum of squares, or they might even write TSS for "Total Sum of Squares".

"Error" means the same as "Within groups" This is the variation which is NOT explained by the fact that we can put the data into different groups.

"Treatment" or "Model" (or sometimes "Factor") means the same as "Between groups" This is the variation that IS explained by the fact that there are different groups of data (often because they come from patients who get different treatments).

Now, in**Regression**, we have:

SSR (Residuals) + SSE (Explained) = SST (Total)

SSR is the sum of (y_i - yhat_i)^2, so it is the**variation of the data away from the regression line**. So it is similar to SSW, it is the residual variation of y-values not explained by the changing x-value.

SSE is the sum of (yhat_i - ybar)^2, so it is the**variation of the regression line itself away from the overall mean of the y-values**. Thus it tells us how much of the variation in the data is explained by the changing x-values.(3 votes)

- Does anybody know how to calculate the dof for the denominator and the numerator? I have a sample of n=4 and only that one group. The numerator df was 3 but I can't figure out the denominator df.(1 vote)
- If you don't have multiple groups, then ANOVA probably isn't the test you want to be using.(3 votes)

- Is there any video describing about OLS regression?(2 votes)

## Video transcript

In this video and
the next few videos, we're just really going to be
doing a bunch of calculations about this data set
right over here. And hopefully, just going
through those calculations will give you an
intuitive sense of what the analysis of
variance is all about. Now, the first thing I
want to do in this video is calculate the
total sum of squares. So I'll call that SST. SS-- sum of squares total. And you could view it
as really the numerator when you calculate variance. So you're just going to take
the distance between each of these data points and the
mean of all of these data points, square them,
and just take that sum. We're not going to divide by
the degree of freedom, which you would normally do
if you were calculating sample variance. Now, what is this going to be? Well, the first
thing we need to do, we have to figure out the mean
of all of this stuff over here. And I'm actually going to
call that the grand mean. And I'm going to
show you in a second that it's the same thing as
the mean of the means of each of these data sets. So let's calculate
the grand mean. So it's going to be 3 plus 2
plus 1 plus 5 plus 3 plus 4 plus 5 plus 6 plus 7. And then we have
nine data points here so we'll divide by 9. And what is this
going to be equal to? 3 plus 2 plus 1 is 6. 6 plus-- let me just add. So these are 6. 5 plus 3 plus 4 is 12. And then 5 plus 6 plus 7 is 18. And then 6 plus 12 is 18 plus
another 18 is 36, divided by 9 is equal to 4. And let me show you that
that's the exact same thing as the mean of the means. So the mean of this
group 1 over here-- let me do it in
that same green-- the mean of group 1 over
here is 3 plus 2 plus 1. That's that 6 right over
here, divided by 3 data points so that
will be equal to 2. The mean of group 2,
the sum here is 12. We saw that right over here. 5 plus 3 plus 4 is
12, divided by 3 is 4 because we have
three data points. And then the mean
of group 3, 5 plus 6 plus 7 is 18 divided by 3 is 6. So if you were to take the
mean of the means, which is another way of viewing this
grand mean, you have 2 plus 4 plus 6, which is 12,
divided by 3 means here. And once again, you would get 4. So you could view
this as the mean of all of the data
in all of the groups or the mean of the means
of each of these groups. But either way, now that
we've calculated it, we can actually figure out
the total sum of squares. So let's do that. So it's going to be
equal to 3 minus 4-- the 4 is this 4 right over
here-- squared plus 2 minus 4 squared plus 1 minus 4 squared. Now, I'll do these guys
over here in purple. Plus 5 minus 4 squared plus 3
minus 4 squared plus 4 minus 4 squared. Let me scroll over a little bit. Now, we only have three
left, plus 5 minus 4 squared plus 6 minus 4 squared
plus 7 minus 4 squared. And what does this give us? So up here, this is going
to be equal to 3 minus 4. Difference is 1. You square it. It's actually negative 1,
but you square it, you get 1, plus you get negative 2 squared
is 4, plus negative 3 squared. Negative 3 squared is 9. And then we have here
in the magenta 5 minus 4 is 1 squared is still 1. 3 minus 4 squared is 1. You square it again,
you still get 1. And then 4 minus 4 is just 0. So we could-- well, I'll
just write the 0 there just to show you that we
actually calculated that. And then we have these
last three data points. 5 minus 4 squared. That's 1. 6 minus 4 squared. That is 4, right? That's 2 squared. And then plus 7 minus
4 is 3 squared is 9. So what's this going
to be equal to? So I have 1 plus 4
plus 9 right over here. That's 5 plus 9. This right over
here is 14, right? 5 plus-- yup, 14. And then we also have
another 14 right over here because we have a
1 plus 4 plus 9. So that right over
there is also 14. And then we have 2 over here. So it's going to be
28-- 14 times 2, 14 plus 14 is 28-- plus 2 is 30. Is equal to 30. So our total sum of
squares-- and actually, if we wanted the
variance here, we would divide this by
the degrees of freedom. And we've learned multiple
times the degrees of freedom here so let's say
that we have-- so we know that we have
m groups over here. So let me just write
it as m and I'm not going to prove things
rigorously here, but I want to show
you where some of these strange formulas that
show up in statistics books actually come from without
proving it rigorously. More to give you the intuition. So we have m groups here. And each group
here has n members. So how many total
members do we have here? Well, we had m
times n or 9, right? 3 times 3 total members. So our degrees of
freedom-- and remember, you have however
many data points you had minus 1
degrees of freedom because if you know
the mean of means, if you assume you knew
that, then only 9 minus 1, only eight of these are going
to give you new information because if you know that, you
could calculate the last one. Or it really doesn't
have to be the last one. If you have the other eight,
you could calculate this one. If you have eight of
them, you could always calculate the ninth one
using the mean of means. So one way to think
about it is that there's only eight independent
measurements here. Or if we want to
talk generally, there are m times n-- so that tells
us the total number of samples-- minus 1 degrees of freedom. And if we were actually
calculating the variance here, we would just divide
30 by m times n minus 1 or this is another way of
saying eight degrees of freedom for this exact example. We would take 30 divided
by 8 and we would actually have the variance for
this entire group, for the group of nine
when you combine them. I'll leave you
here in this video. In the next video, we're
going to try to figure out how much of this total
variance, how much of this total squared sum, total
variation comes from the variation within
each of these groups versus the variation
between the groups. And I think you get
a sense of where this whole analysis of
variance is coming from. It's the sense
that, look, there's a variance of this
entire sample of nine, but some of that variance--
if these groups are different in some way--
might come from the variation from being in different groups
versus the variation from being within a group. And we're going to
calculate those two things and we're going to
see that they're going to add up to the
total squared sum variation.