Current time:0:00Total duration:7:39

0 energy points

# ANOVA 1: Calculating SST (total sum of squares)

Analysis of Variance 1 - Calculating SST (Total Sum of Squares). Created by Sal Khan.

Video transcript

In this video and in the next few videos we'll actually be doing a bunch of calculations about this data set right over here. And hopefully just going through those calculations will give you an intuitive sense of what the analysis of variance is all about. Now the first thing I wanna do in this video is calculate the total sum of squares. So I call that 'SST'. Sum of squares total. And you could view it as really the numerator when you calculate the variance. So we're just gonna take the distance between of each of these data points and the mean of all of these data points, square them and just take that sum, we'll not really divide by the degree of freedom, which you normally do if you're calculating sample variance. Now what is this going to be? Well the first thing we got to do is we have to figure out the mean of all of this stuff over here. And I'm actually gonna call that the grand mean. I'm gonna call that the grand mean. And let me show you in a second that it's the same thing as the mean of the means of each of these data sets. So let's calculate the grand means. So that's gonna be 3 plus 2 plus 1. 3 plus 2 plus 1 plus 5 plus 3 plus 4 plus 5 plus 6 plus 7 ... plus 5 plus 6 plus 7. And then we have nine data points here. We have nine data points so we're gonna divide by nine and then this is gonna be equal to '...'. 3 plus 2 plus 1 is 6. 6 plus, let me just.. er so these are 6. 5 plus 3 plus 4 is, that's 12. And then 5 plus 6 plus seven is 18. And then 6 plus 12 is 18, plus another 18 is 36 divided by nine is equal to 4. Let me show you that that's the exact same thing as the mean of the means. So this, the mean of this group one over here, that's seen in green, the mean of group one over here is 3 plus 2 plus 1, that's 6 right over here, divided by 3 data points, so that would be equal to 2. The mean of group 2, the sum here is 12, we saw that right over here: 5 plus 3 plus 4 is 12, divided by 3 is 4, cause we have three data points. And then the mean of group 3, 5 plus 6 plus 7 is 18 divided by 3 is 6. So if you're gonna take the mean of the means which is in another way this grand mean, you have 2 plus 4 plus 6 which is 12 divided by 3 means here and once again you'd get 4. So you can view this the mean of all of the data and all of the groups or the mean of the means of each of these groups. But either way now that we've calculated it we can actually figure out the total sum of squares. So let's do that. So it's going to be equal to: 3 minus 4, the 4 is this 4 right over here, squared plus 2 minus 4 squared plus 1 minus 4 squared, now I'll do these guys over here in purple, plus 5 minus 4 squared plus 3 minus 4 squared plus 4 minus 4 squared I'll just scroll over here a little bit, plus 4 minus 4 squared. Now we only have three left. Plus 5 minus 4 squared plus 6 minus 4 squared plus 7 minus 4 squared. Now what does this give us? So up here this first is gonna be equal to, 3 minus 4 the difference is 1, you square it, you're gonna get, er, it's actually a negative 1, you square it you get one. Plus, you get negative 2 squared is 4 plus negative 3 squared. Negative 3 squared is 9. And then we have here in the magenta: 5 minus 4 is 1, squared is still 1. 3 minus 4 squared is 1 you square it again you still get 1 and 4 minus 4 is just a 0. So we can ... let me just write a 0 here just to show you that we actually calculated that. And then we have these last 3 data points. 5 minus 4 squared, that's one. 6 minus 4 squared, that is 4, it's 2 squared. And then plus 7 minus 4 is 3 squared is 9. So what's this going to be equal to. So I have 1 plus 4 plus 9. 1 plus 4 plus 9 right over here, that's 5 plus 9. This right over here is 14, right? 5 plus ..., yep, 14. And we also have another 14 right over here cause we have a 1 plus 4 plus 9 so that right over there is also 14. And then we have 2 over here. So it's gonna be 28, 14 times 2, 14 plus 14 is 28, plus 2 is 30. Is equal to 30. So our total sum of squares And actually if we wanted the variance here we would divide this by the degrees of freedom. And these are multiple times the degrees of freedom here. So let's say, let's say that we have so we know we have m groups over here, so let me just write this m. And, I'm not gonna prove things rigorously here but I want you to show, I wanna show you where some of these strange formulas that show up in statistics would actually come from without proving it rigorously, more to give you the intuition. So we have m groups here and each group here has n members. So how many total members do we have here? Well we have m times n or 9, right? 3 times 3 total members. So degrees of freedom, we remember, you have this many, however many data points you have minus 1 degrees of freedom. Because if you know if you knew the mean of means, if you know the mean of means, if you assume you knew that then you only would, would only n, only, er, hehe. 9 minus 1, only 8 of these were going to give you new information because if you know that you could calculate the last, or it really wouldn't have to be the last one if you have the other 8 you can calculate this one. If you have 8 of them you could always calculate the 9th one using the mean of means. So one way to think about it is that theres only 8 independent measurements here. Or if you want to talk in terms of general, you want to talk in general, there are m times n, so that is total number of samples, minus 1 degrees of freedom. And if you're actually calculating the variance here we would just divide 30 by m times n minus 1. Or this is another way of saying 8 degrees of freedom for this exact example. You take 30 divided by 8 and you actually have the variance for this entire group, the group of 9 [...]. I'll leave you here in this video. In the next video we're gonna try to figure out how much of this total variance, how much of this total squared sum, total variation, comes from the variation within each of these groups versus the variation between the groups. And I think you'll get a sense of where this whole analysis of variance is coming from. Look there is the variance of this entire sample of nine but some of that variance, if these groups are different in some way, might come from the variation from being in different groups versus the variation from being within a group. We're gonna calculate those two things and we're going to see that they're going to add up to the total square sum variation.