If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Sample variance

We delve into measuring variability in quantitative data, focusing on calculating sample variance and population variance. The importance of using a sample size minus one (n-1) for a more accurate estimate is highlighted. The distinction between sample mean and population mean is also clarified. Created by Sal Khan.

Want to join the conversation?

  • piceratops ultimate style avatar for user manvithn
    What if n=1? Then wouldn't the sample variance be infinity?
    (40 votes)
    Default Khan Academy avatar avatar for user
  • leaf green style avatar for user JB Segal
    Where did the "-x{bar}" at come from? I've totally missed that.
    (22 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Veronica Vaes
    It really bothers me that these terms are introduced here without a definition. Where would I even go to get some context? It seems like Variance doesn't actually get defined until the next course, which is absurd.
    (35 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      I understand your frustration. It's important for educational materials to provide clear definitions and context for new terms. If you're looking for more context on variance, you might find it helpful to refer to textbooks on statistics or online resources that explain statistical concepts in more detail. Additionally, seeking out supplementary materials or lectures on the topic could provide you with the background information you're seeking.
      (2 votes)
  • aqualine ultimate style avatar for user Caleb Man
    (29 votes)
    Default Khan Academy avatar avatar for user
  • female robot grace style avatar for user **RJ**
    How do we know when to divide by n and when to divide by n-1? Or is it better to always divide by n-1?
    (7 votes)
    Default Khan Academy avatar avatar for user
    • old spice man green style avatar for user jmascaro
      Hi RJ,
      We divide by n when we know a large majority of the data points. For example, if there are 7 tigers and we know 6 of their ages, then we would divide by n. We divide by n-1 when our sample is relatively small. For example, we know the ages of 5 hippos but there are 42 of them. In this case, divide by n-1 because, due to the small sample, we are probably underestimating the average age.
      Hope that helps.
      (27 votes)
  • leaf green style avatar for user chris.stronen
    Why not when finding the variance, find the absolute value of each variables distance from the mean? Why square it? Would the above procedure just give you standard deviation?
    (10 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user SteveSargentJr
      First, we could take their absolute values but that would give us a totally different statistic, called the Mean Absolute Deviation (or MAD for short). There are various reasons why the standard deviation is preferred over the MAD (but that gets pretty technical). The point is that you can take the absolute value but it will, in general, give you a totally different number not equal to the standard deviation.

      Lastly, when we square the distance from the mean, we also are squaring the units associated with them. So, if you are gathering data on children's heights and you want to calculate the variance, the result will be, for instance, 16 inches squared. Then, we take the square root of the variance (because it makes more sense to talk about height in terms of "inches" rather then "inches squared"), giving us a standard deviation of 4 inches. Does this make sense?
      (20 votes)
  • purple pi purple style avatar for user Kara
    How do we know when it's ok to use a caculator when we're doing the math excersises?

    I want to be able to do these problems as well as if I was in a 'real' classroom, so I don't want to cheat and use one when I shouldn't, but I don't know where we're meant to use one and where we should be doing the math completely on our own. Some of these topics doing it without a caculater takes quite a while and I've wondered if it would be ok, and now Sal is using one in this video. Would we be using one with this math topic in a classroom setting?
    (10 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Matthew Daly
      It's really up to you. You could even go through the exercise three different times, once doing the figures by hand (or at least until you got incredibly bored!), once with a statistical calculator, and maybe even once with a spreadsheet or a statistical software package. Think about drills like this not as an obligation to a teacher but as an opportunity to develop critical skills to the degree that you would like.
      (19 votes)
  • duskpin ultimate style avatar for user victoriamathew12345
    Why do we square the differences? And also, would dividing by n-2 also work?
    (18 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Stacy Ashauer Averett
    Is there a resource for understanding what a sample variance is? He just starts talking about it like I am already familiar with the concept and definition, but this is the first time it's been mentioned as I've been working through this class.
    (16 votes)
    Default Khan Academy avatar avatar for user
  • male robot hal style avatar for user RandomDad
    It seems to me that, throughout watching all videos previous this, statistics is not based on any reasonable methodology, is that truth?
    (5 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Dr C
      No, it's not the truth. Statistics can sometimes be challenging to understand when first starting, but it is based on very reasonable methodology.

      Part of the issue in learning intro stats is that some of the oddities require more advanced math to understand. So it's to some degree a chicken-and-the-egg type problem. Both the theory and the methods, without the context of the other, can seem arbitrary at times.
      (21 votes)

Video transcript

Let's say that you're curious about people's TV watching habits. And in particular, how much TV do people in the country watch? So what you are concerned with, if we imagine the entire country-- and we've already talked about-- especially if we're talking about a country like the United States, but pretty much any country, is a very large population. In the United States, we're talking about on the order of 300 million people. So ideally, if you could somehow magically do it, you would survey or somehow observe all 300 million people and take the mean of how many hours of TV they watch on a given day. And then that will give you the parameter, the population mean. But we've already talked about, in a case like this, that's a very impractical. Even if you tried to do it, by the time you did it, your data might be stale because some people might have passed away, other people might have been born. Who knows what might have happened. And so this is a truth that is out there. There is a theoretical population mean for the amount of the average or the mean hours of TV watched per day by Americans. There is a truth here at any given point in time. It's just pretty much impossible to come up with the exact answer, to come up with this exact truth. But you don't give up. You say, well, maybe I don't have to survey all 300 million or observe all 300 million. Instead, I'm just going to observe a sample, right over here. And let's say, to make the computation simple, you do a sample of six. And we'll talk about later why six might not be as large of a sample as you would like. But you survey how much TV these folks watch. And you find one person who watched 1 and 1/2 hours. Another person watched 2 and 1/2 hours. Another person watched 4 hours. And then you get one person who watched 2 hours. And you get two people who watched 1 hour each. So given this data from your sample, what do you get as your sample mean? Well, the sample mean, which we would denote by lowercase x with a bar over it, is just the sum of all of these divided by the number of data points we have. So let's see we have 1.5 plus 2.5 plus 4 plus 2 plus 1 plus 1. And all of that divided by 6, which gives-- let's see, the numerator 1.5 plus 2.5 is 4, plus 4 is 8, plus 2 is 10, plus 2 more is 12. So it's going to be 12 over 6, which is equal to 2 hours of television. So at least for your sample, you say, my sample mean is two hours of television. It's an estimate. It's a statistic that is trying to estimate this parameter, this thing that's very hard to know. But it's our best shot. Maybe we get a better answer if we get more data points. But this is we have so far. Now the next question you ask yourself is, well, I don't want to just estimate my population mean. I also want to estimate another parameter. I also am interested in estimating my population variance. So once again, since we can't survey every one in the population, this is pretty much impossible to know. But we're going to attempt to estimate of this parameter. We attempted to estimate the mean. Now we will also attempt to estimate this parameter, this variance parameter. So how would you do it? Well, reasonable logic would say, well, we maybe we'll do the same thing with a sample as we would have done with the population. When you're doing the population variance, you would take each data point in the population, find the distance between that and the normal population mean, take the square of that difference, and then add up all the squares of those differences, and then divide by the number of data points you have. So let's try that over here. So let's try to find-- take each of these data points, and find the difference-- let me do that in a different color-- each of these data points, and find the difference between that data point and our sample mean-- not the population mean, we don't know what the population mean-- the sample mean. So that's that first data point plus the second data point-- so it's 4 minus 2 squared plus 1 minus 2 squared. And this is what you would have done if you were taking a population variance. If this was your entire population, this is how you would you find a population mean here, if this was your entire population. And you find the squared distances from each of those data points and then divide by the number of data points. So let's just think about this a little bit. 1 minus 2 squared. Then you have 2.5 minus 2-- 2 being the sample mean-- squared. Let me see, this green color. Plus 2 minus 2 squared. Plus 1 minus 2 squared. And then maybe you would divide by the number of data points that you have, where you have the number of data points. So in this case, we're dividing by 6. And what would we get in this circumstance? Well, if we just do the computation, 1.5 minus 2 is negative 0.5. We square that. This becomes a positive 0.25. 4 minus 2 squared is going to be 2 squared, which is 4. 1 minus 2 squared-- well, that's negative 1 squared, which is just 1. 2.5 minus 2 is 0.5 squared, is 0.25. 2 minus 2 squared-- well, that's just 0. And then 1 minus 2 squared is 1, it's negative 1 squared. So we just get 1. And if we add all of this up-- let me add the whole numbers first. 4 plus 1 is 5, plus 1 is 6, and then we have two 0.25s. So this is going to be equal to 6.5-- let me write this in a neutral color. So this is going to be 6.5 over this 6 right over here. Well, there's a couple of ways we could write this, but I'll just get the calculator out and we can just calculate it. So 6.5 divided by 6 gets us-- if we round, it's approximately 1.08. So it's approximately 1.08 is this calculation. Now what we have to think about is whether this is the best calculation, whether this is the best estimate for the population variance, given the data that we have. You can always argue that we could have more data. But given the data we have, is this the best calculation that we can make to estimate the population variance? And I'll have you think about that for a second. Well, it turns out that this is close, this is close to the best calculation, the best estimate that we can make, given the data we have. And sometimes this will be called the sample variance. But it's a particular type of sample variance where we just divide by the number of data points we have. And so people will write just an n over here. So this is one way to define a sample variance in an attempt to estimate our population variance. But it turns out-- and in the next video I'll give you an intuitive explanation of why it turns out this way. And then I would also like to write a computer simulation that, at least experimentally, makes you feel a little bit better. But it turns out, you're going to get a better estimate-- and it's a little bit weird and voodooish at first when you first think about it-- you're going to get a better estimate for your population variance if you don't divide by 6, if you don't divide by the number of data points you have but you divide by one less than the number of data points you have. So how would we do that? And we can denote that as sample variance. So when most people talk about the sample variance, they're talking about the sample variance where you do this calculation, but instead of dividing by 6 you were to divide by 5. You would divide by 5. So they would say you divide by n minus 1. So what would we get in those circumstances? Well, the top part is going to be the exact same thing. We're going to get 6.5. But then our denominator, our n is 6. We have 6 data points. But we're going to divide by 1 less than 6. We're going to divide by 5. And 6.5 divided by 5 is equal to 1.3. So when we calculate our sample variance with this technique, which is the more mainstream technique-- and it seems voodoo. Why are we dividing by n minus 1, wherein for a population variance we divide by n? But remember we're trying to estimate the population variance. And it turns out that this is a better estimate. Because this calculation is underestimating what the population variance is, this is a better estimate. We don't know for sure what it is. These both could be way off. It could be just by chance what we happen to sample. But over many samples-- and there's many ways to think about it-- this is going to be a better calculation. It's going to give you a better estimate. And so how would we write this down? How would we write this down with mathematical notation? Well, remember, we're taking the sum. And we're taking each of the data points. So we'll start with the first data point all the way to the nth data point. This lowercase n says that, hey, we're looking at the sample. If I have an uppercase N, that usually denotes that we're trying to sum up everything in the population. Here we're looking at a sample of size, lower case n. And we're taking each data point, so each x sub i, and from it we're subtracting the sample mean. And then we're squaring it. We're taking the sum of the squared distances. And then we're dividing, not by the number of data points we have, but by 1 less than the number of data points we have. So this calculation, where we just summed up all of this and then we divided by 5, not by 6, this is the standard definition of sample variance. So I'll leave you there. In the next video, I will attempt to give you an intuition of why we're dividing by n minus 1 instead of dividing by n.