Current time:0:00Total duration:13:07
0 energy points
Learn how to calculate standard deviation, how it relates to variance and mean, and the difference between population and sample standard deviation. Created by Sal Khan.
Video transcript
Let's review a little bit of everything we learned so far. And hopefully make everything fit together a little bit better. And then we'll do a bunch of calculations with real numbers. And I think it'll really hit the point home. So first of all, let me make some columns. So if we're dealing with-- let's see. We could call it the concept. And then we'll call it the-- whether we're dealing with a population or a sample. So the first statistical concept we came up with was the notion of the mean or the central tendency. And we learned that that was one way to measure the average or central tendency of a data set. The other ways were the median and the mode. But the mean tends to show up a lot more, especially when we start talking about variances, and we'll do in this video, the standard deviation. But the mean of a population, we learned-- we use the Greek letter mu-- is equal to the sum of each of the data points in the population. That's an I. Let me make sure it looks like an I. So you're going to sum up each of those data points. You're going to start with the first one. And you're going to go to the Nth one. We're assuming that there are n data points in the population. And then you divide by the total number that you have. And this is like the average that you're used to taking before you learned any of the statistic stuff. You add up all the data points. And you divide by the number there are. The sample is the same thing. We just use a slightly different terminology. The mean of a sample-- and I'll do it in a different color. Just write it as x with a line on top. And that's equal to the sum of all the data points in the sample. So each of the x sine in the sample. But we're assuming the sample is something less than a population. So you start with the first one still. And then you go to the lowercase n, where we assume that lowercase n is less than the big N. If this was the same thing, then we're just actually taking the average, or we're taking the mean, of the entire population. And then you divide by the number of data points you added. You get to n. Then we said, OK, this gives us the central tendency. It's one measure of the central tendency. But what if we wanted to know how good of an indicator this is for the population or for the sample. Or on average, how far are the data points from this mean? And that's where we came up with the concept of variance. And I'll arbitrarily switch colors again. And in a population, the variable, or the notation for variance, is the sigma squared. This means variance. And that is equal to-- you take each of the data points. You find the difference between that and the mean that you calculate up there. You square it. So you get the squared difference. And then you essentially take the average of all of these. You take the average of all of these squared distances. So that's if you take the sum from i is equal to one to N. And you divide it by N. That's the variance. And then the variance of a sample mean, and this was a little bit more interesting. And we talked a little bit about it in the last video. You actually want to provide a, you want to estimate the variance of the population when you're taking the variance of a sample. And in order to provide an unbiased estimate, you do something very similar to here. But you end up dividing by n minus 1. So let me write that down. So the variance of a population-- sorry. The variance of a sample, or sample variance, or unbiased sample variance-- and that's why we're going to divide by n minus 1. That's denoted by s squared. What you do is you take the difference between each of the data points in the sample minus the sample mean. We assume that we don't know the population mean. Maybe we did. If we knew the population mean, we actually wouldn't have to do the unbiased thing that we're going to do here in the denominator. But when you have a sample, the only way to kind of figure out the population mean is to estimate it with the sample mean. So we assume that we only have the sample mean. And you're going to square those. And then you're going to sum them up. Sum them up from i is equal to one to i is equal to n. Because you have n data points. And if you want an unbiased estimator, you divide by n minus 1. And we talked a little bit before about why you want this to be n minus 1 instead of an n. And actually, in a couple of videos, I'll actually prove this to you. One I'll prove, maybe experimentally, using Excel. And then I'll-- which wouldn't be a proof. But it'll just give you a little bit of intuition. And then I'll actually prove it a little bit more formally later on. But you don't have to worry about it right now. Now, the next thing we'll learn is something that you've probably heard a lot of. Especially, sometimes, in class teachers talk about the standard deviation of a test, or it's actually, probably, one of the most used words in statistics. I think a lot of people, unfortunately, maybe use it or maybe use it without fully appreciating everything that it involves. But the goal-- well, we'll eventually, hopefully, appreciate all that it involves soon. But the standard deviation. And once you know variance, it's actually quite straightforward. It's the square root of the variance. So the standard deviation of a population is written as sigma which is equal to the square root of the variance. And now, I think, you understand why variance is written as sigma squared. And that is equal to just the square root of all that. It's equal to the square root-- I'll probably run out of space-- of all of that. So I won't write the top at the bottom. That makes it messy. If xi minus mu squared. Everything over N. And then, if you wanted the standard deviation of a sample, and it actually gets a little bit interesting. Because the standard deviation of a sample, which is equal to the square root of the variance of a sample. It actually turns out that this is not an unbiased estimator for this. And I don't want to get too technical for it right now. But this is actually a very good estimate of this. The expected value of this is going to be this. And I'll go into more depth on expected values in the future. But it turns out that this is not quite the same expected value as this. But you don't have to worry about it for now. So why even talk about the standard deviation? Well, one, the units work out a little better. If, let's say, all of our data points we're measured in meters, right? If we were taking a bunch of measurements of length, then the units of the variance would be meter squared, right? Because we're taking meters minus meters. This would be a meter. And then you're squaring it. You're getting meters squared. And that's kind of a strange concept if you say, the average dispersion from the center has been meter squared. So well, first, when you take the square root of it, you get something that's, again, in meters. So you're kind of saying, oh, well, the standard deviation is x or y meters. And then we'll learn a little bit that If you can actually model your data as a bell curve, or if you assume that your data has a distribution of a bell curve, then this tells you some interesting things about where all of the probability of finding someone within one or two standard deviations of the mean. But, anyway, I don't want to go too technical right now. Let's just calculate a bunch. Let's calculate, let's say, if I had numbers 1, 2, 3, 8, and 7. And let's say that this is a population. So what would its mean be? So I have 1 plus 2 plus 3. So it's 3 plus 3 is 6. 6 plus 8 is 14. 14 plus 7 is 21. So the mean of this population, you sum up all the data points. You get 21 divided by the total number of data points. One, two, three, four, five. 21 divided by 5 which is equal to what? 4.20. Fair enough. Now, we want to figure out the variance. And we're assuming that this is the entire population. So the variance of this population is going to be equal to the sum of the squared differences of each of these numbers from 4.20 I'm going to have to get my calculator out. So it's going to be 1 minus 4.20 squared, plus 2 minus 4.20 squared, plus 3 minus 4.20 squared, plus 8 minus 4.20 squared, plus 7 minus 4.20 squared. And it's going to be all of that-- and I know it looks a little bit funny-- divided by the number of data points we have, divided by 5. So let me take the calculator out. All right. Here we go. Actually, maybe I should have used the graphing calculator that I have. Let me see if I can get this thing, if I could get this-- Oh. There you go. Yeah, I think the graphing one will be better. Because I can see everything that I'm writing. OK, so let me clear this. So I want to take 1 minus 4.20 squared, plus-- let me write it down-- plus 2 minus 4.20 squared, plus 3 minus 4.20 squared, plus 8 minus 4.20 squared, right? I'm just taking the sum of the squared distances from the mean. One more, plus 7 minus 4.20 squared. So that's the sum. The sum is 38.80, so the numerator. So this is going to be equal to 38.80 divided by 5. So this is the sum of the square distances, right? Each of these, just so you can relate to the formula, each of that is xi minus the mean squared. And so if we take the sum of all of them, right? This numerator is the sum of each of the xi minus the mean squared from i equals 1 to N. And that ended up to be 38.80. And I just calculated it like that. I just took each of the data points minus the mean squared, added them all up. And I got 38.80. And I want to divide it by N which is 5. So this N up here is actually, also, 5. Right? And so 38.80 divided by 5 is 7.76. So let me scroll down a little bit. The variance is equal to 7.76. Now, if this was a sample of a larger distribution, if the 1, 2, 3, 8, and 7 weren't the population, if it was a sample from a larger population, instead of dividing by 5, we would have divided by 4. And we would have gotten the variance as 38.80. Sorry. 38.80 divided by N minus 1 which is divided by 4. So then we would have gotten the sample variance at 9.70. If you would have divided by N minus 1 instead of n. But, anyway, don't worry about that right now. That's just a change of N. But once you have the variance, very easy to figure out the standard deviation. You just take the square root of it. The square root of 7.76, 2.78. Let's say 2.79 is the standard deviation. So this gives us some measure of, on average, how far the numbers are away from the mean which was 4.20. And it gives it in kind of the units of the original measurement. Anyway, I'm all out of time. I'll see you in the next video. Well, actually, let's figure out. We said if this was a sample, if those numbers were a sample and not the population, then we figured out that the sample variance was 9.70. And so then the sample standard deviation is just going to be the square root of that. The square root of 9.70 which would be 3.13, 3.11. Anyway, hopefully that makes it a little bit more concrete. We've been dealing with these sigma notation variables and all that so far. So when you actually do it with numbers you see it's, hopefully, not that difficult. Anyway, see you in the next video.