Statistics and probability
- Measures of spread: range, variance & standard deviation
- Variance of a population
- Population standard deviation
- The idea of spread and standard deviation
- Calculating standard deviation step by step
- Standard deviation of a population
- Mean and standard deviation versus median and IQR
- Concept check: Standard deviation
- Statistics: Alternate variance formulas
The population standard deviation is a measure of how much variation there is among individual data points in a population. It's a way of quantifying how spread out the data is from its mean. A small standard deviation means that the data points are generally close to the mean, while a large standard deviation means that the data is more dispersed. Created by Sal Khan.
Let's say that you're curious about studying the dimensions of the cars that happen to sit in the parking lot. And so you measure their lengths. Let's just make the computation simple. Let's say that there are five cars in the parking lot. The entire size of the population that we care about is 5. And you go and measure their lengths-- one car is 4 meters long, another car is 4.2 meters long, another car is 5 meters long, the fourth car is 4.3 meters long, and then, let's say the fifth car is 5.5 meters long. So let's come up with some parameters for this population. So the first one that you might want to figure out is a measure of central tendency. And probably the most popular one is the arithmetic mean. So let's calculate that first. So we're going to do that for the population. So we're going to use mu. So what is the arithmetic mean here? Well, we just have to add all of these data points up and divide by 5. And I'll just get the calculator out just so it's a little bit quicker. This is going to be for 4 plus 4.2 plus 5 plus 4.3 plus 5.5. And then, I'm going to take that sum and then divide by 5. And I get an arithmetic mean for my population of 4.6. So that's fine. And if we want to put some units there, it's 4.6 meters. Now, that's the central tendency or measure of central tendency. We also might be curious about how dispersed is the data, especially from that central tendency. So what would we use? Well, we already have a tool at our disposal-- the population variance. And the population variance is one of many ways of measuring dispersion. It has some very neat properties the way we've defined it as the mean of the squared distances from the mean. It tends to be a useful way of doing it. So let's just a bit. Let's actually calculate the population variance for this population right over here. Well, all we need to do is find the distance from each of these points to our mean right over here. And then, square them. And then, take the mean of those two squared distances. So let's do that. So it's going to be 4 minus 4.6 squared plus 4.2 minus 4.6 squared plus 5 minus 4.6 squared plus 4.3 minus 4.6 squared. And then, finally-- I'm running out of space-- plus 5.5 minus 4.6 squared. And then, we're going to divide all of that by 5 to get our population variance. And so what's that going to give us? Let's get our calculator out. 4 minus 4.6 squared. That's negative 0.6 squared. Negative 0.6 squared is going to be the exact same thing as 0.6 squared. So let me write that as 0.6 squared plus 4.2 minus 4.6 is negative 0.4. But when we square it, the negative's going to disappear. So it's going to be plus 0.4. I'll just write 0.4 squared. And then, we have 5 minus 4.6. That's 0.4 so plus 0.4 squared. 4.3 minus 4.6. It's negative 0.3. The negative goes away when you square it. It's going to be plus 0.3 squared. And then, finally, 5.5 minus 4.6 is going to be 0.9. So plus 0.9 squared. Then, we will divide by the number of data points we have. And we get 0.316. Or if we want to write it, this is going to be 0.316. Now, let me ask you what is a mildly interesting question-- what would be the units for this population variance? Since we happen to care about units in this video. Well, up here, this is 4 meters minus 4.6 meters. 4.2 meters minus 4.6 meters. So these are all meters. These are measurements in meters. We saw it up here. So these are all measurements in meters. When you subtract them, you'll get meters. But then when you square them, you get meters squared plus meters squared plus meters squared plus meters squared plus meters squared. And then, you're just dividing that by a unitless count of the number of data points you have. So the units here are going to be square meters. And so you might say, hey. That's kind of a weird unit if we're trying to visualize or think about how dispersed we are from the mean. When I visualize it, I visualize dispersion or how varied they are in terms of meters, not meters squared. So what could we do? And a big hint-- this comes out of just even the notation for variance. And it's this sigma symbol squared. So why don't we just take the square root of our variance? Which we will denote with just a sigma. It makes a lot of sense. And in this case, what's it going to be? It's going to be the square root of 0.316. And then, what are the units going to be? It's going to be just meters. And we end up with-- so let me take the square root of 0.316. And I get 0.56-- I'll just round to the nearest thousandth-- 0.562. So this is approximately 0.562 meters. So you might be saying, Sal, what do we call this thing that we just did? The square root of the variance. And here we're dealing with the population. We haven't thought about sampling yet. The square root of the population variance, what do we call this thing right over here? And this is a very familiar term. Oftentimes, when you take an exam, this is calculated for the scores on the exam. This is our population-- let me do this in a new color. I'm using that yellow a little bit too much. This is the population standard deviation. It is a measure of how much the data is varying from the mean. In general, the larger this value, that means that the data is more varied from the population mean. The smaller, it's less varied. And these are all somewhat arbitrary definitions of how we've defined variance. We could have taken things to the fourth power. We could have done other things. We could have not taken them to a power but taking the absolute value here. The reason why we do it this way is it has neat statistical properties as we try to build on it. But that's the population standard deviation, which gives us nice units-- meters. In the next video, we'll think about the sample standard deviation.