Current time:0:00Total duration:8:05

0 energy points

# Population standard deviation

Created by Sal Khan.

Video transcript

Let's say that you're
curious about studying the dimensions of
the cars that happen to sit in the parking lot. And so you measure
their lengths. Let's just make the
computation simple. Let's say that there are
five cars in the parking lot. The entire size of the
population that we care about is 5. And you go and measure
their lengths-- one car is 4 meters long,
another car is 4.2 meters long, another car is 5 meters
long, the fourth car is 4.3 meters long,
and then, let's say the fifth car is
5.5 meters long. So let's come up with some
parameters for this population. So the first one that you
might want to figure out is a measure of
central tendency. And probably the most popular
one is the arithmetic mean. So let's calculate that first. So we're going to do
that for the population. So we're going to use mu. So what is the
arithmetic mean here? Well, we just have to add
all of these data points up and divide by 5. And I'll just get the
calculator out just so it's a little bit quicker. This is going to be for 4 plus
4.2 plus 5 plus 4.3 plus 5.5. And then, I'm going to take
that sum and then divide by 5. And I get an arithmetic mean
for my population of 4.6. So that's fine. And if we want to put some
units there, it's 4.6 meters. Now, that's the central tendency
or measure of central tendency. We also might be curious about
how dispersed is the data, especially from that
central tendency. So what would we use? Well, we already have a
tool at our disposal-- the population variance. And the population
variance is one of many ways of
measuring dispersion. It has some very neat
properties the way we've defined it as the mean
of the squared distances from the mean. It tends to be a
useful way of doing it. So let's just a bit. Let's actually calculate
the population variance for this population
right over here. Well, all we need to
do is find the distance from each of these points
to our mean right over here. And then, square them. And then, take the mean of
those two squared distances. So let's do that. So it's going to be
4 minus 4.6 squared plus 4.2 minus 4.6 squared
plus 5 minus 4.6 squared plus 4.3 minus 4.6 squared. And then, finally--
I'm running out of space-- plus 5.5
minus 4.6 squared. And then, we're going to
divide all of that by 5 to get our population variance. And so what's that
going to give us? Let's get our calculator out. 4 minus 4.6 squared. That's negative 0.6 squared. Negative 0.6 squared
is going to be the exact same thing
as 0.6 squared. So let me write
that as 0.6 squared plus 4.2 minus 4.6
is negative 0.4. But when we square it, the
negative's going to disappear. So it's going to be plus 0.4. I'll just write 0.4 squared. And then, we have 5 minus 4.6. That's 0.4 so plus 0.4 squared. 4.3 minus 4.6. It's negative 0.3. The negative goes away
when you square it. It's going to be
plus 0.3 squared. And then, finally, 5.5 minus
4.6 is going to be 0.9. So plus 0.9 squared. Then, we will divide by the
number of data points we have. And we get 0.316. Or if we want to write it,
this is going to be 0.316. Now, let me ask you what is a
mildly interesting question-- what would be the units for
this population variance? Since we happen to care
about units in this video. Well, up here, this is 4
meters minus 4.6 meters. 4.2 meters minus 4.6 meters. So these are all meters. These are measurements
in meters. We saw it up here. So these are all
measurements in meters. When you subtract them,
you'll get meters. But then when you
square them, you get meters squared plus meters
squared plus meters squared plus meters squared
plus meters squared. And then, you're just dividing
that by a unitless count of the number of
data points you have. So the units here are
going to be square meters. And so you might say, hey. That's kind of a
weird unit if we're trying to visualize
or think about how dispersed we are from the mean. When I visualize it,
I visualize dispersion or how varied they are in terms
of meters, not meters squared. So what could we do? And a big hint-- this
comes out of just even the notation for variance. And it's this sigma
symbol squared. So why don't we just take the
square root of our variance? Which we will denote
with just a sigma. It makes a lot of sense. And in this case,
what's it going to be? It's going to be the
square root of 0.316. And then, what are
the units going to be? It's going to be just meters. And we end up with-- so let me
take the square root of 0.316. And I get 0.56-- I'll
just round to the nearest thousandth-- 0.562. So this is approximately
0.562 meters. So you might be
saying, Sal, what do we call this thing
that we just did? The square root of the variance. And here we're dealing
with the population. We haven't thought
about sampling yet. The square root of the
population variance, what do we call this
thing right over here? And this is a very
familiar term. Oftentimes, when
you take an exam, this is calculated for
the scores on the exam. This is our population-- let
me do this in a new color. I'm using that yellow
a little bit too much. This is the population
standard deviation. It is a measure of how much the
data is varying from the mean. In general, the
larger this value, that means that the data is
more varied from the population mean. The smaller, it's less varied. And these are all somewhat
arbitrary definitions of how we've defined variance. We could have taken things
to the fourth power. We could have done other things. We could have not
taken them to a power but taking the
absolute value here. The reason why we
do it this way is it has neat statistical properties
as we try to build on it. But that's the population
standard deviation, which gives us nice
units-- meters. In the next video, we'll think
about the sample standard deviation.