Current time:0:00Total duration:13:07

0 energy points

# Standard deviation

Learn how to calculate standard deviation, how it relates to variance and mean, and the difference between population and sample standard deviation. Created by Sal Khan.

Video transcript

Let's review a little bit of
everything we learned so far. And hopefully make
everything fit together a little bit better. And then we'll do a bunch of
calculations with real numbers. And I think it'll really
hit the point home. So first of all, let
me make some columns. So if we're dealing
with-- let's see. We could call it the concept. And then we'll call
it the-- whether we're dealing with a
population or a sample. So the first statistical
concept we came up with was the notion of the mean
or the central tendency. And we learned that
that was one way to measure the average or
central tendency of a data set. The other ways were the
median and the mode. But the mean tends to show up
a lot more, especially when we start talking about
variances, and we'll do in this video, the
standard deviation. But the mean of a
population, we learned-- we use the Greek letter mu--
is equal to the sum of each of the data points
in the population. That's an I. Let me make
sure it looks like an I. So you're going to sum up
each of those data points. You're going to start
with the first one. And you're going to
go to the Nth one. We're assuming that there are n
data points in the population. And then you divide by the
total number that you have. And this is like the
average that you're used to taking before you
learned any of the statistic stuff. You add up all the data points. And you divide by
the number there are. The sample is the same thing. We just use a slightly
different terminology. The mean of a sample-- and I'll
do it in a different color. Just write it as x
with a line on top. And that's equal to
the sum of all the data points in the sample. So each of the x
sine in the sample. But we're assuming the
sample is something less than a population. So you start with
the first one still. And then you go to
the lowercase n, where we assume that lowercase
n is less than the big N. If this was the same thing,
then we're just actually taking the average, or we're taking the
mean, of the entire population. And then you divide by
the number of data points you added. You get to n. Then we said, OK, this gives
us the central tendency. It's one measure of
the central tendency. But what if we wanted to
know how good of an indicator this is for the population
or for the sample. Or on average, how far are the
data points from this mean? And that's where we came up
with the concept of variance. And I'll arbitrarily
switch colors again. And in a population,
the variable, or the notation for variance,
is the sigma squared. This means variance. And that is equal to-- you
take each of the data points. You find the difference
between that and the mean that you calculate up there. You square it. So you get the
squared difference. And then you essentially take
the average of all of these. You take the average of all
of these squared distances. So that's if you take the sum
from i is equal to one to N. And you divide it by
N. That's the variance. And then the
variance of a sample mean, and this was a little
bit more interesting. And we talked a little bit
about it in the last video. You actually want
to provide a, you want to estimate the
variance of the population when you're taking the
variance of a sample. And in order to provide
an unbiased estimate, you do something
very similar to here. But you end up
dividing by n minus 1. So let me write that down. So the variance of a
population-- sorry. The variance of a sample,
or sample variance, or unbiased sample
variance-- and that's why we're going to
divide by n minus 1. That's denoted by s squared. What you do is you take
the difference between each of the data points in the
sample minus the sample mean. We assume that we don't
know the population mean. Maybe we did. If we knew the population
mean, we actually wouldn't have to do the
unbiased thing that we're going to do here
in the denominator. But when you have a sample,
the only way to kind of figure out the population
mean is to estimate it with the sample mean. So we assume that we only
have the sample mean. And you're going
to square those. And then you're
going to sum them up. Sum them up from i is equal
to one to i is equal to n. Because you have n data points. And if you want an
unbiased estimator, you divide by n minus 1. And we talked a little
bit before about why you want this to be n
minus 1 instead of an n. And actually, in a
couple of videos, I'll actually prove this to you. One I'll prove, maybe
experimentally, using Excel. And then I'll-- which
wouldn't be a proof. But it'll just give you a
little bit of intuition. And then I'll actually prove
it a little bit more formally later on. But you don't have to
worry about it right now. Now, the next thing
we'll learn is something that you've probably
heard a lot of. Especially, sometimes,
in class teachers talk about the standard
deviation of a test, or it's actually, probably,
one of the most used words in statistics. I think a lot of
people, unfortunately, maybe use it or maybe use it
without fully appreciating everything that it involves. But the goal-- well,
we'll eventually, hopefully, appreciate all
that it involves soon. But the standard deviation. And once you know variance, it's
actually quite straightforward. It's the square root
of the variance. So the standard
deviation of a population is written as sigma which
is equal to the square root of the variance. And now, I think,
you understand why variance is written
as sigma squared. And that is equal to just
the square root of all that. It's equal to the square
root-- I'll probably run out of space--
of all of that. So I won't write the
top at the bottom. That makes it messy. If xi minus mu squared. Everything over N. And then, if you wanted the
standard deviation of a sample, and it actually gets a
little bit interesting. Because the standard
deviation of a sample, which is equal to
the square root of the variance of a sample. It actually turns
out that this is not an unbiased estimator for this. And I don't want to get too
technical for it right now. But this is actually a
very good estimate of this. The expected value of
this is going to be this. And I'll go into more depth on
expected values in the future. But it turns out that
this is not quite the same expected value as this. But you don't have to
worry about it for now. So why even talk about
the standard deviation? Well, one, the units
work out a little better. If, let's say, all
of our data points we're measured in meters, right? If we were taking a bunch
of measurements of length, then the units of the variance
would be meter squared, right? Because we're taking
meters minus meters. This would be a meter. And then you're squaring it. You're getting meters squared. And that's kind of a
strange concept if you say, the average dispersion from the
center has been meter squared. So well, first, when you
take the square root of it, you get something
that's, again, in meters. So you're kind of
saying, oh, well, the standard deviation
is x or y meters. And then we'll
learn a little bit that If you can actually model
your data as a bell curve, or if you assume that your data
has a distribution of a bell curve, then this tells you
some interesting things about where all of the
probability of finding someone within one or two standard
deviations of the mean. But, anyway, I don't want to
go too technical right now. Let's just calculate a bunch. Let's calculate, let's say, if
I had numbers 1, 2, 3, 8, and 7. And let's say that
this is a population. So what would its mean be? So I have 1 plus 2 plus 3. So it's 3 plus 3 is 6. 6 plus 8 is 14. 14 plus 7 is 21. So the mean of this population,
you sum up all the data points. You get 21 divided by the
total number of data points. One, two, three, four, five. 21 divided by 5 which
is equal to what? 4.20. Fair enough. Now, we want to figure
out the variance. And we're assuming that this
is the entire population. So the variance of this
population is going to be equal to the sum of the squared
differences of each of these numbers from 4.20 I'm going to have to
get my calculator out. So it's going to be
1 minus 4.20 squared, plus 2 minus 4.20 squared,
plus 3 minus 4.20 squared, plus 8 minus 4.20 squared,
plus 7 minus 4.20 squared. And it's going to
be all of that-- and I know it looks
a little bit funny-- divided by the number of data
points we have, divided by 5. So let me take the
calculator out. All right. Here we go. Actually, maybe I should have
used the graphing calculator that I have. Let me see if I
can get this thing, if I could get this-- Oh. There you go. Yeah, I think the graphing
one will be better. Because I can see
everything that I'm writing. OK, so let me clear this. So I want to take 1 minus
4.20 squared, plus-- let me write it down--
plus 2 minus 4.20 squared, plus 3 minus 4.20 squared, plus
8 minus 4.20 squared, right? I'm just taking the sum
of the squared distances from the mean. One more, plus 7
minus 4.20 squared. So that's the sum. The sum is 38.80,
so the numerator. So this is going to be
equal to 38.80 divided by 5. So this is the sum of the
square distances, right? Each of these, just
so you can relate to the formula, each of that
is xi minus the mean squared. And so if we take the sum
of all of them, right? This numerator is
the sum of each of the xi minus the mean
squared from i equals 1 to N. And that ended up to be 38.80. And I just calculated
it like that. I just took each of the data
points minus the mean squared, added them all up. And I got 38.80. And I want to divide
it by N which is 5. So this N up here is
actually, also, 5. Right? And so 38.80 divided
by 5 is 7.76. So let me scroll
down a little bit. The variance is equal to 7.76. Now, if this was a sample
of a larger distribution, if the 1, 2, 3, 8, and 7
weren't the population, if it was a sample from
a larger population, instead of dividing by 5,
we would have divided by 4. And we would have gotten
the variance as 38.80. Sorry. 38.80 divided by N minus
1 which is divided by 4. So then we would have gotten
the sample variance at 9.70. If you would have divided
by N minus 1 instead of n. But, anyway, don't worry
about that right now. That's just a change
of N. But once you have the variance,
very easy to figure out the standard deviation. You just take the
square root of it. The square root of 7.76, 2.78. Let's say 2.79 is the
standard deviation. So this gives us some
measure of, on average, how far the numbers are away
from the mean which was 4.20. And it gives it in
kind of the units of the original measurement. Anyway, I'm all out of time. I'll see you in the next video. Well, actually,
let's figure out. We said if this was a sample,
if those numbers were a sample and not the population, then
we figured out that the sample variance was 9.70. And so then the sample
standard deviation is just going to be the
square root of that. The square root of 9.70
which would be 3.13, 3.11. Anyway, hopefully that makes
it a little bit more concrete. We've been dealing with
these sigma notation variables and all that so far. So when you actually
do it with numbers you see it's, hopefully,
not that difficult. Anyway, see you
in the next video.