Let's say I'm trying to judge
how many years of experience we have at the Khan Academy. Or on average, how many
years of experience we have. And in particular, the
particular type of average we'll focus on, is
the arithmetic mean. So I go and I survey
the folks there. And let's say this was when
Khan Academy was a smaller organization, when
there were only five people in the organization. And I find-- and I'm surveying
the entire population-- so years of experience, the
entire population of Khan Academy, because that's
what I care about, years of experience at our
organization, at Khan Academy. And this was when
we had five people. And I were to go--
we're now 36 people, I don't want to date this video
too much-- but let's say I go, and I say, OK, there's one
person straight out of college, they have one year
of experience, or recently out of
college, somebody with three years of
experience, someone with five years of
experience, someone with seven years of experience,
and someone very experienced, or reasonably experienced,
with 14 years of experience. So based on this data point,
and this is our population, for years of experience. I'm assuming that we
only have five people in the organization,
at this point. What would be the
population mean for the years of experience? What is the mean years of
experience for my population? Well, we can just
calculate that. Our mean experience,
and I'm going to denote it with
mu, because we're talking about the
population now. This is a parameter
for the population. It's going to be equal to the
sum, from our first data point, so data point one all the way
to data point, in this case, data point five-- we have
five data points-- of each of-- so we're going to take
all, from the first data point, the second data
point, the third data point, all the way to the fifth. So this is going to be
equal to x1, plus x-- and I'm going to divide it all
by the number of data points I have-- plus x2, plus x3, plus
x4, plus x sub 5, subscript 5. All of that over 5. And as we said, this is a
very fancy way of saying, I'm going to sum up
all of these things and then divide by the
number of things we have. So let's do that. Get the calculator out. So I'm going to add them
all up, 1 plus 3 plus 5-- I really don't need a calculator
for this-- plus 7 plus 14. So that's five data points. And I'm going to divide by 5. And I get 6. So the population
mean, for years of experience at my
organization, is 6. 6 years of experience. Well, that's, I
guess, interesting. But now I want to
ask another question. I want to get some
measure of how much spread there is around that mean. Or how much do the data
points vary around that mean. And obviously, I can give
someone all the data points. But instead, I actually
want to come up with a parameter that
somehow represents how much all of these things,
on average, are varying from this number right here. Or maybe I will call
that thing the variance. And so, what I do-- so the
variance-- and I will do-- and this is a
population variance that I'm talking about, just
to be clear, it's a parameter. The population
variance I'm going to denote with the Greek letter
sigma, lowercase sigma-- this is capital sigma--
lowercase sigma squared. And I'm going to
say, well, I'm going to take the distance from each
of these points to the mean. And just so I get a positive
value, I'm going to square it. And then, I'm going to divide
by the number of data points that I have. So essentially,
I'm going to find the average squared distance. Now that might sound
very complicated, but let's actually work it out. So I'll take my first
data point and I will subtract our mean from it. So this is going to give
me a negative number. But if I square it, it's
going to be positive. So it's, essentially,
going to be the squared distance
between 1 and my mean. And then, to that,
I'm going to add the squared distance
between 3 and my mean. And to that, I'm going to add
the squared distance between 5 and my mean. And since I'm
squaring, it doesn't matter if I do 5
minus 6, or 6 minus 5. When I square it, I'm going
to get a positive result regardless. And then, to that
I'm going to add the squared distance
between 7 and my mean. So 7 minus 6 squared. All of this, this
is my population mean that I'm finding
the difference between. And then, finally, the squared
difference between 14 and my mean. And then, I'm going
to find, essentially, the mean of these
squared distances. So I have five squared
distances right over here. So let me divide by 5. So what will I get when
I make this calculation, right over here? Well, let's figure this out. This is going to be equal
to 1 minus 6 is negative 5, negative 5 squared is 25. 3 minus 6 is negative 3, now
if I square that, I get 9. 5 minus 6 is negative 1, if I
square it, I get positive 1. 7 minus 6 is 1, if I square
it, I get positive 1. And 14 minus 6 is 8, if
I square it, I get 64. And then, I'm going to
divide all of that by 5. And I don't need to
use a calculator, but I tend to make a
lot of careless mistakes when I do things
while making a video. So I get 25 plus 9 plus 1
plus 1 plus 64 divided by 5. So I get 20. So the average squared distance,
or the mean squared distance, from our population
mean is equal to 20. You may say, wait, these
things aren't 20 away. Remember, it's the
squared distance away from my population mean. So I squared each
of these things. I liked it, because
it made it positive. And we'll see later it has
other nice properties about it. Now the last thing
is, how can we represent this mathematically? We already saw that we know how
to represent a population mean, and a sample mean,
mathematically like this, and hopefully, we don't find
it that daunting anymore. But how would we do
the exact same thing? How would we denote what
we did, right over here? Well, let's just
think it through. We're just saying that
the population variance, we're taking the sum
of each-- so we're going to take each item, we'll
start with the first item. And we're going to go to the
n-th item in our population. We're talking about
a population here. And we're going to
take-- we're not going to just take the item,
this would just be the item-- but we're going take the item. And from that, we're going to
subtract the population mean. We're going to
subtract this thing. We're going to
subtract this thing. We're going to square it. We're going to square it. So the way I've
written it right now, this would just
be the numerator. I've just taken the sum
of each of these things, the sum of the difference
between each data point and the population
mean and squared it. If I really want to get
the way I figure out this variance right
over here, I have to divide the whole thing by the
number of data points we have. So this might seem
very daunting, and very intimidating. But all it says is, take each
of your data points-- well, one, it says, figure out
your population mean. Figure that out first. And then, from each data
point, in your population, subtract out that
population mean, square it, take the sum of all
of those things, and then just divide by the
number of data points you have. And you will get your
population variance.