Main content

## Statistics and probability

### Course: Statistics and probability > Unit 3

Lesson 6: More on standard deviation- Why we divide by n - 1 in variance
- Simulation showing bias in sample variance
- Simulation providing evidence that (n-1) gives us unbiased estimate
- Unbiased estimate of population variance
- Review and intuition why we divide by n-1 for the unbiased sample variance

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Review and intuition why we divide by n-1 for the unbiased sample variance

Reviewing the population mean, sample mean, population variance, sample variance and building an intuition for why we divide by n-1 for the unbiased sample variance. Created by Sal Khan.

## Want to join the conversation?

- n-1 is a better estimator in small samples, but does it compensate in larger samples?(93 votes)
- The larger the sample is, the closer its variance will match the variance of the population, so the less compensation is needed. The n-1 deals with this perfectly since the difference between n and n-1 becomes negligible as n becomes large.(280 votes)

- We could as well have randomly chosen a particular sample whose mean sits ABOVE µ, which would make our sample mean bigger than the actual population mean.

In order to make it smaller, wouldn't a correction of "n+1" in the denominator be just as likely? Why do we assume that we're UNDERestimating?(130 votes)- I think it has to do more with the distribution of the sample vs the distribution of the population than whether the sample mean is larger or smaller than the population mean. To illustrate this at end of the video Sal circles a sample of three points all bunched up on the left end of the line. He then says that the sample mean would sit within this sample and therefore the variance of the sample would be smaller than that of the entire population. However, this is not because the sample mean is smaller than the population mean. It's more because the distribution of the sample is really small so the distance between any of the points and the sample mean is really small so the sample variation would be small. When you compare this sample variation to the population variation you find that the sample variance is much smaller than the population variance because in the sample you have data points that are really close to each other and in contrast population the data is made up of far ranging data from the far left to the far right meaning their distance to the mean is going to be inevitably larger. The distribution of most data sample is likely going to be smaller than the distribution of the complete population and therefore the variation will be smaller as well. If you think about it this way you find that even if the sample mean is larger than the population mean the sample variation will still be an understatement because the sample data will not be as spread out as the population data. Hope it made sense(177 votes)

- (9:21) So why not take n-2 (or even something smaller) to make the variance even bigger...?(56 votes)
- I'll have a go at explaining the intuition between the "1" in "n-1":

Think of the whole equation as the average amount of variation. If this is truly what the equation is measuring then it should be (total amount of variation)/(number of things that can vary). Since the average i.e. mean is always Total/(Number of things).

Look at the numerator and the denominator in the sample variance equation. Is the following true?

-The numerator is a measure of the total amount of variation

-The denominator is the amount of things that are able to vary.

Yes. Why!? I mean surely there are N things that can vary about xbar i.e. the sample mean. Well actually no there aren't. There are N things that can vary about the population mean but only N-1 that can vary about the sample mean. Here's an example of why this is so:

Say you have 3 data points.

- You calculate the sample mean and it comes out to be 2.

- The first data point could be anything, let's say it is 1.

- The second data point could be anything, let's say it is 3.

- What can they second data point be? It absolutely MUST be 2. It is not free to vary - the sum of the three scores must be 6 or else the sample mean is not 2.

Knowing n-1 scores and the sample mean uniquely determines the last score so it is NOT free to vary. This is why we only have "n-1" things that can vary. So the average variation is (total variation)/(n-1).

total variation is just the sum of each points variation from the mean.The measure of variation we are using is the square of the distance. Why do we use the square of the distance ? Well, that is is a topic for another day.(51 votes)

- It seems the confusion comes not from the math but the language we're using to describe it. When we say "unbiased" what we actually mean in less biased, correct?(20 votes)
- No, it is unbiased - it has no bias in either direction, too large or too small and is not biased by sample size. Note that just because it is unbiased doesn't mean it necessarily gives you the right value for the population variance and it says nothing about the bias of the person doing the statistics.(38 votes)

- What is the difference between a parameter and a statistic?(20 votes)
- From what I could tell. It seems to be your level of confidence. If your data covers the whole population you can be almost absolutely sure those numbers reflect the reality of that popultation so you give that value a name that denotes that confidence, a parameter. But if you are sampling a population and inferring about the whole population from that your level of confidence must be less and so you reflect that level of confidence in your language, a statistic. I think then you can start to ask questions like, "Well how confident are we in this statistic?" That is my take on it. There may be a deeper definition but I think that is the core.(8 votes)

- Does this also has a connection with the degrees of freedom?(22 votes)
- Yes. The reason n-1 is used is because that is the number of degrees of freedom in the sample. The sum of each value in a sample minus the mean must equal 0, so if you know what all the values except one are, you can calculate the value of the final one.(24 votes)

- Why can't we just use n-2 or n-3 etc. if we want to get a bigger answer?(10 votes)
- Because the point of using n-1 isn't to get a bigger value. Even when we use n, we will sometimes be overestimating the population variance ( σ² ) , and even when using n-1, we will sometimes be underestimating σ².

The purpose of using n-1 is so that our estimate is "unbiased"*in the long run*. What this means is that if we take a second sample, we'll get a different value of s². If we take a third sample, we'll get a third value of s², and so on. We use n-1 so that the average of all these values of s² is equal to σ².

If we only used n, then in this long run view, we'd be generally (but not always) underestimating σ². If we used n-2 or n-3, etc, then we'd be generally overestimating σ². Using n-1 is the sweet spot (we know this from the theory as well as from simulations).(18 votes)

- what if we overestimate them mean. isn't it as likely as underestimating.(7 votes)
- It's not about over- or underestimating the mean; it's about how the distribution of the sample relates to the distribution of the population.(7 votes)

- A little confusion....the video assumes that my samples would usually be close to each other and hence the variance would be less than my population variance. However, if you assume an unbiased sample, wouldn't it also assume there is a 50% chance that my sample variance exceeds my population variance as well ? Also, why (n-1)? Why not (n-2) and so on ?(8 votes)
- You couldn't possibly have more than the variance between the true population mean and the two most extreme individuals at either end of the scale in any sample, which at greatest possible variance would be a sample size of two, with those two samples being those extreme individuals (say the bond villan Jaws and Danny De Vito).

As to n-2, (I have read somewhere about n-1.5 being used.) We need to think about how different sizes of n effect the product value. With large n values, those sample sizes that give greater confidence, the impact of the -1 is almost nothing, with tiny sample sizes that we know almost instinctively do not accurately represent the population, that -1 really widens the margins of the sample standard deviation.(5 votes)

- I can't understand about how subtracting 1 from n unbiases the mean. What if your samples were all higher on the number line than the population mean.(5 votes)
- Please watch the video here:

https://www.khanacademy.org/math/probability/descriptive-statistics/variance_std_deviation/v/simulation-showing-bias-in-sample-variance

Key point to remember: This is statistics, which deals with a lot of averages. So yes, you might get really unlucky and get all your samples above the mean. But, assuming you repeat this experiment many, many times, you'll see that subtracting 1 gives you the right population variance.(6 votes)

## Video transcript

What I want to do in
this video is review much of what we've already talked
about and then hopefully build some of the intuition on
why we divide by n minus 1 if we want to have an unbiased
estimate of the population variance when we're calculating
the sample variance. So let's think
about a population. So let's say this is the
population right over here. And it is of size
capital N. And we also have a sample of
that population, so a sample of that population. And in its size, we have
lowercase n data points. So let's think about all of
the parameters and statistics that we know about so far. So the first is the idea
of the mean, of the mean. So if we're trying to calculate
the mean for the population, is that going to be a
parameter or a statistic? Well, when we're trying to
calculate it on the population, we are calculating a parameter. We are calculating a parameter. So let me write this down. So this is going to be--
so for the population we are calculating a parameter. It is a parameter. And when we calculate, when we
attempt to calculate something for a sample we would call
that a statistic-- statistic. So how do we think about
the mean for a population? Well, first of all, we denote
it with the Greek letter mu. And we essentially take every
data point in our population. So we take the sum
of every data point. So we start at the
first data point and we go all the way to
the capital Nth data point. So every data point we add up. So this is the i-th
data point, so x sub 1 plus x sub 2 all the
way to x sub capital N. And then we divide by the total
number of data points we have. Well, how do we calculate
the sample mean? Well, the sample mean--
we do a very similar thing with the sample. And we denote it with
a x with a bar over it. And that's going to be taking
every data point in the sample, so going up to a lower
case n, adding them up --so these are the sum of all
the data points in our sample-- and then dividing by
the number of data points that we actually had. Now, the other thing
that we're trying to calculate for the population,
which was a parameter, and then we'll also try to
calculate it for the sample and estimate it
for the population, was the variance, which was
a measure of how dispersed or how much of the data
points vary from the mean. So let's write variance
right over here. And how do we denote
any calculate variance for a population? Well, for population, we'd
say that the variance --we use a Greek letter sigma
squared-- is equal to-- and you can view it as the
mean of the squared distances from the population mean. But what we do is we
take, for each data point, so i equal 1 all
the way to n, we take that data point, subtract
from it the population mean. So if you want to
calculate this, you'd want to figure this out. Well, that's one way to do it. We'll see there's
other ways to do it, where you can calculate
them at the same time. But the easiest or
the most intuitive is to calculate this first,
then for each of the data points take the data point and
subtract it from that, subtract the mean
from that, square it, and then divide by the total
number of data points you have. Now, we get to the interesting
part-- sample variance. There's are several ways-- where
when people talk about sample variance, there's several
tools in their toolkits or there's several
ways to calculate it. One way is the biased
sample variance, the non unbiased estimator
of the population variance. And that's denoted,
usually denoted, by s with a subscript n. And what is the biased
estimator, how we calculate it? Well, we would calculate it very
similar to how we calculated the variance right over here. But what we would do it for
our sample, not our population. So for every data point in our
sample --so we have n of them-- we take that data point. And from it, we subtract
our sample mean. We subtract our sample
mean, square it, and then divide by the number
of data points that we have. But we already talked
about it in the last video. How would we find-- what is
our best unbiased estimate of the population variance? This is usually what
we're trying to get at. We're trying to find an unbiased
estimate of the population variance. Well, in the last video,
we talked about that, if we want to have
an unbiased estimate --and here, in
this video, I want to give you a sense
of the intuition why. We would take the sum. So we're going to go through
every data point in our sample. We're going to take
that data point, subtract from it the
sample mean, square that. But instead of dividing by
n, we divide by n minus 1. We're dividing by
a smaller number. We're dividing by
a smaller number. And when you divide
by a smaller number, you're going to
get a larger value. So this is going to be larger. This is going to be smaller. And this one, we refer
to the unbiased estimate. And this one, we refer
to the biased estimate. If people just
write this, they're talking about the
sample variance. It's a good idea to
clarify which one they're talking about. But if you had to guess
and people give you no further information,
they're probably talking about the unbiased
estimate of the variance. So you'd probably
divide by n minus 1. But let's think about why
this estimate would be biased and why we might want to have
an estimate like that is larger. And then maybe in the future,
we could have a computer program or something that really
makes us feel better, that dividing by
n minus 1 gives us a better estimate of the
true population variance. So let's imagine all the
data in a population. And I'm just going to plot
them on number a line. So this is my number line. This is my number line. And let me plot all the data
points in my population. So this is some data. This is some data. Here's some data. And here is some data here. And I can just do as
many points as I want. So these are just points
on the number line. Now, let's say I take
a sample of this. So this is my entire population. So let's see how many. I have 1 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14. So in this case, what
would be my big N? My big N would be 14. Big N would be 14. Now, let's say I take a sample,
a lowercase n of-- let's say my sample size is 3. I could take-- well, before
I even think about that, let's think about roughly where
the mean of this population would sit. So the way I drew
it --and I'm not going to calculate
exactly-- it looks like the mean might sit some
place roughly right over here. So the mean, the
true population mean, the parameter's going
to sit right over here. Now, let's think about what
happens when we sample. And I'm going to do just a
very small sample size just to give us the
intuition, but this is true of any sample size. So let's say we have
sample size of 3. So there is some
possibility, when we take our sample size of 3,
that we happen to sample it in a way that our sample mean is
pretty close to our population mean. So for example, if we sampled
to that point, that point, and that point, I could imagine
in our sample mean might actually said pretty
close, pretty close to our population mean. But there's a
distinct possibility, there's a distinct
possibility, that maybe when I take a sample, I
sample that and that. And the key idea here is
when you take a sample, your sample mean is always
going to sit within your sample. And so there is a possibility
that when you take your sample, your mean could even be
outside of the sample. And so in this
situation-- and this is just to give
you an intuition. So here, your
sample mean is going to be sitting
someplace in there. And so if you were to just
calculate the distance from each of this points to the
sample mean --so this distance, that distance,
and you square it, and you were to divide by
the number of data points you have-- this is
going to be a much lower estimate than the true
variance the true variance, from the actual population mean,
where these things are much, much, much further. Now, you're always not going to
have the true population mean outside of your sample. But it's possible that you do. So in general, when you
just take your points, find the squared distance
to your sample mean, which is always going to sit
inside of your data even though the true population
mean could be outside of it, or it could be at
one end of your data, however, you might
want to think about it, you are likely to
be underestimating, you're likely to
be underestimating the true population variance. So this right over here is an
underestimate-- underestimate. And it does turn out that
if you just-- instead of dividing by n, you
divide by n minus 1, you'll get a slightly
larger sample variance. And this is an
unbiased estimate. In the next video --and I might
not to get to it immediately-- I would like to generate some
type of a computer program that is more convincing that
this is a better estimate, this is a better estimate
of the population variance than this is.