If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

## AP®︎/College Statistics

### Course: AP®︎/College Statistics>Unit 3

Lesson 5: More on standard deviation (optional)

# Review and intuition why we divide by n-1 for the unbiased sample variance

AP.STATS:
UNC‑1 (EU)
,
UNC‑1.J (LO)
,
UNC‑1.J.3 (EK)
,
UNC‑3 (EU)
,
UNC‑3.I (LO)
,
UNC‑3.I.1 (EK)
Reviewing the population mean, sample mean, population variance, sample variance and building an intuition for why we divide by n-1 for the unbiased sample variance. Created by Sal Khan.

## Want to join the conversation?

• n-1 is a better estimator in small samples, but does it compensate in larger samples? •   The larger the sample is, the closer its variance will match the variance of the population, so the less compensation is needed. The n-1 deals with this perfectly since the difference between n and n-1 becomes negligible as n becomes large.
• We could as well have randomly chosen a particular sample whose mean sits ABOVE µ, which would make our sample mean bigger than the actual population mean.

In order to make it smaller, wouldn't a correction of "n+1" in the denominator be just as likely? Why do we assume that we're UNDERestimating? •   I think it has to do more with the distribution of the sample vs the distribution of the population than whether the sample mean is larger or smaller than the population mean. To illustrate this at end of the video Sal circles a sample of three points all bunched up on the left end of the line. He then says that the sample mean would sit within this sample and therefore the variance of the sample would be smaller than that of the entire population. However, this is not because the sample mean is smaller than the population mean. It's more because the distribution of the sample is really small so the distance between any of the points and the sample mean is really small so the sample variation would be small. When you compare this sample variation to the population variation you find that the sample variance is much smaller than the population variance because in the sample you have data points that are really close to each other and in contrast population the data is made up of far ranging data from the far left to the far right meaning their distance to the mean is going to be inevitably larger. The distribution of most data sample is likely going to be smaller than the distribution of the complete population and therefore the variation will be smaller as well. If you think about it this way you find that even if the sample mean is larger than the population mean the sample variation will still be an understatement because the sample data will not be as spread out as the population data. Hope it made sense
• () So why not take n-2 (or even something smaller) to make the variance even bigger...? •   I'll have a go at explaining the intuition between the "1" in "n-1":

Think of the whole equation as the average amount of variation. If this is truly what the equation is measuring then it should be (total amount of variation)/(number of things that can vary). Since the average i.e. mean is always Total/(Number of things).

Look at the numerator and the denominator in the sample variance equation. Is the following true?

-The numerator is a measure of the total amount of variation
-The denominator is the amount of things that are able to vary.

Yes. Why!? I mean surely there are N things that can vary about xbar i.e. the sample mean. Well actually no there aren't. There are N things that can vary about the population mean but only N-1 that can vary about the sample mean. Here's an example of why this is so:

Say you have 3 data points.
- You calculate the sample mean and it comes out to be 2.
- The first data point could be anything, let's say it is 1.
- The second data point could be anything, let's say it is 3.
- What can they second data point be? It absolutely MUST be 2. It is not free to vary - the sum of the three scores must be 6 or else the sample mean is not 2.

Knowing n-1 scores and the sample mean uniquely determines the last score so it is NOT free to vary. This is why we only have "n-1" things that can vary. So the average variation is (total variation)/(n-1).

total variation is just the sum of each points variation from the mean.The measure of variation we are using is the square of the distance. Why do we use the square of the distance ? Well, that is is a topic for another day.
• It seems the confusion comes not from the math but the language we're using to describe it. When we say "unbiased" what we actually mean in less biased, correct? •  No, it is unbiased - it has no bias in either direction, too large or too small and is not biased by sample size. Note that just because it is unbiased doesn't mean it necessarily gives you the right value for the population variance and it says nothing about the bias of the person doing the statistics.
• What is the difference between a parameter and a statistic? • From what I could tell. It seems to be your level of confidence. If your data covers the whole population you can be almost absolutely sure those numbers reflect the reality of that popultation so you give that value a name that denotes that confidence, a parameter. But if you are sampling a population and inferring about the whole population from that your level of confidence must be less and so you reflect that level of confidence in your language, a statistic. I think then you can start to ask questions like, "Well how confident are we in this statistic?" That is my take on it. There may be a deeper definition but I think that is the core.
• Does this also has a connection with the degrees of freedom? • Yes. The reason n-1 is used is because that is the number of degrees of freedom in the sample. The sum of each value in a sample minus the mean must equal 0, so if you know what all the values except one are, you can calculate the value of the final one.
• Why can't we just use n-2 or n-3 etc. if we want to get a bigger answer? • Because the point of using n-1 isn't to get a bigger value. Even when we use n, we will sometimes be overestimating the population variance ( σ² ) , and even when using n-1, we will sometimes be underestimating σ².

The purpose of using n-1 is so that our estimate is "unbiased" in the long run. What this means is that if we take a second sample, we'll get a different value of s². If we take a third sample, we'll get a third value of s², and so on. We use n-1 so that the average of all these values of s² is equal to σ².

If we only used n, then in this long run view, we'd be generally (but not always) underestimating σ². If we used n-2 or n-3, etc, then we'd be generally overestimating σ². Using n-1 is the sweet spot (we know this from the theory as well as from simulations).
• what if we overestimate them mean. isn't it as likely as underestimating. • A little confusion....the video assumes that my samples would usually be close to each other and hence the variance would be less than my population variance. However, if you assume an unbiased sample, wouldn't it also assume there is a 50% chance that my sample variance exceeds my population variance as well ? Also, why (n-1)? Why not (n-2) and so on ?  