If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

## AP®︎/College Statistics

### Course: AP®︎/College Statistics>Unit 3

Lesson 5: More on standard deviation (optional)

# Simulation providing evidence that (n-1) gives us unbiased estimate

Simulation by KA user tetef showing that dividing by (n-1) gives us an unbiased estimate of population variance. Simulation at: http://www.khanacademy.org/cs/will-it-converge-towards-1/1167579097. Created by Sal Khan.

## Want to join the conversation?

• Just curious: Was it by simulations like this that statisticians originally figured out the n-1 thing? Or is that conclusion actually really obvious if you just understand the "pure math" underlying it?
• No, they did it analytically. They probably came up with some intuition of the need to adjust the variance, but intuition cannot tell you why you have to divide exactly by n-1.

There is a geometrical reason to dividing by n-1, it's the number of degrees of freedom. You can see this for the sample variance by considering the number of independent data points. To compute the sample variance, you compute first the sample mean. This means that given this sample mean, if someone gives you all the data points except one, you can figure out by yourself what the last data point is. So, you actually don't have a sample size of n data points to compute the sample variance, but a sample size of n-1.
• I'm sorry, but what does biased and unbiased mean?
• A biased estimate is an one that consistently underestimates or overestimates.

For example, sample estimates using (n) tend to consistently underestimate the population variance. So we say it has a BIAS for underestimation.

Sample estimates using (n-1) however do not tend to underestimate or overestimate, so we consider it UNBIASED.

Note that unbiased is not the same thing as accurate. Suppose I use another method that sometimes way underestimates, but at other times way overestimates. This method is not very accurate, but it is also unbiased -- the mean of its errors would be close to zero since the overestimates would "cancel out" the underestimates.
(1 vote)
• When do you make a question to do with variance (n-1)? When is it just n? Thank you. would really appreciate a clear answer...
• n-1 when you chose a sample from the population
n when you've counted the entire population.
• These explanations are based on empirical evidence, Is there a theoretical explanation for dividing by n-1?
• For me this Wikipedia link is more detailed and more understandable, however more or less the same as the "Sample variance" page. But still a bit different, might be worth to check for those who needed more info on "n-1" stuff after the Sample variance article. https://en.wikipedia.org/wiki/Bias_of_an_estimator
• I understand that n-1 provides a more accurate estimation. However, if we know our population N value, couldn't we just subtract the n/N ratio from n instead? For example, if N=20 and n=10, we would know the ratio is 0.5. Therefore, we could find an even better estimate from n-0.5.
(1 vote)
• The number that we subtract has nothing to do with the size of the population. It's not just that it makes the estimate "more accurate," it's that it makes it what Statisticians call "unbiased."

Think back to the sampling distribution of the sample mean. So, if we repeated an experiment over and over again, and recorded the sample mean from each of the repeated experiments. The mean of the sampling distribution of the sample mean -- what Sal talks sometimes refers to as the "mean of means" -- happens to be equal to the mean of the original distribution. Because of this, we say that the sample mean is "unbiased" - it doesn't systematically overestimate or underestimate the population mean.

This is not the case with the variance. If we calculate the variance over and over again, using n in the denominator, the "mean of variances" (a strange concept, but it's the proper one to think about) will not be equal to σ^2, it will be σ^2 * (n-1)/n. By dividing by n-1 instead of n, we fix this problem. Using n, the sample variance is biased, because it tends to underestimate the population variance. Using n-1, the sample variance is unbiased.

So in this sense, it's not possible to get a better estimate for the variance. Subtracting 1, and specifically 1, is the best we can do. Changing what we divide by can only make it worse. Now, there are other criteria we might look at which may make a different estimate of the sample variance seem "better," but if we're just talking about the denominator we're using, n-1 can't be beat.
• Hi all,

I have also heard people saying that we divide by the degrees of freedom, which, as I understand, would be the numbers of values I need to fix to get the information on all values. In this case, this would mean that, if I am provided with the sample mean, I only have n-1 degrees of freedom as I can calculated the last value in my sample by the information I got.
Question 1: Did I understand this correctly so far?
Question 2: Where is the logical link between 'I can estimate the last value based on the information I am given' and 'I better divide by n-1 to estimate the variance'?
Question 3: The same idea would be true for the population variance. Here, too, I can calculate the missing value given n-1 values and the mean? So why, under the aspect of degrees of freedom, would I still divide by n here?
Question 4: How is the concept of degrees of freedom related to the explanation for using n-1 provided in the video?

Thank you very much for your help!
• Isn't the relative size of the sample compared to the population relevant when calculating the sample variance? I mean, if we calculate the variance of 99 elements out of a population of 100 elements, won't the variance of this sample be more accurately described by N, and not (N-1)? Is there a threshold for a sample to be described by (N-1)?
• That’s an excellent question, and I’m not sure about the answer.

But if our sample size is only one or two less than our population size, we might as well look at every element in the population instead. Sampling is used when it is not practical to take information from the whole population, so there is usually a good portion of the population left over. So, this situation isn’t practical, but it is interesting to think about theoretically.
• what dont we divide our sample mean by n-1, is it not a biased estimator?
(1 vote)
• Different sample means will oscillate around the population mean, can be both higher and lower, but different sample variances will tend to be lower than then population variance.
• When your sample size approaches the pop. size, at what point would it be best to stop using (n-1) and use (n)
• You should use n-1 unless your sample size is the entire population N. However, for large n and large N, it does not matter much whether you use n instead of the preferred n-1 since the ratio (n-1)/n is small.
(1 vote)
• if my sample size is greater than, let's say, half of the population size, i.e. n > N/2, should I use the biased sample variance to get a better estimate? More generally, if I am aware of the value of N should I use this information to decide which formula to use? And at which value of n/N should I consider using the biased sample variance.
• You should use n-1 unless your sample size is the entire population N. However, for large n and large N, it does not matter much whether you use n instead of the preferred n-1 since the ratio (n-1)/n is small.
(1 vote)

## Video transcript

Here's a simulation created by Khan Academy user TETF. I can assume that's pronounced tet f. And what it allows us to do is give us an intuition as to why we divide by n minus 1 when we calculate our sample variance and why that gives us an unbiased estimate of population variance. So the way this starts off, and I encourage you to go try this out yourself, is that you can construct a distribution. It says build a population by clicking in the blue area. So here, we are actually creating a population. So every time I click, it increases the population size. And I'm just randomly doing this, and I encourage you to go onto this scratch pad-- it's on the Khan Academy Computer Science-- and try to do it yourself. So here I could stop at some point. So I've constructed a population. I can throw out some random points up here. So this is our population, and as you saw while I was doing that, it was calculating parameters for the population. It was calculating the population mean at 204.09 and also the population standard deviation, which is derived from the population variance. This is the square root of the population variance, and it's at 63.8. It was also pop plotting the population variance down here. You see it's 63.8, which is the standard deviation, and it's a little harder to see, but it says it's squared. These are these numbers squared. So essentially, 63.8 squared is the population variance. So that's interesting by itself, but it really doesn't tell us a lot so far about why we divide by n minus 1. And this is the interesting part. We can now start to take samples, and we can decide what sample size we want to do. I'll start with really small samples, so the smallest possible sample that makes any sense. So I'm going to start with a really small sample. And what they're going to do-- what the simulation is going to do-- is every time I take a sample, it's going to calculate the variance. So the numerator is going to be the sum of each of my data points in my sample minus my sample mean, and I'm going to square it. And then it's going to divide it by n plus a, and it's going to vary a. It's going to divide it by anywhere between n plus negative 3, so n minus 3, all the way to n plus a. And we're going to do it in many, many, many, many, times. We're going to essentially take the mean of those variances for any a and figure out which gives us the best estimate. So if I just generate one sample right over there, when we see kind of this curve, when we have high values of a, we are underestimating. When we have lower values of a, we are overestimating the population variance, but that was just for one sample, not really that meaningful. It's one sample of size two. Let's generate a bunch of samples and then average them over many of them. And you see when you look at many, many, many, many, many examples, something interesting is happening. When you look at the mean of those samples, when you average together those curves from all of those samples, you see that our best estimate is when a is pretty close to negative 1, is when this is n plus negative 1 or n minus 1. Anything less than negative 1-- if we did negative n minus 1.05 or n minus 1.5-- we start overestimating the variance. Anything less than negative 1, so if we have n plus 0, if we divide by n or if we have n plus 0.05 or whatever it might be, we start underestimating the population variance. And you can do this for samples of different sizes. Let me try a sample size 6. And here you go once again, as I press-- I'm just keeping Generate Sample pressed down-- as we generate more and more and more samples-- and for all the a's we essentially take the average across those samples for the variance depending on how we calculate it-- you'll see that once again, our best estimate is pretty darn close to negative 1. And if you were to get this to millions of samples generated, you'll see that your best estimate is when a is negative 1 or when you're dividing by n minus 1. So once again, thanks TETF, tet f, for this. I think it's a really interesting way to think about why we divide by n minus 1.