Main content

### Course: Statistics and probability > Unit 3

Lesson 5: Variance and standard deviation of a sample# Sample standard deviation and bias

Sal shows an example of calculating standard deviation and bias. Created by Sal Khan.

## Want to join the conversation?

- Sal says here "hopefully we're convinced now why we divide by n-1," but the previous video left off with "next time I'll show you further why we divide by n-1." Is there a video in between that I should be watching, or some other information? I can't help feeling quite confused, and this is not the first time in this course I've felt Sal mentioned something that wasn't explained previously.(81 votes)
- Here is link to the video I think Sal was referencing

https://www.khanacademy.org/math/ap-statistics/summarizing-quantitative-data-ap/more-standard-deviation/v/review-and-intuition-why-we-divide-by-n-1-for-the-unbiased-sample-variance(47 votes)

- Are there any other ways to obtain an unbiased standard deviation from our sample population, instead of just accepting the fact that the sample variance gives you a biased standard deviation?(73 votes)
- The short answer is "no"--there is no unbiased estimator of the population standard deviation (even though the sample variance
*is*unbiased). However, for certain distributions there are*correction factors*that, when multiplied by the sample standard deviation, give you an unbiased estimator. Nevertheless, all of this is definitely beyond the scope of the video and, frankly, not that important in the grand scheme of things (i.e. unless you're a technical mathematician, don't worry about it). But it was a good question!(75 votes)

- At8:10, what does 'nonlinear' mean by?(23 votes)
- Here is a function y = f(x). When you give a input value x, you will have a output value y through some operation. If this function is linear, it means when you change x by Δx, the change of y (Δy) has a fixed ratio to Δx.

Graphically, if you plot values from function y = f(x) and line them up, you will get a straight line.

Nonlinear functions are those, if you change x with Δx, Δy divided by Δx is not a fixed value. Consequently, the if you plot values from that function and line them up, you won't get a straight line. You may get a curve.(44 votes)

- i didn't get it. how do square root of unbiased sample variance leads to biased standard deviation..? kindly explain.(17 votes)
- I'm not an expert in statistics, but here's my crack at it.

An unbiased process that outputs some value means that the expected value of the process will match some actual value. Basically, as you perform the unbiased process on more and more samples the average value will approach the actual value.

But if you have a set of values who's average is some number and then perform a non-linear operation on them (like sqr root) then their new average value is NOT going to match the old average with the same non-linear operation performed on it.

For example, take the following numbers:

2, 2, 2, 2, 12

Their average is 4.

Here is their sqr roots:

1.41, 1.41, 1.41, 1.41, 3.46

the average value of those sqr roots is 1.82.

But the sqr root of the old average value is 2.

They don't match! We've introduced some bias by performing a non-linear operation.

I imagine it's impossible to remove this bias because the magnitude and direction of the bias probably heavily depends on the population data.(32 votes)

- @4:02, why do we divide by 7 instead of 8? I know he says it is the unbiased sample variance, but what exactly does that mean?(10 votes)
- That means that he is using a better approximation for the variance of the population, given a normal distribution.(13 votes)

- My boy started to glitch @2:50(15 votes)
- Sal.exe has stopped working(2 votes)

- How would I know to divide by n-1 or n? I know this question has been asked before, but I don't really see the reason. Could someone please give me a simple answer?(6 votes)
- The reason has been explained in the wikipedia https://en.wikipedia.org/wiki/Variance#Sample_variance.

n-1 correction is called Bessel's correction. Even though I couldn't understand the proof, I did understand that this is the case that you divide by n-1 instead of n when you have a sample and are estimating for the whole population.(10 votes)

- I don't think "unbiased sample variance" has been explained. In the previous video, Sal promised to explain why the result is more accurate when we subtract 1 from the denominator, but here he just assumes everybody knows why.(8 votes)
- It's confusing where all of these terms are coming from: "Population mean," "sample mean," "capital 'N'," "lowercase 'n'." Can someone help me with this? What do these mean/represent? None of these concepts were showcased earlier in Unit 1 or in the first video of this unit, "Sample variance."(3 votes)
- I like to compare statistics verbiage with a biology class. I usually think of "population" as an entire species and "sample" as a test strip you're putting under the microscope for better attention.

There are two different means: one of the population and one of the sample. The population is the overall mean, whereas the sample mean is the average of the sample you selected from the population. For instance, if you compared the average weight every person in the world could pull versus the 100 strongest people, there would be two very different means. In this case, the sample is skewed because it isn't a randomized selection, but hopefully this gives you a better idea of the difference between a "population mean" and a "sample mean."

In statistics, consider the population as "big" and the sample as "small." In the same way, "N" and "n" can be determined based on their size and associated with either the population or sample size. (N is to population, and n is to sample)(8 votes)

- Instead of squaring the difference from the mean and taking the square root of the sum, isn't it more reasonable to take the mean of the absolute value of the difference from the mean? This way we won't require squares and square roots.(3 votes)
- That is an alternative method, known as the
*mean absolute deviation*. To understand why the variance is more popular, I'd suggest taking a read through an old answer that I wrote up here:

https://www.khanacademy.org/math/probability/descriptive-statistics/variance_std_deviation/v/variance-of-a-population?qa_expand_key=ag5zfmtoYW4tYWNhZGVteXJVCxIIVXNlckRhdGEiN3VzZXJfaWRfa2V5X2h0dHA6Ly9mYWNlYm9va2lkLmtoYW5hY2FkZW15Lm9yZy84Mjg5OTkzOTIMCxIIRmVlZGJhY2sYgfoBDA(8 votes)

## Video transcript

Let's say that you're
a watermelon farmer, and you want to study
how dense the seeds are in your watermelon. Perhaps you want to do this
because over time, you're trying to breed watermelons
that have fewer seeds, and you should see whether you
are actually making progress. And you don't want to
cut open every watermelon in your watermelon farm
or patch or whatever it might be called, because
you want to sell most of them. You just want to sample
a few watermelons, and then take samples
of those watermelons to figure out how dense the
seeds are, and hope that you can calculate statistics
on those samples that are decent estimates of the
parameters for the population. So let's start doing that. So let's say that you take these
little cubic inch chunks out of a random sample
of your watermelons. And then you count the
number of seeds in them. And you have 8
samples like this. So in one of them,
you found 4 seeds. In the next, you found
3, 5, 7, 2, 9, 11, and 7. So this is a
sample, just to make sure we're visualizing it right. If this is the population
of all of the chunks-- I guess we could view
this as a cubic inch-- the cubic inch chunks in
my entire watermelon farm, I'm sampling a very
small sample of them. Maybe I could have had
a million over here. A million chunks
of watermelon could have been produced from
my farm, but I'm only sampling-- so capital
N would be 1 million, lowercase n is equal to 8. And once again, you might
want to have more samples, but this'll make our math easy. Now, let's think about what
statistics we can measure. Well, the first one
that we often do is a measure of
central tendency. And that's the arithmetic mean. But here, we're trying to
estimate the population mean by coming up with
the sample mean. So what is the sample
mean going to be? Well, all we have to do
is add up these points, add up these measurements,
and then divide by the number of
measurements we have. So let's get our
calculator out for that. Actually, maybe I don't
need my calculator. Let's see. So 4 plus 3 is 7. 7 plus 5 is 12. 12 plus 7 is 19. 19 plus 2 is 21, plus 9 is 30,
plus 11 is 41, plus 7 is 48. So I'm going to get
48 over 8 data points. So this worked out quite well. 48 divided by 8 is equal to 6. So our sample mean is 6. It's our estimate of what
the population mean might be. But we also want to think about
how much in our population we want to estimate, how
much spread is there, or how much do our measurements
vary from this mean. So there, we say, well, we can
try to estimate the population variance by calculating
the sample variance. And we're going to calculate
the unbiased sample variance. Hopefully, we're fairly
convinced at this point why we divide by n minus 1. So we're going to calculate
the unbiased sample variance. And if we do that,
what do we get? I'll do this in a
different color. It's going to be 4 minus 6
squared plus 3 minus 6 squared plus 5 minus 6 squared
plus 7 minus 6 squared plus 2 minus 6 squared
plus 9 minus 6 squared plus 11 minus 6 squared plus
7 minus 6 squared, all of that divided by-- not by 8. Remember, we want the
unbiased sample variance. We're going to divide
it by 8 minus 1. So we're going to divide by 7. Let me give myself a little
bit more real estate. The unbiased sample
variance-- and I could even denote it by this to
make it clear that we're dividing by lowercase
n minus 1-- is going to be equal to-- let's see,
4 minus 6 is negative 2. That squared is positive 4. So I did that one. 3 minus 6 is negative 3. That squared is going to be 9. 5 minus 6 squared is
1 squared, which is 1. 7 minus 6 is once again
1 squared, which is 1. 2 minus 6, negative
4 squared is 16. 9 minus 6 squared, well,
that's going to be 9. 11 minus 6 squared, that is 25. And then finally, 7 minus 6
squared, that's another 1. And we're going
to divide it by 7. Let's see if we can add
this up in our heads. 4 plus 9 is 13, plus 1 is
14, 15, 31, 40, 65, 66. So this is going to
be equal to 66 over 7. And we could either divide--
we get that's 9 and 3/7. We could write
that as 9 and 3/7. Or if we want to write
that as a decimal, I could just take
66 divided by 7 gives us 9 point--
I'll just round it. So it's approximately 9.43. Now, that gave us our
unbiased sample variance. Well, how could we calculate
a sample standard deviation? We want to somehow get added
estimate of what the population standard deviation might be. Well, the logic, I guess,
is reasonable to say, well, this is our unbiased
sample variance. It's our best estimate of
what the true population variance is. When we think about
population parameters to get the population
standard deviation, we just take the square root
of the population variance. So if we want to get an
estimate of the sample standard deviation, why
don't we just take the square root of the
unbiased sample variance? So that's what we'll do. So we'll define it that way. We'll call it the sample
standard deviation. We're going to define it to
be equal to the square root of the unbiased sample variance. It's going to be the square
root of this quantity, and we can take
our calculator out. It's going to be the square
root of what I just typed in. I can do 2nd answer. It'll be the last entry here. So the square root of that
is-- and I'll just round. It's approximately
equal to 3.07. Now, I'm going to
tell you something very counterintuitive. Or at least initially
it's counterintuitive, but hopefully you'll
appreciate this over time. This we've already talked
about in some depth. People have even
created simulations to show that this is an unbiased
estimate of population variance when we divide it by n minus 1. And that's a good
starting point if we're going to take the
square root of anything. But it actually turns out
that because the square root function is nonlinear,
that this sample standard deviation-- and
this is how it tends to be defined-- sample standard
deviation, that this sample standard deviation, which is
the square root of our sample variance, so from
i equals 1 to n of our unbiased sample variance,
so we divide it by n minus 1. This is how we literally divide
our sample standard deviation. Because the square root
function is nonlinear, it turns out that this is
not an unbiased estimate of the true population
standard deviation. And I encourage people to
make simulations of that if they're interested. But then you might say, well,
we went through great pains to divide by n minus
1 here in order to get an unbiased estimate
of the population variance. Why don't we go
through similar pains and somehow figure out a
formula for an unbiased estimate of the population
standard deviation? And the reason why
that's difficult is to unbias the
sample variance, we just have to divide by
n minus 1 instead of n. And that'd work for any
probability distribution for our population. It turns out to
do the same thing for the standard deviation. It's not that easy. It's actually dependent on how
that population is actually distributed. So in statistics, we just define
the sample standard deviation. And the one that
we typically use is based on the square root of
the unbiased sample variance. But when you take
that square root, it does give you a
biased result when you're trying to use this
to estimate the population standard deviation. But it's the simplest,
best tool we have.