Main content

## Statistics and probability

### Course: Statistics and probability > Unit 3

Lesson 5: Variance and standard deviation of a sample# Sample variance

AP.STATS:

UNC‑1 (EU)

, UNC‑1.J (LO)

, UNC‑1.J.3 (EK)

CCSS.Math: Thinking about how we can estimate the variance of a population by looking at the data in a sample. Created by Sal Khan.

## Want to join the conversation?

- What if n=1? Then wouldn't the sample variance be infinity?(37 votes)
- It would be undefined, yes. But would that be a problem? If you only looked at one data point from a population you really wouldn't have any idea of how dispersed the data is, so an undefined estimate of the population variance is appropriate.(136 votes)

- Where did the "-x{bar}" at9:59come from? I've totally missed that.(20 votes)
- X bar is for the sample mean whereas µ is for population mean.(28 votes)

- How do we know when it's ok to use a caculator when we're doing the math excersises?

I want to be able to do these problems as well as if I was in a 'real' classroom, so I don't want to cheat and use one when I shouldn't, but I don't know where we're meant to use one and where we should be doing the math completely on our own. Some of these topics doing it without a caculater takes quite a while and I've wondered if it would be ok, and now Sal is using one in this video. Would we be using one with this math topic in a classroom setting?(9 votes)- It's really up to you. You could even go through the exercise three different times, once doing the figures by hand (or at least until you got incredibly bored!), once with a statistical calculator, and maybe even once with a spreadsheet or a statistical software package. Think about drills like this not as an obligation to a teacher but as an opportunity to develop critical skills to the degree that you would like.(16 votes)

- How do we know when to divide by n and when to divide by n-1? Or is it better to always divide by n-1?(6 votes)
- Hi RJ,

We divide by n when we know a large majority of the data points. For example, if there are 7 tigers and we know 6 of their ages, then we would divide by n. We divide by n-1 when our sample is relatively small. For example, we know the ages of 5 hippos but there are 42 of them. In this case, divide by n-1 because, due to the small sample, we are probably underestimating the average age.

Hope that helps.(20 votes)

- Sal referenced a video to explain why you divide by n-1 it is not the next one here is the link:

https://www.khanacademy.org/math/ap-statistics/summarizing-quantitative-data-ap/more-standard-deviation/v/review-and-intuition-why-we-divide-by-n-1-for-the-unbiased-sample-variance(14 votes) - Why not when finding the variance, find the absolute value of each variables distance from the mean? Why square it? Would the above procedure just give you standard deviation?(8 votes)
- First, we
*could*take their absolute values but that would give us a totally different statistic, called the**Mean Absolute Deviation**(or**MAD**for short). There are various reasons why the standard deviation is preferred over the**MAD**(but that gets pretty technical). The point is that you*can*take the absolute value but it will, in general, give you a totally different number*not*equal to the standard deviation.

Lastly, when we square the distance from the mean, we also are squaring the units associated with them. So, if you are gathering data on children's heights and you want to calculate the variance, the result will be, for instance, 16 inches squared. Then, we take the square root of the variance (because it makes more sense to talk about height in terms of "inches" rather then "inches squared"), giving us a standard deviation of 4 inches. Does this make sense?(12 votes)

- Shouldn't you take the square root of the differences after you square them to get a more accurate estimate of the variance?(11 votes)
- It seems to me that, throughout watching all videos previous this, statistics is not based on any reasonable methodology, is that truth?(3 votes)
- No, it's not the truth. Statistics can sometimes be challenging to understand when first starting, but it is based on very reasonable methodology.

Part of the issue in learning intro stats is that some of the oddities require more advanced math to understand. So it's to some degree a chicken-and-the-egg type problem. Both the theory and the methods, without the context of the other, can seem arbitrary at times.(15 votes)

- It really bothers me that these terms are introduced here without a definition. Where would I even go to get some context? It seems like Variance doesn't actually get defined until the next course, which is absurd.(8 votes)
- Why do we square the differences? And also, would dividing by n-2 also work?(8 votes)

## Video transcript

Let's say that you're
curious about people's TV watching habits. And in particular, how much TV
do people in the country watch? So what you are concerned
with, if we imagine the entire country-- and
we've already talked about-- especially if we're talking
about a country like the United States, but pretty
much any country, is a very large population. In the United
States, we're talking about on the order of
300 million people. So ideally, if you could
somehow magically do it, you would survey or somehow
observe all 300 million people and take the mean of
how many hours of TV they watch on a given day. And then that will give you the
parameter, the population mean. But we've already
talked about, in a case like this, that's
a very impractical. Even if you tried to do
it, by the time you did it, your data might be stale because
some people might have passed away, other people
might have been born. Who knows what
might have happened. And so this is a truth
that is out there. There is a theoretical
population mean for the amount of the
average or the mean hours of TV watched per
day by Americans. There is a truth here at
any given point in time. It's just pretty much
impossible to come up with the exact answer, to
come up with this exact truth. But you don't give up. You say, well, maybe I don't
have to survey all 300 million or observe all 300 million. Instead, I'm just going
to observe a sample, right over here. And let's say, to make
the computation simple, you do a sample of six. And we'll talk about
later why six might not be as large of a sample
as you would like. But you survey how much
TV these folks watch. And you find one person who
watched 1 and 1/2 hours. Another person watched
2 and 1/2 hours. Another person watched 4 hours. And then you get one
person who watched 2 hours. And you get two people
who watched 1 hour each. So given this data
from your sample, what do you get as
your sample mean? Well, the sample mean, which
we would denote by lowercase x with a bar over
it, is just the sum of all of these divided by the
number of data points we have. So let's see we have 1.5
plus 2.5 plus 4 plus 2 plus 1 plus 1. And all of that
divided by 6, which gives-- let's see, the numerator
1.5 plus 2.5 is 4, plus 4 is 8, plus 2 is 10, plus 2 more is 12. So it's going to
be 12 over 6, which is equal to 2 hours
of television. So at least for your
sample, you say, my sample mean is two
hours of television. It's an estimate. It's a statistic that
is trying to estimate this parameter, this thing
that's very hard to know. But it's our best shot. Maybe we get a better answer
if we get more data points. But this is we have so far. Now the next question
you ask yourself is, well, I don't want to just
estimate my population mean. I also want to estimate
another parameter. I also am interested in
estimating my population variance. So once again, since
we can't survey every one in the
population, this is pretty much
impossible to know. But we're going to attempt to
estimate of this parameter. We attempted to
estimate the mean. Now we will also
attempt to estimate this parameter, this
variance parameter. So how would you do it? Well, reasonable logic would
say, well, we maybe we'll do the same thing
with a sample as we would have done
with the population. When you're doing the
population variance, you would take each data
point in the population, find the distance between that
and the normal population mean, take the square of
that difference, and then add up all the
squares of those differences, and then divide by the number
of data points you have. So let's try that over here. So let's try to find-- take
each of these data points, and find the difference--
let me do that in a different color--
each of these data points, and find the difference
between that data point and our sample mean--
not the population mean, we don't know what
the population mean-- the sample mean. So that's that first data
point plus the second data point-- so it's 4 minus 2
squared plus 1 minus 2 squared. And this is what
you would have done if you were taking a
population variance. If this was your
entire population, this is how you would you
find a population mean here, if this was your
entire population. And you find the
squared distances from each of those data
points and then divide by the number of data points. So let's just think
about this a little bit. 1 minus 2 squared. Then you have 2.5
minus 2-- 2 being the sample mean-- squared. Let me see, this green color. Plus 2 minus 2 squared. Plus 1 minus 2 squared. And then maybe you would divide
by the number of data points that you have, where you have
the number of data points. So in this case,
we're dividing by 6. And what would we get
in this circumstance? Well, if we just do the
computation, 1.5 minus 2 is negative 0.5. We square that. This becomes a positive 0.25. 4 minus 2 squared is going
to be 2 squared, which is 4. 1 minus 2 squared--
well, that's negative 1 squared, which is just 1. 2.5 minus 2 is 0.5
squared, is 0.25. 2 minus 2 squared--
well, that's just 0. And then 1 minus 2 squared is
1, it's negative 1 squared. So we just get 1. And if we add all
of this up-- let me add the whole numbers first. 4 plus 1 is 5, plus 1 is 6,
and then we have two 0.25s. So this is going to
be equal to 6.5-- let me write this
in a neutral color. So this is going to be 6.5
over this 6 right over here. Well, there's a couple of
ways we could write this, but I'll just get
the calculator out and we can just calculate it. So 6.5 divided by 6
gets us-- if we round, it's approximately 1.08. So it's approximately
1.08 is this calculation. Now what we have
to think about is whether this is the best
calculation, whether this is the best estimate for the
population variance, given the data that we have. You can always argue that
we could have more data. But given the data we have,
is this the best calculation that we can make to estimate
the population variance? And I'll have you think
about that for a second. Well, it turns out
that this is close, this is close to the best
calculation, the best estimate that we can make,
given the data we have. And sometimes this will be
called the sample variance. But it's a particular
type of sample variance where we just divide by the
number of data points we have. And so people will write
just an n over here. So this is one way to define a
sample variance in an attempt to estimate our
population variance. But it turns out--
and in the next video I'll give you an
intuitive explanation of why it turns out this way. And then I would also like to
write a computer simulation that, at least
experimentally, makes you feel a little bit better. But it turns out, you're going
to get a better estimate-- and it's a little bit weird
and voodooish at first when you first think
about it-- you're going to get a better estimate
for your population variance if you don't divide by
6, if you don't divide by the number of
data points you have but you divide by one less
than the number of data points you have. So how would we do that? And we can denote that
as sample variance. So when most people talk
about the sample variance, they're talking about
the sample variance where you do this calculation,
but instead of dividing by 6 you were to divide by 5. You would divide by 5. So they would say you
divide by n minus 1. So what would we get
in those circumstances? Well, the top part is going
to be the exact same thing. We're going to get 6.5. But then our
denominator, our n is 6. We have 6 data points. But we're going to
divide by 1 less than 6. We're going to divide by 5. And 6.5 divided by
5 is equal to 1.3. So when we calculate our sample
variance with this technique, which is the more
mainstream technique-- and it seems voodoo. Why are we dividing
by n minus 1, wherein for a population
variance we divide by n? But remember we're trying
to estimate the population variance. And it turns out that
this is a better estimate. Because this calculation
is underestimating what the population variance
is, this is a better estimate. We don't know for
sure what it is. These both could be way off. It could be just by chance
what we happen to sample. But over many samples--
and there's many ways to think about
it-- this is going to be a better calculation. It's going to give
you a better estimate. And so how would
we write this down? How would we write this down
with mathematical notation? Well, remember,
we're taking the sum. And we're taking each
of the data points. So we'll start with
the first data point all the way to the
nth data point. This lowercase n says that, hey,
we're looking at the sample. If I have an uppercase
N, that usually denotes that we're
trying to sum up everything in the population. Here we're looking at a
sample of size, lower case n. And we're taking each data
point, so each x sub i, and from it we're
subtracting the sample mean. And then we're squaring it. We're taking the sum of
the squared distances. And then we're dividing, not
by the number of data points we have, but by 1 less than the
number of data points we have. So this calculation, where
we just summed up all of this and then we divided
by 5, not by 6, this is the standard
definition of sample variance. So I'll leave you there. In the next video,
I will attempt to give you an intuition of
why we're dividing by n minus 1 instead of dividing by n.