Main content

## Statistics and probability

### Course: Statistics and probability > Unit 3

Lesson 4: Variance and standard deviation of a population- Measures of spread: range, variance & standard deviation
- Variance of a population
- Population standard deviation
- The idea of spread and standard deviation
- Calculating standard deviation step by step
- Standard deviation of a population
- Mean and standard deviation versus median and IQR
- Concept check: Standard deviation
- Statistics: Alternate variance formulas

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Statistics: Alternate variance formulas

Sal explains a different variance formula and why it works! For a population, the variance is calculated as σ² = ( Σ (x-μ)² ) / N. Another equivalent formula is σ² = ( (Σ x²) / N ) - μ². If we need to calculate variance by hand, this alternate formula is easier to work with. Created by Sal Khan.

## Want to join the conversation?

- how does this work for sample variance? do you just subtract 1 from n?(35 votes)
- CAUTION !! As it is stated by sal this formula of variance only works for Population data only, not for sample data. The above formula is not a generalized one hence subtracting 1 from n wont yield the result of sample variance. You can simplify the sample variance as done in video you 'll get it. Thanks(36 votes)

- Around3:30Sal references the Calculus playlist--I'm not even CLOSE to that playlist yet. Am I watching these videos too soon? It seems like the Statistics playlist is showing up really early on my practice map and I may not have the skills to successfully accomplish the unit. Do you think this could be true? I did okay up through standard deviation, but z-scores, empirical rule and some references are throwing me off!(19 votes)
- This is like a side tour, sightseeing in a cool neighborhood. You don't need to move into the calculus house to work in statistics. For example, I think the formula for the Standard Deviation of a uniform distribution is (b-a)/sqrt(12). I wanted to know, Why 12? I asked Doctor Math and he (Doctor Anthony) gave me an explanation that I (frankly) didn't understand, but trust. I don't need to know where the 12 came from to use the formula, but I find it comforting to know that someone knows.(20 votes)

- around4:25where did 1 come from, next to the Sum?(5 votes)
- Firstly, it's grabbed from the "∑( … μ²)" above.

Salman moved μ² to the left of ∑ by dividing it away (to multiply it onto the left side).

∑(μ²) = μ²∑(1),

Because μ² ÷ μ² = 1, and it's only the xᵢ stuff that can't be divided away to the left side. Got that now? =)(10 votes)

- Can someone please let me know how does this work out for sample variance? Do we need to use (N-1) instead of N in the denominator and carry out the simplifications accordingly?(5 votes)
- I re-derived it for sample variances, and I tested my solutions against the problems section. This works if you already have a mean:

∑(x_i)^2 / (N-1) - (N/(N-1)) x̄^2

It's nice, and not much more complicated than the simple one he came up with in the video. Basically, divide the first term by (N-1) instead of N, and multiply the mean by the sample size, then divide by the sample size minus one.

For a Raw Scores method (you don't have a mean first), this works:

(N*∑(x_i)^2 - (∑(x_i)^2 ) / N*(N-1)

or

∑(x_i)^2 / (N-1) - (∑(x_i)^2 / N*(N-1)(3 votes)

- Variance is the single most used formula in Machine learning in supervised learning lessons. Thanks Sal ! You're giving me greater intuition of the topics making me a better engineer.!(5 votes)
- If i hear "mew" again im going to scream.(4 votes)
- where do I find chebyshev's theorem help?(2 votes)
- Doesn't that simplify to <x^2> - <x>^2. Then the standard deviation would be (<x^2>-<x>^2)^(1/2)(3 votes)
- The denominator are different for the (x^2) and the (x)^2, so you can't manipulate them that way.(2 votes)

- this videos was really hard to understand , didnt understand the simplification part(3 votes)
- At11:30, I do not totally get why the second formula is faster than the previous one. Can someone please explain?(2 votes)

## Video transcript

I think now is as good a time as
any to play around a little bit with the formula for variance
and see where it goes. And I think just by doing this
we'll also get a little bit better intuition of just
manipulating sigma notation, or even what it means. So we learned several times
that the formula for variance-- and let's just do
variance of a population. It's almost the same thing
as variance of a sample. You just divide by n
instead of n minus 1. Variance of a population
is equal to-- well, you take each of the
data points x sub i. You subtract from that the mean. You square it. And then you take the
average of all of these. So you add the squared distance
for each of these points from i equals 1 to i is equal to n. And you divide it by n. So let's see what
happens if we can-- maybe we want to multiply
out the squared term and see where it takes us. So let's see. And I think it'll take
us someplace interesting. So this is the same thing as the
sum from i is equal to 1 to n. This, we just multiply it out. This is the same thing as x
sub i squared minus-- this is your little
algebra going on here. So when you square it-- I
mean, we could multiply it out. We could write it. x sub i minus mu times
x sub i minus mu. So we have x sub i times x
sub i, that's x sub i squared. Then you have x sub
i times minus mu. And then you have
minus mu times x sub i. So when you add
those two together, you get minus 2x sub i mu,
because you have it twice. x sub i times mu, that's
1 minus x sub i mu. And then you have another
one, minus mu x sub i. When you add them together,
you get minus 2x sub i mu. I know it's confusing with me
saying sub i and all of that. But it's really no
different than when you did a minus b squared. Just the variables look a
little bit more complicated. And then the last term is
minus mu times minus mu, which is plus mu squared. Fair enough. Let me switch colors just
to keep it interesting. Let me cordon that off. The sum of this
is the same thing as the sum of-- because
if you think about it, we're going to
take each x sub i. For each of the numbers
in our population, we're going to
perform this thing. And we're going to sum it up. But if you think
about it, this is the same thing
as-- if you're not familiar with sigma notation
this is a good thing to know in general, just
a little bit of intuition. That this is the same thing as--
I'll do it here to have space. The sum from i is equal to
1 to n of the first term, x sub i squared minus--
and actually, we can bring out the
constant terms. When you're summing, the
only thing that matters is the thing that
has the i-th term. So in this case, it's x sub i. So x sub 1, x sub 2. So that's the
thing that you have to leave on the right hand
side of the sigma notation. And if you've done the
calculus playlists already, sigma notation is really
like a discrete integral on some level. Because in an integral, you're
summing up a bunch of things and you're multiplying
them times dx, which is a really
small interval. But here you're
just taking a sum. And we showed in the
calculus playlist that an integral actually
is this infinite sum of infinitely
small things, but I don't want to digress too much. But this was just a long way
of saying that the sum from i equals 1 to n of the second term
is the same thing as minus 2 times mu of the sum from i is
equal to 1 to n of x sub i. And then finally,
you have plus-- well, this is just a constant term. This is just a constant term. So you can take it out. Times mu squared times the
sum from i equals 1 to n. And what's going to be here? It's going to be a 1. We just divided a 1. We just divided this by 1. And took it out of the
sigma sign, out of the sum. And you're just
left with a 1 there. And actually, we could have
just left the mu squared there. But either way, let's
just keep simplifying it. So this we can't really do--
well, actually we could. Well, no, we don't know
what the x sub i's are. So we just have to
leave that the same. So that's the sum. Oh sorry, and this is
just the numerator. This whole simplification, we're
just simplifying the numerator. And later, we're just
going to divide by n. So that is equal to
that divided by n, which is equal to this
thing divided by n. I'll divide by n at the end. Because it's the numerator
that's the confusing part. We just want to simplify
this term up here. So let's keep doing this. So this equals the sum
from i equals 1 to n of x sub i squared. And let's see, minus
2 times mu-- sorry, that mu doesn't look good. Edit, Undo, minus 2 times
mu times the sum from i is equal to 1 to n of xi. And then, what is this? What is another
way to write this? Essentially, we're going
to add 1 to itself n times. This is saying, just look,
whatever you have here, just iterate through it n times. If you had an x sub
i here, you would use the first x term,
then the second x term. When you have a 1 here, this
is just essentially saying, add one to itself n times,
which is the same thing as n. So this is going to be
plus mu squared times n. And then see if there's
anything else we can do here. Remember, this was
just the numerator. So this looks fine. We add up each of those terms. So we just have
minus 2 mu from i equals 1 to-- oh well,
think about this. What is this? What is this thing right here? Well actually, let's
bring back that n. So this simplified
to that divided by n, which simplifies to
that whole thing, which is simplified to this
whole thing, divided by n, which simplifies to this whole
thing divided by n, which is the same thing as each of
the terms divided by n, which is the same thing as that,
which is the same thing as that, which is the same thing as that. And now, well, how
does this simplify? This is the interesting part. Well, this, nothing
much I can do here. So that just becomes the
sum from i is equal to 1 to n x sub i squared
divided by big N. Now this is interesting. If I take each of the terms in
my population and I add them up and then I divide it
by n, what is that? This thing right here? If I sum up all of the
terms in my population and divide by the number
of terms there are? That's the mean, right? That's the mean
of my population. So this thing right
here is also mu. So this thing
simplifies to what? Minus 2 times what? Mu times this whole
thing is mu too. So times mu squared. mu times mu, this is the
mean of the population. So that was a nice
simplification. And then plus-- what
do you have here? Well let's see,
you have n over n. Those cancel out. So we just have plus mu squared. So that was a very
nice simplification. And then this simplifies to--
can't do much on this side. So the sum from i is equal to 1
to n of x sub i squared over n. And then you see, we have minus
2 mu squared plus mu squared. Well, that's the same
thing as minus mu squared. Minus the mean squared. So this already we've
come up with a neat way of writing the variance. You can essentially take the
average of the squares of all of the numbers in this
case, a population, and then subtract from
that the mean squared of your population. So this could be, depending
on you're calculating things, maybe a slightly faster way
of calculating the variance. So just playing with a little
algebra, we got from this thing where you have to each time
take each of your data points, subtract the mean from
it, and then squared. And of course, before
you have to do anything you have to calculate the mean. And you take the square. And then you sum them all up. Then you take the
average, essentially, when you sum and divided by n. We've simplified it just
using a little bit of algebra to this formula. We're getting to something
called the raw score method. And what we want to do is write
this right here just in terms of xi's. And then we really are
what you call the raw score method, which is
oftentimes a faster way of calculating the variance. So let's see what
is mu equal to? What is the mean? The mean is just equal to the
sum from i is equal to 1 to n of each of the terms-- you
just take the sum of each of the terms-- and you divide by
the number of terms there are. So if we look at this
thing, this thing can be written as-- let
me draw a line here. This thing can be written as the
sum from i is equal to 1 to n of x sub 1 squared all of
that over n minus mu squared. Well, mu is this. So this thing squared is what? This is x sub i take
the sum up to n. i is equal to 1. You're going to
square this thing. And then you're going to divide
it by-- we squared, right? You divide it by n squared. And this might seem like a
more-- out of all of them, this actually seems like
the simplest formula for me. Where you essentially
just take-- if you know the mean
of your population-- you just say, OK,
my mean is whatever and I can just square that. And just put that
aside for a second. But first, I can just
take each of the numbers, square them, and
then sum them up, and divide by the number
of numbers I have. I don't know if I
wrote-- no, I've erased the last set of numbers. But we could show
you that you'll get to the same variance. So to me, this is almost
the simplest formula. But this one's even
faster in a lot of ways because you don't really
have to even calculate the mean ahead of time. You can just say,
OK, for each xi I just perform this operation. And then I divide by n
squared or n accordingly. And I'll also get
to the variance. So you don't have to do
this calculation before you figure out the whole variance. But anyway, I thought it would
be instructive and hopefully give you a little bit more
intuition behind the algebra dealing with sigma
if we worked out these other ways
to write variances. And frankly, some books
will just say, oh yeah, you know what? The variance could
be written like this. We're talking about the
variance of a population. Or it could be
written like this, or maybe they'll even
write it like this. And it's good to know
that you can just do a little simple
algebraic manipulation and get from one to the other. Anyway, I've run out of time. See you in the next video.