Statistics and probability
- Measures of spread: range, variance & standard deviation
- Variance of a population
- Population standard deviation
- The idea of spread and standard deviation
- Calculating standard deviation step by step
- Standard deviation of a population
- Mean and standard deviation versus median and IQR
- Concept check: Standard deviation
- Statistics: Alternate variance formulas
Sal explains a different variance formula and why it works! For a population, the variance is calculated as σ² = ( Σ (x-μ)² ) / N. Another equivalent formula is σ² = ( (Σ x²) / N ) - μ². If we need to calculate variance by hand, this alternate formula is easier to work with. Created by Sal Khan.
Want to join the conversation?
- how does this work for sample variance? do you just subtract 1 from n?(35 votes)
- CAUTION !! As it is stated by sal this formula of variance only works for Population data only, not for sample data. The above formula is not a generalized one hence subtracting 1 from n wont yield the result of sample variance. You can simplify the sample variance as done in video you 'll get it. Thanks(36 votes)
- Around3:30Sal references the Calculus playlist--I'm not even CLOSE to that playlist yet. Am I watching these videos too soon? It seems like the Statistics playlist is showing up really early on my practice map and I may not have the skills to successfully accomplish the unit. Do you think this could be true? I did okay up through standard deviation, but z-scores, empirical rule and some references are throwing me off!(19 votes)
- This is like a side tour, sightseeing in a cool neighborhood. You don't need to move into the calculus house to work in statistics. For example, I think the formula for the Standard Deviation of a uniform distribution is (b-a)/sqrt(12). I wanted to know, Why 12? I asked Doctor Math and he (Doctor Anthony) gave me an explanation that I (frankly) didn't understand, but trust. I don't need to know where the 12 came from to use the formula, but I find it comforting to know that someone knows.(20 votes)
- around4:25where did 1 come from, next to the Sum?(5 votes)
- Firstly, it's grabbed from the "∑( … μ²)" above.
Salman moved μ² to the left of ∑ by dividing it away (to multiply it onto the left side).
∑(μ²) = μ²∑(1),
Because μ² ÷ μ² = 1, and it's only the xᵢ stuff that can't be divided away to the left side. Got that now? =)(10 votes)
- Can someone please let me know how does this work out for sample variance? Do we need to use (N-1) instead of N in the denominator and carry out the simplifications accordingly?(5 votes)
- I re-derived it for sample variances, and I tested my solutions against the problems section. This works if you already have a mean:
∑(x_i)^2 / (N-1) - (N/(N-1)) x̄^2
It's nice, and not much more complicated than the simple one he came up with in the video. Basically, divide the first term by (N-1) instead of N, and multiply the mean by the sample size, then divide by the sample size minus one.
For a Raw Scores method (you don't have a mean first), this works:
(N*∑(x_i)^2 - (∑(x_i)^2 ) / N*(N-1)
∑(x_i)^2 / (N-1) - (∑(x_i)^2 / N*(N-1)(3 votes)
- Variance is the single most used formula in Machine learning in supervised learning lessons. Thanks Sal ! You're giving me greater intuition of the topics making me a better engineer.!(5 votes)
- Doesn't that simplify to <x^2> - <x>^2. Then the standard deviation would be (<x^2>-<x>^2)^(1/2)(3 votes)
- The denominator are different for the (x^2) and the (x)^2, so you can't manipulate them that way.(2 votes)
- At11:30, I do not totally get why the second formula is faster than the previous one. Can someone please explain?(2 votes)
I think now is as good a time as any to play around a little bit with the formula for variance and see where it goes. And I think just by doing this we'll also get a little bit better intuition of just manipulating sigma notation, or even what it means. So we learned several times that the formula for variance-- and let's just do variance of a population. It's almost the same thing as variance of a sample. You just divide by n instead of n minus 1. Variance of a population is equal to-- well, you take each of the data points x sub i. You subtract from that the mean. You square it. And then you take the average of all of these. So you add the squared distance for each of these points from i equals 1 to i is equal to n. And you divide it by n. So let's see what happens if we can-- maybe we want to multiply out the squared term and see where it takes us. So let's see. And I think it'll take us someplace interesting. So this is the same thing as the sum from i is equal to 1 to n. This, we just multiply it out. This is the same thing as x sub i squared minus-- this is your little algebra going on here. So when you square it-- I mean, we could multiply it out. We could write it. x sub i minus mu times x sub i minus mu. So we have x sub i times x sub i, that's x sub i squared. Then you have x sub i times minus mu. And then you have minus mu times x sub i. So when you add those two together, you get minus 2x sub i mu, because you have it twice. x sub i times mu, that's 1 minus x sub i mu. And then you have another one, minus mu x sub i. When you add them together, you get minus 2x sub i mu. I know it's confusing with me saying sub i and all of that. But it's really no different than when you did a minus b squared. Just the variables look a little bit more complicated. And then the last term is minus mu times minus mu, which is plus mu squared. Fair enough. Let me switch colors just to keep it interesting. Let me cordon that off. The sum of this is the same thing as the sum of-- because if you think about it, we're going to take each x sub i. For each of the numbers in our population, we're going to perform this thing. And we're going to sum it up. But if you think about it, this is the same thing as-- if you're not familiar with sigma notation this is a good thing to know in general, just a little bit of intuition. That this is the same thing as-- I'll do it here to have space. The sum from i is equal to 1 to n of the first term, x sub i squared minus-- and actually, we can bring out the constant terms. When you're summing, the only thing that matters is the thing that has the i-th term. So in this case, it's x sub i. So x sub 1, x sub 2. So that's the thing that you have to leave on the right hand side of the sigma notation. And if you've done the calculus playlists already, sigma notation is really like a discrete integral on some level. Because in an integral, you're summing up a bunch of things and you're multiplying them times dx, which is a really small interval. But here you're just taking a sum. And we showed in the calculus playlist that an integral actually is this infinite sum of infinitely small things, but I don't want to digress too much. But this was just a long way of saying that the sum from i equals 1 to n of the second term is the same thing as minus 2 times mu of the sum from i is equal to 1 to n of x sub i. And then finally, you have plus-- well, this is just a constant term. This is just a constant term. So you can take it out. Times mu squared times the sum from i equals 1 to n. And what's going to be here? It's going to be a 1. We just divided a 1. We just divided this by 1. And took it out of the sigma sign, out of the sum. And you're just left with a 1 there. And actually, we could have just left the mu squared there. But either way, let's just keep simplifying it. So this we can't really do-- well, actually we could. Well, no, we don't know what the x sub i's are. So we just have to leave that the same. So that's the sum. Oh sorry, and this is just the numerator. This whole simplification, we're just simplifying the numerator. And later, we're just going to divide by n. So that is equal to that divided by n, which is equal to this thing divided by n. I'll divide by n at the end. Because it's the numerator that's the confusing part. We just want to simplify this term up here. So let's keep doing this. So this equals the sum from i equals 1 to n of x sub i squared. And let's see, minus 2 times mu-- sorry, that mu doesn't look good. Edit, Undo, minus 2 times mu times the sum from i is equal to 1 to n of xi. And then, what is this? What is another way to write this? Essentially, we're going to add 1 to itself n times. This is saying, just look, whatever you have here, just iterate through it n times. If you had an x sub i here, you would use the first x term, then the second x term. When you have a 1 here, this is just essentially saying, add one to itself n times, which is the same thing as n. So this is going to be plus mu squared times n. And then see if there's anything else we can do here. Remember, this was just the numerator. So this looks fine. We add up each of those terms. So we just have minus 2 mu from i equals 1 to-- oh well, think about this. What is this? What is this thing right here? Well actually, let's bring back that n. So this simplified to that divided by n, which simplifies to that whole thing, which is simplified to this whole thing, divided by n, which simplifies to this whole thing divided by n, which is the same thing as each of the terms divided by n, which is the same thing as that, which is the same thing as that, which is the same thing as that. And now, well, how does this simplify? This is the interesting part. Well, this, nothing much I can do here. So that just becomes the sum from i is equal to 1 to n x sub i squared divided by big N. Now this is interesting. If I take each of the terms in my population and I add them up and then I divide it by n, what is that? This thing right here? If I sum up all of the terms in my population and divide by the number of terms there are? That's the mean, right? That's the mean of my population. So this thing right here is also mu. So this thing simplifies to what? Minus 2 times what? Mu times this whole thing is mu too. So times mu squared. mu times mu, this is the mean of the population. So that was a nice simplification. And then plus-- what do you have here? Well let's see, you have n over n. Those cancel out. So we just have plus mu squared. So that was a very nice simplification. And then this simplifies to-- can't do much on this side. So the sum from i is equal to 1 to n of x sub i squared over n. And then you see, we have minus 2 mu squared plus mu squared. Well, that's the same thing as minus mu squared. Minus the mean squared. So this already we've come up with a neat way of writing the variance. You can essentially take the average of the squares of all of the numbers in this case, a population, and then subtract from that the mean squared of your population. So this could be, depending on you're calculating things, maybe a slightly faster way of calculating the variance. So just playing with a little algebra, we got from this thing where you have to each time take each of your data points, subtract the mean from it, and then squared. And of course, before you have to do anything you have to calculate the mean. And you take the square. And then you sum them all up. Then you take the average, essentially, when you sum and divided by n. We've simplified it just using a little bit of algebra to this formula. We're getting to something called the raw score method. And what we want to do is write this right here just in terms of xi's. And then we really are what you call the raw score method, which is oftentimes a faster way of calculating the variance. So let's see what is mu equal to? What is the mean? The mean is just equal to the sum from i is equal to 1 to n of each of the terms-- you just take the sum of each of the terms-- and you divide by the number of terms there are. So if we look at this thing, this thing can be written as-- let me draw a line here. This thing can be written as the sum from i is equal to 1 to n of x sub 1 squared all of that over n minus mu squared. Well, mu is this. So this thing squared is what? This is x sub i take the sum up to n. i is equal to 1. You're going to square this thing. And then you're going to divide it by-- we squared, right? You divide it by n squared. And this might seem like a more-- out of all of them, this actually seems like the simplest formula for me. Where you essentially just take-- if you know the mean of your population-- you just say, OK, my mean is whatever and I can just square that. And just put that aside for a second. But first, I can just take each of the numbers, square them, and then sum them up, and divide by the number of numbers I have. I don't know if I wrote-- no, I've erased the last set of numbers. But we could show you that you'll get to the same variance. So to me, this is almost the simplest formula. But this one's even faster in a lot of ways because you don't really have to even calculate the mean ahead of time. You can just say, OK, for each xi I just perform this operation. And then I divide by n squared or n accordingly. And I'll also get to the variance. So you don't have to do this calculation before you figure out the whole variance. But anyway, I thought it would be instructive and hopefully give you a little bit more intuition behind the algebra dealing with sigma if we worked out these other ways to write variances. And frankly, some books will just say, oh yeah, you know what? The variance could be written like this. We're talking about the variance of a population. Or it could be written like this, or maybe they'll even write it like this. And it's good to know that you can just do a little simple algebraic manipulation and get from one to the other. Anyway, I've run out of time. See you in the next video.