What's the point of squaring the difference at 5:49 just to make it positive when we could have taken the absolute value?

The main reason to square the values is so they are all positive. You could take the absolute value instead, but squaring means that more variable points have a higher weighting. Squaring rather than taking the absolute value also means that taking the derivative of the function is easier. Finally you can also view the equal as being the Euclidean distance between all the points and the mean of the points (in the same way that the distance between two points (x1, y1) and (x2, y2) is the square root of (x1-x2)^2 + (y1-y2)^2). Credits to Peter Collingridge

In what case will either Variance or Standard Deviation be preferred over each other ? At around 9:55 Sal says that Variance has an odd set of units so is standard deviation better as it has the same units as the data itself ?

In practical settings, the standard deviation is probably almost always preferred. the variance is more often used in the background, deriving this or that, or used in the theory of something.

What is the difference between the standard deviation and the variance? Why is the variance in units squared and not represented by the units in the measurement? If squaring the numbers is just to make it positive (@ 5:48), why not use the average absolute deviation?

Q1) The Standard Deviation is the "mean of mean". Basically, it is the square-root of the Variance (the mean of the differences between the data points and the average). Standard Deviation is the measure of how far a typical value in the set is from the average. The smaller the Standard Deviation, the closely grouped the data point are. The standard deviation of {1,2,3} would be less than the Standard Deviation of {0,4,7,10}. You can see clearly that the data-points are grouped closely together more in the first set than the second set of data-points. And of course, you will see the same when you have endured the boring process of calculating the Variance and then the Standard Deviation. *Therefore, the difference between Variance and the Standard Difference is that the Variance is "The average of the squared differences from the Mean" and the Standard Deviation is it's square-root*. Q2) I think we could use the absolute value,but for the official definition, you have to square the differences. Thanks to Lura Ercolano for clearing my misconception about using absolute value to get the variance. But remember, by squaring the differences, we get a wider spread between the differences, which is called higher weighting.

I'm still kind of confused as to what exactly variance measures.

Variance simply tells you how spread your data is. On its own, it does not mean much, but it is particularly helpful when you compare two different samples: Sample 1: 1,2,3,4,3,1,2,3. Variance: small Sample 2: 1,2000,-23,500. Variance: much larger

So, by reading some of the questions and answers for this video, I have concluded the following: variance and standard deviation are artificial measures of dispersion, designed to be most useful in statistical calculations. The average of the absolute value of the difference of each data point from the mean COULD be used but the square method (variance) is generally adopted by statisticians and mathematicians for various reasons (eg derivatives are easier). Is this conclusion correct? Or is/are there other reasons that more variable points are given more weight (by use of squares not absolute values)? Thanks.

To some extent, I would say yes. Using squares (or the method of "least squares") certainly does often make derivations easier. Though it's not entirely the only reason. The Normal distribution goes hand-in-hand with the notion of squaring deviations, and scientists centuries ago noticed that the Normal distribution worked quite well to model their astronomical data. The method of least squares also results in the sample mean - a very intuitive and common measure of central tendency - being the "best" measure of center. And even better, the sampling distribution of the sample mean converges to the Normal distribution, so a lot of methods can be built on top of the Normal distribution, and as long as the sample mean is the starting point, everything should work out. I have a longer, more detailed answer here (I really wish the KA links were shorter): https://www.khanacademy.org/math/probability/descriptive-statistics/variance_std_deviation/v/range-variance-and-standard-deviation-as-measures-of-dispersion?qa_expand_key=kaencrypted_230b1942c74c60023e9fc29baaa3fd3e_2a53d64f8e4e84ddb90698dfc32164a7370b55e38674387fc44a77a9aa93675a7db92954f954bc475d07d0df2d6df8531fabb52aa5578a3086284bbcd09a14cf0f4833f1c1a9097f7763c5a7fecf9f46f9b173b3083ad409f140a4db31622998a483418a29c48e7b7cb59bef4f004d6edc1c381a3ac93f6e473241f6164026726e30926987a145fdab3c7817889565b99532a6d38f12105af2dcd93070dfb0dc

I thought that when you calculate variance you divide by the number of terms minus 1? (n-1) Is this correct or was I told wrong?

Great question. We use (n-1) when we are *estimating the variance* of a population based on a *sample* which is much smaller than the population. We almost always use the variable "s²" for that. But in this video, we are saying that *this dataset is the entire population*, not just a sample. If you actually have the entire population, then you divide by n, and since that's the situation here, we use "σ²" to signal that this is an exact population variance, not just from a sample. The (n-1) is a correction factor that improves (on average) an estimated variance based on a sample.

You lost me at "Standard Deviation"

i know.. watch the video twice and if you still dont get it, try to find additional sites online that could help you.. or just ask your teacher for help

Main content

Course: Statistics and probability > Unit 3

Lesson 4: Variance and standard deviation of a population

Measures of spread: range, variance & standard deviation

Name: Measures of spread: range, variance & standard deviation
Uploaded: 2011-02-20T16:36:55Z
Description: Range, variance, and standard deviation all measure the spread or variability of a data set in different ways. The range is easy to calculate—it's the difference between the largest and smallest data points in a set. Standard deviation is the square root of the variance. Standard deviation is a measure of how spread out the data is from its mean.

Google Classroom

Range, variance, and standard deviation all measure the spread or variability of a data set in different ways. The range is easy to calculate—it's the difference between the largest and smallest data points in a set. Standard deviation is the square root of the variance. Standard deviation is a measure of how spread out the data is from its mean. Created by Sal Khan.

Want to join the conversation?

Sort by:

Rob
Posted 12 years ago. Direct link to Rob's post “What's the point of squar...”
What's the point of squaring the difference at
5:49
just to make it positive when we could have taken the absolute value?
Button navigates to signup pageComment on Rob's post “What's the point of squar...”
(30 votes)
Answer
- Jacob Kalodner
  Posted 11 years ago. Direct link to Jacob Kalodner's post “The main reason to square...”
  The main reason to square the values is so they are all positive. You could take the absolute value instead, but squaring means that more variable points have a higher weighting. Squaring rather than taking the absolute value also means that taking the derivative of the function is easier.
  Finally you can also view the equal as being the Euclidean distance between all the points and the mean of the points (in the same way that the distance between two points (x1, y1) and (x2, y2) is the square root of (x1-x2)^2 + (y1-y2)^2).
  
  Credits to Peter Collingridge
  Comment on Jacob Kalodner's post “The main reason to square...”
  (41 votes)
Ben J
Posted 11 years ago. Direct link to Ben J's post “Why is it for the varianc...”
Why is it for the variance we square the deviations for data sets to make them positive? Doesn't it make more sense to simplely take the sum of their absolute values, then divide that by the number of data points?
Button navigates to signup pageComment on Ben J's post “Why is it for the varianc...”
(16 votes)
Answer
Enn
Posted 11 years ago. Direct link to Enn's post “In what case will either ...”
In what case will either Variance or Standard Deviation be preferred over each other ?
At around
9:55
Sal says that Variance has an odd set of units so is standard deviation better as it has the same units as the data itself ?
Button navigates to signup pageButton navigates to signup page
(13 votes)
Answer
- Dr C
  Posted 11 years ago. Direct link to Dr C's post “In practical settings, th...”
  In practical settings, the standard deviation is probably almost always preferred. the variance is more often used in the background, deriving this or that, or used in the theory of something.
  Comment on Dr C's post “In practical settings, th...”
  (19 votes)
David Spector
Posted 10 years ago. Direct link to David Spector's post “There are many questioner...”
There are many questioners here (including myself) wondering why squaring is used in the definition of variance instead of the more sensible absolute value. I've done a quick Web search on this question, and I believe I understand this better.

First, almost all the the reasons given have to do with ease of computation. This is largely irrelevant, since we have computers to aid us. For example, the first derivative of the abs val func has a discontinuity at zero. But computer numerical analysis can handle discontinuities, so calculations using the abs val definition should be easy using an advanced computer calculator program.

Next, some people like Euclidean distance over Manhattan (rectangular) distance. There is not much justification for this, as variance is not obviously an unconstrained 2-D distance problem. In fact, it is an n-dimensional problem, where n is the number of data measurements.

No, the real reason is historical: Gauss used the square variance definition to introduce his concept of normal distributions, where it is a perfect and natural fit. We might say, a least-squares fit, since one of the motivations is fitting 2nd-order polynomials to the error data.

The point is that the use of the mean and squaring in the definition of variance works great only for normal (Gaussian) distributions. As soon as you have data derived from two or more normal distributions, or a gamma distribution, or a Poisson distribution, or anything else, using abs val works better. In fact, the mean itself only works when there IS a mean,which is when the data is normally distributed. For general data, the mean is not defined, and other, more robust statistical measures must be found. (See https://en.wikipedia.org/wiki/Robust_statistics)

In summary, the definition of variance given here by Khan only applies to Gaussian distributions, which frequently arise in nature and in human behavior. But non-Gaussian distributions also frequently arise (such as when making many measurements with a ruler). For non-Gaussian data, this def of variance is erroneous. Instead, a measure of variance that shows the spreading of the data from each other should be used rather than the standard one, which shows the order-2 spreading from the mean.

Note: for more detailed information on the advantages of the related Absolute Mean Deviation, see http://www.leeds.ac.uk/educol/documents/00003759.htm .
Button navigates to signup pageButton navigates to signup page
(16 votes)
Answer
FinallyGoodAtMath
Posted 12 years ago. Direct link to FinallyGoodAtMath's post “What is the difference be...”
What is the difference between the standard deviation and the variance? Why is the variance in units squared and not represented by the units in the measurement? If squaring the numbers is just to make it positive (@
5:48
), why not use the average absolute deviation?
Button navigates to signup pageButton navigates to signup page
(7 votes)
Answer
- Arakban Haberi
  Posted 11 years ago. Direct link to Arakban Haberi's post “Q1) The Standard Deviatio...”
  Q1) The Standard Deviation is the "mean of mean". Basically, it is the square-root of the Variance (the mean of the differences between the data points and the average). Standard Deviation is the measure of how far a typical value in the set is from the average. The smaller the Standard Deviation, the closely grouped the data point are. The standard deviation of {1,2,3} would be less than the Standard Deviation of {0,4,7,10}. You can see clearly that the data-points are grouped closely together more in the first set than the second set of data-points. And of course, you will see the same when you have endured the boring process of calculating the Variance and then the Standard Deviation.
  
  Therefore, the difference between Variance and the Standard Difference is that the Variance is "The average of the squared differences from the Mean" and the Standard Deviation is it's square-root.
  
  Q2) I think we could use the absolute value,but for the official definition, you have to square the differences. Thanks to Lura Ercolano for clearing my misconception about using absolute value to get the variance. But remember, by squaring the differences, we get a wider spread between the differences, which is called higher weighting.
  Comment on Arakban Haberi's post “Q1) The Standard Deviatio...”
  (17 votes)
Zoe Martindale
Posted 8 years ago. Direct link to Zoe Martindale's post “I'm still kind of confuse...”
I'm still kind of confused as to what exactly variance measures.
Button navigates to signup pageButton navigates to signup page
(7 votes)
Answer
- Matt B
  Posted 8 years ago. Direct link to Matt B's post “Variance simply tells you...”
  Variance simply tells you how spread your data is. On its own, it does not mean much, but it is particularly helpful when you compare two different samples:
  Sample 1: 1,2,3,4,3,1,2,3. Variance: small
  Sample 2: 1,2000,-23,500. Variance: much larger
  Comment on Matt B's post “Variance simply tells you...”
  (12 votes)
Beau Hansen
Posted 9 years ago. Direct link to Beau Hansen's post “So, by reading some of th...”
So, by reading some of the questions and answers for this video, I have concluded the following: variance and standard deviation are artificial measures of dispersion, designed to be most useful in statistical calculations. The average of the absolute value of the difference of each data point from the mean COULD be used but the square method (variance) is generally adopted by statisticians and mathematicians for various reasons (eg derivatives are easier). Is this conclusion correct? Or is/are there other reasons that more variable points are given more weight (by use of squares not absolute values)?
Thanks.
Button navigates to signup pageButton navigates to signup page
(6 votes)
Answer
- Dr C
  Posted 9 years ago. Direct link to Dr C's post “To some extent, I would s...”
  To some extent, I would say yes. Using squares (or the method of "least squares") certainly does often make derivations easier. Though it's not entirely the only reason. The Normal distribution goes hand-in-hand with the notion of squaring deviations, and scientists centuries ago noticed that the Normal distribution worked quite well to model their astronomical data.
  
  The method of least squares also results in the sample mean - a very intuitive and common measure of central tendency - being the "best" measure of center. And even better, the sampling distribution of the sample mean converges to the Normal distribution, so a lot of methods can be built on top of the Normal distribution, and as long as the sample mean is the starting point, everything should work out.
  
  I have a longer, more detailed answer here (I really wish the KA links were shorter):
  
  https://www.khanacademy.org/math/probability/descriptive-statistics/variance_std_deviation/v/range-variance-and-standard-deviation-as-measures-of-dispersion?qa_expand_key=kaencrypted_230b1942c74c60023e9fc29baaa3fd3e_2a53d64f8e4e84ddb90698dfc32164a7370b55e38674387fc44a77a9aa93675a7db92954f954bc475d07d0df2d6df8531fabb52aa5578a3086284bbcd09a14cf0f4833f1c1a9097f7763c5a7fecf9f46f9b173b3083ad409f140a4db31622998a483418a29c48e7b7cb59bef4f004d6edc1c381a3ac93f6e473241f6164026726e30926987a145fdab3c7817889565b99532a6d38f12105af2dcd93070dfb0dc
  Button navigates to signup page
  (5 votes)
milcha02
Posted 11 years ago. Direct link to milcha02's post “what is range?”
what is range?
Button navigates to signup pageComment on milcha02's post “what is range?”
(5 votes)
Answer
Lori Rahn
Posted 9 years ago. Direct link to Lori Rahn's post “I thought that when you c...”
I thought that when you calculate variance you divide by the number of terms minus 1? (n-1) Is this correct or was I told wrong?
Button navigates to signup pageButton navigates to signup page
(5 votes)
Answer
- robshowsides
  Posted 9 years ago. Direct link to robshowsides's post “Great question. We use (...”
  Great question. We use (n-1) when we are estimating the variance of a population based on a sample which is much smaller than the population. We almost always use the variable "s²" for that. But in this video, we are saying that this dataset is the entire population, not just a sample. If you actually have the entire population, then you divide by n, and since that's the situation here, we use "σ²" to signal that this is an exact population variance, not just from a sample. The (n-1) is a correction factor that improves (on average) an estimated variance based on a sample.
  Button navigates to signup page
  (5 votes)
Tutti Frutti
Posted 5 years ago. Direct link to Tutti Frutti's post “You lost me at "Standard ...”
You lost me at "Standard Deviation"
Button navigates to signup pageButton navigates to signup page
(4 votes)
Answer
- Grace Weinheimer
  Posted 4 years ago. Direct link to Grace Weinheimer's post “i know.. watch the video ...”
  i know.. watch the video twice and if you still dont get it, try to find additional sites online that could help you.. or just ask your teacher for help
  Button navigates to signup page
  (4 votes)

Video transcript

In the last video we talked about different ways to represent the central tendency or the average of a data set. What we're going to do in this video is to expand that a little bit to understand how spread apart the data is as well. So let's just think about this a little bit. Let's say I have negative 10, 0, 10, 20 and 30. Let's say that's one data set right there. And let's say the other data set is 8, 9, 10, 11 and 12. Now let's calculate the arithmetic mean for both of these data sets. So let's calculate the mean. And when you go further on in statistics, you're going to understand the difference between a population and a sample. We're assuming that this is the entire population of our data. So we're going to be dealing with the population mean. We're going to be dealing with, as you see, the population measures of dispersion. I know these are all fancy words. In the future, you're not going to have all of the data. You're just going to have some samples of it, and you're going to try to estimate things for the entire population. So I don't want you to worry too much about that just now. But if you are going to go further in statistics, I just want to make that clarification. Now, the population mean, or the arithmetic mean of this data set right here, it is negative 10 plus 0 plus 10 plus 20 plus 30 over-- we have five data points-- over 5. And what is this equal to? That negative 10 cancels out with that 10, 20 plus 30 is 50 divided by 5, it's equal to 10. Now, what's the mean of this data set? 8 plus 9 plus 10 plus 11 plus 12, all of that over 5. And the way we could think about it, 8 plus 12 is 20, 9 plus 11 is another 20, so that's 40, and then we have a 50 there. Add another 10. So this, once again, is going to be 50 over 5. So this has the exact same population means. Or if you don't want to worry about the word population or sample and all of that, both of these data sets have the exact same arithmetic mean. When you average all these numbers and divide by 5 or when you take the sum of these numbers and divide by 5, you get 10, some of these numbers and divide by 5, you get 10 as well. But clearly, these sets of numbers are different. You know, if you just looked at this number, you'd say, oh, maybe these sets are very similar to each other. But when you look at these two data sets, one thing might pop out at you. All of these numbers are very close to 10. I mean, the furthest number here is two away from 10. 12 is only two away from 10. Here, these numbers are further away from 10. Even the closer ones are still 10 away and these guys are 20 away from 10. So this right here, this data set right here is more disperse, right? These guys are further away from our mean than these guys are from this mean. So let's think about different ways we can measure dispersion, or how far away we are from the center, on average. Now one way, this is kind of the most simple way, is the range. And you won't see it used too often, but it's kind of a very simple way of understanding how far is the spread between the largest and the smallest number. You literally take the largest number, which is 30 in our example, and from that, you subtract the smallest number. So 30 minus negative 10, which is equal to 40, which tells us that the difference between the largest and the smallest number is 40, so we have a range of 40 for this data set. Here, the range is the largest number, 12, minus the smallest number, which is 8, which is equal to 4. So here range is actually a pretty good measure of dispersion. We say, OK, both of these guys have a mean of 10. But when I look at the range, this guy has a much larger range, so that tells me this is a more disperse set. But range is always not going to tell you the whole picture. You might have two data sets with the exact same range where still, based on how things are bunched up, it could still have very different distributions of where the numbers lie. Now, the one that you'll see used most often is called the variance. Actually, we're going to see the standard deviation in this video. That's probably what's used most often, but it has a very close relationship to the variance. So the symbol for the variance-- and we're going to deal with the population variance. Once again, we're assuming that this is all of the data for our whole population, that we're not just sampling, taking a subset, of the data. So the variance, its symbol is literally this sigma, this Greek letter, squared. That is the symbol for variance. And we'll see that the sigma letter actually is the symbol for standard deviation. And that is for a reason. But anyway, the definition of a variance is you literally take each of these data points, find the difference between those data points and your mean, square them, and then take the average of those squares. I know that sounds very complicated, but when I actually calculate it, you're going to see it's not too bad. So remember, the mean here is 10. So I take the first data point. Let me do it over here. Let me scroll down a little bit. So I take the first data point. Negative 10. From that, I'm going to subtract our mean and I'm going to square that. So I just found the difference from that first data point to the mean and squared it. And that's essentially to make it positive. Plus the second data point, 0 minus 10, minus the mean-- this is the mean; this is that 10 right there-- squared plus 10 minus 10 squared-- that's the middle 10 right there-- plus 20 minus 10-- that's the 20-- squared plus 30 minus 10 squared. So this is the squared differences between each number and the mean. This is the mean right there. I'm finding the difference between every data point and the mean, squaring them, summing them up, and then dividing by that number of data points. So I'm taking the average of these numbers, of the squared distances. So when you say it kind of verbally, it sounds very complicated. But you're taking each number. What's the difference between that, the mean, square it, take the average of those. So I have 1, 2, 3, 4, 5, divided by 5. So what is this going to be equal to? Negative 10 minus 10 is negative 20. Negative 20 squared is 400. 0 minus 10 is negative 10 squared is 100, so plus 100. 10 minus 10 squared, that's just 0 squared, which is 0. Plus 20 minus 10 is 10 squared, is 100. Plus 30 minus 10, which is 20, squared is 400. All of that over 5. And what do we have here? 400 plus 100 is 500, plus another 500 is 1000. It's equal to 1000/5, which is equal to 200. So in this situation, our variance is going to be 200. That's our measure of dispersion there. And let's compare it to this data set over here. Let's compare it to the variance of this less-dispersed data set. So let me scroll over a little bit so we have some real estate, although I'm running out. Maybe I could scroll up here. There you go. Let me calculate the variance of this data set. So we already know its mean. So its variance of this data set is going to be equal to 8 minus 10 squared plus 9 minus 10 squared plus 10 minus 10 squared plus 11 minus 10-- let me scroll up a little bit-- squared plus 12 minus 10 squared. Remember, that 10 is just the mean that we calculated. You have to calculate the mean first. Divided by-- we have 1, 2, 3, 4, 5 squared differences. So this is going to be equal to-- 8 minus 10 is negative 2 squared, is positive 4. 9 minus 10 is negative 1 squared, is positive 1. 10 minus 10 is 0 squared. You still get 0. 11 minus 10 is 1. Square it, you get 1. 12 minus 10 is 2. Square it, you get 4. And what is this equal to? All of that over 5. This is 10/5. So this is going to be--all right, this is 10/5, which is equal to 2. So the variance here-- let me make sure I got that right. Yes, we have 10/5. So the variance of this less-dispersed data set is a lot smaller. The variance of this data set right here is only 2. So that gave you a sense. That tells you, look, this is definitely a less-dispersed data set then that there. Now, the problem with the variance is you're taking these numbers, you're taking the difference between them and the mean, then you're squaring it. It kind of gives you a bit of an arbitrary number, and if you're dealing with units, let's say if these are distances. So this is negative 10 meters, 0 meters, 10 meters, this is 8 meters, so on and so forth, then when you square it, you get your variance in terms of meters squared. It's kind of an odd set of units. So what people like to do is talk in terms of standard deviation, which is just the square root of the variance, or the square root of sigma squared. And the symbol for the standard deviation is just sigma. So now that we've figured out the variance, it's very easy to figure out the standard deviation of both of these characters. The standard deviation of this first one up here, of this first data set, is going to be the square root of 200. The square root of 200 is what? The square root of 2 times 100. This is equal to 10 square roots of 2. That's that first data set. Now the standard deviation of the second data set is just going to be the square root of its variance, which is just 2. So the second data set has 1/10 the standard deviation as this first data set. This is 10 roots of 2, this is just the root of 2. So this is 10 times the standard deviation. And this, hopefully, will make a little bit more sense. Let's think about it. This has 10 times more the standard deviation than this. And let's remember how we calculated it. Variance, we just took each data point, how far it was away from the mean, squared that, took the average of those. Then we took the square root, really just to make the units look nice, but the end result is we said that that first data set has 10 times the standard deviation as the second data set. So let's look at the two data sets. This has 10 times the standard deviation, which makes sense intuitively, right? I mean, they both have a 10 in here, but each of these guys, 9 is only one away from the 10, 0 is 10 away from the 10, 10 less. 8 is only two away. This guy is 20 away. So it's 10 times, on average, further away. So the standard deviation, at least in my sense, is giving a much better sense of how far away, on average, we are from the mean. Anyway, hopefully, you found that useful.