Sal's old statistics videos
Statistics: Variance of a Population Variance of a population.
⇐ Use this menu to view and help create subtitles for this video in many different languages.
You'll probably want to hide YouTube's captions if using these subtitles.
- I realize I made a slight error in the last video,
- when I talk about the population and the sample mean.
- So I will rewrite the equations.
- I realize I made a slight notational error,
- and it might have confused you a little bit.
- So just to review a little bit, it never hurts.
- The mean of a population-- once again, that's mu--
- the mean of a population is equal to,
- you take the sum of each of the data points.
- So you take the sum-- that's what that big sigma is for--
- of each of the data points.
- So X sub i.
- I had written X sub n before, and if you review the last video,
- you can see why it might be a little confusing.
- And you start with the first data point.
- So i is equal to 1.
- You start with the first data point,
- and you take the sum all the way to the Nth data point,
- where we have a big capital N,
- where N is the total number of elements in the population.
- And then you divide that by N.
- So that's another way of writing X sub 1, plus X sub 2,
- plus, and you just keep adding, bum bum bum,
- however many there are. X sub N, and then you divide that by N.
- And I think that's what you're familiar with
- as just the arithmetic mean, or the average.
- You just add up all the elements
- and you divide by the total number of elements there are.
- That's just a fancy way of writing that.
- And then the sample mean is essentially the same thing,
- although you use slightly different notation.
- The sample mean is written as x with a line over it,
- and that's equal to, once again, the sum of the elements in the sample.
- And then you have just a slight notational difference.
- You start at the first element in the sample
- and you go to the number of elements in the sample.
- And that's why they use that lowercase n.
- There are big N elements in the whole population,
- and if you took some subset of that--
- we're assuming that small n is less than or equal to big N--
- and you divide that by the total number of elements in the sample.
- So once again, this would be x1 plus x2 plus dun dun dun,
- plus x lowercase n divided by lowercase n.
- These are essentially the same thing.
- If your sample was the entire population,
- then these n's would be equal to each other,
- and these numbers would be equal to each other.
- But just the notational difference,
- if you ever see this, you know you're dealing with a sample.
- Here you know you're dealing with the entire population.
- And similarly, big N, entire population,
- small n, the sample.
- Fair enough.
- I think we're now ready to learn a little bit about
- measures of dispersion.
- So the mean and the mode and the median that
- we covered in the first video in this playlist,
- were all ways of measuring the central tendency of a data set,
- or kind of picking a number that is
- most representative of all the numbers.
- But we lose a lot of information.
- We don't know whether all of the numbers in the data set
- are close to that number, close to the mean,
- or maybe they're all really far away from the mean.
- And that's why we want to come up with measures of dispersion.
- And let me show you what I mean.
- So let's say I have one set,
- and let's say it is a 2, a 2, a 3, and a 3.
- Let's say this is a population.
- Let's just deal with population means
- and population dispersions for now.
- So what's the mean here?
- The mean here is going to be 2 plus 2 plus 3 plus 3,
- all of that over 4.
- And what is that?
- That's equal to 2 and 1/2 right?
- 4 plus 6 divided by 4, right?
- That's equal to 2.5.
- Fair enough.
- Now what if we had this?
- What if we had the numbers, I don't know, 0, 0, and 5, and 5.
- So these are the numbers in the set.
- I'll put commas
- just so you know these are separate numbers.
- What's that mean here?
- Well the mean here-- and let's say this is the population.
- This isn't a sample, this is the entire population.
- And you'll see why I'm making that distinction later.
- So it'll be 0 plus 0 plus 5 plus 5.
- Well that's 10, divided by 4 is equal to 2.5.
- So the arithmetic mean of both of these populations
- are the same number.
- They're both 2.5.
- But you'll see that these sets are different.
- Here all of the numbers are pretty close to 2.5, right?
- While here, sure their mean, their arithmetic mean is 2.5,
- but they're further away from 2.5.
- Or the distances of each of these numbers,
- each of the data, each of the numbers in the set,
- their distance from the mean is further.
- So you can kind of view them that they're more dispersed,
- they're further away from the mean.
- Or another way you can think of it is the mean,
- although it does measure central tendency,
- it's not quite as indicative of all the numbers.
- The numbers are much further away from the mean on average.
- So how do you measure that?
- Well, you measure that with the variance.
- And this is something I've found,
- it seems complicated when you first look at it.
- And most statistics textbooks use fairly complex notation.
- But the idea is almost as straightforward as
- the arithmetic mean.
- So what they'll do is they'll write the variance,
- and they'll write it as this letter sigma, this Greek letter--
- I wrote the top part too long.
- Let me actually undo that.
- I don't want you to spend the rest of your life
- writing it with a big top part.
- They write it as sigma squared.
- And we'll talk in a few seconds about why it's written as--
- you know, why don't they just write v for variance?
- Why do they write this weird letter squared?
- I'll talk about that in a second.
- But the variance of a population is defined--
- and once again, these are just human derived constructs,
- to kind of get our minds around data.
- Being able to describe a set of data without having to list all the numbers,
- and being able to kind of understand
- what that data might represent, what can represent that data.
- So what you do is, you take the sum,
- and you start with all of the points in the population.
- But instead of taking the sum of the points,
- you take each point, X sub i, and you subtract from that--
- and actually it doesn't matter if you subtract from that
- or subtract that from-- the mean, the population mean.
- And then you square it.
- So what is this?
- This is the distance between each number and the mean.
- And when you square it, it becomes a positive number.
- So you can kind of view it as just the squared absolute distance
- between each number and the mean of that set.
- And then you take the average of all of those,
- and you divide that by N.
- That might seem like a very complicated notion,
- but let's calculate it for these two data sets.
- So here, let me rewrite that first data set.
- It's 2, 2, 3, and 3.
- So...what is...actually let me write it this way.
- This will help explain it for you a little bit better.
- So if I wrote i-- i1, i2, i3, i4.
- That's i.
- Then X sub i.
- It's kind of arbitrary, it's just saying the first term,
- the second term, the third term.
- I could have had this in any order; it doesn't matter.
- Maybe this was the first term and this was the second
- and this was the third.
- It doesn't matter, because we're just going to
- add them all up and then divide them.
- So it doesn't matter what order we do it.
- But anyway, X sub 1 is equal to 2.
- X sub 2 is equal to 2.
- X sub 3 is equal to 3.
- I'll stop writing this equal thing. X sub 4 is equal to 4.
- What's the mean?
- Well, we figured out the mean up here.
- We just took these numbers and added them and divided by 4.
- The mean is 2.5.
- So what is X sub i minus the mean?
- We're slowly building up to this equation.
- What is X sub i minus the mean?
- Well, 2 minus 2.5, that's minus 0.5.
- 2 minus 2.5, that's once again minus 0.5.
- 3 minus 2.5, that's 0.5.
- 3 minus 2.5, that's 0.5.
- Fair enough.
- Now, this equation, they want us to square this.
- So X sub i minus the mean squared.
- And there's several other properties we'll talk about later,
- but the most important thing that the squaring does--
- and the absolute value could have done it as well--
- but the squaring makes all of these positive.
- So minus 0.5 squared is positive 0.25.
- This is positive 0.25.
- Plus 0.5 squared is also positive 0.25.
- And this is positive 0.25.
- So if we wanted to know the sum, from i is equal to 1 to 4,
- of X sub i minus the mean, which is 2.5, squared.
- This is equal to the sum of all of these numbers.
- This is just saying sum all of these.
- So sum all of these-- 0.25.
- So that's equal to 1.
- But this isn't the variance yet.
- The variance is this thing--
- let's look at the original formula--
- The variance is this thing
- divided by the total number of numbers you have.
- So you take this.
- So the variance is equal to this thing
- divided by the total number of numbers, which is 4,
- which is equal to 0.25.
- And you see, here the distance from every number
- the distance from every number to the mean squared was 0.25.
- So the average of all of these--
- which is essentially what the variance is--
- the average was also 0.25.
- And I'll do another example where these are different.
- The other example in this video, actually,
- they're not different.
- But you see here, the average squared distance from the mean
- in that first data set is 0.25.
- And here what's the average squared distance from the mean?
- So let's see.
- This is how far from the mean.
- So let's say X sub i,
- and then X sub i minus the mean, for this population.
- So X sub i, there's 0, there's a 0, there's a 5, and a 5.
- This is the first term, X sub 1.
- X sub 1, this is X sub 2, and so forth.
- And then each of these numbers minus μ
- 0 minus μ ...that's minus 2.5.
- 0 minus 2.5-- this could be 2.5, right?
- That's the mean.
- It's minus 2.5, 5 minus 2.5 is 2.5, 5 minus 2.5 is 2.5.
- Now if you took X sub i minus the mean squared,
- 2.5 squared is what?
- 6.25, and it becomes positive.
- So 6.25.
- That's the same thing, 6.25.
- That's already positive.
- So 6.25, 6.25.
- And so the variance is the sum of all of these
- divided by the total number of numbers there are.
- So we take the sum of all of them.
- So it's just the average of these.
- And that's pretty easy to calculate.
- If you add all of these up and divide by 4,
- you're just going to get 6.25.
- So the variance of this population is 6.25.
- So there you have it.
- You have two data sets where their means are the same,
- but the variance of this data set is equal to, we figured out it was 0.25,
- while the variance of this data set is equal to 6.25.
- And it's hard right now to have an intuition of
- how does the 6 relate to the 0.25.
- But you know that this is a larger number,
- this is a much larger number than this is,
- which tells you just kind of an intuitive feel that
- the numbers in this set are, on average,
- much further away from the mean than the numbers in this data set.
- Anyway, I'm out of time.
- I'll see you in the next video.
- And we'll talk a little bit about this, and we'll move into
- the standard deviation, and then what happens
- if you take these of a sample instead of a population.
- Everything we're doing here, we're taking the mean
- and the variance of every number in the data set.
- Later we'll do it for the sample.
- See you soon.
Be specific, and indicate a time in the video:
At 5:31, how is the moon large enough to block the sun? Isn't the sun way larger?
|
Have something that's not a question about this content? |
This discussion area is not meant for answering homework questions.
Discuss the site
For general discussions about Khan Academy, visit our Reddit discussion page.
Flag inappropriate posts
Here are posts to avoid making. If you do encounter them, flag them for attention from our Guardians.
abuse
- disrespectful or offensive
- an advertisement
not helpful
- low quality
- not about the video topic
- soliciting votes or seeking badges
- a homework question
- a duplicate answer
- repeatedly making the same post
wrong category
- a tip or feedback in Questions
- a question in Tips & Feedback
- an answer that should be its own question
about the site
Share a tip
Suggest a fix
Have something that's not a tip or feedback about this content?
This discussion area is not meant for answering homework questions.