If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content
Current time:0:00Total duration:8:05

Video transcript

let's say I'm trying to judge how many years of experience we have at the Kahn Academy where on average how many years of experience we have and in particular the particular type of average we'll focus on is the arithmetic mean so I go and survey the folks there and let's say this was when Kahn Academy was a smaller organization when there were only five people in the organization and I find and I'm surveying the entire population so years of experience the entire population of Kahn Academy because that's what I care about years of experience at our organization at Kahn Academy this is when we had five people and I were to go we're now 36 people I don't want to date this video too much but let's say I go and I say okay there's one person strata college that one year out of X 1 year of experience or recently out of college somebody with three years of experience someone with five years of experience someone with seven years of experience and someone very experienced or reasonably experienced with fourteen years of experience so based on this data point and this is our population four years of experience I'm assuming that we only have five people in the organization at this point what would be the population mean for the years of experience what is the mean years of experience for my population well we can just calculate that our mean experience and I'm going to denote it with mu because we're talking about the population now this is a parameter for the population it's going to be equal to the sum from our first two data point so data point one all the way to data point in this case data point five we have five data points of of each of so we're going to take all from the first data point the second data point the third data point all the way to the fifth so this is going to be equal to x1 plus X and I'm going to divide it all by the number of data points I have plus x2 plus x3 plus x4 plus X sub five subscript five all of that over five and as we said this is a very fancy way of saying I'm going to sum up all of these things I'm going to sum up all of these things and then divide by the number of things we have so let's do that get the calculator out so I'm going to add them all up so I'm going to add them all up one plus three plus five I really don't need a calculator that for this plus seven plus 14 so that's five data points and I'm going to divide by 5 and I get 6 so the population mean 4 years of experience at my organization is 6 6 years of experience well that's I guess interesting but now I want to ask another question I want to get some measure of how how much spread there is around that mean or how much do the data points vary around that mean and obviously I can give someone all the data points but instead I actually want to come up with a parameter that somehow represents how much all of these things on average are varying from this from this number right here or maybe I will call that thing the variance and so what I do so the variance variance and I will do and this is a population variance that I'm talking about just to be clear it's a parameter the population variance I'm going to denote with the Greek letter Sigma lowercase Sigma this is capital Sigma lowercase Sigma squared and I'm going to say well I'm going to take the distance from each of these points to the mean and just so I get a positive value I'm going to square it and then I'm going to divide by the number of data points that I have so essentially i'm going to find the average squared distance now that might sound very complicated but let's actually work it out so I'll take my first data point I'll take that data point and I will subtract our mean from it so this is going to give me a negative number but if I square it it's going to be positive so it's essentially going to be the squared distance between 1 and my mean and then to that I'm going to add the squared distance between 3 and my mean 3 and my mean and to that I'm going to add the squared distance between 5 and my mean 5 and my mean and since I'm squaring it doesn't matter if I do 5 minus 6 or 6 minus 5 when I square it I'm going to get a positive result regardless and then to that I'm going to add the squared distance between 7 in my mean so 7 minus 6 squared all of this this is my population meaning that I'm finding the difference between and then finally the squared the squared difference between 14 and my mean plus 14 and my mean and then I'm going to find essentially the mean of these squared distances so this I have five squared distances right over here so let me divide let me divide let me divide by five so what will I get when I make this calculation right over here well let's figure this out this is going to be equal to this is going to be equal to 1 minus 6 is negative 5 negative 5 squared is 25 3 minus 6 is negative 3 F I squared that I get 9 5 minus 6 is negative 1 if I squared I get positive 1 7 minus 6 is 1 if I squared I get positive 1 and 14 minus 6 is 8 if I square it I get 64 and then I'm going to divide that by I'm going to divide all of that by 5 and I don't need to use a calculator but I tend to make a lot of careless mistakes when I do things while I'm making a video so let me get I get 25 plus 9 plus 1 plus 1 plus 64 divided by 5 divided by 5 so I get 20 so the average squared distance or the mean squared distance from our population mean is equal to is equal to 20 in what's the way these things aren't 20 way remember it's the squared distance away from my population mean so I squared each of these things I liked it because it made it positive and we'll see later it has other nice properties about it now the last thing is how can we represent this mathematically we already saw that we represent we know how to represent a a population mean and a sample mean mathematically like this and hopefully we don't find it that daunting anymore but how would we do the exact same thing how would we denote what we did right over here well let's just think through we're just saying that the population variance the population variance we're taking the sum we're taking the sum the sum of each so we're going to take each item we'll start with the first item and we're going to go to the nth item in our population this is a we're talking about a population here and we're going to take we're not going to just take the item we're not this would just be the item but we're going to take the item and from that we're going to subtract the population mean we're going to subtract to this thing we're going to subtract this thing we're going to square it we're going to square it so the way I've written into right now this would just be the numerator I've just taken the sum of each of these things the sum of the difference between each data point and the population mean and squared it if I really want to get the way I figured out this variance right over here I have to divide the whole thing by the number of data points we have so this might seem very daunting and very intimidating but all it says is take each of your data points so well one it says figure out your population mean figure out your population figure that out first and then from each from each data point in your in your population subtract out that population mean square it take the sum of all of those things and then just divide by the number of data points you have and you will get your population variance