If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains ***.kastatic.org** and ***.kasandbox.org** are unblocked.

Main content

Current time:0:00Total duration:7:59

AP.STATS:

UNC‑1 (EU)

, UNC‑1.K (LO)

, UNC‑1.K.2 (EK)

CCSS.Math: , so we have nine students who recently graduated from a small school that has a class size of nine and they want to figure out what is the central tendency for salaries one year after graduation and they also want to have a sense of the spread around that central tendency one year after graduation so they all agree to put in their salaries into a computer and so these are their salaries and measured in thousands so one makes thirty five thousand fifty thousand fifty thousand to two thousand fifty six thousand to make sixty thousand one makes seventy five thousand and one makes two hundred fifty thousand so she's doing very well for herself and the computer it spits out a bunch of parameters based on this data here so it spits out two typical measures of central tendency the mean is roughly seventy six point two the computer would calculate it by adding up all of these numbers these nine numbers and then dividing by nine and the median is 56 and median is quite easy to calculate you just order the numbers and you take the middle number here which is 56 now what I want you to do is pause this video and think about for this data set for this population of salaries which measure which measure of central tendency is a better measure alright so let's think about this a little bit I'm going to plot it on a line here I'm going to plot my data so we get a better sense so we just don't see them so we just don't see things as numbers but we see where those numbers sit relative to each other so let's say this is 0 let's say this is 2 1 2 3 4 5 so this would be 250 this is 50 hundred 150 1 200 200 and let's see let's say if this is 50 then this would be roughly 40 right you know I just want to get rough so this would be about 60 70 80 90 close enough I could draw this a little bit neater but 60 70 80 90 that should let me just clean this up a little bit more to this one right over here a bit closer so this one let me just put it right around here so that's 40 and then this would be 30 20 10 okay that's pretty good so let's plot this data so one student makes thirty five thousand so that is right over there to make fifty that are three make fifty thousand so one two and three I'll put it like that one makes fifty six thousand which would put them right over here one makes sixty thousand actually to make sixty thousand so it's like that one makes seventy five thousand so that sixty seventy seventy-five thousand so it's going to be right around there and then one makes two hundred and fifty thousand so one salaries all the way around there and then when we calculate the mean as seventy six point two is our measure of central tendency seventy six point two is right over there so is this a good measure of central tendency well to me it doesn't feel that good because our measure of central tendency is higher than all of the data points except for one and the reason is is that you have this one that that our data is skewed significantly by this data point at two hundred and fifty thousand dollars it is so far from the rest of the distribution from the rest of the data that it has skewed the mean and this is something that you see in general if you have data that is skewed and especially things like salary data where someone might make most people are making fifty sixty seventy thousand dollars but someone might make two million dollars and so that will skew the average or skew the mean I should say when you add them all up and divide by the number of data points you have in this case especially when you have data points that would skew the mean median is much more robust the median at fifty six sits right over here which seems to be much more indicative for central tendency and think about it even if you made this a set of 250,000 if you made this 250,000 thousand which would be two hundred fifty million which is a ginormous amount of money to make it wouldn't it would skew the mean incredibly but it actually would not even change the median because the median it doesn't matter how high this number gets this could be a trillion dollars this could be a quadrillion dollars the median is going to stay the same so the median is much more robust if you have a skewed data set mean makes a little bit more sense if you have a symmetric data set or if you have things that are you know where things are roughly above and below the mean or things aren't skewed incredibly in one direction especially by a handful of data points like we have right over here so in this example the median is a much better measure of central tendency and so what about spread well you might have to say well Sal you already told us that that the mean is not so good and the standard deviation is based on the mean you take each of these data points find their distance from the mean square that number add up those squared distances divided by the number of data points if we're taking the population standard deviation and you and then you you take the square root of the whole thing and so since this is based on the mean which isn't a good measure of central tendency in this situation and this this is also going to skew that standard deviation this is going to be this is a lot larger than if you look at the actual you want an indication of the spread yes you have this one data point that's way far away from either the mean or the median depending on how you want to think about it but most of the data point seem much closer and so for that situation not only are we're using the median but the interquartile range is once again more robust how do we calculate the interquartile range well you take the median and then you take the bottom group of numbers and calculate the median of those so that's 50 right over here and they take the top group of numbers the upper group of numbers and the median there is 60 and 75 is 67 point five if this looks unfamiliar we have many videos on interquartile range and calculating standard deviation and median and mean this is just a little bit of review and then the difference between these two is 17.5 and notice this distance between these two the 17.5 this isn't going to change even if this is two hundred and fifty billion dollars so once again it is both of these measures are more robust when you have a skewed data set so the big takeaway here is me and standard deviation they're not bad if you have a roughly symmetric data set if you don't have any significant outliers things that really skew the data set mean and standard deviation can be quite solid but if you're looking at something that could get really skewed by a handful of data points median might be a median and inter quartile range median for central tendency interquartile range for spread around that central tendency and that's why you'll see when people talk about salaries they'll often talk about median because you could have some skewed salaries especially on the upside we talked about things like home prices you'll see median often measured more typically than mean because home prices in a neighborhood a lot of or in a city a lot of the houses might be in the two hundred thousand three hundred thousand dollar range but maybe there's one ginormous mansion that is a hundred million dollars and if you calculated mean that would skew and give a false impression of the average or the central tendency of prices in that city