If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

# Example: Comparing distributions

AP.STATS:
UNC‑1 (EU)
,
UNC‑1.N (LO)
,
UNC‑1.N.1 (EK)
,
UNC‑1.O (LO)
,
UNC‑1.O.1 (EK)

## Video transcript

what we're going to do in this video is start to compare distributions so for example here we have two distributions that show the various temperatures different cities get during the month of January this is the distribution for Portland for example they get 8 days between 1 & 4 degrees Celsius they get 12 days between 4 & 7 degrees Celsius so forth and so on and then this is the distribution for Minneapolis now when we make these comparisons what we're going to focus on is the center of the distributions to compare that and also the spread sometimes people will talk about the variability of the distributions and so this is these are the things that we're going to compare and in making the comparison we're actually going to try to eyeball it we're not gonna try to pick a measure of central tendency say the mean or the median and then calculate precisely what those numbers are for these we might want to do those if they're close but if we can eyeball it that would be even better similar for the spread and variability in either of these cases there are multiple measures in our statistical toolkit Center mean median is for mean median is valuable for the center for spread variability the range the interquartile range the mean absolute deviation the standard deviation these are all measures but sometimes you can just kind of get gauge it by looking so in this first comparison which distribution has a higher center or are they comparable well if you look at the distribution for Portland the center of this distribution let's say if we were to just think about the mean although I think the mean and the median would be reasonably close right over here it seems like it would be around it would be around 7 or maybe a little bit lower than 7 so it would be kind of in it would be kind of in that range maybe between 5 and 7 would be our central tendency would be either our mean or median while for Minneapolis it looks like our center is much closer to maybe negative 2 or negative 3 degrees Celsius so here even though we don't know precisely with the or the median is of each of these distributions you can say that Portland Portland distribution has a has a higher center has higher center however you want to measure either mean or median now what about the spread or variability well if you just superficially thought about range you see here that there's nothing below one degree Celsius and nothing above thirteen so you have about a 13 degree range at most right over here in fact what might be contributing to this first column it might be a bunch of things at 3 degrees or even 3.9 degrees and similarly what's contributing to this last column might be a bunch of things at 10.1 degrees but at most you have a 12 degree range right over here while over here it looks like you have well you it looks like it's approaching a 27 degree range so based on that and even if you just eyeball it this is just we're using the same scales for our horizontal axes here the temperature axes and this is just a much wider distribution than what you see over here and so you would say that the Minneapolis distribution has more spread or higher spread or more variability so higher spread right over here let's do another example and we'll use a different representation for the data here so we're told at the Olympic Games many events have several rounds of competition one of these events is the men's 100-meter backstroke the upper dot plot shows the times in seconds of the top eight finishers in the final round of the 2012 Olympics so that's in green right over here the final round the lower dot plot shows the times of the same eight swimmers but in the semi-final round so given these these distributions which one has a higher Center well once again I mean and here you can actually it's a little bit easier to eyeball even what the median might be the mean I would probably have to do a little bit more mathematics but let's say the median let's see there's 1 2 3 4 5 6 7 eight data points so the median is going to sit between the lower four and the upper four so the central tendency right over here is for the final round is looks like it's around fifty-seven point one seconds while the if we especially if we think about the median while the central tendency for the semifinal round let's see one two three four five six seven eight looks like it is right about there so this is about 57 of more than 57 point three seconds so the semifinal round seems to have a higher central tendency which is a little bit counterintuitive you would expect the finalists to be running faster on average than the semi-finalists but that's not what the state is showing so the semifinal round has higher Center higher higher center and I just eyeballed the median and I suspect that the mean would also be higher in this second distribution and now what about variability well once again if you just looked at range and these are both at the same scale if you just visually look the variability here the range for the final round is larger than the range for the cyma five semi final round so you would say that the final round has higher variability very ability it has a higher range eyeballing it it looks like it has a higher higher spread and there's of course times where one distribution could have a higher range but then it might have a lower standard deviation for example you could have data that's like you know two data points that are really far apart but then all of the other data just sits right it's really really closely packed so for example a distribution like this and I'll draw the horizontal axis here just so you can imagine it as a distribution a distribution like this might have a higher range but lower standard deviation than a distribution like this let me just I'm just drawing a very rough example a distribution like this has a lower range but actually might have a higher standard deviation might have a higher daeviation than the one above it in fact I can even I can make that even better a distribution like this would have a lower range but it would also have a higher standard deviation so you can't just look at it's not always the case that just by looking at one of these measures the range or the standard deviation you'll know for sure but in cases like this it's safe to say when you're looking at it by inspection that look this this green the final round data does seem to have a higher range higher variability and so I'd feel pretty good at this is this very high level comparison