If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

# Judging outliers in a dataset

AP.STATS:
UNC‑1 (EU)
,
UNC‑1.K (LO)
,
UNC‑1.K.1 (EK)
CCSS.Math:

## Video transcript

we have a list of 15 numbers here and what I want to do is think about the outliers and to help us with that lets actually visualize this the the distribution of actual numbers so let us do that so here on a number line I have all the numbers from 1 to 19 and let's see we have two ones so I could say that's one one and then two ones we have one six so let's put that six there we have got a 13 or we have two 13s so we're going to go up here 1 13 and 2 13s let's see we have 3 14 so 14 14 and 14 we have a couple of 15 15 15 so 15 15 we have 1 16 so that's our 16 there we have three 18s 1 2 3 so 1 2 and then 3 and then we have a 19 then we have a 19 so when you look when you look visually at the distribution of numbers it looks like the meat of the distribution so to speak is in this area right over here and so some people might say ok we have 3 outliers these are these two ones and the 6 some people might say well the 6 is kind of close enough maybe only these two ones are outliers and those would actually be both reasonable things to say now to get on the same page the statisticians will use a rule sometimes we say well anything that is more than one and a half times the interquartile range from below q1 or below or above q3 well those are going to be outliers well what am I talking about well let's actually let's figure out the median q1 and q3 here then we can figure out the interquartile range and then we could figure out by that definition what is going to be an outlier and if that all made sense to you so far I encourage you to pause this video and try to work through it on your own or I'll do it for you right now all right so what's the median here well the median is the middle number we have 15 numbers so the middle number is going to be whatever number has seven on either side so that's going to be the eighth number 1 2 3 4 5 6 7 is that right yep 6 7 so that's the median and you have 1 2 3 4 5 6 7 numbers on the right side 2 so that is the median sometimes called q2 that is our median now what is q1 well q1 is going to be the middle of this first group this first group has 7 numbers in it and so the middle is going to be the fourth number it has 3 and 3 3 to the left 3 to the right so that is Q 1 and then Q 3 is going to be the middle of this upper group well that also has 7 numbers in it so the middle is going to be right over there has 3 on either side so that is Q 3 now what is the interquartile range going to be interquartile range is going to be equal to q3 minus q1 the difference between 18 and 13 between 18 and 13 well that is going to be 18 minus 13 which is equal to 5 now to figure out outliers well out lies are going to be anything that is below so outliers outliers are going to be less than our q1 minus 1.5 times our inter quartile range and this once again this isn't some rule of the universe this is something the statisticians have kinda said well if we want to have a better definition for outliers let's just agree that it's something that's more than one and a half times the interquartile range below q1 or or an outlier could be greater than q3 plus 1 and 1/2 times the interquartile range inter quartile range and once again this is somewhat you know people just decided felt writing you one could argue to be 1.6 or one could argue should be 1 or 2 or whatever but this is what people have tended to agree on so let's think about what these numbers are Q one we already know so this is going to be 13 minus 1.5 times our interquartile range our interquartile range here is five so it's one point five times five which is seven point five so this is seven point five 13 minus 7 point 5 is what 13 minus 7 is 6 and then you subtract another point five is five point five so we have outliers outliers outliers would be less than five point five or q3 is 18 this is once again seven point five 18 plus seven point five is twenty five point five or outliers outliers greater than 25 25 point five so based of this based on this we have a kind of a numerical definition for what's an outlier we're not just subjectively saying oh this feels right or that feels right and based on this we only have two outliers that only these two ones are less than five point five only these two ones are less than five point five this is the cutoff right over here so this dot just happened to make it and we don't have any outliers on the high side now another thing to think about is drawing box and whiskers plot based on q1 our median or range all the range of numbers and you can do it either taking in consideration your outliers or not taking into consideration your outliers so there's a couple of ways that we can do it so let me actually clear let me clear all of this we've figured out all of this stuff so let me clear all of that out and let's actually draw a box in whiskers plot so I'll put a another another actually let me do two here that's one and then let me put another one down there this is another now if we were to just draw a classic box a box and whisker plot here we would say all right our medians at 14 and actually I'll do it both ways our medians at 14 at 14 q once at 13 q ones at 13 q ones at 13 q 3 is it 18 q 3 is it 18 q 3 is 18 so that's the box part let me draw that as an actual let me actually draw that as a box so my best attempt there you go that's the box and this is also a box so far I'm doing the exact same thing now if we don't want to consider outliers we would say well what's the entire range here well we have things that go from 1 all the way to 19 so one way to do it is a hey we can we start at 1 and it's our entire range we go so let me draw it a little bit better than that we're going all the way all the way from 1 to 19 now in this one we're including everything we're including even these two outliers but if we don't want to include those outliers we want to make it clear that they're outliers well let's not include them and what we can do instead is say alright including our non outliers we would start at 6 because 6 we're saying is in our data set but it is not an outlier let me make this look better so we're going to we are going to start at 6 and go all the way to 19 and then to say that we have these outliers we would put this we have outlier over there so once again this is a box and whisker plot of the same data set without outliers and this is one where we make specific we make it clear where the outliers actually are
AP® is a registered trademark of the College Board, which has not reviewed this resource.