Main content
Course: Statistics and probability > Unit 3
Lesson 7: Box and whisker plots- Worked example: Creating a box plot (odd number of data points)
- Worked example: Creating a box plot (even number of data points)
- Constructing a box plot
- Creating box plots
- Reading box plots
- Reading box plots
- Interpreting box plots
- Interpreting quartiles
- Box plot review
- Judging outliers in a dataset
- Identifying outliers
- Identifying outliers with the 1.5xIQR rule
© 2024 Khan AcademyTerms of usePrivacy PolicyCookie Notice
Identifying outliers with the 1.5xIQR rule
An outlier is a data point that lies outside the overall pattern in a distribution.
The distribution below shows the scores on a driver's test for applicants. How many outliers do you see?
Some people may say there are outliers, but someone else might disagree and say there are or outliers. Statisticians have developed many ways to identify what should and shouldn't be called an outlier.
A commonly used rule says that a data point is an outlier if it is more than above the third quartile or below the first quartile. Said differently, low outliers are below and high outliers are above .
Let's try it out on the distribution from above.
Step 1) Find the median, quartiles, and interquartile range
Here are the scores listed out.
Step 2) Calculate below the first quartile and check for low outliers.
Step 3) Calculate above the third quartile and check for high outliers.
Bonus learning: Showing outliers in box and whisker plots
Box and whisker plots will often show outliers as dots that are separate from the rest of the plot.
Here's a box and whisker plot of the distribution from above that does not show outliers.
Here's a box and whisker plot of the same distribution that does show outliers.
Notice how the outliers are shown as dots, and the whisker had to change. The whisker extends to the farthest point in the data set that wasn't an outlier, which was .
Here's the original data set again for comparison.
Want to join the conversation?
- Can their be a negative outlier?(13 votes)
- Yes, absolutely.
For example, let's consider
-19, -1, (0), 5, 7, (9), 12, 12, (12), 13, 13
Low threshold Q1-1.5*(Q3-Q1) = 0 - 1.5*12 = -18. Our min value -19 is less than -18, so it is an outlier.
Now, let's shift our numbers in such a way, that there's no more negative numbers:
0, 18, (19), 24, 26, (28), 31, 31, (31), 32, 32 - the same sequence, but with numbers shifted to be positive.
Low threshold = Q1 - 1.5*(Q3-Q1) = 19 - 1.5*(31-19) = 19-1.5*12 = 19-18 = 1.
Our difference is the same here, -19 - (-18) = 0 - 1 = -1, therefore, negative numbers can be used in our data sets as well as positive.
If you think about it, there's no difference in negative or positive numbers as no difference between coordinates on the (x, y) plane. For example, you can get distance between 2 points, doesn't matter where those 2 points lie. This is not exception.(31 votes)
- In this example, and in others, KhanAcademy calculates Q3 as the midpoint of all numbers above Q2. Q2, or the median of the dataset, is excluded from the calculation. The same is true for Q1: it is calculated as the midpoint of all numbers below Q2.
Using Excel, I notice Q1 and Q3 are calculated inclusive of Q2...so Q3 equals the median of the dataset from Q2 to Max, inclusive. Q1 equals the median of the dataset from Min to Q2 inclusive. This changes the IQR from 5 (per KhanAcademy) to 3.5.
Which is correct? Does it depend on whether or not the number of points in the data set is odd or even?(16 votes)- Great Question. The 5 is the correct answer for the question. Like you said in your comment, The Quartile values are calculated without including the median(2 votes)
- what if most of the data points lies outside the iqr??(5 votes)
- Although you can have "many" outliers (in a large data set), it is impossible for "most" of the data points to be outside of the IQR. The IQR, or more specifically, the zone between Q1 and Q3, by definition contains the middle 50% of the data. Extending that to 1.5*IQR above and below it is a very generous zone to encompass most of the data.(14 votes)
- thanks. now I’m a step farther from the “stressed about not knowing this” zone lol(5 votes)
- Why wouldn't we recompute the 5-number summary without the outliers?(3 votes)
- If you want to remove the outliers then could employ a trimmed mean, which would be more fair, as it would remove numbers on both sides.(4 votes)
- I have a point which seems to be the outlier in my scatter plot graph since it is nowhere near to other points. My maths teacher said I had to prove the point to be the outlier with this IQR method. Now the y-coordinate of the point is definetely an outlier (which is why the point is at the very bottom of the graph) but x-coordinate is not. Can I still identify the point as the outlier?(4 votes)
- Hi Zeynep, I think you're looking for finding outliers in 2D ie aka Directional quantile envelopes. Check out https://mathematica.stackexchange.com/questions/114012/finding-outliers-in-2d-and-3d-numerical-data and/or https://mathematicaforprediction.wordpress.com/2014/11/03/directional-quantile-envelopes/(0 votes)
- How did you get the value 5 for IQR?(0 votes)
- IQR, or interquartile range, is the difference between Q3 and Q1. Here Q1 was found to be 19, and Q3 was found to be 24. So subtracting gives, 24 - 19 = 5. Hope that helps!(6 votes)
- In the bonus learning, how do the extra dots represent outliers? Wouldn't 5 be the lowest point, not an outlier.(2 votes)
- For the box-and-whisker showing outliers, the whiskers are modified to depict a span from a low of Q1-1.5*IQR to Q3+1.5*IQR. In other words, the whiskers are modified to represent the non-outliers. Any values outside that range are outliers, and are then depicted with dots.(1 vote)
- How do I draw the box and whiskers? Do I start from Q1 with all the calculations and end at Q3?(2 votes)
- On question 3 how are you using the Q1-1.5_Iqr how does that have to do with the chart
?(2 votes)