If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

### Course: Measurement & Data - Statistics & Probability 218-221>Unit 1

Lesson 7: Box plots

# Worked example: Creating a box plot (even number of data points)

Box-and-whiskers plots help visualize data ranges and medians. First, arrange your numbers from least to greatest. The smallest and largest numbers form the 'whiskers'. The median of the entire data set splits the 'box' in the middle. The medians of the top and bottom halves of the numbers form the 'box' boundaries.

## Want to join the conversation?

• can i get some help what does IQR mean
• The interquartile range. Where you subtract Quartile 3 and Quartile 1.
• Shouldn't the upper quartile be 7.5, the average of 7 and 8? Considering the upper quartile consists of six numbers?
• it has 7:5 5 6 7 8 8 10
the first five stays since it is not the median
• The question says to exclude the median when calculating the quartiles. In this video they have included median.
Can you explain what does exclude means here ?
what I think is, Q3 will be 7.5 and Q1 will be 2.
• When the question states to "exclude the median when computing the quartiles," it means that when you're finding the first quartile (Q1) and the third quartile (Q3), you should not include the median value in the calculations. In other words, the median is not considered when determining the quartiles. Your interpretation is correct: Q1 will be the median of the lower half of the data, and Q3 will be the median of the upper half of the data. So, if the median is 4.5, then Q1 will be the median of the numbers less than 4.5, and Q3 will be the median of the numbers greater than 4.5.

To summarize:

Q1 is the median of the lower half of the data (excluding the median).
Q3 is the median of the upper half of the data (excluding the median).
Based on your calculation, if the median is 4.5, then Q1 could indeed be 2 and Q3 could be 7.5. However, these values might vary depending on the specific distribution of the data points. If you'd like, I can guide you through the process of finding Q1 and Q3 using the given data set.
• I'm confused. In the last worked example when we had an odd number of data, we were taught to eliminate the mean when calculating the upper and lower means. Does that rule not apply with even numbers of data? That was not clear.
• We want the median to divide the data set into two equal halves.
However, with an odd number of data points the two halves can't be equal in size which is why we remove the median before we calculate the upper and lower quartiles.
With an even number of data points we don't have this problem and don't have to remove the median.
• Bro dis hard
• Good luck
P.S Just trying to help
• On the previous video, we are told to exclude the median when computing the quartiles, so Sal does. On this video Sal includes them when it says to exclude them. I'm confused!
• How does outliers affect mean
• Outliers tend to skew the mean to the left or right of the center (according to where the outlier is).
For example take this data set {2,6,6,8,9,11} the mean is (2+6+6+8+9+11)/6 = 42/6 = "7"
If we replaced the 11 in our data set with an outlier "41" the new data set becomes {2,6,6,8,9,41} and the new mean becomes (2+6+6+8+9+41)/6 = 72/6 = "12"
Notice that our new mean "12" is outside our original data set, so what that outlier "41" did is it skewed the mean to the right.
On a side note: This effect of outliers does not happen in median (or it does not change that much); notice that the median is the arithmetic mean of 6 and 8, i.e, the median is 7 in both data sets.
(1 vote)
• I dont know how to create a box plot from a histogram. Help please!
• 1. Write out the data for example 1,2,3,3,3,4,5,5,5,5,6,7,8,8,9,9.
2. Put it in a box plot like shown in the video.
Hope this helps.
(1 vote)
• When working out the median of the first and second half, why did Sal include the 4 and 5?
(1 vote)
• The amount of data points that are less than the median is always equal to the amount of data points that are greater than the median.

In this case we have an even number of data points, namely 14 of them, and so the median should be somewhere between the 7th and 8th data points (i.e. between 4 and 5).
We arrange this by letting the median be the mean of those two data points ((4 + 5)∕2 = 4.5).

– – –

When working out the lower quartile we use the same logic, but for the data points up to and not including the median (i.e. all the data points less than 4.5, which does include 4).

Similarly, for the upper quartile we consider the data points above the median (greater than 4.5, thus including 5).
• lm not sure how Sal got 4.5 as the median. Can someone explain that to me in a little more context please? ... THX :)
• the median is in the middle of the beginning and end of the box plot

## Video transcript

- [Voiceover] Represent the following data using a box-and-whiskers plot. Once again, exclude the median when computing the quartiles. And they gave us a bunch of data points, and it says, if it helps, you might drag the numbers around, which I will do, because that will be useful. And they say the order isn't checked, and that's because I'm doing this on Khan Academy exercises. Up here in the top right, where you can't see, there's actually a check answer. So I encourage you to use the exercises yourself, but let's just use this as an example. So the first thing, if I'm going to do a box-and-whiskers, I'm going to order these numbers. So let me order these numbers from least to greatest. So let's see. There's a one here, and we've got some twos. We've got some twos here and some threes, some threes, some four-- I have one four and fives. I have a six. I have a seven. I have a couple of eights, and I have a 10. So there you go. I have ordered these numbers from least to greatest, and now, well just like that, I can plot the whiskers, because I see the range. My lowest number is one. So my lowest number is one. My largest number is 10. So the whiskers help me visualize the range. Now let me think about the median of my data set is. So my median here is going to be, let's see. I have one, two, three, four, five, six, seven, eight, nine, 10, 11, 12, 13, 14 numbers. Since I have an even number of numbers, the middle two numbers are going to help define my median, because there's no one middle number. I might say this number right over here, this four, but notice, there's one, two, three, four, five, six, seven above it, and there's only one, two, three, four, five, six below it. Same thing would have been true for this five. So this four and five, the middle is actually in between these two. So when you have an even number of numbers like this, you take the middle two numbers, this four and this five, and you take the mean of the two. So the mean of four and five is going to be four-and-a-half. So that's going to be the median of our entire data set, four-and-a-half, four-and-a-half. And now, I want to figure out the median of the bottom half of numbers and the top half of numbers. And here they say exclude the median. Of course I'm going to exclude the median. It's not even included in our data points right here, because our median is 4.5. So now let's take this bottom half of numbers. Let's take this bottom half of numbers right over here and find the middle. So this is the bottom seven numbers. And so the median of those is going to be the one which has three on either side, so it's going to be this two right over here. So that right over there is kind of the left boundary of our box, and then for the right boundary, we need to figure out the middle of our top half of numbers. Remember, four and five were our middle two numbers. Our median is right in between at four-and-a-half. So our top half of numbers starts at this five and goes to this 10. Seven numbers. The middle one's going to have three on both sides. The seven has three to the left, remember of the top half, and three to the right. And so the seven is, I guess you could say the right side of our box. And we're done. We've constructed our box-and-whiskers plot, which helps us visualize the entire range but also you could say the middle, roughly the middle half of our numbers.