If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Constructing a box plot

Here's a word problem that's perfectly suited for a box and whiskers plot to help analyze data. Let's construct one together, shall we?. Created by Sal Khan and Monterey Institute for Technology and Education.

Want to join the conversation?

  • blobby green style avatar for user olivierdoyon775
    If the number of samples is even and we have to take the median using the 2 middle numbers instead of 1, do you not use both of those middle numbers when you go to find the 25th and 75th quartiles too?
    (203 votes)
    Default Khan Academy avatar avatar for user
  • male robot donald style avatar for user Levi Armstrong
    How do you find range?
    (62 votes)
    Default Khan Academy avatar avatar for user
  • piceratops ultimate style avatar for user Ella
    Ugh! I still can't understand it, even though I watched this amazing video! Could someone explain it to me in a nutshell? I just don't get how you make the graph! Also, maybe someone could recommend a Khan video that can help me. Thanks in advance!
    (20 votes)
    Default Khan Academy avatar avatar for user
    • duskpin ultimate style avatar for user Pixel Reshiram
      I am also not very good at explaining, and I hope this isn't too late to answer your question, but this video is mainly about the characteristics and details of a box and whisker plot. Sal demonstrates when to use a box plot in the beginning of the video and to use it to see the median and all of the values. He then tells us to arrange all the numbers so we can find the median of the data set, as to create the box and whisker plot. When you have figured out the median, you would separate the numbers in halves and like Sal said, if there is an even number of values, you have to find the average of the middle two numbers. These apply to find the first and third quartiles of the data set. In a box plot, you use the minimum, the first quartile value, the median, the third quartile value and the maximum value. If you still don't get it here is a helpful link :
      http://www.wikihow.com/Make-a-Box-and-Whisker-Plot
      If this is too complicated, don't mind the info about outliers. :)
      (23 votes)
  • piceratops seed style avatar for user SinGh
    I don't understand Box and whisker plots still
    (9 votes)
    Default Khan Academy avatar avatar for user
    • orange juice squid orange style avatar for user Mahfuzun  Nabi
      So the box and whiskers plot is composed of five data points. It is the summary of your distribution. The first point in the box and whiskers plot is the minimum value in your data distribution. The second point is the Q1 value (the value to which 25 percent of the data fall to the left). The third point is the median of your distribution. The fourth point is your Q3 value (the value to which 75 percent of the data fall to the left). The last point is the maximum value in your data distribution. The box and whiskers plot is summary of our data and often can be used to identify low and high outliers. For instance, to find a low outlier, we can use the equation: Q1 - 1.5 (Q3-Q1). To find a high outlier, we can use the equation: Q3 + 1.5 (Q3-Q1). By the way, the (Q3-Q1) is called the IQR and it is a measurement of the spread of your data. Hope this helps :)
      (27 votes)
  • piceratops tree style avatar for user goku
    what is meadian
    (3 votes)
    Default Khan Academy avatar avatar for user
  • aqualine ultimate style avatar for user Loco haha
    Hello don't have a good day have a great day!
    (10 votes)
    Default Khan Academy avatar avatar for user
  • duskpin sapling style avatar for user Jordan Thomas
    What's the interquartile range??
    (5 votes)
    Default Khan Academy avatar avatar for user
  • marcimus orange style avatar for user Liz
    does the summer have to start at zero??
    (7 votes)
    Default Khan Academy avatar avatar for user
  • duskpin seedling style avatar for user sgleano
    This is just a summary of everything Sal is doing, just in case you are confused.

    1) order data points from least to greatest -- all the numbers are ordered at

    2) find the median of all the numbers -- the circled number at is the median

    3) find the median of ONLY the bottom half -- so take all the numbers below the original median that we found, and take the median of those numbers, shown at

    4) find the median of ONLY the top half -- ie do the same you did in step 3, except for all the numbers above the median, the circled dark pink number at is the top median

    5) graph -- the whisker in the front starts at the smallest number in the ordered numbers, in this case, 1. The whisker in the back ends at the biggest number in the ordered numbers, in this case, 22. The bottom edge of the box is at the lowest median (ie what we found in step 3). That is where you stop the lower whisker. The middle line in the box is at the median of all the numbers (ie what we found in step 2). The top edge of the box is at the highest median (ie what we found in step 4). The highest median is where you start the top whisker.

    Hope this helps!
    (5 votes)
    Default Khan Academy avatar avatar for user
  • ohnoes default style avatar for user Kate Yu
    What are the median of the bottom half and the median of the top half called?
    (2 votes)
    Default Khan Academy avatar avatar for user

Video transcript

The owner of a restaurant wants to find out more about where his patrons are coming from. One day, he decided to gather data about the distance in miles that people commuted to get to his restaurant. People reported the following distances traveled. So here are all the distances traveled. He wants to create a graph that helps him understand the spread of the distances-- this is a key word-- the spread of distances and the median distance that people traveled or that people travel. What kind of graph should he create? So the answer of what kind of graph he should create, that might be a little bit more straightforward than the actual creation of the graph, which we will also do. But he's trying to visualize the spread of information. And at the same time, he wants the median. So what a graph captures both of that information? Well, a box and whisker plot. So let's actually try to draw a box and whisker plot. And to do that, we need to come up with the median. And we'll also see the median of the two halves of the data as well. And whenever we're trying to take the median of something, it's really helpful to order our data. So let's start off by attempting to order our data. So what is the smallest number here? Well, let's see. There's one 2. So let me mark it off. And then we have another two. So we've got all the 2's. And then we have this 3. Then we have this 3. I think we've got all the 3's. Then we have that 4. Then we have this 4. Do we have any 5's? No. Do we have any 6's? Yep. We have that 6. And that looks like the only 6. Any 7's? Yep. We have this 7 right over here. And I just realized that I missed this 1. So let me put the 1 at the beginning of our set. So I got that 1 right over there. Actually, there was two 1's. I missed both of them. So both of those 1's are right over there. So I have the 1's, 2's, 3's, 4's, no 5's. This is one 6. There was one 7. There's one 8 right over here. And then, let's see, any 9's? No 9's. Any 10s? Yep. There's a 10. Any 11s? We have an 11 right over there. Any 12s? Nope. 13, 14? Then we have a 15. And then we have a 20 and then a 22. So we've ordered all our data. Now it should be relatively straightforward to find the middle of our data, the median. So how many data points do we have? 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17. So the middle number is going to be a number that has 8 numbers larger than it and 8 numbers smaller than it. So let's think about it. 1, 2, 3, 4, 5, 6, 7, 8. So the number 6 here is larger than 8 of the values. And if I did the calculations right, it should be smaller than 8 of the values. 1, 2, 3, 4, 5, 6, 7, 8. So it is, indeed, the median. Now, when we're trying to construct a box and whisker plot, the convention is, OK, we have our median. And it's essentially dividing our data into two sets. Now, let's take the median of each of those sets. And the convention is to take our median out and have the sets that are left over. Sometimes people leave it in. But the standard convention, take this median out. And now, look separately at this set and look separately at this set. So if we look at this first bottom half of our numbers essentially, what's the median of these numbers? Well, we have 1, 2, 3, 4, 5, 6, 7, 8 data points. So we're actually going to have two middle numbers. So the two middle numbers are this 2 and this 3, three numbers less than these two, three numbers greater than it. And so when we're looking for a median, you have two middle numbers. We take the mean of these two numbers. So halfway in between two and three is 2.5. Or you can say 2 plus 3 is 5 divided by 2 is 2.5. So here we have a median of this bottom half of 2.5. And then the middle of the top half, once again, we have 8 data points. So our middle two numbers are going to be this 11 and this 14. And so if we want to take the mean of these two numbers, 11 plus 14 is 25. Halfway in between the two is 12.5. So 12.5 is exactly halfway between 11 and 14. And now, we've figured out all of the information we need to actually plot or actually create or actually draw our box and whisker plot. So let me draw a number line, so my best attempt at a number line. So that's my number line. And let's say that this right over here is a 0. I need to make sure I get all the way up to 22 or beyond 22. So let's say that's 0. Let's say this is 5. This is 10. That could be 15. And that could be 20. This could be 25. We could keep going-- 30, maybe 35. So the first thing we might want to think about-- there's several ways to draw it. We want to think about the box part of the box and whisker essentially represents the middle half of our data. So it's essentially trying to represent this data right over here, so the data between the medians of the two halves. So this is a part that we would attempt to represent with the box. So we would start right over here at this 2.5. This is essentially separating the first quartile from the second quartile, the first quarter of our numbers from the second quarter of our numbers. So let's put it right over here. So this is 2.5. 2.5 is halfway between 0 and 5. So that's 2.5. And then up here, we have 12.5. And 12.5 is right over-- let's see. This is 10. So this right over here would be halfway between, well, halfway between 10 and 15 is 12.5. So let me do this. So this is 12.5 right over here. So that separates the third quartile from the fourth quartile. And then our boxes, everything in between, so this is literally the middle half of our numbers. And we'd want to show where the actual median is. And that was actually one of the things we wanted to be able to think about when the owner of the restaurant wanted to think about how far people are traveling from. So the median is 6. So we can plot it right over here. So this right here is about six. Let me do that same pink color. So this right over here is 6. And then the whiskers of the box and whisker plot essentially show us the range of our data. And I can do this in a different color that I haven't used yet. I'll do this in orange. So essentially, if we want to see, look, the numbers go all the way up to 22. So they go all the way up to-- so let's say that this is 22 right over here. Our numbers go all the way up to 22. And they go as low as 1. So 1 is right about here. Let me label that. So that's 1. And they go as low as 1. So there you have it. We have our box and whisker plot. And you can see if you have a plot like this, just visually, you can immediately see, OK, what is the median? It's the middle of the box, essentially. It shows you the middle half. So it shows you how far they're spread or where the meat of the spread is. And then it shows, well, beyond that, we have the range that goes well beyond that or how far the total spread of our data is. So this gives a pretty good sense of both the median and the spread of our data.