Statistics and probability
- Worked example: Creating a box plot (odd number of data points)
- Worked example: Creating a box plot (even number of data points)
- Constructing a box plot
- Creating box plots
- Reading box plots
- Reading box plots
- Interpreting box plots
- Interpreting quartiles
- Box plot review
- Judging outliers in a dataset
- Identifying outliers
- Identifying outliers with the 1.5xIQR rule
Learn how to evaluate what we know and what we don't know about a dataset given its box plot.
Want to join the conversation?
- I still don't understand what a 'quartile' is. Can someone please help?(12 votes)
- the quartile is just like another median but considering only the left half of the plot or the right half. Just divide the data set exactly in half and do like you would find the median.(2 votes)
- Am I the only one who is confussed out of there mind! ;-;(9 votes)
- At7:14, Isn't there 50% on one side of the median and 50% on the other so technically isn't 13 still in the middle so therefore it would be true? Thanks!(6 votes)
- the sign with the curly = sign means that it is approximated, so he can't be sure that it is truly 50% on that side, on the flip side, because we don't know, it might as well be like that.(6 votes)
- I do not get the last statement. We DO know that exactly half of the students are older than 13, because 50% is on the right side of the median which is also 13. Don't we?(4 votes)
- If he'd said that exactly 50% ARE 13 or older, that would be true, because it includes the median.
For example, if i say that 5 is greater than 10/2, that would be false. Because 10/2 is 5, but i said it was GREATER. On the other hand, if i said that 5 >= 10/2 (greater or equal to), that would be true.(5 votes)
- i am very confused where did he get precents from(4 votes)
- Think of the box-and-whisker plot as split into four parts (the first, second, third, and fourth quartiles), making each part equal to 1/4 (essentially 25%) of the plot.
As shown in the video, there are three quartiles that have values larger than ten; that means that 3/4 of the quartiles have kids older than 10. In other words, 75% of the plot accounts for kids 10 and older (since 3/4 can be written as 75%).
The fact that every quartile is 25% is a guestimate; the point is that all three quartiles should add up to at least 75% of the plot.
Hope this clears things up!😄(5 votes)
- At1:06, when he says that it is the second quartile, wouldn't that be Q1 and the median be Q2?(5 votes)
- I don't get it why in the last statement the answer is " We dont know"? The median splits the numbers exactly in half. Even in first example Sal used to prove the point with 6 data points and 13 as a median this median splits numbers exactly in half because median is not a data point but average from two middle data points. Could anyone explain?(3 votes)
- im confused how do you place the whiskers?(2 votes)
- The whiskers show you the minimum and maximum of the whole data set. So the end of the whiskers on the left side are showing you the smallest number. The end of the whisker on the right side is showing you the largest number. Sorry if this isn't a good explanation! It's easier to explain with a whiteboard!(3 votes)
- I'm about to teach this concept to my kids and I've seen problems pop up asking "what percent of students are older than 13?"; but this is unanswerable, yes? The book answer says 50% and I understand where this is coming from, but technically, we could have quite a few 13 year old kids that we cannot see in the plot. Is there a way to solve this problem, or is this a poorly written question?(2 votes)
- Is there a way to find the mean in a box plot or is it only the median? 🤔🤨(2 votes)
- To find the mean, you add up all the numbers and divide by how many numbers there are. Ex: 23+2+10+5 = 40 then you would divide by 4 and the mean would be 10. Hopefully this helps!!(1 vote)
- [Voiceover] So i have a box and whiskers plot showing us the ages of students at a party. And what I'm hoping to do in this video is get a little bit of practice interpreting this. And what I have here are five different statements and I want you to look at these statements. Pause the video, look at these statements, and think about which of these, based on the information in the box and whiskers plot, which of these are for sure true, which of these are for sure false, and which of these we don't have enough information, it could go either way. Alright, so let's work through these. So the first statement is that all of the students are less than 17 years old. Well we see, right over here, that the maximum age, that's the right end of this right whisker is 16. So it is the case that all of the students are less than 17 years old. So this is definitely going to be true. The next statement. At least 75% of the students are 10 years old or older. So, when you look at this, this feels right, because 10 is, 10 is the value that is at the beginning of the second quartile. This is the second quartile right over there. And actually, let me do this, let me do this in a different color. So, this is the second quartile. So 25% of the value of the numbers are in the second, or roughly, sometimes it's not exactly, so approximately, I'll say roughly 25% are going to be in this second quartile, approximately 25% are going to be in the third quartile, and approximately 25% are going to be in the fourth quartile. So it seems reasonable for saying 10 years old or older that this is going to be, this is going to be true. In fact, you could even have a couple of values in the first quartile that are 10. But to make that a little more tangible, let's look at some, so I'm feeling, I'm feeling good that this is true, but let's look a few more examples to make this a little more concrete. So they don't know, we don't know, based on the information here exactly how many students are at the party. We'll have to construct some scenarios. So we could do a scenario, let's see if we can do... We could do a scenario where well let's see, let's see if I can, I can construct something where, let's see, the median is 13. We know that for sure. The median is 13, so if I have an odd number, I would have 13 in the middle, just like that, and maybe I have three on either side. And I'm just making that number up. I'm just trying to see what I can learn about different types of data sets that could be described by this box and whiskers plot. So 10 is going to be the middle of the bottom half. So that's 10 right over there. And 15 is going to be the middle of the top half. That's what this box and whiskers plot is telling us. And they of course tell us what the minimum, the minimum is seven. And they tell us that the maximum is 16. So we know that's seven and then that is 16. And then this, right over here, could be anything. It could be 10, it could be 11, it could be 12, it could be 13. It wouldn't change what these medians are. It wouldn't change this box and whiskers plot. Similarly, this could be 13, it could be 14, it could be 15, and so any of those values wouldn't change it. And so 75% are 10 or older, well, this value, in this case, six out of seven are 10 years old or older. And we could try it out with other, other scenarios where... let's try to minimize the number of 10s given this data set. Well we could do something like, let's say that we have eight. So let's see, one, two, three, four, five, six, seven, eight. And so here we know that the minimum, we know that the minimum is seven, we know that the maximum is 16. We know, we know that the, we know that the mean of these middle two values, we have an even number now so, the median is going to be the mean of these two values. So, it's going to be the mean of this and this, is going to be 13. And we know that the mean of, we know that the mean of this and this is going to be 10 and that the mean of this and this is going to be, is going to be 15. So what could we construct? Well actually, we don't even have to construct to answer this question. We know that, we know that this is going, this is going to have to be 10 or larger. And then all of these other things are going to be 10 or larger, so this is exactly 75%. Exactly 75% if we assume that this is less than 10, are going to be 10 years old or older. So feeling very good, very good, about this one right over here. And actually, just to make this concrete, I'll put in some values here. You know this could be, this could be a nine and an 11. This could be a 12 and a 14. This could be a 14 and a 16. Or, it could be, it could be a 15 and a 15. You could think about it and in any of those, in any of those ways. But feeling very good that this is definitely going to be true based on the information given in this plot. Now they say there's only one seven-year-old at the party. One seven-year-old at the party. Well this first, this first possibility that we looked at, that was the case. There was only one seven-year-old at the party and there was one 16-year-old at the party. And actually, that was the next statement, there's only one 16-year-old at the party. So both of these seem like we can definitely construct data that's consistent with this box plot, box and whiskers plot, where this is true. But could we construct one where it's not true? Well sure. Let's imagine, let's see, we have our median at 13. Median at 13. And then we have, let's see, one, two, three, four, five. One, two, three, four, five. This is gonna be, this is gonna be the 10, the median of this bottom half. This is going to be 15. This is going to be seven. This is going to be 16. Well this could also be seven. It doesn't have to be. It could be seven, eight, nine, or ten. This could also be 16. Doesn't have to be. It could be 15 as well. But just like that, I've constructed a data set, and these could be, you know this could be 10, 11, 12, 13. This could be 10, 11, 12, 13. This could be 13, 14, 15. This one also could be 13, 14, 15. But the simple thing is, or the basic idea here, I can have a data set where I have multiple sevens and multiple 16s, or I could have a data set where I only have one seven or only one 16. So both of these statements, we just plain don't know. We just don't, we just don't know. Now the next statement, exactly half the students are older than 13. Well if you look at this possibility up here, we saw that three out of the seven are older than 13. So that's not exactly half. 3/7 is not 1/2. But in this one over here, we did see that exactly half are over, are older than 13. In fact, if you're saying exactly half... Well, in this one we're saying that exactly half are older than 13. We have an even number right over here. And so it is exactly half. So it's possible that it's true, it's possible that it's not true based on the information given. We once again, we once again don't, we once again do not know. Anyway, hopefully you found this interesting. This is, the whole point of me doing this is when you look at statistics, sometimes it's easy to kind of say, okay I think it roughly means that, and that's sometimes okay. But it's very important to think about what types of actual statements you can make and what you can't make and it's very important when you're looking at statistics to say, well you know what, I just don't know. That the data actually is not telling me that thing for sure.