If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Clusters, gaps, peaks & outliers

This lesson explores the features of distributions in data sets, like clusters, gaps, and peaks. We learn how to identify outliers, which are data points far from the rest. We also discuss how to spot peaks and clusters in data.

Want to join the conversation?

Video transcript

- [Voiceover] In this video, I wanna do some examples looking at distributions, in particular, different features in distributions like clusters, gaps, and peaks. So over here, I wanna do some examples. Which of the following are accurate descriptions of the distribution below? Select all that apply. So the first statement is the distribution has an outlier. So an outlier is a data point that's way off of where the other data points are, it's way larger or way smaller than where all of the other data points seem to be clustered and if we look over here, we have a lot of data points between zero and six. And let's just think about what they're measuring: this is shelf time for each apple at Gorg's Grocier. So, for example, we see there's one, two, three, four, five, six, seven apples that have a shelf life of zero days, so (laughs), they're about to go bad. You see you have one, two, three, four, five, six, seven, eight apples that are gonna be good for another day. You have two apples that are gonna be good for another six days, and you have one apple that's gonna be good for 10 days, and this is unusual. This is an outlier here, it has a way larger shelf life than all of the other data, so I would say this definitely does have an outlier. We just have this one data point sitting all the way to the right, way larger, way more shelf life than everything else, so it definitely has an outlier, and this one would be the outlier. The distribution has a cluster from four to six days. And we indeed do see a cluster from four to six days. A cluster, you can imagine, it's a grouping of data that's sitting there, or you have a grouping of apples that have a shelf life between four and six days, and you definitely do see that cluster there. And since I already selected two things, I'm definitely not gonna select none of the above. Let me check my answer. Let me do a few more of these. Which of the following are accurate descriptions of the distribution below? And once again we're going to select all that apply. So the distribution has an outlier. So let's see this distribution. I do have a data point here that is at the high end and I have another data point here that's at the low end, but I don't have any data points that are sitting far above or far below the bulk of the data. If I had a data point that was out here, then yeah, I would say that was an outlier to the right, or a positive outlier, if I had a data point way to the left off the screen over here, maybe that would be an outlier, but I don't really see any obvious outliers. All of the data, it's pretty clustered together. So I would not say that the distribution has an outlier. The distribution has a peak at 22 degrees. Yeah, it does indeed look like we have, and let's just look at what we're actually measuring: high temperature each day in Edgeton, Iowa in July. So it does indeed look like we have the most number of days that had a high temperature at 22, most number of days in July had a high tempurature at 22 degrees Celsius, so that is a peak. You can see it, if you imagine this as kind of a mountain this is a peak right here, this is a high point. You have, at least locally, the most number of days at 22 degrees Celsius. So I would say it definitely has a peak there. Since I selected something, I'm not gonna select none of the above. Let's do a couple more of these. Which of the following are accurate descrptions of the distribution below? So the first one, the distribution has an outlier. So... number of guests by day at Seth's Sandwich Shop. So, let's see, the lowest... They have no days... No days where he had between zero and 19 guests, no days where he had between 20 and 39 guests, looks like there's about nine days where he had between 40 and 59 guests, looks like 20 days where he had between 60 and 79 guests, all the way where it looks like maybe 8 days that he had between 180 and 199 guests. But the question of outliers, there doesn't seem to be any day where he had an unusual number of guests. There's not a day that's way out here, where he had, like, 500 guests. So I would say this distribution does not have an outlier. The distribution has a cluster from zero to 39 guests. So zero to 39 guests is right over here, zero to 39 guests. And there is no days where he had between zero and 39 guests neither zero to 19, or 20 to 39. So there's definitely not a cluster there. I would say that the cluster would be between days that had between 40 and 199 guests. Definitely not zero and 39, there was no days that were between zero and 39 guests. So I would say none of the above very confidently. Let's do one more of these. Which of the following are accurate descriptions of the distribution below? (laughs) Alright. The distribution has a peak from 12 to 13 points. Let me see what this is measuring, what this data is about. Test scores by student in Mrs. Frine's class. So you had one student who got between a zero and a one on a 20-point scale, so got between, I guess out of 20 questions, got between zero and one point. And then you see that there's no students got between two and three, or four and five, or six and seven. Then we have another student who got between eight and nine, looks like three students got between 10 and 11, and then we keep increasing, this looks like about 12 students got either a 16 or a 17, or something in between maybe, if you could get decimal points on that test. And then it looks like 10 students got from 18 to 19. Alright, so this says the distribution has a peak from 12 to 13 points, 12 to 13 points, there were five students, but this isn't a peak. If you just go to 14 to 15 points, you have more students. So this is definitely not a peak. If you were looking at this as a mountain of some kind, you definitely wouldn't describe this point as a peak. You would say this distribution has a peak, it has the most number of students who got between 16 and 17 points, so that's the peak right there, not 12 to 13 points. So I would not select that first choice. The distribution has an outlier. Well, yeah, look at this: you have this outlier. Most of the students scored between eight and 19 points, and then you have this one student who got between zero and one, it's really an outlier. You even see this when you look at it visually, it's not even connected to the rest of the distribution. It's way to the left. If something is way to the left or way to the right, that's an outlier if it's unusually low or unusually high. So I would say this distribution definitely does have an outlier, and I'm not gonna pick none of the above since I found a choice. And I think we're all done.