If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Comparing means of distributions

Sal compares the means of two different distributions given as dot plots. Created by Sal Khan.

Want to join the conversation?

Video transcript

Voiceover:Kenny interviewed freshmen and seniors at his high school, asking them how many pieces of fruit they eat each day. The results are shown in the 2 plots below. The first statement that we have to complete is the mean number of fruits is greater for, and actually, let me go down the actual screen, is greater for, we have to pick between freshmen and seniors. Then they said the mean is a good measure for the center of distribution of, and we pick either freshmen or seniors. Let me go back to my scratch pad here, and let's think about this. Let's first think about the first part. Let's just calculate the mean for each of these distributions. I encourage you to pause the video and try to calculate it out on your own. Let's first think about the mean number of fruit for freshmen. Essentially, we're just going to take each of these data points, add them all together, and then divide by the number of data points that we have. We have one data point at 0. We have one data point at 0, so I'll write 0. And then we have two data points at 1, so we could say plus 2 times 1. And then we have two data points at 2, so you write plus 2 times 2. And then, let's see, we have a bunch of data. We have four data points at 3, so we could say we have four 3s. Let me circle that. So we have four 3s, plus 4 times 3. And then we have three 4s, so plus 3 times 4. And then we have a 5, so plus 5, and then we have a 6. Let me do this in a color that you can see. And then we have a 6 right over here, plus 6. How many total points did we have? We had 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, oh, actually, be careful. We had 15 points and I didn't put that one in there. Actually, let me just ... So we have 15 points, and I can't forget this one over here, so plus ... my pen is acting a little funny right now, but we'll power through that, plus 19. So what is this going to be? This is just going to be 0. This is going to be 2. This is going to be 4. This is going to be 12. My pen is really acting up. It's almost like it's running out of digital ink or something. This is going to be another 12, and then we have 5, 6, and 19. So what is this going to be? 2 plus 4 is 6, plus 24 is 30, plus 11 is 41, plus 19 gets us to 60. 60 divided by 15 is 4, so the mean number of fruit per day for the freshmen is 4 pieces of fruit per day. This right over here, that right over there is our mean for the ... Let me put that in a color that you can actually see. Now let's do the same calculation for the seniors. We have one data point where they didn't eat any fruit at all each day, not too healthy. Then you have one 1, so I'll just write that as, we could actually write that as 1 times 1, but I'll just write that as 1. Then we have two 2s, so plus 2 times 2. Then we have one, two , three, four, five 3s, five 3s, so plus 5 times 3. And then we have three 4s, so plus 3 times 4. And then we have two 5s, plus 2 times 5, and then we have a 6. We have a 6, plus 6, and we have a 7, someone eats 7 pieces of fruit each day, a lot of fiber, plus 7. And now, how many data points did we have? We have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 data points. So we're going to divide this by 16. So what is this going to be? This is just 0. Let's see. This is, just right over, that's 0. This is 4. This is 15. This is 12. This is 10. So we have 1 plus 4 is 5 plus 15 is 20 plus 12 is 32 plus 10 is 42. 42 plus 6 is 48, 48. Am I doing ... 42 plus 6 is 48 plus 7, 48 plus 7 is 55. Did I do that right? Let me do that one more time. 1 plus 4 is 5 plus 15 is 20, 32, 42. 42 plus 13 is 55. So this is equal to 55 over 16, which is the same thing as, let's see, that's the same thing as 3 and 3 that ... 3 times 16 is 48, so 3 and 7/16. So the mean for the seniors, 3 and 7/16, that's right around ... let's see. This is 3, that's 4, so 7/16, it's a little less than a half. It's right around there. So the mean number of fruits is defnitely greater for the freshmen. They have 4 ... Their mean number of fruit eaten per day is 4 versus 3 and 7/16. The mean is a good measure for the center of the distribution of. So when we think about whether it's freshmen or seniors, the mean is fairly sensitive to when you have outliers here. For example, someone here was eating 19 pieces of fruit per day. That's an enormous amount of fruit. They must be only eating fruit. You can imagine if it was even a bigger number, if someone was eating 20 or 30 pieces of fruit, just that one data point will skew the entire mean upwards. That wouldn't be the effect on the mode because the mode is a middle number. Even if you change this one point all the way out here, it's not going to change what the middle number is. So the mean is more sensitive to these outliers, to these really, these points that are really, really high, really, really low. And because the seniors don't seem to have any outliers like that, I would say that the mean is a good measure for the center of distribution for the seniors, or a better measure for the center of distribution for the seniors. Let's fill both of those out. The mean number of fruit is greater for the freshmen, and the mean is a good measure for the center of distribution for the seniors. You actually even see it here. We saw that the mean number for freshmen was at 4, but if you just ignored this person right over here and just you thought about the bulk of this distribution right over here, 4 really doesn't look like the center of it. The center of it looks closer to 3 here. What happened is this one person eating 19 pieces of fruit per day skewed the mean upwards. While here, that 3 and 7/16 really did look closer to the actual distribution, closer to the ... actually, I shouldn't say ... I mean in both times, we actually did calculate the mean of the actual distribution. But here, since there's no outliers, it does seem the mean seemed much closer to, I guess you could say the middle of this pile right over here. Let's check our answer, and we got it right.