Comparing and sampling populations
At the Olympic games, many events have several rounds of competition. One of these is the men's 100 meter backstroke. The upper dot plot shows the times in seconds of the top 8 finishers in the semifinal round at the 2012 Olympics. The lower dot plot shows the times of the same 8 swimmers, but in the final round. Which pieces of information can be gathered from these dot plots? In the semifinal round, we see that these are the 8 times of the 8 swimmers. We see 3 swimmers finished in exactly 53.5 seconds. One swimmer finished in 53.7 seconds right here. And one swimmer right over here finished in 52.7 seconds. And we can think about similar things for each of these dots. Now, in the final round, one swimmer here went much, much, much faster. So this is in 52.2 seconds. While this swimmer right over here went slower. We don't know which dot he was up here. But regardless of which dot he was up here, this dot took more time than all of these dots. So his time definitely got worse. And this is at 53.8 seconds. So let's look at the statements and see which of these apply. The swimmers had faster times, on average, in the finals. Is this true? Faster times on average in the finals? So if we look at the finals right over here, we could take each of these times, add them up, and then divide by 8, the number of times we have. But let's see if we can get an intuition for where this is, because we're really just comparing these two plots, or these two distributions, we could say. And so let's see, if all the data was these three points and these three points, we could intuit that the mean would be right around there. It would be around 53.2 or 53.3 seconds, right around there. And then we have this point and this point, if you just found the mean of that point and that point, so halfway between that point and that point, would get you right around there. So the mean of those two points would bring down the mean a little bit. And once again, I'm not figuring out the exact number. But maybe it would be around 53.2, 53.1, or 53.2 seconds. So that's my intuition for the mean of the final round. And now let's think about the mean of the semifinal round. Let's just look at these bottom five dots. If you find their mean, you could intuit it would be maybe someplace around here, pretty close to around 53.3 seconds. And then you have all these other ones that are at 53.5 and 53.3, which will bring the mean even higher. So I think it's fair to say that the mean in the final around and the time is less than the mean up here. And you could calculate it yourself, but I'm just trying to look at the distributions and get an intuition here. And at least in this case, it looks pretty clear that the swimmers had faster times, on average, in the finals. It took them less time. One of the swimmers was disqualified from the finals. Well, that's not true. We have 8 swimmers in the semifinal round. And we have 8 swimmers in the final round. So that one's not true. The times in the finals vary noticeably more than the times in the semifinals. That does look to be true. We see in the semifinals, a lot of the times were clumped up right around here at 53.3 seconds and 53.5 seconds. The high time isn't as high as this time. The low time isn't as low there. So the final round is definitely-- they vary noticeably more. Individually, the swimmers all swam faster in the finals than they did in the semifinals. Well, that's not true. Whoever this was, clearly they were one of these data points up here. This data point took more time than all of these data points. So this represents someone who took more time in the finals than they did in the semifinals. And we got it right.