Main content

## Comparing distributions

Current time:0:00Total duration:4:00

## Video transcript

At the Olympic
games, many events have several rounds
of competition. One of these is the men's
100 meter backstroke. The upper dot plot
shows the times in seconds of the
top 8 finishers in the semifinal round
at the 2012 Olympics. The lower dot plot shows the
times of the same 8 swimmers, but in the final round. Which pieces of
information can be gathered from these dot plots? In the semifinal round, we
see that these are the 8 times of the 8 swimmers. We see 3 swimmers finished
in exactly 53.5 seconds. One swimmer finished in
53.7 seconds right here. And one swimmer right over
here finished in 52.7 seconds. And we can think about similar
things for each of these dots. Now, in the final
round, one swimmer here went much,
much, much faster. So this is in 52.2 seconds. While this swimmer right
over here went slower. We don't know which
dot he was up here. But regardless of which
dot he was up here, this dot took more time
than all of these dots. So his time
definitely got worse. And this is at 53.8 seconds. So let's look at the statements
and see which of these apply. The swimmers had faster times,
on average, in the finals. Is this true? Faster times on
average in the finals? So if we look at the
finals right over here, we could take each
of these times, add them up, and then divide by
8, the number of times we have. But let's see if we can get an
intuition for where this is, because we're really
just comparing these two plots, or these two
distributions, we could say. And so let's see, if all
the data was these three points and these
three points, we could intuit that the mean
would be right around there. It would be around 53.2 or 53.3
seconds, right around there. And then we have this
point and this point, if you just found the
mean of that point and that point, so
halfway between that point and that point, would get
you right around there. So the mean of those two
points would bring down the mean a little bit. And once again, I'm not
figuring out the exact number. But maybe it would be around
53.2, 53.1, or 53.2 seconds. So that's my intuition for
the mean of the final round. And now let's think about the
mean of the semifinal round. Let's just look at
these bottom five dots. If you find their
mean, you could intuit it would be maybe
someplace around here, pretty close to
around 53.3 seconds. And then you have
all these other ones that are at 53.5
and 53.3, which will bring the mean even higher. So I think it's fair to say that
the mean in the final around and the time is less
than the mean up here. And you could
calculate it yourself, but I'm just trying to
look at the distributions and get an intuition here. And at least in this case,
it looks pretty clear that the swimmers had
faster times, on average, in the finals. It took them less time. One of the swimmers was
disqualified from the finals. Well, that's not true. We have 8 swimmers in
the semifinal round. And we have 8 swimmers
in the final round. So that one's not true. The times in the
finals vary noticeably more than the times
in the semifinals. That does look to be true. We see in the semifinals,
a lot of the times were clumped up
right around here at 53.3 seconds
and 53.5 seconds. The high time isn't
as high as this time. The low time isn't as low there. So the final round
is definitely-- they vary noticeably more. Individually, the swimmers
all swam faster in the finals than they did in the semifinals. Well, that's not true. Whoever this was, clearly they
were one of these data points up here. This data point took more time
than all of these data points. So this represents someone who
took more time in the finals than they did in the semifinals. And we got it right.