Main content

## Comparing two means

Current time:0:00Total duration:11:36

# Statistical significance on bus speeds

## Video transcript

- [Voiceover] "Giovanna
usually takes bus B to work, "but now she thinks that bus
A gets her to work faster. "She randomized 50 workdays
between a treatment group "and a control group. "For each day from the
treatment group, she took bus A; "and for each day from the
control group, she took bus B. "Each day she timed the
length of her drive." This is really interesting
what she did, it's very important, she randomized
the 50 work days. Before she did this, instead
of just kinda waking up in the morning and just deciding
on her own which bus to take. Because humans are infamously
bad at being random. Even when we think we're being random, we're actually not that random. She might inadvertently be
taking bus A earlier in the week. Or maybe the commute times are shorter. Or maybe she inadvertently
takes bus A when the weather is better,
when there's less traffic. Remember, there's a natural
tendency for human beings to want to confirm their hypothesis. So, if she thinks that
bus A is faster maybe she'll want to pick the days where she'll get data to confirm her hypothesis. It's really important that
she randomize the 50 workdays. What I could imagine she did
is maybe she wrote each of the work days, the dates,
on a piece of paper. She would have 50 pieces of
paper and then she turned them all upside down or
maybe she closed her eyes and then she moved them
all over her table. Then with her eyes closed
she randomly moved them to either the left or the right of the table. If they moved to the left
of the table then those are the days she'll take bus A,
if she moves them to the right of the table those are
the days she takes bus B. That's how she can make sure
that this is truly random. So then they tell us, this is important, "The results of the experiment
showed that the median "travel duration for bus A
is eight minutes less than "the median travel duration for bus B." Or one way to think about, if we said, "The treatment group "median minus the control group median. "What would we get?" Well, the treatment group
is eight minutes less than the control group? Right? This is A, this is B, so if
this is eight less than this, then this is going to be
equal to negative eight. This is just another
way of restating what I have underlined right over here. Someone's car alarm went off,
hope you're not hearing that. Anyway, I'll try to pay
attention while it's going off (chuckles). "To test whether the
results could be explained "by random chance, she
created the table below, "Which summarizes the results
of 1000 re-randomizations "of the data, with
differences between medians "rounded to the nearest five minutes." What is going on over here? You might say well look,
"She got her result that she "wanted to get, this data
seems to confirm that "bus A gets her to work faster. "What's all this other
business with re-randomization "she's doing?" The important thing to realize
is, and she realizes this, is that she might have
just gotten this data that I underlined, by random chance. There's some chance maybe A
and B are completely similar, in terms of how long they take in reality. She just happened to pick bus A on days where bus A got to work faster. Maybe bus B is faster but
she just happened to take bus A on the days that it was faster. The days it just happened
to have less traffic. What she's doing here is
she re-randomized the data and she wants to see that
with all this re-randomized data, out of these 1000 re-randomizations, what fraction of them do
I get a result like this? Do I get a result where A is
eight minutes or more faster? Or you could say that the
median travel duration for bus A is eight minutes less,
or even less than that, than the median travel for bus B. So if it was nine minutes
less, or 10 minutes less, or 15 minutes less, those
are all the interesting ones. Those are the ones that
confirm our hypothesis, that bus A gets to work faster. Let's look at this table, it's not below, it's actually to the right. Let's just remind ourselves
what she did here, cause the first time
you try to process this it can seem a little bit daunting. So, in her experiment, let me write this down, experiment... The car alarm outside which
you probably, hopefully are not hearing, it's actually
a surprisingly pleasant sounding car alarm, sounds like a slightly obnoxious bird,
but anyway (laughs). Her experiment is, the way I described it, 25 days she would take bus A,
25 days she would take bus B. She would record all the
travel times and let's say that I have 25 data points in each column. Let's say they get 12 minutes,
20 minutes, 25 minutes, and you just keep going,
there's 25 data points. Let's just say that there are
12 data points less than 20 minutes and 12 data points
more than 20 minutes. In this circumstance,
her median time for bus A would be 20 minutes and I
just made this number up. So in order for this to
be eight minutes less than the median time for
bus B, the median for bus B would have to be 28 and maybe
you have data points here. Maybe this is 18 and you have
12 more that are less than 28. Then you have 12 more
that are greater than 28. So the median time for bus B would be 28, once again
I just made this data up. If you took treatment group median. I 'll just write TGM for short. TGM minus control group median. What do you get? 20 minus
28 is negative eight. This is the actual results of.... These are theoretical,
potential results, hypothetical results for her actual experiment. Now what's all of this business over here? What she did is she took
these times and she said, "You know what, let's just
imagine a world where I could "have gotten any of these
times randomly on either bus." So she just randomly
re-sorted them between A and B, she did that a thousand times. The first time, the second
time, the third time. She does this 1000 times. I'm assuming she used some
type of computer program to do it and each time, once again, she just took the data
that she had and she just rearranged it, she just reshuffled it. Maybe A on one day. Maybe it got this 18. Maybe it gets the 25. Maybe it gets a 30. Once again, I got the 18, the 25, the 30 and maybe B gets the... You know she's reshuffling
all these other data points that I just have with dots and maybe B... Let's see she had the 18,
25, 30, maybe 12, 20, and 28. So in this circumstance,
this random reshuffling and she keeps doing it
over and over again. In this random reshuffling,
the treatment group median minus the control group
median is going to be what? It's going to be equal to positive five. In this random shuffling,
this hypothetical scenario, Bus A's median would
have been five minutes longer than bus B's. If she gets this result
with this random re-sorting, this would have been... She would have had a column here for five. Then she would have put one notch right over here. It looks like she classified
things or maybe she didn't even get the data but she classified
them by multiples of two. If she got this again then
she would have put a two here. Then she would have said,
"Okay, in how many of these "random reshufflings am I
getting a scenario where "there's a five minute difference? "Or where the treatment group
was five minutes longer?" What is this saying? For example, this is saying that 18 out of the 1000 reshufflings,
which she just randomly re-shuffled the data, 18
out of those 1000 times, she found a scenario where
her treatment group median was 10 minutes longer
than her control group. Where bus A's median
was in this hypothetical re-randomization where
the treatment group is 10 minutes slower than the control group. There were 159 times where
the treatment group... Once again, in her random
reshuffling, these aren't based on observations, these
are random reshufflings. There's 159 times where
her treatment group is four minutes slower
than her control group. The whole reason for
doing this is she says, "Okay, what's the probability
of getting a result "like this or better?" I say, "better", as one
that even more confirms her hypothesis, that the
treatment group is faster than the control group. Well, the scenario, this
scenario is this one right over here and then another
one that the treatment group is even faster, is this right over here. Here, the treatment group
median is 10 less than the control group median. In how many of these scenarios,
out of the thousands, is this occurring? Well, this one occurs 85
times, this one occurs eight. If you add these two together,
93 out of the thousand times, out of her re-randomization
or I guess you could say 9.3 percent of the time, the data... 9.3 percent of the randomized,
the 1000 re-randomizations, 9.3 percent of the time she got data that was as validating
of a hypothesis or more than the actual experiment. One way to think about
this is, the probability of randomly getting the
results from her experiment or better results from her experiment are 9.3 percent. They're low, it's a reasonably
low probability that this happened purely by chance. Now, a question is,
"What's the threshold?" If it was a 50 percent you
say, "Okay, this was very "likely to happen by chance." If this was a 25 percent
you're like, "Okay, it's less "likely to happen by chance
but it could happen." 9.3 percent, it's roughly 10 percent. For every 10 people who do
an experiment like she did, even if it was random, one
person would get data like this? What typically happens amongst
statisticians is they draw a threshold and the threshold for statistical significance
is usually five percent. One way to think about it,
the probability of her getting this result by chance, this result or a more extreme result? One that more confirms
her hypothesis by chance is 9.3 percent. If you're cut-off for
significance is five percent. If you said, "Okay, this has
to be five percent or less." Then you say, "Okay, this is
not statistically significant." There's more than a five
percent chance that I could have gotten this result
purely through random chance. Once again, that just depends on where you have that threshold. When we go back, I think
we've already answered the final question,
"According to the simulations, "what is the probability of
the treatment group's median "being lower than the
control group's median "by eight minutes or more?" Which once again, eight
minutes or more, that would be negative eight and negative 10. We just figured that out,
that was 93 out of the 1000 re-randomizations, so
it's a 9.3 percent chance. If you set five percent as
your cut-off for statistical significance, you say, "Okay,
this doesn't quite meet my "cut-off so maybe this
is not a statistically "significant result."