Main content

## Statistics and probability

### Course: Statistics and probability > Unit 13

Lesson 2: Comparing two means- Statistical significance of experiment
- Statistical significance on bus speeds
- Hypothesis testing in experiments
- Difference of sample means distribution
- Confidence interval of difference of means
- Clarification of confidence interval of difference of means
- Hypothesis test for difference of means

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Statistical significance on bus speeds

Sal determines if the results of an experiment about bus speeds are statistically significant.

## Want to join the conversation?

- In all of these videos about hypothesis testing I'm left wondering how the "re-randomisation" is done. It would be helpful to have this explained in more detail.(31 votes)
- I'm assuming that the "re-randomisation" means when we take the N people and redistribute them between the two groups (let me know if that's mistaken, I'm not able to watch the video right at this moment).

If this is the case, then it's a relatively simple concept. Imagine we have a list of names and the associated group A or B. Just keeping the list of names as-is, take all the group A and B's, and throw them into a hat. Draw one out - the first person is in that group. Draw a second - the second person is in that group, and so on. Draw out all the groups, and just put the next person into that group. Presto, we have a re-randomization of the groups. Rinse and repeat to get a second re-randomization, and so on. Various computer algorithms will do this for us very quickly, but that's the basic idea.(24 votes)

- How and Why Re-randomization works ?(10 votes)
- The purpose of the re-randomization is to take all the original data, regardless of whether it is treatment group or control, and see whether the resulting difference in trip times are likely as random chance. Here is a very simplified version: actually you would want more measurements. In the case of bus trip lengths, we might have the following:

1: 53 min (A)

2: 42 min (A)

3: 40 min (B)

4: 53 min (A)

5: 38 min (A)

6: 28 min (B)

7: 52 min (A)

8: 32 min (B)

9: 55 min (B)

10: 33 min (B

For calculating the original results, we would find the median of the A bus trips and the median of the B bus trips. Then we would compare them, in this case by finding the difference of the medians.

For the simulations, we would dump all the data together and group them randomly into two groups over and over. This is re-randomization. We are trying to find out if the results were important and likely to occur as a regular event, or if the results were just a quirk, and not likely to be a regular result.

How do we do this:? For each simulation trial, we would find the medians of each random group and the difference between the medians. To have a good simulation, we might do this 150 times or 1000 times, as in this case. Then we would see how often we would find the original results among the RANDOM group results. If we get the same result in the random groups very rarely, then we can say that our experimental result was a significant result. If that is true, we can switch to Bus B and save time most of the time on our bus ride.(31 votes)

- In calculating statistical significance should he be counting both the frequencies that are greater than +8 as well as those that are less than -8 (and not just the +8 ones)? I thought that statistical significance was measuring the likelihood of getting a value as extreme as the one he got (regardless of direction)?(10 votes)
- Hi!

Good question.

You would be right that we would have to add the frequencies greater than +8 and smaller than -8...

IF our question was: do A and B differ from each other by more than 8 minutes ?

In this question, we don't care if A is faster than B or B is faster than A.

However, here our question is: is it true (can we reasonably assume) that A is faster than B by 8 minutes ?

For this reason, we are only interested in those outcomes where the difference [A-B] >= 8. Those outcomes where A is greater than B by 8 minutes or more.

I hope this answers your question!(12 votes)

- Why is it that if the probability you get is lower than 5%, then the result is significant? How come when you solve the probability you are actually solving the probability that the results are
*random*?

Thx, Clarissa(6 votes)- The tests that we do make some sort of assumption. We might assume that the population mean is some value, or that the probability of getting heads on a coin flip is 0.5, etc. That assumption is crucial. Once we make that assumption, we can start calculating probabilities. In particular, we want to calculate the probability of the observed result happening by chance. The reason for this is that the outcome - such as the length of time the bus trip lasts - is a "random variable." It can't be predicted exactly. So in this case we make the assumption that the two bus routes have the
*same mean travel time*.

With that, we have a definite scenario to play with. The times are still random events, so there's an element of chance as to whether one route will be a minute or two longer than the other. We want to know the probability of this happening by chance, because if it's a*really small*probability, then it's very unlikely to occur by chance, right?

Now, we try to trust the data - because they're real, they're what actually happened. So if, assuming the two routes have equal travel time, our observed data are very unlikely, that makes our assumption a very poor one, and it's probably wrong. In Statistical jargon, we say this is a "significant" result.(9 votes)

- In this example the median travel time was used. Is there any reason for using the median instead of the mean?(5 votes)
- Sometimes the median can give you a far more practical approach towards a situation.

For example: You want to know how rich the average person living in a city, let's call it Basin City is. While the median person earns 100 dollars per year, and the standard deviation is very low, meaning that most people are very close towards 100 dollars (e.g. 80% of population is between 80 and 120 dollars), the range could be ridiciously high. Imagine a rich CEO living in Basin City earning 5.000.000.000 dollars a year. This insane range may strongly influence the mean, while the median is less affected by those extremes. Now if somebody would use the mean to answer the question "how rich is the average person living in Basin City?" he would get a very distorded answer depicting the average person in Basin City as way more wealthier than actually is the case.(6 votes)

- I dont understand. In Statistical significance on bus speeds, if the chances of Bus A being faster than Bus B in terms of time from source to destination, is roughly ~ 10% out of 1000 simulations by re randomization of sample data medians, to me that means The Claim Bus A is faster than Bus B is True TEN/10 % out of 1000 times OR that almost 90% of the times this Proposed Claim doesn't hold true meaning Bus A DOESN'T reach faster than Bus B.

What am I missing?(4 votes)- You're missing the point that after randomization, the values of Bus A are not really from Bus A anymore, it's values were randomly assigned from Bus A and Bus B of the initial experiment. You've switched the times around randomly so the origin is lost.

Those 10% mean that the hypothesis from your first experience is not valid because in 10% of other 1000 random experiments we got the same result. This tells us that your first experiment might be caused by chance therefore it is not significant.

I hope that clears things out for you.(4 votes)

- Maybe because I am not an English speaker, things are not that clear to me.

According to this video, what I understood was:

1) Hypothesis: bus A is faster than bus B

2) Experiment: bus A "median" travel duration is 8 minutes less than bus B

3) Simulation: the probability of bus A being faster than bus B by 8 minutes or more is 9.3%

4) Significance: ?

I don't understand the significance (meaning and importance) of the simulation result. What means its relationship with the threshold? Less than the threshold means the hypothesis is valid or opposite? Is it good to be over or under the threshold? Is there a logical way to define the threshold or it was chosen by chance?

Sometimes the explanations are very confusing, especially to non-English audience. Sometimes, the choosing of words make the entire explanation confuse, like in "test of pregnancy" and "test of probability" later are referred to as simply "test". Which test?(3 votes)- Hi Marcello!

Good question. The threshold is chosen by the statistician. That's also why we always have to mention it. When we say "this experiment was significant at the 5% level", the audience knows that we chose a threshold of 5%.(2 votes)

- I didn't get the threshold thing here. In the video, Sal said if the threshold is 50%, it is very likely to happen and if its 25% then its less likely to happen. I thought if the probability we get after re-randomizing the previous experiment data is greater than the threshold then we assume our null hypothesis (Bus A is faster than Bus B) to be true.

So, if threshold is 50% or 25% and the probability we got is 9.3%, the chance of Bus A being faster than Bus B is very unlikely and we reject our hypothesis. Do correct me here, I'm probably wrong.(2 votes)- > "the chance of Bus A being faster than Bus B is very unlikely and we reject our hypothesis"

There's a key element you're missing. Our hypothesis is that the two bus routes do*not*have different population median travel times. If our hypothesis is wrong, then Bus A*is*generally faster than Bus B, and so that fact explains the faster time for Bus A. But under our hypothesis, there is no reason that Bus A should tend to be quicker, so the fact that Bus A had a sample median that was 8 minutes faster than Bus B is purely a result of chance - random variation.

In the re-randomization, we simulate a distribution of the difference in medians - this is a set of possible values that we*could have observed*if the two bus routes had equal medians, with more likely values showing up more often. If you've seen some of the other Statistics videos, it's comparable to the Sampling Distribution of the Sample Mean. We use this distribution to find the probability of Bus A being at least 8 minutes faster than Bus B*under the assumption that the two routes have no difference*.

The*observed value*, the 8 minutes difference, is derived from reality. It's what really happened. If Bus A is faster, this will be a larger number. The*simulated distribution*is forced to obey our hypothesis, that neither route is quicker. If there is only a small probability of the*observed*result when comparing against the*simulated*distribution, then we know that our hypothesis doesn't really reflect reality (or put another way: reality conflicts with our hypothesis), and we would claim that the hypothesis is wrong, and that one of the routes is indeed quicker than the other.(3 votes)

- In statistical studies like this, how would one know when to use the median vs the mean? Conceptually, what would analyzing the mean have given different from analyzing the median?(2 votes)
- When the data are skewed or contain outliers, the mean tends to be a poorer measure of center than the median. It is still sometimes preferable to use the mean instead of the median (due to some other properties, such as the sampling distribution of the sample mean being asymptotically normal).(2 votes)

- why do we keep the names of the groups(treatment group, control group) if we are doing a random shuffle? I think that we could have taken the positive values just like we took the negative ones. and in the case of what I said was true which one do we pick.(2 votes)
- What do you mean by "
*we could have taken the positive values just like we took the negative ones*"?

To answer your question: we keep the group labels because we have two different groups (bus routes). We hypothesize / assume that they have the same travel time. Under this assumption, it doesn't matter which route we take, they'll have basically the same travel time, just with random variation (getting stopped at a traffic light, running into heavy traffic, etc). If this assumption if wrong, then one route will have a tendency to produce lower numbers (faster times).

By shuffling the data between the two groups, we*make sure*that neither route has a tendency to produce lower or higher numbers. So, we're intrinsically concerned about the two groups, so we need to keep them around to classify the randomization of the data points.(1 vote)

## Video transcript

- [Voiceover] "Giovanna
usually takes bus B to work, "but now she thinks that bus
A gets her to work faster. "She randomized 50 workdays
between a treatment group "and a control group. "For each day from the
treatment group, she took bus A; "and for each day from the
control group, she took bus B. "Each day she timed the
length of her drive." This is really interesting
what she did, it's very important, she randomized
the 50 work days. Before she did this, instead
of just kinda waking up in the morning and just deciding
on her own which bus to take. Because humans are infamously
bad at being random. Even when we think we're being random, we're actually not that random. She might inadvertently be
taking bus A earlier in the week. Or maybe the commute times are shorter. Or maybe she inadvertently
takes bus A when the weather is better,
when there's less traffic. Remember, there's a natural
tendency for human beings to want to confirm their hypothesis. So, if she thinks that
bus A is faster maybe she'll want to pick the days where she'll get data to confirm her hypothesis. It's really important that
she randomize the 50 workdays. What I could imagine she did
is maybe she wrote each of the work days, the dates,
on a piece of paper. She would have 50 pieces of
paper and then she turned them all upside down or
maybe she closed her eyes and then she moved them
all over her table. Then with her eyes closed
she randomly moved them to either the left or the right of the table. If they moved to the left
of the table then those are the days she'll take bus A,
if she moves them to the right of the table those are
the days she takes bus B. That's how she can make sure
that this is truly random. So then they tell us, this is important, "The results of the experiment
showed that the median "travel duration for bus A
is eight minutes less than "the median travel duration for bus B." Or one way to think about, if we said, "The treatment group "median minus the control group median. "What would we get?" Well, the treatment group
is eight minutes less than the control group? Right? This is A, this is B, so if
this is eight less than this, then this is going to be
equal to negative eight. This is just another
way of restating what I have underlined right over here. Someone's car alarm went off,
hope you're not hearing that. Anyway, I'll try to pay
attention while it's going off (chuckles). "To test whether the
results could be explained "by random chance, she
created the table below, "Which summarizes the results
of 1000 re-randomizations "of the data, with
differences between medians "rounded to the nearest five minutes." What is going on over here? You might say well look,
"She got her result that she "wanted to get, this data
seems to confirm that "bus A gets her to work faster. "What's all this other
business with re-randomization "she's doing?" The important thing to realize
is, and she realizes this, is that she might have
just gotten this data that I underlined, by random chance. There's some chance maybe A
and B are completely similar, in terms of how long they take in reality. She just happened to pick bus A on days where bus A got to work faster. Maybe bus B is faster but
she just happened to take bus A on the days that it was faster. The days it just happened
to have less traffic. What she's doing here is
she re-randomized the data and she wants to see that
with all this re-randomized data, out of these 1000 re-randomizations, what fraction of them do
I get a result like this? Do I get a result where A is
eight minutes or more faster? Or you could say that the
median travel duration for bus A is eight minutes less,
or even less than that, than the median travel for bus B. So if it was nine minutes
less, or 10 minutes less, or 15 minutes less, those
are all the interesting ones. Those are the ones that
confirm our hypothesis, that bus A gets to work faster. Let's look at this table, it's not below, it's actually to the right. Let's just remind ourselves
what she did here, cause the first time
you try to process this it can seem a little bit daunting. So, in her experiment, let me write this down, experiment... The car alarm outside which
you probably, hopefully are not hearing, it's actually
a surprisingly pleasant sounding car alarm, sounds like a slightly obnoxious bird,
but anyway (laughs). Her experiment is, the way I described it, 25 days she would take bus A,
25 days she would take bus B. She would record all the
travel times and let's say that I have 25 data points in each column. Let's say they get 12 minutes,
20 minutes, 25 minutes, and you just keep going,
there's 25 data points. Let's just say that there are
12 data points less than 20 minutes and 12 data points
more than 20 minutes. In this circumstance,
her median time for bus A would be 20 minutes and I
just made this number up. So in order for this to
be eight minutes less than the median time for
bus B, the median for bus B would have to be 28 and maybe
you have data points here. Maybe this is 18 and you have
12 more that are less than 28. Then you have 12 more
that are greater than 28. So the median time for bus B would be 28, once again
I just made this data up. If you took treatment group median. I 'll just write TGM for short. TGM minus control group median. What do you get? 20 minus
28 is negative eight. This is the actual results of.... These are theoretical,
potential results, hypothetical results for her actual experiment. Now what's all of this business over here? What she did is she took
these times and she said, "You know what, let's just
imagine a world where I could "have gotten any of these
times randomly on either bus." So she just randomly
re-sorted them between A and B, she did that a thousand times. The first time, the second
time, the third time. She does this 1000 times. I'm assuming she used some
type of computer program to do it and each time, once again, she just took the data
that she had and she just rearranged it, she just reshuffled it. Maybe A on one day. Maybe it got this 18. Maybe it gets the 25. Maybe it gets a 30. Once again, I got the 18, the 25, the 30 and maybe B gets the... You know she's reshuffling
all these other data points that I just have with dots and maybe B... Let's see she had the 18,
25, 30, maybe 12, 20, and 28. So in this circumstance,
this random reshuffling and she keeps doing it
over and over again. In this random reshuffling,
the treatment group median minus the control group
median is going to be what? It's going to be equal to positive five. In this random shuffling,
this hypothetical scenario, Bus A's median would
have been five minutes longer than bus B's. If she gets this result
with this random re-sorting, this would have been... She would have had a column here for five. Then she would have put one notch right over here. It looks like she classified
things or maybe she didn't even get the data but she classified
them by multiples of two. If she got this again then
she would have put a two here. Then she would have said,
"Okay, in how many of these "random reshufflings am I
getting a scenario where "there's a five minute difference? "Or where the treatment group
was five minutes longer?" What is this saying? For example, this is saying that 18 out of the 1000 reshufflings,
which she just randomly re-shuffled the data, 18
out of those 1000 times, she found a scenario where
her treatment group median was 10 minutes longer
than her control group. Where bus A's median
was in this hypothetical re-randomization where
the treatment group is 10 minutes slower than the control group. There were 159 times where
the treatment group... Once again, in her random
reshuffling, these aren't based on observations, these
are random reshufflings. There's 159 times where
her treatment group is four minutes slower
than her control group. The whole reason for
doing this is she says, "Okay, what's the probability
of getting a result "like this or better?" I say, "better", as one
that even more confirms her hypothesis, that the
treatment group is faster than the control group. Well, the scenario, this
scenario is this one right over here and then another
one that the treatment group is even faster, is this right over here. Here, the treatment group
median is 10 less than the control group median. In how many of these scenarios,
out of the thousands, is this occurring? Well, this one occurs 85
times, this one occurs eight. If you add these two together,
93 out of the thousand times, out of her re-randomization
or I guess you could say 9.3 percent of the time, the data... 9.3 percent of the randomized,
the 1000 re-randomizations, 9.3 percent of the time she got data that was as validating
of a hypothesis or more than the actual experiment. One way to think about
this is, the probability of randomly getting the
results from her experiment or better results from her experiment are 9.3 percent. They're low, it's a reasonably
low probability that this happened purely by chance. Now, a question is,
"What's the threshold?" If it was a 50 percent you
say, "Okay, this was very "likely to happen by chance." If this was a 25 percent
you're like, "Okay, it's less "likely to happen by chance
but it could happen." 9.3 percent, it's roughly 10 percent. For every 10 people who do
an experiment like she did, even if it was random, one
person would get data like this? What typically happens amongst
statisticians is they draw a threshold and the threshold for statistical significance
is usually five percent. One way to think about it,
the probability of her getting this result by chance, this result or a more extreme result? One that more confirms
her hypothesis by chance is 9.3 percent. If you're cut-off for
significance is five percent. If you said, "Okay, this has
to be five percent or less." Then you say, "Okay, this is
not statistically significant." There's more than a five
percent chance that I could have gotten this result
purely through random chance. Once again, that just depends on where you have that threshold. When we go back, I think
we've already answered the final question,
"According to the simulations, "what is the probability of
the treatment group's median "being lower than the
control group's median "by eight minutes or more?" Which once again, eight
minutes or more, that would be negative eight and negative 10. We just figured that out,
that was 93 out of the 1000 re-randomizations, so
it's a 9.3 percent chance. If you set five percent as
your cut-off for statistical significance, you say, "Okay,
this doesn't quite meet my "cut-off so maybe this
is not a statistically "significant result."