Main content

## The idea of significance tests

# P-values and significance tests

AP.STATS:

DAT‑3 (EU)

, DAT‑3.E (LO)

, DAT‑3.E.1 (EK)

, DAT‑3.F (LO)

, DAT‑3.F.1 (EK)

, DAT‑3.F.2 (EK)

, VAR‑7 (EU)

, VAR‑7.C (LO)

, VAR‑7.C.1 (EK)

## Video transcript

- Let's say that I run a website that currently has this off white color for it's background and I know the mean amount of time that people spend on my website, let's say it is 20 minutes and I'm interested in making a change that will make people spend
more time on my website. My idea is to make the background color of my website yellow. But after making that change, how do I feel good about
this actually having the intended consequence? Well that's where significance
tests come into play. What I would do is first
set up some hypotheses, a null hypothesis and an
alternative hypothesis. The null hypothesis tends
to be a statement that, "Hey, your change actually had no effect, "there's no news here,"
and so this would be that your mean is still equal to 20 minutes after the change to yellow, in this case, for our background. And we would also have an
alternative hypothesis. Our alternative hypothesis is actually that our mean is now greater
because of the change, that people are spending
more time on my site. So our mean is greater than
20 minutes after the change. Now the next thing we do is we set up a threshold known as
the significance level and you will see how this
comes into play in a second. So, your significance level is usually denoted by
the Greek letter Alpha and you tend to see
significant levels like 1/100 or 5/100 or 1/10 or 1%, 5%, or 10%. You might see other ones, but we're gonna set a significance level for this particular case. Let's just say it's going to be 0.05. And what we're going to
now do is we're going to take a sample of people visiting this new yellow background website and we're gonna calculate statistics. The sample mean, the
sample standard deviation, and we're gonna say,
"Hey, if we assume that "the null hypothesis is true, "what is the probability
of getting a sample "with the statistics that we get?" And if that probability is lower than our significance level, if that probability is less than 5/100, if it's less than 5%, then
we reject the null hypothesis and say that we have
evidence for the alternative. However, if the probability
of getting the statistics for that sample are at the
significance level or higher, then we say, "Hey, we can't
reject the null hypothesis, "and we aren't able to have
evidence for the alternative." So what we would then do, I
will call this step three. In step three, we would take a sample. So let's say we take a sample size, let's say we take 100 folks
who visit the new website, the yellow background website, and we measure sample statistics. We measure the sample mean here, let's say that for that sample, the mean is 25 minutes. We are also likely to, if we don't know what the actual population standard deviation is, which we typically don't know, we would also calculate the
sample standard deviation. Then the next step is
we calculate a p-value. And the p-value, which
stands for probability value, is the probability of getting a statistic at least this far away from the mean if we were to assume that
the null hypothesis is true. So one way to think about it it is a conditional probability. It is the probability that our sample mean when we take a sample of size n=100 is greater than or equal to 25 minutes, given our null hypothesis is true. And in other videos, we have
talked about how to do this. If we assume that the
sampling distribution of the sample means is roughly normal, we can use the sample mean, we can use our sample size, we can use our sample standard deviation, perhaps we use a t-statistic, to figure out what this
probability is going to be. And then we decide whether we can reject the null hypothesis. So let me call that step five. So step five, there are two situations. If my p-value, if it is less than Alpha, then I reject my null hypothesis and say that I have evidence
for my alternative hypothesis. Now, if we have the other situation, if my p-value is greater than or equal to, in this case 0.05, so if it's greater than or
equal to my significance level, then I cannot reject the null hypothesis. I wouldn't say that I
accept the null hypothesis, I would just say that we do not reject the null hypothesis. And so, let's say, when I do
all of these calculations, I get a p-value which would
put me in this scenario right over here. Let's say that I get a p-value of 0.03. 0.03 is indeed less than 0.05 so I would reject the null hypothesis and say that I have evidence
for the alternative. And this should hopefully
make logical sense because what we're saying is, hey, look, we took a sample and if we
assume the null hypothesis, the probability of getting
that sample is 3%, it's 3/100, and so since that probability is less than our
probability threshold here, we'll reject it and say we have evidence for the alternative. On the other hand, there
might have been a scenario where we do all of the calculations here and we figure out a p-value that we get is equal to 0.5, which you
can interpret as saying that hey, if we assume the
null hypothesis is true, that there's no change due to
making the background yellow, I would have a 50% chance
of getting this result. And in that situation,
since it's higher than my significance level, I wouldn't reject the null hypothesis. A world where the null hypothesis is true and I get this result, well, you know, it
seems reasonably likely. And so, this is the basis for
significant tests generally and as you'll see, is
applicable in almost every field you'll find yourself in. Now there's one last
point of clarification that I wanna make very, very, very clear. Our p-value, the thing that we're using to decide whether or not we
reject the null hypothesis, this is the probability of
getting your sample statistics given that the null hypothesis is true. Sometimes people confuse
this and they say, "Hey, is this the probability
that the null hypothesis "is true given the sample
statistics that we got?" And I would say, "Clearly,
no, that is not the case." We are not trying to gauge the probability that the null hypothesis is true or not. What we are trying to do is say, "Hey, if we assume the
null hypothesis were true, "what is the probability
that we got the result "that we did for our sample?" And if that probability is low, if it's below some threshold
that we set ahead of time, then we decide to reject
the null hypothesis and say that we have
evidence for the alternative.