If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

## Statistics and probability

### Unit 12: Lesson 5

More significance testing videos

# Hypothesis testing and p-values

Sal walks through an example about a neurologist testing the effect of a drug to discuss hypothesis testing and p-values. Created by Sal Khan.

## Want to join the conversation?

• Why do we reject the null hypothesis when we have 99.7% of area under the curve supporting null hypothesis? •   The bell curve is saying that if you test groups of 100 (undrugged) rats over and over again, the average reactions times will range between 1.05 and 1.35 almost every time. The average outcome is 1.2 seconds but just by chance you could get a group of faster reacting rats, or slower reacting rats. Each rat can be a little different than the average. Now suppose you get a group of rats and they average out to 1.05 seconds. That would be really really rare for normal rats. So that bunch of rats must have been drinking Starbucks Coffee or something because they are not normal rats. Think of the the 99.7% part of the curve as caffeine free rates. Null = no = caffeine free. Then along comes a bunch of rats that are so 'hyper' they are "off the charts" Anyone claiming these are normal rats can be pretty safely "rejected" (called a liar).
• Starting at , why do you need to estimate the sample standard deviation when you already have it(.5)? He goes on to say that you put a hat on it to show that you estimated the population standard deviation by using the sample but why does the sigma have a hat for population estimate and have an x bar for sample? Is the notation correct on that section? • SHouldn't it be the other way around when calculating the Z value?

My professor always told me to do it that way. The final conclusion doesn't change in this case though, but just wanted to make sure if that's the proper way. • since normal probability distribution (bell curve) is symmetric around the mean, it doesnt matter. It gives same result in terms of area under curve, thats why prof. wanted to make it less complex in saying that. But if we were dealing with a non symmetric prob. distr. like F distr, then it would matter.
hope that helps.
• Why are you not using a t-distribution to find the probability of getting the sample result? I know that when the sample size is large (n = 100), a t-distribution is essentially the same as a normal distribution, but I think this lesson can be misleading when we are taught to use a t-distribution in the common case when the population standard deviation is not known and we are estimating it from the sample. • The t-test is more conservative, if the sample size is small. I think you would opt for the more conservative test, knowing that with a larger sample size, there is essentially no difference between t and z. In general, when comparing two means, the t-test is used. Note from the results given above by ericp, that the conclusion from either test is the same. The two groups differ significantly. In scientific reports, p-value is reported to 2 decimal places. So using either the z or t test, you would report a significant difference "with p < .01".
• Is it valid to assume the sample SD is close to the population SD? Even if the sample size is high, the rats in the sample have been injected, how do we know that doesn't affect the sample SD? • It is an assumption you are making, justified by the fact that your Ho is that the drug has no effect, and that the populations (drug vs. no drug) will actually be identical. If the drug has no effect, then the standard deviation of drug and no drug rats should be the same. It is an assumption, justified with some logic, but not proven.

In a research paper, this would be recognized as a weakness, but an unavoidable one, because it is impossible to know the true standard deviation of either population - you only know the samples.
• I have a very fundamental question:

Short formulation of the question: Why is the hypothesis test designed the way it is? I want to know exactly why we can't calculate the probability of the alternative hypothesis given the sample directly and why we have to assume the null hypothesis is true?

Long formulation of the question: When conducting an experiment and setting up hypotheses about its outcome, what we actually want to know is whether our alternative hypothesis is true or at least how likely it is (i.e. the probability of the alternative hypothesis to be true), right?
The here presented hypothesis test only gives us the probability of the sample mean to be extreme, but not the probability of the real underlying population mean to be extreme.
So why do we have to go through this process of calculating the probability of the sample given the null-hypothesis is true and then use this result to infer the likelihood of the alternative hypothesis?

From my understanding of the hypothesis test I would answer my own question like that:
Since we don't know anything about the underlying population except the tested sample, we just are not able to do any calculations of it. This includes calculating the probabilities of the alternative hypothesis because it is a hypothesis about the population.
We have to work under the assumption that the null hypothesis is true because otherwise we cannot really do anything, we wouldn't know where to center the normally distributed curve which we use to calculate significance.. ...but somehow I am not entirely convinced by my own answer..
Even if my own answer to the question happens to be not far away from the truth, I would appreciate it very much if someone could elaborate a bit.

Thank you! • H_0: pop_mean that someone (including you) insists it's true
H_A: pop_mean what another (including you, again) insists H_0 can't be right, cause this, their own mean, is true
sample_mean (and sample_std): the only evidence for both sides to check which is right

in short, what you're doing with significance test is attacking someone's mean with a different mean based on a gathered data

if it's good enough to support you, you can kill H_0 and insist your own H_A as the next H_0 (that's how scientifical theories have been developed and challenged and so forth)
if not, you can't kill it. that's it. no more, no less (what about your precious H_A? just forget about that, not enough evidence)
(1 vote)
• I don't understand where Sal got 99.7%... can anyone explain? () • Shouldn't we say that the alternative hypothesis is just μ<1.2s and not in both directions? • How do you calculate the critical value? I cant find an explaination for it in your video list. Thank you! • short answer: Critical values are generally chosen or looked up in a table (based on a chosen alpha).

--------------------
In this video there was no critical value set for this experiment. In the last seconds of the video, Sal briefly mentions a p-value of 5% (0.05), which would have a critical of value of z = (+/-) 1.96. Since the experiment produced a z-score of 3, which is more extreme than 1.96, we reject the null hypothesis.

Generally, one would chose an alpha (a percentage) which represents the "tolerance level for making a mistake.*" Then the corresponding critical value can be looked up from a table. [* the "mistake" being to incorrectly reject the null hypothesis. In other words, we made the error of claiming that the experiment had an effect when it did not.]

The critical value is the cut-off point that corresponds to that alpha; any value beyond the critical value is less than alpha(%) likely to occur by chance.

http://en.wikipedia.org/wiki/Standard_normal_table

note that for an alpha of 5%, in a cumulative table, you would first divide your alpha in half for a two-tailed test, then subtract that from 1. That is the value you are looking for in the table. So we get 1 - (.05/2) = 1 - .025 = 0.9750
We find 0.9750 in our table, look at the row: 1.9; look at the column: 0.06; add the two together to get the corresponding z-score: 1.96.
• If we assume that the null hypothesis is true, then why do we assume that the sample mean is 1.2 sec? We already know that it's 1.05 sec. 