If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Using P-values to make conclusions

Learn how to use a P-value and the significance level to make a conclusion in a significance test.
This article was designed to provide a bit of teaching and a whole lot of practice. The questions are ordered to build your understanding as you go, so it's probably best to do them in order. Onward!

We use p-values to make conclusions in significance testing. More specifically, we compare the p-value to a significance level α to make conclusions about our hypotheses.
If the p-value is lower than the significance level we chose, then we reject the null hypothesis H0 in favor of the alternative hypothesis Ha. If the p-value is greater than or equal to the significance level, then we fail to reject the null hypothesis H0, but this doesn't mean we accept H0. To summarize:
p-value<αreject H0accept Hap-valueαfail to reject H0
Let's try a few examples where we use p-values to make conclusions.

Example 1

Alessandra designed an experiment where subjects tasted water from four different cups and attempted to identify which cup contained bottled water. Each subject was given three cups that contained regular tap water and one cup that contained bottled water (the order was randomized). She wanted to test if the subjects could do better than simply guessing when identifying the bottled water.
Her hypotheses were H0:p=0.25 vs. Ha:p>0.25 (where p is the true likelihood of these subjects identifying the bottled water).
The experiment showed that 20 of the 60 subjects correctly identified the bottle water. Alessandra calculated that the statistic p^=2060=0.3¯ had an associated P-value of approximately 0.068.
Question A (Example 1)
What conclusion should be made using a significance level of α=0.05?
Choose 1 answer:

Question B (Example 1)
In context, what does this conclusion say?
Choose 1 answer:

Question C (Example 1)
How would the conclusion have changed if Alessandra had instead used a significance level of α=0.10?
Choose 1 answer:

Example 2

A certain bag of fertilizer advertises that it contains 7.25 kg, but the amounts these bags actually contain is normally distributed with a mean of 7.4 kg and a standard deviation of 0.15 kg.
The company installed new filling machines, and they wanted to perform a test to see if the mean amount in these bags had changed. Their hypotheses were H0:μ=7.4 kg vs. Ha:μ7.4 kg (where μ is the true mean weight of these bags filled by the new machines).
They took a random sample of 50 bags and observed a sample mean and standard deviation of x¯=7.36 kg and sx=0.12 kg. They calculated that these results had a P-value of approximately 0.02.
Question A (Example 2)
What conclusion should be made using a significance level of α=0.05?
Choose 1 answer:

Question B (Example 2)
In context, what does this conclusion say?
Choose 1 answer:

Question C (Example 2)
How would the conclusion have changed if they had instead used a significance level of α=0.01?
Choose 1 answer:

Ethics and the significance level α

These examples demonstrate how we may arrive at different conclusions from the same data depending on what we choose as our significance level α. In practice, we should make our hypotheses and set our significance level before we collect or see any data. Which specific significance level we choose depends on the consequences of various errors, and we'll cover that in videos and exercises that follow.

Want to join the conversation?

  • aqualine ultimate style avatar for user Stan
    Could any one explain how to get the p-value in the second example?
    (18 votes)
    Default Khan Academy avatar avatar for user
    • aqualine ultimate style avatar for user Saxon Knight
      Sure!

      The p-value is the probability of a statistic at least as deviant as ours occurring under the assumption that the null hypothesis is true.

      Under that assumption, and noting also that we are given that the population is normally distributed (or that we took a sample size of at least 30 [by the Central Limit Theorem]), we can treat the sampling distribution of the sample mean as a normal distribution.

      So now, we can use the normal cumulative density function or a z-table to find this probability. (We could also use a t-table, but it is allowable to just use a z table since our sample size is larger than 30)

      To use a z-table, we'll need to find the appropriate z-score first.

      Since the answer to what we are asking comes from the sampling distribution of the sample mean, we would find the appropriate standard deviation to use by dividing the population standard deviation by the square root of the sample size (since the variance of the sampling distribution is the population variance divided by the sample size, and the standard deviation is the square root of the variance).

      That would give us a standard deviation for the sampling distribution of the sample mean.

      I say would, because unfortunately, we don’t always know the population standard deviation, and so (as it seems they did here, despite knowing the population standard deviation), we are using the sample standard deviation in its place to find an estimate of the standard deviation for the sampling distribution of the sample mean, which is also known as the standard error of the mean.

      In our example, the standard error of the mean therefore has a value of 0.12 / 50^0.5, or approximately 0.01697.

      Taking the difference between our sample mean and the population mean and dividing it by the standard error gives us our z-score (number of standard errors our sample mean is away from the population mean), which is approximately (7.36 - 7.4) / 0.01697 or -2.36.

      Since the alternative hypothesis is not specific about the population mean being either greater than or less than the value in the null hypothesis, we have to consider both tails of the distribution, but by symmetry of the standard normal distribution, we can accomplish this by simply doubling the value we get from using our obtained z-score with a z-table.

      The value given by a z-table using a z-score of -2.36 is 0.0091, which, when doubled, is 0.0182 or approximately 0.02.

      This (or other videos before it in that section) might also help (it comes later in this unit): https://www.khanacademy.org/math/statistics-probability/significance-tests-one-sample/tests-about-population-mean/v/calculating-p-value-from-t-statistic

      :)
      (50 votes)
  • blobby green style avatar for user G.Gulzt
    I don't understand the p-value in example 1

    Isn't the calculation: binomial(60,20) * 0.75^40 * 0.25^20 = 0.0383?
    The problem states it is "0.068". Is this p-value wrong or did I make a mistake in my calculation?
    (6 votes)
    Default Khan Academy avatar avatar for user
    • piceratops ultimate style avatar for user Mahmood Salah
      p-value = P(p(x) >= 20/60 given that the actual proportion is 0.25)
      So, You need to calculate:
      binomial(60,20) * 0.75^40 * 0.25^20 (probability of 20 subject that identified the bottled water )
      +
      binomial(60,21) * 0.75^39 * 0.25^21 (probability of 21 subject that identified the bottled water )
      +
      binomial(60,22) * 0.75^38 * 0.25^22
      +
      binomial(60,23) * 0.75^37 * 0.25^23
      :
      :
      binomial(60,60) * 0.75^0 * 0.25^60 (probability of all the 60 subjects identified the bottled water )


      I guess !
      (7 votes)
  • piceratops seed style avatar for user Nishay
    How do you decide what Significance level you should set??
    (2 votes)
    Default Khan Academy avatar avatar for user
    • primosaur seed style avatar for user Ian Pulizzotto
      A significance level of 0.05 (i.e. 5%) is commonly used, but sometimes other significance levels are used.

      Note that the significance level is the probability of a Type 1 error (rejecting a true null hypothesis). Everything else being equal, decreasing the significance level (probability of a Type 1 error) increases the probability of a Type 2 error (failing to reject a false null hypothesis), and vice versa.

      So the statistician has to weigh the cost of a Type 1 error (rejecting a true null hypothesis) versus the cost of a Type 2 error (failing to reject a false null hypothesis) in the real-world situation. If the statistician is especially concerned about the cost of a Type 1 error, then he/she will use a significance level that is less than 0.05. However, if instead the statistician is especially concerned about the cost of a Type 2 error, then he/she will use a significance level that is greater than 0.05.
      (11 votes)
  • blobby green style avatar for user Mohammad Yasser
    As far as I understand, rejecting H0 doesn't mean accepting Ha in all cases. Rejecting H0 only implies accepting Ha iff both are complements to each other, i.e. exactly one of them must be true. E.g. if H0 says x = 5, and Ha says x > 5, then maybe both are wrong and the truth is x < 5. This will be so weird though because the truth is expected to be either H0 or Ha, but I think it's theoretically possible to happen.
    (6 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user Nick Barnes
      You're confounding the truthfulness of H0 with the acceptability of Ha. In your example, not accepting Ha says we will not accept that x > 5, in other words x = 5 or x < 5. Not accepting Ha does not report on the truth that x < 5, it still allows the possibility that x = 5 - that is H0 is not rejected. It's very tempting to say H0 is "rejected" because x = 5 is a false statement. The key is to clarify what is meant by "reject". The statistics notion of reject is not based on whether the hypothesis is a true or false statement but on if it is rejected by the acceptability criteria of Ha.

      From that perspective verify these statements (the logic flows from one to the next): If you do not accept Ha, then you do not reject H0. The only way you can reject H0 is by accepting Ha. It doesn't make sense to both reject H0 and not accept Ha.
      (1 vote)
  • piceratops tree style avatar for user VVCephei
    In the first problem, is 0.068 the correct p-value? Assuming that the null hypothesis is true, and p = 0.25, the sampling distribution of sample proportion with n = 60 should be approximately normal, with a mean = p = 0.25 and standard deviation of √((p·(1-p))/n) ≈ 0.056. So a sample with p-hat = 0.3 should only have a z-score ≈ 0.89, and there should be ≈ 0.187 probability of getting a sample with p-hat ≥ 0.3. Or am i missing something?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • primosaur seed style avatar for user Ian Pulizzotto
      You generally had the right idea for calculating the p-value. Note that the p-hat value is not 0.3, but rather 20/60 = 1/3 = 0.3333... (perhaps you did not consider the bar on top of the decimal digit 3). So the z-score is about 1.49 instead of 0.89. The probability of equaling or exceeding a z-score of 1.49 is about 0.068.

      Have a blessed, wonderful day!
      (8 votes)
  • blobby green style avatar for user JorgeMercedes
    Please please please show us how those P Values come about. We are very troubled.
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user A_meginniss
    what is the equation to calculate the p value
    (3 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      The equation to calculate the p-value depends on the specific hypothesis test being performed. For example:
      In a z-test for a population mean, the p-value can be calculated using the standard normal distribution tables or software functions.
      In a t-test for a population mean, the p-value is typically calculated using the t-distribution tables or software functions.
      In a chi-square test for independence, the p-value is calculated based on the chi-square distribution.
      Each test has its own formula for calculating the p-value based on the observed sample data and the assumptions of the test.
      (1 vote)
  • leaf green style avatar for user Junsang
    ur... are we going to be told how to calculate this P-value?

    I'm confused on what it actually is...
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user ricardoadam_
    First problem, question B, remark for answer C
    "There wasn't enough evidence to reject H0 at this significance level, but that doesn't mean we should accept H0. This experiment didn't attempt to collect evidence in support of H0."

    What would be like an experiment that would collect evidence in support of H0?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • boggle blue style avatar for user Bryan
      This experiment just assumed Ho was true; if p-value was below our sig level, then our assumption of Ho could be rejected, since it's unlikely we'd get such a deviant (or more deviant) sample proportion if Ho was true.
      If p-value was above our sig level, it tells us that Ha can be rejected, since it's likely enough to get a sample proportion of 0.333333333etc or more assuming Ho; there is no need for Ha to be true (no need for pop proportion to be higher).

      But Ha being rejected doesn't prove Ho (pop proportion = 0.25).

      For example, our hypothesis could be
      Ho: p = 0.245
      Ha: p > 0.245

      And then with a p^ of 0.33333etc, we would have a p-value of around 0.056, which still above our sig level, meaning that we reject our Ha, p > 0.245.
      This would be a contradiction if our first Ho was proven, but it wasn't, so it's not a contradiction.
      Notice how rejecting p > 0.245 doesn't conflict with rejecting p > 0.25, since we never said p had to be in between 0.245 and 0.25.

      You could maybe use the law of large numbers and coerce millions of people into guessing the water in your cups, and see if that proportion is really close to 0.25 to possibly prove that p = 0.25. There's probably a better experiment, but I'm not too experienced in thinking of them :P
      (1 vote)
  • blobby green style avatar for user brittshi000
    Thank you for the great questions. They helped me so much to prepare for my test.
    (2 votes)
    Default Khan Academy avatar avatar for user