If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

## Statistics and probability

### Course: Statistics and probability>Unit 12

Lesson 1: The idea of significance tests

# Idea behind hypothesis testing

## Want to join the conversation?

• Excuse me, how to compute P(accurate #/100), I'm lost here, could anybody give me some help?
C (n / k) = n! / ((n-k)!*k!)
P(100/100) = 0.99^100 * C (100 / 100) = 0.99^100 * 1 =~ 0.366
P(99/100) = (0.99^99)*(0.01^1) * C (100 / 99) = 0.99^99 * 0.01 * 100 =~ 0.37
P(98/100) = (0.99^98)*(0.01^2) * C (100 / 98) = (0.99^98)*(0.01^2) * 100! / (2! * 98!) =~ 0.185
.....
P(n/100) = (0.99^n) * (0.01^(100-n)) * C(100 / n)
• Since P(accurate)=.99.
for 100 test runs P(100/100 accurate)= .99^100
similarly P(99/100 accurate)= .99^99
and P(98/100 accurate)= .99^98 and so on.. What am I doing wrong that I am not getting P(98/100 accurate)= 18.5%?
• > "What am I doing wrong"

If P(accurate) = 0.99, then:
``P( 100 accurate) = (100 nCr 0) * 0.99^100 * 0.01^0 = 0.366P( 99 accurate)  = (100 nCr 1) * 0.99^99  * 0.01^1 = 0.3697P( 98 accurate)  = (100 nCr 2) * 0.99^98  * 0.01^2 = 0.1848``

You're failing to account for:
1. The probability of the non-accurate trials (hence the 0.01^(n-x) factors)
2. The number of ways to arrange the accurate/non-accurate trials.

The fact that you got the right answer for "99 accurate" is pure coincidence. The 0.01^1 and the 100 cancel each other out. This is NOT a general rule, just a coincidence of the numbers in this particular case.
• At , how did Sal get 36%?
I watched the combinatorics videos he mentions but I still don't understand...

>>>>>>>>>>>> EDIT (7 months later) <<<<<<<<<<
The videos I watched were not the ones Sal was referencing. As per @greentree096 's answer, the correct videos are the ones on Binomial Distribution https://www.khanacademy.org/v/binomial-distribution
• He took the probability the test is accurate (each individual time) and raised it to the 100th power. This is because each test should be independent, so he can multiply the probability of these events. This leads to:
0.99^100 (which represents getting an accurate test each of the 100 times, all in a row).

As he starts to throw inaccurate tests into the mix, he has to start to multiply by the number of ways it can happen. For example, with the 99 accurate tests he takes 0.99^99 * 0.01^1 (this represents 99 accurate tests and 1 inaccurate test) but he has to multiply this result by the number of ways it can happen in order to represent the complete answer. This leads to:
0.99^99 * 0.01^1 * 100
• How does Sal calculate these values? Is there some kind of formula I dont know?
• Short Answer: Its the binomial distribution formula. P(n,r)=nCr*p^r*q^n-r
where n is number of trials,
r is the number f success
p is the probability of success,
q is the probability of failure.
Long Answer: The binomial theorem formula is a formula use to calculate the probability that an event will be successful r times if n times occur. To use the example in the video, we are given that the probability the event is successful is 99% or 0.99. We run 100 trials, so n will equal 100.
The question is of these 100 trials, what is the probability than it will be successful 100 times?
If we use the formula, n=100, r=100, p=0.99 and q, the probability that the event will fail is
1 - 0.99=0.01
So, the probability that in 100 trial, all of them are successful is
P(100,100) = 100C100*0.99^100*0.01^0 = 0.366 or 36.6%
• At approximately the minute mark Sal has just finished writing the approximate probabilities of a test that is generally 99% accurate delivering differing levels of accuracy for a given sample size of 100. I noticed that there was about a 5 times greater likelihood of 96/100 versus 95/100, a 4 times greater likelihood of 97/100 versus 96/100, a 3 times greater likelihood of 98/100 versus 97/100, and a 2 times greater likelihood of 99/100 versus 98/100. My question is: Is there an underlying rule of probability or math at work here that causes this 5,4,3,2 pattern or is this simply noise? Thank you. I hope my question makes sense.
• If I have a test that was administered to 1,000 people. The results of my testing was:
990 trials proved to be true
10 times proved to be false
Thus, my test is 99% accurate.
I then give the same test to 100 more people and I should expect that the results would be 99 true and 1 false but according to this hypothesis, I should only expect approximately 36 out of the 100 to be true.
Seems strange.
• What is hypothesis testing? The title advertises "idea behind" it, but IMHO I didn't see any explanation of that (and the previous video (Symple hypothesis testing), only a bunch of pre-calculated percentages.
• Yeah, it might be clearer to start watching the video at around and ignore the set-up. Here's the gist of it. We have some experimental data that we hope confirms our hypothesis that the test is 99% accurate. So we test that hypothesis by assuming that it is true, which then gives us the ability to do the binomial distribution calculation. That calculation leads us to conclude that our experimental results would have been very unlikely to have arisen from a test with 99% accuracy. Therefore, we are left with a very low confidence in the hypothesis that the test is 99% accurate, so we should reject that hypothesis.
• I am not understanding how is the probability of getting an accuracy of 99% in 100 test cases is less than the probability of getting an accuracy of 99% in 99 test cases. The confusion increases more for me as in the following tests of 98, 97, 96, and 95 test cases, the probability decreases as it should.
• Doesn't one need to account for combinatorics with the inaccurate results. In other words what about the small group of results with 0.01 chance. For 97/100, the 3/100 inaccurate chances can also be rearranged even if the 97 accurate results do not change position in the permutation count.
(1 vote)
• Yes, one does. I think Mikhail's answer to Webber Huang's question is correct.

Consider the case of getting 97 accurate results:

Simply calculating 0.99^97 isn't correct; it works out to 38% because it doesn't include the inaccurate results.

Including the three inaccurate results means multiplying 0.99^97 by 0.01^3. That calculation yields a much smaller number (about 3.8 x 10^-7), but that represents the probability of just one particular way of getting 97 accurate and 3 inaccurate results.

The number of ways to have 97 accurate and 3 inaccurate ("100 choose 97") works out to 161,700.

So, to find the probability of any run including 97 accurate and 3 inaccurate, the calculation is

0.99^97 * 0.01^3 * 161,700 = (about) 0.061

which is pretty close to the value Sal gave in the vid.