Main content
Current time:0:00Total duration:7:49

Video transcript

- [Voiceover] Let's say that you have a cholesterol test, and you know, you somehow magically know that the probability that it is accurate, that it gives the correct results is 99, 99%. You have a 99 out of 100 chance that any time you apply this test, that it is going to be accurate. Now let's say that you, and you just magically know that, we're just assuming that. Now let's just say that you get 100 folks into this room and you apply this test to all 100 of them. So apply, apply test 100 times. So what are some of the possible outcomes here? Is it for sure that 99, exactly 99 out of the 100 are going to be accurate and that 1 out of the 100 is gonna be inaccurate? Well that's definitely a likely possibility, but it's also possible you get a little lucky and all 100 are accurate, or you get a little unlucky and that 98 are accurate and that two are inaccurate. And actually, I calculated the probabilities ahead of time, and the goal of this video isn't to go into the probability and combinatorics of it, but if you're curious about it, there's a lot of good videos on probability and combinatorics on Khan Academy, but I calculated ahead of time, and the probability, if you have something that has a 99% chance of being accurate, and you apply it 100 times, the probability that it is accurate, that it is accurate 100 out of the 100 times, is approximately equal to, approximately equal to 36.6%. I rounded to the nearest tenth of a percent. So it's a little better than a third chance that you'll actually get, all of the people are going to get an accurate result, even though for any one of them there's a 99% chance that it is accurate. Now we could keep going, the probability that it is accurate, I'm just gonna put these quotes here so I don't have to rewrite accurate over and over again, the probability that it is accurate 99 out of 100 times, I calculated it ahead of time, it is approximately 37.0%. So this is what you would expect, getting 100 out of 100 doesn't seem that unlikely if each of the times you apply it has a 99% chance of being accurate, but it makes sense that you would expect 99 out 100 to be even more likely, slightly more likely. And we can of course keep going, the probability that it is accurate 98 out of 100 times is approximately 18.5%. And I'm just gonna do a few more. The probability that it is accurate 97 out of 100 times, and once again I calculated all of these ahead of time, is 6%, so it's definitely in the realm of possibility, but it's, the probability is much lower than having 99 out of 100 or 100 out of 100 being accurate, and then the probability, let me put the double quotes here, the probability that it is accurate 96 out of 100 times is approximately 1.5%, and then the probability, and I'll just do one more, I could keep going, the probability, you know, there's some probability that even though each test has a 99% chance you just get super unlucky and that, you know, very few of them are accurate, well I'll just, and you see, you see what's happening to the probabilities as we have fewer and fewer of them being accurate, it becomes less and less probable. So the probability that 95 out of the 100 are accurate is, is approximately 0.3%. So this was just kind of a, I guess you could say a thought experiment. If we had a test that we know for sure that every time you administer it, the probability that it is accurate is 99%, then these are the probabilities that if you administered it 100 times, that you get 100 out 100 accurate, the probability that you get 99 out of 100 accurate, and so on and so forth. So let's just keep that in mind, and then think a little bit about hypothesis testing, and how we can use this framework. So let's put all that in the back of our minds, and let's say that you have devised a new test, you have a new test, and you don't know how accurate it is. You have a new cholesterol test, you don't know how accurate it is, you know that in order for it to be approved by whatever governing body it has to be accurate 99, the probability of it being accurate has to be 99%. So needs to have probability of accurate, accurate equal to 99%. You don't know if this is true, you just know that that's what it needs to be. And so you have your test, and let's you set up a hypothesis, and your hypothesis could be a lot of things, and once you get deeper into statistics, there's, you know, null hypothesis and alternate hypotheses, but let's just start with just a simple hypothesis, you're hopeful, your hypothesis is that the probability that your new test is accurate is, this is your hypothesis, because you want that to be your hypothesis cause if you feel good about it, then you're like, okay, yeah, maybe I'll get approved by the appropriate governing body. So you say, "Hey, my hypothesis is that "my new test is accurate 99, the probability "of it being accurate is 99%." So then you go off and you apply it 100 times. So you apply your new test, you don't know the actual probability of it being accurate, you apply the test 100 times. And let's say out of those 100 times you get that they are accurate, you get that it is accurate, and you're able to use some other test that you, you know, some for-sure test, some super accurate test, to verify your own test results, and you see that it is accurate 95 out of the 100 times. So the question you have is, well, does the hypothesis make sense to you? Will you accept this hypothesis? Well what you say is, well, "If my hypothesis was true, "if my test were accurate 99, "if the probability of my test being accurate is 99%, "what's the probability of me getting this outcome?" Well, we figured that out. If it really was accurate 99% of the time, then the probability of getting this outcome is only 0.3%. So if you assume true, if you assume hypothesis, hypothesis, I'll just write "hyp," if you assume the hypothesis is true, the probability of the outcome you got, probability of observed outcome is approximately 0.3%. And so you say, "Look, you know, "maybe, it's definitely possible "that I just got very, very, very, very unlikely, "but based on this, I probably should reject my hypothesis, "because the probability of me getting this outcome, "if the hypothesis was true, "is very, very, very, very low." And as we go deeper into statistics, you'll see that there are thresholds that people often set, for, you know, if the probability of something happening or not happening is above or below some threshold, then we might reject a certain hypothesis. But in this world, you could see that, look, if my test really was accurate 99% of the time, for me to get, when I apply it to 100 people, it's only accurate 95 out of 100, if my hypothesis is true, that would have only, there's only a 0.3% chance that I would have seen this observation. So based on that, it might be completely reasonable to say, "You know what, I might reject my hypothesis, "look for a new test, I don't feel good "about this new cholesterol test that I constructed."