If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Conditional probability tree diagram example

AP.STATS:
VAR‑4 (EU)
,
VAR‑4.D (LO)
,
VAR‑4.D.1 (EK)
,
VAR‑4.D.2 (EK)
CCSS.Math: , , ,
Using a tree diagram to work out a conditional probability question. If someone fails a drug test, what is the probability that they actually are taking drugs?

Want to join the conversation?

  • duskpin ultimate style avatar for user axel.afc.cardenas
    This video gave me that "AHA, I get it now!" moment. I hear that's a type of accomplishment for teachers. thanks
    (6 votes)
    Default Khan Academy avatar avatar for user
  • starky seed style avatar for user Felipe Montealegre
    at . I think it could be misleading to think that a jury should take into account only the probability that Sal is on drugs GIVEN that he tested positive as a measure of how likely or unlikely is the event that Sal is on drugs. (i know that the question we are trying to solve is that one specifically). That 28% probability is GIVEN that Sal happened to be on that first 2% of being incorrecly tested positively when in fact he was not on drugs. Any thoughts?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user farhanabedin97
    can we solve it with formula?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • piceratops ultimate style avatar for user Khadijah Flowers
      It is possible to use the formula here. You would still have to get the following:

      We want the percentage of people who are tested positive who are actually on drugs.

      We can get this by finding the five percent of ten thousand on drugs (500) and then we multiply this by the probability that the test is correct (99% of the time) and Sal got this to be 495.

      Now we can get the other probability that someone is not on drugs but tests positive which is the other 95% of people (9500) multiplied by the probability that these people who are clean test positive for drugs (190)

      Here's how you'd use Conditional

      A = On the drugs
      B = Tested positive
      P(A|B) = P(A and B) /P(B)

      In this case, the only way to find P(B) is to use the law of Total Probability which is just the sum of all the times when the event B occurs, no matter what happened before it (so Not A is included if B still happened). B occurred when someone was on drugs and tested positive and then it occurred 2% of the time when someone was not on drugs and tested positive. This is why we add 495 and 190.

      P(A & B) = 495
      P(B) = 190 (test positive, but not on drugs) + 495(test positive, and are on drugs)

      This is long winded, but the only immediate difference between this problem and maybe some other conditional probability problems is that the problem branches and the event given may occur in several places throughout your experiment, to which you only have to use the law of Total Probability to sum all of the occurrences of B.
      (8 votes)
  • blobby green style avatar for user goyalnikhilgoyal08
    Why we are the favorable cases 495? Actual drug users are 500. So why not 500. Denominator part I got it. But confused about numerator part.
    (3 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Dominic B
    Can we get this using only formula without giving a specific number of applicants?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • male robot hal style avatar for user Zev Oster
      Sure. All those numbers are just percentages of 100%. If you did what he did based on percentages:

      0.05x0.99=0.0495=4.95%
      0.05x0.01=0.0005=0.05%
      0.95x0.02=0.019=1.9%
      0.95x0.98=0.931=93.10%


      All of the percentages are the same, and therefore the final results are the same.
      (5 votes)
  • starky seed style avatar for user Lucille Dautemer
    why wasn't the formula
    P(U+ I T+)=[P(T+ I U+)*P(U+)]/[(P(T+ I U+)*P(U+))+(P(T+ I U-)*P(U-))]
    used? And why won't it work ?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • starky sapling style avatar for user Aravind Unni
    Can someone please help me solve this question using the tree?

    The Probality that a student knows the correct answer to a multiple choice question is 2/3 . If the student does not know the answer , then the student guesses the answer . The probality of the guessed answer being correct is 1/4 . Given that the student has answered the questions correctly , the conditional probability that the student knows the correct answer is ?

    The answer should be 8/9
    (1 vote)
    Default Khan Academy avatar avatar for user
    • cacteye blue style avatar for user Jerry Nilsson
      2∕3 of the time the student knows the answer and thereby also gives the correct answer to the question.
      So, the probability that the student knows the answer AND answers correctly is
      2∕3 ∙ 1 = 2∕3

      1∕3 of the time the student doesn't know the answer, in which case they answer correctly 1∕4 of the time.
      So, the probability that the student doesn't know the answer AND answers correctly is
      1∕3 ∙ 1∕4 = 1∕12

      Thereby, the student answers correctly
      2∕3 + 1∕12 = 3∕4 of the time.

      Now, for the conditional probability we want to view that 3∕4 as if it was 1 whole, which we achieve by multiplying by its reciprocal, namely 4∕3.

      What we do to one side of an equation we also have to do to the other side, and we get
      (2∕3 ∙ 4∕3) + (1∕12 ∙ 4∕3) = 3∕4 ∙ 4∕3,
      which simplifies to
      8∕9 + 1∕9 = 1

      Thus follows that 8∕9 of the times that the student answers a question correctly it's because they already knew the answer and didn't have to guess it.
      (5 votes)
  • blobby green style avatar for user fractionfred
    At , it is said that "0.05% is 500th of a percent" where it should be 200th.
    (2 votes)
    Default Khan Academy avatar avatar for user
  • leafers ultimate style avatar for user charles john
    the numerator should be 5% right? since its given in the question abt the percentage of people actually taking illegal drug. Can somebody help me.
    (1 vote)
    Default Khan Academy avatar avatar for user
  • winston default style avatar for user william rivera
    why are they talking about drugs
    (1 vote)
    Default Khan Academy avatar avatar for user

Video transcript

- [Instructor] A company screens job applicants for illegal drug use at a certain stage in their hiring process. The specific test they use has a false positive rate of two percent and a false negative rate of one percent. Suppose that five percent of all their applicants are actually using illegal drugs and we randomly select an applicant. Giving the applicant test positive, what is the probability that they are actually on drugs? So let's work through this together. So first let's make sure we understand what they're telling us. So there is this drug test for the job applicants and then the test has a false positive rate of two percent. What does that mean? That means that in two percent of the cases, when it should have read negative, that the person didn't do the drugs, it actually read positive. It is a false positive. It should have read negative but it read positive. Another way to think about it. If someone did not do drugs and you take this test, there's a two percent chance saying that you did do the illegal drugs. They also say that there is a false negative rate of one percent. What does that mean? That means that one percent of the time if someone did actually take the illegal drugs, it'll say that they didn't. It is falsely giving a negative result when it should have given a positive one. And then they say that five percent of all their applicants are actually using illegal drugs. So there is several ways that we can think about it. One of the easiest way to conceptualize is just, let's just make up a large number of applicants, and I'll use a number where it's fairly straightforward to do the mathematics. So let's say that we start of with 10,000 applicants. I will both talk in absolute numbers, and I just made this number up. It could have been 1,000, it could have been 100,000, but I like this number 'cause it's easy to do the math better than saying 9,785. This is also going to be 100% of the applicants. Now they give us some crucial information here. They tell us that five percent of all their applicants are actually using illegal drugs. So we can immediately break this 10,000 group into the ones that are doing the drugs and the ones that are not. So five percent are actually on the drugs, 95% are not on the drugs. So what's five percent of 10,000? So that would be 500. So 500 on drugs, on drugs. Once again, this is five percent of our original population. And then how many are not on drugs? Well 9,500 not on drugs. And once again, this is 95% of our group of applicants. So now let's administer the test. So what is going to happen when we administer the test to the people who are on drugs? Well the test, ideally, would give a positive result. It would say positive for all of them, but we know that it's not a perfect test. It's going to give negative for some of them. It will falsely give a negative result for some of them, and we know that because it has a false negative rate of one percent. Of these 500, 99% is going to get the correct result in that they're going to test positive. So what is 99% of 500? Well let's see, that would be 495. 495 are going to test positive. I will just use a positive right over there. And then we're going to have one percent, which is five, are going to test negative. They are going to falsely test negative. This is the false negative rate. If we say, what percent of our original applicant pool is on drugs and tests positive, well 495 over 10,000. This is 4.95%. What percent is of the original applicant pool that is on drugs but tests negative for drugs? The test says that hey they're not doing drugs. Well this is gonna be five out of 10,000. Which is 0.05%. Another way that you could get these percentages. If you take five percent and multiply by one percent, you're goin to get 0.05%. 500ths of a percent. If you take five percent and multiply by 99%, you're going to get 4.95%. Now let's keep going. Now let's go to the folks who aren't taking the drugs. And this is where the false positive rate is going to come into effect. So we have a false positive rate of two percent. So two percent are going to test positive. What's two percent of 9,500? It's 190 would test positive even though they're not on drugs. This is the false positive rate. So they are testing positive, and then the other 98% will correctly come out negative. The other 98%, so 9,500 minus 190, that's gonna be 9,310 will correctly test negative. Now what percent of the original applicant pool is this? Well 190 is 1.9%, and we could calculate it by 190 over 10,000 or you could just say two percent of 95% is 1.9%. Once again, multiply the path along the tree. What percent is 9.310? Well that is going to be 93.10%. You could say this is 9,310 over 10,000 or you can multiply by the path on our probability tree here. 95% times 98% gets us to 93.10%. But now I think we are ready to answer the question. Given that the applicant tests positive, what is the probability that they are actually on drugs? So let's look at the first part. Given the applicant tests positive. So which applicants actually tested positive? You have these 495 here tested positive, correctly tested positive, and then you have these 190 right over here incorrectly tested positive, but they did test positive. So how many tested positive? Well we have 495 plus 190 tested positive. That's the total number that tested positive, and then which of them were actually on the drugs? Well of the ones that tested positive, 495 were actually on the drugs. We have 495 divided by 495 plus 190 is equal to 0.7226. So we could say approximately 72%. Approximately 72%. Now this is really interesting. Given the applicant tests positive, what is the probability that they are actually on drugs? When you look at these false positive and false negative rates, they seem quite low, but now when you actually did the calculation, the probability that someone's actually on drugs is, it's high, but it's not that high. It's not like if someone were to test positive that you'd say oh they are definitely taking the drugs. And you could also get to this result just by using the percentages. For example, you could think in terms of what percentage of the original applicants end up testing positive? Well that's 4.95% plus 1.9%. 4.95, we'll just do it in terms of percent, plus 1.9%, and of them, what percentage were actually on the drugs? Well that was the 4.95%. And notice this would give you the exact same result. Now there's an interesting takeaway here. Because this is saying, of the people that test positive, 72% are actually on the drugs. You could think about it the other way around. Of the people who test positive. 495 plus 190, what percentage aren't on drugs? Well that was 190, and this comes out to be approximately 28%. 100% minus 72%. If we were in a court of law and let's say the prosecuting attorney, let's say I got tested positive for drugs and the prosecuting attorney says look, this test is very good. It only has a false positive rate of two percent and Sal tested positive, he is probably taking drugs. A jury who doesn't really understand this well or go through the trouble that we just did might say, oh yeah Sal probably took the drugs. But when we look at this, even if I test positive using this test, there's a 28% chance that I'm not taking drugs. That I was just in this false positive group, and the reason why this number is a good bit larger than this number is because when we looked at the original division between those who take drugs and don't take drugs, most don't take the illegal drugs. Two percent of this larger group of the ones that don't take the drugs, well this is actually a fairly large number relative to the percentage that do take the drugs and test positive. So I will leave you there. This is fascinating not just for this particular case, but you will see analysis like this all the time when we're looking at whether a certain medication is effective or a certain procedure is effective. It's important to be able to do this analysis.