Conditional probability and independence
- [Instructor] James is interested in weather conditions and whether the downtown train he sometimes takes runs on time. For a year, James records whether each day is sunny, cloudy, rainy or snowy, as well as whether this train arrives on time or is delayed. His results are displayed in the table below. Alright, this is interesting. These columns, on time, delayed and the total, so for example, when it was sunny, there's a total of 170 sunny days that year, 167 of which the train was on time, three of which the train was delayed, and we can look at that by the different types of weather conditions, and then they say for these days, are the events delayed and snowy independent? So to think about this, and remember, we're only going to be able to figure out experimental probabilities, and you should always view experimental probabilities as somewhat suspect. The more experiments you're able to take, the more likely it is to approximate the true theoretical probability, but there's always some chance that they might be different or even quite different. Let's use this data to try to calculate the experimental probability. So the key question here is what is the probability that the train is delayed? And then we wanna think about what is the probability that the train is delayed given that it is snowy? If we knew the theoretical probabilities and if they were exactly the same, if the probability of being delayed was exactly the same as the probability of being delayed given snowy, then being delayed or being snowy would be independent, but if we knew the theoretical probabilities and the probability of being delayed given snowy were different than the probability of being delayed, then we would not say that these are independent variables. Now, we don't know the theoretical probabilities. We're just going to calculate the experimental probabilities and we do have a good number of experiments here, so if these are quite different, I would feel confident saying that they are dependent. If they are pretty close with the experimental probability, I would say that it would be hard to make the statement that they are dependent, and that you would probably lean towards independence, but let's calculate this. What is the probability that the train is just delayed? Pause this video and try to figure that out. Well, let's see. If we just think in general, we have a total of 365 trials, or 365 experiments, and of them, the train was delayed 35 times. Now, what's the probability that the train is delayed given that it is snowy? Pause the video and try to figure that out. Well, let's see. We have a total of 20 snowy days and we are delayed 12 of those 20 snowy days, and so this is going to be a probability, 12/20 is the same thing as, if we multiply both the numerator and the denominator by five, this is a 60% probability, or I could say a 0.6 probability of being delayed when it is snowy. This is, of course, an experimental probability, which is much higher than this. This is less than 10% right over here. This right over here is less than 0.1. I could get a calculator to calculate it exactly. It'll be nine point something percent or zero point nine something, but clearly, this, you are much more likely, at least from the experimental data, it seems like you have a much higher proportion of your snowy days are delayed than just general days in general, than just general days, and so based on this data, because the experimental probability of being delayed given snowy is so much higher than the experimental probability of just being delayed, I would make the statement that these are not independent, so for these days, are the events delayed and snowy independent? No.