Introduction to Poisson Processes and the Poisson Distribution. Created by Sal Khan.
Want to join the conversation?
- seriously, just got lost when there was intro of limits and the evaluation of e. someone please explain what does that has to do with the Poisson process, and if possible in simple terms what really is Poisson process and its relevance?(20 votes)
- The normal distribution has a bell curve as its probability distribution. Many practical real-world measurements follow a bell curve. Averages of samples from any distribution can be approximated with a bell curve by the Central Limit Theorem. The Central Limit Theorem is very powerful, but there are certain limitations, and assumptions must be made to use it. It is used often in sampling theory and hypothesis testing.
The Poisson and Binomial distributions are discrete "counting" distributions. The regular probability distributions (not sampling distributions) are generally skewed, not symmetric like the normal distribution (which by the way is continuous, not discrete). The Binomial distribution assumes a predetermined number of trials, but the Poisson has no upper limit of possible successes. This is why limits are used to show the relationship between the two distributions.(36 votes)
- What are the prerequisites to studying more advanced statistics? I'm interested in learning statistics but I haven't done much study of calculus or any of the other higher mathematics.(7 votes)
- ^ I will correct you here. You will need to have done a first-year math course in calculus and algebra. a second-year calculus course is highly recommended. most universities now require you to have done a basic probability course, normally offered at second year. Mathematical statistics is a very hard course to take if you don't understand basic probability, calculus and algebra.(5 votes)
- How is lambda/60 a probability?(8 votes)
- lambda = n.p where n is number of intervals and p is probability of a success.
Divide both sides by n
lambda/n = p(12 votes)
- On4:15, Sal says that P(X=k) is the probability of k cars passing in an hour. On5:06, he says, "What happens if one car passes in an hour? Or more than one car passes in a minute? " These two statements are confused me.(9 votes)
- why DOES Poisson Dist., Binomial Dist. , and Normal Dist. act as a BELL CURVE? is it really that fascinating or there is some specific reason behind it?(2 votes)
- To go a bit further than the above answer, Poisson, Binomial, and Normal are all related (and all have continuous analogs, which are likewise related). Poisson is a special case of binomial in which n (the number of events) is very high and p (the probability of each event) is very low.
While you should understand the proof of this in order to use the relationship, know that there are times you can use the binomial in place of the poisson, but the numbers can be very hard to deal with. As an example, try calculating a binomial distribution with p = .00001 and n = 2500. Mind you, this will require you to do 2500!, which is not very convenient. On the other hand, converting it into a Poisson problem makes it much more manageable.
The normal distribution on the other hand can be used with any sample mean and the Central Limit Theorem. It's all part of the awesome cycle of life. :)(13 votes)
- I believe the answer to the calculation in the end is 0.004998097 or 0.5% (we want to check “2” with lambda 9).
If I try 9 with lambda 9, I get 13%. Isn’t this value too low assuming the expected value IS 9?(3 votes)
- Good question! It's easy to think that the probability of the mean value should be higher than 13%, but when there are lots and lots of possible values, then even the most likely value won't have a very high probability. For example, if I flip a coin 2 times, and I get exactly 1 heads, that's no surprise. But if I flip a coin 1000 times and get EXACTLY 500 heads, that's actually pretty amazing, even though 500 is definitely the most likely outcome (and the mean). So in this example, if you calculate the probabilities for every value from k = 0 to k = 20, you will see that the distribution DOES peak around k = 9, but it's just spread out a bit. The values near k = 9 are:
So you see that if you have to divide up 100% into lots of possibilities, even the most likely one might not actually be very likely!(5 votes)
- I am so lost, I don't understand anything at all(4 votes)
- The easiest way to do this is:
(e to the negative Lambda) multiplied by (x to the k) / divided by k factorial.
It is really simples because there is a lot of repetition in the setup. e to the - Lambda, Lambda to the k, over k!; e, Lambda, Lambda, k, k(2 votes)
- How can we have probability distribution with parameter as time as Binomial Distribution?
Is it because we have two answer yes or NO for a car passing to the min or not?
IF yes then is it that we can model everything in the universe with Binomial distribution?(3 votes)
- The answer is "it depends". The Binomial Distribution is a very powerful and versatile tool in statistics, but it doesn't cover everything. There are a number of conditions that the binomial depends on, such as independence, that does not apply in all situations.
I guess a simple way of putting it is that a lot of distributions can be thought of as "like the Binomial, but for Special Case X".
I'm oversimplifying a lot, but there are a lot of cases in statistics where a distribution is derived from another distribution for specific cases, a classic example being the Normal distribution and Student's t. Yes, in large samples, the Normal and t distributions are identical, but what if you don't have a large sample? Then t works better. The same logic applies to the Binomial, the Poisson, and the whole family of distributions derived from them.
The field of density functions in statistics is vast and varied, and Khan Academy, and indeed, any intro to statistics course you may take in college, will only be able to scratch the surface of this topic.
However, statistics is easier than you think, if you focus more on the utility and less on the math. In this age of technology, nobody will care if you can calculate a density function. Any computer can do that in milliseconds. What is far more important is if you can explain what the function means, and how it is important. That's why I always tell my students that statistics is much more of a writing class than a math class. So don't intimidated by the calculations, and focus on understanding of the general idea. Good luck!(0 votes)
- Is there a specific video for e, or for the limit approching infinity, that Sal mentions at9:10?(3 votes)
- Check out the Precalculus section of exponential and logarithmic functions. Go to the videos on compound interest and study those. He explains how to derive e. it helps to use a calculator from your end to follow sals logic(3 votes)
Let's say you're some type of traffic engineer and what you're trying to figure out is, how many cars pass by a certain point on the street at any given point in time? And you want to figure out the probabilities that a hundred cars pass or 5 cars pass in a given hour. So a good place to start is just to define a random variable that essentially represents what you care about. So let's say the number of cars that pass in some amount of time, let's say, in an hour. And your goal is to figure out the probability distribution of this random variable and then once you know the probability distribution then you can figure out what's the probability that 100 cars pass in an hour or the probability that no cars pass in an hour and you'd be unstoppable. And just a little aside, just to move forward with this video, there's two assumptions we need to make because we're going to study the Poisson distribution. And in order to study it's there's two assumptions we have to make. That Poisson hour at this point on the street is no different than any other hour. And we know that that's probably false. During rush hour in a real situation you probably would have more cars than at another rush hour. And you know, if you wanted to be more realistic maybe we do it in the day because in a day any period of time-- actually, no. I shouldn't do a day. We have to assume that every hour is completely just like any other hour and actually, even within the hour there's really no differentiation from one second to the other in terms of the probabilities that a car arrives. That's a little bit of a simplifying assumption that might not truly apply to traffic, but I think we can make that assumption. And then the other assumption we need to make is that if a bunch of cars pass in one hour that doesn't mean that fewer cars will pass in the next. That in no way does the number of cars that pass in one period affect or correlate or somehow influence the number of cars that pass in the next. That they're really independent. Given that, we can then at least try using the skills we have to model out some type of a distribution. The first thing you do and I'd recommend doing this for any distribution is maybe we can estimate the mean. Let's sit out on that curb and measure what this variable is over a bunch of hours and then average it up, and that's going to be a pretty good estimator for the actual mean of our population. Or, since it's a random variable, the expected value of this random variable. Let's say you do that and you get your best estimate of the expected value of this random variable is-- I'll use the letter lambda. You know, this could be 9 cars per hour. You sat out there-- it could be 9.3 cars per hour. You sat out there over hundreds of hours and you just counted the number of cars each hour and you averaged them all up. You said, on average, there are 9.3 cars per hour and you feel that's a pretty good estimate. So that's what you have there. And let's see what we could do. We know the binomial distribution. The binomial distribution tells us that the expected value of a random variable is equal to the number of trials that that random variable's kind of composed of, right? Before, in the previous videos we were counting the number of heads in a coin toss. So this would be the number of coin tosses, times the probability of success over each toss. This is what we did with the binomial distribution. So maybe we can model our traffic situation something similar. This is the number of cars that pass in an hour. So maybe we could say lambda cars per hour is equal to-- I don't know. Let's make each experiment or each toss of the coin equal to whether a car passes in a given minute. So there are 60 minutes per hour, so there would be 60 trials. And then, the probability that we have success in each of those trials, if we modeled this as a binomial distribution would be lambda over 60 cars per minute. And this would be a probability. This would be n, and this would be the probability, if we said that this is a binomial distribution. And this probably wouldn't be that bad of an approximation. If you actually then said, oh, this is a binomial distribution, so the probability that our random variable equals some given value, k. You know, the probability that 3 cars, exactly 3 cars pass in an given hour, we would then be equal to n. So n would be 60. Choose k, and you know, I have 3 cars times the probability of success. So the probability that a car passes in any minute. So it'd be lambda over 60 to the number of successes we need. So to the kth power, times the probability of no success or that no cars pass, to the n minus k. If we have k successes we have to have 60 minus k failures. There are 60 minus k minutes where no car passed. This actually wouldn't be that bad of an approximation where you have 60 intervals and you say this is a binomial distribution. And you'd probably get reasonable results. But there's a core issue here. In this model where we model it as a binomial distribution, what happens if more than one car passes in an hour? Or more than one car passes in a minute? The way we have it right now we call it a success if one car passes in a minute. And if you're kind of counting it counts as one success, even if 5 cars pass in that minute. So you say, oh, OK Sal, I know the solution there. I just have to get more granular. Instead of dividing it into minutes why don't I divide it into seconds? So the probability that I have k successes-- instead of 60 intervals I'll do 3,600 intervals. So the probability of k successful seconds, so a second where a car is passing at that moment out of 3,600 seconds. So that's 3,600 choose k, times the probability that a car passes in any given second. That's the expected number of cars in an hour divided by number seconds in an hour. We're going to have k successes. And these are the failures, the probability of a failure and you're going to have 3,600 minus k failures. And this would be even a better approximation. This actually would not be so bad, but still, you have this situation where 2 cars can come within a half a second of each other. And you say, oh, OK Sal, I see the pattern here. We just have to get more and more granular. We have to just make this number larger and larger and larger. And your intuition is correct. And if you do that you'll end up getting the Poisson distribution. And this is really interesting because a lot of times people give you the formula for the Poisson distribution and you can kind of just plug in the numbers and use it. But it's neat to know that it really is just the binomial distribution and the binomial distribution really did come from kind of the common sense of flipping coins. That's where everything is coming from. But before we kind of prove that if we take the limit as-- let me change colors. Before we proved that as we take the limit as this number right here, the number of intervals approaches infinity that this becomes the Poisson distribution. I'm going to make sure we have a couple of mathematical tools in our belt. So the first is something that you're probably reasonably familiar with by now, but I just want to make sure that the limit as x approaches infinity of 1 plus a/x to the x power is equal to e to the ax-- no sorry. Is equal to e to the a and now just to prove this to you, let's make a little substitution here. Let's say that n is equal to-- let me say 1 over n is equal to a over x. And then what would be x would equal to na. x times 1 is equal to n times a. And so the limit as x approaches infinity, what does a approach? a is-- sorry. As x approaches infinity what does n approach? Well n is x divided by a. So n would also approach infinity. So this thing would be the same thing as just making our substitution the limit as n approaches infinity of 1 plus-- a/x, I made the substitution as 1/n. And x is, by this substitution, n times a. And this is going to be the same thing as the limit as n approaches infinity of 1 plus 1/n to the n, all of that to the a. And since there's no n out here we could just take the limit of this and then take that to the a power. So that's going to be equal to the limit as n approaches infinity of 1 plus 1/n to the nth power, all of that to the a. And this is our definition, or one of the ways to get to e if you'd watch the videos on compound interest and all that. This is how we got to e. And if you tried it out on your calculator, just try larger and larger n's here and you'll get e. This inner part is equal to e, and we raised it to the a power, so it's equal to e to the a. So hopefully you pretty satisfied that this limit is equal to e to the a. And then one other tool kit I want in our belt, and I'll probably actually do the proof in the next video. The other tool kit is to recognize that x factorial over x minus k factorial is equal to x times x minus 1 times x minus 2, all the way down to times x minus k plus 1. And we've done this a lot of times, but this is the most abstract we've ever written it. I can give you a couple of-- and just so you know, they'll be exactly k terms here. 1, 2, 3-- So first term, second term, third term, all the way, and this the kth term. And this is important to our derivation of the Poisson distribution. But just to make this in real numbers, if I had 7 factorial over 7 minus 2 factorial, that's equal to 7 times 6 times 5 times 4 times 3 times 3 times 1. Over 2 times-- no sorry. 7 minus 2, this is 5. So it's over 5 times 4 times 3 times 2 times 1. These cancel out and you just have 7 times 6. And so it's 7 and then the last term is 7 minus 2 plus 1, which is 6. In this example, k was 2 and you had exactly 2 terms. So once we know those two things we're now ready to derive the Poisson distribution and I'll do that in the next video. See you soon.