Main content

## Statistics and probability

### Unit 9: Lesson 9

Poisson distribution# Poisson process 1

Introduction to Poisson Processes and the Poisson Distribution. Created by Sal Khan.

## Want to join the conversation?

- What do limits have to do with any part of statistics? Isn't that calculus? I have an extremely hard time with this already; each time something like this is introduced it throws off the entire video for me. Not one thing has been said about limits or infinity in my statistics class. If someone can edify me; well, super. I do not understand how anyone can fluently understand, or speak this mathematical language.(28 votes)
- This isn't easy, you should have a solid base/knowledge of algebra and calculus before starting statistics =).

However, if you're studying for a test, you don't really have to understand where the Poisson distribution comes from, which is what Sal's explaining here. Just use the formulas if you're in a hurry (I hate saying that).(121 votes)

- seriously, just got lost when there was intro of limits and the evaluation of e. someone please explain what does that has to do with the Poisson process, and if possible in simple terms what really is Poisson process and its relevance?(20 votes)
- The normal distribution has a bell curve as its probability distribution. Many practical real-world measurements follow a bell curve. Averages of samples from any distribution can be approximated with a bell curve by the Central Limit Theorem. The Central Limit Theorem is very powerful, but there are certain limitations, and assumptions must be made to use it. It is used often in sampling theory and hypothesis testing.

The Poisson and Binomial distributions are discrete "counting" distributions. The regular probability distributions (not sampling distributions) are generally skewed, not symmetric like the normal distribution (which by the way is continuous, not discrete). The Binomial distribution assumes a predetermined number of trials, but the Poisson has no upper limit of possible successes. This is why limits are used to show the relationship between the two distributions.(36 votes)

- What are the prerequisites to studying more advanced statistics? I'm interested in learning statistics but I haven't done much study of calculus or any of the other higher mathematics.(7 votes)
- Well, I wouldn't say calculus is compulsory, but I really don't understand why you shouldn't do it. Calculus is the most beautiful and amazing form of mathematics, that I'd recommend it straight up after algebra.(21 votes)

- How is lambda/60 a probability?(8 votes)
- lambda = n.p where n is number of intervals and p is probability of a success.

Divide both sides by n

lambda/n = p(12 votes)

- On4:15, Sal says that P(X=k) is the probability of k cars passing in an hour. On5:06, he says, "What happens if one car passes in an hour? Or more than one car passes in a minute? " These two statements are confused me.(9 votes)
- "in and hour" was a mis-statement that Sal corrects immediately by restating "in a minute".(0 votes)

- why DOES Poisson Dist., Binomial Dist. , and Normal Dist. act as a BELL CURVE? is it really that fascinating or there is some specific reason behind it?(2 votes)
- To go a bit further than the above answer, Poisson, Binomial, and Normal are all related (and all have continuous analogs, which are likewise related). Poisson is a special case of binomial in which n (the number of events) is very high and p (the probability of each event) is very low.

While you should understand the proof of this in order to use the relationship, know that there are times you can use the binomial in place of the poisson, but the numbers can be very hard to deal with. As an example, try calculating a binomial distribution with p = .00001 and n = 2500. Mind you, this will require you to do 2500!, which is not very convenient. On the other hand, converting it into a Poisson problem makes it much more manageable.

The normal distribution on the other hand can be used with any sample mean and the Central Limit Theorem. It's all part of the awesome cycle of life. :)(12 votes)

- I believe the answer to the calculation in the end is 0.004998097 or 0.5% (we want to check “2” with lambda 9).

If I try 9 with lambda 9, I get 13%. Isn’t this value too low assuming the expected value IS 9?(3 votes)- Good question! It's easy to think that the probability of the mean value should be higher than 13%, but when there are lots and lots of possible values, then even the most likely value won't have a very high probability. For example, if I flip a coin 2 times, and I get exactly 1 heads, that's no surprise. But if I flip a coin 1000 times and get EXACTLY 500 heads, that's actually pretty amazing, even though 500 is definitely the most likely outcome (and the mean). So in this example, if you calculate the probabilities for every value from k = 0 to k = 20, you will see that the distribution DOES peak around k = 9, but it's just spread out a bit. The values near k = 9 are:

6 9.1%

7 11.7%

8 13.2%

9 13.2%

10 11.9%

11 9.7%

12 7.3%

So you see that if you have to divide up 100% into lots of possibilities, even the most likely one might not actually be very likely!(5 votes)

- I am so lost, I don't understand anything at all(4 votes)
- The easiest way to do this is:

(e to the negative Lambda) multiplied by (x to the k) / divided by k factorial.

So, (e^-L)(L^k)/k!

It is really simples because there is a lot of repetition in the setup. e to the - Lambda, Lambda to the k, over k!; e, Lambda, Lambda, k, k(2 votes)

- How can we have probability distribution with parameter as time as Binomial Distribution?

Is it because we have two answer yes or NO for a car passing to the min or not?

IF yes then is it that we can model everything in the universe with Binomial distribution?(3 votes)- The answer is "it depends". The Binomial Distribution is a very powerful and versatile tool in statistics, but it doesn't cover everything. There are a number of conditions that the binomial depends on, such as independence, that does not apply in all situations.

I guess a simple way of putting it is that a lot of distributions can be thought of as "like the Binomial, but for Special Case X".

I'm oversimplifying a lot, but there are a lot of cases in statistics where a distribution is derived from another distribution for specific cases, a classic example being the Normal distribution and Student's t. Yes, in large samples, the Normal and t distributions are identical, but what if you don't have a large sample? Then t works better. The same logic applies to the Binomial, the Poisson, and the whole family of distributions derived from them.

The field of density functions in statistics is vast and varied, and Khan Academy, and indeed, any intro to statistics course you may take in college, will only be able to scratch the surface of this topic.

However, statistics is easier than you think, if you focus more on the utility and less on the math. In this age of technology, nobody will care if you can calculate a density function. Any computer can do that in milliseconds. What is far more important is if you can explain what the function means, and how it is important. That's why I always tell my students that statistics is much more of a writing class than a math class. So don't intimidated by the calculations, and focus on understanding of the general idea. Good luck!(0 votes)

- Is there a specific video for e, or for the limit approching infinity, that Sal mentions at9:10?(3 votes)
- Check out the Precalculus section of exponential and logarithmic functions. Go to the videos on compound interest and study those. He explains how to derive e. it helps to use a calculator from your end to follow sals logic(3 votes)

## Video transcript

Let's say you're some type of
traffic engineer and what you're trying to figure out is,
how many cars pass by a certain point on the street at
any given point in time? And you want to figure out
the probabilities that a hundred cars pass or 5
cars pass in a given hour. So a good place to start is
just to define a random variable that essentially
represents what you care about. So let's say the number of cars
that pass in some amount of time, let's say, in an hour. And your goal is to figure out
the probability distribution of this random variable and then
once you know the probability distribution then you can
figure out what's the probability that 100 cars pass
in an hour or the probability that no cars pass in an hour
and you'd be unstoppable. And just a little aside, just
to move forward with this video, there's two assumptions
we need to make because we're going to study the
Poisson distribution. And in order to study it's
there's two assumptions we have to make. That Poisson hour at this point
on the street is no different than any other hour. And we know that that's
probably false. During rush hour in a real
situation you probably would have more cars than
at another rush hour. And you know, if you wanted to
be more realistic maybe we do it in the day because in a day
any period of time-- actually, no. I shouldn't do a day. We have to assume that every
hour is completely just like any other hour and actually,
even within the hour there's really no differentiation from
one second to the other in terms of the probabilities
that a car arrives. That's a little bit of a
simplifying assumption that might not truly apply to
traffic, but I think we can make that assumption. And then the other assumption
we need to make is that if a bunch of cars pass in one hour
that doesn't mean that fewer cars will pass in the next. That in no way does the number
of cars that pass in one period affect or correlate or somehow
influence the number of cars that pass in the next. That they're really
independent. Given that, we can then at
least try using the skills we have to model out some
type of a distribution. The first thing you do and I'd
recommend doing this for any distribution is maybe we
can estimate the mean. Let's sit out on that curb and
measure what this variable is over a bunch of hours and then
average it up, and that's going to be a pretty good estimator
for the actual mean of our population. Or, since it's a random
variable, the expected value of this random variable. Let's say you do that and you
get your best estimate of the expected value of this random
variable is-- I'll use the letter lambda. You know, this could
be 9 cars per hour. You sat out there-- it could
be 9.3 cars per hour. You sat out there over hundreds
of hours and you just counted the number of cars each hour
and you averaged them all up. You said, on average, there are
9.3 cars per hour and you feel that's a pretty good estimate. So that's what you have there. And let's see what we could do. We know the binomial
distribution. The binomial distribution tells
us that the expected value of a random variable is equal to the
number of trials that that random variable's kind
of composed of, right? Before, in the previous videos
we were counting the number of heads in a coin toss. So this would be the number
of coin tosses, times the probability of success
over each toss. This is what we did with
the binomial distribution. So maybe we can model
our traffic situation something similar. This is the number of cars
that pass in an hour. So maybe we could say lambda
cars per hour is equal to-- I don't know. Let's make each experiment or
each toss of the coin equal to whether a car passes
in a given minute. So there are 60 minutes
per hour, so there would be 60 trials. And then, the probability that
we have success in each of those trials, if we modeled
this as a binomial distribution would be lambda over
60 cars per minute. And this would be
a probability. This would be n, and this would
be the probability, if we said that this is a binomial
distribution. And this probably wouldn't be
that bad of an approximation. If you actually then said,
oh, this is a binomial distribution, so the
probability that our random variable equals some
given value, k. You know, the probability that
3 cars, exactly 3 cars pass in an given hour, we would
then be equal to n. So n would be 60. Choose k, and you know,
I have 3 cars times the probability of success. So the probability that a
car passes in any minute. So it'd be lambda over
60 to the number of successes we need. So to the kth power, times the
probability of no success or that no cars pass,
to the n minus k. If we have k successes we have
to have 60 minus k failures. There are 60 minus k minutes
where no car passed. This actually wouldn't be that
bad of an approximation where you have 60 intervals and you
say this is a binomial distribution. And you'd probably get
reasonable results. But there's a core issue here. In this model where we model it
as a binomial distribution, what happens if more than
one car passes in an hour? Or more than one car
passes in a minute? The way we have it right now
we call it a success if one car passes in a minute. And if you're kind of counting
it counts as one success, even if 5 cars pass in that minute. So you say, oh, OK Sal, I
know the solution there. I just have to get
more granular. Instead of dividing it
into minutes why don't I divide it into seconds? So the probability that I have
k successes-- instead of 60 intervals I'll do
3,600 intervals. So the probability of k
successful seconds, so a second where a car is passing at that
moment out of 3,600 seconds. So that's 3,600 choose k, times
the probability that a car passes in any given second. That's the expected number of
cars in an hour divided by number seconds in an hour. We're going to
have k successes. And these are the failures,
the probability of a failure and you're going to have
3,600 minus k failures. And this would be even a
better approximation. This actually would not be so
bad, but still, you have this situation where 2 cars
can come within a half a second of each other. And you say, oh, OK Sal,
I see the pattern here. We just have to get more
and more granular. We have to just make
this number larger and larger and larger. And your intuition is correct. And if you do that you'll
end up getting the Poisson distribution. And this is really interesting
because a lot of times people give you the formula for the
Poisson distribution and you can kind of just plug in
the numbers and use it. But it's neat to know that it
really is just the binomial distribution and the binomial
distribution really did come from kind of the common
sense of flipping coins. That's where everything
is coming from. But before we kind of prove
that if we take the limit as-- let me change colors. Before we proved that as we
take the limit as this number right here, the number of
intervals approaches infinity that this becomes the
Poisson distribution. I'm going to make sure we have
a couple of mathematical tools in our belt. So the first is something that
you're probably reasonably familiar with by now, but I
just want to make sure that the limit as x approaches infinity
of 1 plus a/x to the x power is equal to e to the
ax-- no sorry. Is equal to e to the a and now
just to prove this to you, let's make a little
substitution here. Let's say that n is equal
to-- let me say 1 over n is equal to a over x. And then what would be
x would equal to na. x times 1 is equal
to n times a. And so the limit as x
approaches infinity, what does a approach? a is-- sorry. As x approaches infinity
what does n approach? Well n is x divided by a. So n would also
approach infinity. So this thing would be the same
thing as just making our substitution the limit as n
approaches infinity of 1 plus-- a/x, I made the
substitution as 1/n. And x is, by this
substitution, n times a. And this is going to be the
same thing as the limit as n approaches infinity of 1 plus
1/n to the n, all of that to the a. And since there's no n out here
we could just take the limit of this and then take
that to the a power. So that's going to be equal to
the limit as n approaches infinity of 1 plus 1/n to the
nth power, all of that to the a. And this is our definition, or
one of the ways to get to e if you'd watch the videos on
compound interest and all that. This is how we got to e. And if you tried it out on your
calculator, just try larger and larger n's here
and you'll get e. This inner part is equal to e,
and we raised it to the a power, so it's equal
to e to the a. So hopefully you pretty
satisfied that this limit is equal to e to the a. And then one other tool kit I
want in our belt, and I'll probably actually do the
proof in the next video. The other tool kit is to
recognize that x factorial over x minus k factorial is equal to
x times x minus 1 times x minus 2, all the way down
to times x minus k plus 1. And we've done this a lot of
times, but this is the most abstract we've ever written it. I can give you a couple of--
and just so you know, they'll be exactly k terms here. 1, 2, 3-- So first term, second
term, third term, all the way, and this the kth term. And this is important to
our derivation of the Poisson distribution. But just to make this in real
numbers, if I had 7 factorial over 7 minus 2 factorial,
that's equal to 7 times 6 times 5 times 4 times
3 times 3 times 1. Over 2 times-- no sorry. 7 minus 2, this is 5. So it's over 5 times 4
times 3 times 2 times 1. These cancel out and you
just have 7 times 6. And so it's 7 and then
the last term is 7 minus 2 plus 1, which is 6. In this example, k was 2 and
you had exactly 2 terms. So once we know those two
things we're now ready to derive the Poisson
distribution and I'll do that in the next video. See you soon.