If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Probability density functions

Probability density functions for continuous random variables. Created by Sal Khan.

Want to join the conversation?

  • female robot grace style avatar for user Jenny
    At Sal says that the two statements P(|Y-2|<.1) and P(1.9<Y<2.1) are the same. Why?
    (26 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user kowodo
      |Y-2|<.1 is the same as 1.9<Y<2.1 because :
      solving "|Y-2|<.1" for "(Y-2) >= 0" gives "Y-2<.1" wich gives "Y < 2.1"
      solving "|Y-2|<.1" for " (Y-2) < 0" gives "-Y+2<.1" wich gives "-Y<-1.9" wich gives "Y > 1.9"
      Search for lecture about absolute value for more explanation.
      (33 votes)
  • blobby green style avatar for user Oo Torsten Oo
    I have a hard time wrapping my head around infinity (probably not the first one.)
    I get the concept of continuity and that the probability on a specific point is zero. I am just curious... if the area under the curve is 1 but the curve goes on to +Inf (or in case of a normal distribution even to -Inf and +Inf) then it feels like you could add a little area to the right whenever you want to - so going on forever even with smaller and smaller probability that gets added to the area. So how can something be fixed to 1 when the area itself is not really fixed.
    I guess its the same paradox like the finger of a spinning wheel that has 0 probabilty to stop at any particular point but eventually does stop somewhere....
    (11 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user Dave McIntosh
      In answer to your question about how the total area can be fixed to 1 even though the curve may continue to infinity: try thinking of it this way: start with 0.5 and keep adding half and half again: that is, 0.5+0.25+0.125+0.0625 +.... (keep going forever). However far you go will not get an infinite number, you will get a number that keeps approaching but not quite reaching 1; that is, 'tending to' 1. (You may prefer to think of this as 1/2 + 1/4 + 1/8 + 1/16 + 1/32 + ..... + 1/n where n tends to infinity.) So, contrary to our intuitive first impression, it is actually possible to add increasingly small amounts infinitely and yet never be in 'danger' of exceeding a certain finite total. In this case it's because you are only ever adding on half of what you would actually need to reach 1 - like being 1m away from a wall and walking half a metre, then a quarter of a metre, etc...... - but I'm sure there are other cases!
      (17 votes)
  • leaf green style avatar for user tarjeism

    This might be stupid, but instead of asking for precisely P(Y = 2), couldn't we ask for the limit as Y approaches 2?
    (16 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user Mewho
      The question of "limit as Y approaches 2" is not at all stupid: It is exactly the point! You just have to be careful with placement of the word "precisely": The limit of Prob(Y= precisely 2) = limit of Prob(Y= 2 +/- a bit) as "a bit" approaches zero,... which = 0 . On the other hand... limit Prob(Y = 2 +/- a bit) =1 as "a bit" approaches "whatever" (ie. as "a bit" approaches infinity).
      (4 votes)
  • spunky sam blue style avatar for user Fabrizio Alejandro Ramos Roa
    Is there any continuation to this with multidimensional density functions??? more continuos density functions or expected values from continuos density functions???? beacuse i could not find any video.
    (6 votes)
    Default Khan Academy avatar avatar for user
    • primosaur seed style avatar for user Ian Pulizzotto
      Nice question! Yes, there are joint probability density functions of more than one variable! If X_1, X_2, ... , X_n are continuous random variables, then their joint density function is denoted by f(x_1, x_2, ... , x_n).

      The joint cumulative distribution function of X_1, X_2, ... , X_n is given by
      F(x_1, x_2, ... , x_n) = P(X_1 <= x_1 and X_2 <= x_2 and ... and X_n <= x_n)
      = integral -infinity to x_1 integral -infinity to x_2 ... integral -infinity to x_n of f(y_1, y_2, ... , y_n) dy_n ... dy_2 dy_1.

      The joint probability density function, f(x_1, x_2, ... , x_n), can be obtained from the joint cumulative distribution function by the formula

      f(x_1, x_2, ... , x_n) = n-fold mixed partial derivative of F(x_1, x_2, ... , x_n) with respect to x_1, x_2, ... , x_n.

      If A is a subset of R^n (i.e. n-dimensional space), then the probability that (X_1, X_2, ... , X_n) is in A is given by

      P((X_1, X_2, ... , X_n) is in A) =
      n-fold integral over (X_1, X_2, ... , X_n) in A of f(x_1, x_2, ... , x_n) dV,

      where dV is the n-dimensional infinitesimal volume element.

      For a function g of these n random variables, the expectation of g is given by
      E(g(X_1, X_2, ... , X_n)) = integral -infinity to infinity integral -infinity to infinity ... integral -infinity to infinity of f(x_1, x_2, ... , x_n) g(x_1, x_2, ... , x_n) dx_n ... dx_2 dx_1.

      Have a blessed, wonderful day!
      (4 votes)
  • leaf green style avatar for user samhita
    The probability of 2 inches of rain can't be zero, can it? I get that we can't be certain but probabilit y of 0 would imply that we never ever get 2 inches of rain but we couldn't be sure of that. I would really like to get this point cleared.
    (3 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Matthew Daly
      The probability of exactly two inches of rain is zero. But we can think about the probability of getting between 1.9 and 2.1 inches of rain and the probability of getting between 1.99 and 2.01 inches of rain and so on, because all of those probabilities with actual intervals will be non-zero. So if you consider the ratio of those probabilities to the length of the intervals and take the limit of that ratio as the intervals become very very small, you will get, in some sense, the relative likelihood that you will get "around" two inches of rain, which is what the continuous density function is trying to measure.
      (7 votes)
  • leafers sapling style avatar for user Glen
    I don't understand how you are supposed to draw a continuous probability graph properly, as it is just a probability of 0 along the whole graph.
    (3 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Dr C
      When we plot a continuous distribution, we are actually plotting the density. The probability for the continuous distribution is defined as the integral of the density function over some range (adding up the area below the curve)

      The integral at a point is zero, but the density is non-zero.
      (7 votes)
  • blobby green style avatar for user Rachel
    Does sal explain the area under the curve and explain the fact that its equal to 1 in any of his videos?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user djpailo
    In the video on Discrete and Continuous random variables, Sal said you can have an infinite number of variables for the discrete case, so long as they are countable and listable. But in this video, he says you can only have a finite number (around 20-30 seconds in). So which is it?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Dr C
      There's a pop-up text in the bottom-right where they mention that discrete RVs can take on countably infinite number of values. And that's the correct answer - discrete variables can have an infinite number of values, as long as it's 'countably infinite'.

      For example, there's the Poisson distribution, it's used to model things that have to do with a number of events per some unit, e.g. "How many texts do you receive per day?" This is how many events (texts) by some unit (per day). You could have 0 texts, 1 text, 2, 3, 4, etc etc. There's really no upper bound on the number of texts you can receive, so it can go up to infinity. But we can't have the in-betweens, there's no way to get 3.5 texts.

      Since we can go up to infinity, but we're restricted to whole numbers, this is known as countably infinite. Not all discrete distributions go up to infinity, but some do.
      (7 votes)
  • blobby green style avatar for user Shio
    I am not able to fully comprehend the probability density function. So what does the probability density function itself measures? What exactly does the y-axis of this function represents if it is not the probability?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Dr C
      The pdf and the y-value are talking about density.

      It's fairly math-heavy to try and explain it, the intuitive idea is that with discrete variables, the height of the bars of the probability distribution function can be thought of as actual probability - and is equivalent to the density.

      With continuous variables, we can't do this, and the reason is that there are S) MANY possible values that the variable can take on. Sal sort of explains at . For instance, in the video, the density at x=2 is roughly 0.5, right? Well, if we don't move too far away from 2, then the height at all the points around 2 will also have density of about 0.5. So 1.9 and 2.1 have, say, density of 0.45. And then 1.95 and 2.05 might have density of 0.48. Are you seeing the problem? We have 5 numbers with various densities, if we add them up, we get 0.5+0.45+0.45+0.48+0.48 = 2.36. So already with just a few numbers, this simply cannot be probability!

      You might think that we could just make the graph shorter, but it wouldn't work, because a line is infinitely thin, there will always be just far too many possible outcomes, and we'll always wind up with the total probability going over 1.

      So we change to thinking about the probability density. What we want is for the entire area beneath the line to be 1. Or in calculus terms, we want our pdf to integrate to 1. The density function allows us to do this.
      (4 votes)
  • marcimus pink style avatar for user Ci Qian W
    I am clear that the area under a probability density function must be 1, but can the value on the y-axis be higher than 1? (for example, the range of possible x values is very small e.g. 0.1 < x < 0.2, then the value on y-axis of the pdf should be high for the total area to be 1.) but probability density > 1 seems to make that particular x-value look like it will happen for sure
    (3 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Dr C
      Yes, the value of the PDF can exceed 1. For instance, there is the Beta distribution for which this is quite common (http://en.wikipedia.org/wiki/Beta_distribution).

      It's important to note, however, that the value of the PDF is not the probability of that particular X-value, but the density. For continuous random variables, the probability at a given point is actually 0, even though the density may be much higher. To find a probability, we need to integrate over some interval, and when we do that, the length of the interval will cause the probability for a given interval to be less than 1.
      (4 votes)

Video transcript

In the last video, I introduced you to the notion of-- well, really we started with the random variable. And then we moved on to the two types of random variables. You had discrete, that took on a finite number of values. And the these, I was going to say that they tend to be integers, but they don't always have to be integers. You have discrete, so finite meaning you can't have an infinite number of values for a discrete random variable. And then we have the continuous, which can take on an infinite number. And the example I gave for continuous is, let's say random variable x. And people do tend to use-- let me change it a little bit, just so you can see it can be something other than an x. Let's have the random variable capital Y. They do tend to be capital letters. Is equal to the exact amount of rain tomorrow. And I say rain because I'm in northern California. It's actually raining quite hard right now. We're short right now, so that's a positive. We've been having a drought, so that's a good thing. But the exact amount of rain tomorrow. And let's say I don't know what the actual probability distribution function for this is, but I'll draw one and then we'll interpret it. Just so you can kind of think about how you can think about continuous random variables. So let me draw a probability distribution, or they call it its probability density function. And we draw like this. And let's say that there is-- it looks something like this. Like that. All right, and then I don't know what this height is. So the x-axis here is the amount of rain. Where this is 0 inches, this is 1 inch, this is 2 inches, this is 3 inches, 4 inches. And then this is some height. Let's say it peaks out here at, I don't know, let's say this 0.5. So the way to think about it, if you were to look at this and I were to ask you, what is the probability that Y-- because that's our random variable-- that Y is exactly equal to 2 inches? That Y is exactly equal to two inches. What's the probability of that happening? Well, based on how we thought about the probability distribution functions for the discrete random variable, you'd say OK, let's see. 2 inches, that's the case we care about right now. Let me go up here. You'd say it looks like it's about 0.5. And you'd say, I don't know, is it a 0.5 chance? And I would say no, it is not a 0.5 chance. And before we even think about how we would interpret it visually, let's just think about it logically. What is the probability that tomorrow we have exactly 2 inches of rain? Not 2.01 inches of rain, not 1.99 inches of rain. Not 1.99999 inches of rain, not 2.000001 inches of rain. Exactly 2 inches of rain. I mean, there's not a single extra atom, water molecule above the 2 inch mark. And not as single water molecule below the 2 inch mark. It's essentially 0, right? It might not be obvious to you, because you've probably heard, oh, we had 2 inches of rain last night. But think about it, exactly 2 inches, right? Normally if it's 2.01 people will say that's 2. But we're saying no, this does not count. It can't be 2 inches. We want exactly 2. 1.99 does not count. Normally our measurements, we don't even have tools that can tell us whether it is exactly 2 inches. No ruler you can even say is exactly 2 inches long. At some point, just the way we manufacture things, there's going to be an extra atom on it here or there. So the odds of actually anything being exactly a certain measurement to the exact infinite decimal point is actually 0. The way you would think about a continuous random variable, you could say what is the probability that Y is almost 2? So if we said that the absolute value of Y minus is 2 is less than some tolerance? Is less than 0.1. And if that doesn't make sense to you, this is essentially just saying what is the probability that Y is greater than 1.9 and less than 2.1? These two statements are equivalent. I'll let you think about it a little bit. But now this starts to make a little bit of sense. Now we have an interval here. So we want all Y's between 1.9 and 2.1. So we are now talking about this whole area. And area is key. So if you want to know the probability of this occurring, you actually want the area under this curve from this point to this point. And for those of you who have studied your calculus, that would essentially be the definite integral of this probability density function from this point to this point. So from-- let me see, I've run out of space down here. So let's say if this graph-- let me draw it in a different color. If this line was defined by, I'll call it f of x. I could call it p of x or something. The probability of this happening would be equal to the integral, for those of you who've studied calculus, from 1.9 to 2.1 of f of x dx. Assuming this is the x-axis. So it's a very important thing to realize. Because when a random variable can take on an infinite number of values, or it can take on any value between an interval, to get an exact value, to get exactly 1.999, the probability is actually 0. It's like asking you what is the area under a curve on just this line. Or even more specifically, it's like asking you what's the area of a line? An area of a line, if you were to just draw a line, you'd say well, area is height times base. Well the height has some dimension, but the base, what's the width the a line? As far as the way we've defined a line, a line has no with, and therefore no area. And it should make intuitive sense. That the probability of a very super-exact thing happening is pretty much 0. That you really have to say, OK what's the probably that we'll get close to 2? And then you can define an area. And if you said oh, what's the probability that we get someplace between 1 and 3 inches of rain, then of course the probability is much higher. The probability is much higher. It would be all of this kind of stuff. You could also say what's the probability we have less than 0.1 of rain? Then you would go here and if this was 0.1, you would calculate this area. And you could say what's the probability that we have more than 4 inches of rain tomorrow? Then you would start here and you'd calculate the area in the curve all the way to infinity, if the curve has area all the way to infinity. And hopefully that's not an infinite number, right? Then your probability won't make any sense. But hopefully if you take this sum it comes to some number. And we'll say there's only a 10% chance that you have more than 4 inches tomorrow. And all of this should immediately lead to one light bulb in your head, is that the probability of all of the events that might occur can't be more than 100%. Right? All the events combined-- there's a probability of 1 that one of these events will occur. So essentially, the whole area under this curve has to be equal to 1. So if we took the integral of f of x from 0 to infinity, this thing, at least as I've drawn it, dx should be equal to 1. For those of you who've studied calculus. For those of you who haven't, an integral is just the area under a curve. And you can watch the calculus videos if you want to learn a little bit more about how to do them. And this also applies to the discrete probability distributions. Let me draw one. The sum of all of the probabilities have to be equal to 1. And that example with the dice-- or let's say, since it's faster to draw, the coin-- the two probabilities have to be equal to 1. So this is 1, 0, where x is equal to 1 if we're heads or 0 if we're tails. Each of these have to be 0.5. Or they don't have to be 0.5, but if one was 0.6, the other would have to be 0.4. They have to add to 1. If one of these was-- you can't have a 60% probability of getting a heads and then a 60% probability of getting a tails as well. Because then you would have essentially 120% probability of either of the outcomes happening, which makes no sense at all. So it's important to realize that a probability distribution function, in this case for a discrete random variable, they all have to add up to 1. So 0.5 plus 0.5. And in this case the area under the probability density function also has to be equal to 1. Anyway, I'm all the time for now. In the next video I'll introduce you to the idea of an expected value. See you soon.