If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Example: Analyzing distribution of sum of two normally distributed random variables

Finding the probability that the total of some random variables exceeds an amount by understanding the distribution of the sum of normally distributed variables.

Want to join the conversation?

  • leaf green style avatar for user makvik
    So, I tried solving this problem on my own. I figured out that 25 Liters is 2 standard deviations away form the mean. Using the 69, 95, 99.7 rule, I calculated that the chance that Shinji uses fuel between 25 and 15 is 95 percent. So there is a 5 percent chance he uses between 0 and 15 and 25 to infinity. I divide 5% by 2 because I am only interested in 25 to infinity, and get 2.5%, which is close to what Sal got but worng. Can someone explain the flaw in my logic?
    (21 votes)
    Default Khan Academy avatar avatar for user
  • male robot donald style avatar for user Kariman Fawzi
    what if the two variables aren't independent , what would be the sum variance?
    (4 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user ju lee
    why can we add variance, but can't add standard deviation?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • aqualine ultimate style avatar for user rdeyke
    If the amount of fuel he uses follows a normal distribution, wouldn't there be a small but positive chance that he uses a negative amount of fuel, since the normal distribution extends to infinity in both directions?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      Yes, theoretically, a normal distribution does extend infinitely in both directions, which would imply the possibility of negative values for any variable that follows such a distribution. However, in practical scenarios like fuel consumption, negative values don't make sense. This is an example of where the theoretical model of a normal distribution doesn't perfectly match the real-world phenomena. In such cases, the portions of the distribution that would lead to nonsensical interpretations (like negative fuel usage) are typically ignored or considered as effectively having a probability of zero.
      (1 vote)
  • blobby green style avatar for user Ganesh Prasanna
    At you find out the value of probability for values within twice the standard deviation. So, it includes Probability of the fuel ranging from values (mean-2sd) to (mean+2sd). Subtracting probability from the table from 1 will give us the value of P( fuel consumed<15L)+ P(fuel consumed>25L). What we want is just P(fuel consumed>25L). So, shouldn't you divide the final answer by two?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • hopper jumping style avatar for user Yuya Fujikawa
      Actually, Sal is right. We are used to the 68 - 95 - 99.7 rule, which tells us the percentage occupied by +- sd, +- 2sd ... So that leaves area on both sides. But, What the Z table gives us, is not that!! it gives us the percentage, that is below whatever sd you are at. So when you subtract that value(0.97..) you are actually ONLY left with the tiny area only on the right hand side.
      (3 votes)
  • blobby green style avatar for user ju lee
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Mez Cooper
    At , he says you can't just add the standard deviations.

    Basically, you have to sum the two variances and then take the square root to get the standard deviation as used at .
    (2 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      Correct, when dealing with independent random variables, you can't directly add their standard deviations to find the combined standard deviation. Instead, you add the variances (which are the squares of the standard deviations) of these variables and then take the square root of this sum to get the standard deviation of the sum of the variables. This method correctly aggregates the spread of the total distribution resulting from the sum of two independent distributions.
      (1 vote)
  • aqualine seed style avatar for user Jimmy Zsan
    In the video st 4.56 the first row Z to .09 what do these number indicate for ?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      When the instructor mentions "Z to .09," it seems there might be a slight confusion in the transcription or interpretation of the numbers mentioned. Z-scores represent how many standard deviations an element is from the mean. A Z-score of .09, for example, would indicate that the value is 0.09 standard deviations above the mean of the distribution. However, in the context provided, it appears the focus should be on a Z-score of 2.00, which corresponds to a point two standard deviations above the mean. The number ".9772" mentioned refers to the cumulative probability associated with this Z-score, indicating that 97.72% of the data in a standard normal distribution lies below this value.
      (1 vote)
  • boggle blue style avatar for user Bryan
    Is there a proof anywhere on KA or anywhere else for
    E(A+B) = E(A) + E(B)
    and
    Var(A+B) = Var(A) + Var(B)
    for independent variables
    and that summing two normally distributed gives you another normal distribution? Could someone link me to some good sources to find answers for these, even if I don't understand them at least I'll have a source to look forward to once I get better in math.
    (1 vote)
    Default Khan Academy avatar avatar for user
  • leaf orange style avatar for user Mananisawsome
    Are there any Normal Dist CDF formulas out there?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      The CDF of a normal distribution provides the probability that a variable takes a value less than or equal to a particular value. The formula for the CDF of a normal distribution isn't simple to write down because it involves an integral that doesn't have a straightforward elementary function as its solution. However, it is typically represented as:

      Φ(x) = 1/2​ [1 + erf(x−μ / σ(sqrt(2)))]

      where Φ(x) is the CDF at a point x, μ is the mean, σ is the standard deviation, and erf is the error function, which is a special function that cannot be expressed in terms of simple algebraic operations. In practical applications, the CDF of the normal distribution is often looked up in tables or computed using software or calculators that have the function pre-programmed.
      (1 vote)

Video transcript

- [Instructor] Shinji commutes to work and he worries about running out of fuel. The amount of fuel he uses follows a normal distribution for each part of his commute, but the amount of fuel he uses on the way home varies more. The amounts of fuel he uses for each part of the commute are also independent of each other. Here are summary statistics for the amount of fuel Shinji uses for each part of his commute. So when he goes to work he uses a mean of 10 liters of fuel, with a standard deviation of 1.5 liters. And on the way home, he also has a mean of 10 liters, but there is more variation. There is more spread. He has a standard deviation of two liters. Suppose that Shinji has 25 liters of fuel in his tank and he intends to drive to work and back home. What is the probability that Shinji runs out of fuel? All right, this is really interesting. We have the distributions for the amount of fuel he uses to work and to home, and they say that these are normal distributions. They say that right over here, follows a normal distribution. But here we're talking about the total amount of fuel he has to go to work and to go home. So what we wanna do is come up with a total distribution, home and back, I guess you could say. We could say, call this work plus home. Home and back. If you have two random variables that can be described by normal distributions and you were to define a new random variable as their sum, the distribution of that new random variable will still be a normal distribution and its mean will be the sum of the means of those other random variables. So the mean here, I'll say the mean of work plus home is going to be equal to 20 liters. He will use a mean of 20 liters in the roundtrip. Now for the standard deviation, from home plus work, you can't just add the standard deviations going and coming back. But because the amount of fuel going to work and the amount of fuel coming home are independent random variables, because they are independent of each other, we can add the variances. And only because they are independent can we add the variances. So what you can say is that the variance of the combined trip is equal to the variance of going to work plus the variance of going home. So what's the variance of going to work? Well, 1.5 squared is, so this will be 1.5 squared, and what's the variance coming home? Well, this is going to be two squared, two squared. Well, this is 2.25 plus four, which is equal to 6.25. So the variance on the roundtrip is equal to 6.25. If I were to take the square root of that, which is equal to 2.5, we can now describe the normal distribution of the roundtrip and use that to answer the question. So we have this normal distribution that might look something like this. We know its mean is 20 liters. So this is 20 liters. And we want to know what is the probability that Shinji runs out of fuel. Well, to run out of fuel, he would need to require more than 25 liters of fuel. So if 25 liters of fuel is right over here, so this is 25 liters of fuel, the scenario where Shinji runs out of fuel is right over here, this is where he needs more than 25 liters. He actually has 25 liters in his tank. So how do we figure out that area right over there? Well, we could use a z-table. We could say how many standard deviations above the mean is 25 liters? Well, it is five liters above the mean, so let me write this down. So the Z here, the Z is equal to 25 minus the mean, minus 20, divided by the standard deviation for, I guess you could say this combined normal distribution. This is two standard deviations above the mean or a z-score of plus two. So if we look at a z-table and we look exactly two standard deviations above the mean, that will give us this area, the cumulative area below two standard deviations above the mean. And then if we subtract that from one, we will get the area that we care about. So let's get our z-table out. We care about a z-score of exactly two, so 2.00 is right over here, .9772. So that tells us that this area right over here is 0.9772, and so that blue area, the probability that Shinji runs out of fuel is going to be one minus 0.9772, and what is that going to be equal to? Let's see, this is going to be equal to 0.0228. Did I do that right? I think I did that right. Yes, 0.0228 is the probability that Shinji runs out of fuel. If you want to think of it as a percent, 2.28% chance that he runs out of fuel.