If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Bernoulli distribution mean and variance formulas

Sal continues on from the previous video to derive the mean and variance formulas for the Bernoulli distribution. Created by Sal Khan.

Want to join the conversation?

  • blobby green style avatar for user Mayank
    If 0 & 1 are taken as arbitrary, why can't we take -1 & 1 instead. That will result into a completely different set of formula ?
    (29 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user Sevology
      Let's say we were using 0 & 1 with p = 0.6, then 1-p = 0.4.
      In this case u = p = 0.6.
      The distance from 0.6 to 0 is 0.6. u is at 0.6 which is 60% away from 0.

      Now let's say we were using -1 & 1 with p = 0.6 and 1-p = 0.4.
      In this case, u = (1-p)*-1 + p*1 = 2p - 1 = 2*0.6 - 1 = 0.2.
      The distance from -1 to 0.2 is |-1 - 0.2| = 1.2. The total distance from -1 to 1 is 2.
      Note that 1.2/2 = 0.6, meaning that your u still lies 60% away from -1.
      (27 votes)
  • starky ultimate style avatar for user Claudia
    How do we know which variable should be 0 and which should be 1. In the .4 and .6 example, if we set .6 as 0 and .4 as 1, the mean would be .4 rather than .6. How do we select which becomes 0 and which becomes 1?
    (6 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user maxwell.mca
    There's a problem with using Sal's simplified form for Variance: sigma^2 = p(1 - p). It doesn't take into account loss of degrees of freedom when calculating sample standard deviation s^2.

    In the next video for example, if you used the p(1 - p) formula to calculate s^2 you would get 24.51/100 = 0.2451 rather than the correct answer of 24.51/99 = 0.2476 as is shown in the video. Does this mean that the simplified formula should only be used when calculating POPULATION mean and not SAMPLE mean?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Dr C
      No, the formula µ=p and σ² = p(1 - p) are exact derivations for the Bernoulli distribution. And similarly when we get to the Binomial distribution and see µ=np and σ² = np(1 - p), these are exact for the Binomial distribution.

      In practice, if we're going to make much use of these values, we will be doing an approximation of some sort anyway (e.g., assuming something follows a Normal distribution), so whether or not we're dividing by n or n-1, and what might be proper, isn't really a concern here.
      (5 votes)
  • leaf green style avatar for user niket kumar
    We are calculating mu = (1-p)*0 + p*1. Thats simplifies to p.

    Why didn't we calculate mu = p*0 + (1-p)*1, which equates to 1-p?

    I am assuming that those 0 and 1 which we are multiplying with are purely arbitrary.
    (3 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user Ray
      You could calculate mu with either equation. It depends what the probability (p) is standing for. When Sal calculated mu, p was the probability of a 1. In you 2nd equation you are using p as the probability of a 0.
      So if we use the values that Sal used in the previous video.
      (Probability of a 1 = ps = 0.6 and probability of a 0 = pf = 0.4) then...
      mu = (1-ps)*0 + ps*1 = (1-.6)*0 + .6*1 = 0.6
      mu = pf*0 + (1-pf)*1 = .4*0 + (1-0.4)*1 = 0.6
      (2 votes)
  • blobby green style avatar for user lareibstein
    how come when finding the mean you do not have to have the whole equation over 2. Don't you have to divide by the number of terms?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user Max Well Elias
      From what I could get, I think it is because the outcomes are not actual numbers, they're not strictly numerical, so we can't add them and then divide by the number of observations. For example, when fliping a coin 5 times, the outcome could be "HHTTT", so these aren't numbers we can add and then divide by 5, but we can explain it using percentage, for example, if we consider tails (T) a successful outcome, then we could say that we had 60% of successes (3/5=0.6). Actually, if you analyse what a percentage is (number of something divided by the total), we can realize that dividing by the total is the same as dividing by the number of terms. :)

      If it was random numbers, for example, "10, 3, 7, 2, 4", then it would be okay to find the mean ( (10+3+7+2+4)/5 ). In the case of "HHTTT", it seems logic to explain it using percentage
      (1 vote)
  • blobby green style avatar for user Ondřej Paška
    at I don't understand, why does he write expected value as m rather than E(X). Is it the same thing or not?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Heizzy Gateun Head
    in tossing a biased coin once where the head is twice as likely to occur as the tail, let x be the number of heads. find the moment generating function of x hence the mean and variance of x
    (2 votes)
    Default Khan Academy avatar avatar for user
  • aqualine seed style avatar for user rachel jalas
    how do you know which one to define as 0 and which one to define as 1?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      The choice of which outcome to define as 0 and which as 1 is arbitrary and depends on how you define success and failure in your problem context. In the context of the Bernoulli distribution, one outcome is typically defined as the "success" (which could be an event occurring, a favorable outcome, etc.), and the other as "failure" (which could be the absence of the event, an unfavorable outcome, etc.). It's important to be consistent in your definitions and interpretations throughout your analysis.
      (1 vote)
  • leaf orange style avatar for user Brendan
    Im assuming that the mean is always the highest value of the two probabilities. (If I labeled p as 0.40 and 0.60 as p-1, the mean will be p-1)
    So technically, its not that the mean is p, the mean is the highest values which in this example happens to be p.
    Someone please correct me if im wrong.
    (1 vote)
    Default Khan Academy avatar avatar for user
    • leafers ultimate style avatar for user Sergey Korotkov
      The mean for Bernoulli distribution is p, and it depends on what are you measuring with this p, not on what is the highest value. In Bernoulli distribution you want to measure probability of some "success" (it can be anything: heads on coin flips, 6-s on dice rolls and so on), and you define probability of this "success" as p, and so logically the probability of "failure" is 1-p. This probability p can be very small, but mean (your measure of central tendency of successes) will still be equal to p. I hope this can help!
      (2 votes)
  • old spice man green style avatar for user James Birkin
    What am I missing - if you calculate the mean surely you divide by two as there are two elements here? Ie (1-p)x 0 + p x ` all divided by two
    (1 vote)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      In the case of a Bernoulli distribution, where we have two possible outcomes (0 and 1), we don't need to divide by 2 when calculating the mean. The mean is the probability-weighted sum of the possible values, not the number of values. So, it's the probability of the "success" outcome (p) times 1 plus the probability of the "failure" outcome (1 − p) times 0, which simplifies to p
      (1 vote)

Video transcript

In the last video we figured out the mean, variance and standard deviation for our Bernoulli Distribution with specific numbers. What I want to do in this video is to generalize it. To figure out really the formulas for the mean and the variance of a Bernoulli Distribution if we don't have the actual numbers. If we just know that the probability of success is p and the probability a failure is 1 minus p. So let's look at this, let's look at a population where the probability of success-- we'll define success as 1-- as having a probability of p, and the probability of failure, the probability of failure is 1 minus p. Whatever this might be. And obviously, if you add these two up, if you view them as percentages, these are going to add up to 100%. Or if you add up these two values, they are going to add to 1. And that needs to be the case because these are the only two possibilities that can occur. If this is 60% chance of success there has to be a 40% chance of failure. 70% chance of success, 30% chance of failure. Now with this definition of this-- and this is the most general definition of a Bernoulli Distribution. It's really exactly what we did in the last video, I now want to calculate the expected value, which is the same thing as the mean of this distribution, and I also want to calculate the variance, which is the same thing as the expected squared distance of a value from the mean. So let's do that. So what is the mean over here? What is going to be the mean? Well that's just the probability weighted sum of the values that this could take on. So there is a 1 minus p probability that we get failure, that we get 0. So there's 1 minus p probability of getting 0, so times 0. And then there is a p probability of getting 1, plus p times 1. Well this is pretty easy to calculate. 0 times anything is 0. So that cancels out. And then p times 1 is just going to be p. So pretty straightforward. The mean, the expected value of this distribution, is p. And p might be here or something. So once again it's a value that you cannot actually take on in this distribution, which is interesting. But it is the expected value. Now what is going to be the variance? What is the variance of this distribution? Remember, that is the weighted sum of the squared distances from the mean. Now what's the probability that we get a 0? We already figured that out. There's a 1 minus p probability that we get a 0. So that is the probability part. And what is the squared distance from 0 to our mean? Well the squared distance from 0 to our mean-- let me write it over here-- it's going to be 0, that's the value we're taking on-- let me do that in blue since I already wrote the 0-- 0 minus our mean-- let me do this in a new color-- minus our mean. That's too similar to that orange. Let me do the mean in white. 0 minus our mean, which is p plus the probability that we get a 1, which is just p-- this is the squared distance, let me be very careful. It's the probability weighted sum of the squared distances from the mean. Now what's the distance-- now we've got a 1-- and what's the difference between 1 and the mean? It's 1 minus our mean, which is going to be p over here. And we're going to want to square this as well. This right here is going to be the variance. Now let's actually work this out. So this is going to be equal to 1 minus p. Now 0 minus p is going to be negative p. If you square it you're just going to get p squared. So it's going to be p squared. Then plus p times-- what's 1 minus p squared? 1 minus p squared is going to be 1 squared, which is just 1, minus 2 times the product of this. So this is going to be minus 2p right over here. And then plus negative p squared. So plus p squared just like that. And now let's multiply everything out. This is going to be, this term right over here is going to be p squared minus p to the third. And then this term over here, this whole thing over here, is going to be plus p times 1 is p. p times negative 2p is negative 2p squared. And then p times p squared is p to the third. Now we can simplify these. p to the third cancels out with p to the third. And then we have p squared minus 2p squared. So this right here becomes, you have this p right over here, so this is equal to p. And then when you add p squared to negative 2p squared you're left with negative p squared minus p squared. And if you want to factor a p out of this, this is going to be equal to p times, if you take p divided p you get a 1, p square divided by p is p. So p times 1 minus p, which is a pretty neat, clean formula. So our variance is p times 1 minus p. And if we want to take it to the next level and figure out the standard deviation, the standard deviation is just the square root of the variance, which is equal to the square root of p times 1 minus p. And we could even verify that this actually works for the example that we did up here. Our mean is p, the probability of success. We see that indeed it was, it was 0.6. And we know that our variance is essentially the probability of success times the probability of failure. That's our variance right over there. The probability of success in this example was 0.6, probability of failure was 0.4. You multiply the two, you get 0.24, which is exactly what we got in the last example. And if you take its square root for the standard deviation, which is what we do right here, it's 0.49. So hopefully you found that helpful, and we're going to build on this later on in some of our inferential statistics.