Main content

### Course: Statistics and probability > Unit 9

Lesson 6: Binomial mean and standard deviation formulas- Mean and variance of Bernoulli distribution example
- Bernoulli distribution mean and variance formulas
- Expected value of a binomial variable
- Variance of a binomial variable
- Finding the mean and standard deviation of a binomial random variable
- Mean and standard deviation of a binomial random variable

© 2024 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Bernoulli distribution mean and variance formulas

Sal continues on from the previous video to derive the mean and variance formulas for the Bernoulli distribution. Created by Sal Khan.

## Want to join the conversation?

- If 0 & 1 are taken as arbitrary, why can't we take -1 & 1 instead. That will result into a completely different set of formula ?(29 votes)
- Let's say we were using 0 & 1 with p = 0.6, then 1-p = 0.4.

In this case u = p = 0.6.

The distance from 0.6 to 0 is 0.6. u is at 0.6 which is 60% away from 0.

Now let's say we were using -1 & 1 with p = 0.6 and 1-p = 0.4.

In this case, u = (1-p)*-1 + p*1 = 2p - 1 = 2*0.6 - 1 = 0.2.

The distance from -1 to 0.2 is |-1 - 0.2| = 1.2. The total distance from -1 to 1 is 2.

Note that 1.2/2 = 0.6, meaning that your u still lies 60% away from -1.(27 votes)

- How do we know which variable should be 0 and which should be 1. In the .4 and .6 example, if we set .6 as 0 and .4 as 1, the mean would be .4 rather than .6. How do we select which becomes 0 and which becomes 1?(6 votes)
- There's a problem with using Sal's simplified form for Variance: sigma^2 = p(1 - p). It doesn't take into account loss of degrees of freedom when calculating sample standard deviation s^2.

In the next video for example, if you used the p(1 - p) formula to calculate s^2 you would get 24.51/100 = 0.2451 rather than the correct answer of 24.51/99 = 0.2476 as is shown in the video. Does this mean that the simplified formula should only be used when calculating POPULATION mean and not SAMPLE mean?(3 votes)- No, the formula µ=p and σ² = p(1 - p) are exact derivations for the Bernoulli distribution. And similarly when we get to the Binomial distribution and see µ=np and σ² = np(1 - p), these are exact for the Binomial distribution.

In practice, if we're going to make much use of these values, we will be doing an approximation of some sort anyway (e.g., assuming something follows a Normal distribution), so whether or not we're dividing by n or n-1, and what might be proper, isn't really a concern here.(5 votes)

- We are calculating mu = (1-p)*0 + p*1. Thats simplifies to p.

Why didn't we calculate mu = p*0 + (1-p)*1, which equates to 1-p?

I am assuming that those 0 and 1 which we are multiplying with are purely arbitrary.(3 votes)- You could calculate mu with either equation. It depends what the probability (p) is standing for. When Sal calculated mu, p was the probability of a 1. In you 2nd equation you are using p as the probability of a 0.

So if we use the values that Sal used in the previous video.

(Probability of a 1 = ps = 0.6 and probability of a 0 = pf = 0.4) then...

mu = (1-ps)*0 + ps*1 = (1-.6)*0 + .6*1 = 0.6

mu = pf*0 + (1-pf)*1 = .4*0 + (1-0.4)*1 = 0.6(2 votes)

- how come when finding the mean you do not have to have the whole equation over 2. Don't you have to divide by the number of terms?(3 votes)
- From what I could get, I think it is because the outcomes are not actual numbers, they're not strictly numerical, so we can't add them and then divide by the number of observations. For example, when fliping a coin 5 times, the outcome could be "HHTTT", so these aren't numbers we can add and then divide by 5, but we can explain it using percentage, for example, if we consider tails (T) a successful outcome, then we could say that we had 60% of successes (3/5=0.6). Actually, if you analyse what a percentage is (number of something divided by the total), we can realize that dividing by the total is the same as dividing by the number of terms. :)

If it was random numbers, for example, "10, 3, 7, 2, 4", then it would be okay to find the mean ( (10+3+7+2+4)/5 ). In the case of "HHTTT", it seems logic to explain it using percentage(1 vote)

- at2:33I don't understand, why does he write expected value as m rather than E(X). Is it the same thing or not?(2 votes)
- he wrote mu, it just looked liked M(2 votes)

- in tossing a biased coin once where the head is twice as likely to occur as the tail, let x be the number of heads. find the moment generating function of x hence the mean and variance of x(2 votes)
- how do you know which one to define as 0 and which one to define as 1?(2 votes)
- The choice of which outcome to define as 0 and which as 1 is arbitrary and depends on how you define success and failure in your problem context. In the context of the Bernoulli distribution, one outcome is typically defined as the "success" (which could be an event occurring, a favorable outcome, etc.), and the other as "failure" (which could be the absence of the event, an unfavorable outcome, etc.). It's important to be consistent in your definitions and interpretations throughout your analysis.(1 vote)

- Im assuming that the mean is always the highest value of the two probabilities. (If I labeled p as 0.40 and 0.60 as p-1, the mean will be p-1)

So technically, its not that the mean is p, the mean is the highest values which in this example happens to be p.

Someone please correct me if im wrong.(1 vote)- The mean for Bernoulli distribution is p, and it depends on what are you measuring with this p, not on what is the highest value. In Bernoulli distribution you want to measure probability of some "success" (it can be anything: heads on coin flips, 6-s on dice rolls and so on), and you define probability of this "success" as p, and so logically the probability of "failure" is 1-p. This probability p can be very small, but mean (your measure of central tendency of successes) will still be equal to p. I hope this can help!(2 votes)

- What am I missing - if you calculate the mean surely you divide by two as there are two elements here? Ie (1-p)x 0 + p x ` all divided by two(1 vote)
- In the case of a Bernoulli distribution, where we have two possible outcomes (0 and 1), we don't need to divide by 2 when calculating the mean. The mean is the probability-weighted sum of the possible values, not the number of values. So, it's the probability of the "success" outcome (p) times 1 plus the probability of the "failure" outcome (1 − p) times 0, which simplifies to p(1 vote)

## Video transcript

In the last video we figured
out the mean, variance and standard deviation for our
Bernoulli Distribution with specific numbers. What I want to do in this video
is to generalize it. To figure out really the
formulas for the mean and the variance of a Bernoulli
Distribution if we don't have the actual numbers. If we just know that the
probability of success is p and the probability a failure
is 1 minus p. So let's look at this, let's
look at a population where the probability of success-- we'll
define success as 1-- as having a probability of p, and
the probability of failure, the probability of failure
is 1 minus p. Whatever this might be. And obviously, if you add these
two up, if you view them as percentages, these are
going to add up to 100%. Or if you add up these
two values, they are going to add to 1. And that needs to be the case
because these are the only two possibilities that can occur. If this is 60% chance of success
there has to be a 40% chance of failure. 70% chance of success, 30%
chance of failure. Now with this definition of
this-- and this is the most general definition of a
Bernoulli Distribution. It's really exactly what we did
in the last video, I now want to calculate the expected
value, which is the same thing as the mean of this
distribution, and I also want to calculate the variance, which
is the same thing as the expected squared distance of
a value from the mean. So let's do that. So what is the mean over here? What is going to be the mean? Well that's just the probability
weighted sum of the values that this
could take on. So there is a 1 minus p
probability that we get failure, that we get 0. So there's 1 minus
p probability of getting 0, so times 0. And then there is a p
probability of getting 1, plus p times 1. Well this is pretty
easy to calculate. 0 times anything is 0. So that cancels out. And then p times 1 is
just going to be p. So pretty straightforward. The mean, the expected value
of this distribution, is p. And p might be here
or something. So once again it's a value that
you cannot actually take on in this distribution,
which is interesting. But it is the expected value. Now what is going to
be the variance? What is the variance of
this distribution? Remember, that is the weighted
sum of the squared distances from the mean. Now what's the probability
that we get a 0? We already figured that out. There's a 1 minus p probability
that we get a 0. So that is the probability
part. And what is the squared distance
from 0 to our mean? Well the squared distance from
0 to our mean-- let me write it over here-- it's going to be
0, that's the value we're taking on-- let me do that in
blue since I already wrote the 0-- 0 minus our mean-- let
me do this in a new color-- minus our mean. That's too similar
to that orange. Let me do the mean in white. 0 minus our mean, which is p
plus the probability that we get a 1, which is just p-- this
is the squared distance, let me be very careful. It's the probability weighted
sum of the squared distances from the mean. Now what's the distance-- now
we've got a 1-- and what's the difference between
1 and the mean? It's 1 minus our mean, which
is going to be p over here. And we're going to want to
square this as well. This right here is going
to be the variance. Now let's actually
work this out. So this is going to be
equal to 1 minus p. Now 0 minus p is going
to be negative p. If you square it you're just
going to get p squared. So it's going to be p squared. Then plus p times-- what's
1 minus p squared? 1 minus p squared is going to be
1 squared, which is just 1, minus 2 times the
product of this. So this is going to be minus
2p right over here. And then plus negative
p squared. So plus p squared
just like that. And now let's multiply
everything out. This is going to be, this term
right over here is going to be p squared minus p
to the third. And then this term over here,
this whole thing over here, is going to be plus
p times 1 is p. p times negative 2p is
negative 2p squared. And then p times p squared
is p to the third. Now we can simplify these. p to the third cancels out
with p to the third. And then we have p squared
minus 2p squared. So this right here becomes,
you have this p right over here, so this is equal to p. And then when you add p squared
to negative 2p squared you're left with negative p
squared minus p squared. And if you want to factor a p
out of this, this is going to be equal to p times, if you take
p divided p you get a 1, p square divided by p is p. So p times 1 minus p, which is
a pretty neat, clean formula. So our variance is p
times 1 minus p. And if we want to take it to the
next level and figure out the standard deviation, the
standard deviation is just the square root of the variance,
which is equal to the square root of p times 1 minus p. And we could even verify that
this actually works for the example that we did up here. Our mean is p, the probability
of success. We see that indeed it
was, it was 0.6. And we know that our variance is
essentially the probability of success times the probability
of failure. That's our variance
right over there. The probability of success
in this example was 0.6, probability of failure
was 0.4. You multiply the two, you get
0.24, which is exactly what we got in the last example. And if you take its square
root for the standard deviation, which is what we
do right here, it's 0.49. So hopefully you found that
helpful, and we're going to build on this later on in some
of our inferential statistics.