Main content

## Statistics and probability

### Unit 9: Lesson 6

Binomial mean and standard deviation formulas- Mean and variance of Bernoulli distribution example
- Bernoulli distribution mean and variance formulas
- Expected value of a binomial variable
- Variance of a binomial variable
- Finding the mean and standard deviation of a binomial random variable
- Mean and standard deviation of a binomial random variable

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Mean and variance of Bernoulli distribution example

Sal calculates the mean and variance of a Bernoulli distribution (in this example the responses are either favorable or unfavorable). Created by Sal Khan.

## Want to join the conversation?

- At4:04Sal defines the variance as the probability weighted sum of the squared distances from the mean or the expected valued of the squared distances from the mean. What is the relation of this formula to what we learned in earlier videos about calculating variance as the sum of differences of a sample minus the mean of the sample squared divided by (n-1)?(39 votes)
- You have all the right concepts in play, you just have to relate them. At the start of the video Sal remarks "[Imagine] we can survay every member of the population." This indicates we are doing the population variance not sample variance. In a previous video he stated that population variance is the sum of the squared distances from the mean divided by N. This is much like the population mean which is simply the sum of all the values divided by N. Now if you remember back to the video on expected value, we can express the population mean not only as the traditional formula (sum / N), but also as the sum of each value multiplied by its frequency (also called the weighted sum). This frees us from having a set size for N and we can take the expected value of an infinite set. Essentially what is happening here for the variance is the same process. Instead of dividing the square distances by N to arrive at the variance we are multiplying each by its weight (i.e. frequency, i.e. probability) in the distribution. With this method we can calculate the variance of an infinite population.(75 votes)

- (2:15-3:50) So if you assigned the unfavorable as 1 and favorable as 0, you'd end up with a different mean...? How do you know what number to assign to each variable?(25 votes)
- In fact, you could choose -1 for unfavorable and +1 for favorable. That way, a 0 mean would represent a neutral overall favorability rating, a negative number would yield a negative mean sentiment and a positive number would yield a positive mean sentiment. However, choosing a 0 to represent one of the values simplifies the math.(16 votes)

- What if the population had a third choice? Let's say that part of the population didn't have a clear opinion about it and didn't vote. How would that affect the example mentioned above?(7 votes)
- That would not be a 'Bernoulli distribution'. A Bernoulili distribution only consists of 2 options, failure or succes.(18 votes)

- what is the main differnce between Bernoulli and Binomial Distribution. For Bernoulli case, can I apply Binomial on it? I mean that for the flipping coin, there are also 2 options, head or tail, the same for Bernoulli with 2 options: yes or no, right?(7 votes)
- A Bernoulli distribution is a Binomial distribution with just 1 trial.

Or, a Binomial distribution is the sum of _n_ independent Bernoulli trials with the same probability of success.(13 votes)

- I thought the mean is a sum of numbers divided by the total number of data points. How can you use a mean that is not divided?(4 votes)
- When Sal says that 40% of the answers were unfavorable and 60% were favorable, that information is already calculated from the data points.

For example, suppose the population was 1000 people. Then to get 40% unfavorable, that means that 400 people answered unfavorable. Similarly, 600 people answered favorable. Then we could multiply 400*0 and add it to 600*1, then divide by 1000 to get 0.6.

If we know the percentage (or proportion) of the population in each category, that gives us enough information to calculate the mean even if we do not have access to the raw data. I can show you the algebra:

Let u be the number of people who answered unfavorable.

Let f be the number of people who answered favorable.

Let n be the number of people in the population.

We are given that u/n = 40% = 0.4 & f/n = 60% = 0.6

We calculate the mean:

mu = (u*0 + f*1)/n = (u*0)/n + (f*1)/n = (u/n)*0 + (f/n)*1 = 0.4 *0 + 0.6 * 1 = 0 + 0.6 = 0.6.(14 votes)

- So a Bernoulli distribution is just a situation where there are only 2 options? Like Yes and No or Success and Failure or Positive and Negative? And do they have to be opposites from each other necessarily? So like if the question was: do you like chocolate or vanilla ice cream better, would the responses follow a Bernoulli distribution by definition, or no?(3 votes)
- As long as people
*had*to choose chocolate or vanilla, then that would be a Bernoulli distribution (if they were able to say "neither", that would be a 3rd option and would not be Bernoulli).(8 votes)

- What happened to the (n-1) value in the denominator?(2 votes)
- You have to divide by (n-1) if you want to calculate the sample variance (n-1 is a better approximation than just dividing by n), but here it's the variance of the whole population.(9 votes)

- how can you just decide to define u and f as 0 and 1?

why did you choose those numbers?(6 votes)- once you defined one choice (favour, this case) as 1. the other must be 0 (unfavour) by definition of Bernoulli distribution

one of the conditions for binomial distribution is there must be 2 possible outcomes (success, failure)

you can treat Bernoulli distribution giving specific numbers (1 and 0) to two cases (success and failure) of binomial distribution(1 vote)

- if mean and variance of bionominal distribution are 3 and 1.5 respectively, find the probablity of (1) at least one success (2) exactly 2 success.(3 votes)
- Nice problem!

If n represents the number of trials and p represents the success probability on each trial, the mean and variance are np and np(1 - p), respectively.

Therefore, we have np = 3 and np(1 - p) = 1.5.

Dividing the second equation by the first equation yields 1 - p = 1.5/3 = 0.5.

So p = 1 - 0.5 = 0.5, and n = 3/p = 3/0.5 = 6.

P(at least one success) = 1 - P(no successes) = 1 - (1 - p)^n = 1 - (0.5)^6 = 0.984375.

P(exactly 2 successes) = (n choose 2) p^2 (1-p)^(n-2) = [(6*5)/(1*2)] (0.5)^2 (0.5)^4 = 0.234375.

Have a blessed, wonderful day!(3 votes)

- At time7:48, Sal says the distribution is skewed to the right. Isn't the distribution skewed left because the tail is to the left of the mean?(3 votes)

## Video transcript

Let's say that I'm able to go
out and survey every single member of a population, which
we know is not normally practical, but I'm
able to do it. And I ask each of them, what do
you think of the president? And I ask them, and there's
only two options, they can either have an unfavorable
rating or they could have a favorable rating. And let's say after I survey
every single member of this population, 40% have an
unfavorable rating and 60% have a favorable rating. So if I were to draw the
probability distribution, and it's going to be a discrete one
because there's only two values that any person
can take on. They could either have an
unfavorable view or they could have a favorable view. And 40% have an unfavorable
view, and let me color code this a little bit. So this is the 40% right over
here, so 0.4 or maybe I'll just write 40% right
over there. And then 60% have a
favorable view. Let me color code this. 60% have a favorable view. And notice these two numbers
add up to 100% because everyone had to pick between
these two options. Now if I were to go and ask you
to pick a random member of that population and say what is
the expected favorability rating of that member,
what would it be? Or another way to think about it
is what is the mean of this distribution? And for a discrete distribution
like this, your mean or you're expected value
is just going to be the probability weighted sum of the
different values that your distribution can take on. Now the way I've written it
right here, you can't take a probability weighted sum of u
and f-- you can't say 40% times u plus 60% times
f, you won't get any type of a number. So what we're going to do
is define u and f to be some type of value. So let's say that u
is 0 and f is 1. And now the notion of taking a
probability weighted sum makes some sense. So that mean, or you could say
the mean, I'll say the mean of this distribution it's going
to be 0.4-- that's this probability right here times 0
plus 0.6 times 1, which is going to be equal to-- this
is just going to be 0.6 times 1 is 0.6. So clearly, no individual can
take on the value of 0.6. No one can tell you I 60%
am favorable and 40% am unfavorable. Everyone has to pick either
favorable or unfavorable. So you will never actually find
someone who has a 0.6 favorability value. It'll either be a 1 or a 0. So this is an interesting case
where the mean or the expected value is not a value that
the distribution can actually take on. It's a value some place
over here that obviously cannot happen. But this is the mean, this
is the expected value. And the reason why that makes
sense is if you surveyed 100 people, you'd multiply 100 times
this number, you would expect 60 people to say yes,
or if you'd summed them all up, 60 would say yes, and
then 40 would say 0. You sum them all up, you would
get 60% saying yes, and that's exactly what our population
distribution told us. Now what is the variance? What is the variance of this
population right over here? So the variance-- let me write
it over here, let me pick a new color-- the variance is
just-- you could view it as the probability weighted sum of
the squared distances from the mean, or the expected value
of the squared distances from the mean. So what's that going to be? Well there's two different
values that anything can take on. You can either have a 0 or you
could either have a 1. The probability that you get a
0 is 0.4-- so there's a 0.4 probability that you get a 0. And if you get a 0 what's the
distance from 0 to the mean? The distance from 0 to the mean
is 0 minus 0.6, or I can even say 0.6 minus 0-- same
thing because we're going to square it-- 0 minus 0.6
squared-- remember, the variance is the weighted sum
of the squared distances. So this is the difference
between 0 and the mean. And then plus, there's a 0.6
chance that you get a 1. And the difference between
1 and 0.6, 1 and our mean, 0.6, is that. And then we are also going
to square this over here. Now what is this value
going to be? This is going to be 0.4 times
0.6 squared-- this is 0.4 times point-- because 0 minus
0.6 is negative 0.6. If you square it you
get positive 0.36. So this value right here-- I'm
going to color code it. This value right here
is times 0.36. And then this value right
here-- let me do this in another-- so then we're going to
have plus 0.6 times 1 minus 0.6 squared. Now 1 minus 0.6 is 0.4. 0.4 squared is 0.16. So let me do this. So this value right here
is going to be 0.16. So let me get my calculator
out to actually calculate these values. So this is going to be 0.4 times
0.36, plus 0.6 times 0.16, which is equal to 0.24. So our standard deviation of
this distribution is 0.24. Or if you want to think about
the variance of this distribution is 0.24 and the
standard deviation of this distribution, which is just the
square root of this, the standard deviation of this
distribution is going to be the square root of 0.24, and
let's calculate what that is. That is going to be-- let's take
the square root of 0.24, which is equal to 0.48-- well
I'll just round it up-- 0.49. So this is equal to 0.49. So if you were look at this
distribution, the mean of this distribution is 0.6. So 0.6 is the mean. And the standard deviation
is 0.5. So the standard deviation is--
so it's actually out here-- because if you go add one
standard deviation you're almost getting to 1.1, so this
is one standard deviation above, and then one standard
deviation below gets you right about here. And that kind of makes sense. It's hard to kind of have a good
intuition for a discrete distribution because you really
can't take on those values, but it makes sense
that the distribution is skewed to the right over here. Anyway, I did this example
with particular numbers because I wanted to
show you why this distribution is useful. In the next video I'll do
these with just general numbers where this is going
to be p, where this is the probability of success and this
is 1 minus p, which is the probability of failure. And then we'll come up with
general formulas for the mean and variance and standard
deviation of this distribution, which is actually
called the Bernoulli Distribution. It's the simplest case of the
binomial distribution.