What if the population had a third choice? Let's say that part of the population didn't have a clear opinion about it and didn't vote. How would that affect the example mentioned above?

That would not be a 'Bernoulli distribution'. A Bernoulili distribution only consists of 2 options, failure or succes.

what is the main differnce between Bernoulli and Binomial Distribution. For Bernoulli case, can I apply Binomial on it? I mean that for the flipping coin, there are also 2 options, head or tail, the same for Bernoulli with 2 options: yes or no, right?

A Bernoulli distribution is a Binomial distribution with just 1 trial. Or, a Binomial distribution is the sum of _n_ independent Bernoulli trials with the same probability of success.

I thought the mean is a sum of numbers divided by the total number of data points. How can you use a mean that is not divided?

When Sal says that 40% of the answers were unfavorable and 60% were favorable, that information is already calculated from the data points. For example, suppose the population was 1000 people. Then to get 40% unfavorable, that means that 400 people answered unfavorable. Similarly, 600 people answered favorable. Then we could multiply 400*0 and add it to 600*1, then divide by 1000 to get 0.6. If we know the percentage (or proportion) of the population in each category, that gives us enough information to calculate the mean even if we do not have access to the raw data. I can show you the algebra: Let u be the number of people who answered unfavorable. Let f be the number of people who answered favorable. Let n be the number of people in the population. We are given that u/n = 40% = 0.4 & f/n = 60% = 0.6 We calculate the mean: mu = (u*0 + f*1)/n = (u*0)/n + (f*1)/n = (u/n)*0 + (f/n)*1 = 0.4 *0 + 0.6 * 1 = 0 + 0.6 = 0.6.

So a Bernoulli distribution is just a situation where there are only 2 options? Like Yes and No or Success and Failure or Positive and Negative? And do they have to be opposites from each other necessarily? So like if the question was: do you like chocolate or vanilla ice cream better, would the responses follow a Bernoulli distribution by definition, or no?

As long as people _had_ to choose chocolate or vanilla, then that would be a Bernoulli distribution (if they were able to say "neither", that would be a 3rd option and would not be Bernoulli).

if mean and variance of bionominal distribution are 3 and 1.5 respectively, find the probablity of (1) at least one success (2) exactly 2 success.

Nice problem! If n represents the number of trials and p represents the success probability on each trial, the mean and variance are np and np(1 - p), respectively. Therefore, we have np = 3 and np(1 - p) = 1.5. Dividing the second equation by the first equation yields 1 - p = 1.5/3 = 0.5. So p = 1 - 0.5 = 0.5, and n = 3/p = 3/0.5 = 6. P(at least one success) = 1 - P(no successes) = 1 - (1 - p)^n = 1 - (0.5)^6 = 0.984375. P(exactly 2 successes) = (n choose 2) p^2 (1-p)^(n-2) = [(6*5)/(1*2)] (0.5)^2 (0.5)^4 = 0.234375. Have a blessed, wonderful day!

What is the difference between the binomial and the Bernoulli distribution?

•Bernoulli trial is a random experiment with only two possible outcomes. •Binomial experiment is a sequence of Bernoulli trials performed independently.

Sal how come you decided to define U and F as 0 and 1? If it's arbitrary, and you defined U to be 345 and F to be 3, couldn't you get a much different outcome? In my class, we calculate variance as n*p*(1-p) ... I like your way better because it uses the same intuition as the analysis of random variables, but I don't understand the above.

When we define U as 0 and F as 1, then the sample mean of our data is an estimate of the proportion, p. Could we define these numbers differently? Sure, but there is no reason to do that, and we lose interpretability.

Main content

Course: Statistics and probability > Unit 9

Lesson 6: Binomial mean and standard deviation formulas

Mean and variance of Bernoulli distribution example

Name: Mean and variance of Bernoulli distribution example
Uploaded: 2011-02-20T16:54:18Z
Description: Sal calculates the mean and variance of a Bernoulli distribution (in this example the responses are either favorable or unfavorable).

Google Classroom

Sal calculates the mean and variance of a Bernoulli distribution (in this example the responses are either favorable or unfavorable). Created by Sal Khan.

Want to join the conversation?

Sort by:

danyelds
Posted 12 years ago. Direct link to danyelds's post “What if the population ha...”
What if the population had a third choice? Let's say that part of the population didn't have a clear opinion about it and didn't vote. How would that affect the example mentioned above?
Button navigates to signup pageButton navigates to signup page
(7 votes)
Answer
- Gil Goens
  Posted 12 years ago. Direct link to Gil Goens's post “That would not be a 'Bern...”
  That would not be a 'Bernoulli distribution'. A Bernoulili distribution only consists of 2 options, failure or succes.
  Comment on Gil Goens's post “That would not be a 'Bern...”
  (20 votes)
Tombentom
Posted 8 years ago. Direct link to Tombentom's post “what is the main differnc...”
what is the main differnce between Bernoulli and Binomial Distribution. For Bernoulli case, can I apply Binomial on it? I mean that for the flipping coin, there are also 2 options, head or tail, the same for Bernoulli with 2 options: yes or no, right?
Button navigates to signup pageButton navigates to signup page
(7 votes)
Answer
- Dr C
  Posted 8 years ago. Direct link to Dr C's post “A Bernoulli distribution ...”
  A Bernoulli distribution is a Binomial distribution with just 1 trial.
  
  Or, a Binomial distribution is the sum of _n_ independent Bernoulli trials with the same probability of success.
  Comment on Dr C's post “A Bernoulli distribution ...”
  (13 votes)
hgeller1234
Posted 10 years ago. Direct link to hgeller1234's post “I thought the mean is a s...”
I thought the mean is a sum of numbers divided by the total number of data points. How can you use a mean that is not divided?
Button navigates to signup pageButton navigates to signup page
(4 votes)
Answer
- JaniceHolz
  Posted 9 years ago. Direct link to JaniceHolz's post “When Sal says that 40% of...”
  When Sal says that 40% of the answers were unfavorable and 60% were favorable, that information is already calculated from the data points.
  For example, suppose the population was 1000 people. Then to get 40% unfavorable, that means that 400 people answered unfavorable. Similarly, 600 people answered favorable. Then we could multiply 400*0 and add it to 600*1, then divide by 1000 to get 0.6.
  
  If we know the percentage (or proportion) of the population in each category, that gives us enough information to calculate the mean even if we do not have access to the raw data. I can show you the algebra:
  Let u be the number of people who answered unfavorable.
  Let f be the number of people who answered favorable.
  Let n be the number of people in the population.
  We are given that u/n = 40% = 0.4 & f/n = 60% = 0.6
  We calculate the mean:
  mu = (u*0 + f*1)/n = (u*0)/n + (f*1)/n = (u/n)*0 + (f/n)*1 = 0.4 *0 + 0.6 * 1 = 0 + 0.6 = 0.6.
  Button navigates to signup page
  (16 votes)
SanFranGiants
Posted 9 years ago. Direct link to SanFranGiants's post “So a Bernoulli distributi...”
So a Bernoulli distribution is just a situation where there are only 2 options? Like Yes and No or Success and Failure or Positive and Negative? And do they have to be opposites from each other necessarily? So like if the question was: do you like chocolate or vanilla ice cream better, would the responses follow a Bernoulli distribution by definition, or no?
Button navigates to signup pageButton navigates to signup page
(3 votes)
Answer
- Dr C
  Posted 9 years ago. Direct link to Dr C's post “As long as people _had_ t...”
  As long as people had to choose chocolate or vanilla, then that would be a Bernoulli distribution (if they were able to say "neither", that would be a 3rd option and would not be Bernoulli).
  Button navigates to signup page
  (8 votes)
rachel jalas
Posted 5 years ago. Direct link to rachel jalas's post “how can you just decide t...”
how can you just decide to define u and f as 0 and 1?
why did you choose those numbers?
Button navigates to signup pageButton navigates to signup page
(6 votes)
Answer
- deka
  Posted 2 years ago. Direct link to deka's post “once you defined one choi...”
  once you defined one choice (favour, this case) as 1. the other must be 0 (unfavour) by definition of Bernoulli distribution
  
  one of the conditions for binomial distribution is there must be 2 possible outcomes (success, failure)
  
  you can treat Bernoulli distribution giving specific numbers (1 and 0) to two cases (success and failure) of binomial distribution
  Button navigates to signup page
  (1 vote)
GYanzit Kyap Chhaki
Posted 7 years ago. Direct link to GYanzit Kyap Chhaki's post “if mean and variance of b...”
if mean and variance of bionominal distribution are 3 and 1.5 respectively, find the probablity of (1) at least one success (2) exactly 2 success.
Button navigates to signup pageButton navigates to signup page
(3 votes)
Answer
- Ian Pulizzotto
  Posted 7 years ago. Direct link to Ian Pulizzotto's post “Nice problem! If n repres...”
  Nice problem!
  If n represents the number of trials and p represents the success probability on each trial, the mean and variance are np and np(1 - p), respectively.
  Therefore, we have np = 3 and np(1 - p) = 1.5.
  Dividing the second equation by the first equation yields 1 - p = 1.5/3 = 0.5.
  So p = 1 - 0.5 = 0.5, and n = 3/p = 3/0.5 = 6.
  
  P(at least one success) = 1 - P(no successes) = 1 - (1 - p)^n = 1 - (0.5)^6 = 0.984375.
  P(exactly 2 successes) = (n choose 2) p^2 (1-p)^(n-2) = [(6*5)/(1*2)] (0.5)^2 (0.5)^4 = 0.234375.
  
  Have a blessed, wonderful day!
  Comment on Ian Pulizzotto's post “Nice problem! If n repres...”
  (3 votes)
Eric Robinson
Posted 5 years ago. Direct link to Eric Robinson's post “At time 7:48, Sal says th...”
At time
7:48
, Sal says the distribution is skewed to the right. Isn't the distribution skewed left because the tail is to the left of the mean?
Button navigates to signup pageButton navigates to signup page
(4 votes)
Answer
- daniella
  Posted a month ago. Direct link to daniella's post “The description of the di...”
  The description of the distribution as skewed to the right might have been a slip in explanation. In the context provided, where the mean is 0.6 (60% favorable), and we're dealing with a binary outcome (favorable or unfavorable), the notion of "skewness" in the traditional sense isn't the most fitting descriptor. Skewness typically refers to the asymmetry of a distribution around its mean. Since this is a binary (Bernoulli) distribution, we don't have a long tail extending to the right or left as we would with continuous data. However, if the mean were closer to one of the extremities (0 or 1) and given we're assigning "0" to unfavorable and "1" to favorable, a higher mean suggests a concentration towards the favorable side, but not "skewness" in the classical sense.
  Button navigates to signup page
  (1 vote)
Endre
Posted 10 years ago. Direct link to Endre's post “What is the difference be...”
What is the difference between the binomial and the Bernoulli distribution?
Button navigates to signup pageButton navigates to signup page
(2 votes)
Answer
- Haya.aalali
  Posted 10 years ago. Direct link to Haya.aalali's post “•Bernoulli trial is a ran...”
  •Bernoulli trial is a random experiment with only two possible outcomes.
  •Binomial experiment is a sequence of Bernoulli trials performed independently.
  Button navigates to signup page
  (2 votes)
areid41
Posted 4 years ago. Direct link to areid41's post “why dont you divide by 2 ...”
why dont you divide by 2 when taking the mean
Button navigates to signup pageButton navigates to signup page
(2 votes)
Answer
- daniella
  Posted a month ago. Direct link to daniella's post “When calculating the mean...”
  When calculating the mean (or expected value) of a Bernoulli distribution, we don't divide by 2 because we're not simply averaging two numbers. Instead, we're calculating a weighted average where the weights are the probabilities of each outcome, and the values are the outcomes themselves (0 for unfavorable, 1 for favorable).
  Button navigates to signup page
  (1 vote)
stevemarrocco24
Posted 8 years ago. Direct link to stevemarrocco24's post “Sal how come you decided ...”
Sal how come you decided to define U and F as 0 and 1? If it's arbitrary, and you defined U to be 345 and F to be 3, couldn't you get a much different outcome?

In my class, we calculate variance as n*p*(1-p) ... I like your way better because it uses the same intuition as the analysis of random variables, but I don't understand the above.
Button navigates to signup pageButton navigates to signup page
(1 vote)
Answer
- Dr C
  Posted 8 years ago. Direct link to Dr C's post “When we define U as 0 and...”
  When we define U as 0 and F as 1, then the sample mean of our data is an estimate of the proportion, p. Could we define these numbers differently? Sure, but there is no reason to do that, and we lose interpretability.
  Comment on Dr C's post “When we define U as 0 and...”
  (3 votes)

Video transcript

Let's say that I'm able to go out and survey every single member of a population, which we know is not normally practical, but I'm able to do it. And I ask each of them, what do you think of the president? And I ask them, and there's only two options, they can either have an unfavorable rating or they could have a favorable rating. And let's say after I survey every single member of this population, 40% have an unfavorable rating and 60% have a favorable rating. So if I were to draw the probability distribution, and it's going to be a discrete one because there's only two values that any person can take on. They could either have an unfavorable view or they could have a favorable view. And 40% have an unfavorable view, and let me color code this a little bit. So this is the 40% right over here, so 0.4 or maybe I'll just write 40% right over there. And then 60% have a favorable view. Let me color code this. 60% have a favorable view. And notice these two numbers add up to 100% because everyone had to pick between these two options. Now if I were to go and ask you to pick a random member of that population and say what is the expected favorability rating of that member, what would it be? Or another way to think about it is what is the mean of this distribution? And for a discrete distribution like this, your mean or you're expected value is just going to be the probability weighted sum of the different values that your distribution can take on. Now the way I've written it right here, you can't take a probability weighted sum of u and f-- you can't say 40% times u plus 60% times f, you won't get any type of a number. So what we're going to do is define u and f to be some type of value. So let's say that u is 0 and f is 1. And now the notion of taking a probability weighted sum makes some sense. So that mean, or you could say the mean, I'll say the mean of this distribution it's going to be 0.4-- that's this probability right here times 0 plus 0.6 times 1, which is going to be equal to-- this is just going to be 0.6 times 1 is 0.6. So clearly, no individual can take on the value of 0.6. No one can tell you I 60% am favorable and 40% am unfavorable. Everyone has to pick either favorable or unfavorable. So you will never actually find someone who has a 0.6 favorability value. It'll either be a 1 or a 0. So this is an interesting case where the mean or the expected value is not a value that the distribution can actually take on. It's a value some place over here that obviously cannot happen. But this is the mean, this is the expected value. And the reason why that makes sense is if you surveyed 100 people, you'd multiply 100 times this number, you would expect 60 people to say yes, or if you'd summed them all up, 60 would say yes, and then 40 would say 0. You sum them all up, you would get 60% saying yes, and that's exactly what our population distribution told us. Now what is the variance? What is the variance of this population right over here? So the variance-- let me write it over here, let me pick a new color-- the variance is just-- you could view it as the probability weighted sum of the squared distances from the mean, or the expected value of the squared distances from the mean. So what's that going to be? Well there's two different values that anything can take on. You can either have a 0 or you could either have a 1. The probability that you get a 0 is 0.4-- so there's a 0.4 probability that you get a 0. And if you get a 0 what's the distance from 0 to the mean? The distance from 0 to the mean is 0 minus 0.6, or I can even say 0.6 minus 0-- same thing because we're going to square it-- 0 minus 0.6 squared-- remember, the variance is the weighted sum of the squared distances. So this is the difference between 0 and the mean. And then plus, there's a 0.6 chance that you get a 1. And the difference between 1 and 0.6, 1 and our mean, 0.6, is that. And then we are also going to square this over here. Now what is this value going to be? This is going to be 0.4 times 0.6 squared-- this is 0.4 times point-- because 0 minus 0.6 is negative 0.6. If you square it you get positive 0.36. So this value right here-- I'm going to color code it. This value right here is times 0.36. And then this value right here-- let me do this in another-- so then we're going to have plus 0.6 times 1 minus 0.6 squared. Now 1 minus 0.6 is 0.4. 0.4 squared is 0.16. So let me do this. So this value right here is going to be 0.16. So let me get my calculator out to actually calculate these values. So this is going to be 0.4 times 0.36, plus 0.6 times 0.16, which is equal to 0.24. So our standard deviation of this distribution is 0.24. Or if you want to think about the variance of this distribution is 0.24 and the standard deviation of this distribution, which is just the square root of this, the standard deviation of this distribution is going to be the square root of 0.24, and let's calculate what that is. That is going to be-- let's take the square root of 0.24, which is equal to 0.48-- well I'll just round it up-- 0.49. So this is equal to 0.49. So if you were look at this distribution, the mean of this distribution is 0.6. So 0.6 is the mean. And the standard deviation is 0.5. So the standard deviation is-- so it's actually out here-- because if you go add one standard deviation you're almost getting to 1.1, so this is one standard deviation above, and then one standard deviation below gets you right about here. And that kind of makes sense. It's hard to kind of have a good intuition for a discrete distribution because you really can't take on those values, but it makes sense that the distribution is skewed to the right over here. Anyway, I did this example with particular numbers because I wanted to show you why this distribution is useful. In the next video I'll do these with just general numbers where this is going to be p, where this is the probability of success and this is 1 minus p, which is the probability of failure. And then we'll come up with general formulas for the mean and variance and standard deviation of this distribution, which is actually called the Bernoulli Distribution. It's the simplest case of the binomial distribution.