If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Course: Statistics and probability>Unit 9

Lesson 6: Binomial mean and standard deviation formulas

Bernoulli distribution mean and variance formulas

Sal continues on from the previous video to derive the mean and variance formulas for the Bernoulli distribution. Created by Sal Khan.

Want to join the conversation?

• If 0 & 1 are taken as arbitrary, why can't we take -1 & 1 instead. That will result into a completely different set of formula ?
• Let's say we were using 0 & 1 with p = 0.6, then 1-p = 0.4.
In this case u = p = 0.6.
The distance from 0.6 to 0 is 0.6. u is at 0.6 which is 60% away from 0.

Now let's say we were using -1 & 1 with p = 0.6 and 1-p = 0.4.
In this case, u = (1-p)*-1 + p*1 = 2p - 1 = 2*0.6 - 1 = 0.2.
The distance from -1 to 0.2 is |-1 - 0.2| = 1.2. The total distance from -1 to 1 is 2.
Note that 1.2/2 = 0.6, meaning that your u still lies 60% away from -1.
• How do we know which variable should be 0 and which should be 1. In the .4 and .6 example, if we set .6 as 0 and .4 as 1, the mean would be .4 rather than .6. How do we select which becomes 0 and which becomes 1?
• There's a problem with using Sal's simplified form for Variance: sigma^2 = p(1 - p). It doesn't take into account loss of degrees of freedom when calculating sample standard deviation s^2.

In the next video for example, if you used the p(1 - p) formula to calculate s^2 you would get 24.51/100 = 0.2451 rather than the correct answer of 24.51/99 = 0.2476 as is shown in the video. Does this mean that the simplified formula should only be used when calculating POPULATION mean and not SAMPLE mean?
• No, the formula µ=p and σ² = p(1 - p) are exact derivations for the Bernoulli distribution. And similarly when we get to the Binomial distribution and see µ=np and σ² = np(1 - p), these are exact for the Binomial distribution.

In practice, if we're going to make much use of these values, we will be doing an approximation of some sort anyway (e.g., assuming something follows a Normal distribution), so whether or not we're dividing by n or n-1, and what might be proper, isn't really a concern here.
• We are calculating mu = (1-p)*0 + p*1. Thats simplifies to p.

Why didn't we calculate mu = p*0 + (1-p)*1, which equates to 1-p?

I am assuming that those 0 and 1 which we are multiplying with are purely arbitrary.
• You could calculate mu with either equation. It depends what the probability (p) is standing for. When Sal calculated mu, p was the probability of a 1. In you 2nd equation you are using p as the probability of a 0.
So if we use the values that Sal used in the previous video.
(Probability of a 1 = ps = 0.6 and probability of a 0 = pf = 0.4) then...
mu = (1-ps)*0 + ps*1 = (1-.6)*0 + .6*1 = 0.6
mu = pf*0 + (1-pf)*1 = .4*0 + (1-0.4)*1 = 0.6
• how come when finding the mean you do not have to have the whole equation over 2. Don't you have to divide by the number of terms?
• From what I could get, I think it is because the outcomes are not actual numbers, they're not strictly numerical, so we can't add them and then divide by the number of observations. For example, when fliping a coin 5 times, the outcome could be "HHTTT", so these aren't numbers we can add and then divide by 5, but we can explain it using percentage, for example, if we consider tails (T) a successful outcome, then we could say that we had 60% of successes (3/5=0.6). Actually, if you analyse what a percentage is (number of something divided by the total), we can realize that dividing by the total is the same as dividing by the number of terms. :)

If it was random numbers, for example, "10, 3, 7, 2, 4", then it would be okay to find the mean ( (10+3+7+2+4)/5 ). In the case of "HHTTT", it seems logic to explain it using percentage
(1 vote)
• at I don't understand, why does he write expected value as m rather than E(X). Is it the same thing or not?
• he wrote mu, it just looked liked M
• in tossing a biased coin once where the head is twice as likely to occur as the tail, let x be the number of heads. find the moment generating function of x hence the mean and variance of x
• how do you know which one to define as 0 and which one to define as 1?
• The choice of which outcome to define as 0 and which as 1 is arbitrary and depends on how you define success and failure in your problem context. In the context of the Bernoulli distribution, one outcome is typically defined as the "success" (which could be an event occurring, a favorable outcome, etc.), and the other as "failure" (which could be the absence of the event, an unfavorable outcome, etc.). It's important to be consistent in your definitions and interpretations throughout your analysis.
(1 vote)
• Im assuming that the mean is always the highest value of the two probabilities. (If I labeled p as 0.40 and 0.60 as p-1, the mean will be p-1)
So technically, its not that the mean is p, the mean is the highest values which in this example happens to be p.
Someone please correct me if im wrong.
(1 vote)
• The mean for Bernoulli distribution is p, and it depends on what are you measuring with this p, not on what is the highest value. In Bernoulli distribution you want to measure probability of some "success" (it can be anything: heads on coin flips, 6-s on dice rolls and so on), and you define probability of this "success" as p, and so logically the probability of "failure" is 1-p. This probability p can be very small, but mean (your measure of central tendency of successes) will still be equal to p. I hope this can help!