If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

## Statistics and probability

### Unit 4: Lesson 7

More on normal distributions

# Normal distribution excel exercise

(Long-26 minutes) Presentation on spreadsheet to show that the normal distribution approximates the binomial distribution for a large number of trials. Created by Sal Khan.

## Want to join the conversation?

• where is the excel sheet , i couldn't find it in the path given in the video • Around , the convergence chart seems to have an error whenever you input the probability of finishing an odd number of steps away from the start point. Could this possibly be why?:

Since you have an even number of steps (10, 20, 30,etc.), each one being exactly one unit in directly opposing directions (so that we could say that 1 step right is equivalent to -1 steps left), you cannot finish with an odd number of units away from the center, given an even number of total steps.

Consider this: total steps = S
left steps = L
right steps = R
Left steps + Right steps = total Steps [L+R=S]
Now, total distance from the start (D) = |L-R|
and since S = L+R, solve for L and you get L = S-R
Substitute this into the distance equation:
D = |S-R-R| or, to simplify it, D = |S-2R|
So basically. the distance from the start is the total number of steps minus 2 times the number of steps right. Since R and S have to be whole numbers, to get all the possible results for D, you take the total number of steps minus all multiples of 2. If S is odd, then subtracting any multiple of 2 will keep it odd. If S is even, then subtracting any multiple of 2 will keep it even. Therefore, you cannot have an od distance from the start with an even number of total steps. • Bug 1: An odd "Final Position" causes fractional (impossible) moves and a jaggy (wrong) graph.
Fix: Change the "Total moves" formula from =C8+2*\$C\$6 to =C8+MROUND(\$C\$6,2).
Explanation: You can't do an odd number of moves and end up in the same position. The "Right Moves" and "Left Moves" formulae will result in non-integers, which will result in a wrong (often >1) "Binomial Probability". So, only add an even number of moves to the last known good "Total moves". MROUND rounds to the nearest multiple of 2.

Bug 2: A large "Final Position" causes "Binomial Probability" to display "#NUM", limiting the graph domain.
Fix: Change the "Binomial Probability" formula from =FACT(C8)/(FACT(C10) * FACT(C8-C10)) * \$C\$3^C10 * \$C\$4^C9 to =BINOMDIST(C9,C8,0.5,FALSE).
Explanation: Excel cannot handle numbers larger than 1.8e+308. The factorial of a 171 is larger than this. So, any "Total moves" larger than 170 will result in "#NUM" for "Binomial Probability", which uses factorial. Built-in functions like BINOMDIST avoid large intermediate numbers and get the same result.
• Around what's Sal referring to when the normal curve caused the financial industry to make bad choices? • A lot of old(er) risk models were based on the assumption of normally distributed risk. What typically happens during "shocks" - financial crises, supply shocks in oil markets, etc - is this is often not the case, and this is when you hear people refer to "fat-tails" or risk located multple deviations away from the center of the distribution. Behavioral finance is a discipline that tries to address this.
• • Hello, this is the closest video I have found to what I was searching for. I need help making a histogram in Excel. I get confused on what data goes into "Input" and "Bin" and each box. If someone could help me get some clarification that would be great! • Hi Zach,
INPUT=the raw data, BIN= upper interval limit.
For Example: Raw data for heights of 10 people randomly collected are {4.5,3.4,5.5,5.7,4.2,4.5,4.8,3.2,4.1,5}. This would be the INPUT.
If I want to put the data into the follwoing class intervals
less than or equals 3,
greater than 3 but less than or equals 3.5,
greater than 3.5 but less than or eqauls 4.5
and so on...
Then 3,3.5,4,4.5 etc are the BINS.
• I noticed that you are raising e to the negative 1/2 power. My text doesn't have that so could you please address this and confirm that this is correct? • Sorry everyone, pass up my question. I didn't pick out the numbers the way the I should have read the intalicized text. MY Bad. Excuse me Sal! Thanks for your communal generous insights. I am learning above the Normal Curve. If I approach the mean, I guess that would be a relative term to what "mean"ing means. Thanks to you mathematics meaning has kept on moving forward. I am very grateful for your mission here!
• Does anybody have an idea why there is a sudden difference between Binominal Dist. and Normal Dist. at 20 moves? I mean, the rest of the graph is a nice curve but why does it suddenly drop from 0.0007 (10 moves) to 0.0001 (20 moves)? • I'm watching this in the statistics and probability playlist but I think it's in the wrong place. Binomial distribution hasn't even been covered yet. • This lesson seems out of place as part of the Statistics and probability / Modeling data distributions unit. We have not yet learned about binomial distributions, if you follow the lessons as presented in sequence. It looks like we will eventually learn these concepts in later units, possibly under Probability or under Random Variables (not sure). • Considering a case of flipping coins, finding Expected value E(X), or mean, and Variance Var(X) seems pretty straightforward to me:
p - probabiliy of success, q= 1-p (probability of "fail"), n = 10 trials
E(X)=n * p= 10 * 0.5 = 5
Var(X) = n * p * (1-p) = 10 * 0.5 * 0.5 = 0.25

But how could we find the variance considering rolling the dices?
For example we roll dice 10 times, n = 10. From the previous Khan's videos I could say that expected value (expected sum after 10 rolls):
E(X)=10* 1/6 * (1+2+3+4+5+6) = 35
But how to find Variance here? What is the p and q here? And what is considered as "success" and "failure" in this situation?
Thanks
(1 vote) • The variance is the mean square deviation from the expected value. Breaking that down...
(1) Calculate the expected value of a roll: 3.5
(2) Calculate the difference from the mean for each outcome: (1-3.5), (2-3.5), (3-3.5), (4-3.5), (5-3.5), (6-3.5).
(3) Square each of the differences, multiply by the associated probability (1/6), and sum the result.

Thus the variance of a single roll of a die is 2.917. The variance of ten rolls is ten times the variance of a single roll, or 29.17.