We take an extremely deep dive into the normal distribution to explore the parent function that generates normal distributions, and how to modify parameters in the function to produce a normal distribution with any given mean and standard deviation. We also look at relative frequency as area under the normal distribution. Created by Sal Khan.
Want to join the conversation?
- Why does the formula for a normal distribution contain pi?(41 votes)
- For normalization purposes. The integral of the rest of the function is square root of 2xpi. So it must be normalized (integral of negative to positive infinity must be equal to 1 in order to define a probability density distribution). Actually, the normal distribution is based on the function exp(-x²/2). If you try to graph that, you'll see it looks already like the bell shape of the normal function. If you then graph exp(-(x-mu)²/2), you'll see the same function shifted by its mean - the mean must correspond to the function's maximum. That is a basic characteristic of the normal distribution. Finally, if you try out exp(-((x-mu)/sigma)²/2) you'll then find out that you have the same shape shifted by the mean and elongated or shrunk by the standard deviation. The same applies to any function f(x). f(x/2) is tighter, while f(x/0.5) is wider than the original f(x). So the core of the normal distribution is exp(-x²/2). The variable squared gives this function is parabolic look, while the negative sign makes its concavity look downward. At last, the exponential gives the function its asymptotic behavior. For more information on the nature of the normal distribution, take a look at http://courses.ncssm.edu/math/Talks/PDFS/normal.pdf(61 votes)
- After listening to this video, and reflecting on what I learned, I came across this thought: "The probability of something occuring 0% of the time" is not an equivelant statement to "It cannot happen". If my thought process is right, this is because 0% is on the chart. However, "cannot happen" is a statement that is not on the chart.
So, the probability of randomly pulling data ten-thousand standard deviations away might be 0%, but it is still on the normal distribution curve. The probablity of nighttime and daytime occuring simotaniously cannot happen. So that is not on the curve.
Thus, '0% chance of happening' is not an equivelant statement to 'cannot happen'. Right?(25 votes)
- This is a good question about probability. The shortest answer would be: having a probability of zero is equivalent with being impossible. In fact, that is how we define impossible. The rest of this answer is a somewhat lengthy explanation, but I couldn't think of a way to shorten it without sacrificing a point.
What probability does is assign a value (the probability) to a particular outcome. Say I place 3 red marbles, 3 yellow marbles. How can we describe the probability distribution? Well, we'd start with the obvious:
P( red )=3/6
P( yellow ) = 3/6
But we can always write things like:
P( bananna ) = 0
P( Battlestar Galactica ) = 0
P( coffee mug ) = 0
We're simply only interested in the non-zero probability, so we don't write all the items for which the probability is zero.
Now we get to the normal distribution. You are right that on a theoretical level, it goes out to infinity in either direction. But the curve never actually hits zero. So all of those values are possible values, they just have extremely small probability. What you are thinking about as 0% probability, is actually just rounded off. If something has probability 0.0001, okay, that's pretty unlikely. What about 0.0000001 ? Well, that's even more unlikely. At some point, we just get sick of the whole process and round it off to 0.
Are those values actually impossible to obtain? Possibly, but not necessarily. However, in spite of whether or not something is theoretically possible, on a practical level it is impossible. For example, let's say we're measuring the rainfall in a certain city, and that the mean is 25 inches and the standard deviation is 5 inches.
Clearly the amount of rainfall cannot be negative, but we can still put a normal distribution on this. Why? Because the probabilities down near zero will be so small that they'll round off to zero, so it makes no difference really. Let's try it out, say we want to calculate the probability of less than -1 inches of rain. Doing this, we'd get P(X<-1)=0.0000001.
It's clearly an impossible event, but the probability is not equal to zero. However, it is so small as to be practically zero, so this isn't going to affect much of anything, and it is much easier than the alternative of using a different distribution to model the rainfall.
Similarly, we could ask about the probability of more than 60 inches of rain. There's not really an upper bound on rainfall, so 60 inches isn't a physically impossible value like -1, but the probability is 0.000000000001. Again, this is so small that we'd just round it off to zero, and say that it is impossible to have more than 60 inches of rain in this area.
Sorry for the length of this, but it was an interesting question that is hard to describe in a shorter post.(77 votes)
- When trying to download the spreadsheet at http://www.khanacademy.org/downloads/NormalIntro.xls, the server is not found and i get a 404 error.
Does anyone else have this problem and how can we solve it?(23 votes)
- Yes, I had this problem and the parent directory also doesn't have a link to the file. Just google the link and you'll find it in the results (I found it on weebly something).
I also opened a bug report - "Issue 17659:Missing download from class (404)"(17 votes)
- My teacher has only taught me how to key in the numbers into the formula instead of teaching how the formula itself has been derived! Does anybody out there face the same situation as I do?(13 votes)
- Unfortunately, many teachers feel that it's better to get students to use an equation than it is to understand an equation.(13 votes)
- If we draw two vertical lines at each extrema of the standard deviation will they always cross the graph at the inflection points? (The point where the graph transitions from concave up to concave down). If this is the case, it would make more sense why a decrease in standard deviation causes the graph to become more narrow.(10 votes)
- it's interesting. I don't know. but it appears that the probability of x to fall into the interval between theese two points is constant, around 68%. isn't it?(5 votes)
- Is it possible to convert uniform distribution to normal distribution? I mean, that randomizer gives only uniform distribution and i need Gaussian.I heard about inverse transformation method for conversion from uniform to Cauchy distribution. Is there any recipes for making normal distribution, based on uniform?(5 votes)
- If you're trying to create a distribution that is approximately Normally distributed (and you only have Uniformly distributed random variables), one possible approach--thanks to the Central Limit theorem--is to take n uniform r.v.'s (for instance, n=10), calculate their mean, and then repeat a bunch of times. Then, if your original r.v.'s were Uniform with mean=μ and variance=σ², your distribution of sample means will be (approximately) N~(μ, σ²/n).(4 votes)
- Sal talks, towards the start of this video, about integral calculus being very helpful for this. If I'm going through all the maths on khan academy, is it worth me leaving this for now, doing the calculus modules, and then coming back to this?(7 votes)
- I noticed that the video title and some others include ck12.org in them. Is this what Khanacademy was known as previously or is it a spinoff of Khanacademy?(4 votes)
- ck12.org is a completely different website that functions similarly to khan academy. These videos were made by sal khan for that website.(5 votes)
- This lesson is way advanced for beginners to grasp easily, you did not explain how calculate deviation from the mean, then how to work the square root of the mean, to get the variation, and how to calculate your dispersion, the variance which is the average of squared deviations and standard deviation which is the squared root of average total of squared deviations from mean(if am right). I came here to see simple methods of calculating these square root, but this is kind of advance.(5 votes)
The normal distribution is arguably, the most important concept in statistics. Everything we do or almost everything we do in inferential statistics which is essentially, making inferences based on data points, is to some degree, based on the normal distribution. So, what I want to do in this video, in this spreadsheet, is to essentially give you as deep an understanding of the normal distribution as possible. For the rest of your life if someone says, we're assuming a normal distribution you can say, oh I know what that is, this is a formula and I understand how to use it et cetera. So this spreadsheet, just so you know, is downloadable at www.khanacademy.org/downlads/ and if you just type that part in you'll see everything that's downloadable. Then download/normalintro.xls and you'll get this spreadsheet right here. I think I did this in the right standard. Anyway, if you go onto Wikipedia and if you were to type in normal distribution or were to do a search for a normal distribution -- let me actually get my pen tool going -- this is what you would see. I literally copied and pasted this right here from Wikipedia. I know it looks daunting, you have all these Greek letters there, but this is just -- this sigma right here -- that is just the standard deviation of the distribution. We'll play with that a little bit in this chart and see what that means. You know what the standard deviation means in general but this is the standard deviation of this distribution, which is a probability density function. And I encourage you to rewatch the video on probability density functions because it's a little bit of a transition going from the binomial distribution, which is discreet. The binomial distribution will say, what is the probability of getting a 5, and you just kind of look at that histogram or that bar chart and say oh, that's the probability. But in a continuous probability distribution or a continuous probability density function, you can't just say what is the probability of me getting a 5. You have to say what is the probability of me getting between, let's say a 4.5 and a 5.5. You have to give it some range. And then, your probability isn't given by just reading this graph. The probability is given by the area under that curve. It be given by this area. For those of you who know calculus, if p of x is our probability density function -- doesn't have to be a normal distribution although it often is a normal distribution -- the way you actually figure out the probability, let's say between 4 and a half and 5 and half. What is the probability, the odds of me getting between 4 and a half and 5 and a half inches of rain tomorrow? It'll actually be the integral from 4 and a half to 5 and a half of this probability density function or of this probably density function, the x. So that's just the area of the curve. For those of you who don't know calculus yet, I encourage you to watch that playlist. But all this is saying is the area of the curve from here to here. It turns out for the normal distribution, this isn't an easy thing to evaluate analytical and so you do it numerically. You don't have to feel bad about doing it numerically because oh, how do I take the integral of this? There's actually functions for it and you can even approximate it. One way you can approximate it is you could use it the way you approximate integrals in general. You could say well, what is the area of this? Well it's roughly the area of this trapezoid. So you could figure out the area of that trapezoid, taking the average of that point and that point and multiplying it by the base. Or you can just take the level of -- let me change colors because I think I'm overdoing it with the green -- or you could just take the height of this line right here and multiply it by the base. You'll get the area of this rectangle, which might be a pretty good approximation for the area under the curve. Right? Because you'll have a little bit extra over here but you're going to miss a little bit over there, so it might be a pretty good approximation. That's actually what I do in the other video, just to approximate the area under the curve and give you a good sense that the normal distribution is what the binomial distribution becomes essentially, if you have many trials. What's interesting about the normal distribution -- I don't know if I mentioned this already -- this right here, this is the graph. This is just another word, people might talk about the central limit theorem. But this is really kind of one of the most important or interesting things about our universe, central limit theorem. I won't prove it here but it essentially tells us, and you could kind of understand it by looking at the other video where we talk about flipping coins. If we were to do many flips of coins -- those are independent trials of each other -- and if you take the sum of all of your flips, if you were to give yourself one point -- if you got ahead every time -- and if you're take the sum of them as you approach an infinite number flips, you approach the normal distribution. What's interesting about that is each of those trials -- in the case of flipping the coin, each trial is a flip of the coin -- each of those trials don't have to have a normal distribution. So we could be talking about molecular interactions and every time compound x interacts with compound y what might it result doesn't have to have to be normally distributed. But what happens is, if you take a sum of a ton of those interactions, then all of a sudden the end result will be normally distributed. This is why this is such an important distribution. It shows up in nature all of the time and if you do take data points from something that is very complex and it is the sum of arguably, many almost infinite individual independent trials, it's a pretty good assumption to assume the normal distribution. We'll do other videos where we talk about when it is a good assumption, when it isn't a good assumption. But anyway, just to digest this a little bit and let me actually rewrite it. This is what you'll see on Wikipedia but this could be rewritten as 1 over sigma times the square root of 2 pi times x is just e to that power. So it's just e to the, this whole thing over here, minus x minus the mean squared over 2 sigma squared. This is the standard deviation. Standard deviation squared is just the variance. Just so you know how to use this, you're like, oh wow, there's so many Greek letters here, what do I do? This tells you the height of the normal distribution function. Let's say that this is the distribution of people's heights above 5'9. Let's say that this was 5'9 and not 0. What this tells you is, if you wanted to figure out what is the probability of finding someone who was roughly 5 inches taller than the average right here, what you would do is you would put in this number here, this 5 into x. And then you know the standard deviation, because you've taken a bunch of samples. You know the variance, which is the standard deviation squared. You know the mean, and you just put your x in there and it'll tell you the height of the function. And then you have to give it a range. You can't just say how many people are exactly 5 inches taller than average. You would actually say how many people are between 5.1 inches and 4.9 nine inches taller than the average. You have to give it a little bit of range because no is exactly, it's almost infinitely impossible to the atom to be exactly 5'9. Even the definition of an inch isn't defined that particularly. So that's how you'd use this function. This is so heavily used in, one, it shows up in nature but in all of inferential statistics, I think it behooves you to become as familiar with this formula as possible. I guess to make that happen. Let me play around a little bit with this formula just to kind of give you an intuition of how everything works out et cetera. If I were to take this -- I'd like to maybe help you memorize this -- this could be rewritten as, if we take the sigma into the square root sign, if we take the standard deviation in there, it becomes 1 over the square root of 2 pie sigma squared. I've never seen it written this way but it gives me a little intuition that sigma squared, it's always written as sigma squared, but it's really just the variance and the variance is what you calculate before you calculate the standard deviation, so that's interesting. And then this top right here, this could be written as e to the minus 1 half times and both of these things here are squared so we could just say x minus the mean over sigma squared. And this kind of clarifies what's going on here a little bit better. Because what's this? x minus sigma is the distance between whatever point we want to find. Let's say we're here. x minus mu is the mean so that's here so that's this distance, and this is the standard deviation which is this distance. So this in here tells me how many standard deviations I am away from the mean. That's actually called the standard z score, I talked about it in the other video. And then we square that and then we take this to the minus 1/2. Well let me rewrite that. If I were to write e to the minus 1/2 times a, that's the same thing as e to the a to the minus 1/2 power right? If you take something to an exponent and take that to an exponent you can just multiply these exponents. So likewise, this could be rewritten as, this is equal to 1 over the square root of 2 pi sigma squared, which is just the variance. And I'm just playing around with the formula because I really want you to see all the ways, maybe you'll get a little intuition. I encourage you to email me if you see some insight on why this exists. Once again, I think it is cool then all of the sudden we have this other formula that has pi and e in it. So many phenomenon are described by this and once again pi and e show up together just like e to the i pi is equal to negative 1. It tells you something about our universe. Anyway, I could rewrite this as e to the x minus mu over sigma squared and all of that to the minus 1/2. Something in the minus 1/2 power, that's just 1 over the square root which is already going on here. So we could just rewrite this over here as 1 over the square root of 2 pi times the variance times e to essentially, our z score squared. If we say z is this thing in here, z is how many standard deviations we are from the mean, z score squared. And all of the sudden this becomes very clean. We just say 2 pi times our variance times e to the number standard deviations we are away from the mean and you square that. You take the square root of that thing and invert it and that's the normal distribution. Anyway, I wanted to do that. I thought it was neat and it's interesting to play around with it. Then that way if you see it in any of these other forms in the rest of your life your won't say what's that, I thought the normal distribution was this or it was this and now you know. With that said, let's play around a little bit with this normal distribution. So this spreadsheet, I've plotted the normal distribution. You can change the assumptions that are in this kind of green, blue color. So right now it's plotting it with a mean of zero and a standard deviation of 4. I just write the variance here just for your information, the variance is just the standard deviation squared. And so what happens when you change the mean? So if the mean goes from 0 to let's say it goes to 5. Notice, this graph just shifted to the right by 5. Right? It was centered here, now it's centered over here. If we make it minus 5, what happens? The whole bell curve just shifts 5 to the left from the center. Now what happens when you change the standard deviation. The variance is the average squared distance from the mean, the standard deviation is the square root of that. So it's kind of, not exactly, but kind of the average distance from the mean. So the smaller the standard deviation the closer a lot of the points are going to be the mean. We should get kind of a narrower graph and let's see if that happens. When the standard deviation is 2, we see that. The graph you're more likely to be really close to the mean than further away. If you make the standard deviation, if you make it 10, all of the sudden you got a really flat graph and this thing keeps going on forever. And that's a key difference: the binomial distribution is always finite. You can only have a finite number of values while the normal distribution is defined over the entire real number line. So the probability, if you have a mean of minus 5 and a standard deviation of 10, the probability of getting a thousand here is very low but there's some probability. There's some probably that all of the atoms in my body just arrange perfectly that I fall through the seat I'm sitting on. It's very unlikely and it probably won't happen in the life of the universe but it can happen. And that can be described by normal distribution because it says, anything can happen although it could very unprobable. The thing I talked about at the beginning of the video is when you figure out a normal distribution you can't just look at this point on the graph -- let me get the drop pen tool back -- you have to figure out the area under the curve between two points. So if I wanted to say -- let's say this is our distribution -- and I said what is the probability that I get 0. I don't know what phenomena this is describing but at 0 happened. If I say exactly 0, the probability is 0 -- I shouldn't use 0 too much -- because the area under the curve, just under 0, there's no area, it's just a line. You have to say between a range. So you have to say the probability between -- and actually I can type it in here -- between minus .005 and plus .05 is -- well it rounded -- it says they're close to 0. Let me do it, between minus 1 and between 1. It calculated at 7 percent and I'll show you how I calculated this in a second. So let me get the screen draw tool. So what did I just do? Between minus 1 and 1 -- and I'll show you the behind the scenes, what excel is doing -- we're going from minus 1, which is roughly right here, to 1. And we're calculating the area under the curve. We're calculating this area or for those of you who know calculus, we're calculating the integral from minus 1 to 1 of this function where the standard deviation is right here, is 10 and the mean is minus 5. Actually, let me put that in. So we're calculating, for this example, the way it's drawn right here, the normal distribution function, our standard deviation is 10 times square root of 2 pi times e to the minus 1/2 times x minus our mean. Our mean is negative right now. Our mean is minus 5 so it's x plus 5 over the standard deviation squared which is the variance, so that's 100 squared dx. This is what this number is right here, this 7 percent or actually .07 is the area right around there. Now, unfortunately for us in the world, this isn't an easy integral to evaluate analytical, even for those of us who know our calculus. So this tends to be done numerically. And kind of an easy way to do this is -- well not as easy way -- a function has been defined called the cumuluative distribution function that is a useful tool for figuring out this area. So what the cumulative distribution function is essentially -- let me call it the cumulative distribution function -- it's a function of x. It gives us the area under the curve, under this curve. So let's say that this is x right here, that's our x. It tells you the area under the curve up to x. So another way to think about it, it tells you what is the probability that you land at some value less than your x value. So it's the area from minus infinity to x of our probability density function. When you actually use the Excel normal distribution function, let me say norm distribution. You have to give it your x value, you give it the mean, you give it the standard deviation. And then you say whether you want the cumulative distribution, in which case you say true or you want just this normal distribution, which you say false. So if you wanted to graph this right here, you would say false in caps. If you wanted to graph the cumulative distribution function which I do down here -- let me move this down a little bit, let me get out of pen tool -- then you say true when you make that Excel call. So this is a cumulative di distribution function for this. This is a normal distribution, here's a cumulative distribution. Just so you get the intuition. If you want to know, what is the probability that I get a value less than 20? So I can get any value less than 20 given this distribution. The cumulative distribution right here, -- let me make it so you can see -- if you go to 20 you just go right to that point there and you say wow, the probability of getting 20 or less is pretty high. It's approaching 100 percent. That makes sense because most of the area under this curve is less than 20. Or if you said what's the probability of getting less than minus five. Well minus 5 is the mean so half of your results should be above that and half should be below. And if you go to this point right here you could see that this right here is 50 percent. So the probability of getting less than minus 5 is exactly 50 percent. So what you do is, if I wanted to know the probability of getting between negative 1 and 1 what I do is -- let me get back to my pen tool -- I figure out what is the probability of getting minus 1 or lower. So I figure out this whole area. And then I figure out the probability of getting one or lower which is this whole area -- well I'm going to give it a different color -- 1 or lower is everything there. And I subtract the yellow area from the magenta area and I'll just get what's ever left over here. That's exactly what I did in the spread sheet. Let me scroll down. This might be taxing my computer by taking the screen capture with it. So what I did is I evaluated the cumulative distribution function at one to be right there. And I evaluate the cumulative distribution function at minus 1, which is right there. And the difference between these two, I subtract this number from this number and that tells me essentially the probability that I'm between those two numbers. Or another way to think about it, the area right here. I really encourage you to play with this and explore the excel formulas and everything. This area right here would be minus 1 and 1. Just so you know this graph, the central line right here, this is the mean. And then these two lines I drew right here, these are 1 standard deviation below and 1 standard deviation above the mean. Some people think what's the probability that land within one standard deviation of the mean? Well that's easy to do. What I can do is I'll just click on this. And I'll just call this, what's the probability that I land between 1 standard deviation -- the mean is minus 5 -- one standard deviation below the mean is minus 15 and one standard deviation above the mean is 10 plus minus 5 is 5. So that's between 5 and 15. So 68.3 percent. That's actually always the case that you have a 68.3 percent probability of landing within one standard deviation of the mean, assuming you have a normal distribution. So once again, that number represents the area under the curve here, this area under the curve. And the way you get it is with the cumulative distribution function. I'll go down here. Every time I move this I have to get rid of the pen tool. You evaluate it at plus 5, which is right here. This is 1 standard deviation above the mean, that's a number right around there. It looks like it's like 80 something percent, maybe 90 percent roughly. And then you evaluate it at 1 standard deviation below the mean which is minus 15. And this one looks like roughly 15 percent or so, 15, 16, maybe 17 percent, I'll say 18 percent. But the big picture is when you subtract this value from this value you get the probability that you land between those two. And that's because this value tells the probability that you're less. So when you go to the cumulative distribution function you get that right there. It keeps crawling back and forth. So when you go to five -- and you just go right over here -- this is essentially tells you this area under the curve, the probability that you're less than or equal to 5, everything up there. And then when you evaluate it at minus 15 down here, it tells you the probability that you're back here. So when you subtract this from the larger thing you're just left with what's under the curve right there. Just to understand this spreadsheet a little bit better because I really want you to play with it and see what happens if I make this distribution, the mean was minus 5 now let me make it 5. It just shifted to the right. It just moved over to the right by 5. Whoops. Let me use the pen tool. It just moved over to the right by 5. If I were to try to make the standard deviation smaller we'll see that the whole thing just gets a little bit tighter. Let's make it 6. And all of a sudden this looks a like a tighter curve, we make it 2, it becomes even tighter. I really want you to play with this and play with the formula and get an intuitive feeling for this, the cumulative distribution function and think a lot about how it relates to the binomial distribution and I cover that in the last video. To plot this I just took each of these points. I went to plot the points between minus 20 and 20 and I just incremented by 1. I just decided to increment by 1. So this isn't a continues curve, it's actually just plotting a point at each point and connecting it with the line. Then I did the distance between each of those points and the mean. So I just took 0 minus 5, this is this distance. So this just tells you the point minus 20 is 25 less than the mean. That's all I did there. Then I divided that by the standard deviation and this is the z score, the standard z score. This tells me how many standard deviations is minus 20 away from the mean. It's 12 and a half standard deviations below the mean. Then I use that and I just plugged it into essentially this formula to figure out the height of the function. Let's say at minus 20 the height is very low. Let's say at minus 2 the heights a little bit better, the heights going to be some place, it's going to be right there. And so that gives me that value. But then to actually figure out the probability of that, what I do is I calculate the cumulative distribution function. Well this is the probability that you're less than that. So the area under the curve below that which is very small. It's not zero, I know it looks like 0 here but that's only because I round it. It's going to be 0001, it's going to be a really small number. There's some probability that we even get minus a thousand. Another intuitive thing that you really should have a sense for is the integral over this or the entire area of the curve has to be 1 because that takes into account all possible circumstances. And that should happen if we put a suitably smaller number here and a suitably large number here. There you go, we get 100 percent. Although, this isn't a 100 percent. We'd have to go from minus infinity to plus infinity to really get a 100 percent. It's just rounding to 100 percent. It's probably 99.99999 percent or something like that. And so to actually calculate this, what I do is I take the cumulative distribution function of this point and I subtract from that the cumulative distribution function of that point. And that's where I got this 100 percent from. Anyway, hopefully that will give you a good feel for the normal distribution. I really encourage you to play with the spreadsheet and to even make a spreadsheet like this yourself. In a future exercise we'll actually use this type of spreadsheet as an input into other models. If we're doing a financial model and if we say our revenue has a normal distribution around some expected value, what is the distribution of our net income? Or we could think of a hundred other different types of examples. Anyway, see you in the next video.