Main content

## More on normal distributions

Current time:0:00Total duration:26:24

# Deep definition of the normal distribution

## Video transcript

The normal distribution is
arguably, the most important concept in statistics. Everything we do or almost
everything we do in inferential statistics which is
essentially, making inferences based on data points, is to
some degree, based on the normal distribution. So, what I want to do in this
video, in this spreadsheet, is to essentially give you as deep
an understanding of the normal distribution as possible. For the rest of your life if
someone says, we're assuming a normal distribution you can
say, oh I know what that is, this is a formula and I
understand how to use it et cetera. So this spreadsheet, just so
you know, is downloadable at www.khanacademy.org/downlads/
and if you just type that part in you'll see everything
that's downloadable. Then download/normalintro.xls
and you'll get this spreadsheet right here. I think I did this in
the right standard. Anyway, if you go onto
Wikipedia and if you were to type in normal distribution or
were to do a search for a normal distribution -- let me
actually get my pen tool going -- this is what you would see. I literally copied and pasted
this right here from Wikipedia. I know it looks daunting, you
have all these Greek letters there, but this is just -- this
sigma right here -- that is just the standard deviation
of the distribution. We'll play with that a little
bit in this chart and see what that means. You know what the standard
deviation means in general but this is the standard deviation
of this distribution, which is a probability density function. And I encourage you to rewatch
the video on probability density functions because it's
a little bit of a transition going from the binomial
distribution, which is discreet. The binomial distribution will
say, what is the probability of getting a 5, and you just kind
of look at that histogram or that bar chart and say oh,
that's the probability. But in a continuous probability
distribution or a continuous probability density function,
you can't just say what is the probability of me getting a 5. You have to say what is the
probability of me getting between, let's say
a 4.5 and a 5.5. You have to give it some range. And then, your probability
isn't given by just reading this graph. The probability is given by
the area under that curve. It be given by this area. For those of you who know
calculus, if p of x is our probability density function --
doesn't have to be a normal distribution although it often
is a normal distribution -- the way you actually figure out the
probability, let's say between 4 and a half and 5 and half. What is the probability, the
odds of me getting between 4 and a half and 5 and a half
inches of rain tomorrow? It'll actually be the integral
from 4 and a half to 5 and a half of this probability
density function or of this probably density
function, the x. So that's just the
area of the curve. For those of you who don't know
calculus yet, I encourage you to watch that playlist. But all this is saying is
the area of the curve from here to here. It turns out for the normal
distribution, this isn't an easy thing to evaluate
analytical and so you do it numerically. You don't have to feel bad
about doing it numerically because oh, how do I take
the integral of this? There's actually functions
for it and you can even approximate it. One way you can approximate it
is you could use it the way you approximate integrals
in general. You could say well, what
is the area of this? Well it's roughly the
area of this trapezoid. So you could figure out the
area of that trapezoid, taking the average of that point and
that point and multiplying it by the base. Or you can just take the level
of -- let me change colors because I think I'm overdoing
it with the green -- or you could just take the height of
this line right here and multiply it by the base. You'll get the area of this
rectangle, which might be a pretty good approximation for
the area under the curve. Right? Because you'll have a little
bit extra over here but you're going to miss a little bit
over there, so it might be a pretty good approximation. That's actually what I do in
the other video, just to approximate the area under the
curve and give you a good sense that the normal distribution is
what the binomial distribution becomes essentially, if
you have many trials. What's interesting about the
normal distribution -- I don't know if I mentioned this
already -- this right here, this is the graph. This is just another word,
people might talk about the central limit theorem. But this is really kind of one
of the most important or interesting things about our
universe, central limit theorem. I won't prove it here but it
essentially tells us, and you could kind of understand it by
looking at the other video where we talk about
flipping coins. If we were to do many flips of
coins -- those are independent trials of each other -- and if
you take the sum of all of your flips, if you were to give
yourself one point -- if you got ahead every time -- and if
you're take the sum of them as you approach an infinite number
flips, you approach the normal distribution. What's interesting about that
is each of those trials -- in the case of flipping the coin,
each trial is a flip of the coin -- each of those trials
don't have to have a normal distribution. So we could be talking about
molecular interactions and every time compound x interacts
with compound y what might it result doesn't have to have
to be normally distributed. But what happens is, if you
take a sum of a ton of those interactions, then all of a
sudden the end result will be normally distributed. This is why this is such an
important distribution. It shows up in nature all of
the time and if you do take data points from something that
is very complex and it is the sum of arguably, many almost
infinite individual independent trials, it's a pretty good
assumption to assume the normal distribution. We'll do other videos where we
talk about when it is a good assumption, when it isn't
a good assumption. But anyway, just to digest
this a little bit and let me actually rewrite it. This is what you'll see on
Wikipedia but this could be rewritten as 1 over sigma times
the square root of 2 pi times x is just e to that power. So it's just e to the, this
whole thing over here, minus x minus the mean squared
over 2 sigma squared. This is the standard deviation. Standard deviation squared
is just the variance. Just so you know how to use
this, you're like, oh wow, there's so many Greek
letters here, what do I do? This tells you the
height of the normal distribution function. Let's say that this is the
distribution of people's heights above 5'9. Let's say that this
was 5'9 and not 0. What this tells you is, if you
wanted to figure out what is the probability of finding
someone who was roughly 5 inches taller than the average
right here, what you would do is you would put in this
number here, this 5 into x. And then you know the standard
deviation, because you've taken a bunch of samples. You know the variance, which is
the standard deviation squared. You know the mean, and you just
put your x in there and it'll tell you the height
of the function. And then you have to
give it a range. You can't just say how many
people are exactly 5 inches taller than average. You would actually say how
many people are between 5.1 inches and 4.9 nine inches
taller than the average. You have to give it a little
bit of range because no is exactly, it's almost infinitely
impossible to the atom to be exactly 5'9. Even the definition of
an inch isn't defined that particularly. So that's how you'd
use this function. This is so heavily used in,
one, it shows up in nature but in all of inferential
statistics, I think it behooves you to become as familiar with
this formula as possible. I guess to make that happen. Let me play around a little bit
with this formula just to kind of give you an intuition of how
everything works out et cetera. If I were to take this -- I'd
like to maybe help you memorize this -- this could be rewritten
as, if we take the sigma into the square root sign, if we
take the standard deviation in there, it becomes 1 over
the square root of 2 pie sigma squared. I've never seen it written this
way but it gives me a little intuition that sigma squared,
it's always written as sigma squared, but it's really just
the variance and the variance is what you calculate before
you calculate the standard deviation, so that's
interesting. And then this top right here,
this could be written as e to the minus 1 half times and both
of these things here are squared so we could just say x
minus the mean over sigma squared. And this kind of clarifies
what's going on here a little bit better. Because what's this? x minus sigma is the distance
between whatever point we want to find. Let's say we're here. x minus mu is the mean so
that's here so that's this distance, and this is the
standard deviation which is this distance. So this in here tells me how
many standard deviations I am away from the mean. That's actually called the
standard z score, I talked about it in the other video. And then we square that
and then we take this to the minus 1/2. Well let me rewrite that. If I were to write e to the
minus 1/2 times a, that's the same thing as e to the a to
the minus 1/2 power right? If you take something to an
exponent and take that to an exponent you can just
multiply these exponents. So likewise, this could be
rewritten as, this is equal to 1 over the square root of 2
pi sigma squared, which is just the variance. And I'm just playing around
with the formula because I really want you to see all
the ways, maybe you'll get a little intuition. I encourage you to email me
if you see some insight on why this exists. Once again, I think it is cool
then all of the sudden we have this other formula that
has pi and e in it. So many phenomenon are
described by this and once again pi and e show up together
just like e to the i pi is equal to negative 1. It tells you something
about our universe. Anyway, I could rewrite this as
e to the x minus mu over sigma squared and all of that
to the minus 1/2. Something in the minus 1/2
power, that's just 1 over the square root which is
already going on here. So we could just rewrite this
over here as 1 over the square root of 2 pi times the variance
times e to essentially, our z score squared. If we say z is this thing in
here, z is how many standard deviations we are from the
mean, z score squared. And all of the sudden
this becomes very clean. We just say 2 pi times our
variance times e to the number standard deviations we are away
from the mean and you square that. You take the square root of
that thing and invert it and that's the normal distribution. Anyway, I wanted to do that. I thought it was neat
and it's interesting to play around with it. Then that way if you see it in
any of these other forms in the rest of your life your won't
say what's that, I thought the normal distribution was this or
it was this and now you know. With that said, let's play
around a little bit with this normal distribution. So this spreadsheet,
I've plotted the normal distribution. You can change the assumptions
that are in this kind of green, blue color. So right now it's plotting it
with a mean of zero and a standard deviation of 4. I just write the variance here
just for your information, the variance is just the
standard deviation squared. And so what happens when
you change the mean? So if the mean goes from 0
to let's say it goes to 5. Notice, this graph just
shifted to the right by 5. Right? It was centered here, now
it's centered over here. If we make it minus
5, what happens? The whole bell curve just
shifts 5 to the left from the center. Now what happens when you
change the standard deviation. The variance is the average
squared distance from the mean, the standard deviation is
the square root of that. So it's kind of, not exactly,
but kind of the average distance from the mean. So the smaller the standard
deviation the closer a lot of the points are
going to be the mean. We should get kind of a
narrower graph and let's see if that happens. When the standard deviation
is 2, we see that. The graph you're more likely
to be really close to the mean than further away. If you make the standard
deviation, if you make it 10, all of the sudden you got a
really flat graph and this thing keeps going on forever. And that's a key difference:
the binomial distribution is always finite. You can only have a finite
number of values while the normal distribution is
defined over the entire real number line. So the probability, if you
have a mean of minus 5 and a standard deviation of 10, the
probability of getting a thousand here is very low but
there's some probability. There's some probably that all
of the atoms in my body just arrange perfectly that I fall
through the seat I'm sitting on. It's very unlikely and it
probably won't happen in the life of the universe
but it can happen. And that can be described by
normal distribution because it says, anything can happen
although it could very unprobable. The thing I talked about at the
beginning of the video is when you figure out a normal
distribution you can't just look at this point on the graph
-- let me get the drop pen tool back -- you have to figure out
the area under the curve between two points. So if I wanted to say -- let's
say this is our distribution -- and I said what is the
probability that I get 0. I don't know what phenomena
this is describing but at 0 happened. If I say exactly 0, the
probability is 0 -- I shouldn't use 0 too much -- because
the area under the curve, just under 0, there's no
area, it's just a line. You have to say
between a range. So you have to say the
probability between -- and actually I can type it in here
-- between minus .005 and plus .05 is -- well it rounded -- it
says they're close to 0. Let me do it, between
minus 1 and between 1. It calculated at 7 percent
and I'll show you how I calculated this in a second. So let me get the
screen draw tool. So what did I just do? Between minus 1 and 1 -- and
I'll show you the behind the scenes, what excel is doing --
we're going from minus 1, which is roughly right here, to 1. And we're calculating the
area under the curve. We're calculating this area or
for those of you who know calculus, we're calculating the
integral from minus 1 to 1 of this function where the
standard deviation is right here, is 10 and the
mean is minus 5. Actually, let me put that in. So we're calculating, for this
example, the way it's drawn right here, the normal
distribution function, our standard deviation is 10 times
square root of 2 pi times e to the minus 1/2 times
x minus our mean. Our mean is negative right now. Our mean is minus 5 so it's x
plus 5 over the standard deviation squared which is the
variance, so that's 100 squared dx. This is what this number is
right here, this 7 percent or actually .07 is the
area right around there. Now, unfortunately for us in
the world, this isn't an easy integral to evaluate
analytical, even for those of us who know our calculus. So this tends to be
done numerically. And kind of an easy way to do
this is -- well not as easy way -- a function has been defined
called the cumuluative distribution function that is a
useful tool for figuring out this area. So what the cumulative
distribution function is essentially -- let me call it
the cumulative distribution function -- it's
a function of x. It gives us the area under
the curve, under this curve. So let's say that this is x
right here, that's our x. It tells you the area
under the curve up to x. So another way to think about
it, it tells you what is the probability that you land
at some value less than your x value. So it's the area from minus
infinity to x of our probability density function. When you actually use the Excel
normal distribution function, let me say norm distribution. You have to give it your
x value, you give it the mean, you give it the
standard deviation. And then you say whether you
want the cumulative distribution, in which case you
say true or you want just this normal distribution,
which you say false. So if you wanted to graph
this right here, you would say false in caps. If you wanted to graph the
cumulative distribution function which I do down here
-- let me move this down a little bit, let me get out of
pen tool -- then you say true when you make that Excel call. So this is a cumulative di
distribution function for this. This is a normal distribution,
here's a cumulative distribution. Just so you get the intuition. If you want to know, what
is the probability that I get a value less than 20? So I can get any value
less than 20 given this distribution. The cumulative distribution
right here, -- let me make it so you can see -- if you go to
20 you just go right to that point there and you say wow,
the probability of getting 20 or less is pretty high. It's approaching 100 percent. That makes sense because
most of the area under this curve is less than 20. Or if you said what's the
probability of getting less than minus five. Well minus 5 is the mean
so half of your results should be above that and
half should be below. And if you go to this point
right here you could see that this right here is 50 percent. So the probability of getting
less than minus 5 is exactly 50 percent. So what you do is, if I wanted
to know the probability of getting between negative 1 and
1 what I do is -- let me get back to my pen tool -- I figure
out what is the probability of getting minus 1 or lower. So I figure out
this whole area. And then I figure out the
probability of getting one or lower which is this whole area
-- well I'm going to give it a different color -- 1 or
lower is everything there. And I subtract the yellow area
from the magenta area and I'll just get what's ever
left over here. That's exactly what I did
in the spread sheet. Let me scroll down. This might be taxing my
computer by taking the screen capture with it. So what I did is I evaluated
the cumulative distribution function at one to
be right there. And I evaluate the cumulative
distribution function at minus 1, which is right there. And the difference between
these two, I subtract this number from this number and
that tells me essentially the probability that I'm
between those two numbers. Or another way to think about
it, the area right here. I really encourage you to play
with this and explore the excel formulas and everything. This area right here
would be minus 1 and 1. Just so you know this graph,
the central line right here, this is the mean. And then these two lines I drew
right here, these are 1 standard deviation below and 1
standard deviation above the mean. Some people think what's the
probability that land within one standard deviation
of the mean? Well that's easy to do. What I can do is I'll
just click on this. And I'll just call this, what's
the probability that I land between 1 standard deviation --
the mean is minus 5 -- one standard deviation below the
mean is minus 15 and one standard deviation above the
mean is 10 plus minus 5 is 5. So that's between 5 and 15. So 68.3 percent. That's actually always the case
that you have a 68.3 percent probability of landing within
one standard deviation of the mean, assuming you have
a normal distribution. So once again, that number
represents the area under the curve here, this
area under the curve. And the way you get it
is with the cumulative distribution function. I'll go down here. Every time I move this I have
to get rid of the pen tool. You evaluate it at plus
5, which is right here. This is 1 standard deviation
above the mean, that's a number right around there. It looks like it's like 80
something percent, maybe 90 percent roughly. And then you evaluate it at
1 standard deviation below the mean which is minus 15. And this one looks like roughly
15 percent or so, 15, 16, maybe 17 percent, I'll
say 18 percent. But the big picture is when
you subtract this value from this value you get
the probability that you land between those two. And that's because this
value tells the probability that you're less. So when you go to the
cumulative distribution function you get
that right there. It keeps crawling
back and forth. So when you go to five -- and
you just go right over here -- this is essentially tells you
this area under the curve, the probability that you're
less than or equal to 5, everything up there. And then when you evaluate it
at minus 15 down here, it tells you the probability
that you're back here. So when you subtract this from
the larger thing you're just left with what's under
the curve right there. Just to understand this
spreadsheet a little bit better because I really want you to
play with it and see what happens if I make this
distribution, the mean was minus 5 now let me make it 5. It just shifted to the right. It just moved over
to the right by 5. Whoops. Let me use the pen tool. It just moved over
to the right by 5. If I were to try to make the
standard deviation smaller we'll see that the whole thing
just gets a little bit tighter. Let's make it 6. And all of a sudden this looks
a like a tighter curve, we make it 2, it becomes even tighter. I really want you to play with
this and play with the formula and get an intuitive feeling
for this, the cumulative distribution function and think
a lot about how it relates to the binomial distribution and I
cover that in the last video. To plot this I just took
each of these points. I went to plot the points
between minus 20 and 20 and I just incremented by 1. I just decided to
increment by 1. So this isn't a continues
curve, it's actually just plotting a point at each
point and connecting it with the line. Then I did the distance
between each of those points and the mean. So I just took 0 minus 5,
this is this distance. So this just tells you
the point minus 20 is 25 less than the mean. That's all I did there. Then I divided that by the
standard deviation and this is the z score, the
standard z score. This tells me how many standard
deviations is minus 20 away from the mean. It's 12 and a half standard
deviations below the mean. Then I use that and I just
plugged it into essentially this formula to figure out
the height of the function. Let's say at minus 20
the height is very low. Let's say at minus 2 the
heights a little bit better, the heights going to be
some place, it's going to be right there. And so that gives
me that value. But then to actually figure out
the probability of that, what I do is I calculate the
cumulative distribution function. Well this is the probability
that you're less than that. So the area under the curve
below that which is very small. It's not zero, I know it
looks like 0 here but that's only because I round it. It's going to be 0001,
it's going to be a really small number. There's some probability that
we even get minus a thousand. Another intuitive thing that
you really should have a sense for is the integral over this
or the entire area of the curve has to be 1 because that
takes into account all possible circumstances. And that should happen if we
put a suitably smaller number here and a suitably
large number here. There you go, we
get 100 percent. Although, this isn't
a 100 percent. We'd have to go from minus
infinity to plus infinity to really get a 100 percent. It's just rounding
to 100 percent. It's probably 99.99999 percent
or something like that. And so to actually calculate
this, what I do is I take the cumulative distribution
function of this point and I subtract from that the
cumulative distribution function of that point. And that's where I got
this 100 percent from. Anyway, hopefully that will
give you a good feel for the normal distribution. I really encourage you to play
with the spreadsheet and to even make a spreadsheet
like this yourself. In a future exercise we'll
actually use this type of spreadsheet as an input
into other models. If we're doing a financial
model and if we say our revenue has a normal distribution
around some expected value, what is the distribution
of our net income? Or we could think of a
hundred other different types of examples. Anyway, see you in
the next video.