Main content

## Statistics and probability

### Course: Statistics and probability > Unit 4

Lesson 5: Normal distributions and the empirical rule- Qualitative sense of normal distributions
- Normal distribution problems: Empirical rule
- Standard normal distribution and the empirical rule (from ck12.org)
- More empirical rule and z-score practice (from ck12.org)
- Empirical rule
- Normal distributions review

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Normal distribution problems: Empirical rule

The empirical rule (also called the "68-95-99.7 rule") is a guideline for how data is distributed in a normal distribution.
The rule states that (approximately):
- 68% of the data points will fall within one standard deviation of the mean.
- 95% of the data points will fall within two standard deviations of the mean.
- 99.7% of the data points will fall within three standard deviations of the mean. Created by Sal Khan.

## Want to join the conversation?

- At1:28, Sal draws what looks like an upside down capital letter 'A' to left of 68,95,99.7.

What does it mean?(24 votes)- That was an awkwardly-drawn asterisk.

For the record, ∀ is a common mathematical symbol in logic that is shorthand for "for all", but that's not what Sal was drawing. :)(85 votes)

- I'm wondering: Why use the empirical rule? How can I remember those percentages? I'd love a video on this subject that connects it to the other topics in statistics and explains why to use it!(18 votes)
- You use the empirical rule because it allows you to quickly estimate probabilities when you're dealing with a normal distribution. People often create ranges using standard deviation, so knowing what percentage of cases fall within 1, 2 and 3 standard deviations can be useful.(6 votes)

- This is a bit frustrating.

I started with the "AP Statistics" course. Ran into some exercises/quizzes with terms that were never taught.

Had to switch to the "Statistics and Probability" course to learn about those terms. In that course ran into "standard deviation" term and had to switch to "High school statistics" to learn about it.

Now on this course we get "normal distribution", which was never taught...

Do I now have to go to ck12.org or another course on KA to learn about normal distributions?

Is there no way to do these courses in sequence?(14 votes)- It's out of order but you may want to start with the normal distribution review. And then further on down theres a video called "Deep definition of the normal distribution" in the "More on normal distributions" section, and that is labeled an intro to the normal distribution. There's definitely some weirdness with the stats stuff though.(5 votes)

- Why is it called empirical(something based on observations rather than a fixed formula) rule? The 68-95-99.7% distribution can be calculated through the normal distribution formula as well. How exactly is this empirical?(9 votes)
- The empirical rule is named as such because it was originally based on observation. 18th century French mathematician Abraham de Moivre flipped fair coins and tried to understand the probability of obtaining a specific number of heads from 100 coin flips. He observed that as the number of flips increased, the distribution approached a curve (the normal distribution).(8 votes)

- So, am I right to think that % of the distribution between 1 and 2 standard deviations is 13.5%? 95-68=27 and 27/2=13.5

So if the question were: what % of babies born are between the weights 7.3g and 8.4g? The answer would be 13.5%?(6 votes)- Thanks Dave :)(4 votes)

- How would the problem be different, if the question had not specified that the data was "normally distributed"?(3 votes)
- We can say almost nothing if we do not know how our data is distributed!(5 votes)

- At3:00Sal said "If we go one more standard deviation then, you know where this is headed, 99.7% of a deviation in that range." But why isn't it 100%?(2 votes)
- The Normal curve doesn't ever hit 0, so technically any place that we chop it off, we'll be chopping off a little bit of the probability. It so happens that at +/- 3 standard deviations we've captures 99.7% of the area, and for many folks that is close enough to being "basically everything."(5 votes)

- Is there some part of the course missing?

We just learned about Density curves in the previous lesson.

I would expect to first learn about the Normal Distribution, and then to learn about the empirical rule for the normal distribution.

But instead, this video is about how to solve exercises about the empirical rule (which hasn't been introduced yet) for the normal distribution (which also has not been introduced yet) ...(4 votes)- I have this in High school stat so it's likely the video has been recycled for the course you are on. If anyone need's it here's a link to the previous vid on normal distribution:https://www.khanacademy.org/math/probability/xa88397b6:analyze-quantitative/normal-distributions-a2ii/v/ck12-org-normal-distribution-problems-qualitative-sense-of-normal-distributions

Sorry nobody answered you sooner and I hope this is helpful! :)(1 vote)

- How do we know that the empirical rule actually works?(2 votes)
- These numerical values (68 - 95 - 99.7) come from the cumulative distribution function (CDF) of the normal distribution. For example, F(2) = 0.9772, or Pr(x ≤ μ + 2σ) = 0.9772. Note that this is not a symmetrical interval – this is merely the probability that an observation is less than μ + 2σ. To compute the probability that an observation is within two standard deviations of the mean (small differences due to rounding):

Pr(μ − 2σ ≤ x ≤ μ + 2σ) = F(2) − F(−2) = 0.9772 − (1 − 0.9772) = 0.9545 or 95.45%.

This is related to confidence interval as used in statistics: μ ± 2σ is approximately a 95%.

For μ ± σ and μ ± 3σ we can find the probability using the same method!(5 votes)

- Does the number that the standard deviation is affect the answer? If the standard deviation was a different number would the answer still be 16%?(2 votes)
- No, the answer would no longer be 16% because 9.5 - something other than 1.1 would not be 8.4. Since 8.4 would no longer be 1 standard deviation away from the mean, the answer would no longer apply.(2 votes)

## Video transcript

Let's do another problem from
the normal distribution section of ck12.org's AP
Statistics book. And I'm using this
because it's open source. It's actually quite a good book. The problems are, I think,
good practice for us. So let's see, number
three, number two. You can go to their
site, and I think you can download the book. Assume that the mean weight of
one-year-old girls in the US is normally distributed with
a mean of about 9.5 grams. That's got to be kilograms. I have a 10-month-old son,
and he weighs about 20 pounds, which is about 9 kilograms. 9.5 grams is nothing. This would be if we were talking
about like mice or something. This has got to be kilograms. But anyway. It's about 9.5 kilograms
with a standard deviation of approximately 1.1 grams. So the mean is equal to 9.5
kilograms, I'm assuming, and the standard deviation
is equal to 1.1 grams. Without using a
calculator-- so that's an interesting clue--
estimate the percentage of one-year-old
girls in the US that meet the following condition. So when they say that--
"without a calculator estimate," that's a big clue
or a big giveaway that we're supposed to
use the empirical rule, sometimes called the
68, 95, 99.7 rule. And if you remember, this
is the name of the rule. You've essentially
remembered the rule. What that tells us is if we
have a normal distribution-- I'll do a bit of a
review here before we jump into this problem. If we have a normal
distribution-- let me draw a
normal distribution. So it looks like that. That's my normal distribution. I didn't draw it perfectly,
but you get the idea. It should be symmetrical. This is our mean right there. That's our mean. If we go one standard
deviation above the mean, and one standard
deviation below the mean-- so this is our mean plus
one standard deviation, this is our mean minus
one standard deviation-- the probability of
finding a result, if we're dealing with a perfect
normal distribution that's between one standard deviation
below the mean and one standard deviation above the
mean, that would be this area. And it would be-- you
could guess-- 68%, 68% chance you're
going to get something within one standard
deviation of the mean, either a standard deviation
below or above or anywhere in between. Now if we're talking about
two standard deviations around the mean--
so if we go down another standard deviation. So we go down another
standard deviation in that direction and
another standard deviation above the mean. And we were to ask
ourselves, what's the probability of finding
something within those two or within that range? Then it's, you
could guess it, 95%. And that includes this
middle area right here. So the 68% is a subset of 95%. And I think you know
where this is going. If we go three standard
deviations below the mean and above the mean, the
empirical rule, or the 68, 95, 99.7 rule tells us
that there is a 99.7% chance of finding a result
in a normal distribution that is within three standard
deviations of the mean. So above three standard
deviations below the mean, and below three standard
deviations above the mean. That's what the
empirical rule tells us. Now, let's see if we can
apply it to this problem. So they gave us the mean
and the standard deviation. Let me draw that out. Let me draw my axis
first, as best as I can. That's my axis. Let me draw my bell curve. Let me draw the bell curve. That's about as
good of a bell curve as you can expect a
freehand drawer to do. And the mean here is-- and
this should be symmetric. This height should be the
same as that height, there. I think you get the idea. I'm not a computer. 9.5 is the mean. I won't write the units. It's all in kilograms. One standard deviation
above the mean, we should add 1.1 to that. Because they told us the
standard deviation is 1.1. That's going to be 10.6. Let me just draw a
little dotted line there. Once standard deviation
below the mean, we're going to
subtract 1.1 from 9.5. And so that would be, what? 8.4. If we go two standard
deviations above the mean, we would add another
standard deviation here. We went one standard deviation,
two standard deviations. That one goes to 11.7. And if we were to go
three standard deviations, we'd add 1.1 again. That would get us to 12.8. Doing that on the other
side-- one standard deviation below the mean is 8.4. Two standard deviations below
the mean, subtract 1.1 again, would be 7.3. And then three standard
deviations below the mean, it would be right there,
would be 6.2 kilograms. So that's our setup
for the problem. So what's the
probability that we would find a one-year-old
girl in the US that weighs less
than 8.4 kilograms? Or maybe I should say whose
mass is less than 8.4 kilograms. So if we look here, the
probability of finding a baby or a female baby that's
one-years-old with a mass or a weight of less
than 8.4 kilograms, that's this area right here. I said mass because kilograms
is actually a unit of mass. But most people use
it as weight, as well. So that's in that
area right there. So how can we
figure out that area under this normal distribution
using the empirical rule? Well, we know what this area is. We know what this area between
minus one standard deviation and plus one standard
deviation is. We know that that is 68%. And if that's 68%, then
that means in the parts that aren't in that middle
region, you have 32%. Because the area under the
entire normal distribution is 100, or 100%, or
1, depending on how you want to think about it. Because you can't have-- well,
all the possibilities combined can only add up to 1. You can't have more
than 100% there. So if you add up this leg
and this leg-- so this plus that leg is going
to be the remainder. So 100 minus 68, that's 32%. And 32% is if you add up this
left leg and this right leg over here. And this is a perfect
normal distribution. They told us it's
normally distributed. So it's going to be
perfectly symmetrical. So if this side and
that side add up to 32, but they're both
symmetrical-- meaning they have the exact
same area-- then this side right
here-- do it in pink. This side right
here-- it ended up looking more like
purple-- would be 16%. And this side right
here would be 16%. So your probability of
getting a result more than one standard deviation
above the mean-- so that's this right-hand
side-- would be 16%. Or the probability
of having a result less than one standard deviation
below the mean-- that's this, right here, 16%. So they want to know the
probability of having a baby, at one-years-old, less
than 8.4 kilograms. Less than 8.4 kilograms
is this area right here, and that's 16%. So that's 16% for Part
A. Let's do Part B. Between 7.3 and 11.7
kilograms-- so between 7.3, that's right there. That's two standard
deviations below the mean. And 11.7-- it's two standard
deviations above the mean. So they're essentially
asking us what's the probability of getting
a result within two standard deviations of the mean. This was the mean, right here. This is two standard
deviations below. This is two standard
deviations above. Well, that's pretty
straightforward. The empirical rule
tells us-- between two standard deviations,
you have a 95% chance of getting bad results,
or a 95% chance of getting a result that is
within two standard deviations. So the empirical rule
just gives us that answer. And then finally, Part
C-- the probability of having a one-year-old US baby
girl more than 12.8 kilograms. So 12.8 kilograms is
three standard deviations above the mean. So we want to know the
probability of having a result more than three standard
deviations above the mean. So that is this area way out
there, that I drew in orange. Maybe I should do it
in a different color to really contrast it. So it's this long tail out
here, this little small area. So what is that probability? So let's turn back to
our empirical rule. Well, we know this area. We know the area between minus
three standard deviations and plus three
standard deviations. We know this. Since this is the last problem,
I can color the whole thing in. We know this area, right here--
between minus 3 and plus 3. That is 99.7%. The bulk of the results
fall under there-- I mean, almost all of them. So what do we have left
over for the two tails? Remember, there are two tails. This is one of them. And then you have
the results that are less than three
standard deviations below the mean, this
tail right there. So that tells us that this less
than three standard deviations below the mean and more than
three standard deviations above the mean combined
have to be the rest. Well, the rest--
it's only 0.3%. for the rest. And these two things
are symmetrical. They're going to be equal. So this right here it has to
be half of this, or 0.15%, and this, right here,
is going to be 0.15%. So the probability of
having a one-year-old baby girl in the US that is
more than 12.8 kilograms, if you assume a perfect
normal distribution, is the area under this
curve, the area that is more than three standard
deviations above the mean. And that is 0.15%. Anyway, hope you
found that useful.