The empirical rule (also called the "68-95-99.7 rule") is a guideline for how data is distributed in a normal distribution. The rule states that (approximately): - 68% of the data points will fall within one standard deviation of the mean. - 95% of the data points will fall within two standard deviations of the mean. - 99.7% of the data points will fall within three standard deviations of the mean. Created by Sal Khan.
Want to join the conversation?
- At1:28, Sal draws what looks like an upside down capital letter 'A' to left of 68,95,99.7.
What does it mean?(22 votes)
- That was an awkwardly-drawn asterisk.
For the record, ∀ is a common mathematical symbol in logic that is shorthand for "for all", but that's not what Sal was drawing. :)(80 votes)
- I'm wondering: Why use the empirical rule? How can I remember those percentages? I'd love a video on this subject that connects it to the other topics in statistics and explains why to use it!(17 votes)
- You use the empirical rule because it allows you to quickly estimate probabilities when you're dealing with a normal distribution. People often create ranges using standard deviation, so knowing what percentage of cases fall within 1, 2 and 3 standard deviations can be useful.(6 votes)
- This is a bit frustrating.
I started with the "AP Statistics" course. Ran into some exercises/quizzes with terms that were never taught.
Had to switch to the "Statistics and Probability" course to learn about those terms. In that course ran into "standard deviation" term and had to switch to "High school statistics" to learn about it.
Now on this course we get "normal distribution", which was never taught...
Do I now have to go to ck12.org or another course on KA to learn about normal distributions?
Is there no way to do these courses in sequence?(12 votes)
- It's out of order but you may want to start with the normal distribution review. And then further on down theres a video called "Deep definition of the normal distribution" in the "More on normal distributions" section, and that is labeled an intro to the normal distribution. There's definitely some weirdness with the stats stuff though.(5 votes)
- Why is it called empirical(something based on observations rather than a fixed formula) rule? The 68-95-99.7% distribution can be calculated through the normal distribution formula as well. How exactly is this empirical?(8 votes)
- The empirical rule is named as such because it was originally based on observation. 18th century French mathematician Abraham de Moivre flipped fair coins and tried to understand the probability of obtaining a specific number of heads from 100 coin flips. He observed that as the number of flips increased, the distribution approached a curve (the normal distribution).(6 votes)
- So, am I right to think that % of the distribution between 1 and 2 standard deviations is 13.5%? 95-68=27 and 27/2=13.5
So if the question were: what % of babies born are between the weights 7.3g and 8.4g? The answer would be 13.5%?(6 votes)
- How would the problem be different, if the question had not specified that the data was "normally distributed"?(3 votes)
- At3:00Sal said "If we go one more standard deviation then, you know where this is headed, 99.7% of a deviation in that range." But why isn't it 100%?(2 votes)
- The Normal curve doesn't ever hit 0, so technically any place that we chop it off, we'll be chopping off a little bit of the probability. It so happens that at +/- 3 standard deviations we've captures 99.7% of the area, and for many folks that is close enough to being "basically everything."(5 votes)
- Is there some part of the course missing?
We just learned about Density curves in the previous lesson.
I would expect to first learn about the Normal Distribution, and then to learn about the empirical rule for the normal distribution.
But instead, this video is about how to solve exercises about the empirical rule (which hasn't been introduced yet) for the normal distribution (which also has not been introduced yet) ...(4 votes)
- How do we know that the empirical rule actually works?(2 votes)
- These numerical values (68 - 95 - 99.7) come from the cumulative distribution function (CDF) of the normal distribution. For example, F(2) = 0.9772, or Pr(x ≤ μ + 2σ) = 0.9772. Note that this is not a symmetrical interval – this is merely the probability that an observation is less than μ + 2σ. To compute the probability that an observation is within two standard deviations of the mean (small differences due to rounding):
Pr(μ − 2σ ≤ x ≤ μ + 2σ) = F(2) − F(−2) = 0.9772 − (1 − 0.9772) = 0.9545 or 95.45%.
This is related to confidence interval as used in statistics: μ ± 2σ is approximately a 95%.
For μ ± σ and μ ± 3σ we can find the probability using the same method!(5 votes)
Let's do another problem from the normal distribution section of ck12.org's AP Statistics book. And I'm using this because it's open source. It's actually quite a good book. The problems are, I think, good practice for us. So let's see, number three, number two. You can go to their site, and I think you can download the book. Assume that the mean weight of one-year-old girls in the US is normally distributed with a mean of about 9.5 grams. That's got to be kilograms. I have a 10-month-old son, and he weighs about 20 pounds, which is about 9 kilograms. 9.5 grams is nothing. This would be if we were talking about like mice or something. This has got to be kilograms. But anyway. It's about 9.5 kilograms with a standard deviation of approximately 1.1 grams. So the mean is equal to 9.5 kilograms, I'm assuming, and the standard deviation is equal to 1.1 grams. Without using a calculator-- so that's an interesting clue-- estimate the percentage of one-year-old girls in the US that meet the following condition. So when they say that-- "without a calculator estimate," that's a big clue or a big giveaway that we're supposed to use the empirical rule, sometimes called the 68, 95, 99.7 rule. And if you remember, this is the name of the rule. You've essentially remembered the rule. What that tells us is if we have a normal distribution-- I'll do a bit of a review here before we jump into this problem. If we have a normal distribution-- let me draw a normal distribution. So it looks like that. That's my normal distribution. I didn't draw it perfectly, but you get the idea. It should be symmetrical. This is our mean right there. That's our mean. If we go one standard deviation above the mean, and one standard deviation below the mean-- so this is our mean plus one standard deviation, this is our mean minus one standard deviation-- the probability of finding a result, if we're dealing with a perfect normal distribution that's between one standard deviation below the mean and one standard deviation above the mean, that would be this area. And it would be-- you could guess-- 68%, 68% chance you're going to get something within one standard deviation of the mean, either a standard deviation below or above or anywhere in between. Now if we're talking about two standard deviations around the mean-- so if we go down another standard deviation. So we go down another standard deviation in that direction and another standard deviation above the mean. And we were to ask ourselves, what's the probability of finding something within those two or within that range? Then it's, you could guess it, 95%. And that includes this middle area right here. So the 68% is a subset of 95%. And I think you know where this is going. If we go three standard deviations below the mean and above the mean, the empirical rule, or the 68, 95, 99.7 rule tells us that there is a 99.7% chance of finding a result in a normal distribution that is within three standard deviations of the mean. So above three standard deviations below the mean, and below three standard deviations above the mean. That's what the empirical rule tells us. Now, let's see if we can apply it to this problem. So they gave us the mean and the standard deviation. Let me draw that out. Let me draw my axis first, as best as I can. That's my axis. Let me draw my bell curve. Let me draw the bell curve. That's about as good of a bell curve as you can expect a freehand drawer to do. And the mean here is-- and this should be symmetric. This height should be the same as that height, there. I think you get the idea. I'm not a computer. 9.5 is the mean. I won't write the units. It's all in kilograms. One standard deviation above the mean, we should add 1.1 to that. Because they told us the standard deviation is 1.1. That's going to be 10.6. Let me just draw a little dotted line there. Once standard deviation below the mean, we're going to subtract 1.1 from 9.5. And so that would be, what? 8.4. If we go two standard deviations above the mean, we would add another standard deviation here. We went one standard deviation, two standard deviations. That one goes to 11.7. And if we were to go three standard deviations, we'd add 1.1 again. That would get us to 12.8. Doing that on the other side-- one standard deviation below the mean is 8.4. Two standard deviations below the mean, subtract 1.1 again, would be 7.3. And then three standard deviations below the mean, it would be right there, would be 6.2 kilograms. So that's our setup for the problem. So what's the probability that we would find a one-year-old girl in the US that weighs less than 8.4 kilograms? Or maybe I should say whose mass is less than 8.4 kilograms. So if we look here, the probability of finding a baby or a female baby that's one-years-old with a mass or a weight of less than 8.4 kilograms, that's this area right here. I said mass because kilograms is actually a unit of mass. But most people use it as weight, as well. So that's in that area right there. So how can we figure out that area under this normal distribution using the empirical rule? Well, we know what this area is. We know what this area between minus one standard deviation and plus one standard deviation is. We know that that is 68%. And if that's 68%, then that means in the parts that aren't in that middle region, you have 32%. Because the area under the entire normal distribution is 100, or 100%, or 1, depending on how you want to think about it. Because you can't have-- well, all the possibilities combined can only add up to 1. You can't have more than 100% there. So if you add up this leg and this leg-- so this plus that leg is going to be the remainder. So 100 minus 68, that's 32%. And 32% is if you add up this left leg and this right leg over here. And this is a perfect normal distribution. They told us it's normally distributed. So it's going to be perfectly symmetrical. So if this side and that side add up to 32, but they're both symmetrical-- meaning they have the exact same area-- then this side right here-- do it in pink. This side right here-- it ended up looking more like purple-- would be 16%. And this side right here would be 16%. So your probability of getting a result more than one standard deviation above the mean-- so that's this right-hand side-- would be 16%. Or the probability of having a result less than one standard deviation below the mean-- that's this, right here, 16%. So they want to know the probability of having a baby, at one-years-old, less than 8.4 kilograms. Less than 8.4 kilograms is this area right here, and that's 16%. So that's 16% for Part A. Let's do Part B. Between 7.3 and 11.7 kilograms-- so between 7.3, that's right there. That's two standard deviations below the mean. And 11.7-- it's two standard deviations above the mean. So they're essentially asking us what's the probability of getting a result within two standard deviations of the mean. This was the mean, right here. This is two standard deviations below. This is two standard deviations above. Well, that's pretty straightforward. The empirical rule tells us-- between two standard deviations, you have a 95% chance of getting bad results, or a 95% chance of getting a result that is within two standard deviations. So the empirical rule just gives us that answer. And then finally, Part C-- the probability of having a one-year-old US baby girl more than 12.8 kilograms. So 12.8 kilograms is three standard deviations above the mean. So we want to know the probability of having a result more than three standard deviations above the mean. So that is this area way out there, that I drew in orange. Maybe I should do it in a different color to really contrast it. So it's this long tail out here, this little small area. So what is that probability? So let's turn back to our empirical rule. Well, we know this area. We know the area between minus three standard deviations and plus three standard deviations. We know this. Since this is the last problem, I can color the whole thing in. We know this area, right here-- between minus 3 and plus 3. That is 99.7%. The bulk of the results fall under there-- I mean, almost all of them. So what do we have left over for the two tails? Remember, there are two tails. This is one of them. And then you have the results that are less than three standard deviations below the mean, this tail right there. So that tells us that this less than three standard deviations below the mean and more than three standard deviations above the mean combined have to be the rest. Well, the rest-- it's only 0.3%. for the rest. And these two things are symmetrical. They're going to be equal. So this right here it has to be half of this, or 0.15%, and this, right here, is going to be 0.15%. So the probability of having a one-year-old baby girl in the US that is more than 12.8 kilograms, if you assume a perfect normal distribution, is the area under this curve, the area that is more than three standard deviations above the mean. And that is 0.15%. Anyway, hope you found that useful.