Main content

## Statistics and probability

### Unit 10: Lesson 3

Sampling distribution of a sample mean- Inferring population mean from sample mean
- Central limit theorem
- Sampling distribution of the sample mean
- Sampling distribution of the sample mean (part 2)
- Standard error of the mean
- Example: Probability of sample mean exceeding a value
- Mean and standard deviation of sample means
- Sample means and the central limit theorem
- Finding probabilities with sample means
- Sampling distribution of a sample mean example

© 2022 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Central limit theorem

AP.STATS:

UNC‑3 (EU)

, UNC‑3.H.2 (EK)

, UNC‑3.H.3 (EK)

Introduction to the central limit theorem and the sampling distribution of the mean. Created by Sal Khan.

## Video transcript

In this video, I want to
talk about what is easily one of the most fundamental and
profound concepts in statistics and maybe in all of mathematics. And that's the
central limit theorem. And what it tells us
is we can start off with any distribution that
has a well-defined mean and variance-- and if it has
a well-defined variance, it has a well-defined
standard deviation. And it could be a continuous
distribution or a discrete one. I'll draw a discrete one,
just because it's easier to imagine, at least for
the purposes of this video. So let's say I have a discrete
probability distribution function. And I want to be
very careful not to make it look anything close
to a normal distribution. Because I want to show you
the power of the central limit theorem. So let's say I have
a distribution. Let's say it could take
on values 1 through 6. 1, 2, 3, 4, 5, 6. It's some kind of crazy dice. It's very likely to get a one. Let's say it's
impossible-- well, let me make that
a straight line. You have a very high
likelihood of getting a 1. Let's say it's
impossible to get a 2. Let's say it's an OK likelihood
of getting a 3 or a 4. Let's say it's
impossible to get a 5. And let's say it's very
likely to get a 6 like that. So that's my probability
distribution function. If I were to draw a
mean-- this the symmetric, so maybe the mean would
be something like that. The mean would be halfway. So that would be my
mean right there. The standard
deviation maybe would look-- it would be
that far and that far above and below the mean. But that's my discrete
probability distribution function. Now what I'm going to do
here, instead of just taking samples of this
random variable that's described by this probability
distribution function, I'm going to take samples of it. But I'm going to
average the samples and then look at
those samples and see the frequency of the
averages that I get. And when I say average,
I mean the mean. Let me define something. Let's say my sample size-- and
I could put any number here. But let's say first off we try a
sample size of n is equal to 4. And what that means is I'm going
to take four samples from this. So let's say the first
time I take four samples-- so my sample sizes is
four-- let's say I get a 1. Let's say I get another 1. And let's say I get a 3. And I get a 6. So that right there is my
first sample of sample size 4. I know the terminology
can get confusing. Because this is the sample
that's made up of four samples. But then when we talk about the
sample mean and the sampling distribution of the
sample mean, which we're going to talk more and more
about over the next few videos, normally the sample refers
to the set of samples from your distribution. And the sample size tells
you how many you actually took from your distribution. But the terminology
can be very confusing, because you could easily view
one of these as a sample. But we're taking four
samples from here. We have a sample size of four. And what I'm going to do is
I'm going to average them. So let's say the mean-- I
want to be very careful when I say average. The mean of this first
sample of size 4 is what? 1 plus 1 is 2. 2 plus 3 is 5. 5 plus 6 is 11. 11 divided by 4 is 2.75. That is my first sample mean
for my first sample of size 4. Let me do another one. My second sample of size 4,
let's say that I get a 3, a 4. Let's say I get another 3. And let's say I get a 1. I just didn't happen
to get a 6 that time. And notice I can't
get a 2 or a 5. It's impossible for
this distribution. The chance of getting
a 2 or 5 is 0. So I can't have any
2s or 5s over here. So for the second
sample of sample size 4, my second sample mean is
going to be 3 plus 4 is 7. 7 plus 3 is 10 plus 1 is 11. 11 divided by 4,
once again, is 2.75. Let me do one more,
because I really want to make it clear
what we're doing here. So I do one more. Actually, we're going
to do a gazillion more. But let me just do
one more in detail. So let's say my third
sample of sample size 4-- so I'm going to
literally take 4 samples. So my sample is
made up of 4 samples from this original
crazy distribution. Let's say I get a 1,
a 1, and a 6 and a 6. And so my third sample mean
is going to be 1 plus 1 is 2. 2 plus 6 is 8. 8 plus 6 is 14. 14 divided by 4 is 3 and 1/2. And as I find each
of these sample means-- so for each of my
samples of sample size 4, I figure out a mean. And as I do each
of them, I'm going to plot it on a
frequency distribution. And this is all going to
amaze you in a few seconds. So I plot this all on a
frequency distribution. So I say, OK, on
my first sample, my first sample mean was 2.75. So I'm plotting the actual
frequency of the sample means I get for each sample. So 2.75, I got it one time. So I'll put a little plot there. So that's from that
one right there. And the next time,
I also got a 2.75. That's a 2.75 there. So I got it twice. So I'll plot the
frequency right there. Then I got a 3 and 1/2. So all the possible values,
I could have a three, I could have a 3.25, I
could have a 3 and 1/2. So then I have the 3 and 1/2,
so I'll plot it right there. And what I'm going
to do is I'm going to keep taking these samples. Maybe I'll take 10,000 of them. So I'm going to keep
taking these samples. So I go all the way to S 10,000. I just do a bunch of these. And what it's going to look like
over time is each of these-- I'm going to make it
a dot, because I'm going to have to zoom out. So if I look at it like
this, over time-- it still has all the values that it
might be able to take on, 2.75 might be here. So this first dot is
going to be-- this one right here is going
to be right there. And that second one is
going to be right there. Then that one at 3.5 is
going to look right there. But I'm going to
do it 10,000 times. Because I'm going
to have 10,000 dots. And let's say as I do it, I'm
going just keep plotting them. I'm just going to keep
plotting the frequencies. I'm just going to
keep plotting them over and over and over again. And what you're going
to see is, as I take many, many samples
of size 4, I'm going to have
something that's going to start kind of approximating
a normal distribution. So each of these dots represent
an incidence of a sample mean. So as I keep adding on
this column right here, that means I kept getting
the sample mean 2.75. So over time. I'm going to have
something that's starting to approximate
a normal distribution. And that is a neat thing about
the central limit theorem. So an orange, that's the
case for n is equal to 4. This was a sample size of 4. Now, if I did the same thing
with a sample size of maybe 20-- so in this case, instead
of just taking 4 samples from my original crazy
distribution, every sample I take 20 instances
of my random variable, and I average those 20. And then I plot the
sample mean on here. So in that case,
I'm going to have a distribution that
looks like this. And we'll discuss
this in more videos. But it turns out if I were
to plot 10,000 of the sample means here, I'm going
to have something that, two things-- it's going
to even more closely approximate a normal distribution. And we're going to
see in future videos, it's actually going to
have a smaller-- well, let me be clear. It's going to have
the same mean. So that's the mean. This is going to
have the same mean. So it's going to have a
smaller standard deviation. Well, I should plot
these from the bottom because you kind of stack it. One you get one, then another
instance and another instance. But this is going to
more and more approach a normal distribution. So this is what's super
cool about the central limit theorem. As your sample size
becomes larger-- or you could even say as
it approaches infinity. But you really don't
have to get that close to infinity to really get
close to a normal distribution. Even if you have a
sample size of 10 or 20, you're already getting very
close to a normal distribution, in fact about as
good an approximation as we see in our everyday life. But what's cool is we can start
with some crazy distribution. This has nothing to do
with a normal distribution. This was n equals 4, but if
we have a sample size of n equals 10 or n
equals 100, and we were to take 100 of these,
instead of four here, and average them and
then plot that average, the frequency of it, then we
take 100 again, average them, take the mean, plot
that again, and if we do that a bunch
of times, in fact, if we were to do that
an infinite time, we would find that
we, especially if we had an
infinite sample size, we would find a perfect
normal distribution. That's the crazy thing. And it doesn't apply just
to taking the sample mean. Here we took the
sample mean every time. But you could have also
taken the sample sum. The central limit theorem
would have still applied. But that's what's so
super useful about it. Because in life, there's all
sorts of processes out there, proteins bumping into
each other, people doing crazy things, humans
interacting in weird ways. And you don't know the
probability distribution functions for any
of those things. But what the central
limit theorem tells us is if we add a
bunch of those actions together, assuming that they
all have the same distribution, or if we were to take the
mean of all of those actions together, and if we were to plot
the frequency of those means, we do get a normal distribution. And that's frankly why the
normal distribution shows up so much in statistics
and why, frankly, it's a very good
approximation for the sum or the means of a
lot of processes. Normal distribution. What I'm going to show you in
the next video is I'm actually going to show you that this is
a reality, that as you increase your sample size, as
you increase your n, and as you take a
lot of sample means, you're going to have a frequency
plot that looks very, very close to a normal distribution.