Current time:0:00Total duration:10:52

0 energy points

# Sampling distribution of the sample mean

The central limit theorem and the sampling distribution of the sample mean. Created by Sal Khan.

Video transcript

In the last video, we learned
about what is, quite possibly, the most profound
idea in statistics. And that's the central
limit theorem. And the reason why it's so neat
is we can start with any distribution that has a well
defined mean and variance. Actually I made this--
I wrote the standard deviation in the last few. That should be the mean. And let's say it
has some variance. I could write it like that. Or I could write the
standard deviation there. But as long as it has a well
defined mean and standard deviation, I don't care what
the distribution looks like. What I can do is take
samples, in the last video, of say size 4. So in that means I take,
literally, four instances of this random variable. This is one example. I take their mean. And I consider this the sample
mean from my first trial. Or, you could almost say,
for my first sample. I know it's very confusing
because you can consider that a sample. The set to be a sample. Or you can consider each of its
members of the-- Each member of the set as a sample. So that can be a little
bit confusing there. But I have this
first sample mean. And then I keep doing
that over and over. In my second sample,
my sample size is 4. I got four instances of
this random variable. I average them. I have another sample mean. And the cool thing about the
central limit theorem is, as I keep plotting the frequency
distribution of my sample means, it starts to approach
something that approximates the normal distribution. And it's going to do a better
job of approximating that normal distribution
as n gets larger. And just so we have a little
terminology on our belt, this frequency distribution right
here that I plotted out. Or here or up here, that
I started plotting out. That is called-- And it's kind
of confusing because we use the word sample so much. That is called the
sampling distribution of the sample mean. And let's dissect
this a little bit. Just so that this long
description of this distribution starts to make
a little bit of sense. When we say it's the sampling
distribution, that's telling us that it's being derived from--
It's the distribution of some statistic, which in this case,
happens to be the sample mean. And we're driving it from
samples of an original distribution. So each of these-- So
this is my first sample. My sample size is 4. I'm using the
statistic the mean. I actually could have done
it with other things. I could have done the mode or
the range or other statistics. But the sampling distribution
of the sample mean is the most common one. Is probably in my mind the best
place to start learning about the central limit theorem. And even, frankly,
sampling distribution. So that's what it's called. And just as a little bit of
background-- And I'll prove this to you experimentally,
not mathematically. But I think the experimental
is, on some levels, more satisfying than statistics. That this will have the same
mean as your original distribution right here. So it has the same mean. But we'll see in the next video
that this is actually going to be-- It's going to
start approximating a normal distribution. Even though my original
distribution that this is kind of generated from is
completely non-normal. So let's do that with
this app right here. And just to give proper credit
where credit is due, this is-- I think it was developed
at Rice University. This is from
onlinestatbook.com. And this is their app, which I
think is really neat app because it really helps you to
visualize what a sampling distribution of the
sample mean is. So I can literally create my
own custom distribution here. So let me make something
kind of crazy. So you can do this in theory
with a discrete or a continuous probability density function. But what they have here could
take on 1 of 32 values. And I'm just going to set the
different probabilities of getting any of those 32 values. So clearly this right here is
not a normal distribution. It looks a little bit bimodal,
but it doesn't have long tails. But what I want to do is first
just use a simulation to understand, or to better
understand, what the sampling distribution is all about. So what I'm going to do
I'm going to take-- We'll start with 5 at a time. So my sample size
is going to be 5. And so when I click animate,
what it's going to do is it's going to take five samples
from this probability distribution function. It's going to take five samples
and you're going to see them when I click animate. It's going to average them and
plot the average down here. And then I'm going
to click it again. It's going to do it again. So there you go. I got five samples from there. It averaged them. And it hit there. What did I just do? I clicked-- Oh. I wanted to clear that. Let me make this
bottom one none. So let me do that over again. So I'm going to
take 5 at a time. So I took five samples
from up here. And then it took its mean. And plotted the mean there. Let me do it again. Five samples from this
probability distribution function, plotted
it right there. I could keep doing-- It'll take
some time, but, as you can see, I plotted it right there. Now, I could do this
a thousand times. It's going to take forever. Let's say I just wanted
to do it 1,000 times. So it's-- This program, just
to be clear, it's actually generating the random numbers. This isn't like a
rigged program. It's actually going to generate
the random numbers according to this probability
distribution function. It's going to take five at
a time, find their means and plot the means. So if I click 10,000, it's
going to do that 10,000 times. So it's going to take 5 numbers
from here 10,000 times. And find their means
10,000 times. And then plot the
10,000 means here. So let's do that. So there you go. Notice, it's already looking a
lot like a normal distribution. And, like I said, the
original mean of my crazy distribution here was 14.45. And the mean of, after doing
10,000 samples or 10,000 trials, my mean here is 14.42. So I'm already getting pretty
close to the mean there. My standard deviation,
you might notice, is less than that. We'll talk about that
in a future video. And this skew and kurtosis. These are ideas-- These are
things that help us measure how normal a distribution is. And I've talked a little
bit about it in the past. And let me actually just
diverge a little bit. Just so it's interesting. And they're fairly
straightforward concepts. Skew literally tells-- So
if this is-- Let me do it in a different color. If this is a perfect normal
distribution, and clearly my drawing is very
far from perfect. If that's a perfect
distribution, this would have a skew of 0. If you have a positive skew,
that means you have a larger right tail than you
would've otherwise expect. So something with a positive
skew might look like this. It would have a large
tail to the right. So this would be a positive
skew, which makes it a little less than ideal
for normal distribution. And a negative skew
would look like this. It has a long tail to the left. So negative skew might
look like that. So that is a negative skew. If you have trouble remembering
it, just remember which direction the tail is going. This tail is going towards
the negative direction. This tail is going to
the positive direction. So something has no skew, that
means that it's nice and symmetrical around its mean. Now kurtosis, which sounds like
a very fancy word, is similarly not that fancy of an idea. Kurtosis. So, once again, if I were
to draw a perfect normal distribution-- Remember, there
is no one normal distribution. You could have different
means and different standard deviations. Let's say that's a perfect
normal distribution. If I have positive kurtosis,
what's going to happen is, I'm going to have fatter tails. Let me draw it a little
nicer than that. I'm going to have fatter
tails, but I'm going to have a more pointy peak. I didn't have to draw
it that pointy. Let me draw it like this. I'm going to have fatter tails,
and I'm going to have a more pointy peak than
a normal distribution. So this, right here,
is positive kurtosis. So something that has positive
kurtosis, depending on how positive it is, it tells you
it's a little bit more pointy than a real normal
distribution. Positive kurtosis. And negative kurtosis has
smaller tails, but it's smoother near the middle. So it's like this. So something like this would
have negative kurtosis. So maybe in future videos,
we'll explore that in more detail. But in the context of the
simulation, it's just telling us how normal
this distribution is. So when our sample size was n
equal 5, and we did 10,000 trials, we got pretty close
to a normal distribution. Let's do another 10,000 trials
just to see what happens. It looks even more like
a normal distribution. Our mean is now the
exact same number. But we still have a
little bit of skew and a little bit of kurtosis. Now let's see what happens if
we were to do the same thing with a larger sample size. And we could actually do
them simultaneously. So here's n equal 5. Let's do here n equals 25. Let's let me clear them. I'm going to do the sample--
sampling distribution of the sample mean. As I'm going to run 10,000
trials-- So I'll do one animated trial, just so you
remember what's going on. So I'm literally taking first
5 samples from up here. Find their mean. Now I'm taking 25
samples from up here. Find it's mean. And then plotting it down here. So here the sample size is 25. Here it's 5. I'll do it one more time. I take 5, get the
mean, plot it. Take 25, get the mean, and
then plot it down there. This is a larger sample size. Now that thing that I just did,
I'm going to do 10,000 times. And that's interest-- Remember,
our first distribution was just this really crazy, very
non-normal distribution. But once we did it-- whoops. I didn't want to
make it that big. But once we-- Scroll
up a little bit. So here, what's interesting. They both look a little normal. But if you look at the skew
and the kurtosis when our sample size is larger,
it's more normal. This has a lower skew than when
our sample size was only 5. And it has a less negative
kurtosis then when our sample size was 5. So this is a more
normal distribution. And, one thing that we're going
to explore further in a future video, is not only is it more
normal in it's shape, but it's also tighter fit
around the mean. And you can even think about
why that kind of make sense. When you're sample size is
larger, your odds of getting really far away from
the mean is lower. Because it's very low
likelihood if you're taking 25 samples or 100 samples that
you're just going to get a bunch of stuff way out here, a
bunch of stuff way out here. You're very likely to get a
reasonable spread of things. So it makes sense that your
mean-- your sample mean is less likely to be far
away from the mean. We're going to talk a
little bit more about that in the future. But hopefully this kind
of satisfies you, at least experimentally. I haven't proven it to you with
mathematical rigor, which hopefully we'll do
in the future. But hopefully this satisfies
you, at least experimentally, that the central limit theorem
really does apply to any distribution. I mean this is a
crazy distribution. I encourage you to use this
applet at onlinestatbook.com and experiment with other
crazy distributions to believe for yourself. But the interesting things are
that we're approaching a normal distribution, but as my sample
size got larger, it's a better fit for normal distribution.