We hopefully now have a
respectable working knowledge of the sampling distribution
of the sample mean. And what I want to do in this
video is explore a little bit more on how that distribution
changes as we change our sample size, n. I'll write n down right here. Our sample size n. So just as a bit of review,
we saw before, we could just start off with any
crazy distribution. Maybe it looks
something like this. I'll do a discrete distribution. Really, to model
anything, at some point, you have to make it discrete. It could be a very granular
discrete distribution, but let's say something
crazy that looks like this. This is clearly not a
normal distribution. But we saw in the first
video, if you take, let's say, sample sizes of four. So if you took four numbers
from this distribution, four random numbers where,
let's say, this is the probability of a
1, 2, 3, 4, 5, 6, 7, 8, 9. If you took four numbers
at a time and averaged them-- let me do that here. If you took four
numbers at a time, let's say we use this
distribution to generate four random numbers. Right? We're very likely to get a 9. We're definitely not going
to get any 7's or 8's. We're definitely not
going to get a 4. We might get a 1 or 2. 3 is also very likely. 5 is very likely. So we use this function
to essentially generate random numbers for us. And we take samples of four,
and then we average them up. So let's say our first
average is, I don't know, let's say it's a 9, it's
a 5, it's another 9, and then it's a 1. So what is that? That's 14 plus 10,
24 divided by 4. The average for
this first trial, for this first sample of
four, is going to be 6. They add up to 24 divided by 4. So we would plot it right here. Our average was 6 that time. Just like that. And we'll just keep doing it. And we've seen in the past that,
if you just keep doing this, this is going to start
looking something like a normal distribution. So maybe we do it again,
the average 6 again. Maybe we do it again,
the average is 5. We do it again,
the average is 7. We do it again,
the average is 6. And then if you just do
this a ton, a ton of times, your distribution
might look something that looks very much like
a normal distribution. So these boxes are really small. So we just do a bunch
of these trials. At some point, it might look a
lot like a normal distribution. Obviously, there are
some average values. It won't be a perfect
normal distribution, because you can never
get anything less than 0, or anything less than 1,
really, as an average. You can't get 0 as an average. And you can't get
anything more than 9. So it's not going to have
infinitely long tails but, at least for the
middle part of it, a normal distribution might
be good approximation. In this video, what
I want to think about is what happens
as we change n. So in this case, n was 4. n is our sample size. Every time we do a
trial, we took four and we took their average,
and we plotted it. We could have had n equal 10. We could've taken 10 samples
from this population, you could say, or from
this random variable, averaged them, and
then plotted them here. And in the last video,
we ran the simulation. I'm going to go back to
that simulation in a second. We saw a couple of things. And I'll show it to you
in a little bit more depth this time. When n is pretty
small, it doesn't approach a normal
distribution that well. So when n is small-- let's
take the extreme case. What happens when
n is equal to 1? And that literally
just means I take one instance of this random
variable and average it. Well, it's just going
to be that thing. So if I just take a bunch
of trials from this thing and plot it over time,
what's it look like? Well, it's definitely
not going to look like a normal distribution. You're going to have
a couple of 1's. You're going to have
a couple of 2's. You're going to have
more 3's like that. You're going to have no 4's. You're going to
have a bunch of 5's. You're going to have some
6's that look like that. And you're going to
have a bunch of 9's. So there, your sampling
distribution of the sample mean for an n of 1
is going to look-- I don't care how
many trials you do, it's not going to look
like a normal distribution. So the central limit
theorem, although I said you do a bunch
of trials, it'll look like a normal
distribution, definitely doesn't work for n equals 1. As n gets larger, though,
it starts to make sense. Let's see, if we've
got n equals 2-- and I'm all just
doing this in my head. I don't know what the actual
distributions would look like. But then, it still
would be difficult for it to become an exact
normal distribution. But then you could
get more instances, you could get more--
you might get things from all of the above. But in each of your baskets
that you're averaging, you're only going
to get two numbers. For example, you're never
going to get a 7 and 1/2 in your sampling
distribution of the sample mean for n is equal to 2,
because it's impossible to get a 7, and it's
impossible to get an 8. So you're never going to get
7 and 1/2 as-- so maybe when you plot it, maybe
it looks like this. But there'll be a gap at 7 and
1/2 because that's impossible. And maybe it looks
something like that. So it still won't be
a normal distribution when n is equal to 2. So there's a couple of
interesting things here. So one thing-- and I didn't
mention this the first time, just because I really wanted
you to get the gut sense of what the central limit theorem is. The central limit theorem
says as n approaches, really as it approaches
infinity, then is when you get the real
normal distribution. But in kind of
everyday practice, you don't have to get that
much beyond n equals 2. If you get to n equals
10 or n equals 15, you're getting very close
to a normal distribution. So this converges to a normal
distribution very quickly. Now, the other thing
is you obviously want many, many trials. So this is your sample size. That is your sample size. That's the size of
each of your baskets. In the very first
video I did on this, I took a sample size of 4. In the simulation I
did in the last video, we did sample sizes of 4
and 10 and whatever else. This is a sample size of 1. So that's our sample size. So as that approaches
infinity, your actual sampling distribution of the
sample mean will approach a normal distribution. Now, in order to actually
see that normal distribution, and actually to
prove it to yourself, you would have to
do this many, many-- remember the normal
distribution happens-- this is kind of the population,
or this is the random variable. That tells you all
of the possibilities. In real life, we seldom know
all of the possibilities. In fact, in real life, we
seldom know the pure probability generating function. Only if we're kind
of writing it, if we're writing a
computer program. Normally we're doing
samples, and we're trying to estimate things. So normally, there's
some random variable. And then maybe we take
a bunch of samples. We take their means,
and we plot them. And then we're going to get some
type of normal distribution. Let's say we take samples
of 100 and we average them. We're going to get some
normal distribution. And in theory, as we take those
averages hundreds or thousands of times, our data set is going
to more closely approximate that pure sampling distribution
of the sample mean. This thing is a
real distribution. It's a real distribution
with a real mean. It has a pure mean. So the mean of the sampling
distribution of the sample mean, we'll write it like that. Notice I didn't write it is
as just the x with-- this is actually saying this
is a real population mean. This is a real
random variable mean. If you looked at every
possibility of all of the samples that you
can take from your original distribution, from some other
random original distribution, and you just took all
of the possibilities of, let's say, sample size. Let's say we're
dealing with a world where a sample size is 10. If you took all of the
combinations of 10 samples from some original distribution
and you averaged them out, this would describe
that function. Of course, in
reality, if you don't know the original
distribution, you can't take infinite
samples from it. So you won't know
every combination. But if you did it with 1,000, if
you did the trial 1,000 times-- so 1,000 times you took 10
samples from some distribution, and took 1,000 averages
and then plotted them, you're going to
get pretty close. Now, the next thing
I want to touch on is what happens as n-- we know
as n approaches infinity, it becomes more of a
normal distribution. But as I said already, n
equals 10 is pretty good. And n equals 20 is even better. But we saw something
in the last video that at least I find
pretty interesting. Let's say we start with this
crazy distribution up here. It really doesn't
matter what distribution we're starting with. We saw in the simulation
that when n is equal-- let's say n is equal to 5. Our graph, after we take
samples of five, average them, and we do it 10,000
times, our graph looks something like this. It's kind of wide like that. And then when we did
n is equal to 10, our graph looked a little
bit-- it was actually a little bit squeezed in,
like that, a little bit more. So not only was it
more normal-- that's what the central limit
theorem tells us, because we're taking
larger sample sizes-- but it had a smaller standard
deviation or a smaller variance. The mean is going to be
the same, either case. But when our sample
size was larger, our standard deviation
became smaller. In fact, our standard
deviation became smaller than our original
population distribution, or our original probability
density function. Let me show you that
with a simulation. So let me clear everything. And this simulation
is as good as any. So the first thing I want to
show-- or this distribution is as good as any. The first thing I want to
show you is that n of 2 is really not that good. So let's compare an n of 2
to, let's say, an n of 16. So when you compare an n of 2 to
an n of 16-- let's do it once. So you get one, two trials. You average them, and
then it's going to do 16. And then it's going to plot it
down here and average there. Let's do that 10,000 times. So notice when you
took an n of 2, even though we did
it 10,000 times, this is not approaching
a normal distribution. And you can actually see it in
the skew and kurtosis numbers. It has a rightward
positive skew, means it has longer tail
to the right than to left. And then it has a
negative kurtosis, which means that it has
shorter tails and smaller peaks than a standard
normal distribution. Now, when n is equal to 16 and
you do the same-- so every time we took 16 samples from this
distribution function up here and averaged them--
and each of these dots represent an average. We did it 10,001 times. Now notice, the mean is
the same in both places. But here, all of a sudden,
our kurtosis is much smaller, and our skew is much smaller. So we are more normal
in this situation. But even a more interesting
thing is our standard deviation is smaller. This is more squeezed
in than that is. And it's definitely
more squeezed in than our original
distribution. Now, let me do it with two--
let me clear everything again. I like this distribution
because it's a very non-normal distribution. It looks like a bimodal
distribution of some kind. And let's take a scenario where
I take an n of-- let's take two good n's. Let's take an n of 16. That's a nice, healthy n. And let's take an n of 25. And let's compare
them a little bit. I'll do one trial animated
just to-- it's always nice to see it. So first, it's going to do 16 of
these trials and average them. And there we go. Then it's going to
do 25 of these trials and then average them. And then, there we go. Now let's do that, what
I just did animated, let's do it 10,000 times. Miracles of computers. Now, notice something. Now this is 10,000 times. These are both pretty
good approximations of normal distributions. The n is equal to
25 is more normal. It has less skew, slightly less
skew than n is equal to 16. It has a slightly
less kurtosis, which means it's closer to being
a normal distribution than n is equal to 16. But even more interesting,
it's more squeezed in. It has a lower
standard deviation. The standard
deviation here is 2.1, and the standard
deviation here is 2.64. So that's another-- I
mean, I kind of touched on that in the last video,
and it kind makes sense. For every sample you
do for your average, the more you put in that sample,
the less standard deviation. Think of the extreme case. If instead of taking 16
samples from our distribution every time, or
instead of taking 25, if I were to take a million
samples from this distribution every time-- if I were
to take a million samples from this distribution
every time, that sample mean is always going to be pretty
darn close to my mean. If I take a million
samples of everything, if I essentially try to
estimate a mean by taking a million samples,
I'm going to get a pretty good
estimate of that mean. The probability that a
bunch of-- a million numbers are all out here is very low. So if n is a million, of
course, all of my sample means when I
average them are all going to be really tightly
focused around the mean itself. And so hopefully that kind of
makes sense to you as well. If it doesn't, just
think about it. Or even use this tool
and experiment with it just so you can trust that
that is really the case. And it actually turns
out that there's a very clean
formula that relates to standard deviation of
the original probability distribution function to
the standard deviation of the sampling distribution
of the sample mean. And as you can imagine,
it is a function of your sample size,
of how many samples you take out in every basket
before you average them. And I'll go over that
in the next video.