Main content

# Sampling distribution of the sample mean 2

## Video transcript

We hopefully now have a respectable working knowledge of the sampling distribution of the sample mean. And what I want to do in this video is explore a little bit more on how that distribution changes as we change our sample size, n. I'll write n down right here. Our sample size n. So just as a bit of review, we saw before, we could just start off with any crazy distribution. Maybe it looks something like this. I'll do a discrete distribution. Really, to model anything, at some point, you have to make it discrete. It could be a very granular discrete distribution, but let's say something crazy that looks like this. This is clearly not a normal distribution. But we saw in the first video, if you take, let's say, sample sizes of four. So if you took four numbers from this distribution, four random numbers where, let's say, this is the probability of a 1, 2, 3, 4, 5, 6, 7, 8, 9. If you took four numbers at a time and averaged them-- let me do that here. If you took four numbers at a time, let's say we use this distribution to generate four random numbers. Right? We're very likely to get a 9. We're definitely not going to get any 7's or 8's. We're definitely not going to get a 4. We might get a 1 or 2. 3 is also very likely. 5 is very likely. So we use this function to essentially generate random numbers for us. And we take samples of four, and then we average them up. So let's say our first average is, I don't know, let's say it's a 9, it's a 5, it's another 9, and then it's a 1. So what is that? That's 14 plus 10, 24 divided by 4. The average for this first trial, for this first sample of four, is going to be 6. They add up to 24 divided by 4. So we would plot it right here. Our average was 6 that time. Just like that. And we'll just keep doing it. And we've seen in the past that, if you just keep doing this, this is going to start looking something like a normal distribution. So maybe we do it again, the average 6 again. Maybe we do it again, the average is 5. We do it again, the average is 7. We do it again, the average is 6. And then if you just do this a ton, a ton of times, your distribution might look something that looks very much like a normal distribution. So these boxes are really small. So we just do a bunch of these trials. At some point, it might look a lot like a normal distribution. Obviously, there are some average values. It won't be a perfect normal distribution, because you can never get anything less than 0, or anything less than 1, really, as an average. You can't get 0 as an average. And you can't get anything more than 9. So it's not going to have infinitely long tails but, at least for the middle part of it, a normal distribution might be good approximation. In this video, what I want to think about is what happens as we change n. So in this case, n was 4. n is our sample size. Every time we do a trial, we took four and we took their average, and we plotted it. We could have had n equal 10. We could've taken 10 samples from this population, you could say, or from this random variable, averaged them, and then plotted them here. And in the last video, we ran the simulation. I'm going to go back to that simulation in a second. We saw a couple of things. And I'll show it to you in a little bit more depth this time. When n is pretty small, it doesn't approach a normal distribution that well. So when n is small-- let's take the extreme case. What happens when n is equal to 1? And that literally just means I take one instance of this random variable and average it. Well, it's just going to be that thing. So if I just take a bunch of trials from this thing and plot it over time, what's it look like? Well, it's definitely not going to look like a normal distribution. You're going to have a couple of 1's. You're going to have a couple of 2's. You're going to have more 3's like that. You're going to have no 4's. You're going to have a bunch of 5's. You're going to have some 6's that look like that. And you're going to have a bunch of 9's. So there, your sampling distribution of the sample mean for an n of 1 is going to look-- I don't care how many trials you do, it's not going to look like a normal distribution. So the central limit theorem, although I said you do a bunch of trials, it'll look like a normal distribution, definitely doesn't work for n equals 1. As n gets larger, though, it starts to make sense. Let's see, if we've got n equals 2-- and I'm all just doing this in my head. I don't know what the actual distributions would look like. But then, it still would be difficult for it to become an exact normal distribution. But then you could get more instances, you could get more-- you might get things from all of the above. But in each of your baskets that you're averaging, you're only going to get two numbers. For example, you're never going to get a 7 and 1/2 in your sampling distribution of the sample mean for n is equal to 2, because it's impossible to get a 7, and it's impossible to get an 8. So you're never going to get 7 and 1/2 as-- so maybe when you plot it, maybe it looks like this. But there'll be a gap at 7 and 1/2 because that's impossible. And maybe it looks something like that. So it still won't be a normal distribution when n is equal to 2. So there's a couple of interesting things here. So one thing-- and I didn't mention this the first time, just because I really wanted you to get the gut sense of what the central limit theorem is. The central limit theorem says as n approaches, really as it approaches infinity, then is when you get the real normal distribution. But in kind of everyday practice, you don't have to get that much beyond n equals 2. If you get to n equals 10 or n equals 15, you're getting very close to a normal distribution. So this converges to a normal distribution very quickly. Now, the other thing is you obviously want many, many trials. So this is your sample size. That is your sample size. That's the size of each of your baskets. In the very first video I did on this, I took a sample size of 4. In the simulation I did in the last video, we did sample sizes of 4 and 10 and whatever else. This is a sample size of 1. So that's our sample size. So as that approaches infinity, your actual sampling distribution of the sample mean will approach a normal distribution. Now, in order to actually see that normal distribution, and actually to prove it to yourself, you would have to do this many, many-- remember the normal distribution happens-- this is kind of the population, or this is the random variable. That tells you all of the possibilities. In real life, we seldom know all of the possibilities. In fact, in real life, we seldom know the pure probability generating function. Only if we're kind of writing it, if we're writing a computer program. Normally we're doing samples, and we're trying to estimate things. So normally, there's some random variable. And then maybe we take a bunch of samples. We take their means, and we plot them. And then we're going to get some type of normal distribution. Let's say we take samples of 100 and we average them. We're going to get some normal distribution. And in theory, as we take those averages hundreds or thousands of times, our data set is going to more closely approximate that pure sampling distribution of the sample mean. This thing is a real distribution. It's a real distribution with a real mean. It has a pure mean. So the mean of the sampling distribution of the sample mean, we'll write it like that. Notice I didn't write it is as just the x with-- this is actually saying this is a real population mean. This is a real random variable mean. If you looked at every possibility of all of the samples that you can take from your original distribution, from some other random original distribution, and you just took all of the possibilities of, let's say, sample size. Let's say we're dealing with a world where a sample size is 10. If you took all of the combinations of 10 samples from some original distribution and you averaged them out, this would describe that function. Of course, in reality, if you don't know the original distribution, you can't take infinite samples from it. So you won't know every combination. But if you did it with 1,000, if you did the trial 1,000 times-- so 1,000 times you took 10 samples from some distribution, and took 1,000 averages and then plotted them, you're going to get pretty close. Now, the next thing I want to touch on is what happens as n-- we know as n approaches infinity, it becomes more of a normal distribution. But as I said already, n equals 10 is pretty good. And n equals 20 is even better. But we saw something in the last video that at least I find pretty interesting. Let's say we start with this crazy distribution up here. It really doesn't matter what distribution we're starting with. We saw in the simulation that when n is equal-- let's say n is equal to 5. Our graph, after we take samples of five, average them, and we do it 10,000 times, our graph looks something like this. It's kind of wide like that. And then when we did n is equal to 10, our graph looked a little bit-- it was actually a little bit squeezed in, like that, a little bit more. So not only was it more normal-- that's what the central limit theorem tells us, because we're taking larger sample sizes-- but it had a smaller standard deviation or a smaller variance. The mean is going to be the same, either case. But when our sample size was larger, our standard deviation became smaller. In fact, our standard deviation became smaller than our original population distribution, or our original probability density function. Let me show you that with a simulation. So let me clear everything. And this simulation is as good as any. So the first thing I want to show-- or this distribution is as good as any. The first thing I want to show you is that n of 2 is really not that good. So let's compare an n of 2 to, let's say, an n of 16. So when you compare an n of 2 to an n of 16-- let's do it once. So you get one, two trials. You average them, and then it's going to do 16. And then it's going to plot it down here and average there. Let's do that 10,000 times. So notice when you took an n of 2, even though we did it 10,000 times, this is not approaching a normal distribution. And you can actually see it in the skew and kurtosis numbers. It has a rightward positive skew, means it has longer tail to the right than to left. And then it has a negative kurtosis, which means that it has shorter tails and smaller peaks than a standard normal distribution. Now, when n is equal to 16 and you do the same-- so every time we took 16 samples from this distribution function up here and averaged them-- and each of these dots represent an average. We did it 10,001 times. Now notice, the mean is the same in both places. But here, all of a sudden, our kurtosis is much smaller, and our skew is much smaller. So we are more normal in this situation. But even a more interesting thing is our standard deviation is smaller. This is more squeezed in than that is. And it's definitely more squeezed in than our original distribution. Now, let me do it with two-- let me clear everything again. I like this distribution because it's a very non-normal distribution. It looks like a bimodal distribution of some kind. And let's take a scenario where I take an n of-- let's take two good n's. Let's take an n of 16. That's a nice, healthy n. And let's take an n of 25. And let's compare them a little bit. I'll do one trial animated just to-- it's always nice to see it. So first, it's going to do 16 of these trials and average them. And there we go. Then it's going to do 25 of these trials and then average them. And then, there we go. Now let's do that, what I just did animated, let's do it 10,000 times. Miracles of computers. Now, notice something. Now this is 10,000 times. These are both pretty good approximations of normal distributions. The n is equal to 25 is more normal. It has less skew, slightly less skew than n is equal to 16. It has a slightly less kurtosis, which means it's closer to being a normal distribution than n is equal to 16. But even more interesting, it's more squeezed in. It has a lower standard deviation. The standard deviation here is 2.1, and the standard deviation here is 2.64. So that's another-- I mean, I kind of touched on that in the last video, and it kind makes sense. For every sample you do for your average, the more you put in that sample, the less standard deviation. Think of the extreme case. If instead of taking 16 samples from our distribution every time, or instead of taking 25, if I were to take a million samples from this distribution every time-- if I were to take a million samples from this distribution every time, that sample mean is always going to be pretty darn close to my mean. If I take a million samples of everything, if I essentially try to estimate a mean by taking a million samples, I'm going to get a pretty good estimate of that mean. The probability that a bunch of-- a million numbers are all out here is very low. So if n is a million, of course, all of my sample means when I average them are all going to be really tightly focused around the mean itself. And so hopefully that kind of makes sense to you as well. If it doesn't, just think about it. Or even use this tool and experiment with it just so you can trust that that is really the case. And it actually turns out that there's a very clean formula that relates to standard deviation of the original probability distribution function to the standard deviation of the sampling distribution of the sample mean. And as you can imagine, it is a function of your sample size, of how many samples you take out in every basket before you average them. And I'll go over that in the next video.