If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Sampling distribution of the sample mean (part 2)

More on the Central Limit Theorem and the Sampling Distribution of the Sample Mean. Created by Sal Khan.

Want to join the conversation?

  • blobby green style avatar for user mainman89
    Whats the point of the central limit Theorem if it doesn't provide you with the actual population distribution. For Ex in this video the population distribution in reality was totally different than the normal distribution. So what the importance of this concept?
    (14 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user tmattson
      Each separate sample we take from the population will be different - they will have different scores and different sample means. So how do we tell which sample gives us the best description of the population? Can we even predict how well a sample describes the population it is drawn from?

      By using the distribution of sample means we have the ability to predict the characteristics of the sample. And one of the basic reasons behind taking a sample is to use the sample data to answer questions about the larger population.

      The Central Limit Theorem helps us to describe the distribution of sample means by identifying the basic characteristics of the samples - shape, central tendency and variability. So the distribution of sample means helps us to find the probability associated with each specific sample.

      And because there's always some discrepancy or error between a sample statistic and the corresponding population statistic, the CLT enables us to calculate exactly how much error to expect.
      (18 votes)
  • blobby green style avatar for user Robert Fraser
    Could you please give a practical example of the utility of the Sampling Distribution of the Sample Mean (SDSM) when used with a NON NORMAL distribution? It would seem that a non-normal distribution generated by some process would mean that the process was "out of control", multiple processes going on, or some such. If that's the case, of what utility is the SDSM when it does not describe the scattered output of said process? Thanks much.
    (9 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user lanneeek
    Why is the mean of the sampling distribution of sample means always equal to the population mean?
    (5 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Dr C
      In formulas:
      E[X] = µ
      E[ xbar ] = E[ 1/n Σ xi ]
      = 1/n E[ Σ xi ]
      = 1/n Σ E[X]
      = 1/n ( n * µ )
      = µ


      Logically, it makes sense this should be the case. If some variable has mean µ, that means we expect a given value to be µ. There'll be some variation around that, but that's what we expect, on average. So, we're expecting the average to be µ. Then, if we get a lot of such sample means (that is: the sampling distribution), we're getting a whole lot of values which we expect to all be µ. The average of a lot of things that are all µ or very close to it, should also be µ.
      (7 votes)
  • aqualine sapling style avatar for user VanillaDazzle
    @ actually, if you pulled a 9 and a 6, you would get 7 and a half.
    (6 votes)
    Default Khan Academy avatar avatar for user
  • male robot hal style avatar for user ledaneps
    Is there a difference between 1000 times taking samples of 10 (as at ) and 10 times taking samples of 1000?
    What about taking 1 time taking a sample of 10000?
    ...Or 10000 times taking a sample of 1?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • mr pink red style avatar for user Olena
      Yes. There is the difference. The bigger sample size you have - the narrow normal distribution you will get.

      For example,

      Sample size = 25, number of iteration = 5 || Sample size = 5, number of iteration = 25

      mean ___________ 13.74 __________ || ____________ 14.32 ____________
      median __________ 13.00 __________ || ____________ 15.00 ____________
      SD ______________ 1.19 ___________ || ____________ 2.99 ____________
      (4 votes)
  • piceratops sapling style avatar for user Nicolas Quiroz
    Technically, Sal says that the bigger the sample data the normal"er" the distribution is, but if N=population then you would get a vertical line with a value of x=mean and y tends to infinity, right?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user robshowsides
      Exactly correct! But we almost always assume that the size of the sample is much smaller than the population. If you could really "sample" the ENTIRE POPULATION, then by definition, that's not a "sample", so of course all this theory sort of becomes nonsense. The whole point of this is to try to answer the question, "what can I say about the population if I only know the values for a sample of the population?"
      (4 votes)
  • blobby green style avatar for user Mike G
    at around , Sal says we are never going to get 7.5 when n=2. But if 6 and 9 are randomly selected, then 7.5 would be the average, so I'm not quite understanding his reasoning here?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • leafers sapling style avatar for user Charles Nyiha
    What happens when you take the sampling distribution of the sample mean of the sampling distribution of the sample mean of a set of observations? And what happens when you repeat that again and again on the same set? And how would the results relate to one another?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Dr C
      Let's say that X is my data, xbar1 is the sample mean, and xbar2 is the sample mean of a collection of sample means (basically, a second-level sample mean)

      For sake of argument, let's assume that X is Normally distributed with mean μ and SD σ/√n. Then for a sample of size n, xbar1 is Normally distributed with mean μ and SD σ/√n. If we thought of xbar1 as our random variable, and took samples of size m, we'd apply the same logic, and get that xbar2 is Normally distributed with mean μ and SD σ /√(nm). And so on.

      However, I'm not sure if you're quite thinking along the right lines. The purpose of a sampling distribution is to understand how a statistic varies from sample to sample. Generally speaking, we will have a sample of data, and so only the "first level" sampling distribution would be relevant. While it's possible to think of a "second level" sampling distribution (as you have: the sampling distribution of the sample mean of the sampling distribution), by and large we just won't have any use for it.
      (2 votes)
  • blobby green style avatar for user chris.freilich
    This might be a stupid question, but why is the word 'sample' in 'the sampling distribution of the sample mean' twice?

    Sample mean = mean of all the individual elements of the sample.

    Distribution = How those are distributed.

    Sampling = ?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      The term "sampling distribution of the sample mean" might sound redundant but each word has a specific meaning. "Sample mean" refers to the mean of a sample. "Sampling distribution" refers to the distribution you would get if you took many samples and calculated each sample's mean. So, it's the distribution of these means over many samples, hence the wording.
      (1 vote)
  • blobby green style avatar for user 1499114179
    why SD_p = SD_x / n(1/2)? As Sal said before in scaling random variable video, if x_new = X/n, then SD(x_new) = SD(X)/n. But here, why take square root?
    (2 votes)
    Default Khan Academy avatar avatar for user

Video transcript

We hopefully now have a respectable working knowledge of the sampling distribution of the sample mean. And what I want to do in this video is explore a little bit more on how that distribution changes as we change our sample size n. I'll write n down right here. Our sample size n. So just as a bit of review, we saw before we can just start off with any crazy distribution, maybe it looks something like this. I'll do a discrete distribution. Really to model anything at some point you have to make it discreet. It could be a very granular discrete distribution. But let's say it's something crazy that looks like this. This is clearly not a normal distribution. But we saw in the first video if you take, let's say, sample sizes of 4. So if you took 4 numbers from this distribution, 4 random numbers where let's say this is the probably of a 1, 2, 3, 4, 5, 6, 7, 8, 9. If you took 4 numbers at a time and averaged them-- let me do that here-- if you took 4 numbers at a time, let's say we used this distribution to generate 4 random numbers, right? We're very likely to get a 9. We're definitely not going to get any 7's or 8's. We're definitely not going to get a 4. We might get a 1 or 2. 3 is also very likely. Five is very likely. So we use this function to essentially generate random numbers for us. And we take samples of 4 and then we average them up. So let's say our first average is, I don't know, let's say it's a 9, it's a 5, it's another 9, and then it's a 1. So what is that? That's 14 plus 10. 24 divided by 4. The average for this first trial, for this first sample of 4, is going to be 6, right? They add up to 24 divided by 4. So we would plot it right here. Our average was 6 that time. Just like that. And we'll just keep doing it. And we've seen in the past that if you just keep doing this, this is going to start looking something like a normal distribution. So maybe we'd do it again, the average is 6 again. Maybe we do it again, the average is 5. We do it again, the average is 7. We do it again, the average is 6. And then if you just do this a ton, a ton of times, your distribution might look something that looks very much like a normal distribution. So these boxes are really small. So we just do a bunch of these trials, at some point it might look a lot like a normal distribution. Obviously there are some average values. It won't be a perfect normal distribution because you can never get anything less than a 0, or anything less than a 1, really as an average. You can't get 0 as an average. And you can't get anything more than 9. So it's not going to have infinitely long tails but at least for the middle part of it a normal distribution might be a good approximation. In this video what I want to think about is what happens as we change n. So in this case n was 4. n is our sample size. Every time we do a trial we took 4 and we took their average and we plotted it. We could have had n equal 10. We could have taken 10 samples from this from this population, you could say, or from this random variable, average them, and then plotted them here. And in the last video we ran the simulation. I'm going to go back to that simulation a second. We saw a couple of things. And I'll show it to you at a little bit more depth this time. When n is pretty small, it doesn't approach a normal distribution that well. So when n is small-- I mean, let's take the extreme case. What happens when n is equal to 1? That literally just means I take 1 instance of this random variable and average it. Well it's just going to be that thing. So if I just take a bunch of trials from the thing and plot it over time, what's it going to look like? Well it's definitely not going to look like a normal distribution. It's going to look-- you're going to have a couple of 1's, you're going to have a couple of 2's. You're going to have more 3's like that. You're going to have no 4's. You're going to have a bunch of 5's. You're going to have some 6's that'll look like that. And you're going to have a bunch of 9's. So there your sampling distribution of the sample mean for an n of 1 is going to look-- I don't care how many trials you do, it's not going to look like a normal distribution. So the central limit theorem, although I said you do a bunch of trials, it'll look like a normal distribution, it definitely doesn't work for n equal 1. As n gets larger though it starts to make sense. That let's see if we've got n equals 2--- and I'm all just doing this in my head, I don't know what the actual distributions would look like-- but then, it's still would be difficult for it to become an exact normal distribution. But then you can get more instance-- that you could get more-- you know, you might get things from all of the above. But you can only get two in each of your baskets that your averaging. You're only going to get 2 numbers, right? So? You're never going to for example, you're never going to get 7.5 in your sampling distribution of the sample mean for n is equal to 2 because it's impossible to get a 7 and it's impossible to get an 8. So you're never going to get 7.5 as-- so maybe when you plot it, maybe it looks like this. But there will be a gap at 7.5 because that's impossible and maybe it looks something like that. So it's still won't be a normal distribution when n is equal to 2. So there's a couple of interesting things here. So one thing-- and I didn't mention this the first time because I really wanted you to get the gut sense what the central limit theorem is-- the central limit theorem says as n approaches-- really as it approaches infinity then is when you get the real normal distribution. But in kind of every day practice, you don't have to get that much beyond n equals two. If you get to n equals 10 or n equals 15, you're getting very close to a normal distribution. So this converges to a normal distribution very quickly. Now the other thing is you obviously wants many, many trials. So this is your sample size. That is your sample size. That's the size of each of your baskets. In the very first video I did on this, I took a sample size of 4. And in the simulation I did in the last video, we did sample sizes of 4 and 10 and whatever else. This is a sample size of one. So that's our sample size. So as that approaches infinity your actual sampling distribution of the sample of the sample mean will approach a normal distribution. Now in order to actually see that normal distribution and actually to prove it to yourself, you would have to do this many, many-- remember the normal distribution happens, this is essentially the population or this is the random variable. That tells you all of the possibilities. In real life, we seldom know all the possibilities. In fact in real life, we seldom know the pure probability generating function. Only if we're writing it or if we're writing a computer program. Normally we're doing samples and we're trying estimate things. So normally there's some random variable and then maybe we'll do a bunch of-- we'd take it a bunch of samples, we'd take their means and we'd plot them and we're going to get some type of normal distribution. Let's say we take samples of 100 and we average them. We're going to get some normal distribution. And in theory, as we take those averages hundreds or thousands of times, our data set it's going to more closely approximate that pure sampling distribution of the sample mean. This thing is a real distribution. It's a real distribution with a real mean. It has a pure mean. So the mean of the sampling distribution of the sample mean, we'll write it like that. Notice I didn't write it is just the x with-- what this is, this is actually saying that this is a real population mean, this is a real random variable mean. If you look at every possibility of all of the samples that you can take from your original distribution, from some other random original distribution, and you took all of the possibilities of let's see sample size. Let's see were dealing with the world where a sample size is 10. If you took all of the combinations of 10 samples from some original distribution and you averaged them out, this would describe that function. Of course in reality, if you don't know the original distribution, you can't take an infinite samples from it so you won't know every combination. But if you did it with 1,000-- if you did the trial 1,000 times-- so 1,000 times you took 10 samples from some distribution and took 1,000 averages and then plotted them, you're going to get pretty close. Now the next thing I want to touch on is what happens as n-- we know as n approaches infinity it becomes more of a normal distribution, but as I said already, n equals 10 is pretty good and n equals 20 is even better. But we saw something in the last video that at least I find pretty interesting. Let's say we start with this crazy distribution up here. It really doesn't matter what distribution we're starting with. We saw in the simulation that when n is equal to 5, our graph after we try-- we take samples of 5, average them and we do it 10,000 times-- our graph look something like this. It's kind of wide like that. And then when we did n is equal to 10 our graph looked a little bit-- it was actually a little bit squeezed in like that a little bit more. So not only was it more normal-- that's what the central limit theorem tells us because we're taking larger sample sizes-- but it had a smaller standard deviation or a smaller variance, right? The mean is going to be the same either case but when our sample size was larger our standard deviation became smaller. In fact, our standard deviation became smaller than our original population distribution-- or original probability density function. Let me show you that with a simulation. So let me clear everything. And this simulation is as good as any, so the first thing I want to show-- or this distribution is as good as any-- the first thing I want to show you is that n of 2 is really not that good. So let's compare an n of 2 to let's say an n of 16. So when you compare an n of 2 to an n of 16, let's do it once. So you get 1, 2 trials, you average them. And then it's going to do 16 and then it's going to plot it down here and average there. Let's do that 10,000 times. So notice, when you took an n of 2, even though we did it 10,000 times, this is not approaching a normal distribution. You can actually see it in the skew and kurtosis numbers. It has a rightward positive skew which means it has a longer tail to the right than to the left. And then it has a negative kurtosis which means that it's a little bit-- it has shorter tales and smaller peaks than a standard normal distribution. Now when n is equal to 16 you do the same. So every time we took 16 samples from this distribution function up here and averaged them-- and each of these dots represent an average and we did it 10,001 times-- and notice the mean is the same in both places but here all of a sudden, our kurtosis is much smaller and our skew is much smaller. So we are more normal in this into situation. But even a more interesting thing is our standard deviation is smaller, right? This is more squeezed in than that is. And it's definitely more squeezed in then our original distribution. Now let me do it with 2-- let me clear everything again. I like this distribution because it's a very non-normal distribution. It looks like a bimodal distribution of some kind. And let's take a scenario where I take an n of-- let's take two good n's. Let's take an n of 16-- that's a nice healthy n-- and let's take an n of 25 and let's compare them a little bit. So if we-- I'll do one trial animated just because it's always nice to see. So first it's going to do 16 of these trials and average them and there we go. And then it's going to do 25 of these trials and then average them and then there we go. Now let's do that-- what I just did animated-- let's do it 10,000 times. Miracles of computers. Now notice something: this is 10,000 times. These are both pretty good approximations of normal distributions. The n is equal to 25 is more normal. It has less skew-- slightly less skew than n is equal 16. It has slightly kurtosis which means it's closer to being a normal distribution than n is equal to 16. But even more interesting, it's more squeezed in. It has a lower standard deviation. The standard deviation here is 2.1 and the standard deviation here is 2.64. So that's another-- I mean I kind of touched on that in the last video-- and it kind of makes sense. For every sample you do for your average, the more you put into that sample, the less standard deviation. Think of the extreme case. If instead of taking 16 samples from our distribution every time or instead of taking 25, if I were to take 1,000,000 samples from this distribution every time that sample mean is always going to be pretty darn close to my mean. If I take 1,000,000 samples of everything, if I essentially try to estimate a mean by taking 1,000,000 samples, I'm going to get a pretty good estimate of that mean. The probability that a million numbers are all out here is very low. So if n is 1,000,000 of course all of my sample means when I average them are all going to be really tightly focused around the mean itself. So hopefully that kind of makes sense to you as well. If it doesn't just think about it or even use this tool and experiment with it just so you can trust that is really the case. And it actually turns out that there's a very clean formula that relates to standard deviation of the original probability distribution function to the standard deviation of the sampling distribution of the sample mean. And as you can imagine it is a function of your sample size, of how many samples you take out in every basket before you average them. And I'll go over that in the next video.