If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Central limit theorem

Introduction to the central limit theorem and the sampling distribution of the mean. Created by Sal Khan.

Want to join the conversation?

  • blobby green style avatar for user redefrec
    If the sample-size approaches infinity (or the size of the population), wouldn't the form of the distribution of the sample means go towards a distribution with a deviance of 0, practically becoming useless? I hope this makes sense.
    (31 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user mattgantz
      You are correct, the deviation go to 0 as the sample size increases, because you would get the same result each time (because you are sampling the entire population). However, the deviation of the sampled means is not an indicator of the deviation of the entire population (as opposed to the mean of the sampled means, which IS an indicator of the mean of the entire population).
      (26 votes)
  • blobby green style avatar for user Margaret Field
    What is a typical sample size that would allow for usage of the central limit theorem
    (12 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user SteveSargentJr
      In practice, "n = 30" is usually what distinguishes a "large" sample from a "small" one. In other words, if your sample has a size of at least 30 you can say it is approximately Normal (and, hence, use the Normal distribution). If, on the other hand, your sample has a size less than 30, it's best to use the t-distribution instead.
      (14 votes)
  • blobby green style avatar for user Bruno Schiavo
    Can the said "crazy dice" really approach a normal distribution? I mean, its sample average can never go above six or below one. Such results would be possible in any normal distribution, which covers the whole spectrum from "minus infinite" to "plus infinite".
    (10 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user jamoen7
      The sample average stays in [1,2,3,4,5,6] if the sample size is 1 (because we only roll the 'crazy die' once). But we're increasing the sample size (rolling the 'crazy die' more times) which will increase the range of values of the sample average. I think that this helps because it improves the 'resolution' of the distribution so the differences between it and a normal distribution become smaller and smaller as the sample size increases.
      (3 votes)
  • male robot hal style avatar for user Sarthak Kumar
    I am slightly confused here... Any help will be very appreciated.

    1. The normal distribution obtained after averaging a large number of samples - Is it a good representation of the original distribution at all?

    For example, if we try to deduce the probability of getting a 4.5 to 5.5 from the resultant normal distribution, it will give us a finite value whereas the original distribution clearly indicates that the probability for this outcome is zero. What am I missing here?

    And if the final normal distribution is NOT a good representation of the original distribution, what is it's purpose in that case?

    2. At , Sal says that if we increase the sample size, the standard deviation will be even smaller. Does this continue indefinitely or does the value of standard deviation stabilize somewhere? I mean if our sample size approaches infinity, will the standard deviation start approaching zero? If yes, will this steep distribution still be useful?

    Many Thanks!
    (6 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Dr C
      1. No. But you may have misunderstood something. We are not averaging a large number of samples, rather, we are obtaining the averages from many repeated samples. The distribution of the sample averages is the Normal distribution we obtained. It does not represent the original distribution well. But it's not supposed to do so! This Normal distribution is the distribution of the sample mean. Its use it to let us talk about the probability of the sample mean being in a given interval, better understanding the population mean, and so forth.

      2. There is no lower bound. If we get an astronomically large sample size, the standard deviation will be astronomically small. The Normal distribution is still useful - it's getting narrower and narrower, which lets us be more and more precise when we use it to try and talk about the population mean.
      (11 votes)
  • leaf orange style avatar for user marimarmateo
    why does this happen? Why does most behavior/characteristics etc., when sampled and plotted, result in a normal distribution?
    (6 votes)
    Default Khan Academy avatar avatar for user
    • leafers seedling style avatar for user Judy Hante
      It is when the sample means from multiple samples are plotted that you get an approximately normal distribution. There are much more technical explanations, but what I tell my Intro to Stats students is that calculating a mean from any sample is going to help even out the high and low values in a sample. Even a mean from a very small sample of n = 5 or 10 has this effect, just by the nature of the mean calculation. It makes intuitive sense that calculating means will give you values that are closer to the overall mean and more tightly distributed than the original data. Eventually even for bimodal and skewed distributions, as n increases, you can see the distributions of the means move more and more toward being unimodal and symmetrical because the mean has the effect of pulling toward the center.
      (6 votes)
  • male robot hal style avatar for user Nate
    When should you use sigma or 'mu'? Also, what's the difference between an average and mean?
    (4 votes)
    Default Khan Academy avatar avatar for user
    • male robot hal style avatar for user Jesse
      Usually, sigma and mu are used for the standard deviation and the mean of a population, whereas S and X bar are used for the standard deviation and mean of a sample.

      The word 'average' is a bit more ambiguous. Average can legitimately mean almost any measure of central tendency: mean, median, mode, typical value, etc. However, even "mean" admits some ambiguity, as there are different types of means. The one you are probably most familiar with it the arithmetic mean, although there is also a geometric mean and a harmonic mean.
      (7 votes)
  • blobby green style avatar for user denispmaciel
    Can the central theorem be abstractly proved? Or one gets to it only by observing an infinitely large number of samples?
    (4 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user linediting
    I have a question about the usefulness of the Central Limit Theorem. In this video, the normal distribution curve produced by the Central Limit Theorem is based on the probability distribution function. I assume that in a real-world situation, you would create a probability distribution function based on the data you have from a specific sample. Then you could use a computer program to create a curve based on hypothetical subsamples that follow this distribution, and you could use that curve to calculate p values and so on. But I assume that the way you have obtained the probability distribution function is by observing the results of a finite number of observations. For example, maybe the one in the video was obtained by rolling the crazy die 100 times. What if it were the case that if you actually rolled the crazy die 1,000 times, you would have gotten a a couple of 2's, but no 5's. Then your normal curve would actually be lower than you originally thought. This could make a difference in the significance values that you get from a t-test.
    (3 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Dr C
      There are two things to keep in mind here:
      1. The distribution of the original data
      2. The (sampling) distribution of the sample mean.

      Generally, we do not know #1. With the die example, we know it - the distribution will be Uniform with values 1-6, all equally weighted (unless the die was loaded). However, in generally, we don't necessarily know the distribution. And even if we knew the form of the distribution (say - Normal, or Exponential, or Poisson, etc.), we typically do not know the parameters of the distribution, so we still can't calculate probabilities and so forth.

      So we collect a sample of data, and yes, you are correct, we can get a sense of the population distribution from this sample. But here we hit a snag: what if the data are such that the distribution look somewhat Normal, but not fully? Can we just use the Normal distribution to calculate probabilities? What if the data follow some other - similar - distribution, such as the Laplace distribution:

      https://en.wikipedia.org/wiki/Laplace_distribution

      It can sometimes be difficult to tell these distributions apart just based on a potentially small sample. We could just assume the data are "Normal enough" and move forward, but is that really appropriate? How will that impact the results? How much error will get introduced into our results because of this?

      If we shift our focus to the sample mean, then the CLT can remove some of these doubts. Because the sampling distribution of the sample mean converges to the Normal distribution when the sample size gets large enough. So we don't even need to care about the distribution of the original data, we can just think about the distribution of the sample mean. Since the two distributions have the same population mean, µ, this means that we can get information about µ using the sampling distribution of the sample mean, instead of the distribution of the original data.
      (3 votes)
  • blobby green style avatar for user Ali
    Does this mean the average of the sample means is the same as average of the original population?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user sandeep.sudheendra
    This works for rolling dice, but let me ask a question where the class is not a number: specifically,

    Suppose I have the result of 10000 chess games (only wins), where I have the frequency of winning when one of the eight pawns is moved (let's ignore knight moves). Let us only consider this for white

    The pawns on a chess board are placed on columns marked by alphabets that run from a to h.

    If the frequency of wins by pawn is:

    a 300
    b 500
    c 1700
    d 3000
    e 3000
    f 600
    g 700
    h 200

    When I take a sample size of say 30, how do I average it out? In Sal's case, he had a number on which the dice landed (1-6). In my case, I don't have these numbers. So, how do I find the mean?
    (3 votes)
    Default Khan Academy avatar avatar for user

Video transcript

In this video, I want to talk about what is easily one of the most fundamental and profound concepts in statistics and maybe in all of mathematics. And that's the central limit theorem. And what it tells us is we can start off with any distribution that has a well-defined mean and variance-- and if it has a well-defined variance, it has a well-defined standard deviation. And it could be a continuous distribution or a discrete one. I'll draw a discrete one, just because it's easier to imagine, at least for the purposes of this video. So let's say I have a discrete probability distribution function. And I want to be very careful not to make it look anything close to a normal distribution. Because I want to show you the power of the central limit theorem. So let's say I have a distribution. Let's say it could take on values 1 through 6. 1, 2, 3, 4, 5, 6. It's some kind of crazy dice. It's very likely to get a one. Let's say it's impossible-- well, let me make that a straight line. You have a very high likelihood of getting a 1. Let's say it's impossible to get a 2. Let's say it's an OK likelihood of getting a 3 or a 4. Let's say it's impossible to get a 5. And let's say it's very likely to get a 6 like that. So that's my probability distribution function. If I were to draw a mean-- this the symmetric, so maybe the mean would be something like that. The mean would be halfway. So that would be my mean right there. The standard deviation maybe would look-- it would be that far and that far above and below the mean. But that's my discrete probability distribution function. Now what I'm going to do here, instead of just taking samples of this random variable that's described by this probability distribution function, I'm going to take samples of it. But I'm going to average the samples and then look at those samples and see the frequency of the averages that I get. And when I say average, I mean the mean. Let me define something. Let's say my sample size-- and I could put any number here. But let's say first off we try a sample size of n is equal to 4. And what that means is I'm going to take four samples from this. So let's say the first time I take four samples-- so my sample sizes is four-- let's say I get a 1. Let's say I get another 1. And let's say I get a 3. And I get a 6. So that right there is my first sample of sample size 4. I know the terminology can get confusing. Because this is the sample that's made up of four samples. But then when we talk about the sample mean and the sampling distribution of the sample mean, which we're going to talk more and more about over the next few videos, normally the sample refers to the set of samples from your distribution. And the sample size tells you how many you actually took from your distribution. But the terminology can be very confusing, because you could easily view one of these as a sample. But we're taking four samples from here. We have a sample size of four. And what I'm going to do is I'm going to average them. So let's say the mean-- I want to be very careful when I say average. The mean of this first sample of size 4 is what? 1 plus 1 is 2. 2 plus 3 is 5. 5 plus 6 is 11. 11 divided by 4 is 2.75. That is my first sample mean for my first sample of size 4. Let me do another one. My second sample of size 4, let's say that I get a 3, a 4. Let's say I get another 3. And let's say I get a 1. I just didn't happen to get a 6 that time. And notice I can't get a 2 or a 5. It's impossible for this distribution. The chance of getting a 2 or 5 is 0. So I can't have any 2s or 5s over here. So for the second sample of sample size 4, my second sample mean is going to be 3 plus 4 is 7. 7 plus 3 is 10 plus 1 is 11. 11 divided by 4, once again, is 2.75. Let me do one more, because I really want to make it clear what we're doing here. So I do one more. Actually, we're going to do a gazillion more. But let me just do one more in detail. So let's say my third sample of sample size 4-- so I'm going to literally take 4 samples. So my sample is made up of 4 samples from this original crazy distribution. Let's say I get a 1, a 1, and a 6 and a 6. And so my third sample mean is going to be 1 plus 1 is 2. 2 plus 6 is 8. 8 plus 6 is 14. 14 divided by 4 is 3 and 1/2. And as I find each of these sample means-- so for each of my samples of sample size 4, I figure out a mean. And as I do each of them, I'm going to plot it on a frequency distribution. And this is all going to amaze you in a few seconds. So I plot this all on a frequency distribution. So I say, OK, on my first sample, my first sample mean was 2.75. So I'm plotting the actual frequency of the sample means I get for each sample. So 2.75, I got it one time. So I'll put a little plot there. So that's from that one right there. And the next time, I also got a 2.75. That's a 2.75 there. So I got it twice. So I'll plot the frequency right there. Then I got a 3 and 1/2. So all the possible values, I could have a three, I could have a 3.25, I could have a 3 and 1/2. So then I have the 3 and 1/2, so I'll plot it right there. And what I'm going to do is I'm going to keep taking these samples. Maybe I'll take 10,000 of them. So I'm going to keep taking these samples. So I go all the way to S 10,000. I just do a bunch of these. And what it's going to look like over time is each of these-- I'm going to make it a dot, because I'm going to have to zoom out. So if I look at it like this, over time-- it still has all the values that it might be able to take on, 2.75 might be here. So this first dot is going to be-- this one right here is going to be right there. And that second one is going to be right there. Then that one at 3.5 is going to look right there. But I'm going to do it 10,000 times. Because I'm going to have 10,000 dots. And let's say as I do it, I'm going just keep plotting them. I'm just going to keep plotting the frequencies. I'm just going to keep plotting them over and over and over again. And what you're going to see is, as I take many, many samples of size 4, I'm going to have something that's going to start kind of approximating a normal distribution. So each of these dots represent an incidence of a sample mean. So as I keep adding on this column right here, that means I kept getting the sample mean 2.75. So over time. I'm going to have something that's starting to approximate a normal distribution. And that is a neat thing about the central limit theorem. So an orange, that's the case for n is equal to 4. This was a sample size of 4. Now, if I did the same thing with a sample size of maybe 20-- so in this case, instead of just taking 4 samples from my original crazy distribution, every sample I take 20 instances of my random variable, and I average those 20. And then I plot the sample mean on here. So in that case, I'm going to have a distribution that looks like this. And we'll discuss this in more videos. But it turns out if I were to plot 10,000 of the sample means here, I'm going to have something that, two things-- it's going to even more closely approximate a normal distribution. And we're going to see in future videos, it's actually going to have a smaller-- well, let me be clear. It's going to have the same mean. So that's the mean. This is going to have the same mean. So it's going to have a smaller standard deviation. Well, I should plot these from the bottom because you kind of stack it. One you get one, then another instance and another instance. But this is going to more and more approach a normal distribution. So this is what's super cool about the central limit theorem. As your sample size becomes larger-- or you could even say as it approaches infinity. But you really don't have to get that close to infinity to really get close to a normal distribution. Even if you have a sample size of 10 or 20, you're already getting very close to a normal distribution, in fact about as good an approximation as we see in our everyday life. But what's cool is we can start with some crazy distribution. This has nothing to do with a normal distribution. This was n equals 4, but if we have a sample size of n equals 10 or n equals 100, and we were to take 100 of these, instead of four here, and average them and then plot that average, the frequency of it, then we take 100 again, average them, take the mean, plot that again, and if we do that a bunch of times, in fact, if we were to do that an infinite time, we would find that we, especially if we had an infinite sample size, we would find a perfect normal distribution. That's the crazy thing. And it doesn't apply just to taking the sample mean. Here we took the sample mean every time. But you could have also taken the sample sum. The central limit theorem would have still applied. But that's what's so super useful about it. Because in life, there's all sorts of processes out there, proteins bumping into each other, people doing crazy things, humans interacting in weird ways. And you don't know the probability distribution functions for any of those things. But what the central limit theorem tells us is if we add a bunch of those actions together, assuming that they all have the same distribution, or if we were to take the mean of all of those actions together, and if we were to plot the frequency of those means, we do get a normal distribution. And that's frankly why the normal distribution shows up so much in statistics and why, frankly, it's a very good approximation for the sum or the means of a lot of processes. Normal distribution. What I'm going to show you in the next video is I'm actually going to show you that this is a reality, that as you increase your sample size, as you increase your n, and as you take a lot of sample means, you're going to have a frequency plot that looks very, very close to a normal distribution.