If you're seeing this message, it means we're having trouble loading external resources for Khan Academy.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Sampling distribution of the sample mean

The central limit theorem and the sampling distribution of the sample mean
Back

Sampling distribution of the sample mean

Discussion and questions for this video
Where on the onlinestatbook site is this little software toy?

Thanks, John
I have a practice question that I just can't figure out. It is: "Eighteen subjects are randomly selected and given proficiency tests. The mean for this group is 492.3 and the standard deviation is 37.6. Construct the 98% confidence interval for the population standard deviation."

I don't know how to figure out the confidence interval for a standard deviation. Can you please help. Thanks. Katie
We already know that:
A range from -1 std.dev. to 1 std.dev. contains 68.3% of outcomes.
A range from -2 std.dev. to 2 std.dev. contains 95.4% of outcomes.
A range from -3 std.dev. to 3 std.dev. contains 99.7% of outcomes.

So the question is, how many Std.Dev's do we have to move away from the mean in both directions on the graph to contain 98% of outcomes. Not 95.4%, Not 99.7%, exactly 98%. Right away you know the answer will be between 2 and 3 std.dev's, as 98% is between 95.4% and 99.7%

To
If we know the mean and the standard deviation of the population, then why are we taking samples, if we already have the data?

Thanks in advance.
Learning statistics can be a little strange. It almost seems like you're trying to lift yourself up by your own bootstraps. Basically, you learn about populations working under the assumption that you know the mean/stdev, which is silly, as you say, but later you begin to drop these assumptions and learn to make inferences about populations based on your samples.

Once you have some version of the Central Limit Theorem, you can start answering some interesting questions, but it takes a lot of study just to get there!
Is there any difference if I take 1 "sample" with 100 "instances", or I take 100 "samples" with 1 "instance"?
(By sample I mean the S_1 and S_2 and so on. With instances I mean the numbers, [1,1,3,6] and [3,4,3,1] and so on.)
Sal goes over this better than I do in the next video as well!
So if every distribution approaches normal when do I employ say a Poisson or uniform or a Bernoulli distribution? I suppose it's a concept I haven't breached yet but how do I know when or which distribution to employ so I appropriately analyze the data? End goal = solve real world problems!
Not every distribution goes to the Normal. the distribution of the sample mean does, but that's as the sample size increases. If you have smaller sample sizes, assuming normality either on the data or the sample mean may be wholly inappropriate.

In terms of identifying the distribution, sometimes it's a matter of considering the nature of the data (e.g. we might think "Poisson" if the data collected are a rate, number of events per some unit/interval), sometimes it's a matter of doing some exploratory data analysis (histograms, boxplots, some numerical summaries, and the like).

For actually analyzing data: I would suggest hiring someone with more extensive training in Statistics to actually do such. Taking one course in Stats, which is basically what KhanAcademy goes through, isn't really enough to prepare someone to be a data analyst. I see the primary goal of taking one or two stats courses as giving you enough information to allow you to understand the results of statistical analyses. You can better tell the statistician what you want in his/her own terms, and you can better understand what s/he gives back to you.
Me and my friend Callum have been experimenting with sampling distribution progran on online stat book used by Sal (http://onlinestatbook.com/stat_sim/sampling_dist/index.html). However we found a result we cannot explain nor rationalise: When we ask for a sample size of 2 for the median disribution of any population it aproximates the population distribution and not a 'bell curve'. I am very disturbed by this because surely the median of 2 numbers is the same as the mean of 2 numbers and according to the central limit theorem should approximate a normal distribution. Is this assumption correct? Is the programme wrong? Or is there something we fail to understand?
The distribution of the sample median is not normal even if you take a larger sample size, such as n=5,10,or 25. The distribution of the sample median seems to be more related to the distribution of the population.
But I don't know why.
why can we say that the sampling distribution of mean follows a normal distribution for a large enough sample size even though the population is may not be normally distributed?
Properly, the sampling distribution APPROXIMATES a normal distribution for a sufficiently large sample (sometimes cited as n > 30). A coin flip is not normally distributed, it is either heads or tails. But 30 coin flips will give you a binomial distribution that looks reasonably normal (at least in the middle).
@ 9:15 two distributions are shown and compared (N=5 and N=25) and Sal explains in terms of skew and Kurzweillosis (or something) that the N=25 distribution is more normal. But wait... it does not LOOK more normal to me. Specifically, it looks a lot lumpier... as if it were composed of less data. Each bin is fatter and there are less bins. Am I making sense? Can someone explain?
The lumpier look you're seeing is exactly because of the fewer number of bins. If we wanted, we could go in and specify how we wanted the bins formed, but typically there's just a computer algorithm that chooses the bins in some fashion. If we chose a few more bins there, it would looks much more smooth.

The bottom histogram looks more normal because of the general behavior of the distribution. The one for n=5 is like a normal distribution that was smashed down a bit. It's too short in the middle and has too "fat" of tails. If you think back to, say, the Empirical Rule, the top one would probably have less than 68% of the data within 1 standard deviation of the mean.

p.s. the word is "kurtosis," it's a way to describe the "peakedness" of the graph. A graph with high kurtosis will have much sharper peak (picture 1 below), a graph with low kurtosis will have much more of a rolling hill look to it.

Picture 1:
http://commons.wikimedia.org/wiki/File:Orographic_lifting_of_the_air_-_NOAA.jpg

Picture 2:
http://en.wikipedia.org/wiki/File:FoothillsCO.JPG
I'm a little confused about what you're doing at 04:40. Lets say the PDF represents the 32 species of animals on a small island. So that application selects 5 types of animals lets say zebras, goats, penguins, gorillas and porcupines and plots their mean on the graph below. How the hell can you get the mean of a set of 5 species of animals? I don't get it.
@cnidoblast, selecting 5 types of animals invalidates the CLT. One of the assumptions of the most common CLT (there are actually many versions, this one is the most common) is that the observations, what Mr. Khan calls samples, are independent and identically distributed instances of a random variable. A random variable is a function that converts an observation from a random process in to a number. Your animals are not numbers, so it's meaningless to sum them much less find the mean. If you're talking about averaging their weights then it still fails the CLT assumptions because the weights that you're averaging do not come from an identical distribution. That is, the distribution of weights of zebras is very different from the distribution of weights of goats. Hope this helps! :)
Could you define a measure of skewness as (mean-median)/standard deviation? An advantage of this would be that it is easier to calculate, and it can only take values between -1 and 1
I'm having some issues with this question.

3. For the general population, mean IQ is 100 with a standard deviation of 15. A sample of 100 people is selected at random from the population, with a sample mean of 102. This sample mean comes from a distribution of sample means with the following properties:

a. a mean of 100 and a standard error of 1.5
b. a mean of 102 and a standard error of 1.5
c. a mean of 100 and a standard error of 15
d. a mean of 102 and a standard error of 15

I think that the answer is either a or b, because you would divide the SD 15 by the square root of the original mean 10, which gives 1.5. But I have no idea what to do about the mean 100/102? Can anyone explain why it is one or the other?
THe general population is known to have a mean IQ of 100. That means that the distribution of sample means also has a mean of 100.
At 513 pm: For some reason, I understand this when it comes to means but in Sampling distribution of the sample proportion- Using population (4,5,9), sample size n = 2- I am struggling to construct a table that represents the sampling distribution of the sample proportion of odd numbers. Can you please explain?
How would one answer a question such as "what is the sampling distribution of the sample mean? Explain." after being given a problem where the only info given is the mean of a (normal) distribution and its standard deviation? There is also a number that is being randomly computed and averaged. Is the sample mean the mean of the normal distribution?
The sampling distribution of a normal distribution is itself normally distributed. The mean of the sampling distribution is the mean of the original distribution (by symmetry there is no other possible result), and the standard deviation of the sampling distribution shrinks by the square root of the sample size.

This derives from the properties of the variance. When you add two random variables, the variance of the sum adds. Thus when you add n identical random variables, the variance of the sum is n times the original variance and the standard deviation (square root of the variance) is sqrt(n) times the original standard deviation. Divide this by n, to AVERAGE n identical random variables, and you get the above result.
how do distributions provide a link between probabilities and statistical tests
Statistical tests are generally trying to compute the probability of something. Most often, there is an assumption (hypothesis), and we find the probability of the observed results assuming that hypothesis is true.

The probabilities can be calculated in a few different ways, but a very common method is through a distribution. So, we think that the data or a function of it, like a test statistic, has a particular distribution (this is generally _proven_, so it's not just a guess), and we can use that distribution to calculate probabilities.
what is the relationship between M, meu, and meu with subscript m?
I have a question that I dont quite understand and it goes like this: "Assume the weights of eggs produced on an egg farm have a normal distribution with mean 64 grams and standard deviation 7 grams. and it also says "describe the distribution of weights of 12 (randomly chosen) mixed grade eggs?
9:08, how do you get five samples from the non-normally distributed probability function? How do you get a set of data from the probability function?
Computers can quite easily simulate uniform distributions (for example the rand() function in matlab that gives a number between 0 and 1 accordingly to an uniform distribution). With that number you can simulate all sorts of other distributions.
For example if you want to simulate a fair dice you do :
x = rand(1)
if (x<1/6) then y = 1
elseif (x<2/6) then y = 2
elseif (x<3/6) then y = 3
elseif (x<4/6) then y = 4
elseif (x<5/6) then y = 5
else y=6

This is how you can simulate easily discrete distributions.
My professor said the answer to the problem is "NOT" 0. I take meticulous notes, record lectures, online research, etc. Why can't I figure this out. Do I need to somehow calculate a sample proportion? Not sure what else to do. If the sample proportion is not given, how do I find it. The problem is the Z scores are above 3 and our Standard Normal Distribution Table stops at 3. Again, he said the answer is not 0. Below are some problems directly pasted here:

1) Given a normal distribution with a µ = 100 and σ = 10, if you select a random sample of n = 25, what is the probability that the sample mean is between 90 and 97.5?

2) Given a normal distribution with a µ = 50 and σ = 8, if you select a random sample of n = 100, what is the probability that the sample mean is between 47 and 49.5?

3) Given a normal distribution with a µ = 50 and σ = 5, if you select a random sample of n = 100, there is a 35% chance that the sample mean is above what value?



I'm really struggling here with the Z's being greater than 3. working on this for three days. Not just trolling for answers and being lazy. I desperately want to know the techniques and steps to calculate situations like this. Thank you very very much.
The key to all of these questions is using the standard error of the mean which is described in one of the next videos in the section.

Briefly, the SE (standard error) = standard deviation / sqr (sample size).

For 1) the SE = 10 (standard deviation) / 5 (sqr of 25) = 2. If you use a z table, we are looking for the probability of z between -5 {(90-100)/2} and -1.25 {(97.5-100) / 2}. This is .1056 using this online table (http://www2.fiu.edu/~millerr/Normal%20Table.pdf).

The other problems are solved similarly
I need help putting together the formula to anser the question, "A population is bimodal with a variance of 5.77. One hundred samples of size 30 are randomly selected and the 100 sample means are calculated. The standard deviation of the sample means is approximately:
I have a question that I cant figure out please help:
Identify the class width, class midpoints, & class boundaries for the given frequency distribution
Daily low temp (F) Frequency Daily low temp (F) Frequency
32-35 1 48-51 7
36-39 3 52-55 7
40-43 5 56-59 1
44-47 11
The class widths are the width of each interval which in this case is 4 (e.g. {32, 33, 34, 35} has 4 items),

the mid points are the mid point of each class, (top + bottom)/2, 33.5 in the case of the first one.

The boundaries between the ranges except you want to include the data that gets rounded up or down, so you add 0.5 to the top boundary or subtract it from the lower. So they would be 31.5, 35.5, 39.5, ..., 59.5
I have a question m failing to solve. ' A population has a mean of 200 and a standard deviation of 50. A simple random sample of size 100 will be taken and the sample mean x will be used to estimate the population mean. Show the sampling distribution of the sample mean
at 8:45, it has been said that even for single samples the central limit theorem is true. It is not so, central limit theorem is applicable only for sample MEANS. For example, out of a population of 5000 if I have taken the sample of n=50, central limit theorem does NOT apply to that. It applies only when I have taken (e.g.)40 samples of n=50. However, this is as per my understanding. Please correct me if I am wrong.
What I don't understand is when you have a large Binary distribution for example, and you approximate it using Normal distribution.. If you only have one sample consisting of x values, you haven't got a standard deviation really.. we always have those kinds of questions on the exam but i always get the formula wrong then..
As long as you know all the values in the sample, you can do the series of calculations described under "basic examples" here http://en.wikipedia.org/wiki/Standard_deviation to figure out what the sample's standard deviation is. Of course, you have to divide by N-1 with samples like the wikipedia article (as well as Sal's video on standard deviation) explains, otherwise it's exactly the same. Perhaps you are limiting your definition of "standard deviation" to "standard deviation of population", which you of course can't figure out with just one sample of values? If it's not specified that the population's SD is asked for in the exam question you're describing, it's safe to assume that they are asking for the sample's SD.
I'm trying to picture skew and kurtosis, but I have no idea how much the numbers actually mean. Is there a video that gives a good idea of how much skew is, say: 0.1, 0.5, 1, and 10? Same thing with kurtosis. I like having a feel for what the value means in my brain.
only the mean follows the CLT ?
What would be the difference between the distribution of a sample variable and the sampling distribution of the mean?..? I'm so confused between these two terms
Sal repeats "well defined mean" and "well defined variance" a couple of times at the very beginning of the video. When are these quantities not well defined?
what does the ! following a number mean
7! means 7 factorial which is the same as 7*6*5*4*3*2*1 or 5040
This seems to be a simple question to answer, but I'm actually not 100% certain about it:

say there's a population that's normally distributed with mean u and standard deviation s. An independent sample of N observations is drawn from the population. What is the distribution of the sample mean? I think it's still a normal distribution, but I'm not sure if this is correct and sufficient, because I'm still in the process of getting comfortable with all this stat lingo.

Thanks!
Yes. If X is normally distributed, then the sample mean xbar will also be normally distributed regardless of the sample size. If X is _not_ normally distributed, then we have to make sure the sample size is large enough for the Central Limit Theorem to kick in.
There were two cases talked about; n=5 and n=25. It was said that after 10,000 samples the n=25 was a closer fit to the normal distribution than the n=5 case. What I want to know is, if there were infinite samples, would the n=5 and the n=25 cases both be a perfect normal distribution?

If this is so: As the number of samples tends to infinity, does the n=25 case converge to the normal distribution faster than the n=5 case?
This is answered in the next video in the series.
2:20 Can you really do it with the mode? It seems like there would be some distributions in which no matter how many samples you take, the mode would not be normally distributed.
In this example Sal took 10,000 samples of 5 for a total of 50,000 samples in the first example. Why not just take 50,000 samples of the original distribution and calculate the mean and SD?
Every random variable has some sort of probability distribution. When we have a lot of data, we can plot them in a histogram and "see" the probability distribution. This is what we often do to see the distribution of the raw data. But the sampling distribution o the sample mean is trickier business. When we calculate the sample mean (xbar) , we have 1 value. Xbar is still a random variable, but for a given dataset, we have only 1 value of xbar, and using just 1 value is not going to provide a very useful plot. We'd really like to see how the sampling distribution of xbar behaves, but for that we need to have a lot of xbar's.

So we can do some experiments like Sal has done. He decided how to generate some data (according to that very strange population he was making on the top panel), and then he can drew 5 observations from that distribution. By calculating the mean, we get 1 observation from the sampling distribution of the sample mean. If we do this over and over again, that lets us get 10,000 observations from the sampling distribution of the sample mean. Plotting all of these together lets us see how the sampling distribution of the sample mean behaves - at least for the distribution Sal specified.

If we had put all 50,000 observations that we drew together and calculated the sample mean and SD, that would just be 1 observation from the sampling distribution of the sample mean (with n=50,000 instead of n=5). If we plotted all 50,000 together, that would be plotting the distribution of the raw data, not the distribution of the sample mean.

Let me know if this helps. It's a pretty tricky concept to grasp, I've had college students struggle to understand this, and that was when I was there explaining it in person.
video wont play plese fix
maybe there's a bug or something.
are sample mean and population mean the same? while solving ques for confidence intervals why do we always subtract the sample mean from the value when the formula includes population mean?
there are less videos on econometrics..:(
A manufacturer knows that their items have a normally distributed lifespan, with a mean of 2.6 years, and standard deviation of 0.5 years.
If you randomly purchase 25 items, what is the probability that their mean life will be longer than 3 years?
Discuss the site

For general discussions about Khan Academy, click here.


Flag inappropriate posts

Here are posts to avoid making. If you do encounter them, flag them for attention from our Guardians.

abuse
  • disrespectful or offensive
  • an advertisement
not helpful
  • low quality
  • not about the video topic
  • soliciting votes or seeking badges
  • a homework question
  • a duplicate answer
  • repeatedly making the same post
wrong category
  • a tip or thanks in Questions
  • a question in Tips & Thanks
  • an answer that should be its own question
about the site
Your Spin-Offs