Inferential statistics
Sampling distribution
None
Sampling distribution of the sample mean
The central limit theorem and the sampling distribution of the sample mean
Discussion and questions for this video
Do you understand English? Click here to see more discussion happening on Khan Academy's English site.
 In the last video, we learned about what is, quite
 possibly, the most profound idea in statistics.
 And that's the central limit theorem.
 And the reason why it's so neat is we can start with any
 distribution that has a well defined mean and variance.
 Actually I made this I wrote the standard
 deviation in the last few.
 That should be the mean.
 And let's say it has some variance.
 I could write it like that.
 Or I could write the standard deviation there.
 But as long as it has a well defined mean and standard
 deviation, I don't care what the distribution looks like.
 What I can do is take samples, in the last
 video, of say size 4.
 So in that means I take, literally, four instances
 of this random variable.
 This is one example.
 I take their mean.
 And I consider this the sample mean from my first trial.
 Or, you could almost say, for my first sample.
 I know it's very confusing because you can
 consider that a sample.
 The set to be a sample.
 Or you can consider each of its members of the Each member
 of the set as a sample.
 So that can be a little bit confusing there.
 But I have this first sample mean.
 And then I keep doing that over and over.
 In my second sample, my sample size is 4.
 I got four instances of this random variable.
 I average them.
 I have another sample mean.
 And the cool thing about the central limit theorem is, as I
 keep plotting the frequency distribution of my sample
 means, it starts to approach something that approximates
 the normal distribution.
 And it's going to do a better job of approximating that
 normal distribution as n gets larger.
 And just so we have a little terminology on our belt, this
 frequency distribution right here that I plotted out.
 Or here or up here, that I started plotting out.
 That is called And it's kind of confusing because we use
 the word sample so much.
 That is called the sampling distribution
 of the sample mean.
 And let's dissect this a little bit.
 Just so that this long description of this
 distribution starts to make a little bit of sense.
 When we say it's the sampling distribution, that's telling us
 that it's being derived from It's the distribution of some
 statistic, which in this case, happens to be the sample mean.
 And we're driving it from samples of an original
 distribution.
 So each of these So this is my first sample.
 My sample size is 4.
 I'm using the statistic the mean.
 I actually could have done it with other things.
 I could have done the mode or the range or other statistics.
 But the sampling distribution of the sample mean is
 the most common one.
 Is probably in my mind the best place to start learning about
 the central limit theorem.
 And even, frankly, sampling distribution.
 So that's what it's called.
 And just as a little bit of background And I'll prove
 this to you experimentally, not mathematically.
 But I think the experimental is, on some levels, more
 satisfying than statistics.
 That this will have the same mean as your original
 distribution right here.
 So it has the same mean.
 But we'll see in the next video that this is actually going
 to be It's going to start approximating a
 normal distribution.
 Even though my original distribution that this is
 kind of generated from is completely nonnormal.
 So let's do that with this app right here.
 And just to give proper credit where credit is due, this is
 I think it was developed at Rice University.
 This is from onlinestatbook.com.
 And this is their app, which I think is really neat app
 because it really helps you to visualize what a sampling
 distribution of the sample mean is.
 So I can literally create my own custom distribution here.
 So let me make something kind of crazy.
 So you can do this in theory with a discrete or a continuous
 probability density function.
 But what they have here could take on 1 of 32 values.
 And I'm just going to set the different probabilities of
 getting any of those 32 values.
 So clearly this right here is not a normal distribution.
 It looks a little bit bimodal, but it doesn't have long tails.
 But what I want to do is first just use a simulation to
 understand, or to better understand, what the sampling
 distribution is all about.
 So what I'm going to do I'm going to take We'll
 start with 5 at a time.
 So my sample size is going to be 5.
 And so when I click animate, what it's going to do is it's
 going to take five samples from this probability
 distribution function.
 It's going to take five samples and you're going to see
 them when I click animate.
 It's going to average them and plot the average down here.
 And then I'm going to click it again.
 It's going to do it again.
 So there you go.
 I got five samples from there.
 It averaged them.
 And it hit there.
 What did I just do?
 I clicked Oh.
 I wanted to clear that.
 Let me make this bottom one none.
 So let me do that over again.
 So I'm going to take 5 at a time.
 So I took five samples from up here.
 And then it took its mean.
 And plotted the mean there.
 Let me do it again.
 Five samples from this probability distribution
 function, plotted it right there.
 I could keep doing It'll take some time, but, as you can see,
 I plotted it right there.
 Now, I could do this a thousand times.
 It's going to take forever.
 Let's say I just wanted to do it 1,000 times.
 So it's This program, just to be clear, it's actually
 generating the random numbers.
 This isn't like a rigged program.
 It's actually going to generate the random numbers according
 to this probability distribution function.
 It's going to take five at a time, find their means
 and plot the means.
 So if I click 10,000, it's going to do that 10,000 times.
 So it's going to take 5 numbers from here 10,000 times.
 And find their means 10,000 times.
 And then plot the 10,000 means here.
 So let's do that.
 So there you go.
 Notice, it's already looking a lot like a normal distribution.
 And, like I said, the original mean of my crazy
 distribution here was 14.45.
 And the mean of, after doing 10,000 samples or 10,000
 trials, my mean here is 14.42.
 So I'm already getting pretty close to the mean there.
 My standard deviation, you might notice,
 is less than that.
 We'll talk about that in a future video.
 And this skew and kurtosis.
 These are ideas These are things that help us measure
 how normal a distribution is.
 And I've talked a little bit about it in the past.
 And let me actually just diverge a little bit.
 Just so it's interesting.
 And they're fairly straightforward concepts.
 Skew literally tells So if this is Let me do
 it in a different color.
 If this is a perfect normal distribution, and clearly
 my drawing is very far from perfect.
 If that's a perfect distribution, this would
 have a skew of 0.
 If you have a positive skew, that means you have a
 larger right tail than you would've otherwise expect.
 So something with a positive skew might look like this.
 It would have a large tail to the right.
 So this would be a positive skew, which makes it a
 little less than ideal for normal distribution.
 And a negative skew would look like this.
 It has a long tail to the left.
 So negative skew might look like that.
 So that is a negative skew.
 If you have trouble remembering it, just remember which
 direction the tail is going.
 This tail is going towards the negative direction.
 This tail is going to the positive direction.
 So something has no skew, that means that it's nice and
 symmetrical around its mean.
 Now kurtosis, which sounds like a very fancy word, is similarly
 not that fancy of an idea.
 Kurtosis.
 So, once again, if I were to draw a perfect normal
 distribution Remember, there is no one normal distribution.
 You could have different means and different
 standard deviations.
 Let's say that's a perfect normal distribution.
 If I have positive kurtosis, what's going to happen is, I'm
 going to have fatter tails.
 Let me draw it a little nicer than that.
 I'm going to have fatter tails, but I'm going to
 have a more pointy peak.
 I didn't have to draw it that pointy.
 Let me draw it like this.
 I'm going to have fatter tails, and I'm going to have a
 more pointy peak than a normal distribution.
 So this, right here, is positive kurtosis.
 So something that has positive kurtosis, depending on how
 positive it is, it tells you it's a little bit more pointy
 than a real normal distribution.
 Positive kurtosis.
 And negative kurtosis has smaller tails, but it's
 smoother near the middle.
 So it's like this.
 So something like this would have negative kurtosis.
 So maybe in future videos, we'll explore that
 in more detail.
 But in the context of the simulation, it's just
 telling us how normal this distribution is.
 So when our sample size was n equal 5, and we did 10,000
 trials, we got pretty close to a normal distribution.
 Let's do another 10,000 trials just to see what happens.
 It looks even more like a normal distribution.
 Our mean is now the exact same number.
 But we still have a little bit of skew and a
 little bit of kurtosis.
 Now let's see what happens if we were to do the same thing
 with a larger sample size.
 And we could actually do them simultaneously.
 So here's n equal 5.
 Let's do here n equals 25.
 Let's let me clear them.
 I'm going to do the sample sampling distribution
 of the sample mean.
 As I'm going to run 10,000 trials So I'll do one
 animated trial, just so you remember what's going on.
 So I'm literally taking first 5 samples from up here.
 Find their mean.
 Now I'm taking 25 samples from up here.
 Find it's mean.
 And then plotting it down here.
 So here the sample size is 25.
 Here it's 5.
 I'll do it one more time.
 I take 5, get the mean, plot it.
 Take 25, get the mean, and then plot it down there.
 This is a larger sample size.
 Now that thing that I just did, I'm going to do 10,000 times.
 And that's interest Remember, our first distribution was just
 this really crazy, very nonnormal distribution.
 But once we did it whoops.
 I didn't want to make it that big.
 But once we Scroll up a little bit.
 So here, what's interesting.
 They both look a little normal.
 But if you look at the skew and the kurtosis when our
 sample size is larger, it's more normal.
 This has a lower skew than when our sample size was only 5.
 And it has a less negative kurtosis then when our
 sample size was 5.
 So this is a more normal distribution.
 And, one thing that we're going to explore further in a future
 video, is not only is it more normal in it's shape, but it's
 also tighter fit around the mean.
 And you can even think about why that kind of make sense.
 When you're sample size is larger, your odds of getting
 really far away from the mean is lower.
 Because it's very low likelihood if you're taking 25
 samples or 100 samples that you're just going to get a
 bunch of stuff way out here, a bunch of stuff way out here.
 You're very likely to get a reasonable spread of things.
 So it makes sense that your mean your sample mean is
 less likely to be far away from the mean.
 We're going to talk a little bit more about
 that in the future.
 But hopefully this kind of satisfies you, at
 least experimentally.
 I haven't proven it to you with mathematical rigor, which
 hopefully we'll do in the future.
 But hopefully this satisfies you, at least experimentally,
 that the central limit theorem really does apply to
 any distribution.
 I mean this is a crazy distribution.
 I encourage you to use this applet at onlinestatbook.com
 and experiment with other crazy distributions to
 believe for yourself.
 But the interesting things are that we're approaching a normal
 distribution, but as my sample size got larger, it's a better
 fit for normal distribution.
Be specific, and indicate a time in the video:
At 5:31, how is the moon large enough to block the sun? Isn't the sun way larger?

Have something that's not a question about this content? 
This discussion area is not meant for answering homework questions.
Before asking, please make sure you've checked the top questions below and our FAQ. Thanks!
Where on the onlinestatbook site is this little software toy?
Thanks, John
Thanks, John
John,
http://onlinestatbook.com/ click on "content" in the upper left "List of Simulations and Demonstrations" scroll to the bottom of the page "Simulations from the Rice Virtual Lab in Statistics" and it's the second one down "Sampling Distribution Simulation". Hope that helps.
http://onlinestatbook.com/ click on "content" in the upper left "List of Simulations and Demonstrations" scroll to the bottom of the page "Simulations from the Rice Virtual Lab in Statistics" and it's the second one down "Sampling Distribution Simulation". Hope that helps.
This is not about the vid (srry), but is your name Firstjohn26 have anything to do with 1 John 2:6 (Whoever claims to live in him must live as Jesus did)?
If we know the mean and the standard deviation of the population, then why are we taking samples, if we already have the data?
Thanks in advance.
Thanks in advance.
Learning statistics can be a little strange. It almost seems like you're trying to lift yourself up by your own bootstraps. Basically, you learn about populations working under the assumption that you know the mean/stdev, which is silly, as you say, but later you begin to drop these assumptions and learn to make inferences about populations based on your samples.
Once you have some version of the Central Limit Theorem, you can start answering some interesting questions, but it takes a lot of study just to get there!
Once you have some version of the Central Limit Theorem, you can start answering some interesting questions, but it takes a lot of study just to get there!
I have a practice question that I just can't figure out. It is: "Eighteen subjects are randomly selected and given proficiency tests. The mean for this group is 492.3 and the standard deviation is 37.6. Construct the 98% confidence interval for the population standard deviation."
I don't know how to figure out the confidence interval for a standard deviation. Can you please help. Thanks. Katie
I don't know how to figure out the confidence interval for a standard deviation. Can you please help. Thanks. Katie
We already know that:
A range from 1 std.dev. to 1 std.dev. contains 68.3% of outcomes.
A range from 2 std.dev. to 2 std.dev. contains 95.4% of outcomes.
A range from 3 std.dev. to 3 std.dev. contains 99.7% of outcomes.
So the question is, how many Std.Dev's do we have to move away from the mean in both directions on the graph to contain 98% of outcomes. Not 95.4%, Not 99.7%, exactly 98%. Right away you know the answer will be between 2 and 3 std.dev's, as 98% is between 95.4% and 99.7%
To
A range from 1 std.dev. to 1 std.dev. contains 68.3% of outcomes.
A range from 2 std.dev. to 2 std.dev. contains 95.4% of outcomes.
A range from 3 std.dev. to 3 std.dev. contains 99.7% of outcomes.
So the question is, how many Std.Dev's do we have to move away from the mean in both directions on the graph to contain 98% of outcomes. Not 95.4%, Not 99.7%, exactly 98%. Right away you know the answer will be between 2 and 3 std.dev's, as 98% is between 95.4% and 99.7%
To
N=18, mean=492.3, std deviation=37.6. The zscore for a 98% confidence interval is 2.325 (approx). The formula for a confidence interval is X (sample mean) +/ Z*(std deviation)/sqrt(N). Thus, the confidence interval at 98% is 492.3+/(2.325*37.6/sqrt(18)) or 492.3 +/ 20.6. What this really means is that the mean of the population will fall within 471.7 and 512.9 98% of the time.
If I'm wrong please let me know!
If I'm wrong please let me know!
hi...um arent we suppose to use the t.distribution instead of the z because n(the sample size) is less than 30 and one of the properties to using the t.table is when n is less than 30
Well, I'm not sure how exactly what language the answer must be in, but hopefully I can help with the theory:
If we assume a normal distribution, which is a pivotal assumption and is not explicitly stated, (if the distribution is not normal, the question does not give enough info for an answer) the question seems to be asking:
How wide would a range of the distribution, centered on the mean, have to be to contain 98% of outcomes?
We already know that:
A range from 1 std.dev. to 1 std.dev. c
If we assume a normal distribution, which is a pivotal assumption and is not explicitly stated, (if the distribution is not normal, the question does not give enough info for an answer) the question seems to be asking:
How wide would a range of the distribution, centered on the mean, have to be to contain 98% of outcomes?
We already know that:
A range from 1 std.dev. to 1 std.dev. c
What is Point Estimation and Sampling error? Please give a sample.
We have to use chi square approach.
The 98% confidence interval for population standard deviation is
sqrt( (n1)*s^2 / chi^2_alpha_n1) < σ < sqrt( (n1)*s^2 / chi^2_1alpha_n1)
n = 18
s^2 = 37.6^2
sqrt( 17 * 37.6^2 / 33.4088) < σ < sqrt( 17 * 37.6^2 / 6.4077)
The answer is
4.374 < σ < 9.9876
The 98% confidence interval for population standard deviation is
sqrt( (n1)*s^2 / chi^2_alpha_n1) < σ < sqrt( (n1)*s^2 / chi^2_1alpha_n1)
n = 18
s^2 = 37.6^2
sqrt( 17 * 37.6^2 / 33.4088) < σ < sqrt( 17 * 37.6^2 / 6.4077)
The answer is
4.374 < σ < 9.9876
Because the distribution is symmetrical, you know that 100  99 = 1% will be below the negative of the std.dev. level Excel gives you. And then you have the std.dev. levels between which 98% of outcomes lie. Then, of course multiply your std.dev level by the std.dev given in the question, subtract it from the mean for the lower end of the range and add it to the mean for the upper end of the range. That's it. That's your 98% confidence interval.
To find the exact answer, pull up Excel and pick a random cell. Type in "=NormsInv(0.99)"
Why not 0.98? NormsInv is a cumulative function, so there is no way to specify only between x std.dev's and x std.dev's. It will only give you the std.dev. below which y% of outcomes fall. So, this will tell you the std.dev. level below which 99% of outcomes fall. Because the distribution is symmetrical, you know that 100  99 = 1% will be below the negative of the std.dev. level Excel gives you. And then
Why not 0.98? NormsInv is a cumulative function, so there is no way to specify only between x std.dev's and x std.dev's. It will only give you the std.dev. below which y% of outcomes fall. So, this will tell you the std.dev. level below which 99% of outcomes fall. Because the distribution is symmetrical, you know that 100  99 = 1% will be below the negative of the std.dev. level Excel gives you. And then
Is there any difference if I take 1 "sample" with 100 "instances", or I take 100 "samples" with 1 "instance"?
(By sample I mean the S_1 and S_2 and so on. With instances I mean the numbers, [1,1,3,6] and [3,4,3,1] and so on.)
(By sample I mean the S_1 and S_2 and so on. With instances I mean the numbers, [1,1,3,6] and [3,4,3,1] and so on.)
Sal goes over this better than I do in the next video as well!
There is a difference. Your "samples" (random selections of values "x") that are made up of "instances" (referred to as the variable "n") provide what will essentially be the building blocks of your Sampling Distribution of the Sample Mean. Because your "instances" determine the value of the mean of "x", your size of "n" determines the value of "x"'s mean, and the Sampling Distribution of the Sample Mean's standard deviation (Defined as The original dataset's standard deviation divided by the square root of "n").
For example: If you were to take 1 "sample" with 100 "instances", you would get only one piece of data regarding the mean of 100 items [1,1,3,6,3,6,3,1,1,1,1,1...] from your original data. Your sampling distribution of the Sample mean's standard deviation would have a value of ((The original sample's S.D.)/(The square root of 100)), but that wouldn't really matter, because your data will likely be very close to your original data's mean, and you'd only have one sample.
Now if you take 100 samples with 1 instance [3], you'll get many pieces of data, but no change in standard deviation from your first sample: ((The original sample's S.D.)/(The square root of 1)). Functionally, with enough samples taken like this, you'll recreate your original dataset! You won't be creating a useful sampling distribution of the sample mean because "x" will equal the mean of "x". With 100 "samples" of 1 "instance", you're randomly picking 100 values of "x" and replotting them.
I hope that helps.
For example: If you were to take 1 "sample" with 100 "instances", you would get only one piece of data regarding the mean of 100 items [1,1,3,6,3,6,3,1,1,1,1,1...] from your original data. Your sampling distribution of the Sample mean's standard deviation would have a value of ((The original sample's S.D.)/(The square root of 100)), but that wouldn't really matter, because your data will likely be very close to your original data's mean, and you'd only have one sample.
Now if you take 100 samples with 1 instance [3], you'll get many pieces of data, but no change in standard deviation from your first sample: ((The original sample's S.D.)/(The square root of 1)). Functionally, with enough samples taken like this, you'll recreate your original dataset! You won't be creating a useful sampling distribution of the sample mean because "x" will equal the mean of "x". With 100 "samples" of 1 "instance", you're randomly picking 100 values of "x" and replotting them.
I hope that helps.
Do your sample sizes have to be the same size? E.G, at 1:05(ish) there are a bunch of samples with a sample size of four. Would it mess up any calculations if you took a sample of four and then, say, a sample of ten?
Yes, the sample sizes should be the same. The sample size is not considered to be a variable, it's considered to be a constant. The sampling distribution of the sample mean can be thought of as "For a sample of size n, the sample mean will behave according to this distribution." Any random draw from that sampling distribution would be interpreted as the mean of a sample of n observations from the original population.
So if every distribution approaches normal when do I employ say a Poisson or uniform or a Bernoulli distribution? I suppose it's a concept I haven't breached yet but how do I know when or which distribution to employ so I appropriately analyze the data? End goal = solve real world problems!
Not every distribution goes to the Normal. the distribution of the sample mean does, but that's as the sample size increases. If you have smaller sample sizes, assuming normality either on the data or the sample mean may be wholly inappropriate.
In terms of identifying the distribution, sometimes it's a matter of considering the nature of the data (e.g. we might think "Poisson" if the data collected are a rate, number of events per some unit/interval), sometimes it's a matter of doing some exploratory data analysis (histograms, boxplots, some numerical summaries, and the like).
For actually analyzing data: I would suggest hiring someone with more extensive training in Statistics to actually do such. Taking one course in Stats, which is basically what KhanAcademy goes through, isn't really enough to prepare someone to be a data analyst. I see the primary goal of taking one or two stats courses as giving you enough information to allow you to understand the results of statistical analyses. You can better tell the statistician what you want in his/her own terms, and you can better understand what s/he gives back to you.
In terms of identifying the distribution, sometimes it's a matter of considering the nature of the data (e.g. we might think "Poisson" if the data collected are a rate, number of events per some unit/interval), sometimes it's a matter of doing some exploratory data analysis (histograms, boxplots, some numerical summaries, and the like).
For actually analyzing data: I would suggest hiring someone with more extensive training in Statistics to actually do such. Taking one course in Stats, which is basically what KhanAcademy goes through, isn't really enough to prepare someone to be a data analyst. I see the primary goal of taking one or two stats courses as giving you enough information to allow you to understand the results of statistical analyses. You can better tell the statistician what you want in his/her own terms, and you can better understand what s/he gives back to you.
why can we say that the sampling distribution of mean follows a normal distribution for a large enough sample size even though the population is may not be normally distributed?
Properly, the sampling distribution APPROXIMATES a normal distribution for a sufficiently large sample (sometimes cited as n > 30). A coin flip is not normally distributed, it is either heads or tails. But 30 coin flips will give you a binomial distribution that looks reasonably normal (at least in the middle).
Is it possible to determine the sample variance without the population variance? I have an assignment that requires me to show the sampling distribution of the mean with only a population proportion and sample size.
If a question talks about a "population proportion" then you are dealing with a binomial distribution, except that you divide by the sample size to get sample proportion rather than the sample count. If the population proportion is p, then the mean value of sample proportions will be also be p (as usual, the mean of the sampling distribution is just the same as for the whole population), and the variance will be p(1  p)/n, where n is the size of the sample. You can read about this distribution here (note they use the letter pi for population proportion. It does NOT mean 3.14159...):
http://onlinestatbook.com/2/sampling_distributions/samp_dist_p.html
http://onlinestatbook.com/2/sampling_distributions/samp_dist_p.html
What is the difference between "sample distribution" and "sampling distribution"?
The sample distribution is what you get directly from taking a sample. You plot the value of each item in the sample to get the distribution of values across the single sample. When Sal took a sample in the previous video at 2:04 and got S1 = {1, 1, 3, 6}, and graphed the values that were sampled, that was a sample distribution. The 2nd graph in the video above is a sample distribution because it shows the values that were sampled from the population in the top graph.
The sampling distribution is what you get when you compare the results from several samples. You plot the mean of each sample (rather than the value of each thing sampled). In the previous video, Sal did that starting at 4:29, when he plotted the mean of each sample. The 3rd and 4th graphs above are sampling distributions because each shows a distribution of means from the many samples of a particular size.
http://www.psychstat.missouristate.edu/introbook/SBK19.htm also has an explanation.
The sampling distribution is what you get when you compare the results from several samples. You plot the mean of each sample (rather than the value of each thing sampled). In the previous video, Sal did that starting at 4:29, when he plotted the mean of each sample. The 3rd and 4th graphs above are sampling distributions because each shows a distribution of means from the many samples of a particular size.
http://www.psychstat.missouristate.edu/introbook/SBK19.htm also has an explanation.
Me and my friend Callum have been experimenting with sampling distribution progran on online stat book used by Sal (http://onlinestatbook.com/stat_sim/sampling_dist/index.html). However we found a result we cannot explain nor rationalise: When we ask for a sample size of 2 for the median disribution of any population it aproximates the population distribution and not a 'bell curve'. I am very disturbed by this because surely the median of 2 numbers is the same as the mean of 2 numbers and according to the central limit theorem should approximate a normal distribution. Is this assumption correct? Is the programme wrong? Or is there something we fail to understand?
The distribution of the sample median is not normal even if you take a larger sample size, such as n=5,10,or 25. The distribution of the sample median seems to be more related to the distribution of the population.
But I don't know why.
But I don't know why.
Thanks Jilarra, I agree. Unfortunately I cannot access the program again. However I would like to see if the simulation of n=2 for both the median and the mean are similar (or the same or completely different). If anyone happens to test this please post here to let me know the results :)
when n = 2 (n being sample size) the central limit theorem is not going to give a very good approximation to the normal
Some sources state that Kurtosis for ND is 0 and other books state that it is 3.
I am confused about this and I kindly ask for your advise since I don`t want to go too deep in the formulas for kurtosis.
I am confused about this and I kindly ask for your advise since I don`t want to go too deep in the formulas for kurtosis.
Using kurtosis as it's properly defined, the Normal distribution has kurtosis of 3. Sometimes though, people want the Standard Normal distribution to represent the most basic values possible (mean zero, standard deviation 1, skewness 0, etc) so that it's a baseline (in some manner). So they'll define "excess kurtosis" which is just "kurtosis  3" so that the Normal distribution has 0 kurtosis. Apparently, doing this also makes some other calculations more convenient.
By the way, you don't need to delve into formulas. Wikipedia has a lot of the information on distributions labeled. For instance, check out:
http://en.wikipedia.org/wiki/Normal_distribution
Along the righthand side, there is a table with all sorts of information, one of which is "Ex. Kurtosis." When you click on it, there's a short description of what excess kurtosis means.
By the way, you don't need to delve into formulas. Wikipedia has a lot of the information on distributions labeled. For instance, check out:
http://en.wikipedia.org/wiki/Normal_distribution
Along the righthand side, there is a table with all sorts of information, one of which is "Ex. Kurtosis." When you click on it, there's a short description of what excess kurtosis means.
At around the 8 minute mark the meaning behind a negative or positive kurtosis is explained. What exactly is the relevance behind knowing whether the kurtosis is negative or positive? As it is explained in the video It seems knowing the kurtosis only gives us a more specific idea of the shape of the the curve, I'd like to know if it says anything about the data set? If so what? Thanks!
In the display Sal was working with, there was a field that tracked kurtosis  how close it was to 3 (normal distributions have a kurtosis of 3) as the number of samples increases. I don't know the answer to your questions, but I think he talked about it so we would get some clue about what it is, because the closer the distribution gets to normal, the closer the kurtosis gets to 3.
I believe the tool can generate the same sample more that once, right?
Any special behavior if we plot ALL possible combinations of the population(for a particular sample size N) only once?
Will it produce a "more normal" distribution if we plot the same number of samples but with the possibility of generating the same sample more than once (and therefore leaving out some other sample combination which wont be generated?)
Any special behavior if we plot ALL possible combinations of the population(for a particular sample size N) only once?
Will it produce a "more normal" distribution if we plot the same number of samples but with the possibility of generating the same sample more than once (and therefore leaving out some other sample combination which wont be generated?)
"Any special behavior if we plot ALL possible combinations of the population(for a particular sample size N) only once?"
I'm going to make a guess that it depends on the distribution of the population. Kind of interesting because they're not random samples. If you made up some finite number of samples that approximated a normal sampling distribution, would the central limit theorem apply to them? In my opinion, no.
"Will it produce a "more normal" distribution if ..." How do you satisfy your conditions?
I'm going to make a guess that it depends on the distribution of the population. Kind of interesting because they're not random samples. If you made up some finite number of samples that approximated a normal sampling distribution, would the central limit theorem apply to them? In my opinion, no.
"Will it produce a "more normal" distribution if ..." How do you satisfy your conditions?
What is the difference between Xbar and mu? Like when do you know which to use what?
Xbar is the mean of a sample (as Sal says at 4:29 in https://www.khanacademy.org/math/probability/descriptivestatistics/central_tendency/v/statisticssamplevspopulationmean). You use Xbar for the mean calculated from data that was only gathered from part of the population (such as a survey of 1000 adults out of the entire US population).
Mu is the mean of the entire actual population. You only use mu to describe the mean if you are talking about data gathered from every element in the population, such as the 2010 census or every porcupine in the zoo.
Mu is the mean of the entire actual population. You only use mu to describe the mean if you are talking about data gathered from every element in the population, such as the 2010 census or every porcupine in the zoo.
How would one answer a question such as "what is the sampling distribution of the sample mean? Explain." after being given a problem where the only info given is the mean of a (normal) distribution and its standard deviation? There is also a number that is being randomly computed and averaged. Is the sample mean the mean of the normal distribution?
The sampling distribution of a normal distribution is itself normally distributed. The mean of the sampling distribution is the mean of the original distribution (by symmetry there is no other possible result), and the standard deviation of the sampling distribution shrinks by the square root of the sample size.
This derives from the properties of the variance. When you add two random variables, the variance of the sum adds. Thus when you add n identical random variables, the variance of the sum is n times the original variance and the standard deviation (square root of the variance) is sqrt(n) times the original standard deviation. Divide this by n, to AVERAGE n identical random variables, and you get the above result.
This derives from the properties of the variance. When you add two random variables, the variance of the sum adds. Thus when you add n identical random variables, the variance of the sum is n times the original variance and the standard deviation (square root of the variance) is sqrt(n) times the original standard deviation. Divide this by n, to AVERAGE n identical random variables, and you get the above result.
@ 9:15 two distributions are shown and compared (N=5 and N=25) and Sal explains in terms of skew and Kurzweillosis (or something) that the N=25 distribution is more normal. But wait... it does not LOOK more normal to me. Specifically, it looks a lot lumpier... as if it were composed of less data. Each bin is fatter and there are less bins. Am I making sense? Can someone explain?
The lumpier look you're seeing is exactly because of the fewer number of bins. If we wanted, we could go in and specify how we wanted the bins formed, but typically there's just a computer algorithm that chooses the bins in some fashion. If we chose a few more bins there, it would looks much more smooth.
The bottom histogram looks more normal because of the general behavior of the distribution. The one for n=5 is like a normal distribution that was smashed down a bit. It's too short in the middle and has too "fat" of tails. If you think back to, say, the Empirical Rule, the top one would probably have less than 68% of the data within 1 standard deviation of the mean.
p.s. the word is "kurtosis," it's a way to describe the "peakedness" of the graph. A graph with high kurtosis will have much sharper peak (picture 1 below), a graph with low kurtosis will have much more of a rolling hill look to it.
Picture 1:
http://commons.wikimedia.org/wiki/File:Orographic_lifting_of_the_air__NOAA.jpg
Picture 2:
http://en.wikipedia.org/wiki/File:FoothillsCO.JPG
The bottom histogram looks more normal because of the general behavior of the distribution. The one for n=5 is like a normal distribution that was smashed down a bit. It's too short in the middle and has too "fat" of tails. If you think back to, say, the Empirical Rule, the top one would probably have less than 68% of the data within 1 standard deviation of the mean.
p.s. the word is "kurtosis," it's a way to describe the "peakedness" of the graph. A graph with high kurtosis will have much sharper peak (picture 1 below), a graph with low kurtosis will have much more of a rolling hill look to it.
Picture 1:
http://commons.wikimedia.org/wiki/File:Orographic_lifting_of_the_air__NOAA.jpg
Picture 2:
http://en.wikipedia.org/wiki/File:FoothillsCO.JPG
Frankly, I'm not sure if my question actually belongs here... but this seems to be in the ballpark. It's been 30 yrs since my statistics class and I'm more than a little rusty! lol If this is the wrong forum for this question, I would appreciate if someone would point me in the right direction.
So, what I'm trying to do is find how to figure out a possible distribution of scores given the following info... 120 scores, ranging from 879  900 w/ a mean of 886.
Is that enough info to produce a sample distribution of possible scores and if it is.. is there a lesson/video that can explain the procedure to calculate this.
So, what I'm trying to do is find how to figure out a possible distribution of scores given the following info... 120 scores, ranging from 879  900 w/ a mean of 886.
Is that enough info to produce a sample distribution of possible scores and if it is.. is there a lesson/video that can explain the procedure to calculate this.
What do you mean by the following:
"figure out a possible distribution of scores"
"produce a sample distribution of possible scores"
"figure out a possible distribution of scores"
"produce a sample distribution of possible scores"
I have a question that I cant figure out please help:
Identify the class width, class midpoints, & class boundaries for the given frequency distribution
Daily low temp (F) Frequency Daily low temp (F) Frequency
3235 1 4851 7
3639 3 5255 7
4043 5 5659 1
4447 11
Identify the class width, class midpoints, & class boundaries for the given frequency distribution
Daily low temp (F) Frequency Daily low temp (F) Frequency
3235 1 4851 7
3639 3 5255 7
4043 5 5659 1
4447 11
The class widths are the width of each interval which in this case is 4 (e.g. {32, 33, 34, 35} has 4 items),
the mid points are the mid point of each class, (top + bottom)/2, 33.5 in the case of the first one.
The boundaries between the ranges except you want to include the data that gets rounded up or down, so you add 0.5 to the top boundary or subtract it from the lower. So they would be 31.5, 35.5, 39.5, ..., 59.5
the mid points are the mid point of each class, (top + bottom)/2, 33.5 in the case of the first one.
The boundaries between the ranges except you want to include the data that gets rounded up or down, so you add 0.5 to the top boundary or subtract it from the lower. So they would be 31.5, 35.5, 39.5, ..., 59.5
I'm a little confused about what you're doing at 04:40. Lets say the PDF represents the 32 species of animals on a small island. So that application selects 5 types of animals lets say zebras, goats, penguins, gorillas and porcupines and plots their mean on the graph below. How the hell can you get the mean of a set of 5 species of animals? I don't get it.
@cnidoblast, selecting 5 types of animals invalidates the CLT. One of the assumptions of the most common CLT (there are actually many versions, this one is the most common) is that the observations, what Mr. Khan calls samples, are independent and identically distributed instances of a random variable. A random variable is a function that converts an observation from a random process in to a number. Your animals are not numbers, so it's meaningless to sum them much less find the mean. If you're talking about averaging their weights then it still fails the CLT assumptions because the weights that you're averaging do not come from an identical distribution. That is, the distribution of weights of zebras is very different from the distribution of weights of goats. Hope this helps! :)
Could you define a measure of skewness as (meanmedian)/standard deviation? An advantage of this would be that it is easier to calculate, and it can only take values between 1 and 1
I'm having some issues with this question.
3. For the general population, mean IQ is 100 with a standard deviation of 15. A sample of 100 people is selected at random from the population, with a sample mean of 102. This sample mean comes from a distribution of sample means with the following properties:
a. a mean of 100 and a standard error of 1.5
b. a mean of 102 and a standard error of 1.5
c. a mean of 100 and a standard error of 15
d. a mean of 102 and a standard error of 15
I think that the answer is either a or b, because you would divide the SD 15 by the square root of the original mean 10, which gives 1.5. But I have no idea what to do about the mean 100/102? Can anyone explain why it is one or the other?
3. For the general population, mean IQ is 100 with a standard deviation of 15. A sample of 100 people is selected at random from the population, with a sample mean of 102. This sample mean comes from a distribution of sample means with the following properties:
a. a mean of 100 and a standard error of 1.5
b. a mean of 102 and a standard error of 1.5
c. a mean of 100 and a standard error of 15
d. a mean of 102 and a standard error of 15
I think that the answer is either a or b, because you would divide the SD 15 by the square root of the original mean 10, which gives 1.5. But I have no idea what to do about the mean 100/102? Can anyone explain why it is one or the other?
THe general population is known to have a mean IQ of 100. That means that the distribution of sample means also has a mean of 100.
Excuse me can any one explain for me what is the difference between sampling distribution and population distribution and can explain by example for each of them
thanks
thanks
At 513 pm: For some reason, I understand this when it comes to means but in Sampling distribution of the sample proportion Using population (4,5,9), sample size n = 2 I am struggling to construct a table that represents the sampling distribution of the sample proportion of odd numbers. Can you please explain?
The mean of a set of data is 25 with a standard deviation of 2. What is the interval about the mean of the data within one standard deviation?
This sounds like a homework problem.
what is estimation of parameter ?
what is the relationship between M, meu, and meu with subscript m?
I have a question that I dont quite understand and it goes like this: "Assume the weights of eggs produced on an egg farm have a normal distribution with mean 64 grams and standard deviation 7 grams. and it also says "describe the distribution of weights of 12 (randomly chosen) mixed grade eggs?
9:08, how do you get five samples from the nonnormally distributed probability function? How do you get a set of data from the probability function?
Computers can quite easily simulate uniform distributions (for example the rand() function in matlab that gives a number between 0 and 1 accordingly to an uniform distribution). With that number you can simulate all sorts of other distributions.
For example if you want to simulate a fair dice you do :
x = rand(1)
if (x<1/6) then y = 1
elseif (x<2/6) then y = 2
elseif (x<3/6) then y = 3
elseif (x<4/6) then y = 4
elseif (x<5/6) then y = 5
else y=6
This is how you can simulate easily discrete distributions.
For example if you want to simulate a fair dice you do :
x = rand(1)
if (x<1/6) then y = 1
elseif (x<2/6) then y = 2
elseif (x<3/6) then y = 3
elseif (x<4/6) then y = 4
elseif (x<5/6) then y = 5
else y=6
This is how you can simulate easily discrete distributions.
My professor said the answer to the problem is "NOT" 0. I take meticulous notes, record lectures, online research, etc. Why can't I figure this out. Do I need to somehow calculate a sample proportion? Not sure what else to do. If the sample proportion is not given, how do I find it. The problem is the Z scores are above 3 and our Standard Normal Distribution Table stops at 3. Again, he said the answer is not 0. Below are some problems directly pasted here:
1) Given a normal distribution with a µ = 100 and σ = 10, if you select a random sample of n = 25, what is the probability that the sample mean is between 90 and 97.5?
2) Given a normal distribution with a µ = 50 and σ = 8, if you select a random sample of n = 100, what is the probability that the sample mean is between 47 and 49.5?
3) Given a normal distribution with a µ = 50 and σ = 5, if you select a random sample of n = 100, there is a 35% chance that the sample mean is above what value?
I'm really struggling here with the Z's being greater than 3. working on this for three days. Not just trolling for answers and being lazy. I desperately want to know the techniques and steps to calculate situations like this. Thank you very very much.
1) Given a normal distribution with a µ = 100 and σ = 10, if you select a random sample of n = 25, what is the probability that the sample mean is between 90 and 97.5?
2) Given a normal distribution with a µ = 50 and σ = 8, if you select a random sample of n = 100, what is the probability that the sample mean is between 47 and 49.5?
3) Given a normal distribution with a µ = 50 and σ = 5, if you select a random sample of n = 100, there is a 35% chance that the sample mean is above what value?
I'm really struggling here with the Z's being greater than 3. working on this for three days. Not just trolling for answers and being lazy. I desperately want to know the techniques and steps to calculate situations like this. Thank you very very much.
The key to all of these questions is using the standard error of the mean which is described in one of the next videos in the section.
Briefly, the SE (standard error) = standard deviation / sqr (sample size).
For 1) the SE = 10 (standard deviation) / 5 (sqr of 25) = 2. If you use a z table, we are looking for the probability of z between 5 {(90100)/2} and 1.25 {(97.5100) / 2}. This is .1056 using this online table (http://www2.fiu.edu/~millerr/Normal%20Table.pdf).
The other problems are solved similarly
Briefly, the SE (standard error) = standard deviation / sqr (sample size).
For 1) the SE = 10 (standard deviation) / 5 (sqr of 25) = 2. If you use a z table, we are looking for the probability of z between 5 {(90100)/2} and 1.25 {(97.5100) / 2}. This is .1056 using this online table (http://www2.fiu.edu/~millerr/Normal%20Table.pdf).
The other problems are solved similarly
i) Thus rejecting the null hypothesis is what Type error?
I need help putting together the formula to anser the question, "A population is bimodal with a variance of 5.77. One hundred samples of size 30 are randomly selected and the 100 sample means are calculated. The standard deviation of the sample means is approximately:
If the distdistribution for a histrogram showing weight is normal what does it mean for the mean median and mode
How does this relate to this video? I think this is answered in the earlier video about the qualitative sense of normal distributions
https://www.khanacademy.org/math/probability/statisticsinferential/normal_distribution/v/ck12orgnormaldistributionproblemsqualitativesenseofnormaldistributions
Normal distributions are covered starting at https://www.khanacademy.org/math/probability/statisticsinferential/normal_distribution/v/introductiontothenormaldistribution.
If that video does not clarify things for you, I encourage you to ask the question there.
https://www.khanacademy.org/math/probability/statisticsinferential/normal_distribution/v/ck12orgnormaldistributionproblemsqualitativesenseofnormaldistributions
Normal distributions are covered starting at https://www.khanacademy.org/math/probability/statisticsinferential/normal_distribution/v/introductiontothenormaldistribution.
If that video does not clarify things for you, I encourage you to ask the question there.
Thanks for info! Just one question, what about the distribution of the means of the means? So if I take 10 samples from a population find the averages and do this 10 times and average all the means again, what is the standard deviation of the means of the means?
You'd apply the same ideas. Say we have the following:
Original population has `µ = 1000` and `σ = 24`
Step 1:
Take samples of size n=16 and record the sample mean. If we do this over and over, we'd get the sampling distribution of the sample mean, which is a new population with `µ* = 100` and `σ* = 24 / √16 = 6`
Step 2:
If take samples of size m=9 from the population resulting from Step 1. That is, in order to get 1 observation, we'd:
a. Draw samples of size n=16 from the original population, record the mean.
b. Do this 9 times, and record the mean of the mean.
If we did this over and over, then we'd have a new sampling distribution with `µ** = 100` and `σ** = 6 / √9 = 2`. This can also be expressed as `σ** = 24 / (√14 * √9) = 24 / √(14 *9) `. Which is `σ / √(n*m)`.
In other words: Each value from the sampling distribution in Step 2 used 144 (`16*9`) draws from the original population. Doing this in "stages" of size 16, replicated 9 times, doesn't gain us anything but more bookkeeping, the resulting sampling distribution is exactly the same as if we had just taken samples of size 144 from the original distribution.
Hence, we generally don't bother with this twostage sampling, we just take one sample, and use the theoretical idea of the sampling distribution to determine how the sample mean will behave.
There are some instances where the ideas of this twostage sampling could conceivably come in handy. One such possibility would be metaanalysis, where the results of several studies (which implies several samples) are combined. Metaanalysis is a bit of a specialty (and importantly, not my specialty), so I don't know that these ideas are used, it's just my speculation that they could be used.
Original population has `µ = 1000` and `σ = 24`
Step 1:
Take samples of size n=16 and record the sample mean. If we do this over and over, we'd get the sampling distribution of the sample mean, which is a new population with `µ* = 100` and `σ* = 24 / √16 = 6`
Step 2:
If take samples of size m=9 from the population resulting from Step 1. That is, in order to get 1 observation, we'd:
a. Draw samples of size n=16 from the original population, record the mean.
b. Do this 9 times, and record the mean of the mean.
If we did this over and over, then we'd have a new sampling distribution with `µ** = 100` and `σ** = 6 / √9 = 2`. This can also be expressed as `σ** = 24 / (√14 * √9) = 24 / √(14 *9) `. Which is `σ / √(n*m)`.
In other words: Each value from the sampling distribution in Step 2 used 144 (`16*9`) draws from the original population. Doing this in "stages" of size 16, replicated 9 times, doesn't gain us anything but more bookkeeping, the resulting sampling distribution is exactly the same as if we had just taken samples of size 144 from the original distribution.
Hence, we generally don't bother with this twostage sampling, we just take one sample, and use the theoretical idea of the sampling distribution to determine how the sample mean will behave.
There are some instances where the ideas of this twostage sampling could conceivably come in handy. One such possibility would be metaanalysis, where the results of several studies (which implies several samples) are combined. Metaanalysis is a bit of a specialty (and importantly, not my specialty), so I don't know that these ideas are used, it's just my speculation that they could be used.
Thanks a ton Sal. This video really helped me in understanding the concept very well. Can you explain how variance of this distribution is sigma square by n and not just sigma square
What is Point Estimation?
You are given the distribution of ages of students enrolled at College of the Redwoods from a simple random sample of size 53. You plot the data and it appears to be strongly skewed to the right. You consult the college admissions office and they inform you that the population mean age of students is 21.3 years with a population standard deviation of 10.7 years. You calculate the mean from your sample of 53. What would be the sampling distribution that this value belongs to?
Is kurtosis independent of narrowness (standard deviation)? What kinds of factors influence kurtosis?
Well, the standard deviation is in the formula for kurtosis. So they are _related_ to one another, if that's what you mean.
Though "independence" has a special meaning in Statistics, and I do not know if kurtosis and the variance are independent in the Statistical sense. To explain what I mean: the sample mean is a part of the formula for the sample variance, yet these two statistics are independent.
Though "independence" has a special meaning in Statistics, and I do not know if kurtosis and the variance are independent in the Statistical sense. To explain what I mean: the sample mean is a part of the formula for the sample variance, yet these two statistics are independent.
Can it be demonstrated that a sampling mean is likely to yield more accuracy than one "large" sample mean, for the same amount of work?
What’s the goal of inferential statistics
To use what you have, data wise, to make an inference or to find out information about something you do not have data about. So if I know somthing about the sick people in a study for example, I may want to use that knowledge to make an inference about the people who are not sick...or I might want to make an inference about the population in general from the data I have.
Given that a larger sample would result in more variance in opinions particularly when it comes to qualitative questions shouldn't he range be wider with a larger population?
In Stats we generally make the assumption that the phenomenon we're investigating has certain parameters. For instance, if we're measuring the number of calories consumed per day, there will be some mean (μ) and some standard deviation (σ). For the most part, the caloric intake will cluster around, say, μ +/ σ (within 1 standard deviation of the mean). There will be some folks a bit more on the extreme sides, say μ +/ 2σ. But the further out we go, the less likely a value becomes.
If we collect a larger sample, we don't really expect to see many more values that a further out  we would expect to see more values clustered around the mean. Sure, there would be some values further out on the tails, but by and large the new values would "fill in" the same area. If you think of a histogram of the observed sample values, think of it getting more and more dense, not wider and wider.
If we collect a larger sample, we don't really expect to see many more values that a further out  we would expect to see more values clustered around the mean. Sure, there would be some values further out on the tails, but by and large the new values would "fill in" the same area. If you think of a histogram of the observed sample values, think of it getting more and more dense, not wider and wider.
Is the central limit theorem the same as central tendency?
I'm liking these videos. Nice change from scrolling over lecture notes. Thanks Sal.
can you explain the theory of sampling distributions of sample variance?
thanks
thanks
why is sampling distribution important
My guess:
We need to know how get a reliable mean, because for most populations (unless we want to grade a class test on a curve, or do something like that) it's not practical to get the population mean.
We need to know how get a reliable mean, because for most populations (unless we want to grade a class test on a curve, or do something like that) it's not practical to get the population mean.
I understand that all samples of data are going to vary from time to time and some of them maybe the same, if the population is higher in one distribution than the other. Do you think that sample of means would still normal or do you think that the sample of means would have to be adjusted to fit within the parameters?
I think he was saying that the distribution of the sample means will be approach a normal distribution, regardless of the population the samples come from, as the number of samples increases. I wish he would have discussed some more extreme examples  some even smaller populations, and talked about the limits of use of this statistic.
I have a statistics midterm tomorrow and we only need to understand what the sample mean difference is and what the standard error of the mean is. Can you help me understand those two terms?
In terms of platykuric and leptokuric, which is considered the "positive kurtosis" and "negative kurtosis" as discussed in the video?
From Wikipedia:
"A highkurtosis distribution has a sharper peak and fatter tails, while a lowkurtosis distribution has a more rounded peak and thinner tails.
Distributions with zero excess kurtosis ["excess" is the difference from kurtosis of a normal distribution, which is +3], are called mesokurtic, or mesokurtotic. The most prominent example of a mesokurtic distribution is the normal [Gaussian] distribution family, regardless of the values of its parameters. ...
A distribution with positive excess kurtosis is called leptokurtic, or leptokurtotic. "Lepto" means "slender" [as in "lepton"]. In terms of shape, a leptokurtic distribution has a more acute peak around the mean and fatter tails. ... Sometimes called 'superGaussian'.
A distribution with negative excess kurtosis is called platykurtic, or platykurtotic. "Platy" means "broad" [as in "platypus"]. In terms of shape, a platykurtic distribution has a lower, wider peak around the mean and thinner tails. ... Sometimes termed 'subGaussian'."
"A highkurtosis distribution has a sharper peak and fatter tails, while a lowkurtosis distribution has a more rounded peak and thinner tails.
Distributions with zero excess kurtosis ["excess" is the difference from kurtosis of a normal distribution, which is +3], are called mesokurtic, or mesokurtotic. The most prominent example of a mesokurtic distribution is the normal [Gaussian] distribution family, regardless of the values of its parameters. ...
A distribution with positive excess kurtosis is called leptokurtic, or leptokurtotic. "Lepto" means "slender" [as in "lepton"]. In terms of shape, a leptokurtic distribution has a more acute peak around the mean and fatter tails. ... Sometimes called 'superGaussian'.
A distribution with negative excess kurtosis is called platykurtic, or platykurtotic. "Platy" means "broad" [as in "platypus"]. In terms of shape, a platykurtic distribution has a lower, wider peak around the mean and thinner tails. ... Sometimes termed 'subGaussian'."
WHICH tool was used to generate the 10000 samples?
THIS ISN'T HELPING. I still don't know how to find the probability of getting a specific sample mean given a normal bell curve. He's not even using the bell curve! I'm extremely confused, please help.
I have a question m failing to solve. ' A population has a mean of 200 and a standard deviation of 50. A simple random sample of size 100 will be taken and the sample mean x will be used to estimate the population mean. Show the sampling distribution of the sample mean
what does the distribution of means denote?
I think it means "distribution of the sample means"  the distribution of all of the mean values of the (same sized) samples taken from the original population.
what is the minimum number of sample means that should be used to get a reasonable accurate Sampling distribution of the sample mean since thousands of sample means may not be practical in a gage repeatability and reliability study.
In practice we get one, and use the _theoretical_ sampling distribution to let us draw conclusions.
at 8:45, it has been said that even for single samples the central limit theorem is true. It is not so, central limit theorem is applicable only for sample MEANS. For example, out of a population of 5000 if I have taken the sample of n=50, central limit theorem does NOT apply to that. It applies only when I have taken (e.g.)40 samples of n=50. However, this is as per my understanding. Please correct me if I am wrong.
What would be a real world application for this? If anyone has any examples in the manufacturing world that would be very helpful. Thanks!
What I don't understand is when you have a large Binary distribution for example, and you approximate it using Normal distribution.. If you only have one sample consisting of x values, you haven't got a standard deviation really.. we always have those kinds of questions on the exam but i always get the formula wrong then..
As long as you know all the values in the sample, you can do the series of calculations described under "basic examples" here http://en.wikipedia.org/wiki/Standard_deviation to figure out what the sample's standard deviation is. Of course, you have to divide by N1 with samples like the wikipedia article (as well as Sal's video on standard deviation) explains, otherwise it's exactly the same. Perhaps you are limiting your definition of "standard deviation" to "standard deviation of population", which you of course can't figure out with just one sample of values? If it's not specified that the population's SD is asked for in the exam question you're describing, it's safe to assume that they are asking for the sample's SD.
I'm trying to picture skew and kurtosis, but I have no idea how much the numbers actually mean. Is there a video that gives a good idea of how much skew is, say: 0.1, 0.5, 1, and 10? Same thing with kurtosis. I like having a feel for what the value means in my brain.
I am bit confused:
is n = number of sets that we pick Or the size of the set that we pick ?
is n = number of sets that we pick Or the size of the set that we pick ?
the size of the set!
only the mean follows the CLT ?
What would be the difference between the distribution of a sample variable and the sampling distribution of the mean?..? I'm so confused between these two terms
Where are the practice problems for this?
Sal repeats "well defined mean" and "well defined variance" a couple of times at the very beginning of the video. When are these quantities not well defined?
what does the ! following a number mean
7! means 7 factorial which is the same as 7*6*5*4*3*2*1 or 5040
This seems to be a simple question to answer, but I'm actually not 100% certain about it:
say there's a population that's normally distributed with mean u and standard deviation s. An independent sample of N observations is drawn from the population. What is the distribution of the sample mean? I think it's still a normal distribution, but I'm not sure if this is correct and sufficient, because I'm still in the process of getting comfortable with all this stat lingo.
Thanks!
say there's a population that's normally distributed with mean u and standard deviation s. An independent sample of N observations is drawn from the population. What is the distribution of the sample mean? I think it's still a normal distribution, but I'm not sure if this is correct and sufficient, because I'm still in the process of getting comfortable with all this stat lingo.
Thanks!
Yes. If X is normally distributed, then the sample mean xbar will also be normally distributed regardless of the sample size. If X is _not_ normally distributed, then we have to make sure the sample size is large enough for the Central Limit Theorem to kick in.
There were two cases talked about; n=5 and n=25. It was said that after 10,000 samples the n=25 was a closer fit to the normal distribution than the n=5 case. What I want to know is, if there were infinite samples, would the n=5 and the n=25 cases both be a perfect normal distribution?
If this is so: As the number of samples tends to infinity, does the n=25 case converge to the normal distribution faster than the n=5 case?
If this is so: As the number of samples tends to infinity, does the n=25 case converge to the normal distribution faster than the n=5 case?
This is answered in the next video in the series.
2:20 Can you really do it with the mode? It seems like there would be some distributions in which no matter how many samples you take, the mode would not be normally distributed.
If you can do it with any population, then all the possible modes of all the possible nsamples of a population also constitute a population, no?
In this example Sal took 10,000 samples of 5 for a total of 50,000 samples in the first example. Why not just take 50,000 samples of the original distribution and calculate the mean and SD?
In addition to what Casey wrote, I think you get a different sense of how confident you can be that the sample mean xbar is close to the population mean mu if you take several samples and compare them.
I might take one sample of size 400, get xbar = 11 and std. dev=1.5 and decide that mu is probably between 10 & 12. If I take 4 samples of size 100 and get ```
xbar  std dev
10  2.1
8  1.4
12  1.6
5  1.8
``` I will notice that many of the sample means are more than one standard deviation away from another, so I am no longer confident that any of them are close to the population mean.
I might take one sample of size 400, get xbar = 11 and std. dev=1.5 and decide that mu is probably between 10 & 12. If I take 4 samples of size 100 and get ```
xbar  std dev
10  2.1
8  1.4
12  1.6
5  1.8
``` I will notice that many of the sample means are more than one standard deviation away from another, so I am no longer confident that any of them are close to the population mean.
Every random variable has some sort of probability distribution. When we have a lot of data, we can plot them in a histogram and "see" the probability distribution. This is what we often do to see the distribution of the raw data. But the sampling distribution o the sample mean is trickier business. When we calculate the sample mean (xbar) , we have 1 value. Xbar is still a random variable, but for a given dataset, we have only 1 value of xbar, and using just 1 value is not going to provide a very useful plot. We'd really like to see how the sampling distribution of xbar behaves, but for that we need to have a lot of xbar's.
So we can do some experiments like Sal has done. He decided how to generate some data (according to that very strange population he was making on the top panel), and then he can drew 5 observations from that distribution. By calculating the mean, we get 1 observation from the sampling distribution of the sample mean. If we do this over and over again, that lets us get 10,000 observations from the sampling distribution of the sample mean. Plotting all of these together lets us see how the sampling distribution of the sample mean behaves  at least for the distribution Sal specified.
If we had put all 50,000 observations that we drew together and calculated the sample mean and SD, that would just be 1 observation from the sampling distribution of the sample mean (with n=50,000 instead of n=5). If we plotted all 50,000 together, that would be plotting the distribution of the raw data, not the distribution of the sample mean.
Let me know if this helps. It's a pretty tricky concept to grasp, I've had college students struggle to understand this, and that was when I was there explaining it in person.
So we can do some experiments like Sal has done. He decided how to generate some data (according to that very strange population he was making on the top panel), and then he can drew 5 observations from that distribution. By calculating the mean, we get 1 observation from the sampling distribution of the sample mean. If we do this over and over again, that lets us get 10,000 observations from the sampling distribution of the sample mean. Plotting all of these together lets us see how the sampling distribution of the sample mean behaves  at least for the distribution Sal specified.
If we had put all 50,000 observations that we drew together and calculated the sample mean and SD, that would just be 1 observation from the sampling distribution of the sample mean (with n=50,000 instead of n=5). If we plotted all 50,000 together, that would be plotting the distribution of the raw data, not the distribution of the sample mean.
Let me know if this helps. It's a pretty tricky concept to grasp, I've had college students struggle to understand this, and that was when I was there explaining it in person.
video wont play plese fix
maybe there's a bug or something.
are sample mean and population mean the same? while solving ques for confidence intervals why do we always subtract the sample mean from the value when the formula includes population mean?
P(1.28 < z < 1.75)
how do distributions provide a link between probabilities and statistical tests
Statistical tests are generally trying to compute the probability of something. Most often, there is an assumption (hypothesis), and we find the probability of the observed results assuming that hypothesis is true.
The probabilities can be calculated in a few different ways, but a very common method is through a distribution. So, we think that the data or a function of it, like a test statistic, has a particular distribution (this is generally _proven_, so it's not just a guess), and we can use that distribution to calculate probabilities.
The probabilities can be calculated in a few different ways, but a very common method is through a distribution. So, we think that the data or a function of it, like a test statistic, has a particular distribution (this is generally _proven_, so it's not just a guess), and we can use that distribution to calculate probabilities.
What's an example of a random variable with a symmetrical PDF which is not normal? Would this happen in the real world?
There are also the Laplace and Cauchy distributions both have a similar shape to the Normal  symmetric with a peak  but are very much not Normal.
The Uniform distribution is symmetric, but has no peak. This most certainly comes up in the real world, the first I can think of is die rolling and board games that use spinners. There is also wait times for something that occurs at regular intervals: Say a bus arrives at a given bus stop every 8 minutes. You arrive at a random time. The length of time you're waiting for the bus will follow a uniform distribution.
Depending on the parameters, the Beta distribution can be symmetric, and is certainly nonnormal, since it can only be between 0 and 1. One useful thing is to model probabilities, for instance in Baseball, we can model batting averages. Full disclaimer, I stole this example from the StackOverflow answer here:
http://stats.stackexchange.com/questions/47771/whatistheintuitionbehindbetadistribution
See the first answer.
The Uniform distribution is symmetric, but has no peak. This most certainly comes up in the real world, the first I can think of is die rolling and board games that use spinners. There is also wait times for something that occurs at regular intervals: Say a bus arrives at a given bus stop every 8 minutes. You arrive at a random time. The length of time you're waiting for the bus will follow a uniform distribution.
Depending on the parameters, the Beta distribution can be symmetric, and is certainly nonnormal, since it can only be between 0 and 1. One useful thing is to model probabilities, for instance in Baseball, we can model batting averages. Full disclaimer, I stole this example from the StackOverflow answer here:
http://stats.stackexchange.com/questions/47771/whatistheintuitionbehindbetadistribution
See the first answer.
Can someone help explain the relationship among population, sampling frame, and sample? They all are so interlaced to me.
According to wikipedia, "a sampling frame is the source material or device from which a sample is drawn. It is a list of all those within a population who can be sampled."
Let's take an example. Suppose someone wanted to find out how many houses in the city of Yelm were painted white.
The *population* is everything being considered: all houses in the city of Yelm, WA.
There are many choices of *sampling frames*. I might choose the property tax rolls of the city, since that probably lists every house in the city (so it covers the population) and probably assigns each house an ID # (so it would be easy to choose a random sample by using a random number generator).
The *sample* is which houses I actually draw from the population to see what color they are. Maybe I decide to sample 70 houses. I use a random number generator to give me random numbers across the range of ID #'s in Yelm's property tax rolls, take the first 70 valid ID #'s, go to those houses and record what color they are.
Let's take an example. Suppose someone wanted to find out how many houses in the city of Yelm were painted white.
The *population* is everything being considered: all houses in the city of Yelm, WA.
There are many choices of *sampling frames*. I might choose the property tax rolls of the city, since that probably lists every house in the city (so it covers the population) and probably assigns each house an ID # (so it would be easy to choose a random sample by using a random number generator).
The *sample* is which houses I actually draw from the population to see what color they are. Maybe I decide to sample 70 houses. I use a random number generator to give me random numbers across the range of ID #'s in Yelm's property tax rolls, take the first 70 valid ID #'s, go to those houses and record what color they are.
there are less videos on econometrics..:(
A manufacturer knows that their items have a normally distributed lifespan, with a mean of 2.6 years, and standard deviation of 0.5 years.
If you randomly purchase 25 items, what is the probability that their mean life will be longer than 3 years?
If you randomly purchase 25 items, what is the probability that their mean life will be longer than 3 years?
The average number of defective hard disks made by a certain manufacturer is 3. What is the probability of seeing no more than 10 defective hard disks in a large sample?
You would need at least the standard deviation of the number of defective disks in order to calculate that. With the information presented it's impossible to give an answer.
Discuss the site
View general discussions about Khan Academy.
Flag inappropriate posts
Here are posts to avoid making. If you do encounter them, flag them for attention from our Guardians.
abuse
 disrespectful or offensive
 an advertisement
not helpful
 low quality
 not about the video topic
 contentious posts about politics, religion/atheism, or personal issues
 soliciting votes or seeking badges
 a homework question
 a duplicate answer
 repeatedly making the same post
wrong category
 a tip or thanks in Questions
 a question in Tips & Thanks
 an answer that should be its own question
about the site
 a question about Khan Academy
 a post about badges
 a technical problem with the site
 a request for features
Your SpinOffs
Your SpinOffs
Share a tip
Thank the author
Have something that's not a tip or thanks about this content?