Statistics and probability
- Inferring population mean from sample mean
- Central limit theorem
- Sampling distribution of the sample mean
- Sampling distribution of the sample mean (part 2)
- Standard error of the mean
- Example: Probability of sample mean exceeding a value
- Mean and standard deviation of sample means
- Sample means and the central limit theorem
- Finding probabilities with sample means
- Sampling distribution of a sample mean example
Much of statistics is based upon using data from a random sample that is representative of the population at large. From that sample mean, we can infer things about the greater population mean. We'll explain. Created by Sal Khan.
Want to join the conversation?
- But how could you estimate the percent of the whole population using the sample?(7 votes)
- When using a sample to estimate a measure of a population, statisticians do so with a certain level of confidence and with a possible margin of error. For example, if the mean of our sample is 20, we can say the true mean of the population is 20 plus-or-minus 2 with 95% confidence. In other words, we are 95% sure that the true mean of the population is between 18 and 22.(6 votes)
- What is the difference between sample and population mean?(4 votes)
- population mean is the arithmetic mean of the whole population. For large groups (say all adult males in the united states), finding this mean is impractical. But we are not lost. We can use sampling to estimate the population mean (which we cannot know for certain). Suppose we want to know the mean height of adult males in the U.S. We could randomly select a sample of 50 men and calculate their average height. This would give us our sample mean.
This distinction is important and it is the reason that we need inferential statistics. We cannot measure what we want to know (population mean), but we can use statistical techniques to estimate the population mean to some desired degree of accuracy with a desired likelihood of being correct. The Central limit theorem tells us that the distribution of the sample means that we get any time we sample are normally distributed around the mean of the our population (the thing we want to know but cannot calculate directly). So if we choose our sample size large enough and ensure that our sample is randomly selected we can state the the sample mean that we calculate is within some range of the actual population mean (based on our sample standard deviation) with a certain degree of certainty (usually 95% or 99.7%).(4 votes)
- At0:34what do you mean by geometric and arethmatic mean because I'm confused?(3 votes)
- Geometric and arithmetic mean have different meanings in some areas of math. For now just use the mean you've learned so far.(4 votes)
- the mean of 16 numbers is 8.if z is added whats is the new mean(3 votes)
- If the mean of 16 number is 8, that means you have 16 values (a, b, c, ... , p) whose average is 8.
(a + b + c + ... + p)/16 = 8. So, in other words: a + b + ... + p = 8 * 16 = 128
If you throw in a new number "z" into the mix, then you would have:
(a + b + ... + p + z) / 17 = (128 + z) / 17
This is approximately, 7.5294 + 0.588z as the new mean.(4 votes)
- What does that weird looking E mean?(3 votes)
- It's the greek letter 'Sigma'. It just means that you add up everything in a list. It's just a symbol for people who read maths so they know what is going on in the equation.(2 votes)
- At7:45, what is the funny looking E thing?(2 votes)
- What does the dash over x stand for?
Thank You(2 votes)
- 4:42it states that x-bar (x̄) is the sample mean.
that is the sum of all the entries in your sample divided by the amount of entries
You have 9 bags of lollies and you want to find mean amount of lollies.
1 bag has 6 lollies, another has 7 ,another 3, 5, 8,4,10 8, and the final bag has 12.
so the the sample mean or x-bar (x̄) would be:
so the mean or x-bar is x̄=7(2 votes)
- does "sub" mean anything special(2 votes)
- "Sub" is short for "subscript". Each subscript (1, 2, 3, 4, 5, ...) represents the a specific data point. For example, the variable x-sub-3 denotes the third element of the data set.
x-sub-i represents an arbitrary data point in our set. This is similar to how we treat variables in algebra as arbitrary placeholders for any number on the number line.(2 votes)
Let's say you're trying to design some type of a product for men. One that is somehow based on their height. And the product is for the United States. So ideally, you would like to know the mean height of men in the United States. Let me write this down. So how would you do that? And when I talk about the mean, I'm talking about the arithmetic mean. If I were to talk about some other types of means-- and there are other types of means, like the geometric mean-- I would specify it. But when people just say mean, they're usually talking about the arithmetic mean. So how would you go about finding the mean height of men in the United States? Well, the obvious one is, is you go and ask every or measure every man in the United States. Take their height, add them all together, and then divide by the number of men there are in the United States. But the question you'd ask yourself is whether that is practical. Because you have on the order-- let's see, there's about 300 million people in the United States. Roughly half of them will be men, or at least they'll be male, and so you will have 150 million, roughly 150 million men in the United States. So if you wanted the true mean height of all of the men United States, you would have to somehow survey-- or not even survey. You would have to be able to go and measure all 150 million men. And even if you did try to do that, by the time you're done, many of them might have passed away, new men will have been born, and so your data will go stale immediately. So it is seemingly impossible, or almost impossible, to get the exact height of every man in the United States in a snapshot of time. And so, instead, what you do is say, well, look, OK, I can't get every man, but maybe I can take a sample. I could take a sample of the men in the United States. And I'm going to make an effort that it's a random sample. I don't want to just go sample 100 people who happen to play basketball, or played basketball for their college. I don't want to go sample 100 people who are volleyball players. I want to randomly sample. Maybe the first person who comes out of the mall in a random town, or in several towns, or something like that. Something that should not be based in any way, or skewed in any way, by height. So you take a sample and from that sample you can calculate a mean of at least the sample. And you'll hope that that is indicative of-- especially if this was a reasonably random sample-- you'll hope that was indicative of the mean of the entire population. And what you're going to see in much of statistics it is all about using information, using things that we can calculate about a sample, to infer things about a population. Because we can't directly measure the entire population. So for example, let's say-- And if you're actually trying to do this, I would recommend doing at least 100 data points, or 1,000, and later on we'll talk about how you can think about whether you've measured enough or how confident you can be. But let's just say you're a little bit lazy, and you just sample five men. And so you get their five heights. Let's say one is 6.2 feet. Let's say one is 5.5 feet-- 5.5 feet would be 5 foot, 6 inches. Let's say one ends up being 5.75 feet. Another one is 6.3 feet. Another is 5.9 feet. Now, if these are the ones that you happen to sample, what would you get for the mean of this sample? Well let's get our calculator out. And we get 6.2 plus 5.5 plus 5.75 plus 6.3 plus 5.9. The sum is 29.65. And then we want to divide by the number of data points we have. So we have five data points. So let's divide 29.65 divided by 5, and we get 5.93 feet. So here, our sample mean-- and I'm going to denote it with an x with a bar over it, is-- and I already forgot the number-- 5.93 feet. This is our sample mean, or, if we want to make it clear, sample arithmetic mean. And when we're taking this calculation based on a sample, and somehow we're trying to estimate it for the entire population, we call this right over here, we call it a statistic. Now, you might be saying, well, what notation do we use if, somehow, we are able to measure it for the population? Let's say we can't even measure it for the population, but we at least want to denote what the population mean is. Well if you want to do that, the population mean is usually denoted by the Greek letter mu. And so in a lot of statistics, it's calculating a sample mean in an attempt to estimate this thing that you might not know, the population mean. And these calculations on the entire population, sometimes you might be able to do it. Oftentimes, you will not be able to do it. These are called parameters. So what you're going to find in much of statistics, it's all about calculating statistics for a sample, finding these sample statistics in order to estimate parameters for an entire population. Now the last thing I want to do is introduce you to some of the notation that you might see in a statistics textbook that looks very math-y and very difficult. But hopefully, after the next few minutes, you'll appreciate that it's really just doing exactly what we did here-- adding up the numbers and dividing by the number of numbers you add. If you had to do the population mean, it's the exact same thing. It's just many, many more numbers in this context. You have to add up 150 million numbers and divide by 150 million. So how do mathematicians talk about an operation like that-- adding up a bunch of numbers and then dividing by the number of numbers? Let's first think about the sample mean, because that's where we actually did the calculation. So a mathematician might call each of these data points-- let's say they'll call this first one right over here x sub 1. They'll call this one x sub 2. They'll call this one x sub 3. They'll call this one-- when I say sub, I'm really saying subscript 1, subscript 2, subscript 3. They could call this x subscript 4. They could call this x subscript 5. And so if you had n of these you would just keep going. x subscript 6, x subscript seven, all the way to x subscript n. And so to take the sum of all of these, they would denote it as let me write it right over here. So they will say that the sample mean is equal to the sum of all my x sub i's-- so the way you can conceptualize it, these i's will change. In this case, the i started at 1. The i's are going to start at 1 until the size of our actual sample. So all the way until n. In this case n was equal to 5. So this is literally saying this is equal to x sub 1 plus x sub 2 plus x sub 3, all the way to the nth one. Once again, in this case, we only had five. Now, are we done? Is this what the sample mean is? Well, no, we aren't done. We don't just add up all of the data points. We then have to divide by the number of data points there are. So this might look like very fancy notation, but it's really just saying, add up your data points and divide by the number of data points you have. And this capital Greek letter sigma literally means sum. Sum all of the x i's, from x sub 1 all the way to x sub n, and then divide by the number of data points you have. Now let's think about how we would denote the same thing but, instead of for the sample mean, doing it for the population mean. So the population mean, they will denote it with mu, we already talked about that. And here, once again you're going to take the sum, but this time it's going to be the sum of all of the elements in your population. So your x sub i's-- and you'll still start at i equals 1. But it usually gets denoted that, hey you're taking the whole population, so they'll often put a capital N right over here to somehow denote that this is a bigger number than maybe this smaller n. But once again, we are not done. We have to divide by the number of data points that we are actually summing. And so this, once again, is the same thing as x sub 1 plus x sub 2 plus x sub 3-- all the way to x sub capital N, all of that divided by capital N. And once again, in this situation, we found this practical. We found this impractical. We can debate whether we took enough data points on our sample mean right over here. But we're hoping that it's at least somehow indicative of our population mean.