If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

# Inferring population mean from sample mean

## Video transcript

let's say you're trying to design some type of a product for men one that is somehow based on their height and the product is for the United States so ideally you would like to know the mean height of men in the United States let me write this down mean height height of men men in the in the United States so how would you do that when I talk about the mean when I talk about the mean I'm talking about the arithmetic mean if I were to talk about some other types of means and there are other types of means like the geometric mean I would specify it but when people just say mean they're usually talking about the arithmetic mean so how would you go about finding the mean height of men in the United States well the obvious one is as well you go and you go and ask every or measure every man in the United States take their height add them all together and then divide by the number of men there are in the United States but the question you need to ask yourself is whether that is practical because you have on the order let's see there's about 300 million people in the United States roughly half of them will be men or at least they'll be male and so you will have 150 million roughly 150 million men in the United States so if you wanted the true mean height of all of the men in the United States you would have to somehow survey or not even so you would have to be able to go and measure all 150 million men and even if you did try to do that by the time you're done many many of them might have passed away the new men will have been born and so your data will go stale immediately so it is seemingly impossible or almost impossible to get the exact height of every man in the United States and a snapshot of time and so well instead what you do is say well look okay I can't get every man but maybe I can take a sample I could take a sample of the men in the United States and I'm going to make an effort I'm going to make an effort that it's a random sample I don't want to just go sample 100 people who happen to play basketball or play basketball for their college I don't want to go sample 100 people who are volleyball players I want a randomly sample just you know maybe the first person who comes out of a mall in a random town or in several towns or something like that something that should not be based in any way or skewed in any way by height so you take a sample and from that sample you can calculate a mean of at least the sample and you'll hope that that is indicative of if especially if this was a reasonably random sample you'll hope that that was indicative of the mean of the entire population and what you're going to see in much of statistics and much of statistics it is all about it is all about using using information using things that we can calculate about a sample to infer things about a population because we can't directly measure the entire population so for example let's say and I wouldn't if you're actually trying to do this I would recommend doing at least 100 data points or a thousand and later on we'll talk about how you can think about whether you've measured enough or how confident you can be but let's just say you're a little bit lazy and you just sample you just sample five men and so you get there five Heights let's say one is 6.2 feet let's say one is 5.5 feet 5.5 feet would be five foot six inches one would be let's say 1 ends up being five point seven five seven five feet another one is 6.3 feet another is 5.9 feet now if this is if these are the ones that you happen to sample what would you get what would you get for the mean of this sample well let's get our calculator out and we get six point two plus five point five plus five point seven five plus six point three plus five point nine the sum is twenty nine point six five and then we want to divide by the number of data points we have so we have five data points so let's divide twenty nine point six five divided by five and we get 5.9 three feet so here our sample mean and I'm going to denote it with an X with a bar over it is and I already forgot the number five point nine three feet five point nine three feet this is this is our sample sample mean or if we want to make a clear sample arithmetic mean and when we're taking this calculation based on a sample and then somehow we're trying to estimate it for the entire population we call this we call this right over here we call it a statistic we call it a statistic now you might be saying well what what notation do we use if somehow we are able to measure it for the population let's say we can't even measure it for the population but we at least want to denote what the population mean is well if you want to do that the population mean is usually denoted by the Greek letter mu so the population mean is usually denoted by the Greek letter mu and so a lot of statistics it's calculating a sample mean and an attempt to estimate this thing that you might not know the population mean and these calculations on the entire population sometimes you might be able to do it oftentimes you will not be able to do it these are called parameters this is called parameters so what you're going to find in much of statistics it's all about calculating statistics for a sample finding these sample statistics in order to estimate parameters for an entire population now the last thing I want to do is introduce you to some of the notation that you might see in a statistics textbook that looks very mathy and very difficult but hopefully after the next few minutes you'll appreciate that it's really just doing exactly what we did here adding up the numbers and dividing by the number of numbers you add if you had to do the population mean it's the exact same thing it's just many many more numbers in this context you would have to add up 150 million numbers and divide by 150 million so how do mathematicians talk about an operation like that adding up a bunch of numbers and dividing by the number of numbers well let's first think about the sample mean the sample will the sample mean because that's where we actually did the calculation so a mathematician might call each of these data points they'll call it let's say they'll call this first one right over here they'll call this X sub X sub one but call this one X sub two the call this one X sub 3 they'll call this one when I say sub I'm literally saying subscript 1 subscript 2 subscript 3 they could call this X subscript 4 they could call this X subscript 5 and so if you had n of these you would just keep going X subscript 6x scrub and X subscript 7 all the way to X subscript n and so to take the sum of all of these they would denote it as the sum let me write it right over here so they will say that the sample mean is equal to the sum the sum of all my X sub i's my X sub I so wait you can conceptualize that these eyes will change so the eyes are going to in this case the I started at 1 the eyes are going to start 1 until the size of our actual sample so all the way until n in this case n was equal to 5 so this is literally saying this is really saying this is equal to X sub 1 plus X sub 2 plus X sub 3 all the way all the way to the nth one once again in this case we only had 5 now are we done is this what the sample mean is well no we aren't done we can't we don't just add up all of the all of the data points we then have to add divide by the number of data points there are so we then have to divide we then have to divide by the number of data points that there actually are so this might look like very fancy notation but it's really just saying add up your data points and divide by the number of data points you have and this this this capital Greek letter Sigma literally means sum sum all of the X is from X sub 1 all the way to X sub N and then divided by the number of data points you have now let's think about how we would denote the same thing but instead of for the sample mean doing it for the population mean so the population mean they will denote it with mu we already talked about that and here once again you're going to take the sum but this time it's going to be the sum of all of the elements in your population so your X sub i's and you'll still start you'll still start at I equals 1 but it usually gets to noted that hey you're taking a whole population so they'll often put a capital n right over here to somehow to note that this is a bigger number than maybe this smaller n but once again we are not done we have to divide by the number of data points that we are actually summing and so this once again is the same thing as X sub 1 plus X sub 2 plus X sub 3 all the way to X sub Capital n all of that divided all of that divided by a capital N and once again in this situation we found this practical we found this impractical we can debate whether we took enough data points on this on our sample mean right over here but we're hoping that it's at least somehow indicative of our population mean