If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content
Current time:0:00Total duration:15:02

Video transcript

they live in a country of 100 million people and there's a presidential election coming up and in that presidential election there are two candidates there's candidate a there is candidate a and candidate B and there's some reality let's say I live in a very decisive country and everyone is going to vote for either and everyone participates in election and everyone is going to vote for either candidate a or candidate B and so there's some percentage there's some reality there that P let me write it over here maybe 1 minus P percent let me do the P first there's some reality that maybe P percent will vote for B and I can switch them around if I wanted so P percent are going to vote for B and the rest of the people are going to vote for a so maybe one minus P percent are going to vote for a one minus P and you might already recognize that this is a Bernoulli distribution there's one of there's one of two values for a sample I can get and right here the values I said your user voting for candidate a or voting for candidate B it's very hard to deal with those values you can't calculate a mean between a and B and all of that those are letters they're not numbers so to make us to make it manipulatable mathematically we're going to say sampling someone who's going to vote for a is equivalent to sampling a zero and sampling someone who's going to vote for B is equivalent to sampling a one and if you do that with a Bernoulli distribution we learned in the video on Bernoulli distributions that the mean the mean of this distribution right here the mean of this distribution right here is going to be equal to P and it's a pretty straightforward proof for how we got that so the mean of this distribution will which will actually be not a value that this distribution can take on is going to be someplace over here and it is going to be equal to P now my country has 100 million people it is practically or it is definitely impossible for me to be able to go and ask all hundred million people who are they going to vote for so I won't be able to exactly figure out what these parameters are going to be what my mean is what P is going to be but instead of doing that what I'm going to do is do a random survey I'm going to sample this population then get look at that data and then get an estimate of what P really is because this is what I really care about I really care about P so I'm going to try to estimate P with a sample and then we're also going to think about how good of an estimate that is so let's say so I am going to randomly survey randomly survey or sample randomly survey one hundred one hundred people and let's say I got the following results let's say that 50 57 people say that they were going to vote for person a let me write it this way so 57 people 57 people say they're going to vote for a or that's equivalent to getting 57 samples of zero and then the rest of the people once again very decisive population no one is undecided the rest of the people so 43 people 43 people say they're going to vote for B or that's the equivalent of sampling 43 43 ones now given this sample here what are my what is my sample mean and my sample variance my sample mean my sample mean right here well that's just going to be the average of these zeros and one so I got 57 zeroes so it's going to be 57 times zero plus my forty three ones so the sum of all of my samples so it's 43 ones plus 43 times one over the total number of samples I took over 100 so what does this what does this get me so 57 so 57 times zero is zero 43 times one divided by 100 is zero point four three that is my sample mean the mean of just 100 data points that I actually got now what is my sample variance what is my sample variance sample variance is going to be equal to is going to be equal to the sum of my squared distances to the mean divided by my samples - one remember this is my this is a sample variance and we want to get the best estimator of the real variance of the real variance of this distribution and to do that you don't divide by 100 you're going to divide by 100 - 1 we learned that many many videos ago so what is what is my so I have 57 so I had 57 samples of 0 so I have 57 we do it in that same yellow color 57 samples of 0 and so each of those samples are 0 minus 0.4 3 away from the mean right each of those samples are 0 you subtract 0.43 this is the difference between 0 and point 4 3 and if I want the squared distance I square it that's how we calculate variance there's 57 of those and then there's 43 times that I sampled a 1 in my sample population 43 times I sample to 1 and the 1 is 1 minus 0.4 3 away from the mean because that is the mean and I want to square that distance and then I don't want to just divide it by n I don't want to just divide it by a hundred remember I'm trying to estimate the true population mean in order for this to be the best estimator of that and I give you an intuition of why many many videos ago we divided by 100 minus 1 we divided by 100 minus 1 or 99 let's get the calculator out to actually figure out our sample variance let me get the calculator out to get the calculator out and we have so I'll do the numerator first I have 57 57 times 0 minus 0.4 3.43 squared + 43 + 43 times 1 minus 0.4 3 squared squared and then all of that all of that divided by 100 - 1 or 99 divided by 99 is equal to 0.2 4 7 5 so this is equal to so my variance my sample variance is equal to zero point two four seven five and if I want to figure out my sample standard deviation I just take the square root of that my sample standard deviation is just going to be the square root of my sample variance so I take the square root of that value that I just had which is 0.49 seven so actually let me just round let me just round that up as 0.50 so my sample standard deviation is 0.5 oh now if you just look at this you say okay well your best estimate of the percentage of people voting for a or B is really what you just saw here your best estimate or your best estimate of the mean is that 43 percent of people are going to vote for B and everyone else is going to vote for a but an interesting question is is how good of a sample is that and it looks let's let's take to the next level let's try to think of an interval around 43 percent that for which we are 95% we were reasonably confident almost or roughly 95 percent sure that the real mean is in that interval and let me make it very clear let me draw so this when we when we get our sample mean we are sampling from the sampling distribution of the sampling mean and so let me draw that the sampling distribution of the sample mean so since we're since we're sampling from a discrete distribution it's actually going to be it's actually going to be a discrete distribution but I'm going to but it's going to have 100 possible values right this can take on 100 different values here really anything between 0 & 1 but I'll draw it kind of continuous because it would be hard for me to draw 100 different bars if I did it would have a bar there you'd have a bar there you know the the odds that your sample mean would be one would be it would be very low probability and then you would have one more bar bar like that a bar like that but that takes forever to draw so I'm just going to approximate it with this normal curve right over there and so the sampling distribution of the sample mean let me write it over here so this is the sampling distribution sampling distribution of the sample mean of the sample mean it has some mean here it has some mean it has a mean and I can denote it with the MU sub x-bar this tells us this is the mean of the sampling of the sample distribution but we know from many many videos that this is going to be the same thing as the mean of the the population mean that we are sampling from that each sample comes from each of these hundred samples come from so this is going to be equal to MU which is going to be equal to P so this is going to be equal to MU which is equal to P now this variance over here the variance of this distribution the variance of this distribution let me draw it like this right even better let's say the standard deviation of this distribution the standard deviation of this distribution that distance right over here the standard deviation the standard deviation of the sampling distribution of the sample mean we've seen it multiple times already it's going to be this standard deviation it's going to be the standard deviation of our population distribution so you know that standard deviation is going to be that distance over there so there's some standard deviation associated with this distribution it's going to be that standard deviation divided by the square root of our sample size and we saw many videos ago why that at least experimentally makes sense or why it intuitively makes sense so it's going to be the square root of 100 means square root of 100 so it's going to be this guy it's going to be that guy divided by 10 now we do not know what this guy is the only way to figure out what that guy is is to actually survey a hundred million people which would have been impossible so to estimate to estimate the standard deviation of this we will use our sampling standard deviation we will use our sampling standard deviation as our best estimate for the population standard deviation so we can say remember this is an estimate we cannot come up the exact number for this just from a SAN but we can estimate it because this is our best estimator for this standard deviation and if we divide it by ten we will have our best estimator for the standard deviation of the sampling distribution of the sampling mean so remember this is just an estimate it is just an estimate so you kind of have to take everything after this point with a little bit of a grain of salt so it's going to be roughly equal to it's going to be roughly equal to our an estimate of it is going to be 0.5 it's going to be 0.5 0.5 and remember every time we do it take a different sample from here this number is going to change so this isn't like something in stone this is dependent on our sample so it's going to wiggle around a little bit depending on what numbers we actually get in our sample but it's going to be 0.5 oh this is the S right here this is the S right over here this point 5o divided by 10 which is equal to point zero five so our best estimate of this standard deviation is point zero five or you could even view it as 5% now what I want to do is come up with an interval around the sample mean an interval around the sample mean where I'm reasonably confident using all of my estimates and all of that where I'm reasonably confident that there's a that or maybe we say that I'm really confident that there's a 95% chance there's a 95% chance that the true mean that the true mean is is within two standard deviations or let me put it this way there's a 95% chance that the true mean is in that interval so let me write this down I want to find an interval I want to find an interval fine and interval such that such that I am such that I am reasonably confident reasonably confident and I'm putting this kind of touchy-feely language over here because it's all around the fact that I don't know for a fact that the standard deviation is point zero five I'm just estimating but I'm really easily confident that the true that reason that there is a 95% chance that the there is a 95% chance that the true mean of the population which is the same thing as the proportion of the population who are going to say we're going to vote for person B or the proportion of the population that are going to be a1 so this is this is so this is also you know we just have to remember that mu that mu is equal to P that the true P is in there's a 95% chance that the true P is in that interval and actually since I've already gone 14 minutes into this video I'm going to pause this video I'm going to stop this video here and maybe I'll even let you think about it just based on what everything we've done so far we figured out the sample mean I'm sorry we figured out the sample mean right over here we've figured out an estimate an estimate for the I remember this is just a sampling mean we don't know the true this is the mean of our sample we don't know the true Sam the true mean of the sampling distribution and we also don't know the true standard deviation of the sampling distribution but we were able to estimate it with the sample with the sample standard deviation now everything that we have so far and based on what we've seen before on confidence intervals and all that how can we find an interval such that roughly and I'm saying roughly because we had to estimate the standard deviation that there's a 95% chance that the true the true mean of our population or that P the proportion of the population saying one is in that interval and we're going to do that in the next video