Current time:0:00Total duration:14:03

0 energy points

# Confidence interval 1

Estimating the probability that the true population mean lies within a range around a sample mean. Created by Sal Khan.

Video transcript

You sample 36 apples from
your farm's harvest of over 200,000 apples. The mean weight of the sample
is 112 grams with a 40 gram sample standard deviation. What is the probability that the
mean weight of all 200,000 apples is within 100
and 124 grams? Let's think about what
they're asking. So there's some distribution of
all of the weights of all 200,000 apples or there's
more than 200,000. We don't even know how many
apples, just a huge number. So there's some population
distribution of weights. Maybe it looks something
like that. It will have a mean weight. It has a mean weight. We don't know what that mean
weight is, and it also has a population standard deviation. So this might be one standard
deviation above the population mean, that would be one standard
deviation below. And we'll say this distance
right here is the population standard deviation. Both of these are parameters
that we do not know of the entire population. This is the population
distribution right over there. Now, we know from our experience
with the last few videos that you can repeatedly
take samples, or if you kind of visualize, repeatingly take
samples of a certain size-- in this video we're going to focus
on sample sizes of 36. And you keep taking the means of
those sample size, and you plot the frequency with which
you get those means, you would eventually get something
called the sampling distribution of the
sample mean. Let me write this down. The sampling distribution
of the sample mean. So that might look something
like this. I'll try to draw it a little
bit bigger since we're probably going to use this
one a little bit more. It is going to be pretty close
to a normal distribution. It's going to have some mean,
and we specify that-- let me draw it down here. It has mean when we show that
this is the mean of the sampling distribution. And we know that the mean of
the sampling distribution, that the means of all of your
means, or of the actual distribution of means, is
actually going to be your original population mean. So this is going to be your
population mean over here. And it also has some
standard deviation. So maybe this is a standard
deviation above the mean, this is a standard deviation below
the mean, right over there. And we can specify that by the
standard deviation of the sampling distribution
of the means. And we know that this can
be a given, or I guess approximated, because for fairly
large samples this gives a pretty good indicator. There's a couple of correction
factors if you get to smaller samples. But this is going to be our
population standard deviation divided by the square root--
and we saw this in the last two videos-- divided by the
square root of the number of samples we have when we
calculate each of those means. And we know in this example that
we are taking 36 samples. So this is the square
root of 36. This is a sampling distribution
of the sample mean-- let me write it over
here-- for n equals 36. For each of our sample buckets
or baskets to have 36 items and then we take their mean. And then that is essentially
each of those means is a sample from this distribution
right over here. The means are the sample from
this-- the things that we're using to calculate the samples
are samples, or the things that were used to calculating
the means are samples from that. Hopefully that isn't
too confusing. But this isn't the first
time we've seen it. Anyway, this distribution's
standard deviation is going to be the standard deviation of
this population standard deviation divided by 6. But we still don't know this. We still don't know this
parameter up here. Now with that said let's
refocus on what they're actually asking us. They want to know the
probability that the mean weight of all 200,000 apples. Well the mean weight of all
200,000 apples is that parameter right over there. And they want to know what is a
probably that it is between 100 and 124 grams. So they're
actually asking us if something is between 100 and 124
grams it is within 12 of our sample mean. Right? That's all they're saying. What is a probability that this
thing is within 12 of our sample mean? Because if you're less than 12
or if you're 12 less you're going to get to 100. If you're 12 more you're
going to get to 124. So what they're asking us is
what is the probability that our population mean, this
parameter, this unknown parameter, is within 12 of the
mean of our one sample. Now if I told you that I'm
within 5 feet of you, then that also means that you're
within 5 feet of me. So this is the exact same thing
as the probability that the sample mean is within
12 of the actual mean. I really you to-- this
should make sense. If I said what's the probability
that I'm either 5 behind you or 5 ahead of you,
that's the same thing as a probability that you're either
5 behind me or 5 behind you. This is asking what's the
probability that we're 12 apart, or what's the probability
that I am within 12 feet of you. And this is the probability
that you're within 12 feet of me. They're asking the
exact same thing. But when you phrase it this way
it might dawn on you that you might be able to use the
sampling distribution of the sampling mean. There's some unknown mean here,
which is the same thing as this value right here. So this thing-- let me make it
very clear-- this is also the same thing because this
value and this value are the same thing. This is exactly the same thing
as asking what is the probability that our one sample
mean is within 12 of the actual mean of the sampling
distribution. So we're just saying what's the
probability that that one sample mean we have is
within 12 of this actual sampling mean. At this point your brain should
be reading that gee, if I could figure out how many
standard deviations that is, how many standard deviations
away that is on this distribution, I can then use a
Z-table to actually figure out the probability. And that's exactly what
we're going to do. But there's one slight
complication here. We don't know the actual
standard deviation of the sampling distribution. We just know that it's
this thing right over here divided by 6. But we don't know this thing. So what we're going to
do is get our best estimate of this thing. So we need a good estimator
for the actual population standard deviation. What's our best estimate
of that? What's going to be our sample
standard deviation? We sampled 36 things,
and we had a sample distribution of 40. So we have a sample distribution
of-- let me write this way. This is going to be
approximately equal to our sample distribution or sample
standard deviation, which we got to be 40. So we literally just took-- we
found the mean of our 36 apples, mean weight was 112
grams. Then we found the square distance from each of the
apples' weights to this. Took the average of those. Well we didn't take the straight
up average, we divided by n minus 1. We learned all of this many,
many videos ago. And then we took the square
root of that. This gave us the sample standard
deviation, it is our best estimator for this. So if this is our best estimator
for that, our best estimator for this thing right
here is going to be equal to our sample standard deviation
divided by 6, which is equal to 40 over 6, which is
equal to-- let's get our calculator out. So if we have 40 divided by
6, we have 6.6-- I'll just write down 6.67. So this thing right
here is 6.67. So our best estimate of the
standard deviation of the sample distribution of the
sample mean is 6.67. So this distance right
here is 6.67. So how many standard deviations
is 12 if you look at this distribution
right over here? Well we just divide
12 by 6.67. So let me get the calculator
back up. So if we have 12-- I'll just
divide it by that 6.-- actually this exact number--
12 divided by-- answer just means the last answer we got--
that gives us 1.8 exactly. The numbers just happened
to work out well. So this is completely analogous
to saying what is the probability that our one
sample mean is within 1.8 standard deviations. Let me write it this way. It was in 1.8 standard
deviations of the sample mean-- within 1.8 of these--
of our actual mean of our sampling distribution. So we're literally
just asking that. So if you look at this
distribution up here, within 1.8 standard deviations, this
is one standard deviation, another 0.8 would maybe get
us right about there. And we're within 1.8 above and
1.8 below, so this is 1.8 standard deviations above the
mean, this is 1.8 standard deviations below the mean. So we're just going to say
what's the probability when we just took this one sample of 36
apples that we lie in this space over here? And to figure this out what I'm
going to do is I'm going to use our Z-table to figure
out essentially this space over here. Just what's the probability
of being 12 above it. And then we can just double it
because a normal distribution is symmetric. So let's go to our Z-table. So what's the probability of
being between the mean and 1.8 standard deviations
above the mean? So if you just go straight to
your Z-table, 1.80 is this right over here. Now you get this
0.9641 number. But be very careful. This 0.9641 number that gives
you-- so if I draw a normal distribution-- let me draw a
better normal distribution. If I draw a normal distribution
like that, and this is our mean, this 0.9641
number tells us the probability that we are less
than 1.8 standard deviations above the mean. So this is 1.8 standard
deviations up here, then this is our mean right over here. This is giving us this entire
area right over there. So if I want just this area
right here, what I need to do is from this value, from this
0.9641, I need to subtract this, the probability that
you're essentially directly less than the mean. And that is this. This is the probability that
you're less than the mean, or you're less than the mean plus
0 standard deviations. So this value right
here is 0.50. This whole area that I
just showed you right over there is 0.9641. So this area right over here
is going to be 0.9641 minus 0.5, which is going to
be equal to 0.4641. So this area right here, just
what's in the magenta. Just between there and
there is 0.4641. Let me make sure I
got that right. 0.4641. And if I want this entire
area I just double it. If I want to include this
as well, I just have to double that. So let me get my calculator
out, let me get the trusty calculator. So we're going to have 0.4641
times 2 is equal to 0.9282. So this whole area right over
here is equal to 0.9282. So we did something neat. The probability that the--
remember that well, we're answering right here. The probably that our sample
mean just happens to be within 1.8 standard deviations--
remember, that was 1.8 standard deviations of the
sample means from the actual mean is 0.9282, or there's
a 92.82% chance. But that's also saying that
there's a 92.82% chance that the actual mean is within 12 of
our measured sample mean. And that is neat. Because for the first time
ever-- we started with very little information. We just started with a little
sample over here. And we were able to get as much
information about that sample as possible. But we can now say that there's
a 92.82% chance that the actual mean is within 12
of the mean we measured. That the actual mean is between
100 and 124, or that we're 92.82% confident that
the actual mean is in the range between 100 and
124 grams. I think that's pretty neat.