Main content
Statistics and probability
Course: Statistics and probability > Unit 11
Lesson 3: Estimating a population mean- Introduction to t statistics
- Simulation showing value of t statistic
- Conditions for valid t intervals
- Reference: Conditions for inference on a mean
- Conditions for a t interval for a mean
- Example finding critical t value
- Finding the critical value t* for a desired confidence level
- Example constructing a t interval for a mean
- Calculating a t interval for a mean
- Confidence interval for a mean with paired data
- Making a t interval for paired data
- Interpreting a confidence interval for a mean
- Sample size for a given margin of error for a mean
- Sample size and margin of error in a confidence interval for a mean
© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice
Confidence interval for a mean with paired data
Confidence interval for a mean difference with paired data.
Want to join the conversation?
- Could anyone explain, why the standard deviation of the difference has these strange value? As I realized from previous videos, Variance of difference equals to sum of variances. Is it correct?(15 votes)
- Because they are dependent, the same person did 50 flips for example, if 30 was the dominant hand, 20 would be the non-dominant hand
When can it be independent? if we had 2 groups of people, then we can calculate the diff data set using the 2-way data table (columns names are the group 1 people's names, and row names are the group 2 people's names), then we can calculate the standard deviation(1 vote)
- Why are we using t table when we are having the population statistic in this case? Seeing that the study is meant and conducted only for the five people. Also, why is the std. deviation derived for the difference of dom-non?. Kindly clarify. Thanks!(8 votes)
- We're not calculating the confidence of the sample proportion, so using a critical z value will cause an incorrect result. When calculating a sample mean, we must use t tables in order to get a better value. Also, we do not have the population statistic. The population would be the people in general, and the sample is those five people.(1 vote)
- Why atdoes he say that we are 95% confident in capturing the true mean difference "for these friends"? Aren't we using these friends as a sample to find an interval that we are 95% confident of capturing the true mean difference for a wider population? If it is for these friends then surely either we are using the full population and don't need to use t tables or we are using a maximum sample size because if people can participate twice we lose independence? 9:20(6 votes)
- Perhaps the population is a group of people around their ages, hand sizes and etc.(3 votes)
- how would you calculate the standard deviation of the difference if you only had the standard deviation for each set( non-dominant and dominant) and their means??(2 votes)
- Why don't we have to add the variances of the dominant and non-dominant sets to find the variance and thereby the standard deviation of the difference set, and then use that number for our confidence interval. For example, I am right-handed but I snap with my left hand and when I did this experiment, I snapped almost 70 times with my left hand but only 22 times with my right hand, but the standard deviation of the difference would suggest that this is essentially impossible. Even if the difference isn't always that extreme, one would think that there are a number of people that can snap more with their non-dominant hand than their dominant.(2 votes)
- What is the population and what is the sample here?(1 vote)
- At, why does Sal divide the s(SD for sample) with n instead of the df? I am confused. Isn’t df used in the denominator if we are taking sample data? 8:00(1 vote)
- df is used for selecting t critical value. To estimate the stddev of the sampling distribution we still refer to the central limit theorem which requires to use sqrt(n).(1 vote)
- Very important question in Life. Lmaoooooooo(1 vote)
- Why is this a matched-pair design? What is matching?(1 vote)
- Matched pair design is when a researcher puts one group of the data with another based on a similar quality. A researcher would likely match 2 athletes together based on exercise, weight, work ethic, etc. Sometimes a researcher can a match a person with themselves (ex. 1st try on Nike shoe and then Adidas and then compare the comfort level). From a matched pair design you can do a paired t-test and compare differences in qualities.(1 vote)
- I am confused as to why we cannot directly use S_diff in our interval, could someone explain what S_diff/sqrt(n) represents compared to S_diff? Does it represent the standard deviation of the sampling distribution of the mean of X_diff?(1 vote)
Video transcript
- [Instructor] A group of
friends wondered how much faster they could snap their fingers on one hand versus the other hand. Very important question in life. Each person snapped their
fingers with their dominant hand for 10 seconds and their non-dominant
hand for 10 seconds. Where, if you're right-handed, right hand would be your dominant hand. If you're left-handed, left hand would be your dominant hand. Each participant flipped
a coin to determine which hand they would use first, because if you always used
your dominant hand first, maybe you're tired by
the time you're doing your non-dominant hand or
there's something else. So here it's random
which one you use first. Here are the data for how many snaps they performed with each hand, the difference for each
participant, and summary statistics. And this is actually real data from the Khan Academy Content Team, and so you see, for each
of the participants, for Jeff right over here, he was able to do 44 snaps in 10 seconds on his dominant hand, which is impressive, more than I think I could do, and he was even able to do
35 on his non-dominant hand, and so the difference here, the dominant hand minus
the non-dominant was nine, and then they tabulated this
data for all five members. Now they also calculated
summary statistics for them, but this is the really
interesting thing right over here. This is the difference
between the dominant and the non-dominant hand, and so what they did
here, the mean difference, what they did is they took
this row right over here and they calculated the mean,
which they got to be 6.8, and then they calculated
the standard deviation of these differences right over here, which they got to be approximately 1.64, and then we are asked: Create and interpret a
95% confidence interval for mean difference in number of snaps for these participants. So pause this video. See if you can make some headway here. See if you can think about
how to approach this. So what's interesting here is, we're not trying to construct
a confidence interval for just the mean number of
snaps for the dominant hand or the mean number of snaps
for the non-dominant hand, we're constructing a
95% confidence interval for a mean difference. Now you might say, wait, wait, wait, I have two different samples here and then this third data
is somehow constructed from these other two, but one way to think about it, this is matched pairs design. So in a matched pair
design, what you do is, for each participant, for
each member in your sample, you will make them do the
control and the treatment. So for example, you could do the control as how many they can do in the
dominant hand in 10 seconds, and the treatment is how many they can do in the non-dominant hand, and in matched pairs design, you're really concerned
about the difference, and so you can really view this as, is you just have one sample size of five for which you are
calculating the difference for each member of that sample and the standard deviation
across that entire sample. Now before we calculate
the confidence interval, let's just remind ourselves
some of our conditions that we like to think about
when we are constructing confidence intervals. The first condition we think about is whether our sample is random. Now if we were trying to
make some type of judgment about all human beings and
their snapping ability, this would not be a random sample. These people all work at Khan Academy. Maybe somehow, in our interview process, we select for people who
snap particularly well, but whatever inferences we make, we can say, hey, this is
roughly true about this group of friends. Now the next condition
we wanna think about is the normal condition. Now there's a couple of
ways to think about it. If we had sample size of 30 or larger, the central limit theorem says, okay, the sampling distribution
would be roughly normal, the sampling distribution
of the sample means, but obviously, our sample size
is much smaller than that. One way to think about it. We could just plot our data points and see whether they seem
to be skewed in any way, and if we just do a little
dot plot right over here, we could say. Let's see, make this, zero,
one, two, three, four, five, six, seven, eight, and nine. So we have one data point
where the difference was nine, one data point where
the difference is five, one data point where
the difference is eight, one data point where
the difference is six, and another data point
where the difference is six, and so this doesn't look
massively skewed in any way. Our mean difference was right over here. It's about 6.8. It looks roughly symmetric. So we can feel okay about
this normal distribution. This isn't the best study
that one could conduct. This is obviously a small sample size. It's not random of the entire population, but maybe we could go with it. Also, when you think about
biological processes, like how well someone snaps, which is a product of a
lot of things happening in the human body, and it's the sum of many many processes. Those things also tend to have a roughly normal distribution, but I won't go into too much depth there, but all of these things, once again, this isn't a super robust study, but this is a fun thing for friends to do if they have nothing else to do. All right. Now the third one is independence. And this one actually we
can feel pretty good about, because Jeff's difference right over here really shouldn't impact
David's difference, or David's difference
really shouldn't impact Kim's difference, especially if they're
not observing each other. And let's just say for sake of argument that they did it all
independently in a closed room with an independent observer, so they weren't trying to get competitive or something like that, but needless to say, this isn't a super robust study, but we can still calculate
a 95% confidence interval. So how do we do that? Well, we've done this so many times, our confidence interval
would be our sample mean, so it would be the mean of our difference, the mean of our difference plus or minus, Now we don't know the
population standard deviation, so we're going to use our
sample standard deviation and if you're using a
sample standard deviation and this confidence interval
is all about the mean, and so our critical
value here is going to be based on a t-table on a t-statistic, and they're going to multiply that times the sample standard
deviation of the differences, divided by the square
root of our sample size, divided by the square root of five. Now we know most of this data here, and let me just write it down over here. We know the mean, the sample
mean right over here, 6.8. So it's going to be 6.8 plus or minus, and now what will be
our critical value here? Well, we wanna have a
95% confidence interval and what's our degrees of freedom? Well, it's one less than our sample size, so our degrees of freedom right
over here is equal to four. And so we're ready to use a t-table. So this is a truncated
t-table that I could fit on my screen here, and so there's a couple
of ways to think about it. Here they actually give
us the confidence level, and the reason why that
corresponds to a tail probability of 0.025 is that if
you take the middle 95% of a distribution, you're going to have 2.5% on either end. That's going to be your tail probability, so that's all that's going on over there. So we're going to be in
this column right over here and which degrees of freedom do we use? Well, it's gonna be
four degrees of freedom. Our sample size is five. Five minus one is four, so this is going to be
our critical value, 2.776. So we have 2.776 as our critical value, and then times our sample
standard deviation. Well, the sample standard deviation for our difference is
right over here, is 1.64, and then we're going to divide that by the square root of our sample size. So the square root of our sample size, where I already wrote a five in there. Sometimes I just write an N there, and so, what is this going to be equal to? First, let's just calculate
just the margin of error right over here, so this is going to be 2.776 times 1.64 divided by the square root of five, and we get a margin of error
of approximately 2.036, so this is going to be
6.8 plus or minus 2.036. It's approximately equal to that, where this is our margin of error, and if we actually wanted
to write out the interval, we could just take 6.8 minus
this, and 6.8 plus that, so let's do that again
with the calculator. So 6.8 minus 2.036 is equal to 4.764. So our confidence interval
starts at 4.764, approximately, and it goes to, let's see. I can actually do this one in my head, if I add 2.036 to 6.8, that is going to be 8.836. Now how would we interpret
this confidence interval right over here? One way to interpret it is to
say that we are 95% confident that this interval captures
the true mean difference in snaps for these friends. We could also say that there
appears to be a difference in the mean number of snaps, since zero is not
captured in this interval, and since the entire
interval is above zero, Zero is not captured
here and it's above zero, it seems that this group right over here, this group of friends at Khan Academy, can snap faster with their dominant hands.