Main content

## AP®︎/College Statistics

### Unit 13: Lesson 1

Confidence intervals for the slope of a regression model# Confidence interval for the slope of a regression line

AP.STATS:

UNC‑4 (EU)

, UNC‑4.AF (LO)

, UNC‑4.AF.1 (LO)

, UNC‑4.AF.2 (LO)

Confidence interval for the slope of a regression line.

## Want to join the conversation?

- How is SE coef for caffeine found? We just input data from one sample of size 20 into a computer, and a computer figure out a least-squares regression line. That is we get an output of one particular equation with specific values for slope and y intercept. But how can a computer figure out (or estimate) standar error of slope if he get data from just one sample? Shouldnt we have at least a few samples, and then measure tha variance of slope coefficient for different samples, and only then estimate the tru variance for samplin distribution of slope coefficient?(12 votes)
- The formulas for the SE of coef for caffeine doesn't seem to need multiple different samples, with multiple different least-squares regression slopes.

The formulas can be found here:

https://www.youtube.com/watch?v=THzckPB7E8Q&feature=youtu.be(2 votes)

- why degree of freedom is "sample size" minus 2?(3 votes)
- "Degrees of freedom for regression coefficients are calculated using the ANOVA table where degrees of freedom are n-(k+1), where k is the number of independant variables. So for a simple regression analysis one independant variable k=1 and degrees of freedeom are n-2, n-(1+1)."

Credit: Monito from Analyst Forum.(9 votes)

- How do you find t with a calculator??(6 votes)
- Why don't we divide the SE by sq.root of n (sample size) for the slope, like we do when calculating the confidence interval on the the mean of a sample (mean +- t* x SD/sq.root(n))?(3 votes)
- Whats the relationship between SE and S?(1 vote)
- Again, i think that Caffeine should have been the Dependent Variable & hence on the y axis.(0 votes)
- in this case, the problem is measuring the effect of caffeine consumption on the time time spent studying. in the experiment, the variable that is not dependent on any other factors of the experiment is the amount of caffeine being consumed (hence it is the independent variable). On the other hand, the amount spent studying is an effect of the amount of caffeine consumed (hence it is DEPENDEDENT on the amount of caffeine consumed)(2 votes)

## Video transcript

- [Instructor] Musa is
interested in the relationship between hours spent studying
and caffeine consumption among students at his school. He randomly selects 20
students at his school and records their caffeine
intake in milligrams and the amount of time
studying in a given week. Here is a computer output from a least-squares regression
analysis on his sample. Assume that all conditions
for inference have been met. What is the 95% confidence interval for the slope of the
least-squares regression line? So if you feel inspired, pause the video and see if you can have a go at it. Otherwise, we'll do this together. Okay, so let's first remind
ourselves what's even going on. So let's visualize the regression. So our horizontal axis, or our x-axis, that would be our caffeine intake in milligrams. And then our y-axis, or our vertical axis, that would be the, I would assume it's in hours. So time time studying. And Musa here, he randomly selects 20 students. And so for each of those students, he sees how much caffeine they consumed and how much time they spent studying and plots them here. And so there'll be 20 data points. One, two, three, four, five,
six, seven, eight, nine, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20. He inputs these data
points into a computer. In order to fit a
least-squares regression line. And let's say the
least-squares regression line looks something like this. And a least-squares regression line comes from trying to
minimize the square distance between the line and all of these points. And then this is giving us information on that least-squares regression line. And the most valuable things here, if we really wanna help
visualize or understand the line is what we get in this column. The constant coefficient
tells us essentially what is the y-intercept here. So 2.544. And then the coefficient on the caffeine, this is, one way of thinking about, well for every incremental
increase in caffeine, how much does the time studying increase? Or you might recognize this as the slope of the least-squares regression line. So this is the slope and this would be equal to 0.164. Now this information right over here, it tells us how well our
least-squares regression line fits the data. R-squared, you might
already be familiar with, it says how much of the
variance in the y variable is explainable by the x variable. If it was one or 100%, that means all of it could be explained. And it's a very good fit. If it was zero, that means
none of it can be explained, and it'd be a very bad fit. Capital S, this is the standard
deviation of the residuals. And it's another measure of
how much these data points vary from this regression line. Now this column right over here is going to prove to be useful for answering the question at hand. This gives us the standard
error of the coefficient. And the coefficient that
we really care about, the statistic that we really care about is the slope of the regression line. And this gives us the standard error for the slope of the regression line. You could view this as the estimate of the standard deviation
of the sampling distribution of the slope of the regression line. Remember, we took a
sample of 20 folks here, and we calculated a statistic which is the slope of the regression line. Every time you do a different sample, you will likely get a different slope. And this slope is an estimate of some true parameter in the population. This would sometimes also
be called the standard error of the slope of the
least-squares regression line. Now these last two columns,
you don't have to worry about in the context of this video. This is useful if you were saying well, assuming that
there is no relationship between caffeine intake and time studying, what is the associated T statistic for the statistics that
I actually calculated and what would be the probability of getting something that
extreme or more extreme assuming that there is no association. Assuming that for example, the actual slope of the
regression line is zero. And this says, well the probability, if we would assume that,
is actually quite low. It's about a 1% chance that you would've gotten these results if there truly was not a relationship between caffeine intake and time studying. But with all of that out of the way, let's actually answer the question. Well, to construct a confidence
interval around a statistic, you would take the value of the statistic that you calculated from your sample. So 0.164 and then it would be plus
or minus a critical t value and then this would be driven by the fact that you care about a
95% confidence interval and by the degrees of freedom, and I'll talk about that in a second. And then you would multiply that times the standard error of the statistic. And in this case, the
statistic that we care about is the slope. And so this is 0.057. Times 0.057. And the reason why we're
using a critical t value instead of a critical z value is because our standard
error of the statistic is an estimate. We don't actually know
the standard deviation of the sampling distribution. So the last thing we
have to do is figure out what is this critical t value. You can figure it out
using either a calculator or using a table. I'll do it using a table. And to do that we need to know
what the degrees of freedom. Well, when you're doing this
with a regression slope, like we're doing right now, your degrees of freedom are going to be the number of data points
you have minus two. So our degrees of freedom
are gonna be 20 minus two. Which is equal to 18. I'm not gonna go into a
bunch of depth right now. It actually is beyond the
scope of this video for sure, as to why you subtract two here. But just so that we can
look it up on a table, this is our degrees of freedom. So we care about a 95% confidence level. That's equivalent to having
a 2 1/2% tail on either side. And our degrees of freedom is 18. So our critical t value is 2.101. And so, our 95% confidence interval is going to be 0.164 plus or
minus our critical t value 2.101 times the standard
error of the statistic. Times, I'll just put it in parentheses, 0.057. And you could type this into a calculator if you wanted to figure
out the exact values here. But the way to interpret
a 95% confidence interval is that 95% of the time, that you calculated 95%
confidence interval, it is going to overlap with the true value of the parameter that we are estimating.