If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

# Confidence interval for the slope of a regression line

AP.STATS:
UNC‑4 (EU)
,
UNC‑4.AF (LO)
,
UNC‑4.AF.1 (LO)
,
UNC‑4.AF.2 (LO)

## Video transcript

moussah is interested in the relationship between hours spent studying and caffeine consumption among students at his school he randomly selects 20 students at his school and records their caffeine intake in milligrams and the amount of time studying in a given week here is a computer output from a least-squares regression analysis on his sample assume that all conditions for inference have been met what is a 95% confidence interval for the slope of the least squares regression line so if you feel inspired pause the video and see if you can have a go at it otherwise we'll do this together okay so let's first remind ourselves what's even going on so let's visualize the regression so our horizontal axis or our x axis that would be our caffeine intake in milligrams and then our y axis or our vertical axis that would be the out assume it's in hours so time time studying and moosa' here he randomly selects 20 students and so for each of those students he sees how much caffeine they consumed and how much time they spent studying and plots them here and so there'll be 20 data points 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 he inputs these data points into a computer in order to fit a least squares regression line and let's say the least squares regression line looks something like this and a least squares regression line comes from trying to minimize the squared distance between the line and all of these points and then this is giving us information on that least squares regression line and the most valuable things here if we really want to help visualize or understand the line is what we get in this column the constant coefficient tells us essentially what is the y intercept here so 2.5 4 4 and then the coefficient on the caffeine this is a one way of thinking about well for every incremental increase in caffeine how much does this time studying increase or you might recognize this as the slope of the least squares regression line so the is the slope and this would be equal to 0.16 for now this information right over here it tells us how well our least squares regression line fits the data r-squared you might already be familiar with it says how much of the variance in the Y variable is explainable by the X variable if it was 1 or 100% that means all of it could be explained and it's a very good fit if it was 0 that means none of it can be explained it would be a very bad fit capital S this is the standard deviation of the residuals and it's another measure of how much these data points vary from this regression line now this column right over here is going to prove to be useful for answering the question at hand this gives us the standard error of the coefficient and the coefficient that we really care about the statistic that we really care about is the slope of the regression line and this gives us the standard error for the slope of the regression line you could view this as the estimate of the standard deviation of the sampling distribution of the slope of the regression line remember we took a sample of 20 folks here and we calculate a statistic which is the slope of the regression line every time you do a different sample you will likely get a different slope and this slope is an estimate of some true parameter in the population this would sometimes also be called the standard error of the slope of the least squares regression line now these last two columns you don't have to worry about in the context of this video this is useful if you were saying well assuming that there is no relationship between caffeine intake and time studying what does the Associated T statistics for the statistics that I actually calculated and what would be the probability of getting something that extreme or more extreme assuming that there is no association assuming that for example the actual slope of the regression line is 0 and this as well the probability if we were to assume that is actually quite low it's about a 1% chance that you would have gotten these results if there truly was not a relation between caffeine intake and time studying but with all of that out of the way let's actually answer the question well to construct a confidence interval around a statistic you would take the value of the statistic that you calculated from your sample so 0.164 and then it would be plus or minus a critical t-value and then this would be driven by the fact that you care about a 95% confidence interval and by the degrees of freedom and I'll talk about that in a second and then you would multiply that times the standard error of the statistic and in this case the statistic that we care about is the slope and so this is zero point zero five seven times zero point zero five seven and the reason why we're using a critical t-value instead of a critical z-value is because our standard error of the statistic is an estimate we don't actually know the standard deviation of the sampling distribution so the last thing we have to do is figure out what is this critical t-value you can figure it out using either a calculator or using a table I'll do it using a table and to do that when you know what the degrees of freedom well when you're doing this with a regression slope like we're doing right now your degrees of freedom are going to be the number of data points you have minus two so our degrees of freedom are going to be 20 minus 2 which is equal to 18 I'm not going to go into a bunch of depth right now it actually is beyond the scope of this video for sure as to why you subtract two here but just so that we can look it up on a table this is our degrees of freedom so we care about a 95% confidence level that's equivalent to having a two and a half percent tail on either side and our degrees of freedom it's 18 so our critical T value is two point one zero one and so our 95% confidence interval is going to be zero point one six four plus or minus our critical T value two point one zero one times the standard error of the statistic times I'll just put in parentheses 0.05 seven and you could type this into a calculator if you wanted to figure out the exact values here but the way to interpret a 90 five percent confidence interval is that 95% of the time that you calculate a 95% confidence interval it is going to overlap with the true value of the parameter that we are estimating