Main content
AP®︎/College Statistics
Course: AP®︎/College Statistics > Unit 5
Lesson 4: Least-squares regression- Calculating the equation of a regression line
- Calculating the equation of the least-squares line
- Interpreting slope of regression line
- Interpreting y-intercept in regression model
- Interpreting slope and y-intercept for linear models
- Using least squares regression output
- Using least-squares regression output
© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice
Using least squares regression output
Worked example using least squares regression output.
Want to join the conversation?
- What are SE Coef, T, and P?(19 votes)
- This answer is 3 years late, but it may be helpful for those who are watching the video right now.
This website explains them under "Table of Coefficients": https://people.richland.edu/james/ictcm/2004/weight.html
These values are probably not that important, at least for the content of this video.
Edit: Another video further in this statistics playlist also explains that these values are not in the scope of this course; they are mainly used in inferential statistics.
(https://www.khanacademy.org/math/ap-statistics/bivariate-data-ap/assessing-fit-least-squares-regression/v/interpreting-computer-regression-data at) 3:29(6 votes)
- Why is it called "least squares regression output?"(8 votes)
- Regarding regression, the term doesn't have anything to do with what it really does...
Extracted from this nice article from http://blog.minitab.com/blog/statistics-and-quality-data-analysis/so-why-is-it-called-regression-anyway:
"... here’s the irony: The term regression, as Galton used it, didn't refer to the statistical procedure he used to determine the fit lines for the plotted data points. (...) For Galton, “regression” referred only to the tendency of extreme data values to "revert" to the overall mean value. (...)
Later, as he and other statisticians built on the methodology to quantify correlation relationships and to fit lines to data values, the term “regression” become associated with the statistical analysis that we now call regression. But it was just by chance that Galton's original results using a fit line happened to show a regression of heights. If his study had showed increasing deviance of childrens' heights from the average compared to their parents, perhaps we'd be calling it "progression" instead.
So, you see, there’s nothing particularly “regressive” about a regression analysis."(13 votes)
- at, why does slope = fertility coef? and y-intercept = constant coef? 1:35(4 votes)
- To predict life expectancy (Yhat) based on fertility rate(X), it means for every 1 fertility rate change (delta X = 1) you want to know how much does the life expectancy changes (delta Y). Hence, you use the slope (delta Y/ delta X) to be the coefficient of fertility. When you multiply the fertility rate given by this coefficient then you know Y (life expectancy) would change by how much.
For y-intercept as a constant coefficient I think it's because this is the point in the graph where you know for certain that the linear regression line will pass through, hence the name constant. This value could very well be the mean value of y because every linear regression line will pass through the mean of x and y (x,y) coord. Hopefully, somebody passing by would confirm or correct me on this if my understanding is wrong.(2 votes)
- What software gives this output?
What do the other entries in the table represent?(3 votes) - Life expectancy of whom?
The mothers, the children, or the population as a whole?(2 votes)- the mothers, presumably, cuz childbirth mainly affects the life expectancy of the mother(for example, a difficult labor or such stuff will have a higher impact on the mother than on the child).(2 votes)
- Loving the course, but I don't think we've had "coefficient" defined/explained anywhere—recalling it dimly from school—just as I think we haven't had an explanation for suddenly swapping in "least-squares regression" for "linear regression". (I believe there was a mention that squares would be explained, but here I am with no idea.) I'm acclimating to the shift in variables—b now is a, and m now is b—must be an effect of the hat—but this vid is a leap for me from the preceding. I like the challenge, but ack.(2 votes)
- I think there's a video on the correlation coefficient in lesson 5 unit 5, if you havent already found it. btw, I (think) a coefficient is just the number that comes before a variable (the 2 in 2x)?(1 vote)
- Sorry a bit off the topic, but I am curious how you calculate the standard errors of the coefficients?(1 vote)
Video transcript
- [Instructor] Nkechi took a random sample of 10 countries to study fertility rate. And life expectancy. She noticed a strong
negative linear relationship between those variables
in the sample data. Here is computer output from a least-squares regression analysis for using fertility rate to predict life expectancy. Use this model to predict the
life expectancy of a country whose fertility rate is
two babies per woman. And you can round your answer to the nearest whole number of years. So pause this number and see if you can do it, you might need to use a calculator. All right now let's do this together. So in general, this computer output is actually
giving us a lot of data, more than we need actually,
to do this prediction. But it's giving us the data we need to know the equation
for a regression line. So the general form of a regression line, a linear regression line would be, our estimate, and that little hat means
we're estimating our y value, would be equal to our y-intercept plus our slope, times our x value. Now in this situation, we're using fertility to
predict life expectancy. Or let me circle all of life expectancy. So the thing that we're trying to predict, that is y, life expectancy. And fertility, is the thing that we're using to predict that. So that is going to be
our x, right over there. Now what are a and b? Well, our computer output gives us that. It's these numbers right over here. Our constant coefficient
right over here, this is a. And our slope, is going
to be negative 5.97. You could view it as the
coefficient on fertility. Remember, this right
over here, is fertility. You could even write, rewrite this as our estimated life expectancy, estimated life expectancy. I could put a little hat on it to show this is estimated life expectancy, is going to be equal to 89.70 minus 5.97 times fertility, times fertility rate. I'll just call it, say fert. And period, right over there. Notice, this is the
coefficient on fertility, and then this is the constant coefficient. We could do that right over there. And now, we can use this to
estimate the life expectancy of a country whose fertility
rate is two babies per woman. For fertility, you just put a two here. And then you get your
estimated life expectancy. So what's that going to be? We can get out a calculator. So we can say, 5.97 times two is equal to that and then we wanna subtract that from, so put in a negative there, and add that to 89.7 is equal to, and we wanna round to the
nearest whole number of years, so that's approximately 78 years. So this is approximately 78 years. And we're done. And just to be clear
what even happened here, is that Nkechi, she did a regression, on the x-axis with fertility, fertility, on the y-axis is let's call it l period dot e period. That's our y-axis. Took 10 data points, one, two, three, four, five,
six, seven, eight, nine, 10. Put a regression line on, try to fit try to fit a regression line. Saw a negative linear relationship, and then using this
regression line to estimate, hey, if fertility is, let's say this is two right over here, what is the estimated life expectancy? And we just saw that that
would be roughly 78 years.