If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

# Interpreting computer output for regression

Desiree is interested to see if students who consume more caffeine tend to study more as well. She randomly selects $20$ students at her school and records their caffeine intake (mg) and the number of hours spent studying. A scatterplot of the data showed a linear relationship.
This is computer output from a least-squares regression analysis on the data:
PredictorCoefSE CoefTP
Constant$2.544$$0.134$$18.955$$0.000$
Caffeine (mg)$0.164$$0.057$$2.862$$0.005$
$S=1.532\phantom{\rule{1em}{0ex}}\text{R-Sq}=60.032\mathrm{%}\phantom{\rule{1em}{0ex}}\text{R-Sq(adj)}=58.621\mathrm{%}$
Question 1
What is the equation of the least-squares regression line?
PredictorCoefSE CoefTP
Constant$2.544$$0.134$$18.955$$0.000$
Caffeine (mg)$0.164$$0.057$$2.862$$0.005$
$S=1.532\phantom{\rule{1em}{0ex}}\text{R-Sq}=60.032\mathrm{%}\phantom{\rule{1em}{0ex}}\text{R-Sq(adj)}=58.621\mathrm{%}$

Question 2
Which statement about the slope is true?
PredictorCoefSE CoefTP
Constant$2.544$$0.134$$18.955$$0.000$
Caffeine (mg)$0.164$$0.057$$2.862$$0.005$
$S=1.532\phantom{\rule{1em}{0ex}}\text{R-Sq}=60.032\mathrm{%}\phantom{\rule{1em}{0ex}}\text{R-Sq(adj)}=58.621\mathrm{%}$

question 3
Which statement about the $y$-intercept is true?
PredictorCoefSE CoefTP
Constant$2.544$$0.134$$18.955$$0.000$
Caffeine (mg)$0.164$$0.057$$2.862$$0.005$
$S=1.532\phantom{\rule{1em}{0ex}}\text{R-Sq}=60.032\mathrm{%}\phantom{\rule{1em}{0ex}}\text{R-Sq(adj)}=58.621\mathrm{%}$

question 4
How large is a typical prediction error when using this model to predict study time from caffeine intake?
PredictorCoefSE CoefTP
Constant$2.544$$0.134$$18.955$$0.000$
Caffeine (mg)$0.164$$0.057$$2.862$$0.005$
$S=1.532\phantom{\rule{1em}{0ex}}\text{R-Sq}=60.032\mathrm{%}\phantom{\rule{1em}{0ex}}\text{R-Sq(adj)}=58.621\mathrm{%}$

question 5
About what percentage of the variation in study time can be explained by the regression on caffeine intake?
PredictorCoefSE CoefTP
Constant$2.544$$0.134$$18.955$$0.000$
Caffeine (mg)$0.164$$0.057$$2.862$$0.005$
$S=1.532\phantom{\rule{1em}{0ex}}\text{R-Sq}=60.032\mathrm{%}\phantom{\rule{1em}{0ex}}\text{R-Sq(adj)}=58.621\mathrm{%}$

Question 6
Based on these data, can we conclude that consuming more caffeine will cause someone to study more?

## Want to join the conversation?

• In the earlier video, "R-squared or coefficient of determination", you mentioned the SEline, as in, the sum of errors between the line and the points. Would the S (standard deviation in residuals" be SEline/n?
• The SEline represents the aggregate (sum of) error of the regression line in predicting y. Whereas, the RMSD of the residuals of the line represents the avg. prediction error in y. One is average, the other is the sum.
(1 vote)
• I was under the impression if the Pvalue is below .05 that implies there is a relationship between the independent variable and the dependent variable. If there is also a positive relationship at what point can we confidently determine that the model is a good fit and the increase is caused by the independent variable. Is there a percent threshold for R-sqr/adj r-sqr?
• The significance level of 0.05 is commonly used to determine whether a relationship between the independent and dependent variables is statistically significant. However, statistical significance alone does not indicate the strength or practical importance of the relationship. The coefficient of determination (R^2) or adjusted R^2 provides a measure of the proportion of variance explained by the regression model. There isn't a specific threshold for R^2 or adjusted R^2 to determine a "good" fit, as it can vary depending on the context and field of study. Generally, higher values of R^2 indicate a better fit, but it's essential to consider other factors and conduct further analysis to assess the model's adequacy.
• Why doesn't "bx" come first in ŷ=a+bx, whereas "mx" comes first in y=mx+b.
• I don't think the order matters as long as you have the correct value for the constant and slope.
• Why does more caffeine intake not lead to studying more when there is a strong positive linear relationship
• While there may be a strong positive linear relationship between caffeine intake and study time, it does not necessarily imply causation. Correlation does not imply causation, meaning that even if two variables are strongly correlated, it doesn't mean that changes in one variable cause changes in the other. There could be other variables or factors influencing the relationship, and establishing causation requires additional evidence from experimental studies or rigorous causal inference methods. Therefore, it's not accurate to conclude that consuming more caffeine leads to studying more based solely on the observed correlation.
• Is the R-Sq always going to be the typical prediction error?
• Standard Deviation of the residuals is the typical/average prediction error. R-Sq is the % reduction in prediction error when using a regression line compared to using the avg. y line (total variation in y)
(1 vote)
• Can anybody please explain why the constant coefficient 2.544 is the Y-intercept, and the caffeine coefficient 0.164 is the slope in the question 1? I can't seem to get my head around this. Please help!
• The y-intercept is always displayed in the top row, and the slope is always displayed in the bottom row. (Unfortunately, I don't know the reasoning behind them - sorry! Generally, I've found that the slope, y-intercept, s, and r^2 are the most useful pieces of information in these data charts.)
• What does regression mean?