If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

### Course: Statistics and probability>Unit 5

Lesson 5: Assessing the fit in least-squares regression

# Interpreting computer regression data

Interpreting computer generated regression data to find the equation of a least-squares regression line. Predictors and coefficients. S and R-squared.

## Want to join the conversation?

• In this case, how to interpret the values of T and P?

• T (which you will find as "t" in R language summary function results) is the coefficient divided by the standard error. A large t value means that it is very likely that the coefficient will be significant. You should generally keep in your regression model those variables with a large absolute value in this field.
P (which you will find as "Pr(>|t|)" in R's summary function results) is a measure of how likely it is that your coefficient will be zero for the corresponding variabie. You will generally prefer to keep in your model those variables with a low value in this field.
Hope you find it useful!
• So S is exactly the same as the root-mean-square deviation (RMSD)?
• Sal says S is the 'typical error' or the standard deviation of the residuals, so I think you are correct?
(1 vote)
• Why is b equal to constant coefficient and m equal to caffeine coefficient? Can anyone please elaborate more?
• It's literally just how the computers calls the things it calculates.

We asked the computer to perform a least-squares regression analysis on some data with
x = caffeine consumed and y = hours studying

So imagine the data on a scatterplot, with caffeine consumed as the x-axis, and hours studying as the y-axis.

Now the computer calculates things and finds us a least-squares regression line. But, instead of just giving us the line in the form y = mx + b, it decides to put things into a weird table format.

First you have a column called "predictors", with "constant" and "caffeine" underneath. This is because with a least-squares line you can "predict" the value of y (hours studying) with

y = mx + b

where b is the "constant" that takes part in your prediction, and x is the caffeine. So b and x are the "predictors" of y.

Now the coefficient of x is m, and since x is the "caffiene" portion of your predictors, the coefficient of your caffeine predictor is m.

For b, you can imagine the equation as

y = mx + bx^0

And now, if you consider bx^0 as the "constant" predictor, the coefficient on it is b.

Ultimately, "constant coefficient" and "Caffeine coefficient" are just names the computer gave to m and b.
• Does correlation coefficient 'r' always equal to sqrt of coefficient of determination R2?
When I tried to derive the relation of r from relation of R2, I was not successful. please explain.
I don't think that r is sqrt of R2. Then, why in this video, it is mentioned like that?
• Indeed, second power of r(correlation coefficient) is equal to R^2(determination coefficient)
• What is the least amount of information you need to know to find the equation of a line?
(1 vote)
• To find the equation of the line, you need the slope and y-intercept. Solving for the slope just requires two points on the line that you're solving for.

Hope this helps!😄
• In this case, does the constant b, representing the y-intercept, show that the minimum number of hours spent studying was 2.544? Since our slope is positive, and when x=0 (minimum caffeine intake) that is the value of b?
(1 vote)
• I could be wrong, but I believe that 2.544 would represent the predicted minimum amount of hours spent studying as this is a best fit line, and not the exact value of every data point.
(1 vote)
• I'm confused because in the practice problems after the video, they use the same question but it says that the correct answer ŷ=2.544+0.164x. Is it because the order of numbers is different?
(1 vote)
• Yeah its just in a different order, the answer isn't different it was just written differently
(1 vote)
• If we have to have the x be certain dates like 1955 then 1961 how would we set the x's up to put them into the calculator?
(1 vote)
• If you need to represent dates like 1955 and 1961 as the predictor variable X in a regression analysis, you would typically convert them to numerical values. One common approach is to assign sequential integers to each date. For example, you could assign 1955 to 1, 1956 to 2, and so on. Alternatively, you could use a more precise representation, such as the number of years since a certain reference date. Once you have numerical values assigned to the dates, you can input them into the calculator just like any other numerical data.
(1 vote)
• None of the videos explain as to how we can figure out the slope and the intercept to achieve a least squares regression line without the help of the computer.
Please explain how to do it manually.
(1 vote)
• Positive slope means positive r. Negative slope means negative r. Just memorise that?