Main content

## Assessing the fit in least-squares regression

Current time:0:00Total duration:5:13

# Interpreting computer regression data

AP Stats: DAT‑1 (EU), DAT‑1.G (LO)

## Video transcript

- [Narrator] In other videos, we've done linear regressions by hand, but we mentioned that most regressions are actually done using some
type of computer or calculator. And so what we're going
to do in this video, is look at an example of
the output that we might see from a computer, and to
not be intimidated by it, and to see how it gives us the equation for the regression line, and some of the other data it gives us. So here it tells us,
Cheryl Dixon is interested to see if students who
consume more caffeine tend to study more as well. She randomly selects 20
students at her school, and records their caffeine
intake in milligrams and the number of hours spent studying. A scatterplot of the data
showed a linear relationship. This is a computer output
from a least-squares regression analysis on the data. So we have these things
called the predictors, coefficient, and then we
have these other things, standard error of coefficient, T and P, and then all of these things down here, how do we make sense of this in order to come up with an equation for our linear regression? So let's just get
straight on our variables. Let's just say that we say that Y is the thing that we're trying to predict, so this is the hours spent studying, hours studying. And then let's say X is what we think explains the hours studying, or is one of the things that
explains the hours studying, and this is the amount
of caffeine ingested, so this is caffeine consumed in milligrams. And so, our regression
line would have the form Y hat, this tells us that
this is a linear regression, it's trying to estimate the
actual Y values for given Xs, is going to be equal to, MX plus B. Now how do we figure out what M and B are, based on this computer output? So when you look at this table here, this first column says predictor, and it says constant, and it has caffeine. And so all this is saying is, when you're trying to predict
the number of hours studying, when you're trying to predict Y, there's essentially two inputs there. There is the constant value,
and there is your variable, in this case caffeine, that
you are using to predict the amount that you study. And so this tells you
the coefficients on each. The coefficient on a
constant is the constant. You could view this as the coefficient on the
X to the zeroth term. And so the coefficient on the constant, that is the constant, two point five four four. And then, the coefficient on the caffeine, well, we just said that X
is the caffeine consumed, so this is that coefficient, zero point one six four. So just like that, we actually have the equation for the regression line, that is why these computer
things are useful. So, we can just write it out, Y hat is equal to zero point one six four X plus two point five four four, two point five four four. So that's the regression line, what is this other
information they give us? Well, I won't give you a
very satisfying answer, because all of this is actually useful for inferential statistics. To think about things like, well, what is the probability
that this is chance that we got something to fit this well? So this right over here is the R squared, and if you wanted to
figure out the R from this, you would just take the square root here, we could say that R is going
to be equal to the square root of zero point six zero zero three two, depending on how much precision you have. But you might say, well how do we know if R is the positive square
root, or the negative square root of that, R
can take on values between negative one and positive one. And the answer is, you would
look at the slope here. We have a positive slope,
which tells us that R is going to be positive. If we had a negative slope, then R, then we would take the
negative square root. Now this right here is
the adjusted R squared, and we really don't have
to worry about it too much when we're thinking about
just bivariate data, we're talking about
caffeine and hours studying in this case. If we started to have more variables that tried to explain the hours studying, then we would care about
adjusted R squared, but we're not gonna do that just yet. Last but not least, this S variable. This is the standard
deviation of the residuals, which we study in other videos. And why is that useful? Well that's a measure of
how well the regression line fits the data, it's a
measure of, we could say, the typical error. So big takeaway, computers are useful, they'll give you a lot of data, and the key thing is how
do you pick out the things that you actually need, because
if you know how to do it, it can be quite straightforward.