If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains ***.kastatic.org** and ***.kasandbox.org** are unblocked.

Main content

Current time:0:00Total duration:6:12

AP.STATS:

DAT‑1 (EU)

, DAT‑1.E (LO)

, DAT‑1.E.2 (EK)

, DAT‑1.F (LO)

, DAT‑1.F.1 (EK)

, DAT‑1.F.2 (EK)

- [Instructor] What we're
going to do in this video is talk about the idea of a residual plot for a given regression and the data that it's trying to explain. So right over here, we have a fairly simple
least squares regression. We're trying to fit four points. And in previous videos,
we actually came up with the equation of this
least squares regression line. What I'm going to do now is plot the residuals
for each of these points. So what is a residual? Well, just as a reminder, your residual for a given point is equal to the actual minus the expected. So how do I make that tangible? Well, what's the residual for
this point right over here? For this point here, the actual y when x equals one is one, but the expected, when x equals one for this least squares regression line, 2.5 times one minus two, well, that's gonna be .5. And so our residual is one minus .5, so we have a positive, we have a positive 0.5 residual. Over for this point,
you have zero residual. The actual is the expected. For this point right over here, the actual, when x
equals two, for y is two, but the expected is three. So our residual over here, once again, the actual is y equals two when x equals two. The expected, two times
2.5 minus two is three, so this is going to be two minus three, which equals a residual of negative one. And then over here, our
residual are actual. When x equals three is six, our expected when x equals three is 5.5. So six minus 5.5, that is a positive 0.5. So those are the residuals,
but how do we plot it? Well, we would set up or axes. Let me do it right over here. One, two, and three. And let's see, the maximum
residual here is positive .5 and then the minium one
here is negative one. So let's see, this could be .5, one. negative .5, negative one. So this is negative one. This is positive one here. And so when x equals one,
what was the residual? Well, the actual was
one, expected was 0.5, one minus 0.5 is 0.5. So this right over here, we
can plot right over here. The residual is 0.5. When x equals two, we
actually have two data points. First, I'll do this one. When we have the point two comma three, the residual there is zero. So for one of them, the residual is zero. Now for the other one, the
residual is negative one. Let me do that in a different color. For the other one, the
residual is negative one, so we would plot it right over here. And then this last point, the residual is positive .5. So it is just like that. And so this thing that
I have just created, where we're just seeing, for each x where we have
a corresponding point, we plot the point above or below the line based on the residual. This is called a residual plot. Now, one question is why do people even go through the trouble of creating a residual plot like this. The answer is, regardless of
whether the regression line is upward sloping or downward sloping, this gives you a sense
of how good a fit it is and whether a line is good at explaining the relationship between the variables. The general idea is if you see the point pretty evenly scattered
or randomly scattered above and below this line, you don't really discern any trend here, then a line is probably a
good model for the data. But if you do see some type of trend, if the residuals had an
upward trend like this or if they were curving
up and then curving down, or they had a downward trend, then you might say, "Hey,
this line isn't a good fit, "and maybe we would have
to do a non-linear model." What are some examples
of other residual plots? And let's try to analyze them a bit. So right here you have a regression line and its corresponding residual plot. And once again, you see here, the residual is slightly positive. The actual is slightly above the line, and you see it right over
there, it's slightly positive. This one's even more
positive, you see it there. But like the example we just looked at, it looks like these residuals
are pretty evenly scattered above and below the line. There isn't any discernible trend. And so I would say that
a linear model here, and in particular, this regression line, is a good model for this data. But if we see something like this, a different picture emerges. When I look at just the residual plot, it doesn't look like
they're evenly scattered. It looks like there's
some type of trend here. I'm going down here, but then I'm going back up. When you see something like
this, where on the residual plot you're going below the
x-axis and then above, then it might say, "Hey, a linear model "might not be appropriate. "Maybe some type of non-linear model. "Some type of non-linear curve "might better fit the data," or the relationship
between the y and the x is non-linear. Another way you could think about it is when you have a lot of residuals that are pretty far away from the x-axis in the residual plot, you'd also say, "This line isn't such a good fit." If you calculate the R value here, it would only be slightly positive, but it would not be close to one.