If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Residual plots

Creating and analyzing residual plots based on regression lines.

Want to join the conversation?

  • leaf green style avatar for user Manolo.OrtizMonasterio
    In the last example shown, can a situation be explained by two linear relationships? in the example shown, the first few data points closer to the Y axis, explained by a negative linear relationship and the ones to the right by a positive linear relationship. Is this possible?
    (5 votes)
    Default Khan Academy avatar avatar for user
    • purple pi purple style avatar for user syd.farru
      Unfortunately, no.

      We have to always describe the trend/relationship in the data values with just one pattern which 'best' fits the data. It can be a line, curve, etc.

      Making a positive AND a negative sloping line to describe the shape would mean the data has both positive and negative trend, which is impossible because a bivariate data always has either one relationship, or doesn't have any at all.

      Hope it helped!
      (10 votes)
  • blobby green style avatar for user Dissanayake,Emily
    why does an evenly or randomly scattered residual plot indicate the line is a good line of best fit?
    (4 votes)
    Default Khan Academy avatar avatar for user
    • leaf red style avatar for user BGG
      There are a few different assumptions we have to check against to make sure simple linear regression is the correct analysis to use. One of the assumptions we check is the assumption of equal variance and we check this with a residual vs fitted plot. Essentially, to perform linear analysis we need to have roughly equal variance in our residuals. If there is a shape in our residuals vs fitted plot, or the variance of the residuals seems to change, then that suggests that we have evidence against there being equal variance, meaning that the results of our linear analysis are likely to be less robust and other analyses should be considered.
      (6 votes)
  • blobby green style avatar for user Alejandro_Berrio/yeet
    Is there a way how I can print the worksheet out?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • leafers sapling style avatar for user green_ninja
      If you mean the practices, take a screenshot of the question you want and then print it. If your intent is to mark up the page to help you visualize what's going on, I'd suggest using your screenshot editing tool (like Snip and Sketch for Microsoft) or a basic photo editor to draw on the image so you don't have to print it.

      Hope this helps!😀
      (6 votes)
  • mr pink green style avatar for user Shaghayegh
    In the second example, can we say that we have sine function trend so the line is not a good fit?
    (5 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      In the second example, if the residual plot exhibits a sinusoidal trend (oscillating above and below the x-axis), it suggests that the linear regression model may not adequately capture the underlying relationship between the variables. This could indicate that the relationship is better described by a periodic function like a sine wave rather than a straight line. In such cases, a linear model would not be appropriate, and fitting a non-linear model, such as a sine function, might provide a better fit to the data.
      (1 vote)
  • male robot hal style avatar for user prehistoric-rishi
    Y'know, I got a residual question on i-Ready, so my teacher told me to search it up. I did, and it made -32% sense... KA is so much better.
    (5 votes)
    Default Khan Academy avatar avatar for user
  • duskpin ultimate style avatar for user Devany Ruby Gaeta Jaquez
    So, if the dots are not close to the x axis, it is not a line of best fit?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • orange juice squid orange style avatar for user Evan
      We're going off of the assumption here that the line is the line of best fit. If the dots aren't close to the X axis in the residual plot, then it's most likely that the data points aren't linear. The data set may in fact have an exponential or sinasudacal form (among other things).

      So a line of best fit doesn't always work well for all data sets since a line of best fit will always be a line. And not all data sets can be described well from a line.

      Hope this makes sense! (:
      (3 votes)
  • piceratops sapling style avatar for user Kawainui L. Taporco-Swaggerty
    could u do an actual + expected for a residual plot
    (2 votes)
    Default Khan Academy avatar avatar for user
  • male robot hal style avatar for user hossein darestany
    we measure residual from x-axis viewpoint. or from the independent variable perspective, but what about the y-axis. shouldn't we measure residual from the y-axis viewpoint or dependent variable perspective? like (residual of x) = (actual value of x) - (expected value of x)
    (3 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      In the context of residual plots, residuals are typically measured from the y-axis viewpoint or dependent variable perspective. The residual for a specific data point is indeed calculated as the difference between the actual value of the dependent variable (y) and the predicted value of y based on the regression line. So, you're correct that the residual is essentially the vertical distance between the observed data point and the regression line. The x-axis is used to represent the independent variable, and the y-axis represents the dependent variable. So, while we analyze the distribution of residuals along the x-axis in a residual plot, it's ultimately to assess how well the regression line explains the variability in the dependent variable (y).
      (1 vote)
  • aqualine ultimate style avatar for user Moonbli~
    How do you find out the residual when x or y are not given?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user cossine
      You don't need to know the precise value of the residual.

      Most likely if you were performing regression analysis you would be using a programming language e.g. Python. From there you could write code to graph the residual or have a dataframe indicating what are the residuals for each point.
      (2 votes)
  • blobby green style avatar for user zvandenbergh.de
    what is a least square regression
    (0 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      Least squares regression is a statistical method used to find the best-fitting line (or curve) through a set of data points. The goal is to minimize the sum of the squared differences between the observed values and the values predicted by the regression line. In other words, the least squares regression line is the line that minimizes the sum of the squared residuals, where the residual is the vertical distance between each observed data point and the corresponding point on the regression line. This method is commonly used in linear regression analysis to estimate the relationship between two variables and make predictions based on that relationship.
      (1 vote)

Video transcript

- [Instructor] What we're going to do in this video is talk about the idea of a residual plot for a given regression and the data that it's trying to explain. So right over here, we have a fairly simple least squares regression. We're trying to fit four points. And in previous videos, we actually came up with the equation of this least squares regression line. What I'm going to do now is plot the residuals for each of these points. So what is a residual? Well, just as a reminder, your residual for a given point is equal to the actual minus the expected. So how do I make that tangible? Well, what's the residual for this point right over here? For this point here, the actual y when x equals one is one, but the expected, when x equals one for this least squares regression line, 2.5 times one minus two, well, that's gonna be .5. And so our residual is one minus .5, so we have a positive, we have a positive 0.5 residual. Over for this point, you have zero residual. The actual is the expected. For this point right over here, the actual, when x equals two, for y is two, but the expected is three. So our residual over here, once again, the actual is y equals two when x equals two. The expected, two times 2.5 minus two is three, so this is going to be two minus three, which equals a residual of negative one. And then over here, our residual are actual. When x equals three is six, our expected when x equals three is 5.5. So six minus 5.5, that is a positive 0.5. So those are the residuals, but how do we plot it? Well, we would set up or axes. Let me do it right over here. One, two, and three. And let's see, the maximum residual here is positive .5 and then the minium one here is negative one. So let's see, this could be .5, one. negative .5, negative one. So this is negative one. This is positive one here. And so when x equals one, what was the residual? Well, the actual was one, expected was 0.5, one minus 0.5 is 0.5. So this right over here, we can plot right over here. The residual is 0.5. When x equals two, we actually have two data points. First, I'll do this one. When we have the point two comma three, the residual there is zero. So for one of them, the residual is zero. Now for the other one, the residual is negative one. Let me do that in a different color. For the other one, the residual is negative one, so we would plot it right over here. And then this last point, the residual is positive .5. So it is just like that. And so this thing that I have just created, where we're just seeing, for each x where we have a corresponding point, we plot the point above or below the line based on the residual. This is called a residual plot. Now, one question is why do people even go through the trouble of creating a residual plot like this. The answer is, regardless of whether the regression line is upward sloping or downward sloping, this gives you a sense of how good a fit it is and whether a line is good at explaining the relationship between the variables. The general idea is if you see the point pretty evenly scattered or randomly scattered above and below this line, you don't really discern any trend here, then a line is probably a good model for the data. But if you do see some type of trend, if the residuals had an upward trend like this or if they were curving up and then curving down, or they had a downward trend, then you might say, "Hey, this line isn't a good fit, "and maybe we would have to do a non-linear model." What are some examples of other residual plots? And let's try to analyze them a bit. So right here you have a regression line and its corresponding residual plot. And once again, you see here, the residual is slightly positive. The actual is slightly above the line, and you see it right over there, it's slightly positive. This one's even more positive, you see it there. But like the example we just looked at, it looks like these residuals are pretty evenly scattered above and below the line. There isn't any discernible trend. And so I would say that a linear model here, and in particular, this regression line, is a good model for this data. But if we see something like this, a different picture emerges. When I look at just the residual plot, it doesn't look like they're evenly scattered. It looks like there's some type of trend here. I'm going down here, but then I'm going back up. When you see something like this, where on the residual plot you're going below the x-axis and then above, then it might say, "Hey, a linear model "might not be appropriate. "Maybe some type of non-linear model. "Some type of non-linear curve "might better fit the data," or the relationship between the y and the x is non-linear. Another way you could think about it is when you have a lot of residuals that are pretty far away from the x-axis in the residual plot, you'd also say, "This line isn't such a good fit." If you calculate the R value here, it would only be slightly positive, but it would not be close to one.