If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Introduction to residuals

Build a basic understanding of what a residual is. 
We run into a problem in stats when we're trying to fit a line to data points in a scatter plot. The problem is this: It's hard to say for sure which line fits the data best.
For example, imagine three scientists, start text, start color #ca337c, A, n, d, r, e, a, end color #ca337c, end text, start text, start color #01a995, J, e, r, e, m, y, end color #01a995, end text, and start text, start color #aa87ff, B, r, o, o, k, e, end color #aa87ff, end text, are working with the same data set. If each scientist draws a different line of fit, how do they decide which line is best?
If only we had some way to measure how well each line fit each data point...

Residuals to the rescue!

A residual is a measure of how well a line fits an individual data point.
Consider this simple data set with a line of fit drawn through it
and notice how point left parenthesis, 2, comma, 8, right parenthesis is start color #1fab54, 4, end color #1fab54 units above the line:
This vertical distance is known as a residual. For data points above the line, the residual is positive, and for data points below the line, the residual is negative.
For example, the residual for the point left parenthesis, 4, comma, 3, right parenthesis is start color #e84d39, minus, 2, end color #e84d39:
The closer a data point's residual is to 0, the better the fit. In this case, the line fits the point left parenthesis, 4, comma, 3, right parenthesis better than it fits the point left parenthesis, 2, comma, 8, right parenthesis.

Try to find the remaining residuals yourself

What is the residual of the point left parenthesis, 6, comma, 7, right parenthesis in the graph above?
  • Your answer should be
  • an integer, like 6
  • a simplified proper fraction, like 3, slash, 5
  • a simplified improper fraction, like 7, slash, 4
  • a mixed number, like 1, space, 3, slash, 4
  • an exact decimal, like 0, point, 75
  • a multiple of pi, like 12, space, start text, p, i, end text or 2, slash, 3, space, start text, p, i, end text

What is the residual of the point left parenthesis, 8, comma, 8, right parenthesis in the graph above?
  • Your answer should be
  • an integer, like 6
  • a simplified proper fraction, like 3, slash, 5
  • a simplified improper fraction, like 7, slash, 4
  • a mixed number, like 1, space, 3, slash, 4
  • an exact decimal, like 0, point, 75
  • a multiple of pi, like 12, space, start text, p, i, end text or 2, slash, 3, space, start text, p, i, end text

What is the residual of the point left parenthesis, 1, comma, 2, right parenthesis in the graph above?
  • Your answer should be
  • an integer, like 6
  • a simplified proper fraction, like 3, slash, 5
  • a simplified improper fraction, like 7, slash, 4
  • a mixed number, like 1, space, 3, slash, 4
  • an exact decimal, like 0, point, 75
  • a multiple of pi, like 12, space, start text, p, i, end text or 2, slash, 3, space, start text, p, i, end text

Want to join the conversation?

  • aqualine seed style avatar for user just.play.game.forever
    what is the difference between error and residual?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • female robot grace style avatar for user tyersome
      I think ysun means that:
      An error is a deviation from the population mean.
      A residual is a deviation from the sample mean.

      Errors, like other population parameters (e.g. a population mean), are usually theoretical.
      Residuals, like other sample statistics (e.g. a sample mean), are measured values from a sample. Sample statistics are often used to estimate population parameters, so in this case the residuals can be used to estimate the error.
      (2 votes)
  • starky tree style avatar for user imamulhaq
    How do you do this On a calculator
    (9 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Joona Rauhamäki
    This article does not explain what to do with the residuals after calculating them. Are you supposed to sum them? When are you supposed to use them?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user owen-k
    Really dumb question: Why is it called least squares regression? What does least squares mean?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • purple pi pink style avatar for user ZeroFK
      The "squares" refers to the squares (that is, the 2nd power) of the residuals, and the "least" just means that we're trying to find the smallest total sum of those squares.

      You may ask: why squares? The best answer I could find is that it's easy (minimizing a quadratic formula is easy) and still gives good results.
      (3 votes)
  • duskpin ultimate style avatar for user G-Port
    If you have a really positive residual point that is quite far form the LSRL is that good or bad ? Like what can you say about the residual?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • female robot grace style avatar for user tyersome
      That would be what is called an "outlier".

      It could suggest that the measurement that led to that point was wrong — e.g. The value was 3000, but 30000 got entered by mistake.

      Another possibility, especially if there aren't a lot of data points, is that the relationship between the variables is not linear — e.g. an exponential curve might be a better fit....

      ADDENDUM: It is also possible that the data is actually very "noisy" (highly variable).
      (2 votes)
  • blobby green style avatar for user Charlotte Pierrel
    What are estimates ? How are they different from residuals ?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user bmanoff47
    If there are many points on a graph then how can you draw a line that is best for all of them?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • female robot grace style avatar for user tyersome
      The line you make is a compromise that minimizes some function of the residuals.
      The most commonly used function is the sum of squares of the residuals. You cannot just do the sum of the values of the residuals, since there are likely to be many lines for which that will be zero.
      (2 votes)
  • winston baby style avatar for user Iustus82437
    in residuals how do you determine which one is best? do you mean it or do you do something else this article did not tell me how to.
    (4 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user Sanchit Agarwal
      we sum the square of the distances from the mean..though just summing the residuals look intuitively appealing, but it does not take into consideration the "magnitude" of the distance.. e.g, suppose 10 and -10 are two residuals, they are too far from the mean, but they add to 0.
      (1 vote)
  • starky sapling style avatar for user Jamune
    In the article, it says that the closer the the data point's residual is to zero, it fits the line best. There's (4,3) and (2,8). The residuals are 4, and -2. It says 4 is closer ( aka (4,3) ) but isn't -2 closer to zero than 4? How is this possible?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • hopper cool style avatar for user Avi Mahajan
      The point (4,3) is two units below the line. It has a residual of -2. However, the point (2,8) is four units above the line. It has a residual of 4. Since -2 is closer to zero than 4, the point (4,3) fits the line better than the point (2,8). I think you misunderstood that the residual of four is closer to the line. The article really meant that the point (4,3) is closer to the line. Hope this helped!
      (2 votes)
  • blobby green style avatar for user kylie839692
    how can you summarize a residual plot?
    (2 votes)
    Default Khan Academy avatar avatar for user