what is the difference between error and residual?

I think ysun means that:` An error is *a deviation from the* population mean. A residual is *a deviation from the* sample mean.` Errors, like other population parameters (e.g. a population mean), are _usually_ theoretical. Residuals, like other sample statistics (e.g. a sample mean), are measured values from a sample. Sample statistics are often used to estimate population parameters, so in this case the residuals can be used to estimate the error.

How do you do this On a calculator

the explanation on how to do this using a calculator is confusing

If you have a really positive residual point that is quite far form the LSRL is that good or bad ? Like what can you say about the residual?

That would be what is called an "outlier". It could suggest that the measurement that led to that point was wrong — e.g. The value was 3000, but 30000 got entered by mistake. Another possibility, especially if there aren't a lot of data points, is that the relationship between the variables is not linear — e.g. an exponential curve might be a better fit.... ADDENDUM: It is also possible that the data is actually very "noisy" (highly variable).

Really dumb question: Why is it called least squares regression? What does least squares mean?

The "squares" refers to the squares (that is, the 2nd power) of the residuals, and the "least" just means that we're trying to find the smallest total sum of those squares. You may ask: why squares? The best answer I could find is that it's easy (minimizing a quadratic formula is easy) and still gives good results.

how can a residual be one sided? For example in the graphs, would being one sided mean the data points are not scattered?

In statistics, resids (short for residuals) are the differences between the predicted values and the actual values of the response variable. One-sided residuals can occur when a model is fitted to data with some specific characteristics. A one-sided residual plot is a plot of residual values against the fitted values of the model only for one side of the graph. For example, a one-sided residual plot can be observed when we have a regression model in which our residuals are constrained to be non-negative. In this case, we may have a one-sided residual plot resulting from the fact that only one side of the graph will have positive residuals, while the other side will have residuals of zero. In terms of scatterplots, being one-sided does not necessarily mean that the data points are not scattered. The scatter in the data points will still be visible in the one-sided residual plot.

If there are many points on a graph then how can you draw a line that is best for all of them?

The line you make is a compromise that minimizes some function of the residuals. The most commonly used function is the sum of squares of the residuals. You cannot just do the sum of the values of the residuals, since there are likely to be many lines for which that will be zero.

Main content

Course: Statistics and probability > Unit 5

Lesson 4: Least-squares regression equations

Introduction to residuals

Google Classroom

Build a basic understanding of what a residual is.

We run into a problem in stats when we're trying to fit a line to data points in a scatter plot. The problem is this: It's hard to say for sure which line fits the data best.

For example, imagine three scientists,

Andrea

Jeremy

, and

Brooke

, are working with the same data set. If each scientist draws a different line of fit, how do they decide which line is best?

If only we had some way to measure how well each line fit each data point...

Residuals to the rescue!

A residual is a measure of how well a line fits an individual data point.

Consider this simple data set with a line of fit drawn through it

and notice how point

(2, 8)

4

units above the line:

This vertical distance is known as a residual. For data points above the line, the residual is positive, and for data points below the line, the residual is negative.

For example, the residual for the point

(4, 3)

- 2

The closer a data point's residual is to

0

, the better the fit. In this case, the line fits the point

(4, 3)

better than it fits the point

(2, 8)

Try to find the remaining residuals yourself

What is the residual of the point $(6, 7)$ ‍ in the graph above?

What is the residual of the point $(8, 8)$ ‍ in the graph above?

What is the residual of the point $(1, 2)$ ‍ in the graph above?

Want to join the conversation?

Sort by:

just.play.game.forever
Posted 8 years ago. Direct link to just.play.game.forever's post “what is the difference be...”
what is the difference between error and residual?
Button navigates to signup pageComment on just.play.game.forever's post “what is the difference be...”
(50 votes)
Answer
- tyersome
  Posted 8 years ago. Direct link to tyersome's post “I think ysun means that:`...”
  I think ysun means that:An error is a deviation from the population mean. A residual is a deviation from the sample mean.
  Errors, like other population parameters (e.g. a population mean), are usually theoretical.
  Residuals, like other sample statistics (e.g. a sample mean), are measured values from a sample. Sample statistics are often used to estimate population parameters, so in this case the residuals can be used to estimate the error.
  Comment on tyersome's post “I think ysun means that:`...”
  (52 votes)
imamulhaq
Posted 8 years ago. Direct link to imamulhaq's post “How do you do this On a c...”
How do you do this On a calculator
Button navigates to signup pageButton navigates to signup page
(11 votes)
Answer
- Rebecca Rhone Harrison
  Posted 8 years ago. Direct link to Rebecca Rhone Harrison's post “the explanation on how to...”
  the explanation on how to do this using a calculator is confusing
  Comment on Rebecca Rhone Harrison's post “the explanation on how to...”
  (4 votes)
Joona Rauhamäki
Posted 8 years ago. Direct link to Joona Rauhamäki's post “This article does not exp...”
This article does not explain what to do with the residuals after calculating them. Are you supposed to sum them? When are you supposed to use them?
Button navigates to signup pageComment on Joona Rauhamäki's post “This article does not exp...”
(11 votes)
Answer
- Jacob Kovacs
  Posted 8 years ago. Direct link to Jacob Kovacs's post “The article is incomplete...”
  The article is incomplete. It didn't circle back around to answer the question it posed at the beginning: "If each scientist draws a different line of fit, how do they decide which line is best?" Calculating the residuals for each line helps you decide which line best fits the data.
  Button navigates to signup page
  (14 votes)
G-Port
Posted 7 years ago. Direct link to G-Port's post “If you have a really posi...”
If you have a really positive residual point that is quite far form the LSRL is that good or bad ? Like what can you say about the residual?
Button navigates to signup pageButton navigates to signup page
(3 votes)
Answer
- tyersome
  Posted 6 years ago. Direct link to tyersome's post “That would be what is cal...”
  That would be what is called an "outlier".
  
  It could suggest that the measurement that led to that point was wrong — e.g. The value was 3000, but 30000 got entered by mistake.
  
  Another possibility, especially if there aren't a lot of data points, is that the relationship between the variables is not linear — e.g. an exponential curve might be a better fit....
  
  ADDENDUM: It is also possible that the data is actually very "noisy" (highly variable).
  Button navigates to signup page
  (8 votes)
owen-k
Posted 6 years ago. Direct link to owen-k's post “Really dumb question: Why...”
Really dumb question: Why is it called least squares regression? What does least squares mean?
Button navigates to signup pageButton navigates to signup page
(3 votes)
Answer
- ZeroFK
  Posted 6 years ago. Direct link to ZeroFK's post “The "squares" refers to t...”
  The "squares" refers to the squares (that is, the 2nd power) of the residuals, and the "least" just means that we're trying to find the smallest total sum of those squares.
  
  You may ask: why squares? The best answer I could find is that it's easy (minimizing a quadratic formula is easy) and still gives good results.
  Button navigates to signup page
  (7 votes)
alyssah83
Posted 10 months ago. Direct link to alyssah83's post “how can a residual be one...”
how can a residual be one sided? For example in the graphs, would being one sided mean the data points are not scattered?
Button navigates to signup pageButton navigates to signup page
(3 votes)
Answer
- Parsa Abangah
  Posted 10 months ago. Direct link to Parsa Abangah's post “In statistics, resids (sh...”
  In statistics, resids (short for residuals) are the differences between the predicted values and the actual values of the response variable. One-sided residuals can occur when a model is fitted to data with some specific characteristics. A one-sided residual plot is a plot of residual values against the fitted values of the model only for one side of the graph.
  
  For example, a one-sided residual plot can be observed when we have a regression model in which our residuals are constrained to be non-negative. In this case, we may have a one-sided residual plot resulting from the fact that only one side of the graph will have positive residuals, while the other side will have residuals of zero.
  
  In terms of scatterplots, being one-sided does not necessarily mean that the data points are not scattered. The scatter in the data points will still be visible in the one-sided residual plot.
  Button navigates to signup page
  (2 votes)
kylie839692
Posted 7 years ago. Direct link to kylie839692's post “how can you summarize a r...”
how can you summarize a residual plot?
Button navigates to signup pageButton navigates to signup page
(3 votes)
Answer
Charlotte Pierrel
Posted 5 years ago. Direct link to Charlotte Pierrel's post “What are estimates ? How ...”
What are estimates ? How are they different from residuals ?
Button navigates to signup pageButton navigates to signup page
(3 votes)
Answer
- Ridwan Ahmed
  Posted 3 years ago. Direct link to Ridwan Ahmed's post “An estimate would be the ...”
  An estimate would be the y-value predicted by the regression line whereas a residual is the signed difference between the actual y-value and the estimate.
  Button navigates to signup page
  (1 vote)
bmanoff47
Posted 7 years ago. Direct link to bmanoff47's post “If there are many points ...”
If there are many points on a graph then how can you draw a line that is best for all of them?
Button navigates to signup pageButton navigates to signup page
(3 votes)
Answer
- tyersome
  Posted 7 years ago. Direct link to tyersome's post “The line you make is a co...”
  The line you make is a compromise that minimizes some function of the residuals.
  The most commonly used function is the sum of squares of the residuals. You cannot just do the sum of the values of the residuals, since there are likely to be many lines for which that will be zero.
  Button navigates to signup page
  (2 votes)
Iustus82437
Posted 8 years ago. Direct link to Iustus82437's post “in residuals how do you d...”
in residuals how do you determine which one is best? do you mean it or do you do something else this article did not tell me how to.
Button navigates to signup pageButton navigates to signup page
(4 votes)
Answer
- Sanchit Agarwal
  Posted 7 years ago. Direct link to Sanchit Agarwal's post “we sum the square of the ...”
  we sum the square of the distances from the mean..though just summing the residuals look intuitively appealing, but it does not take into consideration the "magnitude" of the distance.. e.g, suppose 10 and -10 are two residuals, they are too far from the mean, but they add to 0.
  Button navigates to signup page
  (1 vote)