# Introduction to residuals

Build a basic understanding of what a residual is.

We run into a problem in stats when we're trying to fit a line to data points in a scatter plot. The problem is this: It's hard to say for sure which line fits the data best.

For example, imagine three scientists, $\text{\maroonD{Andrea}}$, $\text{\tealD{Jeremy}}$, and $\text{\purpleC{Brooke}}$, are working with the same data set. If each scientist draws a different line of fit, how do they decide which line is best?

If only we had some way to measure how well each line fit each data point...

## Residuals to the rescue!

A residual is a measure of how well a line fits an individual data point.

Consider this simple data set with a line of fit drawn through it

and notice how point $(2,8)$ is $\greenD4$ units above the line:

This vertical distance is known as a

**residual**. For data points above the line, the residual is positive, and for data points below the line, the residual is negative.For example, the residual for the point $(4,3)$ is $\redD{-2}$:

The closer a data point's residual is to $0$, the better the fit. In this case, the line fits the point $(4,3)$ better than it fits the point $(2,8)$.