If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

# R-squared intuition

When we first learned about the correlation coefficient, r, we focused on what it meant rather than how to calculate it, since the computations are lengthy and computers usually take care of them for us.
We'll do the same with r, squared and concentrate on how to interpret what it means.
In a way, r, squared measures how much prediction error is eliminated when we use least-squares regression.

## Predicting without regression

We use linear regression to predict y given some value of x. But suppose that we had to predict a y value without a corresponding x value.
Without using regression on the x variable, our most reasonable estimate would be to simply predict the average of the y values.
Here's an example, where the prediction line is simply the mean of the y data:
Notice that this line doesn't seem to fit the data very well. One way to measure the fit of the line is to calculate the sum of the squared residuals—this gives us an overall sense of how much prediction error a given model has.
So without least-squares regression, our sum of squares is 41, point, 1879
Would using least-squares regression reduce the amount of prediction error? If so, by how much? Let's see!

## Predicting with regression

Here's the same data with the corresponding least-squares regression line and summary statistics:
Equationrr, squared
y, with, hat, on top, equals, 0, point, 5, x, plus, 1, point, 50, point, 8160, point, 6659
This line seems to fit the data pretty well, but to measure how much better it fits, we can look again at the sum of the squared residuals:
Using least-squares regression reduced the sum of the squared residuals from 41, point, 1879 to 13, point, 7627.
So using least-squares regression eliminated a considerable amount of prediction error. How much though?

## R-squared measures how much prediction error we eliminated

Without using regression, our model had an overall sum of squares of 41, point, 1879. Using least-squares regression reduced that down to 13, point, 7627.
So the total reduction there is 41, point, 1879, minus, 13, point, 7627, equals, 27, point, 4252.
We can represent this reduction as a percentage of the original amount of prediction error:
start fraction, 41, point, 1879, minus, 13, point, 7627, divided by, 41, point, 1879, end fraction, equals, start fraction, 27, point, 4252, divided by, 41, point, 1879, end fraction, approximately equals, 66, point, 59, percent
If you look back up above, you'll see that r, squared, equals, 0, point, 6659.
R-squared tells us what percent of the prediction error in the y variable is eliminated when we use least-squares regression on the x variable.
As a result, r, squared is also called the coefficient of determination.
Many formal definitions say that r, squared tells us what percent of the variability in the y variable is accounted for by the regression on the x variable.
It seems pretty remarkable that simply squaring r gives us this measurement. Proving this relationship between r and r, squared is pretty complex, and is beyond the scope of an introductory statistics course.