Main content

# Calculating residual example

We look at an example scenario that includes understanding least squares regression, interpreting the regression equation, calculating residuals, and interpreting the significance of positive and negative residuals in relation to the regression line.

## Want to join the conversation?

- How do we find the residual when there are two y values for one x value?

Thanks,

~HarleyQuinn(6 votes)- Then it wouldn't be a function. Things that aren't functions really tick me off.(1 vote)

- why did sal put the line right there on the graph I do not understand that(4 votes)
- At around3:52, why didn't he add the 1/3 to 52? I guess it's not too big of a difference but wouldn't that make the residual -1 and 1/3?

Thanks,

GoldenDoodle(1 vote)- He already added the 1∕3 to 155∕3 to get 156∕3, which simplifies to 52.(5 votes)

- Where does the whole 1/3 part come in?(2 votes)
- The y-intercept and the slope are 1/3. The general equation for the least squares regression is

^

Y = b + mx.

where b is the why intercept and m is slope.

1/3 itself is just a preset value.(3 votes)

- I don't understand why he puts the line through the graph right there help!(3 votes)
- I'm just wondering, is this the same as Ordinary Least Squares (OLS)?(2 votes)
- Sort of - to be precise, OLS is a method of optimising your parameters so that the sum of the squares of the residuals is as small as possible(2 votes)

- I'd like an answer as soon as possible, please! ^.^

What if the "actual" numbers are a lot larger, like 12, or 28, or larger? I have a problem like this but when I use the equation given, I get huge numbers like 101 and 429, so when I do y-r (y-value minus residual) I get numbers like -89, which are too large to plot on my graph. What am I doing wrong?(2 votes)- Here's your answer, six years later:(2 votes)

- Where did he get the points for the graph?(1 vote)
- Sal probably had the points before we saw the video. I think he made up these numbers.(3 votes)

- I am struggling with math(2 votes)
- At1:27he said the line is trying to minimize the square between the distance why?(1 vote)
- So that it fits the data best(3 votes)

## Video transcript

- [Instructor] Vera rents
bicycles to tourists. She recorded the height, in
centimeters, of each customer and the frame size, in centimeters, of the bicycle that customer rented. After plotting her results, Vera noticed that the relationship between the two variables
was fairly linear, so she used the data to calculate the following least
squares regression equation for predicting bicycle frame size from the height of the customer. And this is the equation. So before I even look at this question, let's just think about what she did. So she had a bunch of customers, and she recorded, given
the height of the customer, what size frame that person rented. And so she might've had
something like this, where in the horizontal
axis you have height measured in centimeters, and in the vertical
axis you have frame size that's also measured in centimeters. And so there might've been someone who measures 100 centimeters in height who gets a 25 centimeter frame. I don't know if that's reasonable or not, for you bicycle experts,
but let's just go with it. And so she would've plotted it there. Maybe there was another person
of 100 centimeters in height who got a frame that was slightly larger, and she plotted it there. And then, she did a
least squares regression. And a least squares regression is trying to fit a line to this data. Oftentimes, you would use a
spreadsheet or use a computer. And that line is trying
to minimize the square of the distance between these points. And so the least squares regression, maybe it would look something like this, and this is just a rough estimate of it. It might look something, let me get my ruler tool, it might look something like, it might look something like this. So let me plot it. So this, that would be the line. So our regression line, y-hat, is equal to 1/3 plus 1/3 x. And so this, you could view
this as a way of predicting, or either modeling the relationship or predicting that, hey,
if I get a new person, I could take their height and put as x and figure out what frame
size they're likely to rent. But they ask us, what is
the residual of a customer with a height of 155 centimeters who rents a bike with a 51 centimeter frame? So how do we think about this? Well, the residual is
going to be the difference between what they actually
produce and what the line, what our regression line
would have predicted. So we could say residual,
let me write it this way, residual is going to be actual, actual minus predicted. So if predicted is larger than actual, this is actually going
to be a negative number. If predicted as smaller than actual, this is gonna be a positive number. Well, we know the actual. They tell us that. They tell us that they rent, it's a, the 155 centimeter person rents a bike with a 51 centimeter frame, so this is 51 centimeters. But what is the predicted? Well, that's where we can
use our regression equation that Vera came up with. The predicted, I'll do that in orange, the predicted is going to be equal to 1/3 plus 1/3 times the person's height. Their height is 155. That's the predicted. Y-hat is what our linear
regression predicts or our line predicts. So what is this going to be? This is going to be equal
to 1/3 plus 155 over three, which is equal to 156 over three, which comes out nicely to 52. So the predicted on our line is 52. And so here, so this person is 155, we can plot 'em right over here, 155. They're coming in slightly below the line. So they're coming in slightly
below the line right there, and that distance, which is, and we can see that
they are below the line, so the distance is going to be, or in this case, the residual
is going to be negative. So this is going to be negative one. And so if we were to
zoom in right over here, you can't see it that well, but let me draw it. So if we zoom in, let's say
we were to zoom in the line, and it looks like this. And our data point is right, our data point is right over here. We know we're below the line, and it's just gonna be
a negative residual. And the magnitude of that residual is how far we are below the line. And in this case, it is negative one. And so that is our residual. This is what actual, the actual data minus what was predicted by our regression line.