Main content

## Least-squares regression equations

Current time:0:00Total duration:4:52

# Calculating residual example

AP Stats: DAT‑1 (EU), DAT‑1.E (LO), DAT‑1.E.1 (EK)

## Video transcript

- [Instructor] Vera rents
bicycles to tourists. She recorded the height, in
centimeters, of each customer and the frame size, in centimeters, of the bicycle that customer rented. After plotting her results, Vera noticed that the relationship between the two variables
was fairly linear, so she used the data to calculate the following least
squares regression equation for predicting bicycle frame size from the height of the customer. And this is the equation. So before I even look at this question, let's just think about what she did. So she had a bunch of customers, and she recorded, given
the height of the customer, what size frame that person rented. And so she might've had
something like this, where in the horizontal
axis you have height measured in centimeters, and in the vertical
axis you have frame size that's also measured in centimeters. And so there might've been someone who measures 100 centimeters in height who gets a 25 centimeter frame. I don't know if that's reasonable or not, for you bicycle experts,
but let's just go with it. And so she would've plotted it there. Maybe there was another person
of 100 centimeters in height who got a frame that was slightly larger, and she plotted it there. And then, she did a
least squares regression. And a least squares regression is trying to fit a line to this data. Oftentimes, you would use a
spreadsheet or use a computer. And that line is trying
to minimize the square of the distance between these points. And so the least squares regression, maybe it would look something like this, and this is just a rough estimate of it. It might look something, let me get my ruler tool, it might look something like, it might look something like this. So let me plot it. So this, that would be the line. So our regression line, y-hat, is equal to 1/3 plus 1/3 x. And so this, you could view
this as a way of predicting, or either modeling the relationship or predicting that, hey,
if I get a new person, I could take their height and put as x and figure out what frame
size they're likely to rent. But they ask us, what is
the residual of a customer with a height of 155 centimeters who rents a bike with a 51 centimeter frame? So how do we think about this? Well, the residual is
going to be the difference between what they actually
produce and what the line, what our regression line
would have predicted. So we could say residual,
let me write it this way, residual is going to be actual, actual minus predicted. So if predicted is larger than actual, this is actually going
to be a negative number. If predicted as smaller than actual, this is gonna be a positive number. Well, we know the actual. They tell us that. They tell us that they rent, it's a, the 155 centimeter person rents a bike with a 51 centimeter frame, so this is 51 centimeters. But what is the predicted? Well, that's where we can
use our regression equation that Vera came up with. The predicted, I'll do that in orange, the predicted is going to be equal to 1/3 plus 1/3 times the person's height. Their height is 155. That's the predicted. Y-hat is what our linear
regression predicts or our line predicts. So what is this going to be? This is going to be equal
to 1/3 plus 155 over three, which is equal to 156 over three, which comes out nicely to 52. So the predicted on our line is 52. And so here, so this person is 155, we can plot 'em right over here, 155. They're coming in slightly below the line. So they're coming in slightly
below the line right there, and that distance, which is, and we can see that
they are below the line, so the distance is going to be, or in this case, the residual
is going to be negative. So this is going to be negative one. And so if we were to
zoom in right over here, you can't see it that well, but let me draw it. So if we zoom in, let's say
we were to zoom in the line, and it looks like this. And our data point is right, our data point is right over here. We know we're below the line, and it's just gonna be
a negative residual. And the magnitude of that residual is how far we are below the line. And in this case, it is negative one. And so that is our residual. This is what actual, the actual data minus what was predicted by our regression line.