Main content
Current time:0:00Total duration:4:52

Video transcript

- [Instructor] Vera rents bicycles to tourists. She recorded the height, in centimeters, of each customer and the frame size, in centimeters, of the bicycle that customer rented. After plotting her results, Vera noticed that the relationship between the two variables was fairly linear, so she used the data to calculate the following least squares regression equation for predicting bicycle frame size from the height of the customer. And this is the equation. So before I even look at this question, let's just think about what she did. So she had a bunch of customers, and she recorded, given the height of the customer, what size frame that person rented. And so she might've had something like this, where in the horizontal axis you have height measured in centimeters, and in the vertical axis you have frame size that's also measured in centimeters. And so there might've been someone who measures 100 centimeters in height who gets a 25 centimeter frame. I don't know if that's reasonable or not, for you bicycle experts, but let's just go with it. And so she would've plotted it there. Maybe there was another person of 100 centimeters in height who got a frame that was slightly larger, and she plotted it there. And then, she did a least squares regression. And a least squares regression is trying to fit a line to this data. Oftentimes, you would use a spreadsheet or use a computer. And that line is trying to minimize the square of the distance between these points. And so the least squares regression, maybe it would look something like this, and this is just a rough estimate of it. It might look something, let me get my ruler tool, it might look something like, it might look something like this. So let me plot it. So this, that would be the line. So our regression line, y-hat, is equal to 1/3 plus 1/3 x. And so this, you could view this as a way of predicting, or either modeling the relationship or predicting that, hey, if I get a new person, I could take their height and put as x and figure out what frame size they're likely to rent. But they ask us, what is the residual of a customer with a height of 155 centimeters who rents a bike with a 51 centimeter frame? So how do we think about this? Well, the residual is going to be the difference between what they actually produce and what the line, what our regression line would have predicted. So we could say residual, let me write it this way, residual is going to be actual, actual minus predicted. So if predicted is larger than actual, this is actually going to be a negative number. If predicted as smaller than actual, this is gonna be a positive number. Well, we know the actual. They tell us that. They tell us that they rent, it's a, the 155 centimeter person rents a bike with a 51 centimeter frame, so this is 51 centimeters. But what is the predicted? Well, that's where we can use our regression equation that Vera came up with. The predicted, I'll do that in orange, the predicted is going to be equal to 1/3 plus 1/3 times the person's height. Their height is 155. That's the predicted. Y-hat is what our linear regression predicts or our line predicts. So what is this going to be? This is going to be equal to 1/3 plus 155 over three, which is equal to 156 over three, which comes out nicely to 52. So the predicted on our line is 52. And so here, so this person is 155, we can plot 'em right over here, 155. They're coming in slightly below the line. So they're coming in slightly below the line right there, and that distance, which is, and we can see that they are below the line, so the distance is going to be, or in this case, the residual is going to be negative. So this is going to be negative one. And so if we were to zoom in right over here, you can't see it that well, but let me draw it. So if we zoom in, let's say we were to zoom in the line, and it looks like this. And our data point is right, our data point is right over here. We know we're below the line, and it's just gonna be a negative residual. And the magnitude of that residual is how far we are below the line. And in this case, it is negative one. And so that is our residual. This is what actual, the actual data minus what was predicted by our regression line.