If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content
Current time:0:00Total duration:9:45

Video transcript

in the last video we were able to find the equation for the regression line the equation for the regression line for these four data points what I want to do in this video is figure out the r-squared for these data points figure out how good this line fits the data or even better figure out the percentage which is really the same thing figure out the percentage of the variation of these data points especially the variation in Y that is due to or that can be explained by a variation in X so to do that I'm actually gonna get a spreadsheet out I actually have tried to do this with a calculator and it's much harder so hopefully this doesn't confuse you too much to use a spreadsheet and I'm going to make a couple of columns here and spreadsheets actually have functions that'll do all of this automatically but I really want to do it so that you could do it by hand if you had to I'm going to make a couple of columns here this is going to move my X column this is going to be my Y column this is going to be the column I'll call this y star this will be the Y value that our line predicts based on our x value this is going to be the this is going to be the error error between error with line error with the line so it's going to be the difference let me call it the squared error with line actually me just do the error with line error with other the squared error squared error squared I want I don't want this to take up too much space squared squared error with line and then the next one I want to do the squared squared error actually no idea already had the squared error and then the next one the next one I am going to have I'm going to have the the squared variation squared variation for that Y value for this y value squared squared from from the mean from the mean Y from the mean one I think these columns Bites by themselves will be enough for us to do everything so let's first put all the data points in so we had negative two comma negative three that was one data point negative one comma negative one then we had one comma two then we have four comma three now what will what does our line predict what does our line predict well our line says look you give me an x value I'm going to tell you what Y value I'll predict so when X is equal to negative 2 the y value on the line is going to be the slope so this is going to be equal to 41 divided by 42 x times our x value and I just select that cell and just just a little bit of a primer on spreadsheets I'm selecting the cell d2 the cell d2 I was able to just move my cursor over and select that that tells me the x value minus 5 over 21 minus 5 divided by minus 5 divided by 21 minus 5 divided by 21 just like that just like that so just to be clear of what we're even doing this y star here I got negative two point one nine that tells us that this point right over here this point right over here is negative this point right here is negative two point one nine right over here so when we figure out the error we're going to figure out the distance between negative two and negative two point one between sorry between negative three that's our Y value if we're between negative three and negative two point one nine so let's do that so this so the error the error is just going to be equal to our Y value that's cell e2 cell e2 minus our the value that our line would predict and we want this so that just that value is the actual error but we want to square it so we want to square it just like that so we will square it and then let me try to do the right thing yep and then the next thing we want to do is the square distance so this is equal to the squared distance of our Y value from the Y's mean so what's the mean of the Y's mean of the Y's is 1/4 so minus 0.25 is the same thing as 1/4 and we also want to square we also want to square that now this is what's fun about spreadsheets I can apply those formulas to every row now and notice what it did when I did that now all of a sudden this is the y-value that my line would predict it's now using this x-value and sticking it over here it's now figuring out the squared distance from the line using using the what the line would predict and using the y-value this one and then it does the same thing and then it does the same thing over here it figures out the squared distance of this y value from the mean from the mean so what is the total squared error with the line so let me just sum this up the total squared error with the line is two point seven three and then the total the total variation from the mean squared vary eight the squared distances from the mean of the Y are twenty two point seven five so let me be very clear what this is let me be very clear what this is so let me write these numbers down so R squared I'll write it up here so we can keep looking at this actual graph I'll write it over here so our squared error versus our line our total squared error we just computed to be two point seven four I rounded a little bit and what that is is you take each of these data points vertical distance to the line so this distance squared plus this distance squared plus this distance squared plus this distance squared that's all we just calculated on excel and that total distance is two that total squared variation to the line is two point seven four or total squared error with the line and then the other number we figured out was the total distance from the mean so the mean here is y is equal to one-fourth so that's going to be right over here so Y is equal to one fourth is going to be right over this is one half so right over here so this is our mean Y let me draw a little bit neater than that this is our mean Y value this is our mean Y value or the central tendency for our Y values and so what we calculated next was the total error the squared error from the means of our Y values that's what we calculated over here this is what we calculated over here in the spreadsheet you see it in the formula it is this number E to minus 0.25 which is the mean of our Y's squared that's exactly what we calculated we calculated for each of the Y values and then we sum them all up is 22.75 it is equal to it is equal to 22 22.75 so if you wanted to know so this is essentially the error the error that the line does not explain this is the total error this is the total variation in the numbers so if you wanted to know the percentage of the total variation that is not explained by the line we'd take you could take this number divided by this number so two point seven four two point two seven four over over twenty-two point two seven five this tells us the total this tells us the total or the the percentage this tells us the percentage of total variation total variation not not explained by the line or by the variation in X by variation variation in X and so what is this number going to be what is this number I could just use Excel for this so if I'm just going to divide so I'm just going to divide this number divided by divided by this number right over there I get point one two so this is equal to 0.12 so this is equal right over here this is equal to point zero point one two or another way to think about it is 12% of the total variation is not explained by the variation in X the total squared distance between each of the points or they're kind of spread their variation is not explained by the variation X so if you want the amount that is explained by the variance in X you just subtract that from one so let me write it right over here so we have so our R squared which is the percent of the total variation that is explained by X is going to be one minus that point one two that we just calculated one minus that point zero point one two that we just calculated which is going to be 0.88 so R squared here is 0.88 it's a it's very very close to one the highest number it can be is one so what this tells us or a way to interpret it this is eighty-eight percent eighty eight percent of the total variation total variation of these Y values is explained is explained by the line or by the variation in X by variation in X and you can see that it looks like a pretty good fit each of these aren't too far they're definitely much closer to this line they're definitely much closer to those though each of these points are definitely much closer to the line than they are to the mean line to the mean line in fact all of them are closer to our actual line then to the mean