If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

# Fitting a line to data

## Video transcript

in this video I want to give you an example of what it means to fit data to a line instead of doing my traditional video using my little pen tab but I'm going to do it straight on Excel so you could see how to do this for yourself so if you have Excel or some other type of spreadsheet program and we're not going to go into the math of it I really just want you to get the conceptual understanding of what it means to fit a data with line or to do a linear regression so here let's just read the problem the following table shows the median California income remember median is the middle the middle California income from 1995 to 2002 as reported by the US Census Bureau draw a scatterplot and find the equation what would you expect the median annual income of a California family to be in the year 2010 what are the meanings of the slope and the y-intercept of this problem so the first thing you'd want to do this I just copied and pasted this image we have to get the data in a form that the spreadsheet can understand it so let's make some tables here let's say years since 1995 let's make that one column let me make this a little bit wider and then let me put median income income this is the median income in California for a family so we start off with one year or 0 years since 1995 0 1 2 3 4 and actually you can if you want you can just it'll figure out the trend if you just keep going down it'll figure out that you're just incrementing by 1 and then the income I'll just copy in these numbers right there so that's 53 807 55,000 217 55,000 209 fifty-five thousand four hundred fifteen dollars sixty three thousand one hundred dollars sixty three thousand two hundred and six dollars sixty three thousand where am i seven hundred and sixty one dollars and then we have sixty-five thousand seven hundred and sixty six dollars so I don't need the is over here so I'm going to get rid of them I can clear them so let me make sure I have enough entries this is one one two three four five six seven eight and I have one two three four five six seven eight entries now when we check out my data type 53,000 807 55 to 1755 209 415 100 206 seven six point seven six six okay there we go now you're going to find that in Excel this is incredibly easy if you know what to click on the one plot this data create a scatter plot and then even better create a regression of that data so all you have to do is you select the data and then you go to insert and I'm going to insert a scatter plot and then you can pick the different types of scatter plots I just want to plot the data and there you go it plotted the data for me that interest so there you go if you go by this is the actual income and this is by years since 1995 so this is 1995 it was 53,000 807 in 1996 its 55,000 217 so it plotted all the data now what I want to do is fit a line so you know this isn't exactly a line but let's see if we assume that a line can model this data well I'm going to get Excel to fit a line for me so what I can do is I have all of these options up here for different ways to fit a line all of these different options and I'm going to pick this one here you might not be able to see it it looks like it has a line between dots and also has FX which sells me it's going to tell me the equation of the line so if I click on that there you go it not only fit it read plotted that same data on a different graph let me make it a little bit bigger let me make it a little bit no I don't want to do that let me make it a little bit bigger we can cover up the data now just because I think we know what's going on so let me cover it up right like that so not only did it plot the various data points it actually fit a line to that data and it gave me the equation of that line it says the equation of this line is y I don't think you can let me see if I can make this a little bit bigger let me see if I can I won't I'll I'll move it out of the way so you can read it at least so it tells me right here that the equation for this line is y is equal to eighteen hundred and eighty two point three x plus fifty two thousand eight hundred and forty seven so if you remember what we know about a slope and y-intercept the y-intercept is fifty two thousand eight hundred and forty seven which is the if you if you use this line as your measure what where this line intersects at Year Zero or in nineteen ninety five so if you use this line as a model in nineteen ninety five the line would say that you're going to make fifty two thousand eight hundred forty-seven the actual data was a little bit off of that it was a little bit higher fifty-three 807 so it was a little bit higher but we're trying to get a line that gets as close as possible to all of this this data it's actually trying to minimize the distance the square of the distance between each of these points in the line and we won't go into the math there but it gave us this nice equation now we can use this nice equation to predict things if we say that this is a good model for the data let me bring this down a little bit let's try to answer our question so we drew a scatterplot really excel did it for us we found the equation right there they say what would you expect the median annual income of a California family to be in the year 2010 so here we can just use the equation they gave us this right over here mm this right here was 2002 so I can write down the year this was the year of 2002 that was the year 2002 so the year 2010 is 8 more years and I can let me let me make a little column here let me make a column here so this is the year 1995 1996 and then Excel will be able to figure out if I select those and I go to this little bottom right square and I scroll down Excel will actually figure out will actually figure out that I want to increment by one year every time and if I say years since 1995 once again I can just continue this trend right here so 2010 would be 15 years and so we can just apply this equation we could say it's going to be equal to according to this line I'm just going to type it in hopefully you can read what I'm saying 1882 0.3 times X X here's the years since 1995 so times I could just select this cell or I could type in the number 15 that means times this cell times 15 and then plus plus 50 2847 plus that right there click enter and it predicts eighty one thousand eighty one dollars and fifty cents so if you just continue this line for another eight or so years it predicts that the median income in california for family will be eighty one thousand dollars anyway hopefully you found that interesting this is spreadsheets are very useful tools for manipulating data and it'll give you a sense of why linear models are interesting why lines are interesting and how you can actually use these tools to interpret data and maybe even extrapolate some type of a prediction this right here is an extrapolation using this linear regression