Main content

### Course: Statistics and probability > Unit 5

Lesson 3: Introduction to trend lines- Fitting a line to data
- Estimating the line of best fit exercise
- Eyeballing the line of best fit
- Estimating with linear regression (linear models)
- Estimating equations of lines of best fit, and using them to make predictions
- Line of best fit: smoking in 1945
- Estimating slope of line of best fit
- Equations of trend lines: Phone data
- Linear regression review

© 2024 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Estimating with linear regression (linear models)

A line of best fit is a straight line that shows the relationship between two sets of data. We can use the line to make predictions. To find the best equation for the line, we look at the slope and the y-intercept. Remember, this is just a model, so it's not always perfect!

## Want to join the conversation?

- For the second part of the question, could you just plug 3.8 into the equation? Isn't that what the question was asking, and not just to look off the graph?(60 votes)
- Yes. The question specifically asks you to use the equation, in which case the answer would be 96, not 97, but since the true values seem to be varying by up to about +/-10, 97 isn't too bad an estimate.(26 votes)

- there is a huge mistake in this video. the estimate is wrong., I typed in 97 and it turns out it was 96 and then I had to restart everything because of this answer !(26 votes)
- The estimate is anything near 97, it doesn't have to be 97 exactly because it is a estimate. The way I got exactly 96 was by plugging 3.8 for x in the equation.(21 votes)

- This video doesn't explain how to calculate plots that are not on the graph. How do we guess the future plots? What's the formula?(9 votes)
- You can find the equation of the line and just simply plug x variable in the equation.(8 votes)

- How can you exactly estimate this?(10 votes)
- You take your variable for x (in this case 3.8) and plug it into the equation you chose. Sal simply eyeballed it, but in order to be more accurate, you should work it out with the equation.(6 votes)

- ITS 96 ! all the trust i've built up for years.. where is it? GONE!(7 votes)
- it is a estimate it doesn't have to be exact, see I know that and my trust is unshaken.(5 votes)

- There are multiple complains on this, as for myself, do not understand this "simple" format.(8 votes)
- so weird that he wouldn't just plug 3.8 into the equation.....(6 votes)
- For the second part of the question, could you just plug 3.8 into the equation? Isn't that what the question was asking, and not just to look off the graph?(5 votes)
- yes, he just estimated it by looking at the graph, but yes, you should do that.(3 votes)

- In the next Practice lesson, Estimating equations of lines of best fit, and using them to make predictions, there is a multiple-choice question with 3 possible answers in y-intercept form. But the y has a ^ over it. Can anybody tell me what this means? Here's the link: https://www.khanacademy.org/math/cc-eighth-grade-math/cc-8th-data/cc-8th-line-of-best-fit/e/equations-of-lines-of-best-fit-to-make-predictions?modal=1

(3 votes)- y-hat is point estimate of y.

In simple terms it is just a estimate of y.(5 votes)

- How do u the formula tho(5 votes)
- To find the formula for the linear equation of the line, you need to determine the slope (m) and the y-intercept (b) from the given data points. The slope (m) can be calculated by finding the change in y divided by the change in x between two points on the line. The y-intercept (b) is the y-coordinate where the line intersects the y-axis. Once you have the values of m and b, you can write the equation of the line in the form y = mx + b.(0 votes)

## Video transcript

- [Instructor] Liz's math test
included a survey question asking how many hours students
spent studying for the test. The scatter plot below
shows the relationship between how many hours
students spent studying and their score on the test. A line was fit to the data
to model the relationship. They don't tell us how the line was fit, but this actually looks
like a pretty good fit if I just eyeball it. Which of these linear equations best describes the given model? So this, you know, this
point right over here, this shows that some student
at least self-reported they studied a little bit
more than half an hour, and they didn't actually
do that well on the test, looks like they scored a
43 or a 44 on the test. This right over here shows, or like this one over here is a student who says they studied two hours, and it looks like they scored
about a 64, 65 on the test. And this over here or
this over here looks like a student who studied over four hours, or they reported that, and they got, looks like
a 95 or a 96 on the exam. And so then, and these are
all the different students, each of these points represents a student, and then they fit a line. And when they say which
of these linear equations best describes the given
model, they're really saying which of these linear equations describes or is being plotted right
over here by this line that's trying to fit to the, that's trying to fit to the data. So essentially, we just want to figure out what is the equation of this line? Well, it looks like the
y-intercept right over here is 20. And it looks like all
of these choices here have a y-intercept of 20, so
that doesn't help us much. But let's think about what the slope is. When we increase by one, when we increase along our x-axis by one, so change in x is one, what is our change in y? Our change in y looks like, let's see, we went from 20 to 40. It looks like we went up by 20. So our change in y over
change in x for this model, for this line that's
trying to fit to the data, is 20 over one. So this is going to be our slope. And if we look at all of these choices, only this one has a slope of 20. So it would be this
choice right over here. Based on this equation, estimate the score for a student that spent 3.8 hours studying. So we would go to 3.8,
which is right around, let's see, this would be, 3.8
would be right around here. So let's estimate that score. So if I go straight up, where
do we intersect our model? Where do we intersect our line? So it looks like they would
get a pretty high score. Let's see, if I were to take
it to the vertical axis, it looks like they would get about a 97. So I would write that my estimate is that they would get a
97 based on this model. And once again, this is only a model. It's not a guarantee that if
someone studies 3.8 hours, they're gonna get a 97, but it could give an
indication of what maybe, might be reasonable to expect, assuming that the time studying is the variable that matters. But you also have to be
careful with these models because it might imply if you kept going that if you get, if you study for nine hours, you're gonna get a 200 on the exam, even though something
like that is impossible. So you always have to be careful
extrapolating with models, and take it with a grain of salt. This is just a model that's
trying to fit to this data. And you might be able to
use it to estimate things or to maybe set some
form of an expectation, but take it all with a grain of salt.