Main content

## Statistics and probability

### Course: Statistics and probability > Unit 5

Lesson 3: Introduction to trend lines- Fitting a line to data
- Estimating the line of best fit exercise
- Eyeballing the line of best fit
- Estimating with linear regression (linear models)
- Estimating equations of lines of best fit, and using them to make predictions
- Line of best fit: smoking in 1945
- Estimating slope of line of best fit
- Equations of trend lines: Phone data
- Linear regression review

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Linear regression review

Linear regression is a process of drawing a line through data in a scatter plot. The line summarizes the data, which is useful when making predictions.

### What is linear regression?

When we see a relationship in a scatterplot, we can use a line to summarize the relationship in the data. We can also use that line to make predictions in the data. This process is called

**linear regression**.*Want to see an example of linear regression? Check out this video.*

### Fitting a line to data

There are more advanced ways to fit a line to data, but in general, we want the line to go through the "middle" of the points.

*Want to learn more about fitting a line to data? Check out this video.*

*Want to practice more problems like this? Check out this exercise.*

### Using equations for lines of fit

Once we fit a line to data, we find its equation and use that equation to make predictions.

#### Example: Finding the equation

The percent of adults who smoke, recorded every few years since $1967$ , suggests a negative linear association with no outliers. A line was fit to the data to model the relationship.

**Write a linear equation to describe the given model.**

**Step 1:**Find the slope.

This line goes through $(0,40)$ and $(10,35)$ , so the slope is $\frac{35-40}{10-0}}=-{\displaystyle \frac{1}{2}$ .

**Step 2:**Find the

We can see that the line passes through $(0,40)$ , so the $y$ -intercept is $40$ .

**Step 3:**Write the equation in

The equation is $y=-0.5x+40$

**Based on this equation, estimate what percent of adults smoked in**$1997$ .

To estimate what percent of adults smoked in $1997$ , we can plug in $30$ for $x$ (since $x$ represents years since $1967$ ):

Based on the equation, about $25\mathrm{\%}$ of adults smoked in $1997$ .

*Want to practice more problems like these? Check out this exercise.*

## Want to join the conversation?

- How will I know for sure if my rounding to the nearest hundred correct?(4 votes)
- Then you check your answer again and see if you got it right or wrong.(2 votes)

- In the practice it asks for the exact number like if i got a 97 as an average for an answer it says my answer is wrong and the answer is like 95 or 96.(2 votes)
- how do you calculate linear regression by hand if you don't have a graphing calculator?(2 votes)
- In a later course Sal describes the least squares regression(1 vote)

- Does the line have to have a positive slope for there to be a linear relationship?(1 vote)
- Absolutely not! Slopes can be negative too, that just means the slope-intercept formula will look like y=-mx+b instead of y=mx+b(3 votes)

- what if the y intercept is not given how do you find it then(2 votes)
- You can also look at the formula of the equation.(2 votes)

- does the line have a slope(0 votes)
- Can you explain how to find the formula? I am still not understanding.(0 votes)
- How would you apply linear regression to a data table?(0 votes)
- You first plot the data points in a scatter plot. If you had "hours playing sports" as your column header, and "mood rating" as your row header, each value could be plotted on a graph, and then you would find the regression line.(1 vote)