## Statistics and probability

### Course: Statistics and probability>Unit 5

Lesson 3: Introduction to trend lines

# Linear regression review

Linear regression is a process of drawing a line through data in a scatter plot. The line summarizes the data, which is useful when making predictions.

### What is linear regression?

When we see a relationship in a scatterplot, we can use a line to summarize the relationship in the data. We can also use that line to make predictions in the data. This process is called linear regression.
### Fitting a line to data

There are more advanced ways to fit a line to data, but in general, we want the line to go through the "middle" of the points.
practice problem
Which line fits the data graphed below?

### Using equations for lines of fit

Once we fit a line to data, we find its equation and use that equation to make predictions.

#### Example: Finding the equation

The percent of adults who smoke, recorded every few years since $1967$, suggests a negative linear association with no outliers. A line was fit to the data to model the relationship.
Write a linear equation to describe the given model.
Step 1: Find the slope.
This line goes through $\left(0,40\right)$ and $\left(10,35\right)$, so the slope is $\frac{35-40}{10-0}=-\frac{1}{2}$.
Step 2: Find the $y$-intercept.
We can see that the line passes through $\left(0,40\right)$, so the $y$-intercept is $40$.
Step 3: Write the equation in $y=mx+b$ form.
The equation is $y=-0.5x+40$
Based on this equation, estimate what percent of adults smoked in $1997$.
To estimate what percent of adults smoked in $1997$, we can plug in $30$ for $x$ (since $x$ represents years since $1967$):
$\begin{array}{rl}y& =-0.5x+40\\ \\ y& =\left(-0.5\right)\left(30\right)+40\\ \\ y& =-15+40\\ \\ y& =25\end{array}$
Based on the equation, about $25\mathrm{%}$ of adults smoked in $1997$.
practice problem
Jacob distributed a survey to his fellow students asking them how many hours they'd spent playing sports in the past day. He also asked them to rate their mood on a scale from $0$ to $10$, with $10$ being the happiest. A line was fit to the data to model the relationship.
Which of these linear equations best describes the given model?
Based on this equation, estimate the mood rating for a student that spent $2.5$ hours playing sports.

