If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Scatterplots | Lesson

A guide to scatterplots on the digital SAT

What are scatterplots?

A scatterplot displays data about two variables as a set of points in the xy-plane. Each axis of the plane usually represents a variable in a real-world scenario.
A scatterplot is graphed on the xy-plane. The data points form a tight diagonal cluster that trends upward.
In this lesson, we'll learn to:
  1. Use the line of best fit to describe scatterplots
  2. Make predictions using the line of best fit
  3. Fit functions to scatterplots
This lesson builds upon the following skills:
  • Data representations
  • Graphs of linear equations and functions
  • Quadratic graphs
  • Exponential graphs
You can learn anything. Let's do this!

How do we talk about scatterplots?

Bivariate relationship linearity, strength and direction

Khan Academy video wrapper
Bivariate relationship linearity, strength and directionSee video transcript

What is the line of best fit?

Interpreting a trend line

Khan Academy video wrapper
Interpreting a trend lineSee video transcript

The line of best fit

While each point in a scatterplot represents a specific observation, the line of best fit describes the general trend based on all of the points.
For a given data point, we expect to see a difference between its y-value and the y-value predicted by the line of best fit. These differences are used for more advanced statistical analysis; for the SAT, we only need to calculate the difference.
We can also interpret the slope and y-intercept of the line of best fit the same way we interpret line graphs:
  • The slope represents a constant rate of change.
  • The y-intercept represents an initial value.

Try it!

Try: find the difference between predicted and actual values
A scatterplot and its line of best fit are shown the the xy-plane. The points form a linear cluster that trend upward from left to right. The line of best fit passes through the points (5, 5.2) and (15, 11). Point m is located at (15, 5).
A scatterplot and its line of best fit are shown in the xy-plane above.
The line of best fit passes through the point (15,
  • Your answer should be
  • an integer, like 6
).
Point m has the coordinates (15,
  • Your answer should be
  • an integer, like 6
).
The positive difference in y-value between the data point and the line of best fit is
  • Your answer should be
  • an integer, like 6
.


TRY: Interpret the meaning of the line of best fit
A scatterplot graphs the relative housing cost in percent of national average cost versus population density in people per square mile land area. The points form a cluster that trend linearly upward from left to right. The line of best fit is also shown.
The scatterplot above shows the relative housing cost and the population density for several large US cities in the year 2005. The equation of the line of best fit is y=0.0125x+61.
The constant 61 means that when the population density is 0 people per square mile of land area, the relative housing cost is
.
The coefficient 0.0125 means that as the population density increases by 1,000 people per square mile land area, the relative housing cost increases by
of the national average cost.


How do I use the line of best fit to make predictions?

Line of best fit: smoking in 1945

Khan Academy video wrapper
Line of best fit: smoking in 1945See video transcript

Predicting what we can and cannot see

When making predictions based on scatterplots, always use the line of best fit instead of individual data points.
If the prediction lies within the part of the xy-plane shown, it must lie on the line of best fit.
If the prediction lies beyond the part of the xy-plane shown, we can either extend the line of best fit or use its equation to find the prediction.

Try it!

Try: predict using the line of best fit
A scatterplot graphs the relative housing cost in percent of national average cost versus population density in people per square mile land area. The points form a cluster that trend linearly upward from left to right. The line of best fit is also shown.
The scatterplot above shows the relative housing cost and the population density for several large US cities in the year 2005. The equation of the line of best fit is y=0.0125x+61.
According to the graph, the predicted relative housing cost for a population density of 15,000 people per square mile land area is approximately
of the national average cost.
According to the equation of the line of best fit, the predicted relative housing cost for a population density of 5,000 people per square mile land area is
of the national average cost.


How do I fit functions to scatterplots?

Use direction and intercepts to determine the best fit

On the SAT, questions that ask you to fit a function to a scatterplot are always multiple choice, and all four choices are usually functions of the same type, e.g., four linear functions or four quadratic functions.
For linear functions in the form f(x)=mx+b:
  • Sketch a line that fits the data and approximate its slope.
  • The value of m should match the slope. Make sure to pay attention to the signs!
  • Approximate the y-intercept of the function that best fits the data. Make sure the constant term b matches the y-intercept.
For quadratic functions in the form f(x)=ax2+bx+c:
  • Sketch a parabola and approximately fits the data.
  • If the parabola opens upward, a should be positive. If the parabola opens downward, a should be negative.
  • Approximate the y-intercept of the function that best fits the data. Make sure the constant term c matches the y-intercept.

Try it!

Try: describe a modeling function for a scatterplot
A scatterplot graphs shoulder height in centimeters versus foot length in centimeters. The points form a tight cluster that trend upward from left to right.
The scatterplot above shows the foot lengths and shoulder heights of the elephants in Kruger National Park in South Africa.
According to the scatterplot, as foot length increases, shoulder height generally
. Therefore, the slope of the line of best fit for this scatterplot is
.
If we sketch a line of best fit for the scatterplots, the y-intercept of the line would be close to 0 and slightly


Your turn!

Practice: find the difference between data and prediction
A scatterplot graphs height in inches versus width in inches for 12 picture frames. The line of best fit for the scatterplot rises through approximately (5, 4) and (25, 30). The width and height of the frames are as follows: 5 by 5, 7.5 by 5.5, 9 by 12, 11.5 by 17.5, 12 by 9, 12 by 15, 12 by 18, 13 by 9.5, 15 by 15, 18 by 13, 20 by 26, 25 by 34.
The scatterplot above shows the dimensions of 12 picture frames on Lee's wall along with the line of best fit. Which of the following statements about the widest picture frame is true?
Choose 1 answer:


Practice: interpret the line of best fit
A scatterplot graphs average rating in points versus price in dollars. The line of best fit for the scatter plot rises through (1, 0), (2, 5), and (3, 10).
A panel is rating different kinds of potato chips. The scatterplot above shows the relationship between their average ratings and the price of the chips. The line of best fit for the data is also shown. According to the line of best fit, which of the following is closest to the predicted increase in average rating for every $0.10 increase in price?
Choose 1 answer:


Practice: predict using the line of best fit
A scatterplot graphs mileage in thousands of miles versus car age in years. The line of best fit for the scatterplot rises through the points (3, 40) and (8, 100).
The scatterplot above shows data from a random sample of people who reported the age and mileage of their cars. A line of best fit for the data is also shown. Based on the line of best fit, which of the following is closest to the predicted mileage, in thousands of miles, of a car that is 13 years old?
Choose 1 answer:


Practice: fit a quadratic function to a scatterplot
A scatterplot graphs the number of employees remaining versus time in hours. As time increases, the number of employees remaining decreases slowly at first and more rapidly as time elapses.
The scatterplot above shows y, the number of employees remaining in an office building, x hours after the building's air conditioning stopped working. Of the following equations, which best models the data in the scatterplot?
Choose 1 answer:


Want to join the conversation?