Main content

### Course: Statistics and probability > Unit 5

Lesson 1: Introduction to scatterplots- Constructing a scatter plot
- Constructing scatter plots
- Making appropriate scatter plots
- Example of direction in scatterplots
- Scatter plot: smokers
- Bivariate relationship linearity, strength and direction
- Positive and negative linear associations from scatter plots
- Describing trends in scatter plots
- Positive and negative associations in scatterplots
- Outliers in scatter plots
- Clusters in scatter plots
- Describing scatterplots (form, direction, strength, outliers)
- Scatterplots and correlation review

© 2024 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Clusters in scatter plots

Learn what a cluster in a scatter plot is!

## What are clusters in scatter plots?

Sometimes the data points in a scatter plot form distinct groups. These groups are called

**clusters.**Consider the scatter plot above, which shows nutritional information for $16$ brands of hot dogs in $1986$ . (Each point represents a brand.) The points form two clusters, one on the left and another on the right.

The left cluster is of brands that tend to be ${\text{low in calories and low in sodium}}$ .

The right cluster is of brands that tend to be ${\text{high in calories and high in sodium}}$ .

## Practice problems

To better wrap our minds around the idea of clusters, let's try a couple of practice problems.

### Problem 1: Male and female fish

Adult male $13$ times as much. Also, while females reach a length of $6$ centimeters, males reach a length of $15$ centimeters.

*Lamprologus callipterus*(a type of fish) are much bigger than their female counterparts. They weigh about### Problem 2: SAT test scores

Some high school students in the U.S. take a test called the SAT before applying to colleges. The scatter plot below shows what percent of each state's college-bound graduates participated in the SAT in $2009{\textstyle \phantom{\rule{0.167em}{0ex}}}\text{-}{\textstyle \phantom{\rule{0.167em}{0ex}}}2010$ , along with that state's average score on the math section.

There is a cluster of states with ${\text{lower participation}}$ , and a cluster of states with ${\text{higher participation}}$ .

## Why do clusters exist in data?

Explaining why clusters exist in a particular data set can be difficult. This article presented three data sets, each using data from the real world. Only in the fish data set was there a clear explanation behind the clusters.

If you have a theory that explains the clusters in either of the other data sets, please share your thoughts in the comments below.

## Want to join the conversation?

- I think that for the SAT problem, clusters might be present because in the states with lower participation because only the students that feel like taking the SAT is "worth it" or have confidence in their abilities take the test. This theory makes sense because of the math scores being higher for the students in states with lower participation. I'm not sure if this is really the reason, but I gave this a shot!(77 votes)
- The ingredients in the hot dogs can effect their ratings(21 votes)
- anyone else on a chromebook?(10 votes)
- The cluster on the top left in SAT score probably has mostly east coast states lmaooo(11 votes)
- The hotdog brand clusters seem to be an example of competitive positioning in marketing. Hotdog brands need to be able to compete with other brands either by being healthier or by being tastier. Any brands with mid-range levels of sodium and calories will be cornered out of the market on all sides.(9 votes)
- Yes, that's a good point. Competitive positioning is an important aspect of marketing, and hot dog brands are no exception. Brands need to find a way to differentiate themselves from competitors and appeal to their target audience. In the case of hot dogs, some brands may choose to focus on health aspects, while others may focus on flavor and indulgence. This creates distinct clusters of brands with different value propositions, and as you mentioned, brands with mid-range levels of sodium and calories may struggle to find a place in the market.(4 votes)

- This was very helpful for me, but I am a little confused on #2, could someone please explain to me?(5 votes)
- The problem is categorizing the dots into either lower or higher participation. It is asking you to find the best fitting answer, but in this case, it is basically just true or false.

The first answer choice says, "The states with lower participation typically had lower math scores." If you look at the graph, all the green dots, which represent lower participation, are higher on the y-axis. The y-axis' variable is test scores. This means this is untrue.

The second choice says, "The states with lower participation typically had higher math scores." Check the graph again. The green dots are high on the y-axis, representing higher test scores. This statement is true.(11 votes)

- why is the pizza box a square if the pizza is a circle but the pizza slice is a triangle?(3 votes)
- The pizza box is square because it is easier and more efficient to manufacture, stack, and transport square boxes. A square box is also more space-efficient than a round box, which would require more material to make and take up more space. As for why the pizza is a circle, it is because a circle is the most efficient shape to hold the maximum amount of pizza on a given surface area. Finally, the pizza slice is cut into a triangle because it allows for even portions and is easy to hold and eat.(7 votes)

- also some dont make a lot sense like the college SAT's(4 votes)
- There is college sat's ∞(3 votes)

- How is it that the state that has lower participation get a higher math grade, 20% of my grade is participation(3 votes)
- To help clarify:

1) in this SAT data, "participation" means "participated in taking the SAT". So it's a percent of students who took the SAT out of the total number of students eligible to take the SAT. In this case, "participation" doesn't refer to how much they participated in school or class in general.

2) An easier way to understand why the lower-participation states did better on the SAT (which sounds counter-intuitive), it's easier to think of it from the point of view of the higher participation states... When you have a whole lot more students taking the SAT, you're going to have a whole lot more chances of that big group getting way different scores. And with a wider range of scores, the average has a chance of being pulled down by the lower scores.

---> A mini-example of this effect: you work on a small group project, your group studies together, you each get top-scores your individual parts of the project, so your group gets a high average score. But if the teacher averages the project grades for everybody in the class, some groups might not have studied as much as your group, so the class average might be at least a little lower than your group's average.

Hope this helps!(3 votes)