If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

# Correlation and Causation | Lesson

## What is the difference between correlation and causation?

Many studies and surveys consider data on more than one variable. For example, suppose a study finds that, over the years, the prices of burgers and fries have both increased. Does this mean that an increase in the price of burgers causes the an increase in the price of fries? To answer questions like this, we need to understand the difference between correlation and causation.
Correlation means there is a relationship or pattern between the values of two variables. A scatterplot displays data about two variables as a set of points in the $xy$-plane and is a useful tool for determining if there is a correlation between the variables.
Causation means that one event causes another event to occur. Causation can only be determined from an appropriately designed experiment. In such experiments, similar groups receive different treatments, and the outcomes of each group are studied. We can only conclude that a treatment causes an effect if the groups have noticeably different outcomes.

### What skills are tested?

• Describing a relationship between variables
• Identifying statements consistent with the relationship between variables
• Identifying valid conclusions about correlation and causation for data shown in a scatterplot
• Identifying a factor that could explain why a correlation does not imply a causal relationship

## How can we determine if variables are correlated?

If there is a correlation between two variables, a pattern can be seen when the variables are plotted on a scatterplot. If this pattern can be approximated by a line, the correlation is linear. Otherwise, the correlation is non-linear.
There are three ways to describe correlations between variables.
• : As $x$ increases, $y$ tends to increase.
• : As $x$ increases, $y$ tends to decrease.
• : As $x$ increases, $y$ tends to stay about the same or have no clear pattern.

## Why doesn't correlation mean causation?

Even if there is a correlation between two variables, we cannot conclude that one variable causes a change in the other. This relationship could be coincidental, or a third factor may be causing both variables to change.
For example, Liam collected data on the sales of ice cream cones and air conditioners in his hometown. He found that when ice cream sales were low, air conditioner sales tended to be low and that when ice cream sales were high, air conditioner sales tended to be high.
• Liam can conclude that sales of ice cream cones and air conditioner are positively correlated.
• Liam can't conclude that selling more ice cream cones causes more air conditioners to be sold. It is likely that the increases in the sales of both ice cream cones and air conditioners are caused by a third factor, an increase in temperature!

TRY: DESCRIBING A RELATIONSHIP
Vivek notices that students in his class with larger shoe sizes tend to have higher grade point averages. Based on this observation, what is the best description of the relationship between shoe size and grade point average?

TRY: FINDING A CONSISTENT STATEMENT
A principal collected data on all students at her high school and concluded that there is no correlation between the number of absences and grade point average. Which of the following statements are consistent with the principal's findings?

TRY: INTERPRETING A SCATTERPLOT
The scatterplot above shows the price of a hot dog and a small drink at seventeen different baseball stadiums. Based on the scatterplot, which of the following statements is true?

TRY: IDENTIFYING A CAUSAL FACTOR
Data from a certain city shows that the size of an individual's home is positively correlated with the individual's life expectancy. Which of the following factors would best explain why this correlation does not necessarily imply that the size of a individual's home is the main cause of increased life expectancy?

## Things to remember

If there is a correlation between two variables, a pattern will be seen when the variables are plotted on a scatterplot.
There are three ways to describe the correlation between variables.
• Positive correlation: As $x$ increases, $y$ increases.
• Negative correlation: As $x$ increases, $y$ decreases.
• No correlation: As $x$ increases, $y$ stays about the same or has no clear pattern.
Causation can only be determined from an appropriately designed experiment.
• Sometimes when two variables are correlated, the relationship is coincidental or a third factor is causing them both to change.

## Want to join the conversation?

• I don't like the use of the word "linear" in question two. If there were no correlation, then the relationship could still be linear in that the "line" would be a flat line along one of the axes showing that one factor stays consistent whether or not the other factor is changed (no correlation). Do people refer to "linear" relationship to strictly mean correlated or has our definition become more precise?
• Two variables can have a linear relationship and not be correlated, or have a linear relationship and be correlated (positively or negatively).
The 'linear' is important because you could have other ways of correlating data which are not linear (for example, variables which are very strongly correlated in an exponential relationship, but only slightly correlated in a linear relationship)
• to be honest, I knew what the answer to each one was, but i didnt know how to phrase it
• how can the data on a scatter-plot be considered linear if it is not linear but instead it seems to have no correlation.
• Is there a way to identify if a relationship is causal rather than correlated?
(1 vote)
• We need explainability. If we can explain why the relationship is causal, that still only makes it a theory. In order to verify causality, we would need to design an experiment in such a way that all other variables are controlled/constant so that any change in our Y variable could only be occuring because of the changes in our X variables (as all other factors are being kept constant).

https://towardsdatascience.com/correlation-is-not-causation-ae05d03c1f53