If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Correlation and causality

Understanding why correlation does not imply causality (even though many in the press and some researchers often imply otherwise). Created by Sal Khan.

Want to join the conversation?

  • blobby green style avatar for user mana.malipeddi
    So how do we know, given some data, that two variables are just correlated or there's some causality between them?
    (158 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user pleitch
      I'm a statistician and I can categorically state that causality is ideological.

      That is, if the data is related (correlated), and if you susplect one causes the other, you are making an ideological statement. It might be true, it might not be – there isn’t enough information to supported or rejected that assertion.

      Sometimes the statement is very obvious - the temperature is correlated to the length of the day... well... the length of the day relates to the amount of sun shine, and therefore we can safely say that the length of the day causes changes in temperature. Sometimes the statment isn't so obvious, like above example. What appears to be a perfectly logical assumption has no basis. The same used to happen in history where people though bad smells gave you diseases (rather than both bad smells and diseases being related to poor hygene and microbial action).

      So at the very least causation is a hypothesis (hypothetical thesis – unproven theory), and at best an accepted theory (i.e. previous studies have confirmed that one is likely to cause the other).

      What does this mean? If you find that data are correlated (related), you should then determine if one causes the other.
      (308 votes)
  • aqualine ultimate style avatar for user Lynn Kim
    So what is the perfect definition for the causality?
    (4 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user zahraszakaria
      Causality is relation between something as cause and other thing as effect.
      So, it's not "just" about relation (correlation), there must be cause and effect. To make it clear, we have to distinguish causality from correlation.

      Let say we have two variables: A and B.
      A and B correlates when the value of A and B changes together; for example, when A's values increase, B's values decrease. However, we cannot say yet that A causes the change of B.

      Here are great examples that correlation doesn't equal causation:
      (28 votes)
  • marcimus purple style avatar for user Navya
    What is the difference between causality and causation?
    (5 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user robshowsides
      Hmmm, I think they are pretty close, but used in different contexts. "Causality" is a general, absolute property of the universe, which most scientists believe is an important building block of the real world. They want their theories to respect "causality" meaning that the cause (or causes) of every specific event must happen before the event (say, the decay of a radioactive atom must happen before the click in the geiger counter). "Causation" is usually used to refer to categories, and often only in a probabilistic sense, such as "smoking causes lung cancer", or "global warming causes floods".
      (16 votes)
  • duskpin ultimate style avatar for user Lotte
    Maybe a combination of eating healthy meals and exercise can result in a decrease in obesity?
    (9 votes)
    Default Khan Academy avatar avatar for user
  • purple pi pink style avatar for user Jesse
    idk if this is math
    (7 votes)
    Default Khan Academy avatar avatar for user
  • leaf red style avatar for user Jordan Casey
    Are there real world applications of causality?
    (1 vote)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user crn_aez
    But I will add to my previous comment, just to be fair and balanced, the point of your discussion was spot on… always question the narrative, whether it be in “statistical analysis” or life in general. Regards.
    (6 votes)
    Default Khan Academy avatar avatar for user
  • blobby purple style avatar for user Chris stewart
    i need help
    (5 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user parkerjacobson
    I like how the rest is about scatter plots but then this is about obesity
    (5 votes)
    Default Khan Academy avatar avatar for user
  • starky sapling style avatar for user ozhang32
    Great point addressed here, to distinguish causality and correlation. Eating breakfast might in certain circumstances leads to a less likelihood of having obesity given an ideal situation where the individual practice a healthy life routine.

    But I wonder if Sal has neglected one word in the title of the webMD article---"may". When the title goes "Eating Breakfast May Beat Teen Obesity", is it necessarily suggesting a causality or it is in effect indicating a correlation?

    It sounds to me that the difference between causality and correlation is when the occurrence of event A leads to B, and it can't go the other way. This indicates 1) a strict time order, A has to happen before B, and 2) A has the strongest correlation with B amongst all the other factors that may or may not contribute to the occurrence of event B so that A has the determining power for B's occurrence which makes the correlation between A and B not simply a correlation but a causation as well.
    (5 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user cossine
      The title "Eating Breakfast May Beat Teen Obesity" suggest correlation not causation as mentioned by Sal.

      It sounds to me that the difference between causality and correlation is when the occurrence of event A leads to B, and it can't go the other way.

      This is not what correlation is about. casuality would mean that A causes B it does not say whether B cause A. Let suppose for particular person dust exposure causes asthma. This does not mean if a person has asthma they been exposed to dust. It could be there was some other trigger.

      2) A has the strongest correlation with B amongst all the other factors

      There could be many other factors involved. It might be A does not have highest correlation with B amongst the factors studied.
      (0 votes)

Video transcript

I have this article right here from WebMD. And the point of this isn't to poke holes at WebMD. I think they have some great articles and they have some great information on their site. But what I want to do here is to think about what a lot of articles you might read or a lot of research you might read are implying and to think about whether they really imply what they claim to be implying. So this is an excerpt of an article, and the title of the article says "Eating breakfast may beat teen obesity." So they're already trying to create this cause-and-effect relationship. The title itself says if you eat breakfast then you're less likely-- or you won't be obese. You're not going to be obese. So the title right there already sets up this. That eating breakfast may beat teen obesity. And then they tell us about the study. "In the study, published in Pediatrics, researchers analyzed the dietary and weight patterns of a group of 2,216 adolescents over a five-year period from public schools in Minneapolis-Saint Paul, Minnesota." And I won't talk too much about this. It looks like a good sample size. It was over a large period of time. I'll just give the researchers the benefit of the doubt, assume that it was over broad audience, that they were able to control for a lot of variables. But then they go on to say, "The researchers write that teens who ate breakfast regularly had a lower percentage of total calories from saturated fat and ate more fiber and carbohydrates." And to some degree that first-- "than those who skipped breakfast." And to some degree this first sentence is obvious. Breakfast tends to be things like cereals, grains. You eat syrup, you eat waffles-- that all tends to fall in the category of carbohydrates and sugars. And frankly, that's not even necessarily a good thing. Not obvious to me whether bacon is more or less healthy than downing a bunch of syrup or Fruit Loops or whatever else. But we'll let that be right here. "In addition, regular breakfast eaters seemed more physically active then the breakfast skippers." So over here they're once again trying to create this other cause-and-effect relationship. Regular breakfast eaters seemed more physically active than the breakfast skippers. So the implication here is that breakfast makes you more active. And then this last sentence right over here, they say "Over time, researchers found teens who regularly ate breakfast tended to gain less weight and had a lower body mass index than breakfast skippers." So you could-- they're telling us that breakfast skipping-- this is the implication here-- is more likely, or it can be a cause of making you overweight or maybe even making you obese. So the entire narrative here, from the title all the way through every paragraph, is look, breakfast prevents obesity. Breakfast makes you active. Breakfast skipping will make you obese. So you just say then, boy, I have to eat breakfast. And you should always think about the motivations and the industries around things like breakfast. But the more interesting question is does this research really tell us that eating breakfast can prevent obesity? Does it really tell us that eating breakfast will cause some to become more active? Does it really tell us that breakfast skipping can make you overweight or make it obese? Or, it is more likely, are they showing that these two things tend to go together? And this is a really important difference. And let me kind of state slightly technical words here. And they sound fancy, but they really aren't that fancy. Are they pointing out causality, which is what it seems like they're implying. Eating breakfast causes you to not be obese. Breakfast causes you to be active. Breakfast skipping causes you to be obese. So it looks like they are kind of implying causality. They're implying cause and effect, but really what the study looked at is correlation. The whole point of this is to understand the difference between causality and correlation because they're saying very different things. Causality versus correlation. And, as I said, causality says A causes B. Well, correlation just says A and B tend to be observed at the same time. Whenever I see B happening, it looks like A is happening at the same time. Whenever A is happening, it looks like it also tends to happen with B. And the reason why it's super important to notice the distinction between these is you can come to very, very, very, very, very different conclusions. So the one thing that this research does do, assuming that it was performed well, is it does show a correlation. So the study does show a correlation. It does show, if we believe all of their data, that breakfast skipping correlates with obesity and obesity correlates with breakfast skipping. We're seeing it at the same time. Activity correlates with breakfast and breakfast correlates with activity-- that all of these correlate. What they don't say-- and there's no data here that lets me know one way or the other-- what is causing what or maybe you have some underlying cause that is causing both. So for example, they're saying breakfast causes activity, or they're implying breakfast causes activity. They're not saying it explicitly. But maybe activity causes breakfast. Maybe. They didn't write the study that people who are active, maybe they're more likely to be hungry in the morning. Activity causes breakfast. And then you start having a different takeaway. Then you don't say, wait, maybe if you're active and you skip breakfast-- and I'm not telling you that you should. I have no data one way or the other-- maybe you'll lose even more weight. Maybe it's even a healthier thing to do. We're not sure. So they're trying to say, look, if you have breakfast it's going to make you active, which is a very positive outcome. But maybe you can have the positive outcome without breakfast. Who knows? Likewise they say breakfast skipping, or they're implying breakfast skipping, can cause obesity. But maybe it's the other way around. Maybe people who have high body fat-- maybe, for whatever reason, they're less likely to get hungry in the morning. So maybe it goes this way. Maybe there's a causality there. Or even more likely, maybe there's some underlying cause that causes both of these things to happen. And you could think of a bunch of different examples of that. One could be the physical activity. And these are all just theories. I have no proof for it. But I just want to give you different ways of thinking about the same data and maybe not just coming to the same conclusion that this article seems like it's trying to lead us to conclude. That we should eat breakfast if we don't want to become obese. So maybe if you're physically active, that leads to you being hungry in the morning, so you're more likely to eat breakfast. And obviously being physically active also makes it so that you burn calories. You have more muscle. So that you're not obese. So notice if you view things this way, if you say physical activity is causing both of these, then all of a sudden you lose this connection between breakfast and obesity. Now you can't make the claim that somehow breakfast is the magic formula for someone to not be obese. So let's say that there is an obese person-- let's say this is the reality, that physical activity is causing both of these things. And let's say that there is an obese person. What will you tell them to do? Will you tell them, eat breakfast and you won't become obese anymore? Well, that might not work, especially if they're not physically active. I mean, what's going to happen if you have an obese person who's not physically active? And then you tell them to eat breakfast? Maybe that'll make things worse. And based on that, that the advice or the implication from the article is the wrong thing. Physical activity maybe is the thing that should be focused on. Maybe something other than physical activity. Maybe you have sleep, maybe people who sleep late and they're not getting enough sleep, maybe that leads to obesity. And obviously, because they're not getting enough sleep, they wake up as late as possible and they have to run to the next appointment-- or they have to run to school in the case of students-- and maybe that's why they skip breakfast. So once again, if you find someone that's obese, maybe the rule here isn't to force a breakfast down your throat. Maybe it will become even worse because maybe it is the lack of sleep that's causing your metabolism to slow down or whatever. So it's very, very important when you're looking at any of these studies to try to say, is this a correlation or is this causality? If it's correlation, you cannot make the judgment that, hey, eating breakfast is necessarily going to make someone less obese. All that tells you is that these things move together. A better study would be one that is able to prove causality. And then we could think of other underlying causes that would kind of break down the narrative that this piece is trying to say. I'm not saying it's wrong. Maybe it's absolutely true that eating breakfast will fight obesity. But I think it's equally or more important to think about what the other causes are, not to just make a blanket statement like that. So for example, maybe poverty causes you to skip breakfast for multiple reasons. Maybe both of your parents are working. There's no one there to give you breakfast. Maybe there's more stress in the-- who knows what it might be? And so when you have poverty maybe you're more likely to skip breakfast and maybe when there's poverty, and maybe you have two-- both your parents are working and the kids have to make their own dinner and whatever else-- maybe they also eat less healthy at all times of day and then that leads to obesity. So once again in this situation, if this is the reality of things, just telling someone to also eat breakfast regardless of what that breakfast is, even if it's Fruit Loops or syrup, that's probably not going to help the situation. Maybe it's just eating unhealthy dinners is the underlying cause. And if you eat an unhealthy dinner maybe by breakfast time you're not hungry still because you've binged so much on breakfast. So you skip breakfast. And this also leads to obesity. But once again, if this is the actual reality, doing the advice that that article's saying might actually be a bad thing. If you need an unhealthy dinner and then force yourself to eat a breakfast when you're not hungry, that might make the obesity even worse. So the whole point of this video isn't to say that the implications from that article are necessarily wrong. The important thing is to just realize that it might be wrong. And that just because you saw this correlation with the data, it doesn't mean that eating breakfast is going to somehow magically fight obesity.