If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content
Current time:0:00Total duration:9:52

Video transcript

let's say that you have a hunch that sugar is somehow causing heart disease is somehow causing heart disease and so you want to research this further you want to see how what kind of statistical studies can you perform to better understand sugar intake in the population generally and whether that seems to be causing heart disease in some way well the first thing that you might want to do is just to try to get a sense of sugar intake in the population as a whole now clearly you don't know you there's no way of measuring exactly how much sugar every member let's say you're talking about the United States every member of that 300 million population is consuming every day so that we the way that we try to get a sense of how much sugar is being consumed is by conducting a sample study so you take your population your 300 million people so you take your your population right over here population and you sample it you sample your population and you not only do sample it but you randomly sample it obviously you don't want to just survey people who are exiting a cupcake store or people who are exiting a gym you want to be a random a random sample of people where where you're sampling them shouldn't be shouldn't somehow affect whether or not their answer or how much sugar they might say that they are consuming but they're going to tell you how much sugar they consume let's say on an average day maybe by filling out a survey or through some other way and through that you would take you take this data and obviously the more samples you have the better and we talked about that in depth and other statistics videos about how how you get a better predictor of the actual true population parameter the more samples that you might take but you might do that to get a gauge of how much sugar the average American consumes in a given day so this right over here where you are taking random samples of the population to essentially generate astute it a statistic that is estimating a true parameter which is the actual amount of sugar America are consuming each day we call this a sample study sample sample study it's a way once again of just estimating what the actual amount of sugar people are having each day but let's say you want to go further this will just tell you this will give you a sense of what is likely the amount of sugar that people are consuming each day but you really want to see how that's related to heart disease so you instead what you do is you go survey people you go survey people and you see you say well how much sugar have you consumed over and once again when you pick these people you should be doing it randomly so let's say that you go survey a random sample of 60 year-olds so once again you wouldn't want to sample people who are in the hospital you wouldn't want to sample people who are at the ad or just at the gym you would want to find a random sample or sample them in places where it shouldn't affect their answer which way they're going to go but let's say you were you surveyed 360 year-olds and you asked them how much sugar have they consumed over the last 30 years and you also ask them their condition of their heart and what you get is something like this so on the horizontal axis you put plot sugar consumption and then on the vertical axis you plot you plot heart disease risk or or their level of heart disease heart disease risk let's say at 60 and you plot and you find a plot that looks something like this so each of these points so this is someone who consumed let's say 200 grams of sugar per day and they're now at high heart disease risk now at age 60 but maybe this is someone who is at low heart disease risk at age 60 even though they consumed a lot of sugar every day and so we just keep plotting all of these points and you say well you know what it does look and obviously I'm not going to do all 300 and you say well actually it looks like there is a rough correlation right over here that if you try to plot a line there are definitely some outliers here but looks like it looks like there is a line that you could fit and then so you might say well it looks like sugar and heart disease risk at age sixty that they are correlated that they're related in this way that they move together that if someone consumed a lot of sugar over the last thirty years they about they seem to have worse heart situation and they consumed a lot lower sugar they seem to have a lower or a better heart situation now and this is this often happens in medical science when people see something like this they often jump to the gluten oh they're for sugar consuming more sugar must drive up heart disease risk and that's very dangerous because just seeing this data doesn't tell you that sugar is causing heart disease risk it's it's it could go the other way around it could be that people who end up having a high heart disease risk maybe they crave sugar but it's actually there's some under under there's some other under lysing cause that's making it happen maybe they have some other deficiency that's making them cause sugar so it's not clear which way the causality is happening is the sugar consumption driving the unhealthy heart or somehow the unhealthy heart driving the sugar consumption or maybe there's some other factor maybe there's fat consumption is driving the heart disease and maybe people have more fat also will have more sugar or vice versa or who knows so all this is telling you is that there is a correlation so this right over here you would call this an observational study you've observed a relationship but you can really can't say what is causing what so let me write this down this is an observational observational study now you're probably saying all right well then how could you prove or get them or feel better about the idea that sugar is actually a cause that there's actually causality there well to do that you would actually have to run an experiment and experiment and to do an experimental study here what you would do is try to take two groups of folks you would have your experimental group so that's your experimental group actually let me let me make it a circle here so it's a pool of people so let's say you have a hundred people that are experimental that's your experimental cool and then you have your control you have your control and what you would do if you wanted to run this type of experiment and as we'll see this type of experiment probably wouldn't be run because what some would consider it unethical or actually I would consider unethical as well but what you would do is you would randomly let's say take 30 year-olds you would randomly take 30 year-olds and put them in one of these two groups and once again when we say randomly you don't want to put all the healthy people in one group and all the unhealthy people the other group provides versa you want it to be random you don't want to put all the people of one type one demographic one economic status in one group or the other you want to be random so you randomly put people into these two into these two groups and then the experimental group you would change one variable and the variable you care about is sugar so what you might say is okay all of the people in this group right over here whatever sugar that they would have consumed on top of that they have to drink I don't know they have to drink a cup of syrup every night or they have to have or they have to have a minimum sugar intake so that you essentially force sugar into this group that you're not forcing into this group and then 30 years later so one this is probably unethical to force people to have something that is very likely not good for their health and then you would have to run it for a long period of time you would wait 30 years you would wait 30 years when they're 60 years old and you would see well what was the heart condition of these folks how many people maybe had heart attacks well how many people at age 60 or who are what's their health condition and then is that statistically is it is it unlikely that the difference would have been pure do pure to would be do purely to chance alone so for example if you did this and let's say that now these people had a slightly higher chance for heart disease or heart attacks and these folks after 30 years that still might not be a convincing X it's still not it would be a good experiment but it wouldn't allow you to conclude that sugar is causing it because that might have happened through chance alone but if for example after 30 years let's say this group right over here has 10 times the risk of a heart attack or 10 times the you know whatever the risk factors for heart disease and you would say and you statistically say well but the odds of that happening by chance alone that the hundred people in this group right over here having ten times the chance of heart attack is this group right over here that's unlikely due to chance alone and so you'd say okay we would feel good about our conclusion that this forced sugar is what is causing that so anyway well we'll dig deeper into each of these three types but the whole point of this video is to just give you an appreciation that you know we use statistics a lot but this gives you a context for how we're using it in different situations when we were performing statistical studies this is to estimate the true parameter for a population what what is the actual sugar intake for the population you randomly sample and then you use that sample data to create a statistic that estimates the true parameter observational study you observe what's going on sugar intake versus heart risk and you say well I hate there's a relationship maybe this is worth doing an experiment on because only through an experiment could you attempt to find some type of causality