If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Types of statistical studies

Created by Sal Khan.

Want to join the conversation?

  • winston baby style avatar for user Ryan Lacroix
    Please correct me if I am wrong, I would love to know what other people think. This video isn't about types of statistical studies as it is the process from hypothesis to experiment for a single idea. Each of the "types" seem to be more "stages" in that progression as opposed to different ways to conduct statistical research.

    Kind Regards!
    (18 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user Jessica Perkins
      In AP Statistics we teach about Observational Studies (includes Sample Studies), which must have random Selection, and Experiments, which must have random assignment of treatments. Often we have to rely on volunteers for experiments, which may affect our scope of inference, but otherwise does not invalidate our results. We can see a correlation from an observational study and we can establish cause and effect from an experiment.
      (6 votes)
  • blobby green style avatar for user Afaque Memon
    if for example some one asks me to define Statistics!! than how should i define??
    (4 votes)
    Default Khan Academy avatar avatar for user
    • purple pi purple style avatar for user Arwa384
      This is not necessarily the most intelligent answer, but according to Sal in the "basic probability" section, he states that Statistics is the analysis of events governed by probability. Wikipedia states that statistics is the study of the collection, organization, analysis, interpretation and presentation of data! Hope this helps!
      (12 votes)
  • aqualine seed style avatar for user Ravi Kumar Yadav
    Whats the difference between Sample study and Observational study as both of them uses sample from the original (or large) pool of data ? moreover both techniques work on these sampled data...
    (6 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user jitter
      I had the same question after the video. From my brief gatherings it seems the first type of study mentioned at () is a Survey type study with data gathered via polling questionnaires and relying on honesty. The observational study ( ) could be measuring taking actual data samples (blood samples for sugar intake). You could sample millions with a survey but not so much with an observational study.
      (2 votes)
  • orange juice squid orange style avatar for user Hannah
    What does the word STATISTICAL mean ? (I looked in both of my dictionaries and could not find an answer.
    (3 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Matthew Daly
      I'm sorry that plan didn't work out, because it was a very good idea! My dictionary is a little better than yours apparently, and it says "of, relating to, based on, or employing the principles of statistics." Zzzzzz.

      In real-person words, statistics is the branch of mathematics that deals with collecting and analyzing data to solve a problem, and something like a study is statistical if it is using statistics in the right way to answer the question it was designed to ask.
      (3 votes)
  • leafers tree style avatar for user Isha R.
    I don't get how this describes Types of Statistical Studies...
    (1 vote)
    Default Khan Academy avatar avatar for user
  • purple pi purple style avatar for user Jonathan
    What about instrumental variables, can't they be used to establish causality when experiments aren't possible?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Matthew Daly
      In theory, although coming up with an appropriate instrument that will withstand criticism might impact the perceived validity of the results. At any rate, instrumental variables are an advanced topic that is beyond what Sal's going to teach in the statistics playlist.
      (3 votes)
  • primosaur ultimate style avatar for user Mike Runge
    I'm having trouble discerning between the different types of statistical studies. What are some bullet points for each study type?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user Jessica Perkins
      An experiment must have random assignment of treatment(s). Often an experiment will incorporate control groups and/or blocking. We don't necessarily have random selection of participants in an experiment because we often have to rely on volunteers. This does not affect our ability to determine cause and effect, but could affect the inference of those results if the volunteers are not representative of the bigger population.
      Observational studies require random selection and generally can only show correlation (not cause and effect!) between variables. We have to be careful of biases and confounding variables. Sample studies, surveys are a type of observational study.
      (2 votes)
  • leaf green style avatar for user medhamajety7
    In regards to a sample study, shouldn't the data be somewhat controlled so most variables are eliminated? For example, in the sugar leads to heart disease study shouldn't you sample people of a certain age, height, weight, ethnicity, etc? Then, if you see any fluctuations or discrepancies within the data you can point to sugar being a cause because all the other variables were controlled whereas if you sampled randomly you don't know if the cause is sugar or age.
    (2 votes)
    Default Khan Academy avatar avatar for user
    • duskpin ultimate style avatar for user Thessalonika
      Sal was talking about studying the ENTIRE population of the US by gathering data from a very broad, varied selection of people. However, in the Observational Study part of the video, he did restrict the people to 60 year olds. I think controlling the data like you described would be the next step if you were doing a serious study on sugar influencing heart disease and wanted to go further.

      Hope this helped and didn't make you too much more confused!
      (2 votes)
  • piceratops ultimate style avatar for user Jet Simon
    What is a parameter? Does only Sample study? Or do all of the others types of statistical studies have them?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user Jessica Perkins
      A parameter is a summary measure about an entire population. We don't know the population parameter and use a statistic (sample summary measure) to estimate the parameter. So when we take a poll and calculate "the proportion of the sample that...."--it's a statistic which we're using to estimate the unknown population parameter--"the true proportion of the population that...." All statistical studies have parameters that are estimated with statistics. If we're doing a randomized experiment and testing the difference in head-ache reduction using two different treatments, the parameter would be the TRUE difference, which we estimate using the results of our experiment.
      (3 votes)
  • leafers tree style avatar for user Angela
    Is there a video about blocking and the difference between that and stratified random sampling?
    (2 votes)
    Default Khan Academy avatar avatar for user

Video transcript

Voiceover: Let's say that you have a hunch that sugar is somehow causing heart disease. Is somehow causing heart disease. So you want to research this further. You want to see what kind of statistical studies can you perform to better understand sugar intake in the population generally and whether that seems to be causing heart disease in some way. Well, the first that you might want to do is just to try to get a sense of sugar intake in the population as a whole. Now clearly, you don't know know, there's no way of measuring exactly how much sugar every member, let's say you're talking about the United States, every member of that 300 million population is consuming everyday. The way that we try to get a sense of how much sugar is being consumed is by conducting a sample study. So you take your population, your 300 million people. So you take your population right over here. Population. And you sample it. You sample your population and not only do you sample it, but you randomly sample it. Obviously you don't want to just survey people who are exiting a cupcake store or people who are exiting a gym. You want it to be a random sample of people, where where you're sampling them shouldn't somehow affect whether or not their answer or how much sugar they might say that they are consuming. But they are gonna tell you how much sugar they consume, let's say, on an average day maybe by filling out a survey or through some other way. And through that you would take this data and obviously, the more samples you have the better and we talk about that in depth in other statistics videos. About how you get a better predictor of the actual true population parameter, the more samples that you might take. But you might do that to get a gauge of how much sugar the average American consumes in a given day. So this right over here where you are taking random samples of the population to, essentially, generate a statistic that is estimating a true parameter, which is the actual amount of sugar Americans are consuming each day. We call this a sample study. Sample study. It's a way, once again, of just estimating what the actual amount of sugar people are having each day. Let's say you want to go further. This will give you a sense of what is likely the amount of sugar that people are consuming each day, but you really want to see how that's related to heart disease. So instead, what you do is you go survey people. You go survey people and you say, "How much sugar have "you consumed over", and once again, when you pick these people, you should be doing it randomly. So let's say you go survey a random sample of 60 year olds. So once again, you wouldn't want to sample people who are in the hospital, you wouldn't want to sample people who are at or just at the gym. You would want to find a random sample or sample them in places where it shouldn't affect their answer, which way they are going to go. Let's say you surveyed 300 60 year olds and you asked them how much sugar have they consumed over the last 30 years. And you also asked them their condition of their heart. And what you get is something like this. So on the horizontal axis you plot sugar consumption and then on the vertical axis you plot heart disease risk or their level of heart disease. Heart disease risk, let's say at 60. And you find a plot that looks something like this. So each of these points. This is someone who consumed 200 grams of sugar per day and they're now at high heart disease risk now at age 60. But maybe this is someone who is at low heart disease risk at age 60 even though they consumed a lot of sugar everyday. And so we just keep plotting all of these points. And you say, "Well, you know what, it does look" and obviously I'm not going to all 300. You say, "Well, actually, it looks like there is "a rough correlation right over here." That if you tried to plot a line, there's definitely some outliers here, but it looks like there is a line that you could fit. And then you might say, "It looks like sugar and heart "disease risk at age 60, that they are correlated, "that they're related in this way, that they move together, "that if someone consumed a lot of sugar over "the last 30 years they seem to have a worse "heart situation and if they consumed a lot lower sugar, "they seem to have a lower or better heart situation." This often happens in medical science, when people see something like this they often jump to the conclusion, oh therefore consuming more sugar must drive up heart disease risk. That's very dangerous because just seeing this data doesn't tell you that sugar is causing heart disease risk. It could go the other way around. It could be that people who end up having a high heart disease risk, maybe they crave sugar, but actually there's some other underlying cause that's making it happen. Maybe they have some other deficiency that's making them crave sugar. So it's not clear which way the causality is happening. Is the sugar consumption driving the unhealthy heart or is somehow the unhealthy heart driving the sugar consumption or maybe there's some other factor. Maybe fat consumption is driving the heart disease and maybe people who have more fat also will have more sugar or vice versa, who knows. All this is telling you is that there is a correlation. So this right over here, you would call this an observational study. You've observed a relationship, but you really can't say what is causing what. So let me write this down. This is an observational study. You're probably saying, "Alright, then how could you prove "or feel better about the idea that sugar is actually a "cause, that there's actually causality there?" To do that, you would actually have to run an experiment. An experiment. To do an experimental study here, what you would do is try to take two groups of folks. You would have your experimental group. So that's your experimental group. Actually, let me make it a circle here so it's a pool of people. Let's say you have 100 people that are experimental. That's your experimental pool. Then you have your control. You have your control. What you would do, if you wanted to run this type of experiment and as we'll see, this type of experiment probably wouldn't be run because some would consider it unethical or actually I would consider it unethical as well. But what you would do is you would randomly, let's say take 30 year olds, you would randomly take 30 year olds and put them in one of these two groups. Once again, when we say randomly, you don't want to put all the healthy people in one group and all the unhealthy people in the other group or vice versa. You want it to be random, you don't want to put all of the people of one type, one demographic, one economic status in one group or the other, you want it to be random. So you randomly put people into these two groups and then the experimental group, you would change one variable. The variable you care about is sugar. So what you might say is, "Okay, all of the people in this "group right over here, whatever sugar that they would have "consumed, on top of that they have to drink, I don't know, "they have to drink a cup of syrup every night or they "have to have a minimum sugar intake." So that you essentially force sugar into this group that you're not forcing into this group. And then 30 years later, so one, this is probably unethical to force people to have something that is very likely not good for their health and then you would have to run it for a long period of time, you would wait 30 years. You would wait 30 years when they're 60 years old and you would see what was the heart condition of these folks. How many people maybe had heart attacks? At age 60, what is their health condition? And then statistically, is it unlikely that the difference would be due purely to chance alone? For example, if you did this, and let's say that yeah these people had a slightly higher chance for heart disease or heart attacks than these folks after 30 years. It would be a good experiment, but it wouldn't allow you to conclude that sugar is causing it, cause that might have happened through chance alone. But if, for example, after 30 years, let's say this group right over here has 10 times the risk of a heart attack or 10 times of whatever the risk factors for heart disease. You would statistically say the odds of that happening by chance alone, that the 100 people in this group right over here having 10 times the chance of heart attack as this group right over here, that's unlikely due to chance alone. So you would say, "Okay!" We would feel good about our conclusion that this forced sugar is what is causing that. Anyway, we'll dig deeper into each of these three types, but the whole point of this video is to just give you an appreciation that, you know, we use statistics a lot, but this gives you a context for how we're using it in different situations when we're performing statistical studies. This is to estimate the true parameter for a population. What is the actual sugar intake for the population? You randomly sample and then you use that sample data to create a statistic that estimates the true parameter. Observational study, you observe what's going on. Sugar intake versus heart risk. You say, "Hey, there's a relationship maybe "this is worth doing an experiment on." Because only through an experiment could you attempt to find some type of causality.