If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Introduction to experiment design

Introduction to experiment design. Explanatory and response variables. Control and treatment groups.

Want to join the conversation?

  • blobby green style avatar for user Idan Harat
    I have never heard the terms Explanatory and response variables. They are usually called predictor\independent variable and dependent or outcome variable.
    (8 votes)
    Default Khan Academy avatar avatar for user
  • hopper jumping style avatar for user UnicornMeat
    What exactly is a lurking variable?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • duskpin ultimate style avatar for user Em
      If I understand Sal correctly, it's just what he calls a variable that you didn't plan for in your experiment. For example, say you're trying to see if birds' feather colors make them faster, and you have a group of blue birds and a group of red birds. But you don't take into account that in some species of birds the females are larger and therefore slower, and so the "lurking variable" of the sex of the bird throws off your experiment.
      (6 votes)
  • blobby green style avatar for user ellenpersson123
    What´s the difference between a replicate and a repetition?
    Are there any definition of a "replicate"?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • duskpin ultimate style avatar for user Vyacheslav Shults
      Replication is the strict repetition of an experimental condition so that the variability associated with the phenomenon can be estimated. It assumes that we can repeat this experiment in every detail. In formal definition "the repetition of the set of all the treatment combinations to be compared in an experiment. Each of the repetitions is called a replicate."
      (2 votes)
  • blobby green style avatar for user Caroline Wong
    What if...a Drug Company wants to test a new (expensive & difficult to deliver) drug. They decide to do this:
    Phase 1: give all subjects a placebo. Remove any responders.
    Phase 2: give remaining subjects the drug. Analyze effect.

    Is this a legitimate design because all placebo responders have been removed at the start of the study?
    Is this a more ethical design?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • cacteye blue style avatar for user Jerry Nilsson
      Sadly, the placebo effect doesn't work like that.
      It's not like an on/off switch where you either get a response or not.
      Also the effect tends to get stronger over time, so the subjects that didn't show a strong response in phase 1 may develop one in phase 2.
      (2 votes)
  • duskpin tree style avatar for user A. Yaya
    So what is the difference between a block design and a SRS? Is it just that the block is used in an experiment and the SRS in used in a survey?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user David Lee
      you can use an SRS in an experimental design. Block design are for experiments and a stratified sample is used for sampling. Blocking implies that there is some known variable that can affect the response variable or the overall experiment. In the video the example would have been gender because maybe there were more men in the treatment group than the control group and women would react differently than men to the pill.
      (1 vote)
  • male robot donald style avatar for user Cobra Coder
    At , Sal talks about a triple-blind experiment where neither the people taking the pill, the person giving them the pill, nor the ones analyzing the data know which one it is. But if nobody knows which pill is given to each person, how can you test for causality between the medicine and the A1c levels?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      In a triple-blind experiment, neither the participants, the individuals administering the interventions, nor the researchers analyzing the data know which group received the treatment or control condition. Despite the lack of awareness about the assigned interventions, researchers can still test for causality by comparing the outcomes between the treatment and control groups. The key is to ensure that the only systematic difference between the groups is the intervention being studied. By controlling for other variables and randomizing group assignment, researchers can infer causal relationships between the intervention and the outcomes of interest.
      (1 vote)
  • male robot johnny style avatar for user Adrian Belen
    what are the differences between grouping and sampling? should get a sample from a population, and then do an experimental and control group
    (1 vote)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      Grouping and sampling are different concepts in experimental design. Sampling involves selecting a subset of individuals or items from a population to participate in the study. Grouping, on the other hand, refers to the division of participants into different categories or conditions, such as control and treatment groups, based on the research design. While sampling ensures that the study sample is representative of the population of interest, grouping allows for comparison between different conditions to assess the effects of interventions.
      (1 vote)
  • blobby green style avatar for user mr.joe.bedard
    "...you could do really a version of stratified sampling that we've talked about in other videos, which is you could do what's called a block design for your random assignment where you actually split everyone into men and women..." Sal is rambling and this is confusing. To say that something is a "version of" is to say it is a synonym. This audio doesn't make a clear distinction between stratified sampling and block design. Could you please fix this video?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      Sal's explanation of stratified sampling and block design could be clarified to differentiate between the two concepts. Stratified sampling involves dividing the population into homogeneous subgroups (strata) based on certain characteristics (e.g., gender) and then randomly selecting samples from each subgroup. Block design, on the other hand, involves grouping participants into blocks based on specific variables (e.g., gender) and then randomly assigning treatments within each block to ensure balance across treatment groups. Clarifying these distinctions would enhance understanding.
      (1 vote)
  • blobby green style avatar for user Bryan Malakou
    Can you not have a placebo group but rather another drug which previously had placebo controlled trials?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      While a placebo group is commonly used in clinical trials to control for the placebo effect and assess the efficacy of a new treatment, it's possible to compare the new treatment to an existing standard treatment instead of a placebo. In such cases, participants in the control group receive the standard treatment, which has already undergone placebo-controlled trials to establish its efficacy. This approach allows researchers to evaluate the relative effectiveness of the new treatment compared to the established standard, providing valuable insights for clinical practice.
      (1 vote)
  • blobby green style avatar for user Thompson, Jenna
    An explanatory variable is what you manipulate or observe changes, while a response variable is what changes as a result.
    (1 vote)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      An explanatory variable is indeed the variable that is manipulated or observed to see changes, while a response variable is the variable that changes as a result of the manipulation or observation. This distinction is crucial in experimental design as it helps researchers identify causal relationships between variables and understand the effects of interventions or treatments on outcomes.
      (1 vote)

Video transcript

- [Instructor] So let's say that I am a drug company and I have come up with a medicine that I think will help folks with diabetes, and in particular, I think it will help reduce their hemoglobin A1c levels, and for those of you who aren't familiar with what hemoglobin A1c is, I encourage you, we have a video on that on Khan Academy, but the general idea is if you have high blood sugar over roughly a three-month period of time, high blood sugar, and I can say high average blood sugar, you're going to have a high A1c, a high hemoglobin A1c level and if you have a low average blood sugar over roughly a three-month time, you're going to have a lower hemoglobin A1c. So if taking the pill seems to lower folks' A1c levels more than is likely to happen due to randomly or due to other variables, well then that means that your new pill might be effective at controlling folks' diabetes. So in this situation, when we're constructing an experiment to test this, we would say that whether or not you are taking the pill, this is the explanatory variable. Explanatory variable, and the thing that it is affecting, the thing that you're hoping has some response, in this case the A1c levels are your indicator of whether it is help controlling the blood sugar, we call that the response variable. That right over there is the response variable. So how are we actually going to conduct this experiment? Well let's say that we have a group of folks, let's say that we have been given a group of 100 folks who need to control their diabetes. So 100 people here who need to control their diabetes, and we say, "Alright, well let's take half of this group "and put them into, I guess you could say a treatment group "and another half and put them into a control group "and see if the treatment group, the one that actually "gets my pill is going to improve their A1c levels in a way "that seems like it would not be just random chance." So let's do that, so we're going to have a control group, so this is my control group, control, and this is the treatment group, this is the treatment group. And you might say, "Okay, we'll just give these folks, "the treatment group the pill and then we won't give "the pill that I created to the control group." But that might introduce a psychological aspect that maybe the benefit of the pill is just people feeling, "Hey I'm taking something that'll control my diabetes," maybe that psychologically affects their blood sugar in some way and this is actually possible, maybe it makes them act healthier in certain ways, maybe that makes them act unhealthier in certain ways 'cause they're like, "Oh I have a pill to control "my diabetes, my blood sugar, I can go eat "more sweets now and it'll control it." And so to avoid that, in order for just the very fact that someone says, "Hey I think I'm taking a medicine, "I might behave in a different way or it might even "psychologically affect my body in a certain way," what we wanna do is give both groups a pill, and we wanna do it in a way that neither group knows which pill they're getting. So what we would do here is we would give this group a placebo, a placebo, and this group would actually get the medicine, the medicine, but those pills should look the same, and people should not know which group they are in and that is a, when we do that, that is a blind experiment, experiment. Now you might have heard about double-blind experiments. Well that would be the case where not only do people not know which group they're in, but even their physician or the person who's administering the experiment, they don't know which one they're giving, they don't know if they're giving the placebo or the actual medicine to the group. So let's say we wanna do that. So we could do double, double-blind experiment, so even the person giving the pill doesn't know which pill they're giving. And you might say, "Well why is that important?" Well if the physician knows, or the person administering or interfacing with the patient, they might give a tell somehow, they might not put as much emphasis on the importance of taking the pill if it's a placebo, they might by accident give away some type of information. So to avoid that type of thing happening, you could do a double-blind, and there's even, some people talk about a triple-blind experiment where even the people analyzing the data don't know which group was the control group and which group was the treatment group, and once again, that's another way to avoid bias. So now that we've kinda figured out, we have a control group, we have a treatment group, we're using A1c as our response variable, so we would wanna measure folks' A1c levels, their hemoglobin A1c levels before they get either the placebo or the medicine and then maybe after three months, we would measure their A1c after, but the next question is, how do you divvy these 100 people up into these two groups, and you might say, "Well I would wanna do it randomly," and you would be right 'cause if you didn't do it randomly, if you put all the men here and all the women here, well that might, first of all, sex might explain it or behavior of men versus women might explain the differences or the non-differences you see in A1c level, if you get a lot of people of one age or one part of the country or one type of dietary habit, you don't want that, so in order to avoid having an imbalance of some of those lurking variables, you would want to randomly sample and we've done multiple videos already on ways to randomly sample, so you're going to randomly sample and put people into either groups. And a very simple way of doing that, you could give everyone here a number from one to 100, use a random number generator to do that and then, or you could use a random number generator, pick 50 names to put in the control group or 50 names to put in the treatment group and then everyone else gets put in the other group. Now, to avoid a situation, just randomly by doing a random sample, you might have a situation where there's some probability that you disproportionately have more men in one group or more women in another group and to avoid that, you could do really a version of stratified sampling that we've talked about in other videos, which is you could do what's called a block design for your random assignment where you actually split everyone into men and women and it might be 50-50 or it might even be just randomly here you got 60 women, 60 women and 40 men and what you do here is you say, "Okay let's randomly "take 30 of these women and put 'em in the control group "and 30 of the women and put 'em in the treatment group "and let's put randomly 20 of the men in the control group "and 20 of the men in the treatment group" and that way someone's sex is less likely to introduce bias into what actually happens here, so once again, doing this is called a block design, really a version of stratified sampling. Block design, and there might be other lurking variables that you wanna make sure doesn't just show up here randomly and so you might want, there's other ways of randomly assigning. Now once you do this, you see what was a change in A1c. If you see that, hey, the change in A1c, one if you see there's no difference in A1c levels between these two groups, and you're like, "Hey, there's a good probability that my pill does nothing" and once again it's all about probabilities, there's some chance that you're just unlucky and it might be a very small chance and that's why you wanna do this with a good number of people and as we forward our statistics understandings, we will better understand at what threshold levels do we think the probability is high or low enough for us to really feel good about our findings. But let's say that you do see an improvement, you need to think about, is that improvement, could that have happened due to random chance or is it very unlikely that that happened due purely to random chance, and if it was very unlikely that it happened due purely to random chance, then you would feel pretty good, and other people when you publish the results, would feel pretty good about your medicine. Now, even then, science is not done. No one will say that they are 100% sure that your medicine is good, there still might have been some lurking variables that we did not, that our experiment did not properly adjust for, that just when we even did this block design, we might have disproportionately gotten randomly older people in one of the groups or the other or people from one part of the country in one group or another so there's always things to think about and the most important thing to think about, even if you did this as good as you could, you still, some random chance might have given you a false positive, you got good results even though it was random, or a false negative, you got bad results even though it was actually random. And so a very important idea in experiments and this is in science in general is that this experiment, you should document it well and it should be, the process of replication, other people should be able to replicate this experiment and hopefully get consistent results so it's not just about the results, it's your experiment design, other people should, it should be an experiment that other people could and should replicate to reinforce the idea that your results are actually true and not just random or just due to some bad administration of the actual experiment.