If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content
Current time:0:00Total duration:10:28

Introduction to experiment design

Video transcript

so let's say that I am a drug company and I have come up with a medicine that I think will help folks with diabetes and in particular I think it will help reduce their hemoglobin a1c levels and for those of you who aren't familiar with what hemoglobin a1c is I encourage you we have a video on that on Khan Academy but the general idea is if you have high blood sugar over roughly a three-month period of time high blood sugar and I could say high average blood sugar you're going to have a high a 1c high hemoglobin a1c level and if you have a low average blood sugar over roughly a three-month time you're going to have a lower hemoglobin a1c so if taking the pill seems to lower folks a1c levels more than is likely to happen due to randomly or due to other variables well then that means that your new pill might be effective at controlling folks diabetes so in this situation what we're trying to when we're constructing an experiment to test this we would say that whether you are whether or not you are taking the pill this is the explanatory variable explanatory variable and the thing that it is affecting the thing that you're hoping has some response in this case you want to the a1c levels are your indicator of whether it is help controlling the blood sugar we call that the response variable that right over there is the response variable so how are we actually going to conduct this experiment well let's say that we have a group of folks let's say that we have given a group of 100 folks who need to control their diabetes so 100 100 people here who need to control their diabetes okay all right well let's take half of this group and put them into a I guess you could say a treatment group and another half and put them into a control group and see if the treatment group the one that actually gets my pill is going to improve their a1c levels in a way that seems like it would not be just random chance so let's do that so we're going to have a control group so this is my control group control and this is the treatment group this is the treatment treatment group and you might say okay we'll just give these folks the treatment group the pill and then we won't give the pill that I created to the control group but that might introduce a psychological aspect that you know maybe the maybe the benefit of the pill is just people failing hey I'm taking something that'll control my diabetes maybe that psychologically affects their their blood sugar in some way and this is actually possible maybe it makes them act healthier in certain ways maybe that makes them act unhealthier in certain ways because like oh I have a I have a pill to control my my diabetes my blood sugar I can go eat more sweets now and it'll control it and so to avoid that in or in order for just the very fact that someone says hey I think I'm taking a medicine I might behave in a different way or might even psychologically affect my my body in certain way what we want to do is give both groups a pill and we want to do in a way that both neither group knows which pill they're getting so what we would do here is we would give this group a placebo a placebo and this group would actually get the medicine the medicine but those pills should look the same and people should not know which group they are in and that is a when we call it when we do that that is a blind experiment experiment now you might have heard about double-blind experiments well that would be the case where not only does the not only do people not know which group they're in but even their physician or the person who's administering the experiment they don't know which which one they're giving they don't know if they're giving the placebo or the the actual medicine to the group so let's say we want to do that so we're going to we could do double double blind experiment so even the person giving the pill doesn't know which pill they're giving and you might say well why is that important well if the physician knows it might or the person administering or interfacing with the patient they might give a tell somehow they might not put as much emphasis on the importance of taking the pill if it's a placebo they might by accident give away some type of information so to avoid that type of thing happening you would have a you could do a double-blind and there's even some people talk about a triple blind experiment where even the people analyzing the data don't know which group was the control group and which group was the treatment group and once again that's another way to avoid bias so now that we've kind of figured out we have a control group we have a treatment group we're using a1c is our response variable so we would want to measure folks a1c levels their hemoglobin a1c levels before they get either the placebo or the medicine and then maybe after three months we would measure their a1c after but the next question is how do you divvy these hundred people up into these two groups and you might say well I would want to do it randomly and you would be right because if you didn't do it randomly if you put all the men here and all the women here well that might you know first of all sex might explain it or the behavior of men versus women might might might explain the differences or the the not the non differences you see an a1c level if you get a lot of people of you know one age or one part of the country or one type of dietary habit you don't want that so in order to avoid having an imbalance of some of those lurking variables you would want to randomly sample and we've done multiple videos already on ways to randomly sample so you're going to randomly sample and put people into either groups you know in a very simple way of doing that you could give everyone here a number from 1 to 100 use a random number generator to do that and then you know if well or you could use a random number generator to pick 50 names to put in the control group or 50 names to put in the treatment group and then everyone else gets but gets put in the other group now to avoid a situation you know just randomly by doing a random sample you might have a situation where there's some probability that you disproportionately have more men in one group or more women in another group and to avoid that you could do really a version of stratified sampling that we've talked about in other videos which is you could do what's called a block design for your random assignment where you actually split everyone into men and women and it might be 50/50 or it might even be you know just randomly here you got you know 60 women 60 women and 40 and 40 men and what you do here is you say okay let's let's randomly take thirty of these women and put them in the control group and thirty of the women and put them in the treatment group and let's put randomly twenty of the men in the control group and twenty of the men in the treatment group and that way someone sex will is less likely to to introduce bias into what actually happens here so once again that doing this is called a is a block design really a version of stratified sampling block design and there might be other lurking variables that you want to make sure it doesn't just show up here randomly and so you might want there's other ways of randomly assigning now once you do this you see what was the change in a one see if you see that hey you know the the the change in a one C or one if you see there's no difference in a one C levels between these two groups and you're like hey there's a good probability that my pill does nothing even and once again it's all about probabilities there's some chance that you were just unlucky and it might be a very small chance and that's why you want to do this with a good number of people and as we forward our statistics understandings we will better understand at what threshold levels do we think the probability is high or low enough for us to really feel good about our findings but let's say that you do see let's say that you do see an improvement you need to think about is that improvement could that have happened due to random chance or is it very unlikely that that happened due purely due a random chance and if it was very unlikely that it happened do poorly to random chance then you would feel pretty good and other people when you publish the results would feel really good about your medicine now even then you know science is not done here no one will say that there are a hundred percent sure that your medicine is good there still might have been some lurking variables that we did not that our experiment did not properly adjust for that just when we even did this block design we might have disproportionately gotten randomly older people in one of the groups or the other or people from one part of the country in one group or another so there's always things to think about and the most important thing to think about even if you did this as good as you could you still some random chance might have given you a false positive or a you know you got good results even though it was random or a false negative you got bad results even though it was even though it was actually random and so a very important idea in experiments and this is in science in general is that this experiment you should document it well and it should be it should be it should is that it the process of replication other people should be able to replicate this experiment and hopefully get consistent results so it's not just about the results it's your experiment design other people should it should be it should be an experiment that other people could and should replicate to reinforce the idea that your results are actually true and not just random or just due to some bad administration of the actual experiment