If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content
Current time:0:00Total duration:10:32
AP.STATS:
DAT‑1 (EU)
,
DAT‑1.C (LO)
,
DAT‑1.C.2 (EK)
,
DAT‑2 (EU)
,
DAT‑2.A (LO)
,
DAT‑2.A.3 (EK)
,
DAT‑2.A.4 (EK)
,
DAT‑2.B (LO)
,
DAT‑2.B.3 (EK)
,
VAR‑3 (EU)
,
VAR‑3.E (LO)
,
VAR‑3.E.1 (EK)
,
VAR‑3.E.2 (EK)
,
VAR‑3.E.3 (EK)

Video transcript

talk about the main types of statistical studies so you could have a sample study and we've already talked about this in several videos but we'll go over it again in this one you can have an observational study observationally study or you can have an experiment experiment so let's go through each of these and always pause this video and see if you can think about what these words likely mean if you or you might already know well sample study we have looked at this is really where you're trying to estimate the value of a parameter for a population so what's an example of that let's say we take the population of people in a city and so that could be hundreds of thousands of people and the parameter that you care about is how much time on average do they spend on a computer so the parameter would be for the entire population if it was possible you would go talk to every maybe there's a million people in the city you would talk to all million of those people and ask them how much time they spent on a computer and you would get the average and then that would be the parameter so population parameter population parameter would be average time on a computer per day average daily time time on a computer now you determined that it's impractical to go talk to everyone so you're not going to be able to figure out this the exact population parameter average daily time on a computer so instead you do a sample study you randomly sample and there there's a lot of thought and thinking about whether your sample is truly random so you randomly sample there's also different techniques of randomly sampling so you randomly sample people from your population and then you take the average daily time on a computer for your sample and that is going to be an estimate for the population parameter so that's your classic samples now in an observational study you're you're not trying to estimate a parameter you're trying to understand how two parameters in a population might move together or not so let's say that you have a population now so let's say let's say you have a population of two you have a population of 1,000 people 1,000 people and you're curious about whether average daily time on a computer how it relates to people's blood pressure so average computer time I'll actually let me write it this way instead of average computer time you should just be computer time computer time versus versus blood pressure blood pressure so what you do is you apply a survey to all 1,000 people and you ask them how much time you spend on a computer and what is your blood pressure or maybe you measure it in some way and then you plot it all you look at the data and you see if there if those two variables move together so what does that mean well let me draw so if if this axis is let's say this is computer time computer time and this axis is blood pressure blood pressure so let's say that there's one person who doesn't spend a lot of time on a computer and they have a relatively low blood pressure there's another person who spends a lot of time has high blood pressure there could be someone who doesn't spend much time on a computer but has a reasonably high blood pressure but you keep doing this and you get all these data points for those 1,000 people and I'm not going to sit here and draw 1000 points but you see something like this and so you see hey look it looks like there's definitely some outliers but it looks like these two variables move together it looks like in general the more compute the higher the blood pressure or the higher the blood pressure the more computer time and so you can make a conclusion here about these two variables correlating that they're positively correlated there is a positive a reasonable conclusion if you did the study appropriately would be that more computer time correlates with higher blood pressure that higher blood pressure correlates with more computer time now when you do these observational studies or when you interpret these observational studies when you read about someone else's it's very important not to say oh well this this is this shows me that computer time causes blood pressure because this is not showing causality and you also can't say maybe because might say somehow people blood pressure causes more people to spend time in front of a computer that seems even a little bit sillier but they're actually the same because all you're saying is that there's a correlation these two variables move together you can't make a conclusion about causality that computer time causes blood pressure or that blood pressure causes some time high blood pressure causes more computer time why can't you make that well there could be what's called a confounding variable sometimes a look called a lurking variable where let's say that so this is computer time computer time and this is blood pressure I'll just write like that blood looks like building so blood blood pressure and it looks like these two things move together we saw that right over here on in our data but there could be a root variable that drives both of these are confounding variable and that could just be the amount of physical activity someone has so there could just be a lack of physical activity driving both lack of activity people who are like less active spend more time in front of a computer and people who are less active have higher have higher blood pressure and if you were to control for this if you were to take a bunch of people who had the simulant who had a similar lack of activity or had a similar level of activity you might see that computer time does not correlate with blood pressure that these are just both driven by the same thing and what you're really seeing here is like okay in people with a high lack of act or who aren't active well it drives both of these variables so once again when you do this observational study and if you do it well you can draw correlations and that might give you decent hypotheses for causality but this does not show causality because you could have these these confounding variables now experiments and experiments are the basis of the scientific method experiments are all about trying to establish causality and so what you would do is if you wanted to do an experiment you would take and you probably wouldn't be able to do it with a thousand people experiments in some way are the hardest to do of all of these maybe you take a hundred people a hundred people and to avoid having this confounding variable introduce error into your experiment you randomly assign these hundred people into two groups so random assign so random it's very important that they're randomly assigned and that's nice you might not know all of the confounding variables there but it makes it likely that each group will have a same amount of people with lack of activity or than the lack of activity or the activity levels on average in each of the groups when they're randomly assigned it gives you a better chance that you know one group doesn't have a significantly different activity level than the other and then what you do is you have a control group and you have a treatment group once again you've randomly assigned them so control and then treatment and what you might say is okay for some amount of time all of you in the control group can only spend you know max max of 30 minutes in front of a computer and on the or maybe if you really wanted to do it you say you have to spend exactly 30 minutes on a computer and that's maybe a little unrealistic and then the treatment group you have to say you have to spend exactly two hours in front of a computer and I'm making up these numbers at random and it would be nice to see okay what was everyone's blood pressure before the experiment and you could say okay well the averages are similar going into the Armant and then you go some amount of time and you measure blood pressure and if you see that wow this group definitely has a higher blood pressure this this group has a higher blood pressure so the blood pressure the blood pressure is higher here and once again some of this might have just happened randomly it might have been you know the people you happen to put in there etc etc but depending if this was a large enough experiment and you conducted it well this says hey look there's I'm I'm feeling like there's a causality here that by making these people spend more time in front of a computer that that actually raised their blood pressure so once again sample study you're trying to estimate a population parameter observation study you are seeing if there is a correlation between two things and you have to be careful not to say hey one is causing the other because you could have confounding variables experiment you're trying to establish or show causality and you do that by taking your taking your group randomly assigning to a control or treatment that should evenly or hopefully evenly distribute not always there's some chance it doesn't but distribute the confounding variables and then you you on the on on each group you you change how much of one of these variables they get and you see if it drives the other variable so anyway and the next few videos we'll do some examples of identifying these types of sample studies and thinking about what we can conclude from them or these types of statistical studies and see what we can conclude from them