If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains ***.kastatic.org** and ***.kasandbox.org** are unblocked.

Main content

Current time:0:00Total duration:10:32

AP.STATS:

DAT‑1 (EU)

, DAT‑1.C (LO)

, DAT‑1.C.2 (EK)

, DAT‑2 (EU)

, DAT‑2.A (LO)

, DAT‑2.A.3 (EK)

, DAT‑2.A.4 (EK)

, DAT‑2.B (LO)

, DAT‑2.B.3 (EK)

, VAR‑3 (EU)

, VAR‑3.E (LO)

, VAR‑3.E.1 (EK)

, VAR‑3.E.2 (EK)

, VAR‑3.E.3 (EK)

- [Instructor] Talk about the main types of statistical studies. So you can have a sample study and we've already talked
about this in several videos but we'll go over it again in this one. You can have an observational
study, observational study. Or you can have an experiment, experiment. So let's go through each of these and always pause this
video and see if you can think about what these words likely mean, or you might already know. Well, sample study, we have looked at. This is really where
you're trying to estimate the value of a parameter for a population. So what's an example of that? So let's say we take the
population of people in a city, and so that could be hundreds
of thousands of people, and the parameter that you care about is how much time on average do
they spend on a computer. So the parameter would be
for the entire population. If it was possible, you
would go talk to every, maybe there's a million
people in the city. You would talk to all
million of those people and ask them how much time
they spend on a computer and you would get the average and then that would be the parameter. So population parameter,
population parameter, would be average time
on a computer per day, average daily time, time on a computer. Now you'd determine that it's impractical to go talk to everyone, so
you're not going to be able to figure out the exact
population parameter, average daily time on a computer, so instead, you do a sample study. You randomly sample, and
there's a lot of thought in thinking about whether
your sample is truly random, so you randomly sample and there's also different techniques of randomly sampling. So you randomly sample
people from your population and then you take the average
daily time on a computer for your sample, and that
is going to be an estimate for the population parameter. So that's your classic sample study. Now in an observational study, you're not trying to estimate a parameter. You're trying to understand
how two parameters in a population might
move together or not. So let's say that you
have a population now, so let's say you have a population of, let's say you have a
population of 1,000 people. 1,000 people, and you're
curious about whether average daily time on a computer, how it relates to people's blood pressure. So average computer time, oh, I shouldn't be writing this way. Instead of average computer time, it should just be computer time. Computer time versus blood
pressure, blood pressure. So what you do is you apply
a survey to all 1,000 people and you ask them how much
time you spend on a computer and what is your blood pressure? Or maybe you measure it in some way, and then you plot it
all, you look at the data and you see if those two
variables move together. So what does that mean? Well, let me draw. If this axis is, let's
say this is computer time. Computer time, and this
axis is blood pressure. Blood pressure. So let's say that there's one person who doesn't spend a lot
of time on a computer and they have a relatively
low blood pressure. There's another person
who spends a lot of time, has high blood pressure. There could be someone who
doesn't spend much time on a computer but has a
reasonably high blood pressure, but you keep doing this and
you get all these data points for those 1,000 people, and
I'm not going to sit here and draw 1,000 points, but
you see something like this, and so you see, hey, look, it looks like there's definitely some outliers but it looks like these two
variables move together. It looks like, in general,
the more computer time, the higher the blood pressure, or the higher the blood
pressure, the more computer time. And so you can make a conclusion here about these two variables correlating, that they're positively correlated. There is a positive, a
reasonable conclusion if you did the study
appropriately would be that more computer time correlates with higher blood pressure or that higher blood pressure correlates with more computer time. Now, when you do these
observational studies or when you interpret these
observational studies, when you read someone
else's, it's very important not to say oh, well, this shows me that computer time causes blood pressure, because this is not showing causality, and you also can't say,
maybe you might say, somehow blood pressure causes more people to spend time in front of a computer. That seems even a little bit sillier, but they're actually the same 'cause all you're saying is
that there's a correlation. These two variables move together. You can't make a
conclusion about causality, that computer time causes blood pressure or that high blood pressure
causes more computer time. Why can't you make that? Well, there could be what's
called a confounding variable, sometimes called a lurking variable, where let's say that, so
this is computer time. Computer time, and this is blood pressure. Looks like building, so
blood, blood pressure. And it looks like these
two things move together. We saw that right over here in our data, but there could be a root variable that drives both of these,
a confounding variable, and that could just be the amount of physical activity someone has. So there could just be a
lack of physical activity driving both, lack of activity. People who are less active spend more time in front of a computer, and people who are less active
have higher blood pressure, and if you were to control for this, if you were to take a bunch of people who had a similar lack of activity or had a similar level of activity, you might see that computer time does not correlate with blood pressure, that these are just both
driven by the same thing and what you're really
seeing here is like, okay, people who aren't active
drives both of these variables. So once again, when you do
this observational study and if you do it well,
you can draw correlations and that might give you decent
hypotheses for causality, but this does not show causality because you could have
these confounding variables. Now, experiments, and experiments are the basis of the scientific method. Experiments are all about
trying to establish causality, and so what you would do is if you wanted to do an
experiment, you would take, and you probably wouldn't be able to do it with 1,000 people. Experiments in some ways are the hardest to do of all of these. Maybe you take 100 people, 100 people, and to avoid having this
confounding variable introduce error into your experiment, you randomly assign these
hundred people to two groups. So random assign, it's very important that they're randomly assigned. And that's nice, you might not know all of the confounding variables there, but it makes it likely
that each group will have a same amount of people
with lack of activity or the activity levels on
average in each of the groups, when they're randomly assigned,
it gives you a better chance that one group doesn't have
a significantly different activity level than the other. And then what you do is you have a control group and you
have a treatment group. Once again, you've randomly assigned them. So a control and then treatment. And what you might say is, okay, for some amount of time, all
of you in the control group can only spend max of 30
minutes in front of a computer, or maybe if you really
wanted to do it, you'd say you have to spend exactly
30 minutes on a computer and that's maybe a little unrealistic, and then the treatment
group, you have to say, you have to spend exactly two
hours in front of a computer, and I'm making up these numbers at random, and it would be nice to see, okay, what as everyone's blood
pressure before the experiment? And you'd say, okay, well,
the averages are similar going into the experiment, and then you go some amount of time and you measure blood pressure, and if you see that, wow, this group definitely has a higher blood pressure, this group has a higher blood pressure, so the blood pressure is higher here, and once again, some of this might have just happened randomly, it
might've been the people you happened to put in
there, et cetera, et cetera, but depending if this was
a large enough experiment and you conducted it well,
this says, hey, look, I'm feeling like there
is a causality here, that by making these
people spend more time in front of a computer, that that actually raised
their blood pressure. So once again, sample study, you're trying to estimate
a population parameter. Observation study, you are
seeing if there is a correlation between two things and
you have to be careful not to say, hey, one is causing the other 'cause you could have
confounding variables. Experiment, you're trying to establish or show causality and you do
that by taking your group, randomly assigning to
a control or treatment. That should evenly or
hopefully evenly distribute. Not always, there's
some chance it doesn't, but distribute the confounding variables and then on each group,
you change how much of one of these variables they get and you see if it drives
the other variable. So anyway, in the next two videos, we'll do some examples of identifying these types of sample studies and thinking about what
we can conclude from them, or these types of statistical studies and see what we can conclude from them.