Main content

## Introduction to scatterplots

Current time:0:00Total duration:8:13

# Bivariate relationship linearity, strength and direction

AP.STATS:

DAT‑1 (EU)

, DAT‑1.A (LO)

, DAT‑1.A.1 (EK)

, DAT‑1.A.2 (EK)

, DAT‑1.A.3 (EK)

, DAT‑1.A.4 (EK)

, DAT‑1.A.5 (EK)

, DAT‑1.A.6 (EK)

CCSS.Math: ## Video transcript

- [Instructor] What we have here is six different scatter plots that show the relationship between
different variables. So, for example, in this one here, in the horizontal axis, we might have something like age, and then here it could be accident frequency. Accident frequency. And I'm just making this up. And I could just show these data points, maybe for some kind of statistical survey, that, when the age is this,
whatever number this is, maybe this is 20 years old,
this is the accident frequency. And it could be a number
of accidents per hundred. And that, when the age is 21 years old, this is the frequency. And so, these data
scientists, or statisticians, went and plotted all of
these in this scatter plot. This is often known as bivariate data, which is a very fancy way of saying, hey, you're plotting things that take two variables into consideration, and you're trying to see whether there's a pattern with how they relate. And what we're going to do in this video is think about, well,
can we try to fit a line, does it look like there's a linear or non-linear relationship between the variables on the different axes? How strong is that variable? Is it a positive, is it
a negative relationship? And then, we'll think about
this idea of outliers. So let's just first think about whether there's a linear
or non-linear relationship. And I'll get my little
ruler tool out here. So, this data right over here, it looks like I could get a,
I could put a line through it that gets pretty close through the data. You're not gonna, it's very unlikely you're gonna be able to go
through all of the data points, but you can try to get a
line, and I'm just doing this. There's more numerical, more
precise ways of doing this, but I'm just eyeballing
it right over here. And it looks like I could plot a line that looks something like that, that goes roughly through the data. So this looks pretty linear. And so I would call this
a linear relationship. And since, as we increase one variable, it looks like the other
variable decreases. This is a downward-sloping line. I would say this is a negative. This is a negative linear relationship. But this one looks pretty strong. So, because the dots aren't
that far from my line. This one gets a little bit further, but it's not, there's not
some dots way out there. And so, most of 'em are
pretty close to the line. So I would call this a negative, reasonably strong linear relationship. Negative, strong, I'll call it reasonably, I'll just say strong,
but reasonably strong, linear, linear relationship
between these two variables. Now, let's look at this one. And pause this video and think about what this one would be for you. Well, let's see. I'll get my ruler tool out again. And it looks like I can try to put a line, it looks like, generally speaking, as one variable increases,
the other variable increases as well, so something like this goes through the data and
approximates the direction. And this looks positive. As one variable increases, the other variable increases, roughly. So this is a positive relationship. But this is weak. A lot of the data is off,
well off of the line. So, positive, weak. But I'd say this is still linear. It seems that, as we increase one, the other one increases
at roughly the same rate, although these data points
are all over the place. So, I would still call this linear. Now, there's also this notion of outliers. If I said, hey, this line is trying to describe the data,
well, we have some data that is fairly off the line. So, for example, even though we're saying it's a positive, weak,
linear relationship, this one over here is reasonably high on the vertical variable, but it's low on the horizontal variable. And so, this one right
over here is an outlier. It's quite far away from the line. You could view that as an outlier. And this is a little bit subjective. Outliers, well, what looks pretty far from the rest of the data? This could also be an outlier. Let me label these. Outlier. Now, pause the video and see if you can think about this one. Is this positive or
negative, is it linear, non-linear, is it strong or weak? I'll get my ruler tool out here. So, this goes here. It seems like I can fit a
line pretty well to this. So, I could fit, maybe
I'll do the line in purple. I could fit a line that looks like that. And so, this one looks like it's positive. As one variable increases,
the other one does, for these data points. So it's a positive. I'd say this was pretty strong. The dots are pretty
close to the line there. It really does look like a little bit of a fat line, if you
just look at the dots. So, positive, strong, linear, linear relationship. And none of these data points
are really strong outliers. This one's a little bit further out. But they're all pretty close to the line, and seem to describe that trend roughly. All right, now, let's look
at this data right over here. So, let me get my line tool out again. So, it looks like I can fit a line. So it looks, and it looks like
it's a positive relationship. The line would be upward sloping. It would look something like this. And, once again, I'm eyeballing it. You can use computers and other methods to actually find a more precise line that minimizes the collective distance to all of the points, but it looks like there is a positive, but I would say, this one is a weak linear relationship, 'cause we have a lot of points
that are far off the line. So, not so strong. So, I would call this a positive, weak, linear relationship. And there's a lot of outliers here. This one over here is
pretty far, pretty far out. Now, let's look at this one. Pause this video and think about, is it positive or negative,
is strong or weak? Is this linear or non-linear? Well, the first thing we wanna do is let's think about it
with linear or non-linear. I could try to put a line on it. But if I try to put a line on it, it's actually quite difficult. If I try to do a line like this, you'll notice everything is kind of bending away from the line. It looks like, generally,
as one variable increases, the other variable decreases, but they're not doing it in a linear fashion. It looks like there's some
other type of curve at play. So, I could try to do a fancier curve that looks something like this, and this seems to fit
the data a lot better. So this one, I would
describe as non-linear. And it is a negative relationship. As one variable increases,
the other variable decreases. So, this is a negative, I would say, reasonably strong non-linear relationship. Pretty strong. Pretty strong. And once again, this is subjective. So, I'll say negative, reasonably strong, non-linear relationship. And maybe you could call
this one an outlier, but it's not that far,
and I might even be able to fit a curve that gets a
little bit closer to that. And once again, I'm eyeballing this. Now let's do this last one. And so, this one looks like a
negative linear relationship to me, a fairly strong
negative linear relationship, although there are some outliers. So, let me draw this line. So that seems to fit the data pretty good. So this is a negative, reasonably strong, reasonably strong linear relationship. But these are very clear outliers. These are well away from the data, or from the cluster of where
most of the points are. So, with some significant, with at least these two significant outliers here. So hopefully this makes
you a little bit familiar with some of this terminology, and it's important to keep in mind, this
is a little bit subjective. There'll be some cases that
are more obvious than others. And oftentimes, you
wanna make a comparison, that this is a stronger linear, positive linear relationship
than this one is, right over here, 'cause you can see, most of the data is closer to the line. This one is, for sure, this is
more non-linear than linear. It depends how you wanna describe, oftentimes, making a comparison, or making a subjective call
on how to describe the data.