Main content

## AP®︎/College Statistics

### Course: AP®︎/College Statistics > Unit 13

Lesson 1: Confidence intervals for the slope of a regression model# Conditions for inference on slope

AP.STATS:

UNC‑4 (EU)

, UNC‑4.AD (LO)

, UNC‑4.AD.1 (LO)

, VAR‑7 (EU)

, VAR‑7.L (LO)

, VAR‑7.L.1 (EK)

Introducing the conditions for making a confidence interval or doing a test about slope in least-squares regression.

## Want to join the conversation?

- If the t-test is to see whether the linear regression is legit, why is it that the linear part would be a condition? Isn't that what we're trying to prove?(3 votes)
- why is "equal variance" a requirement?(1 vote)
- Variance means difference. Equal variance means equal difference. A line should have the same slope at any point.(2 votes)

- Hey! I learned that the "normality condition" relates to the normal distribution of the disturbances/errors. The prof mentioned that it is a common misconception that this condition is related to the normal distribution of a variable. Am I confusing something?(1 vote)
- I'm fairly sure you're right! I think Sal meant exactly what you meant, but expressed it in an unclear way?(1 vote)

- Could you please do videos on bar graphs, but for this same topic.(1 vote)
- Is equal variance the same as data being non-heteroscedastic?(0 votes)

## Video transcript

- [Instructor] In a previous
video, we began to think about how we can use a regression
line and, in particular, the slope of a regression
line based on sample data, how we can use that in
order to make inference about the slope of the true
population regression line. In this video, what we're
going to think about, what are the conditions for inference when we're dealing with regression lines? And these are going to be, in some ways, similar to the conditions for inference that we thought about when we
were doing hypothesis testing and confidence intervals for
means and for proportions, but there's also going to
be a few new conditions. So to help us remember these conditions, you might want to think about
the LINER acronym, L-I-N-E-R. And if it isn't obvious to
you, this almost is linear. Liner, if it had an A, it would be linear. And this is valuable because, remember, we're thinking about linear regression. So the L right over here
actually does stand for linear. And here, the condition is, is
that the actual relationship in the population between
your x and y variables actually is a linear relationship, so actual linear relationship, relationship between, between x and y. Now, in a lot of cases, you
might just have to assume that this is going to be
the case when you see it on an exam, like an AP exam, for example. They might say, hey, assume
this condition is met. Oftentimes, it'll say assume all of these conditions are met. They just want you to maybe
know about these conditions. But this is something to think about. If the underlying
relationship is nonlinear, well, then maybe some of your inferences might not be as robust. Now, the next one is
one we have seen before when we're talking about general
conditions for inference, and this is the independence, independence condition. And there's a couple of
ways to think about it. Either individual observations are independent of each other. So you could be sampling with replacement. Or you could be thinking
about your 10% rule, that we have done when we thought about the independence condition
for proportions and for means, where we would need to feel confident that the size of our
sample is no more than 10% of the size of the population. Now, the next one is the normal condition, which we have talked about
when we were doing inference for proportions and for means. Although, it means something a
little bit more sophisticated when we're dealing with a regression. The normal condition, and, once again, many times people just
say assume it's been met. But let me actually
draw a regression line, but do it with a little perspective, and I'm gonna add a third dimension. Let's say that's the x-axis, and let's say this is the y-axis. And the true population
regression line looks like this. And so the normal condition tells us that, for any given x
in the true population, the distribution of y's that
you would expect is normal, is normal. So let me see if I can
draw a normal distribution for the y's, given that x. So that would be that
normal distribution there. And then let's say, for
this x right over here, you would expect a normal
distribution as well, so just like, just like this. So if we're given x, the distribution of y's should be normal. Once again, many times you'll just be told to assume that that has
been met because it might, at least in an introductory
statistics class, be a little bit hard to
figure this out on your own. Now, the next condition
is related to that, and this is the idea of
having equal variance, equal variance. And that's just saying that each of these normal distributions should have the same spread for a given x. And so you could say equal variance, or you could even think about them having the equal standard deviation. So, for example, if, for a
given x, let's say for this x, all of sudden, you had
a much lower variance, made it look like this, then you would no longer meet
your conditions for inference. Last, but not least, and this
is one we've seen many times, this is the random condition. And this is that the data comes from a well-designed random sample or some type of randomized experiment. And this condition we have
seen in every type of condition for inference that we
have looked at so far. So I'll leave you there. It's good to know. It will show up on some exams. But many times, when it
comes to problem solving, in an introductory statistics
class, they will tell you, hey, just assume the conditions
for inference have been met. Or what are the conditions for inference? But they're not going to
actually make you prove, for example, the normal or
the equal variance condition. That might be a bit much for an introductory statistics class.