Main content

## AP®︎/College Statistics

### Unit 13: Lesson 2

Testing for the slope of a regression model# Using a P-value to make conclusions in a test about slope

AP.STATS:

DAT‑3 (EU)

, DAT‑3.M (LO)

, DAT‑3.M.1 (EK)

, DAT‑3.N (LO)

, DAT‑3.N.1 (EK)

, DAT‑3.N.2 (EK)

Making conclusions in a hypothesis test about the slope of a least-squares regression line.

## Want to join the conversation?

- I wonder what degrees of freedom is taken in this sample? If we assume that it is a large sample, we can use a z-statistic. In this case p-value will be 0.00136.(7 votes)
- in fact we can at least try to infer DF(degree of freedom) using t_table

unfortunately i have a t_table only with columns for 0.01 and 0.002 as two-tail probability. but we have to make an inference for 0.004 which is given. thus we find a closest value of t(=2.999) between these two columns

and that t_value is 3.098 with DF of 1000 for 0.002 tail prob. and for 0.01 tail prob, t_value is 2.581 with same DF

now let's make a leap of faith that t_value shrinks quite linearly with tail prob. meaning if we move from 0.002 tail prob to 0.01 tail prob, the value of 3.098 may decrease to 2.581 in a bit expected manner. regarding the shape of this edge of t_distribution, this might not that large leap i believe

then 0.004 must be 1/4 left to the 0.002 tail prob, thus we can assume 3.098 - (3.098-2.581)/4 = 3.098 - 0.517/4 = 3.098 - 0.129 = 2.969. this is not that distant from our expected 2.999 of tail prob

thus i would bet that DF used in this example might be slightly less than 1000 (but not close to 100)(2 votes)

- Hi Sal,

You mention at3:44that the p-value is related to the alternative hypothesis. I thought the p-value is p(β> estimated β | B0 is True). So in a way it is not related to the alternative hypothesis. Could you please explain?

Best,

Alaa(1 vote)- He should have added | B0 is True but he didn't add it in this scenario.

However it is a minor point.(1 vote)

- Isn't the "constant" the independent variable? Therefore price should be on the x axis and speed on the y?(1 vote)
- What if the slope turns out to be negative? Neither of the hypotheses include a negative outcome, so what would you conclude if that was the case?(1 vote)
- You base your hypotheses on the information you have which is that the slope seams to be positive.(0 votes)

## Video transcript

- [Narrator] "Alicia took a
random sample of mobile phones" "and found a positive linear relationship" "between their processor
speeds and their prices." "Here is computer output
from a least-squares" "regression analysis on her sample." So just to be clear what's going on, she took a sample of phones,
they're not telling us exactly how many, but she
took a number of phones and she found a linear relationship between processor speed and prices. So this is price right over
here, and this is processor speed right over here. And then she plotted her sample. For every phone would be a data point, and so you see that, and then
she put those data points into her computer, and
it was able to come up with a line, a regression
line, for her sample. And her regression line for her sample, if we say that's going to be y is, or y hat is going to be a plus bx, for her sample, a is going to be 127.092, so that's that over there, and for her sample, the
slope of the regression line is going to be the coefficient on speed. Another way to think about it, this x variable right over here is speed, so the coefficient on that is the slope. But we have to remind ourselves that these are estimates of maybe some true truth in the universe. If she were able to sample every phone in the market, then she would get the true population parameters, but since this is a sample,
it's just an estimate. And she, just because she sees this positive linear relationship in her sample doesn't necessarily mean that this is the case for the entire population. She might have just
happened to sample things that had this positive
linear relationship. And so that's why she's
doing this hypothesis test. And in a hypothesis
test, you actually assume that there isn't a relationship between processor speed and price. So Beta right over here, this would be the true population parameter for regression on the population. So if this is the
population right over here, and if somehow, where it's
price on the vertical axis, and processor speed on
the horizontal axis, and if you were able to look
at the entire population, I don't know how many phones there are, but it might be billions of phone, and then do a regression line, then our null hypothesis
is that the slope of the regression line is going to be zero. So the regression line might
look something like that, where the equation of the regression line for the population, y hat would be Alpha plus Beta times, times x. And so our null hypothesis
is that Beta's equal to zero, and the alternative hypothesis, which is her suspicion,
is that the true slope of the regression line is
actually greater than zero. "Assume that all conditions
for inference have been met." "At the Alpha equals 0.01
level of significance," "is there sufficient evidence to conclude" "a positive linear relationship
between these variables" "for all mobile phones?" "Why?" So pause this video and see
if you can have a go at it. Well, in order to do this hypothesis task, we have to say, "Well,
assuming the null hypothesis" "is true, assuming this is
the actual slope of the" "population regression line,"
I guess you could think about it, "what is the
probability of us getting" "this result right over here?" And what we can do is use this information and our estimate of the
sampling distribution of the sample regression line slope, and we can come up with a T-statistic. And for this situation where
our alternative hypothesis is that our true
population regression slope is greater than zero, our P-value can be viewed as the probability
of getting a T-statistic greater than or equal to this. So getting a T-statistic greater than or equal to 2.999. Now, you could be tempted
to say, "Hey, look" "there's this column
that gives us a P-value," "maybe they just figured out for us" "that this probability is 0.004." And we have to be very, very careful here, because here, they're actually giving us, I guess you could call
it a two-sided P-value. If you think of a T-distribution, and they would do it for the
appropriate degrees of freedom, this is saying, "What's the probability" "of getting a result where the" "absolute value is 2.999 or greater?" So if this is T equals zero
right here in the middle, and this is 2.999, we
care about this region. We care about this right tail. This P-value right over
here, this is giving us not just the right tail,
but it's also saying, "Well, what about getting
something less than" "negative 2.999, or
including negative 2.99?" So it's giving us both of these areas, so if you want the
P-value for this scenario, we would just look at this. And as you can see,
because this distribution is symmetric, the
T-distribution is going to be symmetric, you take half of this. So this is going to be equal to 0.002. And what you'd do in any significance test is then compare your P-value
to your level of significance. And so if you look at 0.002 and compare it to 0.01, which of these is greater? Well, at first your eyes might say, "Hey, two is greater than one," but this is two thousandths
versus one hundredth. This is 10 thousandths right over here. So in this situation, our P-value is less than our level of significance,
and so we're saying, "Hey, the probability of getting a result" "this extreme or more extreme is so low," "if we assume our null hypothesis," "that in this situation we will reject." "We will decide to reject
our null hypothesis," "which would suggest the alternative." So, "Is there sufficient
evidence to conclude" "a positive linear relationship" "between these variables
for all mobile phones?" Yes. "Why?" Because, because P-value is less than our significance level, and so we reject our null hypothesis, which suggests, suggests
our alternative hypothesis.