Statistics and probability
- Introduction to inference about slope in linear regression
- Conditions for inference on slope
- Confidence interval for the slope of a regression line
- Confidence interval for slope
- Calculating t statistic for slope of regression line
- Test statistic for slope
- Using a P-value to make conclusions in a test about slope
- Using a confidence interval to test slope
- Making conclusions about slope
Making conclusions in a hypothesis test about the slope of a least-squares regression line.
Want to join the conversation?
- I wonder what degrees of freedom is taken in this sample? If we assume that it is a large sample, we can use a z-statistic. In this case p-value will be 0.00136.(9 votes)
- in fact we can at least try to infer DF(degree of freedom) using t_table
unfortunately i have a t_table only with columns for 0.01 and 0.002 as two-tail probability. but we have to make an inference for 0.004 which is given. thus we find a closest value of t(=2.999) between these two columns
and that t_value is 3.098 with DF of 1000 for 0.002 tail prob. and for 0.01 tail prob, t_value is 2.581 with same DF
now let's make a leap of faith that t_value shrinks quite linearly with tail prob. meaning if we move from 0.002 tail prob to 0.01 tail prob, the value of 3.098 may decrease to 2.581 in a bit expected manner. regarding the shape of this edge of t_distribution, this might not that large leap i believe
then 0.004 must be 1/4 left to the 0.002 tail prob, thus we can assume 3.098 - (3.098-2.581)/4 = 3.098 - 0.517/4 = 3.098 - 0.129 = 2.969. this is not that distant from our expected 2.999 of tail prob
thus i would bet that DF used in this example might be slightly less than 1000 (but not close to 100)(2 votes)
- What if the slope turns out to be negative? Neither of the hypotheses include a negative outcome, so what would you conclude if that was the case?(1 vote)
- Hi Sal,
You mention at3:44that the p-value is related to the alternative hypothesis. I thought the p-value is p(β> estimated β | B0 is True). So in a way it is not related to the alternative hypothesis. Could you please explain?
- Isn't the "constant" the independent variable? Therefore price should be on the x axis and speed on the y?(1 vote)
- [Narrator] "Alicia took a random sample of mobile phones" "and found a positive linear relationship" "between their processor speeds and their prices." "Here is computer output from a least-squares" "regression analysis on her sample." So just to be clear what's going on, she took a sample of phones, they're not telling us exactly how many, but she took a number of phones and she found a linear relationship between processor speed and prices. So this is price right over here, and this is processor speed right over here. And then she plotted her sample. For every phone would be a data point, and so you see that, and then she put those data points into her computer, and it was able to come up with a line, a regression line, for her sample. And her regression line for her sample, if we say that's going to be y is, or y hat is going to be a plus bx, for her sample, a is going to be 127.092, so that's that over there, and for her sample, the slope of the regression line is going to be the coefficient on speed. Another way to think about it, this x variable right over here is speed, so the coefficient on that is the slope. But we have to remind ourselves that these are estimates of maybe some true truth in the universe. If she were able to sample every phone in the market, then she would get the true population parameters, but since this is a sample, it's just an estimate. And she, just because she sees this positive linear relationship in her sample doesn't necessarily mean that this is the case for the entire population. She might have just happened to sample things that had this positive linear relationship. And so that's why she's doing this hypothesis test. And in a hypothesis test, you actually assume that there isn't a relationship between processor speed and price. So Beta right over here, this would be the true population parameter for regression on the population. So if this is the population right over here, and if somehow, where it's price on the vertical axis, and processor speed on the horizontal axis, and if you were able to look at the entire population, I don't know how many phones there are, but it might be billions of phone, and then do a regression line, then our null hypothesis is that the slope of the regression line is going to be zero. So the regression line might look something like that, where the equation of the regression line for the population, y hat would be Alpha plus Beta times, times x. And so our null hypothesis is that Beta's equal to zero, and the alternative hypothesis, which is her suspicion, is that the true slope of the regression line is actually greater than zero. "Assume that all conditions for inference have been met." "At the Alpha equals 0.01 level of significance," "is there sufficient evidence to conclude" "a positive linear relationship between these variables" "for all mobile phones?" "Why?" So pause this video and see if you can have a go at it. Well, in order to do this hypothesis task, we have to say, "Well, assuming the null hypothesis" "is true, assuming this is the actual slope of the" "population regression line," I guess you could think about it, "what is the probability of us getting" "this result right over here?" And what we can do is use this information and our estimate of the sampling distribution of the sample regression line slope, and we can come up with a T-statistic. And for this situation where our alternative hypothesis is that our true population regression slope is greater than zero, our P-value can be viewed as the probability of getting a T-statistic greater than or equal to this. So getting a T-statistic greater than or equal to 2.999. Now, you could be tempted to say, "Hey, look" "there's this column that gives us a P-value," "maybe they just figured out for us" "that this probability is 0.004." And we have to be very, very careful here, because here, they're actually giving us, I guess you could call it a two-sided P-value. If you think of a T-distribution, and they would do it for the appropriate degrees of freedom, this is saying, "What's the probability" "of getting a result where the" "absolute value is 2.999 or greater?" So if this is T equals zero right here in the middle, and this is 2.999, we care about this region. We care about this right tail. This P-value right over here, this is giving us not just the right tail, but it's also saying, "Well, what about getting something less than" "negative 2.999, or including negative 2.99?" So it's giving us both of these areas, so if you want the P-value for this scenario, we would just look at this. And as you can see, because this distribution is symmetric, the T-distribution is going to be symmetric, you take half of this. So this is going to be equal to 0.002. And what you'd do in any significance test is then compare your P-value to your level of significance. And so if you look at 0.002 and compare it to 0.01, which of these is greater? Well, at first your eyes might say, "Hey, two is greater than one," but this is two thousandths versus one hundredth. This is 10 thousandths right over here. So in this situation, our P-value is less than our level of significance, and so we're saying, "Hey, the probability of getting a result" "this extreme or more extreme is so low," "if we assume our null hypothesis," "that in this situation we will reject." "We will decide to reject our null hypothesis," "which would suggest the alternative." So, "Is there sufficient evidence to conclude" "a positive linear relationship" "between these variables for all mobile phones?" Yes. "Why?" Because, because P-value is less than our significance level, and so we reject our null hypothesis, which suggests, suggests our alternative hypothesis.