Loading

Using a P-value to make conclusions in a test about slope

Video transcript

- [Narrator] "Alicia took a random sample of mobile phones" "and found a positive linear relationship" "between their processor speeds and their prices." "Here is computer output from a least-squares" "regression analysis on her sample." So just to be clear what's going on, she took a sample of phones, they're not telling us exactly how many, but she took a number of phones and she found a linear relationship between processor speed and prices. So this is price right over here, and this is processor speed right over here. And then she plotted her sample. For every phone would be a data point, and so you see that, and then she put those data points into her computer, and it was able to come up with a line, a regression line, for her sample. And her regression line for her sample, if we say that's going to be y is, or y hat is going to be a plus bx, for her sample, a is going to be 127.092, so that's that over there, and for her sample, the slope of the regression line is going to be the coefficient on speed. Another way to think about it, this x variable right over here is speed, so the coefficient on that is the slope. But we have to remind ourselves that these are estimates of maybe some true truth in the universe. If she were able to sample every phone in the market, then she would get the true population parameters, but since this is a sample, it's just an estimate. And she, just because she sees this positive linear relationship in her sample doesn't necessarily mean that this is the case for the entire population. She might have just happened to sample things that had this positive linear relationship. And so that's why she's doing this hypothesis test. And in a hypothesis test, you actually assume that there isn't a relationship between processor speed and price. So Beta right over here, this would be the true population parameter for regression on the population. So if this is the population right over here, and if somehow, where it's price on the vertical axis, and processor speed on the horizontal axis, and if you were able to look at the entire population, I don't know how many phones there are, but it might be billions of phone, and then do a regression line, then our null hypothesis is that the slope of the regression line is going to be zero. So the regression line might look something like that, where the equation of the regression line for the population, y hat would be Alpha plus Beta times, times x. And so our null hypothesis is that Beta's equal to zero, and the alternative hypothesis, which is her suspicion, is that the true slope of the regression line is actually greater than zero. "Assume that all conditions for inference have been met." "At the Alpha equals 0.01 level of significance," "is there sufficient evidence to conclude" "a positive linear relationship between these variables" "for all mobile phones?" "Why?" So pause this video and see if you can have a go at it. Well, in order to do this hypothesis task, we have to say, "Well, assuming the null hypothesis" "is true, assuming this is the actual slope of the" "population regression line," I guess you could think about it, "what is the probability of us getting" "this result right over here?" And what we can do is use this information and our estimate of the sampling distribution of the sample regression line slope, and we can come up with a T-statistic. And for this situation where our alternative hypothesis is that our true population regression slope is greater than zero, our P-value can be viewed as the probability of getting a T-statistic greater than or equal to this. So getting a T-statistic greater than or equal to 2.999. Now, you could be tempted to say, "Hey, look" "there's this column that gives us a P-value," "maybe they just figured out for us" "that this probability is 0.004." And we have to be very, very careful here, because here, they're actually giving us, I guess you could call it a two-sided P-value. If you think of a T-distribution, and they would do it for the appropriate degrees of freedom, this is saying, "What's the probability" "of getting a result where the" "absolute value is 2.999 or greater?" So if this is T equals zero right here in the middle, and this is 2.999, we care about this region. We care about this right tail. This P-value right over here, this is giving us not just the right tail, but it's also saying, "Well, what about getting something less than" "negative 2.999, or including negative 2.99?" So it's giving us both of these areas, so if you want the P-value for this scenario, we would just look at this. And as you can see, because this distribution is symmetric, the T-distribution is going to be symmetric, you take half of this. So this is going to be equal to 0.002. And what you'd do in any significance test is then compare your P-value to your level of significance. And so if you look at 0.002 and compare it to 0.01, which of these is greater? Well, at first your eyes might say, "Hey, two is greater than one," but this is two thousandths versus one hundredth. This is 10 thousandths right over here. So in this situation, our P-value is less than our level of significance, and so we're saying, "Hey, the probability of getting a result" "this extreme or more extreme is so low," "if we assume our null hypothesis," "that in this situation we will reject." "We will decide to reject our null hypothesis," "which would suggest the alternative." So, "Is there sufficient evidence to conclude" "a positive linear relationship" "between these variables for all mobile phones?" Yes. "Why?" Because, because P-value is less than our significance level, and so we reject our null hypothesis, which suggests, suggests our alternative hypothesis.