If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains ***.kastatic.org** and ***.kasandbox.org** are unblocked.

Main content

Current time:0:00Total duration:7:12

AP.STATS:

UNC‑4 (EU)

, UNC‑4.AC (LO)

, UNC‑4.AC.3 (LO)

, UNC‑4.AF (LO)

, UNC‑4.AF.1 (LO)

, UNC‑4.AF.2 (LO)

, VAR‑1 (EU)

, VAR‑1.K (LO)

, VAR‑1.K.1 (EK)

in this video we're going to talk about regression lines but it's not going to be the first time we're talking about regression lines and so if the idea of our aggression is foreign to you I encourage you to watch the introductory videos on it here we're gonna think about how we can make inferences from a regression line and so the idea of statistical inference is new to you or hypothesis testing once again watch those videos as well but let's say we think there's a positive association between shoe size and height and so what we might want to do is we could here on the horizontal axis that is shoe size our sizes could go size 1 2 3 4 5 6 7 8 9 10 11 12 and it could keep going up from there and then on this height or on this axis our y-axis this would be height so one foot two feet three feet four feet five feet six feet seven feet and then you could to see if there's an association you might take a sample say you take a random sample of 20 people from the population and in future videos we'll talk about the conditions necessary for making appropriate inferences let's say those 20 people are these 20 data points so there's a young child and maybe there's a grown adult with bigger feet and who's taller and then 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 and so you have these 20 data points and then what you're likely to do is input them into a computer you could do it by hand but we have computers now to do that for us usually and the computer could try to fit a regression line and there's many techniques for doing it but one typical technique is to try to overall minimize the squared distance between these points and that line and this regression line will have an equation as any line would have and we tend to show that as saying Y hat this hat tells us that this is a regression line is equal to the y intercept a plus the slope times our X variable so this right over here would be a now to be clear if you took another sample you might get different results here inside fact let's call this y sub one for our first sample a sub one B sub one and this is a sub one if you were to take another sample of twenty folks so let's do that maybe you get one two three four five six seven eight nine 10 11 12 13 14 15 16 17 18 19 20 and then you tried to fit a line to that that line might look something like this it might have a slightly different y-intercept and a slightly different slope so we could call that for the second sample y sub 2 or Y hat sub 2 is equal to a sub 2 plus B sub 2 times X and so every time you take a sample you are likely to get different results for these values which are essentially statistics remember statistics are things that we can get from samples and we're trying to estimate true population parameters well what would be the true population parameters we're trying to estimate well imagine a world imagine a world here that you are able to find out the true linear relationship or maybe there is some true linear relationship between shoe size and height you could get it if theoretically you could measure every human being on the planet and depending what you define as a population it could be all living people or all people will ever live this isn't practical but let's just say that you actually could and you you would have billions of data points here for the true population and then if you were to fit a regression line to that you could view this as the true population regression line and so that would be Y hat is equal to and to make it clear that here the y-intercept in the slope this would be the true population parameters instead of saying a we say alpha and instead of saying B we say beta times X but it's very hard to come up exactly with what alpha and beta are and so that's why we estimate it with A's and B's based on a sample now what's interesting with this in mind is we can start to make inferences based on our sample so we know that that for example B sub 2 is unlikely to be exactly beta but how confident can we be that there is at least a positive linear relationship or a nonzero linear relationship or can we create a confidence interval around this statistic in order to have a good sense of where the true parameter might actually be and the simple answer is yes and to do so we'll use the same exact ideas that we did when we made inferences based on proportions or based on means the way that you can make an inference for example for your true population slope of your regression line say okay I took a sample I got this slope right over here so I'll just call that b2 and then I could create a confidence interval around that and so that confidence interval is going to be based on some critical value x ideally the standard deviation of the sampling distribution of your sample statistic in this case it would be the sample regression line slope but because we don't know exactly what this is we can't figure out precisely what this is going to be from a sample we are going to estimate it with what's known as the standard error of the statistic and we'll go into more depth in this in future videos and since we're estimating here we're going to use a critical T value here which we have studied before and so based on your confidence level you want to have let's say it's 95% based on the degrees of freedom which we'll see we'll come out of how many data points we have we can figure this out and from our sample we can figure this out and we can figure this out and then we would have constructed a confidence interval we'll also see that you could do hypothesis testing here you could say hey let's set up a null hypothesis and the null hypothesis is going to be that there's no non zero really linear relationship or that there's or that the true population slope of the regression line or slope of the population regression line is equal to 0 and that the alternative hypothesis is that the true relationship could either be greater than 0 it's a positive linear relationship or that it's just non zero and then what you could do is assuming this you could see what's the probability of getting a statistic that is at least this extreme or more extreme and if that's below some threshold you might reject the null potus's which would suggest the alternative so this and this are things that we have done before where you're creating a confidence interval around a statistic or you're doing hypothesis testing making assumptions about a true parameter the only difference here is that the parameter that we're trying to estimate are going to be the parameters for a theoretical population or gresham line and we're going to do that using sample statistics for a sample regression line