If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

# Worked example of linear regression using transformed data

AP.STATS:
DAT‑1 (EU)
,
DAT‑1.J (LO)
,
DAT‑1.J.1 (EK)

## Video transcript

we are told that a conservation group with a long-term goal of preserving species believes that all at-risk species will disappear when land inhabited by those species is developed it has an opportunity to purchase land in an area about to be developed the group has a choice of creating one large Nature Preserve with an area of 45 square kilometers and containing 70 at-risk species or five small nature preserves each with an area of three square kilometers and each containing 16 at-risk species unique to that Preserve which choice would you recommend and why and there's some interesting data here here's it looks like some data they have gathered for different islands and we have their areas and then this is a number of species at risk in 1990 and then this species extinct by 2000 and so we can see for these various islands we can see their areas and the proportion that got extinct and it looks like they're plotted on this scatter plot now be very careful when you look at this because look at the two axes it is the vertical axis is the proportion extinct in 2000 so it's these numbers but the horizontal axis isn't just a straight up area it's the natural log of the area and why did they do this well notice when you make the horizontal axis the natural log of the area it looks like there's a linear relationship but be clear it's a linear relationship between the natural log of the area and proportion extinct in 2000 but the reason why it's valuable to do this type of transformation is now we can apply our tools of linear regression to think about what would be the proportion extinct for the 45 square kilometers versus for the 5 small 3 kilometer islands so pause this video and see if you can figure it out on your own and they give us the regression data for a line that fits this data all right now let's work through it together and to make some space because all of it is already plotted right over here and we have our regression data so the regression line we know it's slope and y-intercept the y-intercept is right over here zero point two eight nine nine six so zero point two this is let's see one two three four five so two eight nine nine six it's almost two nine so it's gonna be right over here would be the y-intercept and it's slope is negative zero point zero five approximately and I could eyeball it it probably it's going to look something something like this that's the regression line or another way to think about it is the regression line tells us in general the proportion proportion and I'll just say proportion shorthand for proportion extinct is going to be equal to our y-intercept zero point two eight nine nine six minus minus is zero point zero five three two three and we have to be careful here you might be tempted to say x the area but no the horizontal axis here is the natural log of the area times the natural log of the area and so we can use this equation for both scenarios to think about what is going to be the proportion that it gets that we would expect to get extinct in either situation and then how many actual species will get extinct and then the one that maybe has fewer species that get extinct is might maybe the best one or the one that with the more that we can preserve is maybe the best one and so let's look at the two scenarios so the first scenario is the forty five square kilometer island and this is just one so x 1 and so what is going to be the proportion proportion that we would expect to go extinct based on this regression well it's going to be zero point two eight nine nine six minus zero point zero five three to three times the natural log of 45 and if we want to know the actual number that go extinct so number extinct extinct would be equal to the proportion would be equal to the proportion times how many let's see the forty five square kilometers and it contains seventy at-risk species so times our seventy species and so we can get our calculator out to figure that out so this is the proportion we would expect to go extinct in the 45 square kilometer island based on our linear regression so this would be equal to so it looks like almost nine percent and if we want to figure out the actual number we would expect to go extinct we would just multiply that times the number of species on that island so times seventy and we get so approximately about six point one one so let me write that down so this is going to be approximately six point one one so we could say there would be approximately if we let's just say six extinct this is all very approximate extinct and approximately 64 saved now let's think about the other scenario let's think about the scenario where we have five small nature preserves so it's going to be three square kilometers times five islands and we're going to just do the same exercise our proportion it that goes extinct it's going to be zero point two eight nine nine six that's just the y-intercept for our regression line minus zero point zero five three two three and there have a negative sign there because we have a negative slope and this is not just times the area it's times the natural log of the area it's going to be three square kilometers three square kilometers and then our number extinct number extinct it is going to be equal to our proportion that we will calculate in the line above times let's see five small nature preserves each with an area of three square kilometers and each containing sixteen at-risk species so five times sixteen if each Island has sixteen and there's five islands that's going to be five times 16 is 80 so times 80 so let's figure out what this is get the calculator out again and we are going to get so this is going to be the proportion it's a much higher proportion and then we'll multiply that times our number of species so times eighty to figure out how many species will go extinct and we have it's approximately eighteen point five two so this is approximately eighteen point five two so another way to think about it is we're going to have approximately well if we round let's just say nineteen extinct nineteen extinct and then if we have nineteen extinct how many we're going to save or we're gonna have sixty-one saved sixty-one saved and even if you said eighteen and a half here and sixty one point five here on either measure the forty five square the Big Island is better you're going to have fewer species that are extinct and more that are saved so which choice would you recommend and why I'd recommend the one large island because you're going to save you would expect to save more species and you would expect that fewer going to get extinct based on this linear regression
AP® is a registered trademark of the College Board, which has not reviewed this resource.