If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Worked example of linear regression using transformed data

Worked example of linear regression using transformed data. Adapted from 2007 AP Statistics free response, form b, question 6, part d.

Want to join the conversation?

  • blobby green style avatar for user Nandee
    Why don't we take the ln(3 *15)?
    (10 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user Tom
      I think you mean "why don't we take the ln(3*5), right? Meaning, "why don't we use the total area of the islands, instead of the area of just one of them," right? The line Sal calculates gives the proportion of animals to go extinct versus CONTIGUOUS area (i.e. area all together in one piece). Note that the regression line has a negative slope, so the smaller the piece of land, the bigger the proportion of animals that go extinct. So, to get an accurate proportion for each of the five smaller parcels, the area for just one must be used to calculate the proportion for that one. Then the proportion for each would be multiplied times the number of animals on that one. Since they're all are the same size, and since they all have the same number of animals, once the proportion is calculated for one, using the area for just that one, or 3 sq. km, then the total predicted extinctions for the group can be calculated by just multiplying by 5. So, 5*[16*(0.28996 - 0.05323*ln(3))] would be the correct calculation.
      (18 votes)
  • leaf orange style avatar for user Christopher Hickey
    The conclusion reached is backwards. It is question of what is the original data saying. If the regression showed a relationship between large PRESERVED islands and small PRESERVED islands then the conclusion would make sense. However, what it appears to be asking is whether one large or five small islands should be preserved given the proportion of extinct species that has been observed on UNPRESERVED islands of various sizes. The correct answer is to preserve the 5 small islands, thereby preserving 19 total species that would have otherwise become extinct.
    (6 votes)
    Default Khan Academy avatar avatar for user
  • aqualine sapling style avatar for user RainyDAZE
    Why is it ln for natural log and not nl? It just confuses me.
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user vebustos1986
    shouldn't be ln(15) be used? (that is 3 times 5, there were 5 smalls lands, after all the total species 80 are being sum up too) In the calculators appears ln(3)
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user sofiav
    How does one go from a result in the ln form to a regular result
    (2 votes)
    Default Khan Academy avatar avatar for user
  • marcimus orange style avatar for user cameron grissom
    with the five nature preserves would't the save 71 species because with five nature preserves containing 16 species each the total species would be 80 at-risk species?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • winston default style avatar for user Victor Gutierrez
    Why the data of the table, does not relate to the data in the scatter plot? I mean, data point number 12 is not represented in the scatter plot, and its proportion value has been increased..
    (1 vote)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Michael Letsinger
    table of linear regression of saving rate on population under 15
    (1 vote)
    Default Khan Academy avatar avatar for user
  • starky sapling style avatar for user ForgottenUser
    Shouldn't we also consider the statistical error for each of those estimates? If the estimates overlap at say a 95% confidence interval, wouldn't that be good evidence that there is no effective difference between the conservation methods?
    (1 vote)
    Default Khan Academy avatar avatar for user

Video transcript

- [Lecturer] We are told that a conservation group with a long-term goal of preserving species believes that all at-risk species will disappear when land, inhabited by those species, is developed. It has an opportunity to purchase land in an area about to be developed. The group has a choice of creating one large nature preserve with an area of 45 square kilometers and containing 70 at-risk species, or five small nature preserves, each with an area of three square kilometers and each containing 16 at-risk species unique to that preserve. Which choice would you recommend and why? There are some interesting data here. It looks like some data they have gathered for different islands. We have their areas. This is the number of species at risk in 1990. The species extinct by 2000. We can see for these various islands, we can see their areas and the proportion that got extinct. It looks like they're plotted on this scatter plot. Now be very careful when you look at this because look at the two axes. The vertical axis is the proportion extinct in 2000. It's these numbers. But the horizontal axes isn't just a straight-up area. It's the natural log of the area. Why did they do this? Notice, when you make the horizontal axis the natural log of the area, it looks like there is a linear relationship. But be clear, it's a linear relationship between the natural log of the area and proportion extinct in 2000. But the reason why it's valuable to do this type of transformation is now we can apply our tools of linear regression to think about what would be the proportion extinct for the 45 square kilometers versus for the five small three-kilometer islands. Pause this video and see if you can figure it out on your own. They gave us the regression data for a line that fits this data. All right. Now let's work through it together. To make some space because all of it is already plotted right over here, and we have our regression data. The regression line, we know it's a slope in y-intercept. The y-intercept is right over here, 0.28996. 0.2, this is, let's see, one, two, three, four, five. 28996. It's almost 29. It's gonna be right over here would be the y-intercept. Its slope is negative 0.05 approximately. I could eyeball. It probably is gonna look something like this. That's the regression line. Or another way to think about it is the regression line tell us in general the proportion, proportion, obviously a proportion, shorthand for proportion extinct, is going to be equal to our y-intercept 0.28996 minus 0.05323. We have to be careful here. You might be tempted to say times the area, but no, the horizontal axis here is the natural log of the area. Times the natural log of the area. We can use this equation for both scenarios to think about what is going to be the proportion that we would expect to get extinct in either situation, and then how many actual species will get extinct. The one that maybe has fewer species that get extinct might maybe the best one, or the one that the more that we can preserve is maybe the best one. Let's look at the two scenarios. The first scenario is the 45 square kilometer island. This is just one, so times one. What is gonna be the proportion, proportion that we would expect to go extinct? Based on this regression, it's going to be 0.28996 minus 0.05323 times the natural log of 45. If we want to know the actual number that go extinct, so number extinct would be equal to the proportion, would be equal to the proportion times how many, let's see, the 45 square kilometers and it contains 70 at-risk species, so times our 70 species. We can get our calculator out to figure that out. This is the proportion we would expect to go extinct in the 45 square kilometer island based on our linear regression. This would be equal to. It looks like almost 9%. If we want to figure out the actual number we would expect to go extinct, we would just multiply that times the number of species on that island, so times 70, and we get approximately about 6.11. Let me write that down. This is going to be approximately 6.11. We could say there would be approximately if we, let's just say six extinct, and this is all very approximate. Extinct. Approximately 64 saved. Now let's think about the other scenario. Let's think about the scenario where we have five small nature preserves. So it's going to be three square kilometers times five islands. We're gonna just do the same exercise. Our proportion that goes extinct is gonna be 0.28996, that's just the y-intercept for our regression line, minus 0.05323, and you have a negative sign there 'cause we have a negative slope, and this is not just times the area, it's times the natural log of the area. It's going to be three square kilometers. Three square kilometers. Our number extinct, Our number extinct is going to be equal to our proportion that we will calculate in the line above times. Let's see. Five small nature preserves, each with an area of three square kilometers and each containing 16 at-risk species. Five times 16, if each island has 16 and there's five islands, that's going to be, five times 16 is 80. Times 80. Let's figure out what this is. Get the calculator out again. We are going to get. This is going to be the proportion. It's a much higher proportion. We'll multiply that times our number of species, so times 80 to figure out how many species will go extinct. We have here it's approximately 18.52. This is approximately 18.52. Another way to think about it is we're gonna have approximately. If we round let's just say 19 extinct, 19 extinct. And then if we have 19 extinct, how many are we gonna save? We're gonna have 61 saved. 61 saved. Even if you said 18 1/2 here and 61.5 here, on either measure, the 45 square, the big island is better. You're gonna have fewer species that are extinct and more that are saved. Which choice would you recommend and why? I'd recommend the one large island because you're gonna save, you would expect to save more species, and you would expect that fewer are going to get extinct based on this linear regression.