If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Impact on median & mean: removing an outlier

In this golf game, Ana's lowest score of 80 was removed due to rule-breaking. This change increased both the mean and median of her remaining scores. However, the mean increased more than the median.

Want to join the conversation?

  • winston default style avatar for user Redapple8787
    Won't removing an outlier be manipulating the data set? This video shows how the mean and median can change when the outlier is removed. So, if a scientist does some tests and gets an outlier, he/she can remove it to change the results to what he/she wants. So, I ask again, won't removing an outlier be unfairly changing the results?
    (31 votes)
    Default Khan Academy avatar avatar for user
    • leaf yellow style avatar for user Howard Bradley
      Depends. You're right that a scientist can't just arbitrarily discard a result, but if she'd been getting consistent results previously an outlier would suggest some kind of experimental error. If she can identify the source of that error then she is justified in removing the data.
      In the video, it turned out that the score of 80 was as a result of "cheating", so we are right to discount it.
      (50 votes)
  • starky tree style avatar for user kristofer
    I remember much about mean, but not so much about the rest. can someone fill me in?
    (4 votes)
    Default Khan Academy avatar avatar for user
    • aqualine ultimate style avatar for user YH
      Mean: Add all the numbers together and divide the sum by the number of data points in the data set.
      Example: Data set; 1, 2, 2, 9, 8. (1 + 2 + 2 + 9 + 8) / 5

      Median: Arrange all the data points from small to large and choose the number that is physically in the middle. If there is an even number of data points, then choose the two numbers in the (physical) middle and find the mean of the two numbers.
      Example: Data set; 1, 2, 2, 9, 8, 10. Small to Large; 1, 2, 2, 8, 9, 10. Find the mean of 2 & 8.

      Mode: The mode is the number that appears most frequently in a data set.
      Example: Data set; 1, 2, 2, 9, 4, 10, 4. Mode: 2 and 4
      (19 votes)
  • blobby green style avatar for user etstraka29
    Why is Ana so bad at golf
    (11 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Victor Covalciuc
    At . If removing a number that is larger than the mean will make the mean itself go down, what will then happen with the median in this case? (when removing a number larger than the median)
    (5 votes)
    Default Khan Academy avatar avatar for user
  • piceratops seedling style avatar for user LandonK
    Pretty useful but how will we solve for the mean if it has a negative number?
    (5 votes)
    Default Khan Academy avatar avatar for user
  • blobby blue style avatar for user MathKid
    Starting from to , how does Sal find the mean without calculating? I thought about it and still couldn't understand how the mean increases, because removing one number means decreasing the total. If he removed 80, the original mean would drop.
    (2 votes)
    Default Khan Academy avatar avatar for user
    • mr pink green style avatar for user David Severin
      Actually, Sal is correct, if you remove a number that is lower than the mean, the mean would increase. You have to remember that you are not only removing the 80 which decreases the total, but you are also removing one of the numbers, so the denominator also drops from 5 to 4. Dividing the sum of the higher number by 4 increases the mean.
      (6 votes)
  • duskpin ultimate style avatar for user aramos15
    i just survived cardiac arrest
    (3 votes)
    Default Khan Academy avatar avatar for user
  • male robot hal style avatar for user Tom Wang
    at ,why does the mean have to go up?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • cacteye blue style avatar for user Jerry Nilsson
      80 is the lowest score.
      All the other four scores are greater than 80, so they can be written as
      80 + ๐‘Ž, 80 + ๐‘, 80 + ๐‘, and 80 + ๐‘‘, for some positive values ๐‘Ž, ๐‘, ๐‘, ๐‘‘.

      The mean of these five scores is
      (80 + (80 + ๐‘Ž) + (80 + ๐‘) + (80 + ๐‘) + (80 + ๐‘‘))โˆ•5 =
      = (5 โˆ™ 80 + ๐‘Ž + ๐‘ + ๐‘ + ๐‘‘)โˆ•5 = 80 + (๐‘Ž + ๐‘ + ๐‘ + ๐‘‘)โˆ•5

      If we remove the lowest score, then the new mean will be
      ((80 + ๐‘Ž) + (80 + ๐‘) + (80 + ๐‘) + (80 + ๐‘‘))โˆ•4 =
      = (4 โˆ™ 80 + ๐‘Ž + ๐‘ + ๐‘ + ๐‘‘)โˆ•4 = 80 + (๐‘Ž + ๐‘ + ๐‘ + ๐‘‘)โˆ•4

      ๐‘Ž, ๐‘, ๐‘, ๐‘‘ > 0 โ‡’ ๐‘Ž + ๐‘ + ๐‘ + ๐‘‘ > 0 โ‡’
      โ‡’ (๐‘Ž + ๐‘ + ๐‘ + ๐‘‘)โˆ•4 > (๐‘Ž + ๐‘ + ๐‘ + ๐‘‘)โˆ•5, and thereby the new mean must be greater than the previous mean.
      (5 votes)
  • duskpin seed style avatar for user Marleyssa
    l,m like math
    (3 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user oliviasun116
    what does it mean by "standard deviation above the mean"?
    (2 votes)
    Default Khan Academy avatar avatar for user

Video transcript

- "Ana played five rounds of golf "and her lowest score was an 80. " The scores of the first four rounds and the lowest round "are shown in the following dot plot." And we see it right over here. The lowest round she scores an 80, she also scores a 90 once, a 92 once, a 94 once, and a 96 once. "It was discovered that Ana broke some rules when she scored "80, so that score", so I guess cheating didn't help her, "so that score will be removed from the data set." So they removed that 80 right over there. We're just left with the scores from the other four rounds. "How will the removal of the lowest round "affect the mean and the median?" So let's actually think about the median first. So the median is the middle number. So over here when you had five data points the middle data point is gonna be the one that has two to the left and two to the right. So the median up here is going to be 92. The median up there is 92. And what's the median once you remove this? Now you only have four data points. When you're trying to find the median of an even number of numbers you look at the middle two numbers. So that's a 92 and a 94. And then you take the average of them. You go halfway between them to figure out the median. So the median here is going to be, let me do that a little bit clearer. The median over here is going to be halfway between 92 and 94 which is 93. So the median, the median is 93. Median is 93. So removing the lowest data point in this case increased the median. So the median, let me write it down here. So the median increased by a little bit. The median increases. Now what's going to happen to the mean? What's going to happen to the mean? Well one way to think about it without having to do any calculations is if you remove a number that is lower than the mean, lower than the existing mean, and I haven't calculated what the existing mean is, but if you remove that the mean is going to go up. The mean is going to go up. So hopefully that gives you some intuition. If you removed a number that's larger than the mean your mean is, your mean is going to go down cause you don't have that large number anymore. If you remove a number that's lower than the mean, well you take that out, you don't have that small number bringing the average down and so the mean will go up. But let's verify it mathematically. So let's calculate the mean over here. So we're gonna add 80, plus 90, plus 92, plus 94, plus 96. Those are our data points. And that gets us: two plus four is six, plus six is 12. And then we have one plus eight is nine, and this is, so these are nine and then you have another nine, another nine, another nine, another nine. You essentially have, this is five nines right over here. So this is going to be 452. So that's the sum of the scores of these five rounds, and then you divide it by the number of rounds you have. So it would be 452 divided by five. So 452 divided by five is going to give us, five goes into, it doesn't go into four, it goes into 45 nine times. Nine times five is 45, you subtract, get zero, bring down the two. Five goes into two zero times, zero times five is, zero times five is zero, subtract. You have two left over, so you can say that the mean here, the mean here is 90 and 2/5. Not nine and 2/5, 90 and 2/5. So the mean is right around here. So that's the mean of these data points right over there. And if you remove it what is the mean going to be? So here we're just going to take our 90, plus our 92, plus our 94, plus our 96, add 'em together. So let's see, two plus four plus six is 12. And then you add these together you're gonna get 37. 372 divided by four, cause I have four data points now, not five. Four goes into, let me do this in a place where you can see it. So four goes into 372, goes into 37 nine times. Nine times four is 36, subtract, you get a one. Bring down the two, it goes exactly three times. Three times four is 12. You have no remainder. So the median and the mean here are both, so this is also the mean. The mean here is also 93. So you see that the median, the median went from 92 to 93, it increased. The mean went from 90 and 2/5 to 93. So the mean increased by more than the median. They both increased but the mean increased by more. And it makes sense cause this number was way, way below all of these over here. So you could imagine if you take this out the mean should increase by a good amount. But let's see which of these choices are what we just described. "Both the mean and the median will decrease", nope. "Both the mean and the median will decrease", nope. "Both the mean and the median will increase, "but the mean will increase by more than the median." That's exactly, that's exactly, what happened. The mean went from 90 and 2/5 or 90.4, went from 90.4 or 90 and 2/5 to 93. And then the median only increased by one. So this is the right answer.