If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Statistics: The average

Introduction to descriptive statistics and central tendency. Ways to measure the average of a set: median, mean, mode. Created by Sal Khan.

Want to join the conversation?

  • male robot hal style avatar for user Ammar Ali
    what if you had to find the mode of 1,2,3,4,5. What would you do then?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • duskpin ultimate style avatar for user Lee Merrick
      a mode is a number that shows up most frequently. If there are two numbers that have the same (maximum) frequency, then we say that the data is bi-modal. If there are three numbers with the same (maximum) frequency, we say the data is tri-modal. If there is more than three numbers with the same maximum frequency, we say there is no mode.

      In the example you provide, there is no mode.
      (21 votes)
  • purple pi purple style avatar for user likeaboss06
    what is the difference between arithmetic mean and mean?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • hopper cool style avatar for user mcflip
      Just as there are different types of averages (mean, median and mode) there's actually further types of mean average (arithmetic, geometric and harmonic).

      Just like when someone says 'average' they tend to be referring to 'mean average'. Similarly when someone generally speaking says 'mean' they tend to be referring more specifically to the type of mean average known as the arithmetic mean (where you add all the terms together and divide by the number of terms). There are other less common means though like the geometric mean (obtained by multiplying all the terms together and then taking the nth root). Sal alludes to these other types of mean averages at the start of the video

      There's some more information here: http://en.wikipedia.org/wiki/Average and I'm sure through-out other Khan Academy videos
      (6 votes)
  • purple pi purple style avatar for user Tobia
    Let's suppose i use statistics to analize measurements. What could the mode be good for, considering that many times the measurements go to thousandths and it's highly unprobable to have two equal measurements?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Dr C
      In my opinion, the only use of the mode for numeric (or quantitative) data is for the general concept of the number of modes of a distribution, not really as a practical statistic. The only exception to this is when there is a few distinct numbers that can be observed (e.g., "On a scale of 1-5, rate such-and-such"). But that starts to blur the difference between numeric and categorical data.

      For categorical (or qualitative) data, the mode is useful. With this type of data, we have distinct categories, so the notion of "which category is the most frequently observed" is a more reasonable idea to think about.
      (3 votes)
  • female robot ada style avatar for user yave millan
    what is the diffrence between central tendency and average
    (3 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user James Xia
      Sorry Yave Millan if I am late, but the difference of central tendency and average depends on the situation you are using them in. First off, if you are doing statistics and they ask for you to find the all the averages, all the central tendencies (arithmetic mean, median, mode) are actually the averages. An average gives a number that is most inclusive of all the elements of a set, and the arithmetic mean, median, and/or mode does that (depending on which one is suits the set the best, i;e , where he talks about 3, 3, 3, 3, 3, 100), thus they are averages. Basically, the central tendencies are averages.
      However, if someone gives you some numbers, asks you to find the "average" of them in any day to day situation, you are expected to find the arithmetic mean.
      Sorry if I'm a bit confusing. If you need any help, be sure to reply.
      (2 votes)
  • aqualine tree style avatar for user Keara Brimhall
    what is central tendency and what is central limit theorem? i don't recall him using either words/ phrases. can anyone give me an easy 2 comprehend definition?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Matthew Daly
      The Central Limit Theorem is a very complicated theorem that you'll learn later. In a nutshell, it says that if you repeat some random event enough times you can predict what the overall results are even though each individual event is unpredictable.

      Central Tendency is more basic, and it just means that you want to find some way to take a large collection of numbers and produce one "central" number that represents all of them. That's not a term we use very often, since we usually talk about the specific formula we use to generate that number, like the mean, median, or mode.
      (2 votes)
  • blobby green style avatar for user Existentior
    You mentioned statisticians say "there isn't really a formal definition for it, but when I see it I know it". Does the Inter Quartile Range (IQR) not show you the possible outliers (1*,2* sd) and extreme outliers (3*sd)?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Dr C
      The "inner fences" are defined as:
      Q3 + 1.5*IQR
      Q1 - 1.5*IQR

      Observations falling outside of of these bounds can be marked as potential outliers. However, it's not a formal (meaning, official or technical) definition. It's just a rule of thumb. There may be values that are outliers which do not fall beyond these bounds, or there may be values beyond these bounds that are not outliers. The sample mean is a formal definition - provided some data, we can calculate it with certainty. The same just isn't the case with "outlierness."

      There are similar (less frequently used, in my experience) rules of thumb using the standard deviation. Usually something like values outside of
      xbar + 3*sd
      xbar - 3*sd

      Though, using xbar and the standard deviation to detect outliers has always struck me as a bit odd, because if there are any outliers, they are inherently going to have a relatively large effect on the values of the mean and standard deviation. I prefer using the robust measures instead.
      (3 votes)
  • blobby green style avatar for user ddeclerk123
    Properties of mean, median, and mode
    (2 votes)
    Default Khan Academy avatar avatar for user
  • purple pi purple style avatar for user Lone Wolf
    in this would it be possible if like you know how 0% is impossible? well is it possible if its like -1%. if so, what would it mean?
    (1 vote)
    Default Khan Academy avatar avatar for user
  • marcimus pink style avatar for user Sibichan Joseph
    can you have more than one mode
    (1 vote)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user HillBMH
    What is the mode of a set of numbers with no duplicates? (EX: 1,3,5,7,11)
    (2 votes)
    Default Khan Academy avatar avatar for user

Video transcript

Welcome to the playlist on statistics. Something I've been meaning to do for some time. So anyway, I just want to get right into the meat of it and I'll try to do as many examples as possible and hopefully give you the feel for what statistics is all about. And, really, just to kind of start off in case you're not familiar with it -- although, I think a lot of people have an intuitive feel for what statistics is about. And essentially -- well in very general terms it's kind of getting your head around data. And it can broadly be classified. Well there are maybe three categories. You have descriptive. So say you have a lot of data and you wanted to tell someone about it without giving them all of the data. Maybe you can kind of find indicative numbers that somehow represent all of that data without having to go over all of the data. That would be descriptive statistics. There's also predictive. Well, I'll kind of group them together. There's inferential statistics. And this is when you use data to essentially make conclusions about things. So let's say you've sampled some data from a population -- and we'll talk a lot about samples versus populations but I think you have just a basic sense of what that is, right? If I survey three people who are going to vote for president, I clearly haven't surveyed the entire population. I've surveyed a sample. But what inferential statistics are all about are if we can do some math on the samples, maybe we can make inferences or conclusions about the population as a whole. Well, anyway, that's just a big picture of what statistics is all about. Let's just get into the meat of it and we'll start with the descriptive. So the first thing that, I don't know, that I would want to do or I think most people would want to do when they are given a whole set of numbers in they're told to describe it. Well, maybe I can come up with some number that is most indicative of all of the numbers in that set. Or some number that represents, kind of, the central tendency -- this is a word you'll see a lot in statistics books. The central tendency of a set of numbers. And this is also called the average. And I'll be a little bit more exact here than I normally am with the word "average." When I talk about it in this context, it just means that the average is a number that somehow is giving us a sense of the central tendency. Or maybe a number that is most representative of a set. And I know that sounds all very abstract but let's do a couple of examples. So there's a bunch of ways that you can actually measure the central tendency or the average of a set of numbers. And you've probably seen these before. They are the mean. And actually, there's types of means but we'll stick with the arithmetic mean. Later, when we talk about stock returns and things, we'll do geometric means and maybe we'll cover the harmonic mean one day. There's a mean, the median, and the mode. And in statistics speak, these all can kind of be representative of a data sets or population central tendency or a sample central tendency. And they all are collectively -- they can all be forms of an average. And I think when we see examples, it'll make a little bit more sense. In every day speak, when people talk about an average, I think you've already computed averages in your life, they're usually talking about the arithmetic mean. So normally when someone says, "Let's take the average of these numbers." And they expect you to do something, they want you to figure out the arithmetic mean. They don't want you figure out the median or the mode. But before we go any further, let's figure out what these things are. Let me make up a set of numbers. Let's say I have the number 1. Let's say I have another 1, a 2, a 3. Let's say I have a 4. That's good enough. We just want a simple example. So the mean or the arithmetic mean is probably what you're most familiar with when people talk about average. And that's essentially -- you add up all the numbers and you divide by the numbers that there are. So in this case, it would be 1 plus 1 plus 2 plus 3 plus 4. And you're going to divide by one, two, three, four, five numbers. It's what? 1 plus 1 is 2. 2 plus 2 is 4. 4 plus 3 is 7. 7 plus 4 is 11. So this is equal to 11/5. That's what? That's 2 1/5? So that's equal to 2.2. And so someone could say, "Hey, you know. That is a pretty good representative number of this set. That's the number that all of these numbers you can kind of say are closest to." Or, 2.2 represents the central tendency of this set. And in common speak, that would be the average. But if we're being a little bit more particular, this is the arithmetic mean of this set of numbers. And you see it kind of represents them. If I didn't want to give you the list of five numbers, I could say, "Well, you know, I have a set of five numbers and their mean is 2.2." It kind of tells you a little bit of at least, you know, where the numbers are. We'll talk a little bit more about how do you know how far the numbers are from that mean in probably the next video. So that's one measure. Another measure, instead of averaging it in this way, you can average it by putting the numbers in order, which I actually already did. So let's just write them down in order again. 1, 1, 2, 3, 4. And you just take the middle number. So let's see, there's one, two, three, four, five numbers. So the middle number's going to be right here, right? The middle number is 2. There's two numbers greater than 2 and there's two numbers less than 2. And this is called the median. So it's actually very little computation. You just have to essentially sort the numbers. And then you find whatever number where you have an equal number greater than or less than that number. So the median of this set is 2. And you see, I mean, that's actually fairly close to the mean. And there's no right answer. One of these isn't a better answer for the average. They're just different ways of measuring the average. So here it's the median. And I know what you might be thinking. "Well, that was easy enough when we had five numbers. What if we had six numbers?" What if it was like this? What if this was our set of numbers? 1, 1, 2, 3, let's add another 4 there. So now, there's no middle number, right? I mean 2 is not the middle number because there's two less than and three larger than it. And then 3's not the middle number because there's three larger and -- sorry, there's two larger and three smaller than it. So there's no middle number. So when you have a set with even numbers and someone tells you to figure out the median, what you do is you take the middle two numbers and then you take the arithmetic mean of those two numbers. So in this case of this set, the median would be 2.5. Fair enough. But let's put this aside because I want to compare the median and the means and the modes for the same set of numbers. But that's a good thing to know because sometimes it can be a little confusing. And these are all definitions. These are all kind of mathematical tools for getting our heads around numbers. It's not like one day someone saw one of these formulas on the face of the sun and says, "Oh, that's part of the universe that this is how the average should be calculated." These are human constructs to kind of just get our heads around large sets of data. This isn't a large set of data, but instead of five numbers, if we had five million numbers, you can imagine if you don't like thinking about every number individually. Anyway, before I talk more about that, let me tell you what the mode is. And the mode to some degree, it's the one I think most people probably forget or never learn and when they see it on an exam, it confuses them because they're like, "Oh, that sounds very advanced." But in some ways, it is the easiest of all of the measures of central tendency or of average. The mode is essentially what number is most common in a set. So in this example, there's two 1's and then there's one of everything else, right? So the mode here is 1. So mode is the most common number. And then you could kind of say, "Whoa, hey Sal, what if this was our set? 1, 1, 2, 3, 4, 4." Here I have two 1's and I have two 4's. And this is where the mode gets a little bit tricky because either of these would have been a decent answer for the mode. You could have actually said the mode of this is 1 or the mode of this is 4 and it gets a little bit ambiguous. And you probably want a little clarity from the person asking you. Most times on a test when they ask you, there's not going to be this ambiguity. There will be a most common number in the set. So now it's like oh, well you know, why wasn't just one of these good enough? You know why we learned averages, why don't we just use averages? Or why don't we use arithmetic mean all the time? What's median and mode good for? Well, I'll try to do one example of that and see if it rings true with you. And then you can think a little bit more. Let's say I had this set of numbers. 3, 3, 3, 3, 3, and, I don't know, 100. So what's the arithmetic mean here? I have one, two, three, four, five 3's and 100. So it would be 115 divided by 6, right? I could have one, two, three, four, five, six numbers. 115 is just the sum of all of these. So that's equal to -- how many times does 6 go into 115? 6 goes into it one time. 1 times 6 is 6. 55 goes into it 9 times. 9 times 6 is 54. So it's equal to 19 1/6. Fair enough. I just added all the numbers and divided by how many there are. But my question is, is this really representative of this set? I mean, I have a ton of 3's and then I have 100 all of a sudden, and we're saying that the central tendency is 19 1/6. And, I mean, 19 1/6 doesn't really seem indicative of the set. I mean maybe it does, depending on your application, but it just seems a little bit off, right? I mean, my intuition would be that the central tendency is something closer to 3 because there's a lot of 3's here. So what would the median tell us? I already put these numbers in order, right? If I give it to you out of order, you'd want to put it in this order and you'd say what's the middle number? Let's see, the middle two numbers, since I have an even number, are 3 and 3. So if I take the average of 3 and 3 -- or I should be particular with my language. If I take the arithmetic mean of 3 and 3, I get 3. And this is maybe a better measurement of the central tendency or of the average of this set of numbers, right? Essentially, what it does is by taking the median, I wasn't so much affected by this really large number that's very different than the others. In statistics they call that an outlier. A number that, you know, if you talked about average home prices, maybe every house in the city is $100,000 and then there's one house that costs $1 trillion. And then if someone told you the average house price was, I don't know, $1 million, you might have a very wrong perception of that city. But the median house price would be $100,000 and you get a better sense of what the houses in that city are like. So similarly, this median, maybe, gives you a better sense of what the numbers in this set are like. Because the arithmetic mean was skewed by this, what they call an outlier. And being able to tell what an outlier is, it's kind of one of those things that a statistician will say, well, I know it when I see it. There isn't really a formal definition for it but it tends to be a number that really kind of sticks out and sometimes it's due to, you know, a measurement error or whatever. And then finally, the mode. What is the most common number in this set? Well there's five 3's and there's 100. So the most common number is, once again, it's a 3. So in this case, when you had this outlier, the median and the mode tend to be, you know, maybe they're a little bit better about giving you an indication of what these numbers represent. Maybe this was just a measurement error. But I don't know, we don't actually know what these represent. If these are house prices, then I would argue that these are probably more indicative measures of what the houses in a area cost. But if this is something else, if this is scores on a test, maybe, you know, maybe there is one kid in the class -- one out of six kids who did really, really well and everyone else didn't study. And this is more indicative of, kind of, how students at that level do on average. Anyway, I'm done talking about all of this. And I encourage you to play with a lot of numbers and deal with the concepts yourself. In the next video, we'll explore more descriptive statistics. Instead of talking about the central tendency, we'll talk about how spread apart things are away from the central tendency. See you in the next video.