Current time:0:00Total duration:12:35
0 energy points
Video transcript
Welcome to the playlist on statistics, something I've been meaning to do for some time. So anyway, I just want to get right into the meat of it. And I'll try to do as many examples as possible, and hopefully give you the feel for what statistics is all about. And really just to start off, in case you're not familiar with it, I think a lot of people have an intuitive feel for what statistics is about-- statistics. In very general terms, it's getting your head around data. And it can broadly be classified into maybe three categories. You have descriptive, so say you have a lot of data and you wanted to tell someone about it without giving them all of the data. Maybe you can find indicative numbers that somehow represent all of that data without having to go over all the data. So that would be descriptive statistics. There's also predictive. Well, I'll kind of group them together. There's inferential statistics. This is when you use data to essentially make conclusions about things. So let's say you've sampled some data from a population, and we'll talk a lot about samples versus populations, but I think just a basic sense of what that is. If I survey three people who are going to vote for president, I clearly haven't surveyed the entire population. I've surveyed a sample. But what inferential statistics are all about are if we can do some math on the samples, maybe we can make inferences or conclusions about the population as a whole. Anyway, that's just the big picture of what statistics is all about. So let's just get into the meat of it. We'll start with the descriptive. So the first thing that I would want to do it, or I think most people would want to do when they're given a whole set of numbers and they're told to describe it. It's like well, maybe I can come up with some number that is most indicative of all of the numbers in that set, or some number that represents the central tendency. This is a word you'll see a lot in statistics books. The central tendency of a set of numbers. This is also called the average. And I'll be a little bit more exact here than I normally am with the word average. When I talk about in this context, it just means that the average is a number that somehow is giving us a sense of the central tendency, or maybe a number that is most representative of a set. I know that sounds all very abstract, but let's do a couple of examples. So there's a bunch of ways you can actually measure of the central tendency, or the average, of a set of numbers, and you've probably seen these before. They are the mean-- actually, there are types of means, but we'll stick with the arithmetic mean. Later when we talk about stock returns and things, we'll do geometric means. And maybe we'll cover the harmonic mean one day. There's a mean, the median, and the mode. And in statistics speak, these all can kind of be representative of a data set, or a population's central tendency, or a sample central tendency. And they all are collectively-- they can all be forms of an average. And I think when we see examples, it'll make a little bit more sense. In every day speak when people talk about an average-- I think you've already computed averages in your life-- they're usually talking about the arithmetic mean. So normally when someone says, let's take the average of these numbers and they expect you to do something, they want you to figure out the arithmetic mean. They don't want you to figure out the median or the mode. But before we go any further, let's figure out what these things are. So let me make up a set of numbers. Let's say I have the number 1, let's say I have another 1, 2, 3, let's say I have a 4. That's good enough. We just wanted a simple example. So the mean, or the arithmetic mean, is probably what you're most familiar with when people talk about average. You add up all the numbers and you divide by the numbers that there are. So in this case, it would be 1 plus 1 plus 2 plus 3 plus 4, and you're going to divide by 1, 2, 3, 4, 5 numbers. 1 plus 1 is 2. 2 plus 2 is 4. 4 plus 3 is 7. 7 plus 4 is 11. So this is equal to 11 over 5. That's 2 and 1/5, so that's equal to 2.2. And so someone could say, hey, that is a pretty good representative number of this set. That's the number that all of these numbers, you can say, are closest to. 2.2 represents the central tendency of this set. And in common speak, that would be the average. But if we're being a little bit more particular, this is the arithmetic mean of this set of numbers. And you see it represents them. If I didn't want to give you the list of five numbers, I could say, well, I have a set of five numbers and their mean is 2.2. And it tells you where the numbers are. And we'll talk a little bit more about how do you know how far the numbers are from that mean in the next video. So that's one measure. Another measure-- instead of averaging it in this way, you could average it by putting the numbers in order, which I actually already did. So let's just write them down in order again-- 1, 1, 2, 3, 4. And you just take the middle number. So let's see. There's one, two, three, four, five numbers. So the middle number is going to be right here, right? The middle number is 2. There's two numbers greater than 2, and there's two numbers less than 2. And this is called the median. So it's actually very little computation. You just have to essentially sort the numbers. And then you find whatever number where you have an equal number greater than or less than that number. So the median of this set is 2. You see, that's actually fairly close to the mean. And there's no right answer. One of these isn't a better answer for the average. They're just different ways of measuring the average. So here, it's the median. And I know what you might be thinking-- well, that was easy enough when we had five numbers. But what if we had six numbers? What if it was like this? What if this was our set of numbers? 1, 1, 2, 3, let's add another 4 there. So now there's no middle number. 2 is not the middle number because there's two less than it and three larger than it. And then 3 is not the middle number because there's two larger and three smaller than it. So there's no middle number. So when you have a set with even numbers and someone tells you to figure out the median, what you do is you take the middle two numbers. And then you take the arithmetic mean of those two numbers. So in this case, of this set, the median would be 2.5. Fair enough. But let's put this aside, because I want to compare the median and the means and the modes for the same set of numbers. But that's a good thing to know, because sometimes it can be a little confusing. These are all definitions. These are all mathematical tools for getting our heads around numbers. It's not like one day someone saw one of these formulas on the face of the sun and says, oh, that's part of the universe. That is how the average should be calculated. These are human constructs to just get our heads around large sets of data. I mean, this isn't a large set of data. But instead of five numbers, if we had five million numbers, you could imagine that you don't like thinking about every number individually. Anyway, before I talk more about that, let me tell you what the mode is. And the mode, to some degree, it's the one that I think most people forget or never learn. And when they see it on an exam, it confuses them because they're like, oh, that sounds very advanced. But in some ways, it is the easiest of all of the measures of central tendency or of average. The mode is essentially what number is most common in a set. So in this example, there's two ones and then there's one of everything else. So the mode here is 1. So mode you can say is the most common number. And then you could say, hey, Sal, what if this was our set? 1, 1, 2, 3, 4, 4. Here, I have two 1s and I have two 4s. And this is where the mode gets a little bit tricky, because either of these would have been a decent answer for the mode. You could have actually said the mode of this is 1, or the mode of this is 4. And it gets a little bit ambiguous, and you probably want a little clarity from the person asking you. Most times on a test when they ask you, there's not going to be this ambiguity. There will be a most common number in the set. Now you're saying, oh, why wasn't just one of these good enough? Why did we learn averages? Why don't we just use arithmetic mean all the time? What's median and mode good for? Well, I'll try to do one example of that and see if it rings true with you. And then you could think a little bit more. Let's say I had this set of numbers-- 3, 3, 3, 3, 3, and 100. So what's the arithmetic mean here? What's the mean here? So one, two, three, four, five 3s and 100. So it would be 115 divided by 6. Because I have one, two, three, four, five, six numbers. 115 is just the sum of all of these. So that's equal to-- how many times does 6 go into 115? 6 goes into it one time. 1 times 6 is 6. 55 goes into it 9 times. 9 times 6 is 54. So it's equal to 19 and 1/6. Fair enough. I just added all the numbers and divided by how many there are. But my question is, is this really representative of this set? I mean, I have a ton of 3s and then I have 100 all of a sudden. And we're saying that the central tendency is 19 and 1/6. 19 and 1/6 doesn't really seem indicative of this set. I mean, maybe it does, depending on your application. But it just seems a little bit off. I mean, my intuition would be that central tendency is something closer to 3. Because there's a lot of 3s here. So what the median tell us? We already put these numbers in order. If I had given you out of order, you'd want to put it in this order. And you'd say, what's the middle number? Let's see. The middle two numbers, since I have an even number, are 3 and 3. So if I take the average of 3 and 3-- I should be particular with my language-- if I take the arithmetic mean of 3 and 3, I get 3. And this is maybe a better measurement of the central tendency or of the average of this set of numbers. Essentially, what it does is by taking the median, I wasn't so much affected by this really large number that's very different than the others. In statistics, they call that an outlier. If you talked about average home prices, maybe every house in the city is $100,000 and then there's one house that costs a trillion dollars. And then if someone told you the average house price was a million dollars, you might have a very wrong perception of that city. But the median house price would be $100,000, and you'd get a better sense of what the houses in that city are like. So similarly, this median gives you a better sense of what the numbers in this set are like. The arithmetic mean was skewed by what they'd call an outlier. Being able to tell what an outlier is is one of those things that a statistician will say, well, I know it when I see it. There isn't really a formal definition for it, but it tends to be a number that really sticks out. And sometimes it's due to a measurement error, or whatever. And then finally, the mode. What is the most common number in this set? Well, there's five 3s and there's one 100. So the most common number is, once again, it's a 3. So in this case, when you had this outlier, the median and the mode tend to be maybe a little bit better about giving you an indication of what these numbers represent. Maybe this was just a measurement error. But I don't know. We don't actually know what these represent. If these are house prices, then I would argue that these are probably more indicative measures of what the houses in an area cost. But if this is something else, if this is scores on a test, maybe, maybe there is one kid in the class-- one out of six kids-- who did really, really well, and everyone else didn't study. And this is more indicative of how students at that level do on average. Anyway, I'm done talking about all of this. I encourage you to play with a lot of numbers and deal with the concepts yourself. In the next video, we'll explore more descriptive statistics. Instead of talking about the central tendency, we'll talk about how spread apart things are away from the central tendency. See you in the next video.