Current time:0:00Total duration:12:35

0 energy points

Studying for a test? Prepare with these 3 lessons on Data Handling.

See 3 lessons

# Statistics: The average

Video transcript

Welcome to the
playlist on statistics, something I've been meaning
to do for some time. So anyway, I just want to get
right into the meat of it. And I'll try to do as
many examples as possible, and hopefully give you the
feel for what statistics is all about. And really just to
start off, in case you're not familiar with
it, I think a lot of people have an intuitive feel
for what statistics is about-- statistics. In very general terms, it's
getting your head around data. And it can broadly be classified
into maybe three categories. You have descriptive, so
say you have a lot of data and you wanted to
tell someone about it without giving them
all of the data. Maybe you can find indicative
numbers that somehow represent all of that data without
having to go over all the data. So that would be
descriptive statistics. There's also predictive. Well, I'll kind of
group them together. There's inferential statistics. This is when you use
data to essentially make conclusions about things. So let's say you've sampled
some data from a population, and we'll talk a lot about
samples versus populations, but I think just a basic
sense of what that is. If I survey three people who
are going to vote for president, I clearly haven't surveyed
the entire population. I've surveyed a sample. But what inferential
statistics are all about are if we can do some
math on the samples, maybe we can make
inferences or conclusions about the population as a whole. Anyway, that's just
the big picture of what statistics is all about. So let's just get
into the meat of it. We'll start with
the descriptive. So the first thing that
I would want to do it, or I think most
people would want to do when they're given a
whole set of numbers and they're told to describe it. It's like well,
maybe I can come up with some number that is
most indicative of all of the numbers in that
set, or some number that represents the central tendency. This is a word you'll see
a lot in statistics books. The central tendency
of a set of numbers. This is also called the average. And I'll be a little
bit more exact here than I normally am
with the word average. When I talk about
in this context, it just means that
the average is a number that
somehow is giving us a sense of the central
tendency, or maybe a number that is most representative of a set. I know that sounds
all very abstract, but let's do a
couple of examples. So there's a bunch of
ways you can actually measure of the central
tendency, or the average, of a set of numbers, and you've
probably seen these before. They are the mean-- actually,
there are types of means, but we'll stick with
the arithmetic mean. Later when we talk about
stock returns and things, we'll do geometric means. And maybe we'll cover the
harmonic mean one day. There's a mean, the
median, and the mode. And in statistics
speak, these all can kind of be
representative of a data set, or a population's central
tendency, or a sample central tendency. And they all are
collectively-- they can all be forms of an average. And I think when
we see examples, it'll make a little
bit more sense. In every day speak when people
talk about an average-- I think you've already computed
averages in your life-- they're usually talking
about the arithmetic mean. So normally when someone
says, let's take the average of these numbers and they
expect you to do something, they want you to figure
out the arithmetic mean. They don't want you to figure
out the median or the mode. But before we go
any further, let's figure out what
these things are. So let me make up
a set of numbers. Let's say I have the number
1, let's say I have another 1, 2, 3, let's say I have a 4. That's good enough. We just wanted a simple example. So the mean, or the
arithmetic mean, is probably what you're
most familiar with when people talk about average. You add up all the numbers
and you divide by the numbers that there are. So in this case, it would be
1 plus 1 plus 2 plus 3 plus 4, and you're going to divide
by 1, 2, 3, 4, 5 numbers. 1 plus 1 is 2. 2 plus 2 is 4. 4 plus 3 is 7. 7 plus 4 is 11. So this is equal to 11 over 5. That's 2 and 1/5, so
that's equal to 2.2. And so someone could
say, hey, that is a pretty good representative
number of this set. That's the number that all of
these numbers, you can say, are closest to. 2.2 represents the central
tendency of this set. And in common speak, that
would be the average. But if we're being a
little bit more particular, this is the arithmetic mean
of this set of numbers. And you see it represents them. If I didn't want to give you
the list of five numbers, I could say, well, I have
a set of five numbers and their mean is 2.2. And it tells you
where the numbers are. And we'll talk a little
bit more about how do you know how
far the numbers are from that mean in
the next video. So that's one measure. Another measure-- instead
of averaging it in this way, you could average it
by putting the numbers in order, which I
actually already did. So let's just write them down
in order again-- 1, 1, 2, 3, 4. And you just take
the middle number. So let's see. There's one, two, three,
four, five numbers. So the middle number is going
to be right here, right? The middle number is 2. There's two numbers
greater than 2, and there's two
numbers less than 2. And this is called the median. So it's actually very
little computation. You just have to essentially
sort the numbers. And then you find
whatever number where you have an equal
number greater than or less than that number. So the median of this set is 2. You see, that's actually
fairly close to the mean. And there's no right answer. One of these isn't a better
answer for the average. They're just different ways
of measuring the average. So here, it's the median. And I know what you
might be thinking-- well, that was easy enough
when we had five numbers. But what if we had six numbers? What if it was like this? What if this was
our set of numbers? 1, 1, 2, 3, let's
add another 4 there. So now there's no middle number. 2 is not the middle
number because there's two less than it and
three larger than it. And then 3 is not
the middle number because there's two larger
and three smaller than it. So there's no middle number. So when you have a set with
even numbers and someone tells you to figure out the
median, what you do is you take the
middle two numbers. And then you take the arithmetic
mean of those two numbers. So in this case, of this
set, the median would be 2.5. Fair enough. But let's put this
aside, because I want to compare the
median and the means and the modes for the
same set of numbers. But that's a good thing to
know, because sometimes it can be a little confusing. These are all definitions. These are all mathematical
tools for getting our heads around numbers. It's not like one day someone
saw one of these formulas on the face of the sun
and says, oh, that's part of the universe. That is how the average
should be calculated. These are human
constructs to just get our heads around
large sets of data. I mean, this isn't
a large set of data. But instead of five numbers,
if we had five million numbers, you could imagine
that you don't like thinking about every
number individually. Anyway, before I
talk more about that, let me tell you
what the mode is. And the mode, to
some degree, it's the one that I think most
people forget or never learn. And when they see
it on an exam, it confuses them because
they're like, oh, that sounds very advanced. But in some ways,
it is the easiest of all of the measures of
central tendency or of average. The mode is essentially what
number is most common in a set. So in this example,
there's two ones and then there's one
of everything else. So the mode here is 1. So mode you can say is
the most common number. And then you could say, hey,
Sal, what if this was our set? 1, 1, 2, 3, 4, 4. Here, I have two 1s
and I have two 4s. And this is where the mode
gets a little bit tricky, because either of
these would have been a decent
answer for the mode. You could have actually
said the mode of this is 1, or the mode of this is 4. And it gets a little
bit ambiguous, and you probably
want a little clarity from the person asking you. Most times on a test
when they ask you, there's not going to
be this ambiguity. There will be a most
common number in the set. Now you're saying,
oh, why wasn't just one of these good enough? Why did we learn averages? Why don't we just use
arithmetic mean all the time? What's median and mode good for? Well, I'll try to do
one example of that and see if it rings
true with you. And then you could
think a little bit more. Let's say I had this set
of numbers-- 3, 3, 3, 3, 3, and 100. So what's the
arithmetic mean here? What's the mean here? So one, two, three,
four, five 3s and 100. So it would be 115 divided by 6. Because I have one, two,
three, four, five, six numbers. 115 is just the sum
of all of these. So that's equal to-- how many
times does 6 go into 115? 6 goes into it one time. 1 times 6 is 6. 55 goes into it 9 times. 9 times 6 is 54. So it's equal to 19 and 1/6. Fair enough. I just added all the numbers and
divided by how many there are. But my question is, is
this really representative of this set? I mean, I have a
ton of 3s and then I have 100 all of a sudden. And we're saying that the
central tendency is 19 and 1/6. 19 and 1/6 doesn't really
seem indicative of this set. I mean, maybe it does,
depending on your application. But it just seems
a little bit off. I mean, my intuition would
be that central tendency is something closer to 3. Because there's
a lot of 3s here. So what the median tell us? We already put these
numbers in order. If I had given you
out of order, you'd want to put it in this order. And you'd say, what's
the middle number? Let's see. The middle two numbers,
since I have an even number, are 3 and 3. So if I take the
average of 3 and 3-- I should be particular
with my language-- if I take the arithmetic
mean of 3 and 3, I get 3. And this is maybe a
better measurement of the central tendency
or of the average of this set of numbers. Essentially, what it does
is by taking the median, I wasn't so much affected by
this really large number that's very different than the others. In statistics, they
call that an outlier. If you talked about
average home prices, maybe every house in the
city is $100,000 and then there's one house
that costs a trillion dollars. And then if someone told
you the average house price was a million
dollars, you might have a very wrong
perception of that city. But the median house
price would be $100,000, and you'd get a better sense
of what the houses in that city are like. So similarly, this
median gives you a better sense of what the
numbers in this set are like. The arithmetic mean was skewed
by what they'd call an outlier. Being able to tell
what an outlier is is one of those things that
a statistician will say, well, I know it when I see it. There isn't really a
formal definition for it, but it tends to be a number
that really sticks out. And sometimes it's due to a
measurement error, or whatever. And then finally, the mode. What is the most common
number in this set? Well, there's five 3s
and there's one 100. So the most common number
is, once again, it's a 3. So in this case, when
you had this outlier, the median and the mode tend
to be maybe a little bit better about giving you an
indication of what these numbers represent. Maybe this was just
a measurement error. But I don't know. We don't actually know
what these represent. If these are house
prices, then I would argue that
these are probably more indicative measures of
what the houses in an area cost. But if this is something else,
if this is scores on a test, maybe, maybe there is one kid
in the class-- one out of six kids-- who did
really, really well, and everyone else didn't study. And this is more indicative
of how students at that level do on average. Anyway, I'm done talking
about all of this. I encourage you to play
with a lot of numbers and deal with the
concepts yourself. In the next video, we'll explore
more descriptive statistics. Instead of talking about
the central tendency, we'll talk about how
spread apart things are away from the
central tendency. See you in the next video.