Main content

## Effects of linear transformations

Current time:0:00Total duration:7:18

# How parameters change as data is shifted and scaled

## Video transcript

- [Instructor] So I have some
data here in the spreadsheet you could use Microsoft
Excel or you could use Google Spreadsheet and we're gonna use the spreadsheet to quickly calculate some parameters. Let's say this is a population. Let's say this is, we're looking
at a population of students and we wanna calculate some parameters and this is their ages, and we wanna calculate
some parameters on that. And so first I'm gonna calculate
it using the spreadsheet, and then we're gonna think about how those parameters change
as we do things to the data. If we were to shift the data up or down or if we were to multiply
all the points by some value, what does that do to
the actual parameters? So the first parameter I'm
gonna calculate as the mean. Then I'm gonna calculate
the standard deviation. Then I'm gonna calculate the median, and then I wanna calculate, let's say, the inter quartile range. Inter, I'll call it IQR. So let's do this. Let's first look at the
measures of central tendencies. So the mean, the function
on most spreadsheets is the average function, and then I cold use my mouse
and select all of these, or I could press Shift
with my arrow button and select all those. Okay, that's the mean of that data. Now let's think about what happens if I take all of that data and if I were to add a fixed amount to it. So if I took all the data and if I were to add five to it. So an easy way to do that in a spreadsheet is you select that, you add five, and then I can scroll down. And notice for every
data point I have before, I now have five more than that. So this is my new dataset, or as I'm calling Data+5. Let's see what the mean of that is. So the mean of that, notice, is exactly five more, and the same would have been true if I added or subtracted any number. The mean would change by the
amount that I add or subtract. That shouldn't surprise you, because when you're calculating the mean, you're adding all the numbers up and you're dividing by
the numbers you have. If all the numbers are five more, you're gonna add five. In this case, how many numbers are there? One, two, three, four,
five, fix, seven, eight, nine, 10, 11, 12. You're gonna add 12 more fives
and then you're gonna divide by 12, and so it makes sense
that your mean goes up by five let's think about how the
mean changes if you multiply. So if you take your data and if I were to multiply
it times five, what happens? So this equals this times five. So now all the data points
are five times more. Now what happens to my mean? Notice my mean is now five times as much. So the measures of central tendency, if I add or subtract, well I'm gonna add or subtract
the mean by that amount, and if I scale it up by five or if I scaled it down by five, well my mean would scale up
or down by that same amount, and if you numerically looked
at how you calculate a mean, it would make sense that this
is happening mathematically. Let's look at the other typical measure of central tendency,
and that is the median. To see if that has the same properties. So let's calculate the median here. So once again you order these numbers and just find the middle number. Which isn't too hard, but a computer can do it awfully fast. So that's the median for that dataset. What do you think the medians gonna be if you take all of the data plus five? Well the middle number, if you ordered all of these numbers and made them all five more, the orders, you could think of it as being the same order, but now the one in the
middle is gonna be five more. So this should be 10.5, and yes, it is indeed 10.5, and what would happen if you
multiply everything by five? Well once again, you still
have the same ordering. It should just multiply that by five. Yup, the middle number's now
gonna be five time larger. So both of these measures
of central tendency, if you shift all the data points, or if you scale them up, you're going to similarly
shift or scale up these measures of central tendency. Now let's think about
these measures of spread. See if that's the same with
these measures of spread. So standard deviation. So STDEV. I'm gonna take the population
standard deviation. I'm assuming that this
is my entire population. So let me analyze it. So let me make sure I'm doing, so standard deviation of all of this is going to be 2.99. Let's see what happens when
I shift everything by five. Actually, pause the video. What do you think is going to happen? This is a measure of spread. So if you shift, I'll
tell you what I think. If I shift everything by the same amount, the mean shifts but the distance
of everything from the mean should not change. So the standard deviation
should not change, I don't think, in this example, and indeed, it does not change. So if we shift the datasets. In this case we shifted it up by five, or if we shifted it down by one. Your measure of spread, in
this case standard deviation should not change, or at least the standard
deviation measure of spread does not change, but if we scale it, well
I think it should change, because you could imagine
a very simple dataset that things that were a
certain amount of distance from the mean are now going
to be five times further from the mean. So I think this actually should, we should multiply by five here, and it does look like that is the case. If I multiplied this by five. So scaling the dataset will
scale the standard deviation is a similar way. What about inter quartile range? Where essentially we're
taking the third quartile and subtracting from
that the first quartile to figure out kind of the
range of the middle 50%. Let's do that. We can have the quartile
function equals quartile and then we want to look at our data, and we want the third quartile. So that's gonna calculate
the third quartile. Minus quartile, same data set. So now we wanna select it again. So same dataset, but
this is now going to be the first quartile. So this is gonna give us
our inter quartile range. This calculates the third
quartile in that dataset and this calculates the first
quartile in that dataset. And we get 2.75. Now let's think about whether
the inter quartile range should change. And I don't think it will. Because remember, everything shifts, and even though the first
quartile is gonna be five more, but the third quartile is
gonna be five more as well. So the difference shouldn't change. And indeed look, the
distance does not change, or the difference does not change. But similarly, if we scale everything up, if we were to scale up the first quartile and the third quartile by five, well then their difference
should scale up by five, and we see that right over there. So the big takeaway here. I just use the example
of shifting up by five and scaling up by five, but you could subtract by any number, and you could divide by a number as well. The typical measures of central tendency mean and median, they both shift and scale as you shift and scale the data, but your typical measures of spread, standard deviation and
inter quartile range, they don't change if you shift the data, but they do change and they
scale as you scale the data.