Main content

### Course: Linear algebra > Unit 2

Lesson 3: Transformations and matrix multiplication# Compositions of linear transformations 2

Providing the motivation for definition of matrix products. Created by Sal Khan.

## Want to join the conversation?

- At13:32, Sal says that a1 belongs to Rn, shouldn't it be Rm(10 votes)
- Yes, in the example given , m represents the number of rows and rows travel from top to bottom.(3 votes)

- I don't get it? Is every column in a matrice really a vector in disguise?(5 votes)
- Yes. The converse is true as well, every vector is a matrix in disguise, e.g. a column vector with 5 entries is really a 5 x 1 matrix. :D

Thumbs up!(9 votes)

- How come can we take the same vector x both for S and T transformations? Especially when the corresponding co-domains are of different dimensions? I mean the statements T(x)=Bx and S(x)=Ax. Or am I just saying bollocks?(4 votes)
**x**is just the variable used for the functions.**x**isn't going to be the same for both T and S, they just both use**x**as the placeholder for the input. You'll notice when he does T(S(**x**)) that the**x**inside the T function was replaced with S(**x**), but that doesn't mean that**x**= S(**x**), it just means that the output of S is being used as the input for T.(6 votes)

- How do you prove that the combination of the composition of two given linear transformations is also a linear transformation? Let's say, V---->W , W------>U , is V------->U also a linear transformation?(1 vote)
- Let's call V->W A, W->U B and V->U C.

Since we know A and B are linear transformations, we know that

A(x + y) = A(x) + A(y) and A(cx) = cA(x)

and similarly for B

B(x + y) = B(x) + B(y) and B(cx) = cB(x)

And now we want to prove that C(x) = B(A(x)) is a linear transformation. The same conditions apply:

1) C(x + y) must be the same as C(x) + C(y) and

2) C(cx) must be equal to cC(x).

C(x + y) = B(A(x + y)) = B(A(x) + A(y)) = B(A(x)) + B(A(y)) = C(x) + C(y)

C(cx) = B(A(cx)) = B(cA(x)) = cB(A(x)) = cC(x)

We see that C(x) satisfies both conditions, so it is also a linear transformation.(8 votes)

- At5:00, why doesn't Sal just use the associative property of matrix multiplication to get that (T ∘ S)(x) = B(Ax) = (BA)x, and thus C = BA?(2 votes)
- Well, later he defined matrix multiplication as we know it. So, C is in fact BA.(2 votes)

- In the previous video, the transformation S maps the members from X to Y ( S:X->Y), hence S(x) = B x.

Similarly, The transformation T: Y -> Z, which means, the members of Y to Z. Shouldn't it be ( T(y)=B y ) or T(s(x)) = B s(x)?

why does Sal write T:Y-> Z as T(x) = B x ? and B is matrix of size l x m(2 votes)- You're right. He'd probably just not being as careful with how he labels thing as maybe he should.(1 vote)

- How do you determine the size of the matrix from the domain and the codomain? What's the trick here ..?(1 vote)
- The simple answer is that the matrix will be
**m**x**n**where the**domain is R^n**and the**codomain is R^m**.

The size of a matrix is written m rows by n columns, usually expressed as m x n. For a linear transformation T(x) from R^n (domain) to R^m (codomain) we can express it as a T(x) = A*x, where A is an m x n matrix.

For example a transformation from R^3 to R^2 (e.g. 3D world onto a 2D screen) can be expressed as a 2 x 3 matrix A multiplied by a vector in R^3 which will produce a vector in R^2.(2 votes)

- I have a vector with known length but unknown components and I know the 3 angles representing the rotation against the x,y, and z- axis. V=[length, 0, 0] and combine the transformations from there. Am I on the right track?(1 vote)
- What's the difference between this and the D=(C^(-1) A C) thing you had to do with the change of basis matrices?(1 vote)
- At6:10, Sal multiplies the identity matrix with A to define matrix multiplication; isn't this circular reasoning? What other step could you have taken to define matrix multiplication?(1 vote)

## Video transcript

In the last video, we started
with a linear transformation S, that was a mapping between
the set x, that was a subset of Rn to the set y. And then we had another
transformation that was a mapping from the set
y to the set z. And we asked ourselves,
given these two linear transformations, could
we construct a linear transformation that goes all
the way from x to z? What we did was we made
a definition. We said let's create something
called the composition of T with S. What that is, is first you apply
S to some vector in X to get some vector in Y. And that's your vector
right there. And then you apply T to
that, to get to z. And so we defined it that way. And our next question was, was
that a linear transformation? We show that it was. It met the two requirements
for them. And because it is a linear
transformation, I left off in the last video saying that
it should be able to be represented by some matrix
vector product. Where this will have to
be an l by n matrix. Because it's a mapping from an n
dimensional space, which was x-- it was a subset of Rn--
to an l dimensional space. Because z is a subset of Rl Now
in this video, let's try to actually construct
this matrix. So at the beginning of the last
video, I told you that T of x could be written as some
matrix product, B times x. Let me write that and rewrite
it down here. So I told you that the linear
transformation T applied to some vector x, could be written
as the matrix vector product, B times a vector x. And since it was a mapping from
an m dimensional space to an l dimensional space, we know
this is going to be in l by m matrix. Now similarly, I told you that
the transformation S can also be written as a matrix
vector product. Where we can say A is its
matrix representation times a vector x. And since S was a mapping from
an n dimensional space to an m dimensional space, this will
be an m by n matrix. Now by definition, what was the
composition of T with S? What is this? By definition, we said that this
is equal to-- you first apply the linear transformation
S to x. And I'll arbitrarily
switch colors. So you first apply the
transformation S to x. And that essentially gets you
a vector right there. This is just a vector in Rm. Or it's really a vector in y,
which is a subset of Rm. And then you apply the
transformation T to that vector to get you into z. Given this we can use our matrix
representations to replace this kind of
transformation representation. Although they're really
the same thing. What is a transformation
of S applied to x? Well this right here is just
A times x, where this is an m by n matrix. So we can say that this is equal
to the transformation applied to A times x. Now, what is the T
transformation applied to any vector x? Well that's the matrix B
times your vector x. So this thing right here is
going to be equal to B times whatever I put in there. So the matrix B times the matrix
A times the vector x right there. This is what our composition
transformation is. The composition of T with S
applied to the vector x. Which takes us from the set x
all the way to the set z is this, if we use the matrix
forms of the two transformations. Now at the end of last video I
said I wanted to find just some matrix that if I were to
multiply times this vector, that is equivalent to
this transformation. And I know that I can
find this matrix. I know that this exists because
this is a linear transformation. So how can we do that? Well, we just do what we've
always done in the past. We start with the identity matrix,
and we apply the transformation to every column
of the identity matrix. And then you end up with your
matrix representation of the transformation itself. So first of all, how
big is the identity matrix going to be? Well, these guys that we're
inputting into our transformation, they are subsets
of x, or they're members of x, which is
an n dimensional space, a subset of Rn. So all we do to figure out C
is we start off with the identity matrix. The n dimensional identity
matrix, because our domain is Rn. And of course we know what
that looks like. We have 1, 0 all the way down. It's going to be an n by n
matrix, and then 0, 1 all the way down 0's. These 0's right here, and then
you have 1's go all the way down the columns and everything
else is 0. We've seen this multiple times
that's what your identity matrix looks like, just 1's down
the column from the top left to the bottom right. Now to figure out C, the matrix
representation of our transformation, all we do is we
apply the transformation to each of these columns. So we can write that our matrix
C is equal to the transformation applied
to this first column. What is the transformation? It is the matrix B times the
matrix A times whatever you're taking the transformation of. In this case we're taking the
transformation of that. We're taking the transformation
of 1, 0, 0 all the way down. There's 1 followed by
a bunch of 0's. That's going to be our
first column of C. Our second column of C is going
to be B times A times the second column of our
identity matrix. And, of course, you remember
these are each the standard basis vectors for Rn. So this is going to be times
E2, which is a 0, 1, 0 all the way down. And then we're going to keep
doing that until we do get to the last column, which is B
times A times a bunch of 0's all the way down
until get a 1. The nth term is just
a 1 right there. Now what is this going
to be equal to? It looks fairly complicated
right now. But all you have to do is make
the realization-- and we've seen this multiple times. If we write our vector A or we
write our matrix A as just a bunch of column vectors. So this is a column vector A1,
A2, all the way to An. We already learned that this
was and n buy m matrix. Then what is the vector A times,
for example, x1, x2 to all the way down to xn. We've seen this multiple
times. This is the equivalent to x1
times A1 plus x2 times A2, all the way to plus xn times An. We've seen this multiple
times. It's a linear combination of
these column vectors where the waiting factors are the terms
in our vector that we're taking the product of. So given that, what is this
guy going to reduce to? This is going to be A1 times
this first entry right here, times x1, plus A2 times a second
entry, plus A3 times a third entry. But all these other
entries are 0. The x2's all the way
to the xn are 0's. So you're only going to end
up with 1 times the first column here in A. So this will reduce to--
let me write this. So the first column is going
to be B times-- now A times this E1 vector, I guess we could
call it, right there is just going to be 1 times the
first column in A plus 0 times the second column in A plus
0 times the third column. So it's just 1 times the
first column in A. So it's just A1. That simple. Now what is this one going
to be equal to? It's going to be 0 times the
first column in A, plus 1 times a second column in A, plus
0 times a third column in A, and the rest are
going to be 0's. So it's just going to be 1 times
the second column in A. So the second column in our
transformation matrix is just going to be B times A2. And I think you get
the idea here. The next one is going to be B
times A3 and all the way until you get B times An. So that's how you would
solve for your transformation matrix. Remember what we were
trying to do. We were trying to find some--
let me write down and summarize everything that
we've done so far. We had a mapping S, that was
a mapping from x to y. But x was a subset of Rn. Y was a subset of Rm. And so we said that this linear
transformation could be represented as some matrix A
where A is an m by n matrix times a vector x. Then I showed you another
transformation, we already called it T, which was a
mapping from y to z. z is a subset of Rl. And of course, the
transformation T applied to some vector in y, can be
represented as some matrix B times that vector. I shouldn't have drawn
parentheses there, but you get the idea. And this, since it's a mapping
from a subset of our Rm to Rl, this will be an l by m matrix. And then we said, look, if
we actually just take the composition of T with S,
of some vector in x, this reduced to B. So first we applied the
S transformation. We multiplied the matrix
A times x. And then we applied the T
transformation to this. So we just multiplied
B times that. Now we know this is a linear
transformation, which means it can be represented as a
matrix vector product. And we just figured out what the
matrix vector product is. So this thing is going to be
equal to C times x, which is equal to this thing right there,
which is equal to that thing right there. Which is equal to-- let me write
it this way-- B, A1, where A1 is the first column
vector in our matrix A. And then the second column
here is going to be B. And then we have A2, where
this is the second column vector in A. And you can keep going all the
way until you have B times An times x, of course. Now this is fair enough. We can always do this if you
give me some matrix. Remember this is an
l by m matrix. And you give me another matrix
right here that is an m by n matrix, I can always do this. And how do I know I can
I always do that? Because each of these
A's are going to have m entries, right? They're going to be Ai. All of them are going to
be members of our Rm. So this is well-defined. This has m columns. This has m entries. So each of these matrix vector
products are well-defined. Now, this is an interesting
thing, because we were able to figure out the actual matrix
representation of this composition transformation. Let's extend it a little
bit further. Wouldn't it be nice if this
were the same thing as the matrices B times A. All of that times x. Wouldn't it be nice if these
were the same thing? Because then we could say that
the composition of T with S of x is equal to the matrix
representation of B times a matrix representation of S. And you take the product
of those two. And that will create a new
matrix representation which you can call C. That you can then multiply
times x. So you won't have to do it
individually every time, or do it this way. And I guess the truth of the
matter is there is nothing to stop us from defining this
to be equal to B times A. We have not defined what a
matrix times a matrix is yet. So we might as well. This is a good enough motivation
for us to define it in this way. So let's throw in
this definition. So if we have some matrix B. B is an l by m matrix. And then we have some other
matrix A-- and I'll actually show what A looks like, where
these are its column vectors. A1, A2, all the way to An. We're going to define
the product. So this is a definition. We're going to define the
product BA as being equal to the matrix B times each of
the column vectors of A. So it's B times A1. That's going to be
its first column. This is going to
be B times A2. All the way to B times An. And you've seen this before in
algebra two, but the reason why I went through almost two
videos to get to here, is to show you the motivation for
why matrix products are defined this way. Because it makes the notion
of compositions of transformations kind
of natural. If you take the composition of
one linear transformation with another, the resulting
transformation matrix is just the product, as we've just
defined it, of their two transformation matrices. For those of you who might not
have a lot of experience taking products of matrices, and
who think this is fairly abstract to look at, in the next
video I'll actually do a bunch of examples and show you
that this definition is actually fairly straightforward.