Providing the motivation for definition of matrix products. Created by Sal Khan.
Want to join the conversation?
- At13:32, Sal says that a1 belongs to Rn, shouldn't it be Rm(10 votes)
- I don't get it? Is every column in a matrice really a vector in disguise?(4 votes)
- Yes. The converse is true as well, every vector is a matrix in disguise, e.g. a column vector with 5 entries is really a 5 x 1 matrix. :D
Thumbs up!(8 votes)
- How do you prove that the combination of the composition of two given linear transformations is also a linear transformation? Let's say, V---->W , W------>U , is V------->U also a linear transformation?(1 vote)
- Let's call V->W A, W->U B and V->U C.
Since we know A and B are linear transformations, we know that
A(x + y) = A(x) + A(y) and A(cx) = cA(x)
and similarly for B
B(x + y) = B(x) + B(y) and B(cx) = cB(x)
And now we want to prove that C(x) = B(A(x)) is a linear transformation. The same conditions apply:
1) C(x + y) must be the same as C(x) + C(y) and
2) C(cx) must be equal to cC(x).
C(x + y) = B(A(x + y)) = B(A(x) + A(y)) = B(A(x)) + B(A(y)) = C(x) + C(y)
C(cx) = B(A(cx)) = B(cA(x)) = cB(A(x)) = cC(x)
We see that C(x) satisfies both conditions, so it is also a linear transformation.(8 votes)
- How come can we take the same vector x both for S and T transformations? Especially when the corresponding co-domains are of different dimensions? I mean the statements T(x)=Bx and S(x)=Ax. Or am I just saying bollocks?(3 votes)
- x is just the variable used for the functions. x isn't going to be the same for both T and S, they just both use x as the placeholder for the input. You'll notice when he does T(S( x )) that the x inside the T function was replaced with S( x ), but that doesn't mean that x = S( x ), it just means that the output of S is being used as the input for T.(5 votes)
- At5:00, why doesn't Sal just use the associative property of matrix multiplication to get that (T ∘ S)(x) = B(Ax) = (BA)x, and thus C = BA?(2 votes)
- In the previous video, the transformation S maps the members from X to Y ( S:X->Y), hence S(x) = B x.
Similarly, The transformation T: Y -> Z, which means, the members of Y to Z. Shouldn't it be ( T(y)=B y ) or T(s(x)) = B s(x)?
why does Sal write T:Y-> Z as T(x) = B x ? and B is matrix of size l x m(2 votes)
- How do you determine the size of the matrix from the domain and the codomain? What's the trick here ..?(1 vote)
- The simple answer is that the matrix will be m x n where the domain is R^n and the codomain is R^m.
The size of a matrix is written m rows by n columns, usually expressed as m x n. For a linear transformation T(x) from R^n (domain) to R^m (codomain) we can express it as a T(x) = A*x, where A is an m x n matrix.
For example a transformation from R^3 to R^2 (e.g. 3D world onto a 2D screen) can be expressed as a 2 x 3 matrix A multiplied by a vector in R^3 which will produce a vector in R^2.(2 votes)
- I have a vector with known length but unknown components and I know the 3 angles representing the rotation against the x,y, and z- axis. V=[length, 0, 0] and combine the transformations from there. Am I on the right track?(1 vote)
- What's the difference between this and the D=(C^(-1) A C) thing you had to do with the change of basis matrices?(1 vote)
- At6:10, Sal multiplies the identity matrix with A to define matrix multiplication; isn't this circular reasoning? What other step could you have taken to define matrix multiplication?(1 vote)
In the last video, we started with a linear transformation S, that was a mapping between the set x, that was a subset of Rn to the set y. And then we had another transformation that was a mapping from the set y to the set z. And we asked ourselves, given these two linear transformations, could we construct a linear transformation that goes all the way from x to z? What we did was we made a definition. We said let's create something called the composition of T with S. What that is, is first you apply S to some vector in X to get some vector in Y. And that's your vector right there. And then you apply T to that, to get to z. And so we defined it that way. And our next question was, was that a linear transformation? We show that it was. It met the two requirements for them. And because it is a linear transformation, I left off in the last video saying that it should be able to be represented by some matrix vector product. Where this will have to be an l by n matrix. Because it's a mapping from an n dimensional space, which was x-- it was a subset of Rn-- to an l dimensional space. Because z is a subset of Rl Now in this video, let's try to actually construct this matrix. So at the beginning of the last video, I told you that T of x could be written as some matrix product, B times x. Let me write that and rewrite it down here. So I told you that the linear transformation T applied to some vector x, could be written as the matrix vector product, B times a vector x. And since it was a mapping from an m dimensional space to an l dimensional space, we know this is going to be in l by m matrix. Now similarly, I told you that the transformation S can also be written as a matrix vector product. Where we can say A is its matrix representation times a vector x. And since S was a mapping from an n dimensional space to an m dimensional space, this will be an m by n matrix. Now by definition, what was the composition of T with S? What is this? By definition, we said that this is equal to-- you first apply the linear transformation S to x. And I'll arbitrarily switch colors. So you first apply the transformation S to x. And that essentially gets you a vector right there. This is just a vector in Rm. Or it's really a vector in y, which is a subset of Rm. And then you apply the transformation T to that vector to get you into z. Given this we can use our matrix representations to replace this kind of transformation representation. Although they're really the same thing. What is a transformation of S applied to x? Well this right here is just A times x, where this is an m by n matrix. So we can say that this is equal to the transformation applied to A times x. Now, what is the T transformation applied to any vector x? Well that's the matrix B times your vector x. So this thing right here is going to be equal to B times whatever I put in there. So the matrix B times the matrix A times the vector x right there. This is what our composition transformation is. The composition of T with S applied to the vector x. Which takes us from the set x all the way to the set z is this, if we use the matrix forms of the two transformations. Now at the end of last video I said I wanted to find just some matrix that if I were to multiply times this vector, that is equivalent to this transformation. And I know that I can find this matrix. I know that this exists because this is a linear transformation. So how can we do that? Well, we just do what we've always done in the past. We start with the identity matrix, and we apply the transformation to every column of the identity matrix. And then you end up with your matrix representation of the transformation itself. So first of all, how big is the identity matrix going to be? Well, these guys that we're inputting into our transformation, they are subsets of x, or they're members of x, which is an n dimensional space, a subset of Rn. So all we do to figure out C is we start off with the identity matrix. The n dimensional identity matrix, because our domain is Rn. And of course we know what that looks like. We have 1, 0 all the way down. It's going to be an n by n matrix, and then 0, 1 all the way down 0's. These 0's right here, and then you have 1's go all the way down the columns and everything else is 0. We've seen this multiple times that's what your identity matrix looks like, just 1's down the column from the top left to the bottom right. Now to figure out C, the matrix representation of our transformation, all we do is we apply the transformation to each of these columns. So we can write that our matrix C is equal to the transformation applied to this first column. What is the transformation? It is the matrix B times the matrix A times whatever you're taking the transformation of. In this case we're taking the transformation of that. We're taking the transformation of 1, 0, 0 all the way down. There's 1 followed by a bunch of 0's. That's going to be our first column of C. Our second column of C is going to be B times A times the second column of our identity matrix. And, of course, you remember these are each the standard basis vectors for Rn. So this is going to be times E2, which is a 0, 1, 0 all the way down. And then we're going to keep doing that until we do get to the last column, which is B times A times a bunch of 0's all the way down until get a 1. The nth term is just a 1 right there. Now what is this going to be equal to? It looks fairly complicated right now. But all you have to do is make the realization-- and we've seen this multiple times. If we write our vector A or we write our matrix A as just a bunch of column vectors. So this is a column vector A1, A2, all the way to An. We already learned that this was and n buy m matrix. Then what is the vector A times, for example, x1, x2 to all the way down to xn. We've seen this multiple times. This is the equivalent to x1 times A1 plus x2 times A2, all the way to plus xn times An. We've seen this multiple times. It's a linear combination of these column vectors where the waiting factors are the terms in our vector that we're taking the product of. So given that, what is this guy going to reduce to? This is going to be A1 times this first entry right here, times x1, plus A2 times a second entry, plus A3 times a third entry. But all these other entries are 0. The x2's all the way to the xn are 0's. So you're only going to end up with 1 times the first column here in A. So this will reduce to-- let me write this. So the first column is going to be B times-- now A times this E1 vector, I guess we could call it, right there is just going to be 1 times the first column in A plus 0 times the second column in A plus 0 times the third column. So it's just 1 times the first column in A. So it's just A1. That simple. Now what is this one going to be equal to? It's going to be 0 times the first column in A, plus 1 times a second column in A, plus 0 times a third column in A, and the rest are going to be 0's. So it's just going to be 1 times the second column in A. So the second column in our transformation matrix is just going to be B times A2. And I think you get the idea here. The next one is going to be B times A3 and all the way until you get B times An. So that's how you would solve for your transformation matrix. Remember what we were trying to do. We were trying to find some-- let me write down and summarize everything that we've done so far. We had a mapping S, that was a mapping from x to y. But x was a subset of Rn. Y was a subset of Rm. And so we said that this linear transformation could be represented as some matrix A where A is an m by n matrix times a vector x. Then I showed you another transformation, we already called it T, which was a mapping from y to z. z is a subset of Rl. And of course, the transformation T applied to some vector in y, can be represented as some matrix B times that vector. I shouldn't have drawn parentheses there, but you get the idea. And this, since it's a mapping from a subset of our Rm to Rl, this will be an l by m matrix. And then we said, look, if we actually just take the composition of T with S, of some vector in x, this reduced to B. So first we applied the S transformation. We multiplied the matrix A times x. And then we applied the T transformation to this. So we just multiplied B times that. Now we know this is a linear transformation, which means it can be represented as a matrix vector product. And we just figured out what the matrix vector product is. So this thing is going to be equal to C times x, which is equal to this thing right there, which is equal to that thing right there. Which is equal to-- let me write it this way-- B, A1, where A1 is the first column vector in our matrix A. And then the second column here is going to be B. And then we have A2, where this is the second column vector in A. And you can keep going all the way until you have B times An times x, of course. Now this is fair enough. We can always do this if you give me some matrix. Remember this is an l by m matrix. And you give me another matrix right here that is an m by n matrix, I can always do this. And how do I know I can I always do that? Because each of these A's are going to have m entries, right? They're going to be Ai. All of them are going to be members of our Rm. So this is well-defined. This has m columns. This has m entries. So each of these matrix vector products are well-defined. Now, this is an interesting thing, because we were able to figure out the actual matrix representation of this composition transformation. Let's extend it a little bit further. Wouldn't it be nice if this were the same thing as the matrices B times A. All of that times x. Wouldn't it be nice if these were the same thing? Because then we could say that the composition of T with S of x is equal to the matrix representation of B times a matrix representation of S. And you take the product of those two. And that will create a new matrix representation which you can call C. That you can then multiply times x. So you won't have to do it individually every time, or do it this way. And I guess the truth of the matter is there is nothing to stop us from defining this to be equal to B times A. We have not defined what a matrix times a matrix is yet. So we might as well. This is a good enough motivation for us to define it in this way. So let's throw in this definition. So if we have some matrix B. B is an l by m matrix. And then we have some other matrix A-- and I'll actually show what A looks like, where these are its column vectors. A1, A2, all the way to An. We're going to define the product. So this is a definition. We're going to define the product BA as being equal to the matrix B times each of the column vectors of A. So it's B times A1. That's going to be its first column. This is going to be B times A2. All the way to B times An. And you've seen this before in algebra two, but the reason why I went through almost two videos to get to here, is to show you the motivation for why matrix products are defined this way. Because it makes the notion of compositions of transformations kind of natural. If you take the composition of one linear transformation with another, the resulting transformation matrix is just the product, as we've just defined it, of their two transformation matrices. For those of you who might not have a lot of experience taking products of matrices, and who think this is fairly abstract to look at, in the next video I'll actually do a bunch of examples and show you that this definition is actually fairly straightforward.