Main content
Linear algebra
Course: Linear algebra > Unit 2
Lesson 1: Functions and linear transformations- A more formal understanding of functions
- Vector transformations
- Linear transformations
- Visualizing linear transformations
- Matrix from visual representation of transformation
- Matrix vector products as linear transformations
- Linear transformations as matrix vector products
- Image of a subset under a transformation
- im(T): Image of a transformation
- Preimage of a set
- Preimage and kernel example
- Sums and scalar multiples of linear transformations
- More on matrix addition and scalar multiplication
© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice
Matrix vector products as linear transformations
Matrix Vector Products as Linear Transformations. Created by Sal Khan.
Want to join the conversation?
- How exactly are matrices used in computer science or physics? I mean yeah I heard that it is related to graphics in computer science and it is related to vector quantities in physics, but how do I exactly apply matrices to these? Someone please give an example either in computer science or physics and explain to me exactly how do we work with matrices. Thanks in advance!(11 votes)
- Peter,
I can give you a more in depth physics example.
Are you familiar with salt-water taffy? It's a piece of candy that is usually cylindrically shaped, about an inch long and maybe a quarter of an inch in diameter. And it's pretty "squishy" and "stretchy" kind of the consistency of play-doh.
Well, imagine you have a piece of taffy and you are holding it so that the long dimension is parallel to the ground. Now imagine that you pull both ends of it. What happens? Well, of course it gets longer in the direction that you are pulling it. But in the middle it also starts get skinny. The technical name for this is called "deformation" or "strain," and think of how many vectors it would take to describe the strain. It's getting longer in the direction you are pulling, so that would be one vector, but it's getting shorter in the other two directions. And of course, nothing says you have to pull perfectly along 1 axis. You could pull in some weird direction. So to describe a 3-dimensional strain it would actually take 9 values:
1 value describes how the x dimension changes due to a force in the x direction: Axx
1 value describes how the x dimension changes due to a force in the y direction:
Axy
1 value describes how the x dimension changes due to a force in the z direction:
Axz
And of course, there would be six more, for the other possibilities: Ayx, Ayy, Ayz, Azx, Azy, Azz.
If you put all of these together they are called the "Strain Tensor" and they can be arranged as a 2 dimensional 3 by 3 matrix. Using that matrix, you can calculate how much force it takes to stretch the taffy from 1 inch long to two inches long, for example, and then how much the taffy will "neck down" or get skinny in the middle.
Unfortunately, I can't give you more depth than that in a comment, but a quick google search will find you a lot more! Of course, this only 1 of a bunch of ways that matrices are used in applications.(71 votes)
- Also, some people (like myself) work much better with tangable objects than all these laws, rules and properties. If I could "see" Linear Transformations geometrically, graphed out and visualized, the theory would be much more digestible.(5 votes)
- Oh wow, we just drop the results of the sin/cos/tan functions in the rotation matrix? Seems simple enough.
What I am confused about is in how we decided to use these specific trig functions....
that is
[Cos(theta) , -Sin(theta)]
[Sin(theta) , Cos(theta)]
I understand vertical V1 is multiplied by X and vertical V2 is multiplied by Y, but still don't see how they were built.
Does the "arrangement" the trig functions are in ever change (when doing rotations)? I guess I don't see how you arrived at that matrix so I'm taking you up on your offer :), that is, I'm confused on how you picked which trig functions to use in the matrix. I recognize the results of the trig functions fine (i'm more familiar with SOHCAHTOA aka hypSin(theta) or hypCos(theta) not xCos(theta) or -ySin(theta) ).
I see Wikipedia has a sheet on various R2 matrix calculations, I'm still lost as to how those Matrixes were derived, I hope you're more clear than Wiki as I mostly work in R3, and I will need to calculate rotations of Z as well.
I think the key lies in figureing out how to do any kind of transformations, not just rotations. It appears, that if for example in R2 that
[x transformation, y transformation]
[x transformation, y transformation]
Reading your response below, a R3 rotation would be described in a 4x4 matrix?
[x transformation, y transformation, z transformation, w transformation]
[x transformation, y transformation, z transformation w transformation]
[x transformation, y transformation, z transformation w transformation]
[x transformation, y transformation, z transformation w transformation](3 votes)
- I would really like to see a demonstration on using Linear Transformations to describe a rotation and a relocation in a 3d space.
Would I need a 3x3 matrix to do that? A 3x4? All this theory is fine and well, but some examples on specific applications such as the ones mentioned above would be great.(5 votes)- If you had an object in 3D space, with a 3x3 matrix you can rotate, scale, stretch, flip, project. You cannot translate it (relocate). You don't need a 4x4 to translate. You could do that with a 3x4 as you suggest. A 3x4 would be very inconvenient though. As it isn't square, it wouldn't have an inverse. Quite often we want to do the opposite transform and the inverse matrix is handy in that it undoes the transformation. Another thing we want to do is combine transformations into 1 transformation. In the matrix world, we do this by multiplying the transformation matrices together. A 4x4 entity means they can be combined easily. The product of two 3x4 matrices on the other hand isn't even defined.
N.B. When using a 4x4 matrix, the 3D points are typically augmented with an extra coordinate we call w. w is typically set to 1. This augmentation is required to allow the product of a 4x4 matrix and a 4x1 vector to be defined.
I don't have any examples to point you at right now, but if I find some I'll edit this answer.(7 votes)
- At, the matrix multiplication he performs does not make sense to me. It lo0ks like at first he's treating v1, v2, v3... as the column vectors of matrix A, which would have dimension 1xm (causing it to have the expected mxn dimensions, as there are n vectors) , but then he multiplies them by the x vector, which is an nx1 matrix. You cannot perform matrix multiplication between a 1xm and an nx1 matrix. Am I overlooking something? 2:07(6 votes)
- A has n vectors, which are each m x 1. So you can't multiply them by x as a vector (as x is n x 1), but that is not what is happening. He is multiplying them by the elements of x, so x1, x2 to xn, and then summing the result. Each element of x is just a scalar, which obviously can multiply the vector columns of A. This is just another way to go through the mechanics of multiplying.. using the elements of x as coefficients of the vectors of A, and it gives the same answer as doing it say the dot product way.(4 votes)
- Does every matrix A have a matrix B (where A != B), that Ax = y is equal to Bx = y?
For example, in Sal's 2x2 matrix [2, -1 <below> 3, 4], the matrix vector product Bx was equal to [2x.1 - x.2 <below> 3x.1 + 4x.2], whereas a = 2, b = -1, c = 3, and d = 4. However, if we had a new matrix A whereas its a = -x.2/x.1, b = 2x.1/x.2, c = 4x.2/x.1, and d = 3x.1/x.2, then, for any x.1 and x.2, Ax = Bx = y.
Is this right, and if so, what does it mean when you deal with matrix inverses? If you have C as an inverse, and you do Cy = x, does there exist many possible C's where Cy = x instead of only one C?
Thanks.(4 votes)- Not quite. What you have shown is that two different matrices can transform a specific vector to the same image. By making your "new matrix A" (matrix B) dependent on the vector this holds only for the specific vector. (Also notice that your new matrix falls apart if x_1 or x_2 = 0. I think if your construction does not work for x = [1 0] and x = [0 1], then you're looking for trouble.)
You should try a specific example for x_1 and x_2 != 0. You'll get two distinct matrices A != B that will transform your x_1 and x_2 to the same x_1' and x_2'. Yay! so far so good, but the two matrices, A and B will not transform a different x_1 and x_2 to the same image.
Consider for example that both a rotation and a reflection can take a specific vector to the same image, but will not take all vectors (the entire space) to the same image.
Some reflections transform specific vectors to the same vector, but that does not mean that they are the identity transformation.(4 votes)
- I have an extremely basic question ...
Is multiplying a matrix with a vector the same as multiplying a vector with a matrix (i.e. does the order matter?)
Sal says in the beginning of this video that "taking a product of a vector with a matrix is equivalent to a transformation" ... should that sentence be "taking a product of a matrix with a vector is equivalent to a transformation."
Sorry about nit-picking on possibly trivial elements ... it's because one does not know if something is important or not until one has fully surveyed the subject :)(4 votes)- Order of matrix multiplication does matter.
Transforming a vector x by a matrix A is mathematically written as Ax, and can also be described by:
"Left multiplying x by A."
Sometimes when the context is clear, when we say "multiplying of x by A", it is clear and obvious we mean left multiplication, i.e. Ax.(3 votes)
- On, why did Sal insist on writing a bold A? 1:08
I thought only vectors were bolded?(4 votes)- A[ v ] is a symbol that references a vector (since it is a member of R^m), so it makes sense to make it bold. But, as Sal suggests, he forgets to do it often and the world doesn't collapse. Indeed, if it were up to me, he would have made T bold as well, since it is a vector function, but meh.(3 votes)
- Why are we checking whether things are linear transformations? are there some perks to being linear?(3 votes)
- Linear transformations are the simplest, and cover a very wide range of possible transformations of vectors. On the other hand, non-linear transformations do not work very well if you change your coordinate grid, making them very rare. But the main reason is that a linear transformation can always be represented as a matrix-vector product, which allows some neat simplifications.(3 votes)
- can you have a linear transformation if there is an absolute value in the transformation matrix?(2 votes)
- Yes. All matrix transformations are linear transformations by definition, but not all linear transformations are matrices.(4 votes)
- Shouldn't Sal have a vector arrow over the x? 7:06(1 vote)
- I think he did forget to put one, but as long as you know that x is a vector, the notation isn't really important.(2 votes)
Video transcript
I think you're pretty familiar
with the idea of matrix vector products and what I want to do
in this video is show you that taking a product of a vector
with a matrix is equivalent to a transformation. It's actually a linear
transformation. Let's say we have some matrix A
and let's say that its terms are, or its columns are v1--
column vector is v2, all the way to vn. So this guy has n columns. Let's say it has m rows. So it's an m by n matrix. And let's say I define
some transformation. Let's say my transformation
goes from Rn to Rm. This is the domain. I can take any vector in Rn
and it will map it to some factor in Rm. And I define my transformation. So T of x where this is some
vector in Rn, is equal to A-- this is this A. Let me write it in this
color right here. And it should be bolded. I kind of get careless sometimes
with the bolding. But big bold A times
the vector x. So the first thing you might,
Sal, this transformation looks very odd relative to how we've
been defining transformations or functions so far. So the first thing we have to
just feel comfortable with is the idea that this is
a transformation. So what are we doing? We're taking something
from Rn and then what does A x produce? If we write A x like this, if
this is x where it's x1, x2. It's going to have n terms
because it's in Rn. This can be rewritten as x1
times v1 plus x2 times v2, all the way to xn times vn. So it's going to be a sum of a
bunch of these column vectors. And each of these columns
vectors, v1, v2, all the way to vn, what set are
they members of? This is an m by n matrix, so
they're going to have m-- the matrix has m rows, or
each of these column vectors will have m entries. So all of these guys
are members of Rm. So if I just take a linear
combination of all of these guys, I'm going to get
another member of Rm. So this guy right here is going
to be a member of Rm, another vector. So clearly, by multiplying my
vector x times a, I'm mapping, I'm creating a mapping from Rn--
and let me pick another color-- to Rm. And I'm saying it in very
general terms. Maybe n is 3, maybe m is 5. Who knows? But I'm saying it in very
general terms. And so if this is a particular instance, a
particular member of set Rn, so it's that vector, our
transformation or our function is going to map it to
this guy right here. And this guy will be a
member of Rm and we could call him a x. Or maybe if we said a x equaled
b we could call him the vector b-- whatever. But this is our transformation
mapping. So this does fit our kind of
definition or our terminology for a function or a
transformation as a mapping from one set to another. But it still might not be
satisfying because everything we saw before looked
kind of like this. If we had a transformation
I would write it like the transformation of-- I would
write, you know, x1 and x2 and xn is equal to. I'd write m terms
here in commas. How does this relate to that? And to do that I'll do
a specific example. So let's say that I had
the matrix-- let me to a different letter. Let's say I have my matrix
B and it is a fairly simple matrix. It's a 2, minus 1, 3 and 4. And I define some
transformation. So I define some transformation
T. And it goes from R2 to R2. And I define T. T of some vector x is equal
to this matrix, B times that vector x. Now what would that equal? Well the matrix is
right there. Let me write it in purple. 2, minus 1, 3, and 4 times x. x1, x2. And so what does this equal? Well this equals
another vector. It equals a vector in the
co-domain R2 where the first term is 2 times x1. I'm just doing the definition
of matrix vector multiplication. 2 times x1 plus minus 1
times x2, or minus x2. That's that row times
our vector. And then the second row
times that factor. We get 3 times x1. Plus 4 times x2. So this is what we might
be more familiar with. I could rewrite this
transformation. I could rewrite this
transformation as T of x1 x2 is equal to 2x1 minus x2 comma--
let me scroll over a little bit, comma
3x1 plus 4x2. So hopefully you're satisfied
that a matrix multiplication, it isn't some new, exotic
form of transformation. That they really are
just another way. This statement right here is
just another way of writing this exact transformation
right here. Now, the next question you might
ask and I already told you the answer to this at the
beginning of the video is, is multiplication by a matrix
always going to be a linear transformation? Now what are the two constraints
for being a linear transformation? We know that the transformation
of two vectors, a plus b, the sum of two vectors
should be equal to the sum of their transformations. The transformation of a plus
the transformation of b. And then the other requirement
is that the transformation of a scaled version of a vector
should be equal to a scaled version of the transformation. These are our two requirements
for being a linear transformation. So let's see if matrix
multiplication applies there. And I've touched on this in the
past and I've even told you that you should prove it. I've already assumed you know
it, but I'll prove it to you here because I'm tired of
telling you that you should prove it. I should do it at least once. So let's see, matrix
multiplication. If I multiply a matrix A times
some vector x, we know that-- let me write it this way. We know that this is equivalent to-- I said our matrix. Let's say this is an
m by n matrix. We can write any matrix
as just a series of column vectors. So this guy could have
n column vectors. So let's say it's v1, v2, all
the way to vn column vectors. And each of these guys are going
to have m components. Times x1, x2, all the
way down to xn. And we've seen this multiple,
multiple times before. This, by the definition of
matrix vector multiplication is equal to x1 times v1. That times that. This scalar times that vector
plus x2 times v2, all the way to plus xn times vn. This was by definition of a
matrix vector multiplication. And of course, this is going
to-- and I did this at the top of the video. This is going to have right
here, this vector is going to be a member of Rm. It's going to have
m components. So what happens if I take some
matrix A, some m by n matrix A, and I multiply it times the
sum of two vectors a plus b? So I could rewrite this as
this thing right here. So my matrix A times. The sum of a plus b, the first
term will just be a1 plus b1. Second term is a2 plus b2, all
the way down to a n plus bn. This is the same
thing as this. I'm not saying a of a plus b. I'm saying a times. Maybe I should put a
dot right there. I'm multiplying the matrix. I want to be careful
with my notation. This is the matrix vector
multiplication. It's not some type of new
matrix dot product. But this is the same
thing as this multiplication right here. And based on what I just told
you up here, which we've seen multiple, multiple times, this
is the same thing as a1 plus b1 times the first column
in a, which is that vector right there. This a is the same as this a. So times v1. Plus a2 plus b2 times v2,
all the way to plus an plus bn times vn. Each xi term here is just
being replaced by an ai plus bi term. So each x1 here is replaced
by an a1 plus b1 here. This is equivalent to this. And then from the fact that we
know that well vector products times scalars exhibit the
distributive property, we can say that this is equal
to a1 times v1. Let me actually write all of
the a1 terms. Let me write this. a1 times v1 plus b1 times
v1 plus a2 times v2 plus b2 times v2, all the way
to plus a n times vn plus bn times vn. And then if we just re-associate
this, if we just group all of the a's together,
all of the a terms together, we get a1 plus a-- sorry. a1 plus-- let me write it this
way. a1 times v1 plus a2 times v2 plus, all the way,
a n times vn. I just grabbed all
the a terms. We get that plus all the b
terms. All the b terms I'll do in this color. All the b terms are like that. So plus b1 times v1 plus b2
times v2, all the way to plus bn times vn. That's that guy right there. Is equivalent to this statement
up here; I just regrouped everything, which is
of course, equivalent to that statement over there. But what's this equal to? This is equal to my vector--
these columns are remember, the column for the
matrix capital A. So this is equal to the matrix
capital A times a1, a2, all the way down to a n, which
was our vector a. And what's this equal to? This is equal to plus
these v1's. These are the columns for the a,
so it's equal to the matrix A times my vector b. b1, b2, all the way
down to bn. This is my vector b. We just showed you that if I add
my two vectors, a and b, and then multiply it by the
matrix, it's completely equivalent to multiplying each
of the vectors times the matrix first and then
adding them up. So we've satisfied-- and this
is for an m by n matrix. So we've now satisfied this
first condition right there. And then what about the
second condition? And this one's even more
straightforward to understand. c times a1, so let me
write it this way. The vector a times-- sorry. The matrix capital A times the
vector lowercase a-- let me do it this way because
I want-- times the vector c lowercase a. So I'm multiplying my vector
times the scalar first. Is equal to-- I can write
my big matrix A. I've already labeled
its columns. It's v1, v2, all
the way to vn. That's my matrix a. And then, what does
ca look like? ca, you just multiply its
scalar times each of the terms of a. So it's ca1, ca2, all the
way down to c a n. And what does this equal? We know this, we've seen this
show multiple times before right there. So it just equals-- I'll write
a little bit lower. That equals c a1 times this
column vector, times v1. Plus c a2 times v2 times this
guy, all the way to plus c a n times vn. And if you just factor this
c out, once again, scalar multiplication times vectors
exhibits the distributive property. I believe I've done a video
on that, but it's very easy to prove. So this will be equal to c
times-- I'll just stay in one color right now-- a1 v1
plus a2 v2 plus all the way to a n vn. And what is this
thing equal to? Well that's just our matrix A
times our vector-- or our matrix uppercase A. Maybe I'm overloading
the letter A. My matrix uppercase A times
my vector lowercase a. Where the lowercase a is just
this thing right here, a1, a2 and so forth. This thing up here was the
same thing as that. So I just showed you that if I
take my matrix and multiply it times some vector that was
multiplied by a scalar first, that's equivalent to first
multiplying the matrix times a vector and then multiplying
by the scalar. So we've shown you that matrix
times vector products or matrix vector products satisfied
this condition of linear transformations
and this condition. So the big takeaway right here
is matrix multiplication. And this is a important
takeaway. Matrix multiplication or matrix
products with vectors is always a linear
transformation. And this is a bit
of a side note. In the next video I'm going to
show you that any linear transformation-- this is
incredibly powerful-- can be represented by a matrix
product or by-- any transformation on any vector can
be equivalently, I guess, written as a product of that
vector with a matrix. Has huge repercussions and you
know, just as a side note, kind of tying this back
to your everyday life. You have your Xbox, your Sony
Playstation and you know, you have these 3D graphic programs
where you're running around and shooting at things. And the way that the software
renders those programs where you can see things from every
different angle, you have a cube then if you kind of move
this way a little bit, the cube will look more like this
and it gets rotated, and you move up and down, these are all transformations of matrices. And we'll do this
in more detail. These are all transformations of
vectors or the positions of vectors and I'll do that
in a lot more detail. And all of that is really just
matrix multiplication. So all of these things that
you're doing in your fancy 3D games on your Xbox or your
Playstation, they're all just matrix multiplications. And I'm going to prove that
to you in the next video. And so when you have these
graphics cards or these graphics engines, all they are--
you know, we're jumping away from the theoretical. But all these graphics
processors are, are hard wired matrix multipliers. If I have just a generalized,
some type of CPU, I have to in software write how to
multiply matrices. But if I'm making an Xbox or
something and 99% of what I'm doing is just rotating these
abstract objects and displaying them in transformed
ways, I should have a dedicated piece of hardware, a
chip, that all it does-- it's hard wired into it-- is
multiplying matrices. And that's what those graphics
processors or graphics engines really are.