I don't think the interpretation was well organized enough. I got very confused.

I agree. The many rapid changes between forms of notation without worked-through examples led to a falling-off in the explanatory value of the video. I understand what Grant is getting at, but this video did not add to my understanding. It should concentrate either on explaining how the multivariable chain rule spits out the directional derivative or on showing how the rule can be expressed using different forms of notation, but not on both as this causes understanding of the relationship between the multivariable chain rule and the directional derivative to be lost.

This quickly became very hard to follow. The incredibly compact expression is quite complex, and confusing.

I also have difficulty in getting the intuition, but I think what Grant is getting at is this: Think of the derivative of f with respect to t as the rate of change for f as you change t by a little (in fact infinitesimal) bit. The directional derivative definition says that this is equal to the directional derivative of f with respect to v'(t), evaluated at v(t). The directional derivative itself is how much f changes as we go along in the v'(t) direction (the direction of motion). We evaluate this at v(t) because that is the point in which the change is occurring. Hope this helps!

Shouldn't the W vector in the directional derivative equation around 4:00 be a unit vector?

yes w vector should be normalized. the problem is i think grant has never applied this or created a simulation of directional derivative and that is why he forgets to normalize the w

why are we not dividing by the modulus of w as was done in directional derivative

Simply because we are not taking the slope of f with respect to a nudge in the direction of w in an xy plane, but the derivative of f with respect to t which causes a change in w. So the magnitude of w matters. Loosely speaking, if a change in t causes a large change in w, the derivative of f would be larger (simply because a larger change in w should cause a larger change in f) and if the change in w is small the change in f is small.

Instead of clarifying what I saw in class, this video, confused me more

I actually like these videos more than Sal's because these are at a higher level, which may not be a good thing in general because these videos also have to serve high school students.

Main content

Course: Multivariable calculus > Unit 2

Lesson 5: Multivariable chain rule

Multivariable chain rule and directional derivatives

Name: Multivariable chain rule and directional derivatives
Uploaded: 2016-05-21T00:29:52Z
Description: See how the multivariable chain rule can be expressed in terms of the directional derivative.

Google Classroom

See how the multivariable chain rule can be expressed in terms of the directional derivative. Created by Grant Sanderson.

Want to join the conversation?

Sort by:

Jersey Tsai
Posted 8 years ago. Direct link to Jersey Tsai's post “I don't think the interpr...”
I don't think the interpretation was well organized enough. I got very confused.
Button navigates to signup pageComment on Jersey Tsai's post “I don't think the interpr...”
(53 votes)
Answer
- Still No Sheep
  Posted 7 years ago. Direct link to Still No Sheep's post “I agree. The many rapid c...”
  I agree. The many rapid changes between forms of notation without worked-through examples led to a falling-off in the explanatory value of the video. I understand what Grant is getting at, but this video did not add to my understanding. It should concentrate either on explaining how the multivariable chain rule spits out the directional derivative or on showing how the rule can be expressed using different forms of notation, but not on both as this causes understanding of the relationship between the multivariable chain rule and the directional derivative to be lost.
  Button navigates to signup page
  (10 votes)
Dave Owen
Posted 8 years ago. Direct link to Dave Owen's post “This quickly became very ...”
This quickly became very hard to follow. The incredibly compact expression is quite complex, and confusing.
Button navigates to signup pageComment on Dave Owen's post “This quickly became very ...”
(27 votes)
Answer
- James Wu
  Posted 7 years ago. Direct link to James Wu's post “I also have difficulty in...”
  I also have difficulty in getting the intuition, but I think what Grant is getting at is this:
  
  Think of the derivative of f with respect to t as the rate of change for f as you change t by a little (in fact infinitesimal) bit. The directional derivative definition says that this is equal to the directional derivative of f with respect to v'(t), evaluated at v(t). The directional derivative itself is how much f changes as we go along in the v'(t) direction (the direction of motion). We evaluate this at v(t) because that is the point in which the change is occurring.
  
  Hope this helps!
  Comment on James Wu's post “I also have difficulty in...”
  (18 votes)
Eugen Engel
Posted 8 years ago. Direct link to Eugen Engel's post “I think it is great that ...”
I think it is great that sometimes things are taught in general terms, but most of the time you should be doing examples with real problems. Then everyone will get a great understanding. Obviously, there is a lot of confusion from other people here. Especially in more advanced math. Where are all my example problems and all the challenges that we had in all the lessons before?
Button navigates to signup pageButton navigates to signup page
(10 votes)
Answer
clive.r.long
Posted 2 years ago. Direct link to clive.r.long's post “I think the comments and ...”
I think the comments and criticisms that this video is confusing, badly organised and lacking examples are simply wrong in the context of all the prior videos in this series and what this video clearly set out to achieve. I suggest you look back at all the previous videos in the series where the 2-dimensional cases are clearly and painstakingly laid out and worked through. This video was clearly indicated as the step to generalise the previous examples and results, showing how the vector notation is a compact way to present results for functions of variables of any dimension.
Button navigates to signup pageButton navigates to signup page
(8 votes)
Answer
st02mcma
Posted 8 years ago. Direct link to st02mcma's post “Shouldn't the W vector in...”
Shouldn't the W vector in the directional derivative equation around
4:00
be a unit vector?
Button navigates to signup pageButton navigates to signup page
(5 votes)
Answer
- codenstarz
  Posted 7 years ago. Direct link to codenstarz's post “yes w vector should be no...”
  yes w vector should be normalized. the problem is i think grant has never applied this or created a simulation of directional derivative and that is why he forgets to normalize the w
  Comment on codenstarz's post “yes w vector should be no...”
  (3 votes)
Evgenii Neumerzhitckii
Posted 7 years ago. Direct link to Evgenii Neumerzhitckii's post “I watched all preceding m...”
I watched all preceding multivariable calculus videos, and they were easy to understand. However, this one is a bit too abstract. Maybe it would help to provide some example problems.
Button navigates to signup pageButton navigates to signup page
(6 votes)
Answer
aakash.raj1999
Posted 6 years ago. Direct link to aakash.raj1999's post “why are we not dividing b...”
why are we not dividing by the modulus of w as was done in directional derivative
Button navigates to signup pageButton navigates to signup page
(1 vote)
Answer
- Malith Lakshan
  Posted 6 years ago. Direct link to Malith Lakshan's post “Simply because we are not...”
  Simply because we are not taking the slope of f with respect to a nudge in the direction of w in an xy plane, but the derivative of f with respect to t which causes a change in w. So the magnitude of w matters. Loosely speaking, if a change in t causes a large change in w, the derivative of f would be larger (simply because a larger change in w should cause a larger change in f) and if the change in w is small the change in f is small.
  Button navigates to signup page
  (7 votes)
J
Posted 5 years ago. Direct link to J's post “What about when *v* is a ...”
What about when v is a function of two functions of two variables? How would we handle using the chain rule for v(u,v) = (x(u,v),y(u,v)) when x and y are both functions of (u,v)?
Button navigates to signup pageButton navigates to signup page
(2 votes)
Answer
siahleeka
Posted 5 months ago. Direct link to siahleeka's post “I think it's a little ann...”
I think it's a little annoying that the first chain rule is introduced in a different form than the practice problems(which use vector form almost exclusively). Makes the whole thing confusing up until this and previous video.
Button navigates to signup pageButton navigates to signup page
(2 votes)
Answer
Seb Sae
Posted 8 years ago. Direct link to Seb Sae's post “Instead of clarifying wha...”
Instead of clarifying what I saw in class, this video, confused me more
Button navigates to signup pageButton navigates to signup page
(1 vote)
Answer
- Potugadu
  Posted 6 years ago. Direct link to Potugadu's post “I actually like these vid...”
  I actually like these videos more than Sal's because these are at a higher level, which may not be a good thing in general because these videos also have to serve high school students.
  Button navigates to signup page
  (2 votes)

Video transcript

- [Voiceover] So, in the last video, I introduced the vector form of the multivariable chain rule and just to remind ourselves, I'm saying you have some kind of function f, and in this case I said it comes from a 100 dimensional space, so you might imagine-- Well, I can't imagine a 100 dimensional space, but in principle, you're just thinking of some area that's 100 dimensions, it can be two if you wanted to think more concretely in two dimensions. And it's a scalar valued function so it just outputs to a number line, some kind of number line that I'll think of as f as its output. And what we're gonna do is we compose it with a vector valued function so some function that takes in a single number t and then outputs into that super high dimensional space. So you're thinking, you go from the single variable t to some very high dimensional space that we think of as full of vectors, and then you take from that over to a single variable, over to a number. And you know, the way you'd write that out is you'd say f composed with the output of v, so f composed with v of t, and what we're interested in doing is taking its derivative. So the derivative of that composition is-- and I told you and we kind of walked through where this come from, the gradient of f, evaluated at v of t, evaluated at your original output, that product, with the derivative of v, the vectorized derivative, and what that means, you know, for v, you're just taking the derivative of every component. So when you take this and you take the derivative with respect to t, all that means is that each component, you're taking the derivative of it. The x1 dt, the x2 dt, on and on until d and then the one hundredth component dt. So this was the vectorized form of the multivariable chain rule. And what I wanna do here is show how this looks a lot like a directional derivative. And if you haven't watched the video on the directional derivative, maybe go back, take a look, kind of remind yourself, but in principle, you say, if you're in the input space of f, and you nudge yourself along some kind of vector v, and maybe just because I'm using v there, I'll instead say some kind of vector w. So not a function, just a vector. And you're wondering, hey, how much does that result in a change to the output of f, that's answered by the directional derivative and you'd write directional derivative in the direction of w of f, the directional derivative of f, and I should say at some point, some input point, p for that input point and it's a vector in this case, like a 100 dimensional vector. And the way you evaluate it, is you take the gradient of f, this is why we use the nabla notation in the first place, it's an indicative of how we compute it, the gradient of f evaluated at that same input point, the same input vector p. So here, just to be clear, you'd be thinking of whatever vector to your input point, that's p. But then the nudge, the nudge away from that input point is w. And you take the dot product between that and the vector itself, the vector that represents your nudge direction. But that looks a lot like the multivariable chain rule up here, except instead of w, you're taking the derivative, the vector value derivative of v, so this whole thing you could say is the directional derivative in the direction of the derivative of t, and that's kind of confusing. Directional derivative in the direction of a derivative, of f, and what point are you taking this, at what point are you taking this directional derivative? Well, it's wherever the output of v is. So this is very compact, it's saying quite a bit here. But a way that you could be thinking about this, is v of t, so I'm gonna kind of erase here. V of t has you zooming all about and as you shift t, it kind of moves you through this space in some way. And each one of these output points here represents the vector, v of t at some point, the derivative of that, what does this derivative represent? That's the tangent vector to that motion, you know, so you're zipping about through that space, the tangent vector to your motion, that's how we interpret v prime of t, the derivative of v with respect to t. I mean why should that make sense? Why should the directional derivative in the direction of v prime of t, this change to the intermediary function v, have anything to do with the multivariable chain rule? Well, remember what we're asking when we say dt of this composition is we're saying we take a tiny nudge to t, so that tiny change here, in the value t, and we're wondering what changed that result in after the composition? Well, at a given point, that tiny nudge in t causes a change in the direction of v prime of t. That's kind of the whole meaning of this vector value derivative. You change t by a little bit, and that's gonna tell you how you move in the output space. But then you say, "Okay, so I've moved a little bit "in this intermediary 100 dimensional space, "how does that influence the output of f "based on the behavior of just "the multivariable function f?" Well, that's what the directional derivative is asking. It says you take a nudge in the direction of some vector, in this case, I wrote v prime of t over here. More generally, you could say any vector w, you take a nudge in that direction. And more importantly, you know, the size of v prime of t matters here. If you're moving really quickly, you would expect that change to be larger, so the fact that v prime of t would be larger is helpful. And the directional derivative is telling you the size of the change in f as a ratio of the proportion of that directional vector that you went along. Right? You could--another notation for the directional derivative is to say partial f, and then partial whatever that vector is. Basically saying you take size of that nudge along that vector as a proportion of the vector itself, and then you consider the change to the output and you're taking the ratio. So I think this is a very beautiful way of understanding the multivariable chain rule. Cause it give this image of, you know, you're thinking of v of t, and you're thinking of zipping along in some way, and the direction and value of your velocity as you zip along is what determines the change in the output of the function f. So hopefully, that helps give a better understanding both of the directional derivative and of the multivariable chain rule. It's one of those nice little interpretations.