If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains ***.kastatic.org** and ***.kasandbox.org** are unblocked.

Main content

Current time:0:00Total duration:6:41

- [Voiceover] So, in the last video, I introduced the vector form of the multivariable chain rule and just to remind ourselves, I'm saying you have
some kind of function f, and in this case I said it comes from a 100 dimensional space,
so you might imagine-- Well, I can't imagine a
100 dimensional space, but in principle, you're just thinking of some area that's 100 dimensions, it can be two if you wanted to think more concretely in two dimensions. And it's a scalar valued function so it just outputs to a number line, some kind of number line that I'll think of as f as its output. And what we're gonna do is we compose it with a vector valued function so some function that
takes in a single number t and then outputs into that
super high dimensional space. So you're thinking, you go
from the single variable t to some very high dimensional space that we think of as full of vectors, and then you take from that
over to a single variable, over to a number. And you know, the way you'd write that out is you'd say f composed
with the output of v, so f composed with v of t, and what we're interested in
doing is taking its derivative. So the derivative of that composition is-- and I told you and we
kind of walked through where this come from, the gradient of f, evaluated at v of t, evaluated
at your original output, that product, with the derivative of v, the vectorized derivative,
and what that means, you know, for v, you're just taking the
derivative of every component. So when you take this and
you take the derivative with respect to t, all that means is that each component, you're
taking the derivative of it. The x1 dt, the x2 dt, on and on until d and then the one hundredth component dt. So this was the vectorized form of the multivariable chain rule. And what I wanna do here is show how this looks a lot like
a directional derivative. And if you haven't watched the video on the directional derivative,
maybe go back, take a look, kind of remind yourself, but in principle, you say, if you're in
the input space of f, and you nudge yourself
along some kind of vector v, and maybe just because I'm using v there, I'll instead say some kind of vector w. So not a function, just a vector. And you're wondering, hey,
how much does that result in a change to the output of f, that's answered by the
directional derivative and you'd write directional derivative in the direction of w of f, the directional derivative of f, and I should say at some
point, some input point, p for that input point and
it's a vector in this case, like a 100 dimensional vector. And the way you evaluate it, is you take the gradient of f, this is why we use the nabla
notation in the first place, it's an indicative of how we compute it, the gradient of f evaluated
at that same input point, the same input vector p. So here, just to be clear,
you'd be thinking of whatever vector to your
input point, that's p. But then the nudge, the nudge away from that input point is w. And you take the dot product between that and the vector itself, the vector that represents
your nudge direction. But that looks a lot like the multivariable chain rule up here, except instead of w, you're
taking the derivative, the vector value derivative of v, so this whole thing you could say is the directional derivative
in the direction of the derivative of t, and
that's kind of confusing. Directional derivative in the
direction of a derivative, of f, and what point are you taking this, at what point are you taking
this directional derivative? Well, it's wherever the output of v is. So this is very compact,
it's saying quite a bit here. But a way that you could
be thinking about this, is v of t, so I'm gonna
kind of erase here. V of t has you zooming all about and as you shift t, it kind of moves you through this space in some way. And each one of these output points here represents the vector,
v of t at some point, the derivative of that, what
does this derivative represent? That's the tangent vector to that motion, you know, so you're zipping
about through that space, the tangent vector to your motion, that's how we interpret v prime of t, the derivative of v with respect to t. I mean why should that make sense? Why should the directional derivative in the direction of v prime of t, this change to the
intermediary function v, have anything to do with the
multivariable chain rule? Well, remember what
we're asking when we say dt of this composition is we're saying we take a tiny nudge to t,
so that tiny change here, in the value t, and we're wondering what changed that result
in after the composition? Well, at a given point,
that tiny nudge in t causes a change in the
direction of v prime of t. That's kind of the whole meaning of this vector value derivative. You change t by a little bit,
and that's gonna tell you how you move in the output space. But then you say, "Okay,
so I've moved a little bit "in this intermediary
100 dimensional space, "how does that influence the output of f "based on the behavior of just "the multivariable function f?" Well, that's what the
directional derivative is asking. It says you take a nudge in
the direction of some vector, in this case, I wrote
v prime of t over here. More generally, you
could say any vector w, you take a nudge in that direction. And more importantly, you know, the size of v prime of t matters here. If you're moving really
quickly, you would expect that change to be larger, so the fact that v prime of t would
be larger is helpful. And the directional
derivative is telling you the size of the change in f as a ratio of the proportion of
that directional vector that you went along. Right? You could--another notation
for the directional derivative is to say partial f, and then partial whatever that vector is. Basically saying you
take size of that nudge along that vector as a
proportion of the vector itself, and then you consider
the change to the output and you're taking the ratio. So I think this is a very
beautiful way of understanding the multivariable chain rule. Cause it give this image of, you know, you're thinking of v of
t, and you're thinking of zipping along in some way, and the direction and
value of your velocity as you zip along is what determines the change in the output
of the function f. So hopefully, that helps
give a better understanding both of the directional derivative and of the multivariable chain rule. It's one of those nice
little interpretations.