Main content

## Gradient and directional derivatives

# Directional derivatives and slope

## Video transcript

- [Voiceover] Hello everyone,
what I wanna talk about here is how to interpret the
directional derivative in terms of graphs. I have here the graph of a function, a multivariable function, it's f of xy is equal
to x squared times y. In the last couple videos I talked about what the directional derivative is, how you can formally define it, how you can compute it using the gradient. Generally the setup
that you might have is, you have some kind of
vector, and this is a vector in the input space so in this case it's gonna be in the xy plane. In this case I'll just
take the vector 1 1. Okay? And the directional
derivative, which we denote by kind of taking
the gradient symbol, except you stick the name of
that vector down in the lower part there, the directional
derivative of your function, it'll still take the same input. This is kind of a measure
of how the function changes when the input moves in that direction. So I'll show you what I mean, I mean you could imagine slicing this
graph by some kind of plane but that plane doesn't
necessarily have to be parallel to the x or y axes. That's what we did for the
partial derivative, we took a plane that represented
the constant x value or the constant y value, but
this is gonna be a plane that kind of tells you what
movement in the direction of your vector looks like, and
like I have a number of other times I'm gonna go ahead and
slice the graph along that plane, and just to make it
clear, I'm gonna color in where the graph intersects that slice. This vector here, this little
v, you'll be thinking of it as living on the xy plane and
it's determining the direction of this plane that we're
slicing things with. On the xy plane you've got this vector, it's 1 1, it kind of points
to that diagonal direction, and then you take the whole
plane and you slice your graph. And if we want to interpret the
directional derivative here, I'm gonna go ahead and
fill it an actual value, so let's say we wanted to do it at -1 1, - 1, -1 'cause I guess I chose a
plane that passes through the origin, so I've got to
make sure that the point I'm evaluating actually
goes along this plane, but you could imagine one that
points in the same direction, but you kind of slide it back and forth, if we're doing this, we can
interpret this as a slope, but you have to be very careful,
if you're gonna interpret this as a slope, it has to be
the case that you're dealing with a unit vector, that
the magnitude of your vector is equal to 1. I mean, it doesn't have to
be, you can kind of account for it later but it makes
it more easy to think about. If we're just thinking of a unit vector. When I go over here instead
of saying that it's 1 1, I'm gonna say it's whatever
vector points in the same direction but has a unit
length, and in this case that happens to be square root of 2 over 2, for each of the components. You can kind of think about
why that would be true by diagonal but this is a
vector with unit length, and its magnitude is 1, and
it points in that direction. If we're evaluating this
negative point like 1 1, we can draw that on the graph,
see where it actually is, and in this case it'll be,
oops, moving things about when I had a point. It'll be this point and if
you kinda look from above, you see that's -1, -1, and if we want the slope at that point, you're kinda thinking
of the tangent line here. Tangent line to that curve,
and we're wondering what its slope is, so, the reason that
the directional derivative is gonna give us this slope,
is because, another notation that might be kinda helpful
for what this directional derivative is, some people
will write partial f, and partial v. You can think about
that as taking a slight nudge in the direction of
v, so this would be a little nudge, a little partial
nudge in the direction of v. And then you're saying
"what changed in the value of the function that's then resulting?" "The height of the graph, does
it devalue the function?". As this initial change
approaches zero and the resulting change approaches zero as
well, that ratio, the ratio of the partial f to partial
v, is going to give the slope of this tangent line. Conceptually, that's kind
of a nicer notation, but the reason we use this other
notation is nabla sub v 1, is it's very indicative
of how you compute things once you need it computed. You take the gradient of f,
just the vector value function gradient of f, and take the
dot product with the vector. Let's actually do that,
just to see what this would look like, and I'll go ahead
and write it over here, use a different color. The gradient of f, first of
all, is a vector full of partial derivatives, it'll be the
partial derivative of f with respect to x and the partial
derivative of f with respect to y. When we actually evaluate
this, we take a look, partial derivative of f with respect
to x, x looks like the variable y I just a constant, so its
partial derivative is 2 times x times y. 2 times x times y. but when we take the partial with respect to y, y now looks
like a variable, and x looks like a constant, derivative of
a constant times a variable, is just that constant x squared. And if we were to evaluate
this at the point -1, -1, then you can plug that in, 2
times -1 times -1 would be 2, and then negative 1 squared, would be 1. So that would be our gradient
at that point, which means if we want to evaluate gradient
of f times v, we could go over here, and say that's 2 1, 'cause we evaluate the gradient
at the point we care about. And then the dot product,
with v itself in this case, root 2 over 2, and root 2 over 2. The answer that we get, we
multiply the fist two components together, 2 times root 2
over 2, then square it to 2, and then here we multiply the
second components together, and that's gonna be 1 times
root 2 over 2, root 2 over 2, and that would be our answer,
that would be our slope. But this only works if your
vector is a unit vector, and I showed this in the last
video where we talked about the formal definition of
the directional derivative. If you scale v by 2, and I
can do it here if instead of v you're talking about 2 v, so
I'll go ahead and make myself some room here. If you're taking the
directional derivative along 2 v of f, the way that we're
computing that, we're still taking the gradient of f, dot product
with 2 times your vector, and dot product, you
can pull out that too. This is just gonna double the
value of the entire thing. V, this started with v, it's
gonna be twice the value, the derivative will become
twice the value, but you don't necessarily want that because
you'd see this plane you sliced with, if instead of
doing it in the direction of v, the unit vector, you did in
the direction of 2 times v, it's the same plane, it's
the same slice you're taking, and you'd want that same
slope, so that's gonna mess everything up. This is super important if
you're thinking about things in the context of slope, one
thing that you could say is your formula for the slope of
a graph in the direction of v, is you take your directional
derivative, that dot product between f and v, and you just
always make sure to divide it by the magnitude of v,
divide it by that magnitude. That will always take care
of what you want, that's basically a way of making sure
that really, you're taking the directional derivative
in the direction of a certain unit vector. Some people even go so far
as to define the directional derivative to be this, to be
something where you normalize out the length of that vector. I don't really like that, but
I think that's because they're thinking of the slope context,
they're thinking of rates of change as being the slope of a graph. One thing I'd like to
emphasize as always, graphical intuition is good, and visual
intuition is always great, you should always be trying
to find a way to think about things visually, but with
multivariable functions, the graph isn't the only way. You can kind of more generally
think about just a nudge in the v direction, and in
the context where v doesn't have a length 1, the nudge
doesn't represent an actual size but it's a certain scaling
constant times that vector, you can look at the video on
the formal definition for the directional derivative, if
you want more details on that. But I do think this is actually
a good way to get a feel for what the directional
derivative is all about.