Main content

## Gradient and directional derivatives

# Directional derivative, formal definition

## Video transcript

- [Voiceover] So I have written
here the formal definition for the partial derivative of a two-variable function
with respect to X, and what I wanna do is build up
to the formal definition of the directional derivative
of that same function in the direction of some
vector V, and you know, V with the little thing on top, this will be some vector in
the input space, and I have another video on the formal definition of the partial derivative if you want to check that out, and just to
really quickly go through here, I've drawn this diagram before, but it's worth drawing again, if you think of your input space,
which is the X Y plane, and you think of it
somehow mapping over to the real number line, which is where your output F lives,
and when you're taking the partial derivative at
a point A B, you're looking over here and you say,
maybe that's your point, some point A B, and you imagine nudging it slightly in the X direction, and saying, hey, how does that influence the function? So, maybe this is where
A B lands, and maybe the result is a nudge that's
a little bit negative. That would be a negative
partial derivative, and you think of the size of that nudge
as partial X, and the size of the resulting nudge in the
output space as partial F. So, the way that you read
this formal definition is you think of this
variable H, you know, people, you could say delta X, but H seems to be the common variable people
use, you think of it as that change in your input space, that slight nudge, and you
look at how that influences the function when you only
change the X component here, you know, you're only
changing the X component with that nudge, and you
say what's the change in F? What's that partial F? So, I'm gonna write this in
a slightly different way, using vector notation. Instead I'm gonna say,
you know, partial F, partial X, and instead of
saying the input is A B, I'm gonna say it's a, you know, just A, and then make it clear that that's a vector, and this will be
a two-dimensional vector, so I'll put that little arrow on top to indicate that it's a
vector, and if we rewrite this definition, we'd be thinking the limit, as H goes to zero, of something divided by H, but that thing, now that we're writing in terms of vector notation, is gonna be F of, so it's gonna be our
original starting point A, but plus what? I mean, up here, it was
clear we could just add it to the first component, but
if I'm not writing in terms of components, and I
have to think in terms of vector addition, really what I'm adding is that H times the
vector, the unit vector in the X direction, and it's common to use, you know, this little I with a hat to represent the unit
vector in the X direction. So when I'm adding these,
it's really the same. You know, this H is only gonna go to that first component, and the
second component is multiplied by zero, and what we subtract off is the value of the function at that original input, that original two-dimensional input
that I'm just thinking of as a vector here, and
when I write it like this, it's actually much clearer
how we might extend this idea to moving in
different directions. 'Cause now, all of the information about what direction you're moving is captured with this vector here, what you multiply your nudge
by as you're adding the input. So let's just rewrite that over here in the context of directional derivative. What you would say is that
the directional derivative in the direction of
some vector, any vector, of F, evaluated at a point,
and we'll think about that input point as
being a vector itself, A. Here, I'll get rid of this guy. It's also gonna be a limit, and as always, with these things, we think of some, not, I mean, always, but with derivatives, you think of some variable
as going to zero, and then that's gonna be on the
denominator, and the change in the function that we're looking for is gonna be F, evaluated at
that initial input vector plus H, that scaling value,
that little nudge of a value, multiplied by the vector whose direction we care about, and then you subtract off the value of F at that original input. So, this right here is
the formal definition for the directional
derivative, and you see how it's much easier to
write in vector notation, because you're thinking of your input as a vector and your output as
just some nudge by something. So, let's take a look at what that would feel like over here. You know, instead of
thinking of D X and a nudge purely in the X direction,
and I'll erase these guys, you would think of this point as being A, as being a vector valued A, so just to make clear how it's a vector, you'd be thinking of it starting at the origin, and the tip represents that point, and then H times V, you know, maybe V is some vector, often, you know, a direction that's neither purely X nor purely Y, but when you scale it down, it'll just
be a tiny little nudge that's gonna be H, that tiny little value, scaling your vector V, so that
tiny little nudge, and what you wonder is, hey,
what's the resulting nudge to the output? And the ratio between the size of that resulting nudge to the output
and the original guy there is your directional derivative,
and more importantly, as you take the limit
for that original nudge getting really really
small, that's gonna be your directional derivative,
and you can probably anticipate there's a way to interpret
this as the slope of a graph. That's what I'm gonna
talk about next video, but you actually have to
be a little bit careful, because we call this the
directional derivative, but notice, if you scale
the value V by two, you know, if you go over here
and you start plugging in two times V and seeing how
that influences things, it'll be twice the change, because here, even if you're scaling
by the same value H, it's gonna double the initial
nudge that you had, and it's gonna double the resulting nudge out here, even though the denominator
H doesn't stay changed. So when you're taking the
ratio, what you're considering is the size of your initial nudge actually might be influenced. So, some authors, they'll actually change this definition, and
they'll throw a little absolute value of the original
vector, just to make sure that when you scale it by
something else, it doesn't influence things, and you
only care about the direction. But, I actually don't like that. I think there's some
usefulness in the definition as it is right here,
and that there's kind of a good interpretation to be had, for when, if you double
the size of your vector, why that should double the
size of your derivative, but I'll get to that in following videos. This right here is the formal definition to be thinking about, and
I'll see you next video.