If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Directional derivative, formal definition

Learn the limit definition of a directional derivative. This helps to clarify what it is really doing.  Created by Grant Sanderson.

Want to join the conversation?

  • leaf green style avatar for user Pedro Vielman
    Isn't the vector "v" supossed to be a unit vector? Because if you were taking a scalar multiple of the vector v, and then computing the directional derivative, then the value of the directional derivative would change. I'm aware Grant mentions that when you double the size of the vector "v", that should double the size of your derivative, but is that really always the case? It seems to me that it would double the value only if the function was linear.
    (32 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user Grant
      When you wish to interpret the directional derivative as a certain slope, namely the slope you get by intersecting the graph with a plane pointing in the direction of your vector, v does indeed have to be a unit vector.

      However, the directional derivative has meaning beyond the notion of slope, and often you actually do want to account for the length of your vector. For example, check out the multivariable chain rule videos.
      (41 votes)
  • primosaur seedling style avatar for user john.doe.13896
    How do you get from the formal definition of the directional derivative to the formula of the previous video, the dot product of the gradient with the directional vector?
    (10 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Taras.Pokalchuk
    Can somedy explain how does adding the two derivatives with coefficients of a vector result in the derivative along that vector?
    (8 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user Joe Kern
      I don't know if this helps, but the way I thought about it was that in single variable calculus you still are sort of multiplying your h in f(a+h) by a vector. It's just that that vector happens to be (1, 0) so the y direction isn't actually taken into account. Does that make sense? So when you do the same thing in multivariable calculus, you aren't multiplying by a single dimension unit vector, instead you're multiplying by a multi dimensional vector that may not have similar values of change in the x and y position. That's the way I think about it anyway, hope that helps.
      (5 votes)
  • blobby green style avatar for user rohanparikh00
    Near the end of the video, he discusses that changing the value with which you scale the vector changes the derivative; for example, doubling the vector causes the derivative to double. However, since you are taking the value of the vector as h approaches 0, and the vector "v" is multiplied by h, why does the change in magnitude but no change in direction change the derivative?
    (5 votes)
    Default Khan Academy avatar avatar for user
    • starky ultimate style avatar for user Kerwin Yi
      I think it is because you should always remember the denominator part is the tiny movement of the input value, which means it should always be h times the scalar of the vector. If you use a unit vector, then the bottom part is h*|v| = h*1 = h; however, if you use a non-unit vector, say twice the length of the unit vector, then the bottom part should be h*|2|. I think it would be easier to understand if Grant just gives the definition with h|v| at the bottom since h itself is not the tiny movement on the vector direction.
      (2 votes)
  • blobby green style avatar for user Matthew4.tch
    How should we think about the connection between the dot product definition of the directional derivative and the limit definition presented here?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • male robot donald style avatar for user Venkata
      Nice question! I like to think of it like this: A dot product essentially tells you how much of a vector is in the direction of another. So, the directional derivative tells you how much the gradient is in the direction of our desired unit vector.

      Now, look at the formal definition. We have the term f(x + hv). This is basically the change in the value of the function f(x) by a small amount h in the direction of v.

      So, compare these ideas now. The dot product is telling us how much the gradient is changing in the direction of v. Similarly, the formal definition is also telling us how much the function is changing in the direction of v. And just to put the cherry on top, the gradient is the function at hand.
      (3 votes)
  • piceratops sapling style avatar for user Omar Badran
    shouldn't it be h(i-hat) in the numerator too?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • duskpin ultimate style avatar for user Maanav Khaitan
    At the inception of the video (), why does Grant say partial derivative with respect to x? Shouldn't it instead be with respect to a?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • primosaur ultimate style avatar for user uncinoO
      Maybe when he writes partial with respect to "x", means partial with respect to the "x" component of the vector "a", infact the unit vector "i" explicitly adds the change "h" to the "x" component of the vector "a". Actually I'm confused too, about the the meaning of the syntax of the derivative operator applied on function that takes a vector as input.
      (3 votes)
  • blobby green style avatar for user mednawfalmaarouf
    isn't it partial derivative of f with respect to a not x ?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • leaf blue style avatar for user Shubham Arya
    can someone explain me use of Absolute function in denominator ?
    (1 vote)
    Default Khan Academy avatar avatar for user
  • starky ultimate style avatar for user Kerwin Yi
    I don't know if my understanding of v has to be a unit vector is correct, please correct me if I misinterpret it. So I think the reason why it has to be a unit vector is because the dx part (denominator) is the tiny movement for the input, and so it should always be not just h but h times the scalar of the vector, for the existance of h is just to shrink the size of movement to approach zero. The denominator should always be h×|v|, but for the sake of simplicity, people make the vector a unit vector so that the bottom part becomes h×1= h.
    (2 votes)
    Default Khan Academy avatar avatar for user

Video transcript

- [Voiceover] So I have written here the formal definition for the partial derivative of a two-variable function with respect to X, and what I wanna do is build up to the formal definition of the directional derivative of that same function in the direction of some vector V, and you know, V with the little thing on top, this will be some vector in the input space, and I have another video on the formal definition of the partial derivative if you want to check that out, and just to really quickly go through here, I've drawn this diagram before, but it's worth drawing again, if you think of your input space, which is the X Y plane, and you think of it somehow mapping over to the real number line, which is where your output F lives, and when you're taking the partial derivative at a point A B, you're looking over here and you say, maybe that's your point, some point A B, and you imagine nudging it slightly in the X direction, and saying, hey, how does that influence the function? So, maybe this is where A B lands, and maybe the result is a nudge that's a little bit negative. That would be a negative partial derivative, and you think of the size of that nudge as partial X, and the size of the resulting nudge in the output space as partial F. So, the way that you read this formal definition is you think of this variable H, you know, people, you could say delta X, but H seems to be the common variable people use, you think of it as that change in your input space, that slight nudge, and you look at how that influences the function when you only change the X component here, you know, you're only changing the X component with that nudge, and you say what's the change in F? What's that partial F? So, I'm gonna write this in a slightly different way, using vector notation. Instead I'm gonna say, you know, partial F, partial X, and instead of saying the input is A B, I'm gonna say it's a, you know, just A, and then make it clear that that's a vector, and this will be a two-dimensional vector, so I'll put that little arrow on top to indicate that it's a vector, and if we rewrite this definition, we'd be thinking the limit, as H goes to zero, of something divided by H, but that thing, now that we're writing in terms of vector notation, is gonna be F of, so it's gonna be our original starting point A, but plus what? I mean, up here, it was clear we could just add it to the first component, but if I'm not writing in terms of components, and I have to think in terms of vector addition, really what I'm adding is that H times the vector, the unit vector in the X direction, and it's common to use, you know, this little I with a hat to represent the unit vector in the X direction. So when I'm adding these, it's really the same. You know, this H is only gonna go to that first component, and the second component is multiplied by zero, and what we subtract off is the value of the function at that original input, that original two-dimensional input that I'm just thinking of as a vector here, and when I write it like this, it's actually much clearer how we might extend this idea to moving in different directions. 'Cause now, all of the information about what direction you're moving is captured with this vector here, what you multiply your nudge by as you're adding the input. So let's just rewrite that over here in the context of directional derivative. What you would say is that the directional derivative in the direction of some vector, any vector, of F, evaluated at a point, and we'll think about that input point as being a vector itself, A. Here, I'll get rid of this guy. It's also gonna be a limit, and as always, with these things, we think of some, not, I mean, always, but with derivatives, you think of some variable as going to zero, and then that's gonna be on the denominator, and the change in the function that we're looking for is gonna be F, evaluated at that initial input vector plus H, that scaling value, that little nudge of a value, multiplied by the vector whose direction we care about, and then you subtract off the value of F at that original input. So, this right here is the formal definition for the directional derivative, and you see how it's much easier to write in vector notation, because you're thinking of your input as a vector and your output as just some nudge by something. So, let's take a look at what that would feel like over here. You know, instead of thinking of D X and a nudge purely in the X direction, and I'll erase these guys, you would think of this point as being A, as being a vector valued A, so just to make clear how it's a vector, you'd be thinking of it starting at the origin, and the tip represents that point, and then H times V, you know, maybe V is some vector, often, you know, a direction that's neither purely X nor purely Y, but when you scale it down, it'll just be a tiny little nudge that's gonna be H, that tiny little value, scaling your vector V, so that tiny little nudge, and what you wonder is, hey, what's the resulting nudge to the output? And the ratio between the size of that resulting nudge to the output and the original guy there is your directional derivative, and more importantly, as you take the limit for that original nudge getting really really small, that's gonna be your directional derivative, and you can probably anticipate there's a way to interpret this as the slope of a graph. That's what I'm gonna talk about next video, but you actually have to be a little bit careful, because we call this the directional derivative, but notice, if you scale the value V by two, you know, if you go over here and you start plugging in two times V and seeing how that influences things, it'll be twice the change, because here, even if you're scaling by the same value H, it's gonna double the initial nudge that you had, and it's gonna double the resulting nudge out here, even though the denominator H doesn't stay changed. So when you're taking the ratio, what you're considering is the size of your initial nudge actually might be influenced. So, some authors, they'll actually change this definition, and they'll throw a little absolute value of the original vector, just to make sure that when you scale it by something else, it doesn't influence things, and you only care about the direction. But, I actually don't like that. I think there's some usefulness in the definition as it is right here, and that there's kind of a good interpretation to be had, for when, if you double the size of your vector, why that should double the size of your derivative, but I'll get to that in following videos. This right here is the formal definition to be thinking about, and I'll see you next video.