If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Directional derivatives (introduction)

How does the value of a multivariable function change as you nudge the input in a specific direction?

What we're building to

  • If you have some multivariable function, f, left parenthesis, x, comma, y, right parenthesis and some vector in the function's input space, start bold text, v, end bold text, with, vector, on top, the directional derivative of f along start bold text, v, end bold text, with, vector, on top tells you the rate at which f will change while the input moves with velocity vector start bold text, v, end bold text, with, vector, on top.
Change input in the direction of start bold text, v, end bold text, with, vector, on top
  • The notation here is del, start subscript, start bold text, v, end bold text, with, vector, on top, end subscript, f, and it is computed by taking the dot product between the gradient of f and the vector start bold text, v, end bold text, with, vector, on top, that is, del, f, dot, start bold text, v, end bold text, with, vector, on top
  • When the directional derivative is used to compute slope, be sure to normalize the vector start bold text, v, end bold text, with, vector, on top first.

Generalizing partial derivatives

Consider some multivariable function:
f, left parenthesis, x, comma, y, right parenthesis, equals, x, squared, minus, x, y
We know that the partial derivatives with respect to x and y tell us the rate of change of f as we nudge the input either in the x or y direction.
The question now is what happens when we nudge the input of f in a direction which is not parallel to the x or y axes.
For example, the image below shows the graph of f along with a small step along a vector start bold text, v, end bold text, with, vector, on top in the input space, meaning the x, y-plane in this case. Is there an operation which tells us how the height of the graph above the tip of start bold text, v, end bold text, with, vector, on top compares to the height of the graph above its tail?
Change input in the direction of start bold text, v, end bold text, with, vector, on top
As you have probably guessed, there is a new type of derivative, called the directional derivative, which answers this question.
Just as the partial derivative is taken with respect to some input variable—e.g., x or y—the directional derivative is taken along some vector start bold text, v, end bold text, with, vector, on top in the input space.
One very helpful way to think about this is to picture a point in the input space moving with velocity start bold text, v, end bold text, with, vector, on top. The directional derivative of f along start bold text, v, end bold text, with, vector, on top is the resulting rate of change in the output of the function. So, for example, multiplying the vector start bold text, v, end bold text, with, vector, on top by two would double the value of the directional derivative since all changes would be happening twice as fast.

Notation

There are quite a few different notations for this one concept:
  • del, start subscript, start bold text, v, end bold text, with, vector, on top, end subscript, f
  • start fraction, \partial, f, divided by, \partial, start bold text, v, end bold text, with, vector, on top, end fraction
  • f, start subscript, start bold text, v, end bold text, with, vector, on top, end subscript, prime
  • D, start subscript, start bold text, v, end bold text, with, vector, on top, f, end subscript
  • \partial, start subscript, start bold text, v, end bold text, with, vector, on top, end subscript, f
All of these represent the same thing: the rate of change of f as you nudge the input along the direction of start bold text, v, end bold text, with, vector, on top. We'll use the del, start subscript, start bold text, v, end bold text, with, vector, on top, end subscript, f notation, just because it subtly hints at how you compute the directional derivative using the gradient, which you'll see in a moment.

Example 1: start bold text, v, end bold text, with, vector, on top, equals, start bold text, j, end bold text, with, hat, on top

Before jumping into the general rule for computing del, start subscript, start bold text, v, end bold text, with, vector, on top, end subscript, f, let's look at how we can rewrite the more familiar notion of a partial derivative as a directional derivative.
For example, the partial derivative start fraction, \partial, f, divided by, \partial, y, end fraction tells us the rate at which f changes as we nudge the input in the y direction. In other words, as we nudge it along the vector start bold text, j, end bold text, with, hat, on top. Therefore, we could equivalently write the partial derivative with respect to y as start fraction, \partial, f, divided by, \partial, y, end fraction, equals, del, start subscript, start bold text, j, end bold text, with, hat, on top, end subscript, f.
This is all just fiddling with different notation. What's more important is to have a clear mental image of what all this notation​ represents.
Reflection Question: Suppose start bold text, v, end bold text, with, vector, on top, equals, start bold text, i, end bold text, with, hat, on top, plus, start bold text, j, end bold text, with, hat, on top, what is your best guess for del, start subscript, start bold text, v, end bold text, with, vector, on top, end subscript, f?

How to compute the directional derivative

Let's say you have a multivariable f, left parenthesis, x, comma, y, comma, z, right parenthesis which takes in three variables—x, y and z—and you want to compute its directional derivative along the following vector:
v=[231] \vec{\textbf{v}} = \left[ \begin{array}{c} \blueE{2} \\ \redE{3} \\ \greenE{-1} \end{array} \right]
The answer, as it turns out, is
del, start subscript, start bold text, v, end bold text, with, vector, on top, end subscript, f, equals, start color #0c7f99, 2, end color #0c7f99, start fraction, \partial, f, divided by, \partial, x, end fraction, plus, start color #bc2612, 3, end color #bc2612, start fraction, \partial, f, divided by, \partial, y, end fraction, plus, start color #0d923f, left parenthesis, minus, 1, right parenthesis, end color #0d923f, start fraction, \partial, f, divided by, \partial, z, end fraction
This should make sense because a tiny nudge along start bold text, v, end bold text, with, vector, on top can be broken down into start color #0c7f99, t, w, o, end color #0c7f99 tiny nudges in the x-direction, start color #bc2612, t, h, r, e, e, end color #bc2612 tiny nudges in the y-direction, and a tiny nudge backwards, by start color #0d923f, minus, 1, end color #0d923f, in the z-direction. We'll go through the rigorous reasoning behind this much more thoroughly in the next article.
More generally, we can write the vector start bold text, v, end bold text, with, vector, on top abstractly as follows:
v=[v1v2v3] \vec{\textbf{v}} = \left[ \begin{array}{c} \blueE{v_1} \\ \redE{v_2} \\ \greenE{v_3} \end{array} \right]
The directional derivative looks like this:
del, start subscript, start bold text, v, end bold text, with, vector, on top, end subscript, f, equals, start color #0c7f99, v, start subscript, 1, end subscript, end color #0c7f99, start fraction, \partial, f, divided by, \partial, x, end fraction, plus, start color #bc2612, v, start subscript, 2, end subscript, end color #bc2612, start fraction, \partial, f, divided by, \partial, y, end fraction, plus, start color #0d923f, v, start subscript, 3, end subscript, end color #0d923f, start fraction, \partial, f, divided by, \partial, z, end fraction
That is, a tiny nudge in the start bold text, v, end bold text, with, vector, on top direction consists of start color #0c7f99, v, start subscript, 1, end subscript, end color #0c7f99 times a tiny nudge in the x-direction, start color #bc2612, v, start subscript, 2, end subscript, end color #bc2612 times a tiny nudge in the y-direction, and start color #0d923f, v, start subscript, 3, end subscript, end color #0d923f times a tiny nudge in the z-direction.
This can be written in a super-pleasing compact way using the dot product and the gradient:
=vf(x,y,z)=v1fx(x,y,z)+v2fy(x,y,z)+v3fz(x,y,z)=[fx(x,y,z)fy(x,y,z)fz(x,y,z)][v1v2v3]=f(x,y,z)v\begin{aligned} &\phantom{=}\nabla_{\vec{\textbf{v}}} f(x, y, z) \\\\ &= \blueE{v_1} \dfrac{\partial f}{\partial x}(x, y, z) + \redE{v_2} \dfrac{\partial f}{\partial y}(x, y, z) + \greenE{v_3} \dfrac{\partial f}{\partial z}(x, y, z) \\\\ &= \left[ \begin{array}{c} \dfrac{\partial f}{\partial x}(x, y, z) \\\\ \dfrac{\partial f}{\partial y}(x, y, z) \\\\ \dfrac{\partial f}{\partial z}(x, y, z) \end{array} \right] \cdot \left[ \begin{array}{c} \blueE{v_1} \\\\ \redE{v_2} \\\\ \greenE{v_3} \end{array} \right] \\\\ &= \nabla f(x, y, z) \cdot \vec{\textbf{v}} \end{aligned}
This is why the notation del, start subscript, start bold text, v, end bold text, with, vector, on top, end subscript is so suggestive of the way we compute the directional derivative:
vf=fv\begin{aligned} \nabla_{\maroonD{\vec{\textbf{v}}}} f = \nabla f \cdot \maroonD{\vec{\textbf{v}}} \end{aligned}
Take a moment to delight in the fact that one single operation, the gradient, packs enough information to compute the rate of change of a function in every possible direction! That's so many directions! Left, right, up, down, north-north-east, 34.8degrees clockwise from the x-axis... Madness!

Example 2:

Problem: Take a look at the following function.
f, left parenthesis, x, comma, y, right parenthesis, equals, x, squared, minus, x, y,
What is the directional derivative of f at the point left parenthesis, 2, comma, minus, 3, right parenthesis along the vector v=0.6i^+0.8j^\begin{aligned} \vec{\textbf{v}} = \blueE{0.6} \hat{\textbf{i}} + \redE{0.8} \hat{\textbf{j}} \end{aligned}?
Solution: You can think of the direction derivative either as a weighted sum of partial derivatives, as below:
vf=0.6fx+0.8fy\begin{aligned} \nabla_{\vec{\textbf{v}}}f = \blueE{0.6} \dfrac{\partial f}{\partial x} + \redE{0.8} \dfrac{\partial f}{\partial y} \end{aligned}
Or, you can think of it as a dot product with the gradient, as you see here:
vf=fv\begin{aligned} \nabla_{\vec{\textbf{v}}}f = \nabla f \cdot \vec{\textbf{v}} \end{aligned}
The first is faster, but just for practice, let's see how the gradient interpretation unfolds. We start by computing the gradient itself:
f=[fxfy]=[x(x2xy)y(x2xy)]=[2xyx] \nabla f = \left[ \begin{array}{c} \dfrac{\partial f}{\blueE{\partial x}} \\ \\ \dfrac{\partial f}{\redE{\partial y}} \\ \end{array} \right] = \left[ \begin{array}{c} \dfrac{\partial }{\blueE{\partial x}} (\blueE{x}^2 - \blueE{x}y) \\ \\ \dfrac{\partial}{\redE{\partial y}} (x^2 - x\redE{y}) \\ \end{array} \right] = \left[ \begin{array}{c} 2\blueE{x} - y \\ -x \end{array} \right]
Next, plug in the point left parenthesis, x, comma, y, right parenthesis, equals, left parenthesis, 2, comma, minus, 3, right parenthesis since this is the point the question asks us about.
f(2,3)=[2(2)(3)(2)]=[72]\begin{aligned} \nabla f(2, -3) = \left[\begin{array}{c} 2(2) - (-3) \\\\ -(2) \end{array} \right] = \left[\begin{array}{c} 7 \\\\ -2 \end{array} \right] \end{aligned}
To get the desired directional derivative, we take the dot product between this gradient and start bold text, v, end bold text:
vf(2,3)=f(2,3)(0.6i^+0.8j^)=[72][0.60.8]=7(0.6)+(2)(0.8)=2.6\begin{aligned} \nabla_{\vec{\textbf{v}}} f(2, -3) &= \nabla f(2, -3) \cdot \left( \blueE{0.6} \hat{\textbf{i}} + \redE{0.8} \hat{\textbf{j}} \right) \\\\ &= \left[ \begin{array}{c} 7 \\\\ -2 \end{array} \right] \cdot \left[ \begin{array}{c} \blueE{0.6} \\\\ \redE{0.8} \end{array} \right] \\\\ &= 7(\blueE{0.6}) + (-2)(\redE{0.8}) \\\\ &= 2.6 \end{aligned}

Finding slope

How do you find the slope of a graph intersected with a plane that is not parallel to the x or y axes?
Slice graph in a direction not parallel to x or y directions
You can use the directional derivative, but there is one important thing to remember:
If the directional derivative is used to compute slope, either start bold text, v, end bold text, with, vector, on top must be a unit vector or you must remember to divide by vertical bar, vertical bar, start bold text, v, end bold text, with, vector, on top, vertical bar, vertical bar at the end.
In the definition and computation above, doubling the length of start bold text, v, end bold text, with, vector, on top would double the value of the directional derivative. In terms of the computation, this is because del, f, dot, left parenthesis, 2, start bold text, v, end bold text, with, vector, on top, right parenthesis, equals, 2, left parenthesis, del, f, dot, v, right parenthesis.
However, this might not always be what you want. The slope of a graph in the direction of start bold text, v, end bold text, with, vector, on top, for example, depends only on the direction of start bold text, v, end bold text, with, vector, on top, not the magnitude vertical bar, vertical bar, start bold text, v, end bold text, with, vector, on top, vertical bar, vertical bar. Let's see why.
How can we imagine this slope? Slice the graph of f with a vertical plane that cuts the x, y-plane in the direction of start bold text, v, end bold text, with, vector, on top. The slope in question is that of a line tangent to the resulting curve. As with any slope, we look for the rise over run.
Computing slope using the directional derivative
In this case, the run will be the distance of a small nudge in the direction of start bold text, v, end bold text, with, vector, on top. We can express such a nudge as an addition of h, start bold text, v, end bold text, with, vector, on top to an input point start bold text, x, end bold text, start subscript, 0, end subscript, where h is thought of as some small number. The magnitude of this nudge is h, vertical bar, vertical bar, start bold text, v, end bold text, with, vector, on top, vertical bar, vertical bar.
The resulting change in the output of f can be approximated by multiplying this little value h by the directional derivative:
h, del, start subscript, start bold text, v, end bold text, with, vector, on top, end subscript, f, left parenthesis, x, start subscript, 0, end subscript, comma, y, start subscript, 0, end subscript, right parenthesis
In fact, the rise of the tangent line—as opposed to the graph of the function— is precisely h, del, start subscript, start bold text, v, end bold text, with, vector, on top, end subscript, f, left parenthesis, x, start subscript, 0, end subscript, comma, y, start subscript, 0, end subscript, right parenthesis due to this run of size h, vertical bar, vertical bar, start bold text, v, end bold text, with, vector, on top, vertical bar, vertical bar. For full details on why this is true, see the formal definition of the directional derivative in the next article.
Therefore, the rise-over-run slope of our graph is
hvf(x0,y0)hv=vf(x0,y0)v\begin{aligned} \dfrac{h\nabla_{\vec{\textbf{v}}}f(x_0, y_0)}{h||v||} = \boxed{\dfrac{\nabla_{\vec{\textbf{v}}}f(x_0, y_0)}{||v||}} \end{aligned}
Notice, if start bold text, v, end bold text, with, vector, on top is a unit vector, meaning vertical bar, vertical bar, start bold text, v, end bold text, with, vector, on top, vertical bar, vertical bar, equals, 1, then the directional derivative does give the slope of a graph along that direction. Otherwise, it is important to remember to divide out by the magnitude of start bold text, v, end bold text, with, vector, on top.
Some authors even go so far as to include normalization in the definition of del, start subscript, start bold text, v, end bold text, with, vector, on top, end subscript, f.
Alternate definition of directional derivative:
vf(x)=limh0f(x+hv)f(x)hv\begin{aligned} \nabla_{\vec{\textbf{v}}} f(\textbf{x}) = \lim_{h \to 0}\dfrac{f(\textbf{x} + h\vec{\textbf{v}}) - f(\textbf{x})}{h\blueE{||\vec{\textbf{v}}||}} \end{aligned}
Personally, I think this definition puts too much emphasis on the particular use case of finding slope, so I prefer to use the original definition and normalize start bold text, v, end bold text, with, vector, on top when necessary.

Example 3: Slope

Problem: On the stage for this problem we have three players.
Player 1, the function:
f, left parenthesis, x, comma, y, right parenthesis, equals, sine, left parenthesis, x, y, right parenthesis
Player 2, the point:
left parenthesis, x, start subscript, 0, end subscript, comma, y, start subscript, 0, end subscript, right parenthesis, equals, left parenthesis, start fraction, pi, divided by, 3, end fraction, comma, start fraction, 1, divided by, 2, end fraction, right parenthesis
Player 3, the vector:
start bold text, v, end bold text, with, vector, on top, equals, 2, start bold text, i, end bold text, with, hat, on top, plus, 3, start bold text, j, end bold text, with, hat, on top
What is the slope of the graph of f at the point left parenthesis, x, start subscript, 0, end subscript, comma, y, start subscript, 0, end subscript, right parenthesis along the vector start bold text, v, end bold text, with, vector, on top?
Answer: Since we are finding slope, we must first normalize the vector in question. The magnitude vertical bar, vertical bar, start bold text, v, end bold text, with, vector, on top, vertical bar, vertical bar is square root of, 2, squared, plus, 3, squared, end square root, equals, square root of, 13, end square root, so we divide each term by square root of, 13, end square root to get the resulting unit vector start bold text, u, end bold text, with, hat, on top in the direction of start bold text, v, end bold text, with, vector, on top:
Next, find the gradient of f:
Plug in the point left parenthesis, x, start subscript, 0, end subscript, comma, y, start subscript, 0, end subscript, right parenthesis, equals, left parenthesis, start fraction, pi, divided by, 3, end fraction, comma, start fraction, 1, divided by, 2, end fraction, right parenthesis to this gradient.
Finally, take the dot product between start bold text, u, end bold text, with, hat, on top and del, f, left parenthesis, pi, slash, 3, comma, 1, slash, 2, right parenthesis:

Summary

  • If you have some multivariable function, f, left parenthesis, x, comma, y, right parenthesis and some vector in the function's input space, start bold text, v, end bold text, with, vector, on top, the directional derivative of f along start bold text, v, end bold text, with, vector, on top tells you the rate at which f will change while the input moves with velocity vector start bold text, v, end bold text, with, vector, on top.
  • The notation here is del, start subscript, start bold text, v, end bold text, with, vector, on top, end subscript, f, and it is computed by taking the dot product between the gradient of f and the vector start bold text, v, end bold text, with, vector, on top, that is, del, f, dot, start bold text, v, end bold text, with, vector, on top.
  • When the directional derivative is used to compute slope, be sure to normalize the vector start bold text, v, end bold text, with, vector, on top first.

Want to join the conversation?

  • blobby green style avatar for user harrysonghurst1
    In example 3, is there an error?
    cos((1/2) * (pi/3)) =/= 1/2

    SqRt(3)/2 is what I get. I think you have taken the sin of pi/6 instead of cos.
    (20 votes)
    Default Khan Academy avatar avatar for user
  • leafers ultimate style avatar for user gschex1112
    In example 3: slope, the magnitude of v should be sqrt(2^2+3^3) = sqrt(13). sqrt(4^2+3^2) = sqrt(25) = 5, not sqrt(13), and 4 is not part of the vector v.

    On a side note, I'm glad to see I'm not the only one who works through an operation and then puts the result back in to the operation, as if it still needs to be solved. I've gotten a few KA exercises wrong that way, usually involving simple arithmetic after having taken care of the calculus.
    (5 votes)
    Default Khan Academy avatar avatar for user
  • piceratops seedling style avatar for user Jorge Luis Borges Vázquez
    Why does he say, the vector "v" is the velocity vector?? I think in this context, it is the displacement vector. We are not considering time, just space.

    If we just consider the graph, (independently of if the real function represents the output of benefits from two inputs of production and investiment, in an economy problem), we obtain the increment of Z distance (increment of the functions), for an increment of a combination of distances in X and Y. Talking about velocity has not sense here.
    (4 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user Taras.Pokalchuk
      It's helpful to think about v as of a velocity vector because if you move along v 2 as fast the resulting rate of change has to be 2 as fast (and it is if you double the directional vector). But if you think of it as of distance, i will not be intuitive to think that doubling the distance traveled will double the output.
      (2 votes)
  • blobby green style avatar for user Taras.Pokalchuk
    if h is an infinitsimal why does the magnitude of v matter? even if it would matter wouldn't it be better to aproach the vector's magnitude to zero too?
    (4 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user Alexander Wu
      Rate of change with h approaching zero is equivalent to the slope of a tangent. If you are using the rate of change on the original graph, h must be tiny. If you are using a tangent line, then h can be whatever, since the slope is constant.

      v can't possibly be zero since the zero vector has no direction. It has to have some length to retain its directional information, We decide 1 is the best choice because it's the most general number other than 0.

      Slope is defined as rise/run, so it is also rise when run = 1. rise/run = rise/1 = rise. We could of course had defined it as 2rise/run or run/rise, which would still retain all the useful information about how steep the graph is, but we defined it as rise/run, and so we have to use ||v|| = 1.
      (0 votes)
  • leaf blue style avatar for user Chris
    I'm still not sure why you have to normalize vector v when computing the directional derivative for slope. Isn't the directional derivative just computing the rate at which f will change while the input moves along v, which is a lengthier way of describing the slope?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • primosaur ultimate style avatar for user shayanaminnjad.sa
      The derivative means instantaneous rate of change. it is obvious if you move along a vector, the bigger the magnitude of the vector is you travel faster, so in each instance, you have a bigger instantaneous rate of change. but the slope is something different. you only care about the rise over run. two vectors with different magnitude have the same rise over run if they point in the same direction. so if we are using derivative as a mean to get to the slope, we ignore the magnitude, cause we only care about the direction. Hope my answer is clear.
      (2 votes)
  • blobby green style avatar for user Richard
    so if I compute the directional derivative, having the unit vector as my direction I get the slope of the surface right?, if i dont use a unit vector what do i get? Im asking for a physical interpretation.
    thanks!
    (2 votes)
    Default Khan Academy avatar avatar for user
  • aqualine tree style avatar for user Steve Wallace
    example 2 calculates the directional derivative and uses dot product with gradient and the vector components, yet example 3 in calculating slope, converts the vector to the unit vector before the dot product. Whats the difference between directional derivative and slope?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • ohnoes default style avatar for user Tejas
      There is no difference. Whenever you calculate either, you need to make the vector specifying direction a unit vector. In example 2, the 0.6î+0.8ĵ is already a unit vector, so there was no need to convert anything.
      (1 vote)
  • orange juice squid orange style avatar for user Radu Marin
    In example 1, reflection question, since v = i + j, why isn't the gradient along v, sqrt(2)/2*df/dx+ sqrt(2)/2*df/dy, since we have to normalize it? I'm a bit confused with having two definitions with different meaning for the gradient... some physical examples on when you use one or another?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • purple pi purple style avatar for user Gadzookie2
    For example 3, shouldn't it be root(2 squared + 3 squared) and then 3/root(13) j?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • duskpin ultimate style avatar for user Sheikheddy
    What does it mean to "normalize" a vector?
    (1 vote)
    Default Khan Academy avatar avatar for user