If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

### Course: Multivariable calculus>Unit 2

Lesson 3: Partial derivative and gradient (articles)

# Directional derivatives (introduction)

How does the value of a multivariable function change as you nudge the input in a specific direction?

## What we're building to

• If you have some multivariable function, $f\left(x,y\right)$ and some vector in the function's input space, $\stackrel{\to }{\mathbf{\text{v}}}$, the directional derivative of $f$ along $\stackrel{\to }{\mathbf{\text{v}}}$ tells you the rate at which $f$ will change while the input moves with velocity vector $\stackrel{\to }{\mathbf{\text{v}}}$.
• The notation here is ${\mathrm{\nabla }}_{\stackrel{\to }{\mathbf{\text{v}}}}f$, and it is computed by taking the dot product between the gradient of $f$ and the vector $\stackrel{\to }{\mathbf{\text{v}}}$, that is, $\mathrm{\nabla }f\cdot \stackrel{\to }{\mathbf{\text{v}}}$
• When the directional derivative is used to compute slope, be sure to normalize the vector $\stackrel{\to }{\mathbf{\text{v}}}$ first.

## Generalizing partial derivatives

Consider some multivariable function:
$f\left(x,y\right)={x}^{2}-xy$
We know that the partial derivatives with respect to $x$ and $y$ tell us the rate of change of $f$ as we nudge the input either in the $x$ or $y$ direction.
The question now is what happens when we nudge the input of $f$ in a direction which is not parallel to the $x$ or $y$ axes.
For example, the image below shows the graph of $f$ along with a small step along a vector $\stackrel{\to }{\mathbf{\text{v}}}$ in the input space, meaning the $xy$-plane in this case. Is there an operation which tells us how the height of the graph above the tip of $\stackrel{\to }{\mathbf{\text{v}}}$ compares to the height of the graph above its tail?
As you have probably guessed, there is a new type of derivative, called the directional derivative, which answers this question.
Just as the partial derivative is taken with respect to some input variable—e.g., $x$ or $y$—the directional derivative is taken along some vector $\stackrel{\to }{\mathbf{\text{v}}}$ in the input space.
One very helpful way to think about this is to picture a point in the input space moving with velocity $\stackrel{\to }{\mathbf{\text{v}}}$. The directional derivative of $f$ along $\stackrel{\to }{\mathbf{\text{v}}}$ is the resulting rate of change in the output of the function. So, for example, multiplying the vector $\stackrel{\to }{\mathbf{\text{v}}}$ by two would double the value of the directional derivative since all changes would be happening twice as fast.

## Notation

There are quite a few different notations for this one concept:
• ${\mathrm{\nabla }}_{\stackrel{\to }{\mathbf{\text{v}}}}f$
• $\frac{\partial f}{\partial \stackrel{\to }{\mathbf{\text{v}}}}$
• ${f}_{\stackrel{\to }{\mathbf{\text{v}}}}^{\prime }$
• ${D}_{\stackrel{\to }{\mathbf{\text{v}}}f}$
• ${\partial }_{\stackrel{\to }{\mathbf{\text{v}}}}f$
All of these represent the same thing: the rate of change of $f$ as you nudge the input along the direction of $\stackrel{\to }{\mathbf{\text{v}}}$. We'll use the ${\mathrm{\nabla }}_{\stackrel{\to }{\mathbf{\text{v}}}}f$ notation, just because it subtly hints at how you compute the directional derivative using the gradient, which you'll see in a moment.

## Example 1: $\stackrel{\to }{\mathbf{\text{v}}}=\stackrel{^}{\mathbf{\text{j}}}$‍

Before jumping into the general rule for computing ${\mathrm{\nabla }}_{\stackrel{\to }{\mathbf{\text{v}}}}f$, let's look at how we can rewrite the more familiar notion of a partial derivative as a directional derivative.
For example, the partial derivative $\frac{\partial f}{\partial y}$ tells us the rate at which $f$ changes as we nudge the input in the $y$ direction. In other words, as we nudge it along the vector $\stackrel{^}{\mathbf{\text{j}}}$. Therefore, we could equivalently write the partial derivative with respect to $y$ as $\frac{\partial f}{\partial y}={\mathrm{\nabla }}_{\stackrel{^}{\mathbf{\text{j}}}}f$.
This is all just fiddling with different notation. What's more important is to have a clear mental image of what all this notation​ represents.
Reflection Question: Suppose $\stackrel{\to }{\mathbf{\text{v}}}=\stackrel{^}{\mathbf{\text{i}}}+\stackrel{^}{\mathbf{\text{j}}}$, what is your best guess for ${\mathrm{\nabla }}_{\stackrel{\to }{\mathbf{\text{v}}}}f\phantom{\rule{0.167em}{0ex}}$?

## How to compute the directional derivative

Let's say you have a multivariable $f\left(x,y,z\right)$ which takes in three variables—$x$, $y$ and $z$—and you want to compute its directional derivative along the following vector:
$\stackrel{\to }{\mathbf{\text{v}}}=\left[\begin{array}{c}2\\ 3\\ -1\end{array}\right]$
The answer, as it turns out, is
${\mathrm{\nabla }}_{\stackrel{\to }{\mathbf{\text{v}}}}f=2\frac{\partial f}{\partial x}+3\frac{\partial f}{\partial y}+\left(-1\right)\frac{\partial f}{\partial z}$
This should make sense because a tiny nudge along $\stackrel{\to }{\mathbf{\text{v}}}$ can be broken down into $two$ tiny nudges in the $x$-direction, $three$ tiny nudges in the $y$-direction, and a tiny nudge backwards, by $-1$, in the $z$-direction. We'll go through the rigorous reasoning behind this much more thoroughly in the next article.
More generally, we can write the vector $\stackrel{\to }{\mathbf{\text{v}}}$ abstractly as follows:
$\stackrel{\to }{\mathbf{\text{v}}}=\left[\begin{array}{c}{v}_{1}\\ {v}_{2}\\ {v}_{3}\end{array}\right]$
The directional derivative looks like this:
${\mathrm{\nabla }}_{\stackrel{\to }{\mathbf{\text{v}}}}f={v}_{1}\frac{\partial f}{\partial x}+{v}_{2}\frac{\partial f}{\partial y}+{v}_{3}\frac{\partial f}{\partial z}$
That is, a tiny nudge in the $\stackrel{\to }{\mathbf{\text{v}}}$ direction consists of ${v}_{1}$ times a tiny nudge in the $x$-direction, ${v}_{2}$ times a tiny nudge in the $y$-direction, and ${v}_{3}$ times a tiny nudge in the $z$-direction.
This can be written in a super-pleasing compact way using the dot product and the gradient:
$\begin{array}{rl}& \phantom{=}{\mathrm{\nabla }}_{\stackrel{\to }{\mathbf{\text{v}}}}f\left(x,y,z\right)\\ \\ & ={v}_{1}\frac{\partial f}{\partial x}\left(x,y,z\right)+{v}_{2}\frac{\partial f}{\partial y}\left(x,y,z\right)+{v}_{3}\frac{\partial f}{\partial z}\left(x,y,z\right)\\ \\ & =\left[\begin{array}{c}\frac{\partial f}{\partial x}\left(x,y,z\right)\\ \\ \frac{\partial f}{\partial y}\left(x,y,z\right)\\ \\ \frac{\partial f}{\partial z}\left(x,y,z\right)\end{array}\right]\cdot \left[\begin{array}{c}{v}_{1}\\ \\ {v}_{2}\\ \\ {v}_{3}\end{array}\right]\\ \\ & =\mathrm{\nabla }f\left(x,y,z\right)\cdot \stackrel{\to }{\mathbf{\text{v}}}\end{array}$
This is why the notation ${\mathrm{\nabla }}_{\stackrel{\to }{\mathbf{\text{v}}}}$ is so suggestive of the way we compute the directional derivative:
$\begin{array}{r}{\mathrm{\nabla }}_{\stackrel{\to }{\mathbf{\text{v}}}}f=\mathrm{\nabla }f\cdot \stackrel{\to }{\mathbf{\text{v}}}\end{array}$
Take a moment to delight in the fact that one single operation, the gradient, packs enough information to compute the rate of change of a function in every possible direction! That's so many directions! Left, right, up, down, north-north-east, 34.8${}^{\circ }$ clockwise from the $x$-axis... Madness!

## Example 2:

Problem: Take a look at the following function.
$f\left(x,y\right)={x}^{2}-xy$,
What is the directional derivative of $f$ at the point $\left(2,-3\right)$ along the vector $\begin{array}{r}\stackrel{\to }{\mathbf{\text{v}}}=0.6\stackrel{^}{\mathbf{\text{i}}}+0.8\stackrel{^}{\mathbf{\text{j}}}\end{array}$?
Solution: You can think of the direction derivative either as a weighted sum of partial derivatives, as below:
$\begin{array}{r}{\mathrm{\nabla }}_{\stackrel{\to }{\mathbf{\text{v}}}}f=0.6\frac{\partial f}{\partial x}+0.8\frac{\partial f}{\partial y}\end{array}$
Or, you can think of it as a dot product with the gradient, as you see here:
$\begin{array}{r}{\mathrm{\nabla }}_{\stackrel{\to }{\mathbf{\text{v}}}}f=\mathrm{\nabla }f\cdot \stackrel{\to }{\mathbf{\text{v}}}\end{array}$
The first is faster, but just for practice, let's see how the gradient interpretation unfolds. We start by computing the gradient itself:
$\mathrm{\nabla }f=\left[\begin{array}{c}\frac{\partial f}{\partial x}\\ \\ \frac{\partial f}{\partial y}\end{array}\right]=\left[\begin{array}{c}\frac{\partial }{\partial x}\left({x}^{2}-xy\right)\\ \\ \frac{\partial }{\partial y}\left({x}^{2}-xy\right)\end{array}\right]=\left[\begin{array}{c}2x-y\\ -x\end{array}\right]$
Next, plug in the point $\left(x,y\right)=\left(2,-3\right)$ since this is the point the question asks us about.
$\begin{array}{r}\mathrm{\nabla }f\left(2,-3\right)=\left[\begin{array}{c}2\left(2\right)-\left(-3\right)\\ \\ -\left(2\right)\end{array}\right]=\left[\begin{array}{c}7\\ \\ -2\end{array}\right]\end{array}$
To get the desired directional derivative, we take the dot product between this gradient and $\mathbf{\text{v}}$:
$\begin{array}{rl}{\mathrm{\nabla }}_{\stackrel{\to }{\mathbf{\text{v}}}}f\left(2,-3\right)& =\mathrm{\nabla }f\left(2,-3\right)\cdot \left(0.6\stackrel{^}{\mathbf{\text{i}}}+0.8\stackrel{^}{\mathbf{\text{j}}}\right)\\ \\ & =\left[\begin{array}{c}7\\ \\ -2\end{array}\right]\cdot \left[\begin{array}{c}0.6\\ \\ 0.8\end{array}\right]\\ \\ & =7\left(0.6\right)+\left(-2\right)\left(0.8\right)\\ \\ & =2.6\end{array}$

## Finding slope

How do you find the slope of a graph intersected with a plane that is not parallel to the $x$ or $y$ axes?
You can use the directional derivative, but there is one important thing to remember:
If the directional derivative is used to compute slope, either $\stackrel{\to }{\mathbf{\text{v}}}$ must be a unit vector or you must remember to divide by $||\stackrel{\to }{\mathbf{\text{v}}}||$ at the end.
In the definition and computation above, doubling the length of $\stackrel{\to }{\mathbf{\text{v}}}$ would double the value of the directional derivative. In terms of the computation, this is because $\mathrm{\nabla }f\cdot \left(2\stackrel{\to }{\mathbf{\text{v}}}\right)=2\left(\mathrm{\nabla }f\cdot v\right)$.
However, this might not always be what you want. The slope of a graph in the direction of $\stackrel{\to }{\mathbf{\text{v}}}$, for example, depends only on the direction of $\stackrel{\to }{\mathbf{\text{v}}}$, not the magnitude $||\stackrel{\to }{\mathbf{\text{v}}}||$. Let's see why.
How can we imagine this slope? Slice the graph of $f$ with a vertical plane that cuts the $xy$-plane in the direction of $\stackrel{\to }{\mathbf{\text{v}}}$. The slope in question is that of a line tangent to the resulting curve. As with any slope, we look for the rise over run.
In this case, the run will be the distance of a small nudge in the direction of $\stackrel{\to }{\mathbf{\text{v}}}$. We can express such a nudge as an addition of $h\stackrel{\to }{\mathbf{\text{v}}}$ to an input point ${\mathbf{\text{x}}}_{0}$, where $h$ is thought of as some small number. The magnitude of this nudge is $h||\stackrel{\to }{\mathbf{\text{v}}}||$.
The resulting change in the output of $f$ can be approximated by multiplying this little value $h$ by the directional derivative:
$h{\mathrm{\nabla }}_{\stackrel{\to }{\mathbf{\text{v}}}}f\left({x}_{0},{y}_{0}\right)$
In fact, the rise of the tangent line—as opposed to the graph of the function— is precisely $h{\mathrm{\nabla }}_{\stackrel{\to }{\mathbf{\text{v}}}}f\left({x}_{0},{y}_{0}\right)$ due to this run of size $h||\stackrel{\to }{\mathbf{\text{v}}}||$. For full details on why this is true, see the formal definition of the directional derivative in the next article.
Therefore, the rise-over-run slope of our graph is
$\begin{array}{r}\frac{h{\mathrm{\nabla }}_{\stackrel{\to }{\mathbf{\text{v}}}}f\left({x}_{0},{y}_{0}\right)}{h||v||}=\overline{)\frac{{\mathrm{\nabla }}_{\stackrel{\to }{\mathbf{\text{v}}}}f\left({x}_{0},{y}_{0}\right)}{||v||}}\end{array}$
Notice, if $\stackrel{\to }{\mathbf{\text{v}}}$ is a unit vector, meaning $||\stackrel{\to }{\mathbf{\text{v}}}||=1$, then the directional derivative does give the slope of a graph along that direction. Otherwise, it is important to remember to divide out by the magnitude of $\stackrel{\to }{\mathbf{\text{v}}}$.
Some authors even go so far as to include normalization in the definition of ${\mathrm{\nabla }}_{\stackrel{\to }{\mathbf{\text{v}}}}f$.
Alternate definition of directional derivative:
$\begin{array}{r}{\mathrm{\nabla }}_{\stackrel{\to }{\mathbf{\text{v}}}}f\left(\mathbf{\text{x}}\right)=\underset{h\to 0}{lim}\frac{f\left(\mathbf{\text{x}}+h\stackrel{\to }{\mathbf{\text{v}}}\right)-f\left(\mathbf{\text{x}}\right)}{h||\stackrel{\to }{\mathbf{\text{v}}}||}\end{array}$
Personally, I think this definition puts too much emphasis on the particular use case of finding slope, so I prefer to use the original definition and normalize $\stackrel{\to }{\mathbf{\text{v}}}$ when necessary.

## Example 3: Slope

Problem: On the stage for this problem we have three players.
Player 1, the function:
$f\left(x,y\right)=\mathrm{sin}\left(xy\right)$
Player 2, the point:
$\left({x}_{0},{y}_{0}\right)=\left(\frac{\pi }{3},\frac{1}{2}\right)$
Player 3, the vector:
$\stackrel{\to }{\mathbf{\text{v}}}=2\stackrel{^}{\mathbf{\text{i}}}+3\stackrel{^}{\mathbf{\text{j}}}$
What is the slope of the graph of $f$ at the point $\left({x}_{0},{y}_{0}\right)$ along the vector $\stackrel{\to }{\mathbf{\text{v}}}$?
Answer: Since we are finding slope, we must first normalize the vector in question. The magnitude $||\stackrel{\to }{\mathbf{\text{v}}}||$ is $\sqrt{{2}^{2}+{3}^{2}}=\sqrt{13}$, so we divide each term by $\sqrt{13}$ to get the resulting unit vector $\stackrel{^}{\mathbf{\text{u}}}$ in the direction of $\stackrel{\to }{\mathbf{\text{v}}}$:
Next, find the gradient of $f$:
Plug in the point $\left({x}_{0},{y}_{0}\right)=\left(\frac{\pi }{3},\frac{1}{2}\right)$ to this gradient.
Finally, take the dot product between $\stackrel{^}{\mathbf{\text{u}}}$ and $\mathrm{\nabla }f\left(\pi /3,1/2\right)$:

## Summary

• If you have some multivariable function, $f\left(x,y\right)$ and some vector in the function's input space, $\stackrel{\to }{\mathbf{\text{v}}}$, the directional derivative of $f$ along $\stackrel{\to }{\mathbf{\text{v}}}$ tells you the rate at which $f$ will change while the input moves with velocity vector $\stackrel{\to }{\mathbf{\text{v}}}$.
• The notation here is ${\mathrm{\nabla }}_{\stackrel{\to }{\mathbf{\text{v}}}}f$, and it is computed by taking the dot product between the gradient of $f$ and the vector $\stackrel{\to }{\mathbf{\text{v}}}$, that is, $\mathrm{\nabla }f\cdot \stackrel{\to }{\mathbf{\text{v}}}$.
• When the directional derivative is used to compute slope, be sure to normalize the vector $\stackrel{\to }{\mathbf{\text{v}}}$ first.

## Want to join the conversation?

• In example 3, is there an error?
cos((1/2) * (pi/3)) =/= 1/2

SqRt(3)/2 is what I get. I think you have taken the sin of pi/6 instead of cos.
• Yes, there is an error.
I found that ∇v ( π/3 , 1/2 ) = [(√3 + π.√3) / (2.√13)] or [(√3 . (1 + π) ) / (2 . √13)]
• I'm still not sure why you have to normalize vector v when computing the directional derivative for slope. Isn't the directional derivative just computing the rate at which f will change while the input moves along v, which is a lengthier way of describing the slope?
• The derivative means instantaneous rate of change. it is obvious if you move along a vector, the bigger the magnitude of the vector is you travel faster, so in each instance, you have a bigger instantaneous rate of change. but the slope is something different. you only care about the rise over run. two vectors with different magnitude have the same rise over run if they point in the same direction. so if we are using derivative as a mean to get to the slope, we ignore the magnitude, cause we only care about the direction. Hope my answer is clear.
• In example 3: slope, the magnitude of v should be sqrt(2^2+3^3) = sqrt(13). sqrt(4^2+3^2) = sqrt(25) = 5, not sqrt(13), and 4 is not part of the vector v.

On a side note, I'm glad to see I'm not the only one who works through an operation and then puts the result back in to the operation, as if it still needs to be solved. I've gotten a few KA exercises wrong that way, usually involving simple arithmetic after having taken care of the calculus.
• Even the u vector calculated is wrong as expressed but later on in the calculation the j component is corrected by using 3 instead of 2 divided by sqrt3 ........ :-)
• Why does he say, the vector "v" is the velocity vector?? I think in this context, it is the displacement vector. We are not considering time, just space.

If we just consider the graph, (independently of if the real function represents the output of benefits from two inputs of production and investiment, in an economy problem), we obtain the increment of Z distance (increment of the functions), for an increment of a combination of distances in X and Y. Talking about velocity has not sense here.
• It's helpful to think about v as of a velocity vector because if you move along v 2 as fast the resulting rate of change has to be 2 as fast (and it is if you double the directional vector). But if you think of it as of distance, i will not be intuitive to think that doubling the distance traveled will double the output.
• if h is an infinitsimal why does the magnitude of v matter? even if it would matter wouldn't it be better to aproach the vector's magnitude to zero too?
• Rate of change with h approaching zero is equivalent to the slope of a tangent. If you are using the rate of change on the original graph, h must be tiny. If you are using a tangent line, then h can be whatever, since the slope is constant.

v can't possibly be zero since the zero vector has no direction. It has to have some length to retain its directional information, We decide 1 is the best choice because it's the most general number other than 0.

Slope is defined as rise/run, so it is also rise when run = 1. rise/run = rise/1 = rise. We could of course had defined it as 2rise/run or run/rise, which would still retain all the useful information about how steep the graph is, but we defined it as rise/run, and so we have to use ||v|| = 1.
• so if I compute the directional derivative, having the unit vector as my direction I get the slope of the surface right?, if i dont use a unit vector what do i get? Im asking for a physical interpretation.
thanks!
• I think the answer must be: the instantaneous rate of change.
• example 2 calculates the directional derivative and uses dot product with gradient and the vector components, yet example 3 in calculating slope, converts the vector to the unit vector before the dot product. Whats the difference between directional derivative and slope?