If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

### Course: Multivariable calculus>Unit 2

Lesson 3: Partial derivative and gradient (articles)

# Directional derivatives (going deeper)

A more thorough look at the formula for directional derivatives, along with an explanation for why the gradient gives the slope of steepest ascent.

## Formal definition of the directional derivative

There are a couple reasons you might care about a formal definition. For one thing, really understanding the formal definition of a new concept can make clear what it is really going on. But more importantly than that, I think the main benefit is that it gives you the confidence to recognize when such a concept can and cannot be applied.
As a warm up, let's review the formal definition of the partial derivative, say with respect to $x$:
$\frac{\partial f}{\partial x}\left({x}_{0},{y}_{0}\right)=\underset{h\to 0}{lim}\frac{f\left({x}_{0}+h,{y}_{0}\right)-f\left({x}_{0},{y}_{0}\right)}{h}$
The connection between the informal way to read $\frac{\partial f}{\partial x}$ and the formal way to read the right-hand side is as follows:
SymbolInformal understandingFormal understanding
$\partial x$A tiny nudge in the $x$ direction.A limiting variable $h$ which goes to $0$, and will be added to the first component of the function's input.
$\partial f$The resulting change in the output of $f$ after the nudge.The difference between $f\left({x}_{0}+h,{y}_{0}\right)$ and $f\left({x}_{0},{y}_{0}\right)$, taken in the same limit as $h\to 0$.
We could instead write this in vector notation, viewing the input point $\left({x}_{0},{y}_{0}\right)$ as a two-dimensional vector
$\begin{array}{r}{\mathbf{\text{x}}}_{0}=\left[\begin{array}{c}{x}_{0}\\ \\ {y}_{0}\\ \end{array}\right]\end{array}$
Here ${\mathbf{\text{x}}}_{0}$ is written in bold to emphasize its vectoriness. It's a bit confusing to use a bold $\mathbf{\text{x}}$ for the entire input rather than some other letter, since the letter $x$ is already used in an un-bolded form to denote the first component of the input. But hey, that's convention, so we go with it.
Instead of writing the "nudged" input as $\left({x}_{0}+h,{y}_{0}\right)$, we write it as ${\mathbf{\text{x}}}_{0}+h\stackrel{^}{\mathbf{\text{i}}}$, where $\stackrel{^}{\mathbf{\text{i}}}$ is the unit vector in the $x$-direction:
$\begin{array}{r}\frac{\partial f}{\partial x}\left({\mathbf{\text{x}}}_{0}\right)=\underset{h\to 0}{lim}\frac{f\left({\mathbf{\text{x}}}_{0}+h\stackrel{^}{\mathbf{\text{i}}}\right)-f\left({\mathbf{\text{x}}}_{0}\right)}{h}\end{array}$
In this notation, it's much easier to see how to generalize the partial derivative with respect to $x$ to the directional derivative along any vector $\stackrel{\to }{\mathbf{\text{v}}}$:
${\mathrm{\nabla }}_{\stackrel{\to }{\mathbf{\text{v}}}}f\left({\mathbf{\text{x}}}_{0}\right)=\underset{h\to 0}{lim}\frac{f\left({\mathbf{\text{x}}}_{0}+h\stackrel{\to }{\mathbf{\text{v}}}\right)-f\left({\mathbf{\text{x}}}_{0}\right)}{h}$
In this case, adding $h\stackrel{\to }{\mathbf{\text{v}}}$ to the input for a limiting variable $h\to 0$ formalizes the idea of a tiny nudge in the direction of $\stackrel{\to }{\mathbf{\text{v}}}$.

## Seeking connection between the definition and computation

Computing the directional derivative involves a dot product between the gradient $\mathrm{\nabla }f$ and the vector $\stackrel{\to }{\mathbf{\text{v}}}$. For example, in two dimensions, here's what this would look like:
$\begin{array}{rl}{\mathrm{\nabla }}_{\stackrel{\to }{\mathbf{\text{v}}}}f\left(x,y\right)& =\mathrm{\nabla }f\cdot \stackrel{\to }{\mathbf{\text{v}}}\\ \\ & =\left[\begin{array}{c}\frac{\partial f}{\partial x}\\ \\ \frac{\partial f}{\partial y}\end{array}\right]\cdot \left[\begin{array}{c}{v}_{1}\\ \\ {v}_{2}\end{array}\right]\\ \\ & ={v}_{1}\frac{\partial f}{\partial x}\left(x,y\right)+{v}_{2}\frac{\partial f}{\partial y}\left(x,y\right)\end{array}$
Here, ${v}_{1}$ and ${v}_{2}$ are the components of $\stackrel{\to }{\mathbf{\text{v}}}$.
$\begin{array}{r}\stackrel{\to }{\mathbf{\text{v}}}=\left[\begin{array}{c}{v}_{1}\\ \\ {v}_{2}\\ \end{array}\right]\end{array}$
The central question is, what does this formula have to do with the definition given above?

## Breaking down the nudge

The computation for ${\mathrm{\nabla }}_{\mathbf{\text{v}}}f$ can be seen as a way to break down a tiny step in the direction of $\mathbf{\text{v}}$ into its $x$ and $y$ components.
Specifically, you can imagine the following procedure:
1. Start at some point $\left({x}_{0},{y}_{0}\right)$.
2. Choose a tiny value $h$.
3. Add $h{v}_{1}$ to ${x}_{0}$, which means stepping to the point $\left({x}_{0}+h{v}_{1},{y}_{0}\right)$. From what we know of partial derivatives, this will change the output of the function by about
$\begin{array}{r}h{v}_{1}\left(\frac{\partial f}{\partial x}\left({x}_{0},{y}_{0}\right)\right)\end{array}$
• Now add $h{v}_{2}$ to ${y}_{0}$ to bring us up/down to the point $\left({x}_{0}+h{v}_{1},{y}_{0}+h{v}_{2}\right)$. The resulting change to $f$ is now about
$\begin{array}{r}h{v}_{2}\left(\frac{\partial f}{\partial y}\left({x}_{0}+h{v}_{1},{y}_{0}\right)\right)\end{array}$
Adding the results of steps $3$ and $4$, the total change to the function upon moving from the input $\left({x}_{0},{y}_{0}\right)$ to the input $\left({x}_{0}+h{v}_{1},{y}_{0}+h{v}_{2}\right)$ has been about
$\begin{array}{r}h{v}_{1}\left(\frac{\partial f}{\partial x}\left({x}_{0},{y}_{0}\right)\right)+h{v}_{2}\left(\frac{\partial f}{\partial y}\left({x}_{0}+h{v}_{1},{y}_{0}\right)\right)\end{array}$
This is very close to the expression for the directional derivative, which says the change in $f$ due to this step $h\stackrel{\to }{\mathbf{\text{v}}}$ should be about
$\begin{array}{rl}& \phantom{=}h{\mathrm{\nabla }}_{\stackrel{\to }{\mathbf{\text{v}}}}f\left({x}_{0},{y}_{0}\right)\\ \\ & =h\stackrel{\to }{\mathbf{\text{v}}}\cdot \mathrm{\nabla }f\left({x}_{0},{y}_{0}\right)\\ \\ & =h{v}_{1}\frac{\partial f}{\partial x}\left({x}_{0},{y}_{0}\right)+h{v}_{2}\frac{\partial f}{\partial y}\left({x}_{0},{y}_{0}\right)\end{array}$
However, this differs slightly from the result of our step-by-step argument, in which the partial derivative with respect to $y$ is taken at the point $\left({x}_{0}+h{v}_{1},{y}_{0}\right)$, not at the point $\left({x}_{0},{y}_{0}\right)$.
Luckily we are considering very, very small values of $h$. In fact, more technically, we should be talking about the limit as $h\to 0$. Therefore evaluating $\frac{\partial f}{\partial y}$ at $\left({x}_{0}+h{v}_{1},{y}_{0}\right)$ will be almost the same as evaluating it at $\left({x}_{0},{y}_{0}\right)$. Moreover, as $h$ approaches $0$, so does the difference between these two, but we have to assume that $f$ is continuous.

## Why does the gradient point in the direction of steepest ascent?

Having learned about the directional derivatives, we can now understand why the direction of the gradient is the direction of steepest ascent.
Specifically, here's the question at hand.
Setup:
• Let $f$ be some scalar-valued multivariable function, such as $f\left(x,y\right)={x}^{2}+{y}^{2}$.
• Let $\left({x}_{0},{y}_{0}\right)$ be a particular input point
• Consider all possible directions, i.e. all unit vectors $\stackrel{^}{\mathbf{\text{u}}}$ in the input space of $f$.
Question (informal): If we start at $\left({x}_{0},{y}_{0}\right)$, which direction should we walk so that the output of $f$ increases most quickly?
Question (formal): Which unit vector $\stackrel{^}{\mathbf{\text{u}}}$ maximizes the directional derivative along $\stackrel{^}{\mathbf{\text{u}}}$?
$\begin{array}{r}{\mathrm{\nabla }}_{\stackrel{^}{\mathbf{\text{u}}}}f\left({x}_{0},{y}_{0}\right)=\underset{\text{Maximize this quantity}}{\underset{⏟}{\stackrel{^}{\mathbf{\text{u}}}\cdot \mathrm{\nabla }f\left({x}_{0},{y}_{0}\right)}}\end{array}$
The famous triangle inequality tells us that this will be maximized by the unit vector in the direction $\mathrm{\nabla }f\left({x}_{0},{y}_{0}\right)$.
Notice, the fact that the gradient points in the direction of steepest ascent is a consequence of the more fundamental fact that all directional derivatives require taking the dot product with $\mathrm{\nabla }f$.

## Want to join the conversation?

• I'm having trouble understanding the 3rd step under the formal argument. If we move hv1 in the x direction, how does this imply that the output will be hv1*fx(x0,y0)? (Sorry for the notation - I'm on my phone).
• That is the definition of the derivative. Remember:
fₓ(x₀,y₀) = lim_Δx→0 [(f(x₀+Δx,y₀)-f(x₀,y₀))/Δx]
Then, we can replace Δx with hv₁ because both Δx and h are very small, so we get:
fₓ(x₀,y₀) = (f(x₀+hv₁,y₀)-f(x₀,y₀))/hv₁
We can then rearrange this equation to get:
f(x₀+hv₁,y₀) = hv₁ × fₓ(x₀,y₀) + f(x₀,y₀)
• Can anyone help me understand how to solve the puzzle at the end of the article? I am having trouble understanding it.
• I had trouble with this puzzle too, but then I thought about it in terms of vectors. We need to maximize 100A + 20B + 2C, right? By definition of the dot product, this expression is equal to the dot product of two vectors [100, 20, 2] * [A, B, C]. So we want to maximize the dot product. When does the dot product have the maximum value? It is maximum when two vectors are parallel, or, in other words, one vector is multiple of the other (this can be understood from the graphical interpretation of the dot product). Therefore, our vector [A, B, C] should be [100x, 20x, 2x], where x is some number.

The second insight is to express A^2 + B^2 + C^2 = 10404 equation in vector notation. Expression A^2 + B^2 + C^2 is equal to the dot product of vector [A,B,C] with itself:

[A,B,C]*[A,B,C] = 10404.

Here instead of [A,B,C] we substitute [100x, 20x, 2x] vector and solve for x:

[100x, 20x, 2x]* [100x, 20x, 2x] = 10000x^2 + 400x^2 + 4x^2 = 10404x^4 = 10404.
x = 1.

Therefore, our vector [A,B,C] is [100 * 1, 20 * 1, 2 * 1] = [100, 20, 2].

A = 100
B = 20
C = 2.

I hope it wasn't confusing. :)
• I don't understand this sentence: The famous triangle inequality tells us that this will be maximized by the unit vector in the direction (nabla) f (x0, y0)

To me, this doesn't seem obvious at all, can I find explanation perhaps somewhere else on the khan academy? I have looked at triangle inequality ||x + y|| <= ||x||+||y|| , but I don't understand how the two things are related, other than that both somehow talk about vectors. Directional derivative however works with dot products and not adding vectors together, as in the triangle inequality, so I don't immediately see the connection between the two.
• I'm in the same boat and don't see how the triangle inequality can be applied, however using the slightly different Cauchy-Schwarz inequality works. The Cauchy-Scwarz inequality states that
x·y <= ||x|| * ||y||
for any two vectors x and y. Cauchy-Schwarz also says that the inequality can be turned to an equality
x · y =  ||x|| * ||y||
if x and y are parallel.
• For a given gradient vector, I can understand that any unit vector in the direction of gradient vector will give maximum value of dot product between itself and gradient vector.
But how does it proves the gradient vector is itself the direction of maximum ascent.
• I don’t think that quite matters. If a unit vector in the direction of the gradient vector is the direction of greatest ascent, then moving in the direction of the gradient vector is also in the direction of the greatest ascent. One is just a multiple of another—they still pointin the same direction.
• Hi, i've a question on the following point "Computing the directional derivative involves a dot product between the gradient and the vector v". when i look at the definition of dot product, it says "|a|.|b|.cosine theeta".
but in this definition no cosine theeta is involved. is that mean, the gradient (which has the partial derivatives of x,y) not considered as a vector here? or is it? if yes,why cosine theeta is not involved here?
• Isn't it good to color "h" in black in the figure just below the subtitle of "Breaking down the nudge" for the consistency? Just a suggestion.
• If you meant that we should bold h like 𝐡, then this is my answer: No, because h is a scalar value, and we're taking the limit as h approaches the (scalar value) zero. This corresponds to the nudge in the input, which is the vector h𝐯, approaching the vector value 𝟎. h𝐯 is equal to the input nudge direction 𝐯 scaled (multiplied) by the step size h.
• step 4. how does adding the change in the x direction and y direction give us the total change in the function? we are adding "perpendicular numbers", not vectors, so the adding should look more like pythagoras, shouldn't it?
The function must be locally be essentially linear, i.e., there must be a linear approximation
L(x)=f(a)+Df(a)(x−a)
That ignorance and directly add make sense for L(x).
Detailed info, you can refer to :
• I have more of a conceptual question. If we think a partial derivative as a little nudge to a certain direction and that nudge (h) is approaching 0, why does that concept not transfer to directional derivative? Namely, why would the directional derivative of 2v twice as big as that of v? If we think of the nudge in v as so tiny that goes to 0, why would 2v versus v even matter?
• To gain an intuitive understanding of that particular concept, you'd need to consider how the single variable calculus differentiation process deals with constant multipliers. For example, dy/dx (y = c*f(x)) gives dy = c*f'(x)dx for all constants c.

In the same respect, the scaling factor of the directional vector v is taken into consideration. Note that while we're dealing with something approaching 0, it doesn't ever quite get there.
(1 vote)
• I think this part may be wrong (or at least I don't fully understand it):

"However, this differs slightly from the result of our step-by-step argument, in which the partial derivative with respect to y is taken at the point (x_0 + hv_1, y_0) not at the point (x_0, y_0)"

I would think we could treat the change in y just like we treated the change in x -- they are in essence happening at the same time -- and how would you chose the order anyway?
• The thing is that lim ℎ→0 𝑥₀ + ℎ𝑣₁ = 𝑥₀

With 𝒗 = (𝑣₁, 𝑣₂)
𝛻𝒗 𝑓(𝑥₀, 𝑦₀) = lim ℎ→0 [𝑓(𝑥₀ + ℎ𝑣₁, 𝑦₀ + ℎ𝑣₂) − 𝑓(𝑥₀, 𝑦₀)]∕ℎ

= lim ℎ→0 [𝑓(𝑥₀ + ℎ𝑣₁, 𝑦₀ + ℎ𝑣₂) − 𝑓(𝑥₀ + ℎ𝑣₁, 𝑦₀) + 𝑓(𝑥₀ + ℎ𝑣₁, 𝑦₀) − 𝑓(𝑥₀, 𝑦₀)]∕ℎ

= lim ℎ→0 [𝑓(𝑥₀ + ℎ𝑣₁, 𝑦₀ + ℎ𝑣₂) − 𝑓(𝑥₀ + ℎ𝑣₁, 𝑦₀)]∕ℎ
+ lim ℎ→0 [𝑓(𝑥₀ + ℎ𝑣₁, 𝑦₀) − 𝑓(𝑥₀, 𝑦₀)]∕ℎ

= lim ℎ→0 (𝜕∕𝜕𝑦[𝑓(𝑥₀ + ℎ𝑣₁, 𝑦₀)]) + 𝜕∕𝜕𝑥[𝑓(𝑥₀, 𝑦₀)]

= 𝜕∕𝜕𝑦[𝑓(𝑥₀, 𝑦₀)]) + 𝜕∕𝜕𝑥[𝑓(𝑥₀, 𝑦₀)]