If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

### Course: Multivariable calculus>Unit 2

Lesson 2: Gradient and directional derivatives

# Why the gradient is the direction of steepest ascent

The way we compute the gradient seems unrelated to its interpretation as the direction of steepest ascent. Here you can see how the two relate.  Created by Grant Sanderson.

## Want to join the conversation?

• i did not get the logic in this proof.
consider two vectors (a) and (b).
(imagine you still don't know that this (a).(b)gives a directional derivative)

now (a).(b) is maximum when (b) is in the direction of (a). i completely agree.
but how can you then say (a) must be the direction of steepest ascent.
this is all i understood from this video.
it seems as if he proved the gradient points in the direction of steepest ascent by assuming it in the first place.
• I know this question was asked a while ago, but I wanted to give it a shot.

The question we should start by asking is not "Why the gradient is the direction of steepest ascent" but instead "What unit vector gives the direction of steepest ascent at a given point". So, what unit vector gives the direction of steepest ascent at a given point? First off, how do we measure steepness? With slope, which in this context is given by the directional derivative of a point. This means we're looking for the vector that maximizes the directional derivative. So, how do we calculate directional derivative? It's the dot product of the gradient and the vector.

A point of confusion that I had initially was mixing up gradient and directional derivative, and seeing the directional derivative as the magnitude of the gradient. This is not correct at all. Visualizing a plane, a single point has just one vector gradient corresponding to it. However, depending on the direction you are turned, left, right, down, or up, the directional derivative is completely different.

Going back to the problem, we're now looking for a vector that would maximize (gradient) dot (vector) at a specific point. Since we are looking at a single point, the gradient part of it is constant. The vector is the only variable. As you have stated, the maximum value would occur if the vector was in the direction of the gradient.

There you have it. At a given point, the direction of steepest ascent is in the same direction as the gradient. Or, another way of putting it, the gradient is the direction of steepest ascent.
• I found this explanation a bit backwards, this is the way i see it.
By taking partial derivative in [1,0] and [0,1] ( two perpendicular vectors, so everything is covered in 2D plane) we find out how much the function will nudge when x and y increase a little. If a nudge in y direction increased function 4 times and nudge in x direction 1 time, its pretty easy to figure out the best way to "climb" the fastest is to move in ratio 4/1 in y direction relative to x.
Tha'ts all the gradient is, ratio of all possible input/output changes, which we interpret as a vector components.
Directional derivative proves nothing to me but that dot product is the biggest when the angle is smallest. Gradient is the direction of steepest ascent because of nature of ratios of change.
If i want magnitude of biggest change I just take the absolute value of the gradient. If I want the unit vector in the direction of steepest ascent ( directional derivative) i would divide gradient components by its absolute value.
• In which direction should you walk to descend the fastest? My homework said it's the negative of the gradient vector but my textbook says when you are moving in the opposite direction of the gradient vector, this results in a minimum rate of change in the direction you're walking- not the maximum.
• Both are correct, but your textbook put it in a way that seems a bit confusing. Moving in the direction of the gradient will give you the greatest rate of increase, and thus going in the opposite direction will give you the greatest rate of decrease. And the greatest rate of decrease is the minimum rate of change because that is when the rate of change is most negative.

As an example, let's say you are hiking up a mountain. Imagine the top of the mountain is to the north, so the gradient points north, imagine it has a magnitude of .5, meaning that for each meter you move north, you will rise .5 meters. So if you walk in the opposite direction, the rate of change will be -.5, and that is the minimum of all possible rates of change. If you walk east or west, the rate of change will be 0, which would be the minimum possible magnitude for the rate of change. But -.5 is less than 0.
• That explanation does not make sense. It's a circle reasoning, no? U already assume the gradient vector is the vector of steepest accent/descent, and then explain that any projection or dot product is maximized if it is a vector that points in the same direction. Yes obviously, but you didn't explain WHY the vector of partial derivatives, the gradient actually IS the vector of steepest ascent in the first place.
• I love these videos! However, in this case, as far as I understand, it is circular reasoning, because his conclusion is the same as his assumption. I think there is a problem here with confusing what we are maximizing: the length of the dot product, or the change in the function value.

For the "steps in a direction" part, when we talk about finding where [3 5] takes you, we describe the location by saying "3 steps in the x direction and 5 steps in the y direction", but when you want to move in that direction, you don't walk in the x direction and then walk in the y direction. Instead you walk directly in the direction of the endpoint of [3 5], or about 59 degrees. That will make you ascend even faster than walking in the y direction.
• This video basically says that gradient is direction of steepest ascent BY DEFINITION of the directional derivative (to be clear, I'm referring to the informal definition of the directional derivative, which is the dot product of directional vector v and gradient). Note that slope and directional derivative (with unit vector direction) are synonymous ideas.

The logic is as follows: "Trust me when I say this, slope is the dot product of gradient and direction. We know that dot product is maximized when the vectors are parallel. Therefore, slope is maximized when direction is parallel to gradient."

What isn't exactly clear to me is why the informal definition itself is a correct way to compute the slope of the function in direction v (I guess it kind of makes sense as it measures a weighted sum of how much I'm taking advantage of going in x direction and how much I'm taking advantage of going in y direction... but it isn't mathematically clear how this weighted sum is reliable measure of the actual graph's slope in that direction) . I DO however understand how the formal limit definition (which was explained in a previous video) is a valid way for computing the slope in a particular direction.

My question is: how is the dot product definition of the directional derivative equivalent to the limit definition of the directional derivative?
• The fact that the gradient is in the direction of steepest ascent is inherent to its very definition, so to prove it by reference to the rule for computing directional derivatives seems to me a bit like begging the question. Surely, the gradient points in the direction of steepest ascent because the partial derivatives provide the maximum increases to the value of the function at a point and summing them means advancing in both of their specific directions at the same time.
• I think unfortunately I do not have the intuition of how to maximize the dot product. Where can I find the videos mentioned at ?
• I read that the gradient is orthogonal/normal to the tangential plane. How is that possible if it points in the direction of steepest ascent?
• If you look at the method to find a tangent plane, and then the method to find a normal vector to a plane in general you'll see the link. The gradient isn't directly normal, but if you have it in the form <df(A)/dx, df(A)/dy, -1> you get the normal vector. A here is whatever point you are measuring from on the surface. Here is a video from the linear algebra playlist on finding the normal vector from a plane. (Pretend my derivatives are partial derivatives)

Start at around to be at the point he has a general plane equation, then just keep in mind what A, B and C are. I hope this helped.