If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Gradient and graphs

Learn how the gradient can be thought of as pointing in the "direction of steepest ascent". This is a rather important interpretation for the gradient.  Created by Grant Sanderson.

Want to join the conversation?

  • blobby green style avatar for user Noah Baker-Kang
    I don't see how the gradient is the "direction of steepest ascent", wouldn't the direction of the gradient vector just indicate how the value of the function is changing as the inputs change? I guess what I don't understand is the "steepest ascent" part.
    (23 votes)
    Default Khan Academy avatar avatar for user
    • orange juice squid orange style avatar for user Luke S.
      Let's imagine we are climbing up the side of a pyramid in Egypt. We'll also imagine it is perfectly smooth, so it only slopes up toward the peak (for simplicity, we'll say the slope towards the peak is 1). Since the pyramid is smooth, it's easy to realize that it is steepest when you walk directly up toward the peak. This is the "steepest ascent."
      Now, imagine that from our point of view, taking a step forward is moving in the +y direction, and taking a step to the right is the +x direction. If we stand so we are facing the peak, then a step forward (+y) has a slope of 1, and a step right or left has no slope. This means partial x is 0, and partial y is 1. Then the gradient is the vector [0, 1], which points straight ahead, toward the peak, which we noticed is the "steepest ascent."
      Next, imagine we turn 90 degrees to the left. This means the peak is now directly to the right. If we step right, the slope must be 1, and if we step forward or backward, there is no slope. This makes our gradient [1, 0], which is a vector pointing to our right, again directly at the peak.
      Ok, but what if we had only turned 45 degrees, so we're at a funny angle on the pyramid? Well, now a step forward brings us up a little bit, but not straight up toward the peak. The slope is 0.5. A step to the right also brings us up a little bit, with a slope of 0.5. The gradient is now [0.5, 0.5], which points at an angle toward our front-right... which is the direction of the peak. In fact, no matter how we turn, if we just check the slope of a step forward and the slope of a step to the right, our gradient vector points straight toward the peak.
      But what if we were climbing something other than a pyramid? What if it was something bumpy? No matter how bumpy the thing we're climbing is, we can always pick the point we're standing on, and imagine taking one side off a tiny little pyramid, and setting it so that it matches the slope at the place we're standing. Then the gradient on that little pyramid side will have to point in whatever direction is steepest from where we're standing.
      (68 votes)
  • primosaur ultimate style avatar for user Lucas Muehleisen
    What direction would a gradient vector point at the top (maxima) of the graph?
    (9 votes)
    Default Khan Academy avatar avatar for user
  • piceratops ultimate style avatar for user Andy Medjedo
    What if you had 2 "tops" of the graph but 1 is much, much larger than the other. Would the vector field tell you to go directly to the highest top? Or would it just find the maximum your closest to? Also is there a gradient for minimum instead of maximum? Or is it the same thing?
    (7 votes)
    Default Khan Academy avatar avatar for user
  • aqualine seed style avatar for user Divyansh
    why is there no z component in the equation x^2 + y^2?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • female robot grace style avatar for user loumast17
      There is Like in two dimensional equations you have y equal to some function of x, in 3d graphs if you just have a function in terms of x and y, you can think of it as saying z = x^2 + y^2. this is the function graphed in the video.

      There are also times when z will be with x and y, just like in 2d x^2 + y^2 = 1 is the unit circle, x^2 + y^2 + z^2 = 1 is the unit sphere.
      (4 votes)
  • winston baby style avatar for user Gregory  Edwards
    What is the eqn of the second graph that was on this video (about 5 mins in)?
    (4 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user enidad20
    where are gradient vectors located?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • female robot grace style avatar for user loumast17
      The gradient is always one dimension smaller than the original function. So for f(x,y), which is 3D (or in R3) the gradient will be 2D, so it is standard to say that the vectors are on the xy plane, which is what we graph in in R2. These vectors have no z coordinate to them, just x and y.

      If youhad a function f(x,y,z) you couldn't actually graph it, since we can't graph in R4 really, but you could graph its gradient vectors, and they would be all throughout R3, or the x,y,z space.

      Let me know if that did not help.
      (3 votes)
  • blobby green style avatar for user johan
    why does the gradient only give you the steepest ascent and not descent
    (2 votes)
    Default Khan Academy avatar avatar for user
  • spunky sam blue style avatar for user Paramvir Singh
    So does it mean, vector fields are used to interpret the slope of a multivariable function?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Es Mohamed
    Is we say gradient refer to maximum because each point has infinite tangents and we take only tangent in i and j?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • leaf orange style avatar for user eternalshenron
    I am unable to imagine why the vector created by partial derivatives should always point to the steepest ascent. I want to give an example to justify my concern.

    Suppose you are standing at 0,0 position. There are smooth slopes at x and y axis with a slope of 1 each. But these slopes are very narrow and the rest of the field is flat. So for example (0.1,1) will be flat but (0,1) will have a slope of 1. Similarly (1,0.1) will be flat but (1,0) will have a slope of 1. So the path of steepest ascent are either on the x axis or the y axis. But when you calculate the vector it comes out to be at 45 degree angle since both the partial derivatives with respect to x and y are exactly the same. What am I doing wrong here?

    Note: I am very new to vectors and partial derivatives. I know there is some misconception on my part that is creating this scenario. I would like to know what that error is and how partial derivatives will give us a right answer in the example that I have given.
    (2 votes)
    Default Khan Academy avatar avatar for user
    • hopper cool style avatar for user [name]
      Based on your explanation of your function, I think you are piecewise-defining a function like this:
      For x=0, f(x,y)=y
      For y=0, f(x,y)=x
      Else, f(x,y)=0
      If you take the gradient of this function, you will get [0 0] everywhere except the x=0, where you get [0 1], and y=0, where you get [1 0]. This matches with your explanation and the graph!

      Hope this helps :)
      (1 vote)

Video transcript

- [Voiceover] So here I'd like to talk about what the gradient means in the context of the graph of a function. So in the last video, I defined the gradient, but let me just take a function here. And the one that I had graphed is x-squared plus y-squared, f of x, y, equals x-squared plus y-squared. So two-dimensional input, which we think about as being kind of the xy-plane, and then a one-dimensional output that's just the height of the graph above that plane. And I defined in the last video, the gradient, to be a certain operator. An operator just means you've taken a function and you output another function, and we use this upside down triangle. So it gives you another function that's also of x and y, but this time it has a vector valued output. And the two components of its output are the partial derivatives, partial of f with respect to x, and the partial of f with respect to y. So for a function like this, we actually evaluated it. Let's take a look. The first one is taking the derivative with respect to x, so it looks at x and says, "You look like a variable to me. "I'm gonna take your derivative, your 2x." 2x, but the y component just looks like a constant as far as the partial x is concerned. And the derivative of a constant is zero. But when you take the partial derivative with respect to y, things reverse. It looks at the x component and says, "You look like a constant. "Your derivative is zero." But it looks at the y component and says, "Ah, you look like a variable. "Your derivative is 2y." So this ultimate function we get, the gradient, which takes in a two variable input, xy, some point on this plane, but outputs a vector, can nicely be visualized with a vector field. And I have another video on vector fields if you're feeling unsure. But I want you to just take a moment, pause if you need to, and guess, or try to think about what vector field this will look like. I'm gonna show you in a moment, but what's it gonna look like, the one that takes in xy and outputs 2x, 2y? Alright, have you done it, have you thought about what it's gonna look like? Here's what we get. It's a bunch of vectors pointing away from the origin. And the basic reason for that is that if you have any given input point, and say it's got coordinates x, y, then the vector that that input point represents would, you know, if it went from the origin here, that's what that vector looks like, but the output is two times that vector. So when we attach that output to the original point, we get something that's two times that original vector but pointing in the same direction, which is away from the origin. We kind of drew it poorly here. And of course, when we draw vector fields, we don't usually draw them to scale. You scale them down just so that things don't look as cluttered. That's why everything here, they all look the same length, but color indicates length. So you should think of these red guys as being really long, the blue ones as being really short. So what does this have to do with the graph of the function? There's actually a really cool interpretation. So imagine that you are just walking along this graph, you know, you're a hiker and this is a mountain. And you picture yourself at any old point on this graph, let's say, what color should I use? Let's say you're sitting at a point like this. And you say, "What direction should I walk "to increase my altitude the fastest?" You want to get uphill as quickly as possible. And from that point, you might walk what looks like straight up there. You certainly wouldn't go around, and this way you wouldn't go down. So you might go straight up there. And if you project your point down onto the input space, so this is the point above which you are, that vector, the one that's gonna get you going uphill the fastest, the direction you should walk. For this graph, it should kind of make sense, is directly away from the origin, 'cause here, I'll erase this 'cause once I start moving things, that won't stick. If you were to look at things from the very bottom, any point that you are on the mountain on the graph here and when you want to increase the fastest, you should just go directly away from the origin 'cause that's when it's the steepest. And all of these vectors are also pointing directly away from the origin. So people will say the gradient points in the direction of steepest ascent, that might even be worth writing down. Direction of steepest ascent. And let's just see what that looks like in the context of another example. So I'll pull up another graph here, pull up another graph and its vector field. So this graph, it's all negative values, it's all below the xy-plane, and it's got these two different peaks. And I've also drawn the gradient field, which is the word for the vector field representing the gradient on top. And you'll notice near the peak all of the vectors are pointing kind of in the uphill direction, sort of telling you to go towards that peak in some way. And as you get a feel around, you can see here, this very top one, like the point that it's stemming from corresponds with something just a little bit shy of the peak there. And everybody's telling you to go uphill. Each vector is telling you which way to walk to increase the altitude on the graph the fastest. It's the direction of steepest ascent. And that's what the direction means, but what does the length mean? Well, if you take a look, take a look at these red vectors here. So red means that they should be considered very, very long. And the graph itself, the point they correspond to on the graph is just way off screen for us because this graph gets really steep and really negative very fast. So the points these correspond to have really, really steep slopes whereas these blue ones over here, you know, it's kind of a relatively shallow slope. By the time you're getting to the peak, things start leveling off. So the length of the gradient vector actually tells you the steepness of that direction of steepest ascent. But one thing I want to point out here, it doesn't really make sense immediately looking at it, why just throwing the partial derivatives into a vector is gonna give you this direction of steepest ascent. Ultimately it will. We're gonna talk through that and I hope to make that connection pretty clear, but unless you're some kind of intuitive genius, I don't think that connection is at all obvious at first. But you will see it in due time. It's gonna require something called the directional derivative. See you next video.