If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

The gradient

The gradient stores all the partial derivative information of a multivariable function.  But it's more than a mere storage device, it has several wonderful interpretations and many, many uses.

What you need to be familiar with before starting this lesson:

What we're building toward

  • The gradient of a scalar-valued multivariable function f(x,y,), denoted f, packages all its partial derivative information into a vector:
    f=[fxfy]
    In particular, this means f is a vector-valued function.
  • If you imagine standing at a point (x0,y0,) in the input space of f, the vector f(x0,y0,) tells you which direction you should travel to increase the value of f most rapidly.
  • These gradient vectors—f(x0,y0,)—are also perpendicular to the contour lines of f.

Definition

After learning that functions with a multidimensional input have partial derivatives, you might wonder what the full derivative of such a function is. In the case of scalar-valued multivariable functions, meaning those with a multidimensional input but a one-dimensional output, the answer is the gradient.
The gradient of a function f, denoted as f, is the collection of all its partial derivatives into a vector.
This is most easily understood with an example.

Example 1: Two dimensions

If f(x,y)=x2xy, which of the following represents f?
Choose 1 answer:

Notice, f is a vector-valued function, specifically one with a two-dimensional input and a two-dimensional output. This means it can be nicely visualized with a vector field. That vector field lives in the input space of f, which is the xy-plane.
This vector field is often called the gradient field of f.
Gradient of f(x,y)=x2xy as a vector field.
Reflection question: Why are the vectors in this vector field so small along the upward diagonal stripe in the middle of the xy-plane?
Highlight empty region.

Example 2: Three dimensions

What is the gradient of f(x,y,z)=xxy+z2?
Choose 1 answer:

f is a function with a three-dimensional input and a three-dimensional output. As such, it is nicely visualized with a vector field in three-dimensional space.
Khan Academy video wrapper

Interpreting the gradient

In each example above, we pictured f as a vector field, but how do we interpret these vector fields?
More concretely, let's think about the case where the input of f is two-dimensional. The gradient turns each input point (x0,y0) into the vector
f(x0,y0)=[fx(x0,y0)fy(x0,y0)].
What does that vector tell us about the behavior of the function around the point (x0,y0)?
Steepest ascent concept.
Think of the graph of f as a hilly terrain. If you are standing on the part of the graph directly above—or below—the point (x0,y0), the slope of the hill depends on which direction you walk. For example, if you step straight in the positive x direction, the slope is fx; if you step straight in the positive y-direction, the slope is fy. But most directions are some combination of the two.
The most important thing to remember about the gradient: The gradient of f, if evaluated at an input (x0,y0), points in the direction of steepest ascent.
So, if you walk in the direction of the gradient, you will be going straight up the hill. Similarly, the magnitude of the vector f(x0,y0) tells you what the slope of the hill is in that direction.
It is not immediately clear why putting the partial derivatives into a vector gives you the slope of steepest ascent, but this will be explained once we get to directional derivatives.
When the inputs of a function f live in more than two dimensions, we can no longer comfortably picture its graph as hilly terrain. That said, the same underlying idea holds. Whether the input space of f is two-dimensional, three-dimensional, or 1,000,000-dimensional: the gradient of f gives a vector in that input space that points in the direction that makes the function f increase the fastest.

Example 3: What local maxima look like

Consider the function f(x,y)=x4+4(x2y2)3. What is its gradient?
Choose 1 answer:

Here's what the graph of f looks like:
Graph of f(x,y)=x4+4(x2y2)3
Notice that it has two peaks. Here's what the vector field for f looks like—vectors colored more red should be understood to be longer, and vectors colored more blue should be understood to be shorter:
The two input points corresponding with the peaks in the graph of f are surrounded by arrows directed towards those points. Why?
This is because near the top of a hill, the direction of steepest ascent always points towards the peak.
Reflection question: What would the gradient field of a function look like near the local minimum of that function?

The gradient is perpendicular to contour lines

Like vector fields, contour maps are also drawn on a function's input space, so we might ask what happens if the vector field of f sits on top of the contour map corresponding for f.
For example, let's take the function f(x,y)=xy:
Contour map of xy
Gradient field of xy
Both contour map and gradient field of xy
Looking at the image above, you might notice something interesting: Each vector is perpendicular to the contour line it touches.
To see why this is true, take a particular contour line, say the one representing the output two, and zoom in to a point on that line. We know that the gradient f points in the direction which increases the value of f most quickly. There are two ways to think about this direction:
  1. Choose a fixed step size, and find the direction such that a step of that size increases f the most.
    Given steps of a constant size away from a particular point, the gradient is the one which increases f the most.
    Figure 1
  2. Choose a fixed increase in f, and find the direction such that it takes the shortest step to increase f by that amount.
    Given steps which increase f by a given size, the gradient direction is the shortest among these.
    Figure 2
Either way, you're trying to maximize the rise over run, either by maximizing the rise, or minimizing the run.
Contour maps provide a good illustration of what this second perspective might look like. In Figure 2 above, there is a second contour line representing 2.1, which is slightly greater than the value 2 represented by the initial line. The gradient of f should point in the direction that will get to this second line with as short a step as possible.
The more we zoom in, the more these lines will look like straight, parallel lines. The shortest path from one line to another that is parallel to it is always perpendicular to both lines, so the gradient will look perpendicular to the contour line.

The del operator

In multivariable calculus—and beyond—the word operator comes up a lot. This might sound fancy, but for the most part, you can think of operator as meaning "thing which turns a function into another function".
The derivative is one example of an operator since it turns a function f into a new function f. Differential operators are all operators that extend the idea of a derivative to a different context.
Example differential operators
NameSymbolExample
Derivativeddxddx(x2)=2x
Partial derivativexx(x2xy)=2xy
Gradient(x2xy)=[2xyx]
This symbol is referred to either as nabla or del. Typically nabla refers to the symbol itself while del refers to the operator it represents. This can be confusing since del can also refer to the symbol , but hey, when has math terminology ever been reasonable?
Whatever you want to call it, the operator can be loosely thought of as a vector of partial derivative operators:
=[xy]
This isn't quite a real definition. For one thing, the dimension of this vector is not defined since it depends on how many inputs there are in the function is applied to. Furthermore, it's playing things pretty fast and loose to make a vector out of operators. But, because in practice the meaning is usually clear, people rarely worry about it.
Imagine "multiplying" this vector by a scalar-valued function:
f=[xy]f=[fxfy]
Of course, this is not multiplication, you are really just evaluating each partial derivative operator on the function. Nevertheless, this is a super helpful way to think about since it comes up again in the context of several more operators we will learn about later: divergence, curl, and the Laplacian.

Summary

  • The gradient of a scalar-valued multivariable function f(x,y,), denoted f, packages all its partial derivative information into a vector:
    f=[fxfy]
    In particular, this means f is a vector-valued function.
  • If you imagine standing at a point (x0,y0,) in the input space of f, the vector f(x0,y0,) tells you which direction you should travel to increase the value of f most rapidly.
  • These gradient vectors f(x0,y0,) are also perpendicular to the contour lines of f.

Want to join the conversation?

  • duskpin ultimate style avatar for user Vesti Sterlingov
    How would I be able to type in the symbol "nabla" on a mac keyboard?
    (14 votes)
    Default Khan Academy avatar avatar for user
    • orange juice squid orange style avatar for user Sam Ley
      I know this is a bit late, but hopefully still helpful for someone. On recent versions of OS X, you can get to the "Character Viewer" from any application by going to Edit > Emoji and Symbols. If you don't see the symbol you are looking for in the list (many math symbols aren't there), click the button on the upper right corner to expand the list, giving you the full character viewer. From there, the "Math Symbols" section includes many helpful symbols, including ∇!
      (32 votes)
  • male robot hal style avatar for user blackmarkt
    Does anyone know of a good online resource that has Multivariable practice questions?
    (9 votes)
    Default Khan Academy avatar avatar for user
  • male robot johnny style avatar for user Gabriel Costa
    I have a question about the gradient: after watching linear algebra videos on linear transformations, we've learned that a transformation T, which takes place from IR^n ---» IR^m represents a matrix with m rows and n columns. Now being aware of this fact, let's assume a function f(x,y) = x^2 - xy, where f: IR^2 ---» IR. Why is the gradient represented as a 2x1 matrix (2 rows and 1 column) and not as a 1x2 matrix (1 row and 2 columns)? The gradient here is represented as a vector field, not as a scalar field, but why?
    (7 votes)
    Default Khan Academy avatar avatar for user
    • leaf grey style avatar for user Qeeko
      Whether you represent the gradient as a 2x1 or as a 1x2 matrix (column vector vs. row vector) does not really matter, as they can be transformed to each other by matrix transposition. If a is a point in R², we have, by definition, that the gradient of ƒ at a is given by the vector

      ∇ƒ(a) = (∂ƒ/∂x(a), ∂ƒ/∂y(a)),

      provided the partial derivatives ∂ƒ/∂x and ∂ƒ/∂y of ƒ exist at a. Note that ∇ƒ(a) is a vector. Thus ∇ƒ maps a vector a in R² to the vector ∇ƒ(a) in R², so that ∇ƒ: R² ➝ R² is a vector field (and not a scalar field).

      Edit
      Going slightly on a tangent here: the gradient ∇ƒ is closely related to the (total) derivative of ƒ. The total derivative of ƒ at a (if it exists) is the unique linear transformation ƒ'(a): R² ➝ R such that

      |ƒ(x) - ƒ(a) - ƒ'(a)(x - a)| / ‖x - a‖ ➝ 0

      as xa. In this case, the matrix of ƒ'(a) (that is, the matrix representation of the linear transformation ƒ'(a)) is given by the 1x2 matrix

      Dƒ(a) = [∂ƒ/∂x(a) ∂ƒ/∂y(a)].
      (10 votes)
  • piceratops ultimate style avatar for user Anitej Banerjee
    Where does the gradient point if there are 2 equally steep directions to go in. For that sake, there could be any n number of directions. How does the gradient decide which one to point in if they are equal?
    An example I can think of is the the origin in the graph z = x^2 - y^2. If you go along either x axis, the curve will increase exponentially (but equally) on both sides. What does the gradient vector do in such cases? (In the case of the origin of x^2 - y^2, I believe it gives the 0 vector, as if we're at a local maxima -- which makes sense along the y direction but not along the x direction...)
    (7 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user grvsinghal
      If you actually take the gradient, it becomes [2x, -2y]. so at x-axis, put y = 0, and the gradient becomes [2x, 0]. Now If you are at x = 0, then gradient is [0,0] which does not tell you to go anywhere i.e. does not point in any direction. but as you deviate slightly in any direction, [h,0] or [-h,0], gradient start pointing in a specific direction, which is the direction of steepest ascent.
      (6 votes)
  • blobby green style avatar for user René Wortel
    hi all,
    I am relatively new to Khan Academy and I like it a lot!
    I just started with “multivariable calculus” and I was curious whether I could be of some help (and get some help!) on this forum. (it’s almost 50 years ago that I was taught this stuff; it’s a trip down memory lane for me; I have to refresh it all and that will take me some time).
    Khan Academy makes it very clear that it hopes (rather: expects) that we are teaching each other.
    Unfortunately I see very few well-formulated clear questions; some posts are not even questions at all! No wonder that you (we!) get little or no response. Let’s try to change that! Hope you don’t find me presumptuous. Do you agree with me that we all should make better use of this site and its possibilities? In the near future I hope to comment on some of your questions (please don’t take offense!) and I will problably pose some questions myself. See You!!
    Let me emphasize my question: Do you agree with me that we all should make better use of this site and its possibilities?
    (7 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user André Carneiro
    I'm missing exercises on multivarible calculus.
    This was a great article thought.
    (3 votes)
    Default Khan Academy avatar avatar for user
  • starky tree style avatar for user muhtasim adib
    My question is suppose we are standing in the xy-plane now-

    1.the gradient of the function shows us the direction of the steepest accent?

    OR

    2.the gradient of the function shows us its value or length by which we can see that in which way its length is minimum and by that we can get the steepest accent of the function?

    I am actually confused about the direction of the gradient weather its parallel to the xy-plane or in the direction of the steepest accent?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • female robot grace style avatar for user loumast17
      The gradient gives us a vector, specifically a 2D vector. This vector is going to be parallel to the xy axis.

      Now, you will have a point (x,y,z) on a graph of f(x,y) the gradient says at point (x,y,z) if you rotate yourself to face the vector you get from the gradient at that point, if you proceed forward the rate of change relative to the z axis will be the greatest in that direction. I will do an example.

      let's just use f(x,y) = x^2 + y^2
      gradient = <2x, 2y>

      let's use the point (2, 3, 13) here the gradient is <2*2, 2*3> = <4, 6>. If you need find the angle on the xy plane .

      Now, on point (2,3,13) if you imagine yourself standing there holding a compss, you would use the compass to you are facing in the direction of the gradient. Now, once you are doing that, walking forward gives you the fastest path up the "hill" you are standing on.

      Worth saying the negative gradient is the steepest path down the hill.

      So the vector that would point up the hill, so with some measure not parallel with the xy plane is different.

      Let me know if that didn't help
      (4 votes)
  • leafers tree style avatar for user Takashi
    "If you imagine standing at a point (x0,y0,…) in the input space of f, the vector ∇f(x0,y0,…) tells you which direction you should travel to increase the value of f most rapidly."

    So this means that the gradient does not point towards the top of a mountain, but to the steepest point, correct?

    If I have a tilted plane, what would the gradient be? Zero?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • piceratops tree style avatar for user Naira
      Might be a little late, but I'll answer in case someone finds it useful...

      the gradient points to the steepest path, like the text you quoted says. I does not point to the steepest point. If you were trying to climb a mountain as quickly as posible, you could use the gradient as a "compass" that would always tell you the fastest way to get to the top (without considering physical restrictions, of course).

      It's easy to see a plane example, let's say: f(x,y) = x + y
      what's the gradient? [1,1]
      what does it mean? it means that, no matter where you are, the steepest point is always in that direction ([1,1]).

      To get a gradient that is always zero you function would have to be constant.
      (4 votes)
  • leaf red style avatar for user Kunjaan
    This is regarding the question "Why are the vectors in this vector field so small along the upward diagonal stripe?"

    I understood why the vector along the line y=2x is going to be 0 but how can you deduce that the vectors close to that line have small horizontal component? Cant they just jump around? Is it because the functions are linear? What is the intuition behind this? Thanks.
    (3 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user René Wortel
      I’m not sure why I am doing this: answering a 2-year old question by someone who probably already got on with his life. Kunjaan unfortunately does not refer explicitly to the “reflection question” under example 1 of the gradient article.
      In the answer that you can find by clicking “show answer”, the author explains (quite lucidly in my opinion) that de gradient (of which the x-component=2x-y and y-component=x) is smallest in the vicinity of the line y=2x and close to the y-axis. In that region the gradient approaches (0,0) wich gives us small vectors. Just read the text closely.
      If you formulate your questions precisely, you are more likely to get a timely answer.
      (2 votes)
  • blobby green style avatar for user Vibha Agarwal
    When defining the gradient of f as a vector, there is a typo: it says partial derivative of "0" but should be partial derivative of "y"
    (3 votes)
    Default Khan Academy avatar avatar for user