The gradient stores all the partial derivative information of a multivariable function.  But it's more than a mere storage device, it has several wonderful interpretations and many, many uses.

What you need to be familiar with before starting this lesson:

What we're building toward

  • The gradient of a scalar-valued multivariable function f(x,y,)f(x, y, \dots), denoted f\nabla f, packages all its partial derivative information into a vector:
    f=[fxfy] \nabla f = \left[ \begin{array}{c} \dfrac{\partial f}{\blueE{\partial x}} \\\\ \dfrac{\partial f}{\redE{\partial y}} \\\\ \vdots \end{array} \right]
    In particular, this means f\nabla f is a vector-valued function.
  • If you imagine standing at a point (x0,y0,x_0, y_0, \dots) in the input space of ff, the vector f(x0,y0,)\nabla f(x_0, y_0, \dots) tells you which direction you should travel to increase the value of ff most rapidly.
  • These gradient vectors—f(x0,y0,)\nabla f(x_0, y_0, \dots)—are also perpendicular to the contour lines of ff.

Definition

After learning that functions with a multidimensional input have partial derivatives, you might wonder what the full derivative of such a function is. In the case of scalar-valued multivariable functions, meaning those with a multidimensional input but a one-dimensional output, the answer is the gradient.
The gradient of a function ff, denoted as f\nabla f, is the collection of all its partial derivatives into a vector.
This is most easily understood with an example.

Example 1: Two dimensions

If f(x,y)=x2xyf(x, y) = x^2 - xy, which of the following represents f\nabla f?
Notice, f\nabla f is a vector-valued function, specifically one with a two-dimensional input and a two-dimensional output. This means it can be nicely visualized with a vector field. That vector field lives in the input space of ff, which is the xyxy-plane.
This vector field is often called the gradient field of ff.
Reflection question: Why are the vectors in this vector field so small along the upward diagonal stripe in the middle of the xyxy-plane?

Example 2: Three dimensions

What is the gradient of f(x,y,z)=xxy+z2f(x, y, z) = x - xy + z^2?
f\nabla f is a function with a three-dimensional input and a three-dimensional output. As such, it is nicely visualized with a vector field in three-dimensional space.

Interpreting the gradient

In each example above, we pictured f\nabla f as a vector field, but how do we interpret these vector fields?
More concretely, let's think about the case where the input of ff is two-dimensional. The gradient turns each input point (x0,y0)(x_0, y_0) into the vector
f(x0,y0)=[fx(x0,y0)fy(x0,y0)].\begin{aligned} \quad \nabla f(x_0, y_0) = \left[ \begin{array}{c} \\ \frac{\partial f}{\partial x}(x_0, y_0) \\ \\ \frac{\partial f}{\partial y}(x_0, y_0) \end{array} \right]. \end{aligned}
What does that vector tell us about the behavior of the function around the point (x0,y0)(x_0, y_0)?
Think of the graph of ff as a hilly terrain. If you are standing on the part of the graph directly above—or below—the point (x0,y0)(x_0, y_0), the slope of the hill depends on which direction you walk. For example, if you step straight in the positive xx direction, the slope is fx\frac{\partial f}{\partial x}; if you step straight in the positive yy-direction, the slope is fy\frac{\partial f}{\partial y}. But most directions are some combination of the two.
The most important thing to remember about the gradient: The gradient of ff, if evaluated at an input (x0,y0)(x_0, y_0), points in the direction of steepest ascent.
So, if you walk in the direction of the gradient, you will be going straight up the hill. Similarly, the magnitude of the vector f(x0,y0)\nabla f(x_0, y_0) tells you what the slope of the hill is in that direction.
It is not immediately clear why putting the partial derivatives into a vector gives you the slope of steepest ascent, but this will be explained once we get to directional derivatives.
When the inputs of a function ff live in more than two dimensions, we can no longer comfortably picture its graph as hilly terrain. That said, the same underlying idea holds. Whether the input space of ff is two-dimensional, three-dimensional, or 1,000,000-dimensional: the gradient of ff gives a vector in that input space that points in the direction that makes the function ff increase the fastest.

Example 3: What local maxima look like

Consider the function f(x,y)=x4+4(x2y2)3f(x, y) = -x^4 + 4(x^2 - y^2)-3. What is its gradient?
Here's what the graph of ff looks like:
Notice that it has two peaks. Here's what the vector field for f\nabla f looks like—vectors colored more red should be understood to be longer, and vectors colored more blue should be understood to be shorter:
The two input points corresponding with the peaks in the graph of ff are surrounded by arrows directed towards those points. Why?
This is because near the top of a hill, the direction of steepest ascent always points towards the peak.
Reflection question: What would the gradient field of a function look like near the local minimum of that function?

The gradient is perpendicular to contour lines

Like vector fields, contour maps are also drawn on a function's input space, so we might ask what happens if the vector field of f\nabla f sits on top of the contour map corresponding for ff.
For example, let's take the function f(x,y)=xyf(x, y) = xy:
Looking at the image above, you might notice something interesting: Each vector is perpendicular to the contour line it touches.
To see why this is true, take a particular contour line, say the one representing the output two, and zoom in to a point on that line. We know that the gradient f\nabla f points in the direction which increases the value of ff most quickly. There are two ways to think about this direction:
  1. Choose a fixed step size, and find the direction such that a step of that size increases ff the most.
  2. Choose a fixed increase in ff, and find the direction such that it takes the shortest step to increase ff by that amount.
Either way, you're trying to maximize the rise over run, either by maximizing the rise, or minimizing the run.
Contour maps provide a good illustration of what this second perspective might look like. In Figure 2 above, there is a second contour line representing 2.1, which is slightly greater than the value 2 represented by the initial line. The gradient of ff should point in the direction that will get to this second line with as short a step as possible.
The more we zoom in, the more these lines will look like straight, parallel lines. The shortest path from one line to another that is parallel to it is always perpendicular to both lines, so the gradient will look perpendicular to the contour line.

The del operator

In multivariable calculus—and beyond—the word operator comes up a lot. This might sound fancy, but for the most part, you can think of operator as meaning "thing which turns a function into another function".
The derivative is one example of an operator since it turns a function ff into a new function ff'. Differential operators are all operators that extend the idea of a derivative to a different context.
Example differential operators
NameSymbolExample
Derivativeddx\frac{d}{dx}ddx(x2)=2x\frac{d}{dx}(x^2) = 2x
Partial derivativex\frac{\partial}{\partial x}x(x2xy)=2xy\frac{\partial}{\partial x}(x^2-xy) = 2x-y
Gradient\nabla(x2xy)=[2xyx]\nabla(x^2 - xy) = \left[\begin{array}{c} 2x - y \\ -x \end{array}\right]
This symbol \nabla is referred to either as nabla or del. Typically nabla refers to the symbol itself while del refers to the operator it represents. This can be confusing since del can also refer to the symbol \partial, but hey, when has math terminology ever been reasonable?
Whatever you want to call it, the operator \nabla can be loosely thought of as a vector of partial derivative operators:
=[xy] \nabla = \left[ \begin{array}{c} \dfrac{\partial}{\partial x} \\ \quad \\ \dfrac{\partial}{\partial y} \\ \quad \\ \vdots \end{array}\right]
This isn't quite a real definition. For one thing, the dimension of this vector is not defined since it depends on how many inputs there are in the function \nabla is applied to. Furthermore, it's playing things pretty fast and loose to make a vector out of operators. But, because in practice the meaning is usually clear, people rarely worry about it.
Imagine "multiplying" this vector by a scalar-valued function:
f=[xy]f=[fxfy]\begin{aligned} \nabla f = \left[ \begin{array}{c} \frac{\partial}{\partial x} \\ \quad \\ \frac{\partial}{\partial y} \\ \quad \\ \vdots \end{array} \right]f = \left[ \begin{array}{c} \frac{\partial f}{\partial x} \\ \quad \\ \frac{\partial f}{\partial y} \\ \quad \\ \vdots \end{array} \right] \end{aligned}
Of course, this is not multiplication, you are really just evaluating each partial derivative operator on the function. Nevertheless, this is a super helpful way to think about \nabla since it comes up again in the context of several more operators we will learn about later: divergence, curl, and the Laplacian.

Summary

  • The gradient of a scalar-valued multivariable function f(x,y,)f(x, y, \dots), denoted f\nabla f, packages all its partial derivative information into a vector:
    f=[fxfy] \nabla f = \left[ \begin{array}{c} \dfrac{\partial f}{\blueE{\partial x}} \\\\ \dfrac{\partial f}{\redE{\partial y}} \\\\ \vdots \end{array} \right]
    In particular, this means f\nabla f is a vector-valued function.
  • If you imagine standing at a point (x0,y0,)(x_0, y_0, \dots) in the input space of ff, the vector f(x0,y0,)\nabla f(x_0, y_0, \dots) tells you which direction you should travel to increase the value of ff most rapidly.
  • These gradient vectors f(x0,y0,)\nabla f(x_0, y_0, \dots) are also perpendicular to the contour lines of ff.
Loading