# The gradient

The gradient stores all the partial derivative information of a multivariable function. But it's more than a mere storage device, it has several wonderful interpretations and many, many uses.

## What you need to be familiar with before starting this lesson:

- Partial derivatives
- Vector fields
- Contour maps—only necessary for one section of this lesson.

## What we're building toward

- The gradient of a scalar-valued multivariable function $f(x, y, \dots)$, denoted $\nabla f$, packages all its partial derivative information into a vector:In particular, this means $\nabla f$ is a vector-valued function.
- If you imagine standing at a point ($x_0, y_0, \dots$) in the input space of $f$, the vector $\nabla f(x_0, y_0, \dots)$ tells you which direction you should travel to increase the value of $f$ most rapidly.
- These gradient vectors—$\nabla f(x_0, y_0, \dots)$—are also perpendicular to the contour lines of $f$.

## Definition

After learning that functions with a multidimensional input have partial derivatives, you might wonder what the full derivative of such a function is. In the case of

**scalar-valued multivariable functions**, meaning those with a multidimensional input but a one-dimensional output, the answer is the gradient.The

**gradient**of a function $f$, denoted as $\nabla f$, is the collection of all its partial derivatives into a vector.This is most easily understood with an example.

## Example 1: Two dimensions

If $f(x, y) = x^2 - xy$, which of the following represents $\nabla f$?

Notice, $\nabla f$ is a

**vector-valued function**, specifically one with a two-dimensional input and a two-dimensional output. This means it can be nicely visualized with a vector field. That vector field lives in the input space of $f$, which is the $xy$-plane.This vector field is often called the

**gradient field**of $f$.**Reflection question**: Why are the vectors in this vector field so small along the upward diagonal stripe in the middle of the $xy$-plane?

## Example 2: Three dimensions

What is the gradient of $f(x, y, z) = x - xy + z^2$?

$\nabla f$ is a function with a three-dimensional input and a three-dimensional output. As such, it is nicely visualized with a vector field in three-dimensional space.

## Interpreting the gradient

In each example above, we pictured $\nabla f$ as a vector field, but how do we interpret these vector fields?

More concretely, let's think about the case where the input of $f$ is two-dimensional. The gradient turns each input point $(x_0, y_0)$ into the vector

What does that vector tell us about the behavior of the function around the point $(x_0, y_0)$?

Think of the graph of $f$ as a hilly terrain. If you are standing on the part of the graph directly above—or below—the point $(x_0, y_0)$, the slope of the hill depends on which direction you walk. For example, if you step straight in the positive $x$ direction, the slope is $\frac{\partial f}{\partial x}$; if you step straight in the positive $y$-direction, the slope is $\frac{\partial f}{\partial y}$. But most directions are some combination of the two.

The most important thing to remember about the gradient: The gradient of $f$, if evaluated at an input $(x_0, y_0)$, points in the direction of steepest ascent.

So, if you walk in the direction of the gradient, you will be going straight up the hill. Similarly,

*the magnitude of the vector $\nabla f(x_0, y_0)$ tells you what the slope of the hill is*in that direction.It is not immediately clear

*why*putting the partial derivatives into a vector gives you the slope of steepest ascent, but this will be explained once we get to directional derivatives.When the inputs of a function $f$ live in more than two dimensions, we can no longer comfortably picture its graph as hilly terrain. That said, the same underlying idea holds. Whether the input space of $f$ is two-dimensional, three-dimensional, or 1,000,000-dimensional: the gradient of $f$ gives a vector in that input space that points in the direction that makes the function $f$ increase the fastest.

## Example 3: What local maxima look like

Consider the function $f(x, y) = -x^4 + 4(x^2 - y^2)-3$. What is its gradient?

Here's what the graph of $f$ looks like:

Notice that it has two peaks. Here's what the vector field for $\nabla f$ looks like—vectors colored more red should be understood to be longer, and vectors colored more blue should be understood to be shorter:

The two input points corresponding with the peaks in the graph of $f$ are surrounded by arrows directed towards those points. Why?

This is because near the top of a hill, the direction of steepest ascent always points towards the peak.

**Reflection question**: What would the gradient field of a function look like near the local minimum of that function?

## The gradient is perpendicular to contour lines

Like vector fields, contour maps are also drawn on a function's input space, so we might ask what happens if the vector field of $\nabla f$ sits on top of the contour map corresponding for $f$.

For example, let's take the function
$f(x, y) = xy$:

Looking at the image above, you might notice something interesting:

*Each vector is perpendicular to the contour line it touches.*To see why this is true, take a particular contour line, say the one representing the output two, and zoom in to a point on that line. We know that the gradient $\nabla f$ points in the direction which increases the value of $f$ most quickly. There are two ways to think about this direction:

- Choose a fixed step size, and find the direction such that a step of that size increases $f$ the most.
- Choose a fixed increase in $f$, and find the direction such that it takes the
*shortest*step to increase $f$ by that amount.

Either way, you're trying to maximize the rise over run, either by maximizing the rise, or minimizing the run.

Contour maps provide a good illustration of what this second perspective might look like. In Figure 2 above, there is a second contour line representing 2.1, which is slightly greater than the value 2 represented by the initial line. The gradient of $f$ should point in the direction that will get to this second line with as short a step as possible.

The more we zoom in, the more these lines will look like straight, parallel lines.

*The shortest path from one line to another that is parallel to it is always perpendicular to both lines*, so the gradient will look perpendicular to the contour line.## The del operator

In multivariable calculus—and beyond—the word

**operator**comes up a lot. This might sound fancy, but for the most part, you can think of operator as meaning "thing which turns a function into another function".The derivative is one example of an operator since it turns a function $f$ into a new function $f'$.

**Differential operators**are all operators that extend the idea of a derivative to a different context.**Example differential operators**

Name | Symbol | Example | |||

Derivative | $\frac{d}{dx}$ | $\frac{d}{dx}(x^2) = 2x$ | |||

Partial derivative | $\frac{\partial}{\partial x}$ | $\frac{\partial}{\partial x}(x^2-xy) = 2x-y$ | |||

Gradient | $\nabla$ | $\nabla(x^2 - xy) = \left[\begin{array}{c} 2x - y \\ -x \end{array}\right]$ |

This symbol $\nabla$ is referred to either as nabla or del. Typically nabla refers to the symbol itself while del refers to the operator it represents. This can be confusing since del can also refer to the symbol $\partial$, but hey, when has math terminology ever been reasonable?

Whatever you want to call it, the operator $\nabla$ can be loosely thought of as a vector of partial derivative operators:

This isn't quite a real definition. For one thing, the dimension of this vector is not defined since it depends on how many inputs there are in the function $\nabla$ is applied to. Furthermore, it's playing things pretty fast and loose to make a vector out of operators. But, because in practice the meaning is usually clear, people rarely worry about it.

Imagine "multiplying" this vector by a scalar-valued function:

Of course, this is not multiplication, you are really just evaluating each partial derivative operator on the function. Nevertheless, this is a

*super*helpful way to think about $\nabla$ since it comes up again in the context of several more operators we will learn about later: divergence, curl, and the Laplacian.## Summary

- The gradient of a scalar-valued multivariable function $f(x, y, \dots)$, denoted $\nabla f$, packages all its partial derivative information into a vector:In particular, this means $\nabla f$ is a vector-valued function.
- If you imagine standing at a point $(x_0, y_0, \dots)$ in the input space of $f$, the vector $\nabla f(x_0, y_0, \dots)$ tells you which direction you should travel to increase the value of $f$ most rapidly.
- These gradient vectors $\nabla f(x_0, y_0, \dots)$ are also perpendicular to the contour lines of $f$.