Main content

## Multivariable calculus

### Unit 3: Lesson 2

Quadratic approximations- What do quadratic approximations look like
- Quadratic approximation formula, part 1
- Quadratic approximation formula, part 2
- Quadratic approximation example
- The Hessian matrix
- The Hessian matrix
- Expressing a quadratic form with a matrix
- Vector form of multivariable quadratic approximation
- The Hessian
- Quadratic approximation

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# The Hessian

The Hessian is a matrix that organizes all the second partial derivatives of a function.

## Background:

## The Hessian matrix

The "

**Hessian matrix**" of a multivariable function f, left parenthesis, x, comma, y, comma, z, comma, dots, right parenthesis, which different authors write as start bold text, H, end bold text, left parenthesis, f, right parenthesis, start bold text, H, end bold text, f, or start bold text, H, end bold text, start subscript, f, end subscript, organizes all second partial derivatives into a matrix:So, two things to notice here:

- This only makes sense for scalar-valued function.
- This object start bold text, H, end bold text, f is no ordinary matrix; it is a matrix with
*functions*as entries. In other words, it is meant to be evaluated at some point left parenthesis, x, start subscript, 0, end subscript, comma, y, start subscript, 0, end subscript, comma, dots, right parenthesis.

As such, you might call this object start bold text, H, end bold text, f a "matrix-valued" function. Funky, right?

One more important thing,

**the word "Hessian" also sometimes refers to the determinant of this matrix,**instead of to the matrix itself.## Example: Computing a Hessian

**Problem**: Compute the Hessian of f, left parenthesis, x, comma, y, right parenthesis, equals, x, cubed, minus, 2, x, y, minus, y, start superscript, 6, end superscript at the point left parenthesis, 1, comma, 2, right parenthesis:

**Solution**: Ultimately we need all the second partial derivatives of f, so let's first compute both partial derivatives:

With these, we compute all four second partial derivatives:

The Hessian matrix in this case is a 2, times, 2 matrix with these functions as entries:

We were asked to evaluate this at the point left parenthesis, x, comma, y, right parenthesis, equals, left parenthesis, 1, comma, 2, right parenthesis, so we plug in these values:

Now, the problem is ambiguous, since the "Hessian" can refer either to this matrix or to its determinant. What you want depends on context. For example, in optimizing multivariable functions, there is something called the "second partial derivative test" which uses the Hessian determinant. When the Hessian is used to approximate functions, you just use the matrix itself.

If it's the determinant we want, here's what we get:

## Uses

By capturing all the second-derivative information of a multivariable function, the Hessian matrix often plays a role analogous to the ordinary second derivative in single variable calculus. Most notably, it arises in these two cases:

- Quadratic approximations of multivariable functions, which is a bit like a second order Taylor expansion, but for multivariable functions.
- The second partial derivative test, which helps you find the maximum/minimum of a multivariable function.

## Want to join the conversation?

- will there be videos and exercises (mostly interested in the exercises) for these topics any time soon?(30 votes)
- Me too. Would love to see exercises in multivariable calculus, differential equations and linear algebra.(22 votes)

- Should the determinant in the final step be: 180xy^4 - 4?(7 votes)
- I agree partially with Marcel Brown; as the determinant is calculated in a 2x2 matrix by ad-bc, in this form bc=(-2)^2 = 4, hence -bc = -4. However, ab.coefficient = 6*-30 = -180, not 180 as Marcel stated.(12 votes)

- Is the Hessian in any way related to the Jacobian matrix?(5 votes)
- More formally:
**H**(f(**x**)) =**J**(∇ f(**x**))^T.

It also relates to the Laplacian as an operator: Δf = ∇²f = trace (**H**(f)).(7 votes)

- Why is the last second partial derivative not -30y^4?(5 votes)
- Why is fyx= d/dy(3x^2-2y) and not d/dx(-2x-6y^5)? Wouldn't it be the partial derivative with respect to x of the first partial derivative with respect to y? I ask the same for fxy= d/dx(-2x-6y^5) not being d/dy(3x^2-2y).(3 votes)
- It is both! Whether you derivate with respect to x first then y, or with respect to y first then x, you get the same answer. Notice here that fxy = fyx = -2. That is Clairaut's theorem.(4 votes)

- why hessian makes sense only for a scalar valued function?(3 votes)
- What are some of the practical applications of the determinant of a Hessian matrix?(2 votes)
- Evaluating it can tell you whether you are at a maximum, minimum, or a saddle point. It has all the same abilities as a second derivative in a uni-variate function.(2 votes)

- Around the last paragraph, the determinant is used to evaluate specific points. But, I was wonder why, in my multivariate calculate classes, we also use the hessian determinant and f_xx to determine local extrema. I was wondering what was the background or the basis for using the determinant of the Hessian Matrix to decided if critical points are local maximum or local minimums? Basically, what is important about the Hessian Determinant? Thank-You!(3 votes)
- We actually don't use the Hessian to determine whether the critical points are local maxima or local minima. We actually use the Hessian to determine whether they are local extrema or saddle points. As for using fxx, it doesn't have to be fxx. You could just as easily use fyy to determine whether the local extremum is a maximum or minimum.

If it is a local minimum, the gradient is pointing away from this point. If it is a local maximum, the gradient is always pointing toward this point. Of course, at all critical points, the gradient is 0. That should mean that the gradient of nearby points would be tangent to the change in the gradient. In other words, fxx and fyy would be high and fxy and fyx would be low.

On the other hand, if the point is a saddle point, then the gradient vectors will all be pointing around the critical point. Therefore at nearby points, the change in the gradient will be orthogonal to the gradient, not tangent. In other words, fxy and fyx would be high and fxx and fyy would be low, or fxx and fyy would have opposite signs. Either way, the Hessian determinant would be negative.(0 votes)

- Since Hessian matrices are only used for trigonometric functions, what can be used to optimise a 3 variable trigonometric function?(1 vote)
- Where it says The Hessian matrix in this case is a 2 x 2 matrix with these functions as entries:

the fxx and fxy are flipped. Cuz under fxx it should be fxy. Then under fyx it should be fyy.(1 vote)