If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

## Multivariable calculus

### Course: Multivariable calculus>Unit 3

Lesson 3: Optimizing multivariable functions

# Second partial derivative test intuition

The second partial derivative test is based on a formula which seems to come out of nowhere. Here, you can see a little more intuition for why it looks the way it does. Created by Grant Sanderson.

## Want to join the conversation?

• where is the written form of that article you talked about? • I know this video is meant with the best intention but I feel the intuition for the dxdy (cross term) was not explained well. Can someone clarify it to me why it matters and why it measures how "similar" we are to f(x,y) = xy? • I think the intuition is that if we check concavity along only the x-input and y-input, we may get what appears to be a consistent result. For example, they may both have second partial derivatives that are positive, indicating the output is concave up along both axes. However, if we look at the concavity along inputs that include both x and y (ie dxdy), it could be revealed that the concavity is not consistent and we may have a saddle point. I'm not sure what you mean about "how 'similar' we are to f(x,y) = xy."
• (Might be helpful for future readers) the point is just that for the point to be a maxima or minima, the directional second derivative always has the same sign, u.Del u.Del f = 0 has no roots for unit vectors u. You can rewrite this as u^T H u = 0 having no roots.

If you simplify this, setting the y-component of u, u_y as sqrt(1-u_x^2) you get a quadratic in u_x. So you look at the discriminant of this, which simplifies to f_xy^2*(f_xy^2 - f_xx f_yy).

If this is positive (i.e. the term given in the video is negative), it means you have two roots, which give you the points where the double-derivative flips sign, and the point is a saddle point.

If it is negative, it has no roots, and therefore the sign is always the same, i.e. all the second directional derivatives agree.

If the discriminant is zero, it means that there is exactly one direction in which the double-derivative is zero, i.e. you have a straight line (since the directional derivative is also zero, this means the straight line is flat, parallel to the xy-plane like in the video). • It seems to me that the f_xy^2 term is really just (f_xy)(f_yx) in disguise. The fact that f_xy = f_yx is a convenient computational coincidence, but it seems conceptually inconsistent with the (f_xx)(f_yy) term. • What would really help Grant's graphs, and make things clearer, is if he he had big clear, X,Y, and Z labels on his axes, and even if he color coded the axes and their labels on top. • Will it be a saddle point if the second partial derivative of x and the second partial derivative of y have different signs? As in one is positive and the other is negative?
If this is the case then we probably won't have to check the mixed partial derivative. • What if we have more than two variables, how to know either it is max, or min or saddle • Is there any reason why the expression used in the test is the determinant of the Hessian matrix? • I'm not sure why the interesting parts here was skipped and why he skipped the important reason for that specific formula, I get that he tries to explain it without using Linear Algebra, which is strange as it was a prerequisite for this course. Anyway, let me give you a brief intuition.

The whole reason why we have that formula is because it is the DETERMINANT of the Hessian matrix. Why you may ask? Because for a diagonalisable matrix, where we have a a linear independent eigenbasis, the determinant of the matrix equals the product of the eigenvalues. So what does this have to do with second partial derivatives and the Hessian?

Well, if you remember the quadratic approximation of a function? That will look very similar to the function at a specific point, in this case a critical point and will have a very similar shape to a maximum or minimum point as parabola shaped functions can perfectly "hug" the graph.

So why is this important? This comes from the quadratic form from Linear Algebra (x^T)Ax where A is a symmetric matrix and guess what? The Hessian is a symmetric matrix. Therefore all the properties of the quadratic form applies in this case.

If we approximate a function using the Taylor series at a critical point, the first derivative will of course be zero so we will have something like T(p) = f(p) + (p - p_0)^T H p. As we are basically only after the "form" of the function (and by form I mean how it looks around that point), we can remove all the shifting up/down & left/right which will place it at the origin and we will end up with the standard quadratic form from Linear Algebra: T(p) =(p)^T Hp.

Without going to deep here as it would require explaining change of basis (another large topic), but there exists a diagonal representation of the Hessian matrix in another basis such as:

(p)^T Hp = (q)^T D q where q is the coordinate vector of p. Anyway, guess what values D contains? The eigenvalues! This means that the eigenvalues affects q directly (after multiplication) and as q will always be positive because we have a diagonal matrix and therefore all q's will be squared, the eigenvalues needs to have the same sign in order to "move" in the same direction i.e the function will have all positive or negative points and be definite.

In order to get a negative point from q, the corresponding eigenvalue needs to be negative and if we have different signs for the eigenvalues, we will have different points at the quadratic function being either positive or negative, i.e a indefinite function or saddle point.

As the determinant is the product of the eigenvalues, the only way to get a negative value is if the signs are different, else we will always get a positive value. We can of course also get zero which is the case when The Hessian is linear dependent and we can't create a basis from the eigenvectors. I guess zero can be viewed as the function is neither concave or convex in both directions, either only in one or in none (flat plane). And btw, the eigenvalues directly affects how concave or convex a quadratic function is.  