Is the Hessian just the Jacobian of the gradient function?

That sounds right, I believe that is another legitimate way of interpreting the Hessian.

How do you write the Hessian matrix notation by hand? Surely Boldface H is only used in printed form? I mean, that's the case with vectors. Written by hand, you draw an arrow over the letter. And does the Hessian matrix have anything to do with the symbol "Ĥ"? They look similar. (I found this symbol in Schrödinger's Equations, as in Ĥ|ɸ> = i ∂/∂t |ɸ>.)

There are numerous ways to denote the Hessian, but the most common form (when writing) is just to use a capital 'H' followed by the function (say, 'f') for which the second partial derivatives are being taken. For example, H(f). It is not necessary to bold, but it does help. The fact that it is capitalised helps in identifying the fact that it is a matrix. Furthermore, the 'Ĥ' in Schrödinger's Equation in Quantum Mechanics is known as the _Hamiltonian_, which is different from the _Hessian_. Hope that clears things up.

In the video, where he is giving d^2f/dydx, he says in explanation that this is first in respect to x (which he puts second), then in respect to y (which he puts first). And the he does opposite for d^2f/dxdy. Here is the interpretation I got from google, as well as ChatGPT: "The notation "d^2F/dydx" typically refers to taking the derivative of F with respect to y first and then taking the derivative of the result with respect to x." Which is correct, the video or the web?

The video is correct. Khan Academy 1, internet 0. Of course, if the function is continuous, then by Clairaut's Theorem it doesn't matter which order you differentiate. But if order matters, then you read the differentials in the denominator from right to left (backwards). For example, d^3F/dydxdz, you would differentiate with respect to z first, then x, and finally y. You work nested derivatives from the inside out.

I thought that constants become 0 when taking the derivate, the same way for instance a "4" would go away when taking a derivative. Why is that not the case here?

If you had f(x) = x + 4, then f'(x) = d/dx[x + 4] = d/dx[x] + d/dx[4] = 1 + 0 = 1 - - - - the 4 "went away" But if you have f(x) = 4x, then f'(x) = d/dx[4x] = 4 d/dx[x] = (4)(1) = 4 - - - - here the 4 stays, why? So if I had f(x,y) = e^x + sin(y), and wanted ∂f/∂x[f(x,y)], we would have ∂f/∂x[f(x,y)] = ∂f/∂x[e^x] + ∂f/∂x[sin(y)] = e^x + 0 = e^x, and the sin(y) "goes away" But, If you have f(x,y) = (e^x)(sin(y)) and want ∂f/∂x[f(x,y)], we can think of the sin(y) as a constant and remove it from the differential operator just like we did the 4 above . . . . ∂f/∂x[f(x,y)] = ∂f/∂x[(e^x)(sin(y)] = sin(y)∂f/∂x[e^x} = sin(y)e^x or (e^x)(sin(y)). When dealing with variables that are being _*multiplied by a constant*_, we can take the constant out of the differential operator. In this case, sin(y) is a constant because in this example we are differentiating with respect to x I hope that helped. Stefen

Main content

Course: Multivariable calculus > Unit 3

Lesson 2: Quadratic approximations

The Hessian matrix

Name: The Hessian matrix
Uploaded: 2016-06-16T19:33:03Z
Description: The Hessian matrix is a way of organizing all the second partial derivative information of a multivariable function.

Google Classroom

The Hessian matrix is a way of organizing all the second partial derivative information of a multivariable function. Created by Grant Sanderson.

Want to join the conversation?

Sort by:

Alexander Wu
Posted 8 years ago. Direct link to Alexander Wu's post “How do you write the Hess...”
How do you write the Hessian matrix notation by hand? Surely Boldface H is only used in printed form?

I mean, that's the case with vectors. Written by hand, you draw an arrow over the letter.

And does the Hessian matrix have anything to do with the symbol "Ĥ"? They look similar. (I found this symbol in Schrödinger's Equations, as in Ĥ|ɸ> = i ∂/∂t |ɸ>.)
Button navigates to signup pageButton navigates to signup page
(5 votes)
Answer
- SteveG
  Posted 8 years ago. Direct link to SteveG's post “There are numerous ways t...”
  There are numerous ways to denote the Hessian, but the most common form (when writing) is just to use a capital 'H' followed by the function (say, 'f') for which the second partial derivatives are being taken. For example, H(f).
  It is not necessary to bold, but it does help.
  The fact that it is capitalised helps in identifying the fact that it is a matrix.
  
  Furthermore, the 'Ĥ' in Schrödinger's Equation in Quantum Mechanics is known as the Hamiltonian, which is different from the Hessian.
  
  Hope that clears things up.
  Button navigates to signup page
  (22 votes)
Surya Raju
Posted 4 years ago. Direct link to Surya Raju's post “Is the Hessian just the J...”
Is the Hessian just the Jacobian of the gradient function?
Button navigates to signup pageComment on Surya Raju's post “Is the Hessian just the J...”
(10 votes)
Answer
- Yuya Fujikawa
  Posted 4 years ago. Direct link to Yuya Fujikawa's post “That sounds right, I beli...”
  That sounds right, I believe that is another legitimate way of interpreting the Hessian.
  Button navigates to signup page
  (3 votes)
richard.l.schuurmans
Posted 8 years ago. Direct link to richard.l.schuurmans's post “I thought that constants ...”
I thought that constants become 0 when taking the derivate, the same way for instance a "4" would go away when taking a derivative. Why is that not the case here?
Button navigates to signup pageButton navigates to signup page
(1 vote)
Answer
- Stefen
  Posted 8 years ago. Direct link to Stefen's post “If you had f(x) = x + 4, ...”
  If you had f(x) = x + 4, then f'(x) = d/dx[x + 4] = d/dx[x] + d/dx[4] = 1 + 0 = 1 - - - - the 4 "went away"
  But
  if you have f(x) = 4x, then f'(x) = d/dx[4x] = 4 d/dx[x] = (4)(1) = 4 - - - - here the 4 stays, why?
  
  So if I had f(x,y) = e^x + sin(y), and wanted ∂f/∂x[f(x,y)], we would have
  ∂f/∂x[f(x,y)] = ∂f/∂x[e^x] + ∂f/∂x[sin(y)] = e^x + 0 = e^x, and the sin(y) "goes away"
  But,
  If you have f(x,y) = (e^x)(sin(y)) and want ∂f/∂x[f(x,y)], we can think of the sin(y) as a constant and remove it from the differential operator just like we did the 4 above . . . .
  ∂f/∂x[f(x,y)] = ∂f/∂x[(e^x)(sin(y)] = sin(y)∂f/∂x[e^x} = sin(y)e^x or (e^x)(sin(y)).
  
  When dealing with variables that are being multiplied by a constant, we can take the constant out of the differential operator. In this case, sin(y) is a constant because in this example we are differentiating with respect to x
  
  I hope that helped.
  Stefen
  Button navigates to signup page
  (7 votes)
Moonchilde
Posted a year ago. Direct link to Moonchilde's post “In the video, where he is...”
In the video, where he is giving d^2f/dydx, he says in explanation that this is first in respect to x (which he puts second), then in respect to y (which he puts first). And the he does opposite for d^2f/dxdy.

Here is the interpretation I got from google, as well as ChatGPT: "The notation "d^2F/dydx" typically refers to taking the derivative of F with respect to y first and then taking the derivative of the result with respect to x."

Which is correct, the video or the web?
Button navigates to signup pageComment on Moonchilde's post “In the video, where he is...”
(2 votes)
Answer
- Elijah Daniels
  Posted a year ago. Direct link to Elijah Daniels's post “The video is correct. Kha...”
  The video is correct. Khan Academy 1, internet 0.
  Of course, if the function is continuous, then by Clairaut's Theorem it doesn't matter which order you differentiate. But if order matters, then you read the differentials in the denominator from right to left (backwards). For example, d^3F/dydxdz, you would differentiate with respect to z first, then x, and finally y. You work nested derivatives from the inside out.
  Comment on Elijah Daniels's post “The video is correct. Kha...”
  (5 votes)
Mohammed Ghaïth
Posted a year ago. Direct link to Mohammed Ghaïth's post “Why is the name pronounce...”
Why is the name pronounced with a 'sh' sound and not with an 's' sound?!
Button navigates to signup pageComment on Mohammed Ghaïth's post “Why is the name pronounce...”
(2 votes)
Answer
ЕБА(БЕ)_Anatolii_Onyshchenko
Posted a month ago. Direct link to ЕБА(БЕ)_Anatolii_Onyshchenko 's post “Is the Hessian just the J...”
Is the Hessian just the Jacobian of the gradient function?
Button navigates to signup pageButton navigates to signup page
(1 vote)
Answer
bohdankolisnyk
Posted a month ago. Direct link to bohdankolisnyk's post “What are possible uses of...”
What are possible uses of Hessian matrix?
Button navigates to signup pageButton navigates to signup page
(1 vote)
Answer
- joshua
  Posted 19 days ago. Direct link to joshua's post “Hessian matrices belong t...”
  Hessian matrices belong to a class of mathematical structures that involve second order derivatives. They are often used in machine learning and data science algorithms for optimizing a function of interest.
  
  https://machinelearningmastery.com/a-gentle-introduction-to-hessian-matrices/
  Button navigates to signup page
  (1 vote)
Ardra
Posted 17 days ago. Direct link to Ardra's post “So the Hessian is the gra...”
So the Hessian is the gradient equivalent for the second derivative of scalar valued functions.

Is the Hessian a a rank-2 tensor?
Button navigates to signup pageButton navigates to signup page
(1 vote)
Answer

Video transcript

- [Voiceover] Hey guys. Before talking about the vector form for the quadratic approximation of multivariable functions, I've got to introduce this thing called the Hessian matrix. Essentially what this is, is just a way to package all the information of the second derivatives of a function. Let's say you have some kind of multivariable function like the example we had in the last video, e to the x halves multiplied by sine of y, so some kind of of a multivariable function. What the Hessian matrix is, and it's often denoted with an H, but a bold faced H, is it's a matrix, incidentally enough, that contains all the second partial derivatives of f. The first component is gonna be, the partial derivative of f with respect to x twice in a row, and everything in this first column is kind of like you first do it with respect to x, because the next part is the second derivative where first you do it with respect to x and then you do it with respect to y. That's the first column of the matrix. Then up here it's the partial derivative where first you do it with respect to y and then you do it with respect to x, and then over here it's where you do it with respect to y both times in a row. Partial with respect to y both times in a row. Let's go ahead and actually compute this and think about what this would look like in the case of our specific function here. In order to get all the second partial derivatives we first should keep a record of the first partial derivatives. The partial derivative of f with respect to x. The only place x shows up is in this e to the x halves. Bring down that 1/2 e to the x halves and sine of y just looks like a constant as far as x is concerned. Sine of y. Then the partial derivative with respect of y. Partial derivative of f with respect to y. Now e to the x halves looks like a constant and it's being multiplied by something that has a y in it, e to the x halves. The derivative of sine of y, since we're doing it with respect to y is cosine of y. These terms won't be included in the Hessian itself but we're just keeping a record of them because now when we go into fill in the matrix, this upper left component, we're taking the second partial derivative where we do it with respect to x then x again. Up here is when we did it with respect to x, if we did it with respect to x again we bring down another 1/2 so that becomes 1/4 by e to the x halves and that sine of y just still looks like a constant. Then this mixed partial derivative where we do it with respect to x then y, so we did it with respect to x here. When we differentiate this with respect to y, the 1/2 e to the x halves just looks like a constant but then derivative of sine of y ends up as cosine of y. Then up here, it's gonna be the same thing but let's see when you do it in the other direction, when you do it first with respect to y then x. Over here we did it first with respect to y. If we took this derivative with respect to x, you'd have the 1/2 would come down, so that would be 1/2 e to the x halves multiplied by cosine of y because that's just looks like a constant since we're doing it with respect to x the second time. That would be cosine of y, and it shouldn't feel like a surprise that both of these terms turn out to be the same. With most functions that's the case. Technically not all functions. You can come up with some crazy things where this won't be symmetric, where you’ll have different terms in the diagonal, but for the most part those you can expect to be the same. In this last term here where we do it with respect to y twice, we now think of taking the derivative of this whole term with respect to y, that e to the x halves looks like a constant and derivative of cosine is negative sine of y. This whole thing, a matrix, each of whose components is a multivariable function, is the Hessian. This is the Hessian of f, and sometimes bold write it as Hessian of f specifying what function its of. You could think of it as a matrix valued function which feels kind of weird but you plug in two different values, x and y, and you'll get a matrix, so it's this matrix valued function. The nice thing about writing it like this is that you can actually extend this so that rather than just for functions that have two variables, let's say you had a function, kind of like clear this up, let's say u had a function that had three variables or four variables or any number. Let's say it was a function of x, y, and z, then you can follow this pattern and following down the first column here the next term that you would get would be the second partial derivative of f, where first you do with respect to x, and then you do it with respect to z. Then over here it would be the second partial derivative of f, where first you did it with respect to y and then you do it with respect to z, I'll clear up even more room here, because you'd have another column where you'd have the second partial derivative, where this time everything first you do it with respect to z and then with respect to x. Then over here you'd have the second partial derivative where first you do it with respect to z and then with respect to y. Then there is the very last component you'd have the second partial derivative where first you do it with respect to, well, I guess you do it with respect to z twice. This whole thing, this three by three matrix would be the Hessian of a three variable function. You can see how you could extend this pattern where if it was a four variable function you'd get a four by four matrix of all of the possible second partial derivatives. If it was a 100 variable function you would have a 100 by 100 matrix. The nice thing about having this is then we can talk about that by just referencing the symbol and we'll see in the next video how this makes it very nice to express, for example the quadratic approximation of any kind of multivariable function not just a two variable function and the symbols don't get way out of hand 'cause you don't have to reference each one of these individual components. You can just reference the matrix as a whole and start doing matrix operations. I will see you in that next video.