Main content
Multivariable calculus
Course: Multivariable calculus > Unit 3
Lesson 2: Quadratic approximations- What do quadratic approximations look like
- Quadratic approximation formula, part 1
- Quadratic approximation formula, part 2
- Quadratic approximation example
- The Hessian matrix
- The Hessian matrix
- Expressing a quadratic form with a matrix
- Vector form of multivariable quadratic approximation
- The Hessian
- Quadratic approximation
© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice
The Hessian matrix
The Hessian matrix is a way of organizing all the second partial derivative information of a multivariable function. Created by Grant Sanderson.
Want to join the conversation?
- How do you write the Hessian matrix notation by hand? Surely Boldface H is only used in printed form?
I mean, that's the case with vectors. Written by hand, you draw an arrow over the letter.
And does the Hessian matrix have anything to do with the symbol "Ĥ"? They look similar. (I found this symbol in Schrödinger's Equations, as in Ĥ|ɸ> = i ∂/∂t |ɸ>.)(4 votes)- There are numerous ways to denote the Hessian, but the most common form (when writing) is just to use a capital 'H' followed by the function (say, 'f') for which the second partial derivatives are being taken. For example, H(f).
It is not necessary to bold, but it does help.
The fact that it is capitalised helps in identifying the fact that it is a matrix.
Furthermore, the 'Ĥ' in Schrödinger's Equation in Quantum Mechanics is known as the Hamiltonian, which is different from the Hessian.
Hope that clears things up.(15 votes)
- I thought that constants become 0 when taking the derivate, the same way for instance a "4" would go away when taking a derivative. Why is that not the case here?(2 votes)
- If you had f(x) = x + 4, then f'(x) = d/dx[x + 4] = d/dx[x] + d/dx[4] = 1 + 0 = 1 - - - - the 4 "went away"
But
if you have f(x) = 4x, then f'(x) = d/dx[4x] = 4 d/dx[x] = (4)(1) = 4 - - - - here the 4 stays, why?
So if I had f(x,y) = e^x + sin(y), and wanted ∂f/∂x[f(x,y)], we would have
∂f/∂x[f(x,y)] = ∂f/∂x[e^x] + ∂f/∂x[sin(y)] = e^x + 0 = e^x, and the sin(y) "goes away"
But,
If you have f(x,y) = (e^x)(sin(y)) and want ∂f/∂x[f(x,y)], we can think of the sin(y) as a constant and remove it from the differential operator just like we did the 4 above . . . .
∂f/∂x[f(x,y)] = ∂f/∂x[(e^x)(sin(y)] = sin(y)∂f/∂x[e^x} = sin(y)e^x or (e^x)(sin(y)).
When dealing with variables that are being multiplied by a constant, we can take the constant out of the differential operator. In this case, sin(y) is a constant because in this example we are differentiating with respect to x
I hope that helped.
Stefen(8 votes)
- Is the Hessian just the Jacobian of the gradient function?(5 votes)
- That sounds right, I believe that is another legitimate way of interpreting the Hessian.(2 votes)
- Why is the name pronounced with a 'sh' sound and not with an 's' sound?!(1 vote)
Video transcript
- [Voiceover] Hey guys. Before talking about the
vector form for the quadratic approximation of multivariable functions, I've got to introduce this
thing called the Hessian matrix. Essentially what this is,
is just a way to package all the information of the
second derivatives of a function. Let's say you have some kind
of multivariable function like the example we had in the last video, e to the x halves multiplied by sine of y, so some kind of of a
multivariable function. What the Hessian matrix
is, and it's often denoted with an H, but a bold
faced H, is it's a matrix, incidentally enough, that contains all the second partial derivatives of f. The first component is gonna
be, the partial derivative of f with respect to x twice
in a row, and everything in this first column is
kind of like you first do it with respect to x, because
the next part is the second derivative where first you
do it with respect to x and then you do it with respect to y. That's the first column of the matrix. Then up here it's the partial
derivative where first you do it with respect to y and then
you do it with respect to x, and then over here it's
where you do it with respect to y both times in a row. Partial with respect to
y both times in a row. Let's go ahead and actually compute this and think about what this would look like in the case of our specific function here. In order to get all the second
partial derivatives we first should keep a record of the
first partial derivatives. The partial derivative
of f with respect to x. The only place x shows up is
in this e to the x halves. Bring down that 1/2 e to
the x halves and sine of y just looks like a constant
as far as x is concerned. Sine of y. Then the partial derivative
with respect of y. Partial derivative of f with respect to y. Now e to the x halves
looks like a constant and it's being multiplied by
something that has a y in it, e to the x halves. The derivative of sine of
y, since we're doing it with respect to y is cosine of y. These terms won't be included
in the Hessian itself but we're just keeping a record of them because now when we go
into fill in the matrix, this upper left component,
we're taking the second partial derivative where we do it with
respect to x then x again. Up here is when we did
it with respect to x, if we did it with respect
to x again we bring down another 1/2 so that becomes
1/4 by e to the x halves and that sine of y just
still looks like a constant. Then this mixed partial
derivative where we do it with respect to x then y, so we
did it with respect to x here. When we differentiate
this with respect to y, the 1/2 e to the x halves
just looks like a constant but then derivative of sine
of y ends up as cosine of y. Then up here, it's gonna be
the same thing but let's see when you do it in the other direction, when you do it first
with respect to y then x. Over here we did it
first with respect to y. If we took this derivative
with respect to x, you'd have the 1/2 would
come down, so that would be 1/2 e to the x halves
multiplied by cosine of y because that's just looks
like a constant since we're doing it with respect
to x the second time. That would be cosine of y,
and it shouldn't feel like a surprise that both of these
terms turn out to be the same. With most functions that's the case. Technically not all functions. You can come up with some crazy things where this won't be
symmetric, where you’ll have different terms in the
diagonal, but for the most part those you can expect to be the same. In this last term here where
we do it with respect to y twice, we now think of taking
the derivative of this whole term with respect to y, that
e to the x halves looks like a constant and derivative of
cosine is negative sine of y. This whole thing, a matrix,
each of whose components is a multivariable
function, is the Hessian. This is the Hessian of f,
and sometimes bold write it as Hessian of f specifying
what function its of. You could think of it as
a matrix valued function which feels kind of weird
but you plug in two different values, x and y, and you'll get a matrix, so it's this matrix valued function. The nice thing about
writing it like this is that you can actually extend
this so that rather than just for functions
that have two variables, let's say you had a function,
kind of like clear this up, let's say u had a function
that had three variables or four variables or any number. Let's say it was a
function of x, y, and z, then you can follow this
pattern and following down the first column here the
next term that you would get would be the second
partial derivative of f, where first you do with respect to x, and then you do it with respect to z. Then over here it would be
the second partial derivative of f, where first you
did it with respect to y and then you do it with respect to z, I'll clear up even more room here, because you'd have another
column where you'd have the second partial derivative,
where this time everything first you do it with respect to z
and then with respect to x. Then over here you'd have
the second partial derivative where first you do it with respect to z and then with respect to y. Then there is the very
last component you'd have the second partial derivative
where first you do it with respect to, well, I guess you
do it with respect to z twice. This whole thing, this three
by three matrix would be the Hessian of a three variable function. You can see how you could
extend this pattern where if it was a four variable function
you'd get a four by four matrix of all of the possible
second partial derivatives. If it was a 100 variable function you would have a 100 by 100 matrix. The nice thing about having
this is then we can talk about that by just referencing the symbol and we'll see in the
next video how this makes it very nice to express,
for example the quadratic approximation of any kind
of multivariable function not just a two variable
function and the symbols don't get way out of hand
'cause you don't have to reference each one of these
individual components. You can just reference
the matrix as a whole and start doing matrix operations. I will see you in that next video.