Main content

## Multivariable calculus

### Course: Multivariable calculus > Unit 3

Lesson 5: Lagrange multipliers and constrained optimization- Constrained optimization introduction
- Lagrange multipliers, using tangency to solve constrained optimization
- Finishing the intro lagrange multiplier example
- Lagrange multiplier example, part 1
- Lagrange multiplier example, part 2
- The Lagrangian
- Meaning of the Lagrange multiplier
- Proof for the meaning of Lagrange multipliers

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Lagrange multipliers, using tangency to solve constrained optimization

The Lagrange multiplier technique is how we take advantage of the observation made in the last video, that the solution to a constrained optimization problem occurs when the contour lines of the function being maximized are tangent to the constraint curve. Created by Grant Sanderson.

## Want to join the conversation?

- Why didn't grant use use the cross product of two gradient vector in order two show that they are in the same direction ? He could easily say that the cross product of them must be zero .(10 votes)
- This works in 2 or 3 dimensions, but the cross product is not defined higher than 3 dimensions, so it would not work in those situations. In contrast, the gradient is defined as long as the multivariate function is continuous.(17 votes)

- Why not just use the constraint to plug it in the function and then maximize the function?

like use x^2 = 1 - y^2 and plug it in the upper function to get a function only in y and them maximize it. When does this work?(8 votes)- Solving for a variable and substituting works for some functions where the constraint is simple, but when the constraint is really complex, it might not be possible or desirable. In those cases, Lagrange multipliers are the way to go.(3 votes)

- I don't get why the two curves have to be tangential to be maximized--doesn't that assume that as you move farther out, the f value increases? I get that in terms of this specific problem, but what about other functions that increase as you move in?(6 votes)
- That's a good question. I'm not sure this is the right explanation, but my intuition is that even if the function increases as you move in, eventually it's going to move in so much that it's still going to eventually end up tangential, just at some other point.(2 votes)

- What if there was a contour line of the function _f_ that would intersect the constraint curve, wouldn't be tangent to it and its value would be higher than the value of the contour lines of f tangent to the constraint curve. How would we optimize the function in such case ?(5 votes)
- If the gradient of G happens to be a square matrix, is it possible to multiply the right side of the gradient of F by the inverse of the matrix gradient G to find Lambda?

F*G^-1 = Lambda ?(3 votes)- Wait, in what case would a function have a square matrix as its gradient?(4 votes)

- so lambda is the eigenvalue of the matrix satisfying gradF = A*gradG?(3 votes)
- Is there a name given to the vector of first order partial derivatives shown at5:25? Why do we use such a vector? Thanks.(2 votes)
- Those can be called by a couple of different names, depending on the uses, including "gradient of f" (used in this video and denoted as ∇f) and the Jacobian matrix, denoted as J. The extension of this for second partial derivatives is the Hessian matrix, denoted as H. Hope this helps.(3 votes)

- Does this work for more than 2 variables?(3 votes)
- So just as a refresher: a gradient is basically the derivative of a set of vectors, right?(2 votes)
- Why can't we work with tangents directly instead of gradients? More explicitly, I'm proposing to find out the tangent equation to both curves in question and solving for their equality. Would this approach be an issue with more variables?(2 votes)

## Video transcript

- [Instructor] In the
last video I introduced a constrained optimization problem where we were trying to
maximize this function, f of x, y equals x squared times y, but subject to a constraint
that your values of x and y have to satisfy x squared
plus y squared equals one. And the way we were visualizing this was to look at the x, y
plane where this circle here represents our constraint. All of the points that make up this set, x squared plus y squared equals one, and then this curvy line here
is one of the contours of f, meaning, we're setting f of
x, y equal to some constant. And then I was varying
around that constant c. So for high values of c,
the contour would look something like this,
this is where the value of x squared times y is big. And then for small values of c, the contours would look like this. So all the points on this
line would be f of x, y equals like 0.01 in this
case, something like that. Then the way to think about
maximizing this function is to try to increase that value of c as much as you can without
it falling off the circle. And the key observation is that happens when they're tangent. So, you know, you might
kind of draw this out in a little sketch and
say there's some curve representing your constraint,
which in this case would be, you know, where our circle is. And then the curve
representing the contour would just kiss that curve, just barely touch it in some way. Now, that's pretty, but in
terms of solving the problem, we still have some work to do. And the main tool we're gonna
use here is the gradient. So let me go ahead and draw
a lot more contour lines than there already are
for x squared times y. So this is many of the contour lines, and I'll draw the gradient
field, the gradient field of f. So I've made a video
about the relationship between the gradient and contour lines. And the upshot of it is
that these gradient vectors, every time they pass
through a contour line, they're perpendicular to it. And the basic reason for that is if you walk along the contour line, the function isn't changing value, so if you want it to change most rapidly, you know, it kind of makes
sense you should walk in the perpendicular direction, so that no component of
the walk that you're taking is, you know, useless, is along the line where the function doesn't change. But again, there's a whole video on that that's worthy checking out
if this feels unfamiliar. For our purposes, what it means is that when we're considering
this point of tangency, the gradient of f at that
point is gonna be some vector perpendicular to both
the curves at that point. So that little vector
represents the gradient of f at this point on the plane. And we can do something very similar to understand the other curve. Right now I've just
written it as a constraint, x squared plus y squared equals one. But you know, to give
that function a name, let's say we've defined g of x, y to be x squared plus y squared, x squared plus y squared. In that case, this
constraint is pretty much just one of the contour
lines for the function g, and we can take a look at that. If we go over here and we look at all of the other contour
lines for this function g, and it should make sense
that they're circles, because this function is
x squared plus y squared. And if we took a look
at the gradient of g, and we go over and ask
about the gradient of g, it has that same property, that every gradient vector, if it passes through a contour line, is perpendicular to it. So over on our drawing here, the gradient vector of g
would also be perpendicular to both these curves. And you know, maybe in this case, it's not as long as the gradient
of f, or maybe it's longer. There's no reason that it
would be the same length, but the important fact is
that it's proportional. And the way that we're
gonna write this in formulas is to say that the
gradient of f evaluated, let's see, evaluated at
whatever the maximizing value of x and y are, so we should
give that a name probably. Maybe x sub m, y sub
m, the specific values of x and y that are gonna be at this point that maximizes the function
subject to our constraint. So that's gonna be related
to the gradient of g, it's not gonna be quite equal, so I'll leave some room here. Related to the gradient of g, evaluated at that same point, xm, ym. And like I said, they're not
equal, they're proportional. So we need to have some kind of proportionality constant in there. You almost always use the variable lambda, and this guy has a fancy name, it's called a Lagrange multiplier. Lagrange, Lagrange was one of those famous French mathematicians. I always get him confused
with some of the other French mathematicians at the
time like Legendre or Laplace, there's a whole bunch of things. Let's see, multiplier,
distracting myself talking here. So Lagrange multiplier. So there's a number of things
in multivariable calculus named after Lagrange, and
this is one of the big ones. This is a technique that
he kind of developed or at the very least popularized. And the core idea is to just set these gradients equal to each other, 'cause that represents
when the contour line for one function is tangent to
the contour line of another. So this, this is something
that we can actually work with. And let's start working it out, right, let's see what this
translates to in formulas. So I already have g written here, so let's go ahead and just evaluate what the gradient of g should be. And that's the gradient of
x squared plus y squared. And the way that we take our gradient is it's gonna be a vector whose components are
all partial derivatives. So the first component
is the partial derivative with respect to x. So we treat x as a variable,
y looks like a constant. The derivative is two x. The second component
the partial derivative with respect to y, so now we're
treating y as the variable, x is the constant, so the
derivative looks like two y. So that's the gradient of g. Then the gradient of f, gradient of f. It's gonna look like gradient of, let's see, what is x? What is f? It's x squared times y. So x squared times y. We do the same thing. First component partial
derivative with respect to x, x looks like a variable, so
it's derivative is two times x and then that y looks like a
constant when we're up here. But then partial derivative
with respect to y, that y looks like a variable,
that x squared just looks like a constant sitting in front of it. So that's what we get. And now if we kind of work out this Lagrange mulitplier expression using these two vectors,
what we have written, what we're gonna have
written is that this vector, two xy x squared is proportional with the
proportionality constant lambda to the gradient vector for g. So two x two y. And if you want you can think about this as two separate equations. I mean right now it's one
equation with vectors, but really what this is saying is you've got two separate equations, two times xy is equal to lambda, ah, gotta change colors a lot here. Lambda times two x. Hm, gonna be stickler for color. Keep red all of the
things associated with g. And then, this second equation is that x squared is equal to lambda times two y. And this might seem like a problem, because we have three unknowns, x, y, and this new lambda that we introduced. Kind of shot ourselves in the foot by giving ourselves a new
variable to deal with. But we only have two equations. So in order to solve this, we're
gonna need three equations. And the third equation is something that we've known the whole time. It's been part of the original problem. It's the constraint itself, x squared plus y squared equals one. So that, that third equation, x squared plus y squared is equal to one. So these are the three equations that characterize our
constrained optimization problem. The bottom one just tells you that we have to be on
this unit circle here. Allow me to just highlight it. We have to be on this unit circle. And then these top two
tell us what's necessary in order for our contour lines, the contour of f and the contour of g to be perfectly tangent with each other. So, in the next video, I'll
go ahead and solve this. At this point, it's pretty
much just algebra to deal with, but it's worthy going through. And then in the next couple ones, I'll talk about a way
that you can encapsulate all three of these equations
into one expression, and also a little bit
about the interpretation of this lambda that we introduced. 'Cause it's not actually
just a dummy variable, it has a pretty interesting
meaning in physical contexts once you're actually dealing with a constrained optimization
problem in practice. So I'll see you in the next video.