If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

# Proof for the meaning of Lagrange multipliers

## Video transcript

all right so last video I showed you guys this really crazy fact we have our usual setup here for this constrained optimization situation we have a function we want to maximize which I'm thinking of as revenues for some company a constraint which I'm thinking of as some kind of budget for that company and as you know if you've gotten to this video one way to solve this constrained optimization problem is to define this function here the Lagrangian which involves taking this function that you're trying to maximize in this case the revenue and subtracting a new variable lambda what's called the Lagrange multiplier times this quantity which is the budget function you know however much you spend as a function of your input parameters minus the budget itself which you might think of as you know \$10,000 in our example so that's all the usual set up and the crazy fact which I just declared is that when you set this gradient equal to zero and you find some solution and there will be three variables in this solution H star s star and lambda star that this lambda star is not meaningless it's not just a proportionality constant between these gradient vectors but it will actually tell you how much the maximum possible revenue changes as a function of your budget and the way to start writing all of that in formulas would be to make explicit the fact that if you consider this value you know the \$10,000 that is your budget which I'm calling be a variable and not a constant then you have to acknowledge that H star and s star are dependent on B right it's a very implicit relationship something that's kind of hard to think about it first because as you change B it changes what the Lagrangian is which is going to change where it's gradient equals zero which changes what H star s star and lambda star are but in principle they are some function of that budget of B and the maximum possible revenue is whatever you get when you just plug in that solution to your function R and the claim I made that I just pulled out of the Hat is that lambda star the lambda value that comes packaged in with these two when you set the gradient of the lagrangian equal to zero equals the derivative of this maximum value thought of as a function of B maybe I should emphasize that you know we're thinking of this maximum value as a function of B with respect to B so that's kind of a mouthful it takes a lot just to even phrase what's going on but in the context of an economic example it has a very clear precise meaning which is if you increase your budget by a dollar right if you increase it from ten thousand dollars to ten thousand and \$1 you're wondering for that tiny change in budget that tiny DB what is the ratio of the resulting change in revenue so in a sense this lambda star tells you for every dollar that you increase the budget how much can your revenue increase if you're always maximizing it so why on earth is this true right this just seems like it comes out of nowhere well there are a couple clever observations that go into proving this the first the first is to notice what happens if we evaluate this Lagrangian function itself at this critical point when you input H star s star and lambda star and remember the way that these guys are defined is that you look at all of the values where the gradient of the lagrangian equals the zero vector and then if you get multiple options you know sometimes when you set the gradient equal to zero you get multiple solutions and whichever one maximizes our that is H star s star lambda star so now I'm just asking if you plug that naught into the gradient of the Lagrangian but to the Lagrangian itself what do you get well you're going to get we just look at its definition up here are evaluated at H star and s star right and we subtract off lambda star times B of H star s star minus the constant that is your budget you know something you might think of as as \$10,000 whatever you said little B equal to okay grant you might say why why does this tell us anything you're just plugging in stars instead of the usual variables but the key is that if you plug in H star and s star this value has to equal zero because H star and s star have to satisfy the constraint remember one of the cool parts about this lagrangian function as a whole is that when you take its partial derivative with respect to lambda all that's left is this constraint function minus the constraint portion when you set the gradient of the Lagrangian equal to the zero vector one component of that is to set the partial derivative with respect to lambda equal to zero and if you remember from the Lagrangian video all that really boils down to is the fact that the constraint holds right which would be your budget achieves \$10,000 when you plug in the appropriate H star and s star to this value you are hitting this constrained amount of money that you can spend so by virtue of how H star and s star are defined the fact that they are solutions to the constrained optimization problem means this whole portion goes to zero so we can just kind of cancel all that out and what's left what's left here is the maximum possible revenue right so evidently when you evaluate the lagrangian at this critical point at h star s star and lambda star it equals m star it equals the maximum possible value for the function you're trying to maximize so ultimately what we want is to understand how that maximum value changes when you consider it a function of the budget so evidently what we can look for is to just ask how the Lagrangian changes as you consider it a function of the budget now this is an interesting thing to observe because if we just look up at the definition of the Lagrangian if you just look at this formula if I told you to take the derivative of this with respect to little B right you know how much does this change with respect to little B you would notice that this goes to zero it doesn't have a little B this would also go to zero and all you'd be left with would be negative lambda times negative B and the derivative of that with respect to B would be lambda so you might say oh yeah of course of course this the derivative of that Lagrangian with respect to B once we work it all out the only term that was left there was the lambda and that's compelling but ultimately it's not entirely right that overlooks the fact that L is not actually defined as a function of B when we define that the Lagrangian we were considering B to be a constant so if you really want to consider this to be a function that involves B the way we should write it and I'll go ahead and erase this guy the way we should write this Lagrangian is to say you're a function of H star which itself is dependent on B and s star which is also a function of B right as soon as we start considering be a variable and not a constant we have to acknowledge that this critical point H star s star and lambda star depends on the value of B so likewise that lambda star lambda star is also going to be a function of B and then we can consider as a fourth variable right so we're adding on yet another variable to this function the value of B itself the value of B itself here so now when we want to know what is the value of the Lagrangian at the critical point H star s star lambda star as a function of B all right so that can be kind of confusing what you basically have is this function that only really depends on one value right it only depends on B but it kind of goes through a for variable function and so just to make it explicit this would equal the value of R as a function of H star and s star and each one of those is a function of little B right so this term is saying what's your revenue evaluated on the maximizing H and s for the given budget and then you subtract off lambda star so here I should probably I'm not gonna have room here am I so what you subtract off minus lambda star at B of H star and s star but each of these guys is also a function of little B little B minus little B right so you have this large kind of complicated multivariable function it's defined in terms of H stars and s stars which are themselves very implicit right we just say by definition these are whatever values make the gradient of L equal 0 so very hard to think about what that means concretely but all of this is really just dependent on the single value little B and from here if we want to evaluate the derivative of L we want to evaluate the derivative of this Lagrangian with respect to little B which is really the only thing it depends on it's just via all of these other variables we use the multivariable chain rule and at this point if you don't know the multivariable chain rule I have a video on that definitely pause go take a look make sure that it all makes sense but right here I'm just going to be assuming that you know what the multivariable chain rule is so what it is is we take the partial we're going to look at the partial derivatives with respect to all four of these inputs so we'll start with the partial derivative of L with respect to H star with respect to H star and we're going to multiply that by the derivative of H star with respect to B and this might seem like a very hard thing to think about like how do we know how H star changes as B changes but don't worry about it you'll see something magic happen in just a moment and then we add in partial derivative of L with respect to that second variable s star with respect to whatever the second variable is x the derivative of s star with respect to B you can see how you really need to know what the multivariable chain rule is right this would all seem kind of out of the blue so what we now add in is the partial to Rivet --iv of L with respect to that lambda star with respect to lambda star x the derivative of lambda star with respect to little b and then finally finally we take the partial derivative of this Lagrangian with respect to that little B which we're now considering a variable in there right we're no longer considering that be a constant x well something kind of silly the derivative of B with respect to itself so now if you're thinking that this is going to be horrifying to compute I can understand where you're coming from you have to know the derivative of lambda star with respect to B you have to somehow intimately be familiar with how this lambda star changes as you change B and like I said that's such an implicit relationship right we just said that lambda star is by definition whatever the solution to this gradient equation is so somehow you're supposed to know how that changes when you when you slightly alter B over here well you don't really have to worry about things because by definition H star s star and lambda star are whatever values make the gradient of L equal to 0 but if you think about that what does it mean for the gradient of L to equal the 0 vector well what it means is that when you take its derivative with respect to that first variable H star it equals 0 when you take its derivative with respect to the second variable that equals 0 as well and with respect to this third variable that's going to equal 0 by definition H star s star and lambda star are whatever values make it the case so that when you plug them in the partial derivative of the Lagrangian with respect to any one of those variables equals 0 so we don't even have to worry about most of this equation the only part that matters here is the partial derivative of L with respect to B that we're now considering a variable x well what's DB DB and how what is the rate of change of a variable with respect to itself it's 1 it is 1 so all of this stuff this entire multivariable chain rule boils down to a single innocent-looking factor which is the partial derivative of L partial derivative of L with respect to little B and now there's something very subtle here right because this might seem obvious I'm saying the derivative of L with respect to B equals the derivative of L with respect to B but maybe I should give a different notation here right because here when I'm taking the derivative really I'm considering L as a single variable function right I'm considering not what happens as you can freely change all four of these variables three of them are locked into place by B so maybe I should really give that a different name I should call that L star L star is a single variable function whereas this L is a multi variable function this is the function where you can freely change the values of H and s and lambda and B as you put them in so if we kind of scroll up to look at its definition which I've written all over I guess here let me actually rewrite its definition right I think that'll be useful I'm going to rewrite that l if I consider it as a for variable function of H s lambda and B that what that equals is are evaluated H and s minus lambda multiplied by this constraint function B evaluated at H and s minus little B and this is now when I'm considering little B to be a variable so this is the Lagrangian when you consider all four of these to be freely changing as you want whereas the thing up here that I'm considering a single variable function has three of its inputs locked into place so effectively it's just a single variable function with respect to B so it's actually quite miraculous that the single variable derivative of that L nearby should L star with respect to B ends up being the same as the partial derivative of L this L where you're free to change all the variables that these should be the same usually in any usual circumstance all of these other terms would have come into play somehow but what's special here is that by the definition of this L star the specific way in which these H star s star and lambda stars are locked into place happens to be one in which all of these partial derivatives go to zero so that's pretty subtle and I think it's quite clever and what it leaves us with is that we just have to evaluate this partial derivative which is quite simple because we look down here and you say what's the partial derivative of L with respect to B well this R has no B's in it so don't need to care about that this term over here it's partial derivative is negative one right just because there's a B here and that's multiplied by the constant lambda so that all just equals lambda but if we're in the situation where lambda is locked into place as a function of little B then we'd write lambda star as a function of little B right so if that feels a little notationally confusing I'm right there with you but the important part here the important thing to remember is that we just started considering B as a variable right and we were looking at the H star s star and lambda star as they depended on that variable we made the observation that the Lagrangian evaluated at that critical point equals the revenue evaluated at that critical point the rest of the stuff just cancels out so if you want to know the derivative of M star the maximizing revenue with respect to the budget you know how much does your maximum revenue change for tiny changes in your budget that's the same as looking at the derivative of the Lagrangian with respect to the budget so long as you're considering it only on values H star s star lambda star that our critical points of the Lagrangian and all of that really nicely boils down to just taking a simple partial derivative that gives us the relation we want