If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content
Current time:0:00Total duration:12:28

Video transcript

all right so today I'm going to be talking about the Lagrangian now we've talked about Lagrange multipliers this is a highly related concept in fact it's not really teaching anything new this is just repackaging stuff that we already know so to remind you of the set up this is going to be a constrained optimization problem set up so we'll have some kind of multivariable function f of X Y and the one I have pictured here is let's see it's x squared times e to the Y times y so what what I have shown here is a contour line for this function so that is we say what happens if we set this equal to some constant and we ask about all values of x and y such that this holds such that this function outputs that constant and if I choose a different constant and that contour line could look a little bit different it's kind of nice that it has similar shapes so that's that's the function and we're trying to maximize it the goal is to maximize this guy and of course it's not just that the reason we call it a constrained optimization problem is because there's some kind of constraint some kind of other function G of X Y in this case x squared plus y squared and we want to say that this has to equal some specific amount in this case I'm going to set it equal to 4 so we say you can't look at any X Y to maximize this function you're limited to the values of x and y that satisfied this property and I talked about this in the last couple videos and kind of the cool thing that we found was that you look through the various different contour lines of f and the maximum will be achieved when that contour line is just perfectly parallel to this contour of G and you know a pretty classic example for what you what these sorts of things could mean or how it's used in practice is if this was say a revenue function for some kind of company and kind of modeling your revenues based on different choices you could make running that company and the constraint that you'd have would be lets say a budget so I'm just going to go ahead and write budget or B for budget here so you're trying to maximize revenues and then you you have some sort of dollar limit for what you're willing to spend and these of course are just kind of made-up functions you'd never have a budget that looks like a circle and this kind of random configuration for your revenue but in principle you know what I mean right so the way that we took advantage of this tangency property and I think this is pretty clever let me just kind of redraw it over here you're looking at the point where the two functions are just tangent to each other is that the gradient the gradient vector for the thing we're maximizing which in this case is R is going to be parallel or proportional to the gradient vector of the constraint which in this case is B is going to be proportional to the gradient of the constraint and what this means is that if we were going to solve a set of equations what you set up is you compute that gradient of r and it will involve you know two different partial derivatives and you set it equal not to the gradient of B because it's not necessarily equal to the gradient of B but it's proportional with some kind of proportionality constant lambda now let me was kind of a squirrelly lambda lambda now that one's doesn't look good either does it y or lambda is so hard to draw all right that looks fine so the gradient of the revenue is proportional to the gradient of the budget and we did a couple examples of solving this kind of thing this gives you two separate equations from the two partial derivatives and then you use this right here this budget constraint as your third equation and the Lagrangian the point of this video this lagrangian function is basically just a way to package up this equation along with this equation into a single entity so it's not really adding new information and if you're solving things by hand it doesn't really do anything for you but what makes it nice is that it's something easier to hand a computer and I'll show you what I mean so I'm going to define the lagrangian itself which we write with this kind of funky looking script l and it's a function with the same inputs that your revenue function or the thing that you're maximizing has along with lambda along with that Lagrange multiplier and the way that we define it and I'm going to need some extra room so I'm going to say it's equal to and kind of define it down here the revenue function or whatever it is that you're maximizing the function that you're maximizing minus lambda that Lagrange multiplier so that's just another input to this new function that we're defining multiplied by the constraint function and this be evaluated at XY minus whatever that constraint value is in this case I put in 4 so you'd write minus 4 if we wanted to be more general maybe we would write you know be 4 whatever your budget is so over here you're subtracting off little B so this here is a new multi variable function right it's something where you could input X Y and lambda and just kind of plug it all in and you'd get some kind of value and remember B in this case is a is a constant so I'll go ahead and write that that this right here is not considered a variable this is just some constant your variables are X Y and lambda and this would seem like a totally weird and random thing to do if you just saw it out of context or if it was unmotivated but what's kind of neat and we'll go ahead and work through this right now is that when you take this is that when you take the gradient of this function called the Lagrangian and you set it equal to zero that's going to encapsulate all three equations that you need and I'll show you what I mean by that so let's let's just remember the gradient of L that that's a vector it's got three different components since L has three different inputs you're going to have the partial derivative of L with respect to X you're going to have the partial derivative of L with respect to Y and then finally the partial derivative of L with respect to lambda our Lagrange multiplier which we're considering an input to this function and remember whenever we write that a vector equals zero really we mean the zero vector often you'll see it in bold if it's in a textbook but what we're really saying is we set those three different functions the three different partial derivatives all equal to zero so this is just a nice like closed form compact way of saying that all of its partial derivative is equal to zero now let's go ahead and think about what those partial derivatives actually are so this first one the partial with respect to X partial derivative of the Lagrangian with respect to X it's kind of fun you know you have all these curly symbols the curly D the curly L it makes it look like you're doing some truly advanced math but really it's just kind of artificial fanciness right but anyway so we take the partial derivative with respect to X and what that equals is well it's whatever the partial derivative of R with respect to X is minus and then lambda from X's perspective at lambda just looks like a constant so it's going to be lambda and then this inside the parentheses the partial derivative of that with respect to X well it's going to be whatever the partial derivative of B is with respect to X but subtracting off that constant that doesn't change the derivative so this right here is the partial derivative of lambda with respect to X now if you set that equal to zero and I know I've kind of run out of room on the right here but if you set that equal to zero that's the same as just saying that the partial derivative of R with respect to X equals lambda times the partial derivative of B with respect to X and if you think about what's going to happen when you unfold this property that the gradient of R is proportional to the gradient of be written up here that's just the first portion of this right if we're setting the gradients equal then the first component of that is to say that the partial derivative of R with respect to X is equal to lambda times the partial derivative of V with respect to X and then if you do this for y if we take the partial derivative of this Lagrangian function with respect to Y it's very similar right it's going to be well you just take the partial derivative of R with respect to Y in fact it all looks just identical whatever R is you take as partial derivative with respect to Y and then we subtract off lambda looks like a constant as far as Y is concerned and then that's multiplied by well what's the partial derivative of this this term inside the parenthesis with respect to Y well is the partial of B with respect to Y and again again if you imagine setting that equal to zero that's going to be the same as setting this partial derivative term equal to lambda times this partial derivative term right you kind of just bring one to the other side so this second component of our lagrangian equals zero equation is just the second function that we've seen and a lot of these exam that we've been doing where you set one of the gradient vectors proportional to the other one and the only real difference here from stuff that we've seen already and even then it's not that different is that what happens when we take the partial derivative of this Lagrangian with respect to lambda with respect now we'll go ahead and give it that kind of green lambda color here well when we take that partial derivative if we kind of look up at the definition of the function R or never has a lambda in it right it's purely a function of x and y so that looks just like a constant when we're differentiating with respect to lambda so that's just going to be zero when we take its partial derivative and then this next component B of XY minus B all of that just looks like a constant as far as lambda is concerned right there's X's there's Y there's this constant B but none of these things have lambdas in them so when we take the partial derivative with respect to lambda this just looks like some big constant times lambda itself that's what we're going to get is I guess we're subtracting off right let's go up here kind of writing a minus sign we're subtracting off all the stuff that was in those parentheses B of X Y minus B that constant and this whole thing if we set that whole thing equal to 0 well that's pretty much the same as setting B of X Y minus B equal to 0 and that that's really just the same as saying hey we're setting B of XY equal to that little B right setting this partial derivative of the Lagrangian with respect to the Lagrange multiplier equal to zero boils down to the constraint right the third equation that we need to solve so in that way setting the gradient of this Lagrangian function equal to zero is just a very compact way of packaging three separate equations that we need to solve the constraint optimization problem and I'll emphasize that in practice if you you know if you actually see a function for our for the thing that you're maximizing and a function for the budget it's much better I think to just directly think about these parallel gradients and kind of solve it from there because if you construct the Lagrangian and then compute its gradient all you're really doing is repackaging it up only to unpackage it again but the point of this kind of the reason that this is a very useful construct is that computers often have really fast ways of solving things like this things like the gradient of some function equals zero and the reason is because that's how you solve unconstrained maximization problems right this is very similar to as if we just looked at this function l out-of-context and were asked hey what is its maximum value what are the critical points that it has and you said it's grading equal to zero so kind of the whole point of this Lagrangian is that it turns our constrained optimization problem involving R and B in this new made-up variable lambda into an unconstrained of an optimization problem where we're just setting the gradient of some function equal to zero so computers can often do that really quickly so if you just hand the computer this function it will be able to find you an answer whereas you know it's harder to say hey computer I want you to think about when the gradients are parallel and also consider this constraint function as just kind of a a cleaner way to package it all up so with that I'll see you next video where I'm going to talk about the significance of this lambda term how it's not just a ghost variable but it actually has a pretty nice interpretation for a given constraint problem