Interpretation of Lagrange multipliers
Lagrange multipliers are more than mere ghost variables that help to solve constrained optimization problems...
Lagrange multipliers technique, quick recap
When you want to maximize (or minimize) a multivariable function subject to the constraint that another multivariable function equals a constant, , follow these steps:
- Step 1: Introduce a new variable , and define a new function as follows:This function is called the "Lagrangian", and the new variable is referred to as a "Lagrange multiplier"
- Step 2: Set the gradient of equal to the zero vector.In other words, find the critical points of .
- Step 3: Consider each solution, which will look something like . Plug each one into . Or rather, first remove the component, then plug it into , since does not have as an input. Whichever one gives the greatest (or smallest) value is the maximum (or minimum) point your are seeking.
Budgetary constraints, revisited
The last article covering examples of the Lagrange multiplier technique included the following problem.
- Problem: Suppose you are running a factory, producing some sort of widget that requires steel as a raw material. Your costs are predominantly human labor, which is per hour for your workers, and the steel itself, which runs for per ton. Suppose your revenue is loosely modeled by the equationWhere
If your budget is , what is the maximum possible revenue?
- represents hours of labor
- represents tons of steel
You can get a feel for this problem using the following interactive diagram, which let's you see which values of yield a given revenue (blue curve) and which values satisfy the constraint (red line).
The full details of the solution can be found in the last article. For our purposes here, you just need to know what happens in principle as we follow the steps of the Lagrange multiplier technique.
- We start by writing the Lagrangian based on the function and the constraint .
- Then we find the critical points of , meaning the solutions to
- There might be several solutions to this equation,so for each one you plug in the and components to the revenue function to see which one actually corresponds with the maximum.
It's common to write this maximizing critical point as , using asterisk superscripts to indicate that this is a solution. This means and represent the hours of labor and tons of steel you should allocate to maximize revenue subject to your budget. But how can we interpret the Lagrange multiplier that comes with these maximizing values? This is the core question of the article.
It turns out that tells us how much more money we can make by changing our budget.
Let's get a feel for what it means to change the budget. The following tool is similar to the one above, but now the red line representing which points satisfy the budget constraint will shift as you let the budget vary around . This budget is represented with the variable .
For each value of the budget , try to maximize while ensuring that the curves still touch each other. Notice that the maximum -value you can achieve changes as changes. We are interested in studying the specifics of that change.
Let represent the maximum revenue you achieve. In the next interactive diagram, the only variable you can change is , and you can see how the value of depends on .
In other words, this maximum revenue is a function of the budget , so we write it as
We can now express a truly wonderful fact: The Lagrange multiplier gives the derivative of :
In terms of the interactive diagram above, this means tells you the rate of change of the black dot representing as you move around the green dot representing .
Showing why this is true is a bit tricky, but first, let's take a moment to interpret it. For example, if we found that , it would mean each additional dollar you spend over your budget would yield another in revenue. Conversely, decreasing your budget by a dollar will cost you that much in lost revenue.
This interpretation of comes up commonly enough in economics to deserve a name: "Shadow price". It is the money gained by loosening the constraint by a single dollar, or conversely the price of strengthening the constraint by one dollar.
Let's generalize what we just did with the budget example and see why it's true. Spelling out the full result is actually quite a mouthful, but it should be made clear by holding the following mantra in the back of your mind: "How does the solution change as the constraint changes?".
We start with the usual Lagrange multiplier setup. There is a function we want to maximize,
and a constraint,
We start by writing the Lagrangian,
Let be the critical point of , which solves our constrained optimization problem. In other words,
And maximizes (subject to the constraint).
When we start to think of as a variable, we must account for the fact that the solution changes as the constraint changes. To do this, we start writing each component as a function of :
In other words, when the constraint equals some value , the solution triplet to the Lagrange multiplier problem is .
We now let represent the (constrained) maximum value of as a function of , which can be written in terms of , and as follows:
The core result we wish to show is that
This says that the Lagrange multiplier gives the rate of change of the solution to the constrained maximization problem as the constraint varies.
Want to outsmart your teacher?
Proving this result could be an algebraic nightmare, since there is no explicit formula for the functions , , or . This means you would have to start with the defining property of , and , namely that , and reason your way towards . This is not at all straight forward (try it!).
There is a fun story, in which a professor was asked what the harshest truth he ever learned from a student was. He recalled a class he taught when he went through a long and algebraically heavy proof, only to be shown by a student that there is a much simpler approach. The lesson, he said, was that he was not as smart as he thought he was.
The result he was talking about just so happens to be what we are now trying to prove. Although the student's approach is not quite so simple as the story makes it out to be, it is still a clean way to view the problem. More importantly, it is easier to remember than other proofs, so I'll spell it out in full here. As happens so often in math, a little insight can save us from excessive algebra.
The underlying insight is that evaluating the Lagrangian itself at a solution will give the maximum value . This is because the "" term in the Lagrangian goes to zero (since a solution must satisfy the constraint), so we have
Given that we want to find , this suggests that we should find a way to treat as a function of . Then we might be able to relate the derivative we want to a derivative of with respect to .
Start by treating as a function of four variable instead of three, since is now modeled as a changing value:
Reflection question: When is written as a four-variable function like this, what is ?
This partial derivative is promising, since our goal is to show that , and we know that at solutions. However, we still have work to do.
To encode the fact that we only care about the value of at a solutions for a given value of , we replace and with and . These are functions of which correspond to the solution of the Lagrangian problem for a given choice of the "constant" .
This lets us write as a function of as follows:
Even though this expression has only one variable, , there is a four-variable function as an intermediary. Therefore, to take its (ordinary) derivative with respect to , we use the multivariable chain rule:
Note, each partial derivative in the expression above should be evaluated at , but writing that would make the expression more messy than it already is.
This might seem like a lot, but remember where the terms , and each came from. Each partial derivative , , and is zero when evaluated at . That's how a solution is defined! This means the first three terms go to zero.
Moreover, since , the entire expression simplifies to
It's important to notice that the reason for this simplification relies on the special properties of solution points . Otherwise, working out the full derivative based on the multivariable chain rule could have been a nightmare!
For the sake of notational cleanliness, we left out the inputs to these derivatives, but let's write them in.
Since we saw in the reflection question above that , this means
Want to join the conversation?
- While calculating dM*/dc why we take partial derivative with respect to x,y and λ and not x*,y* and λ*?(3 votes)
- You mean: "you can't differentiate with respect to a constant".(3 votes)
- In the previous article, there was an example with (lambda)=0, does this means that increasing the budget does not affect the revenue? and how are the constraints related to the budget now?(1 vote)
- Yes, this isn't explained all that clearly.
We are implicitly assuming that you are constrained by the budget - and thus increasing your budget should give you further revenue.
Mathematically, if you are constrained by your budget, then the optimal solution is at the boundary of the surface, meaning for optimal x*, and optimal y*, g(x*,y*) = c . In this case you have a positive lambda. Increasing c will lead to different, better x* and y*.
If you are not constrained by your budget, in the optimal case, you have g(x*, y*) < c . Thus increasing c doesn't give you any extra juice, as x* and y* don't change. In this case, lambda is 0.
In this article, it is implicitly assumed that you are constrained by your budget (or whatever your constraint is) so that increasing c will lead to different solutions. Otherwise, it becomes trivial.(1 vote)
- very nice explanation! I'm sure why we are interested in how the solution changes with a change in c?(1 vote)
- Yeah, i can kinda undertand and track the explanation, but i still have this feeling like i am leaving something out.
I Just can't say i'm undertand the why of the Lagrange multipliers at all.(1 vote)
- Weird that so much time was spent on Lagrangians in this unit, but it doesn't appear on the unit test at all and there's not even a quiz. I'd have liked to test my understanding of it.(1 vote)
- A lot of textbooks interpret the Lagrange multiplier this way (see Strang, Gilbert). But there is an easier way without having to invent an auxiliary function with four variables.
dM*/dc = df(x*,y*)/dc
df(x*, y*)/dc = f_x(x*, y*) (dx/dc) + f_y(x*, y*) (dy/dc)
, where the _x and _y are subscripts representing partial derivatives
But, f_x(x*, y*) = λ* g_x(x*, y*)
f_y(x*, y*) = λ* g_y(x*, y*)
df(x*, y*)/dc = λ*[g_x(x*, y*)(dx/dc) + g_y(x*, y*)(dy/dc)] = λdg(x, y*)/dc
g(x*, y*) = c
λdg(x*, y*)/dc = λ*dc/dc = λ*(1 vote)