If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains ***.kastatic.org** and ***.kasandbox.org** are unblocked.

Main content

Current time:0:00Total duration:9:13

in the last couple videos I showed how you can take a function just a function with two inputs and find the tangent plane to its graph and the way that you think about this you first find a point some kind of input point which is you know I'll just write abstractly as X naught and y naught and then you see where that point ends up on the graph and you want to find a new function a new function which we were calling L and maybe you say L sub F which also is a function of x and y and you want the graph of that function to be a plane tangent to the graph now this often goes by another name this will go under the name local linearization local linearization is kind of a long word zation and what this basically means the word local means you're looking at a specific input point so in this case it's a specific input point X naught Y naught and the idea of a linearization a linearization means you're approximating the function with something simpler with something that's actually linear and I'll tell you what I mean by linear in just a moment but the whole idea here is that we don't really care about you know tangent planes in an abstract 3d space to some kind of graph the whole reason for doing this is that this is a really good way to approximate a function which is potentially a very complicated function with something that's much easier something that has constant partial derivatives now my goal of this video is going to be to show how we write this local linearization here in vector form because it'll be both more compact and hopefully easier to remember and also it's more general it'll apply to things that have more than just just two input variables like this one does so just to remind us of where we were and what we got to in the last couple videos I'll write a little bit more abstractly this time rather than a specific example the way you do this local linearization is first you find the partial derivative of F with respect to X which I'll write with the subscript notation and you evaluate that at X sub o or X naught Y naught U evaluated at the point about which you're approximating and then you multiply that by X minus that constant so the only variable right here everything is a constant but the only variable part is that X and then we add to that basically doing the same thing with Y you take the partial derivative with respect to Y you evaluate it at the input point the point about which you are linearizing and then you multiply it by Y minus y sub o and then to this entire thing because you want to make sure that when you evaluate this function at the input point itself you see when you plug in X naught and y naught this term goes to 0 because X naught minus X naught is 0 this term goes to 0 and this is the whole reason we kind of paired up these terms and organized the constants in this way this way you can just think about adding whatever the function itself evaluates to at that point and this will ensure that your linearization actually equals the function itself at the local point because hopefully if you're approximating it near a point then at that point it's actually equal so what do I mean by this word linear the word linear has a very precise formulation especially in the context of linear algebra and admittedly this is not actually a linear function in the technical sense but loosely what it means and the reason people call it linear is that this X term here this variable term doesn't have anything fancy going on with it it's just being multiplied by a constant and similarly this Y term it's just being multiplied by a constant it's not squared there's no square root it's not in an exponent or anything like that and although there is a more technical meaning of the word linear this is all it really means in this context this is all you need to think about each variable is just multiplied by a constant now you might see this in a more complicated form or what's at first a more complicated form using vectors so first of all let's think about how we would start describing everything going on here with vectors so the input rather than talking about the input is being a pair of points what I want to say is that there's some vector some vector that has these as its components and we just want to capture that all and I want to give that a name and kind of unfortunately the name that we give this it's very common to just call it X and maybe a bold-faced X and that would be easier to do typing than it is writing so I'll just kind of try to emphasize bold-faced x equals this vector and whether that's confusing because x is already one of the input variables that's just a number but I'll try to emphasize it just making it bold you'll see this in writing a lot X is this input vector and then similarly the specified specified input about which we are approximating you would call and say I'll make it a nice bold-faced X naught well we'll do that not to just kind of indicate that it's a constant of some kind and what that is it's a vector containing the two numbers X not Y not so this is just our starting to write things in a more vectorized way and the convenience here is that then if you're dealing with a function with three input variables or four or a hundred you could still just write it as this bold-faced X with the understanding that the vector has a lot more components so now let's take a look at at these first two terms in our linearization we can start thinking of this as a as a dot product actually so let me first just kind of move this guy out of the way and give ourselves some room so he's going to just go up there just the same guy and I want to think about writing this other term here as a as a dot product and what that looks like is we have the two partial derivatives F sub X and F sub y indicating the partial derivatives with respect to X and Y and each one of them is evaluated let's see I'll do it I'll do it evaluating at our bold-faced X naught then this one is also evaluated at that bold-faced X naught so really you're thinking about this as being you know a vector that contains two different variables you're just packing it into a single symbol and the dot product here is against you know the first component is X minus X naught so I'd write that as X minus X naught the number and then similarly Y minus let's the other in the same color why not the number but we can write each one of these in a more compact form where this the vector that has the partial derivatives that's the gradient and if that feels unfamiliar maybe go back and check out the videos on the gradient but this whole vector is basically just saying take the gradient and evaluate it at that that vector input you know X naught and then the second component here that's telling you you've got x and y minus X naught Y naught so what you're basically doing is taking the you know bold-faced input the variable vector X and then you're subtracting off you know X naught where X naught is some kind of constant so this right here this is just vector terms where you're thinking of this as being a vector with two components and this one is a vector with two components but if your function happened to be something more complicated with you know a hundred input variables this would the same thing you write down you would just understand that when you expand this there's going to be a hundred different components in the vector and this is what a linear term looks like in vector terminology because this dot product is telling you that all of the components of that bold-faced X vector the that expands into you know not boldface X Y Z whatever else it expands to all of those are just being multiplied by some kind of constant so we take that whole thing that's that's how you simplify the first couple terms here and of course we just add on the value of the function itself so you take that as the linear term and now I kind of like to add it on to the front actually where you think about taking the function itself and evaluating it at that that constant input X naught because that way you can kind of think this is your constant term this is your constant term and then the rest of the stuff here is your linear term most of your stuff is your linear because later on if we start adding other terms like a quadratic term or more complicated things you can kind of keep adding them on the end so this right here is the expression that you will often see for the local linearization and the only place where the actual variable shows up the variable vector is right here is this guy because you know when you evaluate the function f at a specified input that's just a constant when you evaluate the gradient at that input it's just a constant and we're subtracting off that that specified input that's just a constant so this is the only place where your variable shows up so once all is said and done and once you do your computations this is a very simple function and the important part is maybe this is much simpler than the function f itself which allows you to you know maybe compute something more quickly if you're writing a program that needs to you know deal with some kind of complicated function but run time is an issue or maybe it's a function that you never knew in the first place but you were able to approximate its value at a point and approximate its gradient so this is what lets you approximate the function as a whole near that point so again this might look very abstract but if you just kind of unravel everything and think back to where it came from and look at look at the specific example of a you know tangent plane hopefully it all makes a little bit of sense and you see that this is really just the simplest possible function that evaluates to the same value as f when you input this point and whose partial derivatives all evaluate to the same values as those F at that specified point and if you want to see more examples of this and what it looks like and maybe how you can use it to approximate certain functions I have an article on that that you can go check out and it would be particularly good to kind of go in with a piece of paper and sort of work through the examples your yourself as you as you work through it and with that said I will see you next video