Main content

## Multivariable calculus

### Course: Multivariable calculus > Unit 3

Lesson 1: Tangent planes and local linearization# Local linearization

A "local linearization" is the generalization of tangent plane functions; one that can apply to multivariable functions with any number of inputs. Created by Grant Sanderson.

## Want to join the conversation?

- The formula for local linearization reminded me of Taylor Polynomial (Taylor Approximation) in single variable functions. Are these two related in some way?(15 votes)
- Yes, when you take a Taylor polynomial and discard everything with larger than 1st order derivative, you get a local linearization for your single variable function - a line approximating your function at a given point. You can do this at multivariable calculus too - here you get a plane instead of a line. And off course, you can approximate your surfaces further with larger order partial derivatives - for examle quadratic approximations of multivariable functions are analogous to approximating single variable functions with 2nd degree Taylor polynomials.(17 votes)

- why did Grant say that it is not technically a linear function?(6 votes)
- I think he meant the linear function definition which requires that the graph of the function
**goes through the origin**. The tangent planes in this video do not (in general) pass through the origin, therefore, they are not represented by linear functions, strictly speaking.(10 votes)

- If it is in 3D, can I randomly take two directional derivative to get the function of the plane? For example:

We have a function f(x, y), and we can find its gradient.

I randomly use a vector, say [1, 0], and find its directional derivative at a specific point P(a, b, c), and let's say the result is 5. Then we have a vector [1, 0, 5].

I do this again with a different vector, and get another vector, say [0, 1, -2].

Therefore the equation for the plane will be:

[x, y, z] = [a, b, c] + t[1, 0, 5] + s[0, 1, -2]

Is this method correct? If it is, is there a way to make this more general, to make it work in higher dimentions, or in other words, to vectorize it?(3 votes)- Yes, this works perfectly fine. The simplest way is to always use the coordinate vectors, (1, 0) and (0, 1). If the plane is z = ax + by + c, then the gradient is (a, b) everywhere. Then taking the directional derivative in the x direction, we get a. In the y direction, it's b. So two vectors are (1, 0, a) and (0, 1, b), and we shift them by (0, 0, c). The parameterization is x = s, y = t, z = as + bt + c. But, as you might see, this isn't a really useful trick since you've just obtained a relatively obvious result.(4 votes)

- What's really cool to me is that this formula echoes exactly the equation for tangent lines in single variable calculus.(4 votes)
- ∇f( x₀ ) ⋅ ( x - x₀ ) + f( x₀ ) is the 3D equivalent of:

f’( xₒ ) ⋅ ( x - x₀ ) + f( xₒ )(4 votes) - should it be del f(xo,yo) in the vector form?(3 votes)
- At6:03, why is the gradient expressed as f(x0) + ∇f(x0) dotted by (x-x0), instead of f(x0, y0) + ∇f(x0, y0) dotted by (x-x0, y-y0)?(1 vote)
- The bold
**x**represents the vector <x, y>

the bold**xo**is the vector <xo, yo>(2 votes)

- What does Grant mean when he says "variable multiplied by a constant"? It will still be a variable so why does he mention it?(1 vote)
- For higher dimensional inputs would we have tangent n-spaces?(1 vote)
- No, a tangent plane of a graph of a function requires 2 partial derivatives and a point of reference in space. For a function with n number of inputs, the condition still applies. However, if you have to find approximations or direction of steepest ascent in higher dimensional graphs, there are different tools for the same.(1 vote)

- So in the multivariate case this is just the Jacobian matrix of f evaluated at a point p0, the best linear approximation to f at a point p0. In this univariate case, can we see this as the directional derivative in the direction of a "nudge vector" comprised of a nudge in x (dx) and a nudge in y (dy) directions, from x0 to x, plus an offset (f evaluated at x0) ? I'm trying to see how we generalize this from a regular single variable scalar functions, to multivariable scalar functions and multivariable vector functions, and if this applies to higher dimensions, i.e, tensors of 2, and above rank.(1 vote)

## Video transcript

- [Voiceover] In the last couple videos, I showed how you can take a function, ah, just a function with two inputs, and find the tangent plane to its graph, and the way that you think about this, you first find a point, some kind of input point, which is, you know I'll just write abstractly as x nought and y nought. And you see where that
point ends up on the graph, and you wanna find a new function, a new function which we were calling L, and maybe you say L sub f, which also is a function of x and y. And you want the graph of that function to be a plane tangent to the graph. Now this often goes by another name. This will go under the
name Local Linearization, Local linearization, this is
kind of a long word, zation. And what this basically means, the word local means you're looking at a specific input point. So in this case, it's a specific input
point x nought, y nought, and the idea of a linearization, a linearization, means
you're approximating the function with something simpler, with something that's actually linear, and I'll tell you what
I mean by linear in, in just a moment. But the whole idea here is that
we don't really care about, you know, tangent planes
in an abstract 3D spaced to some kind of graph. The whole reason for doing this, is that this is a really good
way to approximate a function, which is potentially a
very complicated function with something that's much easier, something that has constant
partial derivatives. Now my goal of this
video is gonna be to show how we write this local linearization here in vector form, because
it'll be both more compact, and hopefully easier to remember, and also it's more general. It'll apply to things
that have more than just, just two input variables
like this one does. So just to remind us of where we were, and what we got to in
the last couple videos, I'll write a little bit
more abstractly this time, rather than a specific example. The way you do this local linearization is first you find the partial derivative of f with respect to x, which I'll write with
the subscript notation. And you evaluate that at x of o or x nought, y nought. You evaluate it at the point about which you're approximating and then you multiply that
by x minus that constant. So the only variable right here, everything is a constant, but the only variable part is that x. And then we add to that, basically doing the same thing with y. You take the partial
derivative with respect to y, you evaluate it at the input point, the point about which you are linearizing, and then you multiply
it by y minus ys of o. And then to this entire thing because you wanna make sure that when you evaluate this function
at the input point itself. You see, when you plug
in x nought and y nought, this terms goes to zero, cause x nought minus x nought is zero. This terms goes to zero, and this is the whole
reason we kind of paired up these terms and organized
the constants in this way. This way, you can just think about adding whatever the function itself
evaluates to at that point. And this will ensure
that your linearization actually equals the function itself at the local point. Cause hopefully if you're
approximating it near a point, then at that point, it's actually equal. So what do I mean by this word linear? The word linear has a
very precise formulation, especially in the context
of linear algebra, and admittedly, this is not actually a linear function in the technical sense. But loosely what it means, and the reason people call it linear, is that this x term
here, this variable term, doesn't have anything
fancy going on with it. It's just being multiplied by a constant, and similarly this y term it's just being multiplied by a constant. It's not squared, there's no square root, it's not in an exponent
or anything like that. And although there is a
more technical meaning of the word linear, this is all it really
needs in this context. This is all you need to think about. Each variable is just
multiplied by a constant. Now you might see this in
a more complicated form, or what's at first a more
complicated form using vectors. So first of all, let's think about how we would start describing everything going on here with vectors. So the input, ah, rather than talk about the
input as being a pair of points, what I wanna say is that
there's some vector, some vector that has
these as its components, and we just wanna capture that all, and I wanna give that a name. And kind of unfortunately
the name that we give this, it's very common to just call it x, and maybe a bold-faced x and that would be easier to do typing than it is writing, so I'll just kind of try
to emphasize bold-faced x equals this vector, and where's, that's confusing cause x is already one of the input
variables that's just a number. Um, but I'll try to emphasize it, just making it bold. You'll see this in writing a lot. X is this input vector, and then similarly, the specified, ah, specified input about
which we are approximating, you would call, see I'll make it a nice bold-faced x nought. We'll, we'll do that nought
to just kind of indicate that it's a constant of some kind. And what that is, it's a vector containing the two numbers x nought, y nought. So this is just us
starting to write things in a more vectorized way and the convenience here is that, that if you're dealing with a function with three input variables
or four or a hundred, you could still just write it as this bold-faced x with the understanding that the vector has a lot more components. So now, let's take a look at, at these first two terms
in our linearization. We can start thinking of this as a, as a dot product, actually. So let me first just kind of move this guy out of the way and give
ourselves some room. So he's gonna just go up there, this the same guy, and now I wanna think about
writing this other term here as a, as a dot product. And what that looks like is we have the two partial derivatives fs of x and fs of y, indicating
the partial derivatives with respect to x and y, and each one of them is evaluated. Let's see, I'll do it, I'll do it evaluating at our bold-faced x nought, and then this one is also evaluated at that bold-faced x nought. So, really you're thinking
about this as being, you know, a vector that contains
two different variables. You're just packing it
into a single symbol, and the dot product here is against, ah, you
know, the first component is x minus x nought, so I'd write that as x
minus x nought the number, and then similarly, y minus, let's see, I'll do it in the same color, y nought the number. Ah, but we can write each one of these in a more compact form, where this, the vector that
has the partial derivatives, that's the gradient, and
if that feels unfamiliar, maybe go back and check out
the videos on the gradient, but this whole vector is
basically just saying, take the gradient and evaluate it at that, that vector input, you know, x nought. And in the second component here, that's telling you you've got x and y minus x nought and y nought. So what you're basically doing is taking the, you know, bold-faced input, the variable vector x, and then you're subtracting off, you know, x nought, where x
nought is some kind of constant. So this right here, this is just vector terms
where you're thinking of this as being a vector with two components, and this one is a vector
with two components, but if your function happened to be something more complicated, with, you know, a hundred input variables, this would be the same
thing you write down. You would just understand
that when you expand this, there's gonna be a hundred different components in the vector. Um, and this is what a linear term looks like in vector terminology, cause this dot product is telling you that all of the components of that bold-faced x vector, the, that expands into, you know, not bold-faced x, y, z,
whatever else it expands to. All of those are just being multiplied by some kind of constant. So we take that whole thing, that's, that's how you simplify
the first couple terms here, and of course, we just add on the value of the function itself. So you would take that as the linear term. And no, I kind of like to
add it on to the front, actually, where you think about taking the function itself and evaluating it at that, that constant input x nought, cause that way you can kind of think this is your constant term,
this is your constant term, and then the rest of the stuff
here is your linear term. Rest of your stuff is your linear. Cause later on if we
start adding other terms like a quadratic term or more complicated things, you can kind of keep
adding them on the end. So this right here, is the expression that you will often see for the local linearization. And the only place where the
actual variable shows up, the variable vector, is
right here, is this guy. Cause, you know, when
you evaluate the function f at a specified input,
that's just a constant. when you evaluate the
gradient at that input, it's just a constant. And we're subtracting off that, that um, specified input
that's just a constant. So this is the only place
where your variable shows up. So once all is said and done, and once you do your computations, this is a very simple function. And the, the important part
is maybe this is much simpler than the function f itself, which allows you to, you know, maybe compute something more quickly if you're writing a program that needs to, you know, deal with some
kind of complicated function, but runtime is an issue or maybe, it's a function that you
never knew in the first place, but you were able to approximate
its value at a point, and approximate its gradient. So this is what lets you approximate the function as a whole near that point. So again, this might look very abstract, but if you just kind of unravel everything and think back to where it came from and look at, look at the specific example of a, you know, tangent plane, um, hopefully it all makes
a little bit of sense and you see that this is
really just the simplest possible function that
evaluates to the same value as f when you input this point, and whose partial derivatives all evaluate to the same values as those of f at that specified point. And if you wanna see
more examples of this, and what it looks like and
maybe how you can use it to approximate certain functions, I have an article on that,
that you can go check out, and it would be
particularly good to kind of go in with a piece of paper and sort of work through
the examples yourself as you, as you work through it. And with that said, I will see you next video.