Main content

## Multivariable chain rule

Current time:0:00Total duration:9:33

# Multivariable chain rule

## Video transcript

- [Voiceover] So I've written here three different functions. The first on is a multivariable function, it has a two variable input, x, y, and a single variable output, that's x squared times y, that's just a number, and then the other two
functions are each just regular old single variable functions. And what I want to do is start thinking about the composition of them. So, I'm going to take,
as the first component, the value of the function x of t, so you pump t through that, and then you make that
the first component of f. And the second component will be the value of the function y of t. So, the image that you
might have in your head for something like this is you can think of t as just living on a
number line of some kind, then you have x and y, which is just the plane, so that will be, you know, your x-coordinate, your y-coordinate, two-dimensional space, and then you have your output, which is just whatever the value of f is. And for this whole function, for this whole composition of functions, you're thinking of xt, yt, as taking a single point in t, and kind of moving it over to two-dimensional space somewhere, and then from there, our
multivariable function takes that back down. So, this is just the
single variable function, nothing too fancy going on
in terms of where you start and where you end up, it's just what's happening in the middle. And what I want to know is
what's the derivative of this function. If I take this, and it's
just an ordinary derivative, not a partial derivative, because this is just a
single variable function, one variable input, one variable output, how do you take it's derivative? And there's a special rule for this, it's called the chain rule, the multivariable chain rule, but you don't actually need it. So, let's actually walk through this, showing that you don't need it. It's not that you'll never need it, it's just for computations like this you could go without it. It's a very useful theoretical tool, a very useful model to have in mind for what function composition looks like and implies for derivatives
in the multivariable world. So, let's just start
plugging things in here. If I have f(x) and y(t), the first thing I might do is write okay, f, and instead of x of t,
just write in cosine of t, since that's the function
that I have for x of t, and then y we replace that with sine of t, sine of t, and of course I'm hoping to
take the derivative of this. And then from there, we can
go to the definition of f, f of xy equals f squared times y, which means we take that
first component squared. So we'll take that first
component, cosine of t, and then square it, square that guy, and then we'll multiply it
by the second component, sine of t, sine of t, and again we're just
taking this derivative. And you might be wondering, okay, why am I doing this, you're just showing me how
to take a first derivative, an ordinary derivative? But the pattern that
we'll see is gonna lead us to the multivariable chain rule. And it's actually kind of
surprising when you see it in this context, because it pops out in a way
that you might not expect things to pop out. So, continuing our chugging along, when you take the derivative of this, you do the product rule, left d right, plus right d left, so in this case, the left
is cosine squared of t, we just leave that as it is, cosine squared of t, and multiply it by the
derivative of the right, d right, so that's going to be cosine of t, cosine of t, and then we add to that right, which is, keep that right side unchanged, multiply it by the derivative of the left, and for that we use the chain rule, the single variable chain rule, where you think of taking the
derivative of the outside, so you plug two down, like you're taking the
derivative of two x, but you're just writing
in cosine, instead of x. Cosine t, and then you multiply that by
the derivative of the inside, that's a tongue twister, which is negative sine of t, negative sine of t. And I'm afraid I'm gonna
run off the edge here, certainly with the many many
parentheses that I need. I'll go ahead and rewrite this though. I'm gonna rewrite it anyway because there's a certain pattern
that I hope to make clear. So, let me just rewrite this side, let's copy that down here, I just want to rewrite this guy. You might be wondering why, but it'll become clear in just a moment why I want to do this. So, in this case, I'm gonna write this as two times cosine of t, times sine of t, then all of them multiplied
by negative sine of t, negative sine of t. So this is the derivative, this is the derivative of
the composition of functions that ultimately was a
single variable function, but it kind of wind through
two different variables. And I just want to make an observation in terms of the partial derivatives of f. So, let me just make a copy of this guy, give ourselves a little
bit of room down here, paste that over here. So let's look at the
partial derivatives of f for a second here. So, if I took the partial
derivative with respect to x, partial x, which means y is treated as a constant. So I take the derivative
of x squared to get two x, and then multiply it by that constant, which is just y, and if I also do it with respect to y, get all of them in there. So, now y looks like a variable, x looks like a constant, so x squared also looks like a constant, constant times a variable, the derivative is just that constant. These two, their pattern comes
up in the ultimate result that we got. And this is the whole
reason that I rewrote it. If you look at this two x y, you can see that over here, where cosine corresponds to x, sine corresponds to y, based
on our original functions, and an x squared here corresponds with squaring
the x that we put in there. Then if we take the derivative
of our two intermediary functions, the ordinary derivative
of x, with respect to t, that's derivative of cosine, negative sine of t, and then similarly derivative of y, just the ordinary derivative,
no partials going on here, with respect to t, that's equal to cosine, derivative of sine is cosine. And these guys show up, right, you see negative sine over here, and you see cosine show up over here. And we can generalize this, we can write it down and say at least for this specific example, it looks like the derivative
of the composition is this part, which is the partial of f with respect to y, right, that's kind of
what it looks like here, once we've plugged in the
intermediary functions, multiply it by this guy, was
the ordinary derivative of y, with respect to t. So, that was the ordinary derivative of y, with respect to t. And then very similarly, this guy was the partial of f, with respect to x, partial x, and we're multiplying it
by the ordinary derivative of x of t. So, over here, x of t, with respect to t. And of course, when I write
this partial f, partial y, what I really mean is
you plug in for x and y, the two coordinate functions, x of t, y of t. So, if I say partial
f, partial y over here, what I really mean is you take that x squared
and then you plug in x of t squared to get cosine squared. And same deal over here, you're always plugging things in, so you ultimately have a function of t. But this right here has a name, this is the multivariable chain rule. And it's important enough, I'll just write it out
all on it's own here. If we take the ordinary
derivative, with respect to t, of a composition of a
multivariable function, in this case just two variables, x of t, y of t, where we're plugging in
two intermediary functions, x of t, y of t, each of which just single variable, the result is that we take
the partial derivative, with respect to x, and we multiply it by the derivative of x with respect to t, and then we add to that the partial derivative with respect to y, multiplied by the derivative
of y with respect to t. So, this entire expression here is what you might call the simple version of the multivariable chain rule. There's a more general version, and we'll kind of build up to it, but this is the simplest
example you can think of, where you start with one dimension, and then you move over
to two dimension somehow, and then you move from those
two dimensions down to one. So, this is that, and in the next video I'm gonna talk about the intuition for why this is true. You know, here I just went
through an example and showed oh but it just happens to be true, it fills this pattern. But there's a very nice line of reasoning for where this comes about, and I'll also talk about
a more generalized form, where you'll see it. We start using vector notation, it makes things look very clean, and I might even get around
to a more formal argument for why this is true. So, we'll see in next video.