If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Multivariable chain rule

This is the simplest case of taking the derivative of a composition involving multivariable functions. Created by Grant Sanderson.

Want to join the conversation?

Video transcript

- [Voiceover] So I've written here three different functions. The first on is a multivariable function, it has a two variable input, x, y, and a single variable output, that's x squared times y, that's just a number, and then the other two functions are each just regular old single variable functions. And what I want to do is start thinking about the composition of them. So, I'm going to take, as the first component, the value of the function x of t, so you pump t through that, and then you make that the first component of f. And the second component will be the value of the function y of t. So, the image that you might have in your head for something like this is you can think of t as just living on a number line of some kind, then you have x and y, which is just the plane, so that will be, you know, your x-coordinate, your y-coordinate, two-dimensional space, and then you have your output, which is just whatever the value of f is. And for this whole function, for this whole composition of functions, you're thinking of xt, yt, as taking a single point in t, and kind of moving it over to two-dimensional space somewhere, and then from there, our multivariable function takes that back down. So, this is just the single variable function, nothing too fancy going on in terms of where you start and where you end up, it's just what's happening in the middle. And what I want to know is what's the derivative of this function. If I take this, and it's just an ordinary derivative, not a partial derivative, because this is just a single variable function, one variable input, one variable output, how do you take it's derivative? And there's a special rule for this, it's called the chain rule, the multivariable chain rule, but you don't actually need it. So, let's actually walk through this, showing that you don't need it. It's not that you'll never need it, it's just for computations like this you could go without it. It's a very useful theoretical tool, a very useful model to have in mind for what function composition looks like and implies for derivatives in the multivariable world. So, let's just start plugging things in here. If I have f(x) and y(t), the first thing I might do is write okay, f, and instead of x of t, just write in cosine of t, since that's the function that I have for x of t, and then y we replace that with sine of t, sine of t, and of course I'm hoping to take the derivative of this. And then from there, we can go to the definition of f, f of xy equals f squared times y, which means we take that first component squared. So we'll take that first component, cosine of t, and then square it, square that guy, and then we'll multiply it by the second component, sine of t, sine of t, and again we're just taking this derivative. And you might be wondering, okay, why am I doing this, you're just showing me how to take a first derivative, an ordinary derivative? But the pattern that we'll see is gonna lead us to the multivariable chain rule. And it's actually kind of surprising when you see it in this context, because it pops out in a way that you might not expect things to pop out. So, continuing our chugging along, when you take the derivative of this, you do the product rule, left d right, plus right d left, so in this case, the left is cosine squared of t, we just leave that as it is, cosine squared of t, and multiply it by the derivative of the right, d right, so that's going to be cosine of t, cosine of t, and then we add to that right, which is, keep that right side unchanged, multiply it by the derivative of the left, and for that we use the chain rule, the single variable chain rule, where you think of taking the derivative of the outside, so you plug two down, like you're taking the derivative of two x, but you're just writing in cosine, instead of x. Cosine t, and then you multiply that by the derivative of the inside, that's a tongue twister, which is negative sine of t, negative sine of t. And I'm afraid I'm gonna run off the edge here, certainly with the many many parentheses that I need. I'll go ahead and rewrite this though. I'm gonna rewrite it anyway because there's a certain pattern that I hope to make clear. So, let me just rewrite this side, let's copy that down here, I just want to rewrite this guy. You might be wondering why, but it'll become clear in just a moment why I want to do this. So, in this case, I'm gonna write this as two times cosine of t, times sine of t, then all of them multiplied by negative sine of t, negative sine of t. So this is the derivative, this is the derivative of the composition of functions that ultimately was a single variable function, but it kind of wind through two different variables. And I just want to make an observation in terms of the partial derivatives of f. So, let me just make a copy of this guy, give ourselves a little bit of room down here, paste that over here. So let's look at the partial derivatives of f for a second here. So, if I took the partial derivative with respect to x, partial x, which means y is treated as a constant. So I take the derivative of x squared to get two x, and then multiply it by that constant, which is just y, and if I also do it with respect to y, get all of them in there. So, now y looks like a variable, x looks like a constant, so x squared also looks like a constant, constant times a variable, the derivative is just that constant. These two, their pattern comes up in the ultimate result that we got. And this is the whole reason that I rewrote it. If you look at this two x y, you can see that over here, where cosine corresponds to x, sine corresponds to y, based on our original functions, and an x squared here corresponds with squaring the x that we put in there. Then if we take the derivative of our two intermediary functions, the ordinary derivative of x, with respect to t, that's derivative of cosine, negative sine of t, and then similarly derivative of y, just the ordinary derivative, no partials going on here, with respect to t, that's equal to cosine, derivative of sine is cosine. And these guys show up, right, you see negative sine over here, and you see cosine show up over here. And we can generalize this, we can write it down and say at least for this specific example, it looks like the derivative of the composition is this part, which is the partial of f with respect to y, right, that's kind of what it looks like here, once we've plugged in the intermediary functions, multiply it by this guy, was the ordinary derivative of y, with respect to t. So, that was the ordinary derivative of y, with respect to t. And then very similarly, this guy was the partial of f, with respect to x, partial x, and we're multiplying it by the ordinary derivative of x of t. So, over here, x of t, with respect to t. And of course, when I write this partial f, partial y, what I really mean is you plug in for x and y, the two coordinate functions, x of t, y of t. So, if I say partial f, partial y over here, what I really mean is you take that x squared and then you plug in x of t squared to get cosine squared. And same deal over here, you're always plugging things in, so you ultimately have a function of t. But this right here has a name, this is the multivariable chain rule. And it's important enough, I'll just write it out all on it's own here. If we take the ordinary derivative, with respect to t, of a composition of a multivariable function, in this case just two variables, x of t, y of t, where we're plugging in two intermediary functions, x of t, y of t, each of which just single variable, the result is that we take the partial derivative, with respect to x, and we multiply it by the derivative of x with respect to t, and then we add to that the partial derivative with respect to y, multiplied by the derivative of y with respect to t. So, this entire expression here is what you might call the simple version of the multivariable chain rule. There's a more general version, and we'll kind of build up to it, but this is the simplest example you can think of, where you start with one dimension, and then you move over to two dimension somehow, and then you move from those two dimensions down to one. So, this is that, and in the next video I'm gonna talk about the intuition for why this is true. You know, here I just went through an example and showed oh but it just happens to be true, it fills this pattern. But there's a very nice line of reasoning for where this comes about, and I'll also talk about a more generalized form, where you'll see it. We start using vector notation, it makes things look very clean, and I might even get around to a more formal argument for why this is true. So, we'll see in next video.