If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Multivariable chain rule intuition

Get a feel for what the multivariable is really saying, and how thinking about various "nudges" in space makes it intuitive. Created by Grant Sanderson.

Want to join the conversation?

  • blobby green style avatar for user Taras.Pokalchuk
    But what if df=df/dx*dx/dt*dt multiplied by df/dy*dy/dt*dt? Here both x and y affect df, so have do you know you add them?
    (7 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user Rohit Rao
      Even in this video, x and y both affect f. But if you notice, f is a single variable output. This implies that the output space is a number "line" and not a plane. Essentially and change in x produces a certain change on the number line and a change in y produces another change on the number line. So the total change in the output space is given by the addition of the individual changes by x and y respectively. The total magnitude of change would be a summation of the ratio of change (doh F by doh x) times the actual change in x and similarly for y.
      (8 votes)
  • hopper jumping style avatar for user Nils Petter
    How wrong is it to view dy/dx as a fraction? Does dy=2 dx mean the same as dy/dx =2?
    (5 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user Taras.Pokalchuk
      dy/dx is a fraction, but there is also some information you might loose if you treat it like one. d(x^2)/dx=2x. here dx is defined as dx=limh->0 (h). And the definition of d(x^2) is in the def. of the derivative. But if you write d(x^4)/d(x^2) you mean that x^2 = h and the top differential is a function of the lower. Using this examples
      d(x^2)(from the second) / (dx)(from the first)=1.

      The bottom differential is always h, and the top differential is a function of h and whatever.

      Now a new example d(x^2)/dx=2x. (d(x^2)/dx)*dx=d(x^2). But now that the bottom differential is gone and you get d(x^2) you don't know if d(x^2)=h, or is it
      unless you look at your previous step.
      (3 votes)
  • blobby green style avatar for user Paras Pokharel
    Why is the change in z given by 'adding' the change in x and change in y? Yes, change in y and change in z is responsible for a change in Z, but why is this statement expressed as a simple addition?
    (6 votes)
    Default Khan Academy avatar avatar for user
    • orange juice squid orange style avatar for user laura lee
      It is a vectorial notation. Since the direction is already implied by i and j notation, we can simply "add" both the magnitude and the direction to get z. It's really just like saying 3 steps to the right and 4 steps ahead are the same thing as 5 steps diagonally(of the 3-4-5 right trangle), assuming our definition of the directions "right" and "ahead" are perpendicular to and therefore do not interfere with each other.
      (4 votes)
  • area 52 yellow style avatar for user Surya Raju
    Let r(t)=x(t)i+y(t)j. The can we say that d/dt[f(r(t))] is the directional derivative of f(r(t)) in the direction of r’(t)?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • leafers tree style avatar for user Fasteric Algorithm
    What if x and y are also multivariable functions ?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user gurungdebendra82
    how about finding the multivariable chain rule without paramtrizing with the parameters t. Since Z=x²y itself is a function..
    (2 votes)
    Default Khan Academy avatar avatar for user
  • aqualine ultimate style avatar for user Etudiant
    Why does he say that f is a one dimensional number line? Isn't f ultimately taking t as an input, hence making it a 2D graph with x-axis at 't' and y-axis as 'f'?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • old spice man green style avatar for user whale2
      I believe the graph you mention is the graph of the outputs from f(t) vs. the inputs of different values of t. A two dimension graph needs a set of ordered pairs (x,y). The inputs are used as x coordinates while the corresponding outputs are used as y coordinates. (x,y) = (input,output).
      So for a two dimension plot you need to consider both the inputs and outputs of f(t), as if you ignored the inputs and wrote out the outputs of f(t), you would have a list of single numbers not pair of numbers.
      It seems in the video he just considers the possible outputs from f(t). f(t) would output a list of single numbers (not ordered pairs) so if you only graphed the output it would be a number line.
      Admittedly at first it doesn't seem like much sense to focus just on the plot of the outputs from a function (and not consider the inputs too), but if you think of functions as transformations it makes more sense. We started with a single number line t, then we got ordered pairs of points (x(t),y(t)) which is a two dimensional plot, and finally we had f( x(t), y(t) ) which collapsed the two dimensional inputs x(t), y(t) back to a single list of numbers which is a number line when plotted.
      (2 votes)
  • blobby green style avatar for user dmonkoff
    There is still one thing which intuitively doesn't make sense to me.

    This splitting into a sum of two components dx and dy makes sense if we are talking about vectors, but in our case the thing we want to get is a scalar, the magnitude of change. If we think of it as a length of vector we should take into account the triangle inequality theorem that states that the sum of lengths of any two sides of the triangle is always greater than the length of the third side. Therefore our estimation of the change should be always greater than the actual change.

    I guess I am missing something, but I'm not sure what.
    (1 vote)
    Default Khan Academy avatar avatar for user
  • aqualine tree style avatar for user White
    Cancelling the the dxs and dys also add an intuitive feel.
    df/dt = partialf/dt + partialf/dt
    both of the partials would add up to the full derivative.
    (1 vote)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user bkteach
    The intuition behind df being affected by a combination of the changes due to dx and that due to dy is very nice. At , what is the intuition to determine that df is simply the sum of these two components? How do we know that these two factors do not combine in some other way to affect df? How do we know, for example, that df is not the product of these two, or perhaps the change caused by dx plus twice the change caused by dy, or some other more complicated function involving both of the changes?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user kubleeka
      Because the small change in f is a vector, and can be written as the sum of a small change in the x-direction plus a small change in the y-direction. Each term in the multivariable chain rule is one of these single-coordinate vectors.
      (1 vote)

Video transcript

- [Voiceover] So in the last video, I introduced this multi-variable chain rule and here I want to explain a loose intuition for why it's true, why you would expect something like this to happen. So the way you think about an expression like this, you have this multi-variable function f of xy and you're plugging things in, but just that function itself, you'll be thinking of taking a two dimensional space you know here's our xy plane, and then mapping it to, you know, just a real number line and I'll think of this as f, as the output. So somehow our whole function takes things from this two dimensional space and plugs it onto this output. T you're thinking of just another number line up here, so t, and then you've got separate functions here, you know x of t and y of t. X of t and y of t. Each of which take that same value for a specific input, you know it's not that they're acting on different inputs, x of some other input t and y of some other input, it's the same one and then they move that somewhere to this output space which itself get's moved over. And in this way you're thinking of it as just a single variable function that goes from t and ultimately outputs f it's just that there's multi-dimensional stuff happening in between and now if we start thinking about the derivative of it - what does that mean, what does that mean for the conception of the picture that we have going on here? Well, that bottom part, that dt you're thinking of as a tiny change to t, right? So you're thinking of it as kind of a nudge, I'll draw it as a sizable line here for like moving from some original input over, but you might in principal think of it as a very, very tiny nudge in t. And over here you'd say well, that's gonna move your intermediary output in the xy plane to, you know maybe it'll move it in some amount, again imagine this is a very small nudge, I'm going to give it some size here just so I can write into it and then whatever that nudge in the output space right, it's a nudge in some direction that's going to correspond to some change in f. Some change based on the differential properties of the multi-variable function itself. And if we think about this, this change you might break it into components and say this shift here has some kind of dx, some kind of shift in the x direction and some kind of dy, some shift in the y direction. But you can actually reason about what these should be coz it's not just an arbitrary change in x or an arbitrary change in y, it's the one that was caused by dt. So if I go over here, I might say that dx is caused by that dt and the whole meaning of the derivative, the whole meaning of the single variable derivative would be that when we take dx dt, this is the factor that tells us, you know, a tiny nudge in t, how much does that change the x component and if you want you could think of this as kind of cancelling out the dts and you're just left with x, but really you're saying there's a tiny nudge in t and that results in a change in x and this derivative is what tells you the ratio between those sizes. And similarly, that change in y here, that change in y is gonna be somehow proportional to the change in t and that proportion is given by the derivative of y with respect to t that's the whole point of the derivative, no no, with respect to t and again you can kind of think of it as if you're cancelling out the ts and this is why the fractional writing, this Leibniz notation is actually pretty helpful. You know, people will say, oh mathematicians would like, share their heads at the idea of treating these like fractions, but not only is it a useful thing to do coz it is a helpful mnemonic, it's reflective of what you're gonna do when you make a very formal argument. And I think I'll do that in one of the following videos, I'll describe this in a very, a much more formal way that's a little bit more airtight than the kind of hand-waving nudging around. But the intuition you get from just writing this is a fraction is basically the scaffolding for that formal argument, so it's a fine thing to do, I don't think mathematicians are shaking their heads every time that a student or a teacher does this. But anyway, so this is kind of gives you what that dx is, what that dy is and then over here if you're saying how much does that change the ultimate output of the f? You could say, well, your nudge of size dx over here, you're wondering how much that changes the output of f, that's the meaning of the partial derivative, right. If we say we have the partial derivative with respect to x, what that means, is that if you take a tiny nudge of size x this is giving you the ratio between that and the ultimate change to the output that you want. You could think of it like this partial x is cancelling out with that dx if you wanted or you could just say, this is a tiny nudge in x, this is going to result in some change in f - I'm not sure what - but the meaning of the derivative is the ratio between those two and that's what lets you figure it out. And similarly, you might call this the change in f caused by x, like, due to x. Due to, I should say to dx. But that's not the only thing changing the value of f right? That's not the only change happening in the input space, you also have another change in f and this one I might say is due to dy. Due to that tiny shift in y and what that's gonna be we know it's going to be proportional to that tiny shift in y and the proportionality constant - this is the meaning of the partial derivative, that when you nudge y in some way it results in some kind of nudge in f and the ratio between those two is what the derivative gives. So ultimately, if you put this all together what you'd say is there's two different things causing an ultimate change to f. So if you put these together, and you want to know what the total change in f is - so I might go over here and say the total change in f, one of them is caused by partial f, partial x - and I can multiply it by dx here, but really, we know that dx, the change there was in turn caused by dt so that in turn is caused by the change in the x component that was due to dt. That was of course of size dt. And then for similar reasons, the other way that this changes in the y direction is a partial of f with respect to y but what caused that initial shift in y, you'd say that was a shift in y that was due to t, and that size is dy dt times dt, you could think of it. So slight nudge in t causes a change in y, that change in y causes the change in f and when you add those two together that's everything that's going on, that's everything that influences the ultimate change in f. So then if you take this whole expression and you divide everything out by dt so you know, kind of erase it from this side and put it over here, dt, this is your multi-variable chain rule, and of course I've just written the same thing again but hopefully this gives a little bit on intuition for how you're composing different nudges and why you wanna think about it that way. Of course, you can see this, and you see the partial f kind of cancels out with that dx and this partial y kind of cancels out with that dy and you're left with the two different things that constitute a change in x, you know this one is only partially the change in f, this is also partially the change in f, but together they give the ultimate change in f and I think that gives a very strong reason, if you break it down like that, why this should be true.