If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Chain rule proof

Here we use the formal properties of continuity and differentiability to see why the chain rule is true.

Want to join the conversation?

  • mr pants teal style avatar for user Morgan Chafe
    " lim_{Δx->0} (Δy/Δu) * (Δu)(Δx) "

    But if Δu=0 (even when Δx ≠ 0 ), you'd be dividing by zero ? This reasoning suggests that the chain rule is true but I don't think it's rigorous enough.
    (22 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user Stefen
      You are correct. This is the "intuitive" proof. There is a rigorous proof, the chain rule is sound.
      To prove the Chain Rule correctly you need to show that if f(u) is a differentiable function of u and u = g(x) is a differentiable function of x, then the composite y=f(g(x)) is a differentiable function of x. Since a function is differentiable if and only if it has a derivative at each point in its domain, it must be shown that whenever g is differentiable at xₒ and f is differentiable at g(xₒ) , then the composite is differentiable at xₒ and the derivative of the composite satisfies the equation:
      dy/dx, = f'(g(xₒ))·g'(xₒ) (when x=xₒ,)
      Good eye!
      (35 votes)
  • blobby green style avatar for user spencer marlen-starr
    Is it true that every type of derivative is actually also a chain rule on top of whatever other type it is? I've always had the sneaking suspicion that this is true and I haven't yet found a counterexample, but in math you need formal proofs. Let me provide an example of what I am talking about, when I take the derivative of f(x) = 3x^2 and get my result of d/dx(x) = 2*3x^(2-1) = 6x, isn't that still a chain rule, but I just didn't have to type out the second part of the chain rule because the derivative of the inside x is just 1, so that would have made the (complete) way to take the derivative using the chain rule and power rule d/dx(3x^2) = [2*3x^(2-1)] * (d/dx(x)) = 6x * 1 = 6x. I get the same result, but it shows that the chain rule still holds for different types of derivatives besides just standard chain rule problems. But does this discovery hold for every derivative?
    (14 votes)
    Default Khan Academy avatar avatar for user
    • piceratops ultimate style avatar for user Just Keith
      Yes, the chain rule applies to all derivatives (at least all of the derivatives of the type you deal with in an introductory course such as this). However, as you point out, we often get trivial results from the chain rule that we don't need to show explicitly.
      (11 votes)
  • aqualine ultimate style avatar for user Nabasindhu Das
    Should not the function y be differentiable at u(x) and not x?
    (11 votes)
    Default Khan Academy avatar avatar for user
  • winston baby style avatar for user userbrianjiang
    can't you cancel out du directly, like you would cancel out the 2 in 2/3*1/2= 1/3 for dy/du*du/dx= dy/dx?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • male robot donald style avatar for user Austin Cheng
    At , Sal says d/dx [y(u(x))] = (dy/du) * (du/dx). Shouldn't this be (d [y(u(x))]/du) * (du/dx)? Because Sal is implying that d/dx [f(g(x))] = (d [f(x)]/d [g(x)]) * (d [g(x)])/dx.
    (3 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user Andrew
      So you wrote: "Shouldn't this be (d [y(u(x))]/du) * (du/dx)?"
      Try this instead: ( d [y(u(x)]/d[u(x)] ) * ( d[u(x)] / dx )

      Notice I included a the whole "u(x)" in place of the lone "u" that you sometimes wrote. By writing just "u", you just used a short hand notation for "u(x)". This is what Sal is doing doing by writing dy/du, instead of: ( d[y(u(x))] / d[u(x)] ) * ( d[u(x)] / dx ). Either way is fine, if you know that y is short for y(u(x)).
      (4 votes)
  • leafers tree style avatar for user Felix Chen
    why does f'(g(x)) equal to dy/du, can someone please explain, thanks a lot.☻
    (2 votes)
    Default Khan Academy avatar avatar for user
  • male robot johnny style avatar for user Reeshav
    at sal calls the chain rule infamous, just asking but why
    (3 votes)
    Default Khan Academy avatar avatar for user
  • leafers tree style avatar for user Felix Chen
    By the way, what does dy/du mean? It doesn't make sense to me because u is a function not a variable...
    (3 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user Anon
      Exesssr is incorrect, Sal is talking about differentiating y(u(x)) with respect to u(x). A function is a dependent variable with only one value for any given value of the independent variable. Therefore, y(u(x)) is a variable dependent on u(x), which in turn is a variable dependent on x. u(x) is not nessesarily equal to x, however, so dy/du /= dy/dx. In fact, dy/du * du/dx = dy/dx, so dy/du only equals dy/dx when du/dx equals 1 (du/dx can only be constantly equal 1 if u=x).
      (2 votes)
  • duskpin ultimate style avatar for user Avinash Suresh
    Why can't I just say that dx, dy and du are infinitesimal changes and hence directly prove the chain rule by multiplication and division of du?
    dy/dx = dy/du . du/dx.
    Even if 'd' corresponds to a very,very small change, still it is a change in variable and I should be able to do algebraic manipulation?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user Anon
      Derivatives can be defined in two ways, using limits or using infinitesimals. We cannot assume that the infinitesimal change du in dy/du is equal to the infinitesimal change du in du/dx. However, assuming both dy/du and du/dx are differentiable, the standard part of dy/du and du/dx must be constant for ANY infinitesimal du or dx. In order to be differentiable, the standard part of dy/du and du/dx must also be defined, and therefore du in du/dx must be infinitesimal when dx is infinitesimal. Therefore, we can define du in dy/du to be the infinitesimal change of du for a given dx, knowing that it will be the same for any other du. From there, we can algebraicly solve dy/du * du/dx to get dy/dx. Sal essentially did the same thing that I'm doing here, except he used limits instead of infinitesimals and worked in reverse order.
      (2 votes)
  • leaf blue style avatar for user Fredde9311
    I came up with this alternative proof:
    We know that: df/dt=f'(t) <=> df=f'(t)*dt
    Now, if t itself is a function of another variable x then we have that: t=t(x)=g(x). Also dt=dg (that is, an infinitesimal change in t results in an infinitesimal change in g)
    if we plug this into the first equation we have that: df=f'(g(x))*dg
    Then we divide both sides by dx: df/dx=f'(g(x))*dg/dx=f'(g(x))*g'(x)
    (2 votes)
    Default Khan Academy avatar avatar for user
    • mr pants teal style avatar for user Moon Bears
      That's a good "intuitive" reason for why the chain rule should be true; however it fails complete mathematical rigor, as what happens if dx = 0? The complete proof is a slight modification of yours, creating a piecewise function (called fudge function) for this case.
      (1 vote)

Video transcript

- What I hope to do in this video is a proof of the famous and useful and somewhat elegant and sometimes infamous chain rule. And, if you've been following some of the videos on "differentiability implies continuity", and what happens to a continuous function as our change in x, if x is our independent variable, as that approaches zero, how the change in our function approaches zero, then this proof is actually surprisingly straightforward, so let's just get to it, and this is just one of many proofs of the chain rule. So the chain rule tells us that if y is a function of u, which is a function of x, and we want to figure out the derivative of this, so we want to differentiate this with respect to x, so we're gonna differentiate this with respect to x, we could write this as the derivative of y with respect to x, which is going to be equal to the derivative of y with respect to u, times the derivative of u with respect to x. This is what the chain rule tells us. But how do we actually go about proving it? Well we just have to remind ourselves that the derivative of y with respect to x... the derivative of y with respect to x, is equal to the limit as delta x approaches zero of change in y over change in x. Now we can do a little bit of algebraic manipulation here to introduce a change in u, so let's do that. So this is going to be the same thing as the limit as delta x approaches zero, and I'm gonna rewrite this part right over here. I'm gonna essentially divide and multiply by a change in u. So I could rewrite this as delta y over delta u times delta u, whoops... times delta u over delta x. Change in y over change in u, times change in u over change in x. And you can see, these are just going to be numbers here, so our change in u, this would cancel with that, and you'd be left with change in y over change x, which is exactly what we had here. So nothing earth-shattering just yet. But what's this going to be equal to? What's this going to be equal to? Well the limit of the product is the same thing as the product of the limit, so this is going to be the same thing as the limit as delta x approaches zero of, and I'll color-coat it, of this stuff, of delta y over delta u, times-- maybe I'll put parentheses around it, times the limit... the limit as delta x approaches zero, delta x approaches zero, of this business. So let me put some parentheses around it. Delta u over delta x. So what does this simplify to? Well this right over here, this is the definition, and if we're assuming, in order for this to even be true, we have to assume that u and y are differentiable at x. So we assume, in order for this to be true, we're assuming... we're assuming y comma u are differentiable... are differentiable at x. And remember also, if they're differentiable at x, that means they're continuous at x. But if u is differentiable at x, then this limit exists, and this is the derivative of... this is u prime of x, or du/dx, so this right over here... we can rewrite as du/dx, I think you see where this is going. Now this right over here, just looking at it the way it's written out right here, we can't quite yet call this dy/du, because this is the limit as delta x approaches zero, not the limit as delta u approaches zero. But we just have to remind ourselves the results from, probably, the previous video depending on how you're watching it, which is, if we have a function u that is continuous at a point, that, as delta x approaches zero, delta u approaches zero. So we can actually rewrite this... we can rewrite this right over here, instead of saying delta x approaches zero, that's just going to have the effect, because u is differentiable at x, which means it's continuous at x, that means that delta u is going to approach zero. As our change in x gets smaller and smaller and smaller, our change in u is going to get smaller and smaller and smaller. So we can rewrite this, as our change in u approaches zero, and when we rewrite it like that, well then this is just dy/du. This is just dy, the derivative of y, with respect to u. So just like that, if we assume y and u are differentiable at x, or you could say that y is a function of u, which is a function of x, we've just shown, in fairly simple algebra here, and using some assumptions about differentiability and continuity, that it is indeed the case that the derivative of y with respect to x is equal to the derivative of y with respect to u times the derivative of u with respect to x. Hopefully you find that convincing.