If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

# Chain rule

The chain rule states that the derivative of f(g(x)) is f'(g(x))⋅g'(x). In other words, it helps us differentiate *composite functions*. For example, sin(x²) is a composite function because it can be constructed as f(g(x)) for f(x)=sin(x) and g(x)=x². Using the chain rule and the derivatives of sin(x) and x², we can then find the derivative of sin(x²). Created by Sal Khan.

## Want to join the conversation?

• What is the standard formula or the chain rule?
(55 votes)
• The standard form is
`d/dx f(u) = f'(u)(u'), where u = g(x)`.
Basically, it just uses u as a place holder for the inner function. This is how we end up with "d[sin^2(x)]/d sin(x)".
The way I learned it, was to recognize the compositions and write them as separate functions and find their derivatives.
`f(x) = sin^2(x)` becomes `g(h(x))` where:
`g(x) = x^2g'(x) = 2xh(x) = sin(x)h'(x) = cos(x)`
Then write out the chain:
`g'(h(x))(h'(x))`.
Then plug back in the terms for each function.
`f'(x) = 2(sin(x))(cos(x))`
You can end up with 4 or 5 compositions on tough problems and writing each part out as a separate function, then finding its derivative, then writing out the chain using only the function notation, and finally plugging in the terms from your list of functions and derivatives can really help keep you from missing a step until you are comfortable doing them in your head.
(113 votes)
• At , Sal says that we can't cancel d(sinx) . Why is that so ?
(27 votes)
• Treating differentials as regular numbers is in some cases useful, but it is not entirely correct. In some cases, it can simply confuse you.
If it helps you to remember the chain rule, you can think of the canceling of the differentials. However, do not make the mistake of thinking that is actually what is happening.
(55 votes)
• Can anyone please help me undrrstand how there are two functions in y=sin x^2 but one in y=2x^2+3x?....really getting confused..
thank you
(20 votes)
• The function in the video is y = (sin x)² [more commonly written as y = sin² x]. I think of this as square(sin(x)), that is, a square function of a sine function of x.

Think of y = 2x² + 3x as y = f(x) + g(x) where f(x) is 2x² and g(x) is 3x. The functions of x are not being composed/chained as above (so the chain rule doesn't apply), and they are not being multiplied (so the product rule doesn't apply). They're simply being added. In this situation, the derivative of a sum is the sum of the derivatives, and each function of x is so simple that we can apply the power rule to each term.

* thanks to John Kollar for pointing out how my answer could be clarified.
(27 votes)
• Why is it called the Chain rule ?
(12 votes)
• When functions are composed, they operate as if they were in a chain: the input goes into the first function which spits out an output that becomes the input to the next function which spits out an output that may become the input of a third function and so on. We don't necessarily see this immediately from the way the function is written, but that's the way a compound function operates. So if our function is sin^2(x + pi), we feed the input (x) into the first function which increases it by pi and hands it over to the next function which finds the sine and hands that result over to the next function which squares it.
(29 votes)
• Near , 2sin(x)cos(x) is equal to sin(2x). Is there any deeper meaning to this?
(17 votes)
• can u give some sort of logic behind chain rule on a graph ?? we learnt about basic differentiation using graph, why is it that chain rule is stated here and in my textbook as a fact ?
(11 votes)
• Hmm... very interesting question...

When you apply the chain rule, you're taking into account how the slope of the function is behaving by the influence of the internal variables... for example...

Say f(x) = (2x+1)^2
then, f'(x) = 2(2x+1)(2) = 4(2x+1) = 8x+4

If you graph (2x+1)^2 you will see that it is a parabola... then, if you graph 8x+4 on the same sheet of graphing paper... you will see something very interesting...

Towards the top of of the parabola on the left side, it almost looks like a straight line... and we know the derivative is decreasing... on the same x-value for the derivative... we see that the x-value produces an incredibly negative y-value, which is the slope of the function f(x) at the x-value of interest (sometimes called a)... Even when the slope of f(x) is 0 at x = -2, we see the graph of the derivative crosses the x-axis at x = -2...

A lot of this has to do with looking at a graphs of a function and their derivatives on the same graphing sheet. That's the only time you will make sense of it all.

In sum, basically, the chain rule takes into consideration of how the functions within a function determine the function's slope at some input.

Hope that helps... You may want to review some of Sal's videos on derivatives - especially the ones where he graphs the derivatives intuitively. They seem to get the point across very efficiently.

Happy learning =)
(18 votes)
• Okay, so I sort of understand how we have to go through the function by layers. It's like we unzip it. But I don't understand why we multiply the different layers once we've taken it apart. In the video's example, at we get h'(x)=2sinx * cosx. Why isn't it h'(x)= 2cosx? When we solve for the inner function, why doesn't our answer replace the sinx in the final answer?
(8 votes)
• Consider what is the derivative of 2sinx? The answer is 2cosx, and if that's the derivative of 2sinx we shouldn't expect it to also be the derivative of (sinx)^2.

Why do we multiply when applying the chain rule? Derivatives are rates of change, which means they're essentially multipliers. For example, the derivative of x^2 is 2x, which means at any point on the curve, y is growing at a rate of two times x. If we apply another function to that function, we have another multiplier applied to the first one. That's the essence of why the chain rule works the way it does.
(15 votes)
• what is the derivative of a complex number
(5 votes)
• Since a complex number in itself is a constant, its derivative is zero. Did you mean to ask about the differentiation of complex-valued functions defined on subsets of the complex plane? Such functions may (sometimes) be differentiated. Let `C` denote the set of complex numbers, and suppose `U` is some subset of `C`. Suppose further that `ƒ: U → C` is a complex-valued function defined on `U`, and suppose `w` is an interior point of `U`. If the limit

`lim (z → w) [ƒ(z) - ƒ(w)] / [z - w]`

exists, we say that `ƒ` is (complex) differentiable at `w`, and we denote the value of this limit by `ƒ'(w)`. If `U` is open, and if `ƒ` is differentiable at every point of `U`, we say that `ƒ` is differentiable on `U`. If `ƒ` is differentiable on an open set `U`, one also says that `ƒ` is holomorphic on `U`, or sometimes that `ƒ` is analytic on `U`. Holomorphic functions are central in the theory of complex functions.

More specifically, to say that `ƒ: U → C` is differentiable at an interior point `w` in `U` means the following: there exists some complex number `L` such that for every real number `ε > 0` there exists a real number `δ > 0` with the property that for all complex numbers `z` in `U` with `0 < |z - w| < δ`, we have `|[ƒ(z) - ƒ(w)]/[z - w] - L| < ε`. If such a number `L` exists, we usually denote it by `ƒ'(w)`. This property may also be cast in terms of convergent sequences in `U`.

The process of differentiation of complex-valued functions defined on subsets of the complex plane shares many properties with differentiation of real-valued functions defined on subsets of the real numbers. For instance, the differentiation operator is linear. Furthermore, the product rule, the quotient rule, and the chain rule all hold for such complex functions.

As an example, consider the function `ƒ: C → C` defined by `ƒ(z) = (1 - 3𝑖)z - 2`. It can be shown that `ƒ` is holomorphic, and that `ƒ'(z) = 1 - 3𝑖` for every complex number `z`.
(9 votes)
• Hello.... I have a question that I am unable to solve.....
the question is to differentiate cos x^3 . sin^2*(x^5) w.r.t x
could u please guide me on how to go about this problem ?
(4 votes)
• Did you mean d/dx{cos(x³) * sin²(x⁵)}?
If so, this is a bit of a tricky one. Here's how to do it:
Step 1: Use the power rule.
d/dx{cos(x³) * sin²(x⁵)}
= cos(x³)d/dx{sin²(x⁵)} + sin²(x⁵)d/dx{cos(x³)}

Step 2: Now we have the sum of two derivatives. So, we will find d/dx{sin²(x⁵)} and d/dx{cos(x³)} separately and then plug in the results to cos(x³)d/dx{sin²(x⁵)} + sin²(x⁵)d/dx{cos(x³)}

Step 2a:First, let us do d/dx{sin²(x⁵)}
We need to use the chain rule twice:
d/dx{sin²(x⁵)}
= 2sin(x⁵) d/dx(sin(x⁵))
= 2 sin(x⁵)cos(x⁵) d/dx(x⁵)
= 2 cos(x⁵) sin(x⁵)[5x⁴]
Simplify:
= 10 x⁴ cos(x⁵) sin(x⁵)

Step 2b: Now let us do d/dx{cos(x³)}
We use the chain rule:
d/dx{cos(x³)}
=- sin(x³) d/dx(x³)
=- sin(x³)[3x²]
= -3x²sin(x³)

Step 3: Now let us plug the derivatives we found in steps 2a and 2b into
cos(x³)d/dx{sin²(x⁵)} + sin²(x⁵)d/dx{cos(x³)}
=cos(x³)[10 x⁴ cos(x⁵) sin(x⁵)] + sin²(x⁵)d/dx{cos(x³)}
=cos(x³)[10 x⁴ cos(x⁵) sin(x⁵)] + sin²(x⁵)[-3x²sin(x³)]
Simplify.
=10 x⁴ cos(x³) cos(x⁵) sin(x⁵) -3x²sin²(x⁵) sin(x³)
(8 votes)
• We know that the Chain Rule works only under certain conditions. How do we know whether a particular function obeys those conditions or not?
(5 votes)
• technically you're always using chain rule.
derivative of y=x^2 with respect to x is dy/dx=2x dx/dx, where dx/dx is 1
derivative of y=u^2 with respect to x is dy/dx=2u du/dx, whatever du/dx is as a function of x
(4 votes)

## Video transcript

- [Instructor] What we're going to go over in this video is one of the core principles in calculus, and you're going to use it any time you take the derivative, anything even reasonably complex. And it's called the chain rule. And when you're first exposed to it, it can seem a little daunting and a little bit convoluted. But as you see more and more examples, it'll start to make sense, and hopefully it'd even start to seem a little bit simple and intuitive over time. So let's say that I had a function. Let's say I have a function h of x, and it is equal to, just for example, let's say it's equal to sine of x, let's say it's equal to sine of x squared. Now, I could've written that, I could've written it like this, sine squared of x, but it'll be a little bit clearer using that type of notation. So let me make it so I have h of x. And what I'm curious about is what is h prime of x? So I want to know h prime of x, which another way of writing it is the derivative of h with respect to x. These are just different notations. And to do this, I'm going to use the chain rule. I'm going to use the chain rule, and the chain rule comes into play every time, any time your function can be used as a composition of more than one function. And as that might not seem obvious right now, but it will hopefully, maybe by the end of this video or the next one. Now, what I want to do is a little bit of a thought experiment, a little bit of a thought experiment. If I were to ask you what is the derivative with respect to x, if I were to just apply the derivative operator to x squared with respect to x, what do I get? Well, this gives me two x. We've seen that many, many, many, many times. Now, what if I were to take the derivative with respect to a of a squared? Well, it's the exact same thing. I just swapped an a for the x's. This is still going to be equal to two a. Now I will do something that might be a little bit more bizarre. What if I were to take the derivative with respect to sine of x, with respect to sine of x of, of sine of x, sine of x squared? Well, wherever I had the x's up here, the a's over here, I just replace it with a sine of x. So this is just going to be two times the thing that I had, so whatever I'm taking the derivative with respect to. Here it was with respect to x. Here with respect to a. Here's with respect to sine of x. So it's going to be two times sine of x. Now, so the chain rule tells us that this derivative is going to be the derivative of our whole function with respect, or the derivative of this outer function, x squared, the derivative of x squared, the derivative of this outer function with respect to sine of x. So that's going to be two sine of x, two sine of x. So we could view it as the derivative of the outer function with respect to the inner, two sine of x. We could just treat sine of x like it's kind of an x. And it would've been just two x, but instead it's a sine of x. We say two sine of x times, times the derivative, do this is green, times the derivative of sine of x with respect to x. Times the derivative of sine of x with respect to x, well, that's more straightforward, a little bit more intuitive. The derivative of sine of x with respect to x, we've seen multiple times, is cosine of x, so times cosine of x. And so there we've applied the chain rule. It was the derivative of the outer function with respect to the inner. So derivative of sine of x squared with respect to sine of x is two sine of x, and then we multiply that times the derivative of sine of x with respect to x. So let me make it clear. This right over here is the derivative. We're taking the derivative of, we're taking the derivative of sine of x squared. So let me make it clear. That's what we were taking the derivative of with respect to sine of x, with respect to sine of x. And then we're multiplying that times the derivative of sine of x, the derivative of sine of x with respect to, with respect to x. And this is where it might start making a little bit of intuition. You can't really treat these differentials, this d whatever, this dx, this d sine of x, as a number. And you really can't, this notation makes it look like a fraction because intuitively that's what we're doing. But if you were to treat 'em like fractions, then you could think about canceling that and that. And once again, this isn't a rigorous thing to do, but it can help with the intuition. And then what you're left with is the derivative of this whole sine of x squared with respect to x. So you're left with, you're left with the derivative of essentially our original function, sine of x squared with respect to x, with respect to x, which is exactly what dh/dx is. This right over here, this right over here is our original function h. That's our original function h. So it might seem a little bit daunting now. What I'll do in the next video is another several examples, and then we'll try to abstract this a little bit.