This site contains: mathematics courses and book; covers: image analysis, data analysis, and discrete modelling; provides: image analysis software. Created and run by Peter Saveliev.

# Chain rule of differentiation

(Redirected from Chain Rule)

The derivative of the composition is equal to the product of the derivatives

For vector functions: The derivative of the composition is equal to the composition of the derivatives = product of these matrices.

Theorem (Chain Rule). Suppose $h(x) = g( f(x) )$, where

$$f: {\bf R}^n {\rightarrow} {\bf R}^m, g: {\bf R}^m {\rightarrow} {\bf R}^q, h: {\bf R}^n {\rightarrow} {\bf R}^q.$$

Suppose further that

$f$ is differentiable at $x = a$ and
$g$ is differentiable at $y = f(a)$.

Then

$h$ is differentiable at $x = a$ and
$h'(a) = g'( f(a) ) f'(a)$.

## Proof

Let

$f: {\bf R}^n {\rightarrow} {\bf R}^n$ differentiable at $x = a$,
$g: {\bf R}^m {\rightarrow} {\bf R}^q$ differentiable at $y = b = f(a)$.

Then

$h: {\bf R}^n {\rightarrow} {\bf R}^q$ differentiable (?) at $x = a$,

and the derivative of the composition is equal to the composition of the derivatives.

By definition these three mean the existence of their best affine approximations

(1) $T_f(x) = f(a) + f'(a) ( x - a), \frac{f(x) - T_f(x)}{|| x - a ||} {\rightarrow} 0$ as $x {\rightarrow} a$,

(2) $T_g(y) = g(b) + g'(b) ( y - b), \frac{g(y) - T_g(y)}{|| y - b ||} {\rightarrow} 0$ as $y {\rightarrow} b$.

Now we want to find what is the best affine approximation of the composition:

(3) $T_h(x) = h(a) + h'(a) ( x - a ), \frac{h(x) - T_h(x)}{|| x - a ||} {\rightarrow} 0$ as $x {\rightarrow} a$.

Idea: Prove that $T_g {\circ} T_f$ (satisfies (3)) is the best affine approximation of $h$.

(1) => Let $E_f(x) = \frac{f(x) - T_f(x)}{|| x - a ||}, E_f(x) {\rightarrow} 0$ as $x {\rightarrow} a$. Then

$$f(x) = T_f(x) + E_f(x) || x - a ||.$$

(2) => Let $E_g(y) = \frac{g(y) - T_g(y)}{|| y - b ||}, E_g(y) {\rightarrow} 0$ as $y {\rightarrow} b$. Then

$$g(y) = T_g(y) + E_g(y) || y - b ||.$$

Consider

$$h(x) = g( f(x) ) = T_g( f(x) ) + E_g( f(x) ) || f(x) - b ||,$$

where

$E_g( f(x) ) || f(x) - b || {\rightarrow} 0$ as $x {\rightarrow} a$ since $f(x) {\rightarrow} f(a) = b$ as $x {\rightarrow} a$.

Then

$$\begin{array}{} h(x) &= T_g( T_f(x) + E_f(x) || x - a || ) + {\rm \hspace{3pt} small \hspace{3pt} term} \\ &= [ g(b) + g'(b) ( y - b ) ] ( T_f(x) + E_f(x) || x - a || ) + {\rm \hspace{3pt} small \hspace{3pt} term}. \end{array}$$

With

$$y = T_f(x) + E_f(x) || x - a ||$$

we get

$$\begin{array}{} h(x) &= g(b) + g'(b) ( T_f(x) + E_f(x) || x - a || - b ) + {\rm \hspace{3pt} small \hspace{3pt} term} \\ &= g(b) + g'(b) ( T_f(x) - b ) + g'(b) ( E_f(x) || x - a || ) + {\rm \hspace{3pt} small \hspace{3pt} term} \\ &= g(b) + g'(b) ( T_f(x) - b ) + {\rm \hspace{3pt} small \hspace{3pt} term} \end{array}$$

since

$E_f(x) {\rightarrow} 0$ as $x {\rightarrow} a$ and $|| x - a || {\rightarrow} 0$ as $x {\rightarrow} a$, hence
$g'(b) ( E_f(x) || x - a || ) {\rightarrow} 0$ as $x {\rightarrow} a$ because $g'(b)$ is linear.

Now

$$\begin{array}{} h(x) &= g(b) + g'(b) ( f(a) + f'(a) ( x - a ) - b ) + {\rm \hspace{3pt} small \hspace{3pt} terms} \\ &= g( f(a) ) + g'( f(a) ) ( f'(a) ( x - a ) ) + {\rm \hspace{3pt} small \hspace{3pt} terms} \\ &= h(a) + g'( f(a) ) f'(a) ( x - a ) + {\rm \hspace{3pt} small \hspace{3pt} terms}. \end{array}$$

Define

$$T_h(x) := h(a) + g'( f(a) ) f'(a) ( x - a ),$$

then $h(x) - T_h(x)$ is small in the sense that

$\frac{h(x) - T_h(x)}{|| x - a ||} {\rightarrow} 0$ as $x {\rightarrow} a$.

So $T_h(x)$ is the best affine approximation, its linear part is the derivative of $h ( g'( f(a) ) {\circ} f'(a) {\rightarrow}$ Chain Rule ).