This site contains: mathematics courses and book; covers: image analysis, data analysis, and discrete modelling; provides: image analysis software. Created and run by Peter Saveliev.
Chain rule of differentiation
From Intelligent Perception
The derivative of the composition is equal to the product of the derivatives
For vector functions: The derivative of the composition is equal to the composition of the derivatives = product of these matrices.
Theorem (Chain Rule). Suppose $h(x) = g( f(x) )$, where
$$f: {\bf R}^n {\rightarrow} {\bf R}^m, g: {\bf R}^m {\rightarrow} {\bf R}^q, h: {\bf R}^n {\rightarrow} {\bf R}^q. $$
Suppose further that
$g$ is differentiable at $y = f(a)$.
Then
$h'(a) = g'( f(a) ) f'(a)$.
Proof
Let
$g: {\bf R}^m {\rightarrow} {\bf R}^q$ differentiable at $y = b = f(a)$.
Then
and the derivative of the composition is equal to the composition of the derivatives.
By definition these three mean the existence of their best affine approximations
(1) $T_f(x) = f(a) + f'(a) ( x - a), \frac{f(x) - T_f(x)}{|| x - a ||} {\rightarrow} 0$ as $x {\rightarrow} a$,
(2) $T_g(y) = g(b) + g'(b) ( y - b), \frac{g(y) - T_g(y)}{|| y - b ||} {\rightarrow} 0$ as $y {\rightarrow} b$.
Now we want to find what is the best affine approximation of the composition:
(3) $T_h(x) = h(a) + h'(a) ( x - a ), \frac{h(x) - T_h(x)}{|| x - a ||} {\rightarrow} 0$ as $x {\rightarrow} a$.
Idea: Prove that $T_g {\circ} T_f$ (satisfies (3)) is the best affine approximation of $h$.
(1) => Let $E_f(x) = \frac{f(x) - T_f(x)}{|| x - a ||}, E_f(x) {\rightarrow} 0$ as $x {\rightarrow} a$. Then
$$f(x) = T_f(x) + E_f(x) || x - a ||.$$
(2) => Let $E_g(y) = \frac{g(y) - T_g(y)}{|| y - b ||}, E_g(y) {\rightarrow} 0$ as $y {\rightarrow} b$. Then
$$g(y) = T_g(y) + E_g(y) || y - b ||.$$
Consider
$$h(x) = g( f(x) ) = T_g( f(x) ) + E_g( f(x) ) || f(x) - b ||,$$
where
Then
$$\begin{array}{} h(x) &= T_g( T_f(x) + E_f(x) || x - a || ) + {\rm \hspace{3pt} small \hspace{3pt} term} \\ &= [ g(b) + g'(b) ( y - b ) ] ( T_f(x) + E_f(x) || x - a || ) + {\rm \hspace{3pt} small \hspace{3pt} term}. \end{array}$$
With
$$y = T_f(x) + E_f(x) || x - a ||$$
we get
$$\begin{array}{} h(x) &= g(b) + g'(b) ( T_f(x) + E_f(x) || x - a || - b ) + {\rm \hspace{3pt} small \hspace{3pt} term} \\ &= g(b) + g'(b) ( T_f(x) - b ) + g'(b) ( E_f(x) || x - a || ) + {\rm \hspace{3pt} small \hspace{3pt} term} \\ &= g(b) + g'(b) ( T_f(x) - b ) + {\rm \hspace{3pt} small \hspace{3pt} term} \end{array}$$
since
$g'(b) ( E_f(x) || x - a || ) {\rightarrow} 0$ as $x {\rightarrow} a$ because $g'(b)$ is linear.
Now
$$\begin{array}{} h(x) &= g(b) + g'(b) ( f(a) + f'(a) ( x - a ) - b ) + {\rm \hspace{3pt} small \hspace{3pt} terms} \\ &= g( f(a) ) + g'( f(a) ) ( f'(a) ( x - a ) ) + {\rm \hspace{3pt} small \hspace{3pt} terms} \\ &= h(a) + g'( f(a) ) f'(a) ( x - a ) + {\rm \hspace{3pt} small \hspace{3pt} terms}. \end{array}$$
Define
$$T_h(x) := h(a) + g'( f(a) ) f'(a) ( x - a ),$$
then $h(x) - T_h(x)$ is small in the sense that
So $T_h(x)$ is the best affine approximation, its linear part is the derivative of $h ( g'( f(a) ) {\circ} f'(a) {\rightarrow}$ Chain Rule ).
