This site is devoted to mathematics and its applications. Created and run by Peter Saveliev.

# Differentiation

### From Mathematics Is A Science

## Contents

- 1 Differentiation over addition and constant multiple: the linearity
- 2 Differentiation over compositions: the Chain Rule
- 3 Differentiation over multiplication and division
- 4 The rate of change of the rate of change
- 5 Repeated differentiation
- 6 Change of variables and the derivative
- 7 Implicit differentiation and related rates
- 8 Radar gun: the math
- 9 The derivative of the inverse function
- 10 Reversing differentiation

## 1 Differentiation over addition and constant multiple: the linearity

In this chapter, we will be taking a broader look at how we compute the rate of change.

If a function is defined at the nodes of a partition, it is simply a sequence of numbers. And so is its difference quotient. What this means is that this procedure is a special kind of function, a *function of functions*:
$$\newcommand{\ra}[1]{\!\!\!\!\!\xrightarrow{\quad#1\quad}\!\!\!\!\!}
\newcommand{\da}[1]{\left\downarrow{\scriptstyle#1}\vphantom{\displaystyle\int_0^1}\right.}
%
\begin{array}{ccccccccccccccc}
f & \mapsto & \begin{array}{|c|}\hline\quad \frac{\Delta }{\Delta x} \quad \\ \hline\end{array} & \mapsto & u=\frac{\Delta f}{\Delta x} .
\end{array}$$
Furthermore, the derivative is defined as a limit. Unlike the limits we saw prior to derivatives, this one has a parameter, the location $x$. That is why with the input a differentiable function $f$, the output of this limits is another function $f'$. What this means is that this process is a special kind of function too, a *function of functions*:
$$\newcommand{\ra}[1]{\!\!\!\!\!\xrightarrow{\quad#1\quad}\!\!\!\!\!}
\newcommand{\da}[1]{\left\downarrow{\scriptstyle#1}\vphantom{\displaystyle\int_0^1}\right.}
%
\begin{array}{ccccccccccccccc}
f & \mapsto & \begin{array}{|c|}\hline\quad \frac{d}{dx} \quad \\ \hline\end{array} & \mapsto & f' .
\end{array}$$
We need to understand how these two functions operate. We would like to develop shortcuts and algebraic rules for evaluating both difference quotients and the derivatives. The latter will be found without resorting to using limits!

What happens to the output function of differentiation as we perform algebraic operations with the input functions?

The idea of *addition* of the change is illustrated below:

Here, the bars that represent the change of the output variable are stacked on top of each other, then the heights are added to each other and so are the height differences. The algebra behind this geometry is very simple: $$(A+B)-(a+b)=(A-a)+(B-b).$$

**Theorem (Sum Rule).** (A) The difference of the sum of two functions is the sum of their differences and the difference quotient of the sum of two functions is the sum of their difference quotients; i.e., for any two functions $f,g$ defined at the adjacent nodes $x$ and $x+\Delta x$ of a partition, the differences and the difference quotients (defined at the corresponding secondary node) satisfy:
$$\Delta(f+g)=\Delta f+\Delta g.$$
and
$$\frac{\Delta(f+g)}{\Delta x}=\frac{\Delta f}{\Delta x}+\frac{\Delta g}{\Delta x}.$$
(B) The sum of two functions differentiable at a point is differentiable at that point and its derivative is equal to the sum of their derivatives; i.e., for any two functions $f,g$ differentiable at $x$, we have at $x$:
$$\frac{d(f+g)}{d x}=\frac{d f}{d x}+\frac{d g}{d x}.$$

**Proof.** Applying the definition to the function $f+g$, we have:
$$\begin{array}{lll}
\Delta(f+g)(c)&=(f+g)(x+\Delta x)-(f+g)(x)\\
&=f(x+\Delta x)+g(x+\Delta x)-f(x)-g(x)\\
&=\big( f(x+\Delta x)-f(x) \big) +\big(g(x+\Delta x)-g(x) \big)\\
&=\Delta f(c)+\Delta g(c).
\end{array}$$
Now, the limit with $c=x$:
$$\begin{array}{lll}
\frac{\Delta(f+g)}{\Delta x}(x)&=\frac{\Delta f}{\Delta x}(x)+\frac{\Delta g}{\Delta x}(x)&\text{ ...by SR...}\\
&\to\frac{d f}{d x}+\frac{d g}{d x} &\text{ as } \Delta x\to 0.\\
\end{array}$$
$\blacksquare$

In terms of motion, if two runners are running *away* from each other starting from a common location, then the distance between them is the sum of the distances they have covered.

The formula in the Lagrange notation is as follows: $$(f + g)'(x)= f'(x) + g'(x).$$

The same proof applies to *subtraction* of the change.

**Exercise.** State the Difference Rule.

In terms of motion, if two runners are running *along* with each other starting from a common location, then the distance between them is the difference of the distances they have covered.

The idea *proportion* of the change is illustrated below:

Here, if the heights triple then so do the height differences. The algebra behind this geometry is very simple: $$kA-ka=k(A-a).$$

**Theorem (Constant Multiple Rule).** (A) The difference of a multiple of a function is the multiple of the function's difference and the difference quotient of a multiple of a function is the multiple of the function's difference quotient; i.e., for any function $f$ defined at the adjacent nodes $x$ and $x+\Delta x$ of a partition and any real $k$, the differences and the difference quotients (defined at the corresponding secondary node) satisfy:
$$\Delta(kf)=k\Delta f.$$
and
$$\frac{\Delta(kf)}{\Delta x}=k\frac{\Delta f}{\Delta x}.$$
(B) A multiple of a function differentiable at a point is differentiable at that point and its derivative is equal to the multiple of the function's derivative; i.e., for any function $f$ differentiable at $x$ and any real $k$, we have at $x$:
$$\frac{d(kf)}{dx}=k\frac{d f}{dx}.$$

**Proof.** Applying the definition to the function $c\,f$, we have:
$$\begin{array}{lll}
\Delta(k\cdot f)(c)&=(k\cdot f)(x+\Delta x)-(k\cdot f)(x)\\
&=k\cdot f(x+\Delta x)-k\cdot f(x)\\
&=k\cdot \big( f(x+\Delta x)-f(x) \big)\\
&=k\cdot \Delta f\, (c).
\end{array}$$
Now, the limit with $c=x$:
$$\begin{array}{lll}
\frac{\Delta(kf)}{\Delta x}(x)&=\frac{k\Delta f}{\Delta x}(x)\\
&=k\frac{\Delta f}{\Delta x}(x)&\text{ ...by CMR...}\\
&\to k\frac{d f}{d x}(x)&\text{ as } \Delta x\to 0.\\
\end{array}$$
$\blacksquare$

In terms of motion, if the distance is re-scaled, such as from miles to kilometers, then so is the velocity -- at the same proportion.

The formula in the Lagrange notation is as follows: $$(k\cdot f)'(x) = k\cdot f'(x).$$ Here is another way to write these formulas in the Leibniz notation. This is the Sum Rule: $$\frac{d}{dx}\big( u+v \big) = \frac{du}{dx} + \frac{dv}{dx},$$ and the Constant Multiple Rule: $$\frac{d}{dx}\big( cu \big) = c\frac{du}{dx}.$$

The two theorems can be combined into one. It relies on the following idea: given two functions $f,g$, their *linear combination* is a new function $pf+qg$, where $p,q$ are two constant numbers.

**Theorem (Linearity of Differentiation).** (A) The difference of a linear combination of two functions is the linear combination of their differences and the difference quotient of a linear combination of two functions is the linear combination of their difference quotients; i.e., for any two functions $f,g$ defined at the adjacent nodes $x$ and $x+\Delta x$ of a partition, the differences and the difference quotients (defined at the corresponding secondary node) satisfy:
$$\Delta(pf+qg)=p\Delta f+q\Delta g.$$
and
$$\frac{\Delta(pf+qg)}{\Delta x}=p\frac{\Delta f}{\Delta x}+q\frac{\Delta g}{\Delta x}.$$
(B) A linear combination of two functions differentiable at a point is differentiable at that point and its derivative is equal to the linear combination of their derivatives; i.e., for any two functions $f,g$ differentiable at $x$, we have at $x$:
$$\frac{d(pf+qg)}{d x}=p\frac{d f}{d x}+q\frac{d g}{d x}.$$

In other words, our “function of functions” has the same property as a linear polynomial: $$\newcommand{\ra}[1]{\!\!\!\!\!\xrightarrow{\quad#1\quad}\!\!\!\!\!} \newcommand{\da}[1]{\left\downarrow{\scriptstyle#1}\vphantom{\displaystyle\int_0^1}\right.} % \begin{array}{ccccccccccccccc} pf+qg & \mapsto & \begin{array}{|c|}\hline\quad \frac{d}{dx} \quad \\ \hline\end{array} & \mapsto & pf' +qg'. \end{array}$$

The hierarchy of polynomials and their derivatives was used in the last chapter to model free fall.

- The derivative of a constant polynomial is zero:

$$(c)'=0.$$

- The derivative of a linear polynomial is constant:

$$(mx+b)'=(mx)'+(b)'=m(x)'+0=m\cdot 1=m.$$

- The derivative of a quadratic polynomial is linear:

$$(ax^2+bx+c)'=(ax^2)'+(bx)'+(c)'=a(x^2)'+b(x)'+0=a\cdot 2x+b\cdot 1=2ax+b.$$
And so on: combined with the Power Formula, the two rules above allow us to differentiate *all* polynomials. Every time, the degree goes down by $1$! The general result is as follows.

**Theorem.** The derivative of a polynomial of degree $n>0$,
$$f(x)=a_nx^n+a_{n-1}x^{n-1}+...+a_{2}x^2+a_{1}x+a_0,\ a_n\ne 0,$$
is a polynomial of degree $n-1$,
$$f'(x)=na_nx^{n-1}+(n-1)a_{n-1}x^{n-2}+...+2a_{2}x+a_{1},\ a_n\ne 0.$$

**Exercise.** Prove the theorem.

## 2 Differentiation over compositions: the Chain Rule

How does one express the derivative of the composition of two functions in terms of their derivatives?

**Example.** Treating functions as transformations suggest an easy answer.

- If the first transformation is a stretch by a factor of $2$, i.e., the derivative is $2$, and
- the second transformation is a stretch by a factor of $3$, i.e., the derivative is $3$, then
- the composition of the two transformations is a stretch by a factor of $3\cdot 2=6$, i.e., the derivative is $6$:

We *multiply* the derivatives. $\square$

**Example.** Let's confirm this idea with a very simple example. Consider two *linear polynomials*:
$$\begin{array}{lllll}
x&=qt&\Longrightarrow & \frac{\Delta x}{\Delta t}=\frac{dx}{dt}&=q\\
\quad\quad\circ&&&&\ \ \times\\
y&=mx&\Longrightarrow& \frac{\Delta y}{\Delta x}=\frac{dy}{dx}&=m\\
\hline
y&=m(qt)=mqt&\Longrightarrow& \frac{\Delta y}{\Delta t}=\frac{dy}{dt}&=m\cdot q&=\frac{\Delta x}{\Delta t}\cdot\frac{\Delta y}{\Delta x}=\frac{dx}{dt}\cdot\frac{dy}{dx}
\end{array}$$
We see their derivatives and, which is the same think for linear polynomials, their difference quotients. In either case, we see how the intermediate variable, $\Delta x$ or $dx$, is “cancelled”:
$$\frac{\tiny{\Delta x}}{\Delta t}\cdot\frac{\Delta y}{\tiny{\Delta x}}=\frac{\Delta y}{\Delta t},\quad \frac{\tiny{dx}}{dt}\cdot\frac{dy}{\tiny{dx}}=\frac{dy}{dt}.$$

$\square$

**Example.** We pose the following problem. Suppose a car is driven through a mountain terrain. Its location and its speed, as seen on a map, are known. The *grade* of the road is also known. How fast is the car climbing?

We set up two functions, for the location and the altitude. Then their composition is what we are interested in:

The graph of the second function is literally the profile of the road.

We already know that if the location, $f$, depends on time continuously and the altitude, $g$, depends continuously on location, then the altitude depends on time continuously as well, $g\circ f$. We shall also see that the differentiability of both functions implies the differentiability of the composition.

However, let's first dispose of the “Naive Composition Rule”:
$$(f \circ g)' \neq f'\circ g'.$$
We carry out, again, a “unit analysis” to show that such a formula simply *cannot* be true. Suppose

- $t$ is time measured in $\text{hr}$,
- $x=f(t)$ is the location of the car as a function of time -- measured in $\text{mi}$,
- $y=g(x)$ is the altitude of the road as a function of (horizontal) location -- measured in $\text{ft}$, and
- $y=h(t)=g(f(t))$ is the altitude of the road as a function of time -- measured in $\text{ft}$.

Then,

- $f'(t)$ is the (horizontal) velocity of the car on the road -- measured in $\frac{\text{mi}}{\text{hr}}$, and
- $g'(x)$ is the rate of incline (slope) of the road -- measured in $\frac{\text{ft}}{\text{mi}}$, with the input still measured in $\text{mi}$.

It doesn't even matter now what $h'$ is measured in; just try to compose these two functions...
It is impossible because the units of the output of the former and the input of the latter don't match! However, this *is* possible:

- $f'(t)\cdot g'(x)$ is their product -- measured in $\frac{\text{mi}}{\text{hr}}\cdot \frac{\text{ft}}{\text{mi}}=\frac{\text{ft}}{\text{hr}}$; compare to
- $h'(t)$ is the altitude of the road as a function of time -- measured in $\frac{\text{ft}}{\text{hr}}$.

Why does this make sense?

- 1. How fast you are climbing is proportional to your horizontal speed.
- 2. How fast you are climbing is proportional to the slope of the road.

$\square$

Thus, the derivative of the composition of two linear functions is the product of the two derivatives! Considering the fact that, as far as derivatives at a fixed point are concerned, *all functions are linear*, we have strong evidence in support of this conjecture.

Unfortunately, derivatives aren't fractions! But difference quotients are:
$$\frac{\Delta y}{\Delta x}\cdot\frac{\Delta x}{\Delta t}=\frac{\Delta y}{\Delta t}.$$
The only difference from the other rules we have considered is that there are *two* partitions and $f$ must map the partition for $t$ to the partition of $x$:

**Theorem (Chain Rule).** (A) The difference quotient of the composition of two functions is found as the product of the two difference quotients; i.e., for any function $x=f(t)$ defined at two adjacent nodes $t$ and $t+\Delta t$ of a partition and any function $y=g(x)$ defined at the two adjacent nodes $x=f(t)$ and $x+\Delta x=f(t+\Delta t)$ of a partition, we have the differences and the difference quotients (defined at the secondary nodes $c$ and $q=f(c)$ within these edges of the two partitions respectively) satisfy:
$$\Delta (g\circ f)(c)= \Delta g\, (q),$$
and, provided $\Delta x\ne 0$,
$$\frac{\Delta (g\circ f)}{\Delta t}(c)= \frac{\Delta g}{\Delta x}(q) \cdot \frac{\Delta f}{\Delta t}(c).$$
(B) The composition of a function differentiable at a point and a function differentiable at the image of that point is differentiable at that point and its derivative is found as a product of the two derivatives; specifically, if $x=f(t)$ is differentiable at $t=c$ and $y=g(x)$ is differentiable at $x=q=f(c)$, then we have:
$$\frac{d (g\circ f)}{dt}(c)= \frac{dg}{dx}(q) \cdot \frac{df}{dt}(c).$$

**Proof.** The formula for difference quotients is deduced as follows:
$$\begin{array}{lll}
\frac{\Delta (g\circ f)}{\Delta t}(c)&=\frac{(g\circ f)(t+\Delta t)-(g\circ f)(t)}{\Delta t}\\
&=\frac{g(f(t+\Delta t))-g(f(t))}{f(t+\Delta t)-f(t)}\frac{f(t+\Delta t)-f(t)}{\Delta t}\\
&=\frac{g(x+\Delta x)-g(x)}{\Delta x}\frac{f(t+\Delta t)-f(t)}{\Delta t}\\
&=\frac{\Delta g}{\Delta x}(q) \cdot \frac{\Delta f}{\Delta t}(c).
\end{array}$$
Now we are to take the limit of the formula, with $c=t$, as
$$\Delta t \to 0.$$
Now, since $x=x(t)$ is continuous, we conclude that we also have: $\Delta x \to 0$. Therefore, we have:
$$\begin{array}{lll}
\ \frac{\Delta g}{\Delta t} &=&\ \frac{\Delta g}{\Delta x}(f(t))&\cdot&\ \ \frac{\Delta f}{\Delta t}(t)\\
\quad \downarrow&&\quad \downarrow&&\quad \downarrow\\
\ \frac{dg}{dt} & = &\ \frac{dg}{dx}(f(t))&\cdot&\ \ \frac{df}{dt}(t)
\end{array} $$
The idea seems to have worked out... The trouble is, we assumed that $\Delta x \neq 0$! What if $x=f(t)$ is constant in the vicinity of $t$? A complete proof will be provided later. $\blacksquare$

**Exercise.** Find another, non-constant, example of a function $x=f(t)$ such that $\Delta f$ may be zero even for small values of $\Delta t$.

The formula in the Lagrange notation is as follows: $$(g\circ f)'(t) = g'(f(t))\cdot f'(t).$$

**Example.** Find the derivative of:
$$y = (1 + 2x)^{2}.$$
The function is computed in two consecutive steps (that's how we know this is a composition):

- step 1: from $x$ we compute $1+2x$, and then
- step 2: we square the outcome of the first step.

We then introduce an additional, disposable, variable in order to store the outcome of step 1: $$u=1+2x.$$ Then step 2 becomes: $$y=u^2.$$ This is our decomposition: $x \mapsto u \mapsto y$. Now the derivatives: $$\begin{array}{llll} u & = 1 + 2x &\Longrightarrow&\frac{du}{dx} &= 2 \\ y & = u^{2} &\Longrightarrow&\frac{dy}{du}&= 2u \\ \text{CR } & &\Longrightarrow&\frac{dy}{dx} & = \frac{dy}{du}\cdot\frac{du}{dx} = 2u\cdot 2 = 4u. \end{array} $$ Done. But the answer must be in terms of $x$! Last step: substitute $u = 1 + 2x$. Then the answer is $4(1+2x)$. To verify, expand, $1 + 4x + 4x^{2}$, then use PF. $\square$

**Example.** Now a very simple example that doesn't allow us to circumvent CR. Let
$$y=\sqrt{3x+1}.$$
This is the abbreviated computation (decomposition, the derivatives, CR):
$$\begin{array}{llll}
x \mapsto u=3x+1 \mapsto y=\sqrt{u}\\
\underbrace{x \mapsto u=3x+1} \\
\qquad \frac{du}{dx} = 3 \\
\qquad\qquad\qquad\underbrace{u \mapsto y=\sqrt{u}}\\
\underbrace{ \qquad\qquad\qquad \frac{dy}{du}= \frac{1}{2\sqrt{u}} } \\
\frac{dy}{dx} = \frac{du}{dx}\cdot\frac{dy}{du} = 3\cdot \frac{1}{2\sqrt{u}}= 3\cdot \frac{1}{2\sqrt{3x+1}}.
\end{array} $$

$\square$

**Example.** Find the derivative of:
$$z = e^{\sqrt{3x+1}}$$
*Three* functions this time:
$$ x \mapsto u = 3x+1 \ \mapsto y = \sqrt{u} \ \mapsto z = e^{y}.$$
Fortunately, we already know the derivative of the exponent from the last example. We just *append* that solution with one extra step:
$$\begin{array}{llll}
x \mapsto u=3x+1 \mapsto y=\sqrt{u} \mapsto z = e^{y}\\
\underbrace{x \mapsto u=3x+1} \\
\qquad \frac{du}{dx} = 3 \\
\qquad\qquad\qquad\underbrace{u \mapsto y=\sqrt{u}}\\
\underbrace{ \qquad\qquad\qquad \frac{dy}{du}= \frac{1}{2\sqrt{u}} } \\
\frac{dy}{dx} = \frac{du}{dx}\cdot\frac{dy}{du} = 3\cdot \frac{1}{2\sqrt{u}} \\
\qquad\qquad\qquad\qquad\qquad\qquad \underbrace{ y \mapsto z = e^{y} }\\
\underbrace{ \qquad\qquad\qquad\qquad\qquad\qquad \frac{dz}{dy}=e^y }\\
\frac{dz}{dx} = \left( \frac{du}{dx}\cdot\frac{dy}{du} \right) \cdot\frac{dz}{dy} =3\cdot \frac{1}{2\sqrt{u}}\cdot e^y=3\frac{1}{2\sqrt{3x+1}} e^{\sqrt{3x+1}}.
\end{array} $$
We have applied CR twice! $\square$

The *lesson* we have learned is: three functions -- three derivatives -- multiply them:
$$\begin{array}{rrrrr}
&x &\mapsto u&\mapsto y&\mapsto z \\
\frac{dz}{dx} & = \frac{du}{dx} &\cdot \frac{dy}{du} &\cdot \frac{dz}{dy}
\end{array}$$
These “fractions” appear to cancel again...
$$\frac{dz}{dx} = \frac{\not{du}}{dx} \cdot \frac{\not{dy}}{\not{du}} \cdot \frac{dz}{\not{dy}}.$$
This is the *Generalized Chain Rule* about the derivative of the composition (a “chain”!) of $n$ functions.

The short version of the Chain Rule says:

*the derivative of the composition is the product of the derivatives*,

as functions.

**Example.** However, if we fix the location $x=a$, we can make sense of the derivative of the composition as the *composition of the derivatives*, after all. Indeed, suppose at point $a$ we have the derivative
$$\frac{dy}{dx}=m.$$
What if we think of $dx$ and $dy$ as two new variables -- related to each other by the above equation -- as we have done before?

Then we think of the derivative, $m$, not as a number but a linear function: $$dy=m\cdot dx.$$ If now there is another variable with $$\frac{dx}{dt}=q,$$ we think of $q$ as a linear function: $$dx=q\cdot dt.$$ Then, we have to substitute: $$\begin{array}{lllll} x=x(t)&=qt&\Longrightarrow& dx&=q\cdot dt\\ \quad\quad\circ&\quad\circ&&&\quad\quad\circ\\ y=y(x)&=mx&\Longrightarrow& dy&=m\cdot dx\\ \hline y=y(x(t))&=m(qt)&\Longleftrightarrow& dy&=m\cdot (q\cdot dt) \end{array}$$ We have the composition! $\square$

We can use the Chain Rule to find formulas for other important functions.

**Theorem.** For any $a>0$, we have:
$$\left( a^x\right)'=a^x\ln a.$$

**Proof.** We represent this exponential function in terms of the natural exponential function:
$$a^x=e^{\ln a^x}=e^{x\ln a}.$$
Then,
$$\left( a^x\right)'=\left( e^{x\ln a} \right)'\ \overset{\text{CR}}{=\! =\! =}\ e^{x\ln a} \cdot (x\ln a)'=a^x\cdot \ln a.$$
$\blacksquare$

**Exercise.** Use the idea from the proof above to find the derivative of $x^x$.

## 3 Differentiation over multiplication and division

What happens to the output function of differentiation as we perform such an algebraic operation as *multiplication* with the input functions?

We already know that if the width and the height ($f$ and $g$) of a rectangle are changing continuously then so is its area ($f\cdot g$):

We shall also see that the differentiability of both dimensions implies the differentiability of the area.

However, let's first make sure we avoid the so-called “Naive Product Rule”:
$$(f\cdot g)' \neq f'\cdot g'.$$
The formula is extrapolated from the Sum Rule but it simply *cannot* be true. Let's recast the problem in the terms of motion and take a good look at the *units*. Suppose

- $x$ is time measured in $\text{sec}$,
- $y=f(x)$ is the location of the first person -- measured in $\text{ft}$, and
- $y=g(x)$ is the location of the second person -- measured in $\text{ft}$.

Then

- $f'(x)$ is the velocity of the first person -- measured in $\frac{\text{ft}}{\text{sec}}$, and
- $g'(x)$ is the velocity of the second person -- measured in $\frac{\text{ft}}{\text{sec}}$.

Suppose they are running in two perpendicular directions (east and north), then

- $y=f(x)\cdot g(x)$ is the area of the rectangle enclosed by the two persons -- measured in $\text{ft}^2$.

Therefore,

- $y=\left( f(x)\cdot g(x) \right)'$ is the rate of change of the area -- measured in $\frac{\text{ft}^2}{\text{sec}}$.

Meanwhile,

- $f(x)'\cdot g(x)'$ is an unknown quantity -- measured in $\frac{\text{ft}}{\text{sec}}\cdot \frac{\text{ft}}{\text{sec}}=\frac{\text{ft}^2}{\text{sec}^2}$!

We do notice now that the product of the location and velocity gives the right units: $$f'f,\ g'g \text{ and also } f'g,\ g'f.$$ Which one(s)?

The correct idea -- *cross-multiplication* -- is illustrated below:

As the width and the depth are increasing, so is the area of the rectangle. But the increase of the area cannot be expressed entirely in terms of the increases of the width and depth! This increase is split into two parts corresponding to the two terms in the right-hand side of the formula below.

**Theorem (Product Rule).** (A) The difference quotient of the product of two functions is found as a combination of these functions and their difference quotients. In other words, for any two functions $f,g$ defined at the adjacent nodes $x$ and $x+\Delta x$ of a partition, the differences and the difference quotients (defined at the corresponding secondary node $c$) satisfy:
$$\Delta (f \cdot g)(c)=f(x+\Delta x) \cdot \Delta g(c) + \Delta f(c) \cdot g(x),$$
and
$$\frac{\Delta (f\cdot g)}{\Delta x}(c)=f(x+\Delta x) \cdot \frac{\Delta g}{\Delta x}(c) + \frac{\Delta f}{\Delta x}(c) \cdot g(x).$$
(B) The product of two functions differentiable at a point is differentiable at that point and its derivative is found as a combination of these functions and their derivatives; specifically, given two functions $f,g$ differentiable at $x$, we have:
$$\frac{d (f\cdot g)}{dx}(x)=f(x) \cdot \frac{dg}{dx}(x) + \frac{df}{dx}(x) \cdot g(x).$$

**Proof.**
$$\begin{array}{lll}
\Delta (f \cdot g)(c)&=(f \cdot g)(x+\Delta x)- (f \cdot g)(x)\\
&=f(x+\Delta x) \cdot g(x+\Delta x)- f(x) \cdot g(x)\\
&=f(x+\Delta x) \cdot g(x+\Delta x)- f(x+\Delta x) \cdot g(x) +f(x+\Delta x) \cdot g(x)- f(x) \cdot g(x)\\
&=f(x+\Delta x) \cdot (g(x+\Delta x)- g(x)) +(f(x+\Delta x) - f(x)) \cdot g(x)\\
&=f(x+\Delta x) \cdot \Delta g(c) + \Delta f(c) \cdot g(x).
\end{array}$$
Now, the limit with $c=x$:
$$\begin{array}{lll}
\frac{\Delta (f \cdot g)(x)}{\Delta x}&=f(x+\Delta x) \cdot \frac{\Delta g}{\Delta x} (c)&+ \frac{\Delta f}{\Delta x}(c) \cdot g(x)\\
&\quad\quad \downarrow\quad\quad \quad\ \downarrow&\quad\ \downarrow\quad \quad \quad \\
&\quad\ f(x)\quad \quad \cdot\frac{d g}{d x}(x)&+\ \frac{d f}{d x}(x)\ \cdot g(x)&\text{ as } \Delta x\to 0.\\
\end{array}$$
The first limit is justified by the fact that $f$, as a differentiable function, is continuous. $\blacksquare$

In terms of motion, it is as if two runners are unfurling a flag while running east and north respectively.

The formula in the Lagrange notation is as follows: $$(f \cdot g)'(x) = f(x)\cdot g'(x) + f'(x)\cdot g(x).$$

**Example.** Let
$$y = xe^{x}. $$
Then,
$$ \begin{array}{lllll}
u & = x & \Longrightarrow &\frac{du}{dx} &= (x)' = 1, \\
v & = e^{x} & \Longrightarrow &\frac{dv}{dx} &= (e^{x})' = e^{x}.
\end{array} $$
Apply PR via “cross-multiplication”, the idea of which comes from the picture above:
$$\frac{dy}{dx} = x\cdot e^{x} + e^{x}\cdot 1 = e^{x}(x + 1).$$
$\square$

Next, the derivatives under *division*? We already know that if the width and the height ($f$ and $g$) of a triangle are changing continuously then so is the tangent of its base angle ($f/g$):

We shall also see that the differentiability of either dimension implies the differentiability of the tangent.

However, let's first make sure we avoid the so-called “Naive Quotient Rule”:
$$(f/ g)' \neq f'/ g'.$$
We can repeat the “unit analysis” to show that such a formula simply *cannot* be true. The runners still are running in two perpendicular directions, and we have:

- $y=f(x)/ g(x)$ is unitless, and then
- $y=\left( f(x)/ g(x) \right)'$ is measured in $\frac{1}{\text{sec}}$, while
- $f(x)'/ g(x)'$ is unitless!

**Theorem (Quotient Rule).** (A) The difference quotient of the quotient of two functions is found as a combination of these functions and their difference quotients. In other words, for any two functions $f,g$ defined at the adjacent nodes $x$ and $x+\Delta x$ of a partition, the differences and the difference quotients (defined at the corresponding secondary node $c$) satisfy:
$$\Delta (f / g)(c)=\frac{f(x+\Delta x) \cdot \Delta g(c) - \Delta f(c) \cdot g(x)}{g(x)g(x+\Delta x)},$$
and
$$\frac{\Delta (f/ g)}{\Delta x}(c)=\frac{f(x+\Delta x) \cdot \frac{\Delta g}{\Delta x}(c) - \frac{\Delta f}{\Delta x}(c) \cdot g(x)}{g(x)g(x+\Delta x)},$$
provided $g(x),g(x+\Delta x) \ne 0$. (B) The quotient of two functions differentiable at a point is differentiable at that point and its derivative is found as a combination of these functions and their derivatives; specifically, given two functions $f,g$ differentiable at $x$, we have:
$$\frac{d (f/ g)}{dx}(x)=\frac{f(x) \cdot \frac{dg}{dx}(x) - \frac{df}{dx}(x) \cdot g(x)}{g(x)^2},$$
provided $g(x) \ne 0$.

**Proof.** We start with the case $f=1$. Then we have:
$$\begin{array}{lll}
\frac{\Delta (1/g)(x)}{\Delta x}&=\frac{\frac{1}{g(x+\Delta x)}- \frac{1}{g(x)}}{\Delta x}\\
&=\frac{g(x)- g(x+\Delta x)}{\Delta x g(x+\Delta x)g(x)} \\
&=-\frac{g(x+\Delta x)- g(x)}{\Delta x}\cdot \frac{1}{g(x+\Delta x)\cdot g(x)} \\
&=-\frac{\Delta g}{\Delta x}(c)\cdot \frac{1}{g(x+\Delta x)\cdot g(x)} &\text{ with }c=x\\
&\to -\frac{dg}{dx}(x)\cdot\frac{1}{g(x) \cdot g(x)}&\text{ as } \Delta x\to 0.
\end{array}$$
The limit of the second fraction is justified by the fact that $g$, as a differentiable function, is continuous. Alternatively, we represent the reciprocal of $g$ as a composition:
$$z=\frac{1}{g(x)}\ \Longrightarrow\ z=\frac{1}{y},\ y=g(x)\ \Longrightarrow\ \frac{dz}{dy}=-\frac{1}{y^2},\ \frac{dy}{dx}=g'(x)\ \Longrightarrow\ \frac{dz}{dx}=-\frac{1}{g(x)^2}g'(x),$$
by the *Chain Rule*. Now the general formula follows from the *Product Rule*. $\blacksquare$

The formula is similar to the *Product Rule* in the sense that it also involves *cross-multiplication*.

The formula in the Lagrange notation is as follows: $$\left( \frac{f(x)}{g(x)} \right)' = \frac{f'(x)\cdot g(x) - f(x)\cdot g'(x)}{g(x)^2},$$

**Example.** The tangent:
$$\begin{aligned}
(\tan x)' & = \left( \frac{\sin x}{\cos x} \right)'\\
& \overset{\text{QR}}{=} \frac{(\sin x)' \cos x – \sin x (\cos x)'}{(\cos x)^{2}} \\
& = \frac{\cos x \cos x – \sin x (-\sin x)}{\cos^{2} x} \\
& = \frac{\cos^{2}x + \sin^{2}x}{\cos^{2}x} \quad \text{...use the Pythagorean Theorem...} \\
& = \sec^{2}x.
\end{aligned} $$
$\square$

In the Leibniz notation, this is the form of the *Product Rule*:
$$\frac{d}{dx} \left(uv \right) = \dfrac{du}{dx}\cdot v + \dfrac{dv}{dx}\cdot u,$$
and the Quotient Rule:
$$\frac{d}{dx} \left(\frac{u}{v}\right) = \dfrac{\dfrac{du}{dx}\cdot v – \dfrac{dv}{dx}\cdot u}{v^{2}}.$$

More examples of differentiation...

**Example.** Find
$$(x^{2} + x^{3})' = \lim_{h \to 0} \frac{(x + h)^{2} +(x + h)^{3} - x^{2} - x^{3}}{h}=...$$
Seems like a lot of work... Instead use SR and PF:
$$\begin{array}{lllll}
(x^{2} + x^{3})' & = (x^{2})' + (x^{3})' \\
& = 2x +3x^{2}.
\end{array} $$
$\square$

**Example.** We can differentiate any polynomial easily now:
$$\begin{array}{lllll}
(x^{77} + & 5x^{18} + 6x^{3} - x^{2} + 88)'& \text{ ...try to expand } (x+h)^{77} !\\
& \overset{\text{SR}}{=} (x^{77})' + (5x^{18})' + (6x^{3})' - (x^{2})' + (88)' \\
& \overset{\text{CMR}}{=} (x^{77})' + (5x^{18})' + (6x^{3})' - (x^{2})' + 0 \\
& \overset{\text{PF}}{=} 77x^{77 - 1} + 5\cdot 18x^{13 - 1} + 6\cdot 3x^{3 - 1} - 2x^{2 - 1} \\
& = 77x^{76} + 90x^{17} - 18x^{2} - 2x.
\end{array}$$
$\square$

**Example.** Find
$$ \left( \frac{\sqrt{x}}{x^{2} + 1} \right)'.$$
Consider:
$$ \begin{array}{lllll}
u & = \sqrt{x} &\Longrightarrow &\frac{du}{dx} &= \frac{1}{2\sqrt{x}}, \\
v & = x^{2} + 1 &\Longrightarrow &\frac{dv}{dx} &= 2x.
\end{array} $$
Then,
$$\frac{d}{dx} \left( \frac{u}{v} \right) = \frac{ \dfrac{1}{2\sqrt{x}} (x^2 + 1) - \sqrt{x}\cdot 2x}{( x^2 + 1)^2}. $$
No need to simplify. $\square$

**Example.** This is a different kind of example. Evaluate:
$$\lim_{x \to 5} \frac{2^{x} - 32}{x - 5}. $$
It's just a limit. But we recognize that this is the derivative of some function. We compare the expression to the formula in the definition:
$$ f'(a) = \lim_{x \to a} \frac{f(x) - f(a)}{x - a}, $$
and match. So, we have here:
$$a = 5 ,\ f(x) = 2^{x}, \ f(5) = 2^{5} = 32.$$
Therefore, our limit is equal to $f'(5)$ for $f(x) = 2^{x}$. Compute:
$$f'(x) = (2^{x})' = 2^{x} \ln 2, $$
so
$$f'(5) = 2^{5} \ln 2 = 32 \ln 2.$$
$\square$

This is another interpretation of the formulas. Let's represent the *Sum Rule*, the *Constant Multiple Rule*, and the *Chain Rule* as diagrams:
$$\newcommand{\ra}[1]{\!\!\!\!\!\xrightarrow{\quad#1\quad}\!\!\!\!\!}
\newcommand{\la}[1]{\!\!\!\!\!\xleftarrow{\quad#1\quad}\!\!\!\!\!}
\newcommand{\da}[1]{\left\downarrow{\scriptstyle#1}\vphantom{\displaystyle\int_0^1}\right.}
\newcommand{\ua}[1]{\left\uparrow{\scriptstyle#1}\vphantom{\displaystyle\int_0^1}\right.}
\begin{array}{ccc}
f,g&\ra{\frac{d}{dx}}&f',g'\\
\ \da{+}&SR &\ \da{+}\\
f+g & \ra{\frac{d}{dx}}&(f+g)'=f'+g'
\end{array}\qquad
\begin{array}{ccc}
f&\ra{\frac{d}{dx}}&f'\\
\ \da{\cdot c}& CMR &\ \da{\cdot c}\\
cf & \ra{\frac{d}{dx}}&(cf)'=cf'
\end{array}\qquad
\begin{array}{ccc}
f,g&\ra{\frac{d}{dx}}&f',g'\\
\ \da{\circ}& CR &\ \da{\circ }\\
f\circ g & \ra{\frac{d}{dx}}&(f\circ g)'=f'\circ g'
\end{array}
$$
In the first diagram, we start with a pair of functions at the top left and then we proceed in two ways:

- right: differentiate, then down: add the results; or
- down: add them, then right: differentiate the result.

The result is the same! (Neither the Product Rule nor the Quotient Rule has such an interpretation.)

## 4 The rate of change of the rate of change

If a function is known at the nodes of a partition, its difference quotient is also a function -- known at the secondary nodes. Can we treat the latter as a function too? What is the partition then? We saw in the last chapter how this idea is implemented in order to derive the acceleration from the velocity.

What can we say about the rate of change of this change? If we know only *three* values of a function (first line) at the ends of an interval, we compute the difference quotients along the two intervals (second line) and place the results at the corresponding edge:
$$\begin{array}{ccccccc}
-&f(x_1)&---&f(x_2)&---&f(x_3)&-&\\
-&-\bullet-&\frac{\Delta f}{\Delta x_2}&-\bullet-&\frac{\Delta f}{\Delta x_3}&-\bullet-&-\\
-&-\bullet-&---&\frac{\frac{\Delta f}{\Delta x_3} -\frac{\Delta f}{\Delta x_2}}{c_3-c_2}&---&-\bullet-&-&\\
&x_1&c_2&x_2&c_3&x_3&\\
\end{array}$$
To find the change of this new function, we carry out the same operation and place the result in the middle (third line).

Let's review the construction of the difference quotient.

First, we have an *augmented partition* of an interval $[a,b]$. We partition it into $n$ intervals with the help of the nodes (the end-points of the intervals):
$$a=x_{0},\ x_{1},\ x_{2},\ ... ,\ x_{n-1},\ x_{n}=b;$$
and also provide secondary nodes:
$$ c_{1} \text{ in } [x_{0},x_{1}], \ c_{2} \text{ in } [x_{1},x_{2}],\ ... ,\ c_{n} \text{ in } [x_{n-1},x_{n}].$$

If a function $y=f(x)$ is defined at the nodes $x_k,\ k=0,1,2,...,n$, the difference quotient of $f$ is defined at the secondary nodes of the partition by: $$\frac{\Delta f}{\Delta x}(c_{k})=\frac{f(x_{k+1})-f(x_k)}{x_{k+1}-x_k},\ k=1,2,...,n.$$

The function represents the slopes of the secant lines over the nodes of the partition. In particular, when the location is represented by a function known only at the nodes of the partition, the velocity is then found in this manner. It is now especially important that we have utilized the secondary nodes as the inputs of the new function. Indeed, we can now carry out a similar construction with this function and find the *acceleration*!

We have now a new *augmented partition*, of what? The interval is
$$[p,q],\ \text{ with } p=c_0 \text{ and } q=c_n.$$
We partition it into $n-1$ intervals with the help of the nodes that used to be the secondary nodes in the last partition:
$$p=c_{1},\ c_{2},\ c_{3},\ ... ,\ c_{n-1},\ c_{n}=b.$$
Then the increments are:
$$\Delta c_k=c_{k+1}-c_k.$$
Now, what are the secondary nodes? The primary nodes of the last partition of course! Indeed, we have:
$$ x_{1} \text{ in } [c_{1},c_{2}], \ x_{2} \text{ in } [c_{2},c_{3}],\ ... ,\ x_{n-1} \text{ in } [c_{n-1},c_{n}].$$

We apply the same construction to this partition to the function $g=\frac{\Delta f}{\Delta x}$. The difference quotient function of $g$ is defined at the secondary nodes of the new partition by: $$\frac{\Delta g}{\Delta x}(x_{k})=\frac{g(c_{k+1})-g(c_k)}{c_{k+1}-c_k},\ k=1,2,...,n.$$

**Definition.** The *second difference quotient* of $f$ is defined at the nodes of the partition (**denoted**) by:
$$\frac{\Delta^2 f}{\Delta x^2}(x_{k})=\frac{\frac{\Delta f}{\Delta x}(c_{k+1})-\frac{\Delta f}{\Delta x}(c_k)}{c_{k+1}-c_k},\ k=1,2,...,n.$$

Note that there are:

- $n+1$ values of $f$ (at the nodes),
- $n$ values of $\frac{\Delta f}{\Delta x}$ (at the secondary nodes), and
- $n-1$ values of $\frac{\Delta^2 f}{\Delta x^2}$ (at the nodes except $a$ and $b$).

We will often omit the subscripts for the simplified **notation**:
$$\frac{\Delta^2 f}{\Delta x^2}(x)=\frac{\frac{\Delta f}{\Delta x}(c+\Delta c)-\frac{\Delta f}{\Delta x}(c)}{\Delta c}.$$

Notice that the higher value of the second difference quotient means higher values of the *curvature* of the graph of $y=f(x)$. As another way to see this, imagine yourself driving along a straight part of the road and seeing the tree ahead to remain the same (no curvature), then, as you start to turn, the trees start to pass your field of vision from right to left (curvature):

This construction will be repeatedly used for approximations and simulations. It will be followed, when necessary, by taking its limit.

Let's differentiate $\sin x$ for the second time. In the last chapter, we found its difference quotient over a mid-point partition with a single interval. This time we will need at least two intervals:

- three nodes $x$: $a-h$, $a$, and $a+h$, and
- two secondary nodes $c$: $a-h/2$ and $a+h/2$.

We use the two formulas for the difference quotients of $\sin x$ and $\cos x$ from the last chapter. We write the former for the two secondary nodes, but we re-write the latter for the partition with two nodes $a-h/2,\ a+h/2$ and a single secondary node $x=a$: $$\begin{array}{lllll} \frac{\Delta}{\Delta x}(\sin x)&=\frac{ \sin (h/2)}{h/2}\cdot\cos c,& \frac{\Delta }{\Delta x}(\cos x)=-\frac{ \sin (h/2)}{h/2}\cdot\sin a,\\ \end{array}$$ Therefore, we have at $a$: $$\begin{array}{lllll} \frac{\Delta^2}{\Delta x^2}(\sin x)&=\frac{\Delta }{\Delta x}\left(\frac{\Delta}{\Delta x}( \sin x)\right)(a)\\ &=\frac{\Delta}{\Delta x}\left(\frac{ \sin (h/2)}{h/2}\cdot\cos c\right)&\text{ ...by the first formula... }\\ &=\frac{ \sin (h/2)}{h/2}\frac{\Delta \cos}{\Delta x}(a)&\text{ ...by CMR... }\\ &=\frac{ \sin (h/2)}{h/2}\left(-\frac{ \sin (h/2)}{h/2}\cdot\sin a\right)&\text{ ...by the second formula }\\ &=-\left(\frac{ \sin (h/2)}{h/2}\right)^2\cdot\sin a. \end{array}$$

Similarly, we find: $$\frac{\Delta }{\Delta x}(\cos x)=-\frac{ \sin (h/2)}{h/2}\cdot\sin c\ \Longrightarrow\ \frac{\Delta^2}{\Delta x^2}(\cos x)=-\left(\frac{ \sin (h/2)}{h/2}\right)^2\cdot\cos a.$$

For the exponential function, we need a left-end partition with two intervals:

- three nodes $x$: $a-h$, $a$, and $a+h$, and
- two secondary nodes $c$: $a-h$ and $a$.

Then, we find at $a$: $$\frac{\Delta }{\Delta x}(e^x)=\frac{ e^h-1}{h}\cdot e^{c-h/2}\ \Longrightarrow\ \frac{\Delta^2}{\Delta x^2}(e^x)=\left(\frac{ e^h-1}{h}\right)^2\cdot e^{a-h}.$$

## 5 Repeated differentiation

**Example.** Let's continue to differentiate the sine:
$$\begin{array}{llll}
(\sin x)' & = \cos x &\\
(\cos x)' & = -\sin x & \Longrightarrow &(\sin x)' ' &=-\sin x\\
(-\sin x)' & = -\cos x & \Longrightarrow &(\sin x)' ' ' &=-\cos x\\
(-\cos x)' & = \sin x & \Longrightarrow &(\sin x )' ' ' ' &= \sin x.
\end{array} $$
And we are back where we started, i.e., the differentiation process for this particular function is cyclic! $\square$

We use the following terminology and **notation** for the consecutive derivatives of function $f$:
$$\begin{array}{|l|l|l|l|}
\hline
\text{function } & f & f^{(0)}&\\
\text{first derivative } & f' & f^{(1)}&\frac{df}{dx}\\
\text{second derivative } & f' '=(f')' & f^{(2)}=\left(f^{(1)}\right)'&\frac{d^2f}{dx^2}=\frac{d}{dx}\left( \frac{df}{dx} \right)\\
\text{third derivative } & f' ' '=(f' ')'& f^{(3)}=\left(f^{(2)}\right)'&\frac{d^3f}{dx^3}=\frac{d}{dx}\left( \frac{d^2f}{dx^2} \right)\\
...&&...&...\\
n\text{th derivative } & & f^{(n)}=\left(f^{(n-1)}\right)'&\frac{d^nf}{dx^n}=\frac{d}{dx}\left( \frac{d^{n-1}f}{dx^{n-1}} \right)\\
...&&...&...\\
\hline
\end{array}$$

Thus, a given differentiable function may produce a *sequence of functions*:
$$
\newcommand{\ra}[1]{\!\!\!\!\!\xrightarrow{\quad#1\quad}\!\!\!\!\!}
\newcommand{\da}[1]{\left\downarrow{\scriptstyle#1}\vphantom{\displaystyle\int_0^1}\right.}
%
\begin{array}{ccccccccccccccccc}
f & \mapsto & \begin{array}{|c|}\hline\quad \tfrac{d}{dx} \quad \\ \hline\end{array} & \mapsto & f' & \mapsto & \begin{array}{|c|}\hline\quad \tfrac{d}{dx} \quad \\ \hline\end{array} & \mapsto & f' '& \mapsto & ...& \mapsto & f^{(n)}& \mapsto &...
\end{array}$$
provided the outcome of each step is differentiable as well. In the abbreviated form the sequence is:
$$\newcommand{\ra}[1]{\!\!\!\!\!\xrightarrow{\quad#1\quad}\!\!\!\!\!}
\newcommand{\la}[1]{\!\!\!\!\!\xleftarrow{\quad#1\quad}\!\!\!\!\!}
\newcommand{\da}[1]{\left\downarrow{\scriptstyle#1}\vphantom{\displaystyle\int_0^1}\right.}
\newcommand{\ua}[1]{\left\uparrow{\scriptstyle#1}\vphantom{\displaystyle\int_0^1}\right.}\begin{array}{cccccccccccc}
f &\ra{\frac{d}{dx}} &f' &\ra{\frac{d}{dx}} & f' ' &\ra{\frac{d}{dx}} &...&\ra{\frac{d}{dx}} & f^{(n)} &\ra{\frac{d}{dx}} & ...
\end{array}$$

Note that, for a fixed $x$, the sequence of numbers:
$$f(x),\ f'(x),\ f' '(x),\ ...,\ f^{(n)}(x),\ ...$$
is just that, a *sequence*, a concept familiar from the last chapter. However, as the example of $\sin x$ shows, this sequence doesn't have to converge:
$$\left( \sin x \right)^{(n)}\Big|_{x=0},\ n=0,1,2,3,...\ \leadsto\ 0,-1,0,1,0,...$$
We will see in Part II that some “linear combinations” of the derivatives that produce convergent sequences...

Let's try to compute as many consecutive derivatives as possible, or even all of them, for the functions below.

**Example.** The positive integer powers. The PF applies:
$$(x^{n})' = nx^{n-1}.$$
The power decreases by $1$ every time. Therefore,
$$(x^{n})^{ (n+1)} = 0.$$
Then, it stays $0$:
$$(x^{n})^{ (n+1)} = (x^{n})^{ (n+2)}=...=0.$$
The powers in the sequence of derivatives decrease to $0$ and then remain constant.
$\square$

**Example.** The exponent. Since
$$(e^{x})' = e^{x},$$
we have:
$$(e^{x})^{(n)} = e^{x}.$$
The function remains the same! The sequence of derivatives is constant. $\square$

**Example.** The trig functions. Same for both sine and cosine:
$$\begin{aligned}
(\sin x)^{(4n)} & = \sin x \\
(\cos x)^{(4n)} & = \cos x
\end{aligned}$$
The sequence of derivatives is cyclic for both functions. $\square$

**Example.** The negative integer powers. We apply PF again:
$$\begin{aligned}
(x^{-1})' & = -1x^{-2}, \\
(-x^{-2})' & = 2x^{-3},\\
...
\end{aligned}$$
The power goes down by $1$ every time and, as a result, tends to $–\infty$. The sequence doesn't stop. $\square$

**Exercise.** Show that the same happens with all non-integer powers.

Differentiation creates a certain dynamic among the functions: $$\newcommand{\ra}[1]{\!\!\!\!\!\xrightarrow{\quad#1\quad}\!\!\!\!\!} \newcommand{\la}[1]{\!\!\!\!\!\xleftarrow{\quad#1\quad}\!\!\!\!\!} \newcommand{\lra}[1]{\xleftarrow{\quad\quad#1\quad}\!\to} \newcommand{\da}[1]{\left\downarrow{\scriptstyle#1}\vphantom{\displaystyle\int_0^1}\right.} \newcommand{\ua}[1]{\left\uparrow{\scriptstyle#1}\vphantom{\displaystyle\int_0^1}\right.}\begin{array}{|ccccccccc|} \hline &&&&&&&& \ \curvearrowleft^{\frac{d}{dx}}\\ x^n &\ra{\frac{d}{dx}} &nx^{n-1} &\ra{\frac{d}{dx}} &...&\ra{\frac{d}{dx}} & \text{constant} &\ra{\frac{d}{dx}} & 0\\ \hline \frac{1}{n} &\ra{\frac{d}{dx}} &-\frac{1}{n^2} &\ra{\frac{d}{dx}} & \frac{2}{n^3} &\ra{\frac{d}{dx}} & ...\\ \hline \sin x&\ra{\frac{d}{dx}}&\cos x\\ \ \ua{\frac{d}{dx}}& &\ \da{\frac{d}{dx}}\\ -\cos x & \la{\frac{d}{dx}}& -\sin x\\ \hline e^{-x}&\lra{\frac{d}{dx}}&-e^{-x}\\ \hline \ \curvearrowleft^{\frac{d}{dx}}\\ e^x \\ \hline \end{array}$$

Warning: In Part III, we will see that the function and its derivative are two animals of very different breeds. As a result, the dynamics discussed above will disappear in higher dimensions.

The repeated differentiation process may fail to continue when the $k$th derivative is *not* differentiable, i.e., when the following limit does not exist:
$$f^{(k)}(a)=\lim_{h\to 0} \frac{f^{(k-1)}(a+h)-f^{(k-1)}(a)}{h}.$$

**Definition.** A function $f$ is called *twice, thrice, ..., $n$ times differentiable* when $f',f' ',f' ' ',..., f^{(n)}$ exists. When the derivative exists for all $n$, we call the function *smooth*.

The functions that we have treated above are smooth inside their domains.

**Example.** This function is differentiable but not twice differentiable:
$$f(x)=\begin{cases}
-x^2&\text{ if } x<0;\\
x^2&\text{ if } x\ge 0.
\end{cases}$$
Its graph *looks* smooth:

There is no doubt in which direction a beam of light would bounce off such a surface. However, let's compute the derivatives. It is easy for $x\ne 0$ because there is only one formula: $$f(x)=\begin{cases} -2x&\text{ if } x<0;\\ 2x&\text{ if } x> 0. \end{cases}$$ For the case of $x=0$, we consider the two one-sided limits: $$\lim_{h\to 0^-}\frac{f(0+h)-f(0)}{h}=\lim_{h\to 0^-}\frac{f(h)}{h}=\lim_{h\to 0}\frac{-h^2}{h}=\lim_{h\to 0}(-h)=0;$$ $$\lim_{h\to 0^+}\frac{f(0+h)-f(0)}{h}=\lim_{h\to 0^+}\frac{f(h)}{h}=\lim_{h\to 0}\frac{h^2}{h}=\lim_{h\to 0}h=0.$$ They match! Therefore, $$f'(0)=0.$$ We have discovered that $f'(x)=2|x|$. It's not differentiable at $0$! $\square$

**Example.** More examples of this kind:

- $\sin\frac{1}{x}$ is discontinuous at $x=0$;
- $x\sin\frac{1}{x}$ is continuous at $x=0$ but not differentiable;
- $x^2\sin\frac{1}{x}$ is differentiable at $x=0$ but not twice differentiable.

$\square$

**Exercise.** Prove the above statements.

Below we visualize the relation between these classes of functions:

What is the geometric meaning of these *higher derivatives* for a given function?

Let's consider the first derivative. It represents the slopes of the function. Then the second derivative represents the rate of change of these slopes. Notice how changing slopes are seen as rotating tangents:

Specifically, we see:

- decreasing slopes = tangents rotate clockwise;
- increasing slopes = tangents rotate counter-clockwise.

This matches our convention from trigonometry that counter-clockwise is the positive direction for rotations.

Even though we typically have functions with the $n$th derivative for each positive integer $n$, only the first two reveal something visible about the graph of the original function.

Above we compare

- the
*shapes*of the patches of the graph of the function $f$ to the sign of the*values*of the first derivative $f'$; and - the
*shapes*of the patches of the graph of the function $f$ to the sign of the*values*of the second derivative $f' '$.

There are three main levels of analysis of a function:

*Analysis at level*$0$: the values of $f$. We ask, how large? The findings are about the values, $x$- and $y$-intercepts, asymptotes and other large-scale behavior, periodicity, etc.*Analysis at level*$1$: the slopes of $f$. We ask, up or down? The findings are about the angles, increasing/decreasing behavior, critical points, etc.*Analysis at level*$2$: the rate of change of the slopes of $f$. We ask, concave up or down? The findings are about the change of steepness, concavity, telling a maximum from a minimum, etc.

We can go on and continue to discover more and more subtle but less and less significant properties of the function...

This three-level analysis also applies to our study of motion, below.

The derivative of the velocity and, therefore, the second derivative of the position, is called the *acceleration*. The concept allows one to add another level of analysis of motion:

- Analysis at level $0$: the location, where?
- Analysis at level $1$: the velocity, how fast? forward or back?
- Analysis at level $2$: the acceleration, how large is the force?

Suppose $t$ is time and $y$ is the vertical dimension, the height. Now the specific case of *free fall*... These are the initial conditions:

- $y_0$ is the initial height, $y_0=y\Big|_{t=0}$, and
- $v_y$ is the initial vertical component of velocity, $\frac{dy}{dt}\Big|_{t=0}$.

Then, we have: $$\begin{array}{llllll} y&=y_0+v_yt-\tfrac{1}{2}gt^2&\Longrightarrow& \frac{dy}{dt}&=v_y&-gt&\Longrightarrow&\frac{d^2y}{dt^2}&=-g. \end{array}$$ Now, from the point of the physics of the situation, the derivation should go in the opposite direction:

- when there is no force, the velocity is constant;
- when the force is constant, the velocity is linear on time, etc.

However, at this point we still unable to answer these questions:

- How do we know that only the derivatives of constant functions and none others are zero?
- How do we know that only the derivatives of linear functions and none others are constant?
- How do we know that only the derivatives of quadratic functions and none others are linear?

This reversed process is called *antidifferentiation*. So far, we cannot justify even this, simplest conclusion:
$$f'=0 \Longrightarrow f=c,\ \text{ for some real number }c.$$
We will study these and related questions in the next chapter.

## 6 Change of variables and the derivative

If the distance is measured in *miles* and time in *hours*, the velocity is measured in *miles per hour*. If the distance is measured in *kilometers* and time in *minutes*, the velocity is measured in *kilometers per minute*. In either case, we are dealing with the same functions just measured in different units. If the two distance functions match, do the velocity functions too?

Let's recall that we can interpret every composition as a change of variables. We are especially interested in a change of *units* because we often measure quantities in multiple ways:

- length and distance: inches, miles, kilometers, light years;
- time: minutes, seconds, hours, years;
- weight: pounds, kilograms, karats;
- temperature: degrees of Celsius, of Fahrenheit,
- etc.

How does such a change affect calculus as we know it?

If $$y=f(x)$$ is a relation between two quantities $x$ and $y$, then either one may be replaced with a new variable. Let's call them $t$ and $z$ respectively and suppose these replacements are given by some functions:

- case 1: $x=g(t)$;
- case 2: $z=h(y)$.

These substitutions create new relations:

- case 1: $y=k(t)=f(g(t))$;
- case 2: $z=k(x)=h(f(x))$.

The *Chain Rule* gives us the rate of change for each pair:

- case 1:

$$\frac{dk}{dt}=\frac{df}{dx}\frac{dg}{dt};$$

- case 2:

$$\frac{dk}{dx}=\frac{dh}{dy}\frac{df}{dx}.$$

Most often, the conversion formula of a change of units is *linear*.

This is for Case 1.

**Theorem (Linear Chain Rule I).** If
$$g(t)=mt+b$$
and $y=f(x)$ is differentiable, then the derivative of $y=k(t)=f(g(t))$ is given by:
$$k'(t)=mf'(mt+b).$$

**Example.** What if $x$ is time and we change the moment from which we start measuring time, e.g., the “daylight savings time”? We have:
$$g(t)=t+t_0\ \Longrightarrow\ k'(t)=f'(t+t_0).$$
$\square$

**Example.** Suppose $x$ is time and $y$ is the location, then function $g$ may represent the change of units of time, such as to seconds, $x$, from minutes, $t$:
$$x=g(t)=60t.$$
Then, the change of the units won't change a lot about our calculus:

- if $f$ is the location as a function of seconds, $k$ is the location as a function of minutes, and $k(t)=f(60t)$;
- also $f'$ is the velocity as a function of seconds, $k'$ is the velocity as a function of minutes, and $k'(t)=60f'(60t)$;
- also $f' '$ is the acceleration as a function of seconds, then $k' '$ is the acceleration as a function of minutes, and $k' '(t)=60^2f' '(60t)$.

Thus, the graphs of the new quantities describing motion are simply *re-scaled* versions of the graphs of the old ones. $\square$

This is for Case 2.

**Theorem (Linear Chain Rule II).** If
$$h(y)=my+b,$$
and $y=f(x)$ is differentiable, then the derivative of $z=k(x)=h(f(x))$ is given by:
$$k'(x)=mf'(x).$$

**Example.** What if $y$ is the location and we change the place from which we start measuring, e.g., the Greenwich meridian? We have:
$$h(x)=y+y_0\ \Longrightarrow\ k'(x)=f'(x).$$
We can also change the direction of the $x$-axis:
$$h(x)=-y\ \Longrightarrow\ k'(x)=-f'(x).$$
$\square$

**Example.** Suppose $x$ is time and $y$ is the location, then function $h$ may represent the change of units of length, such as from miles, $y$, to kilometers, $x$:
$$z=h(y)=1.6y.$$
Then, the change of the units will change very little about the calculus that we have developed; the coefficient, $m=1.6$, is the only adjustment necessary. Furthermore,

- if $f$ is the location in miles, then $k$ is the location in kilometers: $k(x)=1.6f(x)$;
- also $f'$ is the velocity with respect to miles, $k'$ is the velocity with respect to kilometers, and $k'(x)=1.6f'(x)$;
- also $f' '$ is the acceleration with respect to miles, $k' '$ is the acceleration with respect to kilometers, and $k' '(x)=1.6f' '(x)$.

Thus, the quantities describing motion are simply replaced with their *multiples*. The new graphs are the vertically stretched versions of the old ones. $\square$

**Example.** Recall the example when we have a function $f$ that records the temperature -- in Fahrenheit -- as a function $f$ of time -- in minutes -- replaced with another to records the temperature in Celsius as a function $g$ of time in seconds:

- $s$ time in seconds;
- $m$ time in minutes;
- $F$ temperature in Fahrenheit;
- $C$ temperature in Celsius.

The conversion formulas are: $$m=s/60,$$ and $$C=(F-32)/1.8.$$

These are the relations between the four quantities:
$$g:\quad s \xrightarrow{\quad s/60 \quad} m \xrightarrow{\quad f\quad} F \xrightarrow{\quad (F-32)/1.8\quad} C.$$
And this is the new function:
$$F=k(s)=(f(s/60)-32)/1.8.$$
Then, by the *Chain Rule*, we have:
$$\frac{dF}{ds}=\frac{dF}{dC}\frac{dC}{dm}\frac{dm}{ds}=\frac{1}{1.8}\cdot f'(m)\cdot \frac{1}{60}.$$
$\square$

**Exercise.** Provide a similar analysis for the sizes of shoes and clothing.

**Example.** The conversion of the number of degrees $y$ to the number of radians $x$ is:
$$x=\frac{\pi}{180}y.$$
Then,
$$\frac{dx}{dy}=\frac{\pi}{180}.$$
Therefore, the trigonometric differentiation formulas, such as $\left( \sin x \right)'=\cos x$, don't hold anymore! Indeed, let's denote *sine and cosine for degrees* by $\sin_dy$ and $\cos_dy$ respectively:
$$\sin_dy=\sin \left( \frac{\pi}{180}y \right) \text{ and } \cos_dy=\cos \left( \frac{\pi}{180}y \right).$$
Then,
$$\begin{array}{lll}
\frac{d}{dy}\sin_d y&=\frac{d}{dy}\sin \left( \frac{\pi}{180}y \right)\\
&=\frac{\pi}{180}\cos \left( \frac{\pi}{180}y \right)\\
&=\frac{\pi}{180}\cos_dy.
\end{array}$$
$\square$

**Example.** What if we are to change our unit to a *logarithmic scale*? For example,
$$x=10^t.$$
Then, for any function $y=f(x)$, we have by the Chain Rule:
$$\frac{dy}{dt}=\frac{dy}{dx}\Bigg|_{x=10^t}\cdot \left( 10^t \right)'=\frac{dy}{dx}\Bigg|_{x=10^t}10^t\ln 10.$$
The effect on the derivative is not proportional! $\square$

This is the summary of the Chain Rule: $$ \newcommand{\ra}[1]{\!\!\!\!\!\!\!\xrightarrow{\quad#1\quad}\!\!\!\!\!} \newcommand{\da}[1]{\left\downarrow{\scriptstyle#1}\vphantom{\displaystyle\int_0^1}\right.} \newcommand{\la}[1]{\!\!\!\!\!\!\!\xleftarrow{\quad#1\quad}\!\!\!\!\!} \newcommand{\ua}[1]{\left\uparrow{\scriptstyle#1}\vphantom{\displaystyle\int_0^1}\right.} \begin{array}{ccccccccc} & f(g(x)) &\ra{\frac{d}{dx}} &f'(g(x))g'(x) \\ \small\text{substitution }&\quad \da{u=g(x)} & &\ \ \ua{CR} \\ & f(u) &\ra{\frac{d}{du}} &f'(u) \end{array}$$ The method allows us to get from left to right at the top (differentiation with respect to $x$) by taking a detour. We follow the path around the square: substitution, differentiation with respect to $u$, the Chain Rule formula with back-substitution.

We differentiate functions; can we differentiate *relations*?

Recall from Chapter 1 that relations are represented by equations, but not the kind we are used to:
$$\underbrace{x^{2}}_{\text{a number}} - \underbrace{1}_{\text{a number}}=0\quad \leadsto \text{ find a particular number } x.$$
After the substitution, the equation should be true. The equations we are interested in are equations of *functions*, such as the familiar equation of the circle:
$$\underbrace{x^2}_{\text{a function}}+\underbrace{y^{2}}_{\text{a composition of two functions}} =0 \quad \leadsto \text{ find a particular function } y=y(x).$$
After the substitution, the equation should be true for *all* $x$.

The equation *implicitly* defines this function. As we have done in the past, we can make the function $y=y(x)$ *explicit* by solving the equation for $y$:
$$y = \sqrt{1 - x^{2}} \text{ or } y=–\sqrt{1 - x^{2}}.$$

However, what if we want only the *rate of change* of this, unknown, function?

We will rely on the following fact: if two functions are equal, for all nodes $x$, of a partition then so are their difference quotients, for all secondary nodes $c$: $$f(x)=g(x) \text{ for all } x\ \Longrightarrow\ \frac{\Delta f}{\Delta x}(c)=\frac{\Delta g}{\Delta x}(c) \text{ for all } c.$$

**Example (circle).** Find the *secant line* through the two points on the circle of radius $1$ centered at $0$:
$$(0,1) \text{ and } \left( \tfrac{\sqrt{2}}{2},\tfrac{\sqrt{2}}{2} \right).$$

Typically, a curve has been the graph of a function $y = x^{2}$, $y = \sin x$, etc., given explicitly. This time the equation is: $$x^{2} + y^{2} = 1.$$ To find the slope of the secant line, we need the difference quotient of the function but there is no, explicit, function!

The idea is to consider the above equation as a relation between the two variables. In fact, we think of $y=y(x)$ as a function of $x$, i.e.: $$ x^{2} + y(x)^{2} = 1.$$ We will also assume that

- the two $x$-values $x_0 =0$ and $x_1\frac{\sqrt{2}}{2}$ are nodes of a partition of the $x$-axis, and
- the two $y$-values $y_0 =1$ and $y_1= \frac{\sqrt{2}}{2}$ are nodes of a partition of the $y$-axis.

We apply the *Chain Rule* to both sides of the equation:
$$\begin{array}{rll}
\frac{\Delta }{\Delta x} \left( x^{2} + y^{2} \right) & = \frac{\Delta }{\Delta x} (1) &\Longrightarrow\\
\frac{\Delta }{\Delta x} x^{2} + \frac{\Delta }{\Delta x}y^{2} &= 0 &\Longrightarrow\\
(x_0+x_1) + (y_0+y_1) \frac{\Delta y}{\Delta x} &= 0 &\Longrightarrow\\
\frac{\Delta y}{\Delta x} &= -\frac{x_0+x_1}{y_0+y_1} &\text{ for } y_0+y_1\ne 0.
\end{array}$$
We have found a formula for the difference quotient but it is still implicit -- because we don't have a formula for $y=y(x)$. Fortunately, we don't need the whole function, just those two points on its graph. We substitute these into the formula above to find:
$$\frac{\Delta y}{\Delta x}= -\frac{0+\frac{\sqrt{2}}{2}}{1+\frac{\sqrt{2}}{2}}= -\frac{\sqrt{2}}{1+\sqrt{2}}.$$
Finally, from the *point-slope formula* we obtain the answer:
$$y - \frac{\sqrt{2}}{2} = -\frac{\sqrt{2}}{1+\sqrt{2}}\left( x - \frac{\sqrt{2}}{2}\right). $$
We can automate this formula and find more secant lines:

$\square$

What about the *derivative*? We will rely on the following fact: if the values of two functions are equal for all $x$ then so are the values of their derivatives:
$$f(x)=g(x) \text{ for all } x\ \Longrightarrow\ f'(x)=g'(x) \text{ for all } x.$$
We can put it simply as: if two functions are equal then so are their derivatives; i.e.,
$$\begin{array}{|c|}\hline\quad f=g \ \Longrightarrow\ f'=g' \quad \\ \hline\end{array}$$

Differentiating an equation of functions and finding the derivative of a function defined by this equation is called *implicit differentiation*.

Let's consider two examples of how this idea may help us with finding tangents to implicit curves.

**Example (circle).** Find the *tangent line* for the circle of radius $1$ centered at $0$ at the point $\left( \tfrac{\sqrt{2}}{2},\tfrac{\sqrt{2}}{2} \right)$.

Typically, a curve has been the graph of a function $y = x^{2}$, $y = \sin x$, etc., given explicitly. This time the equation is: $$x^{2} + y^{2} = 1.$$ To find the slope of the tangent line, we need the derivative, but there is no function to differentiate!

Our approach is to *differentiate the equation* above as a relation between the two variables. As we differentiate, we think of $y=y(x)$ as a function of $x$, i.e.:
$$ x^{2} + y(x)^{2} = 1.$$
This is the result, via the Chain Rule:
$$\begin{array}{rll}
\frac{d}{dx} \left( x^{2} + y^{2} \right) & = \frac{d}{dx} (1) &\Longrightarrow\\
\frac{d}{dx} x^{2} + \frac{d}{dx}y^{2} &= 0 &\Longrightarrow\\
2x + 2y \frac{dy}{dx} &= 0 &\Longrightarrow\\
\frac{dy}{dx} &= -\frac{x}{y} &\text{ for } y\ne 0.
\end{array}$$

We have found a formula for the derivative, but it is still implicit -- because we don't have a formula for $y=y(x)$. Fortunately, we don't need the whole function, just a single point on its graph: $$ x = \frac{\sqrt{2}}{2},\ y = \frac{\sqrt{2}}{2} $$ We substitute these into the formula above to find: $$\frac{dy}{dx}\Bigg|_{x = \frac{\sqrt{2}}{2},\ y = \frac{\sqrt{2}}{2}}= -\frac{x}{y}\Bigg|_{x = \frac{\sqrt{2}}{2},\ y = \frac{\sqrt{2}}{2}}= -1.$$ Finally, from the point-slope formula we obtain the answer: $$y - \frac{\sqrt{2}}{2} = -1\left( x - \frac{\sqrt{2}}{2}\right). $$

Note that we could use the explicit formula $y = \sqrt{1 - x^{2}}$ with the same result: $$\frac{dy}{dx} \overset{\text{CR}}{=} \frac{-2x}{2\sqrt{1 - x^{2}}} = -\frac{x}{1 - x^{2}},$$ after we substitute $x = \frac{\sqrt{2}}{2}$. However, it's only explicit for the upper half of the circle. For a point below the $x$-axis, we'd need to start over and use the other formula, $y = -\sqrt{1 - x^{2}}$.

Observe also that the derivative $\frac{dy}{dx}$ is undefined at $x= \pm 1$ (implicit or explicit) because the denominator is $0$. How do we find the tangent? From the formula we can proceed in two directions:
$$x^{2} + y^{2} = 1 \leadsto
\begin{cases}
y \text{ depends on } x,\\
x \text{ depends on } y.
\end{cases} $$
Then, we can try implicit differentiation of the same equation -- but with respect to $y$ this time. The computation is very similar, and the result is:
$$\frac{dx}{dy} = -\frac{y}{x}.$$
The formula *is* defined for $y = 0$, at the points $(-1,0),\ (1, 0)$. Then, $\frac{dx}{dy} = 0$ at these points. Therefore, the tangent line is $x - 1 = 0 (y-0)$, or $x = 1$. $\square$

**Example (Folium of Descartes).** This curve is given by:
$$x^{3} + y^{3} = 6xy.$$

We differentiate the equation as before:
$$\frac{d}{dx} \left( x^{3} + y^{3} \right) = \frac{d}{dx} (6xy). $$
Using *CR* we notice that every time if we see $y$, the factor $\frac{dy}{dx}$ also appears:
$$\begin{array}{rll}
\frac{d}{dx} (x^{3}) + \frac{d}{dx} (y^{3}) & = 6\frac{d}{dx} (xy) \\
3x^{2} + 3y^{2}\cdot \frac{dy}{dx} &= 6 \left(y + x\frac{dy}{dx} \right) .
\end{array}$$
Solve for $\frac{dy}{dx}$.
$$\begin{array}{rll}
3x^{2} + 3y^{2} \frac{dy}{dx} & = 6y + 6x \frac{dy}{dx} \\
(3y^{2} - 6x) \frac{dy}{dx} & = 6y – 3x^{2} \\
\frac{dy}{dx} & = \underbrace{\frac{6y – 3x^{2}}{3y^{2} - 6x}}_{\text{Fails at } (0,0)!}
\end{array}$$
The end result is: if we know the location $(x, y)$, you know the slope of the tangent at that point. For example, at the tip of the curve we have $x=y$. Therefore, the slope is $\frac{dy}{dx}=-1$. $\square$

Note that in either example, we can cut the curve into pieces each of which is the graph of a function:

Now, implicit differentiation also helps with situations when several quantities depend on each other implicitly as well on time. If we differentiate this dependence equation, we get a dependence between their derivatives. The result is *related rates*.

**Example (air balloon).** Suppose we have an air balloon, spherical in shape. Air is pumped in it at the rate of $5 {}^{\text{in }^{3}}/_{\text{sec}}$. What is the rate of growth of the radius at different radii?

Step one in word problems: introduce variables; let

- $t$ be time,
- $V$ be the volume, and
- $r$ be the radius.

Next, $V$ depends on $t$ and at that moment we have
$$\frac{dV}{dt} = 5,$$
according to the condition. Furthermore, this is a sphere, so
$$V = \frac{4}{3}\pi r^{3}.$$
Here we see that $V$ also depends on $r$; altogether, this is the dependencies we face:
$$\begin{array}{cccc}
t &\to &r\\
&\searrow&\downarrow\\
&&V
\end{array}$$
We *could* reverse the last arrow by finding the inverse: $r = \sqrt[3]{\frac{3}{4\pi}V}$. Instead, we *differentiate the equation itself*. Thus, if two variables are related (via an equation), then so are their derivatives, i.e., the rates of change (hence, “related rates”).

Keeping in mind that both $V$ and $r$ are functions of time, we differentiate the relation with respect to $t$: $$V= \frac{4}{3} \pi r^{3}.$$ The left-hand side is very simple: $$\frac{d}{dt}V=\frac{dV}{dt},$$ but in the right-hand side $r(t)^{3}$ is a composition: $$\frac{d}{dt}\left( \frac{4}{3} \pi r^{3} \right) = \frac{4}{3} \pi \cdot 3r^{2} \frac{dr}{dt}.$$ Thus, we have: $$\frac{dV}{dt} = \frac{4}{3} \pi \cdot 3r^{2} \frac{dr}{dt}.$$

Recall that the rate of change of $V$ is $5$ (at a given moment), so: $$5 = 4\pi r^{2} \frac{dr}{dt},$$ or $$\frac{dr}{dt} = \frac{5}{4\pi r^{2}}.$$

Next, what's the rate of growth of $r$ when $r = 1,\ r = 2,\ r = 3$? $$\begin{array}{lll} r = 1: & \frac{dr}{dt} = \frac{5}{4\pi}; \\ r = 2: & \frac{dr}{dt} = \frac{5}{16\pi}; \\ r = 3: & \frac{dr}{dt} = \frac{5}{36\pi}. \end{array} $$ Indeed, we see a slow-down. $\square$

**Example (sliding ladder).** Suppose a $10$ ft. ladder stands against the wall and its bottom is sliding at $2$ ft/sec. How fast is the top moving when it is $6$ ft from the floor?

Introduce variables:

- $x$ the distance of the bottom from the wall,
- $y$ the distance of the top from the floor, both functions of
- $t$ the time.

Translate the information, as well as the question, about the variables into equations: $$\begin{array}{ll|l} &\text{quantities:}&\text{functions:}\\ \hline \text{always}& x^{2} + y^{2} = 10^{2}&(x(t))^{2} + (y(t))^{2} = 10^{2}\\ \text{now}&\frac{dx}{dt} = 2&x'(t_0)=2\\ \text{now}&y = 6&y(t_0)=6 \\ \text{now}&\frac{dy}{dt} = ?& y'(t_0)=? \end{array}$$ That's the purely mathematical problem to be solved.

We differentiate the equation with respect to the independent variable, $t$: $$\begin{array}{rlll} \frac{d}{dt}\left( x^{2} + y^{2} \right) & = \frac{d}{dt}\left(100\right) \\ 2x\frac{dx}{dt} + 2y\frac{dy}{dt} & = 0,& \text{ solve for } \frac{dy}{dt} \\ \frac{dy}{dt} &= - \frac{x}{y}\frac{dx}{dt},& \text{ substitute } \\ &= -\frac{x}{6} 2, & \text{ now } x=8 \text{ comes from } x^{2} + y^{2} = 100, \\ &= -\frac{8}{6} 2 \\ & = -\frac{8}{3}. \end{array} $$ It is going down! $\square$

**Exercise.** Solve the problem for the moment when the ladder hits the floor.

## 8 Radar gun: the math

**Problem.** Suppose you are driving at a speed $80$ mph when you see a police car positioned off $40$ feet the road. What is the radar gun's reading?

How does the radar gun work? In fact, how does a *radar* work? A signal is sent, it bounces off an object, and, when it comes back, the time lapse is recorded. Then, the distance to the object is computed as:
$$S = \underbrace{\text{ signal's speed }}_{\text{ known }} \cdot \underbrace{\text{ time passed }}_{\text{ measured }}. $$

A radar run does this *twice*.

A signal is sent, it comes back, the time is measured. Then the second time:

- $S_{1} =$ speed $\cdot$ time, at time $t=t_1$,
- $S_{2} =$ speed $\cdot$ time, at time $t=t_2$.

Then, the reading is computed as: $$\text{ estimated speed }= \frac{\text{ change of distance }}{\text{ time between signals }}. $$ No radar gun can do better than that!

To summarize: $$\frac{dS}{dt}\approx \frac{\Delta S}{\Delta t}, $$ where

- $\Delta S=S_{2} - S_{1}$ is the change of distance between the two cars,
- $\Delta t=t_2-t_1$ is the time between signals.

Now the question, is the reading of the radar gun $80$ m/h?

To get an idea of what can happen, consider this extreme example: what if you are just passing in front of the police car, like this?

It is conceivable that at time $t_1$ your car is the same distance from the intersection as it is past the intersection at time $t_2$. Then the $\Delta S=0$! So, the reading *can* be off by a lot...

These are the variables:

- $S$, the distance between the police car to yours.
- $P$, the distance between your car to the intersection.
- $t$, the time, the independent variable, also
- $D=40$, distance from the police car to the road.

Since $80$ m/h is your speed, $\frac{dP}{dt} = 80$. That's what the radar gun is meant to detect. But what does the radar measure in reality is $\frac{dS}{dt}$!

How good an approximation of the real velocity $\frac{dP}{dt}$ is the perceived velocity $\frac{dS}{dt}$? The spreadsheet contains a column of locations $P$ of your car (distances to the intersection), the next one is for the distance $S$ to the police car (plotted first), and finally the average rate of change of $S$.

As we can see, the approximation is best away from the intersection. But, within $75$ feet from the intersection, the reading will be less than $70$ mph!

Next, we establish a functional relation between the two via the Pythagorean Theorem: $$P^{2} + D^{2} = S^{2} \gets \text{These aren't numbers, but variables, i.e., functions.} $$ This connects $P$ and $S$, but not $\frac{dP}{dt}$ and $\frac{dS}{dt}$ yet. We differentiate equation with respect to $t$ to get there: $$\begin{aligned} \frac{d}{dt}\left(P^{2} + D^{2}\right) & = \frac{d}{dt}\left(S^{2}\right)\ \Longrightarrow \\ 2P\cdot \frac{dP}{dt} + 2D\underbrace{\frac{dD}{dt}}_{=0} &= 2S\cdot \frac{dS}{dt}\ \Longrightarrow \\ P\cdot\frac{dP}{dt} &= S\cdot \frac{dS}{dt}\ \Longrightarrow \\ \frac{dS}{dt} &= \frac{P}{S}\frac{dP}{dt}. \end{aligned}$$ Thus, we finally have a relation between these functions. In fact, this is what the radar gun shows: $$\frac{dS}{dt} = \frac{P}{\sqrt{P^2+D^2}}\cdot 80. $$ We plot this relation below, to confirm our earlier conclusions:

Furthermore, we can simplify this relation: $$\cos \alpha = \frac{P}{S}, $$ where $\alpha$ is the angle between the road ahead of you and the direction to the police car.

How does $\alpha$ change as you drive?

- Early: $\alpha$ is close to $0$, so $\cos \alpha$ close to $1$, and, therefore, $\frac{dS}{dt}$ is close to $80$.
- Then, as $\alpha$ increases, $\cos \alpha$ decreases toward $0$, and so does $\frac{dS}{dt}$.
- In the middle, we have $\alpha = \frac{\pi}{2}$, $\cos \alpha = 0$, $\frac{dS}{dt} = 0$.
- As $\alpha$ passes $\frac{\pi}{2}$, $\cos \alpha$ decreases to negative values, and so does $\frac{dS}{dt}$.
- Late: $\alpha$ approaches $\pi$, and $\cos \alpha$ approaches $1$, and, therefore, $\frac{dS}{dt}$ approaches $80$.

**Conclusion:** The radar gun always *under*estimates your speed:
$$\left| \frac{dS}{dt} \right| < 80. $$
Unless, the police car is *on* the road! In that case, what can you do to “improve” the reading? What do you want $\alpha$ to be -- as large as possible!

## 9 The derivative of the inverse function

Let's recall from Chapter 2 that for a given one-to-one and onto function $y=f(x)$, its *inverse* is the function, $x=f^{-1}(y)$, that satisfies
$$f^{-1}(y)=x \text{ if and only if }f(x)=y.$$
An idea is that a function and its inverse represent the *same relation*:

- $x$ and $y$ are related when $y=F(x)$, or
- $x$ and $y$ are related when $x=F^{-1}(y)$.

For example, these are pairs of functions inverse to each other: $$\begin{array}{rl} y=x+2& \text{ vs. } & x=y-2,\\ y=3x&\text{ vs. } & x=\frac{1}{3}y,\\ y=x^2&\text{ vs. } & x=\sqrt{y} \quad\text{ for }x,y\ge 0,\\ y=e^x&\text{ vs. } & x=\ln y \quad\text{ for }y > 0. \end{array}$$

Can we express *the derivative of the inverse of a function in terms of the derivative of the function*?

Let's recall that the inverse “undoes” the effect of the function. The idea applies especially well to the interpretation of the functions as *transformations*. Indeed, what is the meaning of the derivative of a transformation? It is its stretch ratio. Now, if $f$ *stretches* the $x$-axis at the rate of $k$ (at x=a$), then $f^{-1}$ *shrinks* the $y$-axis at the rate of $k$ (at $b=f(a)$); i.e.,
$$\frac{df^{-1}}{dy}(b)=\frac{1}{\frac{df}{dx}(a)}.$$
It's the *reciprocal*!

We can also guess this relation from the following simple picture:

As the $xy$-plane is flipped about the diagonal, this is what happens: $$\text{slope of }f=\frac{\text{ change of }y}{\text{ change of } x}=\frac{A}{B}=\frac{1}{B/A}=\frac{1}{\text{slope of }f^{-1}}.$$

Now the algebra.

We will need the following, algebraic, property of the inverse presented in Chapter 2: $$\begin{array}{lll} f^{-1}(f(x))=x\quad\text{for all }x;\\ f\left(f^{-1}(y) \right) = y \quad\text{for all }y. \end{array}$$ Here is a flowchart representation of this idea: $$\begin{array}{ccccccccccccccc} x & \mapsto & \begin{array}{|c|}\hline\quad f \quad \\ \hline\end{array} & \mapsto & y & \mapsto & \begin{array}{|c|}\hline\quad f^{-1} \quad \\ \hline\end{array} & \mapsto & x &(\text{same }x);\\ y & \mapsto & \begin{array}{|c|}\hline\quad f^{-1} \quad \\ \hline\end{array} & \mapsto & x & \mapsto & \begin{array}{|c|}\hline\quad f \quad \\ \hline\end{array} & \mapsto & y &(\text{same }y). \end{array}$$

**Example.** Suppose we want to find the derivative of the logarithm. We'll use only its definition via the exponential function, as follows. We differentiate this equation (that amounts to the definition of the logarithm) of functions:
$$e^{\ln x} = x.$$
The flow chart below shows the dependencies:
$$ x \mapsto u = \ln x \mapsto y =e^{u} $$
By CR, we have:
$$\begin{aligned}
(e^{\ln x})' & = (\ln x)'\cdot e^{u} \\
& = (\ln x)' e^{\ln x} \\
& = (\ln x)' x &=(x)'=1.
\end{aligned}$$
Therefore,
$$(\ln x)' = \frac{1}{x},$$
whenever $x > 0$. $\square$

Similarly, we can find the derivatives of $\sin^{-1} x$, $\cos^{-1} x$, etc. Let's find the general formula instead.

We differentiate the equation:
$$f^{-1}(f(x))=x.$$
Then, by *CR*, we have
$$\frac{\Delta f^{-1}}{\Delta y}\frac{\Delta f}{\Delta x}=1 \text{ and } \frac{d f^{-1}}{dy}\frac{df}{dx} = 1.$$
The other equation produces the same result!

We have proven the theorem below. We just need to be careful with the variables, as follows.

**Theorem (Inverse Rule).** (A) The difference quotient of the inverse of a function is found as the reciprocal of the its difference quotient; i.e., for any function $f$ defined at two adjacent nodes $x$ and $x+\Delta x$ of a partition with $f(x)\ne f(x+\Delta x)$ so that its inverse function $f^{-1}$ is defined at the two adjacent nodes $f(x)$ and $f(x+\Delta x)$ of a partition, we have the difference quotients (defined at the secondary nodes $c$ and $q$ within these edges of the two partitions respectively) satisfy:
$$\frac{\Delta f^{-1}}{\Delta y}(q)= \frac{1}{\frac{\Delta f}{\Delta x}(c)}.$$
(B) For any one-to-one function $y=f(x)$ differentiable at $x=a$, its inverse $x=f^{-1}(y)$ is differentiable at $b=f(a)$ and we have:
$$\frac{df^{-1}}{dy}(f(a))= \frac{1}{\frac{df}{dx}(a)},$$
or
$$\frac{df^{-1}}{dy}(b)= \frac{1}{\frac{df}{dx}(f^{-1}(b))}.$$

The formulas in the Lagrange notation are as follows: $$\left( f^{-1} (f(a)) \right)' =\frac{1}{f'(a)}$$ and $$\left( f^{-1} (b) \right)' =\frac{1}{f'(f^{-1}(b))}.$$

**Example.** Find $(\sin^{-1} y)'$. There is no formula for this function, but its meaning is (for $-\pi/2\le x\le \pi/2$) as follows:
$$y = \sin x, \text{ or } x = \sin^{-1}y.$$
Since $(\sin x)'=\cos x$, we conclude:
$$(\sin^{-1} y)' =\frac{1}{\cos x}= \frac{1}{\cos(\sin^{-1}y)}.$$
That may be the answer, but it's too cumbersome and should be simplified. We need express $\cos x$ in terms of $\sin x$, which is $y$. By the Pythagorean Theorem
$$\sin^{2} x + \cos^{2} x = 1,$$
we have
$$\begin{aligned}
\cos x & = \sqrt{1 - \sin^{2} x} \\
& = \sqrt{ 1 - y^{2}}.
\end{aligned} $$
Therefore,
$$(\sin^{-1} y)'=\frac{1}{\sqrt{ 1 - y^{2}}}.$$
$\square$

We can apply the theorem to other trigonometric functions. These are the results: $$\begin{aligned} (\sin^{-1}x)' & = \frac{1}{\sqrt{1 - x^{2}}}; \\ (\cos^{-1}x)' & = -\frac{1}{\sqrt{1 - x^{2}}}; \\ (\tan^{-1}x)' & = \frac{1}{1+x^{2}}. \end{aligned}$$

**Exercise.** Prove the formulas.

**Exercise.** Since $(\sin^{-1}x)' = -(\cos^{-1}x)'$, does it mean that $\sin^{-1}x = -\cos^{-1}x$?

We can re-write the Inverse Rule in the Leibniz notation:
$$\frac{dx}{dy}=\frac{1}{\frac{dy}{dx}}\text{ or }\frac{dx}{dy}\frac{dy}{dx}=1.$$
Then *the derivatives of inverses are the reciprocals* of each other. Even though the derivatives aren't fractions, the difference quotients, i.e., the slopes of the secant lines, are:

Then the formula follows from these limits: $$\begin{array}{lll} \frac{\Delta x}{\Delta y}&\cdot& \frac{\Delta y}{\Delta x} &=1\\ \quad \downarrow&&\quad \downarrow\\ \ \frac{dx}{dy}&\cdot&\ \frac{dy}{dx}&=1& \text{ as } \Delta x\to 0,\ \Delta y\to 0. \end{array}$$ The fact that $$\Delta x\to 0 \Longrightarrow\ \Delta y\to 0$$ follows from the continuity of $f$.

Furthermore, if we concentrate on a single point $(a,b)$, where $b=f(a)$, on the graph of $y=f(x)$ and its tangent line, the derivatives $$\frac{dy}{dx}\Bigg|_{x=a} \text{ and } \frac{dx}{dy}\Bigg|_{y=b}$$ are indeed fractions and the reciprocals of each other:

**Theorem (Reciprocal Powers).** For any positive integer $n$ we have:
$$\frac{ d y^{\frac{1}{n}} }{dy}=\frac{1}{n} y^{ \frac{1}{n}-1 }.$$

**Proof.** The inverse of $x=y^{\frac{1}{n}}$ is $y=x^n$. Therefore,
$$\begin{array}{lll}
\frac{ d y^{\frac{1}{n}} }{dy}&=\frac{1}{\frac{d x^{n}}{dx}}
&=\frac{1}{nx^{n-1}}
&=\frac{1}{n\left( y^{\frac{1}{n}} \right)^{n-1}}&=\frac{1}{n y^{\frac{n-1}{n}} }
&=\frac{1}{n}y^{\frac{1}{n}-1}.
\end{array}$$
$\blacksquare$

**Theorem (Rational Powers).** For any positive integers $n$ and $m$ we have:
$$\frac{ d y^{\frac{m}{n}} }{dy}=\frac{m}{n} y^{ \frac{m}{n}-1 }.$$

**Exercise.** Prove the theorem.

## 10 Reversing differentiation

We have encountered the following question several times by now: when we know the velocity at every moment of time, how do we find the location? The question applies equally to the velocity acquired from the location as its difference quotient or as its derivative. We need to “reverse” the effect of differentiation on a function.

But let's start with an even simpler problem: if we know the displacements during each of the time periods, can we find our location? Just add them together to find the total displacement! This is about the *difference*. Suppose we have a function $y=p(x)$ defined at the secondary nodes, $c$, of a partition. How do we find a function $y=f(x)$ defined at the nodes, $x$, of the partition so that $g$ is its difference:
$$\Delta f(c)=p(c)?$$
In other words, we need to solve this equation for $f$:
$$\Delta f=p.$$
Suppose this function $g$ is known but $f$ isn't except for one, initial, value: $y_0=f(a)$. Then the above equation becomes:
$$\Delta f(c_1)=f(x_0+\Delta x_1)-f(x_0)=p(c_1),$$
and we can solve it:
$$f(x_1)= f(x_0+\Delta x_1)=f(x_0)+p(c_1).$$
We continue in this manner and find the rest of the values of $f$:
$$f(x_{k+1})= f(x_k+\Delta x_k)=f(x_k)+p(c_k).$$
This formula is *recursive*: we need to know the last value of $f$ in order to find the next. Though not an explicit formula, the solution is very simple!

Now, the *difference quotient*. Suppose we have a function $y=g(x)$ defined at the secondary nodes, $c$, of a partition. How do we find a function $y=f(x)$ defined at the nodes, $x$, of the partition so that $g$ is its difference quotient:
$$\frac{\Delta f}{\Delta x}(c)=g(c)?$$
In other words, we need to solve this equation for $f$:
$$\frac{\Delta f}{\Delta x}=g.$$
We follow exactly the process above. Suppose this function $g$ is known but $f$ isn't except for one, initial, value: $y_0=f(a)$. Then the above equation becomes:
$$\frac{\Delta f}{\Delta x}(c_1)=\frac{f(x_0+\Delta x_1)-f(x_0)}{\Delta x_1}=g(c_1),$$
and we can solve it:
$$f(x_1)= f(x_0+\Delta x_1)=f(x_0)+g(c_1)\Delta x_1.$$
We continue in this manner and find the rest of the values of $f$:
$$f(x_{k+1})= f(x_k+\Delta x_k)=f(x_k)+g(c_k)\Delta x_k.$$
This formula is also recursive, but, within this limitation, the problem of reversing the effect of the difference quotient is solved!

These two problems are similar to the one of finding the *inverse* of a function. This is how inverse functions appear in algebra; they come from solving equations, for $x$:
$$\begin{array}{llllll}
x^{2} & = 4 & \Longrightarrow& x = 2 & \text{ via } \sqrt{\ \cdot\ }; \\
2^{x} & = 8 & \Longrightarrow& x = 3 & \text{ via } \log_{2}(\cdot ); \\
\sin x & = 0 & \Longrightarrow& x = 0 & \text{ via } \sin^{-1}(\cdot ), \text{ etc.}
\end{array}$$
Now, what if we know the result of differentiation and want to know where it came from? We have just discovered that the inverse of the difference is the sum, no surprise! There may be also some explicit formulas. For example, we can solve this equation, for $f$:
$$\Delta f=(e^h-1)\cdot e^c.$$
This is the solution:
$$f=e^x.$$
Similarly, we solve the equation:
$$\frac{\Delta }{\Delta x}(f)=\frac{ \sin (h/2)}{h/2}\cdot\cos c.$$
by
$$f=\sin x.$$

This is called *anti-differentiation*.

What about the *derivative*? Because it not a fraction but a limit of a fraction, there is no formula, even recursive. Some particular cases are considered in the next chapter.

The above approach still applies. We solve equations with respect to a variable function; for example: $$\begin{array}{llllll} f' & = 2x & \Longrightarrow& f &= x^{2}; \\ f' & = \cos x & \Longrightarrow& f &= \sin x ;\\ f' & =e^x & \Longrightarrow& f &= e^x. \end{array}$$

**Example.** The importance of this “inverse” problem stems from the need to find location from velocity or velocity from acceleration. For example, this is what we derive from our experience with differentiation. For the motion of a free fall, we have the following for the horizontal component:

- the acceleration is zero $\Longrightarrow$ the velocity is constant $\Longrightarrow$ the location is a linear function.

And for the vertical component, we have:

- the acceleration is constant $\Longrightarrow$ the velocity is a linear function $\Longrightarrow$ the location is a quadratic function.

$\square$

We illustrate the idea with a diagram: $$ \newcommand{\ra}[1]{\!\!\!\!\!\xrightarrow{\quad#1\quad}\!\!\!\!\!} \newcommand{\da}[1]{\left\downarrow{\scriptstyle#1}\vphantom{\displaystyle\int_0^1}\right.} % \begin{array}{ccccccccccccccc} x^2 & \mapsto & \begin{array}{|c|}\hline\quad \tfrac{d}{dx} \quad \\ \hline\end{array} & \mapsto & 2x; \\ 2x & \mapsto & \begin{array}{|c|}\hline\quad \left( \tfrac{d}{dx}\right)^{-1} \quad \\ \hline\end{array} & \mapsto & x^2 ... \end{array}$$ ... are there any others? Yes, $(x^{2} + 1)' = 2x $. As a function, $\tfrac{d}{dx}$ isn't one-to-one!

$$\begin{array}{cclc} &&x^2+1\\ &\nearrow&\\ 2x&\to&x^2\\ &\searrow\\ &&x^2-1 \end{array}$$

**Exercise.** We can make any function one-to-one by restricting its domain. How?