This site is devoted to mathematics and its applications. Created and run by Peter Saveliev.

# Functions of several variables

### From Mathematics Is A Science

## Contents

## 1 Overview

Let's review multidimensional functions. We have two axes: the dimension of the domain and the dimension of the range:

We covered the very first cell in Part I. In the last chapter, we made a step in the vertical direction and explored the first column of this table. It is now time to move to the right. We retreat to the first cell because the new material does not depend on the material of the last chapter -- or vice versa, even though they do interact via compsitions. We will not jump diagonally!

We need to appreciate however the different challenges these two steps present. Every *two* numerical functions make a planar parametric curve and, conversely, every planar parametric curve is just a pair numerical functions. On the other hand, we can see that the surface that is the graph of a function of two variables produces -- through cutting by vertical planes -- *infinitely many* graphs of numerical functions.

Note that the first cell has curves but not all of them because some of them fail the vertical line test -- such as the circle -- and can't be represented by graphs of numerical functions. Hence the need for parametric curves. Similarly, the first cell of the second column has surfaces but not all of them because some of them fail the vertical line test -- such as the sphere -- and can't be represented by graphs of functions of two variables. Hence the need for parametric surfaces shown higher in this column. They are presented in the next chapter.

We represent a function diagrammatically as a *black box* that processes the input and produces the output of whatever nature:
$$\newcommand{\ra}[1]{\!\!\!\!\!\xrightarrow{\quad#1\quad}\!\!\!\!\!}
\newcommand{\da}[1]{\left\downarrow{\scriptstyle#1}\vphantom{\displaystyle\int_0^1}\right.}
%
\begin{array}{ccccccccccccccc}
\text{input} & & \text{function} & & \text{output} \\
x & \mapsto & \begin{array}{|c|}\hline\quad f \quad \\ \hline\end{array} & \mapsto & y
\end{array}$$
Let's compare the two:
$$\newcommand{\ra}[1]{\!\!\!\!\!\xrightarrow{\quad#1\quad}\!\!\!\!\!}
\newcommand{\da}[1]{\left\downarrow{\scriptstyle#1}\vphantom{\displaystyle\int_0^1}\right.}
%
\begin{array}{rcccl}
\text{} & & \text{parametric } & & \text{} \\
\text{input} & & \text{curve} & & \text{output} \\
t & \mapsto & \begin{array}{|c|}\hline\quad F \quad \\ \hline\end{array} & \mapsto & X\\
{\bf R}& &&&{\bf R}^m\\
\text{number}& &&&\text{point or vector}
\end{array}\quad
\begin{array}{rcccl}
\text{} & & \text{function of} & & \text{} \\
\text{input} & & \text{two variables} & & \text{output} \\
X & \mapsto & \begin{array}{|c|}\hline\quad f \quad \\ \hline\end{array} & \mapsto & z\\
{\bf R}^m& &&&{\bf R}\\
\text{point or vector}& &&&\text{number}
\end{array}$$
They can be linked up and produce a composition...

But our interest is the latter starting with dimension $m=2$:

The main metaphor for a function of two variables will remain to be *terrain*:

Here, every pair $X=(x,y)$ represents a location on a map and $z=f(x,y)$ is the elevation of the terrain at that location. This is how it is plotted by a spreadsheet:

Now *calculus*.

One subject of calculus is change and *motion* and, among others, we will address the question: if a drop of water land on this surface, in what direction will it flow? We will also consider the issue of the rate of change of the function -- in any direction.

Secondly, calculus studies tangency and *linear approximations* and, among others, we will address the question: if we zoom in on a particular location on the surface, what does it look like? The short answer is: like a plane. It is discussed in the next section.

Examples of this issue have been seen previously. Indeed, recall that the *Tangent Problem* asks for a tangent line to a curve at a given point. It has been solved for parametric curves in the last chapter. However, in real life we see *surfaces* rather than curves. The examples are familiar.

**Example.** In which direction a radar signal will bounce off a plane when the surface of the plane is curved?

In what direction will light bounce off a curved mirror?

What if it is a whole building?

$\square$

If we zoom in on a surface, it starts to look straight like a *plane*.

Recall the $1$-dimensional case. A linear approximation of a function $y=f(x)$ at $x=a$ is function that defines a secant line, i.e., a line on the $xy$-plane through the point of interest $(a,f(a))$ and another point on the graph $(x,f(x))$. Its slope is the difference quotient of the function: $$\frac{\Delta f}{\Delta x}=\frac{f(x)-f(a)}{x-a}.$$

Now, let's see how this plan applies to functions of two variables. A linear approximation of a function $z=f(x,y)$ at $(x,y)=(a,b)$ is a function that represents a secant *plane*, i.e., a plane in the $xyz$-space through the point of interest $(a,b,f(a,b))$ and *two* other points on the graph. In order to insure that these points define a plane, they should be chosen in such a way that they aren't on the same line. The easiest way to accomplish that is to choose the last two to lie in the $x$- and the $y$-directions from $(a,b)$, i.e., $(x,b)$ and $(a,y)$ with $x\ne a$ and $y\ne b$.

The two slopes in these two directions are the two difference quotients, with respect to $x$ and with respect to $y$: $$\frac{\Delta f}{\Delta x}=\frac{f(x,b)-f(a,b)}{x-a} \text{ and } \frac{\Delta f}{\Delta y}=\frac{f(a,y)-f(a,b)}{y-b}.$$

The two lines form our plane but, for approximation, we keep only a triangle.

Furthermore, we build such a triangle in the other three directions along the axes. When done with every sample point on the surface, the result is a *mesh* of triangles:

## 2 Linear functions and planes in ${\bf R}^3$

We will approach the issue in a manner analogous to that for lines.

The standard, slope-intercept, form of the equation of a line in the $xy$-plane is:
$$y=mx+p.$$
A similar, also in some sense *slope-intercept*, form of the equation of a plane in ${\bf R}^3$ is:
$$z=mx+ny+p.$$
Indeed, if we substitute $x=y=0$ we have $z=p$.

Then $p$ is the $z$-intercept! In what sense are $m$ and $n$ slopes? Let's substitute $y=0$ first. We have $z=mx+p$, an equation of a line -- in the $xz$-plane. Its slope is $m$. Now we substitute $x=0$. We have $z=ny+p$, an equation of a line -- in the $yz$-plane. Its slope is $n$. Now, if we cut the plane with any plane parallel to the $xz$-plane, or respectively $yz$-plane, the resulting line has the same slope; for example: $$y=1\ \Longrightarrow\ z=mx+n+p;\quad x=1\ \Longrightarrow\ z=m+ny+p.$$

Therefore, we are justified to say that $m$ and $n$ are the *slopes* of the plane with respect to the variables $x$ and $y$ respectively:
$$\begin{array}{llll}
z=&m&\cdot x&+&n&\cdot y&+&p\\
&x\text{-slope}&&&y\text{-slope}&&&z\text{-intercept}
\end{array}$$
Cutting with a horizontal plane also produces a line:
$$z=1\ \Longrightarrow\ 1=mx+ny+p.$$

Next, the point-slope form of the equation of a line in the $xy$-plane is:
$$y-b=m(x-a).$$
This how we can plot this line. We start with the point $(a,b)$ in ${\bf R}^2$. Then we make a step along the $x$-axis with the slope $m$, i.e., we end up at $(a+1,b+m)$ or $(a+1/m,b+1)$, etc. These two points determine the line. There is a similar, also in some sense *point-slope*, form of the equation of a plane. This is the step we make:
$$\begin{array}{r|l}
\text{in }{\bf R}^2&\text{in }{\bf R}^3\\
\hline
\begin{array}{rrr}
\text{point }(a,b)\\
\text{slope }m
\end{array}&
\begin{array}{lll}
\text{point }(a,b,c)\\
\text{slopes }m,n
\end{array}
\end{array}$$
This analogy produces the following formula:
$$z-c=m(x-a)+n(y-b).$$
Expanding it takes us back to the point-slope formula.

This how we can plot this plane. We start by plotting the point $(a,b,c)$ in ${\bf R}^3$. Now we treat one variable at a time. First, we fix $y$ and change $x$. From the point $(a,b,c)$, we make a step along the $x$-axis with the $x$-slope, i.e., we end up at $(a+1,b,c+m)$ or $(a+1/m,b,c+1)$, etc. The equation of this line is: $$z-c=m(x-a),\ y=b.$$ Second, we fix $x$ and change $y$. We make a step along the $y$-axis with the $y$-slope, i.e., we end up at $(a,b+1,c+n)$ or $(a,b+1/n,c+1)$, etc. The equation of this line is: $$x=a,\ z-c=n(y-b).$$ These three points (or those two lines through the same point) determine the plane. Below, we have $$(a,b,c)=(2,4,1),\ m=-1,\ n=-2.$$

What is a plane anyway? The planes we have considered so far are the graphs of (linear) functions of two variables:
$$z=f(x,y)=mx+ny+p.$$
They have to satisfy the *Vertical Line Test*. Therefore, the vertical planes -- even the two $xz$- and $yz$-planes -- are excluded. This is very similar to the situation with lines and the impossibility to represent *all* lines in the standard form $y=mx+p$ and we need the general (implicit) equation of a line in ${\bf R}^2$:
$$m(x-a)+n(y-b)=0.$$
The general (implicit) equation of a line in ${\bf R}^3$ is:
$$m(x-a)+n(y-b)+k(z-c)=0.$$
Let's take a careful look at the left-hand side of this expression. There is a symmetry here over the three coordinates:
$$\begin{array}{lll}
m&\cdot&(x-a)+\\
n&\cdot&(y-b)+\\
k&\cdot&(z-c)=0
\end{array}\quad\leadsto\quad\left[
\begin{array}{lll}m\\n\\k\end{array}\right]\cdot \left[
\begin{array}{lll}x-a\\y-b\\z-c\end{array}\right]=0.$$
This is the *dot product*! The equation becomes:
$$<m,n,k>\cdot <x-a,y-b,z-c>=0,$$
or even better:
$$<m,n,k>\cdot \bigg( (x,y,z)-(a,b,c) \bigg) =0.$$
Finally a coordinate-free version:
$$N\cdot (P-P_0)=0 \text{ or } N\cdot P_0P=0.$$
Here we have in ${\bf R}^3$:

- $P$ is the variable point,
- $P_0$ is the fixed point, and
- $N$ is any vector that somehow represents the slope of the plane.

The meaning of the vector $N$ is revealed once we remember that the dot product of two vectors is $0$ if and only if they are perpendicular.

These vectors are like bicycle's *spokes* to the hub $N$. This idea gives us our definition.

**Definition.** Suppose a point $P_0$ and a non-zero vector $N$ are given. Then the *plane through $P_0$ with normal vector $N$* is the collection of all points $P$ -- and $P_0$ itself -- that satisfy:
$$P_0P\perp N.$$

This definition gives us different results in different dimensions: $$\begin{array}{llll} \text{dimension}&\text{ambient space}&\text{“hyperplane”}&\\ \hline 2&{\bf R}^2&{\bf R}^1&\text{line}\\ 3&{\bf R}^3&{\bf R}^2&\text{plane}\\ 4&{\bf R}^4&{\bf R}^3&\text{--}\\ ...&...&...&...\\ \end{array}$$ A hyperplane is something very “thin” relative the whole space but not as thin as, say, a curve.

This applicability of the result show that learning the dot product really pays off!

We will need this formula to study parametric surfaces in the next chapter. In this chapter we limit ourselves to functions of two variables and, therefore, non-vertical planes. What makes a plane non-vertical? A non-zero vertical component of the normal vector. Since length don't matter here (only the angles), we can simply assume that this component is equal to one: $$N=<m,n,1>.$$ Then the equation of a plane simplifies: $$0=N\cdot (P-P_0)=<m,n,1>\cdot<x-a,y-b,z-c>=m(x-a)+n(y-b)+z-c,$$ or the familiar $$z=c+m(x-a)+n(y-b).$$ In the vector notation, we have: $$z=c+M\cdot (Q-Q_0).$$ In case of dimension $2$ we have here:

- $Q=(x,y)$ is the variable point,
- $Q_0=(a,b)$ is the fixed point, and
- $M=<m,n>$ is any vector the components of which are the two slopes of the plane.

This is a *linear function*:
$$z=f(x,y)=c+m(x-a)+n(y-b)=p+mx+ny.$$

## 3 An example of a non-linear function

What makes a function *linear*? The implicit answer has been: the variable can only be multiplied by a constant and added to a constant. The first part of the answer still applies, even to functions of many variables as it prohibits multiplication by another variable. You can still add them.

The non-linearity of a function of two variables is seen as soon as it is plotted -- it's not a plane -- but it also suffices to *limit its domain*.

So, this function isn't linear:
$$f(x,y)=xy.$$
In fact, $xy=x^1y^1$ is seen as *quadratic* if we add the powers of $x$ and $y$: $1+1=2$. We come to the same conclusion when we limit the domain of the function to the line $y=x$ in the $xy$-plane; using $y=x$ s a substitution we arrive to a function of one variable:
$$g(x)=f(x,x)=x\cdot x=x^2.$$
So, the part of the graph of $f$ that lies exactly above the line $y=x$ is a *parabola*. And so is the part that lies above the line $y=-x$; it's just open down instead of up:
$$h(x)=f(x,-x)=x\cdot (-x)=-x^2.$$
These two parabolas have a single point in common and therefore make up the essential part of a *saddle point*:

The former parabola gives room for the horse's front and back while the latter for the horseman's legs.

A simpler way to limit the domain is to fix one independent variable at a time. We fix $y$ first: $$\begin{array}{lrl} \text{plane}&\text{equation}&\text{curve}\\ \hline y=2&z=x2&\text{line with slope }2\\ y=1&z=x1&\text{line with slope }1\\ y=0&z=x0=0&\text{line with slope }0\\ y=-1&z=x(-1)&\text{line with slope }1\\ y=-2&z=x(-2)&\text{line with slope }-2\\ \end{array}$$ The view shown below is from the direction of the $y$-axis:

The data for each line comes from the $x$-column of the spreadsheet and one of the $z$-columns. These lines give the lines of elevation of this terrain in a particular, say, east-west direction. This is equivalent to cutting the graph by a vertical plane parallel to the $xz$-plane.

We fix $x$ second: $$\begin{array}{lrl} \text{plane}&\text{equation}&\text{curve}\\ \hline x=2&z=2y&\text{line with slope }2\\ x=1&z=1y&\text{line with slope }1\\ x=0&z=0y=0&\text{line with slope }0\\ x=-1&z=(-1)y&\text{line with slope }1\\ x=-2&z=(-2)y&\text{line with slope }-2\\ \end{array}$$ This is equivalent to cutting the graph by a vertical plane parallel to the $yz$-plane. The view shown below is from the direction of the $x$-axis:

The data for each line comes from the $y$-row of the spreadsheet and one of the $z$-rows. These lines give the lines of elevation of this terrain in a particular, say, north-south direction.

Thus the surface of this graph is made of these straight lines! It's a potato chip:

Another way to analyze the graph is to limit the *range* instead of the domain. We fix the dependent variable this time:
$$\begin{array}{lrl}
\text{elevation}&\text{equation}&\text{curve}\\
\hline
z=2&2=xy&\text{hyperbola}\\
z=1&1=xy&\text{hyperbola}\\
z=0&0=xy&\text{the two axes}\\
z=-1&-1=xy&\text{hyperbola}\\
z=-2&-2=xy&\text{hyperbola}
\end{array}$$
The result is a family of curves:

Each is labelled with the corresponding value of $z$, two branches for each. These lines are the *lines of equal elevation* of this terrain.

We can see who these lines come from cutting the graph by a horizontal plane (parallel to the $xy$-plane).

We can use them to reassemble the surface by lifting each to the elevation indicated by its label:

In the meantime, the colored parts of the graph correspond to *intervals* of outputs.

This surface presented here is called the *hyperbolic paraboloid*.

## 4 Graphs

**Definition.** The *graph* of a function of one variable $z=f(x)$ is the set of all points in ${\bf R}^{2}$ of the form $(x,f(x))$.

In spite of a few exceptions, the graphs of the function of one variable have been *curves*.

**Definition.** The *graph* of a function of two variables $z=f(x,y)$ is the set of all points in ${\bf R}^{3}$ of the form $(x,y,f(x,y))$.

In spite of possible exceptions, the graphs of functions of two variables we encounter will probably be *surfaces*.

It is important to remember that the theory of functions of several variables include that of the functions of one variable! The formula for $f$, for example, will have no mention of $y$ such as $z=f(x,y)=x^2$, etc. The graph of such a function will be feature-less, i.e., constant, in the $y$-direction. It will look like it is made of planks like a park bench.

In fact, the graph can be acquired from the graph of $z=x^2$ (a curve) by shifting it in the $y$-direction producing the surface.

In the last section we fixed one independent variable at a time making a function of *two* variables a function of *one* variable subject to the familiar methods. The idea applies to higher dimensions.

**Definition.** Suppose $z=f(x_1,...,x_n)$ is a function of several variables, i.e., $x_1,...,x_n$ and $z$ are real numbers. Then for each value of $k=1,2,...,n$ and any collection of numbers $x_i=a_i,\ i\ne k$, the numerical function defined by:
$$z=h(x)=f(a_1,...,a_{k-1},x,a_{k+1},...,a_n),$$
is called a *variable function* of $f$. When $n=2$, its graph is called a *variable curve*.

We thus fix all variables but one making the function of $n$ variables a function of a single variable.

**Exercise.** What happens when $n=1$?

Meanwhile, fixing the value of the dependent variable doesn't have to produce a new function. The result is instead an *implicit relation*. The idea applies to higher dimensions too.

**Definition.** Suppose $z=f(X)$ is a function of several variables, i.e., $X$ belongs to some ${\bf R}^n$ and $z$ is a real number. Then for each value of $z=c$, the subset
$$\{X:f(X)=c\}$$
of ${\bf R}^n$ is called a *level set* of $f$. When $n=2$, it is informally called a *level curve* or a *contour curve*.

**Exercise.** What happens when $n=1$?

In general, a level set doesn't have to be a curve as the example of $f(x,y)=1$ shows.

**Theorem.** Level sets don't intersect.

**Proof.** It follows from the *Vertical Line Test*. $\blacksquare$

**Exercise.** Provide the proof.

**Example.** Let's consider:
$$f(x,y)=\sqrt{x^2+y^2}.$$
The level curves are implicit curves and they are plotted point by point...

...unless we recognize them:
$$c=\sqrt{x^2+y^2}\ \Longrightarrow\ c^2=x^2+y^2.$$
They are circles! This surface is a half of a *cone*. How do we know its a cone and not another shape? We limit the domain to this line on the $xy$-plane:
$$y=mx\ \Longrightarrow\ z=\sqrt{x^2+(mx)^2}=\sqrt{1+m}|x|.$$
These are V-shaped curves. $\square$

**Example.** Consider this function of two variables:
$$f(x,y)=y^2-x^2.$$
Its level curves are hyperbolas again just as in the first example.

This is the familiar saddle point from the last section. Its variable curves are different simply because the angle of the graph relative to the axes is different. $\square$

**Example.** If we replace $-$ with $+$, the function of two variables becomes
$$f(x,y)=y^2+x^2$$
with a very different graph; it has an *extreme point*. The variable curves are still parabolas but both point upward this time:

The level curves
$$y^2+x^2=c$$
are circles except when $c=0$ (a point) or $c<0$ (empty). They don't grow uniformly, as with the cone, with $c$ however. The surface is called the *paraboloid of revolution*. One of the parabolas that passes through zero is rotated around the $z$-axis producing this surface. $\square$

The method of level curves has been used for centuries to create actual *maps*, i.e., $2$-dimensional visualizations of $3$-dimensional terrains.

The collection of all level curves is called the *contour map* of the function. We will see that zooming in on any point of the contour map is either a generic point with parallel level curves or a singular point exemplified by a saddle point and an extreme point.

The singular points are little islands in the sea of generic points...

Thus, the variable curves are the result of *restricting the domain* of the function while the level curves are the result of *restricting the image*. Either method is applied in hope of simplifying the function to the degree that will make is subject to the tool we already have. However, the original graph is made of infinitely many of those...

Informally, the meaning of the domain is the same: the set of all possible inputs.

**Definition.** The *natural domain* of a function $z=f(X)$ is the set of all $X$ in ${\bf R}^n$ for which $f(X)$ makes sense.

Just as before, the issue is that of division by zero, square roots, etc. The difference comes from the complexity of the space of inputs: the plane (and further ${\bf R}^n$) vs. the line.

**Example.** The domain of the function
$$f(x,y)=\frac{1}{xy}$$
is the whole plane minus the axes.

$\square$

**Example.** The domain of the function
$$f(x,y)=\sqrt{x-y}$$
is the half of the plane given by the inequality $y\le x$.

$\square$

**Example.** The domain of the function that represents the magnitude of the gravitational force is all points but $0$:

It is a multiple of the function: $$f(X)=\frac{1}{d(X,0)}.$$ $\square$

**Exercise.** Provide the level and the variable curves for the function that represents the magnitude of the gravitational force.

**Definition.** The *graph* of a function $z=f(X)$, where $X$ belongs to ${\bf R}^{n}$, is the set of all points in ${\bf R}^{n+1}$ so that the first $n$ coordinates are those of $X$ and the last is $f(X)$.

We have used the restriction of the image and the level curves that come from it as a tool of *reduction*. We reduce the study of a complex object -- the graph of $z=f(x,y)$ -- to a collection of simpler ones -- implicit curves $c=f(x,y)$. The idea is even more useful in the higher dimensions. In fact, we can't simply plot the graph of a function of three variables anymore -- it is located in ${\bf R}^4$ -- as we did for functions of two variables. The level sets is the only way to visualize it.

The domains of functions of three variables are in our physical space. They may represent:

- the temperature or the humidity,
- the air or water pressure,
- the magnitude of a force (such as gravitation), etc.

**Example.** The level set of a linear function
$$f(x,y,z)=A+mx+ny+kz.$$
are planes. They are all parallel to each other because they have the same normal vector $<m,n,k>$. $\square$

**Example.** We start with a familiar function of *two* variables,
$$f(x,y)=\sin(xy).$$
and just subtract $z$ as the third producing a new function of *three* variables:
$$h(x,y,z)=\sin(xy)-z.$$
Then every of its level sets is given by:
$$d=\sin(xy)-z,$$
for some real $d$. What is this? We can make the relation explicit:
$$z=\sin(xy)-d,$$
Nothing but the graph of $f$ shifted down (and up) by $d$!

The function is growing as we move in the direction of $z$. $\square$

**Example.** Let's consider this function of three variables:
$$f(x,y,z)=\sqrt{x^2+y^2+z^2}.$$
Then every of its level sets is given by this implicit equation:
$$d^2=x^2+y^2+z^2,$$
for some real $d$. Each is an equation of the sphere of radius $|d|$ centered at $0$ (and the origin itself when $d=0$). They are concentric!

The radii also grow uniformly with $d$. So, the function is growing -- and at a constant rate -- as we move in any direction away from $0$. What is this function? It's the *distance function* itself:
$$f(X)=f(x,y,z)=\sqrt{x^2+y^2+z^2}=||X||.$$

The graph of the magnitude of the gravitational force also has concentric spheres are level surfaces but the radii do not change uniformly with $d$. The value of the function is decreasing... $\square$

This is best we can do with functions of three variables. Beyond $3$ variables, we are out of dimensions...

The approach via fixing the independent variables is considered later.

## 5 Limits

We now study small scale behavior of functions; we zoom in on a single point of the graph. Just as in the lower dimensions, one of the most crucial properties of a function is the integrity of its graph: *is there a break or a cut or a hole?* For example, if we think of the graph as a terrain, is there a vertical drop?

We approach the issue via the *limits*.

In spite of all the differences between the functions we have seen -- such as parametric curves vs. functions of several variables -- the idea of limit is identical: as the input approaches a point, the output is also forced to approach a point. This is the context with the arrow “$\to$” to be read as “approaches”: $$\begin{array}{cccc} &\text{numerical functions}\\ &y=f(x)\to l\text{ as }x\to a\\ \text{parametric curves} && \text{functions of several variables}\\ Y=F(t)\to L\text{ as }t\to a && z=f(X)\to l\text{ as }X\to A\\ \end{array}$$ We use lower case for scalars and upper case for anything multi-dimensional (point or vectors). We then see how the complexity shifts from the output to the input. And so does the challenge of multi-dimensionality. We are ready though; this is the familiar meaning of convergence of a sequence in ${\bf R}^m$ to be used throughout: $$X_n\to A\ \Longleftrightarrow\ d(X_n,A)\to 0 \text{ or }||X_n-A||\to 0.$$

**Definition.** The *limit of a function* $z=f(A)$ at a point $X=A$ is defined to be the limit
$$\lim_{n\to \infty} f(X_n)$$
considered for all sequences $\{X_n\}$ within the domain of $f$ excluding $A$ that converge to $A$,
$$A\ne X_n\to A \text{ as } n\to \infty,$$
when all these limits exist and are equal to each other. In that case, we use the **notation**:
$$\lim_{X\to A} f(X).$$
Otherwise, *the limit does not exist*.

We use this construction to understand what is happening to $z=f(X)$ when a point $X$ is in the vicinity of a chosen point $X=A$, where $f$ might be undefined. We start with an arbitrary sequence on the $xy$-plane that converges to this point, $X_n\to A$, then go vertically from each of these point to find the corresponding points on the graph of the function, $(X_n,f(X_n))$, and finally plot the output values (just numbers) on the $z$-axis, $z_n=f(X_n)$.

Is there a limit of this sequences? What about other sequences? Are all these limits the same?

The ability to approach the point from different directions had to be dealt with even for numerical functions: $$\operatorname{sign}(x)\to -1 \text{ as } x\to 0^- \text{ but } \operatorname{sign}(x)\to 1 \text{ as } x\to 0^+.$$ In the multi-dimensional case things are even more complicated as we can approach a point on the plane from infinitely many directions.

**Example.** It is easy to come up with an example of a function that has different limits from different directions. We take one that will be important in the future:
$$\lim_{(x,y)\to (0,0)}\frac{|2x+y|}{\sqrt{x^2+y^2}}.$$
By the way, this is how one might try to calculate the “slope” (rise over the run) at $0$ of the plane $z=2x+y$. First, we approach along the $x$-axis, i.e., $y$ is fixed at $0$:
$$\lim_{x\to 0}\frac{|2x+0|}{\sqrt{x^2+0^2}}=\lim_{x\to 0}\frac{|2x|}{|x|}=2.$$
Second, we approach along the $y$-axis, i.e., $x$ is fixed at $0$:
$$\lim_{y\to 0}\frac{|2\cdot 0+y|}{\sqrt{0^2+y^2}}=\lim_{y\to 0}\frac{|y|}{|y|}=1.$$
There can't be just one slope for a plane... $\square$

Things might be bad in an even more subtle way.

**Example.** Not all functions of two variables have graphs like the one above... Let's consider this function:
$$f(x,y)=\frac{x^2y}{x^4+y^2}.$$
and its limit $(x,y)\to (0,0)$. Does it exist? It all depends on how $X$ approaches $A=0$:

The two green horizontal lines indicate that we get $0$ if we approach $0$ from either the $x$ or the $y$ direction. That's the horizontal cross we see on the surface (just like the one in the hyperbolic paraboloid). In fact, any (linear) direction is OK: $$y=mx\ \Longrightarrow\ f(x,mx)=\frac{x^2mx}{x^4+(mx)^2}=\frac{mx^3}{x^4+m^2x^2}=\frac{m}{x+m^2/x}\to 0 \text{ as }x\to 0.$$ So, the limits from all directions are the same! However, there seems to be two curved cliffs on left and right visible form above... What is we approach $0$ along, instead of a straight line, a parabola $y=x^2$? The result is surprising: $$y=x^2\ \Longrightarrow\ f(x,x^2)=\frac{x^2x^2}{x^4+(x^2)^2}=\frac{x^4}{2x^4}=\frac{1}{2}.$$ This is the height of the cliffs! So, this limit is different from the other and, according to our definition, the limit of the function does not exist: $$\lim_{(x,y)\to (0,0)}\frac{x^2y}{x^4+y^2}\ DNE.$$ In fact, the illustration created by the spreadsheet attempts to make a unbroken surface from these points while in reality there is no passage between the two cliffs as we just demonstrated.

$\square$

So, not only we have to approach the point from all directions at once but also along any possible path.

A simpler but very important conclusion is that studying functions of several variables one variable at a time might be insufficient or even misleading.

A special note about the limit of a function at a point that lies on the *boundary* of the domain... It doesn't matter! The case of one-sided limits at $a$ or $b$ of the domain $[a,b]$ is now included in our new definition.

The algebraic theory of limits is almost identical to that for numerical functions. There are as many rules because whatever you can do with the outputs you can do with the functions.

We will use the algebraic properties of the limits of sequences -- of numbers -- to prove virtually identical facts about limits of functions.

**Theorem (Algebra of Limits of Sequences).** Suppose $a_n\to a$ and $b_n\to b$. Then
$$\begin{array}{|ll|ll|}
\hline
\text{SR: }& a_n + b_n\to a + b& \text{CMR: }& c\cdot a_n\to ca& \text{ for any real }c\\
\text{PR: }& a_n \cdot b_n\to ab& \text{QR: }& a_n/b_n\to a/b &\text{ provided }b\ne 0\\
\hline
\end{array}$$

Each property is matched by its analog for functions.

**Theorem (Algebra of Limits of Functions).** Suppose $f(X)\to a$ and $g(X)\to b$ as $X\to A$ . Then
$$\begin{array}{|ll|ll|}
\hline
\text{SR: }& f(X)+g(X)\to a+b & \text{CMR: }& c\cdot f(X)\to ca& \text{ for any real }c\\
\text{PR: }& f(X)\cdot g(X)\to ab& \text{QR: }& f(X)/g(X)\to a/b &\text{ provided }b\ne 0\\
\hline
\end{array}$$

Note that there were no PR or QR for the parametric curves...

Let's consider them one by one.

Now, limits behave well with respect to the usual arithmetic operations.

**Theorem (Sum Rule).** If the limits at $A$ of functions $f(X),g(X)$ exist then so does that of their sum, $f(P) + g(P)$, and the limit of the sum is equal to the sum of the limits:
$$\lim_{X\to A} (f(X) + g(X)) = \lim_{X\to A} f(X) + \lim_{X\to A} g(X).$$

Since the outputs and the limits are just numbers, in the case of infinite limits we follow the same rules of the algebra of infinities as in Part I: $$\begin{array}{lll} \text{number } &+& (+\infty)&=+\infty\\ \text{number } &+& (-\infty)&=-\infty\\ +\infty &+& (+\infty)&=+\infty\\ -\infty &+& (-\infty)&=-\infty\\ \end{array}$$

The proofs of the rest of the properties are identical.

**Theorem (Constant Multiple Rule).** If the limit at $X=A$ of function $f(X)$ exists then so does that of its multiple, $c f(X)$, and the limit of the multiple is equal to the multiple of the limit:
$$\lim_{X\to A} c f(X) = c \cdot \lim_{X\to A} f(X).$$

**Theorem (Product Rule).** If the limits at $a$ of functions $f(X) ,g(X)$ exist then so does that of their product, $f(X) \cdot g(X)$, and the limit of the product is equal to the product of the limits:
$$\lim_{X\to A} (f(X) \cdot g(X)) = (\lim_{X\to A} f(X))\cdot( \lim_{X\to A} g(X)).$$

**Theorem (Quotient Rule).** If the limits at $X=A$ of functions $f(X) ,g(X)$ exist then so does that of their ratio, $f(X) / g(X)$, provided $\lim_{X\to A} g(X) \ne 0$, and the limit of the ratio is equal to the ratio of the limits:
$$\lim_{X\to A} \left(\frac{f(X)}{g(X)}\right) = \frac{\lim\limits_{X\to A} f(X)}{\lim\limits_{X\to A} g(X)}.$$

Just as with sequences, we can represent these rules as commutative diagrams: $$\newcommand{\ra}[1]{\!\!\!\!\!\xrightarrow{\quad#1\quad}\!\!\!\!\!} \newcommand{\la}[1]{\!\!\!\!\!\xleftarrow{\quad#1\quad}\!\!\!\!\!} \newcommand{\da}[1]{\left\downarrow{\scriptstyle#1}\vphantom{\displaystyle\int_0^1}\right.} \newcommand{\ua}[1]{\left\uparrow{\scriptstyle#1}\vphantom{\displaystyle\int_0^1}\right.} \begin{array}{ccc} f,g&\ra{\lim}&l,m\\ \ \da{+}&SR &\ \da{+}\\ f+g & \ra{\lim}&\lim(f+g)=l+m \end{array}$$

**Example.**
$$\lim_{(x,y)\to (0,0)}\frac{x^2+y^2}{x+y}=...$$
$\square$

We can take advantage of the fact that the domain ${\bf R}^m$ of $f$ can also be seen as made of *vectors* and vectors are subject to algebraic operations.

**Theorem (Alternative formula for limits).** The limit of a function $z=f(X)$ at $X=A$ is equal to $l$ if and only if
$$\lim_{||H|| \to 0} f(A + H) = l.$$

The next result stands virtually unchanged from Part I.

**Theorem (Squeeze Theorem).** If a function is squeezed between two functions with the same limit at a point, its limit also exists and is equal to the that number; i.e., if
$$f(X) \leq h(X) \leq g(X) ,$$
for all $X$ within some distance from $X=A$, and
$$\lim_{X\to A} f(X) = \lim_{X\to A} g(X) = l,$$
then the following limit exists and equal to that number:
$$\lim_{X\to A} h(X) = l.$$

The easiest way to handle limits is *coordinate-wise*, when possible.

**Theorem.** If the limit of a function of several variables exists then it exists with respect to each of the variables; i.e., for any function $z=f(X)=f(x_1,...,x_m)$ and any $A=(a_1,...,a_m)$ in ${\bf R}^m$, we have
$$f(X)\to l \text{ as }X\to A\ \Longrightarrow\ f(a_1,...,a_{k-1},x,a_{k+1},...,a_m)\to l \text{ as }x\to a_k \text{ for each } k=1,2,...,m.$$

The converse isn't true!

**Example.** Recall that this function has limits at $(0,0)$ with respect to either of the variables:
$$f(x,y)=\frac{x^2y}{x^4+y^2},$$
but the limit as $(x,y)\to (0,0)$ does not exist. $\square$

So, we have to establish that the limit exists -- in the “omni-directional” sense -- first and only then we can use limits with respect to every variable -- in the “uni-directional” sense -- to find this limit.

## 6 Continuity

The idea of continuity is identical to that for numerical functions or parametric cures: as the input approaches a point, the output is forced to approach the values of the function at that point. The concept flows from the idea of limit just as before.

**Definition.** A function $z=f(X)$ is called *continuous at point* $X=A$ if

- $f(X)$ is defined at $X=A$,
- the limit of $f$ exists at $A$, and
- the two are equal to each other:

$$\lim_{X\to A}f(X)=f(A).$$
A function is *continuous* if it is continuous at every point of its domain.

Thus, the limits of continuous functions can be found by *substitution*.
$$\newcommand{\ra}[1]{\!\!\!\!\!\xrightarrow{\quad#1\quad}\!\!\!\!\!}
\newcommand{\da}[1]{\left\downarrow{\scriptstyle#1}\vphantom{\displaystyle\int_0^1}\right.}
%
\begin{array}{|c|}\hline\quad \lim_{X\to A}f(X)=f(A) \quad \\ \hline\end{array} \text{ or } \begin{array}{|c|}\hline\quad f(X)\to f(A) \text{ as } X\to A.\quad \\ \hline\end{array} $$

Equivalently, a function $f$ is continuous at $a$ if $$\lim_{n\to \infty}f(A_n)=f(A),$$ for any sequence $X_n\to A$.

A typical function we encounter is continuous at every point of its domain.

**Theorem.** Suppose $f$ and $g$ are continuous at $X=A$. Then so are the following functions:

- 1. (SR) $f\pm g$,
- 2. (CMR) $c\cdot f$ for any real $c$,
- 3. (PR) $f \cdot g$, and
- 4. (QR) $f/g$ provided $g(A)\ne 0$.

As an illustration, we can say that if the floor and the ceiling represented by $f$ and $g$ respectively of a cave are changing continuously then so is its height, which is $g-f$:

Or, if the floor and the ceiling ($f$ and $-g$) are changing continuously then so is its height ($g+f$). And so on.

There are two ways to compose a function of several variables with another function...

**Theorem (Composition Rule I).** If the limit at $X=A$ of function $z=f(X)$ exists and is equal to $l$ then so does that of its composition with any numerical function $u=g(z)$ continuous at $z=l$ and
$$\lim_{X\to A} (g\circ f)(X) = g(l).$$

**Proof.** Suppose we have a sequence,
$$X_n\to A.$$
Then, we also have another sequence,
$$b_n=f(X_n).$$
The condition $f(X)\to l$ as $X\to A$ is restated as follows:
$$b_n\to l \text{ as } n\to \infty.$$
Therefore, continuity of $g$ implies,
$$g(b_n)\to g(l)\text{ as } n\to \infty.$$
In other words,
$$(g\circ f)(X_n)=g(f(X_n))\to g(l)\text{ as } n\to \infty.$$
Since sequence $X_n\to A$ was chosen arbitrarily, this condition is restated as,
$$(g\circ f)(X)\to g(l)\text{ as } X\to A.$$
$\blacksquare$

We can re-write the result as follows: $$\newcommand{\ra}[1]{\!\!\!\!\!\xrightarrow{\quad#1\quad}\!\!\!\!\!} \newcommand{\da}[1]{\left\downarrow{\scriptstyle#1}\vphantom{\displaystyle\int_0^1}\right.} % \begin{array}{|c|}\hline\quad \lim_{X\to A} (g\circ f)(X) = g(l)\Bigg|_{l=\lim_{X\to A} f(X)} \quad \\ \hline\end{array} $$ Furthermore, we have $$\newcommand{\ra}[1]{\!\!\!\!\!\xrightarrow{\quad#1\quad}\!\!\!\!\!} \newcommand{\da}[1]{\left\downarrow{\scriptstyle#1}\vphantom{\displaystyle\int_0^1}\right.} % \begin{array}{|c|}\hline\quad \lim_{X\to A} g\bigg( f(X) \bigg) = g\left( \lim_{X\to A} f(X) \right) \quad \\ \hline\end{array} $$

**Theorem (Composition Rule II).** If the limit at $t=a$ of a parametric curve $X=F(t)$ exists and is equal to $L$ then so does that of its composition with any function $z=g(X)$ of several variables continuous at $X=L$ and
$$\lim_{t\to a} (g\circ F)(t) = g(L).$$

**Proof.** Suppose we have a sequence,
$$t_n\to a.$$
Then, we also have another sequence,
$$B_n=F(t_n).$$
The condition $F(t)\to L$ as $t\to a$ is restated as follows:
$$B_n\to L \text{ as } n\to \infty.$$
Therefore, continuity of $F$ implies,
$$g(B_n)\to g(L)\text{ as } n\to \infty.$$
In other words,
$$(g\circ F)(t_n)=g(F(t_n))\to g(L)\text{ as } n\to \infty.$$
Since sequence $X_n\to A$ was chosen arbitrarily, this condition is restated as,
$$(g\circ F)(t)\to g(L)\text{ as } t\to a.$$
$\blacksquare$

We can re-write the result as follows: $$\newcommand{\ra}[1]{\!\!\!\!\!\xrightarrow{\quad#1\quad}\!\!\!\!\!} \newcommand{\da}[1]{\left\downarrow{\scriptstyle#1}\vphantom{\displaystyle\int_0^1}\right.} % \begin{array}{|c|}\hline\quad \lim_{t\to a} (g\circ F)(t) = g(L)\Bigg|_{L=\lim_{t\to a} F(t)} \quad \\ \hline\end{array} $$ Furthermore, we have: $$\newcommand{\ra}[1]{\!\!\!\!\!\xrightarrow{\quad#1\quad}\!\!\!\!\!} \newcommand{\da}[1]{\left\downarrow{\scriptstyle#1}\vphantom{\displaystyle\int_0^1}\right.} % \begin{array}{|c|}\hline\quad \lim_{t\to a} g\bigg( F(t) \bigg) = g\left( \lim_{t\to a} F(t) \right) \quad \\ \hline\end{array} $$

**Corollary.** The composition $g\circ f$ of a function $f$ continuous at $x=a$ and a function $g$ continuous at $y=f(a)$ is continuous at $x=a$.

The easiest way to handle continuity is *coordinate-wise*, when possible.

**Theorem.** If a function of several variables is continuous then it is also continuous with respect to each of the variables.

The converse isn't true!

**Example.** Recall that this function is continuous at $(0,0)$ with respect to either of the variables:
$$f(x,y)=\frac{x^2y}{x^4+y^2},$$
but the limit as $(x,y)\to (0,0)$ simply does not exist. $\square$

So, we have to establish continuity -- in the sense of the “omni-directional” limit -- first and only then we can use this fact to find facts about the continuity with respect to every variable -- in the “uni-directional” sense.

A special note about the continuity of a function at a point that lies on the *boundary* of the domain... Once again, it doesn't matter! The case of one-sided continuity at $a$ or $b$ of the domain $[a,b]$ is now included in our new definition.

The definition of continuity is purely *local*: only the behavior of the function in the, no matter how small, vicinity of the point matters. If the function is continuous on a whole set, what can we say about its *global* behavior?

Our understanding of continuity of numerical functions has been as the property of having *no gaps in their graphs*.

This idea is more precisely expressed by the following by the *Intermediate Value Theorem*:

- if a function $f$ is defined and is continuous on an interval $[a,b]$, then for any $c$ between $f(a)$ and $f(b)$, there is $d$ in $[a,b]$ such that $f(d) = c$.

An often more convenient way to state is:

- if the domain of a continuous function is an interval then so is its image.

Now the plane is more complex than a line and we can't limit ourselves to intervals. But what is the analog of an interval in a multidimensional space?

**Theorem (Intermediate Value Theorem for functions of several variables).** Suppose a function $f$ is defined and is continuous on a set that contains the path $C$ of a continuous parametric curve. Then the image of this path is an interval.

**Proof.** It follows from the *Composition Rule II* and the *Intermediate Value Theorem* for numerical functions. $\blacksquare$

The theorem says that there are no missing values in the image of such a set.

**Exercise.** Show that that the converse of the theorem isn't true.

Recall that a function $f$ is called *bounded* on a set $S$ in ${\bf R}^n$ if its image is bounded, i.e., there is such a real number $Q$ that
$$|f(X)| \le Q$$
for all $X$ in $S$.

**Theorem.** If the limit at $X=A$ of function $z=f(X)$ exists then $f$ is bounded on some open disk that contains $A$:
$$\lim_{X\to A}f(X) \text{ exists }\ \Longrightarrow\ |f(X)| \le Q$$
for all $X$ with $d(X,A)<\delta$ for some $\delta >0$ and some $Q$.

**Exercise.** Prove the theorem.

The *global* version of the above theorem guarantees that the function is bounded under certain circumstances.

The version of the theorem for numerical functions is simply: a continuous on $[a,b]$ function is bounded.

But what is the multi-dimensional analog of a closed bounded interval?

We already know that a set $S$ in ${\bf R}^n$ is *bounded* if it fits in a sphere (or a box) of a large enough size:
$$||x|| < Q \text{ for all } x \text{ in } S;$$
and a set in ${\bf R}^n$ is called *closed* if it contains the limits of all of its convergent sequences.

**Theorem (Boundedness).** A continuous on a closed bounded set function is bounded.

**Proof.** Suppose, to the contrary, that $x=f(X)$ is unbounded on set $S$. Then there is a sequence $\{X_n\}$ in $S$ such that $f(X_n)\to \infty$. Then, by the *Bolzano-Weierstrass Theorem*, sequence $\{X_n\}$ has a convergent subsequence $\{Y_k\}$:
$$Y_k \to Y.$$
This point belong to $S$! From the continuity, it follows that
$$f(Y_k) \to f(Y).$$
This contradicts the fact that $\{Y_k\}$ is a subsequence of a sequence that diverges to $\infty$. $\blacksquare$

**Exercise.** Why are we justified to conclude in the proof that the limit $Y$ of $\{Y_k\}$ is in $S$?

Is the image of a *closed* set closed? If it is, the function *reaches its extreme values*, i.e., the least upper bound $\sup$ and the greatest lower bound $\inf$.

**Definition.** Given a function $z=f(X)$. Then $X=D$ is called a *global maximum point* of $f$ on set $S$ if
$$f(D)\ge f(X) \text{ for all } X \text{ in }S;$$
and $X=C$ is called a *global minimum point* of $f$ on set $S$ if
$$f(C)\le f(X) \text{ for all } X \text{ in }S.$$
(They are also called *absolute maximum and minimum points*.) Collectively they are all called *global extreme points*.

Just because something is described doesn't mean that it can be found.

**Theorem (Extreme Value Theorem).** A continuous function on a bounded closed set has a global maximum and a global minimum, i.e., if $z=f(X)$ is continuous on a bounded closed set $S$, then there are $C,D$ in $S$ such that
$$f(C)\le f(X) \le f(D),$$
for all $X$ in $S$.

**Proof.** It follows from the *Bolzano-Weierstrass Theorem*. $\blacksquare$

**Definition.** Given a function $z=f(X)$. Then $x=M$ is called the *global maximum value* of $f$ on set $S$ if
$$M\ge f(X) \text{ for all } X \text{ in }S;$$
and $y=m$ is called the *global minimum value* of $f$ on set $S$ if
$$m\le f(x) \text{ for all } X \text{ in }S.$$
(They are also called *absolute maximum and minimum value*.) Collectively they are all called *global extreme value*.

Then *the* global max (or min) value is reached by the function at *any* of its global max (or min) points.

Note that the reason we need the *Extreme Value Theorem* is to ensure that the optimization problem we are facing has a solution.

We can define limits and continuity of functions of several variables without invoking limits of sequences. Let's re-write what we want to say about the meaning of the limits in progressively more and more precise terms. $$\begin{array}{l|ll} X&z=f(X)\\ \hline \text{As } X\to A, & \text{we have } y\to l.\\ \text{As } X\text{ approaches } A, & y\text{ approaches } l. \\ \text{As } \text{the distance from }X \text{ to } A \text{ approaches } 0, & \text{the distance from }y \text{ to } l \text{ approaches } 0. \\ \text{As } d(X,A)\to 0, & \text{we have } |y-l|\to 0.\\ \text{By making } d(X,A) \text{ as smaller and smaller},& \text{we make } |y-l| \text{ as small as needed}.\\ \text{By making } d(X,A) \text{ less than some } \delta>0 ,& \text{we make } |y-l| \text{ smaller than any given } \varepsilon>0. \end{array}$$

**Definition.** The *limit of function* $z=f(X)$ at $X=A$ is a number $l$, if exists, such that for any $\varepsilon >0$ there is such a $\delta> 0$ that
$$0<d(X,A)<\delta \ \Longrightarrow\ |f(X)-l|<\varepsilon.$$

This is the geometric meaning of the definition: if $X$ is within $\delta$ from $A$, then $f(X)$ is supposed to be within $\varepsilon$ from $l$. In other words, this part of the graph fits within the $\varepsilon$-band around the *plane* $z=l$.

## 7 Differentiability and linear approximations

This is the build-up for the introduction of the derivative of functions of two variables:

- the slopes of a surface in different directions,
- secant lines,
- secant planes,
- planes and ways to represent them, and finally
- limits.

Due to the complexity of the problem we will focus on the *derivative at a single point* for now.

Let's review what we know about the derivatives so far. The definition of the derivative of a parametric curve (at a point)is virtually identical to that for a numerical function because the fraction of the difference quotient is allowed by vector algebra: $$\begin{array}{cccc} &\text{numerical functions}\\ &f'(a)=\lim_{x\to a}\frac{f(x)-f(a)}{x-a}\\ \text{parametric curves} && \text{functions of several variables}\\ F'(a)=\lim_{t\to a}\frac{F(t)-F(a)}{t-a} && f'(A)=\lim_{X\to A}\frac{f(X)-f(A)}{X-A}\ ???\\ \end{array}$$ The same formula fails for a function of several variables. We can't divide by a vector! This failure is the reason why we start studying multi-dimensional calculus with parametric curves and not functions of several variables.

Can we fix the definition?

How about we divide by the *magnitude* of this vector? This is allowed by vector algebra and the result is something similar to the rise over the run definition of the slope.

**Example.** Let's carry out this idea for a linear function, the simplest kind of function:
$$f(x,y)=2x+y.$$
The limit:
$$\lim_{(x,y)\to (0,0)}\frac{|2x+y|}{\sqrt{x^2+y^2}},$$
is to be evaluated along either of the axes:
$$\lim_{x\to 0}\frac{|2x+0|}{\sqrt{x^2+0^2}}=\lim_{x\to 0}\frac{|2x|}{|x|}=2\ \ne\ \lim_{y\to 0}\frac{|2\cdot 0+y|}{\sqrt{0^2+y^2}}=\lim_{y\to 0}\frac{|y|}{|y|}=1.$$
The limit doesn't exist because the slopes are different in different directions... $\square$

The line of attack of the meaning of the derivative then shifts -- to finding that tangent plane. Some of the functions shown in the last section have no tangent planes at their points of discontinuity. But such a plane is just a linear function! As such it is a the linear approximation of the function.

In the $1$-dimensional case, among the linear approximations of a function $y=f(x)$ at $x=a$ are the functions that define secant lines, i.e., lines on the $xy$-plane through the point of interest $(a,f(a))$ and another point on the graph $(x,f(x))$. Its slope is the difference quotient of the function: $$\frac{\Delta f}{\Delta x}=\frac{f(x)-f(a)}{x-a}.$$ In general, they are just linear functions and their graphs are lines through $(a,f(a))$; in the point-slope form they are: $$l(x)= f(a)+ m(x - a). $$ When you zoom in on the point, the tangent line -- but no other line -- will merge with the graph:

We reformulate our theory from Part I slightly.

**Definition.** Suppose $y=f(x)$ is a defined at $x=a$ and
$$l(x)=f(a)+m(x-a)$$
is any of its linear approximations at that point. Then, $y=l(x)$ is called the *best linear approximation* of $f$ at $x=a$ if the following is satisfied:
$$\lim_{x\to a} \frac{ f(x) -l(x) }{|x-a|}=0.$$

The definition is about the decline of the error of the approximation, i.e., the difference between the two functions: $$\text{error } = | f(x) -l(x) | . $$

This, not only the error vanishes, but also it vanishes relative to how close we are to the point of interest, i.e., the run. The following result is from Part II.

**Theorem.** If
$$l(x)=f(a)+m(x-a)$$
is the best linear approximation of $f$ at $x=a$, then
$$m=f'(a).$$

Therefore, the graph of this function is the tangent line.

For a function of two variables, we think by analogy. The linear approximations of a function $z=f(x,y)$ at $(x,y)=(a,b)$ are linear functions and some of them represent secant *planes*. There are two slopes in the two directions equal to the two difference quotients:
$$\frac{\Delta f}{\Delta x}=\frac{f(x,b)-f(a,b)}{x-a} \text{ and } \frac{\Delta f}{\Delta y}=\frac{f(a,y)-f(a,b)}{y-b}.$$

Their limits are what we are interested in.

**Definition.** The *partial derivatives of $f$ with respect to $x$ and $y$ at* $(x,y)=(a,b)$ are defined to be the limits of the difference quotient with respect to $x$ at $x=a$ and with respect to $y$ at $y=b$ respectively, **denoted** by:
$$\frac{\partial f}{\partial x}(a,b) = \lim_{x\to a}\frac{f(x,b)-f(a,b)}{x-a}\text{ and } \frac{\partial f}{\partial y}(a,b)= \lim_{y\to b}\frac{f(a,y)-f(a,b)}{y-b},$$
as well as
$$f_x'(a,b)\text{ and } f'_y(a,b).$$

The following is an obvious conclusion.

**Theorem.** The partial derivatives of $f$ at $(x,y)=(a,b)$ are found as the derivatives of $f$ with respect to $x$ and to $y$:
$$\frac{\partial f}{\partial x}(a,b) = \frac{d}{dx}f(x,b)\bigg|_{x=a},\ \frac{\partial f}{\partial y}(a,b)= \frac{d}{dy}f(a,y)\bigg|_{y=b}.$$

**Example.** Find the partial derivatives of the function:
$$f(x,y)=(x-y)e^{x+y^2}$$
at $(x,y)=(0,0)$:
$$\begin{array}{ll}
\frac{\partial f}{\partial x}(0,0) &= \frac{d}{dx}(x-0)e^{x+0^2}\bigg|_{x=0}&= \frac{d}{dx}xe^{x}\bigg|_{x=0}&=e^x+xe^x\bigg|_{x=0}&=1,\\
\frac{\partial f}{\partial y}(0,0) &= \frac{d}{dy}(0-y)e^{0+y^2}\bigg|_{y=0}&= \frac{d}{dy}-ye^{y^2}\bigg|_{y=0}&=-e^{y^2}-ye^{y^2}2y&=-e^{y^2}-2y^2e^{y^2}.
\end{array}$$
$\square$

In general these approximations are just linear functions and their graphs are planes through the point $(a,b,f(a,b))$; in the point-slope form they are: $$l(x,y)= f(a,b)+ m(x - a)+n(y-b). $$ When you zoom in on the point, the tangent plane -- but no other plane -- will merge with the graph:

**Definition.** Suppose $y=f(x,y)$ is defined at $(x,y)=(a,b)$ function and
$$l(x,y)=f(a,b)+m(x-a)+n(y-b)$$
is any of its linear approximations at that point. Then, $y=l(x,y)$ is called the *best linear approximation* of $f$ at $(x,y)=(a,b)$ if the following is satisfied:
$$\lim_{(x,y)\to (a,b)} \frac{ f(x,y) -l(x,y) }{||(x,y)-(a,b)||}=0.$$
In that case, the function $f$ is called *differentiable* at $(a,b)$ and the graph of $z=l(x,y)$ is called the *tangent plane*.

In other words, we stick to the functions that look like plane on a small scale!

The definition is about the decline of the error of the approximation: $$\text{error } = | f(x,y) -l(x,y) | . $$ The limit of $\frac{\text{error}}{\text{run}}$ is required to be zero.

**Theorem.** If
$$l(x,y)=f(a,b)+m(x-a)+n(y-b)$$
is the best linear approximation of $f$ at $x=a$, then
$$m=\frac{\partial f}{\partial x}(a,b) \text{ and }n=\frac{\partial f}{\partial y}(a,b) .$$

**Proof.** The limit in the definition allows us to approach $(a,b)$ in any way we like. Let's start with the direction parallel to the $x$-axis and see how $\frac{\text{error}}{\text{run}}$ in converted into $\frac{\text{rise}}{\text{run}}$:
$$\begin{array}{llcc}
0&=\lim_{x\to a^+,\ y=b} \frac{ f(x,y) -l(x,y) }{||(x,y)-(a,b)||}\\
&=\lim_{x\to a^+}\frac{ f(x,y) -(f(a,b) + m(x-a) +n(b-b))}{x-a}\\
&=\lim_{x\to a^+}\frac{ f(x,y) -f(a,b)}{x-a} -m \\
&= \frac{\partial f}{\partial x}(a,b) -m .
\end{array}$$
The same computation for the $y$-axis produces the second identity. $\blacksquare$

**Example.** Let's confirm that the tangent plane to the hyperbolic paraboloid, given by the graph of $f(x,y)=xy$, at $(0,0)$ is horizontal:
$$\frac{\partial f}{\partial x}(0,0)= \frac{d}{dx}f(x,0)\bigg|_{x=0}=\frac{d}{dx}(x\cdot 0)\bigg|_{x=0} =0,\ \frac{\partial f}{\partial y}(0,0)= \frac{d}{dy}f(0,y)\bigg|_{y=0}=\frac{d}{dx}(0\cdot y)\bigg|_{y=0} =0.$$

The two tangent lines form a cross and the tangent plane is spanned on this cross. $\square$

Just as before, replacing a function with its linear approximation is called *linearization* and it can be used to estimate values of new functions.

**Example.** Let's review a familiar example: approximation of $\sqrt{4.1}$. We can't compute $\sqrt{x}$ by hand because, in a sense, the function $f(x)=\sqrt{x}$ is *unknown*. The best linear approximation of $y=f(x)=\sqrt{x}$ is known and, as a linear function, it can be computed by hand:
$$f'(x) = \frac{1}{2\sqrt{x}} \ \Longrightarrow\ f'(4) = \frac{1}{2\sqrt{4}} = \frac{1}{4}.$$
The best linear approximation is:
$$l(x) = f(a) + f'(a) (x - a) = 2 + \frac{1}{4} (x - 4).$$
Finally, our approximation of $\sqrt{4.1}$ is
$$l(4.1) = 2 + \frac{1}{4}(4.1 - 4) = 2 + \frac{1}{4} \cdot 1 = 2 + 0.025 = 2.025.$$

Let's now approximate of $\sqrt{4.1}\sqrt[3]{7.8}$. Instead of approximating the two terms separately, we will find the best linear approximation of the product $f(x,y)=\sqrt{x}\sqrt[3]{y}$ at $(x,y)=(4,8)$. Then we compute the partial derivatives, $x$: $$\frac{\partial f}{\partial x}(4,8)= \frac{d}{dx}f(x,8)\bigg|_{x=4}=\frac{d}{dx}\left(\sqrt{x}\sqrt[3]{8}\right)\bigg|_{x=4} =\frac{1}{2\sqrt{x}}2\bigg|_{x=4}=\frac{1}{2};$$ and $y$: $$\frac{\partial f}{\partial y}(4,8)= \frac{d}{dy}f(4,y)\bigg|_{y=8}=\frac{d}{dx}\left(\sqrt{4}\sqrt[3]{y}\right)\bigg|_{y=8} =2\frac{1/3}{\sqrt[3]{y^2}}\bigg|_{y=8}=\frac{1}{12}.$$ The best linear approximation is: $$l(x,y) = f(a,b) + \frac{\partial f}{\partial x}(a,b)(x - a) +\frac{\partial f}{\partial y}(a,b)(y-b)= 2\cdot 2 + \frac{1}{2} (x - 4)+\frac{1}{12} (y - 8).$$ Finally, our approximation of $\sqrt{4.1}\sqrt[3]{7.8}$ is $$l(4.1,7.8) = 4 + \frac{1}{2}.1+ \frac{1}{12}(-.2) = 4.033333....$$ $\square$

**Exercise.** Find the best linear approximation of $f(x)=(xy)^{1/3}$ at $(1,1)$.

To summarize,

- under the limit $x\to a$, the secant line parallel to the $x$-axis turns into a tangent line with the slope $f_x'(a,b)$, and
- under the limit $y\to b$, the secant line parallel to the $x$-axis turns into a tangent lines with the slope $f'_y(a,b)$,

provided $f$ is differentiable with respect to these two variables at this point. Meanwhile,

- the secant plane turns into the tangent plane,

provided $f$ is differentiable at this point, with the slopes $f_x'(a,b)$ and $f'_y(a,b)$ in the directions of the coordinate axes.

## 8 The derivative

We progress to the $n$-dimensional case.

First let's look at the point-slope form of *linear functions*:
$$l(x_1,...,x_n)=p+m_1(x-a_1)+...+m_n(x-a_n),$$
where $p$ is the $z$-intercept, $m_1,...,m_n$ are the chosen slopes of the plane along the axes, and $a_1,...,a_n$ are the coordinates of the chosen point in ${\bf R}^n$. Let's express this expression, as before, in terms of the *dot product* with the increment of the independent variable:
$$l(X)=p+M\cdot (X-A),$$
where

- $M=<m_1,...,m_n>$ is the vector of slopes,
- $A=(a_1,...,a_n)$ is the point in ${\bf R}^n$, and
- $X=(x_1,...,x_n)$ is our variable point in ${\bf R}^n$.

Then we can say that the vector $N=<m_1,...,m_n,1>$ is perpendicular to this “plane” in ${\bf R}^n$. The conclusion holds independently from any choice of a coordinate system!

The linear approximations of a function $z=f(X)$ at $X=A$ in ${\bf R}^n$ are linear functions with $n$ slopes in the directions of the axes: $$\frac{\Delta f}{\Delta x_k}=\frac{f((a_1,...,a_{k-1},x,a_{k+1},a_n))-f(a_1,...,a_n)}{x_k-a_k}.$$ Their limits are what we are interested in.

**Definition.** The *partial derivative of $z=f(X)=f(x_1,...,x_n)$ with respect $x_k$ at* $X=A=(a_1,...,a_n)$ are defined to be the limit of the difference quotient with respect to $x_k$ at $x_k=a_k$ **denoted** by:
$$\frac{\partial f}{\partial x_k}(A) \text{ or } f_k'(A).$$

The following is an obvious conclusion.

**Theorem.** The partial derivative of $z=f(X)$ at $X=A=(a_1,...,a_n)$ is found as the derivative of $z=f(x_1,...,x_n)$ with respect to $x_k$:
$$\frac{\partial f}{\partial x_k}(A) = \frac{d}{dx_k}f(x_1,...,x_n)\bigg|_{x_k=a_k}.$$

**Definition.** Suppose $z=f(X)$ is defined at $X=A$ and
$$l(X)=f(A)+M\cdot (X-A)$$
is any of its linear approximations at that point. Then, $z=l(X)$ is called the *best linear approximation* of $f$ at $X=A$ if the following is satisfied:
$$\lim_{X\to A} \frac{ f(X) -l(X) }{||X-A||}=0.$$
In that case, the function $f$ is called *differentiable* at $X=A$.

In other words, we stick to the functions that look like ${\bf R}^n$ on a small scale!

Below is a visualization of a differentiable function of three variables given by its level surfaces:

Not only these surfaces look like planes when we zoom in, they also progress in a parallel and uniform fashion.

**Theorem.** If
$$l(X)=f(A)+M\cdot (X-A)$$
is the best linear approximation of $z=f(X)$ at $X=A$, then
$$M=\operatorname{grad}f(A)=\left<\frac{\partial f}{\partial x_1}(A),..., \frac{\partial f}{\partial x_n}(A)\right>.$$
called the *gradient* of $f$ at $A$.

Note that the gradient notion is to be read as: $$\bigg(\operatorname{grad}f\bigg)(A),$$ i.e., the gradient is computed and then evaluated at $X=A$.

**Exercise.** Prove the theorem.

The gradient serves as *the* derivative of a differentiable function. When it is not differentiable, combining its variables, $x$ and $y$, into one, $X=(x,y)$, may be a bad idea but the partial derivatives might still make sense.

Warning: Even though the derivative of a parametric curve in ${\bf R}^n$ at a point and the derivative of a function of $n$ at a point are both vectors in ${\bf R}^n$, don't read anything into it.