Lecture 8. Inverse and Implicit Function Theorems

Motivation

Differentiation in $\mathbb{R}^n$

Definition (Partial Derivative)

Let $f:\mathbb{R}\to \mathbb{R}$ and $x_0 \in \mathbb{R}^n$ be given. For $j\in\{1,\dots,n\}$, the partial derivative of $f$ at $x_0$ with respect to $x_j$ is defined through $$ \frac{\partial f}{\partial x_j} (x_0) = \lim_{h \to 0} \frac{f(x_0 + he_j) - f(x_0)}{h}, $$ where $e_j^k=\begin{cases} 1\, ,&k=j\, ,\\ 0\, ,&k\neq j\, .\end{cases}$.

Example

Let $f(x,y,z) = x^2 + y ^2 + z^2$ be a function on $\mathbb{R}^3$. Then $$ \frac{\partial f}{\partial x} = 2x\, ,\:\frac{\partial f}{\partial y} = 2y\, ,\text{ and }\frac{\partial f}{\partial z} = 2z. $$
In the one variable case, the existence of $f'(x_0)$ implies continuity of $f$ at $x_0$. In the case of multiple variables, things are slightly more complicated.

Definition (Differentiability)

Let $f:\mathbb{R}^n\to \mathbb{R}$ and $x_0 \in\mathbb{R}^n$. Then $f$ is said to be differentiable at $x_0$ iff there is a linear map from $\mathbb{R}^n$ to $\mathbb{R}$, called $Df(x_0)$ with $$ \lim_{x\to x_0} \frac{ \| f(x) - f(x_0) - Df(x_0)(x - x_0)\|}{\|x-x_0\|} \to 0\, , $$ or, equivalently, iff $$ f(x) - f(x_0) - Df(x_0)(x - x_0)=o\bigl(\| x-x_0\|\bigr)\text{ as }x-x_0\to 0. $$ If we tacitly agree that $\mathbb{R}^n$ is equipped with its standard basis $e_1,\dots,e_n$, then $Df(x_0)$ can be represented as a specific $\mathbb{R}^{1\times n}$ matrix. Which one?

Remark

We often will write $f'(x_0)$ for $Df(x_0)$ if no confusion seems likely.

Example

Let $n>1$ and assume that $\frac{\partial f}{\partial x_1}(x),\dots,\frac{\partial f}{\partial x_n}(x)$ exist for $x$ near $x_0$. Does it follow that $f$ is continuous at $x_0$?

Discussion

Definition (Gradient)

The gradient of $f$ at $x_0$ is the vector defined by the validity of $$ \nabla f(x_0)\cdot h= Df(x_0)h\qquad\forall\, h\in \mathbb{R}^n\, . $$ It follows that $$ \nabla f(x_0)=\begin{bmatrix}\frac{\partial f}{\partial x_1} (x_0)\\\vdots\\\frac{\partial f}{\partial x_n} (x_0)\end{bmatrix}=Df(x_0)^\top. $$

Proposition

If $f$ is differentiable at $x_0$, then
(i) $\frac{\partial f}{\partial x_j}(x_0)$ exists for $j=1,\dots,n$.
(ii) $f$ is continuous at $x_0$.

What is a natural sufficient condition to test whether $f:\mathbb{R}^n\to \mathbb{R}$ is differentiable at a point $x_0\in \mathbb{R}^n$?

Theorem

If $\frac{\partial f}{\partial x_j}(x)$ exists in a neighborhood of $x_0$ and is continuous at $x_0$ for $j=1,\dots,n$, then $f$ is differentiable at $x_0$.

Proof

Example

Suppose that $\frac{\partial^2f}{\partial x\partial y}(x_0, y_0), \frac{\partial^2f}{\partial y\partial x}(x_0, y_0)$ exist. Does it necessarily follow that $$ \frac{\partial^2 f}{\partial x\partial y} (x_0) = \frac{\partial^2 f}{\partial y\partial x} (x_0)? $$

Discussion

It is natural to ask on which assumptions one has that $\frac{\partial^2 x}{\partial x\partial y} = \frac{\partial^2 f}{\partial y\partial x}$ in general.

Theorem

Assume that $$ \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial^2f}{\partial x\partial y} $$ exist in a neighborhood of $(x_0, y_0)$ and that $\frac{\partial^2 f}{\partial x\partial y}$ is continuous at $x_0$. Then $\frac{\partial^2f}{\partial y\partial x}$ also exists and $$ \frac{\partial^2f}{\partial y\partial x}(x_0,y_0) = \frac{\partial^2f}{\partial x\partial y} (x_0,y_0). $$ In particular, if $f$ is twice continuously differentialble at, then the mixed partial derivatives coincide.

Proof


Next, consider a map $f: \mathbb{R}^n \to \mathbb{R}^m$ $$ f(x) = \begin{bmatrix}f_1 \\ \vdots \\ f_m\end{bmatrix}\, ,\:x = \begin{bmatrix}x_1 \\ \vdots \\ x_n \end{bmatrix},\ x_0 = \begin{bmatrix}x^0_1 \\ \vdots \\ x^0_n \end{bmatrix} $$ Then $f$ is said to be differentiable at $x_0 $ if there is $m \times n$ matrix $A$ such that $$ \lim_{x \to x_0} \frac{ || f(x) - f(x_0) - A(x- x_0) ||}{||x- x_0||} = 0. $$ Such matrix is again denoted by $Df(x_0)$ or $f'(x_0)$.

Remark

Notice that differentiability of a function $f:X\to Y$ between general normed vector spaces $\bigl( X,\|\cdot\| _X\bigr)$ and $\bigl( Y,\|\cdot\| _Y\bigr)$ at a point $x_0\in X$ is defined as the existence of a bounded, linear map $A:X\to Y$ with the property that $$ \| f(x)-f(x_0)-A(x-x_0)\| _Y=o\bigl( \| x-x_0\| _X\bigr)\, . $$ A linear map $A:X\to Y$ is called bounded if it satisfies $$ \| Ax\| _Y\leq C\,\| x\| _X\, ,\: x\in X\, , $$ for some constant $C\geq 0$. Again the notation $Df(x_0)=f'(x_0)=A$ is tyically used. With this definition it is apparent that $f$ is differentiable at $x_0$ iff it can be approximated by an affine map to better than first order.

Exercise

Prove that $$ \| A\|:=\sup _{\| x\| _X=1}\| Ax\| _Y=\sup _{x\neq 0}\frac{\| Ax\| _Y}{\| x\|_X} $$ is a norm on the space $\mathcal{L}(X,Y)$ of linear and continuous maps from $X$ to $Y$. If $B\in \mathcal{L}(Y,Z)$ for an additional normed vector space $(Z,\|\cdot\| _Z)$, prove that $\| BA\|\leq \| B\| \| A\|$.

Proposition

If $f:\mathbb{R}^n\to \mathbb{R}^m$ is differentiable at $x_0$, then $\frac{\partial f_i}{\partial x_j} (x_0)$ exists for $1 \le i \le m$ and $1 \le j \le n$ and $$ Df(x_0) = \begin{bmatrix}\frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n}\\\vdots & \cdots & \vdots \\ \frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_m} \end{bmatrix}(x_0) $$

Notation

The Jacobian Jac$(f)$ of $f$ at $x$ is defined as Jac$(f)(x)=$det$Df(x)$.

Theorem

Let $E\subset \mathbb{R}^n$ be convex, $f: E\to \mathbb{R}^m$ be differentiable, and assume that $\sup _{x\in E}||Df(x)|| \le M<\infty$, then $$ ||f(x) - f(y)|| \le M ||x - y||\, ,\: x, y\in E\, . $$

The proof is left as an exercise.

Theorem (Chain Rule)

Let $f: \mathbb{R}^n \to \mathbb{R}^m$ and $g: \mathbb{R}^m \to \mathbb{R}^k$ be differentiable functions. Then $g \circ f: \mathbb{R}^n \to \mathbb{R}^k$ is differentiable and $$ D\bigl(g \circ f\bigr)(x) = Dg\bigl(f(x)\bigr)Df(x)\, . $$ Denoting the composition of linear maps on the right-hand-side of the last identity as a multiplication is motivated by the fact that it amounts to matrix multiplication of the corresponding representation matrices.

Inverse Function and Open Mapping Theorems

Theorem (Inverse Function Theorem)

Let $E$ be a domain in $\mathbb{R}^n$ and $f: E \to \mathbb{R}^n$ be a continuously differentiable function. Let $x_0\in E$ and assume that $Df(x_0)$ is invertible. Then
(i) There is $\delta>0$ such that $f$ is 1-1 on $\mathbb{B}(x_0, \delta)$ and $f\bigl(\mathbb{B}(x_0, \delta)\bigr)$ is open.
(ii) if $g: V=f\bigl(\mathbb{B}(x_0, \delta)\bigr)\to \mathbb{B}(x_0, \delta)$ is the inverse function to $f$, then it is continuously differentiable and $$ Dg(y) =\big[Df\bigl(g(y)\bigr)\big]^{-1}\, ,\: y\in V\, . $$

Proof

Exercise

Based on the proof of the Inverse Function Theorem, what does correspond to $\delta >0$ from the formulation of the theorem?

Corollary (Open Mapping Theorem)

If $f:E\to \mathbb{R}^n$ is such that $\operatorname{det} Df(x)\ne 0$ for all $x \in E$, then $f$ is open, i.e. it maps open sets to open sets.

Implicit Function Theorem

Next we consider an important consequence of the inverse function theorem.

Theorem (Implicit Function Theorem)

Let $$ f =(f_1,\dots, f_n):\mathbb{R}^{n+m} \to \mathbb{R}^n $$ be a continuously differentiable map. Assume there is $(a,b) \in \mathbb{R}^{n+m}$ such that $f(a,b) =0$ and that $D_xf(a,b)$ is invertible, where $D_x$ indicates that we are taking the derivative with respect to the $x$-variables only. Then there are open sets $U \subset \mathbb{R}^{n+m}$ and $W \subset \mathbb{R}^m$ with $$ (a,b)\in U\text{ and }b\in W, $$ and a function $g:W\to g(W)\subset \mathbb{R}^n$ such that for fixed $y\in W$, $\bigl(g(y), y\bigr)$ is the only solution in $U$ of $f(x,y)=0$ of the form $(x,y)$. In particular $$ f\bigl(g(y),y\bigr)=0\, ,\: y\in W. $$

Proof

Remark

The theorem implies that the relations $f(x,y) =0$ locally determine a function $g$ of the variable $y$ such that $f\bigl(g(y),y\bigr)$ in the vicinity of any point satisfying the assumptions. Moreover $$ Dg(y) = -D_xf\bigl(g(y),y\bigr)^{-1} D_yf\bigl(g(y),y\bigr)\, , y\in W. $$ where $$ D_xf = \begin{bmatrix}\frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \vdots & & \vdots \\ \frac{\partial f_n}{\partial x_1} & \cdots & \frac{\partial f_n}{\partial x_n} \end{bmatrix}, \quad D_yf = \begin{bmatrix}\frac{\partial f_1}{\partial y_1} & \cdots & \frac{\partial f_1}{\partial y_m} \\ \vdots & & \vdots \\ \frac{\partial f_n}{\partial y_1} & \cdots & \frac{\partial f_n}{\partial y_m} \end{bmatrix} $$

Example

Let $f = (f_1, f_2): \mathbb{R}^{2+3} \to \mathbb{R}^2$ be defined by $$ f(x_1,x_2,y_1,y_2,y_3)= \begin{bmatrix}2e^{x_1} + x_2 y_1 - 4y_2 + 3\\ x_2 \cos(x_1) - 6x_1 + 2y_1-y_3 \end{bmatrix} $$ Let $a=(0,1)$, $b=(3,2,7)$. Then $$ f(a,b) = \begin{bmatrix}2+3-4\cdot2+3 \\1- 0 +2\cdot 3 -7 \end{bmatrix} = \begin{bmatrix}0\\ 0\end{bmatrix}. $$ It is clear that $f$ is a continuously differentiable map and that \begin{eqnarray*} Df=\begin{bmatrix} 2e^{x_1} & y_1 & x_2 & -4 & 0 \\ -x_2 \sin(x_1) -6 & \cos(x_1) & 2 & 0 & -1\end{bmatrix}\, . \end{eqnarray*} Then $$ D_xf(a,b) = \begin{bmatrix}2e^{x_1} & y_1 \\ -x_2 \sin(x_1) -6 & \cos(x_1) \end{bmatrix}(0,1)= \begin{bmatrix}2 & 3 \cr -6 & 1 \end{bmatrix} $$ which is non-singular since its determinant is $20\ne 0$. The implicit function theorem then yields a function $x=g(y)$ defined in some neighborhood $W$ of $b$ and $$ Dg(y) = -\big[D_xf\bigl(g(y),y\bigr)\big]^{-1} D_yf\bigl(g(y),y\bigr)\, . $$ In particular $$ Dg(b) = -\frac{1}{20}\begin{bmatrix}1 & -3 \\ 6 & 2 \end{bmatrix} \begin{bmatrix}1 & -4 & 0 \\ 2 & 0 & -1\end{bmatrix}=- \frac{1}{20}\begin{bmatrix}-5 & -4 & 3 \\ 1 & -24 & -2\end{bmatrix} $$