Lecture 8. Inverse and Implicit Function Theorems

Motivation

A prime example of the usefulness of differentiability is the widely used Inverse Function Theorem (formulated and proved in this lecture). Here we do not give a proof but point to intuitive nature of the result which also informs the way in which a rigorous proof can and will be given. Let $n\in \mathbb{N}$ and consider a continuously differentiable functon $f:\mathbb{R}^n\to \mathbb{R}^n$ and fix a point $x_0\in \mathbb{R}^n$, which without loss of generality, can be taken to be the origin, i.e. $x_0=0$. The function $f$ will assume some value $f(0)$ there, which we can also assume to vanish. The question we care to answer is the following:
if we are given $y\simeq 0$, when is it possible to say that there is exactly one $x\simeq 0$ which solves $F(x)=y$?
Before we try and answer this question, let us investigate the limits of our expectations. First notice that the question is purely local and has to be. If you consider the function $f(x)=\sin(x)$ in one dimension then $f(0)=0$ and we can ask the question above: is there a unique small solution of the equation $\sin(x)=y$ for given small $y$? It is clear that we would have infinitely many solutions if we were to drop the requirment that $x$ be small due to the periodicity of the function. Choosing $f(x)=|x|^2e_1$ for $x\in \mathbb{R}^n$ and any $n\in \mathbb{N}$, where $e_1=(1,0,\dots,0)$, one has that $f(0)=0$, but it is easy to find small $y\in \mathbb{R}^n$ such that no solution can exist. This example shows that we need to required local surjectivity somehow. Think about how this could be achieved before moving on to the next line.
Next, let us get a grasp of the nature of the problem at hand. Since we assume differentiability we know that $$ f(x)\simeq f(0)+Df(0)x+o(|x|)=Df(0)x+o(|x|). $$ Given small $y$, the problem $f(x)=y$ is therefore approximately given by $$ Df(0)x=y, $$ and it would appear that the invertibility of $Df(0)$ be sufficient to solve the problem. This is indeed the case in a precise way as formulated in the Inverse Function Theorem. Notice, however, that this invertibility condition is not necessary as the one dimensional example $f(x)=x^3$ shows.

Differentiation in $\mathbb{R}^n$

Definition (Partial Derivative)

Let $f:\mathbb{R}\to \mathbb{R}$ and $x_0 \in \mathbb{R}^n$ be given. For $j\in\{1,\dots,n\}$, the partial derivative of $f$ at $x_0$ with respect to $x_j$ is defined through $$ \frac{\partial f}{\partial x_j} (x_0) = \lim_{h \to 0} \frac{f(x_0 + he_j) - f(x_0)}{h}, $$ where $e_j^k=\begin{cases} 1\, ,&k=j\, ,\\ 0\, ,&k\neq j\, .\end{cases}$.

Example

Let $f(x,y,z) = x^2 + y ^2 + z^2$ be a function on $\mathbb{R}^3$. Then $$ \frac{\partial f}{\partial x} = 2x\, ,\:\frac{\partial f}{\partial y} = 2y\, ,\text{ and }\frac{\partial f}{\partial z} = 2z. $$
In the one variable case, the existence of $f'(x_0)$ implies continuity of $f$ at $x_0$. In the case of multiple variables, things are slightly more complicated.

Definition (Differentiability)

Let $f:\mathbb{R}^n\to \mathbb{R}$ and $x_0 \in\mathbb{R}^n$. Then $f$ is said to be differentiable at $x_0$ iff there is a linear map from $\mathbb{R}^n$ to $\mathbb{R}$, called $Df(x_0)$ with $$ \lim_{x\to x_0} \frac{ \| f(x) - f(x_0) - Df(x_0)(x - x_0)\|}{\|x-x_0\|} \to 0\, , $$ or, equivalently, iff $$ f(x) - f(x_0) - Df(x_0)(x - x_0)=o\bigl(\| x-x_0\|\bigr)\text{ as }x-x_0\to 0. $$ If we tacitly agree that $\mathbb{R}^n$ is equipped with its standard basis $e_1,\dots,e_n$, then $Df(x_0)$ can be represented as a specific $\mathbb{R}^{1\times n}$ matrix. Which one?

Remark

We often will write $f'(x_0)$ for $Df(x_0)$ if no confusion seems likely.

Example

Let $n>1$ and assume that $\frac{\partial f}{\partial x_1}(x),\dots,\frac{\partial f}{\partial x_n}(x)$ exist for $x$ near $x_0$. Does it follow that $f$ is continuous at $x_0$?

Discussion

Definition (Gradient)

The gradient of $f$ at $x_0$ is the vector defined by the validity of $$ \nabla f(x_0)\cdot h= Df(x_0)h\qquad\forall\, h\in \mathbb{R}^n\, . $$ It follows that $$ \nabla f(x_0)=\begin{bmatrix}\frac{\partial f}{\partial x_1} (x_0)\\\vdots\\\frac{\partial f}{\partial x_n} (x_0)\end{bmatrix}=Df(x_0)^\top. $$

Proposition

If $f$ is differentiable at $x_0$, then
(i) $\frac{\partial f}{\partial x_j}(x_0)$ exists for $j=1,\dots,n$.
(ii) $f$ is continuous at $x_0$.

What is a natural sufficient condition to test whether $f:\mathbb{R}^n\to \mathbb{R}$ is differentiable at a point $x_0\in \mathbb{R}^n$?

Theorem

If $\frac{\partial f}{\partial x_j}(x)$ exists in a neighborhood of $x_0$ and is continuous at $x_0$ for $j=1,\dots,n$, then $f$ is differentiable at $x_0$.

Proof

Example

Suppose that $\frac{\partial^2f}{\partial x\partial y}(x_0, y_0), \frac{\partial^2f}{\partial y\partial x}(x_0, y_0)$ exist. Does it necessarily follow that $$ \frac{\partial^2 f}{\partial x\partial y} (x_0) = \frac{\partial^2 f}{\partial y\partial x} (x_0)? $$

Discussion

It is natural to ask on which assumptions one has that $\frac{\partial^2 x}{\partial x\partial y} = \frac{\partial^2 f}{\partial y\partial x}$ in general.

Theorem

Assume that $$ \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial^2f}{\partial x\partial y} $$ exist in a neighborhood of $(x_0, y_0)$ and that $\frac{\partial^2 f}{\partial x\partial y}$ is continuous at $x_0$. Then $\frac{\partial^2f}{\partial y\partial x}$ also exists and $$ \frac{\partial^2f}{\partial y\partial x}(x_0,y_0) = \frac{\partial^2f}{\partial x\partial y} (x_0,y_0). $$ In particular, if $f$ is twice continuously differentialble at, then the mixed partial derivatives coincide.

Proof

Next, consider a map $f: \mathbb{R}^n \to \mathbb{R}^m$ $$ f(x) = \begin{bmatrix}f_1 \\ \vdots \\ f_m\end{bmatrix}\, ,\:x = \begin{bmatrix}x_1 \\ \vdots \\ x_n \end{bmatrix},\ x_0 = \begin{bmatrix}x^0_1 \\ \vdots \\ x^0_n \end{bmatrix} $$ Then $f$ is said to be differentiable at $x_0 $ if there is $m \times n$ matrix $A$ such that $$ \lim_{x \to x_0} \frac{ || f(x) - f(x_0) - A(x- x_0) ||}{||x- x_0||} = 0. $$ Such matrix is again denoted by $Df(x_0)$ or $f'(x_0)$.

Remark

Notice that differentiability of a function $f:X\to Y$ between general normed vector spaces $\bigl( X,\|\cdot\| _X\bigr)$ and $\bigl( Y,\|\cdot\| _Y\bigr)$ at a point $x_0\in X$ is defined as the existence of a bounded, linear map $A:X\to Y$ with the property that $$ \| f(x)-f(x_0)-A(x-x_0)\| _Y=o\bigl( \| x-x_0\| _X\bigr)\, . $$ A linear map $A:X\to Y$ is called bounded if it satisfies $$ \| Ax\| _Y\leq C\,\| x\| _X\, ,\: x\in X\, , $$ for some constant $C\geq 0$. Again the notation $Df(x_0)=f'(x_0)=A$ is tyically used. With this definition it is apparent that $f$ is differentiable at $x_0$ iff it can be approximated by an affine map to better than first order.

Exercise

Prove that $$ \| A\|:=\sup _{\| x\| _X=1}\| Ax\| _Y=\sup _{x\neq 0}\frac{\| Ax\| _Y}{\| x\|_X} $$ is a norm on the space $\mathcal{L}(X,Y)$ of linear and continuous maps from $X$ to $Y$. If $B\in \mathcal{L}(Y,Z)$ for an additional normed vector space $(Z,\|\cdot\| _Z)$, prove that $\| BA\|\leq \| B\| \| A\|$.

Proposition

If $f:\mathbb{R}^n\to \mathbb{R}^m$ is differentiable at $x_0$, then $\frac{\partial f_i}{\partial x_j} (x_0)$ exists for $1 \le i \le m$ and $1 \le j \le n$ and $$ Df(x_0) = \begin{bmatrix}\frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n}\\\vdots & \cdots & \vdots \\ \frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_m} \end{bmatrix}(x_0) $$

Notation

The Jacobian Jac$(f)$ of $f$ at $x$ is defined as Jac$(f)(x)=$det$Df(x)$.

Theorem

Let $E\subset \mathbb{R}^n$ be convex, $f: E\to \mathbb{R}^m$ be differentiable, and assume that $\sup _{x\in E}||Df(x)|| \le M<\infty$, then $$ ||f(x) - f(y)|| \le M ||x - y||\, ,\: x, y\in E\, . $$

The proof is left as an exercise.

Theorem (Chain Rule)

Let $f: \mathbb{R}^n \to \mathbb{R}^m$ and $g: \mathbb{R}^m \to \mathbb{R}^k$ be differentiable functions. Then $g \circ f: \mathbb{R}^n \to \mathbb{R}^k$ is differentiable and $$ D\bigl(g \circ f\bigr)(x) = Dg\bigl(f(x)\bigr)Df(x)\, . $$ Denoting the composition of linear maps on the right-hand-side of the last identity as a multiplication is motivated by the fact that it amounts to matrix multiplication of the corresponding representation matrices.

Inverse Function and Open Mapping Theorems

Theorem (Inverse Function Theorem)

Let $E$ be a domain in $\mathbb{R}^n$ and $f: E \to \mathbb{R}^n$ be a continuously differentiable function. Let $x_0\in E$ and assume that $Df(x_0)$ is invertible. Then
(i) There is $\delta>0$ such that $f$ is 1-1 on $\mathbb{B}(x_0, \delta)$ and $f\bigl(\mathbb{B}(x_0, \delta)\bigr)$ is open.
(ii) if $g: V=f\bigl(\mathbb{B}(x_0, \delta)\bigr)\to \mathbb{B}(x_0, \delta)$ is the inverse function to $f$, then it is continuously differentiable and $$ Dg(y) =\big[Df\bigl(g(y)\bigr)\big]^{-1}\, ,\: y\in V\, . $$

Proof

First we observe that, without loss of generality, we can assume that $x_0=0$ and that $y_0=f(x_0)=0$. Indeed if $x_0\neq 0$ and/or if $f(x_0)=y_0\neq 0$, simply consider $$ \tilde f(x) =f(x+x_0)-y_0 $$ instead of $f$ itself. Clearly $\tilde f(0)=f(x_0)-y_0=0$, the smoothness assumptions is equally valid and the invertibility assumption as well since $D\tilde f(0)=Df(x_0)$.
Next rewrite $f(x)=y$ as $Df(0)x=Df(0)x+y-f(x)$, which, on the invertibility assumption, amounts to $$ x=x+Df(0)^{-1}\bigl( y-f(x)\bigr)=:\Phi_y(x). $$ In other words, we have that, for any fixed $y$, $$ x\text{ solves }f(x)=y\:\Longleftrightarrow\: x\text{ is a fixed point of }\Phi_y. $$ Next observe that $$ \big |\Phi_y(x)\big |\leq \big | x+Df(0)^{-1}\bigl( y-f(x)\bigr)\big | \leq \| Df(0)^{-1}\| \bigl( |y|+o(|x|)\bigr)=c\, |y|+o(|x|)\text{ as }x\to 0, $$ since it holds by assumption that $f(x)=Df(0)x+o(|x|)$. Since $\delta _1>0$ can be found such that $$ o(|x|)\leq \frac{1}{2}|x|\leq \frac{1}{2}\delta _1\text{ for }|x|\leq \delta_1, $$ it can be inferred that $$ \big |\Phi_y(x)\big |\leq \delta_1/2+\delta_1/2=\delta_1\text{ if }|x|\leq \delta_1\text{ and }|y|\leq \delta_1/2c. $$ This means that $$ \Phi_y \bigl( \overline{\mathbb{B}}(0,\delta_1)\bigr)\subset \overline{\mathbb{B}}(0,\delta_1)\:\forall\: y\in \mathbb{B}(0,\delta_1/2c), $$ and $\Phi_y$ is a self-map of the given closed ball for any choice of $y$ in the corresponding ball. On the other hand, we also see that $$ \big |\Phi_y(x_1)-\Phi_y(x_2) \big |=\big | x_1-x_2-Df(0)^{-1} \bigl( f(x_1)-f(x_2)\bigr)= \big | x_1-x_2-Df(0)^{-1}Df\bigl( x_2+\tau(x_1-x_2)\bigr)(x_1-x_2)\big |, $$ for some $\tau\in(0,1)$. Now $Df$ is continuous at $x_0=0$ and therefore $$ Df\bigl( x_2+\tau(x_1-x_2)\bigr)=Df(0)+o(1)\text{ as }x_1,x_2\to 0, $$ so that we obtain $$ \big |\Phi_y(x_1)-\Phi_y(x_2) \big |\leq o(1)|x_1-x_2|\text{ as }x_1,x_2\to 0, $$ independently of $y$. There is therefore $\delta _2>0$ with $$ \big |\Phi_y(x_1)-\Phi_y(x_2) \big |\leq \frac{1}{2}|x_1-x_2|\:\forall\: x_1,x_2\in \mathbb{B}(0,\delta_2)\text{ and }\forall\: y\in \mathbb{B}(0,\delta_1/2c), $$ by the definition of $o(1)$. Choosing $\delta=\min(\delta_1,\delta_2)>0$, it is seen than $\Phi_y$ is a contractive self-map of $\overline{\mathbb{B}}(0,\delta)$ regardless of the choice of $y\in \mathbb{B}(0,\delta/2c)$. The Banach Fixed Point Theorem now implies \begin{equation}\label{soleq} \forall\: y\in \mathbb{B}(0,\delta/2c)\: \exists !\: x\in \overline{\mathbb{B}}(0,\delta)\text{ such that }\Phi_y(x)=x\text{ or , equivalently such that }f(x)=y. \end{equation} Continuity of $f$ at $0$ yields $\tilde \delta >0$ for which $$ f \bigl( \mathbb{B}(0,\tilde \delta )\bigr)\subset \mathbb{B}(0,\delta/2c) $$ so that $f\big |_{\mathbb{B}(0,\tilde \delta )}$ is injective and surjective on its range. Using \eqref{soleq}, one obtains a map $X=X(y)$ satisfying $X(y)=\Phi_y \bigl( X(y)\bigr)$ and then $$ DX(y)=D_y\Phi_y\bigl( X(y)\bigr)+D_x\Phi\bigl( X(y)\bigr)DX(y)= Df(0)^{-1}+\Bigl[ \mathbb{1}_n-Df(0)^{-1}Df\bigl( X(y)\bigr)\Bigr] DX(y), $$ which can be rewritten as $$ Df\bigl( X(y)\bigr)DX(y)=\mathbb{1}_n\text{ or }DX(y)=Df\bigl( X(y)\bigr)^{-1}. $$ It follows that $[f\big |_{\mathbb{B}(0,\tilde \delta )}]^{-1}$ is differentiable, that $Df ^{-1} (y)=Df(x)^{-1} $ if $y=f(x)$, and that $f \bigl( \mathbb{B}(0,\tilde \delta )\bigr)$ is open since $$ f \bigl( \mathbb{B}(0,\tilde \delta)\bigr)= \bigl[f^{-1}\bigr]^{-1}\bigl( \mathbb{B}(0,\tilde \delta)\bigr). $$

Exercise

Based on the proof of the Inverse Function Theorem, what does correspond to $\delta >0$ from the formulation of the theorem?

Corollary (Open Mapping Theorem)

If $f:E\to \mathbb{R}^n$ is such that $\operatorname{det} Df(x)\ne 0$ for all $x \in E$, then $f$ is open, i.e. it maps open sets to open sets.

Implicit Function Theorem

Next we consider an important consequence of the inverse function theorem.

Theorem (Implicit Function Theorem)

Let $$ f =(f_1,\dots, f_n):\mathbb{R}^{n+m} \to \mathbb{R}^n $$ be a continuously differentiable map. Assume there is $(a,b) \in \mathbb{R}^{n+m}$ such that $f(a,b) =0$ and that $D_xf(a,b)$ is invertible, where $D_x$ indicates that we are taking the derivative with respect to the $x$-variables only. Then there are open sets $U \subset \mathbb{R}^{n+m}$ and $W \subset \mathbb{R}^m$ with $$ (a,b)\in U\text{ and }b\in W, $$ and a function $g:W\to g(W)\subset \mathbb{R}^n$ such that for fixed $y\in W$, $\bigl(g(y), y\bigr)$ is the only solution in $U$ of $f(x,y)=0$ of the form $(x,y)$. In particular $$ f\bigl(g(y),y\bigr)=0\, ,\: y\in W. $$

Proof

Remark

The theorem implies that the relations $f(x,y) =0$ locally determine a function $g$ of the variable $y$ such that $f\bigl(g(y),y\bigr)$ in the vicinity of any point satisfying the assumptions. Moreover $$ Dg(y) = -D_xf\bigl(g(y),y\bigr)^{-1} D_yf\bigl(g(y),y\bigr)\, , y\in W. $$ where $$ D_xf = \begin{bmatrix}\frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \vdots & & \vdots \\ \frac{\partial f_n}{\partial x_1} & \cdots & \frac{\partial f_n}{\partial x_n} \end{bmatrix}, \quad D_yf = \begin{bmatrix}\frac{\partial f_1}{\partial y_1} & \cdots & \frac{\partial f_1}{\partial y_m} \\ \vdots & & \vdots \\ \frac{\partial f_n}{\partial y_1} & \cdots & \frac{\partial f_n}{\partial y_m} \end{bmatrix} $$

Example

Let $f = (f_1, f_2): \mathbb{R}^{2+3} \to \mathbb{R}^2$ be defined by $$ f(x_1,x_2,y_1,y_2,y_3)= \begin{bmatrix}2e^{x_1} + x_2 y_1 - 4y_2 + 3\\ x_2 \cos(x_1) - 6x_1 + 2y_1-y_3 \end{bmatrix} $$ Let $a=(0,1)$, $b=(3,2,7)$. Then $$ f(a,b) = \begin{bmatrix}2+3-4\cdot2+3 \\1- 0 +2\cdot 3 -7 \end{bmatrix} = \begin{bmatrix}0\\ 0\end{bmatrix}. $$ It is clear that $f$ is a continuously differentiable map and that \begin{eqnarray*} Df=\begin{bmatrix} 2e^{x_1} & y_1 & x_2 & -4 & 0 \\ -x_2 \sin(x_1) -6 & \cos(x_1) & 2 & 0 & -1\end{bmatrix}\, . \end{eqnarray*} Then $$ D_xf(a,b) = \begin{bmatrix}2e^{x_1} & y_1 \\ -x_2 \sin(x_1) -6 & \cos(x_1) \end{bmatrix}(0,1)= \begin{bmatrix}2 & 3 \cr -6 & 1 \end{bmatrix} $$ which is non-singular since its determinant is $20\ne 0$. The implicit function theorem then yields a function $x=g(y)$ defined in some neighborhood $W$ of $b$ and $$ Dg(y) = -\big[D_xf\bigl(g(y),y\bigr)\big]^{-1} D_yf\bigl(g(y),y\bigr)\, . $$ In particular $$ Dg(b) = -\frac{1}{20}\begin{bmatrix}1 & -3 \\ 6 & 2 \end{bmatrix} \begin{bmatrix}1 & -4 & 0 \\ 2 & 0 & -1\end{bmatrix}=- \frac{1}{20}\begin{bmatrix}-5 & -4 & 3 \\ 1 & -24 & -2\end{bmatrix} $$