A prime example of the usefulness of differentiability is the widely
used Inverse Function Theorem (formulated and proved in this
lecture). Here we do not give a proof but point to intuitive nature of
the result which also informs the way in which a rigorous proof can
and will be given. Let $n\in \mathbb{N}$ and consider a continuously
differentiable functon $f:\mathbb{R}^n\to \mathbb{R}^n$ and fix a
point $x_0\in \mathbb{R}^n$, which without loss of generality, can be
taken to be the origin, i.e. $x_0=0$. The function $f$ will assume
some value $f(0)$ there, which we can also assume to vanish. The
question we care to answer is the following:
if we are given $y\simeq
0$, when is it possible to say that there is exactly one $x\simeq 0$
which solves $F(x)=y$?
Before we try and answer this question, let us investigate the limits
of our expectations. First notice that the question is purely local
and has to be. If you consider the function $f(x)=\sin(x)$ in one
dimension then $f(0)=0$ and we can ask the question above: is there a
unique small solution of the equation $\sin(x)=y$ for given small
$y$? It is clear that we would have infinitely many solutions if we
were to drop the requirment that $x$ be small due to the periodicity
of the function. Choosing $f(x)=|x|^2e_1$ for $x\in \mathbb{R}^n$ and
any $n\in \mathbb{N}$, where $e_1=(1,0,\dots,0)$, one has that
$f(0)=0$, but it is easy to find small $y\in \mathbb{R}^n$ such that
no solution can exist. This example shows that we need to required
local surjectivity somehow. Think about how this could be achieved
before moving on to the next line.
Next, let us get a grasp of the nature of the problem at hand. Since
we assume differentiability we know that
$$
f(x)\simeq f(0)+Df(0)x+o(|x|)=Df(0)x+o(|x|).
$$
Given small $y$, the problem $f(x)=y$ is therefore approximately given
by
$$
Df(0)x=y,
$$
and it would appear that the invertibility of $Df(0)$ be sufficient to
solve the problem. This is indeed the case in a precise way as
formulated in the Inverse Function Theorem. Notice, however, that this
invertibility condition is not necessary as the one dimensional
example $f(x)=x^3$ shows.
Differentiation in $\mathbb{R}^n$
Definition (Partial Derivative)
Let $f:\mathbb{R}\to \mathbb{R}$ and $x_0 \in \mathbb{R}^n$ be given. For
$j\in\{1,\dots,n\}$, the partial derivative of $f$ at
$x_0$ with respect to $x_j$ is defined through
$$
\frac{\partial f}{\partial x_j} (x_0) = \lim_{h \to 0} \frac{f(x_0 +
he_j) - f(x_0)}{h},
$$
where $e_j^k=\begin{cases} 1\, ,&k=j\, ,\\ 0\, ,&k\neq j\, .\end{cases}$.
Example
Let $f(x,y,z) = x^2 + y ^2 + z^2$ be a function on $\mathbb{R}^3$. Then
$$
\frac{\partial f}{\partial x} = 2x\, ,\:\frac{\partial f}{\partial y} = 2y\, ,\text{ and
}\frac{\partial f}{\partial z} = 2z.
$$
In the one variable case, the existence of $f'(x_0)$ implies
continuity of $f$ at $x_0$. In the case of multiple variables, things
are slightly more complicated.
Definition (Differentiability)
Let $f:\mathbb{R}^n\to \mathbb{R}$ and $x_0 \in\mathbb{R}^n$. Then $f$
is said to be differentiable at $x_0$ iff there is a
linear map from $\mathbb{R}^n$ to $\mathbb{R}$, called $Df(x_0)$ with
$$
\lim_{x\to x_0} \frac{ \| f(x) - f(x_0) - Df(x_0)(x - x_0)\|}{\|x-x_0\|} \to 0\, ,
$$
or, equivalently, iff
$$
f(x) - f(x_0) - Df(x_0)(x - x_0)=o\bigl(\| x-x_0\|\bigr)\text{ as
}x-x_0\to 0.
$$
If we tacitly agree that $\mathbb{R}^n$ is equipped with its standard
basis $e_1,\dots,e_n$, then $Df(x_0)$ can be represented as a specific
$\mathbb{R}^{1\times n}$ matrix. Which one?
Remark
We often will write $f'(x_0)$ for $Df(x_0)$ if no confusion seems likely.
Example
Let $n>1$ and assume that $\frac{\partial f}{\partial
x_1}(x),\dots,\frac{\partial f}{\partial x_n}(x)$ exist for $x$ near
$x_0$. Does it follow that $f$ is continuous at $x_0$?
The following counterexample shows that the answer is no. Let $f$
be given by
$$
f(x,y) =\begin{cases}\frac{xy}{x^2 + y^2}\, ,&(x,y) \ne (0,0)\\
0\, ,&(x,y) = (0,0)\, .\end{cases}
$$
Then $f$ is not continuous at $(0,0)$. However,
$$
\frac{\partial f}{\partial x}(0,0)=\lim_{h\to
0}\frac{f(h,0)-f(0,0)}{h}=\lim_{h\to 0}\frac{0}{h}=0
$$
and
$$
\frac{\partial f}{\partial y}(0,0)=\lim_{h\to
0}\frac{f(0,h)-f(0,0)}{h}=\lim_{h\to 0}\frac{0}{h}=0
$$
both exist as do all partial derivatives at any $(x,y)\neq(0,0)$.
Definition (Gradient)
The gradient of $f$ at $x_0$ is the vector defined
by the validity of
$$
\nabla f(x_0)\cdot h= Df(x_0)h\qquad\forall\, h\in \mathbb{R}^n\, .
$$
It follows that
$$
\nabla f(x_0)=\begin{bmatrix}\frac{\partial f}{\partial x_1}
(x_0)\\\vdots\\\frac{\partial f}{\partial x_n}
(x_0)\end{bmatrix}=Df(x_0)^\top.
$$
Proposition
If $f$ is differentiable at $x_0$, then
(i) $\frac{\partial f}{\partial x_j}(x_0)$ exists for $j=1,\dots,n$.
(ii) $f$ is continuous at $x_0$.
What is a natural sufficient condition to test whether
$f:\mathbb{R}^n\to \mathbb{R}$ is differentiable at a point $x_0\in
\mathbb{R}^n$?
Theorem
If $\frac{\partial f}{\partial x_j}(x)$ exists in a neighborhood of
$x_0$ and is continuous at $x_0$ for $j=1,\dots,n$, then $f$ is
differentiable at $x_0$.
Consider
\begin{equation*}
f(x_0^1+h^1,\dots, x_0^n+h^h)-f(x_0)=f(x_0^1+h^1,x_0^2+h^2\dots,
x_0^n+h^h)-f(x_0^1,x_0^2+h^2,\dots,
x_0^n+h^n)+\dots+f(x_0^1,\dots,x^{n-1}_0,x_0^n+h^n)-
f(x_0^1,\dots,x_0^n).
\end{equation*}
the existence of the partial derivatives yields $\tilde h^j\in(0,h^j)$
for $j=1,\dots,n$ such that
\begin{multline*}
f(x_0^1,\dots, x_0^{j-1},x_0^j+h^j,\dots,x_0^n+h^n)-f(x_0^1,\dots,
x_0^j,x_0^{j+1}+h^{j+1},\dots, x_0^n+h^n)\\ =\frac{\partial}{\partial
x_j}f(x_0^1,\dots,x_0^{j-1},x_0^j+\tilde
h^j,x_0^{j+1}+h^{j+1},\dots, x_0^n+h^n)\,({x_0^j}+h^j-{x_0^j}).
\end{multline*}
Summarizing we see that
$$
f(x_0+h)-f(x_0)-\sum_{j=1}^n \frac{\partial}{\partial x_j}f(x_0)\,
h^j= \sum_{j=1}^n \big[\frac{\partial}{\partial x_j}f(x_0)-\frac{\partial}{\partial
x_j}f(\tilde x_j)\big]\, h^j
$$
for $\tilde x_j=(x_0^1,\dots,x_0^{j-1},x_0^j+\tilde
h^j,x_0^{j+1}+h^{j+1},\dots,x_0^n+h^n)$. Now the continuity of all
partial derivatives at $x_0$ yields that
\begin{equation*}
\frac{1}{\| h\|}\big\|\sum_{j=1}^n \big[\frac{\partial}{\partial
x_j}f(x_0)-\frac{\partial}{\partial
x_j}f(\tilde x_j)\big]\, h^j\big\|\leq\Bigl( \sum_{j=1}^n
\big[\frac{\partial}{\partial x_j}f(x_0)-\frac{\partial}{\partial
x_j}f(\tilde x_j)\big]^2\Bigr)^{1/2}\longrightarrow 0\text{ as }h\to 0\, ,
\end{equation*}
since $\tilde x_j\to x_0$ as $h\to 0$. Differentiability at $x_0$ is
therefore established along with
$$
Df(x_0)=\begin{bmatrix} \frac{\partial}{\partial x_1}f(x_0)\:\cdots
\:\frac{\partial}{\partial x_n}f(x_0)\end{bmatrix}\in \mathbb{R}^{1\times n}\, .
$$
Example
Suppose that $\frac{\partial^2f}{\partial x\partial y}(x_0, y_0),
\frac{\partial^2f}{\partial y\partial x}(x_0, y_0)$
exist. Does it necessarily follow that
$$
\frac{\partial^2 f}{\partial x\partial y} (x_0) = \frac{\partial^2
f}{\partial y\partial x} (x_0)?
$$
The answer is no in general as the following counter-example shows:
$$
f(x,y)=\begin{cases}xy \frac{x^2 - y^2}{x^2 + y^2}\,
,&(x,y)\ne(0,0)\\0\, ,&(x,y) = (0,0)\end{cases}
$$
Notice that, for $(x,y) \ne (0,0)$, one has that
$$
f(x,y)=xy - \frac{2y^2}{x^2 + y^2} = -xy + \frac{2x^2}{x^2 + y^2}
$$
and thus
\begin{equation*}
\frac{\partial f}{\partial x}(x,y)= y + \frac{2y^2 \cdot 2x}{(x^2 + y^2 ) ^2 }= y +
\frac{4xy^2}{(x^2 + y^2)^2 },\:
\frac{\partial f}{\partial y}(x,y)= -x - \frac{4x^2y}{(x^2 + y^2 ) ^2 }\qquad
\end{equation*}
For $(x,y) = (0,0)$ it can be verified that
$$
\frac{\partial f}{\partial x}(0,0) = 0, \frac{\partial f}{\partial
y}(0,0) = 0.
$$
Moving to the second order derivatives, one computes that
\begin{align*}
\frac{\partial^2f}{\partial x\partial y}&=\frac{\partial}{\partial
x}\left ( -x - \frac{4x^2y}{(x^2 + y^2 ) ^2 } \right ) \\
&=\frac{\partial}{\partial x}\left ( -x - \frac{4y}{(x^2 + y^2 ) ^2 }
+ \frac{4y^3}{(x^2 + y^2 ) ^2} \right ) \\
&=1 + \frac{8yx}{(x^2 + y^2)^3 } + \frac{16 xy^3}{(x^2 + y^2)^3}
\end{align*}
and
\begin{align*}
\frac{\partial^2f}{\partial y\partial x}&=\frac{\partial}{\partial
y}\left ( y+ \frac{4xy^2}{(x^2 + y^2 ) ^2 } \right ) \\
&=\frac{\partial}{\partial y} \left [ y+ \frac{4x}{(x^2 + y^2 ) } -
\frac{4x^3}{(x^2 + y^2 ) ^2 } \right ] \\
&=1 - \frac{8x^2}{(x^2 + y^2 ) ^2 } + \frac{16x^4}{(x^2 + y^2 ) ^3 }
\end{align*}
for $(x,y)\neq(0,0)$, and
\begin{equation*}
\frac{\partial^2 f}{\partial x\partial y}(0,0)= \lim_{ x \to 0}
\frac{\frac{\partial f}{\partial y}(x,0) - \frac{\partial
f}{\partial y} (0,0)}{x}= \lim_{x \to 0} \frac{-x-0}{x} = -1,\:
\frac{\partial^2 f}{\partial y\partial x}(0,0) = \lim_{ y \to 0}
\frac{\frac{\partial f}{\partial x}(0,y) -
\frac{\partial f}{\partial x} (0,0)}{y}=\lim_{x \to 0} \frac{y-0}{y} = 1
\end{equation*}
so that clearly
$$
\frac{\partial^2 f}{\partial x\partial y}(0,0) \ne \frac{\partial^2
f}{\partial y\partial x}(0,0)\, .
$$
It is natural to ask on which assumptions one has that
$\frac{\partial^2 x}{\partial x\partial y} = \frac{\partial^2
f}{\partial y\partial x}$ in general.
Theorem
Assume that
$$
\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y},
\frac{\partial^2f}{\partial x\partial y}
$$
exist in a neighborhood of $(x_0, y_0)$ and that $\frac{\partial^2
f}{\partial x\partial y}$ is continuous at $x_0$. Then
$\frac{\partial^2f}{\partial y\partial x}$ also exists and
$$
\frac{\partial^2f}{\partial y\partial x}(x_0,y_0) =
\frac{\partial^2f}{\partial x\partial y} (x_0,y_0).
$$
In particular, if $f$ is twice continuously differentialble at, then
the mixed partial derivatives coincide.
For $h,k \in \mathbb{R}^n$ define
$$
v(h,k)=f(x_0 + h, y_0 + k) - f(x_0 +h, y_0) - f(x_0, y_0 +k) + f(x_0,
y_0),
$$
and
$$
u(h,t) = f(x_0 +h, t) - f(x_0, t)\, .
$$
Then the Mean Value Theorem yields $\theta$ and $\tilde\theta$ such
that
\begin{equation*}
u(h,y_0 + k) - u(h,y_0)=\frac{du}{dt}(h,y+\theta
k)k=\big[\frac{\partial f}{\partial y}(x_0+h, y_0 +
\theta k) - \frac{\partial f}{\partial y} (x_0, y_0+ \theta k)\big]k
=\frac{\partial^2f}{\partial x\partial y}(x_0 +
\tilde\theta h, y_0 + \theta k)h k
\end{equation*}
If $\epsilon>0$, continuity of $\frac{\partial^2 f}{\partial x\partial
y}$ at $(x_0, y_0)$ implies the
existence of $\delta>0$ such that
$$
\big |\frac{\partial^2f}{\partial x\partial y} (x_0 + \tilde\theta h,
y_0 +\theta k) -
\frac{\partial^2f}{\partial x\partial y}(x_0, y_0)
\big|\leq\epsilon\, ,\text{ if }|h| + |k| <
\delta\, .
$$
Since $v(h,k)=u(h,y_0+k)-u(h,y_0)$, it also holds that
$$
\big |\frac{v(h,k)}{hk} - \frac{\partial^2f}{\partial x\partial
y}(x_0,y_0)\big |\leq\epsilon\,
,\text{ when }|h| + |k| \leq\delta.
$$
On the other hand,
\begin{multline*}
\lim_{k \to 0} \lim_{h \to 0} \frac{u(h, k)}{hk}
=\lim_{k \to 0}\frac{1}{k} \lim_{h \to 0} \frac{f(x_0 +h, y_0 +k) -
f(x_0, y_0 +k) -
f(x_0 +h, y_0) + f(x_0, y_0)}{h} \\
=\lim_{k \to 0} \frac{1}{k} \left [ \frac{\partial f}{\partial x}(x_0,
y_0 +k) - \frac{\partial f}{\partial
x} ( x_0, y_0) \right ]=\frac{\partial ^2f}{\partial y\partial x}
(x_0, y_0),
\end{multline*}
as long as $\lim_{k \to 0} \lim_{h \to 0} \frac{v(h,k)}{hk }$
exists. By the above
one has that
$$
\lim_{k \to 0} \lim_{h \to 0}\Big[\frac{v(h,k)}{hk
}-\frac{\partial^2f}{\partial x\partial y} ( x_0,
y_0)\Bigr]=0\, .
$$
Hence $\frac{\partial^2f}{\partial y\partial x} (x_0, y_0)$ exists and
coincides with $\frac{\partial^2f}{\partial
x\partial y}(x_0, y_0)$.
Next, consider a map $f: \mathbb{R}^n \to \mathbb{R}^m$
$$
f(x) = \begin{bmatrix}f_1 \\ \vdots \\ f_m\end{bmatrix}\, ,\:x =
\begin{bmatrix}x_1 \\ \vdots \\ x_n \end{bmatrix},\
x_0 = \begin{bmatrix}x^0_1 \\ \vdots \\ x^0_n \end{bmatrix}
$$
Then $f$ is said to be differentiable at $x_0 $ if there is $m \times
n$ matrix $A$ such that
$$
\lim_{x \to x_0} \frac{ || f(x) - f(x_0) - A(x- x_0) ||}{||x- x_0||} = 0.
$$
Such matrix is again denoted by $Df(x_0)$ or $f'(x_0)$.
Remark
Notice that differentiability of a function $f:X\to Y$ between general
normed vector spaces $\bigl( X,\|\cdot\| _X\bigr)$ and $\bigl(
Y,\|\cdot\| _Y\bigr)$ at a point $x_0\in X$ is defined as the
existence of a bounded, linear map $A:X\to Y$ with the property that
$$
\| f(x)-f(x_0)-A(x-x_0)\| _Y=o\bigl( \| x-x_0\| _X\bigr)\, .
$$
A linear map $A:X\to Y$ is called bounded if it satisfies
$$
\| Ax\| _Y\leq C\,\| x\| _X\, ,\: x\in X\, ,
$$
for some constant $C\geq 0$. Again the notation $Df(x_0)=f'(x_0)=A$ is
tyically used. With this definition it is apparent that $f$ is
differentiable at $x_0$ iff it can be approximated by an affine map to
better than first order.
Exercise
Prove that
$$
\| A\|:=\sup _{\| x\| _X=1}\| Ax\| _Y=\sup _{x\neq 0}\frac{\| Ax\|
_Y}{\| x\|_X}
$$
is a norm on the space $\mathcal{L}(X,Y)$ of linear and continuous
maps from $X$ to $Y$. If $B\in \mathcal{L}(Y,Z)$ for an additional
normed vector space $(Z,\|\cdot\| _Z)$, prove that $\| BA\|\leq \| B\|
\| A\|$.
Proposition
If $f:\mathbb{R}^n\to \mathbb{R}^m$ is differentiable at $x_0$, then
$\frac{\partial f_i}{\partial x_j} (x_0)$ exists for $1 \le i \le m$
and $1 \le j \le n$ and
$$
Df(x_0) = \begin{bmatrix}\frac{\partial f_1}{\partial x_1} & \cdots &
\frac{\partial f_1}{\partial x_n}\\\vdots & \cdots & \vdots \\
\frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial
f_m}{\partial x_m} \end{bmatrix}(x_0)
$$
Notation
The Jacobian Jac$(f)$ of $f$ at $x$ is defined as
Jac$(f)(x)=$det$Df(x)$.
Theorem
Let $E\subset \mathbb{R}^n$ be convex, $f: E\to \mathbb{R}^m$ be
differentiable, and assume that $\sup _{x\in E}||Df(x)|| \le
M<\infty$, then
$$
||f(x) - f(y)|| \le M ||x - y||\, ,\: x, y\in E\, .
$$
The proof is left as an exercise.
Theorem (Chain Rule)
Let $f: \mathbb{R}^n \to \mathbb{R}^m$ and $g: \mathbb{R}^m \to
\mathbb{R}^k$ be differentiable functions. Then
$g \circ f: \mathbb{R}^n \to \mathbb{R}^k$ is differentiable and
$$
D\bigl(g \circ f\bigr)(x) = Dg\bigl(f(x)\bigr)Df(x)\, .
$$
Denoting the composition of linear maps on the right-hand-side of the
last identity as a multiplication is motivated by the fact that it
amounts to matrix multiplication of the corresponding representation
matrices.
Inverse Function and Open Mapping Theorems
Theorem (Inverse Function Theorem)
Let $E$ be a domain in $\mathbb{R}^n$ and $f: E \to \mathbb{R}^n$ be a
continuously differentiable function. Let $x_0\in E$ and assume that
$Df(x_0)$ is invertible. Then
(i) There is $\delta>0$ such that $f$ is 1-1 on $\mathbb{B}(x_0,
\delta)$ and $f\bigl(\mathbb{B}(x_0, \delta)\bigr)$ is open.
(ii) if $g: V=f\bigl(\mathbb{B}(x_0, \delta)\bigr)\to \mathbb{B}(x_0,
\delta)$ is the inverse function to $f$, then it is continuously
differentiable and
$$
Dg(y) =\big[Df\bigl(g(y)\bigr)\big]^{-1}\, ,\: y\in V\, .
$$
First we observe that, without loss of generality, we can assume that
$x_0=0$ and that $y_0=f(x_0)=0$. Indeed if $x_0\neq 0$ and/or if
$f(x_0)=y_0\neq 0$, simply consider
$$
\tilde f(x) =f(x+x_0)-y_0
$$
instead of $f$ itself. Clearly $\tilde f(0)=f(x_0)-y_0=0$, the smoothness
assumptions is equally valid and the invertibility assumption as well
since $D\tilde f(0)=Df(x_0)$.
Next rewrite $f(x)=y$ as $Df(0)x=Df(0)x+y-f(x)$, which, on the
invertibility assumption, amounts to
$$
x=x+Df(0)^{-1}\bigl( y-f(x)\bigr)=:\Phi_y(x).
$$
In other words, we have that, for any fixed $y$,
$$
x\text{ solves }f(x)=y\:\Longleftrightarrow\: x\text{ is a fixed
point of }\Phi_y.
$$
Next observe that
$$
\big |\Phi_y(x)\big |\leq \big | x+Df(0)^{-1}\bigl( y-f(x)\bigr)\big
| \leq \| Df(0)^{-1}\| \bigl( |y|+o(|x|)\bigr)=c\, |y|+o(|x|)\text{ as
}x\to 0,
$$
since it holds by assumption that $f(x)=Df(0)x+o(|x|)$. Since $\delta
_1>0$ can be found such that
$$
o(|x|)\leq \frac{1}{2}|x|\leq \frac{1}{2}\delta _1\text{ for }|x|\leq
\delta_1,
$$
it can be inferred that
$$
\big |\Phi_y(x)\big |\leq \delta_1/2+\delta_1/2=\delta_1\text{ if
}|x|\leq \delta_1\text{ and }|y|\leq \delta_1/2c.
$$
This means that
$$
\Phi_y \bigl( \overline{\mathbb{B}}(0,\delta_1)\bigr)\subset
\overline{\mathbb{B}}(0,\delta_1)\:\forall\: y\in \mathbb{B}(0,\delta_1/2c),
$$
and $\Phi_y$ is a self-map of the given closed ball for any choice of
$y$ in the corresponding ball. On the other hand, we also see that
$$
\big |\Phi_y(x_1)-\Phi_y(x_2) \big |=\big | x_1-x_2-Df(0)^{-1} \bigl(
f(x_1)-f(x_2)\bigr)= \big | x_1-x_2-Df(0)^{-1}Df\bigl(
x_2+\tau(x_1-x_2)\bigr)(x_1-x_2)\big |,
$$
for some $\tau\in(0,1)$. Now $Df$ is continuous at $x_0=0$ and
therefore
$$
Df\bigl( x_2+\tau(x_1-x_2)\bigr)=Df(0)+o(1)\text{ as }x_1,x_2\to 0,
$$
so that we obtain
$$
\big |\Phi_y(x_1)-\Phi_y(x_2) \big |\leq o(1)|x_1-x_2|\text{ as
}x_1,x_2\to 0,
$$
independently of $y$. There is therefore $\delta _2>0$ with
$$
\big |\Phi_y(x_1)-\Phi_y(x_2) \big |\leq
\frac{1}{2}|x_1-x_2|\:\forall\: x_1,x_2\in
\mathbb{B}(0,\delta_2)\text{ and }\forall\: y\in
\mathbb{B}(0,\delta_1/2c),
$$
by the definition of $o(1)$. Choosing
$\delta=\min(\delta_1,\delta_2)>0$, it is seen than $\Phi_y$ is a
contractive self-map of $\overline{\mathbb{B}}(0,\delta)$ regardless of the
choice of $y\in \mathbb{B}(0,\delta/2c)$. The Banach Fixed Point Theorem
now implies
\begin{equation}\label{soleq}
\forall\: y\in \mathbb{B}(0,\delta/2c)\: \exists !\: x\in
\overline{\mathbb{B}}(0,\delta)\text{ such that }\Phi_y(x)=x\text{ or
, equivalently such that }f(x)=y.
\end{equation}
Continuity of $f$ at $0$ yields $\tilde \delta >0$ for which
$$
f \bigl( \mathbb{B}(0,\tilde \delta )\bigr)\subset
\mathbb{B}(0,\delta/2c)
$$
so that $f\big |_{\mathbb{B}(0,\tilde \delta )}$ is injective and
surjective on its range. Using \eqref{soleq}, one obtains a map
$X=X(y)$ satisfying $X(y)=\Phi_y \bigl( X(y)\bigr)$ and then
$$
DX(y)=D_y\Phi_y\bigl( X(y)\bigr)+D_x\Phi\bigl( X(y)\bigr)DX(y)=
Df(0)^{-1}+\Bigl[ \mathbb{1}_n-Df(0)^{-1}Df\bigl( X(y)\bigr)\Bigr] DX(y),
$$
which can be rewritten as
$$
Df\bigl( X(y)\bigr)DX(y)=\mathbb{1}_n\text{ or }DX(y)=Df\bigl(
X(y)\bigr)^{-1}.
$$
It follows that $[f\big |_{\mathbb{B}(0,\tilde \delta )}]^{-1}$ is
differentiable, that $Df ^{-1} (y)=Df(x)^{-1} $ if $y=f(x)$, and that
$f \bigl( \mathbb{B}(0,\tilde \delta )\bigr)$ is open since
$$
f \bigl( \mathbb{B}(0,\tilde \delta)\bigr)=
\bigl[f^{-1}\bigr]^{-1}\bigl( \mathbb{B}(0,\tilde
\delta)\bigr).
$$
Exercise
Based on the proof of the Inverse Function Theorem, what does
correspond to $\delta >0$ from the formulation of the theorem?
Corollary (Open Mapping Theorem)
If $f:E\to \mathbb{R}^n$ is such that $\operatorname{det} Df(x)\ne 0$
for all $x \in E$, then $f$ is open, i.e. it maps open sets to open sets.
Implicit Function Theorem
Next we consider an important consequence of the inverse function
theorem.
Theorem (Implicit Function Theorem)
Let
$$
f =(f_1,\dots, f_n):\mathbb{R}^{n+m} \to \mathbb{R}^n
$$
be a continuously differentiable map. Assume there is $(a,b) \in
\mathbb{R}^{n+m}$ such that $f(a,b) =0$ and that $D_xf(a,b)$ is
invertible, where $D_x$ indicates that we are taking the derivative
with respect to the $x$-variables only. Then there are open sets $U
\subset \mathbb{R}^{n+m}$ and $W \subset \mathbb{R}^m$ with
$$
(a,b)\in U\text{ and }b\in W,
$$
and a function $g:W\to g(W)\subset \mathbb{R}^n$ such that for fixed
$y\in W$, $\bigl(g(y), y\bigr)$ is the only solution in $U$ of
$f(x,y)=0$ of the form $(x,y)$. In particular
$$
f\bigl(g(y),y\bigr)=0\, ,\: y\in W.
$$
Define the map
$$
F: \mathbb{R}^{n+m} \to \mathbb{R}^{n+m}\,
,\: \begin{bmatrix}x\\y\end{bmatrix}\mapsto \begin{bmatrix}f(x,y)\\
y\end{bmatrix}\, ,
$$
and observe that $F(a,b) =(0,b)$ and that $F$ is continuously
differentiable. Its derivative is given by
$$
DF(x,y) = \begin{bmatrix}D_xf & D_yf \\ 0 & \mathbb{1}_m \end{bmatrix}
$$
det $DF(a,b) =$ det $D_xf(a,b) \ne 0$ by assumption. The inverse
function theorem then yields a function $G: V \to U$, where $U
=\mathbb{B}\bigl((a,b), r\bigr)$ for some $r>0$ and $V = F(U)$, such
that $G \circ F =\operatorname{id}_U$ and $F\circ G
=\operatorname{id}_V$. In particular
$$
\begin{bmatrix}0 \\ y \end{bmatrix} = \bigl(F\circ
G\bigr)(0,y)=\begin{bmatrix}
f(G_1(0,y),G_2(0,y) )\\G_2(0,y)\end{bmatrix}\, .
$$
It follows that $G_2(0,y)=y$. Defining the function $g$ by
$g(y)=G_1(0, y)$ for $y\in W=\{\tilde y: (0,\tilde y) \in V\}$,
one has finally that
$$
f\bigl(g(y),y\bigr) = 0,\quad y\in W\, ,
$$
as claimed. The differentiability claim of the Inverse Function
Theorem and the chain rule now yield
\begin{equation*}
D_xf\bigl(g(y), y\bigr) Dg(y) + D_yf\bigl(g(y),y\bigr) =
0\Leftrightarrow Dg(y) = -\big[D_xf\bigl(g(y),y\bigr)\big]^{-1}
D_yf\bigl(g(y),y\bigr).
\end{equation*}
Remark
The theorem implies that the relations $f(x,y) =0$ locally determine a
function $g$ of the variable $y$ such that $f\bigl(g(y),y\bigr)$ in
the vicinity of any point satisfying the assumptions. Moreover
$$
Dg(y) = -D_xf\bigl(g(y),y\bigr)^{-1} D_yf\bigl(g(y),y\bigr)\, , y\in
W.
$$
where
$$
D_xf = \begin{bmatrix}\frac{\partial f_1}{\partial x_1} & \cdots &
\frac{\partial f_1}{\partial x_n} \\
\vdots & & \vdots \\ \frac{\partial f_n}{\partial x_1} & \cdots &
\frac{\partial f_n}{\partial
x_n} \end{bmatrix},
\quad
D_yf = \begin{bmatrix}\frac{\partial f_1}{\partial y_1} & \cdots &
\frac{\partial f_1}{\partial y_m} \\
\vdots & & \vdots \\ \frac{\partial f_n}{\partial y_1} & \cdots &
\frac{\partial f_n}{\partial
y_m} \end{bmatrix}
$$
Example
Let $f = (f_1, f_2): \mathbb{R}^{2+3} \to \mathbb{R}^2$ be defined by
$$
f(x_1,x_2,y_1,y_2,y_3)= \begin{bmatrix}2e^{x_1} + x_2 y_1 - 4y_2 +
3\\ x_2 \cos(x_1) - 6x_1 + 2y_1-y_3 \end{bmatrix}
$$
Let $a=(0,1)$, $b=(3,2,7)$. Then
$$
f(a,b) = \begin{bmatrix}2+3-4\cdot2+3 \\1- 0 +2\cdot 3 -7
\end{bmatrix} = \begin{bmatrix}0\\ 0\end{bmatrix}.
$$
It is clear that $f$ is a continuously differentiable map and that
\begin{eqnarray*}
Df=\begin{bmatrix} 2e^{x_1} & y_1 & x_2 & -4 & 0 \\ -x_2 \sin(x_1) -6
& \cos(x_1) & 2 & 0 & -1\end{bmatrix}\, .
\end{eqnarray*}
Then
$$
D_xf(a,b) = \begin{bmatrix}2e^{x_1} & y_1 \\ -x_2 \sin(x_1) -6 &
\cos(x_1) \end{bmatrix}(0,1)= \begin{bmatrix}2 & 3 \cr -6 & 1
\end{bmatrix}
$$
which is non-singular since its determinant is $20\ne 0$. The implicit
function theorem then yields a function $x=g(y)$ defined in some
neighborhood $W$ of $b$ and
$$
Dg(y) = -\big[D_xf\bigl(g(y),y\bigr)\big]^{-1} D_yf\bigl(g(y),y\bigr)\, .
$$
In particular
$$
Dg(b) = -\frac{1}{20}\begin{bmatrix}1 & -3 \\ 6 & 2 \end{bmatrix}
\begin{bmatrix}1 & -4 & 0 \\ 2 & 0 & -1\end{bmatrix}=-
\frac{1}{20}\begin{bmatrix}-5 & -4 & 3 \\ 1 & -24 & -2\end{bmatrix}
$$