Lecture 9. Maxima and Minima

Motivation

In order to motivate the search for minima and maxima, we present an extended example which finds widespread use in science and technology: the singular value decomposition of a matrix.

Consider a matrix $A\in \mathbb{R}^{m\times n}$. Recall that the rank of $A$ $$ \operatorname{rank}(A)=\operatorname{dim}\Bigl( \operatorname{span}\{ A_{\bullet 1},A_{\bullet 2},\dots,A_{\bullet n}\}\Bigr) $$ is the dimension of the range of $A$. We use the notation $A_{\bullet k}$ for the $k$-th column of $A$. Now the "simplest" possible matrices are column $\mathbb{R}^{m\times 1}$ and row $\mathbb{R}^{1\times n}$ vectors. They represent the simple linear maps $$ \underline{u}:\mathbb{R}\to \mathbb{R}^m,\: x\mapsto x\, u $$ and $$ \underline{v}:\mathbb{R}^n\to \mathbb{R}, ,\: u\mapsto v u $$ for $u\in \mathbb{R}^{m\times 1}\simeq \mathbb{R}^m$ and for $v\in\mathbb{R}^{1\times n}\simeq \mathbb{R}^n$. If we interpret $v$ as a column vector as well, then we would write $v^\mathsf{T}u$ instead of $vu$. Combining two of these we obtain a rank-one matrix $$ u\, v^\mathsf{T}\in \mathbb{R}^{m\times n} $$ which we think of as the map $\mathbb{R}^m\ni x\mapsto (v^\mathsf{T}x)u\in \mathbb{R}^n$. Of course, for this matrix to actually be rank-one, it needs to be assumed that $u\neq 0$ and $v\neq 0$. Notice that $$ u\, v^\mathsf{T}=\| u\| \| v\| \frac{u}{\| u\|}\frac{v^\mathsf{T}}{\| v\|}= \sigma \bar u \bar v ^\mathsf{T}, $$ where $\sigma \geq 0$ and $\| \bar u\|=1=\|\bar v\|$.
Notice that $$ Ax=\sum _{i=1}^m \Bigl( \sum _{j=1}^n a_{ij}x_j\Bigr)e_i= \sum _{i=1}^m \sum _{j=1}^n a_{ij}e^m_i \bigl( e^n_j\bigr)^\mathsf{T}x, $$ so that $$ A=\sum _{i=1}^m \sum _{j=1}^n a_{ij}e^m_i \bigl( e^n_j\bigr)^\mathsf{T}, $$ and the superscripts are used to indicate the space in which the natural basis is taken. Clearly if $\operatorname{rank}(A)=k$, then $k$ vectors "should suffice", that is, we should have that $$ A=\sum _{j=1}^k \sigma_j u_j\, v_j^\mathsf{T}. $$

The question we would like to answer is: does this actually work? Can we find $$ \sigma_j, u_j\in \mathbb{R}^m,v_j\in \mathbb{R}^n\text{ such that } A=\sum _{j=1}^k \sigma_j u_j\, v_j^\mathsf{T}\text{, if }\operatorname{rank}(A)=k? $$

As an exercise, prove that $\operatorname{rank}(A)=\operatorname{rank}(A^\mathsf{T})$. We define the Frobenius norm of a matrix by $$ \| A\| _{F}=\sqrt{\operatorname{tr}(A^\mathsf{T}A)}=\Bigl[\sum _{i=1}^m \sum _{j=1}^na_{ij}^2\Bigr]^{1/2} $$ and consider the problem $$ \operatorname{argmin}_{\sigma_j,u_j,v_j}\frac{1}{2}\| A-\sum _{j=1}^k \sigma_j u_j\, v_j^\mathsf{T}\| ^2_F $$ If a minimum is reached and is $0$, then we would be done! We start with $$ \operatorname{argmin}\big\{ \frac{1}{2}\| A-\sigma u\, v^\mathsf{T}\| ^2_F \, :\, \sigma,u\in \mathbb{R}^m,v\in \mathbb{R}^n, \|u\|=1=\|v\|\big\}. $$ Convince yourself that $$ E(\sigma,u,v):=\frac{1}{2}\| A-\sigma u\, v^\mathsf{T}\| _F=\frac{1}{2}\| A\|^2 _F-\sigma\, u^TAv+\frac{\sigma^2}{2}\underset{=1}{\underbrace{\| u\|^2\| v\|^2}}. $$ If the mimimum is such that $\sigma=0$, then we have that $$ \frac{1}{2}\| A\|_F^2\leq \frac{1}{2}\| A-\sigma\, uv^\mathsf{T}\| _F^2=\frac{1}{2}\| A\|_F^2-\sigma \, u^\mathsf{T}Av+\frac{\sigma^2}{2}\text{ for all }\sigma ,u,v. $$ This implies that $(u^\mathsf{T}Av)\sigma -\sigma^2/2\leq 0$ for all $\sigma$ no matter what $u$ and $v$ are, which yields that $\frac{1}{2}(u^\mathsf{T}Av)^2\leq 0$ for all $u,v$ since the maximum of the expression is at $\sigma=u^\mathsf{T}Av$. It follows that, in this case, $u^\mathsf{T}Av=0$ for all $u,v$ and therefore that $A=0$. We conclude that the minimum is positive if $A\neq 0$.

Notice that $$ DE(\sigma,u,v)=\begin{bmatrix} -u^\mathsf{T}Av+\sigma & -\sigma Av+\sigma^2u & -\sigma A^\mathsf{T}u+\sigma^2 v\end{bmatrix}, $$ so that for a critical point $\sigma_*,u_*,v_*$ it holds that $$ \sigma_*=u_*^\mathsf{T}Av_*\text{ and }\sigma_* \bigl( Av_*-u_*\bigr) =0=\sigma_* \bigl( A^\mathsf{T}u_*-v_*\bigr). $$ We can therefore consider $$ \tilde E(u,v)=E \bigl( \sigma(u,v),u,v\bigr)\text{ on }\mathbb{S}^m\times \mathbb{S}^n, $$ where $\sigma(u,v)=u^\mathsf{T}Av$ and $\mathbb{S}^l$ denotes the unit sphere in $\mathbb{R}^l$. It follows from the compactness of $ \mathbb{S}^m\times \mathbb{S}^n$ and the continuity of $\tilde E$ that a minimum exists, i.e. unit vectors $u_1,v_1$ can be found such that $$ \sigma_1=u_1^\mathsf{T}Av_1\text{ and } \tilde E(u_1,v_1)>0\text{ if } A\neq 0. $$ Notice that $\tilde E(u_1,v_1)\leq \tilde E(u,v)$ for all $u,v$ and that $$ \tilde E(u,v)= E \bigl( \sigma(u,v),u,v\bigr)\leq E(\sigma,u,v) \text{ for all }\sigma ,u,v, $$ since $\sigma(u,v)$ minimizes $\frac{1}{2}\| A-\sigma\, uv^\mathsf{T}\| _F^2$ for any fixed given $u,v$. Next observe that $$ \operatorname{rank}\bigl( A-\sigma _1u_1v_1^\mathsf{T}\bigr)<\operatorname{rank}(A), $$ since $Av_1=u_1\neq 0$ but $$ \bigl( A-\sigma _1u_1v_1^\mathsf{T}\bigr)v_1=Av_1-\sigma_1 u_1=0, $$ by the stationarity conditions $DE=0$ and the fact that $\sigma_1>0$ for $A\neq 0$.

Now it only remains to replace $A$ by $A_1=A-\sigma _1u_1v_1^\mathsf{T}$ and consider $$ \operatorname{argmin}\big\{ \frac{1}{2}\| A_1-\sigma u\, v^\mathsf{T}\| ^2_F \, :\, \sigma,u\in \mathbb{R}^m,v\in \mathbb{R}^n, \|u\|=1=\|v\|\big\}. $$ If the minimum is 0, then $A_1=0$ and we are done since $A=\sigma _1u_1v_1^\mathsf{T}$. If not, we can find $\sigma_2>0$ as well as $u_2,v_2$ of norm 1 which minimize the new energy and such that $$ \operatorname{rank}(A_2)=\operatorname{rank}\bigl( A_1-\sigma _2u_2v_2^\mathsf{T}\bigr)<\operatorname{rank}(A_1). $$ Since the rank is strictly decereasing in each step, continuing in this fashion, a $k\in \mathbb{N}$ is found along with $\sigma_j,u_j,v_j$ for $j=1,\dots,k$ such that $$ A-\sum _{j=1}^k \sigma_j u_j\, v_j^\mathsf{T}=0. $$ The number $\sigma _j$ are called singular values, and the vectors $u_j,v_j$, singular vectors. Using the vectors $u_j$ and $v_j$ as the columns of two matrices $U\in \mathbb{R}^{m\times k}$ and $V\in \mathbb{R}^{n\times k}$ and the values $\sigma_j$ as the diagonal entries of an otherwise zero square matrix $\Sigma\in \mathbb{R}^{k\times k}$, this can be written as $$ A=U\Sigma V^T. $$ Prove that the matrices $U$ and $V$ are orthogonal.

Taylor's Theorem in $\mathbb{R}^n$

Multi-indeces are useful to deal with higher order partial derivatives. For $\alpha =(\alpha_1,...,\alpha_n)\in \mathbb{N}^n\cup\{0\}$ set $|\alpha|=\sum_{j=1}^n \alpha_j$ and define $$ \frac{\partial^{|\alpha|}f}{\partial x^\alpha}=\frac{\partial^{|\alpha|}f }{\partial x_1^{\alpha_1} \cdots \partial x_n^{\alpha_n}} $$

Theorem (Taylor's theorem)

Let $U\subset \mathbb{R}^n$ be open and convex and $\bar x\in U$. Let $f\in \operatorname{C}^{m+1} (U)$, that is, $(m+1)$ times continuously differentiable. Then $$ f(x) = \sum_{|\alpha|=0}^m {1\over \alpha!} {\partial^{|\alpha|} f(\bar x) \over \partial x^\alpha } (x-\bar x)^\alpha +\sum_{|\alpha| = m+1} {1\over \alpha!} {\partial^{|\alpha|} f(\xi)\over \partial x^\alpha } (x-\bar x)^{\alpha}, $$ where $\alpha! =\prod _{j=1}^n\alpha_j!$

Proof

Extremal problems

Definition (Extrema)

Let $U$ be a domain in $\mathbb{R}^n$, $f:U\to \mathbb{R}$ be continuous, and $x_0\in U$. Then we say that
(i) $f$ attains a local maximum [minimum] at $x_0$ if there is a $\delta>0$ such that $$ f(x_0)\geq f(x)\quad [ f(x_0)\leq\, f(x) ]\, ,\:x\in \mathbb{B}(x_0, \delta)\, . $$
(ii) $f$ attains a global maximum [minimum] on $U$ at $x_0$ if $$ f(x_0)\geq f(x)\quad[ f(x_0)\leq \, f(x) ]\, ,\:x\in U\, . $$ In these cases it is also said that $x_0$ is a (local) maximizer/minimizer.

A pervasive issue in mathematics is how to find maxima (or maximizers) and minima of $f$ in $U$ if they exist?

Proposition

Let $f\in \operatorname{C}^1(U)$. If $x_0\in U$ is a local maximizer or a local minimizer of $f$ in $U$, then $\nabla f(x_0) = 0$.

Proof

Definition (Critical points)

A point $x_0\in U$ is called a critical point of $f$ in $U$ if either $\nabla f(x_0) =0$ or $\nabla f(x_0)$ does not exist.

Now, if $x_0$ is a critical point for $f$ in $U$, how can it be determined whether $x_0$ is a local maximizer, minimizer, or a saddle point?

Definition (Positive definite matrix)

Let $A$ be an $n\times n$ symmetric matrix. Then
(i) $A$ is positive [negative] definite if and only if there is a constant $\lambda>0$ such that $$ \langle A x, x\rangle \ge \lambda |x|^2 \quad [ \langle A x, x\rangle \le -\lambda |x|^2 ],\quad x \in \mathbb{R}^n . $$
(ii) $A$ is positive semi-definite if and only if $$ \langle Ax, x\rangle \ge 0,\quad x\in\mathbb{R}^n $$

Theorem (Second derivatives test)

Let $f\in \operatorname{C}^2(U)$ and $x_0\in U$ be a critical point of $f$. Defining the Hessian $H_f(x_0)$ of $f$ at $x_0$ by $$ H_f(x_0) =\begin{bmatrix} \frac{\partial ^2f}{\partial x_1^2} & \cdots & \frac{\partial ^2f}{\partial x_1\partial x_n} \\ \vdots & & \vdots \\ \frac{\partial^2f}{\partial x_n \partial x_1} & \cdots & \frac{\partial ^2f}{\partial x_n^2}\end{bmatrix}(x_0)\, , $$ the following statements hold
(i) If $H_f(x_0)$ is positive definite, then $x_0$ is a local minimizer.
(ii) If $H_f(x_0)$ is negative definite, then $x_0$ is a local maximizer.
(iii) If $H_f(x_0)$ is indefinite, then $x_0$ is a saddle point.

Example

Let $f(x_1,x_2) = x_1^2 + x_2^2$ for $x\in \mathbb{R} ^2$. Then $(0,0)$ is a critical point and $$ H_f(0,0) = \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix} $$ is positive definite. Therefore $(0,0)$ is a local minimizer.

Example

Let $f(x_1,x_2) = -(x_1^2 + x_2^2)$ for $x\in \mathbb{R} ^2$. Then $(0,0)$ is a critical point and $$ H_f(0,0) = \begin{bmatrix}-2 & 0 \\ 0 & -2 \end{bmatrix} $$ is negative definite. Therefore $(0,0)$ is a local maximizer.

Example

Let $f(x_1,x_2) = x_1^2 - x_2^2$ for $x\in \mathbb{R} ^2$. Then $(0,0)$ is a critical point and $$ H_f(0,0) =\begin{bmatrix} 2 & 0 \\0 & -2 \end{bmatrix} $$ is indefinite. Therefore $(0,0)$ is a saddle point.

Example

Let $f(x,y) = x^4 + y^4$ for $x\in \mathbb{R} ^2$. Then $(0,0)$ is a critical point and $$ H_f(x,y) = \begin{bmatrix}12x^2 & 0 \\ 0 & 12y^2 \end{bmatrix},\quad H_f(0,0) = \begin{bmatrix}0 & 0 \\ 0 & 0 \end{bmatrix}\, $$ The second derivative test is therefore inconclusive. However, $(0,0)$ is the global minimizer for $f$ in $\mathbb{R}^2$.

Theorem

A function $f\in \operatorname{C}^2(U)$ is convex if and only $H_f(x)$ is positive semidefinite, i.e. if and only if $$ \sum_{i,j=1}^n {\partial^2 f(x) \over \partial x_i \partial x_j} \xi_i \xi_j \ge 0,\: \xi\in\mathbb{R}^n, \quad x\in U. $$

Proof

Theorem

If $f\in \operatorname{C}^2(U)$ is convex, every critical point of $f$ is a global minimizer.

Proof

Example

Find all global minimizers of $f:\mathbb{R}^2\to \mathbb{R}$ where $$ f(x, y)=x^4+y^4-32x-2y^2,\: (x,y)\in \mathbb{R} ^2\, . $$

Proof

Lagrange Multipliers

Often extrema of functionals need to be found which are subject to additional constraints, such as in the case where extrema are sought in the zero set of some function only. For given $f,g:\mathbb{R}^n\to \mathbb{R}$, a typical example is $$ \hbox{Min} \Big\{f(x): x\in[g=c] \Big\} , $$ which is interpreted as the problem of finding extremal points of $f$ on the level set of $g$ $$ [g=c]:=\big\{ x\in \mathbb{R}^n\, :\, g(x)=c\big\} $$ for a given constant $c$ (which can w.l.o.g. be taken to vanish). If you think geometrically, it is easy to see that, if $x_0$ is an extremal point of $f$ on $[g=c]$, then $\nabla f(x_0)$ has to point in a direction orthogonal to $[g=c]$ at the point $x_0$ (otherwise the function value could be increased/decerased by moving in the appropriate direction along the gradient and still remaining in the set $[g=c]$). Since $\nabla g(x_0)$ is the direction perpendicular to $[g=c]$ at $x_0$ it follows that $$\begin{cases} \nabla f(x_0) =\lambda \nabla g(x_0)&\\ g(x_0) =c&\end{cases} $$ for some $\lambda\in \mathbb{R}$. The parameter $\lambda$ is called Lagrange multiplier for the probelm. This is, of course, provided the functions involved are smooth and $x_0$ is not critical for $g$.

Example

Find maximum and minimum of $f(x,y,z) = x+y$ on the unit sphere $S^2=[x^2 + y^2 + z^2=1]$.

Lecture 9. Maxima and Minima

Motivation

Taylor's Theorem in $\mathbb{R}^n$

Theorem (Taylor's theorem)

Proof

Extremal problems

Definition (Extrema)

Proposition

Proof

Definition (Critical points)

Definition (Positive definite matrix)

Theorem (Second derivatives test)

Example

Example

Example

Example

Theorem

Proof

Theorem

Proof

Example

Proof

Lagrange Multipliers

Example

Discussion