Conjugate Gradient Methods

Long Chen

Problem Setting

We introduce the conjugate gradient (CG) and and its preconditioned version PCG methods for solving

Au=b, \label{maineq}

$A$ $\mathbb V$ $\dim \mathbb V = N$ $\mathbb V = \mathbb R^N$ $A$ is an SPD matrix.

$\eqref{maineq}$ can be derived from the following optimization problem

\min _{u\in \mathbb V} f(u):= \frac{1}{2}\|u\|_A^2 - (b, u).

$f$ $\nabla f(u) = 0$ $\eqref{maineq}$ .

$A$ $(\cdot,\cdot)$ $(\cdot,\cdot)_A$ $A$ . That is

(x,y)_A = (Ax,y) = (x,Ay) = x^{\intercal}Ay = y^{\intercal}Ax.

$(\cdot,\cdot)$ $(\cdot,\cdot)_A$ by $A$ -orthogonal or conjugate.

$x\in \mathbb V$ $A$ $x$ $S\subseteq \mathbb V$ ${\rm Proj}_S^Ax\in S$ , by the relation

({\rm Proj}_S^Ax, y)_A = (x, y)_A, \quad \forall y\in S \label{Aproj}.

CG Algorithm

$u_0$ $p_0 = r_0 = -\nabla f (u_0)$ $k=0,1,2,\ldots, n$ $\mathbb V_k = {\rm span}\{p_0, p_1, \cdots , p_k\}$ $A$ $(p_i, p_j)_A = 0$ $i\neq j, i,j=0,\ldots, k$ .

CG consists of three steps:

$u_{k+1}$ $A$ $u-u_0$ $\mathbb V_k$ .
$r_{k+1}$ $\mathbb V_k$ $\mathbb V_{k+1}$ .
$A-$ $p_{k+1}$ .

We now briefly explain each step and present recrusive formulae.

Remark 1. $u, v$ $\mathbb V, \mathbb V_k$ $u\in \mathbb V$ $u-0\in \mathbb V$ $u_{k+1}$ $u_0+\mathbb V_k$ $\square$

$\mathbb V_k = {\rm span}\{p_0, p_1, \cdots , p_k\}$ $A$ $u_{k+1}$ by

u_{k+1} - u_0 = {\rm Proj}_{V_k}^A (u - u_0).

$u$ $Au=b$ $(u_{k+1} - u_0,p_i)_A = ({\rm Proj}_{V_k}^A (u - u_0), p_i)_A = (u - u_0, p_i)_A = (A(u - u_0), p_i) = (r_0, p_i)$ $i=0,1,\ldots,k$ , which leads to the formulae

u_{k+1} = u_0 + \sum _{i=0}^{k}\alpha _i p_i, \quad \alpha_i = \frac{(u-u_0, p_i)_A}{(p_i, p_i)_A} = \frac{(r_0, p_i)}{(Ap_i, p_i)}. \label{uk1}

$u_{k+1}$ $r_{k+1} = b - Au_{k+1}$ $r_{k+1} =0$ $u_{k+1}=u$ $\mathbb V_{k+1} = {\rm span}\{p_0, p_1, \cdots , p_k, r_{k+1}\}$ .

$r_{k+1}$ $A$ -orthogonal to others. The new conjugate direction is

p_{k+1} = r_{k+1} + \beta _{k} p_{k},\quad \beta _{k} = - \frac{(r_{k+1}, p_{k})_A}{(p_{k},p_{k})_A}. \label{updatep}

$\eqref{updatep}$ is justified by the following orthogonality which will be proved later on.

Lemma 1. $r_{k+1}$ $(\cdot,\cdot)$ $\mathbb V_{k}$ $A$ $\mathbb V_{k-1}$ .

$u_0$ $p_0= r_0$ $k=0, 1, 2, \cdots$ , we use three recursion formulae to compute

\begin{align*} u_{k+1} &= u_k + \alpha _k p_k, & \alpha _k &= \frac{(r_0, p_k)}{(p_k, p_k)_A},\\ r_{k+1} & = r_k - \alpha _kAp_k,\\ p_{k+1} & = r_{k+1} + \beta _kp_k, & \beta _{k} &= - \frac{(r_{k+1}, p_{k})_A}{(p_{k},p_{k})_A}. \end{align*}

The method is called conjugate gradient method since the search direction is obtained by a correction of the negative gradient (the residual) direction and conjugate to all previous directions.

$\alpha _k$ $\beta _k$ can be derived using the orthogonality presented in Lemma 1.

\alpha _k = \frac{(r_k, r_k)}{(Ap_k, p_k)}, \quad \beta _k = \frac{(r_{k+1}, r_{k+1})}{(r_k, r_k)}.

We present the algorithm of the conjugate gradient method as follows.


x
function u = CG(A,b,u,tol)
tol = tol*norm(b);
k = 1;
r = b - A*u; 
p = r;
r2 = r'*r;
while sqrt(r2) >= tol && k<length(b)
    Ap = A*p;
    alpha = r2/(p'*Ap);
    u = u + alpha*p;
    r = r - alpha*Ap;
    r2old = r2;
    r2 = r'*r;
    beta = r2/r2old;
    p = r + beta*p;
    k = k + 1;
end

Several remarks on the realization are listed below:

A*p $A$ explicitly. A subroutine to compute the matrix-vector multiplication is enough. This is an attractive feature of Krylov subspace methods.
$\|r\|\leq \text{tol} \|b\|$ .
A maximum iteration step is given to avoid a large loop.

Orthogonality and Optimality

$A$ $A$ $\eqref{Aproj}$ we have the orthogonality

( x - {\rm Proj}_S^A x, y)_A = 0 \quad \forall y\in S. \label{Aorth}

Consequently

\| x - {\rm Proj}_S^A x \|_A = \inf_{y\in S} \| x -y \|_A. \label{shortdist}

$A$ $A$ -orthogonal projection.

$A$ $\eqref{Aorth}$ to present a proof of Lemma 1.

Lemma 1. $r_{k+1}$ $(\cdot,\cdot)$ $\mathbb V_{k}$ $A$ $\mathbb V_{k-1}$ .

Proof. $P_k = {\rm Proj}_{V_k}^A$ $u-u_{k+1} = u-u_0 - (u_{k+1}-u_0) = (I-P_{k}) (u-u_0)$ $u-u_{k+1} \bot _A \mathbb V_{k}$ $r_{k+1} = A(u-u_{k+1})\bot \mathbb V_{k}$ . The first orthogonality is thus proved.

$\mathbb V_{k} = {\rm span}\{p_0, p_1, \cdots, p_{k}\}= {\rm span}\{r_0, r_1, \cdots, r_{k}\}.$

$r_{i+1} = r_{i} - \alpha _{i}Ap_{i}$ $Ap_{i} \in {\rm span}\{r_{i}, r_{i+1}\} \subset \mathbb V_{k}$ $0\leq i\leq k-1$ $r_{k+1} \bot \mathbb V_{k}$ $(r_{k+1}, p_i)_A =(r_{k+1}, Ap_i) = 0$ $0\leq i\leq k-1$ $r_{k+1}$ $A$ $\mathbb V_{k-1}$ $\square$

$\eqref{shortdist}$ , we have the optimality

\| u - u_{k+1} \|_A = \inf_{v\in u_0+\mathbb V_k}\| u - v\|_A. \label{cg:inf}

$f(v) - f(u) = \frac{1}{2}\|v - u\|_A^2$ , we can obtain another optimality

f(u_{k+1}) = \inf_{v\in u_0+\mathbb V_k} f(v).

We can also show

Exercise. $u_{k+1}$ $\eqref{uk1}$ . Prove that

\| u - u_{k+1} \|_A = \inf_{\alpha \in \mathbb R}\| u - (u_k+\alpha p_k)\|_A. \label{steep}

$u_{k+1}$ $u_k$ $p_k$ $\eqref{steep}$ $\alpha_k = (r_k, p_k)_A/(p_k, p_k)_A$ .

PCG Algorithm

$B$ be an SPD matrix. We apply the Gram-Schmidt process to the subspace

\mathbb V_k = {\rm span}\{ Br_0, Br_1, \cdots, Br_k\},

$A$ $Br_{k+1}$ $r_{k+1}$ .

$u_{k+1}$ $A$ $u-u_0$ $\mathbb V_k$ .
$Br_{k+1}$ $\mathbb V_k$ $\mathbb V_{k+1}$ .
$A-$ $p_{k+1}$ .

$u_0$ $p_0 = Br_0$ $k=0, 1, 2, \cdots$ , the three-term recursion formulae are

\begin{align*} u_{k+1} &= u_k + \alpha _k p_k, &\alpha _k &= \frac{(Br_k, r_k)}{(Ap_k, p_k)},\\ r_{k+1} & = r_k - \alpha _kAp_k,\\ p_{k+1} & = Br_{k+1} + \beta _kp_k, & \beta _k &= \frac{(Br_{k+1}, r_{k+1})}{(Br_k, r_k)}. \end{align*}

Proof of the following lemma is almost identical to that of Lemma 1.

Lemma 2. $Br_{k+1}$ $A$ $\mathbb V_{k-1}$ $p_{k+1}$ $A$ $\mathbb V_k$ .

We present the algorithm of preconditioned conjugate gradient method.


xxxxxxxxxx
function u = pcg(A,b,u,B,tol)
tol = tol*norm(b);
k = 1;
r = b - A*u;
rho =  1;
while sqrt(rho) >= tol && k<length(b)
    Br = B*r;
    rho = r'*Br;
    if k = 1 
       p = Br;
    else
       beta = rho/rho_old;
       p = Br + beta*p;
    end
    Ap = A*p;
    alpha = rho/(p'*Ap);
    u = u + alpha*p;
    r = r - alpha*Ap;
    rho_old = rho;
    k = k + 1;
end

Similarly we collect several remarks for the implementation

$A$ $\mathbb V$ $A: \mathbb V\to \mathbb V'$ $\mathbb V'$ $B$ $\mathbb V'$ $\mathbb V$ $(\cdot,\cdot)_B$ $B$ is the identity matrix.
$A$ is continuous and stable. Then the corresponding Riesz representation of that inner product is a good preconditioner.
B*r $B$ do not have to be formed explicitly. All we need is the matrix-vector multiplication which can be replaced by a subroutine.

Convergence Analysis

Recall that we have two bases for the subspace

\mathbb V_{k} = {\rm span}\{r_0, r_1, \cdots, r_{k}\} = {\rm span}\{p_0, p_1, \cdots, p_{k}\}.

Another basis is that

Lemma 3.

\mathbb V_{k} = {\rm span}\{r_0, Ar_0, \cdots , A^{k}r_0\}.

Proof. $k = 0$ $k =i$ $k=i+1$ by noting that

\mathbb V_{i+1} = \mathbb V_{i} + {\rm span} \{r_{i+1}\} = \mathbb V_{i} + {\rm span} \{r_i, Ap_i\} = \mathbb V_{i} + {\rm span} \{Ap_i\} = \mathbb V_{i} + A\mathbb V_i.

$\square$

${\rm span}\{r_0, Ar_0, \cdots , A^{k}r_0\}$ Krylov subspace $\mathcal P_k$ $\leq k$ .

Theorem 1. $u_k$ $k$ $u_0$ . Then

\begin{align}<br>\label{CGidentity} \|u-u_{k}\|_A &= \inf_{\substack{p_k \in \mathcal P_k,\\ p_k(0)=1}}\|p_k(A)(u-u_0)\|_A,\\<br>\|u-u_{k}\|_A & \leq \inf_{\substack{p_k \in \mathcal P_k,\\ p_k(0)=1}} \sup _{\lambda \in \sigma (A)} |p_k(\lambda)|<br> \, \|u-u_0\|_A. \label{CGratepoly}<br>\end{align}

Proof. $v\in \mathbb V_{k-1}$ , it can be expanded as

v = \sum _{i=0}^{k-1} c_iA^i r_0 = \sum _{i=1}^k c_{i-1}A^{i}(u-u_0).

$p_k(x) = 1{\color{blue}+} \sum _{i=1}^k c_{i-1}x^i$ . Then

u- u _0 + v = p_k(A)(u-u_0).

$\eqref{CGidentity}$ $\eqref{cg:inf}$ .

$A^i$ $A$ -inner product, we have

\|p_k(A)\|_A = \rho (p_k(A)) = \sup _{\lambda \in \sigma (A)} |p_k(\lambda)|,

$\eqref{CGratepoly}$ $\square$

$p_k\in \mathcal P_k$ $p_k(0)=1$ will be called the residual polynomial. Various convergence results of CG method can be obtained by choosing specific residual polynomials.

Corollary 1. $\{\phi\}_{i=1}^N$ $A$ $b \in {\rm span}\{\phi _{i_1}, \phi _{i_2}, \cdots , \phi _{i_k}\}$ $k\leq N$ $u_0=0$ $Au=b$ $k$ $b\in \mathbb R^N$ $Au=b$ $N$ iterations. Namely CG can be viewed as a direct method.

Proof. $\mathbb U_k = {\rm span}\{\phi _{i_1}, \phi _{i_2}, \cdots , \phi _{i_k}\}$ $b = \sum_{l=1}^k b_l \phi_{i_l}$ $u = \sum _{l=1}^k u_l \phi _{i_l}$ $u_l = b_l/\lambda_{i_l}$ $b\in \mathbb U_k$ $u$ . Note that here the eigenvectors are introduced for analysis and unknown in the algorithm.

$\mathbb U_k$ $A$ $A\mathbb U_k = \mathbb U_k$ $r_0=b\in \mathbb U_k$ $\mathbb V_i\subset \mathbb U_k$ $i\leq k-1$ $r_{i+1}\neq 0$ $\mathbb V_{i+1}$ $k$ $\mathbb V_k = \mathbb U_k$ $u\in \mathbb U_k$ $\mathbb V_k = \mathbb U_k$ is itself.

$p(x) = \prod _{l=1}^k(1-x/\lambda _{i_l})$ $p$ $k$ $p(0)=1, p(\lambda_{i_l})=0$ . and

p(A)\sum _{l=1}^k u_l \phi _{i_l} = \sum _{l=1}^k u_l p(\lambda _{i_l})\phi _{i_l} = 0.

$\eqref{CGidentity}$ .

Remark 2. $A$ $\{\phi _i\}_{i=1}^k$ $\lambda _{\min}(A)=0$ $b \in {\rm range} (A) = {\rm span}\{\phi _{k+1}, \phi _{k+2}, \cdots, \phi _N\}$ $u_0\in {\rm range}(A)$ $Au=b$ $N-k-1$ $\square$

$A$ $k\ll N$ .

Theorem 2. $u_k$ $k$ $u_0$ . Then

\|u-u_{k}\|_A \leq 2\left (\frac{\sqrt {\kappa (A)}-1}{\sqrt {\kappa (A)}+1} \right)^k\|u-u_0\|_A. \label{CGrate}

Proof. $b = \lambda _{\max}(A)$ $a = \lambda _{\min}(A)$ . Choose

p_k(x) = \frac{T_k((b+a -2x)/(b-a))}{T_k((b+a)/(b-a))},

$T_k(x)$ $k$ given by

\begin{eqnarray*} T_k(x)= \left\{\begin{array}{ll} \cos(k\cdot \arccos x ) & \mbox{if}\quad |x|\leq 1\\ \mathrm{cosh}(k\cdot \mathrm{arccosh}\ x ) & \mbox{if}\quad |x|\geq 1 \end{array}\right. \end{eqnarray*}

$p_k(0)=1$ $x\in [a, b]$ , then

\left| \frac{b+a -2x}{b-a} \right| \leq 1.

Hence

\begin{eqnarray*} \inf_{\substack{p_k \in \mathcal P_k,\\ p_k(0)=1}} \sup _{\lambda \in \sigma (A)} |p_k(\lambda)| \leq \left[ T_{k}\left (\frac{b+a}{b-a} \right ) \right]^{-1}. \end{eqnarray*}

We set

\frac{b+a}{b-a}=\mathrm{cosh} \sigma=\frac{\mathrm{e}^{\sigma} + \mathrm{e}^{-\sigma}}{2}.

$\mathrm{e}^{\sigma}$ , we have

\mathrm{e}^{\sigma} =\frac{ \sqrt{\kappa (A)}+1 }{ \sqrt{\kappa (A)}-1}

$\kappa (A)=b/a$ . We then obtain

T_{k}\left (\frac{b+a}{b-a}\right ) = \mathrm{cosh}(k \sigma)= \frac{\mathrm{e}^{k\sigma} + \mathrm{e}^{-k\sigma}}{2} \geq \frac12 \mathrm{e}^{k\sigma} = \frac12 \left (\frac{\sqrt {\kappa (A)}+1}{\sqrt {\kappa (A)}-1} \right)^k,

$\square$

$\eqref{CGrate}$ $A$ $\kappa(A)$ is large, the iteration will perform well if the eigenvalues are clustered in a few small intervals; see the following estimate.

Corollary 2. $\sigma (A) = \sigma _0(A)\cup \sigma _1(A)$ $l$ $\sigma _0(A)$ . Then

\|u-u_{k}\|_A \leq 2M \left (\frac{\sqrt{b/a}-1}{\sqrt{b/a}+1} \right)^{k-l}\|u-u_0\|_A,

where

\begin{align*} a = \min _{\lambda \in \sigma _1(A)}\lambda, \; b = \max _{\lambda \in \sigma _1(A)}\lambda, \quad \hbox{ and }\ M = \max _{\lambda\in \sigma _1(A)} \prod _{\mu \in \sigma _0(A)}|1- \lambda /\mu|. \end{align*}

$\tilde r_0 = Br_0$ $\mathbb V_k = {\rm span}\{\tilde r_0, BA \tilde r_0, \cdots, (BA)^k \tilde r_0\}.$ $\eqref{cg:inf}$ $BA$ $A$ -inner product, we obtain the following convergence rate of PCG.

Theorem 3. $A,B$ $u_k$ $k$ $B$ $u_0$ . Then

\begin{align}<br>\|u-u_{k}\|_A &= \inf_{\substack{p_k \in \mathcal P_k,\\ p_k(0)=1}}\|p_k(BA)(u-u_0)\|_A,\\<br>\|u-u_{k}\|_A & \leq \inf_{\substack{p_k \in \mathcal P_k,\\ p_k(0)=1}} \sup _{\lambda \in \sigma (BA)} |p_k(\lambda)|<br> \, \|u-u_0\|_A,\\<br> \label{PCGrate} \|u-u_{k}\|_A<br>& \leq<br> 2\left (\frac{\sqrt {\kappa (BA)}-1}{\sqrt {\kappa (BA)}+1} \right)^k\|u-u_0\|_A.<br>\end{align}

$B$ preconditioner $B$ $\kappa (BA)$ $\kappa (A)$ . The design of a good preconditioner requires the knowledge of the problem and finding a good preconditioner is a central topic of scientific computing.

$u_{k+1} = u_k + B(f-Au_k)$ $B$ $B=D^{-1}$ $A$ .