Conjugate Gradient Methods

Long Chen

 

Problem Setting

We introduce the conjugate gradient (CG) and and its preconditioned version PCG methods for solving

where is a symmetric and positive definite (SPD) operator defined on a Hilbert space with . When , is an SPD matrix.

Equation can be derived from the following optimization problem

As is strongly convex, the global minimizer exists and unique and satisfies Euler equation which is exactly equation .

We shall derive CG from the -orthogonal projection. We use for the standard inner product and for the inner product introduced by the SPD operator . That is

When talking about orthogonality, we refer to the default inner product and emphasize the orthogonality in by -orthogonal or conjugate.

Given a vector , the -orthogonal projection of to a subspace is a vector, denoted by , by the relation

 

CG Algorithm

Start from an initial guess . Let . For , let be a subspace spanned by -orthogonal basis, i.e. for .

CG consists of three steps:

  1. compute by the -orthogonal projection of to .
  2. add residual vector to to get .
  3. apply Gram-Schmit process to get orthogonal vector .

We now briefly explain each step and present recrusive formulae.

Remark 1. We treat as points and their difference as a vector. Spaces are vectors spaces and means . If we mix the points with vectors, the space we are looking for is indeed the affine space .

Given a subspace spanned by -orthogonal basis, we compute by

The projection can be computed without knowing as is known. Indeed, by definition, for , which leads to the formulae

With , we can compute a new vector . If , which means is the solution, then we stop. Otherwise expand the subspace to a larger one .

Then use Gram-Schmit process to make new added vector -orthogonal to others. The new conjugate direction is

The formulae is justified by the following orthogonality which will be proved later on.

Lemma 1. The residual is -orthogonal to and -orthogonal to .

Last, we present recursive formula. Starting from an initial guess and , for , we use three recursion formulae to compute

The method is called conjugate gradient method since the search direction is obtained by a correction of the negative gradient (the residual) direction and conjugate to all previous directions.

More efficient formulae for and can be derived using the orthogonality presented in Lemma 1.

 

We present the algorithm of the conjugate gradient method as follows.

Several remarks on the realization are listed below:

 

Orthogonality and Optimality

We explore the orthogonality and optimality due to the -projection to the subspace. By definition of -projection, cf. we have the orthogonality

Consequently

That is the shortest distance measured in the -norm is given by the -orthogonal projection.

We will use the -orthogonality to present a proof of Lemma 1.

Lemma 1. The residual is -orthogonal to and -orthogonal to .

Proof. To simplify notation, we denote by . We write . Therefore , which is equivalent to . The first orthogonality is thus proved.

Since at every step, the residual vector is added to expand the subspace, it can be easily proved by induction that

By the recursive formula for the residual , we get for . Since we have proved , we get for , i.e. is -orthogonal to .

In view of , we have the optimality

As , we can obtain another optimality

We can also show

Exercise. Let be computed by . Prove that

That is can be found by the steepest descent method starting from and moving along the searching direction . From we get another formulae .

 

PCG Algorithm

Let be an SPD matrix. We apply the Gram-Schmidt process to the subspace

to get another -orthogonal basis. The three steps are still the same except in the second step, we add instead of .

  1. compute by the -orthogonal projection of to .
  2. add residual vector to to get .
  3. apply Gram-Schmit process to get orthogonal vector .

Starting from an initial guess and . For , the three-term recursion formulae are

Proof of the following lemma is almost identical to that of Lemma 1.

Lemma 2. is -orthogonal to and is -orthogonal to .

We present the algorithm of preconditioned conjugate gradient method.

 

Similarly we collect several remarks for the implementation

 

Convergence Analysis

Recall that we have two bases for the subspace

Another basis is that

Lemma 3.

Proof. We prove it by induction. For , it is trivial. Suppose the statement holds for . We prove it holds for by noting that

The space is called Krylov subspace. The CG method belongs to a large class of Krylov subspace methods. In the sequel, we use to denote the space of polynomials with degree .

Theorem 1. Let be the th iteration in the CG method with an initial guess . Then

Proof. For , it can be expanded as

Let . Then

The identity then follows from .

Since is symmetric in the -inner product, we have

which leads to the estimate .

The polynomial with constraint will be called the residual polynomial. Various convergence results of CG method can be obtained by choosing specific residual polynomials.

Corollary 1. Let be the eigenvector basis of . Let , . Then CG method with will find the solution in at most iterations. In particular, for a given , the CG algorithm will find the solution within iterations. Namely CG can be viewed as a direct method.

Proof. Denote by . Let . Then with . Namely if , so is . Note that here the eigenvectors are introduced for analysis and unknown in the algorithm.

The subspace is invariant under the action of , i.e. . Therefore starting from , the Krylov space for all . If , the dimension increases by 1. So at most iterations, . The projection of to is itself.

Choose the polynomial . Then is a residual polynomial of order and . and

The assertion is then from the identity .

Remark 2. The CG method can be also applied to symmetric and positive semi-definite matrix . Let be the eigenvectors associated to . Then from Corollary \ref{ex:keigenvalue}, if , then the CG method with will find a solution within iterations.

The CG is invented as a direct method but it is more effective to use as an iterative method. The rate of convergence depends crucially on the distribution of eigenvalues of and could converge to the solution within certain tolerance in steps .

Theorem 2. Let be the -th iteration of the CG method with . Then

Proof. Let and . Choose

where is the Chebyshev polynomial of degree given by

It is easy to see that , and if , then

Hence

We set

Solving this equation for , we have

with . We then obtain

which complete the proof.

The estimate shows that CG is in general better than the gradient method. Furthermore if the condition number of is close to one, the CG iteration will converge very fast. Even if is large, the iteration will perform well if the eigenvalues are clustered in a few small intervals; see the following estimate.

Corollary 2. Assume that and is the number of elements in . Then

where

To perform the convergence analysis of PCG, we let . Then one can easily verify that Since the optimality still holds, and is symmetric in the -inner product, we obtain the following convergence rate of PCG.

Theorem 3. Let be SPD and let be the th iteration in the PCG method with the preconditioner and an initial guess . Then

The SPD matrix is called preconditioner. A good preconditioner should have the properties that the action of is easy to compute and that is significantly smaller than . The design of a good preconditioner requires the knowledge of the problem and finding a good preconditioner is a central topic of scientific computing.

An interesting fact is that an iterative method may not be convergent whereas can be a preconditioner. For example, the Jacobi method is not convergent for all SPD systems, but can always be used as a preconditioner, which is often known as the diagonal preconditioner. Another popular preconditioner is obtained by an incomplete Cholesky factorization of .