It turns out that the Gram-Schmidt procedure we introduced previously suffers from numerical instability: Round-off errors can accumulate and destroy orthogonality of the resulting vectors. We introduce the modified Gram-Schmidt procedure to help remedy this issue.

for $j = 1:n$

$v_j = x_j$

for $k = 1:j-1$

$ v_j = v_j - \left( \frac{v_k^Tx_j}{v_k^Tv_k} \right) v_k$

endfor

endfor

Note that the output of this is an **orthogonal** set $\{v_1,\ldots,v_n\}$ and

gives an **orthonormal set**

for $j = 1:n$

$v_j = x_j$

for $k = 1:j-1$

$ v_j = v_j - ({q_k^Tx_j}) q_k$

endfor

$q_j = v_j/\|v_j\|_2$

endfor

Note that the output of this is an **orthonormal** set $\{q_1,\ldots,q_n\}.$

for $j = 1:n$

$v_j = x_j$

endfor

for $j = 1:n$

$q_j = v_j/\|v_j\|_2 $

for $k = j+1:n$

$ v_k = v_k - ({q_j^Tv_k}) q_j$

endfor

endfor

Note that the output of this is an **orthonormal** set $\{q_1,\ldots,q_n\}$.

In [4]:

```
n = 200;
A = rand(n); % Construct a "typical" matrix
Q = CGS(A); % Minimal accumulation of round-off errors (CGS)
norm(eye(n)-Q'*Q)
Q = MGS(A); % Less accumulation of round-off errors (MGS)
norm(eye(n)-Q'*Q)
```

In [3]:

```
n = 200;
A = 0.00001*eye(n)+hilb(n); % Construct a difficult matrix (ill-conditioned)
Q = CGS(A); % Clear accumulation of round-off errors (CGS)
norm(eye(n)-Q'*Q)
Q = MGS(A); % Some accumulation of round-off errors (MGS)
norm(eye(n)-Q'*Q)
```

Think carefully about these pseudocodes. In classical Gram-Schmidt (CGS), we take each vector, one at a time, and make it orthogonal to all previous vectors. In modified Gram-Schmidt (MGS), we take each vector, and modify all forthcoming vectors to be orthogonal to it. Once you argue this way, it is clear that both methods are performing the same operations, and are mathematically equivalent.

But, importantly, modified Gram-Schmidt suffers from round-off instability to a significantly less degree. This can be explained, in part, from the formulae:

$$\text{GS: } v_j = v_j - ({v_k^Tx_j}) v_k \quad \text{ and } \quad \text{MGS: } v_k = v_k - ({v_j^Tv_k}) v_j.$$If an error is made in computing $q_2$ in CGS, so that $q_1^Tq_2 = \delta$ is small, but non-zero, this will not be corrected for in any of the computations that follow:

$$v_3 = x_3 - (q_1^Tx_3)q_1 - (q_2^T x_3)q_2,$$$$q_2^Tv_3 = q_2^T x_3 - (q_1^Tx_3) \delta - (q_2^T x_3) = - (q_1^Tx_3) \delta.$$$$q_1^Tv_3 = q_1^T x_3 - (q_1^Tx_3) - (q_2^T x_3)\delta = - (q_2^Tx_3) \delta.$$We see that $v_3$ is not orthogonal to $q_1$ or $q_2$. On the other hand, assume the same for MGS: $q_1^Tq_2 = \delta$ is small, but non-zero. Let's examine how the third vector $v_3$ changes:

$\text{Initially: } v_3^{(0)} = x_3,$

${j = 1:}~~ v_3^{(1)} = v_3^{(0)} - (q_1^Tv_3^{(0)})q_1,$

${j = 2:}~~ v_3 = v_3^{(1)} - (q_2^Tv_3^{(1)})q_2.$

Then $v_3$ is the final form of the third vector (before normalization). Let's check orthogonality, assuming no more errors are made:

$$q_2^Tv_3 = q_2^T v_3^{(1)} - (q_2^Tv_3^{(1)}) = 0.$$So, we perserve orthogonality to $q_2$. Then for $q_1$:

$$q_1^Tv_3 = q_1^T v_3^{(1)} - (q_2^Tv_3^{(1)})\delta,$$$$q_2^Tv_3^{(1)} = q_2^Tv_3^{(0)} - (q_1^Tv_3^{(0)})\delta,$$$$q_1^Tv_3^{(1)} = q_1^Tv_3^{(0)} - (q_1^Tv_3^{(0)}) = 0.$$This gives

$$q_1^Tv_3 = - (q_2^Tv_3^{(0)} - (q_1^Tv_3^{(0)})\delta)\delta.$$$$ = -q_2^Tv_3^{(0)} \delta + q_1^Tv_3^{(0)}\delta^2.$$Since $\delta$ is very small, $\delta^2$ is much smaller. And so the error in $q_1^Tv_3$ is no worse than in CGS, but we've eliminated the errors in $q_2^Tv_3$, an improvement.

Just as the LU factorization is "Gaussian elimination with bookkeeping" the QR factorization is "Gram-Schmidt with bookkeeping". Given linearly independent vectors $\{x_1,x_2,\ldots,x_n\}$ in $\mathbb R^n$, we form a matrix

$$X = [x_1,x_2,\ldots,x_n],$$and relate $[x_1,x_2,\ldots,x_n]$ to $Q = [q_1,q_2,\ldots,q_n]$ found via Gram-Schmidt with an upper-triangular matrix $R$:

$$X = Q R = [q_1,q_2,\ldots,q_n] \begin{bmatrix} r_{11} & r_{12} & \cdots & r_{1n}\\ 0 & r_{22} & \cdots & r_{2n}\\ \vdots & \ddots & \ddots & \vdots\\ 0 & \cdots & 0 & r_{nn}\end{bmatrix}.$$The QR factorization can be completed with both CGS and MGS, but numerically, one should use MGS.

for $j = 1:n$

$v_j = x_j$

endfor

for $j = 1:n$

$r_{jj} = \|v_j\|_2$

$q_j = v_j/r_{jj} $

for $k = j+1:n$

$r_{jk} = {q_j^Tv_k}$

$ v_k = v_k - r_{jk} q_j$

endfor

endfor

In the next lecture, we will talk about how this factorization allows us to compute eigenvalues. But for now, consider solving

$$ Ax = b,$$$$ QR = b,$$$$ R = Q^T b.$$This last equality follows from $Q^{-1} = Q^T$ because $Q$ is orthogonal. Then $R$ is upper-triangular and this can be solve with backward substituion. This provides an alternate algorithm to Gaussian elimination or the LU factorization to solve a linear system. But, it turns out that the complexity of the QR algorithm is slightly worse than that of the LU algorithm, so it is not used for this purpose often.

If $\det A \neq 0$ and $r_{ii} > 0$ for all $i$, then the QR factorization is unique

Let $A = Q_0 R_0 = Q_1 R_1$ be two QR factorization where $R_1$ and $R_2$ have positive diagonal entries. Then

$$ Q_0 R_0 = Q_1 R_1.$$It follows that $Q_1^TQ_0 = R_1 R_0^{-1}$ is an orthogonal matrix.

From the cofactor expansion for the inverse matrix (which we did not cover), it follows any upper-triangular matrix with positive diagonal entries has its inverse satisfy the same conditions. Furthermore, the product of two upper-triangular matrices with positive diagonal entries, also satifies these same conditions. The same is true of lower-triangular matrices with positive diagonal entries.

So, because $R_1 R_0^{-1}$ is orthogonal, its inverse satisfies

$$ (R_1 R_0^{-1})^{-1} = R_0 R_1^{-1} =R_0^{-T} R_1.$$The equation $R_0 R_1^{-1} =R_0^{-T} R_1$ states that we have an upper-triangular orthogonal matrix with positive diagonal entries that is equal to a lower-triangular matrix. This shows that $R_0 R_1^{-1}$ is diagonal, orthogonal, with positive diagonal entries. The ONLY matrix that satisfies this is the identity matrix, therefore $R_0 = R_1$ and the factorization is unique.

Find the QR factorization of

$$ A = \begin{bmatrix} 1 & 0 & 1\\ 0 & -2 & 0 \\ 1 & -2 & 2 \end{bmatrix}. $$