2.6 Orthogonal Projections

Recall the discussion of the Gram-Schmidt process, where we saw that any ﬁnite-dimensional sub-

space W of an inner product space V has an orthonormal basis β

= {w

, . . . , w

}. In such a situa-

tion, we can deﬁne the orthogonal projections onto W and W

⊥

via

: V → W : x 7→

∑

j=1



x, w



, π

⊥

: V → W

⊥

: x 7→ x − π

( x)

Our previous goal was to use orthonormal bases to ease computation. In this section we develop

projections more generally. First recall the notion of a direct sum within a vector space V:

V = X ⊕Y ⇐⇒ ∀v ∈ V, ∃ unique x ∈ X, y ∈ Y such that v = x + y

Deﬁnition 2.54. A linear map T ∈ L(V) is a projection if:

V = R(T) ⊕ N(T) and T

R(T)

= I

R(T)

Otherwise said, T(r + n) = r whenever r ∈ R(T) and n ∈ N(T).

Alternatively, given V = X ⊕ Y, the projection along Y onto X is the map

v = x + y 7→ x.

We call a A ∈ M

(F ) a projection matrix if L

∈ L(F

) is a projection.

x = T(v)

Example 2.55. A =



6 −2

3 −1



is a projection matrix with R(A) = Span





and N(A) = Span





Indeed, it is straightforward to describe all projection matrices in M

(R ). There are three cases:

1. A = I is the identity matrix: R(A) = R

and N(A) = {0};

2. A = 0 is the zero matrix: R(A) = {0} and N(A) = R

;

3. Choose distinct subspaces R(A) = Span

(

)

and N(A) = Span

(

)

, then

A =

ad − bc





( d −c) =

ad − bc



ad −ac

bd −bc



Think about why this last does what we claim.

It should be clear that every projection T has (at most) two eigenspaces:

• R(T) is an eigenspace with eigenvalue 1

• N(T) is an eigenspace with eigenvalue 0

If V is ﬁnite-dimensional and ρ, η are bases of R(T), N(T) respectively, then the matrix of T with

respect to ρ ∪ η has block form

[T]

ρ∪η



I 0

0 0



where rank I = rank T. In particular, every ﬁnite-dimensional projection is diagonalizable.

Lemma 2.56. T ∈ L(V) is a projection if and only if T

= T.

Proof. Throughout, assume r ∈ R(T) and n ∈ N( T).

( ⇒) Since every vector in V has a unique representation v = r + n, simply compute

( v) = T



T(r + n)



= T(r) = r = T(v)

( ⇐) Suppose T

= T. Note ﬁrst that if r ∈ R( T), then r = T(v) for some v ∈ V, whence

T(r) = T

( v) = T(v) = r (†)

Thus T is the identity on R(T). Moreover, if x ∈ R(T) ∩ N(T), (†) says that x = T(x) = 0, whence

R(T) ∩ N(T) = {0}

and so R(T) ⊕N(T) is a well-deﬁned subspace of V.

To ﬁnish things off, let v ∈ V and observe that



v −T(v)



= T(v) −T

( v) = 0 =⇒ v −T(v) ∈ N(T)

so that v = T(v) +



v − T(v)



is a decomposition into R(T)- and N(T)-parts. We conclude that

V = R(T) ⊕ N(T) and that T is a projection.

Thus far the discussion hasn’t had anything to do with inner products. . .

Deﬁnition 2.57. An orthogonal projection is a projection T ∈ L(V) on an inner product space for

which we additionally have

N(T) = R(T)

⊥

and R(T) = N(T)

⊥

Alternatively, given V = W ⊕ W

⊥

, the orthogonal projection π

is the projection along W

⊥

onto W:

that is

R(π

) = W and N(π

) = W

⊥

The complementary orthogonal projection π

⊥

= I − π

has R(π

⊥

) = W

⊥

and N(π

⊥

) = W.

Example (2.55 continued). The identity and zero matrices are both 2 × 2 orthogonal projection

matrices, while those of type 3 are orthogonal if

(

)

(

)

= 0: we obtain

A =

+ b





(a b) =

+ b



ab b



More generally, if W ≤ F

has orthonormal basis {w

, . . . , w

}, then the matrix of π

∑

j=1

∗

In ﬁnite dimensions, the rank–nullity theorem and dimension counting ﬁnishes the proof here without having to

proceed further:

dim(R(T) ⊕N(T)) = rank T + null T = dim V =⇒ R(T) ⊕ N(T) = V

Theorem 2.58. A projection T ∈ L(V) is orthogonal if and only if it is self-adjoint T = T

∗

Proof. (⇒) By assumption, R(T) and N(T) are orthogonal subspaces. Letting x, y ∈ V and using

subscripts to denote R(T)- and N(T)-parts, we see that

⟨

x, T(y)

⟩

⟨

+ x

, y

⟩

⟨

, y

+ y

⟩

⟨

T(x), y

⟩

=⇒ T

∗

= T

(⇐) Suppose T is a self-adjoint projection. By the fundamental subspaces theorem,

N(T) = N(T

∗

) = R(T)

⊥

Since T is a projection already, we have V = R(T) ⊕ N(T) = R(T) ⊕ R(T)

⊥

, from which

R(T) =



R(T)

⊥



⊥

= N(T)

⊥

The language of projections allows us to rephrase the Spectral Theorem.

Theorem 2.59 (Spectral Theorem, mk. II). Let V be a ﬁnite-dimensional complex/real inner prod-

uct space and T ∈ L(V) be normal/self-adjoint with spectrum {λ

, . . . , λ

} and corresponding

eigenspaces E

, . . . , E

. Let π

∈ L(V) be the orthogonal projection onto E

. Then:

1. V = E

⊕···⊕E

is a direct sum of orthogonal subspaces, in particular, E

⊥

is the direct sum of

the remaining eigenspaces.

2. π

= 0 if i = j.

3. (Resolution of the identity) I

= π

+ ··· + π

4. (Spectral decomposition) T = λ

+ ··· + λ

Proof. 1. T is diagonalizable and so V is the direct sum of the eigenspaces of T. Since T is normal,

the eigenvectors corresponding to distinct eigenvalues are orthogonal, whence the eigenspaces

are mutually orthogonal. In particular, this says that

i=j

≤ E

⊥

Since V is ﬁnite-dimensional, we have V = E

⊕ E

⊥

, whence

dim

∑

i=j

dim E

= dim V −dim E

= dim E

⊥

=⇒

= E

⊥

2. This is clear by part 1, since N(π

) = E

⊥

3. Write x =

∑

j=1

where each x

∈ E

. Then π

( x) = x

: now add. . .

4. T(x) =

∑

j=1

T(x

) =

∑

j=1

∑

j=1

( x).

Examples 2.60. We verify the resolution of the identity and the spectral decomposition; for clarity,

we index projections and eigenspaces by eigenvalue rather than the natural numbers.

1. The symmetric matrix A =



10 2

2 7



has spectrum {6, 11} and orthonormal eigenvectors

√



−2



, w

√





The corresponding projections therefore have matrices

= w



−2



(1 −2) =



1 −2

−2 4



, π

= w



4 2

2 1



from which the resolution of the identity and the spectral decomposition are readily veriﬁed:

+ π



1 0

0 1



and 6π

+ 11π



6 + 44 −12 + 22

−12 + 22 24 + 11



= A

2. The normal matrix B =



1+i 1+i

−1−i 1+i



∈ M

(C ) has spectrum {2, 2i} and corresponding orthonor-

mal eigenvectors

√





, w

√





The orthogonal projection matrices are therefore

= w

∗





( −i 1) =



1 i

−i 1



= w

∗



1 −i

i 1



from which

+ π



1 0

0 1



and 2π

+ 2iπ



1 i

−i 1





i 1

−1 i



= B

3. The matrix C =



0 1 1

1 0 1

1 1 0



has spectrum {−1, 2}, an orthonormal eigenbasis

{u, v, w} =







√









√





−1





√





−2











and eigenspaces E

= Span{u} and E

−1

= Span{v, w}. The orthogonal projections have ma-

trices

= uu









(1 1 1) =





1 1 1





−1

= vv

+ ww





−1





(1 −1 0) +





−2





(1 1 −2)





1 −1 0

−1 1 0

0 0 0









1 1 −2

−2 −2 4









2 −1 −1

−1 2 −1

−1 −1 2





It is now easy to check the resolution of the identity and the spectral decomposition:

+ π

−1

= I and 2π

−π

−1

= C

Orthogonal Projections and Minimization Problems

We ﬁnish this section with an important observation that drives much of the application of inner

product spaces to other parts of mathematics and beyond. Throughout this discussion, X and Y

denote inner product spaces.

Theorem 2.61. Suppose Y = W ⊕ W

⊥

. For any y ∈ Y, the orthogonal projection π

( y) is the

unique element of W which minimizes the distance to y:

∀w ∈ W,

y − π

( y)

≤

y − w

Proof. Apply Pythagoras’ Theorem: since π

⊥

( y) = y − π

( y) ∈ W

⊥

y − w

y − π

( y) + π

( y) − w

y − π

( y)

( y) − w

≥

y − π

( y)

with equality if and only if w = π

( y).

This set up can be used to compute accurate approximations in many contexts.

Examples 2.62. 1. To obtain a quadratic polynomial approximation p(x) = a + bx + cx

to e

the interval [−1, 1] we choose to minimize the integral

−1

− p(x)

dx, namely the squared

-norm

− p(x)

on C[−1, 1]. If we let W = Span{1, x, x

}, then the ﬁnite-dimensionality

of W means that C[−1, 1] = W ⊕W

⊥

. By the Theorem, the solution is p(x) = π

( e

To compute this, recall that we have an orthonormal basis for W, namely



√

(3x

−1)



from which

p(x) =

⟨

1, e

⟩

⟨

x, e

⟩

x +

(3x

−1)



−1, e



−1

dx +

−1

(3x

−1)

−1

(3x

−1)e

( e −e

−1

) + 3e

−1

x +

( e −7e

−1

)(3x

−1)

≈ 1.18 + 1.10x + 0.179(3x

−1)

≈ 1 + 1.1x + 0.537x

−1 0 1

The linear and quadratic approximations to y = e

are drawn. Compare this with the Maclaurin

polynomial e

≈ 1 + x +

from calculus.

2. The n

Fourier approximation of a function f (x) is its orthogonal projection onto the ﬁnite-

dimensional subspace

= Span{1, e

, e

−ix

, . . . , e

inx

, e

−inx

}

= Span{1, cos x, sin x, cos 2x, sin 2x, . . . , cos nx, sin nx}

within L

[−π, π], namely

(x) =

2π

∑

k=−n

f (x), e

ikx

2π

⟨

f (x), 1

⟩

∑

k=1

⟨

f (x), cos kx

⟩

cos kx +

⟨

f (x), sin kx

⟩

sin kx

According to the Theorem, this is the unique function in F

(x) ∈ W

minimizing the integral

f (x) −F

(x)

−π

f (x) −F

(x)

For example, if f (x) =

(

1 if 0 < x ≤ π

−1 if −π < x ≤ 0

is extended periodically, then

2n−1

(x) =

∑

j=1

sin( 2j −1)x

2j −1



sin x +

sin 3x

sin 5x

+ ··· +

sin( 2n −1)x

2n −1



−1

π 2π−π−2π

y = f (x) and its eleventh Fourier approximation y = F

(x)

We’ll return to related minimization problems in the next section.

Exercises 2.6 1. Compute the matrices of the orthogonal projections onto W viewed as subspaces

of the standard inner product spaces R

or C

(a) W = Span



−1



(b) W = Span



















−1







































(d) W = Span





























(watch out, these vectors aren’t orthogonal!)

2. For each of the following matrices, compute the projections onto each eigenspace, verify the

resolution of the identity and the spectral decomposition.

(a)



1 2

2 1



(b)



0 −1

1 0



(c)



2 3 −3i

3 + 3i 5



(d)





2 1 1

1 2 1

1 1 2





(You should have orthonormal eigenbases from the previous section)

3. If W be a ﬁnite-dimensional subspace of an inner product space V. If T = π

is the orthogonal

projection onto W, prove that I −T is the orthogonal projection onto W

⊥

4. Let T ∈ L(V) where V is ﬁnite-dimensional.

(a) If T is an orthogonal projection, prove that

T(x)

≤

for all x ∈ V.

(b) Give an example of a projection for which the inequality in (a) is false.

T(x)

for all x ∈ V, what is T?

(d) If T is a projection for which

T(x)

≤

for all x ∈ V, prove that T is an orthogonal

projection.

5. Let T be a normal operator on a ﬁnite-dimensional inner product space. If T is a projection,

prove that it must be an orthogonal projection.

6. Let T be a normal operator on a ﬁnite-dimensional complex inner product space V. Use the

spectral decomposition T = λ

+ ··· + λ

to prove:

(a) If T

is the zero map for some n ∈ N, then T is the zero map.

(b) U ∈ L(V) commutes with T if and only if U commutes with each π

= T.

(d) T is invertible if and only if λ

= 0 for all j.

(e) T is a projection if and only if every λ

= 0 or 1.

(f) T = −T

∗

if and only if every λ

is imaginary.

7. Find a linear approximation to f (x) = e

on [0, 1] using the L

inner product.

8. Consider the L

inner product on C[−π, π] inner product.

(a) Explain why



sin x, x



= 0 for all n.

(b) Find linear and cubic approximations to f (x) = sin x.

(Feel free to use a computer algebra package to evaluate the integrals!)

9. Revisit Example 2.62.2

(a) Verify that the general complex (e

ikx

) and real (cos kx, sin kx) expressions for the Fourier

approximation are correct.

(Hint: use Euler’s formula e

ikx

= cos kx + i sin kx)

(b) Verify the explicit expression for F

2n−1

(x) when f (x) is the given step-function. What is

(x) in this case?

2.7 The Singular Value Decomposition and the Pseudoinverse

Given T ∈ L(V, W) between ﬁnite-dimensional inner product spaces, the overarching concern of this

chapter is the existence and computation of bases β, γ of V, W with two properties:

• That β, γ be orthonormal, thus facilitating easy calculation within V, W;

• That the matrix [T]

be as simple as possible.

We have already addressed two special cases:

Spectral Theorem When V = W and T is normal/self-adjoint, ∃β = γ such that [T]

is diagonal.

Schur’s Lemma When V = W and p(t) splits, ∃β = γ such that [ T]

is upper triangular.

In this section we allow V = W and β = γ, and obtain a result that applies to any linear map between

ﬁnite-dimensional inner product spaces.

Example 2.63. Let A =



3 1

2 −2

1 3



and consider orthonormal bases β = {v

, v

} of R

and γ =

, w

} of R

respectively:

β =



√





√



−1



, γ =







√









√





−1

−2





√





−1











Since Av

= 4w

and Av

= 2

√

, whence [L

]



4 0

0 2

√

0 0



is almost diagonal.

Our main result says that such bases always exist.

Theorem 2.64 (Singular Value Decomposition). Suppose V, W are ﬁnite-dimensional inner prod-

uct spaces and that T ∈ L(V, W) has rank r. Then:

1. There exist orthonormal bases β = {v

, . . . , v

} of V and γ = {w

, . . . , w

} of W, and positive

scalars σ

≥ σ

≥ ··· ≥ σ

such that

T(v

) =

(

if j ≤ r

0 otherwise

equivalently [T]



diag(σ

, . . . , σ

) O

O O



2. Any such β is an eigenbasis of T

∗

T, whence the scalars σ

are uniquely determined by T: indeed

∗

T(v

) =

(

if j ≤ r

0 otherwise

and T

∗

( w

) =

(

if j ≤ r

0 otherwise

3. If A ∈ M

m×n

(F ) with rank A = r, then A = PΣQ

∗

where

Σ = [L

]



diag(σ

, . . . , σ

) O

O O



, P = (w

, . . . , w

), Q = (v

, . . . , v

)

Since the columns of P, Q are orthonormal, these matrices are unitary.

Deﬁnition 2.65. The numbers σ

, . . . , σ

are the singular values of T. If T is not maximum rank, we

have additional zero singular values σ

r+1

= ··· = σ

min(m,n)

= 0.

Freedom of Choice While the singular values are uniquely determined, there is often signiﬁcant free-

dom regarding the bases β and γ, particularly if any eigenspace of T

∗

T has dimension ≥ 2.

Special Case (Spectral Theorem) If V = W and T is normal/self-adjoint, we may choose β to be an

eigenbasis of T, then σ

is the modulus of the corresponding eigenvalue (see Exercise 7).

Rank-one decomposition If we write g

: V → F for the linear map g

: v 7→



v, v



(recall Riesz’s

Theorem), then the singular value decomposition says

T =

∑

j=1

that is T(v) =

∑

j=1



v, v



thus rewriting T as a linear combination of rank-one maps (w

: V → W).

For matrices, g

is simply multiplication by the row vector v

∗

and we may write

A =

∑

j=1

∗

Example (2.63 cont). We apply the method in the Theorem to A =



3 1

2 −2

1 3



The symmetric(!) matrix A

A =



14 2

2 14



has eigenvalues σ

= 16, σ

= 12 and orthonormal eigen-

vectors v

√





, v

√



−1



. The singular values are therefore σ

= 4, σ

= 2

√

3. Now

compute

√





√









, w

√



−1



√





−1

−2





and observe that these are orthonormal. Finally choose w

√



−1



to complete the orthonormal

basis γ of R

. A singular value decomposition is therefore

A =





3 1

2 −2

1 3





= PΣQ

∗







√

−1

√

−2

√

−1

√

−1

√











4 0

0 2

√

0 0





√

−1

√

By expanding the decomposition, A is expressed as the sum of rank-one matrices:

+ σ

= 4







√









√



+ 2

√







−1

√

−2

√









−1

√







2 2

0 0

2 2









1 −1

2 −2

−1 1





Since β is orthonormal, it is common to write v

∗

for the map g

= ⟨ , v

⟩ in general contexts. To those familiar with

the dual space V

∗

= L(V, F), the set {g

, . . . , g

} = {v

∗

, . . . , v

∗

} is the dual basis to β. In this course v

∗

will only ever mean

the conjugate-transpose of a column vector in F

. This discussion is part of why physicists write inner products differently!

Proof. 1. Since T

∗

T is self-adjoint, the spectral theorem says it has an orthonormal basis of eigen-

vectors β = {v

, . . . , v

}. If T

∗

T(v

) = λ

, then



T(v

), T(v

)





∗

T(v

), v



= λ



, v



(

if j = k

0 if j = k

(∗)

whence every eigenvalue is a non-negative real number: λ



T(v

)



≥ 0.

Since rank T

∗

T = rank T = r (Exercise 8), exactly r eigenvalues are non-zero; by reordering

basis vectors if necessary, we may assume

≥ ··· ≥ λ

> 0

If j ≤ r, deﬁne σ

> 0 and w

T(v

), then the set {w

, . . . , w

} is orthonormal (∗).

If necessary, extend this to an orthonormal basis γ of W.

2. If orthonormal bases β and γ exist such that [T]



diag(σ

,...,σ

) O

O O



, then [T

∗

]

is essentially the

same matrix just that its dimensions have been reversed:

∗

]



diag(σ

, . . . , σ

) O

O O



=⇒ T

∗

T =



diag(σ

, . . . , σ

) O

O O



whence T

∗

and T

∗

T are as claimed.

3. This is merely part 1 in the context of T = L

∈ L( F

, F

). The orthonormal bases β, γ consist

of column vectors and so the (change of co-ordinate) matrices P, Q are unitary.

Examples 2.66. 1. The matrix A =



2 3

0 2



has A

A =



4 6

6 13



with eigenvalues σ

= 16 and σ

= 1

and orthonormal eigenbasis

β =



√





√



−2



The singular values are therefore σ

= 4 and σ

= 1, from which we obtain

γ =







√





√



−1



and the decomposition

A = PΣQ

∗

√

−1

√



4 0

0 1



√

−2

√

Multiplying out, we may write A as a sum of rank-one matrices

A =







1 2





−1





−2 1





2 4

1 2





2 −1

−4 2



2. The decomposition can be very messy to ﬁnd in non-matrix situations. Here is a classic example

where we simply observe the structure directly.

The L

inner product

⟨

f , g

⟩

f (x)g(x) dx on P

(R ) and P

(R ) admits orthonormal bases

β =

√

5( 6x

−6x + 1),

√

3( 2x −1), 1

, γ =

√

3( 2x −1), 1

Let T =

be the derivative operator. The matrix of T is already in the required form!

[T]



√

15 0 0

0 2

√

3 0



thus β , γ are suitable bases and the singular values of T are σ

= 2

√

15 and σ

= 2

√

Since β , γ are orthonormal, we could have used the adjoint method to evaluate this directly,

∗

= ([T]

)

[T]





60 0 0

0 12 0

0 0 0





=⇒ σ

= 60, σ

= 12

Up to sign, {[v

]

, [v

]

, [v

]

} is therefore forced to be the standard ordered basis of R

, con-

ﬁrming that β was the correct basis of P

(R ) all along!

The Pseudoinverse

The singular value decomposition of a map T ∈ L(V, W) gives rise to a natural map from W back to

V. This map behaves somewhat like an inverse even when the operator is non-invertible!

Deﬁnition 2.67. Given the singular value decomposition of a rank r map T ∈ L(V, W), the pseu-

doinverse of T is the linear map T

†

∈ L(W, V) deﬁned by

†

( w

) =

(

if j ≤ r

0 otherwise

Restricted to Span{v

, . . . , v

} → Span{w

, . . . , w

}, the pseudoinverse really does invert T:

†

T(v

) =

(

if j ≤ r

0 otherwise

†

( w

) =

(

if j ≤ r

0 otherwise



= Span{v

, . . . , v

}

| {z }

N(T)

⊥

=R(T

†

)

⊕ Span{v

r+1

, . . . , v

}

| {z }

N(T)=R(T

†

)

⊥



}

bijection



†

R(T)=N(T

†

)

⊥

z }| {

Span{w

, . . . , w

} ⊕

R(T)

⊥

=N( T

†

)

z }| {

Span{w

r+1

, . . . , w

}

Otherwise said, the combinations are orthogonal projections: T

†

T = π

⊥

N(T)

and TT

†

= π

R(T)

Given the singular value decomposition of a matrix A = PΣQ

∗

, its pseudoinverse is that of matrix of

)

†

, namely

†

∑

j=1

∗

= QΣ

†

∗

where Σ

†



diag(σ

−1

, . . . , σ

−1

) O

O O



Examples 2.68. 1. Again continuing Example 2.63, A =



3 1

2 −2

1 3



has pseudoinverse

†

∗

√





(1 0 1) +

√



−1



( −1 −2 1)



1 0 1





1 2 −1

−1 −2 1





5 4 1

1 −4 5



which is exactly what we would have found by computing A

†

= QΣ

†

∗

. Observe that

†

A =



1 0

0 1



and AA

†





2 1 1

1 2 1

1 1 2





are the orthogonal projection matrices onto the spaces N(A)

⊥

= Span{v

, v

} = R

and

R(A) = Span{w

, w

} ≤ R

respectively. Both spaces have dimension 2, since rank A = 2.

2. The pseudoinverse of T =

: P

(R ) → P

(R ), as seen in Example 2.66.2, maps

†

(

√

3( 2x −1)) =

√

5( 6x

−6x + 1) =

√

(6x

−6x + 1)

†

(1) =

√

3( 2x −1) = x −

=⇒ T

†

(a + bx) = T

†



a +

√

3( 2x −1)





a +



x −



(6x

−6x + 1)

+ ax −

−

The pseudoinverse of ‘differentiation’ therefore returns a particular choice of anti-derivative,

namely the unique anti-derivative of a + bx lying in Span{

√

5( 6x

−6x + 1),

√

3( 2x −1)}.

Exercises 2.7 1. Find the ingredients β, γ and the singular values for each of the following:

(a) T ∈ L(R

, R

) where T

(

)



x+y

x−y



(b) T : P

(R ) → P

(R ) and T( f ) = f

′′

where

⟨

f , g

⟩

f (x)g(x) dx

⟨

f , g

⟩

2π

f (x)g(x) dx, with T( f ) = f

′

+ 2 f

2. Find a singular value decomposition of each of the matrices:

(a)





1 1

−1 −1





(b)



1 0 1

1 0 −1



(c)





1 1 1

1 −1 0

1 0 −1





3. Find an explicit formula for T

†

for each map in Exercise 1.

4. Find the pseudoinverse of each of the matrix in Exercise 2.

5. Suppose A = PΣQ

∗

is a singular value decomposition.

(a) Describe a singular value decomposition of A

∗

(b) Explain why A

†

= QΣ

†

∗

isn’t a singular value decomposition of A

†

; what would be the

correct decomposition? (Hint: what is wrong with Σ

†

6. Suppose T : V → W is written according to the singular value theorem. Prove that γ is a basis

of eigenvectors of TT

∗

with the same non-zero eigenvalues as T

∗

T, including repetitions.

7. (a) Suppose T = L(V) is normal. Prove that each v

in the singular value theorem may be

chosen to be an eigenvector of T and that σ

is the modulus of the corresponding eigenvalue.

(b) Let A =



0 1

1 0



. Show that any orthonormal basis β of R

satisﬁes the singular value theo-

rem. What is γ here? What is it about the eigenvalues of A that make this possible?

(Even when T is self-adjoint, the vectors in β need not also be eigenvectors of T!)

8. In the proof of the singular value theorem we claimed that rank T

∗

T = rank T. Verify this by

checking explicitly that N(T

∗

T) = N( T) .

(This is circular logic if you use the decomposition, so you must do without!)

9. Let V, W be ﬁnite-dimensional inner product spaces and T ∈ L( V, W). Prove:

(a) T

∗

†

= T

†

∗

= T

∗

(Hint: evaluate on the basis γ = {w

, . . . , w

} in the singular value theorem)

(b) If T is injective, then T

∗

T is invertible and T

†

= (T

∗

−1

∗

is invertible and T

†

= T

∗

(TT

∗

)

−1

10. Consider the equation T(x) = b, where T is a linear map between ﬁnite-dimensional inner

product spaces. A least-squares solution is a vector x which minimizes

T(x) −b

(a) Prove that x

= T

†

( b) is a least-squares solution and that any other has the form x

+ n

for some n ∈ N(T).

(Hint: Theorem 2.61 says that x

is a least-squares solution if and only if T(x

) = π

R(T)

( b))

(b) Prove that x

= T

†

( b) has smaller norm than any other least-squares solution.

= T

†

( b) is the unique least-squares solution.

11. Find the minimal norm solution to the ﬁrst system, and the least-squares solution to the second:

(

3x + 2y + z = 9

x −2y + 3z = 3











3x + y = 1

2x −2y = 0

x + 3y = 0

Linear Regression (non-examinable)

Given a data set {(t

, y

) : 1 ≤ j ≤ m}, we may employ the least-squares method to ﬁnd a best-ﬁtting

line y = c

+ c

t; often used to predict y given a value of t.

The trick is to minimize the sum of the squares of the vertical deviations of the line from the data set.

∑

j=1



−c



y − Ax

where A =













x =





y =













With the indicated notation, we recognize this as a least-squares problem. Indeed if there are at least

two distinct t-values in the data set, then rank A = 2 is maximal and we have a unique best-ﬁtting

line with coefﬁcients given by





= A

†

y = (A

−1

Example 2.69. Given the data set {(0, 1), (1, 1), (2, 0), (3, 2), (4, 2)}, we compute

A =







0 1

1 1

2 1

3 1

4 1







, y =













=⇒ x





= (A

−1

y =



30 10

10 5



−1









The regression line therefore has equation y =

( t + 2).

The process can be applied more generally to approximate using other functions. To ﬁnd the best-

ﬁtting quadratic polynomial y = c

+ c

t + c

, we’d instead work with

A =













x =









y =













=⇒

∑

j=1



−c

t − c



y − Ax

Provided we have at least three distinct values t

, t

, the matrix A is guaranteed to have rank 3 and

there will be a best-ﬁtting least-squares quadratic: in this case

y =

(15t

−39t + 72)

This curve and the best-ﬁtting straight line are shown below.

0 1 2 3 4

Optional Problems Use a computer to invert any 3 ×3 matrices!

1. Check the calculation for the best-ﬁtting least-squares quadratic in Example 2.69.

2. Find the best-ﬁtting least-squares linear and quadratic approximations to the data set

{

(1, 2), (3, 4), (5, 7), (7, 9), (9, 12)

}

3. Suppose a data set {(t

, y

) : 1 ≤ j ≤ m} has unique regression line y = ct + d.

(a) Show that the equations A

= A

y can be written in matrix form



∑







∑



(b) Recover the standard expressions from statistics:

c =

Cov(t, y)

and d = y − ct

where

• t =

∑

j=1

and y =

∑

j=1

are the means (averages),

• σ

∑

j=1

( t

−t)

is the variance,

• Cov(t, y) =

∑

j=1

( t

−t)(y

−y) is the covariance.

2.8 Bilinear and Quadratic Forms

In this section we slightly generalize the idea of an inner product. Throughout, V is a vector space

over a ﬁeld F; it need not be an inner product space and F can be any ﬁeld (not just R or C).

Deﬁnition 2.70. A bilinear form B : V ×V → F is linear in each entry: ∀v, x, y ∈ V, λ ∈ F,

B(λx + y, v) = λB(x, v) + B(y, v), B(v, λx + y) = λB(v, x) + B(v, y)

Additionally, B is symmetric if ∀x, y ∈ V, B(x, y) = B(y, x).

Examples 2.71. 1. If V is a real inner product space, then the inner product

⟨

⟩

is a symmetric

bilinear form. Note that a complex inner product is not bilinear!

2. If A ∈ M

(F ), then B(x, y) := x

Ay is a bilinear form on F

. For instance, on R

B(x, y) = x



1 2

2 0



y = x

+ 2x

deﬁnes a symmetric bilinear form, though not an inner product since it isn’t positive deﬁnite;

for example B(j, j) = 0.

As seen above, we often make use of a matrix.

Deﬁnition 2.72. Let B be a bilinear form on a ﬁnite-dimensional space with basis ϵ = {v

, . . . , v

The matrix of B with respect to ϵ is the matrix [B]

= A ∈ M

(F ) with ij

entry

= B(v

, v

)

Given x, y ∈ V, compute their co-ordinate vectors [x]

, [y]

with respect to ϵ, then

B(x, y) = [x]

A[y]

The set of bilinear forms on V is therefore in bijective correspondence with M

(F ). Moreover,

B(y, x) = [y]

A[x]



[y]

A[x]



= [x]

[y]

Finally, if β is another basis of V, then an appeal to the change of co-ordinate matrix Q

yields

B(x, y) = [x]

A[y]

= (Q

[x]

)

A(Q

[y]

) = [x]

)

[y]

=⇒ [B]

= (Q

)

[B]

To summarize:

Lemma 2.73. Let B be a bilinear form on a ﬁnite-dimensional vector space.

1. If A is the matrix of B with respect to some basis, then every other matrix of B has the form

AQ for some invertible Q.

2. B is symmetric if and only if its matrix with respect to any (and all) bases is symmetric.

Naturally, the simplest situation is when the matrix of B is diagonal. . .

Examples 2.74. 1. Example 2.71.2 can be written

B(x, y) = x



1 2

2 0



y = x

+ 2x

= (x

+ 2x

)(y

+ 2y

) −4x



+ 2x





1 0

0 −4



+ 2y



= x



1 2

0 1





1 0

0 −4



1 2

0 1



If ϵ is the standard basis, then [B]



1 0

0 −4



where Q



1 2

0 1



. It follows that Q



1 −2

0 1



from which β =







−2



is a diagonalizing basis.

2. In general, we may perform a sequence of simultaneous row and column operations to diago-

nalize any symmetric B; we require only elementary matrices E

(λ)

of type III.

For instance:

B(x, y) = x

Ay = x





1 −2 3

−2 0 4

3 4 −1





y (A = [B]

with respect to the standard basis)

(2)





1 0 3

0 −4 10

3 10 −1





(add twice row 1 to row 2, columns similarly)

(−3)

(2)

(−3)





1 0 0

0 −4 10

0 10 −10





(subtract 3 times row 1 from row 3, etc.)

(1)

(−3)

(2)

(−3)

(1)





1 0 0

0 6 0

0 0 −10





(add row 3 to row 2, etc.)

If β is the diagonalizing basis, the the change of co-ordinate matrix is

= E

(2)

(−3)

(1)





1 2 0

0 1 0

0 0 1









1 0 −3

0 1 0

0 0 1









1 0 0

0 1 0

0 1 1









1 −1 −3

0 1 0

0 1 1





from which β =

n





−1





−3

o

. If you’re having trouble believing this, invert the

change of co-ordinate matrix and check that

B(x, y) = (x

−2x

+ 3x

)(y

−2y

+ 3y

) + 6x

−10(−x

+ x

)(−y

+ y

)

Warning! If F = R then every symmetric B may be diagonalized by an orthonormal basis (for the

usual dot product on R

). It is very unlikely that our algorithm will produce such! The algorithm has

two main advantages over the spectral theorem: it is typically faster and it applies to vector spaces

over any ﬁeld. As a disadvantage, it is highly non-unique.

Recall that E

(λ)

is the identity matrix with an additional λ in the ij

entry.

• As a column operation (right-multiplication), A 7→ AE

(λ)

adds λ times the i

column to the j

• As a row operation (left-multiplication), A 7→ E

(λ)

A = (E

(λ)

)

A adds λ times the i

row to the j

Example 2.75. We diagonalize B(x, y) = x



1 6

6 3



y = x

+ 6x

+ 3x

in three ways.

•



1 0

−6 1



1 6

6 3



1 −6

0 1





1 0

0 −33



= [B]

where β =







−6



. This corresponds to

B(x, y) = (x

+ 6x

)(y

+ 6y

) −33x

•



1 −2

0 1



1 6

6 3



1 0

−2 1





−11 0

0 3



= [B]

where γ =



−2







. This corresponds to

B(x, y) = −11x

+ 3(2x

+ x

)(2y

+ y

)

• If F = R, we may apply the spectral theorem to see that [B]



√

37 0

0 2−

√



is diagonal

with respect to η =

n

√





−1−

√

o

. The expression for B(x, y) in these co-ordinates is

disgusting, so we omit it; that B can be diagonalized orthogonally doesn’t mean it should be!

Theorem 2.76. Suppose B is a bilinear form of a ﬁnite-dimensional space V over F.

1. If B is diagonalizable, then it is symmetric.

2. If B is symmetric and F does not have characteristic two (see aside), then B is diagonalizable.

Proof. 1. If B is diagonalizable, ∃β such that [B]

is diagonal and thus symmetric.

2. Suppose B is non-zero (otherwise the result is trivial). We prove by induction on n = dim V.

If n = 1, the result is trivial: B(x, y) = axy for some a ∈ F is clearly symmetric.

Fix n ∈ N and assume that every non-zero symmetric bilinear form on a dimension n vector

space over F is diagonalizable. Let dim V = n + 1. By the discussion below, ∃x ∈ V such that

B(x, x) = 0. Consider the linear map

T : V → F : v 7→ B(x, v)

Clearly rank T = 1 =⇒ dim N(T) = n. Moreover, B is symmetric when restricted to N(T);

by the induction hypothesis there exists a basis β of N(T) such that [B

N(T)

]

is diagonal. But

then B is diagonal with respect to the basis β ∪{x}.

Aside: Characteristic two ﬁelds This means 1 + 1 = 0 in F; this holds, for instance, in the ﬁeld

= {0, 1} of remainders modulo 2. We now see the importance of char F = 2 to the above result.

• The proof uses the existence of x ∈ V such that B(x, x) = 0. If B is non-zero, ∃u, v such that

B(u, v) = 0. If both B( u, u) = 0 = B(v, v), then x = u + v does the job whenever char F = 2:

B(x, x) = B(u, v) + B(v, u) = 2B(u, v) = 0

• To see that the requirement isn’t idle, consider B(x, y) = x



0 1

1 0



y on the ﬁnite vector space

= {

















} over Z

. Every element of this space satisﬁes B(x, x) = 0! Perhaps

surprisingly, the matrix of B is identical with respect to any basis of Z

, whence B is symmetric

but non-diagonalizable.

In Example 2.75 notice how the three diagonal matrix representations have something in common:

one each of the diagonal entries are positive and negative. This is a general phenomenon:

Theorem 2.77 (Sylvester’s Law of Inertia). Suppose B is a symmetric bilinear form on a real vector

space V with diagonal matrix representation diag(λ

, . . . , λ

). Then the number of entries λ

which

are positive/negative/zero is independent of the diagonal representation.

Deﬁnition 2.78. The signature of a symmetric bilinear form B is the triple (n

, n

−

, n

) representing

how many positive, negative and zero terms are in any diagonal representation. Sylvester’s Law says

that the signature is an invariant of a symmetric bilinear form.

Positive-deﬁniteness says that a real inner product on an n-dimensional space has signature (n, 0, 0).

Practitioners of relativity often work in Minkowski spacetime: R

equipped with a signature (1, 3, 0)

bilinear form, typically

B(x, y) = x







0 0 0

0 −1 0 0

0 0 −1 0

0 0 0 −1







y = c

− x

where c is the speed of light. Vectors are time-, space-, or light-like depending on whether B(x, x) is

positive, negative or zero. For instance x = 3c

−1

+ 2e

+ e

is light-like.

Proof. For simplicity, let V = R

and write B(x, y) = x

Ay where A is symmetric.

1. Deﬁne rank B := rank A and observe this is independent of basis (exercises).

2. Let β, γ be diagonalizing bases, ordered according to whether B is positive, negative or zero.

β = {v

, . . . , v

, v

p+1

, . . . , v

, v

r+1

, . . . , v

}

γ = {w

, . . . , w

, w

q+1

, . . . , w

, w

r+1

, . . . , w

}

Here r = rank B in accordance with part 1: our goal is to prove that p = q.

3. Assume p < q, deﬁne the matrix C and check what follows:

C =







q+1







∈ M

(r−q+p)×n

(R )

(a) rank C ≤ r −q + p < r =⇒ null C > n −r, thus

∃x ∈ R

such that Cx = 0 and x ∈ Span{v

r+1

, . . . , v

}

(b) The ﬁrst p entries of Cx = 0 mean that x ∈ Span{v

p+1

, . . . , v

, v

r+1

, . . . , v

} and so

B(x, x) < 0. Note how we use part (a) to get a strict inequality here.

, . . . , w

, w

r+1

, . . . , w

whence B(x, x) ≥ 0: a contradiction.

Quadratic Forms & Diagonalizing Conics

Deﬁnition 2.79. To every symmetric bilinear form B : V ×V → F is associated a quadratic form

K : V → F : x 7→ B(x, x)

A function K : V → F is termed a quadratic form when such a symmetric bilinear form exists.

Examples 2.80. 1. If B is a real inner product, then K(v) =

⟨

v, v

⟩

is the square of the norm.

2. Let dim V = n and A be the matrix of B with respect to a basis β. By the symmetry of A,

K(x) = x

Ax =

∑

i,j=1

∑

1≤i≤j≤n

where

(

if i = j

if i = j

E.g., K(x) = 3x

+ 4x

−2x

corresponds to the bilinear form B(x, y) = x



3 −1

−1 4



As a fun application, we consider the diagonalization of conics in R

. The general non-zero conic has

equation

+ 2bxy + cy

+ dx + ey + f = 0

where the ﬁrst three terms comprise a quadratic form

K(x) = ax

+ 2bxy + cy

↭ B(v, w) = v



a b

b c



If {v

, v

} is a diagonalizing basis, then there exist scalars λ

, λ

for which

K(t

+ t

) = λ

+ λ

whence the general conic becomes

+ λ

+ µ

= η, λ

, λ

, µ

, η ∈ R

If λ

= 0, we may complete the square via the linear transformation s

= t

2λ

. The canonical

forms are then recovered:

Parabola: λ

or λ

= 0 (but not both).

Ellipse: λ

+ λ

= k = 0 where λ

, λ

, k have the same sign.

Hyperbola: λ

+ λ

= k = 0 where λ

, λ

have opposite signs.

Since B is symmetric, we may take {v

, v

} to be an orthonormal basis of R

, whence any conic may be

put in canonical form by applying only a rotation/reﬂection and translation (completing the square).

Alternatively, we could diagonalize K using our earlier algorithm; this additionally permits shear

transforms. By Sylvester’s Law, the diagonal entries will have the same number of (+, −, 0) terms

regardless of the method, so the canonical form will be unchanged.

Examples 2.81. 1. We describe and plot the conic with equation 7x

+ 24xy = 144.

The matrix of the associated bilinear form is



7 12

12 0



which has orthonormal eigenbasis

β = {v

, v

} =









−3



with eigenvalues (λ

, λ

) = (16, −9). In the rotated ba-

sis, this is the canonical hyperbola

16t

−9t

= 144 ⇐⇒

−

= 1

which is easily plotted. In case this is too fast, use the

change of co-ordinate matrix to compute directly:

−4

−2

−4 −2 2 4

−1

−2

−3

−4

−1

−2

−3

−4



4 −3

3 4



=⇒





= [x]

= Q







4x + 3y

−3x + 4y



which quickly recovers the original conic by substitution.

2. The conic deﬁned by K(x) = x

+ 12xy + 3y

= −33 deﬁnes

a hyperbola in accordance with Example 2.75. With respect

to the basis γ = {



−2







}, we see that

K(x) = −11x

+ 3(2x + y)

= −11t

+ 3t

= −33

⇐⇒

−

= 1

Even though γ is non-orthogonal, we can still plot the conic!

If we instead used the orthonormal basis η, we’d obtain

K(x) = (

√

37 + 2)s

−(

√

37 −2)s

= −33

however the calculation to ﬁnd η is time-consuming and the

expressions for s

, s

are extremely ugly.

−6 −3 3 6

−1

−2

−3

−6

−3

A similar approach can be applied to higher degree quadratic equations/manifolds: e.g. ellipsoids,

paraboloids and hyperboloids in R

Exercises 2.8 1. Prove that the sum of any two bilinear forms is bilinear, and that any scalar multi-

ple of a bilinear form is bilinear: thus the set of bilinear forms on V is a vector space.

(You can’t use matrices here, since V could be inﬁnite-dimensional!)

2. Compute the matrix of the bilinear form

B(x, y) = x

−2x

+ x

− x

on R

with respect to the basis β =

n





−1





o

3. Check that the function B( f , g) = f

′

(0)g

′′

(0) is a bilinear form on the vector space of twice-

differentiable functions. Find the matrix of B with respect to β = {cos t, sin t, cos 2t, sin 2t}

when restricted to the subspace Span β.

4. For each matrix A, ﬁnd a diagonal matrix D and an invertible matrix Q such that Q

AQ = D.

(a)



1 3

3 2



(b)





3 1 2

1 4 0

2 0 −1





5. If K is a quadratic form and K(x) = 2, what is the value of K(3x)?

6. If F does not have characteristic 2, and K(x) = B(x, x) is a quadratic form, prove that we can

recover the bilinear form B via

B(x, y) =

(

K(x + y) − K(x) −K(y)

)

7. If B(x, y) = x



0 1

1 0



y is a bilinear form on F

, compute the quadratic form K(x).

8. Suppose B is a symmetric bilinear form on a real ﬁnite-dimensional space. With reference to

the proof of Sylvester’s Law, explain why rank B is independent of the choice of diagonalizing

basis.

9. If char F = 2, apply the diagonalizing algorithm to the symmetric bilinear form B = x



0 1

1 0



on F

. What goes wrong if char F = 2?

10. Describe and plot the following conics:

(a) x

+ y

+ xy = 6

(b) 35x

+ 120xy = 4x + 3y

11. Suppose that a non-empty, non-degenerate

conic C in R

has the form ax

+ 2bxy + cy

+ dx +

ey + f = 0, where at least one of a, b, c = 0, and deﬁne ∆ = b

− ac. Prove that:

• C is a parabola if and only if ∆ = 0;

• C is an ellipse if and only if ∆ < 0;

• C is a hyperbola if and only if ∆ > 0.

(Hint: λ

, λ

are the eigenvalues of a symmetric matrix, so. . . )

The conic contains at least two points and cannot be factorized as a product of two straight lines: for example, the

following are disallowed;

• x

+ y

+ 1 = 0 is empty (unless one allows conics over C . . .)

• x

+ y

= 0 contains only one point;

• x

− xy − x + y = (x −1)(x −y) = 0 is the product of two lines.