Math 121B — Linear Algebra

Neil Donaldson

Fall 2022

Linear Algebra, Stephen Friedberg, Arnold Insel & Lawrence Spence, 4th Ed 2003, Prentice Hall.

Review from 121A

We begin by recalling a few basic notions and notations.

Vector Spaces Bold-face v denotes a vector in a vector space V over a ﬁeld F. A vector space is closed

under vector addition and scalar multiplication

∀v

, v

∈ V, ∀λ

, λ

∈ F, λ

+ λ

∈ V

Examples. Here are four (families of) vector spaces over the ﬁeld R.

• R

= {xi + yj : x, y ∈ R} = {

(

)

: x, y ∈ R} is a vector space over the ﬁeld R.

• P

(R ); polynomials with degree ≤ n and coefﬁcients in R

• P(R); polynomials over R with any degree.

• C(R); continuous functions from R to R.

Linear Combinations and Spans Let β ⊆ V be a subset of a vector space V over F. A linear

combination of vectors in β is any ﬁnite sum

+ ··· + λ

where λ

∈ F and v

∈ β. The span of β comprises all linear combinations: this is a subspace of V.

Bases and Co-ordinates A set β ⊆ V is a basis of V if it has two properties:

Linear Independence Any linear combination yielding the zero vector is trivial; for distinct v

∈ β,

+ ··· + λ

= 0 =⇒ ∀j, λ

= 0

Spanning Set V = Span β; every vector in V is a (ﬁnite!) linear combination of elements of β.

Theorem. β is a basis of V ⇐⇒ every v ∈ V is a unique linear combination of elements of β.

The cardinality of all basis sets is identical; this is the dimension dim

Example. P

(R ) has standard basis β = {1, x, x

}: every degree ≤ 2 polynomial is a unique as

a linear combination p(x) = a + bx + cx

and so dim P

(R ) = 3. The real numbers a, b, c are the

co-ordinates of p with respect to β; the co-ordinate vector of p is written

[p]









Linearity and Linear Maps A function T : V → W between vector spaces V, W over the same ﬁeld

F is (F-)linear if it respects the linearity properties of V, W

∀v

, v

∈ V, ∀λ

, λ

∈ F, T( λ

+ λ

) = λ

T(v

) + λ

T(v

)

We write L(V, W) for the set (indeed vector space!) of linear maps from V to W: this is shortened to

L(V) if V = W. An isomorphism is an invertible/bijective linear map.

Theorem. If dim

V = n and β is a basis of V, then the co-ordinate map v 7→ [v]

is an isomorphism

of vector spaces V → F

Matrices and Linear Maps If V, W are ﬁnite-dimensional, then any linear map T : V → W can be

described using matrix multiplication.

Example. If A =



2 −1

0 −1

−4 3



, then the linear map L

: R

→ R

(left-multiplication by A) is









2 −1

0 −1

−4 3













2x −y

−y

3y −4x





The linear map in fact deﬁnes the matrix A; we recover the columns of the matrix by feeding the

standard basis vectors to the linear map.





−4





= L









−1





= L





More generally, if T ∈ L(V, W) and β = {v

, . . . , v

} and γ = {w

, . . . , w

} are bases of V, W

respectively, then the matrix of T with respect to β and γ is

[T]



[T(v

)]

··· [T(v

)]



∈ M

m×n

(F )

whose j

column is obtained by feeding the j

basis vector of β to T and taking its co-ordinate vector

with respect to γ. This ﬁts naturally with the co-ordinate isomorphisms

T(v) = w ⇐⇒ [ T]

[v]

= [w]

There are two special cases when V = W:

• If β = γ, then we simply write [T]

instead of [T]

• If T = I is the identity map, then Q

:= [I]

is the change of co-ordinate matrix from β to γ.

Being able to convert linear maps into matrix multiplication is a central skill in linear algebra. Test

your comfort by working through the following; if everything feels familiar, you should consider

yourself in a good place as far as pre-requisites are concerned!

Example. Let T : P

(R ) → P

(R ) be the linear map deﬁned by differentiation

T(a + bx + cx

) = b + 2cx (∗)

The standard bases of P

(R ) and P

(R ) are, respectively, β = {1, x, x

} and γ = {1, x}. Observe that

[T(1)]

= [0]





, [T(x)]

= [1]





, [T(x

)]

= [2x]





=⇒ [T]



[T(1)]

[T(x)]

[T(x

)]





0 1 0

0 0 2



Written in co-ordinates, we see the original linear map (∗)



T(a + bx + cx

)



= [T]

[a + bx + cx

]



0 1 0

0 0 2















[

b + 2cx

]

(†)

1. η = {1 + x, x + x

, x

+ 1} is also a basis of P

(R ). Show that

[T]



1 1 0

0 2 2



2. As in (†) above, the matrix multiplication



1 1 0

0 2 2













a + b

2b + 2c



corresponds to an equation [T(p)]

= [T]

[p]

for some polynomial p(x); what is p(x) in terms

of a, b, c?

3. Find the change of co-ordinate matrix Q

and check that the matrices of T are related by

[T]

= [T]

1 Diagonalizability & the Cayley–Hamilton Theorem

1.1 Eigenvalues, Eigenvectors & Diagonalization (Review)

Deﬁnition 1.1. Suppose V is a vector space over F and T ∈ L(V). A non-zero v ∈ V is an eigenvector

of T with eigenvalue λ ∈ F (together an eigenpair) if

T(v) = λv

For matrices, the eigenvalues/vectors of A ∈ M

(F ) are precisely those of L

∈ L(F

Suppose λ is an eigenvalue of T;

1. The eigenspace of λ is the nullspace E

:= N(T −λI).

2. The geometric multiplicity of λ of the dimension dim E

We say that T is diagonalizable if there exists a basis of eigenvectors; an eigenbasis.

We start by recalling a couple of basic facts, the ﬁrst of which is easily proved by induction.

Lemma 1.2. If v

, . . . , v

are eigenvectors corresponding to distinct eigenvalues, then {v

, . . . , v

}

is linearly independent.

Moreover, if dim

V = n and T ∈ L(V) has n distinct eigenvalues, then T is diagonalizable.

Eigenvalues and Eigenvectors in ﬁnite dimensions

If dim

V = n and ϵ is a basis, then the eigenvector deﬁnition is equivalent to a matrix equation

[T]

[v]

= λ[v]

In such a situation, T being diagonalizable means ∃β such that [T]

is a diagonal matrix

[T]







0 ··· 0

0 λ

0 ··· 0 λ







Thankfully there is systematic way to ﬁnd eigenvalues and eigenvectors in ﬁnite-dimensions:

1. Choose any basis ϵ of V and compute the matrix A = [T]

∈ M

(F ).

2. Observe that

λ ∈ F is an eigenvalue ⇐⇒ ∃[v]

∈ F

\{0} such that A[v]

= λ[v]

⇐⇒ ∃[v]

∈ F

\{0} such that

(

A −λI

)

[v]

= 0

⇐⇒ det

(

A −λI

)

= 0

This last is a degree n polynomial equation whose roots are the eigenvalues.

3. For each eigenvalue λ

, compute the eigenspace E

= N(T − λ

I) to ﬁnd the eigenvectors.

Remember that E

is a subspace of the original vector space V, so translate back if necessary!

Deﬁnition 1.3. The characteristic polynomial of T ∈ L(V) is the degree-n polynomial

p(t) := det(T −tI)

The eigenvalues of T are precisely the solutions to the characteristic equation p(t) = 0.

Examples 1.4. 1. A =



0 −1

1 0



has characteristic polynomial p(t) = t

+ 1 = (t + i)(t − i). As a

linear map L

∈ L(R

), A has no eigenvalues and no eigenvectors!

As a linear map L

∈ L(C

), we have two eigenvalues ±i. Indeed

(A −iI)v =



−i −1

1 −i



v =⇒ E

= Span





and similarly E

−i

= Span



−i



. We therefore have an eigenbasis β = {







−i



} (of C

), with

respect to which

]



i 0

0 −i



2. Let T ∈ L(P

(R )) be deﬁned by

T( f ) (x) = f (x) + (x −1) f

′

(x)

With respect to the standard basis ϵ = {1, x, x

}, we have the non-diagonal matrix

A = [T]





1 −1 0

0 2 −2

0 0 3





=⇒ p(t) = det(A −tI) = (1 −t)(2 −t)(3 −t)

With three distinct eigenvalues, T is diagonalizable. To ﬁnd the eigenvectors, compute the

nullspaces:

= 1: 0 = (A −λ

I)[v

]



0 −1 0

0 1 −2

0 0 2



]

=⇒ [v

]

∈ Span





=⇒ E

= Span{1}

= 2: A −λ

I =



−1 −1 0

0 0 −2

0 0 1



=⇒ [v

]

∈ Span



−1



=⇒ E

= Span{1 −x}

= 3: A −λ

I =



−2 −1 0

0 −1 −2

0 0 0



=⇒ [v

]

∈ Span



−2



=⇒ E

= Span{1 −2x + x

}

Making a sensible choice of non-zero eigenvectors, we obtain an eigenbasis, with respect to

which the linear map is necessarily diagonal

β = {v

, v

} = {1, 1 −x, 1 −2x + x

} = {1, 1 −x, (1 −x)

}



a + b(1 −x) + c(1 − x)



= a + 2b(1 − x) + 3c(1 − x)

[T]





1 0 0

0 2 0

0 0 3





Conditions for diagonalizability of ﬁnite-dimensional operators

We now borrow a little terminology from the theory of polynomials.

Deﬁnition 1.5. Let F be a ﬁeld and p(t) a polynomial with coefﬁcients in F.

1. Let λ ∈ F be a root; p(λ) = 0. The algebraic multiplicity mult(λ) is the largest power of λ −t to

divide p(t). Otherwise said, there exists

some polynomial q(t) such that

p(t) = ( λ − t)

mult(λ)

q(t) and q(λ) = 0

2. We say that p(t) splits over F if it factorizes completely into linear factors; equivalently

∃a, λ

, . . . , λ

∈ F such that

p(t) = a(λ

−t)

···(λ

−t)

When p(t) splits, the algebraic multiplicities sum to the degree n of the polynomial

n = m

+ ··· + m

Of course, we are most interested when p(t) is the characteristic polynomial of a linear map T ∈ L(V).

If such a polynomial splits, then a = 1 and λ

, . . . , λ

are necessarily the (distinct) eigenvalues of T.

Example 1.6. The ﬁeld matters! For instance p(t) = t

+ 1 = (t −i)(t + i) = −( i −t)(−i −t) splits

over C but not over R. Its roots are plainly ±i.

For the purposes of review, we state the main result; this will be proved in the next section.

Theorem 1.7. Let V be ﬁnite-dimensional. A linear map T ∈ L(V) is diagonalizable if and only if,

1. Its characteristic polynomial splits over F, and,

2. The geometric and algebraic multiplicities of each eigenvalue are equal; dim E

= mult(λ

Example 1.8. The matrix A =



3 1 0

0 3 0

0 0 5



is easily seen to have eigenvalues λ

= 3 and λ

= 5. Indeed

p(t) = (3 −t)

(5 −t), mult(3) = 2, mult( 5) = 1

= Span





, E

= Span





, dim E

= dim E

= 1

This matrix is non-diagonalizable since dim E

= 1 = 2 = mult(3).

Everything prior to this should be review. If it feels very unfamiliar, revisit your notes from 121A,

particularly sections 5.1 and 5.2 of the textbook.

The existence follows from Descartes factor theorem and the division algorithm for polynomials.

Exercises 1.1 1. For each matrix over R; ﬁnd its characteristic polynomial, its eigenvalues/spaces,

and its algebraic and geometric multiplicities; decide if it is diagonalizable.

(a) A =



2 0 0 0

0 3 1 0

0 0 3 1

0 0 0 3



(b) B =



−1 6 0 0

−2 6 0 0

0 0 3 0

0 0 0 3



2. Suppose A is a real matrix with eigenpair (λ, v). If λ ∈ R show that (λ, v) is also an eigenpair.

3. Show that the characteristic polynomial of A =



3 −4

4 3



does not split over R. Diagonalize A

over C.

4. Give an example of a 2 ×2 matrix whose entries are rational numbers and whose characteristic

polynomial splits over R, but not over Q.

5. Diagonalize L

∈ L(C

) where C =



2i 1

2 0



6. If p(t) splits, explain why

det T = λ

mult(λ

)

···λ

mult(λ

)

where λ

, . . . , λ

are the distinct eigenvalues of T.

7. Suppose T ∈ L(V) is invertible with eigenvalue λ. Prove that λ

−1

is an eigenvalue of T

−1

with

the same eigenspace E

. If T is diagonalizable, prove that T

−1

is also diagonalizable.

8. If V is ﬁnite-dimensional and T ∈ L(V), we may deﬁne det T to equal det[T]

, where β is any

basis of V. Explain why the choice of basis does not matter; that is, if γ is any other basis of V,

we have det[T]

= det[T]

1.2 Invariant Subspaces and the Cayley–Hamilton Theorem

The proof of Theorem 1.7 is facilitated by a new concept, of which eigenspaces are a special case.

Deﬁnition 1.9. Suppose T ∈ L(V). A subspace W of V is T-invariant if T(W) ⊆ W. In such a case,

the restriction of T to W is the linear map

: W → W : w 7→ T(w)

Examples 1.10. 1. The trivial subspace {0} and the entire vector space V are invariant for any

linear map T ∈ L(V).

2. Every eigenspace is invariant; if v ∈ E

, then T(v) = λv ∈ E

3. Continuing Example 1.8, if A =



3 1 0

0 3 0

0 0 5



then W = Span

{

i, j

}

is an invariant subspace for the

linear map L

. Indeed

A(xi + yj) = (3x + y)i + 3yj ∈ W

W is an example of a generalized eigenspace; we’ll study these properly at the end of term.

To prove our diagonalization criterion, we need to see how to factorize the characteristic polynomial.

It turns out that factors of p(t) correspond to T-invariant subspaces!

Example 1.11. W = Span{i, j} is an invariant subspace of A =



1 2 4

0 3 1

0 0 2



∈ M

(R ). With respect to

the standard basis, the restriction [L

]

has matrix



1 2

0 3



. The characteristic polynomial p

( t) of the

restriction is plainly a factor of the whole,

p(t) = (1 −t)(2 −t)(3 −t) = (2 −t)p

( t)

Theorem 1.12. Suppose T ∈ L(V), that dim V is ﬁnite and that W is a T-invariant subspace of V.

Then the characteristic polynomial of the restriction T

divides that of T.

The proof simply abstracts the approach of the example.

Proof. Extend a basis β

of W to a basis β of V. Since T(w) ∈ Span β

for each w ∈ W, we see that

the matrix of [T] has block form

[T]



A B

O C



=⇒ p(t) = det(A −tI) det(C −tI) = p

( t) det( C −tI)

where p

( t) is the characteristic polynomial, and A = [T

]

the matrix of the restriction T

Corollary 1.13. If λ is an eigenvalue of T, then T

= λI

is a multiple of the identity, whence,

1. The characteristic polynomial of the restriction T

is p

( t) = (λ − t)

dim E

2. p

( t) divides the characteristic polynomial of T. In particular dim E

≤ mult(λ).

We are now in a position to state and prove an extended version of Theorem 1.7.

Theorem 1.14. Suppose dim

V = n and that T ∈ L(V) has distinct eigenvalues λ

, . . . , λ

. The

following are equivalent:

1. T is diagonalizable.

2. The characteristic polynomial splits over F and dim E

= mult(λ

) for each j; indeed

p(t) = p

( t) ··· p

( t) = (λ

−t)

dim E

···(λ

−t)

dim E

∑

j=1

dim E

= n

4. V = E

⊕··· ⊕ E

Example 1.15. A =



7 0 −12

0 1 0

2 0 −3



is diagonalizable. Indeed p(t) = (1 −t)

(3 −t) splits, and we have

1 3

mult(λ) 2 1

Span

n





o

Span





dim E

2 1

and R

= E

⊕ E

With respect to the eigenbasis β =

n









o

, the map is diagonal [L

]



1 0 0

0 1 0

0 0 3



Proof. (1 ⇒ 2) If T is diagonalizable with eigenbasis β, then [T]

is diagonal. But then

p(t) = ( λ

−t)

···(λ

−t)

splits and

∑

mult(λ

) = n. The cardinality n of an eigenbasis is at most

∑

dim E

since every

element is an (independent) eigenvector. By Corollary 1.13 (dim E

≤ mult(λ

)) we see that

n ≤

∑

dim E

≤

∑

mult(λ

) = n =⇒ ∀j, dim E

= mult(λ

)

whence the inequalities are equalities with each pair equal dim E

= mult(λ

)

(2 ⇒ 3) p(t) splits =⇒ n =

∑

mult(λ

) =

∑

dim E

(3 ⇒ 4) Assume E

⊕··· ⊕ E

exists.

If (λ

j+1

, v

j+1

) is an eigenpair, then v

j+1

∈ E

⊕··· ⊕ E

for

otherwise this would contradict Lemma 1.2.

By induction, E

⊕ ···⊕ E

exists; by assumption it has dimension n = dim V and therefore

equals V.

(4 ⇒ 1) For each j, choose a basis β

of E

. Then β := β

∪ ··· ∪ β

is a basis of V consisting of

eigenvectors of T; an eigenbasis.

Distinct eigenspaces have trivial intersection: i

= i

≤ j =⇒ E

∩ E

= {0}.

T-cyclic Subspaces and the Cayley–Hamilton Theorem

We ﬁnish this chapter by introducing a general family of invariant subspaces and using them to prove

a startling result.

Deﬁnition 1.16. Let T ∈ L(V) and let v ∈ V. The T-cyclic subspace generated by v is the span

⟨

⟩

= Span{v, T(v), T

( v), . . .}

Example 1.17. Recalling Example 1.10.3 Let A =



3 1 0

0 3 0

0 0 5



, and v = i + k. It is easy to see that

Av = 3i + 5k, A

v = 9i + 25k, . . . , A

v = 3

i + 5

all of which lie in Span{i, k}. Plainly this is the L

-cyclic subspace

⟨

i + k

⟩

The proof of the following basic result is left as an exercise.

Lemma 1.18.

⟨

⟩

is the smallest T-invariant subspace of V containing v, speciﬁcally:

⟨

⟩

is T-invariant.

2. If W ≤ V is T-invariant and v ∈ W, then

⟨

⟩

≤ W.

3. dim

⟨

⟩

= 1 ⇐⇒ v is an eigenvector of T.

We were lucky in the example that the general form A

v was so clear. It is helpful to develop a more

precise test for identifying the dimension and a basis of a T-cyclic subspace.

Suppose a T-cyclic subspace

⟨

⟩

= Span{v, T(v), T

( v), . . .} has ﬁnite dimension.

Let k ≥ 1 be

maximal such that the set

{v, T(v), . . . , T

k−1

( v)}

is linearly independent.

• If k doesn’t exist, the inﬁnite linearly independent set {v, T(v), . . .} contradicts dim

⟨

⟩

< ∞.

• By the maximality of k, T

( v) ∈ Span{v, T(v), . . . , T

k−1

( v)}; by induction this extends to

j ≥ k =⇒ T

( v) ∈ Span{v, T(v), . . . , T

k−1

( v)}

It follows that

⟨

⟩

= Span{v, T(v), . . . , T

k−1

( v)}, and we’ve proved a useful criterion.

Theorem 1.19. Suppose v = 0, then

dim

⟨

⟩

= k ⇐⇒ {v, T(v), . . . , T

k−1

( v)} is a basis of

⟨

⟩

⇐⇒ k is maximal such that {v, T(v), . . . , T

k−1

( v)} is linearly independent

⇐⇒ k is minimal such that T

( v) ∈ Span{v, T(v), . . . , T

k−1

( v)}

Necessarily the situation if dim V < ∞, when we are thinking about characteristic polynomials.

Examples 1.20. 1. According to the Theorem, in Example 1.17 we need only have noticed

• v = i + k and Av = 3i + 5k are linearly independent.

• That A

( i + k) = 9i + 25k ∈ Span{v, Av}.

We could then conclude that

⟨

⟩

= Span{v, Av} has dimension 2.

2. Let T(p(x)) = 3p(x) − p

′′

(x) viewed as a linear map T ∈ L(P

(R )) and consider the T-cyclic

subspace generated by the polynomial p(x) = x

T(x

) = 3x

−2, T

) = T(3x

−2) = 3(3x

−2) −6 = 9x

−12, . . .

Observe that {x

, T(x

) } is linearly independent, but that

) = 9x

−12 = −9x

+ 6(3x

−2) ∈ Span{x

, T(x

) }

We conclude that dim





= 2. An alternative basis for





is plainly {1, x

We ﬁnish by considering the interaction of a T-cyclic subspace with the characteristic polynomial.

Surprisingly, the coefﬁcients of the characteristic polynomial and the linear combination coincide.

Continuing the Example, if W =





and β

= {x

, T(x

) } = {x

, 3x

−2}, then

]



0 −9

1 6



=⇒ p

( t) = t

−6t + 9

Theorem 1.21. Let T ∈ L(V) and suppose W =

⟨

⟩

has dim W = k with basis

= {w, T(w), . . . , T

k−1

( w)}

in accordance with Theorem 1.19, then

1. If T

( w) + a

k−1

( w) + ···+ a

w = 0, then the characteristic polynomial of T

( t) = (−1)



+ a

k−1

+ ··· + a

t + a



2. p

) = 0 is the zero map on W.

Proof. 1. This is an exercise.

2. Write S ∈ L(V) for the linear map

S := p

(T) = (−1)



+ a

k−1

+ ··· + a



Part 1 says S(w) = 0. Since S is a polynomial in T, it commutes with all powers of T:

∀j, S(T

( w)) = T

(S( w)) = 0

Since S is zero on the basis β

of W, we see that S

is the zero function.

With a little sneakiness, we can drop the W’s in the second part of the Theorem and observe an

intimate relation between a linear map and its characteristic polynomial.

Corollary 1.22 (Cayley–Hamilton). If V is ﬁnite-dimensional, then T ∈ L(V) satisﬁes its charac-

teristic polynomial; p(T) = 0.

Proof. Let w ∈ V and consider the cyclic subspace W =

⟨

⟩

generated by w. By Theorem 1.12,

p(t) = q

( t)p

( t)

for some polynomial q

. But the previous result says that p

(T) (w) = 0, whence

p(T)(w) = 0

Since we may apply this reasoning to any w ∈ V, we conclude that p(T) is the zero function.

Examples 1.23. 1. A =



2 1

3 4



has p(t) = t

−6t + 5 and we conﬁrm:

−6A =



7 6

18 19



−6



2 1

3 4



= −5I

It may seem like a strange thing to do for this matrix, but the characteristic equation can be

used to calculate the inverse of A:

−6A + 5I = 0 =⇒ A(A −6I) = −5I =⇒ A

−1

(6I − A) =



4 −1

−3 2



2. We use the Cayley–Hamilton Theorem to compute A

when

A =





2 −1

0 1 −6

0 0 2





The characteristic polynomial is

p(t) = (2 −t)

(1 −t) = 4 −8t + 5t

−t

By Cayley–Hamilton,

= AA

= A(5A

−8A + 4I)

= 5A

−8A

+ 4A = 5(5A

−8A + 4I) −8A

+ 4A

= 17A

−36A + 20I = 17





4 −3

0 1 −18

0 0 4





−36





2 −1

0 1 −6

0 0 2





+ 20





1 0 0

0 1 0

0 0 1









16 −15

562

0 1 −90

0 0 16





3. Recall Example 1.4.2, where the linear map T( f (x)) = f (x) + (x −1) f

′

(x) had

p(t) = (1 −t)(2 −t)(3 −t) = −t

+ 6t

−11t + 6

By Cayley–Hamilton, T

= 6T

−11T + 6I. You can check this explicitly, after ﬁrst computing

( f (x)) = f (x) + 3(x −1) f

′

(x) + (x −1)

′′

(x), etc.

Cayley–Hamilton can also be used to simplify higher powers of T and even to compute the

inverse!

I =

−6T

+ 11T) =⇒ T

−1

−6T + 11I)

=⇒ T

−1

( f (x)) = f (x) −

(x −1) f

′

(x) +

(x −1)

′′

(x)

Exercises 1.2 1. For the linear map T = L

: R

→ R

where A =



3 0 0

0 2 4

0 0 2



ﬁnd the T-cyclic

subspace generated by the standard basis vector e





2. Let T = L

, where A =



1 2 4

0 3 1

0 0 2



and let v =



−1



. Compute T(v) and T

( v). Hence describe

the T-cyclic subspace

⟨

⟩

and its dimension.

3. Given A =



2 0 0 0

0 3 1 0

0 0 3 1

0 0 0 3



, ﬁnd two distinct L

-invariant subspaces W ≤ R

such that dim W = 3.

4. Suppose that W and X are T-invariant subspaces of V. Prove that the sum

W + X = {w + x : w ∈ W, x ∈ X}

is also T-invariant.

5. Prove Lemma 1.18.

6. Give an example of an inﬁnite-dimensional vector space V, a linear map T ∈ L(V), and a vector

v such that

⟨

⟩

= V.

7. Let β = {sin x, cos x, 2x sin x, 3x cos x} and T =

∈ L(Span β). Plainly the subspace W :=

Span{sin x, cos x} is T-invariant. Compute the matrices [T]

and [T

]

and observe that

p(t) =



( t)



8. Verify explicitly that A =



2 3

0 2



satisﬁes its characteristic polynomial.

9. Check the details of Example 1.23.3 and evaluate T

as a linear combination of I, T and T

. In

particular, check the evaluation of T

−1

( f (x)).

10. Suppose a, b are constants with a = 0 and deﬁne T( f (x)) = a f (x) + b f

′

(x).

(a) Find an expression for the inverse T

−1

( f (x)) if T ∈ L(P

(R ))

(b) Find an expression for the inverse T

−1

( f (x)) if T ∈ L(P

(R ))

Your answers should be written in terms of f and its derivatives.

11. Let T( f ) (x) = f

′

(x) +

f (t) dt be a linear map T ∈ L(P

(R )).

(a) Find the characteristic polynomial of T and identify its eigenspaces. Is T diagonalizable?

(b) Find a, b, c ∈ R such that T

= aT

+ bT + cI.

(R )) and dim Span{T

: k ∈ N

}? Explain.

12. If A =



a b

c d



has non-zero determinant, use the Cayley–Hamilton Theorem to obtain the usual

expression for A

−1

13. Recall Examples 1.10.3, 1.17, and 1.20.1 with A =



3 1 0

0 3 0

0 0 5



(a) If v =





, show that det(v, Av, A

v) = −4y

(b) Hence determine all L

-cyclic subspaces of R

14. (a) Consider Example 1.20.2 where T ∈ L(P

(R )) is deﬁned by T(p(x)) = 3p(x) − p

′′

(x).

Prove that all T-cyclic subspaces have dimension ≤ 2.

(b) What if we instead consider S ∈ L(P

(R ) deﬁned by S(p(x)) = 3p(x) − p

′

(x)?

15. We prove part 1 of Theorem 1.21.

(a) Explain why the matrix of T

with respect to the basis β

]







0 0 0 ··· 0 −a

1 0 0 0 −a

0 1 0 0 −a

0 0 0 0 −a

k−2

0 0 0 ··· 1 −a

k−1







∈ M

(F )

(b) Compute the characteristic polynomial p

( t) = det



]

− tI



by expanding the de-

terminant along the ﬁrst row.

2 Inner Product Spaces, part 1

You should be familiar with the scalar/dot product in R

. For any vectors x =

(

)

, y =





, deﬁne

x ·y := x

+ x

The norm or length of a vector is

√

x ·x.

The angle θ between vectors satisﬁes x ·y =

||||

cos θ.

Vectors are orthogonal or perpendicular precisely when x ·y = 0.

The dot product is what allows us to compute lengths of and angles between vectors in R

. An inner

product is an algebraic structure that generalizes this idea to other vector spaces.

2.1 Real and Complex Inner Product Spaces

Unless explicitly stated otherwise, throughout this chapter F is either the real R or complex C ﬁeld.

Deﬁnition 2.1. An inner product space (V,

⟨

⟩

) is a vector space V over F together with an inner

product: a function

⟨

⟩

: V × V → F satisfying the following properties ∀x, y, z ∈ V, λ ∈ F,

(a) Linear:

⟨

λx + y, z

⟩

= λ

⟨

x, z

⟩

⟨

y, z

⟩

(b) Conjugate-Symmetric:

⟨

y, x

⟩

⟨

x, y

⟩

(complex conjugate!)

⟨

x, x

⟩

> 0

The norm or length of x ∈ V is

⟨

x, x

⟩

. A unit vector has

= 1.

Vectors x, y are perpendicular/orthogonal if

⟨

x, y

⟩

= 0 and orthonormal if they are additionally unit

vectors.

Real inner product spaces

The deﬁnition simpliﬁes slightly when F = R.

• Conjugate-symmetry becomes plain symmetry:

⟨

y, x

⟩

⟨

x, y

⟩

• Linearity + symmetry yields bilinearity: a real inner product is also linear in its second slot

⟨

x, λy + z

⟩

= λ

⟨

x, y

⟩

⟨

x, z

⟩

A real inner product is often termed a positive-deﬁnite, symmetric, bilinear form.

The simplest example is the natural generalization of the dot product.

Deﬁnition 2.2. Euclidean space means R

equipped with the standard inner (dot) product,

⟨

x, y

⟩

= x ·y = y

x =

∑

j=1

= x

+ ··· + x

∑

j=1

Unless the inner product is stated explicitly, if we refer to R

as an inner product space, we mean

Euclidean space. However, there are many other ways to make R

into an inner product space. . .

Example 2.3. Deﬁne an alternative inner product on R

via

⟨

x, y

⟩

= x

+ 3x

It is easy to check that this satisﬁes the required properties:

(a) Linearity: follows from the associative/distributive laws in R,

⟨

λx + y, z

⟩

= (λx

+ y

) z

+ 3(λx

+ y

) z

= λ(x

+ 3x

) + (y

+ 3y

)

= λ

⟨

x, z

⟩

⟨

y, z

⟩

(b) Symmetry: follows from the commutativity of multiplication in R,

⟨

y, x

⟩

= y

+ 3y

= x

+ 3x

⟨

x, y

⟩

⟨

x, x

⟩

= x

+ 3x

> 0.

With respect to

⟨

⟩

, the concept of orthogonality feels strange: e.g., {x, y} =





√



−1



is an

orthonormal set!

+ 3 ·1

) = 1,

+ 3 ·1

) = 1,

⟨

x, y

⟩

√

(3 −3) = 0

However, with respect to the standard dot product, these vectors are not special:

x ·x =

, y ·y =

, x ·y =

√

We have the same vector space R

, but different inner product spaces: (R

⟨

⟩

) = (R

, ·).

The above is an example of a weighted inner product: choose weights a

, . . . , a

∈ R

and deﬁne

⟨

x, y

⟩

∑

j=1

= a

+ ··· + a

It is a simple exercise to check that this deﬁnes an inner product on R

. In particular, R

may be

equipped with inﬁnitely many distinct inner products!

More generally, a symmetric matrix A ∈ M

(R ) is positive-deﬁnite if x

Ax > 0 for all non-zero x ∈ R

It is straightforward to check that

⟨

x, y

⟩

:= y

deﬁnes an inner product on R

. In fact all inner products on R

arise in this fashion! The weighted

inner products correspond to A being diagonal (Euclidean space is A = I), but this is not required.

Example 2.4. The matrix A =



3 1

1 1



is positive-deﬁnite and thus deﬁnes an inner product

⟨

x, y

⟩

= 3x

+ x

Lemma 2.5 (Basic properties). Let V be an inner product space, let x, y, z ∈ V and λ ∈ F.

⟨

0, x

⟩

= 0

= 0 ⇐⇒ x = 0

λx

|||

⟨

x, z

⟩

⟨

y, z

⟩

for all z =⇒ x = y

5. (Cauchy–Schwarz inequality)

|⟨

x, y

⟩|

≤

||||

with equality

if and only if x, y are parallel

6. (Triangle Inequality)

x + y

≤

with equality if and

only if x, y are parallel and point in the same direction

x + y

Triangle inequality

Be careful with notation:

means the absolute value/modulus (of a scalar), while

means the norm

(of a vector).

In the real case, Cauchy–Schwarz allows us to deﬁne angle via cos θ =

⟨

x,y

⟩

||||

, since the right-hand

side lies in the interval [−1, 1]. However, except in Euclidean R

and R

, this notion is of limited use;

orthogonality (

⟨

x, y

⟩

= 0) and orthonormality are usually all we care about.

Proof. Parts 1–3 are exercises. For simplicity, we prove 5 and 6 only when F = R.

4. Let z = x −y, apply the linearity condition and part 2:

⟨

x, z

⟩

⟨

y, z

⟩

=⇒ 0 =

⟨

x −y, z

⟩

⟨

x −y, x −y

⟩

x −y

=⇒ x = y

5. If y = 0, the result is trivial. WLOG (and by part 3) we may assume

= 1; if the inequality

holds for this, then it holds for all non-zero y by parts 1 and 4. Now expand:

0 ≤

x −

⟨

x, y

⟩

(positive-deﬁniteness)

|⟨

x, y

⟩|

−

⟨

x, y

⟩⟨

x, y

⟩

−

⟨

x, y

⟩⟨

y, x

⟩

(bilinearity)

−

|⟨

x, y

⟩|

(symmetry)

Taking square-roots establishes the inequality.

By part 2, equality holds if and only if x =

⟨

x, y

⟩

y, which is precisely when x, y are linearly

dependent.

6. We establish the squared result.





−

x + y

+ 2

||||

−

⟨

x, y

⟩

−

⟨

y, x

⟩

−

= 2



||||

−

⟨

x, y

⟩



≥ 2



||||

−

|⟨

x, y

⟩|



≥ 0 (Cauchy–Schwarz)

Equality requires both equality in Cauchy–Schwarz (x, y parallel) and that

⟨

x, y

⟩

≥ 0; since x, y

are already parallel, this means that one is a non-negative multiple of the other.

Complex Inner Product Spaces

Deﬁnition 2.1 is already set up nicely when C = F. One subtle difference comes from how we expand

linear combinations in the second slot.

Lemma 2.6. An inner product is a positive-deﬁnite, conjugate-symmetric, sesquilinear

form: it is

conjugate-linear (anti-linear) in the second slot,

⟨

z, λx + y

⟩

= λ

⟨

z, x

⟩

⟨

z, y

⟩

The proof is very easy if you remember your complex conjugates; try it!

Warning! If you dabble in the dark arts of Physics, be aware that their convention

is for an inner

product to be conjugate-linear in the ﬁrst entry and linear in the second!

Deﬁnition 2.7. The standard (Hermitian) inner product and norm on C

are

⟨

x, y

⟩

= y

∗

x =

∑

j=1

= x

+ ··· + x

∑

j=1



where y

∗

= y

is the conjugate-transpose of y and



is the modulus.

Weighted inner products may be deﬁned as on page 16:

⟨

x, y

⟩

∑

j=1

= a

+ ··· + a

Note that the weights a

must still be positive real numbers. We may similarly deﬁne inner products in

terms of positive-deﬁnite matrices

⟨

x, y

⟩

= y

∗

Ax.

Deﬁnition 2.8. A matrix A ∈ M

(C ) is Hermitian (self-adjoint) if A

∗

= A. It is moreover positive-

deﬁnite if x

∗

Ax > 0 for all non-zero x ∈ C

The self-adjoint condition reduces to symmetry (A

= A) if A is a real matrix.

Example 2.9. It can be seen (Exercise 6) that A =



3 −i

i 3



is positive-deﬁnite, whence

⟨

x, y

⟩







3 −i

i 3





= 3x

+ ix

−ix

+ 3x

deﬁnes an inner product on C

Almost all results in this chapter will be written for general inner product spaces, thus covering the

real and complex cases simultaneously. If you don’t feel conﬁdent with complex numbers, simply

let F = R and delete all complex conjugates at ﬁrst read! Very occasionally a different proof will be

required depending on the ﬁeld. For simplicity, examples will more often use real inner products.

The preﬁx sesqui- means one-and-a-half; for instance a sesquicentenary is a 150 year anniversary.

The common Physics notation relates to ours via

⟨

x |y

⟩

⟨

y, x

⟩

Further Examples

As before, the ﬁeld F must be either R or C.

Deﬁnition 2.10 (Frobenius inner product). If A, B ∈ M

m×n

(F ), deﬁne

⟨

A, B

⟩

= tr(B

∗

where tr is the trace of an n ×n matrix; this makes M

m×n

(F ) into an inner product space.

This isn’t really a new example: if we map A → F

m×n

by stacking the columns of A, then the

Frobenius inner product is the standard inner product in disguise.

Example 2.11. In M

3×2

(C ),





1 i

2 −i 0

0 −1









0 7

1 2i

3 −2i 4





= tr



0 1 3 + 2i

7 −2i 4







1 i

2 −i 0

0 −1





= tr



2 −i −3 −2i

9 −4i −4 + 7i



= 2 −i −4 + 7i = −2 −8i







1 i

2 −i 0

0 −1







= tr



1 2 + i 0

−i 0 −1







1 i

2 −i 0

0 −1





= tr



6 i

−i 2



= 8

Deﬁnition 2.12 (L

inner product). Given a real interval [a, b], the function

⟨

f , g

⟩

f (t)g(t) dt

deﬁnes an inner product on the space C[a, b] of continuous functions f : [a, b] → F.

With careful restriction, this works even for inﬁnite intervals and a larger class of functions.

Veri-

fying the required properties is straightforward if you know a little analysis; for instance continuity

allows us to conclude

f (x)

dx = 0 ⇐⇒ f (x) ≡ 0

This is our ﬁrst example of an inﬁnite-dimensional inner product space.

Example 2.13. Let f (x) = x and g(x) = x

; these lie in the inner product space C[−1, 1] with

respect to the L

inner product.

⟨

f , g

⟩

−1

dx = 0,

−1

dx =

−1

dx =

With some simple scaling, we see that

f ,





forms an orthonormal set.

For us, functions will always be continuous (often polynomials) on closed bounded intervals. The square-integrable

functions and L

-spaces for which the inner product is named are a more complicated business and beyond this course.

Deﬁnition 2.14 (ℓ

inner product). A sequence (x

) is square-summable if

∞

∑

n=1

< ∞. These se-

quences form a vector space on which we can deﬁne an inner product

⟨

), (y

)

⟩

∞

∑

n=1

In essence we’ve taken the standard inner product on F

and let n → ∞! This example, and its L

cousin, are the prototypical Hilbert spaces, which have great application to differential equations, sig-

nal processing, etc. Since a rigorous discussion requires a signiﬁcant amount of analysis (convergence

of series, completeness, integrability), these objects are generally beyond the course.

Our ﬁnal example of an inner product is a useful, and hopefully obvious, hack to which we shall

repeatedly appeal in examples.

Lemma 2.15. Let V be a vector space over R or C. If β is a basis of V, then there exists exactly one

inner product on V for which β is an orthonormal set.

Exercises 2.1 1. Evaluate the inner product of the given vectors.

(a) x =





, y =



−1



where

⟨

x, y

⟩

= 2x

+ 3x

+ x

(b) x =





, y =





where

⟨

x, y

⟩

is the standard Hermitian inner product on C





, y =





where

⟨

x, y

⟩

= y

∗



2 −i

i 2



(d) f (x) = x −1, g(x) = x + 1 where

⟨

f , g

⟩

is the L

inner product on C[0, 2]

2. Suppose

⟨

x, y

⟩

⟨

x, z

⟩

for all x. Prove that y = z.

3. For each z ∈ V, The linearity condition says that the map T

: V → F deﬁned by

( x) =

⟨

x, z

⟩

is linear (T

is an element of the dual space V

∗

). What, if anything, can you say about the function

: x 7→

⟨

z, x

⟩

4. Deﬁne

⟨

x, y

⟩

∑

j=1

on C

. Is this an inner product? Which of the properties (a), (b), (c)

from Deﬁnition 2.1 does it satisfy?

5. (a) Verify that the matrix in Example 2.4 is positive-deﬁnite and that

⟨

x, y

⟩

= 3x

+ x

therefore deﬁnes an inner product on R

(Hint: Try to write

as a sum of squares. . . )

(b) Let x =





. With respect to the inner product in part (a), ﬁnd a non-zero unit vector y

which is orthogonal to x.

6. By multiplying out

−ix

, show that the matrix



3 −i

i 3



is positive-deﬁnite.

(Hint: recall that

= aa for complex numbers!!!)

Neither of these facts are obvious; in particular, we’d need to see that the sum of two square-summable sequences is

also square-summable.

7. Show that every eigenvalue of a positive deﬁnite matrix is positive.

8. Prove parts 1, 2 and 3 of Lemma 2.5.

9. Let V be an inner product space, prove Pythagoras’ Theorem:

⟨

x, y

⟩

= 0, then

x + y

10. Use basic algebra to prove the Cauchy–Schwarz inequality for vectors x =

(

)

and y =

(

)

with the standard (dot) product.

11. Prove the Cauchy–Schwarz and triangle inequalities for a complex inner product space. What

has to change compared to the proof of Lemma 2.5?

12. Prove the polarization identities:

(a) In any real inner product space:

⟨

x, y

⟩

x + y

−

x −y

(b) In any complex inner product space:

⟨

x, y

⟩

∑

k=1



x + i



If you know the length of every vector, then you know the inner product!

13. Prove that

√

x+1

dx ≤

√

(Hint: use Cauchy–Schwarz)

14. Let m ∈ Z and consider the complex-valued function f

(x) =

√

2π

imx

. If

⟨

⟩

is the L

inner

product on C[−π, π], prove that {f

: m ∈ Z} is an orthonormal set.

This example is central to the study of Fourier series.

(Hint: If complex functions are scary, use Euler’s formula e

imx

= cos mx + i sin mx and work with the

real-valued functions cos mx and sin mx. The difﬁculty is that you then need integration by parts. . . )

15. Let

⟨

⟩

be an inner product on F

(recall that F = R or C). Deﬁne the matrix A ∈ M

(F ) by



, e



where {e

, . . . , e

} is the standard basis. Verify that A is the matrix of the inner

product:

∀x, y ∈ F

⟨

x, y

⟩

= y

∗

In particular,

• A is a Hermitian/self-adjoint matrix: A

∗

= A (if F = R this is simply symmetric);

• A is positive-deﬁnite: for all x ∈ F

, x

∗

Ax > 0;

More generally, if β = {v

, . . . , v

} is a basis then A



, v



deﬁnes the matrix of the inner

product with respect to β:

⟨

x, y

⟩

= [y]

∗

A[x]

2.2 Orthogonal Sets and the Gram–Schmidt Process

We start with a simple deﬁnition, relating subspaces to orthogonality.

Deﬁnition 2.16. Let U be a subspace of an inner product space V. The orthogonal complement U

⊥

the set

⊥

= {x ∈ V : ∀u ∈ U,

⟨

x, u

⟩

= 0}

It is easy to check that U

⊥

is itself a subspace of V and that U ∩ U

⊥

= {0}. It can moreover be seen

that U ⊆ (U

⊥

)

⊥

, though equality need not hold in inﬁnite dimensions (see Exercise 7).

Example 2.17. U = Span

n





−1

o

≤ R

has orthogonal complement U

⊥

= Span

n

o

As in the example, we often have a direct sum decomposition

V = U ⊕U

⊥

: otherwise said,

∀x ∈ V, ∃ unique u ∈ U, w ∈ U

⊥

such that x = u + w

In such a case, we’ll call u = π

( x) and w = π

⊥

( x) the or-

thogonal projections of x onto U and U

⊥

respectively. As our

ﬁrst result shows, these are easy to compute whenever U has

a ﬁnite orthogonal basis.

Theorem 2.18. Let V be an inner product space and let U = Span β where β = {u

, . . . , u

} is an

orthogonal set of non-zero vectors;



, u



(

0 if j = k



= 0 if j = k

Then:

1. β is a basis of U and each x ∈ U has unique representation

x =

∑

j=1



x, u





(∗)

This simpliﬁes to x =

∑



x, u



if β is an orthonormal set.

2. V = U ⊕U

⊥

. For any x ∈ V, we may write x = u + w where

u =

∑

j=1



x, u





∈ U and w = x −u ∈ U

⊥

Observe that (∗) essentially calculates the co-ordinate vector [x]

∈ F

. Recalling how unpleasant

such calculations have been in the past, often requiring large matrix inversions, we immediately see

the power of inner products and orthogonal bases.

Proof. 1. Since β spans U, a given x ∈ U may be written

x =

∑

k=1

for some scalars a

. The orthogonality of β recovers the required expression for a



x, u



∑

k=1



, u



= a



Finally, let x = 0 to see that β is linearly independent.

2. Clearly u ∈ U. For each u

, the orthogonality of β tells us that

⟨

w, u

⟩

⟨

x −u, u

⟩

⟨

x, u

⟩

−

∑

j=1



x, u







, u



⟨

x, u

⟩

−

⟨

x, u

⟩

= 0

Since w is orthogonal to a basis of U it is orthogonal to any element of U; we conclude that

w ∈ U

⊥

. Finally, U ∩ U

⊥

= {0} forces the uniqueness of the decomposition x = u + w,

whence V = U ⊕U

⊥

Examples 2.19. 1. Consider the standard orthonormal basis β = {e

, e

} of R

. For any x =

(

)

we easily check that

∑

j=1



x, e



= x

+ x

= x

2. In R

, β = {u

, u

} =

n





−1





−5

o

is an orthogonal set and thus a basis. We

compute the co-ordinates of x =





with respect to β:

x =

∑

j=1



x, u





7 + 8 + 6

1 + 4 + 9









14 −4 + 0

4 + 1 + 0





−1





21 + 24 −10

9 + 36 + 25





−5













+ 2





−1









−5





=⇒ [x]





3/2

1/2





Compare this with the painfully slow augmented matrix method for ﬁnding co-ordinates!

3. Revisiting Example 2.17, let x =





. Since β =

n





−1

o

is an orthogonal basis of U, we

observe that

( x) =













−1









−1





, π

⊥

( x) = x −π

( x) =









are the orthogonal projections corresponding to R

= U ⊕U

⊥

The Gram–Schmidt Process

Theorem 2.18 tells us how to compute the orthogonal projections corresponding to V = U ⊕ U

⊥

provided U has a ﬁnite, orthogonal basis. Given how useful such bases are, our next goal is to see that

such exist for any ﬁnite-dimensional subspace. Helpfully there exists a constructive algorithm.

Theorem 2.20 (Gram–Schmidt). Suppose S = {s

, . . . , s

} is a linearly independent subset of an

inner product space V. Construct a sequence of vectors u

, . . . , u

inductively:

• Choose u

= a

where a

= 0

• For each k ≥ 2, choose u

= a

−

k−1

∑

j=1



, u





where a

= 0 (†)

Then β := {u

, . . . , u

} is an orthogonal basis of Span S.

The purpose of the scalars a

is to give you some freedom; choose them to avoid unpleasant fractions!

If you want a set of orthonormal vectors, it is easier to scale everything after the algorithm is complete.

Indeed, by taking S to be a basis of V and normalizing the resulting β, we conclude:

Corollary 2.21. Every ﬁnite-dimensional inner product space has an orthonormal basis.

Example 2.22. S = {s

, s

} =

n





−1





o

is a linearly independent subset of R

1. Choose u

= s





=⇒

= 1

2. s

−

⟨

⟩



−1



−







−1



: choose u



−1



=⇒

= 10

3. s

−

⟨

⟩

−

⟨

⟩





−





−



−1







: choose u





The orthogonality of β = {u

, u

} is clear. It is now trivial to observe that

√

an orthonormal basis of R

Proof of Theorem 2.20. For each k ≤ n, deﬁne S

= {s

, . . . , s

} and β

= {u

, . . . , u

}. We prove by

induction that each β

is an orthogonal set of non-zero vectors and that Span β

= Span S

. The

Theorem is then the terminal case k = n.

(Base case k = 1) Certainly β

= {a

} is an orthogonal set and Span β

= Span S

(Induction step) Fix k ≥ 2, assume β

k−1

is an orthogonal non-zero set and that Span β

k−1

Span S

k−1

. By Theorem 2.18, u

∈ (Span β

k−1

)

⊥

. We also see that u

= 0, for if not,

( †) =⇒ s

∈ Span β

k−1

= Span S

k−1

and S would be linearly dependent. It follows that β

is an orthogonal set of non-zero vectors.

Moreover, s

∈ Span β

=⇒ Span S

≤ Span β

. Since these spaces have the same (ﬁnite) dimension

k, we conclude that Span β

= Span S

By induction, β is an orthogonal, non-zero spanning set for Span S; by Theorem 2.18, it is a basis.

Example 2.23. This time we work in the space of polynomials P(R) equipped with the L

inner

product

⟨

f , g

⟩

f (x)g(x) dx on the interval [0, 1]. Let S = {1, x, x

} and apply the algorithm:

1. Choose f

(x) = 1 =⇒

1 dx = 1

2. x −

⟨

x, f

⟩

= x −

x dx = x −

We choose f

(x) = 2x −1, with

(

2x −1

)

dx =

3. x

−

⟨

, f

⟩

−

⟨

, f

⟩

= x

−

dx −

(

2x−1

)

1/3

(

2x −1

)

= x

− x +

We choose f

(x) = 6x

−6x + 1 with



−6x + 1



dx =

It follows that Span S has an orthonormal basis

β =

√

3( 2x −1),

√

5( 6x

−6x + 1)

This example can be extended to arbitrary degree since the countable set {1, x, x

, . . .} is basis of

P(R). Indeed this shows that (P(R),

⟨

⟩

) has an orthonormal basis.

Gram–Schmidt also shows that our earlier discussion of orthogonal projections is generic.

Corollary 2.24. If U is a ﬁnite-dimensional subspace of an inner product space V, then V = U ⊕U

⊥

and the orthogonal projections may be computed as in Theorem 2.18.

If U is an inﬁnite-dimensional subspace of V, then we need not have V = U ⊕ U

⊥

and the orthog-

onal projections might not be well-deﬁned (see, for example, Exercises 7 and 8). Instead, if β is an

orthonormal basis of U, it is common to describe the coefﬁcients

⟨

x, u

⟩

for each u ∈ β as the Fourier

coefﬁcients, and the inﬁnite sum

∑

u∈β

⟨

x, u

⟩

as the Fourier series of x, provided the sum converges.

Exercises 2.2 1. Apply Gram–Schmidt to obtain an orthogonal basis β for Span S. Then obtain the

co-ordinate representation (Fourier coefﬁcients) of the given vector with respect to β.

(a) S =

n









o

⊆ R

, x =





(b) S =



3 5

−1 1





−1 9

5 −1





7 −17

2 −6



⊆ M

(R ), X =



−1 27

−4 8



(use the Frobenius product)

n









o

⊆ C

with x =





(d) S = {1, x, x

} with

⟨

f , g

⟩

−1

f (x)g(x) dx and f (x) = x

Important! You’ll likely need much more practice than this to get comfortable with Gram–

Schmidt; make up your own problems!

2. Let S = {s

, s

} =

n





−1

o

and U = Span S ≤ R

. Find π

( x) if x =



−1

−2



(Hint: First apply Gram–Schmidt)

3. Find the orthogonal complement to U = Span{x

} ≤ P

(R ) with respect to the inner product

⟨

f , g

⟩

f (t)g(t) dt

4. Let T ∈ L(V, W) where V, W are inner product spaces with orthonormal bases β = {v

, . . . , v

}

and γ = {w

, . . . , w

} respectively. Prove that the matrix A = [T]

∈ M

m×n

(F ) of T with

respect to these bases has jk

entry



T(v

), w



5. Suppose that β is an orthonormal basis of an n-dimensional inner product space V. Prove that,

∀x, y ∈ V,

⟨

x, y

⟩

= [y]

∗

[x]

Otherwise said, the co-ordinate isomorphism ϕ

: V → F

deﬁned by ϕ

( x) = [x]

is an isomorphism

of inner product spaces where we use the standard inner product on F

6. Let U be a subspace of an inner product space V. Prove the following:

(a) U

⊥

is a subspace of V.

(b) U ∩U

⊥

= {0}

⊥

)

⊥

(d) If V = U ⊕U

⊥

, then U = (U

⊥

)

⊥

(this is always the case when dim U < ∞)

7. Let ℓ

be the set of square-summable sequences of real numbers (Deﬁnition 2.14). Consider the

sequences u

, u

, . . ., where u

is the zero sequence except for a single 1 in the j

entry. For

instance,

= (0, 0, 0, 1, 0, 0, 0, 0, . . .)

(a) Let U = Span{u

: j ∈ N}. Prove that U

⊥

contains only the zero sequence.

(b) Show that the sequence y =





lies in ℓ

, but does not lie in U.

U is therefore a proper subset of (U

⊥

)

⊥

= ℓ

and ℓ

= U ⊕U

⊥

8. Recall Exercise 2.1.14 where we saw that the set β = {

√

2π

imx

: m ∈ Z} is orthonormal with

respect to

⟨

f , g

⟩

−π

f (t)g(t) dt.

(a) Show that the Fourier series of f (x) = x is

F(x) :=

∞

∑

m=−∞

√

2π

imx

√

2π

imx

∞

∑

m=1

2(−1)

m+1

sin mx

(b) Brieﬂy explain why the Fourier series is not an element of Span β.

extended to R, how they approximate a discontinuous periodic function.

9. (Hard) Much of Theorem 2.18 remains true, with suitable modiﬁcations, even if β is an inﬁnite

set. Restate and prove as much as you can, and identify the false part(s).

2.3 The Adjoint of a Linear Operator

Recall how the standard inner product on F

may be written in terms of the conjugate-transpose

⟨

x, y

⟩

= y

∗

x = y

We start by inserting a matrix into this expression and interpreting in two different ways. Suppose

A ∈ M

m×n

(F ), v ∈ F

and w ∈ F

, then

⟨

∗

w, v

⟩

| {z }

in F

= v

∗

w) = ( v

∗

) w = (Av)

∗

w =

⟨

w, Av

⟩

| {z }

in F

(†)

Example 2.25. As a sanity check, let A =



1 2

0 3



∈ M

(R ), w =

(

)

and v =





. Then,













1 0

2 3











2x + 3y







= xp + (2x + 3y)q





, A











1 2

0 3











p + 2q



= x(p + 2q) + 3yq

Note how the inner products are evaluated on different spaces. At the level of linear maps this is a

relationship between L

∈ L(F

, F

) and L

∗

∈ L(F

, F

), one that is easily generalizable.

Deﬁnition 2.26. Let T ∈ L(V, W) where V, W are inner product spaces over the same ﬁeld F. The

adjoint of T is a function T

∗

: W → V (read ‘T-star’) satisfying

∀v ∈ V, w ∈ W,

⟨

∗

( w), v

⟩

⟨

w, T(v)

⟩

Note that the ﬁrst inner product is computed within V and the second within W.

The adjoint effectively extends the conjugate-transpose to linear maps. We now use the same notation

for three objects, so be careful!

• If A is a real or complex matrix, then A

∗

= A

is its conjugate-transpose.

• If T is a linear map, then T

∗

is its adjoint.

• If V is a vector space, then V

∗

= L(V, F) is its dual space.

Thankfully the two notations line up nicely, as part 3 of our ﬁrst result shows.

Theorem 2.27 (Basic Properties). 1. If an adjoint exists,

then it is unique and linear.

2. If T and S have adjoints, then

∗

)

∗

= T, (TS)

∗

= S

∗

, (λT + S)

∗

= λT

∗

+ S

∗

3. Suppose V, W are ﬁnite-dimensional with orthonormal bases β, γ respectively. Then the matrix

of the adjoint of T ∈ L(V, W) is the conjugate-transpose of the original: [T

∗

]

= ([T]

)

∗

Existence of adjoints is trickier, so we postpone this a little: see Corollary 2.34 and Exercise 12.

Proof. 1. (Uniqueness) Suppose T

∗

and S

∗

are adjoints of T. Then

⟨

∗

( x), y

⟩

⟨

x, T(y)

⟩

⟨

∗

( x), y

⟩

Since this holds for all y, Lemma 2.5 part 4 says that ∀x, T

∗

( x) = S

∗

( x), whence T

∗

= S

∗

(Linearity) Simply translate across, use the linearity of T, and again appeal to Lemma 2.5:

∀z,

⟨

∗

( λx + y), z

⟩

⟨

λx + y, T(z)

⟩

= λ

⟨

x, T(z)

⟩

⟨

y, T(z)

⟩

= λ

⟨

∗

( x), z

⟩

⟨

∗

( y), z

⟩

⟨

λT

∗

( x) + T

∗

( y), z

⟩

=⇒ T

∗

( λx + y) = λT

∗

( x) + T

∗

( y)

2. These may be proved similarly to part 1 and are left as an exercise.

3. By Exercise 2.2.4, the jk

entry of [T

∗

]



∗

( w

), v





, T(v

)





T(v

), w



= A

We revisit our motivating set-up (†) in the language of part 3. Suppose:

• V = F

and W = F

have standard orthonormal bases β = {e

, . . . , e

} and γ = {e

, . . . , e

• T = L

∈ L(F

, F

Since the matrix of T with respect to the standard bases is simply A itself, the theorem conﬁrms our

earlier observation that the adjoint of L

is left multiplication by the conjugate-transpose A

∗

]

= ([T]

)

∗

= A

∗

= [L

∗

]

=⇒ T

∗

= (L

)

∗

= L

∗

Here is another straightforward example, this time using the standard inner products on C

and C

Example 2.28. Let T = L

∈ L(C

, C

) where A =



−i 1 −3

2 1−i 4+2i



Plainly A = [T]

with respect to β = {e

, e

} and γ = {e

, e

}. We conclude that T

∗

= L

∗

]

= A

∗





i 2

1 1 + i

−3 4 −2i





As a sanity check, multiply out a few examples of

⟨

∗

w, v

⟩

⟨

w, Av

⟩

; make sure you’re comfortable

with the fact that the left inner product is on C

and the right on C

The Theorem tells us that every linear map T ∈ L(V, W) between ﬁnite-dimensional spaces has an

adjoint and moreover how to compute it:

1. Choose orthonormal bases (exist by Corollary 2.21) and ﬁnd the matrix [T]

2. Take the conjugate-transpose



[T]



∗

and translate back to ﬁnd T

∗

∈ L(W, V) .

The prospect of twice applying Gram–Schmidt and translating between linear maps and their ma-

trices is unappealing; calculating this way can quickly become an enormous mess! In practice, it is

often better to try a modiﬁed approach; see for instance part 2(b) of the next Example.

Examples 2.29. Let T =

∈ L(P

(R )) be the derivative operator; T(a + bx) = b. We treat P

(R )

as an inner product space in two ways.

1. Equip the inner product for which the standard basis ϵ = {1, x} is orthonormal. Then

[T]



0 1

0 0



=⇒ [T

∗

]



0 0

1 0



=⇒ T

∗

(a + bx) = ax

2. Equip the L

inner product

⟨

f , g

⟩

f (x)g(x) dx. As we saw in Example 2.23, the basis

β = {f

, f

} = {1, 2x − 1} is orthogonal with

= 1 and

√

. We compute the

adjoint of T =

in two different ways.

(a) The basis γ = {g

, g

} =

√

3 f

√

3( 2x −1)

is orthonormal. Observe that

T(g

) = 0, T(g

) = 2

√

3 =⇒ [T]



0 2

√

0 0



=⇒ [T

∗

]



0 0

√

3 0



=⇒ T

∗

(a + bx) = T

∗



a +

√

3( 2x −1)





a +



·2

√

= 3(2a + b)(2x −1)

(b) Use the orthogonal basis β and the projection formula (Theorem 2.18). With p(x) = a + bx,

∗

(p) =

⟨

∗

(p), f

⟩

⟨

∗

(p), f

⟩

⟨

p, T( 1)

⟩

⟨

p, T( 2x −1)

⟩

·3(2x −1)

⟨

p, 0

⟩

+ 3

⟨

p, 2

⟩

(2x −1) = 3



2(a + bx) dx



(2x −1)

= 3(2a + b)(2x −1)

Note the advantages here: no square roots and no need to change basis at the end!

The calculations for the second example were much nastier, even though we were already in posses-

sion of an orthogonal basis. The crucial point is that the two examples produce different maps T

∗

: the

adjoint depends on the inner product!

Why should we care about adjoints?

Adjoints might seem merely to be an abstraction of something simple (transposes) for its own sake.

A convincing explanation of why adjoints are useful takes a lot of work; here is a short version.

Given a linear map T ∈ L(V) on an inner product space, we now have two desirable types of basis.

1. Eigenbasis: diagonalizes T.

2. Orthonormal basis: recall (Theorem 2.18) how these simplify computations.

The capstone result of this course is the famous spectral theorem (Theorem 2.37) which says, in short,

that self-adjoint operators (T

∗

= T) have an orthonormal eigenbasis, the holy grail of easy computation!

Such operators are important both theoretically and in applications such as quantum mechanics.

The Fundamental Subspaces Theorem

To every linear map are associated its range and nullspace. These interact nicely with the adjoint. . .

Theorem 2.30. If T ∈ L(V, W) has adjoint T

∗

, then,

1. R(T

∗

)

⊥

= N(T)

2. If V is ﬁnite dimensional, then R(T

∗

) = N(T)

⊥

The corresponding results hold if we swap V ↔ W and T ↔ T

∗

The proof is left to Exercise 6. You’ve likely observed this with transposes of small matrices.

Example 2.31. Let A =



1 2 −1

0 3 −2



. Viewed as a linear map between Euclidean spaces, T = L

has

adjoint T

∗

= L

. It is easy to compute the relevant subspaces:

R(A) = R

, N(A

) = {0}, R(A

) = Span

n

−1





−2

o

, N(A) = Span



−1



The Riesz Representation Theorem

This powerful result demonstrates a natural relation between an inner product space and its dual-

space V

∗

= L(V, F).

Theorem 2.32. If V is ﬁnite-dimensional and g : V → F is linear, then there exists a unique y ∈ V

such that g(x) =

⟨

x, y

⟩

Example 2.33. g(p) :=

p(x) dx is a linear map g : P

(R ) → R. Equip P

(R ) with the inner

product for which the standard basis {1, x, x

} is orthonormal. Then

g(a + bx + cx

) = a +

b +

c =



a + bx + cx

, 1 +

x +



We conclude that g(p) =

⟨

p, q

⟩

, where q(x) = 1 +

x +

The idea of the proof is very simple: if g(x) =

⟨

x, y

⟩

then the nullspace of g must equal Span{y}

⊥

. . .

Proof. If g is the zero map, take y = 0. Otherwise, rank g = 1 and

V = N(g) ⊕ N(g)

⊥

where dim N(g)

⊥

= 1 (rank–nullity theorem and Exercise 2.2.6)

Let u ∈ N(g)

⊥

be either of the two unit vectors and deﬁne, independently of u,

y := g(u)u ∈ V

Following the decomposition, write x = n + αu where n ∈ N(g) and observe that

⟨

x, y

⟩

n + αu, g(u)u

= g(u)α = 0 + g(αu) = g(n + αu) = g(x)

The uniqueness of y follows from the cancellation property (Lemma 2.5, part 4).

Due to the tight correspondence, the map is often decorated as g

. Riesz’s theorem indeed says that

y 7→ g

is an isomorphism V

∼

∗

. While there are inﬁnitely many isomorphisms between these

spaces, the inner product structure identiﬁes a canonical or preferred choice.

Corollary 2.34. Every linear map on a ﬁnite-dimensional inner product space has an adjoint.

Note how only the domain is required to be ﬁnite-dimensional! Riesz’s Theorem and the Corollary

also apply to continuous linear operators on (inﬁnite-dimensional) Hilbert spaces, though the proof

is a little trickier.

Proof. Let T ∈ L(V, W) where dim V < ∞, and suppose z ∈ W is given. Simply deﬁne T

∗

( z) := y

where y ∈ V is the unique vector in Riesz’s Theorem arising from the linear map

g : V → F, g(x) =

⟨

T(x), z

⟩

and check the required property:

⟨

∗

( z), x

⟩

⟨

y, x

⟩

= g(x) =

⟨

z, T(x)

⟩

Exercises 2.3 1. For each inner product space V and linear operator T ∈ L(V), evaluate T

∗

on the

given vector.

(a) V = R

with the standard inner product, T

(

)



2x+y

x−3y



and x =





(b) V = C

with the standard inner product, T

(

)



2z+iw

(1−i)z



and x =



3−i

1+2i



(R ) with

⟨

f , g

⟩

f (t)g(t) dt, T( f ) = f

′

+ 3f and f (t) = 4 −2t

2. Suppose A =



1 1

4 3



and consider the linear map T = L

∈ L(R

) where R

is equipped with

the weighted inner product

⟨

x, y

⟩

= 4x

+ x

(a) Find the matrix of T with respect to the orthonormal basis β = {v

, v

} = {

, e

(b) Find the adjoint T

∗

and its matrix with respect to the standard basis ϵ = {e

, e

(Hint: the answer isn’t A

3. Extending Examples 2.29, ﬁnd the adjoint of T =

∈ L(P

(R )) with respect to:

(a) The inner product where the standard basis ϵ = {1, x, x

} is orthonormal.

(b) (Hard!) The L

inner product

f (x)g(x) dx.

4. Let T( f ) = f

′′

be a linear transformation of P

(R ) and let ϵ = {1, x, x

} be the standard basis.

Find T

∗

(a + bx + cx

(a) With respect to the inner product where ϵ is orthonormal;

(b) With respect to the L

inner product

⟨

f , g

⟩

−1

f (t)g(t) dt.

(Hint: {1, x, 3x

−1} is orthogonal)

5. Prove part 2 of Theorem 2.27.

6. Prove the Fundamental Subspaces Theorem 2.30.

7. For each inner product space V and linear transformation g : V → F, ﬁnd a vector y ∈ V such

that g(x) =

⟨

x, y

⟩

for all x ∈ V.

(a) V = R

with the standard inner product, and g





= x −2y + 4z

(b) V = C

with the standard inner product, and g

(

)

= iz −2w

(R ) with the L

inner product

⟨

f , h

⟩

f (x)h(x) dx, and g( f ) = f

′

(1)

8. (a) In the proof of Theorem 2.32, explain why y depends only on g (not u).

(b) In the proof of Corollary 2.34, check that g(x) :=

⟨

T(x), z

⟩

is linear.

9. Let y, z ∈ V be ﬁxed vectors and deﬁne T ∈ L(V) by T(x) =

⟨

x, y

⟩

z. Show that T

∗

exists and

ﬁnd an explicit expression.

10. Suppose A ∈ M

m×n

(F ). Prove that A

∗

A is diagonal if and only if the columns of A are orthog-

onal. What additionally would it mean if A

∗

A = I?

11. Suppose T ∈ L(V) where V is a ﬁnite-dimensional inner product space.

(a) Prove that the eigenvalues of T

∗

are the complex conjugates of those of T.

(Hint: relate the characteristic polynomial p

∗

( t) = det(T

∗

−tI) to that of T)

(b) Prove that T

∗

is diagonalizable if and only if T is.

12. (Hard) We present two linear maps which do not have an adjoint!

(a) Since ϵ = {1, x, x

, . . .} is a basis of P(R), we may deﬁne a linear map T ∈ L(P(R)) via

T(x

) = 1 for all n; for instance

T( 4 + 3x + 2x

) = 9

Let

⟨

⟩

be the inner product for which ϵ is orthonormal. If T

∗

existed, show that

∗

(1) =

∞

∑

n=0

would be an inﬁnite series: T

∗

therefore does not exist.

(b) For a related challenge, recall the space ℓ

of square-summable real sequences. For any

sequence (x

)

∞

n=1

∈ ℓ

), deﬁne T ∈ L(ℓ

) via

T(x

) =

∞

∑

n=1

, 0, 0, 0, 0, 0, . . .

Find the adjoint T

∗

. If V ≤ ℓ

is the subspace whose elements have only ﬁnitely many

non-zero terms, show that the restriction T

does not have an adjoint.

2.4 Normal & Self-Adjoint Operators and the Spectral Theorem

We now come to the fundamental question of this chapter: for which linear operators T ∈ L( V) can

we ﬁnd an orthonormal eigenbasis? Many linear maps are, of course, not even diagonalizable, so in

general this is far too much to hope for! Let’s see what happens if such a basis exists. . .

If β is an orthonormal basis of eigenvectors of T, then

[T]

= diag(λ

, ··· , λ

) =⇒ [T

∗

]

= diag(λ

, ··· , λ

)

If V is a real inner product space, then these matrices are identical and so T

∗

= T. In the complex

case, we instead observe that

[TT

∗

]

= diag(

, ··· ,

) = [T

∗

=⇒ TT

∗

= T

∗

Deﬁnition 2.35. Suppose T is a linear operator on an inner product space V and assume T has an

adjoint. We say that T is,

Normal if TT

∗

= T

∗

Self-adjoint if T

∗

= T.

The deﬁnitions for square matrices over R and C are identical, where

∗

now denotes the conjugate-

transpose.

A real self-adjoint matrix A ∈ M

(R ) is plainly symmetric: A

= A. A complex self-adjoint matrix is

also called Hermitian: A

∗

= A.

If T is self-adjoint then it is certainly normal, but the converse is false as the next example shows.

Example 2.36. The (non-symmetric) real matrix A =



2 −1

1 2



is normal but not self-adjoint:



2 −1

1 2



2 1

−1 2





5 0

0 5





2 1

−1 2



2 −1

1 2



= A

More generally, every non-zero skew-hermitian matrix A

∗

= −A is normal but not self-adjoint:

∗

= −A =⇒ AA

∗

= −A

= A

∗

We saw above that linear maps with an orthonormal eigenbasis are either self-adjoint or normal

depending whether the inner product space is real or complex. Amazingly, this provides a complete

characterisation of such maps!

Theorem 2.37 (Spectral Theorem, version 1). Let T be a linear operator on a ﬁnite-dimensional

inner product space V.

Complex case: V has an orthonormal basis of eigenvectors of T if and only if T is normal.

Real case: V has an orthonormal basis of eigenvectors of T if and only if T is self-adjoint.

The theorem gets is name from the spectrum (set of eigenvalues) of T.

Examples 2.38. 1. We diagonalize the self-adjoint linear map T = L

∈ L(R

) where A =



6 3

3 −2



Characteristic polynomial p(t) = (6 −t)(−2 −t) −9 = t

−4t −21 = (t −7)(t + 3)

Eigenvalues λ

= 7, λ

= −3

Eigenvectors (normalized) w

√





, w

√



−1



The basis β = {w

, w

} is orthonormal, with respect to which [T]



7 0

0 −3



is diagonal.

2. The map T = L

∈ L(R

) where A =



1 3

0 −2



is neither self-adjoint nor normal:

∗



1 3

0 −2



1 0

3 −2





10 −6

−6 4



=



1 3

3 13





1 0

3 −2



1 3

0 −2



= A

∗

It is diagonalizable, indeed

γ =







−1



⇝ [T]



1 0

0 −2



In accordance with the spectral theorem, γ is not orthogonal.

3. Let A =



0 −1

1 0



and consider T = L

acting on both C

and R

. Since T is normal but not

self-adjoint, we’ll see how the ﬁeld really matters in the spectral theorem.

First the complex case: T ∈ L(C

) is normal and thus diagonalizable with respect to an or-

thonormal basis of eigenvectors. Here are the details.

Characteristic polynomial p(t) = t

+ 1 = (t −i)(t + i)

Eigenvalues λ

= i, λ

= −i

Eigenvectors (normalized) w

√





, w

√



−i



Certainly

⟨

, w

⟩

= 0, β = {w

, w

} is orthonormal, and [T]



i 0

0 −i



is diagonal.

Now for the real case: T ∈ L(R

) is not self-adjoint and thus should not be diagonalizable

with respect to an orthonormal basis of eigenvectors. Indeed this is trivial; the characteristic

polynomial has no roots in R and so there are no real eigenvalues! It is also clear geometrically:

T is rotation by 90° counter-clockwise around the origin, so it has no eigenvectors.

4. Let A =



3 i

−i 3



and consider the self-adjoint operator T = L

∈ L(C

Characteristic polynomial p(t) = t

−6t + 9 −1 = (t −2)(t −4)

Eigenvalues λ

= 2, λ

= 4

Eigenvectors (normalized) w

√





, w

√



−i



With respect to the orthonormal basis β = {w

, w

}, we have [T]



2 0

0 4



Proving the Spectral Theorem for Self-Adjoint Operators

Lemma 2.39 (Basic properties of self-adjoint operators). Let T ∈ L(V) be self-adjoint.

1. If W ≤ V is T-invariant then the restriction T

is self-adjoint;

2. Every eigenvalue of T is real;

3. If dim V is ﬁnite then T has an eigenvalue.

It is irrelevant whether V is real or complex. The previous example demonstrates part 2; even when

V = C

is a complex inner product space, the eigenvalues of a self-adjoint matrix are real.

Proof. 1. Let w

, w

∈ W. Then

⟨

( w

), w

⟩

⟨

T(w

), w

⟩

⟨

, T(w

)

⟩

⟨

, T

( w

)

⟩

2. Suppose (λ, x) is an eigenpair. Then

⟨

T(x), x

⟩

⟨

x, T(x)

⟩

= λ

=⇒ λ ∈ R

3. This is trivial if V is complex since every characteristic polynomial splits over C. We therefore

assume V is real. Choose any orthonormal basis γ of V, let A = [T]

∈ M

(R ), and deﬁne

S := L

∈ L(C

). Then;

• The characteristic polynomial of S splits over C, whence there exists an eigenvalue λ ∈ C.

• The characteristic polynomials of S and T are identical (to that of A).

• S is self-adjoint and thus (part 2) λ ∈ R.

It follows that T has the same real eigenvalue λ.

We are now able to prove the spectral theorem for self-adjoint operators on a ﬁnite-dimensional inner

product space V. The argument applies regardless of whether V is real or complex.

Proof of the Spectral Theorem (self-adjoint case). We prove by induction on dim V.

(Base case) If dim V = 1, then V = Span{x} and T(x) = λx for some unit vector x and scalar λ ∈ R;

plainly {x} is an orthonormal eigenbasis for T.

(Induction step) Fix n ∈ N and assume that every self-adjoint operator on every inner product space

of dimension n satisﬁes the spectral theorem. Let dim V = n + 1 and T ∈ L(V) be self-adjoint.

By part 3 of the Lemma, T has an eigenpair (λ, x) where we may assume x has unit length. Let

W = Span{x}

⊥

. If w ∈ W, then

⟨

x, T(w)

⟩

⟨

T(x), w

⟩

= λ

⟨

x, w

⟩

= 0 (∗)

whence W is T-invariant. Plainly dim W = n. By part 1 of the Lemma, T

is self-adjoint. By the

induction hypothesis, T

is diagonalized by some orthonormal basis γ of W. But then T is diagonal-

ized by the orthonormal basis β = γ ∪{x} of V.

Proving the Spectral Theorem for Normal Operators

What changes for normal operators on complex inner product spaces? Not much! Indeed the proof

is almost identical when T is merely normal.

• We don’t need parts 2 and 3 of Lemma 2.39: every linear operator on a ﬁnite-dimensional

complex inner product space has an eigenvalue and we no longer care whether eigenvalues are

real.

• Two parts of the induction step need ﬁxed:

– W being T-invariant: This isn’t quite as simple as (∗), but thankfully part 3 of the next

result provides the needed correction.

– T

being normal: We need a replacement for part 1 of Lemma 2.39; this is a little more

involved.

Rather than write out all the details, we leave this to Exercises 6 and 7.

For completeness, and as an analogue/extension of Lemma 2.39, we summarize some of the basic

properties of normal operators. These also apply to self-adjoint operators as a special case.

Lemma 2.40 (Basic properties of normal operators). Let T be normal on V. Then:

1. ∀x ∈ V,

T(x)

∗

( x)

2. T −tI is normal for any scalar t.

3. T(x) = λx ⇐⇒ T

∗

( x) =

λx so that T and T

∗

have the same eigenvectors and conjugate

eigenvalues. This recovers the previously established fact that λ ∈ R if T is self-adjoint.

4. Distinct eigenvalues of T have orthogonal eigenvectors.

Proof. 1.

T(x)

⟨

T(x), T(x)

⟩

⟨

∗

T(x), x

⟩

⟨

∗

( x), x

⟩

⟨

∗

( x), T

∗

( x)

⟩

∗

( x)

⟨

x, ( T −tI)(y)

⟩

⟨

x, T(y)

⟩

− t

⟨

x, y

⟩

⟨

∗

( x), y

⟩

−



tx, y





∗

−tI)(x), y



shows that T −

tI has adjoint T

∗

−tI. It is trivial to check that these commute.

3. T(x) = λx ⇐⇒

(T −λI)(x)

= 0

parts

⇐⇒

1&2



∗

−λI)(x)



= 0 ⇐⇒ T

∗

( x) = λx.

4. In part this follows from the spectral theorem, but we can also prove more straightforwardly.

Suppose T(x) = λx and T(y) = µy where λ = µ. By part 3,

⟨

x, y

⟩

⟨

λx, y

⟩

⟨

T(x), y

⟩

⟨

x, T

∗

( y)

⟩

⟨

x, µy

⟩

= µ

⟨

x, y

⟩

This is a contradiction unless

⟨

x, y

⟩

= 0.

Schur’s Lemma

It is reasonable to ask how useful an orthonormal basis can be in general. Here is one answer.

Lemma 2.41 (Schur). Suppose T is a linear operator on a ﬁnite-dimensional inner product space V.

If the characteristic polynomial of T splits, then there exists an orthonormal basis β of V such that

[T]

is upper-triangular.

The spectral theorem is a special case; since the proof is similar, we leave it to the exercises.

The conclusion of Schur’s lemma is weaker than the spectral theorem, though it applies to more

operators: indeed if V is complex, it applies to any T! Every example of the spectral theorem is also

an example of Schur’s lemma. Example 2.38.2 provides another, since the matrix A is already upper

triangular with respect to the standard orthonormal basis. Here is a another example.

Example 2.42. Consider T( f ) = 2 f

′

(x) + x f (1) as a linear map T ∈ L(P

(R )) with respect to the

inner product

⟨

f , g

⟩

f (t)g(t) dt. We have

T(a + bx) = 2b + (a + b)x

If [T]

is to be upper-triangular, the ﬁrst vector in β must be an eigenvector of T. It is easily checked

that f

= 1 + x is such with eigenvalue 2. To ﬁnd a basis satisfying Schur’s lemma, we need only ﬁnd

orthogonal to this and then normalize. This can be done by brute force since the problem is small,

but for the sake of practice we apply Gram–Schmidt to the polynomial 1:

1 −

⟨

1, 1 + x

⟩

1 + x

(1 + x) = 1 −

1 +

1 + 1 +

(1 + x) =

(5 −9x) =⇒ f

= 5 −9x

Indeed we obtain an upper-triangular matrix for T:

T( f

) = −18 −4x = −13(1 + x) − (5 −9x) = −13 f

− f

=⇒ [T]

, f

}



2 −13

0 −1



We can also work with the corresponding orthonormal basis as posited in the theorem, though the

matrix is messier:

β = {g

, g

} =

(

(1 + x),

√

(5 −9x)

)

=⇒ [T]

2 −

√

0 −1

Alternatively, we could have started with the other eigenvector h

= 2 − x: an orthogonal vector to

this is h

= 4 −9x, with respect to which

[T]

}



−1 −13

0 2



In both cases the eigenvalues are down the diagonal, as must be for an upper-triangular matrix.

In general, it is difﬁcult to quickly ﬁnd a suitable basis satisfying Schur’s lemma. After trying the

proof in the exercises, you should be able to describe a method, though it is impractically slow!

Exercises 2.4 1. For each linear operator T on an inner product space V, decide whether T is nor-

mal, self-adjoint, or neither. If the spectral theorem permits, ﬁnd an orthonormal eigenbasis.

(a) V = R

and T

(

)



2a−b

−2a+5b



(b) V = R

and T







−a−b

4a−2b+5c



and T

(

)



2z+iw

z+2w



(d) V = R

with T : (e

, e

) 7→ ( e

, e

)

(e) V = P

(R ) with

⟨

f , g

⟩

f (t)g(t) dt and T( f ) = f

′

(Hint: Don’t compute T

∗

! Instead assume T is normal and aim for a contradiction. . . )

2. Let T( f (x)) = f

′

(x) + 4x f (0) where T ∈ L(P

(R )) and

⟨

f , g

⟩

−1

f (t)g(t) dt. Find an

orthonormal basis of P

(R ) with respect to which the matrix of T is upper-triangular.

3. Suppose S, T are self-adjoint operators on an inner product space V. Prove that ST is self-adjoint

if and only if ST = TS.

(Hint: recall Theorem 2.27)

4. Let T be normal on a ﬁnite-dimensional inner product space V. Prove that N(T

∗

) = N(T) and

that R(T

∗

) = R(T).

(Hint: Use Lemma 2.40 and the Fundamental Subspaces Theorem 2.30)

5. Let T be self-adjoint on a ﬁnite-dimensional inner product space V. Prove that

∀x ∈ V,

T(x) ±ix

T(x)

Hence prove that T −iI is invertible and that [(T −iI)

−1

]

∗

= (T + iI)

−1

6. Let W be a T-invariant subspace of an inner product space V and let T

∈ L(W) be the restric-

tion of T to W. Prove:

(a) W

⊥

is T

∗

-invariant.

(b) If W is both T- and T

∗

-invariant, then ( T

)

∗

= (T

∗

)

∗

-invariant and T is normal, then T

is normal.

7. Use the previous question to complete the proof of the spectral theorem for a normal operator

on a ﬁnite-dimensional complex inner product space.

8. (a) Suppose S is a normal operator on a ﬁnite-dimensional complex inner product space, all of

whose eigenvalues are real. Prove that S is self-adjoint.

(b) Let T be a normal operator on a ﬁnite-dimensional real inner product space V whose char-

acteristic polynomial splits. Prove that T is self-adjoint and that there exists an orthonor-

mal basis of V of eigenvectors of T.

(Hint: Mimic the proof of Lemma 2.39 part 3 and use part (a))

9. Prove Schur’s lemma by induction, similarly to the proof of the spectral theorem.

(Hint: T

∗

has an eigenvector x; why? Now show that W = Span{x}

⊥

is T-invariant. . . )

2.5 Unitary and Orthogonal Operators and their Matrices

In this section we focus on length-preserving transformations of an inner product space.

Deﬁnition 2.43. A linear

isometry of an inner product space V is a linear map T satisfying

∀x ∈ V,

T(x)

Every eigenvalue of an isometry must have modulus 1: if T(w) = λw, then

T(w)

λw

Example 2.44. Let T = L

∈ L(R

), where A =



4 −3

3 4



. Then











4x −3y

3x + 4y







(4x −3y)

+ (3x + 4y)



= x

+ y









This matrix is very special in that its inverse equals its transpose:

−1



4 3

−3 4





4 3

−3 4



= A

We call such matrices orthogonal. The simple version of what follows is that every linear isometry on

is multiplication by an orthogonal matrix.

Deﬁnition 2.45. A unitary operator T on an inner product space V is an invertible linear map satis-

fying T

∗

T = I = TT

∗

. A unitary matrix is a (real or complex) matrix satisfying A

∗

A = I.

If V is real, we usually call these orthogonal operators/matrices; this isn’t necessary, since unitary en-

compasses both real and complex spaces. An orthogonal matrix satisﬁes A

A = I.

Example 2.46. The matrix A =



i 2+2i

2−2i i



is unitary:

∗

A =



−i 2 + 2i

2 −2i −i



i 2 + 2i

2 −2i i





−i

+ 4 + 4 ( −i + i)( 2 + 2i)

(i −i)(2 −2i) 4 + 4 −i



= I

If V is ﬁnite-dimensional, the operator/matrix notions correspond straightforwardly. By Theorem

2.27, if we choose any orthonormal basis β of V, then

T ∈ L(V) is unitary/orthogonal ⇐⇒ [T]

is unitary/orthogonal

Moreover, we need only assume T

∗

T = I (or TT

∗

= I) if V is ﬁnite-dimensional: if β is an orthonormal

basis, then

∗

T = I ⇐⇒ [T

∗

]

[T]

= I ⇐⇒ [T]

∗

]

= I ⇐⇒ TT

∗

= I

In inﬁnite dimensions, we need T

∗

to be both the left- and right-inverse of T. This isn’t an empty

requirement (see Exercise 13).

There also exist non-linear isometries: for instance translations (T(x) = x + a for any constant a) and complex conjugation

(T(x) = x) on C

. Together with linear isometries, these essentially comprise all isometries in ﬁnite dimensions.

We now tackle the correspondence between unitary operators and isometries.

Theorem 2.47. Let T be a linear operator on an inner product space V.

1. If T is a unitary/orthogonal operator, then it is a linear isometry.

2. If T is a linear isometry and V is ﬁnite-dimensional, then T is unitary/orthogonal.

Proof. 1. If T is unitary, then

∀x, y ∈ V,

⟨

x, y

⟩

⟨

∗

T(x), y

⟩

⟨

T(x), T(y)

⟩

(†)

In particular taking x = y shows that T is an isometry.

2. ( I − T

∗

= I

∗

− (T

∗

= I − T

∗

T is self-adjoint. By the spectral theorem, there exists an

orthonormal basis of V of eigenvectors of I −T

∗

T. For any such x with (real) eigenvalue λ,

0 =

−

T(x)

⟨

x, x

⟩

−

⟨

T(x), T(x)

⟩

⟨

x, ( I −T

∗

T)x

⟩

= λ

=⇒ λ = 0

Since I − T

∗

T = 0 on a basis, T

∗

T = I. Since V is ﬁnite-dimensional, we also have TT

∗

= I

whence T is unitary.

The ﬁnite-dimensional restriction is important in part 2: we use the existence of adjoints, the spectral

theorem, and that a left-inverse is also a right-inverse. See Exercise 13 for an example of a non-unitary

isometry in inﬁnite dimensions.

The proof shows a little more:

Corollary 2.48. On a ﬁnite-dimensional space, being unitary is equivalent to each of the following:

(a) Preservation of the inner product

( †).

(b) The existence of an orthonormal basis β = {w

, . . . , w

} such that T(β) = {T(w

), . . . , T(w

) }

is also orthonormal.

While (a) is simply (†), claims (b) and (c) are also worth proving explicitly: see Exercise 9. If β is the

standard orthonormal basis of F

and T = L

, then the columns of A form the orthonormal set T(β).

This makes identifying unitary/orthogonal matrices easy:

Corollary 2.49. A matrix A ∈ M

(R ) is orthogonal if and only if its columns form an orthonormal

basis of R

with respect to the standard (dot) inner product.

A matrix A ∈ M

(C ) is unitary if and only if its columns form an orthonormal basis of C

with

respect to the standard (Hermitian) inner product.

In particular, in a real inner product space isometries also preserve the angle θ between vectors since cos θ =

⟨

x,y

⟩

||||

Examples 2.50. 1. The matrix A



cos θ −sin θ

sin θ cos θ



∈ M

(R ) is orthogonal for any θ. Example 2.44 is

this with θ = tan

−1

= sin

−1

= cos

−1

. More generally (Exercise 6), it can be seen that every

real orthogonal 2 ×2 matrix has the form A



cos θ sin θ

sin θ −cos θ



for some angle θ. The effect of the L

is to rotate counter-clockwise by θ, while that of L

is to

reﬂect across the line making angle

θ with the positive x-axis.

2. A =

√

3 1

√

2 0 −2

−

√

3 −1

∈ M

(R ) is orthogonal: check the columns!.

3. A =

√



1 i

i 1



∈ M

(C ) is unitary: indeed it maps the standard basis to the orthonormal basis

T(β ) =



√





√





It is also easy to check that the characteristic polynomial is

p(t) = det

√

−t

√

−t



t −

√



=⇒ t =

√

(1 ±i) = e

±πi/4

whence the eigenvalues of T both have modulus 1.

4. Here is an example of an inﬁnite-dimensional unitary operator. On the space C[−π, π], the

function T( f (x)) = e

f (x) is linear. Moreover

f (x), g(x)

2π

−π

f (x)g(x) dx =

2π

−π

f (x)e

−ix

g(x) dx =

f (x), e

−ix

g(x)

whence T

∗

( f (x)) = e

−ix

f (x). Indeed T

∗

= T

−1

and so T is a unitary operator.

Since C[−π, π] is inﬁnite-dimensional, we don’t expect all parts of the Corollary to hold:

(a) T does preserve the inner product.

(b), (c) C[−π, π] doesn’t have an orthonormal basis; there is no orthonormal set β = {f

} so that

every continuous function is a ﬁnite linear combination.

We cannot therefore claim that T

maps orthonormal bases to orthonormal bases!

An inﬁnite orthonormal set β = {f

: k ∈ Z} can be found so that every function f ‘equals’ an inﬁnite series in the

sense that

f −

∑

= 0. Since these are not ﬁnite sums, β isn’t strictly a basis, though it isn’t uncommon for it to be

so described. Moreover, given that the norm is deﬁned by an integral, this also isn’t a claim that f and

∑

are equal

as functions. Indeed the inﬁnite series need not be continuous! For these reasons, when working with Fourier series, one

tends to consider a broader class than the continuous functions.

Unitary and Orthogonal Equivalence

Suppose A ∈ M

(R ) is symmetric (self-adjoint) A

= A. By the spectral theorem, A has an orthonor-

mal eigenbasis β = {w

, . . . , w

}: Aw

= λ

. Arranging the eigenbasis as the columns of a matrix,

we see that the columns of U = (w

···w

) are orthonormal and so U is an orthogonal matrix. We

can therefore write

A = UDU

−1

= U







··· 0

0 ··· λ







A similar approach works if A ∈ M

(C ) is normal: we now have A = UDU

∗

where U is unitary.

Example 2.51. The matrix A =



1+i 1+i

−1−i 1+i



is normal as can easily be checked. Its characteristic

polynomial is

p(t) = t

−2(1 + i)t + 4i = (t −2i)(t −2)

with corresponding orthonormal eigenvectors

√



−i



, w

√





We conclude that

A =

√



1 1

−i i



2 0

0 2i



√



1 1

−i i



−1



1 1

−i i



1 0

0 i



1 i

1 −i



This is an example of unitary equivalence.

Deﬁnition 2.52. Square matrices A, B are unitarily equivalent if there exists a unitary matrix U such

that B = U

∗

AU. Orthogonal equivalence is similar: B = U

AU.

The above discussion proves half the following:

Theorem 2.53. A ∈ M

(C ) is normal if and only if it is unitarily equivalent to a diagonal matrix

(the matrix of its eigenvalues).

A ∈ M

(R ) is self-adjoint (symmetric) if and only if it is orthogonally equivalent to a diagonal matrix.

Proof. We’ve already observed the (⇒) direction.

For the converse, let D be diagonal, U unitary, and A = U

∗

DU. Then

∗

A = (U

∗

DU)

∗

DU = U

∗

DU = U

∗

DDU = U

∗

DDU = U

∗

DUU

∗

= U

∗

DU(U

∗

DU)

∗

= AA

∗

since U

∗

= U

−1

and because diagonal matrices commute: DD = DD.

In the special case where A is real and U is orthogonal, then A is symmetric:

= (U

DU)

= U

U = U

DU = A

Exercises 2.5 1. For each matrix A ﬁnd an orthogonal or unitary U and a diagonal D = U

∗

AU.

(a)



1 2

2 1



(b)



0 −1

1 0



(c)



2 3−3i

3+3i 5



(d)



2 1 1

1 2 1

1 1 2



2. Which of the following pairs are unitarily/orthogonally equivalent? Explain your answers.

(a) A =



0 1

1 0



and B =



0 2

2 0



(b) A =



0 −1 0

1 0 0

0 0 1



and B =



2 0 0

0 −1 0

0 0 0





0 −1 0

1 0 0

0 0 1



and B =



1 0 0

0 i 0

0 0 −i



3. Let a, b ∈ C be such that

= 1. Prove that every 2 ×2 matrix of the form



a −e

iθ

b e

iθ



unitary. Are these all the unitary 2 ×2 matrices? Prove or disprove.

4. If A, B are orthogonal/unitary, prove that AB and A

−1

are also orthogonal/unitary.

(This proves that orthogonal/unitary matrices are groups under matrix multiplication)

5. Check that A =



5 −4i

4i 5



∈ M

(C ) satisﬁes A

A = I (it is a complex orthogonal matrix).

(These don’t have the same nice relationship with inner products, and are thus less useful to us)

6. Supply the details of Exercise 2.50.1.

(Hints: β = {i, j} is orthonormal, whence {Ai, Aj} must be orthonormal. Now draw pictures to

compute the result of rotating and reﬂecting the vectors i and j.)

7. Show that the linear map in Example 2.50.4 has no eigenvectors.

8. Prove that A ∈ M

(C ) has an orthonormal basis of eigenvectors whose eigenvalues have mod-

ulus 1, if and only if A is unitary.

9. Prove parts (b) and (c) of Corollary 2.48 for a ﬁnite-dimensional inner product space:

(a) If β is an orthonormal basis such that T(β) is orthonormal, then T is unitary.

(b) If T is unitary, and η is an orthonormal basis, then T(η) is an orthonormal basis.

10. Let T be a linear operator on a ﬁnite-dimensional inner product space V. If

T(x)

for

all x in some orthonormal basis of V, must T be unitary? Prove or disprove.

11. Let T be a unitary operator on an inner product space V and let W be a ﬁnite-dimensional

T-invariant subspace of V. Prove:

(a) T(W) = W (Hint: show that T

is injective);

(b) W

⊥

is T-invariant.

12. Let W a subspace of an inner product space V such that V = W ⊕ W

⊥

. Deﬁne T ∈ L(V) by

T(u + w) = u −w where u ∈ W and w ∈ W

⊥

. Prove that T is unitary and self-adjoint.

13. In the inner product space ℓ

of square-summable sequences, consider the linear operator

T(x

, x

, . . .) = (0, x

, x

, . . .). Prove that T is an isometry and compute its adjoint. Check that T

is non-invertible and non-unitary.

14. Prove Schur’s Lemma for matrices. Every A ∈ M

(R ) is orthogonally equivalent and every

A ∈ M

(C ) is unitarily equivalent to an upper triangular matrix.

2.6 Orthogonal Projections

Recall the discussion of the Gram-Schmidt process, where we saw that any ﬁnite-dimensional sub-

space W of an inner product space V has an orthonormal basis β

= {w

, . . . , w

}. In such a situa-

tion, we can deﬁne the orthogonal projections onto W and W

⊥

via

: V → W : x 7→

∑

j=1



x, w



, π

⊥

: V → W

⊥

: x 7→ x −π

( x)

Our previous goal was to use orthonormal bases to ease computation. In this section we develop

projections more generally. First recall the notion of a direct sum within a vector space V:

V = X ⊕Y ⇐⇒ ∀v ∈ V, ∃ unique x ∈ X, y ∈ Y such that v = x + y

Deﬁnition 2.54. A linear map T ∈ L(V) is a projection if:

V = R(T) ⊕N(T) and T

R(T)

= I

R(T)

Otherwise said, T(r + n) = r whenever r ∈ R( T) and n ∈ N(T).

Alternatively, given V = X ⊕ Y, the projection along Y onto X is the map

v = x + y 7→ x.

We call a A ∈ M

(F ) a projection matrix if L

∈ L(F

) is a projection.

x = T(v)

Example 2.55. A =



6 −2

3 −1



is a projection matrix with R(A) = Span





and N(A) = Span





Indeed, it is straightforward to describe all projection matrices in M

(R ). There are three cases:

1. A = I is the identity matrix: R(A) = R

and N(A) = {0};

2. A = 0 is the zero matrix: R(A) = {0} and N(A) = R

;

3. Choose distinct subspaces R(A) = Span

(

)

and N(A) = Span

(

)

, then

A =

ad −bc





( d −c) =

ad −bc



ad −ac

bd −bc



Think about why this last does what we claim.

It should be clear that every projection T has (at most) two eigenspaces:

• R(T) is an eigenspace with eigenvalue 1

• N(T) is an eigenspace with eigenvalue 0

If V is ﬁnite-dimensional and ρ, η are bases of R(T), N(T) respectively, then the matrix of T with

respect to ρ ∪η has block form

[T]

ρ∪η



I 0

0 0



where rank I = rank T. In particular, every ﬁnite-dimensional projection is diagonalizable.

Lemma 2.56. T ∈ L(V) is a projection if and only if T

= T.

Proof. Throughout, assume r ∈ R(T) and n ∈ N(T).

( ⇒) Since every vector in V has a unique representation v = r + n, simply compute

( v) = T



T(r + n)



= T(r) = r = T(v)

( ⇐) Suppose T

= T. Note ﬁrst that if r ∈ R(T), then r = T(v) for some v ∈ V, whence

T(r) = T

( v) = T(v) = r (†)

Thus T is the identity on R(T). Moreover, if x ∈ R(T) ∩N(T), (†) says that x = T(x) = 0, whence

R(T) ∩N(T) = {0}

and so R(T) ⊕ N(T) is a well-deﬁned subspace of V.

To ﬁnish things off, let v ∈ V and observe that



v −T(v)



= T(v) −T

( v) = 0 =⇒ v −T(v) ∈ N(T)

so that v = T(v) +



v − T(v)



is a decomposition into R(T)- and N(T)-parts. We conclude that

V = R(T) ⊕N(T) and that T is a projection.

Thus far the discussion hasn’t had anything to do with inner products. . .

Deﬁnition 2.57. An orthogonal projection is a projection T ∈ L(V) on an inner product space for

which we additionally have

N(T) = R(T)

⊥

and R(T) = N(T)

⊥

Alternatively, given V = W ⊕W

⊥

, the orthogonal projection π

is the projection along W

⊥

onto W:

that is

R(π

) = W and N(π

) = W

⊥

The complementary orthogonal projection π

⊥

= I −π

has R(π

⊥

) = W

⊥

and N(π

⊥

) = W.

Example (2.55 continued). The identity and zero matrices are both 2 ×2 orthogonal projection

matrices, while those of type 3 are orthogonal if

(

)

(

)

= 0: we obtain

A =

+ b





(a b) =

+ b



ab b



More generally, if W ≤ F

has orthonormal basis {w

, . . . , w

}, then the matrix of π

∑

j=1

∗

In ﬁnite dimensions, the rank–nullity theorem and dimension counting ﬁnishes the proof here without having to

proceed further:

dim(R(T) ⊕ N(T)) = rank T + null T = dim V =⇒ R( T) ⊕ N(T) = V

Theorem 2.58. A projection T ∈ L(V) is orthogonal if and only if it is self-adjoint T = T

∗

Proof. (⇒) By assumption, R(T) and N(T) are orthogonal subspaces. Letting x, y ∈ V and using

subscripts to denote R(T)- and N(T)-parts, we see that

⟨

x, T(y)

⟩

⟨

+ x

, y

⟩

⟨

, y

+ y

⟩

⟨

T(x), y

⟩

=⇒ T

∗

= T

(⇐) Suppose T is a self-adjoint projection. By the fundamental subspaces theorem,

N(T) = N(T

∗

) = R(T)

⊥

Since T is a projection already, we have V = R(T) ⊕N(T) = R(T) ⊕ R(T)

⊥

, from which

R(T) =



R(T)

⊥



⊥

= N(T)

⊥

The language of projections allows us to rephrase the Spectral Theorem.

Theorem 2.59 (Spectral Theorem, mk. II). Let V be a ﬁnite-dimensional complex/real inner prod-

uct space and T ∈ L(V) be normal/self-adjoint with spectrum {λ

, . . . , λ

} and corresponding

eigenspaces E

, . . . , E

. Let π

∈ L(V) be the orthogonal projection onto E

. Then:

1. V = E

⊕···⊕E

is a direct sum of orthogonal subspaces, in particular, E

⊥

is the direct sum of

the remaining eigenspaces.

2. π

= 0 if i = j.

3. (Resolution of the identity) I

= π

+ ··· + π

4. (Spectral decomposition) T = λ

+ ··· + λ

Proof. 1. T is diagonalizable and so V is the direct sum of the eigenspaces of T. Since T is normal,

the eigenvectors corresponding to distinct eigenvalues are orthogonal, whence the eigenspaces

are mutually orthogonal. In particular, this says that

i=j

≤ E

⊥

Since V is ﬁnite-dimensional, we have V = E

⊕ E

⊥

, whence

dim

∑

i=j

dim E

= dim V −dim E

= dim E

⊥

=⇒

= E

⊥

2. This is clear by part 1, since N(π

) = E

⊥

3. Write x =

∑

j=1

where each x

∈ E

. Then π

( x) = x

: now add. . .

4. T(x) =

∑

j=1

T(x

) =

∑

j=1

∑

j=1

( x).

Examples 2.60. We verify the resolution of the identity and the spectral decomposition; for clarity,

we index projections and eigenspaces by eigenvalue rather than the natural numbers.

1. The symmetric matrix A =



10 2

2 7



has spectrum {6, 11} and orthonormal eigenvectors

√



−2



, w

√





The corresponding projections therefore have matrices

= w



−2



(1 −2) =



1 −2

−2 4



, π

= w



4 2

2 1



from which the resolution of the identity and the spectral decomposition are readily veriﬁed:

+ π



1 0

0 1



and 6π

+ 11π



6 + 44 −12 + 22

−12 + 22 24 + 11



= A

2. The normal matrix B =



1+i 1+i

−1−i 1+i



∈ M

(C ) has spectrum {2, 2i} and corresponding orthonor-

mal eigenvectors

√





, w

√





The orthogonal projection matrices are therefore

= w

∗





( −i 1) =



1 i

−i 1



= w

∗



1 −i

i 1



from which

+ π



1 0

0 1



and 2π

+ 2iπ



1 i

−i 1





i 1

−1 i



= B

3. The matrix C =



0 1 1

1 0 1

1 1 0



has spectrum {−1, 2}, an orthonormal eigenbasis

{u, v, w} =







√









√





−1





√





−2











and eigenspaces E

= Span{u} and E

−1

= Span{v, w}. The orthogonal projections have ma-

trices

= uu









(1 1 1) =





1 1 1





−1

= vv

+ ww





−1





(1 −1 0) +





−2





(1 1 −2)





1 −1 0

−1 1 0

0 0 0









1 1 −2

−2 −2 4









2 −1 −1

−1 2 −1

−1 −1 2





It is now easy to check the resolution of the identity and the spectral decomposition:

+ π

−1

= I and 2π

−π

−1

= C

Orthogonal Projections and Minimization Problems

We ﬁnish this section with an important observation that drives much of the application of inner

product spaces to other parts of mathematics and beyond. Throughout this discussion, X and Y

denote inner product spaces.

Theorem 2.61. Suppose Y = W ⊕ W

⊥

. For any y ∈ Y, the orthogonal projection π

( y) is the

unique element of W which minimizes the distance to y:

∀w ∈ W,

y −π

( y)

≤

y −w

Proof. Apply Pythagoras’ Theorem: since π

⊥

( y) = y −π

( y) ∈ W

⊥

y −w

y −π

( y) + π

( y) −w

y −π

( y)

( y) −w

≥

y −π

( y)

with equality if and only if w = π

( y).

This set up can be used to compute accurate approximations in many contexts.

Examples 2.62. 1. To obtain a quadratic polynomial approximation p(x) = a + bx + cx

to e

the interval [−1, 1] we choose to minimize the integral

−1

− p(x)

dx, namely the squared

-norm

− p(x)

on C[−1, 1]. If we let W = Span{1, x, x

}, then the ﬁnite-dimensionality

of W means that C[−1, 1] = W ⊕W

⊥

. By the Theorem, the solution is p(x) = π

( e

To compute this, recall that we have an orthonormal basis for W, namely



√

(3x

−1)



from which

p(x) =

⟨

1, e

⟩

⟨

x, e

⟩

x +

(3x

−1)



−1, e



−1

dx +

−1

(3x

−1)

−1

(3x

−1)e

( e −e

−1

) + 3e

−1

x +

( e −7e

−1

)(3x

−1)

≈ 1.18 + 1.10x + 0.179(3x

−1)

≈ 1 + 1.1x + 0.537x

−1 0 1

The linear and quadratic approximations to y = e

are drawn. Compare this with the Maclaurin

polynomial e

≈ 1 + x +

from calculus.

2. The n

Fourier approximation of a function f (x) is its orthogonal projection onto the ﬁnite-

dimensional subspace

= Span{1, e

, e

−ix

, . . . , e

inx

, e

−inx

}

= Span{1, cos x, sin x, cos 2x, sin 2x, . . . , cos nx, sin nx}

within L

[−π, π], namely

(x) =

2π

∑

k=−n

f (x), e

ikx

2π

⟨

f (x), 1

⟩

∑

k=1

⟨

f (x), cos kx

⟩

cos kx +

⟨

f (x), sin kx

⟩

sin kx

According to the Theorem, this is the unique function in F

(x) ∈ W

minimizing the integral

f (x) −F

(x)

−π

f (x) −F

(x)

For example, if f (x) =

(

1 if 0 < x ≤ π

−1 if − π < x ≤ 0

is extended periodically, then

2n−1

(x) =

∑

j=1

sin( 2j −1)x

2j −1



sin x +

sin 3x

sin 5x

+ ··· +

sin( 2n −1)x

2n −1



−1

π 2π−π−2π

y = f (x) and its eleventh Fourier approximation y = F

(x)

We’ll return to related minimization problems in the next section.

Exercises 2.6 1. Compute the matrices of the orthogonal projections onto W viewed as subspaces

of the standard inner product spaces R

or C

(a) W = Span



−1



(b) W = Span



















−1







































(d) W = Span





























(watch out, these vectors aren’t orthogonal!)

2. For each of the following matrices, compute the projections onto each eigenspace, verify the

resolution of the identity and the spectral decomposition.

(a)



1 2

2 1



(b)



0 −1

1 0



(c)



2 3 −3i

3 + 3i 5



(d)





2 1 1

1 2 1

1 1 2





(You should have orthonormal eigenbases from the previous section)

3. If W be a ﬁnite-dimensional subspace of an inner product space V. If T = π

is the orthogonal

projection onto W, prove that I −T is the orthogonal projection onto W

⊥

4. Let T ∈ L(V) where V is ﬁnite-dimensional.

(a) If T is an orthogonal projection, prove that

T(x)

≤

for all x ∈ V.

(b) Give an example of a projection for which the inequality in (a) is false.

T(x)

for all x ∈ V, what is T?

(d) If T is a projection for which

T(x)

≤

for all x ∈ V, prove that T is an orthogonal

projection.

5. Let T be a normal operator on a ﬁnite-dimensional inner product space. If T is a projection,

prove that it must be an orthogonal projection.

6. Let T be a normal operator on a ﬁnite-dimensional complex inner product space V. Use the

spectral decomposition T = λ

+ ··· + λ

to prove:

(a) If T

is the zero map for some n ∈ N, then T is the zero map.

(b) U ∈ L(V) commutes with T if and only if U commutes with each π

= T.

(d) T is invertible if and only if λ

= 0 for all j.

(e) T is a projection if and only if every λ

= 0 or 1.

(f) T = −T

∗

if and only if every λ

is imaginary.

7. Find a linear approximation to f (x) = e

on [0, 1] using the L

inner product.

8. Consider the L

inner product on C[−π, π] inner product.

(a) Explain why



sin x, x



= 0 for all n.

(b) Find linear and cubic approximations to f (x) = sin x.

(Feel free to use a computer algebra package to evaluate the integrals!)

9. Revisit Example 2.62.2

(a) Verify that the general complex (e

ikx

) and real (cos kx, sin kx) expressions for the Fourier

approximation are correct.

(Hint: use Euler’s formula e

ikx

= cos kx + i sin kx)

(b) Verify the explicit expression for F

2n−1

(x) when f (x) is the given step-function. What is

(x) in this case?

2.7 The Singular Value Decomposition and the Pseudoinverse

Given T ∈ L(V, W) between ﬁnite-dimensional inner product spaces, the overarching concern of this

chapter is the existence and computation of bases β, γ of V, W with two properties:

• That β, γ be orthonormal, thus facilitating easy calculation within V, W;

• That the matrix [T]

be as simple as possible.

We have already addressed two special cases:

Spectral Theorem When V = W and T is normal/self-adjoint, ∃β = γ such that [T]

is diagonal.

Schur’s Lemma When V = W and p(t) splits, ∃β = γ such that [T]

is upper triangular.

In this section we allow V = W and β = γ, and obtain a result that applies to any linear map between

ﬁnite-dimensional inner product spaces.

Example 2.63. Let A =



3 1

2 −2

1 3



and consider orthonormal bases β = {v

, v

} of R

and γ =

, w

} of R

respectively:

β =



√





√



−1



, γ =







√









√





−1

−2





√





−1











Since Av

= 4w

and Av

= 2

√

, whence [L

]



4 0

0 2

√

0 0



is almost diagonal.

Our main result says that such bases always exist.

Theorem 2.64 (Singular Value Decomposition). Suppose V, W are ﬁnite-dimensional inner prod-

uct spaces and that T ∈ L(V, W) has rank r. Then:

1. There exist orthonormal bases β = {v

, . . . , v

} of V and γ = {w

, . . . , w

} of W, and positive

scalars σ

≥ σ

≥ ··· ≥ σ

such that

T(v

) =

(

if j ≤ r

0 otherwise

equivalently [T]



diag(σ

, . . . , σ

) O

O O



2. Any such β is an eigenbasis of T

∗

T, whence the scalars σ

are uniquely determined by T: indeed

∗

T(v

) =

(

if j ≤ r

0 otherwise

and T

∗

( w

) =

(

if j ≤ r

0 otherwise

3. If A ∈ M

m×n

(F ) with rank A = r, then A = PΣQ

∗

where

Σ = [L

]



diag(σ

, . . . , σ

) O

O O



, P = (w

, . . . , w

), Q = (v

, . . . , v

)

Since the columns of P, Q are orthonormal, these matrices are unitary.

Deﬁnition 2.65. The numbers σ

, . . . , σ

are the singular values of T. If T is not maximum rank, we

have additional zero singular values σ

r+1

= ··· = σ

min(m,n)

= 0.

Freedom of Choice While the singular values are uniquely determined, there is often signiﬁcant free-

dom regarding the bases β and γ, particularly if any eigenspace of T

∗

T has dimension ≥ 2.

Special Case (Spectral Theorem) If V = W and T is normal/self-adjoint, we may choose β to be an

eigenbasis of T, then σ

is the modulus of the corresponding eigenvalue (see Exercise 7).

Rank-one decomposition If we write g

: V → F for the linear map g

: v 7→



v, v



(recall Riesz’s

Theorem), then the singular value decomposition says

T =

∑

j=1

that is T(v) =

∑

j=1



v, v



thus rewriting T as a linear combination of rank-one maps (w

: V → W).

For matrices, g

is simply multiplication by the row vector v

∗

and we may write

A =

∑

j=1

∗

Example (2.63 cont). We apply the method in the Theorem to A =



3 1

2 −2

1 3



The symmetric(!) matrix A

A =



14 2

2 14



has eigenvalues σ

= 16, σ

= 12 and orthonormal eigen-

vectors v

√





, v

√



−1



. The singular values are therefore σ

= 4, σ

= 2

√

3. Now

compute

√





√









, w

√



−1



√





−1

−2





and observe that these are orthonormal. Finally choose w

√



−1



to complete the orthonormal

basis γ of R

. A singular value decomposition is therefore

A =





3 1

2 −2

1 3





= PΣQ

∗







√

−1

√

−2

√

−1

√

−1

√











4 0

0 2

√

0 0





√

−1

√

By expanding the decomposition, A is expressed as the sum of rank-one matrices:

+ σ

= 4







√









√



+ 2

√







−1

√

−2

√









−1

√







2 2

0 0

2 2









1 −1

2 −2

−1 1





Since β is orthonormal, it is common to write v

∗

for the map g

= ⟨ , v

⟩ in general contexts. To those familiar with

the dual space V

∗

= L(V, F), the set {g

, . . . , g

} = {v

∗

, . . . , v

∗

} is the dual basis to β. In this course v

∗

will only ever mean

the conjugate-transpose of a column vector in F

. This discussion is part of why physicists write inner products differently!

Proof. 1. Since T

∗

T is self-adjoint, the spectral theorem says it has an orthonormal basis of eigen-

vectors β = {v

, . . . , v

}. If T

∗

T(v

) = λ

, then



T(v

), T(v

)





∗

T(v

), v



= λ



, v



(

if j = k

0 if j = k

(∗)

whence every eigenvalue is a non-negative real number: λ



T(v

)



≥ 0.

Since rank T

∗

T = rank T = r (Exercise 8), exactly r eigenvalues are non-zero; by reordering

basis vectors if necessary, we may assume

≥ ··· ≥ λ

> 0

If j ≤ r, deﬁne σ

> 0 and w

T(v

), then the set {w

, . . . , w

} is orthonormal (∗).

If necessary, extend this to an orthonormal basis γ of W.

2. If orthonormal bases β and γ exist such that [T]



diag(σ

,...,σ

) O

O O



, then [T

∗

]

is essentially the

same matrix just that its dimensions have been reversed:

∗

]



diag(σ

, . . . , σ

) O

O O



=⇒ T

∗

T =



diag(σ

, . . . , σ

) O

O O



whence T

∗

and T

∗

T are as claimed.

3. This is merely part 1 in the context of T = L

∈ L(F

, F

). The orthonormal bases β, γ consist

of column vectors and so the (change of co-ordinate) matrices P, Q are unitary.

Examples 2.66. 1. The matrix A =



2 3

0 2



has A

A =



4 6

6 13



with eigenvalues σ

= 16 and σ

= 1

and orthonormal eigenbasis

β =



√





√



−2



The singular values are therefore σ

= 4 and σ

= 1, from which we obtain

γ =







√





√



−1



and the decomposition

A = PΣQ

∗

√

−1

√



4 0

0 1



√

−2

√

Multiplying out, we may write A as a sum of rank-one matrices

A =







1 2





−1





−2 1





2 4

1 2





2 −1

−4 2



2. The decomposition can be very messy to ﬁnd in non-matrix situations. Here is a classic example

where we simply observe the structure directly.

The L

inner product

⟨

f , g

⟩

f (x)g(x) dx on P

(R ) and P

(R ) admits orthonormal bases

β =

√

5( 6x

−6x + 1),

√

3( 2x −1), 1

, γ =

√

3( 2x −1), 1

Let T =

be the derivative operator. The matrix of T is already in the required form!

[T]



√

15 0 0

0 2

√

3 0



thus β, γ are suitable bases and the singular values of T are σ

= 2

√

15 and σ

= 2

√

Since β, γ are orthonormal, we could have used the adjoint method to evaluate this directly,

∗

= ([T]

)

[T]





60 0 0

0 12 0

0 0 0





=⇒ σ

= 60, σ

= 12

Up to sign, {[v

]

, [v

]

, [v

]

} is therefore forced to be the standard ordered basis of R

, con-

ﬁrming that β was the correct basis of P

(R ) all along!

The Pseudoinverse

The singular value decomposition of a map T ∈ L(V, W) gives rise to a natural map from W back to

V. This map behaves somewhat like an inverse even when the operator is non-invertible!

Deﬁnition 2.67. Given the singular value decomposition of a rank r map T ∈ L(V, W), the pseu-

doinverse of T is the linear map T

†

∈ L(W, V) deﬁned by

†

( w

) =

(

if j ≤ r

0 otherwise

Restricted to Span{v

, . . . , v

} → Span{w

, . . . , w

}, the pseudoinverse really does invert T:

†

T(v

) =

(

if j ≤ r

0 otherwise

†

( w

) =

(

if j ≤ r

0 otherwise



= Span{v

, . . . , v

}

| {z }

N(T)

⊥

=R(T

†

)

⊕ Span{v

r+1

, . . . , v

}

| {z }

N(T)=R(T

†

)

⊥



}

bijection



†

R(T)=N(T

†

)

⊥

z }| {

Span{w

, . . . , w

} ⊕

R(T)

⊥

=N(T

†

)

z }| {

Span{w

r+1

, . . . , w

}

Otherwise said, the combinations are orthogonal projections: T

†

T = π

⊥

N(T)

and TT

†

= π

R(T)

Given the singular value decomposition of a matrix A = PΣQ

∗

, its pseudoinverse is that of matrix of

)

†

, namely

†

∑

j=1

∗

= QΣ

†

∗

where Σ

†



diag(σ

−1

, . . . , σ

−1

) O

O O



Examples 2.68. 1. Again continuing Example 2.63, A =



3 1

2 −2

1 3



has pseudoinverse

†

∗

√





(1 0 1) +

√



−1



( −1 −2 1)



1 0 1





1 2 −1

−1 −2 1





5 4 1

1 −4 5



which is exactly what we would have found by computing A

†

= QΣ

†

∗

. Observe that

†

A =



1 0

0 1



and AA

†





2 1 1

1 2 1

1 1 2





are the orthogonal projection matrices onto the spaces N(A)

⊥

= Span{v

, v

} = R

and

R(A) = Span{w

, w

} ≤ R

respectively. Both spaces have dimension 2, since rank A = 2.

2. The pseudoinverse of T =

: P

(R ) → P

(R ), as seen in Example 2.66.2, maps

†

(

√

3( 2x −1)) =

√

5( 6x

−6x + 1) =

√

(6x

−6x + 1)

†

(1) =

√

3( 2x −1) = x −

=⇒ T

†

(a + bx) = T

†



a +

√

3( 2x −1)





a +



x −



(6x

−6x + 1)

+ ax −

−

The pseudoinverse of ‘differentiation’ therefore returns a particular choice of anti-derivative,

namely the unique anti-derivative of a + bx lying in Span{

√

5( 6x

−6x + 1),

√

3( 2x −1)}.

Exercises 2.7 1. Find the ingredients β, γ and the singular values for each of the following:

(a) T ∈ L(R

, R

) where T

(

)



x+y

x−y



(b) T : P

(R ) → P

(R ) and T( f ) = f

′′

where

⟨

f , g

⟩

f (x)g(x) dx

⟨

f , g

⟩

2π

f (x)g(x) dx, with T( f ) = f

′

+ 2f

2. Find a singular value decomposition of each of the matrices:

(a)





1 1

−1 −1





(b)



1 0 1

1 0 −1



(c)





1 1 1

1 −1 0

1 0 −1





3. Find an explicit formula for T

†

for each map in Exercise 1.

4. Find the pseudoinverse of each of the matrix in Exercise 2.

5. Suppose A = PΣQ

∗

is a singular value decomposition.

(a) Describe a singular value decomposition of A

∗

(b) Explain why A

†

= QΣ

†

∗

isn’t a singular value decomposition of A

†

; what would be the

correct decomposition? (Hint: what is wrong with Σ

†

6. Suppose T : V → W is written according to the singular value theorem. Prove that γ is a basis

of eigenvectors of TT

∗

with the same non-zero eigenvalues as T

∗

T, including repetitions.

7. (a) Suppose T = L(V) is normal. Prove that each v

in the singular value theorem may be

chosen to be an eigenvector of T and that σ

is the modulus of the corresponding eigenvalue.

(b) Let A =



0 1

1 0



. Show that any orthonormal basis β of R

satisﬁes the singular value theo-

rem. What is γ here? What is it about the eigenvalues of A that make this possible?

(Even when T is self-adjoint, the vectors in β need not also be eigenvectors of T!)

8. In the proof of the singular value theorem we claimed that rank T

∗

T = rank T. Verify this by

checking explicitly that N(T

∗

T) = N(T).

(This is circular logic if you use the decomposition, so you must do without!)

9. Let V, W be ﬁnite-dimensional inner product spaces and T ∈ L(V, W). Prove:

(a) T

∗

†

= T

†

∗

= T

∗

(Hint: evaluate on the basis γ = {w

, . . . , w

} in the singular value theorem)

(b) If T is injective, then T

∗

T is invertible and T

†

= (T

∗

−1

∗

is invertible and T

†

= T

∗

(TT

∗

)

−1

10. Consider the equation T(x) = b, where T is a linear map between ﬁnite-dimensional inner

product spaces. A least-squares solution is a vector x which minimizes

T(x) − b

(a) Prove that x

= T

†

( b) is a least-squares solution and that any other has the form x

+ n

for some n ∈ N(T).

(Hint: Theorem 2.61 says that x

is a least-squares solution if and only if T(x

) = π

R(T)

( b))

(b) Prove that x

= T

†

( b) has smaller norm than any other least-squares solution.

= T

†

( b) is the unique least-squares solution.

11. Find the minimal norm solution to the ﬁrst system, and the least-squares solution to the second:

(

3x + 2y + z = 9

x −2y + 3z = 3











3x + y = 1

2x −2y = 0

x + 3y = 0

Linear Regression (non-examinable)

Given a data set {(t

, y

) : 1 ≤ j ≤ m}, we may employ the least-squares method to ﬁnd a best-ﬁtting

line y = c

+ c

t; often used to predict y given a value of t.

The trick is to minimize the sum of the squares of the vertical deviations of the line from the data set.

∑

j=1



−c



y − Ax

where A =













x =





y =













With the indicated notation, we recognize this as a least-squares problem. Indeed if there are at least

two distinct t-values in the data set, then rank A = 2 is maximal and we have a unique best-ﬁtting

line with coefﬁcients given by





= A

†

y = (A

−1

Example 2.69. Given the data set {(0, 1), (1, 1), (2, 0), (3, 2), (4, 2)}, we compute

A =







0 1

1 1

2 1

3 1

4 1







, y =













=⇒ x





= (A

−1

y =



30 10

10 5



−1









The regression line therefore has equation y =

( t + 2).

The process can be applied more generally to approximate using other functions. To ﬁnd the best-

ﬁtting quadratic polynomial y = c

+ c

t + c

, we’d instead work with

A =













x =









y =













=⇒

∑

j=1



−c

t −c



y − Ax

Provided we have at least three distinct values t

, t

, the matrix A is guaranteed to have rank 3 and

there will be a best-ﬁtting least-squares quadratic: in this case

y =

(15t

−39t + 72)

This curve and the best-ﬁtting straight line are shown below.

0 1 2 3 4

Optional Problems Use a computer to invert any 3 ×3 matrices!

1. Check the calculation for the best-ﬁtting least-squares quadratic in Example 2.69.

2. Find the best-ﬁtting least-squares linear and quadratic approximations to the data set

{

(1, 2), (3, 4), (5, 7), (7, 9), (9, 12)

}

3. Suppose a data set {(t

, y

) : 1 ≤ j ≤ m} has unique regression line y = ct + d.

(a) Show that the equations A

= A

y can be written in matrix form



∑







∑



(b) Recover the standard expressions from statistics:

c =

Cov(t, y)

and d = y −ct

where

• t =

∑

j=1

and y =

∑

j=1

are the means (averages),

• σ

∑

j=1

( t

−t)

is the variance,

• Cov(t, y) =

∑

j=1

( t

−t)(y

−y) is the covariance.

2.8 Bilinear and Quadratic Forms

In this section we slightly generalize the idea of an inner product. Throughout, V is a vector space

over a ﬁeld F; it need not be an inner product space and F can be any ﬁeld (not just R or C).

Deﬁnition 2.70. A bilinear form B : V ×V → F is linear in each entry: ∀v, x, y ∈ V, λ ∈ F,

B(λx + y, v) = λB(x, v) + B(y, v), B(v, λx + y) = λB(v, x) + B(v, y)

Additionally, B is symmetric if ∀x, y ∈ V, B(x, y) = B(y, x).

Examples 2.71. 1. If V is a real inner product space, then the inner product

⟨

⟩

is a symmetric

bilinear form. Note that a complex inner product is not bilinear!

2. If A ∈ M

(F ), then B(x, y) := x

Ay is a bilinear form on F

. For instance, on R

B(x, y) = x



1 2

2 0



y = x

+ 2x

deﬁnes a symmetric bilinear form, though not an inner product since it isn’t positive deﬁnite;

for example B( j, j) = 0.

As seen above, we often make use of a matrix.

Deﬁnition 2.72. Let B be a bilinear form on a ﬁnite-dimensional space with basis ϵ = {v

, . . . , v

The matrix of B with respect to ϵ is the matrix [B]

= A ∈ M

(F ) with ij

entry

= B(v

, v

)

Given x, y ∈ V, compute their co-ordinate vectors [x]

, [y]

with respect to ϵ, then

B(x, y) = [x]

A[y]

The set of bilinear forms on V is therefore in bijective correspondence with M

(F ). Moreover,

B(y, x) = [y]

A[x]



[y]

A[x]



= [x]

[y]

Finally, if β is another basis of V, then an appeal to the change of co-ordinate matrix Q

yields

B(x, y) = [x]

A[y]

= (Q

[x]

)

A(Q

[y]

) = [x]

)

[y]

=⇒ [B]

= (Q

)

[B]

To summarize:

Lemma 2.73. Let B be a bilinear form on a ﬁnite-dimensional vector space.

1. If A is the matrix of B with respect to some basis, then every other matrix of B has the form

AQ for some invertible Q.

2. B is symmetric if and only if its matrix with respect to any (and all) bases is symmetric.

Naturally, the simplest situation is when the matrix of B is diagonal. . .

Examples 2.74. 1. Example 2.71.2 can be written

B(x, y) = x



1 2

2 0



y = x

+ 2x

= (x

+ 2x

)( y

+ 2y

) −4x



+ 2x





1 0

0 −4



+ 2y



= x



1 2

0 1





1 0

0 −4



1 2

0 1



If ϵ is the standard basis, then [B]



1 0

0 −4



where Q



1 2

0 1



. It follows that Q



1 −2

0 1



from which β =







−2



is a diagonalizing basis.

2. In general, we may perform a sequence of simultaneous row and column operations to diago-

nalize any symmetric B; we require only elementary matrices E

(λ)

of type III.

For instance:

B(x, y) = x

Ay = x





1 −2 3

−2 0 4

3 4 −1





y (A = [B]

with respect to the standard basis)

(2)





1 0 3

0 −4 10

3 10 −1





(add twice row 1 to row 2, columns similarly)

(−3)

(2)

(−3)





1 0 0

0 −4 10

0 10 −10





(subtract 3 times row 1 from row 3, etc.)

(1)

(−3)

(2)

(−3)

(1)





1 0 0

0 6 0

0 0 −10





(add row 3 to row 2, etc.)

If β is the diagonalizing basis, the the change of co-ordinate matrix is

= E

(2)

(−3)

(1)





1 2 0

0 1 0

0 0 1









1 0 −3

0 1 0

0 0 1









1 0 0

0 1 0

0 1 1









1 −1 −3

0 1 0

0 1 1





from which β =

n





−1





−3

o

. If you’re having trouble believing this, invert the

change of co-ordinate matrix and check that

B(x, y) = (x

−2x

+ 3x

)( y

−2y

+ 3y

) + 6x

−10(−x

+ x

)( −y

+ y

)

Warning! If F = R then every symmetric B may be diagonalized by an orthonormal basis (for the

usual dot product on R

). It is very unlikely that our algorithm will produce such! The algorithm has

two main advantages over the spectral theorem: it is typically faster and it applies to vector spaces

over any ﬁeld. As a disadvantage, it is highly non-unique.

Recall that E

(λ)

is the identity matrix with an additional λ in the ij

entry.

• As a column operation (right-multiplication), A 7→ AE

(λ)

adds λ times the i

column to the j

• As a row operation (left-multiplication), A 7→ E

(λ)

A = (E

(λ)

)

A adds λ times the i

row to the j

Example 2.75. We diagonalize B( x, y) = x



1 6

6 3



y = x

+ 6x

+ 3x

in three ways.

•



1 0

−6 1



1 6

6 3



1 −6

0 1





1 0

0 −33



= [B]

where β =







−6



. This corresponds to

B(x, y) = (x

+ 6x

)( y

+ 6y

) −33x

•



1 −2

0 1



1 6

6 3



1 0

−2 1





−11 0

0 3



= [B]

where γ =



−2







. This corresponds to

B(x, y) = −11x

+ 3(2x

+ x

)(2y

+ y

)

• If F = R, we may apply the spectral theorem to see that [B]



√

37 0

0 2−

√



is diagonal

with respect to η =

n

√





−1−

√

o

. The expression for B(x, y) in these co-ordinates is

disgusting, so we omit it; that B can be diagonalized orthogonally doesn’t mean it should be!

Theorem 2.76. Suppose B is a bilinear form of a ﬁnite-dimensional space V over F.

1. If B is diagonalizable, then it is symmetric.

2. If B is symmetric and F does not have characteristic two (see aside), then B is diagonalizable.

Proof. 1. If B is diagonalizable, ∃β such that [B]

is diagonal and thus symmetric.

2. Suppose B is non-zero (otherwise the result is trivial). We prove by induction on n = dim V.

If n = 1, the result is trivial: B(x, y) = axy for some a ∈ F is clearly symmetric.

Fix n ∈ N and assume that every non-zero symmetric bilinear form on a dimension n vector

space over F is diagonalizable. Let dim V = n + 1. By the discussion below, ∃x ∈ V such that

B(x, x) = 0. Consider the linear map

T : V → F : v 7→ B(x, v)

Clearly rank T = 1 =⇒ dim N(T) = n. Moreover, B is symmetric when restricted to N(T);

by the induction hypothesis there exists a basis β of N(T) such that [B

N(T)

]

is diagonal. But

then B is diagonal with respect to the basis β ∪{x}.

Aside: Characteristic two ﬁelds This means 1 + 1 = 0 in F; this holds, for instance, in the ﬁeld

= {0, 1} of remainders modulo 2. We now see the importance of char F = 2 to the above result.

• The proof uses the existence of x ∈ V such that B(x, x) = 0. If B is non-zero, ∃u, v such that

B(u, v) = 0. If both B(u, u) = 0 = B(v, v), then x = u + v does the job whenever char F = 2:

B(x, x) = B(u, v) + B(v, u) = 2B(u, v) = 0

• To see that the requirement isn’t idle, consider B(x, y) = x



0 1

1 0



y on the ﬁnite vector space

= {

















} over Z

. Every element of this space satisﬁes B(x, x) = 0! Perhaps

surprisingly, the matrix of B is identical with respect to any basis of Z

, whence B is symmetric

but non-diagonalizable.

In Example 2.75 notice how the three diagonal matrix representations have something in common:

one each of the diagonal entries are positive and negative. This is a general phenomenon:

Theorem 2.77 (Sylvester’s Law of Inertia). Suppose B is a symmetric bilinear form on a real vector

space V with diagonal matrix representation diag(λ

, . . . , λ

). Then the number of entries λ

which

are positive/negative/zero is independent of the diagonal representation.

Deﬁnition 2.78. The signature of a symmetric bilinear form B is the triple (n

, n

−

, n

) representing

how many positive, negative and zero terms are in any diagonal representation. Sylvester’s Law says

that the signature is an invariant of a symmetric bilinear form.

Positive-deﬁniteness says that a real inner product on an n-dimensional space has signature (n, 0, 0).

Practitioners of relativity often work in Minkowski spacetime: R

equipped with a signature (1, 3, 0)

bilinear form, typically

B(x, y) = x







0 0 0

0 −1 0 0

0 0 −1 0

0 0 0 −1







y = c

− x

where c is the speed of light. Vectors are time-, space-, or light-like depending on whether B(x, x) is

positive, negative or zero. For instance x = 3c

−1

+ 2e

+ e

is light-like.

Proof. For simplicity, let V = R

and write B( x, y) = x

Ay where A is symmetric.

1. Deﬁne rank B := rank A and observe this is independent of basis (exercises).

2. Let β, γ be diagonalizing bases, ordered according to whether B is positive, negative or zero.

β = {v

, . . . , v

, v

p+1

, . . . , v

, v

r+1

, . . . , v

}

γ = {w

, . . . , w

, w

q+1

, . . . , w

, w

r+1

, . . . , w

}

Here r = rank B in accordance with part 1: our goal is to prove that p = q.

3. Assume p < q, deﬁne the matrix C and check what follows:

C =







q+1







∈ M

(r−q+p)×n

(R )

(a) rank C ≤ r −q + p < r =⇒ null C > n −r, thus

∃x ∈ R

such that Cx = 0 and x ∈ Span{v

r+1

, . . . , v

}

(b) The ﬁrst p entries of Cx = 0 mean that x ∈ Span{v

p+1

, . . . , v

, v

r+1

, . . . , v

} and so

B(x, x) < 0. Note how we use part (a) to get a strict inequality here.

, . . . , w

, w

r+1

, . . . , w

whence B( x, x) ≥ 0: a contradiction.

Quadratic Forms & Diagonalizing Conics

Deﬁnition 2.79. To every symmetric bilinear form B : V ×V → F is associated a quadratic form

K : V → F : x 7→ B(x, x)

A function K : V → F is termed a quadratic form when such a symmetric bilinear form exists.

Examples 2.80. 1. If B is a real inner product, then K(v) =

⟨

v, v

⟩

is the square of the norm.

2. Let dim V = n and A be the matrix of B with respect to a basis β. By the symmetry of A,

K(x) = x

Ax =

∑

i,j=1

∑

1≤i≤j≤n

where

(

if i = j

if i = j

E.g., K(x) = 3x

+ 4x

−2x

corresponds to the bilinear form B(x, y) = x



3 −1

−1 4



As a fun application, we consider the diagonalization of conics in R

. The general non-zero conic has

equation

+ 2bxy + cy

+ dx + ey + f = 0

where the ﬁrst three terms comprise a quadratic form

K(x) = ax

+ 2bxy + cy

↭ B(v, w) = v



a b

b c



If {v

, v

} is a diagonalizing basis, then there exist scalars λ

, λ

for which

K(t

+ t

) = λ

+ λ

whence the general conic becomes

+ λ

+ µ

= η, λ

, λ

, µ

, η ∈ R

If λ

= 0, we may complete the square via the linear transformation s

= t

2λ

. The canonical

forms are then recovered:

Parabola: λ

or λ

= 0 (but not both).

Ellipse: λ

+ λ

= k = 0 where λ

, λ

, k have the same sign.

Hyperbola: λ

+ λ

= k = 0 where λ

, λ

have opposite signs.

Since B is symmetric, we may take {v

, v

} to be an orthonormal basis of R

, whence any conic may be

put in canonical form by applying only a rotation/reﬂection and translation (completing the square).

Alternatively, we could diagonalize K using our earlier algorithm; this additionally permits shear

transforms. By Sylvester’s Law, the diagonal entries will have the same number of (+, −, 0) terms

regardless of the method, so the canonical form will be unchanged.

Examples 2.81. 1. We describe and plot the conic with equation 7x

+ 24xy = 144.

The matrix of the associated bilinear form is



7 12

12 0



which has orthonormal eigenbasis

β = {v

, v

} =









−3



with eigenvalues (λ

, λ

) = (16, −9). In the rotated ba-

sis, this is the canonical hyperbola

16t

−9t

= 144 ⇐⇒

−

= 1

which is easily plotted. In case this is too fast, use the

change of co-ordinate matrix to compute directly:

−4

−2

−4 −2 2 4

−1

−2

−3

−4

−1

−2

−3

−4



4 −3

3 4



=⇒





= [x]

= Q







4x + 3y

−3x + 4y



which quickly recovers the original conic by substitution.

2. The conic deﬁned by K(x) = x

+ 12xy + 3y

= −33 deﬁnes

a hyperbola in accordance with Example 2.75. With respect

to the basis γ = {



−2







}, we see that

K(x) = −11x

+ 3(2x + y)

= −11t

+ 3t

= −33

⇐⇒

−

= 1

Even though γ is non-orthogonal, we can still plot the conic!

If we instead used the orthonormal basis η, we’d obtain

K(x) = (

√

37 + 2)s

−(

√

37 −2)s

= −33

however the calculation to ﬁnd η is time-consuming and the

expressions for s

, s

are extremely ugly.

−6 −3 3 6

−1

−2

−3

−6

−3

A similar approach can be applied to higher degree quadratic equations/manifolds: e.g. ellipsoids,

paraboloids and hyperboloids in R

Exercises 2.8 1. Prove that the sum of any two bilinear forms is bilinear, and that any scalar multi-

ple of a bilinear form is bilinear: thus the set of bilinear forms on V is a vector space.

(You can’t use matrices here, since V could be inﬁnite-dimensional!)

2. Compute the matrix of the bilinear form

B(x, y) = x

−2x

+ x

− x

on R

with respect to the basis β =

n





−1





o

3. Check that the function B( f , g) = f

′

(0)g

′′

(0) is a bilinear form on the vector space of twice-

differentiable functions. Find the matrix of B with respect to β = {cos t, sin t, cos 2t, sin 2t}

when restricted to the subspace Span β.

4. For each matrix A, ﬁnd a diagonal matrix D and an invertible matrix Q such that Q

AQ = D.

(a)



1 3

3 2



(b)





3 1 2

1 4 0

2 0 −1





5. If K is a quadratic form and K( x) = 2, what is the value of K(3x)?

6. If F does not have characteristic 2, and K(x) = B(x, x) is a quadratic form, prove that we can

recover the bilinear form B via

B(x, y) =

(

K(x + y) −K(x) − K(y)

)

7. If B( x, y) = x



0 1

1 0



y is a bilinear form on F

, compute the quadratic form K(x).

8. Suppose B is a symmetric bilinear form on a real ﬁnite-dimensional space. With reference to

the proof of Sylvester’s Law, explain why rank B is independent of the choice of diagonalizing

basis.

9. If char F = 2, apply the diagonalizing algorithm to the symmetric bilinear form B = x



0 1

1 0



on F

. What goes wrong if char F = 2?

10. Describe and plot the following conics:

(a) x

+ y

+ xy = 6

(b) 35x

+ 120xy = 4x + 3y

11. Suppose that a non-empty, non-degenerate

conic C in R

has the form ax

+ 2bxy + cy

+ dx +

ey + f = 0, where at least one of a, b, c = 0, and deﬁne ∆ = b

− ac. Prove that:

• C is a parabola if and only if ∆ = 0;

• C is an ellipse if and only if ∆ < 0;

• C is a hyperbola if and only if ∆ > 0.

(Hint: λ

, λ

are the eigenvalues of a symmetric matrix, so. . . )

The conic contains at least two points and cannot be factorized as a product of two straight lines: for example, the

following are disallowed;

• x

+ y

+ 1 = 0 is empty (unless one allows conics over C . . .)

• x

+ y

= 0 contains only one point;

• x

− xy − x + y = (x −1) (x −y) = 0 is the product of two lines.

3 Canonical Forms

3.1 Jordan Forms & Generalized Eigenvectors

Throughout this course we’ve concerned ourselves with variations of a general question: for a given

map T ∈ L(V), ﬁnd a basis β such that the matrix [T]

is as close to diagonal as possible. In this

chapter we see what is possible when T is non-diagonalizable.

Example 3.1. The matrix A =



−8 4

−25 12



∈ M

(R ) has characteristic equation

p(t) = ( −8 −t)(12 −t) + 4 ·25 = t

−4t + 4 = (t −2)

and thus a single eigenvalue λ = 2. It is non-diagonalizable since the eigenspace is one-dimensional

= N



−10 4

−25 10



= Span





However, if we consider a basis β = {v

, v

} where v





is an eigenvector, then [L

]

is upper-

triangular, which is better than nothing! How simple can we make this matrix? Let v

(

)

, then



−8x + 4y

−25x + 12y



= 2







−10x + 4y

−25x + 10y



= 2v

+ (−5x + 2y)v

=⇒ [L

]



2 −5x + 2y

0 2



Since v

cannot be parallel to v

, the only thing we cannot have is a diagonal matrix. The next best

thing is for the upper right corner be 1; for instance we could choose

β = {v

, v

} =









=⇒ [L

]



2 1

0 2



Deﬁnition 3.2. A Jordan block is a square matrix of the form

J =







λ 1







where all non-indicated entries are zero. Any 1 ×1 matrix is also a Jordan block.

A Jordan canonical form is a block-diagonal matrix diag(J

, . . . , J

) where each J

is a Jordan block.

A Jordan canonical basis for T ∈ L(V) is a basis β of V such that [T]

is a Jordan canonical form.

If a map is diagonalizable, then any eigenbasis is Jordan canonical and the corresponding Jordan

canonical form is diagonal. What about more generally? Does every non-diagonalizable map have a

Jordan canonical basis? If so, how can we ﬁnd such?

Example 3.3. It can easily be checked that β = {v

, v

} =

n









o

is a Jordan canon-

ical basis for

A =





−1 2 3

−4 5 4

−2 1 4





(really L

∈ L(R

)). Indeed

= 2v

, Av

= 3v

, Av













1 + 3

2 + 3

0 + 3





= v

+ 3v

=⇒ [L

]





2 0 0

0 3 1

0 0 3





Generalized Eigenvectors

Example 3.3 was easy to check, but how would we go about ﬁnding a suitable β if we were merely

given A? We brute-forced this in Example 3.1, but such is not a reasonable approach in general.

Eigenvectors get us some of the way:

• v

is an eigenvector in Example 3.1, but v

is not.

• v

and v

are eigenvectors in Example 3.3, but v

is not.

The practical question is how to ﬁll out a Jordan canonical basis once we have a maximal independent

set of eigenvectors. We now deﬁne the necessary objects.

Deﬁnition 3.4. Suppose T ∈ L(V) has an eigenvalue λ. Its generalized eigenspace is

:= {x ∈ V : ( T −λI)

( x) = 0 for some k ∈ N} =

[

k∈N

N(T −λI)

A generalized eigenvector is any non-zero v ∈ K

As with eigenspaces, the generalized eigenspaces of A ∈ M

(F ) are those of the map L

∈ L(F

It is easy to check that our earlier Jordan canonical bases consist of generalized eigenvectors.

Example 3.1: We have one eigenvalue λ = 2. Since (A − 2I)



0 0



is the zero matrix, every

non-zero vector is a generalized eigenvector; plainly K

= R

Example 3.3: We see that

(A −2I)v

= 0, (A −3I)v

= 0, (A −3I)

= (A −3I)v

= 0

whence β is a basis of generalized eigenvectors. Indeed

= Span{v

, v

}, K

= E

= Span{v

}

though verifying this with current technology is a little awkward. . .

In order to easily compute generalized eigenspaces, it is useful to invoke the main result of this

section. We postpone the proof for a while due to its meatiness.

Theorem 3.5. Suppose that the characteristic polynomial of T ∈ L(V) splits over F:

p(t) = ( λ

−t)

···(λ

−t)

where the λ

are the distinct eigenvalues of T with algebraic multiplicities m

. Then:

1. For each eigenvalue; (a) K

= N(T −λI)

and (b) dim K

= m.

2. V = K

⊕··· ⊕ K

: there exists a basis of generalized eigenvectors.

Compare this with the statement on diagonalizability from the start of the course.

With regard to part 2; we shall eventually be able to choose this to be a Jordan canonical basis. In

conclusion: a map has a Jordan canonical basis if and only if its characteristic polynomial splits.

Examples 3.6. 1. Observe how Example 3.3 works in this language:

A =





−1 2 3

−4 5 4

−2 1 4





=⇒ p(t) = (2 −t)

(3 −t)

= N(A −2I)

= Span









=⇒ dim K

= 1

= N(A −3I)

= N





2 −1 −1

0 0 0

2 −1 −1





= Span





























=⇒ dim K

= 2

= K

⊕K

2. We ﬁnd the generalized eigenspaces of the matrix A =



5 2 −1

0 0 0

9 6 −1



The characteristic polynomial is

p(t) = det(A −λI) = −t



5 −t −1

9 −1 −t



= −t(t

−5t + t −5 + 9) = −(0 −t)

(2 −t)

• λ = 0 has multiplicity 1; indeed K

= N(A − 0I)

= N(A) = Span



−1



is just the

eigenspace E

• λ = 2 has multiplicity 2,

= N(A −2I)

= N





3 2 −1

0 −2 0

9 6 −3





= N





0 −4 0

0 4 0

0 −12 0





= Span





























In this case the corresponding eigenspace is one-dimensional, E

= Span





⪇ K

, and

the matrix is non-diagonalizable.

Observe also that R

= K

⊕K

in accordance with the Theorem.

Properties of Generalized Eigenspaces and the Proof of Theorem 3.5

A lot of work is required to justify our main result. Feel free to skip the proofs at ﬁrst reading.

Lemma 3.7. Let λ be an eigenvalue of T ∈ L(V). Then:

1. E

is a subspace of K

, which is itself a subspace of V.

2. K

is T-invariant.

3. Suppose K

is ﬁnite-dimensional and µ = λ. Then:

(a) K

is ( T −µI)-invariant and the restriction of T − µI to K

is an isomorphism.

(b) If µ is another eigenvalue, then K

∩ K

= {0}. In particular K

contains no eigenvectors

other than those in E

Proof. 1. These are an easy exercise.

2. Let x ∈ K

, then ∃k such that (T −λI)

( x) = 0. But then

(T −λI)



T(x)



= (T −λI)



T(x) − λx + λx



= (T −λI)

k+1

( x) + λ(T −λI)

( x) = 0

Otherwise said, T(x) ∈ K

3. (a) Let x ∈ K

. Part 2 tells us that

(T −µI)(x) = T(x) − µx ∈ K

whence K

is ( T −µI)-invariant.

Suppose, for a contradiction, that T −µI is not injective on K

. Then

∃y ∈ K

\{0} such that ( T −µI)(y) = 0

Let k ∈ N be minimal such that (T − λI)

( y) = 0 and let z = (T − λI)

k−1

( y). Plainly

z = 0, for otherwise k is not minimal. Moreover,

(T −λI)(z) = (T −λI)

( y) = 0 =⇒ z ∈ E

Since T −µI and T −λI commute, we can also compute the effect of T − µI;

(T −µI)(z) = (T −µI) (T −λI)

k−1

( y) = (T −λI)

k−1

(T −µI)(x) = 0

which says that z is an eigenvector in E

; if µ isn’t an eigenvalue, then we already have

our contradiction! Even if µ is an eigenvalue, E

∩ E

= {0} provides the desired contra-

diction.

We conclude that (T − µI)

∈ L(K

) is injective. Since dim K

< ∞, the restriction is

automatically an isomorphism.

(b) This is another exercise.

Now to prove Theorem 3.5: remember that the characteristic polynomial of T is assumed to split.

Proof. (Part 1(a)) Fix an eigenvalue λ. By deﬁnition, we have N(T −λI)

≤ K

For the converse, parts 2 and 3 of the Lemma tell us (why?) that

( t) = (λ − t)

dim K

from which dim K

≤ m (∗)

By the Cayley–Hamilton Theorem, T

satisﬁes its characteristic polynomial, whence

∀x ∈ K

(

λI −T

)

dim K

( x) = 0 =⇒ K

≤ N(T −λI)

(Parts 1(b) and 2) We prove simultaneously by induction on the number of distinct eigenvalues of T.

(Base case) If T has only one eigenvalue, then p(t) = (λ − t)

. Another appeal to Cayley–

Hamilton says ( T −λI)

( x) = 0 for all x ∈ V. Thus V = K

and dim K

= m.

(Induction step) Fix k and suppose the results hold for maps with k distinct eigenvalues. Let T

have distinct eigenvalues λ

, . . . , λ

, µ, with multiplicities m

, . . . , m

, m respectively. Deﬁne

W = R(T −µI)

The subspace W has the following properties, the ﬁrst two of which we leave as exercises:

• W is T-invariant.

• W ∩ K

= {0} so that µ is not an eigenvalue of the restriction T

• Each K

≤ W: since (T −µI)

is an isomorphism (Lemma part 3), we can invert,

x ∈ K

=⇒ x = ( T −µI)



(T −µI)

−1



( x) ∈ R(T −µI)

= W

We conclude that λ

is an eigenvalue of the restriction T

with generalized eigenspace K

Since T

has k distinct eigenvalues, the induction hypotheses apply:

W = K

⊕··· ⊕ K

and p

( t) = (λ

−t)

dim K

···(λ

−t)

dim K

Since W ∩K

= {0} it is enough ﬁnally to use the rank–nullity theorem and count dimensions:

dim V = rank(T −µI)

+ null(T −µI)

= dim W + dim K

∑

j=1

dim K

+ dim K

(∗)

≤ m

+ ··· + m

+ m = deg(p(t)) = dim V

The inequality is thus an equality; each dim K

= m

and dim K

= m. We conclude that

V = K

⊕··· ⊕ K

⊕K

which completes the induction step and thus the proof. Whew!

This is yet another argument where we consider a suitable subspace to which we can apply an induction hypothesis;

recall the spectral theorem, Schur’s lemma, bilinear form diagonalization, etc. Theorem 3.12 will provide one more!

Cycles of Generalized Eigenvectors

By Theorem 3.5, for every linear map whose characteristic polynomial splits there exists generalized

eigenbasis. This isn’t the same as a Jordan canonical basis, but we’re very close!

Example 3.8. The matrix A =



5 1 0

0 5 1

0 0 5



∈ M

(R ) is a single Jordan block, whence there is a single

generalized eigenspace K

= R

and the standard basis ϵ = {e

, e

} is Jordan canonical.

The crucial observation for what follows is that one of these vectors e

generates the others via re-

peated applications of A −5I:

= (A −5I)e

, e

= (A −5I)e

= (A −5I)

Deﬁnition 3.9. A cycle of generalized eigenvectors for a linear operator T is a set

(T −λI)

k−1

( x), . . . , (T −λI)(x) , x

where the generator x ∈ K

is non-zero and k is minimal such that (T −λI)

( x) = 0.

Note that the ﬁrst element ( T −λI)

k−1

( x) is an eigenvector.

Our goal is to show that K

has a basis consisting of cycles of generalized eigenvectors; putting these

together results in a Jordan canonical basis.

Lemma 3.10. Let β

be a cycle of generalized eigenvectors of T with length k. Then:

1. β

is linearly independent and thus a basis of Span β

2. Span β

is T-invariant. With respect to β

, the matrix of the restriction of T is the k × k Jordan

block [T

Span β

]

λ 1

In what follows, it will be useful to consider the linear map U = T −λI. Note the following:

• The nullspace of U is the eigenspace: N(U) = E

≤ K

• T commutes with U: that is TU = UT.

• β

= {U

k−1

( x), . . . , U(x), x}; that is, Span β

⟨

⟩

is the U-cyclic subspace generated by x.

Proof. 1. Feed the linear combination

∑

k−1

j=0

( x) = 0 to U

k−1

to obtain

k−1

( x) = 0 =⇒ a

= 0

Now feed the same combination to U

k−2

, etc., to see that all coefﬁcients a

= 0.

2. Since T and U commute, we see that



( x)



= U



T(x)



= U



(U + λI)(x)



= U

j+1

( x) + λU

( x) ∈ Span β

This justiﬁes both T-invariance and the Jordan block claim!

The basic approach to ﬁnding a Jordan canonical basis is to ﬁnd the generalized eigenspaces and play

with cycles until you ﬁnd a basis for each K

. Many choices of canonical basis exist for a given map!

We’ll consider a more systematic method in the next section.

Examples 3.11. 1. The characteristic polynomial of A =



1 0 2

0 1 6

6 −2 1



∈ M

(R ) splits:

p(t) = (1 −t)



1 −t 6

−2 1 −t



+ 2



0 1 −t

6 −2



= (1 −t)



(1 −t)

+ 12 −12



= (1 −t)

With only one eigenvalue we see that K

= R

. Simply choose any vector in R

and see what

U = A − I does to it! For instance, with x = e





, U







o

n









o

provides a Jordan canonical basis of R

. We conclude

A = QJQ

−1



12 0 1

36 0 0

0 6 0



1 1 0

0 1 1

0 0 1



12 0 1

36 0 0

0 6 0



−1

In practice, almost any choice of x ∈ R

will generate a cycle of length three!

2. The matrix B =



7 1 −4

0 3 0

8 1 −5



∈ M

(R ) has characteristic equation

p(t) = (3 −t)(t

−2t −3) = −(t + 1)

( t −3)

dim K

−1

= 1 =⇒ K

−1

= E

−1

= Span





, spanned by a cycle of length one.

Since dim K

= 2, we have

= N(B −3I)

= N



4 1 −4

0 0 0

8 1 −8



= N



−16 0 16

0 0 0

−32 0 32



= Span

n





o

This is spanned by a cycle of length two:





is an eigenvector and





= (B −3I)





We conclude that β =

n









o

is a Jordan canonical basis for B, and that

B = QJQ

−1



1 1 0

0 0 1

2 1 0



−1 0 0

0 3 1

0 0 3



1 1 0

0 0 1

2 1 0



−1

3. Let T =

on P

(R ). With respect to the standard basis ϵ = {1, x, x

, x

A = [T]



0 1 0 0

0 0 2 0

0 0 0 3

0 0 0 0



With only one eigenvalue λ = 0, we have a single generalized eigenspace K

= P

(R ). It is

easy to check that f (x) = x

generates a cycle of length three and thus a Jordan canonical basis:

β = {6, 6x, 3x

, x

} =⇒ [T]



0 1 0 0

0 0 1 0

0 0 0 1

0 0 0 0





6 0 0 0

0 6 0 0

0 0 3 0

0 0 0 1



−1



0 1 0 0

0 0 2 0

0 0 0 3

0 0 0 0



6 0 0 0

0 6 0 0

0 0 3 0

0 0 0 1



Our ﬁnal results state that this process works generally.

Theorem 3.12. Let T ∈ L(V) have an eigenvalue λ. If dim K

< ∞, then there exists a basis

= β

∪··· ∪ β

of K

consisting of ﬁnitely many linearly independent cycles.

Intuition suggests that we create cycles β

by starting with a basis of the eigenspace E

and extending

backwards: for each x, if x = (T −λI)(y), then x ∈ β

; now repeat until you have a maximum length

cycle. This is essentially what we do, though a sneaky induction is required to make sure we keep

track of everything and guarantee that the result really is a basis of K

Proof. We prove by induction on m = dim K

(Base case) If m = 1, then K

= E

= Span x for some eigenvector x. Plainly {x} = β

(Induction step) Fix m ≥ 2. Write n = dim E

≤ m and U = (T −λI)

(i) For the induction hypothesis, suppose every generalized eigenspace with dimension < m (for

any linear map!) has a basis consisting of independent cycles of generalized eigenvectors.

(ii) Deﬁne W = R(U) ∩ E

: that is

w ∈ W ⇐⇒

(

U(w) = 0 and

w = U( v) for some v ∈ K

Let k = dim W, choose a complementary subspace X such that E

= W ⊕ X and select a basis

k+1

, . . . , x

} of X. If k = 0, the induction step is ﬁnished (why?). Otherwise we continue. . .

(iii) The calculation in the proof of Lemma 3.10 (take j = 1) shows that R(U) is T-invariant; it is

therefore the single generalized eigenspace

of T

R(U)

(iv) By the rank–nullity theorem,

dim R(U) = rank U = dim K

−null U = m −dim E

< m

By the induction hypothesis, R(U) has a basis of independent cycles. Since the last non-zero

element in each cycle is an eigenvector, this basis consists of k distinct cycles β

∪ ··· ∪ β

whose terminal vectors form a basis of W.

(v) Since each

∈ R(U), there exist vectors x

, . . . , x

such that

= U(x

). Including the length-

one cycles generated by the basis of X, the cycles β

, . . . , β

now contain

dim R(U) + k + (n −k) = rank U + null U = m

vectors. We leave as an exercise the veriﬁcation that these vectors are linearly independent.

Corollary 3.13. Suppose that the characteristic polynomial of T ∈ L(V) splits (necessarily dim V <

∞). Then there exists a Jordan canonical basis, namely the union of bases β

from Theorem 3.12.

Proof. By Theorem 3.5, V is the direct sum of generalized eigenspaces. By the previous result, each

has a basis β

consisting of ﬁnitely many cycles. By Lemma 3.10, the matrix of T

has Jordan

canonical form with respect to β

. It follows that β =

is a Jordan canonical basis for T.

Exercises 3.1 1. For each matrix, ﬁnd the generalized eigenspaces K

, ﬁnd bases consisting of

unions of disjoint cycles of generalized eigenvectors, and thus ﬁnd a Jordan canonical form J

and invertible Q so that the matrix may be expressed as QJQ

−1

(a) A =



1 1

−1 3



(b) B =



1 2

3 2







11 −4 −5

21 −8 −11

3 −1 0





(d) D =







2 1 0 0

0 2 1 0

0 0 3 0

0 1 −1 3







2. If β = {v

, . . . , v

} is a Jordan canonical basis, what can you say about v

? Brieﬂy explain why

the linear map L

∈ L(R

) where A =



0 −1

1 0



has no Jordan canonical form.

3. Find a Jordan canonical basis for each linear map T:

(a) T ∈ L(P

(R )) deﬁned by T( f (x)) = 2 f (x) − f

′

(x)

(b) T( f ) = f

′

deﬁned on Span{1, t, t

, e

, te

}

deﬁned on M

(R )

4. In Example 3.11.1, suppose x =





. Show that almost any choice of a, b, c produces a Jordan

canonical basis β

5. We complete the proof of Lemma 3.7.

(a) Prove part 1: that E

≤ K

≤ V.

(b) Verify that T −µI and T −λI commute.

6. Consider the induction step in the proof of Theorem 3.5.

(a) Prove that W is T-invariant.

(b) Explain why W ∩ K

= {0}.

( t) = (λ

− t)

dim K

···(λ

− t)

dim K

near the end of the proof is the

induction hypothesis for part 1(b). Why can’t we also assume that dim K

= m

and thus

tidy the inequality argument near the end of the proof?

7. We ﬁnish some of the details of Theorem 3.12.

(a) In step (ii), suppose dim W = k = 0. Explain why {x

, . . . , x

} is in fact a basis of K

, so

that the rest of the proof is unnecessary.

(b) In step (v), prove that the m vectors in the cycles β

, . . . , β

are linearly independent.

(Hint: model your argument on part 1 of Lemma 3.10)

3.2 Cycle Patterns and the Dot Diagram

In this section we obtain a useful result that helps us compute Jordan forms more efﬁciently and

systematically. To give us some clues how to proceed, here is a lengthy example.

Example 3.14. Precisely three Jordan canonical forms A, B, C ∈ M

(R ) correspond to the charac-

teristic polynomial p(t) = (5 −t)

A =





5 0 0

0 5 0

0 0 5





B =





5 1 0

0 5 0

0 0 5





C =





5 1 0

0 5 1

0 0 5





In all three cases the standard basis β = {e

, e

} is Jordan canonical, so how do we distinguish

things? By considering the number and lengths of the cycles of generalized eigenvectors.

• A has eigenspace E

= K

= R

. Since (A − 5I)v = 0 for all v ∈ R

, we have maximum

cycle-length one. We therefore need three distinct cycles to construct a Jordan basis, e.g.

= {e

}, β

= {e

}, β

= {e

} =⇒ β = β

∪ β

= {e

, e

}

• B has eigenspace E

= Span{e

, e

}. By computing

v =









=⇒ (B −5I)v =









=⇒ (B −5I)

v = 0

we see that β

is a cycle with maximum length two, provided b = 0 (v ∈ E

). We therefore

need two distinct cycles, of lengths two and one, to construct a Jordan basis, e.g.



(B −5I)e

, e



= {e

, e

}, β

= {e

} =⇒ β = β

∪ β

= {e

, e

}

• C has eigenspace E

= Span e

. This time

v =









=⇒ (C −5I)v =









, (C −5I)

v =









, (C −5I)

v = 0

generates a cycle with maximum length two provided c = 0. Indeed this cycle is a Jordan basis,

so one cycle is all we need:

β = β



(C −5I)

, (C −5I)e

, e



= {e

, e

}

Why is the example relevant? Suppose that dim

V = 3 and that T ∈ L(V) has characteristic polyno-

mial p(t) = (5 −t)

. Theorem 3.12 tells us that T has a Jordan canonical form, and that is is moreover

one of the above matrices A, B, C. Our goal is to develop a method whereby the pattern of cycle-

lengths can be determined, thus allowing us to be able to discern which Jordan form is correct. As a

side-effect, this will also demonstrate that the pattern of cycle lengths for a given T is independent of

the Jordan basis so that, up to some reasonable restriction, the Jordan form of T is unique. To aid us

in this endeavor, we require some terminology. . .

Deﬁnition 3.15. Let V be ﬁnite dimensional and K

a generalized eigenspace of T ∈ L(V). Follow-

ing the Theorem 3.12, assume that β

= β

∪ ···∪ β

is a Jordan canonical basis of T

, where the

cycles are arranged in non-increasing length. That is:

1. β

= {(T −λI)

−1

( x

), . . . , x

} has length k

, and

2. k

≥ k

≥ ··· ≥ k

The dot diagram of T

is a representation of the elements of β

, one dot for each vector: the j

column

represents the elements of β

arranged vertically with x

at the bottom.

Given a linear map, our eventual goal is to identify the dot diagram as an intermediate step in the

computation of a Jordan basis. First, however, we observe how the conversion of dot diagrams to a

Jordan form is essentially trivial.

Example 3.16. Suppose dim V = 14 and that T ∈ L(V) has the following eigenvalues and dot

diagrams:

= −4 λ

= 7 λ

= 12

• • • •

• •

•

• • •

Then generalized eigenspaces of T satisfy:

• K

−4

= N(T + 4I)

and dim K

−4

= 6;

• K

= N(T −7I)

and dim K

= 5;

• K

= N(T −12I) = E

and dim K

= 3;

T has a Jordan canonical basis β with respect to which its Jordan canonical form is

[T]







−4 1

0 −4

−4 1

0 −4

−4

7 1 0

0 7 1

0 0 7

7 1

0 7







Note how the sizes of the Jordan blocks are non-increasing within each eigenvalue. For instance, for

= −4, the sequence of cycle lengths (k

) is 2 ≥ 2 ≥ 1 ≥ 1.

Theorem 3.17. Suppose β

is a Jordan canonical basis of T

as described in Deﬁnition 3.15, and

suppose the i

row of the dot diagram has r

entries. Then:

1. For each r ∈ N, the vectors associated to the dots in the ﬁrst r rows form a basis of N(T −λI)

2. r

= null(T −λI) = dim V −rank(T −λI)

3. When i > 1, r

= null(T −λI)

−null(T −λI)

i−1

= rank(T −λI)

i−1

−rank(T −λI)

Example (3.14 cont). We describe the dot diagrams of the three matrices A, B, C, along with the

corresponding vectors in the Jordan canonical basis β and the values r

A :

• • • x

Since A − 5I is the zero matrix, r

= 3 − rank(A − 5I) = 3. The dot diagram has one row,

corresponding to three independent cycles of length one: β = β

∪ β

B : • •

•

(B −5I)x

Row 1: B −5I =



0 1 0

0 0 0



=⇒ rank(B −5I) = 1 and r

= 3 −1 = 2. The ﬁrst row {e

, e

} is

a basis of E

= N(B −5I).

Row 2: (B −5I)

is the zero matrix, whence r

= rank(B −5I) −rank(B −5I)

= 1 −0 = 1.

The dot diagram corresponds to β = β

∪ β

= {e

, e

}∪{e

C : •

•

(C −5I)

(C −5I) x

Row 1: C −5I =



0 1 0

0 0 1

0 0 0



=⇒ r

= 3 −rank(C −5I) = 1. The ﬁrst row {e

} is a basis of

= N(C −5I).

Row 2: (C −5I)



0 0 1

0 0 0



=⇒ r

= rank(C −5I) −rank(C −5I)

= 2 −1 = 1. The ﬁrst

two rows {e

, e

} form a basis of N(C −5I)

Row 3: (C −5I)

is the zero matrix, whence r

= rank(C −5I)

−rank(C −5I)

= 1 −0 = 1.

Proof. As previously, let U = T −λI.

1. Since each dot represents a basis vector U

( v

), any v ∈ K

may be written uniquely as a linear

combination of the dots. Applying U simply moves all the dots up a row and all dots in the top

row to 0. It follows that v ∈ N(U

) ⇐⇒ it lies in the span of the ﬁrst r rows. Since the dots

are linearly independent, they form a basis.

2. By part 1, r

= dim N(U) = null(T −λI) = dim V −rank(T −λI) .

3. More generally,

= (r

+ ··· + r

) −(r

+ ··· + r

i−1

) = dim N(U

) −dim N(U

i−1

)

= null(U

) −null(U

i−1

) = rank(T −λI)

i−1

−rank(T −λI)

Since the ranks of maps ( T −λI)

are independent of basis, so also is the dot diagram. . .

Corollary 3.18. For any eigenvalue λ, the dot diagram is uniquely determined by T and λ. If we

list Jordan blocks for each eigenspace in non-increasing order, then the Jordan form of a linear map

is unique up to the order of the eigenvalues.

We now have a slightly more systematic method for ﬁnding Jordan canonical bases.

Example 3.19. The matrix A =



6 2 −4 −6

0 3 0 0

0 0 3 0

2 1 −2 −1



has characteristic equation

p(t) = (3 −t)



6 −t −6

2 −1 −t



= (2 −t)(3 −t)

We have two generalized eigenspaces:

• K

= E

= N(A −2I) = N



4 2 −4 −6

0 1 0 0

0 0 1 0

2 1 −2 −3



= Span





. The trivial dot diagram • corresponds

to this single eigenvector.

• K

= N(A −3I)

. To ﬁnd the dot diagram, compute powers of A −3I:

Row 1: A −3I =



3 2 −4 −6

0 0 0 0

2 1 −2 −4



has rank 2 and the ﬁrst row has r

= 4 −2 = 2 entries.

Row 2: (A −3I)



−3 0 0 6

0 0 0 0

−2 0 0 4



has rank 1 and the second row has r

= 2 −1 = 1 entry.

Since we now have three dots (equalling dim K

), the algorithm terminates and the dot diagram

for K

is • •

•

For the single dot in the second row, we choose something in N(A −3I)

which isn’t an eigen-

vector; perhaps the simplest choice is x

= e

, which yields the two-cycle

{

(A −3I)x

, x

}









To complete the ﬁrst row, choose any eigenvector to complete the span: for instance x





We now have suitable cycles and a Jordan canonical basis/form:

β =

















, A = QJQ

−1



3 2 0 0

0 0 1 2

0 0 0 1

2 1 0 0



2 0 0 0

0 3 1 0

0 0 3 0

0 0 0 3



3 2 0 0

0 0 1 2

0 0 0 1

2 1 0 0



−1

Other choices are available! For instance, if we’d chosen the two-cycle generated by x

= e

, we’d

obtain a different Jordan basis but the same canonical form J:

β =







−4

−2











, A =



3 −4 0 0

0 0 0 2

0 0 1 1

2 −2 0 0



2 0 0 0

0 3 1 0

0 0 3 0

0 0 0 3



3 −4 0 0

0 0 0 2

0 0 1 1

2 −2 0 0



−1

We do one ﬁnal example for a non-matrix map.

Example 3.20. Let ϵ = {1, x, y, x

, y

, xy} and deﬁne T



f (x, y)



= 2

∂ f

∂x

−

∂ f

∂y

as a linear operator on

V = Span

ϵ. The matrix and characteristic polynomial of T is easy to compute:

[T]





0 2 −1 0 0 0

0 0 0 4 0 −1

0 0 0 0 −2 2

0 0 0 0 0 0





=⇒ p(t) = t

, [T

]





0 0 0 8 2 −4

0 0 0 0 0 0





, [T

]

= O

There is only one eigenvalue λ = 0 and therefore one generalized eigenspace K

= V. We could keep

working with matrices, but it is easy to translate the nullspaces of the matrices back to subspaces of

V, from which the necessary data can be read off:

N(T) = Span{1, x + 2y, x

+ 4y

+ 4xy} null T = 3, rank T = 3, r

= 3

N(T

) = Span{1, x, y, x

+ 2xy, 2y

+ xy} null T

= 5, rank T

= 1, r

= 3 −1 = 2

We now have ﬁve dots; since dim K

= 6, the last row has one, and the dot diagram is • • •

• •

•

Since the ﬁrst two rows span N(T

), we may choose any f

∈ N(T

) for the ﬁnal dot: f

= xy is

suitable, from which the ﬁrst column of the dot diagram becomes

(xy) • •

T(xy) •

−4 • •

2y −x •

Now choose the second dot on the second row to be anything in N(T

) such that the ﬁrst two rows

span N(T

): this time f

= x

−4y

is suitable, and the diagram becomes:

(xy) T(x

−4y

) •

T(xy) x

−4y

−4 4x + 8y •

2y −x x

−4y

The ﬁnal dot is now chosen so that the ﬁrst row spans N(T): this time f

= x

+ 4y

+ 4xy works.

The result is a Jordan canonical basis and form for T

β =



−4, 2y −x, xy, 4x + 8y, x

−4y

, x

+ 4y

+ 4xy



, J = [T]







0 1 0

0 0 1

0 0 0

0 1

0 0







As previously, many other choices of cycle-generators f

, f

are available; while these result in

different Jordan canonical bases, Corollary 3.18 assures us that we’ll always obtain the same canonical

form J.

Exercises 3.2 1. Let T be a linear operator whose characteristic polynomial splits. Suppose the

eigenvalues and the dot diagrams for the generalized eigenspaces K

are as follows:

= 2 λ

= 4 λ

= −3

• • •

• •

•

• •

•

• •

Find the Jordan form J of T.

2. Suppose T has Jordan canonical form

J =







2 1 0

0 2 1

0 0 2

2 1

0 2







(a) Find the characteristic polynomial of T.

(b) Find the dot diagram for each eigenvalue.

such that K

= N(T −λ

3. For each matrix A ﬁnd a Jordan canonical form and an invertible Q such that A = QJQ

−1

(a) A =





−3 3 −2

−7 6 −3

1 −1 2





(b) A =





0 1 −1

−4 4 −2

−2 1 1











0 −3 1 2

−2 1 −1 2

−2 −3 1 4







4. For each linear operator T, ﬁnd a Jordan canonical form J and basis β:

(a) T( f ) = f

′

on Span

, te

, t

, e

}

(b) T



f (x)



= x f

′′

(x) on P

(R )

+ b f

on Span

{1, x, y, x

, y

, xy}. How does your answer depend on a, b?

5. (Generalized Eigenvector Method for ODEs) Let A ∈ M

(R ) have an eigenvalue λ and sup-

pose β

= {v

k−1

, . . . , v

, v

} is a cycle of generalized eigenvectors for this eigenvalue. Show

that

x(t) := e

λt

k−1

∑

j=0

( t)v

satisﬁes x

′

( t) = Ax ⇐⇒

(

′

( t) = 0, and

′

( t) = b

j−1

( t) when j ≥ 1

Use this method to solve the system of differential equations

′







3 1 0 0

0 3 1 0

0 0 3 0

0 0 0 2







3.3 The Rational Canonical Form (non-examinable)

We ﬁnish the course with a very quick discussion of what can be done when the characteristic poly-

nomial of a linear map does not split. In such a situation, we may assume that

p(t) = ( −1)



( t)



···



( t)



(∗)

where each ϕ

( t) is an irreducible monic polynomial over the ﬁeld.

Example 3.21. The following matrix has characteristic equation p(t) = (t

+ 1)

(3 −t)

A =

0 −1 0 0 0

1 0 0 0 0

0 0 0 −1 0

0 0 1 0 0

0 0 0 0 3

∈ M

(R )

This doesn’t split over R since t

+ 1 = 0 has no real roots. It is, however, diagonalizable over C.

A couple of basic facts from algebra:

• Every polynomial splits over C: every A ∈ M

(C ) therefore has a Jordan form.

• Every polynomial over R factorizes into linear or irreducible quadratic factors.

The question is how to deal with non-linear irreducible factors in the characteristic polynomial.

Deﬁnition 3.22. The monic polynomial t

+ a

k−1

+ ··· + a

has companion matrix







0 0 0 ··· 0 −a

1 0 0 0 −a

0 1 0 0 −a

0 0 0 0 −a

k−2

0 0 0 ··· 1 −a

k−1







(when k = 1, this is the 1 ×1 matrix (−a

))

If T ∈ L(V) has characteristic polynomial (∗), then a rational canonical basis is a basis for which

[T]







O ··· O

O C

O O ··· C







where each C

is a companion matrix of some (ϕ

( t))

where s

≤ m

. We call [T]

a rational canonical

form of T.

We state the main result without proof:

Theorem 3.23. A rational canonical basis exists for any linear operator T on a ﬁnite-dimensional

vector space V. The canonical form is unique up to ordering of companion matrices.

Example (3.21 cont). The matrix A is already in rational canonical form: the standard basis is rational

canonical with three companion blocks,

= C



0 −1

1 0



, C

= (3)

Example 3.24. Let A =



4 −3

2 2



∈ M

(R ). Its characteristic polynomial

p(t) = t

−6t + 14 = (t −3)

+ 5

doesn’t split over R and so it has no eigenvalues. Instead simply pick a vector, x =





(say), deﬁne

y = Ax =





, let β = {x, y} and observe that

]



0 −14

1 6



is a rational canonical form. Indeed this works for any x = 0: if β := {x, Ax}, then Cayley–Hamilton

forces

x = (6A −14I)x = −14x + 6Ax =⇒ [L

]



0 −14

1 6



whence β is a rational canonical basis and the form [L

]

is independent of x!

A systematic approach to ﬁnding rational canonical forms is similar to that for Jordan forms: for each

irreducible divisor of p(t), the subspace K

= N



ϕ(T)



plays a role analogous to a generalized

eigenspace; indeed K

= K

for the linear irreducible factor ϕ(t) = λ − t!

We ﬁnish with two examples; hopefully the approach is intuitive, even without theoretical justiﬁca-

tion.

Examples 3.25. If the characteristic polynomial of T ∈ L(R

) is

p(t) = ( ϕ(t))

= (t

−2t + 3)

= t

−4t

+ 10t

−12t + 9

then there are two possible rational canonical forms; here is an example of each.

1. If A =

0 −15 0 −9

2 2 −3 0

0 −9 0 −6

−3 0 5 2

, then ϕ(A) = O is the zero matrix, whence N(ϕ(A)) = R

. Since ϕ(t)

isn’t the full characteristic polynomial, we expect there to be two independent cycles of length

two in the canonical basis. Start with something simple as a guess:





=⇒ x

= Ax



−3



=⇒ Ax



−3

−6



= −3x

+ 2x

Now make another choice that isn’t in the span of {x

, x





=⇒ x

= Ax



−3



=⇒ Ax



−6

−3



= −3x

+ 2x

We therefore have a rational canonical basis β = {x

, x

} and

A =



1 0 0 0

0 2 0 −3

0 0 1 0

0 −3 0 5



0 −3 0 0

1 2 0 0

0 0 0 −3

0 0 1 2



1 0 0 0

0 2 0 −3

0 0 1 0

0 −3 0 5



−1

Over C, this example is diagonalizable. Indeed each of the 2 ×2 companion matrices is diago-

nalizable over C.

2. Let B =



0 0 2 1

1 1 −1 −1

0 1 −2 −16

0 0 1 5



. This time

ϕ(B) = B

−2B + 3I =

3 2 −7 −29

−1 1 4 13

1 −3 −6 −17

0 1 1 2

=⇒ N(ϕ(B)) = Span



−1





−2



Anything not in this span will sufﬁce as a generator for a single cycle of length four: e.g.,





, x

= Bx





, x

= Bx





, x

= Bx



−1





−1

−14



= −9





+ 12





−10





+ 4



−1



We therefore have a rational canonical basis β = {x

, x

} and

B =



1 0 0 2

0 1 1 0

0 0 1 −1

0 0 0 1



0 0 0 −9

1 0 0 12

0 1 0 −10

0 0 1 4



1 0 0 2

0 1 1 0

0 0 1 −1

0 0 0 1



−1

In contrast to the ﬁrst example, B isn’t diagonalizable over C. It has Jordan form J =

λ 1 0 0

0 λ 0 0

0 0 λ 1

0 0 0 λ

where λ = 1 + i

√