Math 121B Linear Algebra
Neil Donaldson
Fall 2022
Linear Algebra, Stephen Friedberg, Arnold Insel & Lawrence Spence, 4th Ed 2003, Prentice Hall.
Review from 121A
We begin by recalling a few basic notions and notations.
Vector Spaces Bold-face v denotes a vector in a vector space V over a field F. A vector space is closed
under vector addition and scalar multiplication
v
1
, v
2
V, λ
1
, λ
2
F, λ
1
v
1
+ λ
2
v
2
V
Examples. Here are four (families of) vector spaces over the field R.
R
2
= {xi + yj : x, y R} = {
(
x
y
)
: x, y R} is a vector space over the field R.
P
n
(R ); polynomials with degree n and coefficients in R
P(R); polynomials over R with any degree.
C(R); continuous functions from R to R.
Linear Combinations and Spans Let β V be a subset of a vector space V over F. A linear
combination of vectors in β is any finite sum
λ
1
v
1
+ ··· + λ
n
v
n
where λ
j
F and v
j
β. The span of β comprises all linear combinations: this is a subspace of V.
Bases and Co-ordinates A set β V is a basis of V if it has two properties:
Linear Independence Any linear combination yielding the zero vector is trivial; for distinct v
j
β,
λ
1
v
1
+ ··· + λ
n
v
n
= 0 = j, λ
j
= 0
Spanning Set V = Span β; every vector in V is a (finite!) linear combination of elements of β.
Theorem. β is a basis of V every v V is a unique linear combination of elements of β.
The cardinality of all basis sets is identical; this is the dimension dim
F
V.
1
Example. P
2
(R ) has standard basis β = {1, x, x
2
}: every degree 2 polynomial is a unique as
a linear combination p(x) = a + bx + cx
2
and so dim P
2
(R ) = 3. The real numbers a, b, c are the
co-ordinates of p with respect to β; the co-ordinate vector of p is written
[p]
β
=
a
b
c
Linearity and Linear Maps A function T : V W between vector spaces V, W over the same field
F is (F-)linear if it respects the linearity properties of V, W
v
1
, v
2
V, λ
1
, λ
2
F, T( λ
1
v
1
+ λ
2
v
2
) = λ
1
T(v
1
) + λ
2
T(v
2
)
We write L(V, W) for the set (indeed vector space!) of linear maps from V to W: this is shortened to
L(V) if V = W. An isomorphism is an invertible/bijective linear map.
Theorem. If dim
F
V = n and β is a basis of V, then the co-ordinate map v 7 [v]
β
is an isomorphism
of vector spaces V F
n
.
Matrices and Linear Maps If V, W are finite-dimensional, then any linear map T : V W can be
described using matrix multiplication.
Example. If A =
2 1
0 1
4 3
, then the linear map L
A
: R
2
R
3
(left-multiplication by A) is
L
A
x
y
=
2 1
0 1
4 3
x
y
=
2x y
y
3y 4x
The linear map in fact defines the matrix A; we recover the columns of the matrix by feeding the
standard basis vectors to the linear map.
2
0
4
= L
A
1
0
1
1
3
= L
A
0
1
More generally, if T L(V, W) and β = {v
1
, . . . , v
n
} and γ = {w
1
, . . . , w
m
} are bases of V, W
respectively, then the matrix of T with respect to β and γ is
[T]
γ
β
=
[T(v
1
)]
γ
··· [T(v
n
)]
γ
M
m×n
(F )
whose j
th
column is obtained by feeding the j
th
basis vector of β to T and taking its co-ordinate vector
with respect to γ. This fits naturally with the co-ordinate isomorphisms
T(v) = w [ T]
γ
β
[v]
β
= [w]
γ
2
There are two special cases when V = W:
If β = γ, then we simply write [T]
β
instead of [T]
β
β
.
If T = I is the identity map, then Q
γ
β
:= [I]
γ
β
is the change of co-ordinate matrix from β to γ.
Being able to convert linear maps into matrix multiplication is a central skill in linear algebra. Test
your comfort by working through the following; if everything feels familiar, you should consider
yourself in a good place as far as pre-requisites are concerned!
Example. Let T : P
2
(R ) P
1
(R ) be the linear map defined by differentiation
T(a + bx + cx
2
) = b + 2cx ()
The standard bases of P
2
(R ) and P
1
(R ) are, respectively, β = {1, x, x
2
} and γ = {1, x}. Observe that
[T(1)]
γ
= [0]
γ
=
0
0
, [T(x)]
γ
= [1]
γ
=
1
0
, [T(x
2
)]
γ
= [2x]
γ
=
0
2
= [T]
γ
β
=
[T(1)]
γ
[T(x)]
γ
[T(x
2
)]
γ
=
0 1 0
0 0 2
Written in co-ordinates, we see the original linear map ()
T(a + bx + cx
2
)
γ
= [T]
γ
β
[a + bx + cx
2
]
β
=
0 1 0
0 0 2
a
b
c
=
b
2c
=
[
b + 2cx
]
γ
(†)
1. η = {1 + x, x + x
2
, x
2
+ 1} is also a basis of P
2
(R ). Show that
[T]
γ
η
=
1 1 0
0 2 2
2. As in (†) above, the matrix multiplication
1 1 0
0 2 2
a
b
c
=
a + b
2b + 2c
corresponds to an equation [T(p)]
γ
= [T]
γ
η
[p]
η
for some polynomial p(x); what is p(x) in terms
of a, b, c?
3. Find the change of co-ordinate matrix Q
β
η
and check that the matrices of T are related by
[T]
γ
η
= [T]
γ
β
Q
β
η
3
1 Diagonalizability & the Cayley–Hamilton Theorem
1.1 Eigenvalues, Eigenvectors & Diagonalization (Review)
Definition 1.1. Suppose V is a vector space over F and T L(V). A non-zero v V is an eigenvector
of T with eigenvalue λ F (together an eigenpair) if
T(v) = λv
For matrices, the eigenvalues/vectors of A M
n
(F ) are precisely those of L
A
L(F
n
).
Suppose λ is an eigenvalue of T;
1. The eigenspace of λ is the nullspace E
λ
:= N(T λI).
2. The geometric multiplicity of λ of the dimension dim E
λ
.
We say that T is diagonalizable if there exists a basis of eigenvectors; an eigenbasis.
We start by recalling a couple of basic facts, the first of which is easily proved by induction.
Lemma 1.2. If v
1
, . . . , v
k
are eigenvectors corresponding to distinct eigenvalues, then {v
1
, . . . , v
k
}
is linearly independent.
Moreover, if dim
F
V = n and T L(V) has n distinct eigenvalues, then T is diagonalizable.
Eigenvalues and Eigenvectors in finite dimensions
If dim
F
V = n and ϵ is a basis, then the eigenvector definition is equivalent to a matrix equation
[T]
ϵ
[v]
ϵ
= λ[v]
ϵ
In such a situation, T being diagonalizable means β such that [T]
β
is a diagonal matrix
[T]
β
=
λ
1
0 ··· 0
0 λ
2
.
.
.
.
.
.
.
.
.
0
0 ··· 0 λ
n
Thankfully there is systematic way to find eigenvalues and eigenvectors in finite-dimensions:
1. Choose any basis ϵ of V and compute the matrix A = [T]
ϵ
M
n
(F ).
2. Observe that
λ F is an eigenvalue [v]
ϵ
F
n
\{0} such that A[v]
ϵ
= λ[v]
ϵ
[v]
ϵ
F
n
\{0} such that
(
A λI
)
[v]
ϵ
= 0
det
(
A λI
)
= 0
This last is a degree n polynomial equation whose roots are the eigenvalues.
3. For each eigenvalue λ
j
, compute the eigenspace E
λ
j
= N(T λ
j
I) to find the eigenvectors.
Remember that E
λ
j
is a subspace of the original vector space V, so translate back if necessary!
4
Definition 1.3. The characteristic polynomial of T L(V) is the degree-n polynomial
p(t) := det(T tI)
The eigenvalues of T are precisely the solutions to the characteristic equation p(t) = 0.
Examples 1.4. 1. A =
0 1
1 0
has characteristic polynomial p(t) = t
2
+ 1 = (t + i)(t i). As a
linear map L
A
L(R
2
), A has no eigenvalues and no eigenvectors!
As a linear map L
A
L(C
2
), we have two eigenvalues ±i. Indeed
(A iI)v =
i 1
1 i
v = E
i
= Span
i
1
and similarly E
i
= Span
i
1
. We therefore have an eigenbasis β = {
i
1
,
i
1
} (of C
2
), with
respect to which
[L
A
]
β
=
i 0
0 i
2. Let T L(P
2
(R )) be defined by
T( f ) (x) = f (x) + (x 1) f
(x)
With respect to the standard basis ϵ = {1, x, x
2
}, we have the non-diagonal matrix
A = [T]
ϵ
=
1 1 0
0 2 2
0 0 3
= p(t) = det(A tI) = (1 t)(2 t)(3 t)
With three distinct eigenvalues, T is diagonalizable. To find the eigenvectors, compute the
nullspaces:
λ
1
= 1: 0 = (A λ
1
I)[v
1
]
ϵ
=
0 1 0
0 1 2
0 0 2
[v
1
]
ϵ
= [v
1
]
ϵ
Span
1
0
0
= E
1
= Span{1}
λ
2
= 2: A λ
2
I =
1 1 0
0 0 2
0 0 1
= [v
2
]
ϵ
Span
1
1
0
= E
2
= Span{1 x}
λ
3
= 3: A λ
3
I =
2 1 0
0 1 2
0 0 0
= [v
3
]
ϵ
Span
1
2
1
= E
3
= Span{1 2x + x
2
}
Making a sensible choice of non-zero eigenvectors, we obtain an eigenbasis, with respect to
which the linear map is necessarily diagonal
β = {v
1
, v
2
, v
3
} = {1, 1 x, 1 2x + x
2
} = {1, 1 x, (1 x)
2
}
T
a + b(1 x) + c(1 x)
2
= a + 2b(1 x) + 3c(1 x)
2
[T]
β
=
1 0 0
0 2 0
0 0 3
5
Conditions for diagonalizability of finite-dimensional operators
We now borrow a little terminology from the theory of polynomials.
Definition 1.5. Let F be a field and p(t) a polynomial with coefficients in F.
1. Let λ F be a root; p(λ) = 0. The algebraic multiplicity mult(λ) is the largest power of λ t to
divide p(t). Otherwise said, there exists
1
some polynomial q(t) such that
p(t) = ( λ t)
mult(λ)
q(t) and q(λ) = 0
2. We say that p(t) splits over F if it factorizes completely into linear factors; equivalently
a, λ
1
, . . . , λ
k
F such that
p(t) = a(λ
1
t)
m
1
···(λ
k
t)
m
k
When p(t) splits, the algebraic multiplicities sum to the degree n of the polynomial
n = m
1
+ ··· + m
k
Of course, we are most interested when p(t) is the characteristic polynomial of a linear map T L(V).
If such a polynomial splits, then a = 1 and λ
1
, . . . , λ
k
are necessarily the (distinct) eigenvalues of T.
Example 1.6. The field matters! For instance p(t) = t
2
+ 1 = (t i)(t + i) = ( i t)(i t) splits
over C but not over R. Its roots are plainly ±i.
For the purposes of review, we state the main result; this will be proved in the next section.
Theorem 1.7. Let V be finite-dimensional. A linear map T L(V) is diagonalizable if and only if,
1. Its characteristic polynomial splits over F, and,
2. The geometric and algebraic multiplicities of each eigenvalue are equal; dim E
λ
j
= mult(λ
j
).
Example 1.8. The matrix A =
3 1 0
0 3 0
0 0 5
is easily seen to have eigenvalues λ
1
= 3 and λ
2
= 5. Indeed
p(t) = (3 t)
2
(5 t), mult(3) = 2, mult( 5) = 1
E
3
= Span
1
0
0
, E
5
= Span
0
0
1
, dim E
3
= dim E
5
= 1
This matrix is non-diagonalizable since dim E
3
= 1 = 2 = mult(3).
Everything prior to this should be review. If it feels very unfamiliar, revisit your notes from 121A,
particularly sections 5.1 and 5.2 of the textbook.
1
The existence follows from Descartes factor theorem and the division algorithm for polynomials.
6
Exercises 1.1 1. For each matrix over R; find its characteristic polynomial, its eigenvalues/spaces,
and its algebraic and geometric multiplicities; decide if it is diagonalizable.
(a) A =
2 0 0 0
0 3 1 0
0 0 3 1
0 0 0 3
(b) B =
1 6 0 0
2 6 0 0
0 0 3 0
0 0 0 3
2. Suppose A is a real matrix with eigenpair (λ, v). If λ R show that (λ, v) is also an eigenpair.
3. Show that the characteristic polynomial of A =
3 4
4 3
does not split over R. Diagonalize A
over C.
4. Give an example of a 2 ×2 matrix whose entries are rational numbers and whose characteristic
polynomial splits over R, but not over Q.
5. Diagonalize L
C
L(C
2
) where C =
2i 1
2 0
.
6. If p(t) splits, explain why
det T = λ
mult(λ
1
)
1
···λ
mult(λ
k
)
k
where λ
1
, . . . , λ
k
are the distinct eigenvalues of T.
7. Suppose T L(V) is invertible with eigenvalue λ. Prove that λ
1
is an eigenvalue of T
1
with
the same eigenspace E
λ
. If T is diagonalizable, prove that T
1
is also diagonalizable.
8. If V is finite-dimensional and T L(V), we may define det T to equal det[T]
β
, where β is any
basis of V. Explain why the choice of basis does not matter; that is, if γ is any other basis of V,
we have det[T]
γ
= det[T]
β
.
7
1.2 Invariant Subspaces and the Cayley–Hamilton Theorem
The proof of Theorem 1.7 is facilitated by a new concept, of which eigenspaces are a special case.
Definition 1.9. Suppose T L(V). A subspace W of V is T-invariant if T(W) W. In such a case,
the restriction of T to W is the linear map
T
W
: W W : w 7 T(w)
Examples 1.10. 1. The trivial subspace {0} and the entire vector space V are invariant for any
linear map T L(V).
2. Every eigenspace is invariant; if v E
λ
, then T(v) = λv E
λ
.
3. Continuing Example 1.8, if A =
3 1 0
0 3 0
0 0 5
then W = Span
{
i, j
}
is an invariant subspace for the
linear map L
A
. Indeed
A(xi + yj) = (3x + y)i + 3yj W
W is an example of a generalized eigenspace; we’ll study these properly at the end of term.
To prove our diagonalization criterion, we need to see how to factorize the characteristic polynomial.
It turns out that factors of p(t) correspond to T-invariant subspaces!
Example 1.11. W = Span{i, j} is an invariant subspace of A =
1 2 4
0 3 1
0 0 2
M
3
(R ). With respect to
the standard basis, the restriction [L
A
]
W
has matrix
1 2
0 3
. The characteristic polynomial p
W
( t) of the
restriction is plainly a factor of the whole,
p(t) = (1 t)(2 t)(3 t) = (2 t)p
W
( t)
Theorem 1.12. Suppose T L(V), that dim V is finite and that W is a T-invariant subspace of V.
Then the characteristic polynomial of the restriction T
W
divides that of T.
The proof simply abstracts the approach of the example.
Proof. Extend a basis β
W
of W to a basis β of V. Since T(w) Span β
W
for each w W, we see that
the matrix of [T] has block form
[T]
β
=
A B
O C
= p(t) = det(A tI) det(C tI) = p
W
( t) det( C tI)
where p
W
( t) is the characteristic polynomial, and A = [T
W
]
β
W
the matrix of the restriction T
W
.
Corollary 1.13. If λ is an eigenvalue of T, then T
E
λ
= λI
E
λ
is a multiple of the identity, whence,
1. The characteristic polynomial of the restriction T
E
λ
is p
λ
( t) = (λ t)
dim E
λ
.
2. p
λ
( t) divides the characteristic polynomial of T. In particular dim E
λ
mult(λ).
8
We are now in a position to state and prove an extended version of Theorem 1.7.
Theorem 1.14. Suppose dim
F
V = n and that T L(V) has distinct eigenvalues λ
1
, . . . , λ
k
. The
following are equivalent:
1. T is diagonalizable.
2. The characteristic polynomial splits over F and dim E
λ
j
= mult(λ
j
) for each j; indeed
p(t) = p
λ
1
( t) ··· p
λ
k
( t) = (λ
1
t)
dim E
λ
1
···(λ
k
t)
dim E
λ
k
3.
k
j=1
dim E
λ
j
= n
4. V = E
λ
1
··· E
λ
k
Example 1.15. A =
7 0 12
0 1 0
2 0 3
is diagonalizable. Indeed p(t) = (1 t)
2
(3 t) splits, and we have
λ
1 3
mult(λ) 2 1
E
λ
Span
n
2
0
1
,
0
1
0
o
Span
3
0
1
dim E
λ
2 1
and R
4
= E
1
E
3
With respect to the eigenbasis β =
n
2
0
1
,
0
1
0
,
3
0
1
o
, the map is diagonal [L
A
]
β
=
1 0 0
0 1 0
0 0 3
Proof. (1 2) If T is diagonalizable with eigenbasis β, then [T]
β
is diagonal. But then
p(t) = ( λ
1
t)
m
1
···(λ
k
t)
m
k
splits and
mult(λ
i
) = n. The cardinality n of an eigenbasis is at most
dim E
λ
i
since every
element is an (independent) eigenvector. By Corollary 1.13 (dim E
λ
j
mult(λ
j
)) we see that
n
dim E
λ
j
mult(λ
j
) = n = j, dim E
λ
j
= mult(λ
j
)
whence the inequalities are equalities with each pair equal dim E
λ
j
= mult(λ
j
)
(2 3) p(t) splits = n =
mult(λ
j
) =
dim E
λ
j
(3 4) Assume E
λ
1
··· E
λ
j
exists.
2
If (λ
j+1
, v
j+1
) is an eigenpair, then v
j+1
E
λ
1
··· E
λ
j
for
otherwise this would contradict Lemma 1.2.
By induction, E
λ
1
··· E
λ
k
exists; by assumption it has dimension n = dim V and therefore
equals V.
(4 1) For each j, choose a basis β
j
of E
λ
j
. Then β := β
1
··· β
k
is a basis of V consisting of
eigenvectors of T; an eigenbasis.
2
Distinct eigenspaces have trivial intersection: i
1
= i
2
j = E
i
1
E
i
2
= {0}.
9
T-cyclic Subspaces and the Cayley–Hamilton Theorem
We finish this chapter by introducing a general family of invariant subspaces and using them to prove
a startling result.
Definition 1.16. Let T L(V) and let v V. The T-cyclic subspace generated by v is the span
v
= Span{v, T(v), T
2
( v), . . .}
Example 1.17. Recalling Example 1.10.3 Let A =
3 1 0
0 3 0
0 0 5
, and v = i + k. It is easy to see that
Av = 3i + 5k, A
2
v = 9i + 25k, . . . , A
m
v = 3
m
i + 5
m
k
all of which lie in Span{i, k}. Plainly this is the L
A
-cyclic subspace
i + k
.
The proof of the following basic result is left as an exercise.
Lemma 1.18.
v
is the smallest T-invariant subspace of V containing v, specifically:
1.
v
is T-invariant.
2. If W V is T-invariant and v W, then
v
W.
3. dim
v
= 1 v is an eigenvector of T.
We were lucky in the example that the general form A
m
v was so clear. It is helpful to develop a more
precise test for identifying the dimension and a basis of a T-cyclic subspace.
Suppose a T-cyclic subspace
v
= Span{v, T(v), T
2
( v), . . .} has finite dimension.
3
Let k 1 be
maximal such that the set
{v, T(v), . . . , T
k1
( v)}
is linearly independent.
If k doesn’t exist, the infinite linearly independent set {v, T(v), . . .} contradicts dim
v
< .
By the maximality of k, T
k
( v) Span{v, T(v), . . . , T
k1
( v)}; by induction this extends to
j k = T
j
( v) Span{v, T(v), . . . , T
k1
( v)}
It follows that
v
= Span{v, T(v), . . . , T
k1
( v)}, and we’ve proved a useful criterion.
Theorem 1.19. Suppose v = 0, then
dim
v
= k {v, T(v), . . . , T
k1
( v)} is a basis of
v
k is maximal such that {v, T(v), . . . , T
k1
( v)} is linearly independent
k is minimal such that T
k
( v) Span{v, T(v), . . . , T
k1
( v)}
3
Necessarily the situation if dim V < , when we are thinking about characteristic polynomials.
10
Examples 1.20. 1. According to the Theorem, in Example 1.17 we need only have noticed
v = i + k and Av = 3i + 5k are linearly independent.
That A
2
( i + k) = 9i + 25k Span{v, Av}.
We could then conclude that
v
= Span{v, Av} has dimension 2.
2. Let T(p(x)) = 3p(x) p
′′
(x) viewed as a linear map T L(P
2
(R )) and consider the T-cyclic
subspace generated by the polynomial p(x) = x
2
T(x
2
) = 3x
2
2, T
2
(x
2
) = T(3x
2
2) = 3(3x
2
2) 6 = 9x
2
12, . . .
Observe that {x
2
, T(x
2
) } is linearly independent, but that
T
2
(x
2
) = 9x
2
12 = 9x
2
+ 6(3x
2
2) Span{x
2
, T(x
2
) }
We conclude that dim
x
2
= 2. An alternative basis for
x
2
is plainly {1, x
2
}.
We finish by considering the interaction of a T-cyclic subspace with the characteristic polynomial.
Surprisingly, the coefficients of the characteristic polynomial and the linear combination coincide.
Continuing the Example, if W =
x
2
and β
W
= {x
2
, T(x
2
) } = {x
2
, 3x
2
2}, then
[T
W
]
β
W
=
0 9
1 6
= p
W
( t) = t
2
6t + 9
Theorem 1.21. Let T L(V) and suppose W =
w
has dim W = k with basis
β
W
= {w, T(w), . . . , T
k1
( w)}
in accordance with Theorem 1.19, then
1. If T
k
( w) + a
k1
T
k1
( w) + ···+ a
0
w = 0, then the characteristic polynomial of T
W
is
p
W
( t) = (1)
k
t
k
+ a
k1
t
k1
+ ··· + a
1
t + a
0
2. p
W
(T
W
) = 0 is the zero map on W.
Proof. 1. This is an exercise.
2. Write S L(V) for the linear map
S := p
W
(T) = (1)
k
T
k
+ a
k1
T
k1
+ ··· + a
0
I
Part 1 says S(w) = 0. Since S is a polynomial in T, it commutes with all powers of T:
j, S(T
j
( w)) = T
j
(S( w)) = 0
Since S is zero on the basis β
W
of W, we see that S
W
is the zero function.
11
With a little sneakiness, we can drop the W’s in the second part of the Theorem and observe an
intimate relation between a linear map and its characteristic polynomial.
Corollary 1.22 (Cayley–Hamilton). If V is finite-dimensional, then T L(V) satisfies its charac-
teristic polynomial; p(T) = 0.
Proof. Let w V and consider the cyclic subspace W =
w
generated by w. By Theorem 1.12,
p(t) = q
W
( t)p
W
( t)
for some polynomial q
W
. But the previous result says that p
W
(T) (w) = 0, whence
p(T)(w) = 0
Since we may apply this reasoning to any w V, we conclude that p(T) is the zero function.
Examples 1.23. 1. A =
2 1
3 4
has p(t) = t
2
6t + 5 and we confirm:
A
2
6A =
7 6
18 19
6
2 1
3 4
= 5I
It may seem like a strange thing to do for this matrix, but the characteristic equation can be
used to calculate the inverse of A:
A
2
6A + 5I = 0 = A(A 6I) = 5I = A
1
=
1
5
(6I A) =
1
5
4 1
3 2
2. We use the Cayley–Hamilton Theorem to compute A
4
when
A =
2 1
8
3
0 1 6
0 0 2
The characteristic polynomial is
p(t) = (2 t)
2
(1 t) = 4 8t + 5t
2
t
3
By Cayley–Hamilton,
A
4
= AA
3
= A(5A
2
8A + 4I)
= 5A
3
8A
2
+ 4A = 5(5A
2
8A + 4I) 8A
2
+ 4A
= 17A
2
36A + 20I = 17
4 3
50
3
0 1 18
0 0 4
36
2 1
8
3
0 1 6
0 0 2
+ 20
1 0 0
0 1 0
0 0 1
=
16 15
562
3
0 1 90
0 0 16
12
3. Recall Example 1.4.2, where the linear map T( f (x)) = f (x) + (x 1) f
(x) had
p(t) = (1 t)(2 t)(3 t) = t
3
+ 6t
2
11t + 6
By Cayley–Hamilton, T
3
= 6T
2
11T + 6I. You can check this explicitly, after first computing
T
2
( f (x)) = f (x) + 3(x 1) f
(x) + (x 1)
2
f
′′
(x), etc.
Cayley–Hamilton can also be used to simplify higher powers of T and even to compute the
inverse!
I =
1
6
(T
3
6T
2
+ 11T) = T
1
=
1
6
(T
2
6T + 11I)
= T
1
( f (x)) = f (x)
1
2
(x 1) f
(x) +
1
6
(x 1)
2
f
′′
(x)
Exercises 1.2 1. For the linear map T = L
A
: R
3
R
3
where A =
3 0 0
0 2 4
0 0 2
find the T-cyclic
subspace generated by the standard basis vector e
3
=
0
0
1
.
2. Let T = L
A
, where A =
1 2 4
0 3 1
0 0 2
and let v =
0
1
1
. Compute T(v) and T
2
( v). Hence describe
the T-cyclic subspace
v
and its dimension.
3. Given A =
2 0 0 0
0 3 1 0
0 0 3 1
0 0 0 3
, find two distinct L
A
-invariant subspaces W R
4
such that dim W = 3.
4. Suppose that W and X are T-invariant subspaces of V. Prove that the sum
W + X = {w + x : w W, x X}
is also T-invariant.
5. Prove Lemma 1.18.
6. Give an example of an infinite-dimensional vector space V, a linear map T L(V), and a vector
v such that
v
= V.
7. Let β = {sin x, cos x, 2x sin x, 3x cos x} and T =
d
dx
L(Span β). Plainly the subspace W :=
Span{sin x, cos x} is T-invariant. Compute the matrices [T]
β
and [T
W
]
β
W
and observe that
p(t) =
p
W
( t)
2
8. Verify explicitly that A =
2 3
0 2
satisfies its characteristic polynomial.
9. Check the details of Example 1.23.3 and evaluate T
4
as a linear combination of I, T and T
2
. In
particular, check the evaluation of T
1
( f (x)).
13
10. Suppose a, b are constants with a = 0 and define T( f (x)) = a f (x) + b f
(x).
(a) Find an expression for the inverse T
1
( f (x)) if T L(P
1
(R ))
(b) Find an expression for the inverse T
1
( f (x)) if T L(P
2
(R ))
Your answers should be written in terms of f and its derivatives.
11. Let T( f ) (x) = f
(x) +
1
x
R
x
0
f (t) dt be a linear map T L(P
2
(R )).
(a) Find the characteristic polynomial of T and identify its eigenspaces. Is T diagonalizable?
(b) Find a, b, c R such that T
3
= aT
2
+ bT + cI.
(c) What are dim L(P
2
(R )) and dim Span{T
k
: k N
0
}? Explain.
12. If A =
a b
c d
has non-zero determinant, use the Cayley–Hamilton Theorem to obtain the usual
expression for A
1
.
13. Recall Examples 1.10.3, 1.17, and 1.20.1 with A =
3 1 0
0 3 0
0 0 5
.
(a) If v =
x
y
z
, show that det(v, Av, A
2
v) = 4y
2
z
(b) Hence determine all L
A
-cyclic subspaces of R
3
.
14. (a) Consider Example 1.20.2 where T L(P
2
(R )) is defined by T(p(x)) = 3p(x) p
′′
(x).
Prove that all T-cyclic subspaces have dimension 2.
(b) What if we instead consider S L(P
2
(R ) defined by S(p(x)) = 3p(x) p
(x)?
15. We prove part 1 of Theorem 1.21.
(a) Explain why the matrix of T
W
with respect to the basis β
W
is
[T
W
]
β
W
=
0 0 0 ··· 0 a
0
1 0 0 0 a
1
0 1 0 0 a
2
.
.
.
.
.
.
.
.
.
0 0 0 0 a
k2
0 0 0 ··· 1 a
k1
M
k
(F )
(b) Compute the characteristic polynomial p
W
( t) = det
[T
W
]
β
W
tI
k
by expanding the de-
terminant along the first row.
14
2 Inner Product Spaces, part 1
You should be familiar with the scalar/dot product in R
2
. For any vectors x =
(
x
1
x
2
)
, y =
y
1
y
2
, define
x ·y := x
1
y
1
+ x
2
y
2
The norm or length of a vector is
||
x
||
=
x ·x.
The angle θ between vectors satisfies x ·y =
||
x
||||
y
||
cos θ.
Vectors are orthogonal or perpendicular precisely when x ·y = 0.
x
y
θ
The dot product is what allows us to compute lengths of and angles between vectors in R
2
. An inner
product is an algebraic structure that generalizes this idea to other vector spaces.
2.1 Real and Complex Inner Product Spaces
Unless explicitly stated otherwise, throughout this chapter F is either the real R or complex C field.
Definition 2.1. An inner product space (V,
,
) is a vector space V over F together with an inner
product: a function
,
: V × V F satisfying the following properties x, y, z V, λ F,
(a) Linear:
λx + y, z
= λ
x, z
+
y, z
(b) Conjugate-Symmetric:
y, x
=
x, y
(complex conjugate!)
(c) Positive-definite: x = 0 =
x, x
> 0
The norm or length of x V is
||
x
||
:=
p
x, x
. A unit vector has
||
x
||
= 1.
Vectors x, y are perpendicular/orthogonal if
x, y
= 0 and orthonormal if they are additionally unit
vectors.
Real inner product spaces
The definition simplifies slightly when F = R.
Conjugate-symmetry becomes plain symmetry:
y, x
=
x, y
.
Linearity + symmetry yields bilinearity: a real inner product is also linear in its second slot
x, λy + z
= λ
x, y
+
x, z
A real inner product is often termed a positive-definite, symmetric, bilinear form.
The simplest example is the natural generalization of the dot product.
Definition 2.2. Euclidean space means R
n
equipped with the standard inner (dot) product,
x, y
= x ·y = y
T
x =
n
j=1
x
j
y
j
= x
1
y
1
+ ··· + x
n
y
n
,
||
x
||
=
v
u
u
t
n
j=1
x
2
j
Unless the inner product is stated explicitly, if we refer to R
n
as an inner product space, we mean
Euclidean space. However, there are many other ways to make R
n
into an inner product space. . .
15
Example 2.3. Define an alternative inner product on R
2
via
x, y
= x
1
y
1
+ 3x
2
y
2
It is easy to check that this satisfies the required properties:
(a) Linearity: follows from the associative/distributive laws in R,
λx + y, z
= (λx
1
+ y
1
) z
1
+ 3(λx
2
+ y
2
) z
2
= λ(x
1
z
1
+ 3x
2
z
2
) + (y
1
z
1
+ 3y
2
z
2
)
= λ
x, z
+
y, z
(b) Symmetry: follows from the commutativity of multiplication in R,
y, x
= y
1
x
1
+ 3y
2
x
2
= x
1
y
1
+ 3x
2
y
2
=
x, y
(c) Positive-definiteness: if x = 0, then
x, x
= x
2
1
+ 3x
2
2
> 0.
With respect to
,
, the concept of orthogonality feels strange: e.g., {x, y} =
n
1
2
1
1
,
1
2
3
3
1
o
is an
orthonormal set!
||
x
||
2
=
1
4
(1
2
+ 3 ·1
2
) = 1,
||
y
||
2
=
1
12
(3
2
+ 3 ·1
2
) = 1,
x, y
=
1
4
3
(3 3) = 0
However, with respect to the standard dot product, these vectors are not special:
x ·x =
1
2
, y ·y =
5
6
, x ·y =
1
2
3
We have the same vector space R
2
, but different inner product spaces: (R
2
,
,
) = (R
2
, ·).
The above is an example of a weighted inner product: choose weights a
1
, . . . , a
n
R
+
and define
x, y
=
n
j=1
a
j
x
j
y
j
= a
1
x
1
y
1
+ ··· + a
n
x
n
y
n
It is a simple exercise to check that this defines an inner product on R
n
. In particular, R
n
may be
equipped with infinitely many distinct inner products!
More generally, a symmetric matrix A M
n
(R ) is positive-definite if x
T
Ax > 0 for all non-zero x R
n
.
It is straightforward to check that
x, y
:= y
T
Ax
defines an inner product on R
n
. In fact all inner products on R
n
arise in this fashion! The weighted
inner products correspond to A being diagonal (Euclidean space is A = I), but this is not required.
Example 2.4. The matrix A =
3 1
1 1
is positive-definite and thus defines an inner product
x, y
= 3x
1
y
1
+ x
1
y
2
+ x
2
y
1
+ x
2
y
2
16
Lemma 2.5 (Basic properties). Let V be an inner product space, let x, y, z V and λ F.
1.
0, x
= 0
2.
||
x
||
= 0 x = 0
3.
||
λx
||
=
|
λ
|||
x
||
4.
x, z
=
y, z
for all z = x = y
5. (Cauchy–Schwarz inequality)
|
x, y
|
||
x
||||
y
||
with equality
if and only if x, y are parallel
6. (Triangle Inequality)
||
x + y
||
||
x
||
+
||
y
||
with equality if and
only if x, y are parallel and point in the same direction
x + y
x
y
Triangle inequality
Be careful with notation:
|
λ
|
means the absolute value/modulus (of a scalar), while
||
x
||
means the norm
(of a vector).
In the real case, Cauchy–Schwarz allows us to define angle via cos θ =
x,y
||
x
||||
y
||
, since the right-hand
side lies in the interval [1, 1]. However, except in Euclidean R
2
and R
3
, this notion is of limited use;
orthogonality (
x, y
= 0) and orthonormality are usually all we care about.
Proof. Parts 1–3 are exercises. For simplicity, we prove 5 and 6 only when F = R.
4. Let z = x y, apply the linearity condition and part 2:
x, z
=
y, z
= 0 =
x y, z
=
x y, x y
=
||
x y
||
2
= x = y
5. If y = 0, the result is trivial. WLOG (and by part 3) we may assume
||
y
||
= 1; if the inequality
holds for this, then it holds for all non-zero y by parts 1 and 4. Now expand:
0
||
x
x, y
y
||
2
(positive-definiteness)
=
||
x
||
2
+
|
x, y
|
2
||
y
||
2
x, y
x, y
x, y
y, x
(bilinearity)
=
||
x
||
2
|
x, y
|
2
(symmetry)
Taking square-roots establishes the inequality.
By part 2, equality holds if and only if x =
x, y
y, which is precisely when x, y are linearly
dependent.
6. We establish the squared result.
||
x
||
+
||
y
||
2
||
x + y
||
2
=
||
x
||
2
+ 2
||
x
||||
y
||
+
||
y
||
2
||
x
||
2
x, y
y, x
||
y
||
2
= 2
||
x
||||
y
||
x, y
2
||
x
||||
y
||
|
x, y
|
0 (Cauchy–Schwarz)
Equality requires both equality in Cauchy–Schwarz (x, y parallel) and that
x, y
0; since x, y
are already parallel, this means that one is a non-negative multiple of the other.
17
Complex Inner Product Spaces
Definition 2.1 is already set up nicely when C = F. One subtle difference comes from how we expand
linear combinations in the second slot.
Lemma 2.6. An inner product is a positive-definite, conjugate-symmetric, sesquilinear
4
form: it is
conjugate-linear (anti-linear) in the second slot,
z, λx + y
= λ
z, x
+
z, y
The proof is very easy if you remember your complex conjugates; try it!
Warning! If you dabble in the dark arts of Physics, be aware that their convention
5
is for an inner
product to be conjugate-linear in the first entry and linear in the second!
Definition 2.7. The standard (Hermitian) inner product and norm on C
n
are
x, y
= y
x =
n
j=1
x
j
y
j
= x
1
y
1
+ ··· + x
n
y
n
,
||
x
||
=
v
u
u
t
n
j=1
x
j
2
where y
= y
T
is the conjugate-transpose of y and
x
j
is the modulus.
Weighted inner products may be defined as on page 16:
x, y
=
n
j=1
a
j
x
j
y
j
= a
1
x
1
y
1
+ ··· + a
n
x
n
y
n
Note that the weights a
j
must still be positive real numbers. We may similarly define inner products in
terms of positive-definite matrices
x, y
= y
Ax.
Definition 2.8. A matrix A M
n
(C ) is Hermitian (self-adjoint) if A
= A. It is moreover positive-
definite if x
Ax > 0 for all non-zero x C
n
.
The self-adjoint condition reduces to symmetry (A
T
= A) if A is a real matrix.
Example 2.9. It can be seen (Exercise 6) that A =
3 i
i 3
is positive-definite, whence
x, y
=
y
1
y
2
3 i
i 3
x
1
x
2
= 3x
1
y
1
+ ix
1
y
2
ix
2
y
1
+ 3x
2
y
2
defines an inner product on C
2
.
Almost all results in this chapter will be written for general inner product spaces, thus covering the
real and complex cases simultaneously. If you don’t feel confident with complex numbers, simply
let F = R and delete all complex conjugates at first read! Very occasionally a different proof will be
required depending on the field. For simplicity, examples will more often use real inner products.
4
The prefix sesqui- means one-and-a-half; for instance a sesquicentenary is a 150 year anniversary.
5
The common Physics notation relates to ours via
x |y
=
y, x
.
18
Further Examples
As before, the field F must be either R or C.
Definition 2.10 (Frobenius inner product). If A, B M
m×n
(F ), define
A, B
= tr(B
A)
where tr is the trace of an n ×n matrix; this makes M
m×n
(F ) into an inner product space.
This isn’t really a new example: if we map A F
m×n
by stacking the columns of A, then the
Frobenius inner product is the standard inner product in disguise.
Example 2.11. In M
3×2
(C ),
*
1 i
2 i 0
0 1
,
0 7
1 2i
3 2i 4
+
= tr
0 1 3 + 2i
7 2i 4
1 i
2 i 0
0 1
= tr
2 i 3 2i
9 4i 4 + 7i
= 2 i 4 + 7i = 2 8i
1 i
2 i 0
0 1
2
= tr
1 2 + i 0
i 0 1
1 i
2 i 0
0 1
= tr
6 i
i 2
= 8
Definition 2.12 (L
2
inner product). Given a real interval [a, b], the function
f , g
:=
Z
b
a
f (t)g(t) dt
defines an inner product on the space C[a, b] of continuous functions f : [a, b] F.
With careful restriction, this works even for infinite intervals and a larger class of functions.
6
Veri-
fying the required properties is straightforward if you know a little analysis; for instance continuity
allows us to conclude
||
f
||
2
=
Z
b
a
|
f (x)
|
2
dx = 0 f (x) 0
This is our first example of an infinite-dimensional inner product space.
Example 2.13. Let f (x) = x and g(x) = x
2
; these lie in the inner product space C[1, 1] with
respect to the L
2
inner product.
f , g
=
Z
1
1
x
3
dx = 0,
||
f
||
2
=
Z
1
1
x
2
dx =
2
3
,
||
g
||
2
=
Z
1
1
x
4
dx =
2
5
With some simple scaling, we see that
n
1
||
f
||
f ,
1
||
g
||
g
o
=
q
3
2
x,
q
5
2
x
2
forms an orthonormal set.
6
For us, functions will always be continuous (often polynomials) on closed bounded intervals. The square-integrable
functions and L
2
-spaces for which the inner product is named are a more complicated business and beyond this course.
19
Definition 2.14 (
2
inner product). A sequence (x
n
) is square-summable if
n=1
|
x
n
|
2
< . These se-
quences form a vector space on which we can define an inner product
7
(x
n
), (y
n
)
=
n=1
x
n
y
n
In essence we’ve taken the standard inner product on F
n
and let n ! This example, and its L
2
cousin, are the prototypical Hilbert spaces, which have great application to differential equations, sig-
nal processing, etc. Since a rigorous discussion requires a significant amount of analysis (convergence
of series, completeness, integrability), these objects are generally beyond the course.
Our final example of an inner product is a useful, and hopefully obvious, hack to which we shall
repeatedly appeal in examples.
Lemma 2.15. Let V be a vector space over R or C. If β is a basis of V, then there exists exactly one
inner product on V for which β is an orthonormal set.
Exercises 2.1 1. Evaluate the inner product of the given vectors.
(a) x =
1
2
1
, y =
1
1
1
where
x, y
= 2x
1
y
1
+ 3x
2
y
2
+ x
3
y
3
(b) x =
1
2i
, y =
5i
4
where
x, y
is the standard Hermitian inner product on C
2
(c) x =
1
2i
, y =
5i
4
where
x, y
= y
2 i
i 2
x
(d) f (x) = x 1, g(x) = x + 1 where
f , g
is the L
2
inner product on C[0, 2]
2. Suppose
x, y
=
x, z
for all x. Prove that y = z.
3. For each z V, The linearity condition says that the map T
z
: V F defined by
T
z
( x) =
x, z
is linear (T
z
is an element of the dual space V
). What, if anything, can you say about the function
U
z
: x 7
z, x
?
4. Define
x, y
:=
n
j=1
x
j
y
j
on C
n
. Is this an inner product? Which of the properties (a), (b), (c)
from Definition 2.1 does it satisfy?
5. (a) Verify that the matrix in Example 2.4 is positive-definite and that
x, y
= 3x
1
y
1
+ x
1
y
2
+
x
2
y
1
+ x
2
y
2
therefore defines an inner product on R
2
.
(Hint: Try to write
||
x
||
2
as a sum of squares. . . )
(b) Let x =
1
1
. With respect to the inner product in part (a), find a non-zero unit vector y
which is orthogonal to x.
6. By multiplying out
|
x
1
ix
2
|
2
, show that the matrix
3 i
i 3
is positive-definite.
(Hint: recall that
|
a
|
2
= aa for complex numbers!!!)
7
Neither of these facts are obvious; in particular, we’d need to see that the sum of two square-summable sequences is
also square-summable.
20
7. Show that every eigenvalue of a positive definite matrix is positive.
8. Prove parts 1, 2 and 3 of Lemma 2.5.
9. Let V be an inner product space, prove Pythagoras’ Theorem:
If
x, y
= 0, then
||
x + y
||
2
=
||
x
||
2
+
||
y
||
2
10. Use basic algebra to prove the Cauchy–Schwarz inequality for vectors x =
(
a
b
)
and y =
(
c
d
)
in
R
2
with the standard (dot) product.
11. Prove the Cauchy–Schwarz and triangle inequalities for a complex inner product space. What
has to change compared to the proof of Lemma 2.5?
12. Prove the polarization identities:
(a) In any real inner product space:
x, y
=
1
4
||
x + y
||
2
1
4
||
x y
||
2
(b) In any complex inner product space:
x, y
=
1
4
4
k=1
i
k
x + i
k
y
2
If you know the length of every vector, then you know the inner product!
13. Prove that
R
2
0
x
x+1
dx
2
3
(Hint: use Cauchy–Schwarz)
14. Let m Z and consider the complex-valued function f
m
(x) =
1
2π
e
imx
. If
,
is the L
2
inner
product on C[π, π], prove that {f
m
: m Z} is an orthonormal set.
This example is central to the study of Fourier series.
(Hint: If complex functions are scary, use Eulers formula e
imx
= cos mx + i sin mx and work with the
real-valued functions cos mx and sin mx. The difficulty is that you then need integration by parts. . . )
15. Let
,
be an inner product on F
n
(recall that F = R or C). Define the matrix A M
n
(F ) by
A
jk
=
e
k
, e
j
where {e
1
, . . . , e
n
} is the standard basis. Verify that A is the matrix of the inner
product:
x, y F
n
,
x, y
= y
Ax
In particular,
A is a Hermitian/self-adjoint matrix: A
= A (if F = R this is simply symmetric);
A is positive-definite: for all x F
n
, x
Ax > 0;
More generally, if β = {v
1
, . . . , v
n
} is a basis then A
jk
=
v
k
, v
j
defines the matrix of the inner
product with respect to β:
x, y
= [y]
β
A[x]
β
21
2.2 Orthogonal Sets and the Gram–Schmidt Process
We start with a simple definition, relating subspaces to orthogonality.
Definition 2.16. Let U be a subspace of an inner product space V. The orthogonal complement U
is
the set
U
= {x V : u U,
x, u
= 0}
It is easy to check that U
is itself a subspace of V and that U U
= {0}. It can moreover be seen
that U (U
)
, though equality need not hold in infinite dimensions (see Exercise 7).
Example 2.17. U = Span
n
1
0
0
,
0
1
3
o
R
3
has orthogonal complement U
= Span
n
0
3
1
o
.
As in the example, we often have a direct sum decomposition
V = U U
: otherwise said,
x V, unique u U, w U
such that x = u + w
In such a case, we’ll call u = π
U
( x) and w = π
U
( x) the or-
thogonal projections of x onto U and U
respectively. As our
first result shows, these are easy to compute whenever U has
a finite orthogonal basis.
Theorem 2.18. Let V be an inner product space and let U = Span β where β = {u
1
, . . . , u
n
} is an
orthogonal set of non-zero vectors;
u
j
, u
k
=
(
0 if j = k
u
j
2
= 0 if j = k
Then:
1. β is a basis of U and each x U has unique representation
x =
n
j=1
x, u
j
u
j
2
u
j
()
This simplifies to x =
x, u
j
u
j
if β is an orthonormal set.
2. V = U U
. For any x V, we may write x = u + w where
u =
n
j=1
x, u
j
u
j
2
u
j
U and w = x u U
Observe that () essentially calculates the co-ordinate vector [x]
β
F
n
. Recalling how unpleasant
such calculations have been in the past, often requiring large matrix inversions, we immediately see
the power of inner products and orthogonal bases.
22
Proof. 1. Since β spans U, a given x U may be written
x =
n
k=1
a
k
u
k
for some scalars a
k
. The orthogonality of β recovers the required expression for a
j
:
x, u
j
=
n
k=1
a
k
u
k
, u
j
= a
j
u
j
2
Finally, let x = 0 to see that β is linearly independent.
2. Clearly u U. For each u
k
, the orthogonality of β tells us that
w, u
k
=
x u, u
k
=
x, u
k
n
j=1
x, u
j
u
j
2
u
j
, u
k
=
x, u
k
x, u
k
= 0
Since w is orthogonal to a basis of U it is orthogonal to any element of U; we conclude that
w U
. Finally, U U
= {0} forces the uniqueness of the decomposition x = u + w,
whence V = U U
.
Examples 2.19. 1. Consider the standard orthonormal basis β = {e
1
, e
2
} of R
2
. For any x =
(
x
1
x
2
)
,
we easily check that
2
j=1
x, e
j
e
j
= x
1
e
1
+ x
2
e
2
= x
2. In R
3
, β = {u
1
, u
2
, u
3
} =
n
1
2
3
,
2
1
0
,
3
6
5
o
is an orthogonal set and thus a basis. We
compute the co-ordinates of x =
7
4
2
with respect to β:
x =
3
j=1
x, u
j
u
j
2
u
j
=
7 + 8 + 6
1 + 4 + 9
1
2
3
+
14 4 + 0
4 + 1 + 0
2
1
0
+
21 + 24 10
9 + 36 + 25
3
6
5
=
3
2
1
2
3
+ 2
2
1
0
+
1
2
3
6
5
= [x]
β
=
3/2
2
1/2
Compare this with the painfully slow augmented matrix method for finding co-ordinates!
3. Revisiting Example 2.17, let x =
1
1
1
. Since β =
n
1
0
0
,
0
1
3
o
is an orthogonal basis of U, we
observe that
π
U
( x) =
1
1
1
0
0
+
2
10
0
1
3
=
1
5
5
1
3
, π
U
( x) = x π
U
( x) =
2
5
0
3
1
are the orthogonal projections corresponding to R
3
= U U
.
23
The Gram–Schmidt Process
Theorem 2.18 tells us how to compute the orthogonal projections corresponding to V = U U
,
provided U has a finite, orthogonal basis. Given how useful such bases are, our next goal is to see that
such exist for any finite-dimensional subspace. Helpfully there exists a constructive algorithm.
Theorem 2.20 (Gram–Schmidt). Suppose S = {s
1
, . . . , s
n
} is a linearly independent subset of an
inner product space V. Construct a sequence of vectors u
1
, . . . , u
n
inductively:
Choose u
1
= a
1
s
1
where a
1
= 0
For each k 2, choose u
k
= a
k
s
k
k1
j=1
s
k
, u
j
u
j
2
u
j
!
where a
k
= 0 ()
Then β := {u
1
, . . . , u
n
} is an orthogonal basis of Span S.
The purpose of the scalars a
k
is to give you some freedom; choose them to avoid unpleasant fractions!
If you want a set of orthonormal vectors, it is easier to scale everything after the algorithm is complete.
Indeed, by taking S to be a basis of V and normalizing the resulting β, we conclude:
Corollary 2.21. Every finite-dimensional inner product space has an orthonormal basis.
Example 2.22. S = {s
1
, s
2
, s
3
} =
n
1
0
0
,
2
1
3
,
1
1
1
o
is a linearly independent subset of R
3
.
1. Choose u
1
= s
1
=
1
0
0
=
||
u
1
||
2
= 1
2. s
2
s
2
,u
1
||
u
1
||
2
u
1
=
2
1
3
2
1
1
0
0
=
0
1
3
: choose u
2
=
0
1
3
=
||
u
2
||
2
= 10
3. s
3
s
3
,u
1
||
u
1
||
2
u
1
s
3
,u
2
||
u
2
||
2
u
2
=
1
1
1
1
1
1
0
0
2
10
0
1
3
=
2
5
0
3
1
: choose u
3
=
0
3
1
The orthogonality of β = {u
1
, u
2
, u
3
} is clear. It is now trivial to observe that
n
u
1
,
1
10
u
2
,
1
10
u
3
o
is
an orthonormal basis of R
3
.
Proof of Theorem 2.20. For each k n, define S
k
= {s
1
, . . . , s
k
} and β
k
= {u
1
, . . . , u
k
}. We prove by
induction that each β
k
is an orthogonal set of non-zero vectors and that Span β
k
= Span S
k
. The
Theorem is then the terminal case k = n.
(Base case k = 1) Certainly β
1
= {a
1
s
1
} is an orthogonal set and Span β
1
= Span S
1
.
(Induction step) Fix k 2, assume β
k1
is an orthogonal non-zero set and that Span β
k1
=
Span S
k1
. By Theorem 2.18, u
k
(Span β
k1
)
. We also see that u
k
= 0, for if not,
( ) = s
k
Span β
k1
= Span S
k1
and S would be linearly dependent. It follows that β
k
is an orthogonal set of non-zero vectors.
Moreover, s
k
Span β
k
= Span S
k
Span β
k
. Since these spaces have the same (finite) dimension
k, we conclude that Span β
k
= Span S
k
.
By induction, β is an orthogonal, non-zero spanning set for Span S; by Theorem 2.18, it is a basis.
24
Example 2.23. This time we work in the space of polynomials P(R) equipped with the L
2
inner
product
f , g
=
R
1
0
f (x)g(x) dx on the interval [0, 1]. Let S = {1, x, x
2
} and apply the algorithm:
1. Choose f
1
(x) = 1 =
||
f
1
||
2
=
R
1
0
1 dx = 1
2. x
x, f
1
||
f
1
||
2
f
1
= x
R
1
0
x dx = x
1
2
We choose f
2
(x) = 2x 1, with
||
f
2
||
2
=
R
1
0
(
2x 1
)
2
dx =
1
3
3. x
2
x
2
, f
1
||
f
1
||
2
f
1
x
2
, f
2
||
f
2
||
2
= x
2
R
1
0
x
2
dx
R
1
0
x
2
(
2x1
)
dx
1/3
(
2x 1
)
= x
2
x +
1
6
We choose f
3
(x) = 6x
2
6x + 1 with
||
f
3
||
2
=
R
1
0
6x
2
6x + 1
2
dx =
1
5
It follows that Span S has an orthonormal basis
β =
n
1,
3( 2x 1),
5( 6x
2
6x + 1)
o
This example can be extended to arbitrary degree since the countable set {1, x, x
2
, . . .} is basis of
P(R). Indeed this shows that (P(R),
,
) has an orthonormal basis.
Gram–Schmidt also shows that our earlier discussion of orthogonal projections is generic.
Corollary 2.24. If U is a finite-dimensional subspace of an inner product space V, then V = U U
and the orthogonal projections may be computed as in Theorem 2.18.
If U is an infinite-dimensional subspace of V, then we need not have V = U U
and the orthog-
onal projections might not be well-defined (see, for example, Exercises 7 and 8). Instead, if β is an
orthonormal basis of U, it is common to describe the coefficients
x, u
for each u β as the Fourier
coefficients, and the infinite sum
uβ
x, u
u
as the Fourier series of x, provided the sum converges.
Exercises 2.2 1. Apply Gram–Schmidt to obtain an orthogonal basis β for Span S. Then obtain the
co-ordinate representation (Fourier coefficients) of the given vector with respect to β.
(a) S =
n
1
1
1
,
0
1
1
,
0
0
1
o
R
3
, x =
1
0
1
(b) S =
3 5
1 1
,
1 9
5 1
,
7 17
2 6
M
2
(R ), X =
1 27
4 8
(use the Frobenius product)
(c) S =
n
1
i
0
,
0
1
i
,
0
0
1
o
C
3
with x =
1
1
1
(d) S = {1, x, x
2
} with
f , g
=
R
1
1
f (x)g(x) dx and f (x) = x
2
.
Important! You’ll likely need much more practice than this to get comfortable with Gram–
Schmidt; make up your own problems!
25
2. Let S = {s
1
, s
2
} =
n
1
0
3
,
2
1
0
o
and U = Span S R
3
. Find π
U
( x) if x =
3
1
2
.
(Hint: First apply Gram–Schmidt)
3. Find the orthogonal complement to U = Span{x
2
} P
2
(R ) with respect to the inner product
f , g
=
R
1
0
f (t)g(t) dt
4. Let T L(V, W) where V, W are inner product spaces with orthonormal bases β = {v
1
, . . . , v
n
}
and γ = {w
1
, . . . , w
m
} respectively. Prove that the matrix A = [T]
γ
β
M
m×n
(F ) of T with
respect to these bases has jk
th
entry
A
jk
=
T(v
k
), w
j
5. Suppose that β is an orthonormal basis of an n-dimensional inner product space V. Prove that,
x, y V,
x, y
= [y]
β
[x]
β
Otherwise said, the co-ordinate isomorphism ϕ
β
: V F
n
defined by ϕ
β
( x) = [x]
β
is an isomorphism
of inner product spaces where we use the standard inner product on F
n
6. Let U be a subspace of an inner product space V. Prove the following:
(a) U
is a subspace of V.
(b) U U
= {0}
(c) U (U
)
(d) If V = U U
, then U = (U
)
(this is always the case when dim U < )
7. Let
2
be the set of square-summable sequences of real numbers (Definition 2.14). Consider the
sequences u
1
, u
2
, u
3
, . . ., where u
j
is the zero sequence except for a single 1 in the j
th
entry. For
instance,
u
4
= (0, 0, 0, 1, 0, 0, 0, 0, . . .)
(a) Let U = Span{u
j
: j N}. Prove that U
contains only the zero sequence.
(b) Show that the sequence y =
1
n
lies in
2
, but does not lie in U.
U is therefore a proper subset of (U
)
=
2
and
2
= U U
.
8. Recall Exercise 2.1.14 where we saw that the set β = {
1
2π
e
imx
: m Z} is orthonormal with
respect to
f , g
=
R
π
π
f (t)g(t) dt.
(a) Show that the Fourier series of f (x) = x is
F(x) :=
m=
D
x,
1
2π
e
imx
E
1
2π
e
imx
=
m=1
2(1)
m+1
m
sin mx
(b) Briefly explain why the Fourier series is not an element of Span β.
(c) Sketch a few of the Fourier approximations (sum up to m = 5 or 7. . . ) and observe, when
extended to R, how they approximate a discontinuous periodic function.
9. (Hard) Much of Theorem 2.18 remains true, with suitable modifications, even if β is an infinite
set. Restate and prove as much as you can, and identify the false part(s).
26
2.3 The Adjoint of a Linear Operator
Recall how the standard inner product on F
n
may be written in terms of the conjugate-transpose
x, y
= y
x = y
T
x
We start by inserting a matrix into this expression and interpreting in two different ways. Suppose
A M
m×n
(F ), v F
n
and w F
m
, then
A
w, v
| {z }
in F
n
= v
(A
w) = ( v
A
) w = (Av)
w =
w, Av
| {z }
in F
m
(†)
Example 2.25. As a sanity check, let A =
1 2
0 3
M
2
(R ), w =
(
x
y
)
and v =
p
q
. Then,
A
T
x
y
,
p
q
=
1 0
2 3
x
y
,
p
q
=
x
2x + 3y
,
p
q
= xp + (2x + 3y)q
x
y
, A
p
q
=
x
y
,
1 2
0 3
p
q
=
x
y
,
p + 2q
3q
= x(p + 2q) + 3yq
Note how the inner products are evaluated on different spaces. At the level of linear maps this is a
relationship between L
A
L(F
n
, F
m
) and L
A
L(F
m
, F
n
), one that is easily generalizable.
Definition 2.26. Let T L(V, W) where V, W are inner product spaces over the same field F. The
adjoint of T is a function T
: W V (read ‘T-star’) satisfying
v V, w W,
T
( w), v
=
w, T(v)
Note that the first inner product is computed within V and the second within W.
The adjoint effectively extends the conjugate-transpose to linear maps. We now use the same notation
for three objects, so be careful!
If A is a real or complex matrix, then A
= A
T
is its conjugate-transpose.
If T is a linear map, then T
is its adjoint.
If V is a vector space, then V
= L(V, F) is its dual space.
Thankfully the two notations line up nicely, as part 3 of our first result shows.
Theorem 2.27 (Basic Properties). 1. If an adjoint exists,
8
then it is unique and linear.
2. If T and S have adjoints, then
(T
)
= T, (TS)
= S
T
, (λT + S)
= λT
+ S
3. Suppose V, W are finite-dimensional with orthonormal bases β, γ respectively. Then the matrix
of the adjoint of T L(V, W) is the conjugate-transpose of the original: [T
]
β
γ
= ([T]
γ
β
)
.
8
Existence of adjoints is trickier, so we postpone this a little: see Corollary 2.34 and Exercise 12.
27
Proof. 1. (Uniqueness) Suppose T
and S
are adjoints of T. Then
T
( x), y
=
x, T(y)
=
S
( x), y
Since this holds for all y, Lemma 2.5 part 4 says that x, T
( x) = S
( x), whence T
= S
.
(Linearity) Simply translate across, use the linearity of T, and again appeal to Lemma 2.5:
z,
T
( λx + y), z
=
λx + y, T(z)
= λ
x, T(z)
+
y, T(z)
= λ
T
( x), z
+
T
( y), z
=
λT
( x) + T
( y), z
= T
( λx + y) = λT
( x) + T
( y)
2. These may be proved similarly to part 1 and are left as an exercise.
3. By Exercise 2.2.4, the jk
th
entry of [T
]
β
γ
is
T
( w
k
), v
j
=
w
k
, T(v
j
)
=
T(v
j
), w
k
= A
kj
We revisit our motivating set-up (†) in the language of part 3. Suppose:
V = F
n
and W = F
m
have standard orthonormal bases β = {e
1
, . . . , e
n
} and γ = {e
1
, . . . , e
m
}.
T = L
A
L(F
n
, F
m
).
Since the matrix of T with respect to the standard bases is simply A itself, the theorem confirms our
earlier observation that the adjoint of L
A
is left multiplication by the conjugate-transpose A
:
[T
]
β
γ
= ([T]
γ
β
)
= A
= [L
A
]
β
γ
= T
= (L
A
)
= L
A
Here is another straightforward example, this time using the standard inner products on C
2
and C
3
.
Example 2.28. Let T = L
A
L(C
3
, C
2
) where A =
i 1 3
2 1i 4+2i
.
Plainly A = [T]
γ
β
with respect to β = {e
1
, e
2
, e
3
} and γ = {e
1
, e
2
}. We conclude that T
= L
A
:
[T
]
β
γ
= A
=
i 2
1 1 + i
3 4 2i
As a sanity check, multiply out a few examples of
A
w, v
=
w, Av
; make sure you’re comfortable
with the fact that the left inner product is on C
2
and the right on C
3
!
The Theorem tells us that every linear map T L(V, W) between finite-dimensional spaces has an
adjoint and moreover how to compute it:
1. Choose orthonormal bases (exist by Corollary 2.21) and find the matrix [T]
γ
β
.
2. Take the conjugate-transpose
[T]
γ
β
and translate back to find T
L(W, V) .
The prospect of twice applying Gram–Schmidt and translating between linear maps and their ma-
trices is unappealing; calculating this way can quickly become an enormous mess! In practice, it is
often better to try a modified approach; see for instance part 2(b) of the next Example.
28
Examples 2.29. Let T =
d
dx
L(P
1
(R )) be the derivative operator; T(a + bx) = b. We treat P
1
(R )
as an inner product space in two ways.
1. Equip the inner product for which the standard basis ϵ = {1, x} is orthonormal. Then
[T]
ϵ
=
0 1
0 0
= [T
]
ϵ
=
0 0
1 0
= T
(a + bx) = ax
2. Equip the L
2
inner product
f , g
=
R
1
0
f (x)g(x) dx. As we saw in Example 2.23, the basis
β = {f
1
, f
2
} = {1, 2x 1} is orthogonal with
||
f
1
||
= 1 and
||
f
2
||
=
1
3
. We compute the
adjoint of T =
d
dx
in two different ways.
(a) The basis γ = {g
1
, g
2
} =
n
f
1
,
3 f
2
o
=
n
1,
3( 2x 1)
o
is orthonormal. Observe that
T(g
1
) = 0, T(g
2
) = 2
3 = [T]
γ
=
0 2
3
0 0
= [T
]
γ
=
0 0
2
3 0
= T
(a + bx) = T
a +
b
2
+
b
2
3
3( 2x 1)
=
a +
b
2
·2
3g
2
= 3(2a + b)(2x 1)
(b) Use the orthogonal basis β and the projection formula (Theorem 2.18). With p(x) = a + bx,
T
(p) =
T
(p), f
1
||
f
1
||
2
f
1
+
T
(p), f
2
||
f
2
||
2
f
2
=
p, T( 1)
+
p, T( 2x 1)
·3(2x 1)
=
p, 0
+ 3
p, 2
(2x 1) = 3
Z
1
0
2(a + bx) dx
(2x 1)
= 3(2a + b)(2x 1)
Note the advantages here: no square roots and no need to change basis at the end!
The calculations for the second example were much nastier, even though we were already in posses-
sion of an orthogonal basis. The crucial point is that the two examples produce different maps T
: the
adjoint depends on the inner product!
Why should we care about adjoints?
Adjoints might seem merely to be an abstraction of something simple (transposes) for its own sake.
A convincing explanation of why adjoints are useful takes a lot of work; here is a short version.
Given a linear map T L(V) on an inner product space, we now have two desirable types of basis.
1. Eigenbasis: diagonalizes T.
2. Orthonormal basis: recall (Theorem 2.18) how these simplify computations.
The capstone result of this course is the famous spectral theorem (Theorem 2.37) which says, in short,
that self-adjoint operators (T
= T) have an orthonormal eigenbasis, the holy grail of easy computation!
Such operators are important both theoretically and in applications such as quantum mechanics.
29
The Fundamental Subspaces Theorem
To every linear map are associated its range and nullspace. These interact nicely with the adjoint. . .
Theorem 2.30. If T L(V, W) has adjoint T
, then,
1. R(T
)
= N(T)
2. If V is finite dimensional, then R(T
) = N(T)
The corresponding results hold if we swap V W and T T
.
The proof is left to Exercise 6. You’ve likely observed this with transposes of small matrices.
Example 2.31. Let A =
1 2 1
0 3 2
. Viewed as a linear map between Euclidean spaces, T = L
A
has
adjoint T
= L
A
T
. It is easy to compute the relevant subspaces:
R(A) = R
2
, N(A
T
) = {0}, R(A
T
) = Span
n
1
2
1
,
0
3
2
o
, N(A) = Span
1
2
3
The Riesz Representation Theorem
This powerful result demonstrates a natural relation between an inner product space and its dual-
space V
= L(V, F).
Theorem 2.32. If V is finite-dimensional and g : V F is linear, then there exists a unique y V
such that g(x) =
x, y
.
Example 2.33. g(p) :=
R
1
0
p(x) dx is a linear map g : P
2
(R ) R. Equip P
2
(R ) with the inner
product for which the standard basis {1, x, x
2
} is orthonormal. Then
g(a + bx + cx
2
) = a +
1
2
b +
1
3
c =
a + bx + cx
2
, 1 +
1
2
x +
1
3
x
2
We conclude that g(p) =
p, q
, where q(x) = 1 +
1
2
x +
1
3
x
2
.
The idea of the proof is very simple: if g(x) =
x, y
then the nullspace of g must equal Span{y}
. . .
Proof. If g is the zero map, take y = 0. Otherwise, rank g = 1 and
V = N(g) N(g)
where dim N(g)
= 1 (rank–nullity theorem and Exercise 2.2.6)
Let u N(g)
be either of the two unit vectors and define, independently of u,
y := g(u)u V
Following the decomposition, write x = n + αu where n N(g) and observe that
x, y
=
D
n + αu, g(u)u
E
= g(u)α = 0 + g(αu) = g(n + αu) = g(x)
The uniqueness of y follows from the cancellation property (Lemma 2.5, part 4).
30
Due to the tight correspondence, the map is often decorated as g
y
. Riesz’s theorem indeed says that
y 7 g
y
is an isomorphism V
=
V
. While there are infinitely many isomorphisms between these
spaces, the inner product structure identifies a canonical or preferred choice.
Corollary 2.34. Every linear map on a finite-dimensional inner product space has an adjoint.
Note how only the domain is required to be finite-dimensional! Riesz’s Theorem and the Corollary
also apply to continuous linear operators on (infinite-dimensional) Hilbert spaces, though the proof
is a little trickier.
Proof. Let T L(V, W) where dim V < , and suppose z W is given. Simply define T
( z) := y
where y V is the unique vector in Riesz’s Theorem arising from the linear map
g : V F, g(x) =
T(x), z
and check the required property:
T
( z), x
=
y, x
= g(x) =
z, T(x)
Exercises 2.3 1. For each inner product space V and linear operator T L(V), evaluate T
on the
given vector.
(a) V = R
2
with the standard inner product, T
(
x
y
)
=
2x+y
x3y
and x =
3
5
(b) V = C
2
with the standard inner product, T
(
z
w
)
=
2z+iw
(1i)z
and x =
3i
1+2i
(c) V = P
1
(R ) with
f , g
=
R
1
0
f (t)g(t) dt, T( f ) = f
+ 3f and f (t) = 4 2t
2. Suppose A =
1 1
4 3
and consider the linear map T = L
A
L(R
2
) where R
2
is equipped with
the weighted inner product
x, y
= 4x
1
y
1
+ x
2
y
2
(a) Find the matrix of T with respect to the orthonormal basis β = {v
1
, v
2
} = {
1
2
e
1
, e
2
}.
(b) Find the adjoint T
and its matrix with respect to the standard basis ϵ = {e
1
, e
2
}.
(Hint: the answer isn’t A
T
!)
3. Extending Examples 2.29, find the adjoint of T =
d
dx
L(P
2
(R )) with respect to:
(a) The inner product where the standard basis ϵ = {1, x, x
2
} is orthonormal.
(b) (Hard!) The L
2
inner product
R
1
0
f (x)g(x) dx.
4. Let T( f ) = f
′′
be a linear transformation of P
2
(R ) and let ϵ = {1, x, x
2
} be the standard basis.
Find T
(a + bx + cx
2
):
(a) With respect to the inner product where ϵ is orthonormal;
(b) With respect to the L
2
inner product
f , g
=
R
1
1
f (t)g(t) dt.
(Hint: {1, x, 3x
2
1} is orthogonal)
31
5. Prove part 2 of Theorem 2.27.
6. Prove the Fundamental Subspaces Theorem 2.30.
7. For each inner product space V and linear transformation g : V F, find a vector y V such
that g(x) =
x, y
for all x V.
(a) V = R
3
with the standard inner product, and g
x
y
z
= x 2y + 4z
(b) V = C
2
with the standard inner product, and g
(
z
w
)
= iz 2w
(c) V = P
2
(R ) with the L
2
inner product
f , h
=
R
1
0
f (x)h(x) dx, and g( f ) = f
(1)
8. (a) In the proof of Theorem 2.32, explain why y depends only on g (not u).
(b) In the proof of Corollary 2.34, check that g(x) :=
T(x), z
is linear.
9. Let y, z V be fixed vectors and define T L(V) by T(x) =
x, y
z. Show that T
exists and
find an explicit expression.
10. Suppose A M
m×n
(F ). Prove that A
A is diagonal if and only if the columns of A are orthog-
onal. What additionally would it mean if A
A = I?
11. Suppose T L(V) where V is a finite-dimensional inner product space.
(a) Prove that the eigenvalues of T
are the complex conjugates of those of T.
(Hint: relate the characteristic polynomial p
( t) = det(T
tI) to that of T)
(b) Prove that T
is diagonalizable if and only if T is.
12. (Hard) We present two linear maps which do not have an adjoint!
(a) Since ϵ = {1, x, x
2
, . . .} is a basis of P(R), we may define a linear map T L(P(R)) via
T(x
n
) = 1 for all n; for instance
T( 4 + 3x + 2x
5
) = 9
Let
,
be the inner product for which ϵ is orthonormal. If T
existed, show that
T
(1) =
n=0
x
n
would be an infinite series: T
therefore does not exist.
(b) For a related challenge, recall the space
2
of square-summable real sequences. For any
sequence (x
n
)
n=1
2
), define T L(
2
) via
T(x
n
) =
n=1
1
n
x
n
, 0, 0, 0, 0, 0, . . .
!
Find the adjoint T
. If V
2
is the subspace whose elements have only finitely many
non-zero terms, show that the restriction T
V
does not have an adjoint.
32
2.4 Normal & Self-Adjoint Operators and the Spectral Theorem
We now come to the fundamental question of this chapter: for which linear operators T L( V) can
we find an orthonormal eigenbasis? Many linear maps are, of course, not even diagonalizable, so in
general this is far too much to hope for! Let’s see what happens if such a basis exists. . .
If β is an orthonormal basis of eigenvectors of T, then
[T]
β
= diag(λ
1
, ··· , λ
n
) = [T
]
β
= diag(λ
1
, ··· , λ
n
)
If V is a real inner product space, then these matrices are identical and so T
= T. In the complex
case, we instead observe that
[TT
]
β
= diag(
|
λ
1
|
2
, ··· ,
|
λ
n
|
2
) = [T
T]
β
= TT
= T
T
Definition 2.35. Suppose T is a linear operator on an inner product space V and assume T has an
adjoint. We say that T is,
Normal if TT
= T
T,
Self-adjoint if T
= T.
The definitions for square matrices over R and C are identical, where
now denotes the conjugate-
transpose.
A real self-adjoint matrix A M
n
(R ) is plainly symmetric: A
T
= A. A complex self-adjoint matrix is
also called Hermitian: A
= A.
If T is self-adjoint then it is certainly normal, but the converse is false as the next example shows.
Example 2.36. The (non-symmetric) real matrix A =
2 1
1 2
is normal but not self-adjoint:
AA
T
=
2 1
1 2
2 1
1 2
=
5 0
0 5
=
2 1
1 2
2 1
1 2
= A
T
A
More generally, every non-zero skew-hermitian matrix A
= A is normal but not self-adjoint:
A
= A = AA
= A
2
= A
A
We saw above that linear maps with an orthonormal eigenbasis are either self-adjoint or normal
depending whether the inner product space is real or complex. Amazingly, this provides a complete
characterisation of such maps!
Theorem 2.37 (Spectral Theorem, version 1). Let T be a linear operator on a finite-dimensional
inner product space V.
Complex case: V has an orthonormal basis of eigenvectors of T if and only if T is normal.
Real case: V has an orthonormal basis of eigenvectors of T if and only if T is self-adjoint.
The theorem gets is name from the spectrum (set of eigenvalues) of T.
33
Examples 2.38. 1. We diagonalize the self-adjoint linear map T = L
A
L(R
2
) where A =
6 3
3 2
.
Characteristic polynomial p(t) = (6 t)(2 t) 9 = t
2
4t 21 = (t 7)(t + 3)
Eigenvalues λ
1
= 7, λ
2
= 3
Eigenvectors (normalized) w
1
=
1
10
3
1
, w
2
=
1
10
1
3
The basis β = {w
1
, w
2
} is orthonormal, with respect to which [T]
β
=
7 0
0 3
is diagonal.
2. The map T = L
A
L(R
2
) where A =
1 3
0 2
is neither self-adjoint nor normal:
AA
=
1 3
0 2
1 0
3 2
=
10 6
6 4
=
1 3
3 13
=
1 0
3 2
1 3
0 2
= A
A
It is diagonalizable, indeed
γ =
1
0
,
3
1
[T]
γ
=
1 0
0 2
In accordance with the spectral theorem, γ is not orthogonal.
3. Let A =
0 1
1 0
and consider T = L
A
acting on both C
2
and R
2
. Since T is normal but not
self-adjoint, we’ll see how the field really matters in the spectral theorem.
First the complex case: T L(C
2
) is normal and thus diagonalizable with respect to an or-
thonormal basis of eigenvectors. Here are the details.
Characteristic polynomial p(t) = t
2
+ 1 = (t i)(t + i)
Eigenvalues λ
1
= i, λ
2
= i
Eigenvectors (normalized) w
1
=
1
2
i
1
, w
2
=
1
2
i
1
Certainly
w
1
, w
2
= 0, β = {w
1
, w
2
} is orthonormal, and [T]
β
=
i 0
0 i
is diagonal.
Now for the real case: T L(R
2
) is not self-adjoint and thus should not be diagonalizable
with respect to an orthonormal basis of eigenvectors. Indeed this is trivial; the characteristic
polynomial has no roots in R and so there are no real eigenvalues! It is also clear geometrically:
T is rotation by 90° counter-clockwise around the origin, so it has no eigenvectors.
4. Let A =
3 i
i 3
and consider the self-adjoint operator T = L
A
L(C
2
).
Characteristic polynomial p(t) = t
2
6t + 9 1 = (t 2)(t 4)
Eigenvalues λ
1
= 2, λ
2
= 4
Eigenvectors (normalized) w
1
=
1
2
1
i
, w
2
=
1
2
1
i
With respect to the orthonormal basis β = {w
1
, w
2
}, we have [T]
β
=
2 0
0 4
.
34
Proving the Spectral Theorem for Self-Adjoint Operators
Lemma 2.39 (Basic properties of self-adjoint operators). Let T L(V) be self-adjoint.
1. If W V is T-invariant then the restriction T
W
is self-adjoint;
2. Every eigenvalue of T is real;
3. If dim V is finite then T has an eigenvalue.
It is irrelevant whether V is real or complex. The previous example demonstrates part 2; even when
V = C
2
is a complex inner product space, the eigenvalues of a self-adjoint matrix are real.
Proof. 1. Let w
1
, w
2
W. Then
T
W
( w
1
), w
2
=
T(w
1
), w
2
=
w
1
, T(w
2
)
=
w
1
, T
W
( w
2
)
2. Suppose (λ, x) is an eigenpair. Then
λ
||
x
||
2
=
T(x), x
=
x, T(x)
= λ
||
x
||
2
= λ R
3. This is trivial if V is complex since every characteristic polynomial splits over C. We therefore
assume V is real. Choose any orthonormal basis γ of V, let A = [T]
γ
M
n
(R ), and define
S := L
A
L(C
n
). Then;
The characteristic polynomial of S splits over C, whence there exists an eigenvalue λ C.
The characteristic polynomials of S and T are identical (to that of A).
S is self-adjoint and thus (part 2) λ R.
It follows that T has the same real eigenvalue λ.
We are now able to prove the spectral theorem for self-adjoint operators on a finite-dimensional inner
product space V. The argument applies regardless of whether V is real or complex.
Proof of the Spectral Theorem (self-adjoint case). We prove by induction on dim V.
(Base case) If dim V = 1, then V = Span{x} and T(x) = λx for some unit vector x and scalar λ R;
plainly {x} is an orthonormal eigenbasis for T.
(Induction step) Fix n N and assume that every self-adjoint operator on every inner product space
of dimension n satisfies the spectral theorem. Let dim V = n + 1 and T L(V) be self-adjoint.
By part 3 of the Lemma, T has an eigenpair (λ, x) where we may assume x has unit length. Let
W = Span{x}
. If w W, then
x, T(w)
=
T(x), w
= λ
x, w
= 0 ()
whence W is T-invariant. Plainly dim W = n. By part 1 of the Lemma, T
W
is self-adjoint. By the
induction hypothesis, T
W
is diagonalized by some orthonormal basis γ of W. But then T is diagonal-
ized by the orthonormal basis β = γ {x} of V.
35
Proving the Spectral Theorem for Normal Operators
What changes for normal operators on complex inner product spaces? Not much! Indeed the proof
is almost identical when T is merely normal.
We don’t need parts 2 and 3 of Lemma 2.39: every linear operator on a finite-dimensional
complex inner product space has an eigenvalue and we no longer care whether eigenvalues are
real.
Two parts of the induction step need fixed:
W being T-invariant: This isn’t quite as simple as (), but thankfully part 3 of the next
result provides the needed correction.
T
W
being normal: We need a replacement for part 1 of Lemma 2.39; this is a little more
involved.
Rather than write out all the details, we leave this to Exercises 6 and 7.
For completeness, and as an analogue/extension of Lemma 2.39, we summarize some of the basic
properties of normal operators. These also apply to self-adjoint operators as a special case.
Lemma 2.40 (Basic properties of normal operators). Let T be normal on V. Then:
1. x V,
||
T(x)
||
=
||
T
( x)
||
.
2. T tI is normal for any scalar t.
3. T(x) = λx T
( x) =
λx so that T and T
have the same eigenvectors and conjugate
eigenvalues. This recovers the previously established fact that λ R if T is self-adjoint.
4. Distinct eigenvalues of T have orthogonal eigenvectors.
Proof. 1.
||
T(x)
||
2
=
T(x), T(x)
=
T
T(x), x
=
TT
( x), x
=
T
( x), T
( x)
=
||
T
( x)
||
2
.
2.
x, ( T tI)(y)
=
x, T(y)
t
x, y
=
T
( x), y
tx, y
=
(T
tI)(x), y
shows that T
tI has adjoint T
tI. It is trivial to check that these commute.
3. T(x) = λx
||
(T λI)(x)
||
= 0
parts
1&2
(T
λI)(x)
= 0 T
( x) = λx.
4. In part this follows from the spectral theorem, but we can also prove more straightforwardly.
Suppose T(x) = λx and T(y) = µy where λ = µ. By part 3,
λ
x, y
=
λx, y
=
T(x), y
=
x, T
( y)
=
x, µy
= µ
x, y
This is a contradiction unless
x, y
= 0.
36
Schurs Lemma
It is reasonable to ask how useful an orthonormal basis can be in general. Here is one answer.
Lemma 2.41 (Schur). Suppose T is a linear operator on a finite-dimensional inner product space V.
If the characteristic polynomial of T splits, then there exists an orthonormal basis β of V such that
[T]
β
is upper-triangular.
The spectral theorem is a special case; since the proof is similar, we leave it to the exercises.
The conclusion of Schur’s lemma is weaker than the spectral theorem, though it applies to more
operators: indeed if V is complex, it applies to any T! Every example of the spectral theorem is also
an example of Schur’s lemma. Example 2.38.2 provides another, since the matrix A is already upper
triangular with respect to the standard orthonormal basis. Here is a another example.
Example 2.42. Consider T( f ) = 2 f
(x) + x f (1) as a linear map T L(P
1
(R )) with respect to the
L
2
inner product
f , g
=
R
1
0
f (t)g(t) dt. We have
T(a + bx) = 2b + (a + b)x
If [T]
β
is to be upper-triangular, the first vector in β must be an eigenvector of T. It is easily checked
that f
1
= 1 + x is such with eigenvalue 2. To find a basis satisfying Schur’s lemma, we need only find
f
2
orthogonal to this and then normalize. This can be done by brute force since the problem is small,
but for the sake of practice we apply Gram–Schmidt to the polynomial 1:
1
1, 1 + x
||
1 + x
||
2
(1 + x) = 1
1 +
1
2
1 + 1 +
1
3
(1 + x) =
1
14
(5 9x) = f
2
= 5 9x
Indeed we obtain an upper-triangular matrix for T:
T( f
2
) = 18 4x = 13(1 + x) (5 9x) = 13 f
1
f
2
= [T]
{f
1
, f
2
}
=
2 13
0 1
We can also work with the corresponding orthonormal basis as posited in the theorem, though the
matrix is messier:
β = {g
1
, g
2
} =
(
r
3
7
(1 + x),
1
7
(5 9x)
)
= [T]
β
=
2
13
3
0 1
!
Alternatively, we could have started with the other eigenvector h
1
= 2 x: an orthogonal vector to
this is h
2
= 4 9x, with respect to which
[T]
{h
1
,h
2
}
=
1 13
0 2
In both cases the eigenvalues are down the diagonal, as must be for an upper-triangular matrix.
In general, it is difficult to quickly find a suitable basis satisfying Schur’s lemma. After trying the
proof in the exercises, you should be able to describe a method, though it is impractically slow!
37
Exercises 2.4 1. For each linear operator T on an inner product space V, decide whether T is nor-
mal, self-adjoint, or neither. If the spectral theorem permits, find an orthonormal eigenbasis.
(a) V = R
2
and T
(
x
y
)
=
2ab
2a+5b
(b) V = R
3
and T
x
y
z
=
ab
5b
4a2b+5c
(c) V = C
2
and T
(
z
w
)
=
2z+iw
z+2w
(d) V = R
4
with T : (e
1
, e
2
, e
3
, e
4
) 7 ( e
3
, e
4
, e
1
, e
2
)
(e) V = P
2
(R ) with
f , g
=
R
1
0
f (t)g(t) dt and T( f ) = f
(Hint: Don’t compute T
! Instead assume T is normal and aim for a contradiction. . . )
2. Let T( f (x)) = f
(x) + 4x f (0) where T L(P
1
(R )) and
f , g
=
R
1
1
f (t)g(t) dt. Find an
orthonormal basis of P
1
(R ) with respect to which the matrix of T is upper-triangular.
3. Suppose S, T are self-adjoint operators on an inner product space V. Prove that ST is self-adjoint
if and only if ST = TS.
(Hint: recall Theorem 2.27)
4. Let T be normal on a finite-dimensional inner product space V. Prove that N(T
) = N(T) and
that R(T
) = R(T).
(Hint: Use Lemma 2.40 and the Fundamental Subspaces Theorem 2.30)
5. Let T be self-adjoint on a finite-dimensional inner product space V. Prove that
x V,
||
T(x) ±ix
||
2
=
||
T(x)
||
2
+
||
x
||
2
Hence prove that T iI is invertible and that [(T iI)
1
]
= (T + iI)
1
.
6. Let W be a T-invariant subspace of an inner product space V and let T
W
L(W) be the restric-
tion of T to W. Prove:
(a) W
is T
-invariant.
(b) If W is both T- and T
-invariant, then ( T
W
)
= (T
)
W
.
(c) If W is both T- and T
-invariant and T is normal, then T
W
is normal.
7. Use the previous question to complete the proof of the spectral theorem for a normal operator
on a finite-dimensional complex inner product space.
8. (a) Suppose S is a normal operator on a finite-dimensional complex inner product space, all of
whose eigenvalues are real. Prove that S is self-adjoint.
(b) Let T be a normal operator on a finite-dimensional real inner product space V whose char-
acteristic polynomial splits. Prove that T is self-adjoint and that there exists an orthonor-
mal basis of V of eigenvectors of T.
(Hint: Mimic the proof of Lemma 2.39 part 3 and use part (a))
9. Prove Schur’s lemma by induction, similarly to the proof of the spectral theorem.
(Hint: T
has an eigenvector x; why? Now show that W = Span{x}
is T-invariant. . . )
38
2.5 Unitary and Orthogonal Operators and their Matrices
In this section we focus on length-preserving transformations of an inner product space.
Definition 2.43. A linear
9
isometry of an inner product space V is a linear map T satisfying
x V,
||
T(x)
||
=
||
x
||
Every eigenvalue of an isometry must have modulus 1: if T(w) = λw, then
||
w
||
2
=
||
T(w)
||
2
=
||
λw
||
2
=
|
λ
|
2
||
w
||
2
Example 2.44. Let T = L
A
L(R
2
), where A =
1
5
4 3
3 4
. Then
T
x
y
2
=
1
5
4x 3y
3x + 4y
2
=
1
25
(4x 3y)
2
+ (3x + 4y)
2
= x
2
+ y
2
=
x
y
2
This matrix is very special in that its inverse equals its transpose:
A
1
=
1
16
25
+
9
25
4 3
3 4
=
1
5
4 3
3 4
= A
T
We call such matrices orthogonal. The simple version of what follows is that every linear isometry on
R
n
is multiplication by an orthogonal matrix.
Definition 2.45. A unitary operator T on an inner product space V is an invertible linear map satis-
fying T
T = I = TT
. A unitary matrix is a (real or complex) matrix satisfying A
A = I.
If V is real, we usually call these orthogonal operators/matrices; this isn’t necessary, since unitary en-
compasses both real and complex spaces. An orthogonal matrix satisfies A
T
A = I.
Example 2.46. The matrix A =
1
3
i 2+2i
22i i
is unitary:
A
A =
1
9
i 2 + 2i
2 2i i
i 2 + 2i
2 2i i
=
1
9
i
2
+ 4 + 4 ( i + i)( 2 + 2i)
(i i)(2 2i) 4 + 4 i
2
= I
If V is finite-dimensional, the operator/matrix notions correspond straightforwardly. By Theorem
2.27, if we choose any orthonormal basis β of V, then
T L(V) is unitary/orthogonal [T]
β
is unitary/orthogonal
Moreover, we need only assume T
T = I (or TT
= I) if V is finite-dimensional: if β is an orthonormal
basis, then
T
T = I [T
]
β
[T]
β
= I [T]
β
[T
]
β
= I TT
= I
In infinite dimensions, we need T
to be both the left- and right-inverse of T. This isn’t an empty
requirement (see Exercise 13).
9
There also exist non-linear isometries: for instance translations (T(x) = x + a for any constant a) and complex conjugation
(T(x) = x) on C
n
. Together with linear isometries, these essentially comprise all isometries in finite dimensions.
39
We now tackle the correspondence between unitary operators and isometries.
Theorem 2.47. Let T be a linear operator on an inner product space V.
1. If T is a unitary/orthogonal operator, then it is a linear isometry.
2. If T is a linear isometry and V is finite-dimensional, then T is unitary/orthogonal.
Proof. 1. If T is unitary, then
x, y V,
x, y
=
T
T(x), y
=
T(x), T(y)
(†)
In particular taking x = y shows that T is an isometry.
2. ( I T
T)
= I
(T
T)
= I T
T is self-adjoint. By the spectral theorem, there exists an
orthonormal basis of V of eigenvectors of I T
T. For any such x with (real) eigenvalue λ,
0 =
||
x
||
2
||
T(x)
||
2
=
x, x
T(x), T(x)
=
x, ( I T
T)x
= λ
||
x
||
2
= λ = 0
Since I T
T = 0 on a basis, T
T = I. Since V is finite-dimensional, we also have TT
= I
whence T is unitary.
The finite-dimensional restriction is important in part 2: we use the existence of adjoints, the spectral
theorem, and that a left-inverse is also a right-inverse. See Exercise 13 for an example of a non-unitary
isometry in infinite dimensions.
The proof shows a little more:
Corollary 2.48. On a finite-dimensional space, being unitary is equivalent to each of the following:
(a) Preservation of the inner product
10
( ).
(b) The existence of an orthonormal basis β = {w
1
, . . . , w
n
} such that T(β) = {T(w
1
), . . . , T(w
n
) }
is also orthonormal.
(c) That every orthonormal basis β of V is mapped to an orthonormal basis T(β).
While (a) is simply (), claims (b) and (c) are also worth proving explicitly: see Exercise 9. If β is the
standard orthonormal basis of F
n
and T = L
A
, then the columns of A form the orthonormal set T(β).
This makes identifying unitary/orthogonal matrices easy:
Corollary 2.49. A matrix A M
n
(R ) is orthogonal if and only if its columns form an orthonormal
basis of R
n
with respect to the standard (dot) inner product.
A matrix A M
n
(C ) is unitary if and only if its columns form an orthonormal basis of C
n
with
respect to the standard (Hermitian) inner product.
10
In particular, in a real inner product space isometries also preserve the angle θ between vectors since cos θ =
x,y
||
x
||||
y
||
.
40
Examples 2.50. 1. The matrix A
θ
=
cos θ sin θ
sin θ cos θ
M
2
(R ) is orthogonal for any θ. Example 2.44 is
this with θ = tan
1
3
4
= sin
1
3
5
= cos
1
4
5
. More generally (Exercise 6), it can be seen that every
real orthogonal 2 ×2 matrix has the form A
θ
or
B
θ
=
cos θ sin θ
sin θ cos θ
for some angle θ. The effect of the L
A
θ
is to rotate counter-clockwise by θ, while that of L
B
θ
is to
reflect across the line making angle
1
2
θ with the positive x-axis.
2. A =
1
6
2
3 1
2 0 2
2
3 1
!
M
3
(R ) is orthogonal: check the columns!.
3. A =
1
2
1 i
i 1
M
2
(C ) is unitary: indeed it maps the standard basis to the orthonormal basis
T(β ) =
1
2
1
i
,
1
2
i
1
It is also easy to check that the characteristic polynomial is
p(t) = det
1
2
t
i
2
i
2
1
2
t
!
=
t
1
2
2
+
1
2
= t =
1
2
(1 ±i) = e
±πi/4
whence the eigenvalues of T both have modulus 1.
4. Here is an example of an infinite-dimensional unitary operator. On the space C[π, π], the
function T( f (x)) = e
ix
f (x) is linear. Moreover
D
e
ix
f (x), g(x)
E
=
1
2π
Z
π
π
e
ix
f (x)g(x) dx =
1
2π
Z
π
π
f (x)e
ix
g(x) dx =
D
f (x), e
ix
g(x)
E
whence T
( f (x)) = e
ix
f (x). Indeed T
= T
1
and so T is a unitary operator.
Since C[π, π] is infinite-dimensional, we don’t expect all parts of the Corollary to hold:
(a) T does preserve the inner product.
(b), (c) C[π, π] doesn’t have an orthonormal basis; there is no orthonormal set β = {f
k
} so that
every continuous function is a finite linear combination.
11
We cannot therefore claim that T
maps orthonormal bases to orthonormal bases!
11
An infinite orthonormal set β = {f
k
: k Z} can be found so that every function f ‘equals’ an infinite series in the
sense that
||
f
a
k
f
k
||
= 0. Since these are not finite sums, β isn’t strictly a basis, though it isn’t uncommon for it to be
so described. Moreover, given that the norm is defined by an integral, this also isn’t a claim that f and
a
k
f
k
are equal
as functions. Indeed the infinite series need not be continuous! For these reasons, when working with Fourier series, one
tends to consider a broader class than the continuous functions.
41
Unitary and Orthogonal Equivalence
Suppose A M
n
(R ) is symmetric (self-adjoint) A
T
= A. By the spectral theorem, A has an orthonor-
mal eigenbasis β = {w
1
, . . . , w
n
}: Aw
j
= λ
j
w
j
. Arranging the eigenbasis as the columns of a matrix,
we see that the columns of U = (w
1
···w
n
) are orthonormal and so U is an orthogonal matrix. We
can therefore write
A = UDU
1
= U
λ
1
··· 0
.
.
.
.
.
.
.
.
.
0 ··· λ
n
U
T
A similar approach works if A M
n
(C ) is normal: we now have A = UDU
where U is unitary.
Example 2.51. The matrix A =
1+i 1+i
1i 1+i
is normal as can easily be checked. Its characteristic
polynomial is
p(t) = t
2
2(1 + i)t + 4i = (t 2i)(t 2)
with corresponding orthonormal eigenvectors
w
2
=
1
2
1
i
, w
2i
=
1
2
1
i
We conclude that
A =
1
2
1 1
i i
2 0
0 2i
1
2
1 1
i i
1
=
1 1
i i
1 0
0 i
1 i
1 i
This is an example of unitary equivalence.
Definition 2.52. Square matrices A, B are unitarily equivalent if there exists a unitary matrix U such
that B = U
AU. Orthogonal equivalence is similar: B = U
T
AU.
The above discussion proves half the following:
Theorem 2.53. A M
n
(C ) is normal if and only if it is unitarily equivalent to a diagonal matrix
(the matrix of its eigenvalues).
A M
n
(R ) is self-adjoint (symmetric) if and only if it is orthogonally equivalent to a diagonal matrix.
Proof. We’ve already observed the () direction.
For the converse, let D be diagonal, U unitary, and A = U
DU. Then
A
A = (U
DU)
U
DU = U
D
UU
DU = U
DDU = U
DDU = U
DUU
DU
= U
DU(U
DU)
= AA
since U
= U
1
and because diagonal matrices commute: DD = DD.
In the special case where A is real and U is orthogonal, then A is symmetric:
A
T
= (U
T
DU)
T
= U
T
D
T
U = U
T
DU = A
42
Exercises 2.5 1. For each matrix A find an orthogonal or unitary U and a diagonal D = U
AU.
(a)
1 2
2 1
(b)
0 1
1 0
(c)
2 33i
3+3i 5
(d)
2 1 1
1 2 1
1 1 2
2. Which of the following pairs are unitarily/orthogonally equivalent? Explain your answers.
(a) A =
0 1
1 0
and B =
0 2
2 0
(b) A =
0 1 0
1 0 0
0 0 1
and B =
2 0 0
0 1 0
0 0 0
(c) A =
0 1 0
1 0 0
0 0 1
and B =
1 0 0
0 i 0
0 0 i
3. Let a, b C be such that
|
a
|
2
+
|
b
|
2
= 1. Prove that every 2 ×2 matrix of the form
a e
iθ
b
b e
iθ
a
is
unitary. Are these all the unitary 2 ×2 matrices? Prove or disprove.
4. If A, B are orthogonal/unitary, prove that AB and A
1
are also orthogonal/unitary.
(This proves that orthogonal/unitary matrices are groups under matrix multiplication)
5. Check that A =
1
3
5 4i
4i 5
M
2
(C ) satisfies A
T
A = I (it is a complex orthogonal matrix).
(These don’t have the same nice relationship with inner products, and are thus less useful to us)
6. Supply the details of Exercise 2.50.1.
(Hints: β = {i, j} is orthonormal, whence {Ai, Aj} must be orthonormal. Now draw pictures to
compute the result of rotating and reflecting the vectors i and j.)
7. Show that the linear map in Example 2.50.4 has no eigenvectors.
8. Prove that A M
n
(C ) has an orthonormal basis of eigenvectors whose eigenvalues have mod-
ulus 1, if and only if A is unitary.
9. Prove parts (b) and (c) of Corollary 2.48 for a finite-dimensional inner product space:
(a) If β is an orthonormal basis such that T(β) is orthonormal, then T is unitary.
(b) If T is unitary, and η is an orthonormal basis, then T(η) is an orthonormal basis.
10. Let T be a linear operator on a finite-dimensional inner product space V. If
||
T(x)
||
=
||
x
||
for
all x in some orthonormal basis of V, must T be unitary? Prove or disprove.
11. Let T be a unitary operator on an inner product space V and let W be a finite-dimensional
T-invariant subspace of V. Prove:
(a) T(W) = W (Hint: show that T
W
is injective);
(b) W
is T-invariant.
12. Let W a subspace of an inner product space V such that V = W W
. Define T L(V) by
T(u + w) = u w where u W and w W
. Prove that T is unitary and self-adjoint.
13. In the inner product space
2
of square-summable sequences, consider the linear operator
T(x
1
, x
2
, . . .) = (0, x
1
, x
2
, . . .). Prove that T is an isometry and compute its adjoint. Check that T
is non-invertible and non-unitary.
14. Prove Schur’s Lemma for matrices. Every A M
n
(R ) is orthogonally equivalent and every
A M
n
(C ) is unitarily equivalent to an upper triangular matrix.
43
2.6 Orthogonal Projections
Recall the discussion of the Gram-Schmidt process, where we saw that any finite-dimensional sub-
space W of an inner product space V has an orthonormal basis β
W
= {w
1
, . . . , w
n
}. In such a situa-
tion, we can define the orthogonal projections onto W and W
via
π
W
: V W : x 7
n
j=1
x, w
j
w
j
, π
W
: V W
: x 7 x π
W
( x)
Our previous goal was to use orthonormal bases to ease computation. In this section we develop
projections more generally. First recall the notion of a direct sum within a vector space V:
V = X Y v V, unique x X, y Y such that v = x + y
Definition 2.54. A linear map T L(V) is a projection if:
V = R(T) N(T) and T
R(T)
= I
R(T)
Otherwise said, T(r + n) = r whenever r R( T) and n N(T).
Alternatively, given V = X Y, the projection along Y onto X is the map
v = x + y 7 x.
We call a A M
n
(F ) a projection matrix if L
A
L(F
n
) is a projection.
X
Y
v
x = T(v)
y
Example 2.55. A =
1
5
6 2
3 1
is a projection matrix with R(A) = Span
2
1
and N(A) = Span
1
3
.
Indeed, it is straightforward to describe all projection matrices in M
2
(R ). There are three cases:
1. A = I is the identity matrix: R(A) = R
2
and N(A) = {0};
2. A = 0 is the zero matrix: R(A) = {0} and N(A) = R
2
;
3. Choose distinct subspaces R(A) = Span
(
a
b
)
and N(A) = Span
(
c
d
)
, then
A =
1
ad bc
a
b
( d c) =
1
ad bc
ad ac
bd bc
Think about why this last does what we claim.
It should be clear that every projection T has (at most) two eigenspaces:
R(T) is an eigenspace with eigenvalue 1
N(T) is an eigenspace with eigenvalue 0
If V is finite-dimensional and ρ, η are bases of R(T), N(T) respectively, then the matrix of T with
respect to ρ η has block form
[T]
ρη
=
I 0
0 0
where rank I = rank T. In particular, every finite-dimensional projection is diagonalizable.
44
Lemma 2.56. T L(V) is a projection if and only if T
2
= T.
Proof. Throughout, assume r R(T) and n N(T).
( ) Since every vector in V has a unique representation v = r + n, simply compute
T
2
( v) = T
T(r + n)
= T(r) = r = T(v)
( ) Suppose T
2
= T. Note first that if r R(T), then r = T(v) for some v V, whence
T(r) = T
2
( v) = T(v) = r (†)
Thus T is the identity on R(T). Moreover, if x R(T) N(T), (†) says that x = T(x) = 0, whence
R(T) N(T) = {0}
and so R(T) N(T) is a well-defined subspace of V.
12
To finish things off, let v V and observe that
T
v T(v)
= T(v) T
2
( v) = 0 = v T(v) N(T)
so that v = T(v) +
v T(v)
is a decomposition into R(T)- and N(T)-parts. We conclude that
V = R(T) N(T) and that T is a projection.
Thus far the discussion hasn’t had anything to do with inner products. . .
Definition 2.57. An orthogonal projection is a projection T L(V) on an inner product space for
which we additionally have
N(T) = R(T)
and R(T) = N(T)
Alternatively, given V = W W
, the orthogonal projection π
W
is the projection along W
onto W:
that is
R(π
W
) = W and N(π
W
) = W
The complementary orthogonal projection π
W
= I π
W
has R(π
W
) = W
and N(π
W
) = W.
Example (2.55 continued). The identity and zero matrices are both 2 ×2 orthogonal projection
matrices, while those of type 3 are orthogonal if
(
a
b
)
·
(
c
d
)
= 0: we obtain
A =
1
a
2
+ b
2
a
b
(a b) =
1
a
2
+ b
2
a
2
ab
ab b
2
More generally, if W F
n
has orthonormal basis {w
1
, . . . , w
k
}, then the matrix of π
W
is
k
j=1
w
j
w
j
.
12
In finite dimensions, the rank–nullity theorem and dimension counting finishes the proof here without having to
proceed further:
dim(R(T) N(T)) = rank T + null T = dim V = R( T) N(T) = V
45
Theorem 2.58. A projection T L(V) is orthogonal if and only if it is self-adjoint T = T
.
Proof. () By assumption, R(T) and N(T) are orthogonal subspaces. Letting x, y V and using
subscripts to denote R(T)- and N(T)-parts, we see that
x, T(y)
=
x
r
+ x
n
, y
r
=
x
r
, y
r
+ y
n
=
T(x), y
= T
= T
() Suppose T is a self-adjoint projection. By the fundamental subspaces theorem,
N(T) = N(T
) = R(T)
Since T is a projection already, we have V = R(T) N(T) = R(T) R(T)
, from which
R(T) =
R(T)
= N(T)
The language of projections allows us to rephrase the Spectral Theorem.
Theorem 2.59 (Spectral Theorem, mk. II). Let V be a finite-dimensional complex/real inner prod-
uct space and T L(V) be normal/self-adjoint with spectrum {λ
1
, . . . , λ
k
} and corresponding
eigenspaces E
1
, . . . , E
k
. Let π
j
L(V) be the orthogonal projection onto E
j
. Then:
1. V = E
1
···E
k
is a direct sum of orthogonal subspaces, in particular, E
j
is the direct sum of
the remaining eigenspaces.
2. π
i
π
j
= 0 if i = j.
3. (Resolution of the identity) I
V
= π
1
+ ··· + π
k
4. (Spectral decomposition) T = λ
1
π
1
+ ··· + λ
k
π
k
Proof. 1. T is diagonalizable and so V is the direct sum of the eigenspaces of T. Since T is normal,
the eigenvectors corresponding to distinct eigenvalues are orthogonal, whence the eigenspaces
are mutually orthogonal. In particular, this says that
ˆ
E
j
:=
M
i=j
E
i
E
j
Since V is finite-dimensional, we have V = E
j
E
j
, whence
dim
ˆ
E
j
=
i=j
dim E
i
= dim V dim E
j
= dim E
j
=
ˆ
E
j
= E
j
2. This is clear by part 1, since N(π
j
) = E
j
=
ˆ
E
j
.
3. Write x =
k
j=1
x
j
where each x
j
E
j
. Then π
j
( x) = x
j
: now add. . .
4. T(x) =
k
j=1
T(x
j
) =
k
j=1
λ
j
x
j
=
k
j=1
λ
j
π
j
( x).
46
Examples 2.60. We verify the resolution of the identity and the spectral decomposition; for clarity,
we index projections and eigenspaces by eigenvalue rather than the natural numbers.
1. The symmetric matrix A =
10 2
2 7
has spectrum {6, 11} and orthonormal eigenvectors
w
6
=
1
5
1
2
, w
11
=
1
5
2
1
The corresponding projections therefore have matrices
π
6
= w
6
w
T
6
=
1
5
1
2
(1 2) =
1
5
1 2
2 4
, π
11
= w
11
w
T
11
=
1
5
4 2
2 1
from which the resolution of the identity and the spectral decomposition are readily verified:
π
6
+ π
11
=
1 0
0 1
and 6π
6
+ 11π
11
=
1
5
6 + 44 12 + 22
12 + 22 24 + 11
= A
2. The normal matrix B =
1+i 1+i
1i 1+i
M
2
(C ) has spectrum {2, 2i} and corresponding orthonor-
mal eigenvectors
w
2
=
1
2
i
1
, w
2i
=
1
2
1
i
The orthogonal projection matrices are therefore
π
2
= w
2
w
2
=
1
2
i
1
( i 1) =
1
2
1 i
i 1
π
2i
= w
2i
w
2i
=
1
2
1 i
i 1
from which
π
2
+ π
2i
=
1 0
0 1
and 2π
2
+ 2iπ
2i
=
1 i
i 1
+
i 1
1 i
= B
3. The matrix C =
0 1 1
1 0 1
1 1 0
has spectrum {1, 2}, an orthonormal eigenbasis
{u, v, w} =
1
3
1
1
1
,
1
2
1
1
0
,
1
6
1
1
2
and eigenspaces E
2
= Span{u} and E
1
= Span{v, w}. The orthogonal projections have ma-
trices
π
2
= uu
T
=
1
3
1
1
1
(1 1 1) =
1
3
1 1 1
1 1 1
1 1 1
π
1
= vv
T
+ ww
T
=
1
2
1
1
0
(1 1 0) +
1
6
1
1
2
(1 1 2)
=
1
2
1 1 0
1 1 0
0 0 0
+
1
6
1 1 2
1 1 2
2 2 4
=
1
3
2 1 1
1 2 1
1 1 2
It is now easy to check the resolution of the identity and the spectral decomposition:
π
2
+ π
1
= I and 2π
2
π
1
= C
47
Orthogonal Projections and Minimization Problems
We finish this section with an important observation that drives much of the application of inner
product spaces to other parts of mathematics and beyond. Throughout this discussion, X and Y
denote inner product spaces.
Theorem 2.61. Suppose Y = W W
. For any y Y, the orthogonal projection π
W
( y) is the
unique element of W which minimizes the distance to y:
w W,
||
y π
W
( y)
||
||
y w
||
Proof. Apply Pythagoras’ Theorem: since π
W
( y) = y π
W
( y) W
,
||
y w
||
2
=
||
y π
W
( y) + π
W
( y) w
||
2
=
||
y π
W
( y)
||
2
+
||
π
W
( y) w
||
2
||
y π
W
( y)
||
2
with equality if and only if w = π
W
( y).
This set up can be used to compute accurate approximations in many contexts.
Examples 2.62. 1. To obtain a quadratic polynomial approximation p(x) = a + bx + cx
2
to e
x
on
the interval [1, 1] we choose to minimize the integral
R
1
1
|
e
x
p(x)
|
2
dx, namely the squared
L
2
-norm
||
e
x
p(x)
||
2
on C[1, 1]. If we let W = Span{1, x, x
2
}, then the finite-dimensionality
of W means that C[1, 1] = W W
. By the Theorem, the solution is p(x) = π
W
( e
x
).
To compute this, recall that we have an orthonormal basis for W, namely
1
2
,
q
3
2
x,
q
5
8
(3x
2
1)
from which
p(x) =
1
2
1, e
x
+
3
2
x, e
x
x +
5
8
(3x
2
1)
3x
2
1, e
x
=
1
2
Z
1
1
e
x
dx +
3
2
x
Z
1
1
xe
x
dx
+
5
8
(3x
2
1)
Z
1
1
(3x
2
1)e
x
dx
=
1
2
( e e
1
) + 3e
1
x +
5
4
( e 7e
1
)(3x
2
1)
1.18 + 1.10x + 0.179(3x
2
1)
1 + 1.1x + 0.537x
2
1
2
y
1 0 1
x
The linear and quadratic approximations to y = e
x
are drawn. Compare this with the Maclaurin
polynomial e
x
1 + x +
1
2
x
2
from calculus.
48
2. The n
th
Fourier approximation of a function f (x) is its orthogonal projection onto the finite-
dimensional subspace
W
n
= Span{1, e
ix
, e
ix
, . . . , e
inx
, e
inx
}
= Span{1, cos x, sin x, cos 2x, sin 2x, . . . , cos nx, sin nx}
within L
2
[π, π], namely
F
n
(x) =
1
2π
n
k=n
D
f (x), e
ikx
E
e
ikx
=
1
2π
f (x), 1
+
1
π
n
k=1
f (x), cos kx
cos kx +
f (x), sin kx
sin kx
According to the Theorem, this is the unique function in F
n
(x) W
n
minimizing the integral
||
f (x) F
n
(x)
||
2
=
Z
π
π
|
f (x) F
n
(x)
|
2
dx
For example, if f (x) =
(
1 if 0 < x π
1 if π < x 0
is extended periodically, then
F
2n1
(x) =
4
π
n
j=1
sin( 2j 1)x
2j 1
=
4
π
sin x +
sin 3x
3
+
sin 5x
5
+ ··· +
sin( 2n 1)x
2n 1
1
1
y
x
π 2ππ2π
y = f (x) and its eleventh Fourier approximation y = F
11
(x)
We’ll return to related minimization problems in the next section.
Exercises 2.6 1. Compute the matrices of the orthogonal projections onto W viewed as subspaces
of the standard inner product spaces R
n
or C
n
.
(a) W = Span
4
1
(b) W = Span
1
2
1
,
1
0
1
(c) W = Span
i
1
0
,
1
i
1
(d) W = Span
1
1
0
,
1
2
1
(watch out, these vectors aren’t orthogonal!)
49
2. For each of the following matrices, compute the projections onto each eigenspace, verify the
resolution of the identity and the spectral decomposition.
(a)
1 2
2 1
(b)
0 1
1 0
(c)
2 3 3i
3 + 3i 5
(d)
2 1 1
1 2 1
1 1 2
(You should have orthonormal eigenbases from the previous section)
3. If W be a finite-dimensional subspace of an inner product space V. If T = π
W
is the orthogonal
projection onto W, prove that I T is the orthogonal projection onto W
.
4. Let T L(V) where V is finite-dimensional.
(a) If T is an orthogonal projection, prove that
||
T(x)
||
||
x
||
for all x V.
(b) Give an example of a projection for which the inequality in (a) is false.
(c) If T is a projection for which
||
T(x)
||
=
||
x
||
for all x V, what is T?
(d) If T is a projection for which
||
T(x)
||
||
x
||
for all x V, prove that T is an orthogonal
projection.
5. Let T be a normal operator on a finite-dimensional inner product space. If T is a projection,
prove that it must be an orthogonal projection.
6. Let T be a normal operator on a finite-dimensional complex inner product space V. Use the
spectral decomposition T = λ
1
π
1
+ ··· + λ
k
π
k
to prove:
(a) If T
n
is the zero map for some n N, then T is the zero map.
(b) U L(V) commutes with T if and only if U commutes with each π
j
.
(c) There exists a normal U L(V) such that U
2
= T.
(d) T is invertible if and only if λ
j
= 0 for all j.
(e) T is a projection if and only if every λ
j
= 0 or 1.
(f) T = T
if and only if every λ
j
is imaginary.
7. Find a linear approximation to f (x) = e
x
on [0, 1] using the L
2
inner product.
8. Consider the L
2
inner product on C[π, π] inner product.
(a) Explain why
sin x, x
2n
= 0 for all n.
(b) Find linear and cubic approximations to f (x) = sin x.
(Feel free to use a computer algebra package to evaluate the integrals!)
9. Revisit Example 2.62.2
(a) Verify that the general complex (e
ikx
) and real (cos kx, sin kx) expressions for the Fourier
approximation are correct.
(Hint: use Eulers formula e
ikx
= cos kx + i sin kx)
(b) Verify the explicit expression for F
2n1
(x) when f (x) is the given step-function. What is
F
2n
(x) in this case?
50
2.7 The Singular Value Decomposition and the Pseudoinverse
Given T L(V, W) between finite-dimensional inner product spaces, the overarching concern of this
chapter is the existence and computation of bases β, γ of V, W with two properties:
That β, γ be orthonormal, thus facilitating easy calculation within V, W;
That the matrix [T]
γ
β
be as simple as possible.
We have already addressed two special cases:
Spectral Theorem When V = W and T is normal/self-adjoint, β = γ such that [T]
β
is diagonal.
Schurs Lemma When V = W and p(t) splits, β = γ such that [T]
β
is upper triangular.
In this section we allow V = W and β = γ, and obtain a result that applies to any linear map between
finite-dimensional inner product spaces.
Example 2.63. Let A =
3 1
2 2
1 3
and consider orthonormal bases β = {v
1
, v
2
} of R
2
and γ =
{w
1
, w
2
, w
3
} of R
3
respectively:
β =
1
2
1
1
,
1
2
1
1
, γ =
1
2
1
0
1
,
1
6
1
2
1
,
1
3
1
1
1
Since Av
1
= 4w
1
and Av
2
= 2
3w
2
, whence [L
A
]
γ
β
=
4 0
0 2
3
0 0
is almost diagonal.
Our main result says that such bases always exist.
Theorem 2.64 (Singular Value Decomposition). Suppose V, W are finite-dimensional inner prod-
uct spaces and that T L(V, W) has rank r. Then:
1. There exist orthonormal bases β = {v
1
, . . . , v
n
} of V and γ = {w
1
, . . . , w
m
} of W, and positive
scalars σ
1
σ
2
··· σ
r
such that
T(v
j
) =
(
σ
j
w
j
if j r
0 otherwise
equivalently [T]
γ
β
=
diag(σ
1
, . . . , σ
r
) O
O O
2. Any such β is an eigenbasis of T
T, whence the scalars σ
j
are uniquely determined by T: indeed
T
T(v
j
) =
(
σ
2
j
v
j
if j r
0 otherwise
and T
( w
j
) =
(
σ
j
v
j
if j r
0 otherwise
3. If A M
m×n
(F ) with rank A = r, then A = PΣQ
where
Σ = [L
A
]
γ
β
=
diag(σ
1
, . . . , σ
r
) O
O O
, P = (w
1
, . . . , w
m
), Q = (v
1
, . . . , v
n
)
Since the columns of P, Q are orthonormal, these matrices are unitary.
51
Definition 2.65. The numbers σ
1
, . . . , σ
r
are the singular values of T. If T is not maximum rank, we
have additional zero singular values σ
r+1
= ··· = σ
min(m,n)
= 0.
Freedom of Choice While the singular values are uniquely determined, there is often significant free-
dom regarding the bases β and γ, particularly if any eigenspace of T
T has dimension 2.
Special Case (Spectral Theorem) If V = W and T is normal/self-adjoint, we may choose β to be an
eigenbasis of T, then σ
j
is the modulus of the corresponding eigenvalue (see Exercise 7).
Rank-one decomposition If we write g
j
: V F for the linear map g
j
: v 7
v, v
j
(recall Riesz’s
Theorem), then the singular value decomposition says
T =
r
j=1
σ
j
w
j
g
j
that is T(v) =
r
j=1
σ
j
v, v
j
w
j
thus rewriting T as a linear combination of rank-one maps (w
j
g
j
: V W).
For matrices, g
j
is simply multiplication by the row vector v
j
and we may write
13
A =
r
j=1
σ
j
w
j
v
j
Example (2.63 cont). We apply the method in the Theorem to A =
3 1
2 2
1 3
.
The symmetric(!) matrix A
T
A =
14 2
2 14
has eigenvalues σ
2
1
= 16, σ
2
2
= 12 and orthonormal eigen-
vectors v
1
=
1
2
1
1
, v
2
=
1
2
1
1
. The singular values are therefore σ
1
= 4, σ
2
= 2
3. Now
compute
w
1
=
1
σ
1
Av
1
=
1
4
2
A
1
1
=
1
2
1
0
1
, w
2
=
1
σ
2
Av
2
=
1
2
6
A
1
1
=
1
6
1
2
1
and observe that these are orthonormal. Finally choose w
3
=
1
3
1
1
1
to complete the orthonormal
basis γ of R
3
. A singular value decomposition is therefore
A =
3 1
2 2
1 3
= PΣQ
=
1
2
1
6
1
3
0
2
6
1
3
1
2
1
6
1
3
4 0
0 2
3
0 0
1
2
1
2
1
2
1
2
!
By expanding the decomposition, A is expressed as the sum of rank-one matrices:
σ
1
w
1
v
T
1
+ σ
2
w
2
v
T
2
= 4
1
2
0
1
2
1
2
1
2
+ 2
3
1
6
2
6
1
6
1
2
1
2
=
2 2
0 0
2 2
+
1 1
2 2
1 1
13
Since β is orthonormal, it is common to write v
j
for the map g
j
= , v
j
in general contexts. To those familiar with
the dual space V
= L(V, F), the set {g
1
, . . . , g
n
} = {v
1
, . . . , v
n
} is the dual basis to β. In this course v
j
will only ever mean
the conjugate-transpose of a column vector in F
n
. This discussion is part of why physicists write inner products differently!
52
Proof. 1. Since T
T is self-adjoint, the spectral theorem says it has an orthonormal basis of eigen-
vectors β = {v
1
, . . . , v
n
}. If T
T(v
j
) = λ
j
v
j
, then
T(v
j
), T(v
k
)
=
T
T(v
j
), v
k
= λ
j
v
j
, v
k
=
(
λ
j
if j = k
0 if j = k
()
whence every eigenvalue is a non-negative real number: λ
j
=
T(v
j
)
2
0.
Since rank T
T = rank T = r (Exercise 8), exactly r eigenvalues are non-zero; by reordering
basis vectors if necessary, we may assume
λ
1
··· λ
r
> 0
If j r, define σ
j
:=
p
λ
j
> 0 and w
j
:=
1
σ
j
T(v
j
), then the set {w
1
, . . . , w
r
} is orthonormal ().
If necessary, extend this to an orthonormal basis γ of W.
2. If orthonormal bases β and γ exist such that [T]
γ
β
=
diag(σ
1
,...,σ
r
) O
O O
, then [T
]
β
γ
is essentially the
same matrix just that its dimensions have been reversed:
[T
]
β
γ
=
diag(σ
1
, . . . , σ
r
) O
O O
= T
T =
diag(σ
2
1
, . . . , σ
2
r
) O
O O
whence T
and T
T are as claimed.
3. This is merely part 1 in the context of T = L
A
L(F
n
, F
m
). The orthonormal bases β, γ consist
of column vectors and so the (change of co-ordinate) matrices P, Q are unitary.
Examples 2.66. 1. The matrix A =
2 3
0 2
has A
T
A =
4 6
6 13
with eigenvalues σ
2
1
= 16 and σ
2
2
= 1
and orthonormal eigenbasis
β =
1
5
1
2
,
1
5
2
1
The singular values are therefore σ
1
= 4 and σ
2
= 1, from which we obtain
γ =
1
σ
1
Av
1
,
1
σ
2
Av
2
=
1
5
2
1
,
1
5
1
2
and the decomposition
A = PΣQ
=
2
5
1
5
1
5
2
5
!
4 0
0 1
1
5
2
5
2
5
1
5
!
Multiplying out, we may write A as a sum of rank-one matrices
A =
4
5
2
1
1 2
+
1
5
1
2
2 1
=
4
5
2 4
1 2
+
1
5
2 1
4 2
53
2. The decomposition can be very messy to find in non-matrix situations. Here is a classic example
where we simply observe the structure directly.
The L
2
inner product
f , g
=
R
1
0
f (x)g(x) dx on P
2
(R ) and P
1
(R ) admits orthonormal bases
β =
n
5( 6x
2
6x + 1),
3( 2x 1), 1
o
, γ =
n
3( 2x 1), 1
o
Let T =
d
dx
be the derivative operator. The matrix of T is already in the required form!
[T]
γ
β
=
2
15 0 0
0 2
3 0
thus β, γ are suitable bases and the singular values of T are σ
1
= 2
15 and σ
2
= 2
3.
Since β, γ are orthonormal, we could have used the adjoint method to evaluate this directly,
[T
T]
β
= ([T]
γ
β
)
T
[T]
γ
β
=
60 0 0
0 12 0
0 0 0
= σ
2
1
= 60, σ
2
2
= 12
Up to sign, {[v
1
]
β
, [v
2
]
β
, [v
3
]
β
} is therefore forced to be the standard ordered basis of R
3
, con-
firming that β was the correct basis of P
2
(R ) all along!
The Pseudoinverse
The singular value decomposition of a map T L(V, W) gives rise to a natural map from W back to
V. This map behaves somewhat like an inverse even when the operator is non-invertible!
Definition 2.67. Given the singular value decomposition of a rank r map T L(V, W), the pseu-
doinverse of T is the linear map T
L(W, V) defined by
T
( w
j
) =
(
1
σ
j
v
j
if j r
0 otherwise
Restricted to Span{v
1
, . . . , v
r
} Span{w
1
, . . . , w
r
}, the pseudoinverse really does invert T:
T
T(v
j
) =
(
v
j
if j r
0 otherwise
TT
( w
j
) =
(
w
j
if j r
0 otherwise
V
T
= Span{v
1
, . . . , v
r
}
| {z }
N(T)
=R(T
)
Span{v
r+1
, . . . , v
n
}
| {z }
N(T)=R(T
)
{0
V
}
bijection
OO
W
T
OO
=
R(T)=N(T
)
z }| {
Span{w
1
, . . . , w
r
}
R(T)
=N(T
)
z }| {
Span{w
r+1
, . . . , w
m
}
CC
{0
W
}
Otherwise said, the combinations are orthogonal projections: T
T = π
N(T)
and TT
= π
R(T)
54
Given the singular value decomposition of a matrix A = PΣQ
, its pseudoinverse is that of matrix of
(L
A
)
, namely
A
=
r
j=1
1
σ
j
v
j
w
j
= QΣ
P
where Σ
=
diag(σ
1
1
, . . . , σ
1
r
) O
O O
Examples 2.68. 1. Again continuing Example 2.63, A =
3 1
2 2
1 3
has pseudoinverse
A
=
1
σ
1
v
1
w
1
+
1
σ
2
v
2
w
2
=
1
4
2
2
1
1
(1 0 1) +
1
2
3
2
6
1
1
( 1 2 1)
=
1
8
1 0 1
1 0 1
+
1
12
1 2 1
1 2 1
=
1
24
5 4 1
1 4 5
which is exactly what we would have found by computing A
= QΣ
P
. Observe that
A
A =
1 0
0 1
and AA
=
1
3
2 1 1
1 2 1
1 1 2
are the orthogonal projection matrices onto the spaces N(A)
= Span{v
1
, v
2
} = R
2
and
R(A) = Span{w
1
, w
2
} R
3
respectively. Both spaces have dimension 2, since rank A = 2.
2. The pseudoinverse of T =
d
dx
: P
2
(R ) P
1
(R ), as seen in Example 2.66.2, maps
T
(
3( 2x 1)) =
1
2
15
5( 6x
2
6x + 1) =
1
2
3
(6x
2
6x + 1)
T
(1) =
1
2
3
3( 2x 1) = x
1
2
= T
(a + bx) = T
a +
b
2
+
b
2
3
3( 2x 1)
=
a +
b
2
x
1
2
+
b
12
(6x
2
6x + 1)
=
b
2
x
2
+ ax
a
2
b
6
The pseudoinverse of ‘differentiation’ therefore returns a particular choice of anti-derivative,
namely the unique anti-derivative of a + bx lying in Span{
5( 6x
2
6x + 1),
3( 2x 1)}.
Exercises 2.7 1. Find the ingredients β, γ and the singular values for each of the following:
(a) T L(R
2
, R
3
) where T
(
x
y
)
=
x
x+y
xy
(b) T : P
2
(R ) P
1
(R ) and T( f ) = f
′′
where
f , g
:=
R
1
0
f (x)g(x) dx
(c) V = W = Span{1, sin x, cos x} and
f , g
=
R
2π
0
f (x)g(x) dx, with T( f ) = f
+ 2f
55
2. Find a singular value decomposition of each of the matrices:
(a)
1 1
1 1
1 1
(b)
1 0 1
1 0 1
(c)
1 1 1
1 1 0
1 0 1
3. Find an explicit formula for T
for each map in Exercise 1.
4. Find the pseudoinverse of each of the matrix in Exercise 2.
5. Suppose A = PΣQ
is a singular value decomposition.
(a) Describe a singular value decomposition of A
.
(b) Explain why A
= QΣ
P
isn’t a singular value decomposition of A
; what would be the
correct decomposition? (Hint: what is wrong with Σ
?)
6. Suppose T : V W is written according to the singular value theorem. Prove that γ is a basis
of eigenvectors of TT
with the same non-zero eigenvalues as T
T, including repetitions.
7. (a) Suppose T = L(V) is normal. Prove that each v
j
in the singular value theorem may be
chosen to be an eigenvector of T and that σ
j
is the modulus of the corresponding eigenvalue.
(b) Let A =
0 1
1 0
. Show that any orthonormal basis β of R
2
satisfies the singular value theo-
rem. What is γ here? What is it about the eigenvalues of A that make this possible?
(Even when T is self-adjoint, the vectors in β need not also be eigenvectors of T!)
8. In the proof of the singular value theorem we claimed that rank T
T = rank T. Verify this by
checking explicitly that N(T
T) = N(T).
(This is circular logic if you use the decomposition, so you must do without!)
9. Let V, W be finite-dimensional inner product spaces and T L(V, W). Prove:
(a) T
TT
= T
TT
= T
.
(Hint: evaluate on the basis γ = {w
1
, . . . , w
m
} in the singular value theorem)
(b) If T is injective, then T
T is invertible and T
= (T
T)
1
T
.
(c) If T is surjective, then TT
is invertible and T
= T
(TT
)
1
.
10. Consider the equation T(x) = b, where T is a linear map between finite-dimensional inner
product spaces. A least-squares solution is a vector x which minimizes
||
T(x) b
||
.
(a) Prove that x
0
= T
( b) is a least-squares solution and that any other has the form x
0
+ n
for some n N(T).
(Hint: Theorem 2.61 says that x
0
is a least-squares solution if and only if T(x
0
) = π
R(T)
( b))
(b) Prove that x
0
= T
( b) has smaller norm than any other least-squares solution.
(c) If T is injective, prove that x
0
= T
( b) is the unique least-squares solution.
11. Find the minimal norm solution to the first system, and the least-squares solution to the second:
(
3x + 2y + z = 9
x 2y + 3z = 3
3x + y = 1
2x 2y = 0
x + 3y = 0
56
Linear Regression (non-examinable)
Given a data set {(t
j
, y
j
) : 1 j m}, we may employ the least-squares method to find a best-fitting
line y = c
0
+ c
1
t; often used to predict y given a value of t.
The trick is to minimize the sum of the squares of the vertical deviations of the line from the data set.
m
j=1
y
j
c
0
c
1
t
j
2
=
||
y Ax
||
2
where A =
t
1
1
.
.
.
.
.
.
t
m
1
x =
c
1
c
0
y =
y
1
.
.
.
y
m
With the indicated notation, we recognize this as a least-squares problem. Indeed if there are at least
two distinct t-values in the data set, then rank A = 2 is maximal and we have a unique best-fitting
line with coefficients given by
c
1
c
0
= A
y = (A
T
A)
1
A
T
y
Example 2.69. Given the data set {(0, 1), (1, 1), (2, 0), (3, 2), (4, 2)}, we compute
A =
0 1
1 1
2 1
3 1
4 1
, y =
1
1
0
2
2
= x
0
=
c
1
c
0
= (A
T
A)
1
A
T
y =
30 10
10 5
1
15
6
=
3
10
1
2
The regression line therefore has equation y =
3
10
( t + 2).
The process can be applied more generally to approximate using other functions. To find the best-
fitting quadratic polynomial y = c
0
+ c
1
t + c
2
t
2
, we’d instead work with
A =
t
2
1
t
1
1
.
.
.
.
.
.
.
.
.
t
2
m
t
m
1
x =
c
2
c
1
c
0
y =
y
1
.
.
.
y
m
=
m
j=1
y
j
c
0
c
1
t c
2
t
2
j
2
=
||
y Ax
||
2
Provided we have at least three distinct values t
1
, t
2
, t
3
, the matrix A is guaranteed to have rank 3 and
there will be a best-fitting least-squares quadratic: in this case
y =
1
70
(15t
2
39t + 72)
This curve and the best-fitting straight line are shown below.
0
1
2
y
0 1 2 3 4
t
0
1
2
y
0 1 2 3 4
t
57
Optional Problems Use a computer to invert any 3 ×3 matrices!
1. Check the calculation for the best-fitting least-squares quadratic in Example 2.69.
2. Find the best-fitting least-squares linear and quadratic approximations to the data set
{
(1, 2), (3, 4), (5, 7), (7, 9), (9, 12)
}
3. Suppose a data set {(t
j
, y
j
) : 1 j m} has unique regression line y = ct + d.
(a) Show that the equations A
T
Ax
0
= A
T
y can be written in matrix form
t
2
j
t
j
t
j
m
c
d
=
t
j
y
j
y
j
(b) Recover the standard expressions from statistics:
c =
Cov(t, y)
σ
2
t
and d = y ct
where
t =
1
m
m
j=1
t
j
and y =
1
m
m
j=1
y
j
are the means (averages),
σ
2
t
=
1
m
m
j=1
( t
j
t)
2
is the variance,
Cov(t, y) =
1
m
m
j=1
( t
j
t)(y
j
y) is the covariance.
58
2.8 Bilinear and Quadratic Forms
In this section we slightly generalize the idea of an inner product. Throughout, V is a vector space
over a field F; it need not be an inner product space and F can be any field (not just R or C).
Definition 2.70. A bilinear form B : V ×V F is linear in each entry: v, x, y V, λ F,
B(λx + y, v) = λB(x, v) + B(y, v), B(v, λx + y) = λB(v, x) + B(v, y)
Additionally, B is symmetric if x, y V, B(x, y) = B(y, x).
Examples 2.71. 1. If V is a real inner product space, then the inner product
,
is a symmetric
bilinear form. Note that a complex inner product is not bilinear!
2. If A M
n
(F ), then B(x, y) := x
T
Ay is a bilinear form on F
n
. For instance, on R
2
,
B(x, y) = x
T
1 2
2 0
y = x
1
y
1
+ 2x
1
y
2
+ 2x
2
y
1
defines a symmetric bilinear form, though not an inner product since it isn’t positive definite;
for example B( j, j) = 0.
As seen above, we often make use of a matrix.
Definition 2.72. Let B be a bilinear form on a finite-dimensional space with basis ϵ = {v
1
, . . . , v
n
}.
The matrix of B with respect to ϵ is the matrix [B]
ϵ
= A M
n
(F ) with ij
th
entry
A
ij
= B(v
i
, v
j
)
Given x, y V, compute their co-ordinate vectors [x]
ϵ
, [y]
ϵ
with respect to ϵ, then
B(x, y) = [x]
T
ϵ
A[y]
ϵ
The set of bilinear forms on V is therefore in bijective correspondence with M
n
(F ). Moreover,
B(y, x) = [y]
T
ϵ
A[x]
ϵ
=
[y]
T
ϵ
A[x]
ϵ
T
= [x]
T
ϵ
A
T
[y]
ϵ
Finally, if β is another basis of V, then an appeal to the change of co-ordinate matrix Q
β
ϵ
yields
B(x, y) = [x]
T
ϵ
A[y]
ϵ
= (Q
ϵ
β
[x]
β
)
T
A(Q
ϵ
β
[y]
ϵ
) = [x]
T
β
(Q
ϵ
β
)
T
AQ
ϵ
β
[y]
β
= [B]
β
= (Q
ϵ
β
)
T
[B]
ϵ
Q
ϵ
β
To summarize:
Lemma 2.73. Let B be a bilinear form on a finite-dimensional vector space.
1. If A is the matrix of B with respect to some basis, then every other matrix of B has the form
Q
T
AQ for some invertible Q.
2. B is symmetric if and only if its matrix with respect to any (and all) bases is symmetric.
Naturally, the simplest situation is when the matrix of B is diagonal. . .
59
Examples 2.74. 1. Example 2.71.2 can be written
B(x, y) = x
T
1 2
2 0
y = x
1
y
1
+ 2x
1
y
2
+ 2x
2
y
1
= (x
1
+ 2x
2
)( y
1
+ 2y
2
) 4x
2
y
2
=
x
1
+ 2x
2
x
2
1 0
0 4
y
1
+ 2y
2
y
2
= x
T
1 2
0 1
T
1 0
0 4
1 2
0 1
y
If ϵ is the standard basis, then [B]
β
=
1 0
0 4
where Q
β
ϵ
=
1 2
0 1
. It follows that Q
ϵ
β
=
1 2
0 1
from which β =
1
0
,
2
1
is a diagonalizing basis.
2. In general, we may perform a sequence of simultaneous row and column operations to diago-
nalize any symmetric B; we require only elementary matrices E
(λ)
ij
of type III.
14
For instance:
B(x, y) = x
T
Ay = x
T
1 2 3
2 0 4
3 4 1
y (A = [B]
ϵ
with respect to the standard basis)
E
(2)
21
AE
(2)
12
=
1 0 3
0 4 10
3 10 1
(add twice row 1 to row 2, columns similarly)
E
(3)
31
E
(2)
21
AE
(2)
12
E
(3)
13
=
1 0 0
0 4 10
0 10 10
(subtract 3 times row 1 from row 3, etc.)
E
(1)
23
E
(3)
31
E
(2)
21
AE
(2)
12
E
(3)
13
E
(1)
32
=
1 0 0
0 6 0
0 0 10
(add row 3 to row 2, etc.)
If β is the diagonalizing basis, the the change of co-ordinate matrix is
Q
ϵ
β
= E
(2)
12
E
(3)
13
E
(1)
32
=
1 2 0
0 1 0
0 0 1
1 0 3
0 1 0
0 0 1
1 0 0
0 1 0
0 1 1
=
1 1 3
0 1 0
0 1 1
from which β =
n
1
0
0
,
1
1
1
,
3
0
1
o
. If you’re having trouble believing this, invert the
change of co-ordinate matrix and check that
B(x, y) = (x
1
2x
2
+ 3x
3
)( y
1
2y
2
+ 3y
3
) + 6x
2
y
2
10(x
2
+ x
3
)( y
2
+ y
3
)
Warning! If F = R then every symmetric B may be diagonalized by an orthonormal basis (for the
usual dot product on R
n
). It is very unlikely that our algorithm will produce such! The algorithm has
two main advantages over the spectral theorem: it is typically faster and it applies to vector spaces
over any field. As a disadvantage, it is highly non-unique.
14
Recall that E
(λ)
ij
is the identity matrix with an additional λ in the ij
th
entry.
As a column operation (right-multiplication), A 7→ AE
(λ)
ij
adds λ times the i
th
column to the j
th
.
As a row operation (left-multiplication), A 7 E
(λ)
ji
A = (E
(λ)
ij
)
T
A adds λ times the i
th
row to the j
th
.
60
Example 2.75. We diagonalize B( x, y) = x
T
1 6
6 3
y = x
1
y
1
+ 6x
1
y
2
+ 6x
2
y
1
+ 3x
2
y
2
in three ways.
1 0
6 1
1 6
6 3
1 6
0 1
=
1 0
0 33
= [B]
β
where β =
1
0
,
6
1
. This corresponds to
B(x, y) = (x
1
+ 6x
2
)( y
1
+ 6y
2
) 33x
2
y
2
1 2
0 1
1 6
6 3
1 0
2 1
=
11 0
0 3
= [B]
γ
where γ =
1
2
,
0
1
. This corresponds to
B(x, y) = 11x
1
y
1
+ 3(2x
1
+ x
2
)(2y
1
+ y
2
)
If F = R, we may apply the spectral theorem to see that [B]
η
=
2+
37 0
0 2
37
is diagonal
with respect to η =
n
6
1+
37
,
1
37
6
o
. The expression for B(x, y) in these co-ordinates is
disgusting, so we omit it; that B can be diagonalized orthogonally doesn’t mean it should be!
Theorem 2.76. Suppose B is a bilinear form of a finite-dimensional space V over F.
1. If B is diagonalizable, then it is symmetric.
2. If B is symmetric and F does not have characteristic two (see aside), then B is diagonalizable.
Proof. 1. If B is diagonalizable, β such that [B]
β
is diagonal and thus symmetric.
2. Suppose B is non-zero (otherwise the result is trivial). We prove by induction on n = dim V.
If n = 1, the result is trivial: B(x, y) = axy for some a F is clearly symmetric.
Fix n N and assume that every non-zero symmetric bilinear form on a dimension n vector
space over F is diagonalizable. Let dim V = n + 1. By the discussion below, x V such that
B(x, x) = 0. Consider the linear map
T : V F : v 7 B(x, v)
Clearly rank T = 1 = dim N(T) = n. Moreover, B is symmetric when restricted to N(T);
by the induction hypothesis there exists a basis β of N(T) such that [B
N(T)
]
β
is diagonal. But
then B is diagonal with respect to the basis β {x}.
Aside: Characteristic two fields This means 1 + 1 = 0 in F; this holds, for instance, in the field
Z
2
= {0, 1} of remainders modulo 2. We now see the importance of char F = 2 to the above result.
The proof uses the existence of x V such that B(x, x) = 0. If B is non-zero, u, v such that
B(u, v) = 0. If both B(u, u) = 0 = B(v, v), then x = u + v does the job whenever char F = 2:
B(x, x) = B(u, v) + B(v, u) = 2B(u, v) = 0
To see that the requirement isn’t idle, consider B(x, y) = x
T
0 1
1 0
y on the finite vector space
Z
2
2
= {
0
0
,
1
0
,
0
1
,
1
1
} over Z
2
. Every element of this space satisfies B(x, x) = 0! Perhaps
surprisingly, the matrix of B is identical with respect to any basis of Z
2
2
, whence B is symmetric
but non-diagonalizable.
61
In Example 2.75 notice how the three diagonal matrix representations have something in common:
one each of the diagonal entries are positive and negative. This is a general phenomenon:
Theorem 2.77 (Sylvesters Law of Inertia). Suppose B is a symmetric bilinear form on a real vector
space V with diagonal matrix representation diag(λ
1
, . . . , λ
n
). Then the number of entries λ
j
which
are positive/negative/zero is independent of the diagonal representation.
Definition 2.78. The signature of a symmetric bilinear form B is the triple (n
+
, n
, n
0
) representing
how many positive, negative and zero terms are in any diagonal representation. Sylvester’s Law says
that the signature is an invariant of a symmetric bilinear form.
Positive-definiteness says that a real inner product on an n-dimensional space has signature (n, 0, 0).
Practitioners of relativity often work in Minkowski spacetime: R
4
equipped with a signature (1, 3, 0)
bilinear form, typically
B(x, y) = x
T
c
2
0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
y = c
2
x
1
y
1
x
2
y
2
x
3
y
3
x
4
y
4
where c is the speed of light. Vectors are time-, space-, or light-like depending on whether B(x, x) is
positive, negative or zero. For instance x = 3c
1
e
1
+ 2e
2
+ 2e
3
+ e
4
is light-like.
Proof. For simplicity, let V = R
n
and write B( x, y) = x
T
Ay where A is symmetric.
1. Define rank B := rank A and observe this is independent of basis (exercises).
2. Let β, γ be diagonalizing bases, ordered according to whether B is positive, negative or zero.
β = {v
1
, . . . , v
p
, v
p+1
, . . . , v
r
, v
r+1
, . . . , v
n
}
γ = {w
1
, . . . , w
q
, w
q+1
, . . . , w
r
, w
r+1
, . . . , w
n
}
Here r = rank B in accordance with part 1: our goal is to prove that p = q.
3. Assume p < q, define the matrix C and check what follows:
C =
v
T
1
A
.
.
.
v
T
p
A
w
T
q+1
A
.
.
.
w
T
r
A
M
(rq+p)×n
(R )
(a) rank C r q + p < r = null C > n r, thus
x R
n
such that Cx = 0 and x ∈ Span{v
r+1
, . . . , v
n
}
(b) The first p entries of Cx = 0 mean that x Span{v
p+1
, . . . , v
r
, v
r+1
, . . . , v
n
} and so
B(x, x) < 0. Note how we use part (a) to get a strict inequality here.
(c) Now write x with respect to γ: this time we see that x Span{w
1
, . . . , w
q
, w
r+1
, . . . , w
n
},
whence B( x, x) 0: a contradiction.
62
Quadratic Forms & Diagonalizing Conics
Definition 2.79. To every symmetric bilinear form B : V ×V F is associated a quadratic form
K : V F : x 7 B(x, x)
A function K : V F is termed a quadratic form when such a symmetric bilinear form exists.
Examples 2.80. 1. If B is a real inner product, then K(v) =
v, v
=
||
v
||
2
is the square of the norm.
2. Let dim V = n and A be the matrix of B with respect to a basis β. By the symmetry of A,
K(x) = x
T
Ax =
n
i,j=1
x
i
A
ij
x
j
=
1ijn
˜
a
ij
x
i
x
j
where
˜
a
ij
=
(
A
ij
if i = j
2A
ij
if i = j
E.g., K(x) = 3x
2
1
+ 4x
2
2
2x
1
x
2
corresponds to the bilinear form B(x, y) = x
T
3 1
1 4
y
As a fun application, we consider the diagonalization of conics in R
2
. The general non-zero conic has
equation
ax
2
+ 2bxy + cy
2
+ dx + ey + f = 0
where the first three terms comprise a quadratic form
K(x) = ax
2
+ 2bxy + cy
2
B(v, w) = v
T
a b
b c
w
If {v
1
, v
2
} is a diagonalizing basis, then there exist scalars λ
1
, λ
2
for which
K(t
1
v
1
+ t
2
v
2
) = λ
1
t
2
1
+ λ
2
t
2
2
whence the general conic becomes
λ
1
t
2
1
+ λ
2
t
2
2
+ µ
1
t
1
+ µ
2
t
2
= η, λ
1
, λ
2
, µ
1
, µ
2
, η R
If λ
i
= 0, we may complete the square via the linear transformation s
j
= t
j
+
µ
j
2λ
j
. The canonical
forms are then recovered:
Parabola: λ
1
or λ
2
= 0 (but not both).
Ellipse: λ
1
s
2
1
+ λ
2
s
2
2
= k = 0 where λ
1
, λ
2
, k have the same sign.
Hyperbola: λ
1
s
2
1
+ λ
2
s
2
2
= k = 0 where λ
1
, λ
2
have opposite signs.
Since B is symmetric, we may take {v
1
, v
2
} to be an orthonormal basis of R
2
, whence any conic may be
put in canonical form by applying only a rotation/reflection and translation (completing the square).
Alternatively, we could diagonalize K using our earlier algorithm; this additionally permits shear
transforms. By Sylvester’s Law, the diagonal entries will have the same number of (+, , 0) terms
regardless of the method, so the canonical form will be unchanged.
63
Examples 2.81. 1. We describe and plot the conic with equation 7x
2
+ 24xy = 144.
The matrix of the associated bilinear form is
7 12
12 0
which has orthonormal eigenbasis
β = {v
1
, v
2
} =
1
5
4
3
,
1
5
3
4
with eigenvalues (λ
1
, λ
2
) = (16, 9). In the rotated ba-
sis, this is the canonical hyperbola
16t
2
1
9t
2
2
= 144
t
2
1
3
2
t
2
2
4
2
= 1
which is easily plotted. In case this is too fast, use the
change of co-ordinate matrix to compute directly:
4
2
2
4
y
4 2 2 4
x
v
1
v
2
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
t
1
t
2
Q
ϵ
β
=
1
5
4 3
3 4
=
t
1
t
2
= [x]
β
= Q
β
ϵ
x
y
=
1
5
4x + 3y
3x + 4y
which quickly recovers the original conic by substitution.
2. The conic defined by K(x) = x
2
+ 12xy + 3y
2
= 33 defines
a hyperbola in accordance with Example 2.75. With respect
to the basis γ = {
1
2
,
0
1
}, we see that
K(x) = 11x
2
+ 3(2x + y)
2
= 11t
2
1
+ 3t
2
2
= 33
t
2
1
3
t
2
2
11
= 1
Even though γ is non-orthogonal, we can still plot the conic!
If we instead used the orthonormal basis η, we’d obtain
K(x) = (
37 + 2)s
2
1
(
37 2)s
2
2
= 33
however the calculation to find η is time-consuming and the
expressions for s
1
, s
2
are extremely ugly.
y
6 3 3 6
x
v
1
v
2
1
2
3
1
2
3
6
3
6
3
t
1
t
2
A similar approach can be applied to higher degree quadratic equations/manifolds: e.g. ellipsoids,
paraboloids and hyperboloids in R
3
.
Exercises 2.8 1. Prove that the sum of any two bilinear forms is bilinear, and that any scalar multi-
ple of a bilinear form is bilinear: thus the set of bilinear forms on V is a vector space.
(You can’t use matrices here, since V could be infinite-dimensional!)
2. Compute the matrix of the bilinear form
B(x, y) = x
1
y
1
2x
1
y
2
+ x
2
y
1
x
3
y
3
on R
3
with respect to the basis β =
n
1
0
1
,
1
0
1
,
0
1
0
o
.
64
3. Check that the function B( f , g) = f
(0)g
′′
(0) is a bilinear form on the vector space of twice-
differentiable functions. Find the matrix of B with respect to β = {cos t, sin t, cos 2t, sin 2t}
when restricted to the subspace Span β.
4. For each matrix A, find a diagonal matrix D and an invertible matrix Q such that Q
T
AQ = D.
(a)
1 3
3 2
(b)
3 1 2
1 4 0
2 0 1
5. If K is a quadratic form and K( x) = 2, what is the value of K(3x)?
6. If F does not have characteristic 2, and K(x) = B(x, x) is a quadratic form, prove that we can
recover the bilinear form B via
B(x, y) =
1
2
(
K(x + y) K(x) K(y)
)
7. If B( x, y) = x
T
0 1
1 0
y is a bilinear form on F
2
, compute the quadratic form K(x).
8. Suppose B is a symmetric bilinear form on a real finite-dimensional space. With reference to
the proof of Sylvester’s Law, explain why rank B is independent of the choice of diagonalizing
basis.
9. If char F = 2, apply the diagonalizing algorithm to the symmetric bilinear form B = x
T
0 1
1 0
y
on F
2
. What goes wrong if char F = 2?
10. Describe and plot the following conics:
(a) x
2
+ y
2
+ xy = 6
(b) 35x
2
+ 120xy = 4x + 3y
11. Suppose that a non-empty, non-degenerate
15
conic C in R
2
has the form ax
2
+ 2bxy + cy
2
+ dx +
ey + f = 0, where at least one of a, b, c = 0, and define = b
2
ac. Prove that:
C is a parabola if and only if = 0;
C is an ellipse if and only if < 0;
C is a hyperbola if and only if > 0.
(Hint: λ
1
, λ
2
are the eigenvalues of a symmetric matrix, so. . . )
15
The conic contains at least two points and cannot be factorized as a product of two straight lines: for example, the
following are disallowed;
x
2
+ y
2
+ 1 = 0 is empty (unless one allows conics over C . . .)
x
2
+ y
2
= 0 contains only one point;
x
2
xy x + y = (x 1) (x y) = 0 is the product of two lines.
65
3 Canonical Forms
3.1 Jordan Forms & Generalized Eigenvectors
Throughout this course we’ve concerned ourselves with variations of a general question: for a given
map T L(V), find a basis β such that the matrix [T]
β
is as close to diagonal as possible. In this
chapter we see what is possible when T is non-diagonalizable.
Example 3.1. The matrix A =
8 4
25 12
M
2
(R ) has characteristic equation
p(t) = ( 8 t)(12 t) + 4 ·25 = t
2
4t + 4 = (t 2)
2
and thus a single eigenvalue λ = 2. It is non-diagonalizable since the eigenspace is one-dimensional
E
2
= N
10 4
25 10
= Span
2
5
However, if we consider a basis β = {v
1
, v
2
} where v
1
=
2
5
is an eigenvector, then [L
A
]
β
is upper-
triangular, which is better than nothing! How simple can we make this matrix? Let v
2
=
(
x
y
)
, then
Av
2
=
8x + 4y
25x + 12y
= 2
x
y
+
10x + 4y
25x + 10y
= 2v
2
+ (5x + 2y)v
1
= [L
A
]
β
=
2 5x + 2y
0 2
Since v
2
cannot be parallel to v
1
, the only thing we cannot have is a diagonal matrix. The next best
thing is for the upper right corner be 1; for instance we could choose
β = {v
1
, v
2
} =
2
5
,
1
3
= [L
A
]
β
=
2 1
0 2
Definition 3.2. A Jordan block is a square matrix of the form
J =
λ 1
λ
.
.
.
.
.
.
1
λ
where all non-indicated entries are zero. Any 1 ×1 matrix is also a Jordan block.
A Jordan canonical form is a block-diagonal matrix diag(J
1
, . . . , J
m
) where each J
k
is a Jordan block.
A Jordan canonical basis for T L(V) is a basis β of V such that [T]
β
is a Jordan canonical form.
If a map is diagonalizable, then any eigenbasis is Jordan canonical and the corresponding Jordan
canonical form is diagonal. What about more generally? Does every non-diagonalizable map have a
Jordan canonical basis? If so, how can we find such?
66
Example 3.3. It can easily be checked that β = {v
1
, v
2
, v
3
} =
n
1
0
1
,
1
2
0
,
1
1
1
o
is a Jordan canon-
ical basis for
A =
1 2 3
4 5 4
2 1 4
(really L
A
L(R
3
)). Indeed
Av
1
= 2v
1
, Av
2
= 3v
2
, Av
3
=
4
5
3
=
1 + 3
2 + 3
0 + 3
= v
2
+ 3v
3
= [L
A
]
β
=
2 0 0
0 3 1
0 0 3
Generalized Eigenvectors
Example 3.3 was easy to check, but how would we go about finding a suitable β if we were merely
given A? We brute-forced this in Example 3.1, but such is not a reasonable approach in general.
Eigenvectors get us some of the way:
v
1
is an eigenvector in Example 3.1, but v
2
is not.
v
1
and v
2
are eigenvectors in Example 3.3, but v
3
is not.
The practical question is how to fill out a Jordan canonical basis once we have a maximal independent
set of eigenvectors. We now define the necessary objects.
Definition 3.4. Suppose T L(V) has an eigenvalue λ. Its generalized eigenspace is
K
λ
:= {x V : ( T λI)
k
( x) = 0 for some k N} =
[
kN
N(T λI)
k
A generalized eigenvector is any non-zero v K
λ
.
As with eigenspaces, the generalized eigenspaces of A M
n
(F ) are those of the map L
A
L(F
n
).
It is easy to check that our earlier Jordan canonical bases consist of generalized eigenvectors.
Example 3.1: We have one eigenvalue λ = 2. Since (A 2I)
2
=
0 0
0 0
is the zero matrix, every
non-zero vector is a generalized eigenvector; plainly K
2
= R
2
.
Example 3.3: We see that
(A 2I)v
1
= 0, (A 3I)v
2
= 0, (A 3I)
2
v
3
= (A 3I)v
2
= 0
whence β is a basis of generalized eigenvectors. Indeed
K
3
= Span{v
1
, v
2
}, K
2
= E
2
= Span{v
3
}
though verifying this with current technology is a little awkward. . .
67
In order to easily compute generalized eigenspaces, it is useful to invoke the main result of this
section. We postpone the proof for a while due to its meatiness.
Theorem 3.5. Suppose that the characteristic polynomial of T L(V) splits over F:
p(t) = ( λ
1
t)
m
1
···(λ
k
t)
m
k
where the λ
j
are the distinct eigenvalues of T with algebraic multiplicities m
j
. Then:
1. For each eigenvalue; (a) K
λ
= N(T λI)
m
and (b) dim K
λ
= m.
2. V = K
λ
1
··· K
λ
k
: there exists a basis of generalized eigenvectors.
Compare this with the statement on diagonalizability from the start of the course.
With regard to part 2; we shall eventually be able to choose this to be a Jordan canonical basis. In
conclusion: a map has a Jordan canonical basis if and only if its characteristic polynomial splits.
Examples 3.6. 1. Observe how Example 3.3 works in this language:
A =
1 2 3
4 5 4
2 1 4
= p(t) = (2 t)
1
(3 t)
2
K
2
= N(A 2I)
1
= Span
1
0
1
= dim K
2
= 1
K
3
= N(A 3I)
2
= N
2 1 1
0 0 0
2 1 1
= Span
1
2
0
,
1
1
1
= dim K
3
= 2
R
3
= K
2
K
3
2. We find the generalized eigenspaces of the matrix A =
5 2 1
0 0 0
9 6 1
The characteristic polynomial is
p(t) = det(A λI) = t
5 t 1
9 1 t
= t(t
2
5t + t 5 + 9) = (0 t)
1
(2 t)
2
λ = 0 has multiplicity 1; indeed K
0
= N(A 0I)
1
= N(A) = Span
1
1
3
is just the
eigenspace E
0
.
λ = 2 has multiplicity 2,
K
2
= N(A 2I)
2
= N
3 2 1
0 2 0
9 6 3
2
= N
0 4 0
0 4 0
0 12 0
= Span
1
0
0
,
0
0
1
In this case the corresponding eigenspace is one-dimensional, E
2
= Span
1
0
3
K
2
, and
the matrix is non-diagonalizable.
Observe also that R
3
= K
0
K
2
in accordance with the Theorem.
68
Properties of Generalized Eigenspaces and the Proof of Theorem 3.5
A lot of work is required to justify our main result. Feel free to skip the proofs at first reading.
Lemma 3.7. Let λ be an eigenvalue of T L(V). Then:
1. E
λ
is a subspace of K
λ
, which is itself a subspace of V.
2. K
λ
is T-invariant.
3. Suppose K
λ
is finite-dimensional and µ = λ. Then:
(a) K
λ
is ( T µI)-invariant and the restriction of T µI to K
λ
is an isomorphism.
(b) If µ is another eigenvalue, then K
λ
K
µ
= {0}. In particular K
λ
contains no eigenvectors
other than those in E
λ
.
Proof. 1. These are an easy exercise.
2. Let x K
λ
, then k such that (T λI)
k
( x) = 0. But then
(T λI)
k
T(x)
= (T λI)
k
T(x) λx + λx
= (T λI)
k+1
( x) + λ(T λI)
k
( x) = 0
Otherwise said, T(x) K
λ
.
3. (a) Let x K
λ
. Part 2 tells us that
(T µI)(x) = T(x) µx K
λ
whence K
λ
is ( T µI)-invariant.
Suppose, for a contradiction, that T µI is not injective on K
λ
. Then
y K
λ
\{0} such that ( T µI)(y) = 0
Let k N be minimal such that (T λI)
k
( y) = 0 and let z = (T λI)
k1
( y). Plainly
z = 0, for otherwise k is not minimal. Moreover,
(T λI)(z) = (T λI)
k
( y) = 0 = z E
λ
Since T µI and T λI commute, we can also compute the effect of T µI;
(T µI)(z) = (T µI) (T λI)
k1
( y) = (T λI)
k1
(T µI)(x) = 0
which says that z is an eigenvector in E
µ
; if µ isn’t an eigenvalue, then we already have
our contradiction! Even if µ is an eigenvalue, E
µ
E
λ
= {0} provides the desired contra-
diction.
We conclude that (T µI)
K
λ
L(K
λ
) is injective. Since dim K
λ
< , the restriction is
automatically an isomorphism.
(b) This is another exercise.
69
Now to prove Theorem 3.5: remember that the characteristic polynomial of T is assumed to split.
Proof. (Part 1(a)) Fix an eigenvalue λ. By definition, we have N(T λI)
m
K
λ
.
For the converse, parts 2 and 3 of the Lemma tell us (why?) that
p
λ
( t) = (λ t)
dim K
λ
from which dim K
λ
m ()
By the Cayley–Hamilton Theorem, T
K
λ
satisfies its characteristic polynomial, whence
x K
λ
,
(
λI T
)
dim K
λ
( x) = 0 = K
λ
N(T λI)
m
(Parts 1(b) and 2) We prove simultaneously by induction on the number of distinct eigenvalues of T.
(Base case) If T has only one eigenvalue, then p(t) = (λ t)
m
. Another appeal to Cayley–
Hamilton says ( T λI)
m
( x) = 0 for all x V. Thus V = K
λ
and dim K
λ
= m.
(Induction step) Fix k and suppose the results hold for maps with k distinct eigenvalues. Let T
have distinct eigenvalues λ
1
, . . . , λ
k
, µ, with multiplicities m
1
, . . . , m
k
, m respectively. Define
16
W = R(T µI)
m
The subspace W has the following properties, the first two of which we leave as exercises:
W is T-invariant.
W K
µ
= {0} so that µ is not an eigenvalue of the restriction T
W
.
Each K
λ
j
W: since (T µI)
K
λ
j
is an isomorphism (Lemma part 3), we can invert,
x K
λ
j
= x = ( T µI)
m
(T µI)
1
K
λ
j
m
( x) R(T µI)
m
= W
We conclude that λ
j
is an eigenvalue of the restriction T
W
with generalized eigenspace K
λ
j
.
Since T
W
has k distinct eigenvalues, the induction hypotheses apply:
W = K
λ
1
··· K
λ
k
and p
W
( t) = (λ
1
t)
dim K
λ
1
···(λ
k
t)
dim K
λ
k
Since W K
µ
= {0} it is enough finally to use the rank–nullity theorem and count dimensions:
dim V = rank(T µI)
m
+ null(T µI)
m
= dim W + dim K
µ
=
k
j=1
dim K
λ
j
+ dim K
µ
()
m
1
+ ··· + m
k
+ m = deg(p(t)) = dim V
The inequality is thus an equality; each dim K
λ
j
= m
j
and dim K
µ
= m. We conclude that
V = K
λ
1
··· K
λ
k
K
µ
which completes the induction step and thus the proof. Whew!
16
This is yet another argument where we consider a suitable subspace to which we can apply an induction hypothesis;
recall the spectral theorem, Schur’s lemma, bilinear form diagonalization, etc. Theorem 3.12 will provide one more!
70
Cycles of Generalized Eigenvectors
By Theorem 3.5, for every linear map whose characteristic polynomial splits there exists generalized
eigenbasis. This isn’t the same as a Jordan canonical basis, but we’re very close!
Example 3.8. The matrix A =
5 1 0
0 5 1
0 0 5
M
3
(R ) is a single Jordan block, whence there is a single
generalized eigenspace K
5
= R
3
and the standard basis ϵ = {e
1
, e
2
, e
3
} is Jordan canonical.
The crucial observation for what follows is that one of these vectors e
3
generates the others via re-
peated applications of A 5I:
e
2
= (A 5I)e
3
, e
1
= (A 5I)e
2
= (A 5I)
2
e
3
Definition 3.9. A cycle of generalized eigenvectors for a linear operator T is a set
β
x
:=
n
(T λI)
k1
( x), . . . , (T λI)(x) , x
o
where the generator x K
λ
is non-zero and k is minimal such that (T λI)
k
( x) = 0.
Note that the first element ( T λI)
k1
( x) is an eigenvector.
Our goal is to show that K
λ
has a basis consisting of cycles of generalized eigenvectors; putting these
together results in a Jordan canonical basis.
Lemma 3.10. Let β
x
be a cycle of generalized eigenvectors of T with length k. Then:
1. β
x
is linearly independent and thus a basis of Span β
x
.
2. Span β
x
is T-invariant. With respect to β
x
, the matrix of the restriction of T is the k × k Jordan
block [T
Span β
x
]
β
x
=
λ 1
λ
.
.
.
.
.
.
1
λ
!
.
In what follows, it will be useful to consider the linear map U = T λI. Note the following:
The nullspace of U is the eigenspace: N(U) = E
λ
K
λ
.
T commutes with U: that is TU = UT.
β
x
= {U
k1
( x), . . . , U(x), x}; that is, Span β
x
=
x
is the U-cyclic subspace generated by x.
Proof. 1. Feed the linear combination
k1
j=0
a
j
U
j
( x) = 0 to U
k1
to obtain
a
0
U
k1
( x) = 0 = a
0
= 0
Now feed the same combination to U
k2
, etc., to see that all coefficients a
j
= 0.
2. Since T and U commute, we see that
T
U
j
( x)
= U
j
T(x)
= U
j
(U + λI)(x)
= U
j+1
( x) + λU
j
( x) Span β
x
This justifies both T-invariance and the Jordan block claim!
71
The basic approach to finding a Jordan canonical basis is to find the generalized eigenspaces and play
with cycles until you find a basis for each K
λ
. Many choices of canonical basis exist for a given map!
We’ll consider a more systematic method in the next section.
Examples 3.11. 1. The characteristic polynomial of A =
1 0 2
0 1 6
6 2 1
M
3
(R ) splits:
p(t) = (1 t)
1 t 6
2 1 t
+ 2
0 1 t
6 2
= (1 t)
(1 t)
2
+ 12 12
= (1 t)
3
With only one eigenvalue we see that K
1
= R
3
. Simply choose any vector in R
3
and see what
U = A I does to it! For instance, with x = e
1
,
β
x
=
n
U
2
1
0
0
, U
1
0
0
,
1
0
0
o
=
n
12
36
0
,
0
0
6
,
1
0
0
o
provides a Jordan canonical basis of R
3
. We conclude
A = QJQ
1
=
12 0 1
36 0 0
0 6 0
1 1 0
0 1 1
0 0 1
12 0 1
36 0 0
0 6 0
1
In practice, almost any choice of x R
3
will generate a cycle of length three!
2. The matrix B =
7 1 4
0 3 0
8 1 5
M
3
(R ) has characteristic equation
p(t) = (3 t)(t
2
2t 3) = (t + 1)
1
( t 3)
2
dim K
1
= 1 = K
1
= E
1
= Span
1
0
2
, spanned by a cycle of length one.
Since dim K
3
= 2, we have
K
3
= N(B 3I)
2
= N
4 1 4
0 0 0
8 1 8
2
= N
16 0 16
0 0 0
32 0 32
= Span
n
1
0
1
,
0
1
0
o
This is spanned by a cycle of length two:
1
0
1
is an eigenvector and
1
0
1
= (B 3I)
0
1
0
We conclude that β =
n
1
0
2
,
1
0
1
,
0
1
0
o
is a Jordan canonical basis for B, and that
B = QJQ
1
=
1 1 0
0 0 1
2 1 0
1 0 0
0 3 1
0 0 3
1 1 0
0 0 1
2 1 0
1
3. Let T =
d
dx
on P
3
(R ). With respect to the standard basis ϵ = {1, x, x
2
, x
3
},
A = [T]
ϵ
=
0 1 0 0
0 0 2 0
0 0 0 3
0 0 0 0
With only one eigenvalue λ = 0, we have a single generalized eigenspace K
0
= P
3
(R ). It is
easy to check that f (x) = x
3
generates a cycle of length three and thus a Jordan canonical basis:
β = {6, 6x, 3x
2
, x
3
} = [T]
β
=
0 1 0 0
0 0 1 0
0 0 0 1
0 0 0 0
=
6 0 0 0
0 6 0 0
0 0 3 0
0 0 0 1
1
0 1 0 0
0 0 2 0
0 0 0 3
0 0 0 0
6 0 0 0
0 6 0 0
0 0 3 0
0 0 0 1
72
Our final results state that this process works generally.
Theorem 3.12. Let T L(V) have an eigenvalue λ. If dim K
λ
< , then there exists a basis
β
λ
= β
x
1
··· β
x
n
of K
λ
consisting of finitely many linearly independent cycles.
Intuition suggests that we create cycles β
x
j
by starting with a basis of the eigenspace E
λ
and extending
backwards: for each x, if x = (T λI)(y), then x β
y
; now repeat until you have a maximum length
cycle. This is essentially what we do, though a sneaky induction is required to make sure we keep
track of everything and guarantee that the result really is a basis of K
λ
.
Proof. We prove by induction on m = dim K
λ
.
(Base case) If m = 1, then K
λ
= E
λ
= Span x for some eigenvector x. Plainly {x} = β
x
.
(Induction step) Fix m 2. Write n = dim E
λ
m and U = (T λI)
K
λ
.
(i) For the induction hypothesis, suppose every generalized eigenspace with dimension < m (for
any linear map!) has a basis consisting of independent cycles of generalized eigenvectors.
(ii) Define W = R(U) E
λ
: that is
w W
(
U(w) = 0 and
w = U( v) for some v K
λ
Let k = dim W, choose a complementary subspace X such that E
λ
= W X and select a basis
{x
k+1
, . . . , x
n
} of X. If k = 0, the induction step is finished (why?). Otherwise we continue. . .
(iii) The calculation in the proof of Lemma 3.10 (take j = 1) shows that R(U) is T-invariant; it is
therefore the single generalized eigenspace
˜
K
λ
of T
R(U)
.
(iv) By the rank–nullity theorem,
dim R(U) = rank U = dim K
λ
null U = m dim E
λ
< m
By the induction hypothesis, R(U) has a basis of independent cycles. Since the last non-zero
element in each cycle is an eigenvector, this basis consists of k distinct cycles β
ˆ
x
1
··· β
ˆ
x
k
whose terminal vectors form a basis of W.
(v) Since each
ˆ
x
j
R(U), there exist vectors x
1
, . . . , x
k
such that
ˆ
x
j
= U(x
j
). Including the length-
one cycles generated by the basis of X, the cycles β
x
1
, . . . , β
x
n
now contain
dim R(U) + k + (n k) = rank U + null U = m
vectors. We leave as an exercise the verification that these vectors are linearly independent.
Corollary 3.13. Suppose that the characteristic polynomial of T L(V) splits (necessarily dim V <
). Then there exists a Jordan canonical basis, namely the union of bases β
λ
from Theorem 3.12.
Proof. By Theorem 3.5, V is the direct sum of generalized eigenspaces. By the previous result, each
K
λ
has a basis β
λ
consisting of finitely many cycles. By Lemma 3.10, the matrix of T
K
λ
has Jordan
canonical form with respect to β
λ
. It follows that β =
S
β
λ
is a Jordan canonical basis for T.
73
Exercises 3.1 1. For each matrix, find the generalized eigenspaces K
λ
, find bases consisting of
unions of disjoint cycles of generalized eigenvectors, and thus find a Jordan canonical form J
and invertible Q so that the matrix may be expressed as QJQ
1
.
(a) A =
1 1
1 3
(b) B =
1 2
3 2
(c) C =
11 4 5
21 8 11
3 1 0
(d) D =
2 1 0 0
0 2 1 0
0 0 3 0
0 1 1 3
2. If β = {v
1
, . . . , v
n
} is a Jordan canonical basis, what can you say about v
1
? Briefly explain why
the linear map L
A
L(R
2
) where A =
0 1
1 0
has no Jordan canonical form.
3. Find a Jordan canonical basis for each linear map T:
(a) T L(P
2
(R )) defined by T( f (x)) = 2 f (x) f
(x)
(b) T( f ) = f
defined on Span{1, t, t
2
, e
t
, te
t
}
(c) T(A) = 2A + A
T
defined on M
2
(R )
4. In Example 3.11.1, suppose x =
a
b
c
. Show that almost any choice of a, b, c produces a Jordan
canonical basis β
x
.
5. We complete the proof of Lemma 3.7.
(a) Prove part 1: that E
λ
K
λ
V.
(b) Verify that T µI and T λI commute.
(c) Prove part 3(b): generalized eigenspaces for distinct eigenvalues have trivial intersection.
6. Consider the induction step in the proof of Theorem 3.5.
(a) Prove that W is T-invariant.
(b) Explain why W K
µ
= {0}.
(c) The assumption p
W
( t) = (λ
1
t)
dim K
λ
1
···(λ
k
t)
dim K
λ
k
near the end of the proof is the
induction hypothesis for part 1(b). Why can’t we also assume that dim K
λ
j
= m
j
and thus
tidy the inequality argument near the end of the proof?
7. We finish some of the details of Theorem 3.12.
(a) In step (ii), suppose dim W = k = 0. Explain why {x
1
, . . . , x
n
} is in fact a basis of K
λ
, so
that the rest of the proof is unnecessary.
(b) In step (v), prove that the m vectors in the cycles β
x
1
, . . . , β
x
n
are linearly independent.
(Hint: model your argument on part 1 of Lemma 3.10)
74
3.2 Cycle Patterns and the Dot Diagram
In this section we obtain a useful result that helps us compute Jordan forms more efficiently and
systematically. To give us some clues how to proceed, here is a lengthy example.
Example 3.14. Precisely three Jordan canonical forms A, B, C M
3
(R ) correspond to the charac-
teristic polynomial p(t) = (5 t)
3
:
A =
5 0 0
0 5 0
0 0 5
B =
5 1 0
0 5 0
0 0 5
C =
5 1 0
0 5 1
0 0 5
In all three cases the standard basis β = {e
1
, e
2
, e
3
} is Jordan canonical, so how do we distinguish
things? By considering the number and lengths of the cycles of generalized eigenvectors.
A has eigenspace E
5
= K
5
= R
3
. Since (A 5I)v = 0 for all v R
3
, we have maximum
cycle-length one. We therefore need three distinct cycles to construct a Jordan basis, e.g.
β
e
1
= {e
1
}, β
e
2
= {e
2
}, β
e
3
= {e
3
} = β = β
e
1
β
e
2
β
e
3
= {e
1
, e
2
, e
3
}
B has eigenspace E
5
= Span{e
1
, e
3
}. By computing
v =
a
b
c
= (B 5I)v =
b
0
0
= (B 5I)
2
v = 0
we see that β
v
is a cycle with maximum length two, provided b = 0 (v E
5
). We therefore
need two distinct cycles, of lengths two and one, to construct a Jordan basis, e.g.
β
e
2
=
(B 5I)e
2
, e
2
= {e
1
, e
2
}, β
e
3
= {e
3
} = β = β
e
2
β
e
3
= {e
1
, e
2
, e
3
}
C has eigenspace E
5
= Span e
1
. This time
v =
a
b
c
= (C 5I)v =
b
c
0
, (C 5I)
2
v =
c
0
0
, (C 5I)
3
v = 0
generates a cycle with maximum length two provided c = 0. Indeed this cycle is a Jordan basis,
so one cycle is all we need:
β = β
e
3
=
(C 5I)
2
e
3
, (C 5I)e
3
, e
3
= {e
1
, e
2
, e
3
}
Why is the example relevant? Suppose that dim
R
V = 3 and that T L(V) has characteristic polyno-
mial p(t) = (5 t)
3
. Theorem 3.12 tells us that T has a Jordan canonical form, and that is is moreover
one of the above matrices A, B, C. Our goal is to develop a method whereby the pattern of cycle-
lengths can be determined, thus allowing us to be able to discern which Jordan form is correct. As a
side-effect, this will also demonstrate that the pattern of cycle lengths for a given T is independent of
the Jordan basis so that, up to some reasonable restriction, the Jordan form of T is unique. To aid us
in this endeavor, we require some terminology. . .
75
Definition 3.15. Let V be finite dimensional and K
λ
a generalized eigenspace of T L(V). Follow-
ing the Theorem 3.12, assume that β
λ
= β
x
1
··· β
x
n
is a Jordan canonical basis of T
K
λ
, where the
cycles are arranged in non-increasing length. That is:
1. β
x
j
= {(T λI)
k
j
1
( x
j
), . . . , x
j
} has length k
j
, and
2. k
1
k
2
··· k
n
The dot diagram of T
K
λ
is a representation of the elements of β
λ
, one dot for each vector: the j
th
column
represents the elements of β
x
j
arranged vertically with x
j
at the bottom.
Given a linear map, our eventual goal is to identify the dot diagram as an intermediate step in the
computation of a Jordan basis. First, however, we observe how the conversion of dot diagrams to a
Jordan form is essentially trivial.
Example 3.16. Suppose dim V = 14 and that T L(V) has the following eigenvalues and dot
diagrams:
λ
1
= 4 λ
2
= 7 λ
3
= 12
Then generalized eigenspaces of T satisfy:
K
4
= N(T + 4I)
2
and dim K
4
= 6;
K
7
= N(T 7I)
3
and dim K
7
= 5;
K
12
= N(T 12I) = E
12
and dim K
12
= 3;
T has a Jordan canonical basis β with respect to which its Jordan canonical form is
[T]
β
=
4 1
0 4
4 1
0 4
4
4
7 1 0
0 7 1
0 0 7
7 1
0 7
12
12
12
Note how the sizes of the Jordan blocks are non-increasing within each eigenvalue. For instance, for
λ
1
= 4, the sequence of cycle lengths (k
j
) is 2 2 1 1.
76
Theorem 3.17. Suppose β
λ
is a Jordan canonical basis of T
K
λ
as described in Definition 3.15, and
suppose the i
th
row of the dot diagram has r
i
entries. Then:
1. For each r N, the vectors associated to the dots in the first r rows form a basis of N(T λI)
r
.
2. r
1
= null(T λI) = dim V rank(T λI)
3. When i > 1, r
i
= null(T λI)
i
null(T λI)
i1
= rank(T λI)
i1
rank(T λI)
i
Example (3.14 cont). We describe the dot diagrams of the three matrices A, B, C, along with the
corresponding vectors in the Jordan canonical basis β and the values r
i
.
A :
x
1
x
2
x
3
e
1
e
2
e
3
Since A 5I is the zero matrix, r
1
= 3 rank(A 5I) = 3. The dot diagram has one row,
corresponding to three independent cycles of length one: β = β
e
1
β
e
2
β
e
3
.
B :
(B 5I)x
1
x
2
x
1
e
1
e
3
e
2
Row 1: B 5I =
0 1 0
0 0 0
0 0 0
= rank(B 5I) = 1 and r
1
= 3 1 = 2. The first row {e
1
, e
3
} is
a basis of E
5
= N(B 5I).
Row 2: (B 5I)
2
is the zero matrix, whence r
2
= rank(B 5I) rank(B 5I)
2
= 1 0 = 1.
The dot diagram corresponds to β = β
e
2
β
e
3
= {e
1
, e
2
}{e
3
}.
C :
(C 5I)
2
x
1
(C 5I) x
1
x
1
e
1
e
2
e
3
Row 1: C 5I =
0 1 0
0 0 1
0 0 0
= r
1
= 3 rank(C 5I) = 1. The first row {e
1
} is a basis of
E
5
= N(C 5I).
Row 2: (C 5I)
2
=
0 0 1
0 0 0
0 0 0
= r
2
= rank(C 5I) rank(C 5I)
2
= 2 1 = 1. The first
two rows {e
1
, e
2
} form a basis of N(C 5I)
2
.
Row 3: (C 5I)
3
is the zero matrix, whence r
3
= rank(C 5I)
2
rank(C 5I)
3
= 1 0 = 1.
Proof. As previously, let U = T λI.
1. Since each dot represents a basis vector U
p
( v
j
), any v K
λ
may be written uniquely as a linear
combination of the dots. Applying U simply moves all the dots up a row and all dots in the top
row to 0. It follows that v N(U
r
) it lies in the span of the first r rows. Since the dots
are linearly independent, they form a basis.
2. By part 1, r
1
= dim N(U) = null(T λI) = dim V rank(T λI) .
3. More generally,
r
i
= (r
1
+ ··· + r
i
) (r
1
+ ··· + r
i1
) = dim N(U
i
) dim N(U
i1
)
= null(U
i
) null(U
i1
) = rank(T λI)
i1
rank(T λI)
i
77
Since the ranks of maps ( T λI)
i
are independent of basis, so also is the dot diagram. . .
Corollary 3.18. For any eigenvalue λ, the dot diagram is uniquely determined by T and λ. If we
list Jordan blocks for each eigenspace in non-increasing order, then the Jordan form of a linear map
is unique up to the order of the eigenvalues.
We now have a slightly more systematic method for finding Jordan canonical bases.
Example 3.19. The matrix A =
6 2 4 6
0 3 0 0
0 0 3 0
2 1 2 1
has characteristic equation
p(t) = (3 t)
2
6 t 6
2 1 t
= (2 t)(3 t)
3
We have two generalized eigenspaces:
K
2
= E
2
= N(A 2I) = N
4 2 4 6
0 1 0 0
0 0 1 0
2 1 2 3
= Span
3
0
0
2
. The trivial dot diagram corresponds
to this single eigenvector.
K
3
= N(A 3I)
3
. To find the dot diagram, compute powers of A 3I:
Row 1: A 3I =
3 2 4 6
0 0 0 0
0 0 0 0
2 1 2 4
has rank 2 and the first row has r
1
= 4 2 = 2 entries.
Row 2: (A 3I)
2
=
3 0 0 6
0 0 0 0
0 0 0 0
2 0 0 4
has rank 1 and the second row has r
2
= 2 1 = 1 entry.
Since we now have three dots (equalling dim K
3
), the algorithm terminates and the dot diagram
for K
3
is
For the single dot in the second row, we choose something in N(A 3I)
2
which isn’t an eigen-
vector; perhaps the simplest choice is x
1
= e
2
, which yields the two-cycle
β
x
1
=
{
(A 3I)x
1
, x
1
}
=
2
0
0
1
,
0
1
0
0
To complete the first row, choose any eigenvector to complete the span: for instance x
2
=
0
2
1
0
.
We now have suitable cycles and a Jordan canonical basis/form:
β =
3
0
0
2
,
2
0
0
1
,
0
1
0
0
,
0
2
1
0
, A = QJQ
1
=
3 2 0 0
0 0 1 2
0 0 0 1
2 1 0 0
2 0 0 0
0 3 1 0
0 0 3 0
0 0 0 3
3 2 0 0
0 0 1 2
0 0 0 1
2 1 0 0
1
Other choices are available! For instance, if we’d chosen the two-cycle generated by x
1
= e
3
, we’d
obtain a different Jordan basis but the same canonical form J:
˜
β =
3
0
0
2
,
4
0
0
2
,
0
0
1
0
,
0
2
1
0
, A =
3 4 0 0
0 0 0 2
0 0 1 1
2 2 0 0
2 0 0 0
0 3 1 0
0 0 3 0
0 0 0 3
3 4 0 0
0 0 0 2
0 0 1 1
2 2 0 0
1
78
We do one final example for a non-matrix map.
Example 3.20. Let ϵ = {1, x, y, x
2
, y
2
, xy} and define T
f (x, y)
= 2
f
x
f
y
as a linear operator on
V = Span
R
ϵ. The matrix and characteristic polynomial of T is easy to compute:
[T]
ϵ
=
0 2 1 0 0 0
0 0 0 4 0 1
0 0 0 0 2 2
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
= p(t) = t
6
, [T
2
]
ϵ
=
0 0 0 8 2 4
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
, [T
3
]
ϵ
= O
There is only one eigenvalue λ = 0 and therefore one generalized eigenspace K
0
= V. We could keep
working with matrices, but it is easy to translate the nullspaces of the matrices back to subspaces of
V, from which the necessary data can be read off:
N(T) = Span{1, x + 2y, x
2
+ 4y
2
+ 4xy} null T = 3, rank T = 3, r
1
= 3
N(T
2
) = Span{1, x, y, x
2
+ 2xy, 2y
2
+ xy} null T
2
= 5, rank T
2
= 1, r
2
= 3 1 = 2
We now have five dots; since dim K
0
= 6, the last row has one, and the dot diagram is
Since the first two rows span N(T
2
), we may choose any f
1
N(T
2
) for the final dot: f
1
= xy is
suitable, from which the first column of the dot diagram becomes
T
2
(xy)
T(xy)
xy
4
2y x
xy
Now choose the second dot on the second row to be anything in N(T
2
) such that the first two rows
span N(T
2
): this time f
2
= x
2
4y
2
is suitable, and the diagram becomes:
T
2
(xy) T(x
2
4y
2
)
T(xy) x
2
4y
2
xy
4 4x + 8y
2y x x
2
4y
2
xy
The final dot is now chosen so that the first row spans N(T): this time f
3
= x
2
+ 4y
2
+ 4xy works.
The result is a Jordan canonical basis and form for T
β =
4, 2y x, xy, 4x + 8y, x
2
4y
2
, x
2
+ 4y
2
+ 4xy
, J = [T]
β
=
0 1 0
0 0 1
0 0 0
0 1
0 0
0
As previously, many other choices of cycle-generators f
1
, f
2
, f
3
are available; while these result in
different Jordan canonical bases, Corollary 3.18 assures us that we’ll always obtain the same canonical
form J.
79
Exercises 3.2 1. Let T be a linear operator whose characteristic polynomial splits. Suppose the
eigenvalues and the dot diagrams for the generalized eigenspaces K
λ
i
are as follows:
λ
1
= 2 λ
2
= 4 λ
3
= 3
Find the Jordan form J of T.
2. Suppose T has Jordan canonical form
J =
2 1 0
0 2 1
0 0 2
2 1
0 2
3
3
(a) Find the characteristic polynomial of T.
(b) Find the dot diagram for each eigenvalue.
(c) For each eigenvalue find the smallest k
j
such that K
λ
j
= N(T λ
j
I)
k
j
.
3. For each matrix A find a Jordan canonical form and an invertible Q such that A = QJQ
1
.
(a) A =
3 3 2
7 6 3
1 1 2
(b) A =
0 1 1
4 4 2
2 1 1
(c) A =
0 3 1 2
2 1 1 2
2 1 1 2
2 3 1 4
4. For each linear operator T, find a Jordan canonical form J and basis β:
(a) T( f ) = f
on Span
R
{e
t
, te
t
, t
2
e
t
, e
2t
}
(b) T
f (x)
= x f
′′
(x) on P
3
(R )
(c) T( f ) = a f
x
+ b f
y
on Span
R
{1, x, y, x
2
, y
2
, xy}. How does your answer depend on a, b?
5. (Generalized Eigenvector Method for ODEs) Let A M
n
(R ) have an eigenvalue λ and sup-
pose β
v
0
= {v
k1
, . . . , v
1
, v
0
} is a cycle of generalized eigenvectors for this eigenvalue. Show
that
x(t) := e
λt
k1
j=0
b
j
( t)v
j
satisfies x
( t) = Ax
(
b
0
( t) = 0, and
b
j
( t) = b
j1
( t) when j 1
Use this method to solve the system of differential equations
x
=
3 1 0 0
0 3 1 0
0 0 3 0
0 0 0 2
x
80
3.3 The Rational Canonical Form (non-examinable)
We finish the course with a very quick discussion of what can be done when the characteristic poly-
nomial of a linear map does not split. In such a situation, we may assume that
p(t) = ( 1)
n
ϕ
1
( t)
m
1
···
ϕ
k
( t)
m
k
()
where each ϕ
j
( t) is an irreducible monic polynomial over the field.
Example 3.21. The following matrix has characteristic equation p(t) = (t
2
+ 1)
2
(3 t)
A =
0 1 0 0 0
1 0 0 0 0
0 0 0 1 0
0 0 1 0 0
0 0 0 0 3
!
M
5
(R )
This doesn’t split over R since t
2
+ 1 = 0 has no real roots. It is, however, diagonalizable over C.
A couple of basic facts from algebra:
Every polynomial splits over C: every A M
n
(C ) therefore has a Jordan form.
Every polynomial over R factorizes into linear or irreducible quadratic factors.
The question is how to deal with non-linear irreducible factors in the characteristic polynomial.
Definition 3.22. The monic polynomial t
k
+ a
k1
t
k1
+ ··· + a
0
has companion matrix
0 0 0 ··· 0 a
0
1 0 0 0 a
1
0 1 0 0 a
2
.
.
.
.
.
.
.
.
.
0 0 0 0 a
k2
0 0 0 ··· 1 a
k1
(when k = 1, this is the 1 ×1 matrix (a
0
))
If T L(V) has characteristic polynomial (), then a rational canonical basis is a basis for which
[T]
β
=
C
1
O ··· O
O C
2
O
.
.
.
.
.
.
.
.
.
O O ··· C
r
where each C
j
is a companion matrix of some (ϕ
j
( t))
s
j
where s
j
m
j
. We call [T]
β
a rational canonical
form of T.
We state the main result without proof:
Theorem 3.23. A rational canonical basis exists for any linear operator T on a finite-dimensional
vector space V. The canonical form is unique up to ordering of companion matrices.
Example (3.21 cont). The matrix A is already in rational canonical form: the standard basis is rational
canonical with three companion blocks,
C
1
= C
2
=
0 1
1 0
, C
3
= (3)
81
Example 3.24. Let A =
4 3
2 2
M
2
(R ). Its characteristic polynomial
p(t) = t
2
6t + 14 = (t 3)
2
+ 5
doesn’t split over R and so it has no eigenvalues. Instead simply pick a vector, x =
1
0
(say), define
y = Ax =
4
2
, let β = {x, y} and observe that
[L
A
]
β
=
0 14
1 6
is a rational canonical form. Indeed this works for any x = 0: if β := {x, Ax}, then Cayley–Hamilton
forces
A
2
x = (6A 14I)x = 14x + 6Ax = [L
A
]
β
=
0 14
1 6
whence β is a rational canonical basis and the form [L
A
]
β
is independent of x!
A systematic approach to finding rational canonical forms is similar to that for Jordan forms: for each
irreducible divisor of p(t), the subspace K
ϕ
= N
ϕ(T)
m
plays a role analogous to a generalized
eigenspace; indeed K
λ
= K
ϕ
for the linear irreducible factor ϕ(t) = λ t!
We finish with two examples; hopefully the approach is intuitive, even without theoretical justifica-
tion.
Examples 3.25. If the characteristic polynomial of T L(R
4
) is
p(t) = ( ϕ(t))
2
= (t
2
2t + 3)
2
= t
4
4t
3
+ 10t
2
12t + 9
then there are two possible rational canonical forms; here is an example of each.
1. If A =
0 15 0 9
2 2 3 0
0 9 0 6
3 0 5 2
!
, then ϕ(A) = O is the zero matrix, whence N(ϕ(A)) = R
4
. Since ϕ(t)
isn’t the full characteristic polynomial, we expect there to be two independent cycles of length
two in the canonical basis. Start with something simple as a guess:
x
1
=
1
0
0
0
= x
2
= Ax
1
=
0
2
0
3
= Ax
2
=
3
4
0
6
= 3x
1
+ 2x
2
Now make another choice that isn’t in the span of {x
1
, x
2
}:
x
3
=
0
0
1
0
= x
4
= Ax
3
=
0
3
0
5
= Ax
4
=
0
6
3
10
= 3x
3
+ 2x
4
We therefore have a rational canonical basis β = {x
1
, x
2
, x
3
, x
4
} and
A =
1 0 0 0
0 2 0 3
0 0 1 0
0 3 0 5
0 3 0 0
1 2 0 0
0 0 0 3
0 0 1 2
1 0 0 0
0 2 0 3
0 0 1 0
0 3 0 5
1
Over C, this example is diagonalizable. Indeed each of the 2 ×2 companion matrices is diago-
nalizable over C.
82
2. Let B =
0 0 2 1
1 1 1 1
0 1 2 16
0 0 1 5
. This time
ϕ(B) = B
2
2B + 3I =
3 2 7 29
1 1 4 13
1 3 6 17
0 1 1 2
!
= N(ϕ(B)) = Span
3
1
1
0
,
11
2
0
1
Anything not in this span will suffice as a generator for a single cycle of length four: e.g.,
x
1
=
1
0
0
0
, x
2
= Bx
1
=
0
1
0
0
, x
3
= Bx
2
=
0
1
1
0
, x
4
= Bx
3
=
2
0
1
1
Bx
4
=
1
2
14
4
= 9
1
0
0
0
+ 12
0
1
0
0
10
0
1
1
0
+ 4
2
0
1
1
We therefore have a rational canonical basis β = {x
1
, x
2
, x
3
, x
4
} and
B =
1 0 0 2
0 1 1 0
0 0 1 1
0 0 0 1
0 0 0 9
1 0 0 12
0 1 0 10
0 0 1 4
1 0 0 2
0 1 1 0
0 0 1 1
0 0 0 1
1
In contrast to the first example, B isn’t diagonalizable over C. It has Jordan form J =
λ 1 0 0
0 λ 0 0
0 0 λ 1
0 0 0 λ
!
where λ = 1 + i
2.
83