2 Inner Product Spaces, part 1
You should be familiar with the scalar/dot product in R
2
. For any vectors x =
(
x
1
x
2
)
, y =
y
1
y
2
, define
x ·y := x
1
y
1
+ x
2
y
2
The norm or length of a vector is
||
x
||
=
x ·x.
The angle θ between vectors satisfies x · y =
||
x
||||
y
||
cos θ.
Vectors are orthogonal or perpendicular precisely when x · y = 0.
x
y
θ
The dot product is what allows us to compute lengths of and angles between vectors in R
2
. An inner
product is an algebraic structure that generalizes this idea to other vector spaces.
2.1 Real and Complex Inner Product Spaces
Unless explicitly stated otherwise, throughout this chapter F is either the real R or complex C field.
Definition 2.1. An inner product space (V,
,
) is a vector space V over F together with an inner
product: a function
,
: V ×V F satisfying the following properties x, y, z V, λ F,
(a) Linear:
λx + y, z
= λ
x, z
+
y, z
(b) Conjugate-Symmetric:
y, x
=
x, y
(complex conjugate!)
(c) Positive-definite: x = 0 =
x, x
> 0
The norm or length of x V is
||
x
||
:=
p
x, x
. A unit vector has
||
x
||
= 1.
Vectors x, y are perpendicular/orthogonal if
x, y
= 0 and orthonormal if they are additionally unit
vectors.
Real inner product spaces
The definition simplifies slightly when F = R.
Conjugate-symmetry becomes plain symmetry:
y, x
=
x, y
.
Linearity + symmetry yields bilinearity: a real inner product is also linear in its second slot
x, λy + z
= λ
x, y
+
x, z
A real inner product is often termed a positive-definite, symmetric, bilinear form.
The simplest example is the natural generalization of the dot product.
Definition 2.2. Euclidean space means R
n
equipped with the standard inner (dot) product,
x, y
= x ·y = y
T
x =
n
j=1
x
j
y
j
= x
1
y
1
+ ···+ x
n
y
n
,
||
x
||
=
v
u
u
t
n
j=1
x
2
j
Unless the inner product is stated explicitly, if we refer to R
n
as an inner product space, we mean
Euclidean space. However, there are many other ways to make R
n
into an inner product space. . .
1
Example 2.3. Define an alternative inner product on R
2
via
x, y
= x
1
y
1
+ 3x
2
y
2
It is easy to check that this satisfies the required properties:
(a) Linearity: follows from the associative/distributive laws in R,
λx + y, z
= (λx
1
+ y
1
) z
1
+ 3(λx
2
+ y
2
) z
2
= λ(x
1
z
1
+ 3x
2
z
2
) + (y
1
z
1
+ 3y
2
z
2
)
= λ
x, z
+
y, z
(b) Symmetry: follows from the commutativity of multiplication in R,
y, x
= y
1
x
1
+ 3y
2
x
2
= x
1
y
1
+ 3x
2
y
2
=
x, y
(c) Positive-definiteness: if x = 0, then
x, x
= x
2
1
+ 3x
2
2
> 0.
With respect to
,
, the concept of orthogonality feels strange: e.g., {x, y} =
n
1
2
1
1
,
1
2
3
3
1
o
is an
orthonormal set!
||
x
||
2
=
1
4
(1
2
+ 3 ·1
2
) = 1,
||
y
||
2
=
1
12
(3
2
+ 3 ·1
2
) = 1,
x, y
=
1
4
3
(3 3) = 0
However, with respect to the standard dot product, these vectors are not special:
x ·x =
1
2
, y ·y =
5
6
, x ·y =
1
2
3
We have the same vector space R
2
, but different inner product spaces: (R
2
,
,
) = (R
2
, ·).
The above is an example of a weighted inner product: choose weights a
1
, . . . , a
n
R
+
and define
x, y
=
n
j=1
a
j
x
j
y
j
= a
1
x
1
y
1
+ ···+ a
n
x
n
y
n
It is a simple exercise to check that this defines an inner product on R
n
. In particular, R
n
may be
equipped with infinitely many distinct inner products!
More generally, a symmetric matrix A M
n
(R ) is positive-definite if x
T
Ax > 0 for all non-zero x R
n
.
It is straightforward to check that
x, y
:= y
T
Ax
defines an inner product on R
n
. In fact all inner products on R
n
arise in this fashion! The weighted
inner products correspond to A being diagonal (Euclidean space is A = I), but this is not required.
Example 2.4. The matrix A =
3 1
1 1
is positive-definite and thus defines an inner product
x, y
= 3x
1
y
1
+ x
1
y
2
+ x
2
y
1
+ x
2
y
2
2
Lemma 2.5 (Basic properties). Let V be an inner product space, let x, y, z V and λ F.
1.
0, x
= 0
2.
||
x
||
= 0 x = 0
3.
||
λx
||
=
|
λ
|||
x
||
4.
x, z
=
y, z
for all z = x = y
5. (Cauchy–Schwarz inequality)
|
x, y
|
||
x
||||
y
||
with equality
if and only if x, y are parallel
6. (Triangle Inequality)
||
x + y
||
||
x
||
+
||
y
||
with equality if and
only if x, y are parallel and point in the same direction
x + y
x
y
Triangle inequality
Be careful with notation:
|
λ
|
means the absolute value/modulus (of a scalar), while
||
x
||
means the norm
(of a vector).
In the real case, Cauchy–Schwarz allows us to define angle via cos θ =
x,y
||
x
||||
y
||
, since the right-hand
side lies in the interval [1, 1]. However, except in Euclidean R
2
and R
3
, this notion is of limited use;
orthogonality (
x, y
= 0) and orthonormality are usually all we care about.
Proof. Parts 1–3 are exercises. For simplicity, we prove 5 and 6 only when F = R.
4. Let z = x y, apply the linearity condition and part 2:
x, z
=
y, z
= 0 =
x y, z
=
x y, x y
=
||
x y
||
2
= x = y
5. If y = 0, the result is trivial. WLOG (and by part 3) we may assume
||
y
||
= 1; if the inequality
holds for this, then it holds for all non-zero y by parts 1 and 4. Now expand:
0
||
x
x, y
y
||
2
(positive-definiteness)
=
||
x
||
2
+
|
x, y
|
2
||
y
||
2
x, y
x, y
x, y
y, x
(bilinearity)
=
||
x
||
2
|
x, y
|
2
(symmetry)
Taking square-roots establishes the inequality.
By part 2, equality holds if and only if x =
x, y
y, which is precisely when x, y are linearly
dependent.
6. We establish the squared result.
||
x
||
+
||
y
||
2
||
x + y
||
2
=
||
x
||
2
+ 2
||
x
||||
y
||
+
||
y
||
2
||
x
||
2
x, y
y, x
||
y
||
2
= 2
||
x
||||
y
||
x, y
2
||
x
||||
y
||
|
x, y
|
0 (Cauchy–Schwarz)
Equality requires both equality in Cauchy–Schwarz (x, y parallel) and that
x, y
0; since x, y
are already parallel, this means that one is a non-negative multiple of the other.
3
Complex Inner Product Spaces
Definition 2.1 is already set up nicely when C = F. One subtle difference comes from how we expand
linear combinations in the second slot.
Lemma 2.6. An inner product is a positive-definite, conjugate-symmetric, sesquilinear
1
form: it is
conjugate-linear (anti-linear) in the second slot,
z, λx + y
= λ
z, x
+
z, y
The proof is very easy if you remember your complex conjugates; try it!
Warning! If you dabble in the dark arts of Physics, be aware that their convention
2
is for an inner
product to be conjugate-linear in the first entry and linear in the second!
Definition 2.7. The standard (Hermitian) inner product and norm on C
n
are
x, y
= y
x =
n
j=1
x
j
y
j
= x
1
y
1
+ ···+ x
n
y
n
,
||
x
||
=
v
u
u
t
n
j=1
x
j
2
where y
= y
T
is the conjugate-transpose of y and
x
j
is the modulus.
Weighted inner products may be defined as on page 2:
x, y
=
n
j=1
a
j
x
j
y
j
= a
1
x
1
y
1
+ ···+ a
n
x
n
y
n
Note that the weights a
j
must still be positive real numbers. We may similarly define inner products in
terms of positive-definite matrices
x, y
= y
Ax.
Definition 2.8. A matrix A M
n
(C ) is Hermitian (self-adjoint) if A
= A. It is moreover positive-
definite if x
Ax > 0 for all non-zero x C
n
.
The self-adjoint condition reduces to symmetry (A
T
= A) if A is a real matrix.
Example 2.9. It can be seen (Exercise 6) that A =
3 i
i 3
is positive-definite, whence
x, y
=
y
1
y
2
3 i
i 3
x
1
x
2
= 3x
1
y
1
+ ix
1
y
2
ix
2
y
1
+ 3x
2
y
2
defines an inner product on C
2
.
Almost all results in this chapter will be written for general inner product spaces, thus covering the
real and complex cases simultaneously. If you don’t feel confident with complex numbers, simply
let F = R and delete all complex conjugates at first read! Very occasionally a different proof will be
required depending on the field. For simplicity, examples will more often use real inner products.
1
The prefix sesqui- means one-and-a-half; for instance a sesquicentenary is a 150 year anniversary.
2
The common Physics notation relates to ours via
x |y
=
y, x
.
4
Further Examples
As before, the field F must be either R or C.
Definition 2.10 (Frobenius inner product). If A, B M
m×n
(F ), define
A, B
= tr(B
A)
where tr is the trace of an n ×n matrix; this makes M
m×n
(F ) into an inner product space.
This isn’t really a new example: if we map A F
m×n
by stacking the columns of A, then the
Frobenius inner product is the standard inner product in disguise.
Example 2.11. In M
3×2
(C ),
*
1 i
2 i 0
0 1
,
0 7
1 2i
3 2i 4
+
= tr
0 1 3 + 2i
7 2i 4
1 i
2 i 0
0 1
= tr
2 i 3 2i
9 4i 4 + 7i
= 2 i 4 + 7i = 2 8i
1 i
2 i 0
0 1
2
= tr
1 2 + i 0
i 0 1
1 i
2 i 0
0 1
= tr
6 i
i 2
= 8
Definition 2.12 (L
2
inner product). Given a real interval [a, b], the function
f , g
:=
Z
b
a
f (t)g(t) dt
defines an inner product on the space C[a, b] of continuous functions f : [a, b] F.
With careful restriction, this works even for infinite intervals and a larger class of functions.
3
Veri-
fying the required properties is straightforward if you know a little analysis; for instance continuity
allows us to conclude
||
f
||
2
=
Z
b
a
|
f (x)
|
2
dx = 0 f (x) 0
This is our first example of an infinite-dimensional inner product space.
Example 2.13. Let f (x) = x and g(x) = x
2
; these lie in the inner product space C[1, 1] with
respect to the L
2
inner product.
f , g
=
Z
1
1
x
3
dx = 0,
||
f
||
2
=
Z
1
1
x
2
dx =
2
3
,
||
g
||
2
=
Z
1
1
x
4
dx =
2
5
With some simple scaling, we see that
n
1
||
f
||
f ,
1
||
g
||
g
o
=
q
3
2
x,
q
5
2
x
2
forms an orthonormal set.
3
For us, functions will always be continuous (often polynomials) on closed bounded intervals. The square-integrable
functions and L
2
-spaces for which the inner product is named are a more complicated business and beyond this course.
5
Definition 2.14 (
2
inner product). A sequence (x
n
) is square-summable if
n=1
|
x
n
|
2
< . These se-
quences form a vector space on which we can define an inner product
4
(x
n
), ( y
n
)
=
n=1
x
n
y
n
In essence we’ve taken the standard inner product on F
n
and let n ! This example, and its L
2
cousin, are the prototypical Hilbert spaces, which have great application to differential equations, sig-
nal processing, etc. Since a rigorous discussion requires a significant amount of analysis (convergence
of series, completeness, integrability), these objects are generally beyond the course.
Our final example of an inner product is a useful, and hopefully obvious, hack to which we shall
repeatedly appeal in examples.
Lemma 2.15. Let V be a vector space over R or C. If β is a basis of V, then there exists exactly one
inner product on V for which β is an orthonormal set.
Exercises 2.1 1. Evaluate the inner product of the given vectors.
(a) x =
1
2
1
, y =
1
1
1
where
x, y
= 2x
1
y
1
+ 3x
2
y
2
+ x
3
y
3
(b) x =
1
2i
, y =
5i
4
where
x, y
is the standard Hermitian inner product on C
2
(c) x =
1
2i
, y =
5i
4
where
x, y
= y
2 i
i 2
x
(d) f (x) = x 1, g(x) = x + 1 where
f , g
is the L
2
inner product on C[0, 2]
2. Suppose
x, y
=
x, z
for all x. Prove that y = z.
3. For each z V, The linearity condition says that the map T
z
: V F defined by
T
z
( x) =
x, z
is linear (T
z
is an element of the dual space V
). What, if anything, can you say about the function
U
z
: x 7
z, x
?
4. Define
x, y
:=
n
j=1
x
j
y
j
on C
n
. Is this an inner product? Which of the properties (a), (b), (c)
from Definition 2.1 does it satisfy?
5. (a) Verify that the matrix in Example 2.4 is positive-definite and that
x, y
= 3x
1
y
1
+ x
1
y
2
+
x
2
y
1
+ x
2
y
2
therefore defines an inner product on R
2
.
(Hint: Try to write
||
x
||
2
as a sum of squares. . . )
(b) Let x =
1
1
. With respect to the inner product in part (a), find a non-zero unit vector y
which is orthogonal to x.
6. By multiplying out
|
x
1
ix
2
|
2
, show that the matrix
3 i
i 3
is positive-definite.
(Hint: recall that
|
a
|
2
= aa for complex numbers!!!)
4
Neither of these facts are obvious; in particular, we’d need to see that the sum of two square-summable sequences is
also square-summable.
6
7. Show that every eigenvalue of a positive definite matrix is positive.
8. Prove parts 1, 2 and 3 of Lemma 2.5.
9. Let V be an inner product space, prove Pythagoras’ Theorem:
If
x, y
= 0, then
||
x + y
||
2
=
||
x
||
2
+
||
y
||
2
10. Use basic algebra to prove the Cauchy–Schwarz inequality for vectors x =
(
a
b
)
and y =
(
c
d
)
in
R
2
with the standard (dot) product.
11. Prove the Cauchy–Schwarz and triangle inequalities for a complex inner product space. What
has to change compared to the proof of Lemma 2.5?
12. Prove the polarization identities:
(a) In any real inner product space:
x, y
=
1
4
||
x + y
||
2
1
4
||
x y
||
2
(b) In any complex inner product space:
x, y
=
1
4
4
k=1
i
k
x + i
k
y
2
If you know the length of every vector, then you know the inner product!
13. Prove that
R
2
0
x
x+1
dx
2
3
(Hint: use Cauchy–Schwarz)
14. Let m Z and consider the complex-valued function f
m
(x) =
1
2π
e
imx
. If
,
is the L
2
inner
product on C[π, π], prove that {f
m
: m Z } is an orthonormal set.
This example is central to the study of Fourier series.
(Hint: If complex functions are scary, use Eulers formula e
imx
= cos mx + i sin mx and work with the
real-valued functions cos mx and sin mx. The difficulty is that you then need integration by parts. . . )
15. Let
,
be an inner product on F
n
(recall that F = R or C). Define the matrix A M
n
(F ) by
A
jk
=
e
k
, e
j
where {e
1
, . . . , e
n
} is the standard basis. Verify that A is the matrix of the inner
product:
x, y F
n
,
x, y
= y
Ax
In particular,
A is a Hermitian/self-adjoint matrix: A
= A (if F = R this is simply symmetric);
A is positive-definite: for all x F
n
, x
Ax > 0;
More generally, if β = {v
1
, . . . , v
n
} is a basis then A
jk
=
v
k
, v
j
defines the matrix of the inner
product with respect to β:
x, y
= [y]
β
A[x]
β
7
2.2 Orthogonal Sets and the Gram–Schmidt Process
We start with a simple definition, relating subspaces to orthogonality.
Definition 2.16. Let U be a subspace of an inner product space V. The orthogonal complement U
is
the set
U
= {x V : u U,
x, u
= 0}
It is easy to check that U
is itself a subspace of V and that U U
= {0}. It can moreover be seen
that U (U
)
, though equality need not hold in infinite dimensions (see Exercise 7).
Example 2.17. U = Span
n
1
0
0
,
0
1
3
o
R
3
has orthogonal complement U
= Span
n
0
3
1
o
.
As in the example, we often have a direct sum decomposition
V = U U
: otherwise said,
x V, unique u U, w U
such that x = u + w
In such a case, we’ll call u = π
U
( x) and w = π
U
( x) the or-
thogonal projections of x onto U and U
respectively. As our
first result shows, these are easy to compute whenever U has
a finite orthogonal basis.
Theorem 2.18. Let V be an inner product space and let U = Span β where β = {u
1
, . . . , u
n
} is an
orthogonal set of non-zero vectors;
u
j
, u
k
=
(
0 if j = k
u
j
2
= 0 if j = k
Then:
1. β is a basis of U and each x U has unique representation
x =
n
j=1
x, u
j
u
j
2
u
j
()
This simplifies to x =
x, u
j
u
j
if β is an orthonormal set.
2. V = U U
. For any x V, we may write x = u + w where
u =
n
j=1
x, u
j
u
j
2
u
j
U and w = x u U
Observe that () essentially calculates the co-ordinate vector [x]
β
F
n
. Recalling how unpleasant
such calculations have been in the past, often requiring large matrix inversions, we immediately see
the power of inner products and orthogonal bases.
8
Proof. 1. Since β spans U, a given x U may be written
x =
n
k=1
a
k
u
k
for some scalars a
k
. The orthogonality of β recovers the required expression for a
j
:
x, u
j
=
n
k=1
a
k
u
k
, u
j
= a
j
u
j
2
Finally, let x = 0 to see that β is linearly independent.
2. Clearly u U. For each u
k
, the orthogonality of β tells us that
w, u
k
=
x u, u
k
=
x, u
k
n
j=1
x, u
j
u
j
2
u
j
, u
k
=
x, u
k
x, u
k
= 0
Since w is orthogonal to a basis of U it is orthogonal to any element of U; we conclude that
w U
. Finally, U U
= {0} forces the uniqueness of the decomposition x = u + w,
whence V = U U
.
Examples 2.19. 1. Consider the standard orthonormal basis β = {e
1
, e
2
} of R
2
. For any x =
(
x
1
x
2
)
,
we easily check that
2
j=1
x, e
j
e
j
= x
1
e
1
+ x
2
e
2
= x
2. In R
3
, β = {u
1
, u
2
, u
3
} =
n
1
2
3
,
2
1
0
,
3
6
5
o
is an orthogonal set and thus a basis. We
compute the co-ordinates of x =
7
4
2
with respect to β:
x =
3
j=1
x, u
j
u
j
2
u
j
=
7 + 8 + 6
1 + 4 + 9
1
2
3
+
14 4 + 0
4 + 1 + 0
2
1
0
+
21 + 24 10
9 + 36 + 25
3
6
5
=
3
2
1
2
3
+ 2
2
1
0
+
1
2
3
6
5
= [x]
β
=
3/2
2
1/2
Compare this with the painfully slow augmented matrix method for finding co-ordinates!
3. Revisiting Example 2.17, let x =
1
1
1
. Since β =
n
1
0
0
,
0
1
3
o
is an orthogonal basis of U, we
observe that
π
U
( x) =
1
1
1
0
0
+
2
10
0
1
3
=
1
5
5
1
3
, π
U
( x) = x π
U
( x) =
2
5
0
3
1
are the orthogonal projections corresponding to R
3
= U U
.
9
The Gram–Schmidt Process
Theorem 2.18 tells us how to compute the orthogonal projections corresponding to V = U U
,
provided U has a finite, orthogonal basis. Given how useful such bases are, our next goal is to see that
such exist for any finite-dimensional subspace. Helpfully there exists a constructive algorithm.
Theorem 2.20 (Gram–Schmidt). Suppose S = {s
1
, . . . , s
n
} is a linearly independent subset of an
inner product space V. Construct a sequence of vectors u
1
, . . . , u
n
inductively:
Choose u
1
= a
1
s
1
where a
1
= 0
For each k 2, choose u
k
= a
k
s
k
k1
j=1
s
k
, u
j
u
j
2
u
j
!
where a
k
= 0 ()
Then β := {u
1
, . . . , u
n
} is an orthogonal basis of Span S.
The purpose of the scalars a
k
is to give you some freedom; choose them to avoid unpleasant fractions!
If you want a set of orthonormal vectors, it is easier to scale everything after the algorithm is complete.
Indeed, by taking S to be a basis of V and normalizing the resulting β, we conclude:
Corollary 2.21. Every finite-dimensional inner product space has an orthonormal basis.
Example 2.22. S = {s
1
, s
2
, s
3
} =
n
1
0
0
,
2
1
3
,
1
1
1
o
is a linearly independent subset of R
3
.
1. Choose u
1
= s
1
=
1
0
0
=
||
u
1
||
2
= 1
2. s
2
s
2
,u
1
||
u
1
||
2
u
1
=
2
1
3
2
1
1
0
0
=
0
1
3
: choose u
2
=
0
1
3
=
||
u
2
||
2
= 10
3. s
3
s
3
,u
1
||
u
1
||
2
u
1
s
3
,u
2
||
u
2
||
2
u
2
=
1
1
1
1
1
1
0
0
2
10
0
1
3
=
2
5
0
3
1
: choose u
3
=
0
3
1
The orthogonality of β = {u
1
, u
2
, u
3
} is clear. It is now trivial to observe that
n
u
1
,
1
10
u
2
,
1
10
u
3
o
is
an orthonormal basis of R
3
.
Proof of Theorem 2.20. For each k n, define S
k
= {s
1
, . . . , s
k
} and β
k
= {u
1
, . . . , u
k
}. We prove by
induction that each β
k
is an orthogonal set of non-zero vectors and that Span β
k
= Span S
k
. The
Theorem is then the terminal case k = n.
(Base case k = 1) Certainly β
1
= {a
1
s
1
} is an orthogonal set and Span β
1
= Span S
1
.
(Induction step) Fix k 2, assume β
k1
is an orthogonal non-zero set and that Span β
k1
=
Span S
k1
. By Theorem 2.18, u
k
(Span β
k1
)
. We also see that u
k
= 0, for if not,
( ) = s
k
Span β
k1
= Span S
k1
and S would be linearly dependent. It follows that β
k
is an orthogonal set of non-zero vectors.
Moreover, s
k
Span β
k
= Span S
k
Span β
k
. Since these spaces have the same (finite) dimension
k, we conclude that Span β
k
= Span S
k
.
By induction, β is an orthogonal, non-zero spanning set for Span S; by Theorem 2.18, it is a basis.
10
Example 2.23. This time we work in the space of polynomials P(R) equipped with the L
2
inner
product
f , g
=
R
1
0
f (x)g(x) dx on the interval [0, 1]. Let S = {1, x, x
2
} and apply the algorithm:
1. Choose f
1
(x) = 1 =
||
f
1
||
2
=
R
1
0
1 dx = 1
2. x
x, f
1
||
f
1
||
2
f
1
= x
R
1
0
x dx = x
1
2
We choose f
2
(x) = 2x 1, with
||
f
2
||
2
=
R
1
0
(
2x 1
)
2
dx =
1
3
3. x
2
x
2
, f
1
||
f
1
||
2
f
1
x
2
, f
2
||
f
2
||
2
= x
2
R
1
0
x
2
dx
R
1
0
x
2
(
2x1
)
dx
1/3
(
2x 1
)
= x
2
x +
1
6
We choose f
3
(x) = 6x
2
6x + 1 with
||
f
3
||
2
=
R
1
0
6x
2
6x + 1
2
dx =
1
5
It follows that Span S has an orthonormal basis
β =
n
1,
3( 2x 1),
5( 6x
2
6x + 1)
o
This example can be extended to arbitrary degree since the countable set {1, x, x
2
, . . .} is basis of
P(R ). Indeed this shows that (P(R),
,
) has an orthonormal basis.
Gram–Schmidt also shows that our earlier discussion of orthogonal projections is generic.
Corollary 2.24. If U is a finite-dimensional subspace of an inner product space V, then V = U U
and the orthogonal projections may be computed as in Theorem 2.18.
If U is an infinite-dimensional subspace of V, then we need not have V = U U
and the orthog-
onal projections might not be well-defined (see, for example, Exercises 7 and 8). Instead, if β is an
orthonormal basis of U, it is common to describe the coefficients
x, u
for each u β as the Fourier
coefficients, and the infinite sum
uβ
x, u
u
as the Fourier series of x, provided the sum converges.
Exercises 2.2 1. Apply Gram–Schmidt to obtain an orthogonal basis β for Span S. Then obtain the
co-ordinate representation (Fourier coefficients) of the given vector with respect to β.
(a) S =
n
1
1
1
,
0
1
1
,
0
0
1
o
R
3
, x =
1
0
1
(b) S =
3 5
1 1
,
1 9
5 1
,
7 17
2 6
M
2
(R ), X =
1 27
4 8
(use the Frobenius product)
(c) S =
n
1
i
0
,
0
1
i
,
0
0
1
o
C
3
with x =
1
1
1
(d) S = {1, x, x
2
} with
f , g
=
R
1
1
f (x)g(x) dx and f (x) = x
2
.
Important! You’ll likely need much more practice than this to get comfortable with Gram–
Schmidt; make up your own problems!
11
2. Let S = {s
1
, s
2
} =
n
1
0
3
,
2
1
0
o
and U = Span S R
3
. Find π
U
( x) if x =
3
1
2
.
(Hint: First apply Gram–Schmidt)
3. Find the orthogonal complement to U = Span{x
2
} P
2
(R ) with respect to the inner product
f , g
=
R
1
0
f (t)g(t) dt
4. Let T L(V, W) where V, W are inner product spaces with orthonormal bases β = {v
1
, . . . , v
n
}
and γ = {w
1
, . . . , w
m
} respectively. Prove that the matrix A = [T]
γ
β
M
m×n
(F ) of T with
respect to these bases has jk
th
entry
A
jk
=
T(v
k
), w
j
5. Suppose that β is an orthonormal basis of an n-dimensional inner product space V. Prove that,
x, y V,
x, y
= [y]
β
[x]
β
Otherwise said, the co-ordinate isomorphism ϕ
β
: V F
n
defined by ϕ
β
( x) = [x]
β
is an isomorphism
of inner product spaces where we use the standard inner product on F
n
6. Let U be a subspace of an inner product space V. Prove the following:
(a) U
is a subspace of V.
(b) U U
= {0}
(c) U (U
)
(d) If V = U U
, then U = (U
)
(this is always the case when dim U < )
7. Let
2
be the set of square-summable sequences of real numbers (Definition 2.14). Consider the
sequences u
1
, u
2
, u
3
, . . ., where u
j
is the zero sequence except for a single 1 in the j
th
entry. For
instance,
u
4
= (0, 0, 0, 1, 0, 0, 0, 0, . . .)
(a) Let U = Span{u
j
: j N}. Prove that U
contains only the zero sequence.
(b) Show that the sequence y =
1
n
lies in
2
, but does not lie in U.
U is therefore a proper subset of (U
)
=
2
and
2
= U U
.
8. Recall Exercise 2.1.14 where we saw that the set β = {
1
2π
e
imx
: m Z} is orthonormal with
respect to
f , g
=
R
π
π
f (t)g(t) dt.
(a) Show that the Fourier series of f (x) = x is
F(x) :=
m=
D
x,
1
2π
e
imx
E
1
2π
e
imx
=
m=1
2(1)
m+1
m
sin mx
(b) Briefly explain why the Fourier series is not an element of Span β.
(c) Sketch a few of the Fourier approximations (sum up to m = 5 or 7. . . ) and observe, when
extended to R, how they approximate a discontinuous periodic function.
9. (Hard) Much of Theorem 2.18 remains true, with suitable modifications, even if β is an infinite
set. Restate and prove as much as you can, and identify the false part(s).
12
2.3 The Adjoint of a Linear Operator
Recall how the standard inner product on F
n
may be written in terms of the conjugate-transpose
x, y
= y
x = y
T
x
We start by inserting a matrix into this expression and interpreting in two different ways. Suppose
A M
m×n
(F ), v F
n
and w F
m
, then
A
w, v
| {z }
in F
n
= v
(A
w) = (v
A
) w = (Av)
w =
w, Av
| {z }
in F
m
(†)
Example 2.25. As a sanity check, let A =
1 2
0 3
M
2
(R ), w =
(
x
y
)
and v =
p
q
. Then,
A
T
x
y
,
p
q
=
1 0
2 3
x
y
,
p
q
=
x
2x + 3y
,
p
q
= xp + (2x + 3y)q
x
y
, A
p
q
=
x
y
,
1 2
0 3
p
q
=
x
y
,
p + 2q
3q
= x(p + 2q) + 3yq
Note how the inner products are evaluated on different spaces. At the level of linear maps this is a
relationship between L
A
L(F
n
, F
m
) and L
A
L(F
m
, F
n
), one that is easily generalizable.
Definition 2.26. Let T L(V, W) where V, W are inner product spaces over the same field F. The
adjoint of T is a function T
: W V (read ‘T-star’) satisfying
v V, w W,
T
( w), v
=
w, T(v)
Note that the first inner product is computed within V and the second within W.
The adjoint effectively extends the conjugate-transpose to linear maps. We now use the same notation
for three objects, so be careful!
If A is a real or complex matrix, then A
= A
T
is its conjugate-transpose.
If T is a linear map, then T
is its adjoint.
If V is a vector space, then V
= L(V, F) is its dual space.
Thankfully the two notations line up nicely, as part 3 of our first result shows.
Theorem 2.27 (Basic Properties). 1. If an adjoint exists,
5
then it is unique and linear.
2. If T and S have adjoints, then
(T
)
= T, (TS)
= S
T
, ( λT + S)
= λT
+ S
3. Suppose V, W are finite-dimensional with orthonormal bases β, γ respectively. Then the matrix
of the adjoint of T L(V, W) is the conjugate-transpose of the original: [T
]
β
γ
= ([T]
γ
β
)
.
5
Existence of adjoints is trickier, so we postpone this a little: see Corollary 2.34 and Exercise 12.
13
Proof. 1. (Uniqueness) Suppose T
and S
are adjoints of T. Then
T
( x), y
=
x, T(y)
=
S
( x), y
Since this holds for all y, Lemma 2.5 part 4 says that x, T
( x) = S
( x), whence T
= S
.
(Linearity) Simply translate across, use the linearity of T, and again appeal to Lemma 2.5:
z,
T
( λx + y), z
=
λx + y, T(z)
= λ
x, T(z)
+
y, T(z)
= λ
T
( x), z
+
T
( y), z
=
λT
( x) + T
( y), z
= T
( λx + y) = λT
( x) + T
( y)
2. These may be proved similarly to part 1 and are left as an exercise.
3. By Exercise 2.2.4, the jk
th
entry of [T
]
β
γ
is
T
( w
k
), v
j
=
w
k
, T(v
j
)
=
T(v
j
), w
k
= A
kj
We revisit our motivating set-up (†) in the language of part 3. Suppose:
V = F
n
and W = F
m
have standard orthonormal bases β = {e
1
, . . . , e
n
} and γ = {e
1
, . . . , e
m
}.
T = L
A
L(F
n
, F
m
).
Since the matrix of T with respect to the standard bases is simply A itself, the theorem confirms our
earlier observation that the adjoint of L
A
is left multiplication by the conjugate-transpose A
:
[T
]
β
γ
= ([T]
γ
β
)
= A
= [L
A
]
β
γ
= T
= (L
A
)
= L
A
Here is another straightforward example, this time using the standard inner products on C
2
and C
3
.
Example 2.28. Let T = L
A
L(C
3
, C
2
) where A =
i 1 3
2 1i 4+2i
.
Plainly A = [T]
γ
β
with respect to β = {e
1
, e
2
, e
3
} and γ = {e
1
, e
2
}. We conclude that T
= L
A
:
[T
]
β
γ
= A
=
i 2
1 1 + i
3 4 2i
As a sanity check, multiply out a few examples of
A
w, v
=
w, Av
; make sure you’re comfortable
with the fact that the left inner product is on C
2
and the right on C
3
!
The Theorem tells us that every linear map T L(V, W) between finite-dimensional spaces has an
adjoint and moreover how to compute it:
1. Choose orthonormal bases (exist by Corollary 2.21) and find the matrix [T]
γ
β
.
2. Take the conjugate-transpose
[T]
γ
β
and translate back to find T
L(W, V).
The prospect of twice applying Gram–Schmidt and translating between linear maps and their ma-
trices is unappealing; calculating this way can quickly become an enormous mess! In practice, it is
often better to try a modified approach; see for instance part 2(b) of the next Example.
14
Examples 2.29. Let T =
d
dx
L(P
1
(R )) be the derivative operator; T(a + bx) = b. We treat P
1
(R )
as an inner product space in two ways.
1. Equip the inner product for which the standard basis ϵ = {1, x} is orthonormal. Then
[T]
ϵ
=
0 1
0 0
= [T
]
ϵ
=
0 0
1 0
= T
(a + bx) = ax
2. Equip the L
2
inner product
f , g
=
R
1
0
f (x)g(x) dx. As we saw in Example 2.23, the basis
β = {f
1
, f
2
} = {1, 2x 1} is orthogonal with
||
f
1
||
= 1 and
||
f
2
||
=
1
3
. We compute the
adjoint of T =
d
dx
in two different ways.
(a) The basis γ = {g
1
, g
2
} =
n
f
1
,
3 f
2
o
=
n
1,
3( 2x 1)
o
is orthonormal. Observe that
T(g
1
) = 0, T(g
2
) = 2
3 = [T]
γ
=
0 2
3
0 0
= [T
]
γ
=
0 0
2
3 0
= T
(a + bx) = T
a +
b
2
+
b
2
3
3( 2x 1)
=
a +
b
2
·2
3g
2
= 3(2a + b)(2x 1)
(b) Use the orthogonal basis β and the projection formula (Theorem 2.18). With p(x) = a + bx,
T
(p) =
T
(p), f
1
||
f
1
||
2
f
1
+
T
(p), f
2
||
f
2
||
2
f
2
=
p, T(1)
+
p, T(2x 1)
·3(2x 1)
=
p, 0
+ 3
p, 2
(2x 1) = 3
Z
1
0
2(a + bx) dx
(2x 1)
= 3(2a + b)(2x 1)
Note the advantages here: no square roots and no need to change basis at the end!
The calculations for the second example were much nastier, even though we were already in posses-
sion of an orthogonal basis. The crucial point is that the two examples produce different maps T
: the
adjoint depends on the inner product!
Why should we care about adjoints?
Adjoints might seem merely to be an abstraction of something simple (transposes) for its own sake.
A convincing explanation of why adjoints are useful takes a lot of work; here is a short version.
Given a linear map T L(V) on an inner product space, we now have two desirable types of basis.
1. Eigenbasis: diagonalizes T.
2. Orthonormal basis: recall (Theorem 2.18) how these simplify computations.
The capstone result of this course is the famous spectral theorem (Theorem 2.37) which says, in short,
that self-adjoint operators (T
= T) have an orthonormal eigenbasis, the holy grail of easy computation!
Such operators are important both theoretically and in applications such as quantum mechanics.
15
The Fundamental Subspaces Theorem
To every linear map are associated its range and nullspace. These interact nicely with the adjoint. . .
Theorem 2.30. If T L(V, W) has adjoint T
, then,
1. R( T
)
= N(T)
2. If V is finite dimensional, then R(T
) = N(T)
The corresponding results hold if we swap V W and T T
.
The proof is left to Exercise 6. You’ve likely observed this with transposes of small matrices.
Example 2.31. Let A =
1 2 1
0 3 2
. Viewed as a linear map between Euclidean spaces, T = L
A
has
adjoint T
= L
A
T
. It is easy to compute the relevant subspaces:
R(A) = R
2
, N(A
T
) = {0}, R(A
T
) = Span
n
1
2
1
,
0
3
2
o
, N(A) = Span
1
2
3
The Riesz Representation Theorem
This powerful result demonstrates a natural relation between an inner product space and its dual-
space V
= L(V, F).
Theorem 2.32. If V is finite-dimensional and g : V F is linear, then there exists a unique y V
such that g(x) =
x, y
.
Example 2.33. g(p) :=
R
1
0
p(x) dx is a linear map g : P
2
(R ) R. Equip P
2
(R ) with the inner
product for which the standard basis {1, x, x
2
} is orthonormal. Then
g(a + bx + cx
2
) = a +
1
2
b +
1
3
c =
a + bx + cx
2
, 1 +
1
2
x +
1
3
x
2
We conclude that g(p) =
p, q
, where q(x) = 1 +
1
2
x +
1
3
x
2
.
The idea of the proof is very simple: if g(x) =
x, y
then the nullspace of g must equal Span{y}
. . .
Proof. If g is the zero map, take y = 0. Otherwise, rank g = 1 and
V = N(g) N(g)
where dim N(g)
= 1 (rank–nullity theorem and Exercise 2.2.6)
Let u N(g)
be either of the two unit vectors and define, independently of u,
y := g(u)u V
Following the decomposition, write x = n + αu where n N(g) and observe that
x, y
=
D
n + αu, g(u)u
E
= g(u)α = 0 + g(αu) = g(n + αu) = g(x)
The uniqueness of y follows from the cancellation property (Lemma 2.5, part 4).
16
Due to the tight correspondence, the map is often decorated as g
y
. Riesz’s theorem indeed says that
y 7 g
y
is an isomorphism V
=
V
. While there are infinitely many isomorphisms between these
spaces, the inner product structure identifies a canonical or preferred choice.
Corollary 2.34. Every linear map on a finite-dimensional inner product space has an adjoint.
Note how only the domain is required to be finite-dimensional! Riesz’s Theorem and the Corollary
also apply to continuous linear operators on (infinite-dimensional) Hilbert spaces, though the proof
is a little trickier.
Proof. Let T L(V, W) where dim V < , and suppose z W is given. Simply define T
( z) := y
where y V is the unique vector in Riesz’s Theorem arising from the linear map
g : V F, g(x) =
T(x), z
and check the required property:
T
( z), x
=
y, x
= g(x) =
z, T(x)
Exercises 2.3 1. For each inner product space V and linear operator T L(V), evaluate T
on the
given vector.
(a) V = R
2
with the standard inner product, T
(
x
y
)
=
2x+y
x3y
and x =
3
5
(b) V = C
2
with the standard inner product, T
(
z
w
)
=
2z+iw
(1i)z
and x =
3i
1+2i
(c) V = P
1
(R ) with
f , g
=
R
1
0
f (t)g(t) dt, T( f ) = f
+ 3 f and f (t) = 4 2t
2. Suppose A =
1 1
4 3
and consider the linear map T = L
A
L(R
2
) where R
2
is equipped with
the weighted inner product
x, y
= 4x
1
y
1
+ x
2
y
2
(a) Find the matrix of T with respect to the orthonormal basis β = {v
1
, v
2
} = {
1
2
e
1
, e
2
}.
(b) Find the adjoint T
and its matrix with respect to the standard basis ϵ = {e
1
, e
2
}.
(Hint: the answer isn’t A
T
!)
3. Extending Examples 2.29, find the adjoint of T =
d
dx
L(P
2
(R )) with respect to:
(a) The inner product where the standard basis ϵ = {1, x, x
2
} is orthonormal.
(b) (Hard!) The L
2
inner product
R
1
0
f (x)g(x) dx.
4. Let T( f ) = f
′′
be a linear transformation of P
2
(R ) and let ϵ = {1, x, x
2
} be the standard basis.
Find T
(a + bx + cx
2
):
(a) With respect to the inner product where ϵ is orthonormal;
(b) With respect to the L
2
inner product
f , g
=
R
1
1
f (t)g(t) dt.
(Hint: {1, x, 3x
2
1} is orthogonal)
17
5. Prove part 2 of Theorem 2.27.
6. Prove the Fundamental Subspaces Theorem 2.30.
7. For each inner product space V and linear transformation g : V F, find a vector y V such
that g(x) =
x, y
for all x V.
(a) V = R
3
with the standard inner product, and g
x
y
z
= x 2y + 4z
(b) V = C
2
with the standard inner product, and g
(
z
w
)
= iz 2w
(c) V = P
2
(R ) with the L
2
inner product
f , h
=
R
1
0
f (x)h(x) dx, and g( f ) = f
(1)
8. (a) In the proof of Theorem 2.32, explain why y depends only on g (not u).
(b) In the proof of Corollary 2.34, check that g(x) :=
T(x), z
is linear.
9. Let y, z V be fixed vectors and define T L(V) by T(x) =
x, y
z. Show that T
exists and
find an explicit expression.
10. Suppose A M
m×n
(F ). Prove that A
A is diagonal if and only if the columns of A are orthog-
onal. What additionally would it mean if A
A = I?
11. Suppose T L(V) where V is a finite-dimensional inner product space.
(a) Prove that the eigenvalues of T
are the complex conjugates of those of T.
(Hint: relate the characteristic polynomial p
( t) = det(T
tI) to that of T)
(b) Prove that T
is diagonalizable if and only if T is.
12. (Hard) We present two linear maps which do not have an adjoint!
(a) Since ϵ = {1, x, x
2
, . . .} is a basis of P(R), we may define a linear map T L(P(R)) via
T(x
n
) = 1 for all n; for instance
T( 4 + 3x + 2x
5
) = 9
Let
,
be the inner product for which ϵ is orthonormal. If T
existed, show that
T
(1) =
n=0
x
n
would be an infinite series: T
therefore does not exist.
(b) For a related challenge, recall the space
2
of square-summable real sequences. For any
sequence (x
n
)
n=1
2
), define T L(
2
) via
T(x
n
) =
n=1
1
n
x
n
, 0, 0, 0, 0, 0, . . .
!
Find the adjoint T
. If V
2
is the subspace whose elements have only finitely many
non-zero terms, show that the restriction T
V
does not have an adjoint.
18
2.4 Normal & Self-Adjoint Operators and the Spectral Theorem
We now come to the fundamental question of this chapter: for which linear operators T L(V) can
we find an orthonormal eigenbasis? Many linear maps are, of course, not even diagonalizable, so in
general this is far too much to hope for! Let’s see what happens if such a basis exists. . .
If β is an orthonormal basis of eigenvectors of T, then
[T]
β
= diag(λ
1
, ··· , λ
n
) = [T
]
β
= diag(λ
1
, ··· , λ
n
)
If V is a real inner product space, then these matrices are identical and so T
= T. In the complex
case, we instead observe that
[TT
]
β
= diag(
|
λ
1
|
2
, ··· ,
|
λ
n
|
2
) = [T
T]
β
= TT
= T
T
Definition 2.35. Suppose T is a linear operator on an inner product space V and assume T has an
adjoint. We say that T is,
Normal if TT
= T
T,
Self-adjoint if T
= T.
The definitions for square matrices over R and C are identical, where
now denotes the conjugate-
transpose.
A real self-adjoint matrix A M
n
(R ) is plainly symmetric: A
T
= A. A complex self-adjoint matrix is
also called Hermitian: A
= A.
If T is self-adjoint then it is certainly normal, but the converse is false as the next example shows.
Example 2.36. The (non-symmetric) real matrix A =
2 1
1 2
is normal but not self-adjoint:
AA
T
=
2 1
1 2
2 1
1 2
=
5 0
0 5
=
2 1
1 2
2 1
1 2
= A
T
A
More generally, every non-zero skew-hermitian matrix A
= A is normal but not self-adjoint:
A
= A = AA
= A
2
= A
A
We saw above that linear maps with an orthonormal eigenbasis are either self-adjoint or normal
depending whether the inner product space is real or complex. Amazingly, this provides a complete
characterisation of such maps!
Theorem 2.37 (Spectral Theorem, version 1). Let T be a linear operator on a finite-dimensional
inner product space V.
Complex case: V has an orthonormal basis of eigenvectors of T if and only if T is normal.
Real case: V has an orthonormal basis of eigenvectors of T if and only if T is self-adjoint.
The theorem gets is name from the spectrum (set of eigenvalues) of T.
19
Examples 2.38. 1. We diagonalize the self-adjoint linear map T = L
A
L(R
2
) where A =
6 3
3 2
.
Characteristic polynomial p(t) = (6 t)(2 t) 9 = t
2
4t 21 = (t 7)(t + 3)
Eigenvalues λ
1
= 7, λ
2
= 3
Eigenvectors (normalized) w
1
=
1
10
3
1
, w
2
=
1
10
1
3
The basis β = {w
1
, w
2
} is orthonormal, with respect to which [T]
β
=
7 0
0 3
is diagonal.
2. The map T = L
A
L(R
2
) where A =
1 3
0 2
is neither self-adjoint nor normal:
AA
=
1 3
0 2
1 0
3 2
=
10 6
6 4
=
1 3
3 13
=
1 0
3 2
1 3
0 2
= A
A
It is diagonalizable, indeed
γ =
1
0
,
3
1
[T]
γ
=
1 0
0 2
In accordance with the spectral theorem, γ is not orthogonal.
3. Let A =
0 1
1 0
and consider T = L
A
acting on both C
2
and R
2
. Since T is normal but not
self-adjoint, we’ll see how the field really matters in the spectral theorem.
First the complex case: T L(C
2
) is normal and thus diagonalizable with respect to an or-
thonormal basis of eigenvectors. Here are the details.
Characteristic polynomial p(t) = t
2
+ 1 = (t i)(t + i)
Eigenvalues λ
1
= i, λ
2
= i
Eigenvectors (normalized) w
1
=
1
2
i
1
, w
2
=
1
2
i
1
Certainly
w
1
, w
2
= 0, β = {w
1
, w
2
} is orthonormal, and [T]
β
=
i 0
0 i
is diagonal.
Now for the real case: T L(R
2
) is not self-adjoint and thus should not be diagonalizable
with respect to an orthonormal basis of eigenvectors. Indeed this is trivial; the characteristic
polynomial has no roots in R and so there are no real eigenvalues! It is also clear geometrically:
T is rotation by 90° counter-clockwise around the origin, so it has no eigenvectors.
4. Let A =
3 i
i 3
and consider the self-adjoint operator T = L
A
L(C
2
).
Characteristic polynomial p(t) = t
2
6t + 9 1 = (t 2)(t 4)
Eigenvalues λ
1
= 2, λ
2
= 4
Eigenvectors (normalized) w
1
=
1
2
1
i
, w
2
=
1
2
1
i
With respect to the orthonormal basis β = {w
1
, w
2
}, we have [T]
β
=
2 0
0 4
.
20
Proving the Spectral Theorem for Self-Adjoint Operators
Lemma 2.39 (Basic properties of self-adjoint operators). Let T L(V) be self-adjoint.
1. If W V is T-invariant then the restriction T
W
is self-adjoint;
2. Every eigenvalue of T is real;
3. If dim V is finite then T has an eigenvalue.
It is irrelevant whether V is real or complex. The previous example demonstrates part 2; even when
V = C
2
is a complex inner product space, the eigenvalues of a self-adjoint matrix are real.
Proof. 1. Let w
1
, w
2
W. Then
T
W
( w
1
), w
2
=
T(w
1
), w
2
=
w
1
, T(w
2
)
=
w
1
, T
W
( w
2
)
2. Suppose (λ , x) is an eigenpair. Then
λ
||
x
||
2
=
T(x), x
=
x, T(x)
= λ
||
x
||
2
= λ R
3. This is trivial if V is complex since every characteristic polynomial splits over C. We therefore
assume V is real. Choose any orthonormal basis γ of V, let A = [T]
γ
M
n
(R ), and define
S := L
A
L(C
n
). Then;
The characteristic polynomial of S splits over C, whence there exists an eigenvalue λ C.
The characteristic polynomials of S and T are identical (to that of A).
S is self-adjoint and thus (part 2) λ R.
It follows that T has the same real eigenvalue λ.
We are now able to prove the spectral theorem for self-adjoint operators on a finite-dimensional inner
product space V. The argument applies regardless of whether V is real or complex.
Proof of the Spectral Theorem (self-adjoint case). We prove by induction on dim V.
(Base case) If dim V = 1, then V = Span{x} and T(x) = λx for some unit vector x and scalar λ R;
plainly {x} is an orthonormal eigenbasis for T.
(Induction step) Fix n N and assume that every self-adjoint operator on every inner product space
of dimension n satisfies the spectral theorem. Let dim V = n + 1 and T L(V) be self-adjoint.
By part 3 of the Lemma, T has an eigenpair (λ, x) where we may assume x has unit length. Let
W = Span{x}
. If w W, then
x, T(w)
=
T(x), w
= λ
x, w
= 0 ()
whence W is T-invariant. Plainly dim W = n. By part 1 of the Lemma, T
W
is self-adjoint. By the
induction hypothesis, T
W
is diagonalized by some orthonormal basis γ of W. But then T is diagonal-
ized by the orthonormal basis β = γ {x} of V.
21
Proving the Spectral Theorem for Normal Operators
What changes for normal operators on complex inner product spaces? Not much! Indeed the proof
is almost identical when T is merely normal.
We don’t need parts 2 and 3 of Lemma 2.39: every linear operator on a finite-dimensional
complex inner product space has an eigenvalue and we no longer care whether eigenvalues are
real.
Two parts of the induction step need fixed:
W being T-invariant: This isn’t quite as simple as (), but thankfully part 3 of the next
result provides the needed correction.
T
W
being normal: We need a replacement for part 1 of Lemma 2.39; this is a little more
involved.
Rather than write out all the details, we leave this to Exercises 6 and 7.
For completeness, and as an analogue/extension of Lemma 2.39, we summarize some of the basic
properties of normal operators. These also apply to self-adjoint operators as a special case.
Lemma 2.40 (Basic properties of normal operators). Let T be normal on V. Then:
1. x V,
||
T(x)
||
=
||
T
( x)
||
.
2. T tI is normal for any scalar t.
3. T(x) = λx T
( x) =
λx so that T and T
have the same eigenvectors and conjugate
eigenvalues. This recovers the previously established fact that λ R if T is self-adjoint.
4. Distinct eigenvalues of T have orthogonal eigenvectors.
Proof. 1.
||
T(x)
||
2
=
T(x), T(x)
=
T
T(x), x
=
TT
( x), x
=
T
( x), T
( x)
=
||
T
( x)
||
2
.
2.
x, (T tI) (y)
=
x, T(y)
t
x, y
=
T
( x), y
tx, y
=
(T
tI) (x), y
shows that T
tI has adjoint T
tI. It is trivial to check that these commute.
3. T(x) = λx
||
(T λI)(x)
||
= 0
parts
1&2
(T
λI) (x)
= 0 T
( x) = λx.
4. In part this follows from the spectral theorem, but we can also prove more straightforwardly.
Suppose T(x) = λx and T(y) = µy where λ = µ. By part 3,
λ
x, y
=
λx, y
=
T(x), y
=
x, T
( y)
=
x, µy
= µ
x, y
This is a contradiction unless
x, y
= 0.
22
Schurs Lemma
It is reasonable to ask how useful an orthonormal basis can be in general. Here is one answer.
Lemma 2.41 (Schur). Suppose T is a linear operator on a finite-dimensional inner product space V.
If the characteristic polynomial of T splits, then there exists an orthonormal basis β of V such that
[T]
β
is upper-triangular.
The spectral theorem is a special case; since the proof is similar, we leave it to the exercises.
The conclusion of Schur’s lemma is weaker than the spectral theorem, though it applies to more
operators: indeed if V is complex, it applies to any T! Every example of the spectral theorem is also
an example of Schur’s lemma. Example 2.38.2 provides another, since the matrix A is already upper
triangular with respect to the standard orthonormal basis. Here is a another example.
Example 2.42. Consider T( f ) = 2 f
(x) + x f (1) as a linear map T L(P
1
(R )) with respect to the
L
2
inner product
f , g
=
R
1
0
f (t)g(t) dt. We have
T(a + bx) = 2b + (a + b)x
If [T]
β
is to be upper-triangular, the first vector in β must be an eigenvector of T. It is easily checked
that f
1
= 1 + x is such with eigenvalue 2. To find a basis satisfying Schur’s lemma, we need only find
f
2
orthogonal to this and then normalize. This can be done by brute force since the problem is small,
but for the sake of practice we apply Gram–Schmidt to the polynomial 1:
1
1, 1 + x
||
1 + x
||
2
(1 + x) = 1
1 +
1
2
1 + 1 +
1
3
(1 + x) =
1
14
(5 9x) = f
2
= 5 9x
Indeed we obtain an upper-triangular matrix for T:
T( f
2
) = 18 4x = 13(1 + x) (5 9x) = 13 f
1
f
2
= [T]
{f
1
, f
2
}
=
2 13
0 1
We can also work with the corresponding orthonormal basis as posited in the theorem, though the
matrix is messier:
β = {g
1
, g
2
} =
(
r
3
7
(1 + x),
1
7
(5 9x)
)
= [T]
β
=
2
13
3
0 1
!
Alternatively, we could have started with the other eigenvector h
1
= 2 x: an orthogonal vector to
this is h
2
= 4 9x, with respect to which
[T]
{h
1
,h
2
}
=
1 13
0 2
In both cases the eigenvalues are down the diagonal, as must be for an upper-triangular matrix.
In general, it is difficult to quickly find a suitable basis satisfying Schur’s lemma. After trying the
proof in the exercises, you should be able to describe a method, though it is impractically slow!
23
Exercises 2.4 1. For each linear operator T on an inner product space V, decide whether T is nor-
mal, self-adjoint, or neither. If the spectral theorem permits, find an orthonormal eigenbasis.
(a) V = R
2
and T
(
x
y
)
=
2ab
2a+5b
(b) V = R
3
and T
x
y
z
=
ab
5b
4a2b+5c
(c) V = C
2
and T
(
z
w
)
=
2z+iw
z+2w
(d) V = R
4
with T : (e
1
, e
2
, e
3
, e
4
) 7 (e
3
, e
4
, e
1
, e
2
)
(e) V = P
2
(R ) with
f , g
=
R
1
0
f (t)g(t) dt and T( f ) = f
(Hint: Don’t compute T
! Instead assume T is normal and aim for a contradiction. . . )
2. Let T( f (x)) = f
(x) + 4x f (0) where T L(P
1
(R )) and
f , g
=
R
1
1
f (t)g(t) dt. Find an
orthonormal basis of P
1
(R ) with respect to which the matrix of T is upper-triangular.
3. Suppose S, T are self-adjoint operators on an inner product space V. Prove that ST is self-adjoint
if and only if ST = TS.
(Hint: recall Theorem 2.27)
4. Let T be normal on a finite-dimensional inner product space V. Prove that N(T
) = N(T) and
that R( T
) = R(T).
(Hint: Use Lemma 2.40 and the Fundamental Subspaces Theorem 2.30)
5. Let T be self-adjoint on a finite-dimensional inner product space V. Prove that
x V,
||
T(x) ±ix
||
2
=
||
T(x)
||
2
+
||
x
||
2
Hence prove that T iI is invertible and that [(T iI)
1
]
= (T + iI)
1
.
6. Let W be a T-invariant subspace of an inner product space V and let T
W
L(W) be the restric-
tion of T to W. Prove:
(a) W
is T
-invariant.
(b) If W is both T- and T
-invariant, then ( T
W
)
= (T
)
W
.
(c) If W is both T- and T
-invariant and T is normal, then T
W
is normal.
7. Use the previous question to complete the proof of the spectral theorem for a normal operator
on a finite-dimensional complex inner product space.
8. (a) Suppose S is a normal operator on a finite-dimensional complex inner product space, all of
whose eigenvalues are real. Prove that S is self-adjoint.
(b) Let T be a normal operator on a finite-dimensional real inner product space V whose char-
acteristic polynomial splits. Prove that T is self-adjoint and that there exists an orthonor-
mal basis of V of eigenvectors of T.
(Hint: Mimic the proof of Lemma 2.39 part 3 and use part (a))
9. Prove Schur’s lemma by induction, similarly to the proof of the spectral theorem.
(Hint: T
has an eigenvector x; why? Now show that W = Span{x}
is T-invariant. . . )
24
2.5 Unitary and Orthogonal Operators and their Matrices
In this section we focus on length-preserving transformations of an inner product space.
Definition 2.43. A linear
6
isometry of an inner product space V is a linear map T satisfying
x V,
||
T(x)
||
=
||
x
||
Every eigenvalue of an isometry must have modulus 1: if T(w) = λw, then
||
w
||
2
=
||
T(w)
||
2
=
||
λw
||
2
=
|
λ
|
2
||
w
||
2
Example 2.44. Let T = L
A
L(R
2
), where A =
1
5
4 3
3 4
. Then
T
x
y
2
=
1
5
4x 3y
3x + 4y
2
=
1
25
(4x 3y)
2
+ (3x + 4y)
2
= x
2
+ y
2
=
x
y
2
This matrix is very special in that its inverse equals its transpose:
A
1
=
1
16
25
+
9
25
4 3
3 4
=
1
5
4 3
3 4
= A
T
We call such matrices orthogonal. The simple version of what follows is that every linear isometry on
R
n
is multiplication by an orthogonal matrix.
Definition 2.45. A unitary operator T on an inner product space V is an invertible linear map satis-
fying T
T = I = TT
. A unitary matrix is a (real or complex) matrix satisfying A
A = I.
If V is real, we usually call these orthogonal operators/matrices; this isn’t necessary, since unitary en-
compasses both real and complex spaces. An orthogonal matrix satisfies A
T
A = I.
Example 2.46. The matrix A =
1
3
i 2+2i
22i i
is unitary:
A
A =
1
9
i 2 + 2i
2 2i i
i 2 + 2i
2 2i i
=
1
9
i
2
+ 4 + 4 (i + i)(2 + 2i)
(i i)(2 2i) 4 + 4 i
2
= I
If V is finite-dimensional, the operator/matrix notions correspond straightforwardly. By Theorem
2.27, if we choose any orthonormal basis β of V, then
T L(V) is unitary/orthogonal [T]
β
is unitary/orthogonal
Moreover, we need only assume T
T = I (or TT
= I) if V is finite-dimensional: if β is an orthonormal
basis, then
T
T = I [T
]
β
[T]
β
= I [T]
β
[T
]
β
= I TT
= I
In infinite dimensions, we need T
to be both the left- and right-inverse of T. This isn’t an empty
requirement (see Exercise 13).
6
There also exist non-linear isometries: for instance translations (T(x) = x + a for any constant a) and complex conjugation
(T(x) = x) on C
n
. Together with linear isometries, these essentially comprise all isometries in finite dimensions.
25
We now tackle the correspondence between unitary operators and isometries.
Theorem 2.47. Let T be a linear operator on an inner product space V.
1. If T is a unitary/orthogonal operator, then it is a linear isometry.
2. If T is a linear isometry and V is finite-dimensional, then T is unitary/orthogonal.
Proof. 1. If T is unitary, then
x, y V,
x, y
=
T
T(x), y
=
T(x), T(y)
(†)
In particular taking x = y shows that T is an isometry.
2. ( I T
T)
= I
(T
T)
= I T
T is self-adjoint. By the spectral theorem, there exists an
orthonormal basis of V of eigenvectors of I T
T. For any such x with (real) eigenvalue λ,
0 =
||
x
||
2
||
T(x)
||
2
=
x, x
T(x), T(x)
=
x, (I T
T)x
= λ
||
x
||
2
= λ = 0
Since I T
T = 0 on a basis, T
T = I. Since V is finite-dimensional, we also have TT
= I
whence T is unitary.
The finite-dimensional restriction is important in part 2: we use the existence of adjoints, the spectral
theorem, and that a left-inverse is also a right-inverse. See Exercise 13 for an example of a non-unitary
isometry in infinite dimensions.
The proof shows a little more:
Corollary 2.48. On a finite-dimensional space, being unitary is equivalent to each of the following:
(a) Preservation of the inner product
7
( ).
(b) The existence of an orthonormal basis β = {w
1
, . . . , w
n
} such that T(β) = {T(w
1
), . . . , T(w
n
) }
is also orthonormal.
(c) That every orthonormal basis β of V is mapped to an orthonormal basis T(β).
While (a) is simply (), claims (b) and (c) are also worth proving explicitly: see Exercise 9. If β is the
standard orthonormal basis of F
n
and T = L
A
, then the columns of A form the orthonormal set T(β).
This makes identifying unitary/orthogonal matrices easy:
Corollary 2.49. A matrix A M
n
(R ) is orthogonal if and only if its columns form an orthonormal
basis of R
n
with respect to the standard (dot) inner product.
A matrix A M
n
(C ) is unitary if and only if its columns form an orthonormal basis of C
n
with
respect to the standard (Hermitian) inner product.
7
In particular, in a real inner product space isometries also preserve the angle θ between vectors since cos θ =
x,y
||
x
||||
y
||
.
26
Examples 2.50. 1. The matrix A
θ
=
cos θ sin θ
sin θ cos θ
M
2
(R ) is orthogonal for any θ. Example 2.44 is
this with θ = tan
1
3
4
= sin
1
3
5
= cos
1
4
5
. More generally (Exercise 6), it can be seen that every
real orthogonal 2 ×2 matrix has the form A
θ
or
B
θ
=
cos θ sin θ
sin θ cos θ
for some angle θ. The effect of the L
A
θ
is to rotate counter-clockwise by θ, while that of L
B
θ
is to
reflect across the line making angle
1
2
θ with the positive x-axis.
2. A =
1
6
2
3 1
2 0 2
2
3 1
!
M
3
(R ) is orthogonal: check the columns!.
3. A =
1
2
1 i
i 1
M
2
(C ) is unitary: indeed it maps the standard basis to the orthonormal basis
T(β) =
1
2
1
i
,
1
2
i
1
It is also easy to check that the characteristic polynomial is
p( t) = det
1
2
t
i
2
i
2
1
2
t
!
=
t
1
2
2
+
1
2
= t =
1
2
(1 ±i) = e
±πi/4
whence the eigenvalues of T both have modulus 1.
4. Here is an example of an infinite-dimensional unitary operator. On the space C[π, π], the
function T( f (x)) = e
ix
f (x) is linear. Moreover
D
e
ix
f (x), g(x)
E
=
1
2π
Z
π
π
e
ix
f (x)g(x) dx =
1
2π
Z
π
π
f (x)e
ix
g(x) dx =
D
f (x), e
ix
g(x)
E
whence T
( f (x)) = e
ix
f (x). Indeed T
= T
1
and so T is a unitary operator.
Since C[π, π] is infinite-dimensional, we don’t expect all parts of the Corollary to hold:
(a) T does preserve the inner product.
(b), (c) C[π, π] doesn’t have an orthonormal basis; there is no orthonormal set β = {f
k
} so that
every continuous function is a finite linear combination.
8
We cannot therefore claim that T
maps orthonormal bases to orthonormal bases!
8
An infinite orthonormal set β = {f
k
: k Z} can be found so that every function f ‘equals’ an infinite series in the
sense that
||
f
a
k
f
k
||
= 0. Since these are not finite sums, β isn’t strictly a basis, though it isn’t uncommon for it to be
so described. Moreover, given that the norm is defined by an integral, this also isn’t a claim that f and
a
k
f
k
are equal
as functions. Indeed the infinite series need not be continuous! For these reasons, when working with Fourier series, one
tends to consider a broader class than the continuous functions.
27
Unitary and Orthogonal Equivalence
Suppose A M
n
(R ) is symmetric (self-adjoint) A
T
= A. By the spectral theorem, A has an orthonor-
mal eigenbasis β = {w
1
, . . . , w
n
}: Aw
j
= λ
j
w
j
. Arranging the eigenbasis as the columns of a matrix,
we see that the columns of U = (w
1
···w
n
) are orthonormal and so U is an orthogonal matrix. We
can therefore write
A = UDU
1
= U
λ
1
··· 0
.
.
.
.
.
.
.
.
.
0 ··· λ
n
U
T
A similar approach works if A M
n
(C ) is normal: we now have A = UDU
where U is unitary.
Example 2.51. The matrix A =
1+i 1+i
1i 1+i
is normal as can easily be checked. Its characteristic
polynomial is
p( t) = t
2
2(1 + i)t + 4i = (t 2i)(t 2)
with corresponding orthonormal eigenvectors
w
2
=
1
2
1
i
, w
2i
=
1
2
1
i
We conclude that
A =
1
2
1 1
i i
2 0
0 2i
1
2
1 1
i i
1
=
1 1
i i
1 0
0 i
1 i
1 i
This is an example of unitary equivalence.
Definition 2.52. Square matrices A, B are unitarily equivalent if there exists a unitary matrix U such
that B = U
AU. Orthogonal equivalence is similar: B = U
T
AU.
The above discussion proves half the following:
Theorem 2.53. A M
n
(C ) is normal if and only if it is unitarily equivalent to a diagonal matrix
(the matrix of its eigenvalues).
A M
n
(R ) is self-adjoint (symmetric) if and only if it is orthogonally equivalent to a diagonal matrix.
Proof. We’ve already observed the () direction.
For the converse, let D be diagonal, U unitary, and A = U
DU. Then
A
A = (U
DU)
U
DU = U
D
UU
DU = U
DDU = U
DDU = U
DUU
DU
= U
DU(U
DU)
= AA
since U
= U
1
and because diagonal matrices commute: DD = DD.
In the special case where A is real and U is orthogonal, then A is symmetric:
A
T
= ( U
T
DU)
T
= U
T
D
T
U = U
T
DU = A
28
Exercises 2.5 1. For each matrix A find an orthogonal or unitary U and a diagonal D = U
AU.
(a)
1 2
2 1
(b)
0 1
1 0
(c)
2 33i
3+3i 5
(d)
2 1 1
1 2 1
1 1 2
2. Which of the following pairs are unitarily/orthogonally equivalent? Explain your answers.
(a) A =
0 1
1 0
and B =
0 2
2 0
(b) A =
0 1 0
1 0 0
0 0 1
and B =
2 0 0
0 1 0
0 0 0
(c) A =
0 1 0
1 0 0
0 0 1
and B =
1 0 0
0 i 0
0 0 i
3. Let a, b C be such that
|
a
|
2
+
|
b
|
2
= 1. Prove that every 2 ×2 matrix of the form
a e
iθ
b
b e
iθ
a
is
unitary. Are these all the unitary 2 ×2 matrices? Prove or disprove.
4. If A, B are orthogonal/unitary, prove that AB and A
1
are also orthogonal/unitary.
(This proves that orthogonal/unitary matrices are groups under matrix multiplication)
5. Check that A =
1
3
5 4i
4i 5
M
2
(C ) satisfies A
T
A = I (it is a complex orthogonal matrix).
(These don’t have the same nice relationship with inner products, and are thus less useful to us)
6. Supply the details of Exercise 2.50.1.
(Hints: β = {i, j} is orthonormal, whence {Ai, Aj} must be orthonormal. Now draw pictures to
compute the result of rotating and reflecting the vectors i and j.)
7. Show that the linear map in Example 2.50.4 has no eigenvectors.
8. Prove that A M
n
(C ) has an orthonormal basis of eigenvectors whose eigenvalues have mod-
ulus 1, if and only if A is unitary.
9. Prove parts (b) and (c) of Corollary 2.48 for a finite-dimensional inner product space:
(a) If β is an orthonormal basis such that T(β) is orthonormal, then T is unitary.
(b) If T is unitary, and η is an orthonormal basis, then T(η) is an orthonormal basis.
10. Let T be a linear operator on a finite-dimensional inner product space V. If
||
T(x)
||
=
||
x
||
for
all x in some orthonormal basis of V, must T be unitary? Prove or disprove.
11. Let T be a unitary operator on an inner product space V and let W be a finite-dimensional
T-invariant subspace of V. Prove:
(a) T(W) = W (Hint: show that T
W
is injective);
(b) W
is T-invariant.
12. Let W a subspace of an inner product space V such that V = W W
. Define T L(V) by
T(u + w) = u w where u W and w W
. Prove that T is unitary and self-adjoint.
13. In the inner product space
2
of square-summable sequences, consider the linear operator
T(x
1
, x
2
, . . .) = (0, x
1
, x
2
, . . .). Prove that T is an isometry and compute its adjoint. Check that T
is non-invertible and non-unitary.
14. Prove Schur’s Lemma for matrices. Every A M
n
(R ) is orthogonally equivalent and every
A M
n
(C ) is unitarily equivalent to an upper triangular matrix.
29