2.6 Orthogonal Projections
Recall the discussion of the Gram-Schmidt process, where we saw that any finite-dimensional sub-
space W of an inner product space V has an orthonormal basis β
W
= {w
1
, . . . , w
n
}. In such a situa-
tion, we can define the orthogonal projections onto W and W
via
π
W
: V W : x 7
n
j=1
x, w
j
w
j
, π
W
: V W
: x 7 x π
W
( x)
Our previous goal was to use orthonormal bases to ease computation. In this section we develop
projections more generally. First recall the notion of a direct sum within a vector space V:
V = X Y v V, unique x X, y Y such that v = x + y
Definition 2.54. A linear map T L(V) is a projection if:
V = R(T) N(T) and T
R(T)
= I
R(T)
Otherwise said, T(r + n) = r whenever r R(T) and n N(T).
Alternatively, given V = X Y, the projection along Y onto X is the map
v = x + y 7 x.
We call a A M
n
(F ) a projection matrix if L
A
L(F
n
) is a projection.
X
Y
v
x = T(v)
y
Example 2.55. A =
1
5
6 2
3 1
is a projection matrix with R(A) = Span
2
1
and N(A) = Span
1
3
.
Indeed, it is straightforward to describe all projection matrices in M
2
(R ). There are three cases:
1. A = I is the identity matrix: R(A) = R
2
and N(A) = {0};
2. A = 0 is the zero matrix: R(A) = {0} and N(A) = R
2
;
3. Choose distinct subspaces R(A) = Span
(
a
b
)
and N(A) = Span
(
c
d
)
, then
A =
1
ad bc
a
b
( d c) =
1
ad bc
ad ac
bd bc
Think about why this last does what we claim.
It should be clear that every projection T has (at most) two eigenspaces:
R(T) is an eigenspace with eigenvalue 1
N(T) is an eigenspace with eigenvalue 0
If V is finite-dimensional and ρ, η are bases of R(T), N(T) respectively, then the matrix of T with
respect to ρ η has block form
[T]
ρη
=
I 0
0 0
where rank I = rank T. In particular, every finite-dimensional projection is diagonalizable.
1
Lemma 2.56. T L(V) is a projection if and only if T
2
= T.
Proof. Throughout, assume r R(T) and n N( T).
( ) Since every vector in V has a unique representation v = r + n, simply compute
T
2
( v) = T
T(r + n)
= T(r) = r = T(v)
( ) Suppose T
2
= T. Note first that if r R( T), then r = T(v) for some v V, whence
T(r) = T
2
( v) = T(v) = r (†)
Thus T is the identity on R(T). Moreover, if x R(T) N(T), (†) says that x = T(x) = 0, whence
R(T) N(T) = {0}
and so R(T) N(T) is a well-defined subspace of V.
1
To finish things off, let v V and observe that
T
v T(v)
= T(v) T
2
( v) = 0 = v T(v) N(T)
so that v = T(v) +
v T(v)
is a decomposition into R(T)- and N(T)-parts. We conclude that
V = R(T) N(T) and that T is a projection.
Thus far the discussion hasn’t had anything to do with inner products. . .
Definition 2.57. An orthogonal projection is a projection T L(V) on an inner product space for
which we additionally have
N(T) = R(T)
and R(T) = N(T)
Alternatively, given V = W W
, the orthogonal projection π
W
is the projection along W
onto W:
that is
R(π
W
) = W and N(π
W
) = W
The complementary orthogonal projection π
W
= I π
W
has R(π
W
) = W
and N(π
W
) = W.
Example (2.55 continued). The identity and zero matrices are both 2 × 2 orthogonal projection
matrices, while those of type 3 are orthogonal if
(
a
b
)
·
(
c
d
)
= 0: we obtain
A =
1
a
2
+ b
2
a
b
(a b) =
1
a
2
+ b
2
a
2
ab
ab b
2
More generally, if W F
n
has orthonormal basis {w
1
, . . . , w
k
}, then the matrix of π
W
is
k
j=1
w
j
w
j
.
1
In finite dimensions, the rank–nullity theorem and dimension counting finishes the proof here without having to
proceed further:
dim(R(T) N(T)) = rank T + null T = dim V = R(T) N(T) = V
2
Theorem 2.58. A projection T L(V) is orthogonal if and only if it is self-adjoint T = T
.
Proof. () By assumption, R(T) and N(T) are orthogonal subspaces. Letting x, y V and using
subscripts to denote R(T)- and N(T)-parts, we see that
x, T(y)
=
x
r
+ x
n
, y
r
=
x
r
, y
r
+ y
n
=
T(x), y
= T
= T
() Suppose T is a self-adjoint projection. By the fundamental subspaces theorem,
N(T) = N(T
) = R(T)
Since T is a projection already, we have V = R(T) N(T) = R(T) R(T)
, from which
R(T) =
R(T)
= N(T)
The language of projections allows us to rephrase the Spectral Theorem.
Theorem 2.59 (Spectral Theorem, mk. II). Let V be a finite-dimensional complex/real inner prod-
uct space and T L(V) be normal/self-adjoint with spectrum {λ
1
, . . . , λ
k
} and corresponding
eigenspaces E
1
, . . . , E
k
. Let π
j
L(V) be the orthogonal projection onto E
j
. Then:
1. V = E
1
···E
k
is a direct sum of orthogonal subspaces, in particular, E
j
is the direct sum of
the remaining eigenspaces.
2. π
i
π
j
= 0 if i = j.
3. (Resolution of the identity) I
V
= π
1
+ ··· + π
k
4. (Spectral decomposition) T = λ
1
π
1
+ ··· + λ
k
π
k
Proof. 1. T is diagonalizable and so V is the direct sum of the eigenspaces of T. Since T is normal,
the eigenvectors corresponding to distinct eigenvalues are orthogonal, whence the eigenspaces
are mutually orthogonal. In particular, this says that
ˆ
E
j
:=
M
i=j
E
i
E
j
Since V is finite-dimensional, we have V = E
j
E
j
, whence
dim
ˆ
E
j
=
i=j
dim E
i
= dim V dim E
j
= dim E
j
=
ˆ
E
j
= E
j
2. This is clear by part 1, since N(π
j
) = E
j
=
ˆ
E
j
.
3. Write x =
k
j=1
x
j
where each x
j
E
j
. Then π
j
( x) = x
j
: now add. . .
4. T(x) =
k
j=1
T(x
j
) =
k
j=1
λ
j
x
j
=
k
j=1
λ
j
π
j
( x).
3
Examples 2.60. We verify the resolution of the identity and the spectral decomposition; for clarity,
we index projections and eigenspaces by eigenvalue rather than the natural numbers.
1. The symmetric matrix A =
10 2
2 7
has spectrum {6, 11} and orthonormal eigenvectors
w
6
=
1
5
1
2
, w
11
=
1
5
2
1
The corresponding projections therefore have matrices
π
6
= w
6
w
T
6
=
1
5
1
2
(1 2) =
1
5
1 2
2 4
, π
11
= w
11
w
T
11
=
1
5
4 2
2 1
from which the resolution of the identity and the spectral decomposition are readily verified:
π
6
+ π
11
=
1 0
0 1
and 6π
6
+ 11π
11
=
1
5
6 + 44 12 + 22
12 + 22 24 + 11
= A
2. The normal matrix B =
1+i 1+i
1i 1+i
M
2
(C ) has spectrum {2, 2i} and corresponding orthonor-
mal eigenvectors
w
2
=
1
2
i
1
, w
2i
=
1
2
1
i
The orthogonal projection matrices are therefore
π
2
= w
2
w
2
=
1
2
i
1
( i 1) =
1
2
1 i
i 1
π
2i
= w
2i
w
2i
=
1
2
1 i
i 1
from which
π
2
+ π
2i
=
1 0
0 1
and 2π
2
+ 2iπ
2i
=
1 i
i 1
+
i 1
1 i
= B
3. The matrix C =
0 1 1
1 0 1
1 1 0
has spectrum {1, 2}, an orthonormal eigenbasis
{u, v, w} =
1
3
1
1
1
,
1
2
1
1
0
,
1
6
1
1
2
and eigenspaces E
2
= Span{u} and E
1
= Span{v, w}. The orthogonal projections have ma-
trices
π
2
= uu
T
=
1
3
1
1
1
(1 1 1) =
1
3
1 1 1
1 1 1
1 1 1
π
1
= vv
T
+ ww
T
=
1
2
1
1
0
(1 1 0) +
1
6
1
1
2
(1 1 2)
=
1
2
1 1 0
1 1 0
0 0 0
+
1
6
1 1 2
1 1 2
2 2 4
=
1
3
2 1 1
1 2 1
1 1 2
It is now easy to check the resolution of the identity and the spectral decomposition:
π
2
+ π
1
= I and 2π
2
π
1
= C
4
Orthogonal Projections and Minimization Problems
We finish this section with an important observation that drives much of the application of inner
product spaces to other parts of mathematics and beyond. Throughout this discussion, X and Y
denote inner product spaces.
Theorem 2.61. Suppose Y = W W
. For any y Y, the orthogonal projection π
W
( y) is the
unique element of W which minimizes the distance to y:
w W,
||
y π
W
( y)
||
||
y w
||
Proof. Apply Pythagoras’ Theorem: since π
W
( y) = y π
W
( y) W
,
||
y w
||
2
=
||
y π
W
( y) + π
W
( y) w
||
2
=
||
y π
W
( y)
||
2
+
||
π
W
( y) w
||
2
||
y π
W
( y)
||
2
with equality if and only if w = π
W
( y).
This set up can be used to compute accurate approximations in many contexts.
Examples 2.62. 1. To obtain a quadratic polynomial approximation p(x) = a + bx + cx
2
to e
x
on
the interval [1, 1] we choose to minimize the integral
R
1
1
|
e
x
p(x)
|
2
dx, namely the squared
L
2
-norm
||
e
x
p(x)
||
2
on C[1, 1]. If we let W = Span{1, x, x
2
}, then the finite-dimensionality
of W means that C[1, 1] = W W
. By the Theorem, the solution is p(x) = π
W
( e
x
).
To compute this, recall that we have an orthonormal basis for W, namely
1
2
,
q
3
2
x,
q
5
8
(3x
2
1)
from which
p(x) =
1
2
1, e
x
+
3
2
x, e
x
x +
5
8
(3x
2
1)
3x
2
1, e
x
=
1
2
Z
1
1
e
x
dx +
3
2
x
Z
1
1
xe
x
dx
+
5
8
(3x
2
1)
Z
1
1
(3x
2
1)e
x
dx
=
1
2
( e e
1
) + 3e
1
x +
5
4
( e 7e
1
)(3x
2
1)
1.18 + 1.10x + 0.179(3x
2
1)
1 + 1.1x + 0.537x
2
1
2
y
1 0 1
x
The linear and quadratic approximations to y = e
x
are drawn. Compare this with the Maclaurin
polynomial e
x
1 + x +
1
2
x
2
from calculus.
5
2. The n
th
Fourier approximation of a function f (x) is its orthogonal projection onto the finite-
dimensional subspace
W
n
= Span{1, e
ix
, e
ix
, . . . , e
inx
, e
inx
}
= Span{1, cos x, sin x, cos 2x, sin 2x, . . . , cos nx, sin nx}
within L
2
[π, π], namely
F
n
(x) =
1
2π
n
k=n
D
f (x), e
ikx
E
e
ikx
=
1
2π
f (x), 1
+
1
π
n
k=1
f (x), cos kx
cos kx +
f (x), sin kx
sin kx
According to the Theorem, this is the unique function in F
n
(x) W
n
minimizing the integral
||
f (x) F
n
(x)
||
2
=
Z
π
π
|
f (x) F
n
(x)
|
2
dx
For example, if f (x) =
(
1 if 0 < x π
1 if π < x 0
is extended periodically, then
F
2n1
(x) =
4
π
n
j=1
sin( 2j 1)x
2j 1
=
4
π
sin x +
sin 3x
3
+
sin 5x
5
+ ··· +
sin( 2n 1)x
2n 1
1
1
y
x
π 2ππ2π
y = f (x) and its eleventh Fourier approximation y = F
11
(x)
We’ll return to related minimization problems in the next section.
Exercises 2.6 1. Compute the matrices of the orthogonal projections onto W viewed as subspaces
of the standard inner product spaces R
n
or C
n
.
(a) W = Span
4
1
(b) W = Span
1
2
1
,
1
0
1
(c) W = Span
i
1
0
,
1
i
1
(d) W = Span
1
1
0
,
1
2
1
(watch out, these vectors aren’t orthogonal!)
6
2. For each of the following matrices, compute the projections onto each eigenspace, verify the
resolution of the identity and the spectral decomposition.
(a)
1 2
2 1
(b)
0 1
1 0
(c)
2 3 3i
3 + 3i 5
(d)
2 1 1
1 2 1
1 1 2
(You should have orthonormal eigenbases from the previous section)
3. If W be a finite-dimensional subspace of an inner product space V. If T = π
W
is the orthogonal
projection onto W, prove that I T is the orthogonal projection onto W
.
4. Let T L(V) where V is finite-dimensional.
(a) If T is an orthogonal projection, prove that
||
T(x)
||
||
x
||
for all x V.
(b) Give an example of a projection for which the inequality in (a) is false.
(c) If T is a projection for which
||
T(x)
||
=
||
x
||
for all x V, what is T?
(d) If T is a projection for which
||
T(x)
||
||
x
||
for all x V, prove that T is an orthogonal
projection.
5. Let T be a normal operator on a finite-dimensional inner product space. If T is a projection,
prove that it must be an orthogonal projection.
6. Let T be a normal operator on a finite-dimensional complex inner product space V. Use the
spectral decomposition T = λ
1
π
1
+ ··· + λ
k
π
k
to prove:
(a) If T
n
is the zero map for some n N, then T is the zero map.
(b) U L(V) commutes with T if and only if U commutes with each π
j
.
(c) There exists a normal U L(V) such that U
2
= T.
(d) T is invertible if and only if λ
j
= 0 for all j.
(e) T is a projection if and only if every λ
j
= 0 or 1.
(f) T = T
if and only if every λ
j
is imaginary.
7. Find a linear approximation to f (x) = e
x
on [0, 1] using the L
2
inner product.
8. Consider the L
2
inner product on C[π, π] inner product.
(a) Explain why
sin x, x
2n
= 0 for all n.
(b) Find linear and cubic approximations to f (x) = sin x.
(Feel free to use a computer algebra package to evaluate the integrals!)
9. Revisit Example 2.62.2
(a) Verify that the general complex (e
ikx
) and real (cos kx, sin kx) expressions for the Fourier
approximation are correct.
(Hint: use Eulers formula e
ikx
= cos kx + i sin kx)
(b) Verify the explicit expression for F
2n1
(x) when f (x) is the given step-function. What is
F
2n
(x) in this case?
7
2.7 The Singular Value Decomposition and the Pseudoinverse
Given T L(V, W) between finite-dimensional inner product spaces, the overarching concern of this
chapter is the existence and computation of bases β, γ of V, W with two properties:
That β, γ be orthonormal, thus facilitating easy calculation within V, W;
That the matrix [T]
γ
β
be as simple as possible.
We have already addressed two special cases:
Spectral Theorem When V = W and T is normal/self-adjoint, β = γ such that [T]
β
is diagonal.
Schurs Lemma When V = W and p(t) splits, β = γ such that [ T]
β
is upper triangular.
In this section we allow V = W and β = γ, and obtain a result that applies to any linear map between
finite-dimensional inner product spaces.
Example 2.63. Let A =
3 1
2 2
1 3
and consider orthonormal bases β = {v
1
, v
2
} of R
2
and γ =
{w
1
, w
2
, w
3
} of R
3
respectively:
β =
1
2
1
1
,
1
2
1
1
, γ =
1
2
1
0
1
,
1
6
1
2
1
,
1
3
1
1
1
Since Av
1
= 4w
1
and Av
2
= 2
3w
2
, whence [L
A
]
γ
β
=
4 0
0 2
3
0 0
is almost diagonal.
Our main result says that such bases always exist.
Theorem 2.64 (Singular Value Decomposition). Suppose V, W are finite-dimensional inner prod-
uct spaces and that T L(V, W) has rank r. Then:
1. There exist orthonormal bases β = {v
1
, . . . , v
n
} of V and γ = {w
1
, . . . , w
m
} of W, and positive
scalars σ
1
σ
2
··· σ
r
such that
T(v
j
) =
(
σ
j
w
j
if j r
0 otherwise
equivalently [T]
γ
β
=
diag(σ
1
, . . . , σ
r
) O
O O
2. Any such β is an eigenbasis of T
T, whence the scalars σ
j
are uniquely determined by T: indeed
T
T(v
j
) =
(
σ
2
j
v
j
if j r
0 otherwise
and T
( w
j
) =
(
σ
j
v
j
if j r
0 otherwise
3. If A M
m×n
(F ) with rank A = r, then A = PΣQ
where
Σ = [L
A
]
γ
β
=
diag(σ
1
, . . . , σ
r
) O
O O
, P = (w
1
, . . . , w
m
), Q = (v
1
, . . . , v
n
)
Since the columns of P, Q are orthonormal, these matrices are unitary.
8
Definition 2.65. The numbers σ
1
, . . . , σ
r
are the singular values of T. If T is not maximum rank, we
have additional zero singular values σ
r+1
= ··· = σ
min(m,n)
= 0.
Freedom of Choice While the singular values are uniquely determined, there is often significant free-
dom regarding the bases β and γ, particularly if any eigenspace of T
T has dimension 2.
Special Case (Spectral Theorem) If V = W and T is normal/self-adjoint, we may choose β to be an
eigenbasis of T, then σ
j
is the modulus of the corresponding eigenvalue (see Exercise 7).
Rank-one decomposition If we write g
j
: V F for the linear map g
j
: v 7→
v, v
j
(recall Riesz’s
Theorem), then the singular value decomposition says
T =
r
j=1
σ
j
w
j
g
j
that is T(v) =
r
j=1
σ
j
v, v
j
w
j
thus rewriting T as a linear combination of rank-one maps (w
j
g
j
: V W).
For matrices, g
j
is simply multiplication by the row vector v
j
and we may write
2
A =
r
j=1
σ
j
w
j
v
j
Example (2.63 cont). We apply the method in the Theorem to A =
3 1
2 2
1 3
.
The symmetric(!) matrix A
T
A =
14 2
2 14
has eigenvalues σ
2
1
= 16, σ
2
2
= 12 and orthonormal eigen-
vectors v
1
=
1
2
1
1
, v
2
=
1
2
1
1
. The singular values are therefore σ
1
= 4, σ
2
= 2
3. Now
compute
w
1
=
1
σ
1
Av
1
=
1
4
2
A
1
1
=
1
2
1
0
1
, w
2
=
1
σ
2
Av
2
=
1
2
6
A
1
1
=
1
6
1
2
1
and observe that these are orthonormal. Finally choose w
3
=
1
3
1
1
1
to complete the orthonormal
basis γ of R
3
. A singular value decomposition is therefore
A =
3 1
2 2
1 3
= PΣQ
=
1
2
1
6
1
3
0
2
6
1
3
1
2
1
6
1
3
4 0
0 2
3
0 0
1
2
1
2
1
2
1
2
!
By expanding the decomposition, A is expressed as the sum of rank-one matrices:
σ
1
w
1
v
T
1
+ σ
2
w
2
v
T
2
= 4
1
2
0
1
2
1
2
1
2
+ 2
3
1
6
2
6
1
6
1
2
1
2
=
2 2
0 0
2 2
+
1 1
2 2
1 1
2
Since β is orthonormal, it is common to write v
j
for the map g
j
= , v
j
in general contexts. To those familiar with
the dual space V
= L(V, F), the set {g
1
, . . . , g
n
} = {v
1
, . . . , v
n
} is the dual basis to β. In this course v
j
will only ever mean
the conjugate-transpose of a column vector in F
n
. This discussion is part of why physicists write inner products differently!
9
Proof. 1. Since T
T is self-adjoint, the spectral theorem says it has an orthonormal basis of eigen-
vectors β = {v
1
, . . . , v
n
}. If T
T(v
j
) = λ
j
v
j
, then
T(v
j
), T(v
k
)
=
T
T(v
j
), v
k
= λ
j
v
j
, v
k
=
(
λ
j
if j = k
0 if j = k
()
whence every eigenvalue is a non-negative real number: λ
j
=
T(v
j
)
2
0.
Since rank T
T = rank T = r (Exercise 8), exactly r eigenvalues are non-zero; by reordering
basis vectors if necessary, we may assume
λ
1
··· λ
r
> 0
If j r, define σ
j
:=
p
λ
j
> 0 and w
j
:=
1
σ
j
T(v
j
), then the set {w
1
, . . . , w
r
} is orthonormal ().
If necessary, extend this to an orthonormal basis γ of W.
2. If orthonormal bases β and γ exist such that [T]
γ
β
=
diag(σ
1
,...,σ
r
) O
O O
, then [T
]
β
γ
is essentially the
same matrix just that its dimensions have been reversed:
[T
]
β
γ
=
diag(σ
1
, . . . , σ
r
) O
O O
= T
T =
diag(σ
2
1
, . . . , σ
2
r
) O
O O
whence T
and T
T are as claimed.
3. This is merely part 1 in the context of T = L
A
L( F
n
, F
m
). The orthonormal bases β, γ consist
of column vectors and so the (change of co-ordinate) matrices P, Q are unitary.
Examples 2.66. 1. The matrix A =
2 3
0 2
has A
T
A =
4 6
6 13
with eigenvalues σ
2
1
= 16 and σ
2
2
= 1
and orthonormal eigenbasis
β =
1
5
1
2
,
1
5
2
1
The singular values are therefore σ
1
= 4 and σ
2
= 1, from which we obtain
γ =
1
σ
1
Av
1
,
1
σ
2
Av
2
=
1
5
2
1
,
1
5
1
2
and the decomposition
A = PΣQ
=
2
5
1
5
1
5
2
5
!
4 0
0 1
1
5
2
5
2
5
1
5
!
Multiplying out, we may write A as a sum of rank-one matrices
A =
4
5
2
1
1 2
+
1
5
1
2
2 1
=
4
5
2 4
1 2
+
1
5
2 1
4 2
10
2. The decomposition can be very messy to find in non-matrix situations. Here is a classic example
where we simply observe the structure directly.
The L
2
inner product
f , g
=
R
1
0
f (x)g(x) dx on P
2
(R ) and P
1
(R ) admits orthonormal bases
β =
n
5( 6x
2
6x + 1),
3( 2x 1), 1
o
, γ =
n
3( 2x 1), 1
o
Let T =
d
dx
be the derivative operator. The matrix of T is already in the required form!
[T]
γ
β
=
2
15 0 0
0 2
3 0
thus β , γ are suitable bases and the singular values of T are σ
1
= 2
15 and σ
2
= 2
3.
Since β , γ are orthonormal, we could have used the adjoint method to evaluate this directly,
[T
T]
β
= ([T]
γ
β
)
T
[T]
γ
β
=
60 0 0
0 12 0
0 0 0
= σ
2
1
= 60, σ
2
2
= 12
Up to sign, {[v
1
]
β
, [v
2
]
β
, [v
3
]
β
} is therefore forced to be the standard ordered basis of R
3
, con-
firming that β was the correct basis of P
2
(R ) all along!
The Pseudoinverse
The singular value decomposition of a map T L(V, W) gives rise to a natural map from W back to
V. This map behaves somewhat like an inverse even when the operator is non-invertible!
Definition 2.67. Given the singular value decomposition of a rank r map T L(V, W), the pseu-
doinverse of T is the linear map T
L(W, V) defined by
T
( w
j
) =
(
1
σ
j
v
j
if j r
0 otherwise
Restricted to Span{v
1
, . . . , v
r
} Span{w
1
, . . . , w
r
}, the pseudoinverse really does invert T:
T
T(v
j
) =
(
v
j
if j r
0 otherwise
TT
( w
j
) =
(
w
j
if j r
0 otherwise
V
T
= Span{v
1
, . . . , v
r
}
| {z }
N(T)
=R(T
)
Span{v
r+1
, . . . , v
n
}
| {z }
N(T)=R(T
)
{0
V
}
bijection
OO
W
T
OO
=
R(T)=N(T
)
z }| {
Span{w
1
, . . . , w
r
}
R(T)
=N( T
)
z }| {
Span{w
r+1
, . . . , w
m
}
CC
{0
W
}
Otherwise said, the combinations are orthogonal projections: T
T = π
N(T)
and TT
= π
R(T)
11
Given the singular value decomposition of a matrix A = PΣQ
, its pseudoinverse is that of matrix of
(L
A
)
, namely
A
=
r
j=1
1
σ
j
v
j
w
j
= QΣ
P
where Σ
=
diag(σ
1
1
, . . . , σ
1
r
) O
O O
Examples 2.68. 1. Again continuing Example 2.63, A =
3 1
2 2
1 3
has pseudoinverse
A
=
1
σ
1
v
1
w
1
+
1
σ
2
v
2
w
2
=
1
4
2
2
1
1
(1 0 1) +
1
2
3
2
6
1
1
( 1 2 1)
=
1
8
1 0 1
1 0 1
+
1
12
1 2 1
1 2 1
=
1
24
5 4 1
1 4 5
which is exactly what we would have found by computing A
= QΣ
P
. Observe that
A
A =
1 0
0 1
and AA
=
1
3
2 1 1
1 2 1
1 1 2
are the orthogonal projection matrices onto the spaces N(A)
= Span{v
1
, v
2
} = R
2
and
R(A) = Span{w
1
, w
2
} R
3
respectively. Both spaces have dimension 2, since rank A = 2.
2. The pseudoinverse of T =
d
dx
: P
2
(R ) P
1
(R ), as seen in Example 2.66.2, maps
T
(
3( 2x 1)) =
1
2
15
5( 6x
2
6x + 1) =
1
2
3
(6x
2
6x + 1)
T
(1) =
1
2
3
3( 2x 1) = x
1
2
= T
(a + bx) = T
a +
b
2
+
b
2
3
3( 2x 1)
=
a +
b
2
x
1
2
+
b
12
(6x
2
6x + 1)
=
b
2
x
2
+ ax
a
2
b
6
The pseudoinverse of ‘differentiation’ therefore returns a particular choice of anti-derivative,
namely the unique anti-derivative of a + bx lying in Span{
5( 6x
2
6x + 1),
3( 2x 1)}.
Exercises 2.7 1. Find the ingredients β, γ and the singular values for each of the following:
(a) T L(R
2
, R
3
) where T
(
x
y
)
=
x
x+y
xy
(b) T : P
2
(R ) P
1
(R ) and T( f ) = f
′′
where
f , g
:=
R
1
0
f (x)g(x) dx
(c) V = W = Span{1, sin x, cos x} and
f , g
=
R
2π
0
f (x)g(x) dx, with T( f ) = f
+ 2 f
12
2. Find a singular value decomposition of each of the matrices:
(a)
1 1
1 1
1 1
(b)
1 0 1
1 0 1
(c)
1 1 1
1 1 0
1 0 1
3. Find an explicit formula for T
for each map in Exercise 1.
4. Find the pseudoinverse of each of the matrix in Exercise 2.
5. Suppose A = PΣQ
is a singular value decomposition.
(a) Describe a singular value decomposition of A
.
(b) Explain why A
= QΣ
P
isn’t a singular value decomposition of A
; what would be the
correct decomposition? (Hint: what is wrong with Σ
?)
6. Suppose T : V W is written according to the singular value theorem. Prove that γ is a basis
of eigenvectors of TT
with the same non-zero eigenvalues as T
T, including repetitions.
7. (a) Suppose T = L(V) is normal. Prove that each v
j
in the singular value theorem may be
chosen to be an eigenvector of T and that σ
j
is the modulus of the corresponding eigenvalue.
(b) Let A =
0 1
1 0
. Show that any orthonormal basis β of R
2
satisfies the singular value theo-
rem. What is γ here? What is it about the eigenvalues of A that make this possible?
(Even when T is self-adjoint, the vectors in β need not also be eigenvectors of T!)
8. In the proof of the singular value theorem we claimed that rank T
T = rank T. Verify this by
checking explicitly that N(T
T) = N( T) .
(This is circular logic if you use the decomposition, so you must do without!)
9. Let V, W be finite-dimensional inner product spaces and T L( V, W). Prove:
(a) T
TT
= T
TT
= T
.
(Hint: evaluate on the basis γ = {w
1
, . . . , w
m
} in the singular value theorem)
(b) If T is injective, then T
T is invertible and T
= (T
T)
1
T
.
(c) If T is surjective, then TT
is invertible and T
= T
(TT
)
1
.
10. Consider the equation T(x) = b, where T is a linear map between finite-dimensional inner
product spaces. A least-squares solution is a vector x which minimizes
||
T(x) b
||
.
(a) Prove that x
0
= T
( b) is a least-squares solution and that any other has the form x
0
+ n
for some n N(T).
(Hint: Theorem 2.61 says that x
0
is a least-squares solution if and only if T(x
0
) = π
R(T)
( b))
(b) Prove that x
0
= T
( b) has smaller norm than any other least-squares solution.
(c) If T is injective, prove that x
0
= T
( b) is the unique least-squares solution.
11. Find the minimal norm solution to the first system, and the least-squares solution to the second:
(
3x + 2y + z = 9
x 2y + 3z = 3
3x + y = 1
2x 2y = 0
x + 3y = 0
13
Linear Regression (non-examinable)
Given a data set {(t
j
, y
j
) : 1 j m}, we may employ the least-squares method to find a best-fitting
line y = c
0
+ c
1
t; often used to predict y given a value of t.
The trick is to minimize the sum of the squares of the vertical deviations of the line from the data set.
m
j=1
y
j
c
0
c
1
t
j
2
=
||
y Ax
||
2
where A =
t
1
1
.
.
.
.
.
.
t
m
1
x =
c
1
c
0
y =
y
1
.
.
.
y
m
With the indicated notation, we recognize this as a least-squares problem. Indeed if there are at least
two distinct t-values in the data set, then rank A = 2 is maximal and we have a unique best-fitting
line with coefficients given by
c
1
c
0
= A
y = (A
T
A)
1
A
T
y
Example 2.69. Given the data set {(0, 1), (1, 1), (2, 0), (3, 2), (4, 2)}, we compute
A =
0 1
1 1
2 1
3 1
4 1
, y =
1
1
0
2
2
= x
0
=
c
1
c
0
= (A
T
A)
1
A
T
y =
30 10
10 5
1
15
6
=
3
10
1
2
The regression line therefore has equation y =
3
10
( t + 2).
The process can be applied more generally to approximate using other functions. To find the best-
fitting quadratic polynomial y = c
0
+ c
1
t + c
2
t
2
, we’d instead work with
A =
t
2
1
t
1
1
.
.
.
.
.
.
.
.
.
t
2
m
t
m
1
x =
c
2
c
1
c
0
y =
y
1
.
.
.
y
m
=
m
j=1
y
j
c
0
c
1
t c
2
t
2
j
2
=
||
y Ax
||
2
Provided we have at least three distinct values t
1
, t
2
, t
3
, the matrix A is guaranteed to have rank 3 and
there will be a best-fitting least-squares quadratic: in this case
y =
1
70
(15t
2
39t + 72)
This curve and the best-fitting straight line are shown below.
0
1
2
y
0 1 2 3 4
t
0
1
2
y
0 1 2 3 4
t
14
Optional Problems Use a computer to invert any 3 ×3 matrices!
1. Check the calculation for the best-fitting least-squares quadratic in Example 2.69.
2. Find the best-fitting least-squares linear and quadratic approximations to the data set
{
(1, 2), (3, 4), (5, 7), (7, 9), (9, 12)
}
3. Suppose a data set {(t
j
, y
j
) : 1 j m} has unique regression line y = ct + d.
(a) Show that the equations A
T
Ax
0
= A
T
y can be written in matrix form
t
2
j
t
j
t
j
m
c
d
=
t
j
y
j
y
j
(b) Recover the standard expressions from statistics:
c =
Cov(t, y)
σ
2
t
and d = y ct
where
t =
1
m
m
j=1
t
j
and y =
1
m
m
j=1
y
j
are the means (averages),
σ
2
t
=
1
m
m
j=1
( t
j
t)
2
is the variance,
Cov(t, y) =
1
m
m
j=1
( t
j
t)(y
j
y) is the covariance.
15
2.8 Bilinear and Quadratic Forms
In this section we slightly generalize the idea of an inner product. Throughout, V is a vector space
over a field F; it need not be an inner product space and F can be any field (not just R or C).
Definition 2.70. A bilinear form B : V ×V F is linear in each entry: v, x, y V, λ F,
B(λx + y, v) = λB(x, v) + B(y, v), B(v, λx + y) = λB(v, x) + B(v, y)
Additionally, B is symmetric if x, y V, B(x, y) = B(y, x).
Examples 2.71. 1. If V is a real inner product space, then the inner product
,
is a symmetric
bilinear form. Note that a complex inner product is not bilinear!
2. If A M
n
(F ), then B(x, y) := x
T
Ay is a bilinear form on F
n
. For instance, on R
2
,
B(x, y) = x
T
1 2
2 0
y = x
1
y
1
+ 2x
1
y
2
+ 2x
2
y
1
defines a symmetric bilinear form, though not an inner product since it isn’t positive definite;
for example B(j, j) = 0.
As seen above, we often make use of a matrix.
Definition 2.72. Let B be a bilinear form on a finite-dimensional space with basis ϵ = {v
1
, . . . , v
n
}.
The matrix of B with respect to ϵ is the matrix [B]
ϵ
= A M
n
(F ) with ij
th
entry
A
ij
= B(v
i
, v
j
)
Given x, y V, compute their co-ordinate vectors [x]
ϵ
, [y]
ϵ
with respect to ϵ, then
B(x, y) = [x]
T
ϵ
A[y]
ϵ
The set of bilinear forms on V is therefore in bijective correspondence with M
n
(F ). Moreover,
B(y, x) = [y]
T
ϵ
A[x]
ϵ
=
[y]
T
ϵ
A[x]
ϵ
T
= [x]
T
ϵ
A
T
[y]
ϵ
Finally, if β is another basis of V, then an appeal to the change of co-ordinate matrix Q
β
ϵ
yields
B(x, y) = [x]
T
ϵ
A[y]
ϵ
= (Q
ϵ
β
[x]
β
)
T
A(Q
ϵ
β
[y]
ϵ
) = [x]
T
β
(Q
ϵ
β
)
T
AQ
ϵ
β
[y]
β
= [B]
β
= (Q
ϵ
β
)
T
[B]
ϵ
Q
ϵ
β
To summarize:
Lemma 2.73. Let B be a bilinear form on a finite-dimensional vector space.
1. If A is the matrix of B with respect to some basis, then every other matrix of B has the form
Q
T
AQ for some invertible Q.
2. B is symmetric if and only if its matrix with respect to any (and all) bases is symmetric.
Naturally, the simplest situation is when the matrix of B is diagonal. . .
16
Examples 2.74. 1. Example 2.71.2 can be written
B(x, y) = x
T
1 2
2 0
y = x
1
y
1
+ 2x
1
y
2
+ 2x
2
y
1
= (x
1
+ 2x
2
)(y
1
+ 2y
2
) 4x
2
y
2
=
x
1
+ 2x
2
x
2
1 0
0 4
y
1
+ 2y
2
y
2
= x
T
1 2
0 1
T
1 0
0 4
1 2
0 1
y
If ϵ is the standard basis, then [B]
β
=
1 0
0 4
where Q
β
ϵ
=
1 2
0 1
. It follows that Q
ϵ
β
=
1 2
0 1
from which β =
1
0
,
2
1
is a diagonalizing basis.
2. In general, we may perform a sequence of simultaneous row and column operations to diago-
nalize any symmetric B; we require only elementary matrices E
(λ)
ij
of type III.
3
For instance:
B(x, y) = x
T
Ay = x
T
1 2 3
2 0 4
3 4 1
y (A = [B]
ϵ
with respect to the standard basis)
E
(2)
21
AE
(2)
12
=
1 0 3
0 4 10
3 10 1
(add twice row 1 to row 2, columns similarly)
E
(3)
31
E
(2)
21
AE
(2)
12
E
(3)
13
=
1 0 0
0 4 10
0 10 10
(subtract 3 times row 1 from row 3, etc.)
E
(1)
23
E
(3)
31
E
(2)
21
AE
(2)
12
E
(3)
13
E
(1)
32
=
1 0 0
0 6 0
0 0 10
(add row 3 to row 2, etc.)
If β is the diagonalizing basis, the the change of co-ordinate matrix is
Q
ϵ
β
= E
(2)
12
E
(3)
13
E
(1)
32
=
1 2 0
0 1 0
0 0 1
1 0 3
0 1 0
0 0 1
1 0 0
0 1 0
0 1 1
=
1 1 3
0 1 0
0 1 1
from which β =
n
1
0
0
,
1
1
1
,
3
0
1
o
. If you’re having trouble believing this, invert the
change of co-ordinate matrix and check that
B(x, y) = (x
1
2x
2
+ 3x
3
)(y
1
2y
2
+ 3y
3
) + 6x
2
y
2
10(x
2
+ x
3
)(y
2
+ y
3
)
Warning! If F = R then every symmetric B may be diagonalized by an orthonormal basis (for the
usual dot product on R
n
). It is very unlikely that our algorithm will produce such! The algorithm has
two main advantages over the spectral theorem: it is typically faster and it applies to vector spaces
over any field. As a disadvantage, it is highly non-unique.
3
Recall that E
(λ)
ij
is the identity matrix with an additional λ in the ij
th
entry.
As a column operation (right-multiplication), A 7→ AE
(λ)
ij
adds λ times the i
th
column to the j
th
.
As a row operation (left-multiplication), A 7 E
(λ)
ji
A = (E
(λ)
ij
)
T
A adds λ times the i
th
row to the j
th
.
17
Example 2.75. We diagonalize B(x, y) = x
T
1 6
6 3
y = x
1
y
1
+ 6x
1
y
2
+ 6x
2
y
1
+ 3x
2
y
2
in three ways.
1 0
6 1
1 6
6 3
1 6
0 1
=
1 0
0 33
= [B]
β
where β =
1
0
,
6
1
. This corresponds to
B(x, y) = (x
1
+ 6x
2
)(y
1
+ 6y
2
) 33x
2
y
2
1 2
0 1
1 6
6 3
1 0
2 1
=
11 0
0 3
= [B]
γ
where γ =
1
2
,
0
1
. This corresponds to
B(x, y) = 11x
1
y
1
+ 3(2x
1
+ x
2
)(2y
1
+ y
2
)
If F = R, we may apply the spectral theorem to see that [B]
η
=
2+
37 0
0 2
37
is diagonal
with respect to η =
n
6
1+
37
,
1
37
6
o
. The expression for B(x, y) in these co-ordinates is
disgusting, so we omit it; that B can be diagonalized orthogonally doesn’t mean it should be!
Theorem 2.76. Suppose B is a bilinear form of a finite-dimensional space V over F.
1. If B is diagonalizable, then it is symmetric.
2. If B is symmetric and F does not have characteristic two (see aside), then B is diagonalizable.
Proof. 1. If B is diagonalizable, β such that [B]
β
is diagonal and thus symmetric.
2. Suppose B is non-zero (otherwise the result is trivial). We prove by induction on n = dim V.
If n = 1, the result is trivial: B(x, y) = axy for some a F is clearly symmetric.
Fix n N and assume that every non-zero symmetric bilinear form on a dimension n vector
space over F is diagonalizable. Let dim V = n + 1. By the discussion below, x V such that
B(x, x) = 0. Consider the linear map
T : V F : v 7 B(x, v)
Clearly rank T = 1 = dim N(T) = n. Moreover, B is symmetric when restricted to N(T);
by the induction hypothesis there exists a basis β of N(T) such that [B
N(T)
]
β
is diagonal. But
then B is diagonal with respect to the basis β {x}.
Aside: Characteristic two fields This means 1 + 1 = 0 in F; this holds, for instance, in the field
Z
2
= {0, 1} of remainders modulo 2. We now see the importance of char F = 2 to the above result.
The proof uses the existence of x V such that B(x, x) = 0. If B is non-zero, u, v such that
B(u, v) = 0. If both B( u, u) = 0 = B(v, v), then x = u + v does the job whenever char F = 2:
B(x, x) = B(u, v) + B(v, u) = 2B(u, v) = 0
To see that the requirement isn’t idle, consider B(x, y) = x
T
0 1
1 0
y on the finite vector space
Z
2
2
= {
0
0
,
1
0
,
0
1
,
1
1
} over Z
2
. Every element of this space satisfies B(x, x) = 0! Perhaps
surprisingly, the matrix of B is identical with respect to any basis of Z
2
2
, whence B is symmetric
but non-diagonalizable.
18
In Example 2.75 notice how the three diagonal matrix representations have something in common:
one each of the diagonal entries are positive and negative. This is a general phenomenon:
Theorem 2.77 (Sylvesters Law of Inertia). Suppose B is a symmetric bilinear form on a real vector
space V with diagonal matrix representation diag(λ
1
, . . . , λ
n
). Then the number of entries λ
j
which
are positive/negative/zero is independent of the diagonal representation.
Definition 2.78. The signature of a symmetric bilinear form B is the triple (n
+
, n
, n
0
) representing
how many positive, negative and zero terms are in any diagonal representation. Sylvester’s Law says
that the signature is an invariant of a symmetric bilinear form.
Positive-definiteness says that a real inner product on an n-dimensional space has signature (n, 0, 0).
Practitioners of relativity often work in Minkowski spacetime: R
4
equipped with a signature (1, 3, 0)
bilinear form, typically
B(x, y) = x
T
c
2
0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
y = c
2
x
1
y
1
x
2
y
2
x
3
y
3
x
4
y
4
where c is the speed of light. Vectors are time-, space-, or light-like depending on whether B(x, x) is
positive, negative or zero. For instance x = 3c
1
e
1
+ 2e
2
+ 2e
3
+ e
4
is light-like.
Proof. For simplicity, let V = R
n
and write B(x, y) = x
T
Ay where A is symmetric.
1. Define rank B := rank A and observe this is independent of basis (exercises).
2. Let β, γ be diagonalizing bases, ordered according to whether B is positive, negative or zero.
β = {v
1
, . . . , v
p
, v
p+1
, . . . , v
r
, v
r+1
, . . . , v
n
}
γ = {w
1
, . . . , w
q
, w
q+1
, . . . , w
r
, w
r+1
, . . . , w
n
}
Here r = rank B in accordance with part 1: our goal is to prove that p = q.
3. Assume p < q, define the matrix C and check what follows:
C =
v
T
1
A
.
.
.
v
T
p
A
w
T
q+1
A
.
.
.
w
T
r
A
M
(rq+p)×n
(R )
(a) rank C r q + p < r = null C > n r, thus
x R
n
such that Cx = 0 and x Span{v
r+1
, . . . , v
n
}
(b) The first p entries of Cx = 0 mean that x Span{v
p+1
, . . . , v
r
, v
r+1
, . . . , v
n
} and so
B(x, x) < 0. Note how we use part (a) to get a strict inequality here.
(c) Now write x with respect to γ: this time we see that x Span{w
1
, . . . , w
q
, w
r+1
, . . . , w
n
},
whence B(x, x) 0: a contradiction.
19
Quadratic Forms & Diagonalizing Conics
Definition 2.79. To every symmetric bilinear form B : V ×V F is associated a quadratic form
K : V F : x 7 B(x, x)
A function K : V F is termed a quadratic form when such a symmetric bilinear form exists.
Examples 2.80. 1. If B is a real inner product, then K(v) =
v, v
=
||
v
||
2
is the square of the norm.
2. Let dim V = n and A be the matrix of B with respect to a basis β. By the symmetry of A,
K(x) = x
T
Ax =
n
i,j=1
x
i
A
ij
x
j
=
1ijn
˜
a
ij
x
i
x
j
where
˜
a
ij
=
(
A
ij
if i = j
2A
ij
if i = j
E.g., K(x) = 3x
2
1
+ 4x
2
2
2x
1
x
2
corresponds to the bilinear form B(x, y) = x
T
3 1
1 4
y
As a fun application, we consider the diagonalization of conics in R
2
. The general non-zero conic has
equation
ax
2
+ 2bxy + cy
2
+ dx + ey + f = 0
where the first three terms comprise a quadratic form
K(x) = ax
2
+ 2bxy + cy
2
B(v, w) = v
T
a b
b c
w
If {v
1
, v
2
} is a diagonalizing basis, then there exist scalars λ
1
, λ
2
for which
K(t
1
v
1
+ t
2
v
2
) = λ
1
t
2
1
+ λ
2
t
2
2
whence the general conic becomes
λ
1
t
2
1
+ λ
2
t
2
2
+ µ
1
t
1
+ µ
2
t
2
= η, λ
1
, λ
2
, µ
1
, µ
2
, η R
If λ
i
= 0, we may complete the square via the linear transformation s
j
= t
j
+
µ
j
2λ
j
. The canonical
forms are then recovered:
Parabola: λ
1
or λ
2
= 0 (but not both).
Ellipse: λ
1
s
2
1
+ λ
2
s
2
2
= k = 0 where λ
1
, λ
2
, k have the same sign.
Hyperbola: λ
1
s
2
1
+ λ
2
s
2
2
= k = 0 where λ
1
, λ
2
have opposite signs.
Since B is symmetric, we may take {v
1
, v
2
} to be an orthonormal basis of R
2
, whence any conic may be
put in canonical form by applying only a rotation/reflection and translation (completing the square).
Alternatively, we could diagonalize K using our earlier algorithm; this additionally permits shear
transforms. By Sylvester’s Law, the diagonal entries will have the same number of (+, , 0) terms
regardless of the method, so the canonical form will be unchanged.
20
Examples 2.81. 1. We describe and plot the conic with equation 7x
2
+ 24xy = 144.
The matrix of the associated bilinear form is
7 12
12 0
which has orthonormal eigenbasis
β = {v
1
, v
2
} =
1
5
4
3
,
1
5
3
4
with eigenvalues (λ
1
, λ
2
) = (16, 9). In the rotated ba-
sis, this is the canonical hyperbola
16t
2
1
9t
2
2
= 144
t
2
1
3
2
t
2
2
4
2
= 1
which is easily plotted. In case this is too fast, use the
change of co-ordinate matrix to compute directly:
4
2
2
4
y
4 2 2 4
x
v
1
v
2
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
t
1
t
2
Q
ϵ
β
=
1
5
4 3
3 4
=
t
1
t
2
= [x]
β
= Q
β
ϵ
x
y
=
1
5
4x + 3y
3x + 4y
which quickly recovers the original conic by substitution.
2. The conic defined by K(x) = x
2
+ 12xy + 3y
2
= 33 defines
a hyperbola in accordance with Example 2.75. With respect
to the basis γ = {
1
2
,
0
1
}, we see that
K(x) = 11x
2
+ 3(2x + y)
2
= 11t
2
1
+ 3t
2
2
= 33
t
2
1
3
t
2
2
11
= 1
Even though γ is non-orthogonal, we can still plot the conic!
If we instead used the orthonormal basis η, we’d obtain
K(x) = (
37 + 2)s
2
1
(
37 2)s
2
2
= 33
however the calculation to find η is time-consuming and the
expressions for s
1
, s
2
are extremely ugly.
y
6 3 3 6
x
v
1
v
2
1
2
3
1
2
3
6
3
6
3
t
1
t
2
A similar approach can be applied to higher degree quadratic equations/manifolds: e.g. ellipsoids,
paraboloids and hyperboloids in R
3
.
Exercises 2.8 1. Prove that the sum of any two bilinear forms is bilinear, and that any scalar multi-
ple of a bilinear form is bilinear: thus the set of bilinear forms on V is a vector space.
(You can’t use matrices here, since V could be infinite-dimensional!)
2. Compute the matrix of the bilinear form
B(x, y) = x
1
y
1
2x
1
y
2
+ x
2
y
1
x
3
y
3
on R
3
with respect to the basis β =
n
1
0
1
,
1
0
1
,
0
1
0
o
.
21
3. Check that the function B( f , g) = f
(0)g
′′
(0) is a bilinear form on the vector space of twice-
differentiable functions. Find the matrix of B with respect to β = {cos t, sin t, cos 2t, sin 2t}
when restricted to the subspace Span β.
4. For each matrix A, find a diagonal matrix D and an invertible matrix Q such that Q
T
AQ = D.
(a)
1 3
3 2
(b)
3 1 2
1 4 0
2 0 1
5. If K is a quadratic form and K(x) = 2, what is the value of K(3x)?
6. If F does not have characteristic 2, and K(x) = B(x, x) is a quadratic form, prove that we can
recover the bilinear form B via
B(x, y) =
1
2
(
K(x + y) K(x) K(y)
)
7. If B(x, y) = x
T
0 1
1 0
y is a bilinear form on F
2
, compute the quadratic form K(x).
8. Suppose B is a symmetric bilinear form on a real finite-dimensional space. With reference to
the proof of Sylvester’s Law, explain why rank B is independent of the choice of diagonalizing
basis.
9. If char F = 2, apply the diagonalizing algorithm to the symmetric bilinear form B = x
T
0 1
1 0
y
on F
2
. What goes wrong if char F = 2?
10. Describe and plot the following conics:
(a) x
2
+ y
2
+ xy = 6
(b) 35x
2
+ 120xy = 4x + 3y
11. Suppose that a non-empty, non-degenerate
4
conic C in R
2
has the form ax
2
+ 2bxy + cy
2
+ dx +
ey + f = 0, where at least one of a, b, c = 0, and define = b
2
ac. Prove that:
C is a parabola if and only if = 0;
C is an ellipse if and only if < 0;
C is a hyperbola if and only if > 0.
(Hint: λ
1
, λ
2
are the eigenvalues of a symmetric matrix, so. . . )
4
The conic contains at least two points and cannot be factorized as a product of two straight lines: for example, the
following are disallowed;
x
2
+ y
2
+ 1 = 0 is empty (unless one allows conics over C . . .)
x
2
+ y
2
= 0 contains only one point;
x
2
xy x + y = (x 1)(x y) = 0 is the product of two lines.
22