Math 140B - Notes
Neil Donaldson
March 11, 2026
1 Continuity
The overarching goal of this course and its prequel is to make elementary calculus rigorous. We begin
with a review of some basic concepts and conventions.
Sets & Functions In these notes, essentially all functions have the form f : U V where both U, V
are subsets of the real numbers R. To f are associated several concepts:
Domain dom( f ) = U; the inputs to f . Often implied to be the largest set on which a formula is defined.
In calculus examples, the domain is typically a union of (open) intervals.
Codomain codom( f ) = V; the potential outputs of f . By convention, V = R unless necessary.
Range range( f ) = f (U) = {f (x) : x U}; the realized outputs of f and a subset of V.
Injectivity f is injective/one-to-one if f (x) = f (y) = x = y: distinct inputs produce distinct outputs.
Surjectivity f is surjective/onto if f (U) = V: all potential outputs are realized.
Inverses f is bijective/invertible if it is injective and surjective. Equivalently, f
1
: V U satisfying
u U, f
1
f (u)
= u and v V, f
f
1
(v)
= v
Example 1.1. The function defined by f (x) =
1
x(x2)
has implied
dom( f ) = R \ {0, 2} = (, 0) (0, 2) (2, )
range( f ) = (, 1] (0, )
The function is neither injective nor surjective. By restricting the do-
main & codomain, we obtain a bijection:
dom(
ˆ
f ) = [1, 2) (2, )
codom(
ˆ
f ) = (, 1] (0, )
with inverse
ˆ
f
1
(y) =
(
1 + y
1
p
y + 1 if y > 0
1 y
1
p
y + 1 if y 1
Now dom(
ˆ
f
1
) = codom(
ˆ
f ) and codom(
ˆ
f
1
) = dom(
ˆ
f ).
2
1
1
2
f (x)
1 1 2 3
x
2 1 0 1 2
1
2
3
ˆ
f
1
(y)
y
Suprema and Infima A set U R is bounded above if it has an upper bound M:
M R such that u U, u M
Axiom 1.2 (Completeness). If U R is non-empty and bounded above, then it has a least upper
bound, the supremum of U
sup U = min
M R : u U, u M
By convention, sup U := if U is unbounded above and sup := ; now every subset of R has a
supremum. Similarly, the infimum of U is its greatest lower bound:
inf U =
max
m R : u U, u m
if U = is bounded below
if U = is unbounded below
if U =
Examples 1.3. Here are four sets with their suprema and infima stated. You should be able to verify
these assertions directly from the definitions.
U {1, 2, 3, 4} (0, 5) (, π] R {
1
n
: n N}
sup U 4 5 π 1
inf U 1 0 0
Note how the supremum/infimum might or might not lie in the set itself.
Interiors, closures, boundaries and neighborhoods These last concepts might not be review, but
they will be used repeatedly.
Definition 1.4. Let U R. A value a R is interior to U if it lies in some open subinterval of U:
δ > 0 such that (a δ, a + δ) U
A neighborhood of a is any set to which a is interior: the interval (a δ, a + δ) is an open δ-neighborhood
of a. A punctured neighborhood of a is a neighborhood with a deleted.
The set of points interior to U is denoted U
.
A limit point of U is the limit of some sequence (x
n
) U. The closure U is the set of limit points.
The boundary of U is the set U = U \U
.
Examples 1.5. 1. If U = [1, 3), then U
= (1, 3), U = [1, 3] and U = {1, 3}.
2. Q
= and Q = Q = R.
3. (3, 5) (5, 7] is a punctured neighborhood of 5.
2
1.17 Continuity of Functions
Everything in this section should be review.
Definition 1.6. A function f : U R is continuous at u U if either/both of the following hold:
1. For all sequences (x
n
) U converging to u, the sequence ( f (x
n
)) converges to f (u).
2. ϵ > 0, δ > 0 such that (x U),
|
x u
|
< δ =
|
f (x) f (u)
|
< ϵ.
A function f is continuous on U if it is continuous at every point u U.
Examples 1.7. 1. We prove that f (x) = x
3
is continuous at u = 2.
(a) (Limit method) Let x
n
2. By the limit laws (i.e. lim(x
k
n
) =
(
lim x
n
)
k
),
lim f (x
n
) = lim x
3
n
=
lim x
n
3
= 2
3
= f (2)
(b) (ϵδ method) Let ϵ > 0 be given and let δ = min
1,
ϵ
19
.
|
x 2
|
< δ =
|
x 2
|
< 1 = 1 < x < 3
from which
x
3
2
3
=
|
x 2
|
x
2
+ 2x + 2
2
< 19
|
x 2
|
ϵ
where we used the triangle inequality.
2. Let g(x) =
(
x sin
1
x
if x = 0,
0 if x = 0
Then g is continuous at x = 0. Again this can be done with limits
or an ϵδ argument; both are essentially the squeeze theorem.
3. The function defined by
h(x) =
(
1 + 2x
2
if x < 1
2 x if x 1
is discontinuous at x = 1.
(a) The sequence with x
n
= 1
1
n
converges to 1, yet
lim h(x
n
) = 3 = 1 = h(1)
(b) Choose ϵ = 1 and suppose δ > 0 is given. Now choose
x = max{1
δ
2
,
1
2
} to see that
|
x 1
|
< δ and
|
h(x) h(1)
|
1 = ϵ
g(x)
x
0
1
2
3
h(x)
0 1 2
x
x
3
Theorem 1.8. The two parts of Definition 1.6 are equivalent.
Proof. (1 2) We prove the contrapositive. Suppose condition 2 is false; that is,
ϵ > 0, such that δ > 0, x U with
|
x u
|
< δ and
|
f (x) f (u)
|
ϵ
In particular, for any n N we may let δ =
1
n
to obtain
ϵ > 0, such that n N, x
n
U with
|
x
n
u
|
<
1
n
and
|
f (x
n
) f (u)
|
ϵ
The sequence (x
n
) shows that condition 1 is false:
n,
|
x
n
u
|
<
1
n
whence x
n
u.
n,
|
f (x
n
) f (u)
|
ϵ > 0, whence f (x
n
) does not converge to f (u).
(2 1) Suppose condition 2 is true, that (x
n
) U converges to u and that ϵ > 0 is given. Then
δ > 0 such that
|
x u
|
< δ =
|
f (x) f (u)
|
< ϵ
However, by the definition of convergence (x
n
u),
N N such that n > N =
|
x
n
u
|
< δ =
|
f (x
n
) f (u)
|
< ϵ
Otherwise said, f (x
n
) f (u).
Rather than use these definitions every time, it is helpful to have a working dictionary.
Theorem 1.9 (Dictionary of Common Continuous Functions).
1. Suppose f and g are continuous at u and that k is constant. Then the following are continuous
at u (if defined—don’t divide by zero!):
f + g, f g, f g,
f
g
,
|
f
|
, k f , max( f , g), min( f , g)
2. If f is continuous at u and h is continuous at f (u) , then h f is continuous at u.
3. Algebraic functions are continuous: these are functions constructed using finitely many addi-
tion/subtraction, multiplication/division and n
th
root operations.
4. The familiar transcendental functions are continuous: exp, ln, sin, etc.
Example 1.10. f (x) = sin
3
x
2
+7
x2
+ cos
1
e
x
1
is continuous on its domain (, 0) (0, 1) (1, ).
Theses claims are tedious to prove using elementary definitions. In particular, it is better to defer a
proof of the transcendental claim until we can define such functions using power series, after which
continuity comes for free.
4
Exercises 1.17. Key concepts/results: Suprema/Completeness, Sequential & ϵ-δ continuity
1. Give examples to show that g f being continuous can happen with:
(a) f continuous and g discontinuous. (b) g continuous and f discontinuous.
(c) Both f , g discontinuous.
You may use pictures, but make sure they clearly describe the functions f , g.
2. (a) Prove that the function f (x) = x
3
is continuous at x = 2 using an ϵδ argument.
(b) Prove that f (x) = x
3
is continuous at x = u using an ϵδ argument.
3. Prove that the following are discontinuous at x = 0: use both definitions of continuity.
(a) f (x) = 1 for x < 0 and f (x) = 0 for x 0.
(b) g(x) = sin(1/x) for x = 0 and g(0) = 0.
4. If f is continuous at u, prove that it is bounded on some set (u δ, u + δ) dom( f ).
5. Prove the following parts of Theorem 1.9 using ϵδ arguments.
(a) If f , g are continuous at u, then f g is continuous at u.
(b) If f , g are continuous at u, then f g is continuous at u.
(c) If f is continuous at u and h at f (u) , then h f is continuous at u.
6. Suppose f : U R is a function whose domain U contains an isolated point a: i.e. r > 0 such
that (a r, a + r) U = {a}. Prove that f is continuous at a.
7. Refresh your prerequisites by giving formal proofs:
(a) (Suprema and sequences) If M = sup U, then (x
n
) U such that x
n
M.
(This has to work even if M = !)
(b) (Limit of a bounded sequence) If (x
n
) [a, b] and x
n
x, then x [a, b].
(c) (Bolzano–Weierstraß) Every bounded sequence in R has a convergent subsequence.
(Hint: If (x
n
) [a, b], explain why there exist intervals I
1
I
2
I
3
··· such that infinitely
many (x
n
) lie in each interval I
k
. Hence obtain a subsequence (x
n
k
) and prove that it is Cauchy.
1
)
8. (Very Hard) Consider the function f : R R where
f (x) =
(
1
q
whenever x =
p
q
Q with q > 0 and gcd(p, q) = 1
0 if x Q
For example, f (1) = f (2) = f (7) = 1, and f (
1
2
) = f (
1
2
) = f (
3
2
) = ··· =
1
2
, etc. Prove that f
is continuous at each point of R \Q and discontinuous at each point of Q.
(Hint: for continuity, consider A = {r Q : f (r)
1
q
} where q
1
ϵ
. . . )
1
This is a good moment to review Cauchy completeness: that a sequence is convergent if and only if it is Cauchy:
ϵ > 0, N such that m, n > N =
|
x
m
x
n
|
< ϵ
5
1.18 Properties of Continuous Functions
In this section we describe the behavior of a continuous function on an interval. We first consider the
special case when the domain is a closed bounded interval [a, b] .
Theorem 1.11 (Extreme Value Theorem). A continuous function on a closed, bounded interval is
bounded and attains its bounds. Otherwise said, if f : [a, b] R is continuous, then
x, y [a, b] such that f (x) = sup range( f ) and f (y) = inf range( f )
In particular, the supremum and infimum are finite.
Proof. Suppose f is continuous with domain [a, b] and let M = sup{f (x) : x [a, b]}. We invoke the
three parts of Exercise 1.17.7:
(Part a) There exists a sequence (x
n
) [a, b] such that f (x
n
) M.
(Part c) There exists a convergent subsequence (x
n
k
) with limit x.
(Part b) x [a, b].
Since f is continuous, we now have f (x) = lim
k
f (x
n
k
) = M. This shows that M is finite and that f
attains its least upper bound. For the lower bound, apply this to f .
It is worth considering how the result can fail when one of the hypotheses is weakened. For example:
f discontinuous f : [0, 1] R : x 7
(
x if x = 1
0 if x = 1
is bounded but does not attain its bounds.
dom( f ) not closed f : [0, 1) R : x 7 x is bounded but does not attain its bounds.
dom( f ) not bounded f : [0, ) R : x 7 x is unbounded.
We now consider continuous functions on arbitrary intervals. The next result should be familiar from
elementary calculus and is intuitively obvious from the na
¨
ıve notion of continuity (draw the graph
without taking your pen off the page).
Theorem 1.12 (Intermediate Value Theorem). Let f : I R be continuous on an interval I. Sup-
pose a, b I with a < b and that f (a) = f (b). If L lies between f (a) and f (b), then ξ (a, b) such
that f (ξ) = L.
6
Example 1.13. Let f (x) = cos x with a =
π
4
, b = 3π
and L =
1
2
; then
f (ξ) = L ξ
π
3
,
5π
3
,
7π
3
There may therefore be several suitable values of ξ. It is
even possible (Exercise 2) for there to be infinitely many.
1
1
f (x)
x
a bπ 2π
L
Proof. Suppose WLOG that f (a) < L < f (b) and let
S = {x [a, b] : f (x) < L}
Plainly S [a, b) is non-empty, hence ξ := sup S exists
and ξ [a, b]. It remains to show that ξ satisfies the
required properties.
By Exercise 7, (s
n
) S with lim s
n
= ξ. Since f is
continuous, f (ξ) = lim f (s
n
) L. In particular, ξ = b.
x
L
a b
f (a)
f (b)
ξ
S
To finish, play a similar game with the sequence defined by t
n
= min{b, ξ +
1
n
} (see Exercise 4).
Example 1.14. The intermediate value theorem is useful for demonstrating the existence of solutions
to equations. For example, we show that the equation x2
x
= 1 has a solution.
Observe that g(x) = x2
x
1 is continuous.
g(0) = 1 < 0.
g(1) = 1 > 0.
By the intermediate value theorem ξ (0, 1) such
that g( ξ) = 0: that is ξ · 2
ξ
= 1.
1
1
2
g(x)
1
x
ξ
It is inefficient, but one can home in on ξ by repeatedly halving the size of the interval: for instance,
g(
1
2
) =
2
2
1 < 0, g(
3
4
) =
3
4
·2
3/4
1 0.26 > 0 . . . =
1
2
< ξ <
3
4
Corollary 1.15. Continuous functions map intervals to intervals (or points).
Proof. An interval I is characterized by the following property
x
1
, x
2
I, x R, x
1
< x < x
2
= x I
Let f : I R be continuous and suppose its range f (I) is not a single point. If f (a) < L < f (b), then
ξ between a, b such that f (ξ) = L. Otherwise said, L f (I) and so f (I) is an interval.
More generally, if dom( f ) =
S
I
n
is written as a union of disjoint intervals and f is continuous, then
range( f ) =
[
f (I
n
)
7
is also a union of intervals, though these need not be disjoint: a continuous function can bring inter-
vals together, but cannot break them apart.
2
Example 1.16. The function f (x) =
x
2
4 has implied domain (, 2] [2, ) and range [0, ).
Both halves of the domain are mapped onto the same interval range( f ) .
Exercises 1.18. Key concepts: Extreme Value Theorem, Intermediate Value Theorem
Continuous functions preserve intervals
1. Give examples of the following:
(a) An unbounded discontinuous function on a closed bounded interval.
(b) An unbounded continuous function on a non-closed bounded interval.
(c) A bounded continuous function on a closed unbounded interval which fails to attain its
bounds.
2. Consider the function f (x) =
(
x sin
1
x
if x = 0
0 if x = 0
(a) Explain why f is continuous on any interval I.
(b) Suppose a < 0 < b and that f (a), f (b) have opposite signs. If L = 0, show that the
intermediate value theorem is satisfied by infinitely many distinct values ξ.
3. Use the intermediate value theorem to prove that the equation 8x
3
12x
2
2x + 1 = 0 has at
least 3 real solutions (and thus, by the fundamental theorem of algebra, exactly 3).
4. Complete the proof of the intermediate value theorem by defining t
n
= min(b, ξ +
1
n
).
5. (a) Suppose f : U R is continuous and that U =
n
S
k=1
I
k
is the union of a finite sequence (I
k
)
of closed bounded intervals. Prove that f is bounded and attains its bounds.
(b) Let U =
S
n=1
I
n
, where I
n
= [
1
2n
,
1
2n1
] for each n N. Give an example of a continuous
function f : U R which is either unbounded or does not attain its bounds. Explain.
2
More generally, if f : U V is a continuous function between topological spaces and a, b lie in the same component of
U, then f (a), f (b) lie in the same component of f (U). In our examples each component is an interval.
8
1.19 Uniform Continuity
Recall Definition 1.6: f : U R is continuous at all points
3
y U provided
y U, ϵ > 0, δ > 0 such that (x U)
|
x y
|
< δ =
|
f (x) f (y)
|
< ϵ
Note the order of the quantifiers: δ is permitted to depend both on y and ϵ. In the na
¨
ıve sense of
continuity (x close to y = f (x) close to f (y)), the meaning of close can depend on the location y.
Uniform continuity is a stronger condition where the meaning of close is independent of location.
Definition 1.17. f : U R is uniformly continuous if
ϵ > 0, δ > 0 such that (x, y U)
|
x y
|
< δ =
|
f (x) f (y)
|
< ϵ
We’ve included the (typically) hidden quantifiers (x, y) to make clear that δ is independent of x, y.
Note also that the definition is now symmetric in x, y.
Example 1.18. Consider f (x) =
1
x
.
1. If 0 < a < b , then f is uniformly continuous on [a, b).
Let ϵ > 0 be given and let δ = a
2
ϵ. Then x, y [a, b),
|
x y
|
< δ =
1
x
1
y
=
y x
xy
<
δ
xy
δ
a
2
= ϵ
2. If 0 < b , then f is not uniformly continuous on (0, b).
Let ϵ = 1 and suppose δ > 0 is given.
Let x = min(δ, 1,
b
2
) and y =
x
2
.
Certainly x, y (0, b) and
|
x y
|
=
x
2
δ
2
< δ. However,
|
f (x) f (y)
|
=
1
x
1 = ϵ
f (x)
x
a b
δ
ϵ
Think about how ϵ and δ must relate as one slides the intervals in the picture up/down and left/right.
Some intuition will help make sense of the example.
Bounded/unbounded gradient In part 1 ϵ = δa
2
, where
1
a
2
=
|
f
(a)
|
bounds the gradient of f .
By contrast, the slope of f is unbounded in part 2.
Extensibility In part 1 the domain of f may be extended to a (and to b if finite): g : [a, b] R : x 7
1
x
is continuous. In part 2, this is impossible: there is no continuous function g : [0, b) R such
that g(x) =
1
x
whenever x > 0.
Informally, if a continuous function f has bounded gradient, or if you can ‘fill in the holes’ at the
endpoints of dom( f ), then f is uniformly continuous. When uniform continuity is used abstractly in
a proof, it is often one of the above properties that is being invoked. The remainder of this section
involves making these observations watertight.
3
To promote symmetry, we use y instead of u for a generic point of dom( f ).
9
Theorem 1.19. Let f : I R be continuous on an interval I and differentiable with bounded
derivative on the interior I
. Then f is uniformly continuous on I.
The proof depends on the mean value theorem, which should be familiar from elementary calculus;
we’ll discuss a proof later on.
Proof. Suppose
|
f
(x)
|
M on I
. Let ϵ > 0 be given, let δ =
ϵ
M
and suppose (y, x) I. Then
|
x y
|
< δ = ξ I
such that f
(ξ) =
f (x) f (y)
x y
(MVT)
=
|
f (x) f (y)
|
=
f
(ξ)
|
x y
|
< Mδ = ϵ
Theorem 1.19 isn’t a biconditional: for instance, Exercise 1.19.5 shows that f (x) =
x on [0, ) and
g(x) = x
1/3
on R are both uniformly continuous even though both have unbounded slope.
We now discuss extensibility and how uniform continuity relates to continuity on closed sets. First
we see that for closed bounded sets, uniform continuity is nothing new.
Theorem 1.20. If g : [a, b] R is continuous, then it is uniformly continuous.
Proof. Suppose g is continuous but not uniformly so. Then
ϵ > 0 such that δ > 0, x, y [a, b] for which
|
x y
|
< δ and
|
g(x) g(y)
|
ϵ ()
For each n N, let δ =
1
n
to see that there exists sequences (x
n
), (y
n
) [a, b] satisfying the above.
By Bolzano–Weierstraß, the bounded sequence (x
n
) has a convergent subsequence x
n
k
x [a, b].
Clearly
|
x
n
k
y
n
k
|
<
1
n
k
0 = y
n
k
x
But then
|
g(x
n
k
) g(y
n
k
)
|
0, which contradicts () .
Now we build towards a partial converse.
Lemma 1.21. If f : U R is uniformly continuous and (x
n
) U is a Cauchy sequence, then
f (x
n
)
is also Cauchy.
Proof. Let ϵ > 0 be given. Then:
(Uniform Continuity) δ > 0 such that
|
x y
|
< δ =
|
f (x) f (y)
|
< ϵ.
(Cauchy) N N such that m, n > N =
|
x
m
x
n
|
< δ.
Putting these together, we see that
N N such that m, n > N =
|
f (x
m
) f (x
n
)
|
< ϵ
Otherwise said,
f (x
n
)
is Cauchy.
10
We now see that a function f : I R is uniformly continuous on a bounded interval if and only if it
has a continuous extension g : I R defined on the closure of its domain.
Theorem 1.22. Suppose f : I R is continuous where I is a bounded interval with endpoints a < b.
Define g : [a, b] R via
g(x) =
f (x) if x I
lim f (x
n
) whenever (x
n
) I and x
n
a
lim f (x
n
) whenever (x
n
) I and x
n
b
Then f is uniformly continuous if and only g is well-defined (g is continuous, if well-defined).
Proof. () Suppose f is uniformly continuous on I and that a I. Let (x
n
), (y
n
) I be sequences
converging to a. To show that g is well-defined, we must prove that
f (x
n
)
and
f (y
n
)
are
convergent, and to the same limit. For this, we define a sequence
(u
n
) = (x
1
, y
1
, x
2
, y
2
, x
3
, y
3
, . . .)
Since (x
n
) and (y
n
) have the same limit a, we conclude that u
n
a. But then (u
n
) is Cauchy.
By Lemma 1.21,
f (u
n
)
is also Cauchy and thus convergent. Since
f (x
n
)
and
f (y
n
)
are
subsequences of a convergent sequence, they must also converge to the same (finite) limit.
The argument when b I is identical.
() If g is well-defined then it is continuous (Definition 1.6, part 1); by Theorem 1.20 it is uniformly
so. Since f = g on a subset of dom(g), the same choice of δ will work for f as for g: f is therefore
uniformly continuous.
Examples 1.23. 1. Consider f : x 7 x
2
.
(a) If dom( f ) is the open interval (3, 10), then f is uniformly continuous since its derivative
f
(x) = 2x is bounded (
|
f
(x)
|
20). The continuous extension is g(x) = x
2
on [3, 10].
(b) If dom( f ) is the infinite interval (3, ), then neither Theorem 1.19 nor 1.22 applies: both
f
and the domain (3, ) are unbounded.
Instead, note that if ϵ = 1, then for any δ > 0, we can choose x =
1
δ
and y =
1
δ
+
δ
2
. Clearly
|
x y
|
=
δ
2
< δ and
x
2
y
2
= 1 +
δ
2
4
> 1 = ϵ
whence f is not uniformly continuous.
2. f (x) = x sin
1
x
is continuous on the interval ( 0, ). Strictly, neither Theorem 1.19 nor 1.22 apply
since the derivative
f
(x) = sin
1
x
1
x
cos
1
x
is unbounded as is the domain. However, by breaking the domain into two pieces. . .
On [1, ), the derivative is bounded:
|
f
(x)
|
1 +
1
|
x
|
2 by the triangle inequality.
Theorem 1.19 says f is uniformly continuous on [1, ).
11
f is continuous on (0, 1] and, by the squeeze theorem
x
n
0
+
= lim f (x
n
) = 0
Extending f so that f (0) = 0 defines a continuous extension. By Theorem 1.22, f is uni-
formly continuous on ( 0, 1].
Putting this together (Exercise 6), f is uniformly continuous on (0, ). Indeed the function
h(x) =
(
x sin
1
x
if x = 0
0 if x = 0
is uniformly continuous on R.
Exercises 1.19. Key concepts: Uniform Continuity (same δ for all locations), Bounded gradient,
Continuous extensions
1. Which functions are uniformly continuous? Justify your answers.
(a) f (x) = x
4
on [1, 1] (b) f (x) = x
4
on (1, 1]
(c) f (x) = x
4
on ( 0, 2] (d) f (x) = x
4
on ( 1, 2]
(e) f (x) = x
2
sin
1
x
on ( 0, 1]
2. Prove that each function is uniformly continuous by verifying the ϵδ property.
(a) f (x) = 2x 14 on R (b) f (x) = x
3
on [1, 5]
(c) f (x) = x
1
on ( 1, ) (d) f (x) =
x+1
x+2
on [0, 1]
3. Prove that f (x) = x
4
is not uniformly continuous on R.
4. (a) Suppose f is uniformly continuous on a bounded interval I. Prove that f is bounded on I.
(b) Use part (a) to write down a bounded interval on which the function f (x) = tan x is
defined, but not uniformly continuous.
5. Both parts of this question are easy using Exercise 6. Do them explicitly using the ϵδ property.
(a) Let f (x) =
x with domain [0, ). Show that f
(x) is unbounded, but that f is still
uniformly continuous on [0, ).
(Hint: let δ = (
ϵ
2
)
3
and consider the cases x y 0, x y 0 and x > 0 > y separately)
(b) Prove that g(x) = x
1/3
is uniformly continuous on R.
(Hint: let δ = ϵ
2
and WLOG assume 0 y x. Now compute (
y + ϵ)
2
. . . )
6. Suppose f is uniformly continuous on intervals U
1
, U
2
for which U
1
U
2
is non-empty. Prove
that f is uniformly continuous on U
1
U
2
.
(Hint: if x, y do not lie in the same U
1
, U
2
, choose some a U
1
U
2
between x and y)
12
1.20 Limits of Functions
In elementary calculus you likely saw many calculations of the following form:
lim
x3
x
2
9
x 3
= lim
x3
(x 3)(x + 3)
x 3
= lim
x3
(x + 3) = 6
Loosely speaking, this means that if (x
n
) R \ {3} is a sequence converging to 3, then
f (x
n
)
converges to 6. Our goal in this section is to make this notation precise.
Definition 1.24. Suppose f : U R, that S U, and that a is the limit of some sequence in S.
We say that L is the limit of f (x) as x tends to a along S, written lim
xa
S
f (x) = L, provided
(x
n
) S, lim x
n
= a = lim f (x
n
) = L
We can now define one-sided and two-sided limits of functions:
Right-hand limit: lim
xa
+
f (x) = L means S = (a, b) U for which lim
xa
S
f (x) = L
Left-hand limit: lim
xa
f (x) = L means S = (c, a) U for which lim
xa
S
f (x) = L
Two-sided limit: lim
xa
f (x) = L means S = (c, a) (a, b) U for which lim
xa
S
f (x) = L
If U = dom( f ) is unbounded, then the one-sided definitions apply when a = ±. We omit the
± modifiers: for instance,
lim
x
f (x) = L lim
x
S
f (x) = L for some S = (c, ) U
Note that f need not be defined at a, though U = dom( f ) must contain at least some punctured
neighborhood of a (one-sided for a one-sided limit). This will certainly happen if U is a union
of intervals of positive length. In such a case, one may simply replace S with U \ {a} in the
definition: this is precisely what we did in the motivating example where U = R \ {3}.
Moreover, in such a situation, Definition 1.6 recovers a familiar idea from elementary calculus:
f is continuous at a U f (a) =
lim
xa
f (x) when a U
lim
xa
±
f (x) when a U \U
()
Warning! When dom( f ) does not contain a punctured neighborhood of a, the right hand side
doesn’t exist and the assertion is false!
By modifying the proof of Theorem 1.8 when a, L R are finite, we can restate using ϵ-
language. For instance, lim
xa
f (x) = L means
ϵ > 0, δ > 0 such that (x dom f ) 0 <
|
x a
|
< δ =
|
f (x) L
|
< ϵ
If a and/or L is infinite, use the language of unboundedness: e.g., lim
xa
f (x) = means
M > 0, δ > 0 such that 0 <
|
x a
|
< δ = f (x) > M
There are fifteen distinct combinations: three two-sided and six each of the one-sided limits!
13
Examples 1.25. 1. Let f (x) =
2+x
x
where dom( f ) = U = R \ {0} = (, 0) (0, )
The following should be clear:
lim
x3
f (x) =
5
3
lim
x
f (x) = 1
To compute the first, for instance, we could choose S = (0, 3) (3, ); if (x
n
) S and x
n
3,
then the limit laws justify the first claim
lim
n
f (x
n
) =
2 + 3
3
=
5
3
as does the fact that f is continuous at x = 3. The second claim can be checked similarly.
We can take one-sided limits at x = 0:
lim
x0
+
f (x) = and lim
x0
f (x) =
For instance, let (x
n
) (0, ) satisfy x
n
0. Again, the
limit laws show that lim
n
f (x
n
) = , which is enough to
justify the first claim.
Finally, the sequences defined by x
n
=
1
n
and y
n
=
1
n
both lie in S = R \ {0} and converge to zero, yet
lim
n
f (x
n
) = = = lim
n
f (y
n
)
It follows that the two-sided limit lim
x0
f (x) does not exist.
9
6
3
3
6
9
f (x)
2 1 1 2
x
x
1
y
1
x
2
y
2
f (x
n
)
f (y
n
)
2. Let f (x) =
1
x
2
whenever x = 0 and additionally let f (0) = 0. Here the two-sided limit exists
lim
x0
f (x) =
However the value of the function at x = 0 does not equal this limit: clearly f is discontinuous
at x = 0.
3. We revisit our motivating example. Let f (x) =
x
2
9
x3
have domain U = R \ {3}. Whenever
x
n
= 3, we see that
f (x
n
) =
(x
n
3)(x
n
+ 3)
x
n
3
= x
n
+ 3
By the limit laws, we conclude that lim f (x
n
) = 3 + 3 = 6 and so
lim
x3
x
2
9
x 3
= 6
14
Since we referenced the limit laws for sequences so often in the examples, it is appropriate to update
them to this new context. We do so without proof.
Corollary 1.26 (Limit Laws for functions). Suppose f , g : U R satisfy L = lim
xa
f (x) and M =
lim
xa
g(x) exist. Then,
1. lim
xa
( f + g) (x) = L + M.
2. lim
xa
( f g)(x) = LM.
3. lim
xa
f
g
(x) =
L
M
(requires M = 0).
4. If L R and h is continuous at L, then lim
xa
(h f )(x) = h(L).
5. (Squeeze Theorem) If L = M and f (x) h(x) g(x) for all x U, then lim
xa
h(x) = L.
The corresponding results for one-sided limits also hold.
As with the original limit laws for sequences, parts 1–3 apply provided the limits are not indeterminate
forms (e.g. , 0 · ,
0
0
,
). We’ll see later how l’H
ˆ
opital’s rule may be applied to such cases.
Examples 1.27. 1. Since f (x) =
x
2
+5
3x
2
2
is a rational function (continuous at all points of its domain),
we quickly conclude that
lim
x2
x
2
+ 5
3x
2
2
= f (2) =
9
10
Alternatively, we may tediously invoke the other parts of the theorem:
lim
x2
x
2
+ 5
3x
2
2
(3)
=
lim(x
2
+ 5)
lim( 3x
2
2)
(1)
=
lim x
2
+ lim 5
lim 3x
2
lim 2
(2)
=
(lim x)
2
+ 5
(lim 3)(lim x)
2
2
=
2
2
+ 5
3 ·2
2
2
=
9
10
2. As x , the simplistic approach results in a nonsense indeterminate form:
lim
x
x
2
+ 5
3x
2
2
?
=
lim(x
2
+ 5)
lim( 3x
2
2)
?
=
However, a little pre-theorem algebra quickly yields
4
lim
x
x
2
+ 5
3x
2
2
= lim
x
1 + 5x
2
3 2x
2
=
lim( 1 + 5x
2
)
lim( 3 2x
2
)
=
1
3
4
Be careful! The expressions
x
2
+5
3x
2
2
and
1+5x
2
32x
2
do not describe the same function, yet their limits at are equal. The
ease of equating these limits is one of the advantages of the S formulation in Definition 1.24. Think about why; what is
a suitable set S in this context?
15
Classification of Discontinuities
We now consider the ways in which a function can fail to be continuous.
Definition 1.28. Suppose that a function is continuous on an interval except at finitely many values:
we call these isolated discontinuities.
Examples 1.29. 1. f (x) =
1
x
has a discontinuity at x = 0 since it is continuous on the interval R,
except at one point x = 0. Note that a function need not be defined at a discontinuity!
2. f (x) =
1
sin
1
x
has a non-isolated discontinuity at x = 0: on any interval containing zero, f has
infinitely many discontinuities: x =
1
πn
where
|
n
|
N.
The next result helps us classify isolated discontinuities.
Theorem 1.30. Let f be defined on a punctured neighborhood of a R. Then
lim
xa
f (x) = L lim
xa
+
f (x) = L = lim
xa
f (x)
Proof. () Let S = (c, a) (a, b) satisfy the definition for lim
xa
f (x) = L. Since any sequence (say) in
S
+
is also in S, plainly S
+
= (a, b) and S
= (c, a) satisfy the one-sided definitions.
() Suppose S
= (c, a) and S
+
= (a, b) satisfy the one-sided definitions and denote S = S
S
+
.
Let (x
n
) S be such that x
n
a. Clearly (x
n
) is the disjoint union of two subsequences
(x
n
) S
+
and (x
n
) S
, both of which
5
converge to a. There are three cases:
L finite: Let ϵ > 0 be given. Because of the one-sided limits,
N
1
such that n > N
1
and x
n
> a =
|
f (x
n
) L
|
< ϵ
N
2
such that n > N
2
and x
n
< a =
|
f (x
n
) L
|
< ϵ
Now let N = max(N
1
, N
2
) in the definition of limit to see that lim f (x
n
) = L. Since this
holds for all sequences (x
n
) S converging to a, we conclude that lim
xa
f (x) = L.
L = ±: This is an exercise.
Example 1.31. Recalling elementary calculus, we show that the following is continuous at x = 1:
f (x) =
(
x
2
3 if x 1
3 5x if x < 1
Step 1: Compute the left- and right-handed limits and check that these are equal:
lim
x1
f (x) = lim
x1
3 5x = 2, lim
x1
+
f (x) = lim
x1
+
x
2
3 = 2
Step 2: Check that the value of the limits equals that of the function: f (1) = 1
2
3 = 2.
5
It is possible for one of these subsequences to be finite; say if x
n
> a for all large n. This is of no concern; one of the ϵ-N
conditions would be empty and thus vacuously true.
16
Recalling () on page 13, we describe the different types of isolated discontinuity at some point a.
Removable discontinuity The two-sided limit lim
xa
f (x) = L is fi-
nite, and either:
f (a) = L or f (a) is undefined.
The term comes from the fact that we can remove the discon-
tinuity by changing the behavior of f only at x = a:
˜
f (x) :=
(
f (x) if x = a
lim
xa
f (x) if x = a
is now continuous at x = a. In the pictures,
f
1
(x) =
x
2
9
x 3
and f
2
(x) =
(
x sin(
1
x
) if x = 0
1 if x = 0
have removable discontinuities at x = 3 and 0 respectively.
f
1
(x)
x
f
2
(x)
x
Jump Discontinuity The one-sided limits are finite but not equal. A
jump discontinuity cannot be removed by changing or insert-
ing a value at x = a. The picture shows
g(x) =
|
x
|
x
=
(
1 if x > 0
1 if x < 0
with a jump discontinuity at x = 0.
x
g(x)
Infinite discontinuity The one-sided limits exist but at least one is
infinite. We call the line x = a a vertical asymptote. The picture
shows
h(x) =
1
x
2
with an infinite discontinuity x = 0. The fact that the one-
sided limits of h are equal (and infinite) is irrelevant.
x
h(x)
Essential discontinuity At least one of the one-sided limits does
not exist. The picture shows j(x) = sin
1
x
for which neither of
the limits lim
x0
±
j(x) exist.
x
j(x)
It is also reasonable to refer to removable, infinite or essential discontinuities at interval endpoints.
17
Exercises 1.20. Key concepts: lim
xa
f (x) = L, ϵ, δ, M, N versions, Limit Laws, Discontinuities
1. Given f (x) =
x
3
|
x
|
, find lim
x
f (x), lim
x→−
f (x), lim
x0
f (x), lim
x0
+
f (x) and lim
x0
f (x), if they exist.
2. Evaluate the following limits using the methods of this section
(a) lim
xa
x
a
x a
(b) lim
xa
x
3/2
a
3/2
x a
(c) lim
x0
1 + 3x
2
1
x
2
(d) lim
x→−
4 + 3x
2
2
x
3. Suppose that the limits L = lim
xa
+
f (x) and M = lim
xa
+
g(x) exist.
(a) Suppose f (x) g(x) for all x in some interval (a, b). Prove that L M.
(b) Do we have the same conclusion if we have f (x) < g(x) on (a, b), or can we conclude that
L < M? Prove your assertion, or give a counter-example.
4. Suppose that lim
x
f (x) = lim
x
g(x) = . Using only this information, which of the following
can you always evaluate? Prove your assertions in each case.
(a) lim
x
( f + g) (x) (b) lim
x
( f g) (x) (c) lim
x
( f g)(x) (d) lim
x
( f /g)(x)
5. Complete the proof of Theorem 1.30 by considering the L = ± cases.
6. Graph f : R R, find and identify the types of its discontinuities.
f (x) =
0 x = 0, ±1
x
|
x
|
0 <
|
x
|
< 1
x
2
|
x
|
> 1
7. Find the discontinuities and identify their types for the following function
f (x) =
(
1
x
sin
1
x
if x < 0 or x > 1
1
x
if 0 < x 1
8. Verify the claim following Definition 1.24: lim
xa
f (x) = L if and only if
ϵ > 0, δ > 0 such that 0 <
|
x a
|
< δ =
|
f (x) L
|
< ϵ
9. Recall Exercise 1.17.6, where we saw that a function f : U R is continuous at any isolated
point a U.
(a) Any function with domain dom( f ) = Z is continuous everywhere! Explain why we
cannot define any limits lim
xa
(±)
f (x) for such a function.
(Hint: Being unable to define a limit is different from saying lim f (x) = DNE: see page 13.)
(b) Suppose g(x) = x
2
h(x) has dom(g) = {0} {
1
n
: n Z}, where h is any function taking
values in the interval [1, 1]. Explain why g is continuous at every point of its domain.
(These awkward examples of continuity can be avoided if we follow our usual approach where a domain
is a union of intervals of positive length. This restriction is essentially baked in to the Definition 1.24.)
18
2 Sequences and Series of Functions
If ( f
n
) is a sequence of functions, what should we mean by lim f
n
? This question is of huge relevance
to the history of calculus: Issac Newton’s work in the late 1600’s made great use of power series, which
are naturally constructed as limits of sequences of polynomials.
For instance, for each n N
0
, we might consider the polynomial function f
n
: R R defined by
f
n
(x) =
n
k=0
x
k
= 1 + x + ···+ x
n
This is easily differentiated and integrated using the power law. What, however, are we to make of
the series
f (x) :=
n=0
x
n
= 1 + x + x
2
+ ··· ?
Does this make sense as a function? What is its domain? Does it equal the limit of the sequence
( f
n
) in any meaningful way? Is it continuous, differentiable, integrable? If so, can we compute its
derivative or integral term-by-term: for instance, is it legitimate to write
f
(x) =
n=1
nx
n1
= 1 + 2x + 3x
2
+ ··· ?
To many in Newton’s time, such technical questions were less important than the application of cal-
culus to the natural sciences. For the 18
th
and 19
th
century mathematicians who followed, however,
the widespread application of calculus only increased the imperative to rigorously address these
issues.
2.23 Power Series
First we review some of the important definitions, examples and results concerning infinite series.
Definition 2.1. Let (b
n
)
n=m
be a sequence of real numbers. The (infinite) series
b
n
is the limit of the
sequence (s
n
) of partial sums,
s
n
=
n
k=m
b
n
= b
m
+ b
m+1
+ ···+ b
n
,
n=m
b
n
= lim
n
s
n
The series
b
n
is said to converge, diverge to infinity or diverge by oscillation
6
as does (s
n
).
b
n
is absolutely convergent if
|
b
n
|
converges. A convergent series that is not absolutely convergent
is conditionally convergent.
6
Recall that every sequence (s
n
) has subsequences tending to each of
lim sup s
n
= lim
N
sup{x
n
: n > N} and lim inf s
n
= lim
N
inf{x
n
: n > N}
If (s
n
) converges, or diverges to ±, then lim s
n
= lim sup s
n
= lim inf s
n
. The remaining case, divergence by oscillation,
is when lim inf s
n
= lim sup s
n
: there exist (at least) two subsequences tending to different limits.
19
Examples 2.2. These examples form the standard reference dictionary for analysis of more compli-
cated series. Make sure they are familiar!
7
1. (Geometric series) If r is constant, then s
n
=
n
k=0
r
k
=
1r
n+1
1r
. It follows that
n=0
r
n
converges (absolutely) to
1
1r
if 1 < r < 1
diverges to if r 1
diverges by oscillation if r 1
2. (Telescoping series) If b
n
=
1
n(n+1)
, then s
n
=
n
k=1
b
n
= 1
1
n+1
=
n=1
1
n(n+1)
= 1.
3.
n=1
1
n
2
is (absolutely) convergent. In fact
n=1
1
n
2
=
π
2
6
, though checking this explicitly is tricky.
4. (Harmonic series)
n=1
1
n
is divergent to .
5. (Alternating harmonic series)
n=1
(1)
n
n
is conditionally convergent.
Theorem 2.3 (Root Test). Given a series
b
n
, let β = lim sup
|
b
n
|
1/n
,
If β < 1 then the series converges absolutely.
If β > 1 then the series diverges.
7
We give sketch proofs or refer to a standard ‘test.’ Review these if you are unfamiliar.
1. s
n
rs
n
= 1 + r + ··· + r
n
(r + ··· + r
n
+ r
n+1
) = 1 r
n+1
= s
n
=
1r
n+1
1r
.
2. By partial fractions, b
n
=
1
n
1
n+1
= s
n
=
1
1
2
+
1
2
1
3
+ ··· +
1
n
1
n+1
= 1
1
n+1
.
3. Use the comparison or integral tests. Alternatively: For each n 2, we have
1
n
2
<
1
n(n1)
. By part 2,
s
n
=
n
k=1
1
k
2
< 1 +
n
k=1
1
k(k 1)
< 1 +
n=1
1
n(n 1)
= 2
Since (s
n
) is monotone-up and bounded above by 2, we conclude that
1
n
2
is convergent.
4. Use the integral test. Alternatively, observe that
s
2
n+1
s
2
n
=
2
n+1
k=2
n
1
1
k
2
n
2
n+1
=
1
2
= s
2
n
n
2
n
Since s
n
=
n
k=1
1
k
is monotone-up, we conclude that s
n
.
5. Use the alternating series test, or explicitly check that both the even and odd partial sums (s
2n
) and ( s
2n+1
) are
convergent (monotone and bounded) to the same limit (essentially the proof of the alternating series test).
Root Test: β < 1 = ϵ > 0 such that
|
b
n
|
1/n
1 ϵ (for large n) =
|
b
n
|
converges by comparison with
(1 ϵ)
n
.
β > 1 = some subsequence of (
|
b
n
|
1/n
) converges to β > 1 = b
n
0 =
b
n
diverges (n
th
-term test).
20
The root test is inconclusive if β = 1. Some simple inequalities
8
yield a test that is often easier to
apply.
Corollary 2.4 (Ratio Test). Given a series
b
n
:
If lim sup
b
n+1
b
n
< 1 then
b
n
converges absolutely.
If lim inf
b
n+1
b
n
> 1 then
b
n
diverges.
We are now ready to properly define and analyze our main objects of interest.
Definition 2.5. A power series centered at c R with coefficients a
n
R is a formal expression
n=m
a
n
(x c)
n
where x R is considered a variable. A power series is a function whose implied domain is the set
of x for which the resulting infinite series converges.
It is common to refer simply to a series, and modify by infinite/power only when clarity requires.
Almost always m = 0 or 1, and it is common for examples to be centered at c = 0.
Example 2.6. By the geometric series formula,
n=0
(1)
n
2
n
(x 4)
n
=
1
1
(x4)
2
=
2
x 2
whenever
x 4
2
< 1 2 < x < 6
The series is valid (converges) only on the subinterval (2, 6) of
the implied domain of the function x 7
2
x2
.
The behavior as x 2
+
is unsurprising, since evaluating the
power series results in the divergent infinite series
1 = +
By contrast, as x 6
we see that limits and infinite series do
not interact as we might expect,
lim
x6
n=0
(1)
n
2
n
(x 4)
n
= lim
x6
2
x 2
=
1
2
n=0
lim
x6
(1)
n
2
n
(x 4)
n
=
(1)
n
= DNE
with the last series being divergent by oscillation.
The example shows that we cannot blindly take limits inside an infinite sum; understanding precisely
when this is possible is one of our primary goals.
8
You should have encountered these previously: lim inf
b
n+1
b
n
lim inf
|
b
n
|
1/n
lim sup
|
b
n
|
1/n
lim sup
b
n+1
b
n
21
Radius and Interval of Convergence
The implied domain of the series in Example 2.6 turned out to be an interval (2, 6). Somewhat amaz-
ingly, the root test (Theorem 2.3) shows that the same is true for every power series!
Theorem 2.7 (Root Test for Power Series). Given a power series
a
n
(x c)
n
, define
9
R =
1
lim sup
|
a
n
|
1/n
The precisely one of the following statements holds:
R (0, ) the series converges absolutely when
|
x c
|
< R and diverges when
|
x c
|
> R
R = the series converges absolutely for all x R
R = 0 the series converges only at the center x = c
Proof. For each fixed x R, let b
n
= a
n
(x c)
n
and apply the root test to
b
n
, noting that
lim sup
|
b
n
|
1/n
=
lim sup
|
a
n
|
1/n
|
x c
|
=
1
R
|
x c
|
if R (0, )
0 if R = or x = c
if R = 0 and x = c
In the first situation, lim sup
|
b
n
|
1/n
< 1
|
x c
|
< R, etc.
Definition 2.8. The radius of convergence is the value R defined in Theorem 2.7. The interval of conver-
gence is the set of x R for which the series converges; its implied domain.
Radius of convergence Interval of convergence
R = 0, ( c R, c + R), (c R, c + R], [c R, c + R) or [c R, c + R]
R = (, )
0 {c}
In the first case, convergence/divergence at the endpoints of the interval of convergence must be
tested for separately.
The ratio test (Corollary 2.4) provides a more user-friendly version.
Corollary 2.9 (Ratio Test for Power Series). If the limit exists, R = lim
n
a
n
a
n+1
.
The ratio test is weaker than the root test: as Example 2.10.5 shows, there exist series for which the
ratio test is be inconclusive.
9
Since
|
a
n
|
0, we here adopt the conventions
1
0
= ,
1
= 0. With similar caveats, one can write R = lim inf
|
a
n
|
1/n
.
Since every sequence has a limit superior, this really is a definition. Whether one can easily compute R is another matter. . .
22
Examples 2.10. 1. The series
n=1
1
n
x
n
is centered at 0. The ratio test tells us that
R = lim
n
a
n
a
n+1
= lim
n
1/n
1/(n + 1)
= lim
n
n + 1
n
= 1
Test the endpoints of the interval of convergence separately:
x = 1
1
n
= diverges
x = 1
(1)
n
n
converges (conditionally)
We conclude that the interval of convergence is [1, 1).
It can be seen (later) that the series converges to ln(1 x)
on its interval of convergence. As in Example 2.6, this function
has a larger domain (, 1), than that of the series.
1
1
2
3
y
3 2 1 1
x
y =
n=0
1
n
x
n
y = ln(1 x)
2. The series
n=1
1
n
2
x
n
similarly has
R = lim
n
a
n
a
n+1
= lim
n
(n + 1)
2
n
2
= 1
Since
1
n
2
is absolutely convergent, we conclude that the power series also converges abso-
lutely at x = ±1; the interval of convergence is [1, 1].
3. The series
n=0
1
n!
x
n
converges absolutely for all x R, since
R = lim
n
a
n
a
n+1
= lim
n
(n + 1)!
n!
= lim
n
(n + 1) =
You should recall from elementary calculus that this series converges to the natural exponential
function exp(x) = e
x
everywhere on R; indeed this is one of the common definitions of the
exponential function.
4. The series
n=0
n!x
n
has R = lim
n!
(n+1)!
= 0. It therefore converges only at its center x = 0.
5. Let a
n
=
2
3
n
if n is even and
3
2
n
if n is odd. If we try to apply the ratio test to the series
n=0
a
n
x
n
, we see that
a
n
a
n+1
=
(
2
3
2n+1
if n even
3
2
2n+1
if n odd
= lim sup
a
n
a
n+1
= = 0 = lim inf
a
n
a
n+1
The ratio test is therefore inconclusive. However, by the root test,
|
a
n
|
1/n
=
(
2
3
if n even
3
2
if n odd
= R =
1
lim sup
|
a
n
|
1/n
=
1
3/2
=
2
3
It is easy to check that the series diverges at x = ±
2
3
; the interval of convergence is (
2
3
,
2
3
).
23
With the help of the root test the domain of a power series is fully understood. Limits, continuity,
differentiability and integrability are more delicate. We will return to these once we’ve developed
some of the ideas around convergence for sequences of functions.
Exercises 2.23. Key concepts: Power Series, Radius/interval of convergence R =
1
lim sup
|
a
n
|
1/n
1. For each power series, find the radius and interval of convergence:
(a)
(1)
n
n
2
4
n
x
n
(b)
(n + 1)
2
n
3
(x 3)
n
(c)
nx
n
(d)
1
n
n
(x + 7)
n
(e)
(x π)
n!
(f)
3
n
n
x
2n+1
2. For each n N let a
n
=
4+2(1)
n
5
n
(a) Find lim sup
|
a
n
|
1/n
, lim inf
|
a
n
|
1/n
, lim sup
a
n+1
a
n
and lim inf
a
n+1
a
n
.
(b) Does the series
a
n
converge? What about
(1)
n
a
n
? Why?
(c) Find the interval of convergence of the power series
a
n
x
n
.
3. Suppose that
a
n
x
n
has radius of convergence R. If lim sup
|
a
n
|
> 0, prove that R 1.
4. On the interval (
2
3
,
2
3
), express the series in Example 2.10.5 as a simple function.
(Hints: Use geometric series formulæ and the fact that the value of an absolutely convergent series is
independent of rearrangements)
5. Consider the power series
n=1
1
3
n
n
(x 7)
5n+1
=
1
3
(x 7) +
1
18
(x 7)
6
+
1
81
(x 7)
11
+ ···
Since only one in five of the terms are non-zero, it is a little tricky to analyze using a na
¨
ıve
application of our standard tests.
(a) Explain why the ratio test for power series (Corollary 2.9) does not apply.
(b) Writing the series as
a
m
(x 7)
m
, observe that
a
m
=
5
3
m1
5
(m1)
if m 1 mod 5
0 otherwise
Use the root test (Theorem 2.7) and your understanding of elementary limits to directly
compute the radius of convergence.
(c) Alternatively, write
1
3
n
n
(x 7)
5n+1
=
b
n
. Apply the ratio test for infinite series (Corol-
lary 2.4): what do you observe? Use your observation to compute the radius of conver-
gence of the original series in a simpler manner than part (a).
(d) Finally, check the endpoints to determine the interval of convergence.
24
2.24 Uniform Convergence
In this section we consider sequences ( f
n
) of functions f
n
: U R and their limits.
Example 2.11. For each n N, define f
n
: (0, 1) R : x 7 x
n
. Several examples are graphed.
0
1
f
1
(x)
0 1
x
0
1
f
2
(x)
0 1
x
0
1
f
5
(x)
0 1
x
0
1
f
50
(x)
0 1
x
There are several useful notions of convergence for sequences of functions. The simplest is where,
for each input x, ( f
n
(x)) is treated as a distinct sequence of real numbers.
Definition 2.12. Suppose a function f and a sequence of functions ( f
n
) are given, all with domain
U. We say that ( f
n
) converges pointwise to f on U if,
x U, lim
n
f
n
(x) = f (x)
It is common to write f
n
f pointwise.’ For reference, here are two equivalent rephrasings:
1. x U, lim
n
|
f
n
(x) f (x)
|
= 0;
2. x U, ϵ > 0, N such that n > N =
|
f
n
(x) f (x)
|
< ϵ.
As we’ll see shortly, the relative position of the quantifiers (x, N) is crucial: in this definition, the
value of N is permitted to depend on x as well as ϵ.
Example (2.11, mk. II). The sequence ( f
n
) converges pointwise on the domain U = (0, 1) to
f : (0, 1) R : x 7 0
As a sanity check, we prove this explicitly. First observe that
|
f
n
(x) f (x)
|
= x
n
Suppose x (0, 1), that ϵ > 0 is given, and let N =
ln ϵ
ln x
.
Then
n > N = n ln x < ln ϵ = x
n
< ϵ
where the inequality switches sign since ln x < 0.
0
1
f
n
(x)
0 1
x
···
The example is nice in that a sequence of continuous functions converges pointwise to a continuous
function. Unfortunately, this desirable situation is not universal. . .
25
Example (2.11, mk. III). Define
g
n
: (0, 1] R : x 7 x
n
Each g
n
is a continuous function, however its pointwise limit
g(x) =
(
0 if x < 1
1 if x = 1
has a jump discontinuity at x = 1.
0
1
g
n
(x)
0 1
x
···
We’d like the limit of a sequence of continuous functions to itself be continuous. With this goal in
mind, we make a tighter definition.
Definition 2.13. ( f
n
) converges uniformly to f on U if either
1. sup
xU
|
f
n
(x) f (x)
|
n
0, or,
2. ϵ > 0, N such that x U, n > N =
|
f
n
(x) f (x)
|
< ϵ
A common notation is f
n
f , though we won’t use it.
2ǫ
f (x)
f
n
(x)
As pictured, whenever n > N, the graph of f
n
(x) must lie between f (x) ± ϵ.
We’ll show that statements 1 and 2 are equivalent momentarily. For the present, compare with the
corresponding statements for pointwise convergence:
As with continuity versus uniform continuity, the distinction comes in the order of the quantifiers:
in uniform convergence, x is quantified after N and so the same N works for all locations x.
Uniform convergence implies pointwise convergence.
Example (2.11, mk. IV). For the final time we revisit our main example. If f
n
(x) = x
n
and f (x) = 0
are defined on U = (0, 1), then f
n
f non-uniformly. We show this using both criteria.
1. For every n,
sup
x(0,1)
|
f
n
(x) f (x)
|
= sup{x
n
: 0 < x < 1} = 1 0
which plainly fails to converge to zero.
2. Suppose the convergence were uniform and let ϵ =
1
2
. Then
N N such that x (0, 1), n > N = x
n
<
1
2
Since N N, a simple choice results in a contradiction;
x =
1
2
1
N+1
(0, 1) = x
N+1
=
1
2
0
1
1
ǫ
ǫ
x
26
Theorem 2.14. The criteria for uniform convergence in Definition 2.13 are equivalent.
Proof. (1 2) This follows immediately from the fact that
x U,
|
f
n
(x) f (x)
|
sup
xU
|
f
n
(x) f (x)
|
(2 1) Suppose ϵ > 0 is given. Then
N R such that x U, n > N =
|
f
n
(x) f (x)
|
<
ϵ
2
But then
n > N = sup
xU
|
f
n
(x) f (x)
|
ϵ
2
< ϵ
Amazingly, this subtle change of definition is enough to preserve continuity.
Theorem 2.15. Suppose ( f
n
) is a sequence of continuous functions. If f
n
f uniformly, then f is
continuous.
Proof. We demonstrate the continuity of f at a U. Let ϵ > 0 be given.
Since f
n
f uniformly,
N such that x U, n > N =
|
f (x) f
n
(x)
|
<
ϵ
3
Choose any n > N. Since f
n
is continuous at a,
δ > 0 such that
|
x a
|
< δ =
|
f
n
(x) f
n
(a)
|
<
ϵ
3
(†)
Put these together with the triangle inequality to see that
|
x a
|
< δ =
|
f (x) f (a)
|
|
f (x) f
n
(x)
|
+
|
f
n
(x) f
n
(a)
|
+
|
f
n
(a) f (a)
|
<
ϵ
3
+
ϵ
3
+
ϵ
3
= ϵ
We need not have fixed a at the start of the proof. Rewriting () to become
δ > 0 such that x, a U,
|
x a
|
< δ =
|
f
n
(x) f
n
(a)
|
<
ϵ
3
proves a related result.
Corollary 2.16. Suppose ( f
n
) is a sequence of uniformly continuous functions. If f
n
f uniformly,
then f is also uniformly continuous.
27
Examples 2.17. 1. Let f
n
(x) = x +
1
n
x
2
. This is continuous on R for all x, and converges pointwise
to the continuous function f : x 7 x.
(a) On any bounded interval [M, M] the convergence f
n
f is uniform,
sup
x[M,M]
|
f
n
(x) f (x)
|
= sup
1
n
x
2
: x [M, M]
=
M
2
n
n
0
(b) On any unbounded interval, R say, the convergence is non-uniform,
sup
xR
|
f
n
(x) f (x)
|
= sup
1
n
x
2
: x R
=
2. Consider f
n
(x) =
1
1+x
n
; this is continuous on
(1, ) and converges pointwise to
f (x) =
0 if x > 1
1
2
if x = 1
1 if 1 < x < 1
We consider the convergence f
n
f on several
intervals.
1
2
f
n
(x)
1 0 1 2 3
x
(a) On [2, ), the pointwise limit is continuous. Moreover, f
n
(x) is decreasing, whence
sup
x[2,)
|
f
n
(x) 0
|
=
1
1 + 2
n
n
0
and the convergence is uniform. Alternatively; if ϵ ( 0, 1), let N = log
2
(ϵ
1
1), then
x 2, n > N =
|
f
n
(x) 0
|
=
1
1 + x
n
1
1 + 2
n
<
1
1 + 2
N
= ϵ
The same argument shows that f
n
f uniformly on any interval [a, ) where a > 1.
(b) On [1, ) the convergence is not uniform, since the pointwise limit is discontinuous,
f (x) =
(
0 if x > 1
1
2
if x = 1
(c) The convergence is not even uniform on the open interval (1, ),
sup
x[1,)
|
f
n
(x) f (x)
|
= sup
1
1 + x
n
: x > 1
=
1
2
/
n
0
(d) Similarly, for any a (0, 1), the convergence f
n
f is uniform on [0, a], this time to the
(continuous) constant function f (x) = 1,
sup
x[0,a]
|
f
n
(x) 1
|
=
1
1
1 + a
n
=
a
n
1 + a
n
n
0
(e) Finally, on (1, 1) the convergence is not uniform,
sup
x[0,1)
|
f
n
(x) f (x)
|
= sup
x
n
1 + x
n
: x [0, 1)
=
1
2
/
n
0
28
Exercises 2.24. Key concepts: Pointwise & Uniform Convergence, Uniform conv preserves continuity
1. For each sequence of functions defined on [0, ):
(i) Find the pointwise limit f (x) as n .
(ii) Determine whether f
n
f uniformly on [0, 1].
(iii) Determine whether f
n
f uniformly on [1, ).
(a) f
n
(x) =
x
n
(b) f
n
(x) =
x
n
1 + x
n
(c) f
n
(x) =
x
n
n + x
n
(d) f
n
(x) =
x
1 + nx
2
(e) f
n
(x) =
nx
1 + nx
2
2. Let f
n
(x) =
x
1
n
2
. If f (x) = x
2
, we clearly have f
n
f pointwise on any domain.
(a) Prove that the convergence is uniform on [1, 1].
(b) Prove that the convergence is non-uniform on R.
3. For each sequence, find the pointwise limit and decide if the convergence is uniform.
(a) f
n
(x) =
1+2 cos
2
(nx)
n
for x R.
(b) f
n
(x) = cos
n
(x) on [π/2, π/2].
4. For each n N, consider the continuous function
f
n
: [0, 1] R : x 7 nx
n
(1 x)
(a) Given 0 x < 1, let a (x, 1). Explain why N such that
n > N =
|
f
n+1
(x)
|
a
|
f
n
(x)
|
Hence conclude that the pointwise limit of ( f
n
) is the zero function.
(b) Use elementary calculus ( f
n
(x) = 0 . . .) to prove that the maximum value of f
n
is
located at x
n
=
n
1+n
. Hence compute
sup
x[0,1]
|
f
n
(x) f (x)
|
and use it to show that the convergence f
n
0 is non-uniform.
This shows that the converse to Theorem 2.15 is false, even on a bounded interval: the continuous
sequence ( f
n
) converges non-uniformly to a continuous function. Sketches of several f
n
are below.
0
0 1
x
e
1
y = f
1
(x)
0
0 1
x
e
1
y = f
2
(x)
0
0 1
x
e
1
y = f
5
(x)
0
0 1
x
e
1
y = f
50
(x)
5. Explain where the proof of Theorem 2.15 fails if f
n
f non-uniformly.
29
2.25 More on Uniform Convergence
While we haven’t yet developed calculus, our familiarity with basic differentiation and integration
makes it natural to pause to consider the interaction of these concepts with sequences of functions.
We also consider a Cauchy-criterion for uniform convergence, which leads to the useful Weierstraß
M-test.
Example 2.18. Recall that f
n
(x) = x
n
converges uniformly to f (x) = 0 on any interval [0, a] where
a < 1. We easily check that
Z
a
0
f
n
(x) dx =
1
n + 1
a
n+1
n
0 =
Z
a
0
f (x) dx
In fact the sequence of derivatives converge here also
d
dx
f
n
(x) = nx
n1
n
0 = f
(x)
It is perhaps surprising that integration interacts more nicely with uniform limits than does differen-
tiation. We therefore consider integration first.
Theorem 2.19. Let f
n
f uniformly on [a, b] where the functions f
n
are integrable. Then f is
integrable on [a, b] and
lim
n
Z
b
a
f
n
(x)dx =
Z
b
a
f (x)dx
Proof. Given ϵ > 0, note that
R
b
a
ϵ
2(ba)
dx =
ϵ
2
. Since f
n
f uniformly, N such that
10
x [a, b] , n > N =
|
f
n
(x) f (x)
|
<
ϵ
2(b a)
= f
n
(x)
ϵ
2(b a)
< f (x) < f
n
(x) +
ϵ
2(b a)
=
Z
b
a
f
n
(x) dx
ϵ
2
Z
b
a
f (x) dx
Z
b
a
f
n
(x) dx +
ϵ
2
=
Z
b
a
f
n
(x) dx
Z
b
a
f (x) dx
ϵ
2
< ϵ
The appearance of uniform convergence in the proof is subtle. If N = N(ϵ) were allowed to depend
on x, then the integral
R
b
a
f
n
(x) dx would be meaningless: Which n would we consider? Larger than
N(x, ϵ) for which x? Taking n ‘larger than all the N(x, ϵ) might produce the absurdity n = !
10
This assumes f is already integrable. Once we’ve properly defined (Riemann/Darboux) integrability at the end of the
course, we can insert the following
Z
b
a
f
n
(x) dx
ϵ
2
L( f ) U( f )
Z
b
a
f
n
(x) dx +
ϵ
2
= 0 U( f ) L( f ) ϵ = U( f ) = L( f )
where U( f ) and L( f ) are the upper and lower Darboux integrals of f ; equality shows that f is integrable on [a, b].
30
Examples 2.20. 1. Uniform convergence is not required for the integrals to converge as we’d like. For
instance, recall that extending the previous example to the domain [0, 1] results in non-uniform
convergence; however, we still have
Z
1
0
f
n
(x) dx =
1
n + 1
n
0 =
Z
1
0
f (x) dx
2. To obtain a sequence of functions f
n
f for which
R
f
n
R
f requires a bit of creativity.
Consider the sequence
f
n
: [1, 1] R : x 7
(
n n
2
x if 0 < x <
1
n
0 otherwise
If 0 < x < 1, then for large n N we have
x
1
n
= f
n
(x) = 0
1 1
x
f
n
(x)
1
n
n
We conclude that f
n
0 pointwise. Since the area under f
n
is a triangle with base
1
n
and height
n, the integral is constant and non-zero;
Z
1
1
f
n
(x) dx =
1
2
= 0 =
Z
1
1
f (x) dx
It should be obvious that the convergence f
n
0 is non-uniform; why?
Derivatives and Uniform Limits We’ve already seen that a uniform limit of differentiable functions
might be differentiable (Example 2.18). As the next example shows, this should not be expected in
general, since even uniform limits of differentiable functions can have corners.
Example 2.21. For each n N, consider the function
f
n
: R R : x 7
(
|
x
|
if
|
x
|
1
n
n
2
x
2
+
1
2n
if
|
x
|
<
1
n
Each f
n
is differentiable: f
n
(x) =
1 if x
1
n
nx if
|
x
|
<
1
n
1 if x
1
n
f
n
converges pointwise to f (x) =
|
x
|
, which is non-
differentiable at x = 0.
f
n
f uniformly since
sup
x[1,1]
|
f
n
(x) f (x)
|
= f
n
(0) =
1
2n
0
1
f
n
(x)
1 0 1
x
1
1
f
n
(x)
1 1
x
1
n
1
n
31
If our goal is to transfer differentiability to the limit of a sequence of functions, then we have some
work to do.
Theorem 2.22. Suppose ( f
n
) is a sequence and f , g functions, all with domain [a, b]. Suppose also:
f
n
f pointwise;
Each f
n
is differentiable with continuous derivative;
11
f
n
g uniformly.
Then f
n
f uniformly on [a, b] and f is differentiable with derivative g.
The issue in the previous example is that the pointwise limit of the derived sequence ( f
n
) is discontin-
uous at x = 0 and therefore f
n
g isn’t uniform!
Proof. For any x [a, b] , the fundamental theorem of calculus (part II) tells us that
Z
x
a
f
n
(t) dt = f
n
(x) f
n
(a)
As n , Theorem 2.19 says the left side converges to
R
x
a
g(t) dt and the right to f (x) f (a) (both
pointwise). Since f
n
g uniformly, we see that g is continuous and can apply the fundamental
theorem (part I):
R
x
a
g(t) dt = f (x) f (a) is differentiable with derivative g.
The uniformity of the convergence f
n
f follows from Exercise 10.
Uniformly Cauchy Sequences and the Weierstraß M-Test
Recall that one may use Cauchy sequences to demonstrate convergence without knowing the limit in
advance. An analogous discussion is available for sequences of functions.
Definition 2.23. A sequence of functions ( f
n
) is uniformly Cauchy on U if
ϵ > 0, N N such that x U, m, n > N =
|
f
n
(x) f
m
(x)
|
< ϵ
Example 2.24. Let f
n
(x) =
n
k=1
1
k
2
sin k
2
x be defined on R. Given ϵ > 0, let N =
1
ϵ
, then
m > n > N =
|
f
m
(x) f
n
(x)
|
=
m
k=n+1
1
k
2
sin k
2
x
m
k=n+1
1
k
2
m
k=n+1
1
k(k 1)
=
m
k=n+1
1
k 1
1
k
=
1
n
1
m
<
1
N
= ϵ
whence ( f
n
) is uniformly Cauchy.
11
Without this continuity assumption, the fundamental theorem of calculus doesn’t apply and the proof requires an
alternative approach. One can also weaken the hypotheses: if f
n
g uniformly and ( f
n
(x)) converges for at least one
x [a, b], then there exists f such that f
n
f is uniform and f
= g.
32
As with sequences of real numbers, uniformly Cauchy sequences converge; in fact uniformly!
Theorem 2.25. A sequence ( f
n
) is uniformly Cauchy on U if and only if it converges uniformly to
some f : U R.
Proof. () Let ( f
n
) be uniformly Cauchy on U. For each x U, the sequence ( f
n
(x)) R is Cauchy
and thus convergent. Define f : U R to be the pointwise limit:
f (x) := lim
n
f
n
(x)
We claim that f
n
f uniformly. Let ϵ > 0 be given, then N N such that
m > n > N =
|
f
n
(x) f
m
(x)
|
<
ϵ
2
= f
n
(x)
ϵ
2
< f
m
(x) < f
n
(x) +
ϵ
2
= f
n
(x)
ϵ
2
f (x) f
n
(x) +
ϵ
2
(take limits as m )
=
|
f
n
(x) f (x)
|
ϵ
2
< ϵ
() This is Exercise 2.
Example (2.24, mk. II). Since ( f
n
) is uniformly Cauchy on R, it converges uniformly to some f :
R R. It seems reasonable to write
f (x) =
n=1
1
n
2
sin n
2
x
The graph of this function looks somewhat bizarre:
1
1
f (x)
x
2ππ2π π
Since each f
n
is (uniformly) continuous, Theorem 2.15 says that f is also (uniformly) continuous. By
Theorem 2.19, f (x) is integrable, indeed
Z
b
a
f (x) dx = lim
n
n
k=1
k
4
cos k
2
x
b
a
=
n=1
1
n
4
(cos n
2
a cos n
2
b)
which converges (comparison test) for all a, b. By contrast, the derived sequence
f
n
(x) =
n
k=1
cos k
2
x
does not converge for any x since lim
n
cos n
2
x = 0. We should therefore expect (though we offer no
proof) that f is nowhere differentiable.
33
The example generalizes. Suppose (g
k
) is a sequence of functions on U and define the series
g
k
(x)
as the pointwise limit of the sequence ( f
n
) of partial sums
k=k
0
g
k
(x) := lim
n
f
n
(x) where f
n
(x) =
n
k=k
0
g
k
(x)
whenever the limit exists. The series is said to converge uniformly whenever ( f
n
) does so. Theorems
2.15, 2.19 and 2.22 immediately translate.
Corollary 2.26. Let
g
k
be a series of functions converging uniformly on U. Then:
1. If each g
k
is (uniformly) continuous then
g
k
is (uniformly) continuous.
2. If each g
k
is integrable, then
R
g
k
(x) dx =
R
g
k
(x) dx.
3. If each g
k
is continuously differentiable, and the sequence of derived partial sums f
n
converges
uniformly, then
g
k
is differentiable and
d
dx
g
k
(x) =
g
k
(x).
As an application of the uniform Cauchy criterion, we obtain an easy test for uniform convergence.
Theorem 2.27 (Weierstraß M-test). Suppose (g
k
) is a sequence of functions on U. Moreover assume:
1. (M
k
) is a non-negative sequence such that
M
k
converges.
2. Each g
k
is bounded by M
k
; that is
|
g
k
(x)
|
M
k
.
Then
g
k
(x) converges uniformly on U.
Proof. Let f
n
(x) =
n
k=k
0
g
k
(x) define the sequence of partial sums. Since
M
k
converges, its sequence
of partial sums is Cauchy (the Cauchy criterion for infinite series); given ϵ > 0,
N such that m > n > N =
m
k=n+1
M
k
< ϵ
However, by assumption,
m > n > N =
|
f
m
(x) f
n
(x)
|
=
m
k=n+1
g
k
(x)
m
k=n+1
|
g
k
(x)
|
m
k=n+1
M
k
< ϵ
The sequence of partial sums is uniformly Cauchy and thus uniformly convergent.
Example 2.28. Given the series
n=1
1+cos
2
(nx)
n
2
sin(nx), we clearly have
1 + cos
2
(nx)
n
2
sin(nx)
2
n
2
for all x R
Since
2
n
2
converges, the M-test shows that the original series converges uniformly on R.
34
Exercises 2.25. Key concepts: Uniform convergence preseves integration, Uniform Cauchyness, M-test
1. For each n N, let f
n
(x) = nx
n
when x [0, 1) and f
n
(1) = 0.
(a) Prove that f
n
0 pointwise on [0, 1].
(Hint: recall Exercise 2.24.4 if you’re not sure how to prove this)
(b) By considering the integrals
R
1
0
f
n
(x) dx show that f
n
0 is not uniform.
2. Prove that if f
n
f uniformly, then the sequence ( f
n
) is uniformly Cauchy.
3. (a) Suppose ( f
n
) is a sequence of bounded functions on U and suppose that f
n
f converges
uniformly on U. Prove that f is bounded on U.
(b) Give an example of a sequence of bounded functions ( f
n
) converging pointwise to f on
[0, ), but for which f is unbounded.
4. The sequence defined by f
n
(x) =
nx
1+nx
2
(Exercise 2.24.1) converges uniformly on any closed
interval [a, b] where 0 < a < b.
(a) Check explicitly that
R
b
a
f
n
(x) dx
R
b
a
f (x) dx, where f = lim f
n
.
(b) Is the same thing true for derivatives?
5. Let f
n
(x) = n
1
sin n
2
x be defined on R.
(a) Prove that f
n
converges uniformly on R.
(b) Check that
R
x
0
f
n
(t) dt converges for any x R.
(c) Does the derived sequence ( f
n
) converge? Explain.
6. Use the M-test to prove that
n=1
x
n
n
2
defines a continuous function on [1, 1].
7. Prove that
n=1
x
n
sin x
(n+1)
3
2
n
converges uniformly to a continuous function on the interval [2, 2] .
8. Prove that if
g
k
converges uniformly on a set U and if h is a bounded function on U, then
hg
k
converges uniformly on U.
(Warning: you cannot simply write
hg
k
= h
g
k
)
9. Consider Example 2.20.2.
(a) Check explicitly that the convergence isn’t uniform by computing sup
x[1,1]
|
f
n
(x) f (x)
|
(b) Prove that f
n
0 pointwise on (0, 1] using the ϵN definition of convergence: that is,
given ϵ > 0 and x (0, 1], find an explicit N(x, ϵ) such that
n > N =
|
f (x)
|
< ϵ
What happens to your choice of N(x, ϵ) as x 0
+
?
10. Suppose ( f
n
) converges uniformly on [a, b] and that each f
n
is continuous.
(a) Use the fact that ( f
n
) is uniformly Cauchy to prove that ( f
n
) is uniformly Cauchy and thus
converges uniformly to some function f .
(Hint:
|
f
n
(x) f
m
(x)
|
=
R
x
a
f
n
(t) f
m
(t) dt
. . .)
(b) Explain why we need not have assumed the existence of f in Theorem 2.22.
35
2.26 Differentiation and Integration of Power Series
We now specialize our recent results to power series. While everything will be stated for series
centered at x = 0, all are easily translated to arbitrary centers.
Theorem 2.29. Let
a
n
x
n
be a power series with radius of convergence R > 0 and let T (0, R).
Then:
1. The series converges uniformly on [T, T].
2. The series is uniformly continuous on [T, T] and continuous on (R, R).
Proof. This is a straightforward application of the Weierstraß M-test (Theorem 2.27). For each k,
define M
k
=
|
a
k
|
T
k
, and observe that
T < R =
a
n
T
n
converges absolutely =
M
k
converges
By the M-test and Corollary 2.26, the power series converges uniformly on [T, T] to a uniformly
continuous function.
Finally, every x (R, R) lies in some such interval (take T =
|
x
|
), whence the power series is
continuous on (R, R).
Example 2.30. On its interval of convergence (1, 1), the geometric series
n=0
x
n
converges pointwise
to
1
1x
; convergence is uniform on any interval [T, T] (1, 1).
We needn’t use the Theorem for this is simple to verify directly: writing f , f
n
for the series and its
partial sums,
|
f
n
(x) f (x)
|
=
1 x
n+1
1 x
1
1 x
=
x
n+1
1 x
= sup
x[T,T]
|
f
n
(x) f (x)
|
=
T
n+1
1 T
n
0
By contrast, the convergence is non-uniform on (1, 1), since
sup
x(1,1)
|
f
n
(x) f (x)
|
=
Theorem 2.31. Suppose a power series
a
n
x
n
has radius of convergence R > 0. Then the series is
integrable and differentiable term-by-term on the interval (R, R). Indeed for any x (R, R),
d
dx
n=0
a
n
x
n
=
n=1
na
n
x
n1
and
Z
x
0
n=0
a
n
t
n
dt =
n=0
a
n
n + 1
x
n+1
where both series also have radius of convergence R.
36
Proof. Let f (x) =
a
n
x
n
have radius of convergence R, and observe that
lim sup
|
na
n
|
1/n
= lim n
1/n
lim sup
|
a
n
|
1/n
=
1
R
whence
na
n
x
n
also has radius of convergence R. At any given non-zero x (R, R), we may write
n=1
na
n
x
n1
= x
1
n=1
na
n
x
n
to see that the derived series also has radius of convergence R. On any interval [T, T] (R, R), the
derived series converges uniformly (Theorem 2.29). Since each a
n
x
n
is continuously differentiable,
Corollary 2.26 says that f is differentiable on [T, T] and that
f
(x) =
n=0
d
dx
a
n
x
n
=
n=1
na
n
x
n1
Since any x ( R, R) lies in some such interval [T, T], we are done.
Exercise 7 discusses the corresponding result for integration.
We postpone the canonical examples until after the next result.
Continuity at Endpoints?
There is one small hole in our analysis. A series
a
n
x
n
with radius of convergence R converges and
is continuous on (R, R). But what if it also converges at x = ±R? Is the series continuous at the
endpoints? The answer is yes, though demonstrating this small benefit requires a lot of work!
Theorem 2.32 (Abel’s Theorem). Power series are continuous on their full interval of convergence.
Examples 2.33. 1. Apply our results to the geometric series;
1
(1 x)
2
=
d
dx
1
1 x
=
n=1
nx
n1
=
n=0
(n + 1)x
n
= 1 + 2x + 3x
2
+ 4x
3
+ ···
ln( 1 x) =
Z
x
0
1
1 t
dt =
n=0
1
n + 1
x
n+1
=
n=1
1
n
x
n
=
x +
1
2
x
2
+
1
3
x
3
+ ···
where both are valid on (1, 1). In fact the first series has exactly this interval of convergence,
whereas the second has [1, 1). By Abel’s Theorem and the fact that logarithms are continuous,
we have equality at x = 1 and recover the famous identity
ln 2 =
n=1
(1)
n+1
n
= 1
1
2
+
1
3
1
4
+ ···
This also shows that while integrated and differentiated series have the same radius of conver-
gence as the original, convergence at the endpoints need not be the same.
37
2. Substitute x 7 x
2
in the geometric series and integrate term-by-term: if
|
x
|
< 1, then
1
1 + x
2
=
n=0
(1)
n
x
2n
= arctan x =
n=0
(1)
n
2n + 1
x
2n+1
In fact the arctangent series also converges at x = ±1; Abel’s Theorem says it is continuous on
[1, 1]. Since arctangent is continuous (on R!) we recover another famous identity
π
4
= arctan 1 =
n=0
(1)
n
2n + 1
= 1
1
3
+
1
5
1
7
+ ···
As with the identity for ln 2, this is a very slowly converging alternating series and therefore
doesn’t provide an efficient method for approximating π.
3. The series f (x) =
n=0
(1)
n
(2n)!
x
2n
has radius of convergence . Differentiate to obtain
f
(x) =
n=1
(1)
n
x
2n1
(2n 1)!
=
n=0
(1)
n+1
(2n + 1)!
x
2n+1
This series is also valid for all x R. Differentiating again,
f
′′
(x) =
n=0
(1)
n+1
(2n)!
x
2n
=
n=0
(1)
n
(2n)!
x
2n
= f (x)
Recalling that f (x) = cos x is the unique solution to the initial value problem
(
f
′′
(x) = f (x)
f (0) = 1, f
(0) = 0
We conclude that, x R,
cos x =
n=0
(1)
n
(2n)!
x
2n
sin x = f
(x) =
n=0
(1)
n
(2n + 1)!
x
2n+1
These last expressions can be taken as the definitions of sine and cosine. As promised earlier, con-
tinuity and differentiability of these functions now come for free! The only real downside of this
definition is believing that it has anything to do with right-triangles!
We can similarly define other common transcendental functions using power series: for instance
exp(x) =
n=0
1
n!
x
n
Example 2.33.1 could also be taken as a definition of the logarithm on the interval ( 0, 2],
ln x = ln(1 (1 x)) =
n=1
1
n
(1 x)
n
=
n=1
(1)
n+1
n
(x 1)
n
though this is unnecessary since it is more natural to define ln as the inverse of the exponential.
38
Proof of Abel’s Theorem (non-examinable)
This requires a lot of work, so feel free to omit on a first reading!
First observe that there is nothing to check unless 0 < R < . By the change of variable x 7 ±
x
R
, it
is enough for us to prove the following:
n=0
a
n
convergent and f (x) =
n=0
a
n
x
n
on (1, 1) = lim
x1
f (x) =
n=0
a
n
Proof. Let s
n
=
n
k=0
a
k
and write s = lim s
n
=
a
n
. It is an easy exercise to check that
n
k=0
a
k
x
k
= s
n
x
n
+ (1 x)
n1
k=0
s
k
x
k
If
|
x
|
< 1, then (since s
n
s) lim s
n
x
n
= 0, whence we obtain
x (1, 1), f (x) = (1 x)
n=0
s
n
x
n
Let ϵ (0, 1) be given and fix x (0, 1). Then
N N such that n > N =
|
s
n
s
|
<
ϵ
2
()
Use the geometric series formula
n=0
x
n
=
1
1x
and write h(x) = (1 x)
N
n=0
(s
n
s)x
n
to observe
|
f (x) s
|
=
(1 x)
n=0
s
n
x
n
s
=
(1 x)
n=0
s
n
x
n
s(1 x)
n=0
x
n
=
(1 x)
n=0
(s
n
s)x
n
= (1 x)
N
n=0
(s
n
s)x
n
+
n=N+1
(s
n
s)x
n
(1 x)
N
n=0
(s
n
s)x
n
+ (1 x)
n=N+1
(s
n
s)x
n
(-inequality)
< h(x) +
ϵ
2
(1 x)
n=N+1
x
n
(by ())
h(x) +
ϵ
2
Since h > 0 is continuous and h(1) = 0, δ > 0 such that x (1 δ, 1) = h(x) <
ϵ
2
(the computa-
tion of a suitable δ is another exercise).
We conclude that lim
x1
f (x) = s.
39
Exercises 2.26. Key concepts: Power series continuous on ( R, R), uniformly on any [T, T] (R, R),
Power series differentiable and integrable term-by-term on (R, R)
1. (a) Prove that
n=1
nx
n
=
x
(1x)
2
for
|
x
|
< 1.
(b) Evaluate
n=1
n
2
n
,
n=1
n
4
n
and
n=1
(1)
n
n
4
n
2. (a) Starting with a power series centered at x = 0, evaluate the integral
R
1/2
0
1
1+x
4
dx as an
infinite series.
(b) (Harder) Repeat part (a) but for
R
1
0
1
1+x
4
dx. What extra ingredients do you need?
3. The probability that a standard normally distributed random variable X lies in the interval [a, b]
is given by the integral
P(a X b) =
1
2π
Z
b
a
exp
x
2
2
dx
Find P(1 X 1) as an infinite series.
4. If f (x) =
n=0
(1)
n
(2n)!
x
2n
is defined as in Example 2.33.3, prove that
f (x)
2
+
f
(x)
2
= 1.
What does (the converse of) Pythagoras’ Theorem say about f (x), at least when both it and
f
(x) are positive?
(Hint: Differentiate and evaluate at zero!)
5. Define c(x) =
n=0
x
2n
(2n)!
and s(x) =
n=0
x
2n+1
(2n+1)!
.
(a) Prove that c
(x) = s(x) and that s
(x) = c(x).
(b) Prove that c(x)
2
s(x)
2
= 1 for all x R.
(These functions are the hyperbolic sine and cosine: s(x) = sinh x and c(x) = cosh x)
6. Let a, b (1, 1). Extending Example 2.30, show that the convergence
x
n
=
1
1x
is non-
uniform on any interval of the form (1, a) or (b, 1).
7. Prove the integration part of Theorem 2.31.
8. Prove or disprove: If a series converges absolutely at the endpoints of its interval of convergence
then its convergence is uniform on the entire interval.
9. Complete the proof of Abel’s Theorem:
(a) Let s
n
=
n
k=0
a
k
be the partial sum of the series
a
n
. For each n, prove that,
n
k=0
a
k
x
k
= s
n
x
n
+ (1 x)
n1
k=0
s
k
x
k
(b) Suppose x > 0. Let S = max{
|
s
n
s
|
: n N} and prove that h(x) S(1 x
N+1
). Hence
find an explicit δ that completes the final step.
40
2.27 The Weierstraß Approximation Theorem
A major theme of analysis is approximation; for instance power series are an example of (uniform)
approximation by polynomials. It is reasonable to ask whether any function can be so approximated.
In 1885, Weierstraß answered a specific case in the affirmative.
Theorem 2.34 (Weierstraß). If f : [a, b] R is continuous, then there exists a sequence of polyno-
mials converging uniformly to f on [a, b].
Suitable polynomials can be defined in various ways. By scaling the domain, it is enough to do this
on [a, b] = [0, 1] where perhaps the simplest approach is via the Bernstein Polynomials,
B
n
f (x) :=
n
k=0
n
k
f
k
n
x
k
(1 x)
nk
(
(
n
k
)
=
n!
k!(nk)!
is the binomial coefficient)
We omit the proof due to length; Weierstraß’ original argument was completely different. Instead we
compute a couple of examples and give an important interpretation/application.
Examples 2.35. 1. Suppose f (x) = 2x if x <
1
2
and f (x) = 1 otherwise.
B
1
f (x) = f ( 0)(1 x) f (0) + f (1)x = x
B
2
f (x) = f ( 0)(1 x)
2
+ 2 f (
1
2
)x(1 x) + f (1)x
2
= 2x(1 x) + x
2
= x(2 x)
0
1
0 1
B
3
f (x) = f ( 0)(1 x)
3
+ 3 f (
1
3
)x(1 x)
2
+ 3 f (
2
3
)x
2
(1 x) + f (1)x
3
= 0(1 x)
3
+ 2x(1 x)
2
+ 3x
2
(1 x) + x
3
= x(2 x) = B
2
f (x)
B
4
f (x) = 0(1 x)
4
+ 2x(1 x)
3
+ 6x
2
(1 x)
2
+ 4x
3
(1 x) + x
4
= x(x
3
2x
2
+ 2)
The Bernstein polynomials B
2
f (x), B
4
f (x) and B
50
f (x) are drawn.
2. Now assume f (x) = x if x <
1
2
and f (x) = 1 x otherwise.
B
1
f (x) = f ( 0)(1 x) + f (1)x = 0
B
2
f (x) = x(1 x)
B
3
f (x) = 0(1 x)
3
+ x(1 x)
2
+ x
2
(1 x) + 0x
3
= x(1 x) = B
2
f (x)
0
0 1
1
2
B
4
f (x) = f ( 0)(1 x)
4
+ f (
1
4
)·4x(1 x)
3
+ f (
1
2
)·6x
2
(1 x)
2
+ f (
3
4
)·4x
3
(1 x) + f (1)x
4
= x(1 x)
3
+ 3x
2
(1 x)
2
+ x
3
(1 x)
= x(1 x)(1 + x x
2
)
41
B´ezier curves (just for fun!)
The Bernstein polynomials arise naturally when con-
sidering B´ezier curves. These have many applications,
particularly in computer graphics. Given three points
A, B, C, define points on the line segments
AB and
BC
for each t [0, 1], via
AB(t) = (1 t)A + tB
BC(t) = (1 t)B + tC
These points move at a constant speed along the cor-
responding segments. Now consider a point on the
moving segment between the points defined above:
0
y
0 1
x
A
B
C
AB
BC
1
2
R(t) := (1 t)
AB(t) + t
BC(t) = (1 t)
2
A + 2t(1 t)B + t
2
C
This is the quadratic B´ezier curve with control points A, B, C. The 2
nd
Bernstein polynomial for a function
f is simply the quadratic B
´
ezier curve with control points
(
0, f (0)
)
,
1
2
, f (
1
2
)
and
(
1, f (1)
)
. The
picture
12
above shows B
2
f (x) for the above example.
We can repeat the construction with more control points: with four points A, B, C, D, one constructs
AB(t),
BC(t),
CD(t), then the second-order points between these, and finally the cubic B
´
ezier curve
R(t) : = (1 t)
(1 t)
AB(t) + t
BC(t)
+ t
(1 t)
BC(t) + t
CD(t)
= (1 t)
3
A + 3t(1 t)
2
B + 3t
2
(1 t)C + t
3
D
where we now recognize the relationship to the 3
rd
Bernstein polynomial.
0
1
y
0
1
x
A
B
C
D
A
B
C
D
The pictures show cubic B
´
ezier curves: the first is the graph of the Bernstein polynomial
B
3
f (x) = 0(1 x)
3
+ 3x(1 x)
2
+ 3x
2
(1 x) +
2
3
x
3
while the second is for the four given control points A, B, C, D.
12
Click on any of the pictures to see all of them move.
42
Exercises 2.27. Key concepts: Every continuous function is the uniform limit of a polynomial sequence
1. Show that the closed bounded interval assumption in the approximation theorem is required
by giving an example of a continuous function f : (1, 1) R which is not the uniform limit
of a sequence of polynomials.
2. If g : [a, b] R is continuous, then f (x) := g
(b a)x + a
is continuous on [0, 1]. If P
n
f
uniformly on [0, 1], prove that Q
n
g uniformly on [a, b], where
Q
n
(x) = P
n
x a
b a
3. Use the binomial theorem to check that every Bernstein polynomial for f (x) = x is B
n
f (x) = x
itself!
4. Find a parametrization of the cubic B
´
ezier curve with control points (1, 0), (0, 1), (1, 0) and
(0, 1). Now sketch the curve.
(Use a computer algebra package if you like!)
5. (Hard) Show that the Bernstein polynomials for f (x) = x
2
are given by
B
n
f (x) =
1
n
x +
n 1
n
x
2
and thus verify explicitly that B
n
f f uniformly.
43
3 Differentiation
Differentiation grew out of the problem of instantaneous velocity. Velocity can only easily be measured
as an average over a time interval:
13
if an object travels d meters in t seconds, then its average
velocity is v
av
=
d
t
ms
1
. An early ‘definition’ (dating to the 1300s) makes the instantaneous velocity
equal to the constant velocity that would be observed if a body were to stop accelerating: while
useless for the purposes of measurement, this is essentially Newton’s first law regarding inertial
motion (1687). We also see the concept of the tangent line beginning to appear. Indeed if one graphs
position against time, intuition tells us:
The graph of inertial (constant speed) motion is a straight line whose slope is the velocity.
The tangent line to a curve has slope equal to the instantaneous velocity.
The problem of finding, defining and computing instantaneous velocity thus morphed into the con-
sideration of tangent lines to curves. With the advent of analytic geometry in the early 1600s, math-
ematicians such as Fermat and Descartes pioneered versions of the familiar secant (‘cutting’) line
method for computing tangents.
d
t
a
t
d
v =
d
t
d
t
a
t
Instantaneous velocity equals constant
velocity corresponding to tangent line
Secant lines approximate tangent line as t a
The average velocity of the particle over the time interval [a, t] is the slope of the secant line, namely
v
av
(a, t) =
d(t) d(a)
t a
Since the secant lines approximate the tangent line as t approaches a, it seems reasonable that we
should compute the instantaneous velocity in this manner:
v(a) = lim
ta
v
av
(a, t) = lim
ta
d(t) d(a)
t a
This is, of course, the modern definition of the derivative.
13
Even a modern technique such as Doppler-shift compares measurements separated by the extremely small period of a
light or soundwave. These are still therefore average velocities, albeit taken over very small time intervals.
44
3.28 Basic Properties of the Derivative
Definition 3.1. Let f : U R and a U
an interior point. We say that f is differentiable at a if the
following limit exists (is finite!)
lim
xa
f (x) f (a)
x a
We call this limit the derivative of f at a and denote its value by
d f
dx
x=a
or f
(a).
If f
(a) exists for all a U then f is differentiable (on U); the derivative becomes a function f
(x) =
d f
dx
.
The two notations are partly attributable to the primary founders of calculus: Issac Newton and
Gottfried Leibniz. Each has its pros and cons and you should be comfortable with both.
One-sided derivatives Differentiability only makes sense at interior points of U since the defining
limit is two-sided. Left- and right-derivatives may be defined using one-sided limits; differentiability
is then equivalent to these being equal. All results in this section hold for one-sided derivatives with
suitable (sometimes tedious) modifications. It is common, though strictly incorrect, to say that f is
differentiable on [a, b) if it is differentiable on the interior (a, b) and right-differentiable at a. In these
notes we will strictly adhere to Definition 3.1: differentiable means two-sided.
Examples 3.2. Basic examples should be familiar from elementary calculus.
1. Let f (x) = x
2
+ 4x. Then, for any a R,
lim
xa
f (x) f (a)
x a
= lim
xa
x
2
+ 4x a
2
4a
x a
= lim
xa
(x a)(x + a + 4)
x a
= lim
xa
(x + a + 4) = 2a + 4
Note how the definition of lim
xa
allows us to cancel the x a terms from the numerator and
denominator. We conclude that f is differentiable (on R) and that f
(x) = 2x + 4.
2. Let g(x) =
x+1
2x3
. Then, for any a =
3
2
,
lim
xa
f (x) f (a)
x a
= lim
xa
1
x a
x + 1
2x 3
a + 1
2a 3
= lim
xa
5a 5x
(x a)(2x 3)(2a 3)
= lim
xa
5
(2x 3)(2a 3)
=
5
(2a 3)
2
f is therefore differentiable on its domain R \{
3
2
} with derivative f
(x) =
5
(2x3)
2
.
The familiar expressions
f
(a) = lim
h0
f (a + h) f (a)
h
, f
(x) = lim
h0
f (x + h) f (x)
h
are equivalent to the original definition (Exercise 5). While seemingly simpler, they sometimes lead
to nastier calculations: see what happens if you try the previous example in this language. . .
45
We now turn to perhaps the most well-known result of elementary calculus.
Theorem 3.3 (Power Law). Let r R. Then f (x) = x
r
is differentiable with f
(x) = rx
r1
.
The domains of f and f
depend messily on r, but the formula holds at least on the interval (0, ).
We leave a complete proof to the exercises and instead consider a few generalizable examples.
Examples 3.4. 1. If n N and a R, a simple factorization yields
lim
xa
x
n
a
n
x a
= lim
xa
(x a)(x
n1
+ ax
n2
+ ···+ a
n2
x + a
n1
)
x a
()
= lim
xa
(x
n1
+ ax
n2
+ ···+ a
n2
x + a
n1
) = na
n1
We conclude that
d
dx
x
n
= nx
n1
.
2. If f (x) = x
1
and a = 0, then
lim
xa
x
1
a
1
x a
= lim
xa
a x
ax(x a)
= lim
xa
1
ax
=
1
a
2
from which we conclude that f
(x) = x
2
.
A similar approach followed by the factorization () proves the
power law for all negative integer exponents:
x
n
a
n
x a
=
a
n
x
n
a
n
x
n
(x a)
= ···
2
1
1
2
y
2 1 1 2
x
3. To differentiate x
1/n
, substitute x = y
n
and observe case 1. For
instance, if g(x) = x
1/3
and a = 0, then y = x
1/3
and b = a
1/3
yield
lim
xa
x
1/3
a
1/3
x a
= lim
yb
y b
y
3
b
3
=
1
3b
2
=
1
3
a
2/3
= g
(x) =
1
3
x
2/3
Note that g is not differentiable at x = 0!
1
1
2
y
2 1 1 2
x
We could similarly compute the derivative for all rational exponents, though it is much easier to wait
for the chain rule. The power law for irrational exponents is somewhat more ticklish.
Corollary 3.5 (Basic Transcendental Functions). Recalling our development of power series in
Chapter 2, the power law (for positive integers!) is all we need to see that
d
dx
exp(x) = exp(x),
d
dx
sin x = cos x,
d
dx
cos x = sin x
It is also possible to develop these results independently of power series (see e.g. Exercise 12).
46
Failure of differentiability
It is instructive to consider how a function might fail to be differentiable. Firstly, a familiar fact shows
that functions are not differentiable at discontinuities.
Lemma 3.6. If f is differentiable at a then f is continuous at a.
Proof. Just take the limit (think carefully why this works!):
lim
xa
f (x) = lim
xa
f (x) f (a)
x a
(x a) + f (a)
= f
(a)(0 0) + f (a) = f (a)
It remains to consider situations when a function is continuous but not differentiable.
Examples 3.7. The following exemplify all situations where a function is continuous on an interval
and differentiable everywhere except at a single interior point. As with isolated discontinuities, these
are classified by considering the three ways in which the derivative limit might not converge.
1. A vertical tangent line occurs when the limit is infinite. For instance, g(x) = x
1/3
at x = 0.
2. Corners occur when the one-sided limits are unequal (could be infinite). For instance, f (x) =
|
x
|
is not differentiable at zero, with one-sided limits
lim
x0
+
|
x
|
|
0
|
x 0
= lim
x0
+
x
x
= 1 = lim
x0
|
x
|
|
0
|
x 0
= lim
x0
x
x
= 1
Indeed f is differentiable everywhere except at zero, with
f
(x) =
(
1 if x > 0
1 if x < 0
A cusp describes the special case where the one-sided limits are = .
3. A singularity is where left- and/or right-limits do not exist.
The standard example is
f (x) =
(
x sin
1
x
if x = 0
0 if x = 0
which is continuous on R and differentiable everywhere ex-
cept at zero: the details are in Exercise 10.
1
x
2
π
2
π
2
π
Singularities and vertical tangent lines can also prevent one-sided differentiability.
More esoteric examples of non-differentiability are possible:
Utilizing series, we can create functions which are continuous on an interval but nowhere differ-
entiable! For an example, see Exercise 15.
It is also possible to construct a function which differentiable (and thus continuous) at precisely
one point; can you think of an example?
47
The Basic Rules of Differentiation
Theorem 3.8. Let f , g be differentiable and k, l be constants.
1. (Linearity) The function k f + lg is differentiable with (k f + lg)
= k f
+ lg
.
2. (Product rule) The function f g is differentiable with ( f g)
= f
g + f g
.
3. (Inverse functions) Suppose f is bijective, b = f
1
(a) is an interior point of dom f
1
, and
f
(a) = 0, then f
1
is differentiable at b and
d
dy
y=b
f
1
(y) =
1
f
(a)
=
1
f
f
1
(b)
Proof. Parts 1 and 2 follow from the limit laws:
lim
xa
(k f + lg)(x) (k f + lg)(a)
x a
= lim
xa
k
f (x) f (a)
x a
+ l
g(x) g(a)
x a
= k f
(a) + lg
(a)
lim
xa
f (x)g(x) f (a)g(a)
x a
= lim
xa
f (x) f (a)
x a
g(x) + f (a)
g(x) g(a)
x a
= f
(a)g(a) + f (a)g
(a)
Note where we used the continuity of g in the second line (lim g(x) = g(a)). Part 3 is an exercise.
The inverse function rule should be intuitive: since the graphs of f and f
1
are related by reflection
in the diagonal y = x, gradients at corresponding points are reciprocals. The result feels even more
natural in Leibniz’s notation:
dx
dy
=
1
dy/dx
.
Examples 3.9. 1. Linearity permits the differentiation of any polynomial: e.g.,
d
dx
7x
2
+ 13x
4
= 7
d
dx
x
2
+ 13
d
dx
x
4
= 14x + 52x
3
2. The product rule extends the reach of differentiation to include simple combinations: e.g.,
d
dx
(x
4
sin x) =
d
dx
x
4
sin x + x
4
d
dx
sin x = 4x
3
sin x x
4
cos x
3. Inverse trigonometric functions can now be differentiated: e.g.,
y = sin
1
x =
d
dx
sin
1
x =
dy
dx
=
dx
dy
1
=
1
cos y
=
1
q
1 sin
2
y
=
1
1 x
2
4. Define the natural logarithm to be the inverse of the (bijective!) exponential function exp(x):
y = ln x x = exp y
It follows that
d
dx
ln x =
dx
dy
1
=
1
exp y
=
1
x
The full details, and the justification that exp x = e
x
, are in Exercise 14.
48
Theorem 3.10 (Chain Rule). If g is differentiable at a, and f is differentiable at g(a), then f g is
differentiable at a with derivative
( f g)
(a) = f
g(a)
g
(a)
In Leibniz’s notation,
d( f g)
dx
=
d f
dg
dg
dx
: this looks like a simple cancellation of the dg terms. . .
14
Proof. Since f and g are differentiable, a is interior to dom(g) and g(a) is interior to dom( f ). Since g
is continuous at a, there must exist some open interval U a for which x U = g(x) dom( f ).
Define γ : dom( f ) R via
γ(v) =
f
v
f
g(a)
vg(a)
if v = g(a)
f
g(a)
if v = g(a)
()
Since f is differentiable at g(a),we see that γ is continuous there: indeed lim
vg(a)
γ(v) = f
g(a)
.
For any x U \ {a}, let v = g(x) in (). Then
f
g(x)
f
g(a)
x a
= γ
g(x)
g(x) g(a)
x a
Take limits as x a for the result.
Corollary 3.11 (Quotient Rule). Suppose f and g are differentiable. Then
f
g
is differentiable when-
ever g(x) = 0. Moreover
f
g
=
f
g f g
g
2
The proof is an exercise.
Examples 3.12. 1. By the quotient rule,
d
dx
tan x =
d
dx
sin x
cos x
=
cos
2
x + sin
2
x
cos
2
x
= sec
2
x
2. We can now differentiate highly involved combinations of elementary functions:
d
dx
tan(e
4x
2
)
7x
sin x
= 8xe
4x
2
sec
2
(e
4x
2
)
7 sin x 7x cos x
sin
2
x
14
This is completely unjustified since dg does not (for us) have independent meaning. The same problem appears in a
famously flawed one-line ‘proof of the chain rule:
lim
xa
f
g(x)
f
g(a)
x a
?
= lim
xa
f
g(x)
f
g(a)
g(x) g(a)
lim
xa
g(x) g(a)
x a
The second limit doesn’t make sense unless g(x) = g(a) for all x on some punctured neighborhood of a: in particular, g(x)
cannot be constant! The faulty argument may be repaired by replacing this difference quotient with f
g(a)
whenever
g(x) = g(a), before taking the limit. This is precisely what γ
g(x)
does in the correct proof.
49
Exercises 3.28. Key concepts: Differentiability, Basic rules: linearity, power, product, chain, quotient
1. Use Definition 3.1 to calculate the derivatives.
(a) f (x) = x
3
at x = 2 (b) g(x) = x + 2 at x = a
(c) f (x) = x
2
cos x at x = 0 (d) r(x) =
3x+4
2x1
at x = 1
2. Differentiate the function f (x) = cos
e
x
5
3x
using the chain and product rules.
3. (a) Prove the quotient rule (Corollary 3.11) by combining the chain and product rules.
(b) Prove the inverse derivative rule (Theorem 3.8, part 3).
(Hint: You can’t simply differentiate 1 =
dx
dx
=
d
dx
f
f
1
(x)
using the chain rule; why not?)
4. (a) Find the derivatives of secant, cosecant and cotangent using the quotient rule.
(b) Why did we choose the positive square-root when computing
d
dx
sin
1
x? What is the
standard domain of arcsine, and what happens at x = ±1?
(c) Find the derivatives of the inverse trigonometric functions using the inverse function rule.
5. Using the definition of the derivative, and supposing that f is differentiable at a, prove that
f
(a) = lim
h0
f (a + h) f (a)
h
= lim
h0
f (a + h) f (a h)
2h
6. Use induction to prove the power law
d
dx
x
n
= nx
n1
when n N using only the product rule
and the fact that
d
dx
x = 1.
7. Prove that f (x) = x
|
x
|
is differentiable everywhere and compute its derivative.
8. Show that f (x) = x
2/3
has a cusp (see Example 3.7.2) at x = 0.
9. Show that following function is differentiable everywhere and compute its derivative:
f (x) =
(
x
2
sin
1
x
if x = 0
0 if x = 0
Moreover, prove that the derivative f
is discontinuous at x = 0.
10. Prove that the function in Example 3.7.3 is differentiable everywhere except at x = 0.
11. Suppose f (x) = x
2
whenever x Q and f (x) = 0 whenever x ∈ Q. At what values of x is f
differentiable? Prove your assertion.
12. (a) Suppose 0 < h <
π
2
. Use the picture to show that
0 <
1 cos h
h
< sin
h
2
and sin h < h < tan h
Hence conclude that lim
h0
sin h
h
= 1 and lim
h0
1cos h
h
= 0.
(b) Use part (a) to prove that
d
dx
sin x = cos x
cos h
sin h
tan h
1
h
h
50
13. (Hard) Use induction to prove the Leibniz rule (general product rule):
( f g)
(n)
=
n
k=0
n
k
f
(k)
g
(nk)
Warning! The last two exercises are much longer and & tougher: have a go if you appreciate a
challenge.
14. The Exponential Function & the Power Law
The ratio tests shows that the power series exp(x) :=
n=0
x
n
n!
converges for all real x. Define
e := exp(1). Certainly e
x
makes sense whenever x Q. If x is irrational, instead define
e
x
:= sup{e
q
: q Q, q < x}
The goal of this question is to prove that exp(x) = e
x
. As a nice bonus we recover Bernoulli’s
limit identity e = lim
n
1 +
1
n
n
and obtain a complete proof of the power law!
(a) For all x, y R, prove that exp(x + y) = exp(x) exp(y)
(Hint: use the binomial theorem and change the order of summation)
(b) Show that exp(x) is always positive, even when x < 0.
(c) Prove that exp : R ( 0, ) is bijective.
(Hint: x 0 = exp(x) 1 + x; take limits then apply part (a))
(d) Prove that e
x
= exp(x). Do this in three stages:
If x N, use part (a). Now check for x Z
.
If x =
m
n
Q, first compute
exp(
m
n
)
n
.
If x is irrational, consider a sequence of rational numbers q
n
< x with e
q
n
e
x
. . .
(e) Let ln : (0, ) R be the inverse function of exp. Prove the logarithm laws:
ln(xy) = ln x + ln y and ln x
r
= r ln x
(Just do this when r N; in general, another argument like part (d) is required)
(f) We’ve already seen that
d
dy
ln y =
1
y
. Use the fact that
d
dy
ln y = lim
h0
ln(y + h) ln y
h
to prove that exp(x) = lim
n
1 +
x
n
n
, thus recovering Bernoulli’s definition of e.
(g) For any r R, define x
r
:= exp(r ln x). Hence obtain the power law for any exponent.
51
15. A Very Strange Function
Here is a classic example of a continuous but nowhere-differentiable function!
Let f be the sawtooth function defined by f (x) =
|
x
|
whenever x [1, 1] and extending
periodically to R so that f (x + 2) = f (x). Now define g : R R via
g(x) =
n=0
3
4
n
f (4
n
x)
1
2
2 1 0 1 2
1
2
2 1 0 1 2
f (x) and iterations to n = 3 g(x) (really n = 6, but can you tell?!)
(a) Prove that g is well-defined and continuous on R.
(b) Let x R and m N be fixed. Define h
m
= ±
1
2
·4
m
where the sign is chosen so that no
integers lie strictly between 4
m
x and 4
m
(x + h
m
) = 4
m
x ±
1
2
.
For each n N
0
, define
k
n
=
f
4
n
(x + h
m
)
f (4
n
x)
h
m
Prove the following
i.
|
k
n
|
4
n
with equality when n = m.
ii. n > m = k
n
= 0.
(Hint:
|
f (y) f (z)
|
|
y z
|
: when is this an equality?)
(c) Use part (b) to prove that
g(x + h
m
) g(x)
h
m
1
2
(3
m
+ 1)
Hence conclude that g is nowhere differentiable.
52
3.29 The Mean Value Theorem
A key result in elementary calculus, this should be very familiar from your previous studies.
Theorem 3.13 (Mean Value Theorem/MVT). Let f be continuous on [a, b] and differentiable on
(a, b). Then there exists ξ (a, b) such that f
(ξ) =
f (b)f (a)
ba
.
This follows easily from two lemmas.
Lemma 3.14. 1. (Critical Points) Suppose g is bounded on (a, b) and attains its maximum or mini-
mum at ξ (a, b). If g is differentiable at ξ then g
(ξ) = 0.
2. (Rolle’s Theorem) Suppose g is continuous on [a, b], differentiable on (a, b), and g(a) = g(b).
Then there exists ξ (a, b) such that g
(ξ) = 0.
The main result is obtained by subtracting a straight line and applying Rolle’s theorem to
g(x) = f (x)
f (b) f (a)
b a
(x a)
and observing that g(a) = f (a) = g(b) and g
(x) = f
(x)
f (b)f (a)
ba
.
g(x)
x
a b
ξ
Critical Points/Rolles Theorem
f (x)
x
a b
ξ
Mean Value Theorem
In the pictures, the orange and green lines are parallel: the average slope over the interval [a, b] equals
the gradient/derivative f
(ξ).
Proof of Lemma. 1. Suppose ξ (a, b) is a maximum: that is, g(x) g(ξ) for all x = ξ. Then
g(x) g(ξ)
x ξ
(
0 whenever x > ξ
0 whenever x < ξ
Now take the one-sided limits: since g is differentiable at ξ, we see that
0 lim
xξ
+
g(x) g(ξ)
x ξ
= g
(ξ) = lim
xξ
g(x) g(ξ)
x ξ
0
Otherwise said g
(ξ) = 0. The case when ξ is a minimum is similar.
2. By the Extreme Value Theorem (1.11), g is bounded and attains its bounds. If the extrema both
occur at the endpoints a, b, then g is constant: any ξ (a, b) satisfies the result. Otherwise, at
least one extreme occurs at some ξ (a, b): part 1 says that g
(ξ) = 0.
53
Examples 3.15. 1. Let f (x) = (x 1)
2
(4 x) + x on [a, b] = [1, 4]: this is roughly the above picture
illustrating the mean value theorem. Compute the average slope and the derivative,
f (b) f (a)
b a
= 1, f
(x) = 2(x 1)(4 x) (x 1)
2
+ 1 = 3x
2
+ 12x 8
and observe that
f
(ξ) =
f (b) f (a)
b a
3ξ
2
12ξ + 9 = 0 ξ = 1 or 3
Since only 3 lies in the interval ( 1, 4), this is the value ξ satisfying the mean value theorem.
2. We find the maximum and minimum values of g(x) = x
4
14x
2
+ 24x on the interval [0, 2].
The function is differentiable, with
g
(x) = 4x
3
28x + 24 = 4(x 2)(x 1)(x + 3)
By the Lemma, the locations of the extrema are either the end-
points x = 0, 2 or locations with zero derivative (x = 1). Since
f (0) = 0, f (1) = 11, f (2) = 8
we conclude that max( f ) = f (1) = 11 and min( f ) = f (0) = 0.
0
5
10
g(x)
0 1 2
x
Consequences of the Mean Value Theorem Several simple corollaries relate to monotonicity.
Definition 3.16. Suppose f : I R is defined on an interval I. We say that f is:
Increasing (monotone-up) on I if x < y = f (x) f (y)
Decreasing (monotone-down) on I if x < y = f (x) f (y)
We say strictly increasing/decreasing if the inequalities are strict.
Examples 3.17. 1. f : x 7 x
2
is strictly increasing on [0, )
and strictly decreasing on (, 0].
2. The floor function f : x 7 x(the greatest integer less
than or equal to x) is increasing, but not strictly, on R.
2
1
1
2
g(x)
2 1 1 2 3
x
Corollary 3.18. Suppose f is differentiable on an interval I. Then
1. f
0 on I f is increasing on I
2. f
0 on I f is decreasing on I
3. f
= 0 on I f is constant on I
54
Proof. (Part 1, ) Let x < y where x, y I. By the mean value theorem, ξ (x, y) such that
f (y) f (x)
y x
= f
(ξ) whence f
(ξ) 0 = f (y) f (x)
() For the converse, use the definition of derivative: f
(ξ) = lim
xξ
f (x)f (ξ)
xξ
. If f is increasing, then
x > ξ = f (x) f ( ξ) = f
(ξ) 0
Parts 2 and 3 are similar.
More care is required when relating f
> 0 to f being strictly increasing (see Exercise 5). The corollary
also yields a couple of (hopefully familiar) flashbacks to elementary calculus.
Corollary 3.19. Let I be an open interval.
1. (Anti-derivatives on an interval) If f
(x) = g
(x) on I, then c such that g(x) = f (x) + c on I.
2. (First derivative test) Suppose f is continuous on I and differentiable except perhaps at ξ. If
(
f
(x) < 0 whenever x < ξ, and
f
(x) > 0 whenever x > ξ
then f has its minimum value at x = ξ
The statement for a maximum is similar.
Examples 3.20. 1. Since
d
dx
sin( 3x
2
+ x) = (6x + 1) cos(3x
2
+ x) on (the interval) R, whence all
anti-derivatives of f (x) = (6x + 1) cos(3x
2
+ x) are given by
Z
f (x) dx =
Z
(6x + 1) cos(3x
2
+ x) dx = sin(3x
2
+ x) + c
As is typical in calculus, we use the indefinite integral notation
R
f (x) dx for anti-derivatives.
2. If f (x) = x
2/3
e
x/3
, then f
(x) =
1
3
x
1/3
(2 + x)e
x/3
.
By Lemma 3.14, the only possible critical points are at
x = 0 or 2. The sign of the derivative is also clear:
2 0
x
f
(x) > 0 f
(x) < 0 f
(x) > 0
1
f (x)
3 2 1 0 1
x
By the 1
st
derivative test, f has a maximum at x = 2 and a minimum at x = 0.
We finish this section by tying together the mean and intermediate value theorems.
Theorem 3.21 (IVT for Derivatives). Suppose f is differentiable on an interval I containing a < b,
and that L lies between f
(a) and f
(b). Then ξ (a, b) such that f
(ξ) = L.
If f
(x) is continuous, this is just the intermediate value theorem applied to f
; surprisingly, continuity
of f
is not required. A full proof is in Exercise 7.
55
Exercises 3.29. Key concepts: Differentiability, Basic rules: linearity, power, product, chain, quotient
1. Determine whether the conclusion of the mean value theorem holds for each function on the
given interval. If so, find a suitable point ξ. If not, state which hypothesis fails.
(a) x
2
on [1, 2] (b) sin x on [0, π] (c)
|
x
|
on [1, 2]
(d) 1/x on [1, 1] (e) 1/x on [1, 3]
2. Suppose f and g are differentiable on an interval I containing a < b and that f (a) = f (b) = 0.
By considering h(x) = f (x)e
g(x)
, prove that f
(ξ) + f (ξ)g
(ξ) = 0 for some ξ (a, b).
3. (a) Use the Mean Value Theorem to prove that x < tan x for all x (0,
π
2
).
(b) Prove that
x
sin x
is strictly increasing on (0,
π
2
).
(c) Prove that x
π
2
sin x for all x [0,
π
2
].
4. Suppose that
|
f (x) f (y)
|
(x y)
2
for all x, y R. Prove that f is a constant function.
5. (a) Prove that f
> 0 on an interval I = f is strictly increasing on I.
(b) Show that the converse of part (a) is false.
(c) Carefully prove the first derivative test (Corollary 3.19).
6. If f is differentiable on an interval I such that f
(x) = 0 for all x I, use the intermediate value
theorem for derivatives to prove that f is either strictly increasing or strictly decreasing.
7. (Intermediate value theorem for derivatives) Let f , a, b and L be as in Theorem 3.21, define
g : I R by g(x) = f (x) Lx, and let ξ [a, b] be such that
g(ξ) = min
g(x) : x [a, b]
(a) Why can we be sure that ξ exists? If ξ (a, b), explain why f
(ξ) = L.
(b) Assume WLOG that f
(a) < f
(b). Prove that g
(a) < 0 < g
(b). By considering
lim
xa
+
g(x)g(a)
xa
, show that x > a for which g(x) < g(a). Hence complete the proof.
8. Suppose f
exists on (a, b), and is continuous except for a discontinuity at c (a, b).
(a) Suppose lim
xc
+
f
(x) = L < f
(c). By taking ϵ =
f
(c)L
2
in the definition of this limit
and applying IVT for derivatives, obtain a contradiction.
Hence argue that c cannot be a removable or a jump discontinuity.
(b) Similarly, show that f
cannot have an infinite discontinuity by considering lim
xc
+
f
(x) = .
(c) By parts (a) and (b), It remains to see that f
can have an essential discontinuity. Recall
(Exercise 3.28.9) that
f : R R : x 7
(
x
2
sin
1
x
x = 0
0 x = 0
is differentiable on R, but has discontinuous derivative at x = 0.
i. Use x
n
=
1
2nπ
and y
n
=
1
(2n+1)π
to show that f
has an essential discontinuity at x = 0.
ii. Prove that if lim s
n
= 0 and lim f
(s
n
) = M, then M [ 1, 1].
iii. Prove that for any L [1, 1], there is a sequence (t
n
) for which lim f
(t
n
) = L.
(Hint: Use IVT for derivatives)
56
3.30 L’Hˆopital’s Rule
We are often required to consider indeterminate forms: limits which do not yield easily to the standard
limits laws. For instance, while it is tempting to write
lim
x0
sin 2x
e
3x
1
=
lim sin 2x
lim e
3x
1
=
0
0
()
this is an incorrect application of the limit laws since the resulting quotient has no meaning.
Definition 3.22. An indeterminate form is any limit where a na
¨
ıve application of the limit laws results
in a meaningless expression: the primary types are
0
0
,
, , 0 · , 0
0
, 0
, and 1
.
Examples 3.23. 1. lim
x7
+
(x 7)
1
x7
is an indeterminate form of type 0
.
2. Our motivating example () may correctly be evaluated using the definition of the derivative:
lim
x0
sin 2x
e
3x
1
= lim
x0
sin 2x 0
x 0
x 0
e
3x
1
=
d
dx
x=0
sin 2x
d
dx
x=0
e
3x
1
=
2
3
By considering lim
x0
3a sin 2x
2(e
3x
1)
, we see that an indeterminate form of type
0
0
can take any value a!
The approach generalizes, if non-rigorously: if f , g are differentiable at a and f (a) = 0 = g(a), then
lim
xa
f (x)
g(x)
= lim
xa
f (x) f (a)
x a
·
x a
g(x) g(a)
=
f
(a)
g
(a)
Our goal is to fully justify this result and extend to several situations:
One-sided limits, including when a = ± .
When lim f (x) = 0 exists, but f (a) does not (g(x), g(a) similarly).
Indeterminate forms of type
(lim f (x) = , etc.).
When the RHS cannot be cleanly evaluated: for instance g
(a) = 0 or if the original limit is ±.
Here is the full result.
Theorem 3.24 (L’H ˆopital’s Rule). Let a R {±} and suppose functions f and g satisfy:
1. lim
xa
f
(x)
g
(x)
= L for some L R {±}, and,
2. (a) lim
xa
f (x) = lim
xa
g(x) = 0, or (b) lim
xa
g(x) = (no condition on f )
Then lim
xa
f (x)
g(x)
= L. The same result holds for one-sided limits.
The full proof is a behemoth—we postpone this until after several examples. In part because of this,
and because examples can often be evaluated more instructively using elementary methods (as in the
above example), l’H
ˆ
opital’s rule is often discouraged in elementary calculus.
57
Examples 3.25. 1. If f (x) = e
4x
and g(x) = 21x 17, then lim
x
f (x)
g(x)
has type
. By l’H
ˆ
opital’s rule,
lim
x
f
(x)
g
(x)
= lim
x
4e
4x
21
= = lim
x
e
4x
21x 17
=
2. For an example of type
0
0
, consider f (x) = x
2
9 and g(x) = ln(4 x):
lim
x3
f
(x)
g
(x)
= lim
x3
2x
1/(4 x)
= lim
x3
2x(x 4) = 6 = lim
x3
x
2
9
ln( 4 x)
= 6
3. One can apply the rule repeatedly: for example
lim
x0
e
4x
1 4x
x
2
= lim
x0
4e
4x
4
2x
= lim
x0
16e
4x
2
= 8
This is a generally accepted abuse of protocol: one shouldn’t really state the first limit until one
knows the last limit exists! As long as everything works, you are fine. However. . .
4. It is crucially important that the limit lim
f
g
exists before applying l’H
ˆ
opital’s rule! Consider
f (x) = x + cos x and g(x) = x: certainly lim
x
f (x)
g(x)
has type
, however
lim
x
f
(x)
g
(x)
= lim
x
1 sin x
does not exist! In this case the rule is unnecessary: appealing to the squeeze theorem,
f (x)
g(x)
= 1 +
cos x
x
x
1
5. For another reason for why l’H
ˆ
opital’s rule is often prohibited in Freshman calculus, consider
lim
x0
sin x
x
= lim
x0
cos x
1
= 1
This appears legitimate. However, recall (Exercise 3.28.12) that this limit is used to demonstrate
d
dx
sin x = cos x; to use this to calculate the limit on which it depends is circular logic!
The remaining indeterminate forms (Definition 3.22) may be modified so that l’H
ˆ
opital’s rule applies.
Examples 3.26. 1. An indeterminate form of type may be transformed to one of type
0
0
before
applying the rule (twice):
lim
x0
+
1
e
x
1
1
x
= lim
x0
+
x + 1 e
x
x(e
x
1)
(type
0
0
)
= lim
x0
+
1 e
x
e
x
1 + xe
x
(still type
0
0
)
= lim
x0
+
e
x
2e
x
+ xe
x
=
1
2
58
2. For an indeterminate form of type 1
, we use the log laws & continuity of the exponential:
lim
x0
+
(1 + sin x)
1/x
= exp
lim
x0
+
1
x
ln( 1 + sin x)
(type
0
0
)
= exp
lim
x0
+
cos x
1 + sin x
= e
1
= e
Proving l’Hˆopital’s Rule
The complete argument is very lengthy. It starts with an extension of the Mean Value Theorem.
Lemma 3.27 (Extended Mean Value Theorem). Fix a < b, suppose f , g are continuous on [a, b] and
differentiable on (a, b). Then there exists ξ (a, b) such that
f (b) f (a)
g
(ξ) =
g(b) g(a)
f
(ξ)
Proof. Apply the standard mean value theorem (really Rolle’s theorem) to
h(t) =
f (b) f (a)
g(t)
g(b) g(a
) f (t)
which satisfies h(a) = h(b).
Now for the main event. If you do nothing else, read the following proof of the simplest case. Every-
thing else is a modification.
Proof (Case (a)/type
0
0
, with right limits). Suppose we have a form of type
0
0
= lim
xa
+
f (x)
g(x)
taking right-
limits at a finite location a, and that the resulting limit L is finite.
First observe that condition 1 forces the existence of an interval (a, b) on which f , g are differentiable
and g
(x) = 0. Everything follows from the definition the limit in condition 1, and Lemma 3.27:
Given ϵ > 0, δ (0, b a) such that a < ξ < a + δ =
f
(ξ)
g
(ξ)
L
<
ϵ
2
()
a < y < x < a + δ = ξ ( y, x) such that
f (x) f (y)
g(x) g(y)
=
f
(ξ)
g
(ξ)
(†)
Since g
= 0, the usual mean value theorem says
c (y, x) such that g(x) g(y) = g
(c)(x y) = 0
whence we never divide by zero in (). Combining () and () , observe that
a < x < a + δ =
f (x)
g(x)
L
2(a)
= lim
ya
+
f (x) f (y)
g(x) g(y)
L
()
= lim
ya
+
f
(ξ)
g
(ξ)
L
()
ϵ
2
< ϵ
Note that a < y < ξ(x, y) < x is a function of x, y here! Since ϵ > 0 is arbitrary, this is the required
result.
59
A complete proof for all indeterminate forms of type
0
0
follows from some simple modifications.
If a = : Replace the blue part of () as follows:
Given ϵ > 0, m b such that ξ < m =
f
(ξ)
g
(ξ)
L
<
ϵ
2
The rest of the proof goes through after replacing a with and a + δ with m.
If L = : Replace the green parts of () with Given M > 0 and
f
(ξ)
g
(ξ)
> 2M. Fixing the rest of the
proof is again straightforward.
If L = : Replace the green parts of () with Given M > 0 and
f
(ξ)
g
(ξ)
< 2M.
Left-limits: If f , g are differentiable on (c, a), then the blue part may be replaced with either:
(a finite) δ (0, a c) such that a δ < ξ < a
(a = ) m c such that ξ > m
The blue and green parts of () may be replaced independently.
Proof (Case (b), lim g(x) = ). This requires a little more care.
15
Since g
= 0, and lim
xa
+
g(x) = ,
Exercise 3.29.6 says that g is strictly decreasing on (a, b). By replacing b by some
˜
b (a, b), if necessary,
we may assume that
a < y < x < b = 0 < g(x) < g(y) (‡)
Assume a and L are finite and obtain () and () as before. Let x (a, a + δ) be fixed and multiply
(†) by
g(y)g(x)
g(y)
(this is positive by (‡)): a little algebra and the triangle inequality tell us that
a < y < x =
f (y)
g(y)
=
f
(ξ)
g
(ξ)
+
f (x)
g(y)
g(x)
g(y)
·
f
(ξ)
g
(ξ)
=
f (y)
g(y)
L
f
(ξ)
g
(ξ)
L
+
1
g(y)
|
f (x)
|
+
|
g(x)
|
L +
ϵ
2
Since lim
ya
+
g(y) = and x is fixed, we see that there exists η x a < δ such that
y (a, a + η) =
1
g(y)
|
f (x)
|
+
|
g(x)
|
L +
ϵ
2
<
ϵ
2
Finally combine with (): given ϵ > 0, η > 0 such that y (a, a + η) =
f (y)
g(y)
L
< ϵ.
The same modifications listed above complete the proof.
15
Forms of type
? Instead of assumption 2. (b), why not simply assume lim f = lim g = and write
f
g
=
1/g
1/ f
to obtain
a form of type
0
0
? The problem is that the derivative of the ‘new’ denominator
d
dx
1
f
=
f
f
2
need not be non-zero on any
interval (a, b) and so condition 1. need not hold. We could modify this, but it would make for a weaker theorem. Example
3.25.4 illustrates the issue: f
(x) = 1 + sin x has zeros on any unbounded interval.
After the 2. (b) case is proved and we know that lim
f
g
= L, it is then clear that lim f must also be infinite (unless L = 0 in
which case lim f could be anything and need not exist). This situation therefore really does deal with forms of type
.
60
Exercises 3.30. Key concepts: Types of indeterminate forms, Formal statement of l’Hˆopital’s rule
1. Evaluate the limits, if they exist:
(a) lim
x0
x
3
sin x x
(b) lim
x
π
2
tan x
2
π 2x
(c) lim
x0
(cos x)
1/x
2
(d) lim
x0
(1 + 2x)
1/x
(e) lim
x
(e
x
+ x)
1/x
2. Suppose f is differentiable on (c, ) and that lim
x
[ f (x) + f
(x)] = L is finite.
(a) Prove that lim
x
f (x) = L and that lim
x
f
(x) = 0.
(Hint: write f (x) =
f (x)e
x
e
x
)
(b) Does anything change if L exists and is infinite?
3. If p
n
(x) is a polynomial of degree n, use induction to prove that lim
x
p
n
(x)e
x
= 0
4. Let f (x) = x + sin x cos x, g(x) = e
sin x
f (x) and h(x) =
2 cos x
e
sin x
( f (x) + 2 cos x)
(a) Prove that lim
x
f (x) = = lim
x
g(x) but that lim
x
f (x)
g(x)
does not exist.
(b) If cos x = 0, and x is large, show that
f
(x)
g
(x)
= h(x).
(c) Prove that lim
x
h(x) = 0. Explain why this does not contradict part (a)!
61
3.31 Taylors Theorem
A primary goal of power series is the approximation of functions. With this in mind, there are two
natural questions to ask of a function f :
1. Given c dom( f ), is there a series
a
n
(x c)
n
which equals f (x) on an interval containing c?
2. If we take the first n terms of such a series, how accurate is this polynomial approximation?
Example 3.28. Recall the geometric series
f (x) =
1
1 x
=
n=0
x
n
whenever 1 < x < 1
The polynomial approximation
p
n
(x) =
n
k=0
x
k
= 1 + x + ···+ x
n
=
1 x
n+1
1 x
has error
R
n
(x) = f (x) p
n
(x) =
x
n+1
1 x
2
4
6
8
10
y
1 0 1
x
1
2
1
2
p
3
(x) = 1 + x + x
2
+ x
3
If x is close to 0, this is likely very small; for instance if x
1
2
,
1
2
, then
|
R
n
(x)
|
1
1
1
2
1
2
n+1
= 2
n
However, when x is close to 1 the error is unbounded!
The above behavior occurs in general: the truncated polynomials provide better approximations
nearer the center of the series. To see this, we first need to consider higher-order derivatives.
Definition 3.29. We write f
′′
for the second derivative of f , namely the derivative of its derivative
f
′′
(a) = lim
xa
f
(x) f
(a)
x a
The existence of f
′′
(a) presupposes that f
exists on an (open) interval containing a. We can similarly
consider third, fourth, and higher-order derivatives. As a function, the n
th
derivative is written
f
(n)
(x) =
d
n
f
dx
n
By convention, the zeroth derivative is the function itself f
(0)
(x) = f (x). We say that f is n times
differentiable at a if f
(n)
(a) exists, and infinitely differentiable (or smooth) if derivatives of all orders exist.
Example 3.30. f (x) = x
2
|
x
|
is twice differentiable, with f
′′
(x) = 6
|
x
|
. It is smooth everywhere
except at x = 0, where third (and higher-order) derivatives do not exist.
62
Definition 3.31. Suppose f is n times differentiable at x = c. The n
th
Taylor polynomial p
n
of f
centered at c is
p
n
(x) :=
n
k=0
f
(k)
(c)
k!
(x c)
k
= f (c) + f
(c)(x c) +
f
′′
(c)
2
(x c)
2
+ ···+
f
(n)
(c)
n!
(x c)
n
The remainder R
n
(x) is the error in the polynomial approximation
R
n
(x) = f (x) p
n
(x) = f (x)
n
j=0
f
(k)
(c)
k!
(x c)
k
If f is infinitely differentiable at x = c, then its Taylor series centered at x = c is the power series
T
c
f (x) =
n=0
f
(n)
(c)
n!
(x c)
n
When c = 0 this is known as a Maclaurin series.
16
For simplicity we’ll mostly work with Maclaurin series, with general situation hopefully being clear.
Examples 3.32. 1. If f (x) = e
3x
, then f
(n)
(x) = 3
n
e
x
, from which the Maclaurin series is
T
0
f (x) =
n=0
3
n
n!
x
n
2. If g(x) = sin 7x, then the sequence of derivatives is
7 cos 7x, 7
2
sin 7x, 7
3
cos 7x, 7
4
sin 7x, 7
5
cos 7x, 7
6
sin 7x, . . .
At x = 0, every even derivative is zero whereas the odd derivatives alternate in sign. The
Maclaurin series is easily seen to be
T
0
g(x) =
n=0
(1)
n
7
2n+1
(2n + 1)!
x
2n+1
3. If h(x) =
x, then h
(x) =
1
2
x
1/2
, h
′′
(x) =
1
2
2
x
3/2
, and h
′′
(x) =
3
2
3
x
5/2
, from which the
third Taylor polynomial centered at c = 1 is
p
2
(x) = h(1) + h
(1)(x 1) +
h
′′
(1)
2
(x 1)
2
+
h
′′
(1)
6
(x 1)
3
= 1 +
1
2
(x 1)
1
8
(x 1)
2
+
1
16
(x 1)
3
Rather than computing further examples, we first develop a little theory that makes verifying Taylor
series much easier.
16
Named for Englishman Brook Taylor (1685–1731) and Scotsman Colin Maclaurin (1698–1746). Taylor’s general method
expanded on examples discovered by James Gregory and Issac Newton in the mid-to-late 1600s.
63
Differentiation of Taylor Polynomials and Series
Suppose P(x) =
a
j
x
j
is a power series with radius of convergence R > 0. As we saw previously
(Theorem 2.31), P(x) is differentiable term-by-term on (R, R). Indeed,
P
(x) =
j=1
a
j
jx
j1
= P
(0) = a
1
P
′′
(x) =
j=2
a
j
j(j 1)x
j2
= P
′′
(0) = 2a
2
P
′′
(x) =
j=3
a
j
j(j 1)(j 2)x
j3
= P
′′
(0) = 3!a
3
.
.
.
P
(k)
(x) =
j=k
a
j
j(j 1) ···(j k + 1)x
jk
=
j=k
j!a
j
(j k)!
x
jk
= P
(k)
(0) = k!a
k
Otherwise said, P is its own Maclaurin series! The same discussion holds for polynomials. Indeed if
P(x) = a
0
+ a
1
x + ···+ a
n
x
n
is a polynomial and f a function, then
P
(k)
(0) = f
(k)
(0) a
k
=
f
(k)
(0)
k!
If this holds for all k n, then P = p
n
is the n
th
Taylor polynomial of f ! With a little modification,
we’ve proved the following:
Theorem 3.33. 1. If f (x) =
a
n
(x c)
n
is a power series defined on a neighborhood of c, then
T
c
f (x) = f (x): the function is its own Taylor series!
2. The n
th
Taylor polynomial of f centered at x = c is the unique polynomial p
n
of degree n
whose value and first n derivatives agree with those of f at x = c: that is
k n, p
(k)
n
(c) = f
(k)
(c)
This answers our first motivating question: a function can equal at most one power series with a
given center. The second question requires a careful study of the remainder: we’ll do this shortly.
Examples 3.34 (Common Maclaurin Series). These should be familiar from elementary calculus.
Each function equals the given series form our previous discussions of power series: by the Theorem,
each series is immediately the Maclaurin series of the given function.
e
x
=
n=0
x
n
n!
x R
1
1 x
=
n=0
x
n
x (1, 1)
sin x =
n=0
(1)
n
(2n + 1)!
x
2n+1
x R ln( 1 + x) =
n=1
(1)
n+1
n
x
n
x (1, 1]
cos x =
n=0
(1)
n
(2n)!
x
2n
x R tan
1
x =
n=0
(1)
n
2n + 1
x
2n+1
x [1, 1]
64
Examples 3.35 (Modifying Maclaurin Series). By substituting for x in a common series, we quickly
obtain new series.
1. Substitute x 7 7x in the Maclaurin series for sin x, to recover our earlier example
sin 7x =
n=0
(1)
n
7
2n+1
(2n + 1)!
x
2n+1
, x R
Note how this requires almost no calculation: since the function equals a series, the Theorem
says we have the Maclaurin series for sin 7x!
2. Substitute x 7 x
2
in the Maclaurin series for e
x
to obtain
e
x
2
= exp(x
2
) =
n=0
1
n!
x
2n
, x R
This would be disgusting to verify directly, given the difficulty of repeatedly differentiating e
x
2
.
3. We find the Taylor series for f (x) =
1
5x
centered at x = 2:
f (x) =
1
3 + 2 x
=
1
3( 1
x2
3
)
=
1
3
n=0
x 2
3
n
which is valid whenever 1 <
x2
3
< 1 1 < x < 5.
4. Fix c R and observe that, for all x R,
e
x
= e
c+xc
= e
c
e
xc
=
n=0
e
c
n!
(x c)
n
We conclude that the series is the Taylor series of e
x
centered at x = c. Of course this is easily
verified using the definition, since
d
n
dx
n
x=c
e
x
= e
c
.
5. Combining the Theorem with the multiple-angle formula, we obtain the Taylor series for sin x
centered at x = c:
sin x = sin(c + x c) = sin c cos(x c) + cos c sin(x c)
=
n=0
(1)
n
sin c
(2n)!
(x c)
2n
+
n=0
(1)
n
cos c
(2n + 1)!
(x c)
2n+1
Definition 3.36. A function f is analytic on a its domain if every c dom f has a neighborhood on
which f (x) equals its Taylor series centered at c.
All the examples we’ve thus far seen are analytic on their domains; indeed the last two of Exam-
ples 3.35 prove this for the exponential and sine functions. Every analytic function is automatically
smooth (infinitely differentiable), however the converse is false (Exercise 10). Analyticity is of greater
importance in complex analysis where (amazingly!) it is equivalent to complex-differentiability.
65
Accuracy of Taylor Approximations
Our final goal is to estimate the accuracy of a Taylor polynomial as an approximation to its generating
function. Otherwise said, we want to estimate the size of the remainder R
n
(x) = f (x) p
n
(x).
Theorem 3.37 (Taylors Theorem: Lagrange’s form). Suppose f is n + 1 times differentiable on an
open interval I containing c and let x I \ {c}. Then there exists ξ between c and x for which the
remainder centered at c satisfies
R
n
(x) =
f
(n+1)
(ξ)
(n + 1)!
(x c)
n+1
Proof. For simplicity let c = 0. Fix x = 0, define a constant M
x
and a function g : I R by
R
n
(x) =
M
x
(n + 1)!
x
n+1
and g(t) =
M
x
(n + 1)!
t
n+1
+ p
n
(t) f (t) =
M
x
(n + 1)!
t
n+1
R
n
(t)
Observe that
k n + 1 = g
(k)
(x) =
M
x
(n + 1 k)!
t
n+1k
+ p
(k)
n
(t) f
(k)
(t) ()
= g
(k)
(0) = p
(k)
n
(0) f
(k)
(0) = 0 if k n
where we invoked Theorem 3.33.
Now apply Rolle’s Theorem repeatedly (WLOG assume x > 0):
ξ
1
between 0 and x such that g
(ξ
1
) = 0.
ξ
2
between 0 and ξ
1
such that g
′′
(ξ
2
) = 0, etc.
Iterate to obtain a sequence (ξ
k
) such that
0 < ξ
n+1
< ξ
n
< ··· < ξ
1
< x and g
(k)
(ξ
k
) = 0
Take ξ = ξ
n+1
and consider (): since deg p
n
n, we see that
0 = g
(n+1)
(ξ) = M
x
f
(n+1)
(ξ) = R
n
(x) = f (x) p
n
(x) =
f
(n+1)
(ξ)
(n + 1)!
x
n+1
Corollary 3.38. Suppose f is smooth on an open interval I containing c and that all derivatives f
(n)
of all orders are bounded on I. Then f equals its Taylor series (centered at c) on I.
Proof. For simplicity, let c = 0. Suppose
f
(n+1)
(ξ)
K for all ξ I. Choose any N >
|
x
|
and
observe that
n > N =
|
R
n
(x)
|
K
|
x
|
n+1
(n + 1)!
=
K
|
x
|
n+1
N!(N + 1) ···(n + 1)
K
|
x
|
N
N!
|
x
|
N
n+1N
n
0
66
Examples 3.39. 1. The functions sine and cosine have derivatives bounded by 1 on R, and thus both
functions equal their Maclaurin series on R. This removes the need to have previously justified
these facts using the theory of differential equations.
2. The exponential function does not have bounded derivatives, however we can still apply Tay-
lor’s Theorem. For any fixed x, ξ between 0 and x such that
|
R
n
(x)
|
=
e
ξ
(n + 1)!
x
n+1
n
0
by the same argument in the Corollary. Thus e
x
equals its Maclaurin series on the real line (we
knew this already from Exercise 3.28.14).
3. Extending Example 3.32.3, we see that h(x) =
x has linear approximation (1
st
Taylor polyno-
mial) centered at c = 9
p
1
(x) = h(9) + h
(9)(x 9) = 3 +
1
6
(x 9)
This yields the simple approximation
10 p
1
(10) = 3 +
1
6
=
19
6
Taylor’s Theorem can be used to estimate its accuracy (remember to shift the center to 9!):
R
1
(10) =
h
′′
(ξ)
2!
(10 9)
2
=
1
2
2
·2!
ξ
3/2
=
1
8ξ
3/2
for some ξ (9, 10)
Certainly ξ
3/2
< 9
3/2
=
1
27
, whence
1
216
< R
1
(10) < 0 =
19
6
1
216
=
683
216
<
10 <
684
216
=
19
6
19
6
is therefore an overestimate for
10, but is accurate to within
1
216
< 0.005.
Alternative Versions of Taylors Theorem
There are two further common expressions for the remainder in Taylor’s Theorem. These are typ-
ically less easy to use than Lagrange’s form but can sometimes provide sharper estimates for the
remainder, particularly when x is far from the center of the series.
Corollary 3.40. Suppose f
(n+1)
is continuous on an open interval I containing c, let x I \ {c}, and
let R
n
(x) = f (x) p
n
(x) be the remainder for the Taylor polynomial centered at c. Then:
1. (Integral Remainder) R
n
(x) =
Z
x
c
(x t)
n
n!
f
(n+1)
(t) dt
2. (Cauchy’s Form) ξ between c and x such that R
n
(x) =
(x ξ)
n
n!
(x c) f
(n+1)
(ξ)
67
Using these expressions it is possible to explicitly prove Newton’s binomial series formula.
Corollary 3.41. If α R and
|
x
|
< 1, then
(1 + x)
α
= 1 +
n=1
α(α 1) ···(α n + 1)
n!
x
n
= 1 + αx +
α(α 1)
2!
x
2
+
α(α 1)(α 2)
3!
x
3
+
α(α 1)(α 2)(α 3)
4!
x
4
+ ···
If α N
0
, this is the usual binomial theorem. Otherwise it is more interesting: for instance,
1 + x = (1 + x)
1/2
= 1 +
1
2
x
1
8
x
2
+
1
16
x
3
5
128
x
4
+ ···
1
(1 + x)
3
= 1 3x + 6x
2
10x
3
+ 15x
4
···
Of course this last could easily be obtained from
1
1+x
=
(1)
n
x
n
by differentiating twice!
Exercises 3.31. Key concepts: Taylor Series/Polynomials, Lagrange’s form for Remainder
1. Compute the Maclaurin series for cos x directly from the definition and use Taylor’s Theorem
to indicate why it converges to cos x for all x R.
2. Repeat the previous exercise for sinh x =
1
2
(e
x
e
x
) and cosh x =
1
2
(e
x
+ e
x
).
3. Find the Maclaurin series for the function sin( 3x
2
). How do you know you are correct?
4. Find the Taylor series of f (x) = x
4
3x
2
+ 2x 5 at x = 2 and show that T
2
f (x) = f (x).
5. Find a rational approximation to
3
9 using the first Taylor polynomial for f (x) =
3
x. Now use
Taylor’s Theorem to estimate its accuracy.
6. If c = 1, use the fact that 1 x = (1 c)
1
xc
1c
to obtain the Taylor series of
1
1x
centered at
c. Hence conclude that
1
1x
is analytic on its domain R \{1}.
7. We prove that the Maclaurin series
n=1
(1)
n+1
n
x
n
converges to ln(1 + x) whenever 0 < x 1.
(a) Explicitly compute
d
n+1
dx
n+1
ln( 1 + x).
(b) Suppose 0 < x 1. Using Taylor’s Theorem, prove that lim
n
R
n
(x) = 0.
(If 1 < x < 0, the argument is tougher, being similar to Exercise 11)
8. Why can’t we use Taylor’s Theorem to approximate the error in
1
1x
= 1 + x + R
1
(x) when
x 1? Try it when x = 2, what happens? What about when x = 2?
9. Prove Taylor’s Theorem with integral remainder when c = 0 by using the following as an
induction step: for each n N, define
A
n
(x) =
Z
x
0
(x t)
n
n!
f
(n+1)
(t) dt
and use integration by parts to prove that A
n+1
= A
n
x
n+1
(n+1)!
f
(n+1)
(0).
(The Cauchy form follows from the intermediate value theorem for integrals which we’ll see later)
68
10. Consider the function
f (x) =
(
e
1/x
if x > 0
0 otherwise
(a) Prove by induction that there exists a degree 2n polynomial q
n
for which
f
(n)
(x) = q
n
1
x
e
1/x
whenever x > 0
(b) Prove that f is infinitely differentiable at x = 0 with f
(n)
(0) = 0 (use Exercise 3.30.3).
The Maclaurin series of f is identically zero! Moreover, f is smooth (infinitely differentiable) on R but
non-analytic at zero since it does not equal its Taylor series on any open interval containing zero.
A modification allows us to create bump functions, which find wide use in analysis. If a < b, define
g
a,b
: x 7 f (x a) f (b x)
This is smooth on R but non-zero only on the interval (a, b). A
further modification involving two such functions g
a,b
creates
a smooth function on R which satisfies
h
a,b,ϵ
(x) =
(
0 if x a ϵ or x b + ϵ
1 if a x b
This ‘switches on’ rapidly from 0 to 1 near a and switches off
similarly near b. By letting ϵ be small, we smoothly (but not
uniformly) approximate the indicator function on [a, b].
0
h
a,b,ǫ
(x)
0
x
aa ǫ b b + ǫ
1
11. (Hard) We prove the binomial series formula (Corollary 3.41).
Let f (x) = (1 + x)
α
and g(x) = 1 +
n=1
a
n
x
n
where a
n
=
α(α1)···(αn+1)
n!
. Our goal is to prove
that f = g on the interval (1, 1).
(a) Check that f
(n)
(0) = n!a
n
so that g really is the Maclaurin series of f .
(b) i. Prove that the radius of convergence of g is 1.
ii. Prove that lim
n
na
n
x
n
= 0 whenever
|
x
|
< 1.
iii. If
|
x
|
< 1 and ξ lies between 0 and x, prove that
xξ
1+ξ
|
x
|
.
(Hint: write ξ = tx for some t (0, 1). . . )
(c) Use Taylor’s Theorem with Cauchy remainder to prove that
|
R
n
(x)
|
< (n + 1)
|
a
n+1
||
x
|
n+1
(1 + ξ)
α1
Hence conclude that g = f whenever
|
x
|
< 1.
(d) Here is an alternative argument for the full result:
i. Show that (n + 1)a
n+1
+ na
n
= αa
n
.
ii. Differentiate term-by-term to prove directly that g satisfies the differential equation
(1 + x)g
(x) = αg(x). Solve this to show that g = f whenever
|
x
|
< 1.
69
4 Integration
The theory of infinite series addresses how to sum infinitely many finite quantities. Integration, by
contrast, is the business of summing infinitely many infinitesimal quantities. Attempts to do both have
been part of mathematics for well over 2000 years, and the philosophical objections are just as old.
17
The development and increased application of calculus from the late 1600s onward spurred mathe-
maticians to put the theory on a firmer footing, though from Newton and Leibniz it took another 150
years before Bernhard Riemann (1856) provided a thorough development of the integral.
4.32 The Riemann Integral
The basic idea behind Riemann integration is to approximate area using a sequence of rectangles
whose width tends to zero. The following discussion illustrates the essential idea, which should be
familiar from elementary calculus.
Example 4.1. Suppose f (x) = x
2
is defined on [0, 1].
For each n N, let x =
1
n
and define x
i
= ix.
Above each subinterval [x
i1
, x
i
], raise a rectangle of height
f (x
i
) = x
2
i
. The sum of the areas of these rectangles is the Rie-
mann sum with right-endpoints
18
R
n
=
n
i=1
f (x
i
)x =
n
i=1
i
2
n
3
=
n(n + 1)(2n + 1)
6n
3
=
1
3
+
3n + 1
6n
2
The Riemann sum with left-endpoints is defined similarly:
L
n
=
n
i=1
f (x
i1
)x =
n
i=1
(i 1)
2
n
3
=
1
3
3n 1
6n
2
Since f is an increasing function, the area A under the curve
plainly satisfies
L
n
A R
n
By the squeeze theorem, we conclude that A =
1
3
.
0
1
0 1
n =
16
R
n
=
0.365234
0
1
0 1
n =
16
L
n
=
0.302734
The example should feel convincing, though perhaps this is due to the simplicity of the function. To
apply this approach to more general functions, we need to be significantly more rigorous.
17
Two of Zeno’s ancient paradoxes are relevant here: Achilles and the Tortoise concerns a convergent infinite series,
while the Arrow Paradox toys with integration by questioning whether time can be viewed as a sum of instants. Perhaps
the most famous contemporary criticism comes from Bishop George Berkeley, who gave his name to the city and first
UC campus: in 1734’s The Analyst, Berkeley savaged the foundations of calculus, describing the infinitesimal increments
required in Newton’s theory of fluxions (derivatives) as merely the “ghosts of departed quantities.”
18
Recall some basic identities:
n
i=1
i =
1
2
n(n + 1),
n
i=1
i
2
=
1
6
n(n + 1)(2n + 1),
n
i=1
i
3
=
1
4
n
2
(n + 1)
2
70
Definition 4.2. A partition P = {x
0
, . . . , x
n
} of an interval [a, b] is a finite sequence for which
a = x
0
< x
1
< ··· < x
n1
< x
n
= b
Choosing a sample point x
i
in each subinterval [x
i1
, x
i
] results in a tagged partition.
The mesh of the partition is mesh(P) := max x
i
, the width x
i
= x
i
x
i1
of the largest subinterval.
If f : [a, b] R, the Riemann sum
n
i=1
f (x
i
) x
i
evaluates the area of a family of n rectangles, as
pictured. The heights f (x
i
) and thus areas can be negative or zero.
f (x)
x
x
1
x
1
x
2
x
2
x
3
x
3
x
4
x
4
x
5
x
5
x
6
x
6
b = x
7
x
7
a = x
0
b = x
7
In elementary calculus, one typically computes Riemann sums for equally-spaced partitions with left,
right or middle sample points. The flexibility of tagged partitions makes applying Riemann’s defini-
tion a challenge, so we instead consider two special families of rectangles.
Definition 4.3. Given a partition P of [a, b] and a bounded function f on [a, b], define
M
i
= sup
x[x
i1
,x
i
]
f (x) U( f , P) =
n
i=1
M
i
x
i
m
i
= inf
x[x
i1
,x
i
]
f (x) L( f , P) =
n
i=1
m
i
x
i
U( f , P) and L( f , P) are the upper and lower Darboux sums for
f with respect to P. The upper and lower Darboux integrals are
U( f ) = inf U( f , P) L( f ) = sup L( f , P)
where the supremum/infimum are taken over all partitions.
Necessarily both integrals are finite.
We say that f is (Riemann) integrable on [a, b] if U( f ) = L( f ) .
We denote this value by
Z
b
a
f or
Z
b
a
f (x) dx
a x
1
x
2
x
3
x
4
x
5
x
6
x
7
b
Upper Darboux sum U( f , P)
a x
1
x
2
x
3
x
4
x
5
x
6
x
7
b
Lower Darboux sum L( f , P)
If the interval is understood or irrelevant, one often simply says that f is integrable and writes
R
f .
Intuitively, L( f , P) is the sum of the areas of rectangles built on P which just fit under the graph of f .
It is also the infimum of all Riemann sums on P. If f is discontinuous, then L( f , P) need not itself be
a Riemann sum, as there might not exist suitable sample points!
71
Examples 4.4. 1. We revisit Example 4.1 in this language.
Given a partition Q = {x
0
, . . . , x
n
} of [0, 1] and sample points x
i
[x
i1
, x
i
], we compute the
Riemann sum for f (x) = x
2
n
i=1
f (x
i
) x
i
=
n
i=1
(x
i
)
2
(x
i
x
i1
)
Since f is increasing, we have x
2
i1
(x
i
)
2
x
2
i
on each interval, whence
L( f , Q) =
n
i=1
(x
i1
)
2
(x
i
x
i1
)
n
i=1
(x
i
)
2
(x
i
x
i1
)
n
i=1
(x
i
)
2
(x
i
x
i1
) = U( f , Q)
The Darboux sums are therefore the Riemann sums for left- and right-endpoints.
If we take Q
n
to be the partition with subintervals of equal width x =
1
n
, then
U( f ) = inf
P
U( f , P) U( f , Q
n
) =
n
i=1
i
n
2
x = R
n
is the right Riemann sum discussed originally. Similarly L( f ) L
n
. Since L
n
and R
n
both
converge to
1
3
as n , the squeeze theorem forces
L
n
L( f ) U( f ) R
n
= L( f ) = U( f ) =
1
3
Otherwise said, f is integrable on [0, 1] with
R
1
0
x
2
dx =
1
3
.
2. Suppose f (x) = kx + c on [a, b], and that k > 0. Take
the evenly spaced partition P
n
where x
i
= a +
ba
n
i.
Since f is increasing, the upper Darboux sum is again
the Riemann sum with right-endpoints:
U( f , P
n
) = R
n
=
n
i=1
f (x
i
)x
=
b a
n
n
i=1
k(b a)
n
i + ak + c
0
ba
c
ak + c
bk + c
U( f , P
n
)
=
b a
n
k(b a)
n
·
1
2
n(n + 1) + (ak + c)n
n
1
2
k(b a)
2
+ (b a)(ak + c) =
k
2
(b
2
a
2
) + c( b a)
Similarly, the lower Darboux sum is the Riemann sum with left-endpoints:
L( f , P
n
) = L
n
=
b a
n
k(b a)
n
·
1
2
n(n 1) + (ak + c)n
n
k
2
(b
2
a
2
) + c( b a)
As above, L
n
L( f ) U( f ) R
n
and the squeeze theorem prove that f is integrable on [a, b]
with
R
b
a
f =
k
2
(b
2
a
2
) + c( b a).
72
Now we have some examples, a few remarks are in order.
Riemann versus Darboux Definition 4.3 is really that of the Darboux integral. Here is Riemann’s defi-
nition: f : [a, b] R being integrable with integral
R
b
a
f means
ϵ > 0, δ such that (P, x
i
) mesh(P) < δ =
n
i=1
f (x
i
)x
i
Z
b
a
f
< ϵ
This is significantly more difficult to work with, though it can be shown to be equivalent to the
Darboux integral. We won’t pursue Riemann’s formulation further, except to observe that if
a function is integrable and mesh(P
n
) 0, then
R
b
a
f = lim
n
n
i=1
f (x
i
)x
i
: this allows us to
approximate integrals using any sample points we choose, hence why right-endpoints (x
i
= x
i
)
are so common in Freshman calculus.
Monotone Functions Darboux sums are easy to compute for monotone functions. As in the examples,
if f is increasing, then each M
i
= f (x
i
), from which U( f , P) is the Riemann sum with right-
endpoints. Similarly, L( f , P) is the Riemann sum with left-endpoints.
Area If f is positive and continuous,
19
the Riemann integral
R
b
a
f serves as a definition for the area
under the curve y = f (x). This should make intuitive sense:
1. In the second example where we have a straight line, we obtain the same value for the
area by computing directly as the sum of a rectangle and a triangle!
2. For any partition P, the area under the curve should satisfy the inequalities
L( f , P) Area U( f , P)
But these are precisely the same inequalities satisfied by the integral itself!
L( f , P) L( f ) =
Z
b
a
f = U( f ) U( f , P)
In the examples we exhibited a sequence of partitions (P
n
) where U( f , P
n
) and L( f , P
n
) converged to
the same limit. The remaining results in this section develop some basic properties of partitions and
make this limiting process rigorous.
Definition 4.5. If P Q are both partitions of [a, b], we call Q a refinement of P.
To refine a partition, we simply throw some more points in!
Lemma 4.6. Suppose f : [a, b] R is bounded.
1. If Q is a refinement of P (on [a, b]), then
L( f , P) L( f , Q) U( f , Q) U( f , P)
2. For any partitions P, Q of [a, b], we have L( f , P) U( f , Q).
3. L( f ) U( f )
19
We’ll see in Theorem 4.17 that every continuous function is integrable.
73
Proof. 1. We prove inductively. Suppose first that Q = P {t} contains exactly one additional point
t (x
k1
, x
k
). Write
m
1
= inf
f (x) : x [x
k1
, t]
m
2
= inf
f (x) : x [t, x
k
]
m = inf
f (x) : x [x
k1
, x
k
]
= min{m
1
, m
2
}
The Darboux sums L( f , P) and L( f , Q) are identical ex-
cept for the terms involving t. This results in extra area:
x
k1
x
k
t
Extra area!
··· ···
m
1
m
2
L( f , Q) L( f , P) = m
1
(t x
k1
) + m
2
(x
k
t) m(x
k
x
k1
)
= (m
1
m)(t x
k1
) + (m
2
m)(x
k
t) 0
More generally, since a refinement Q is obtained by adding finitely many new points, induction
tells us that P Q = L( f , P) L( f , Q) .
The argument for U( f , Q) U( f , P) is similar, and the middle inequality is trivial.
2. If P and Q are partitions, then P Q is a refinement of both P and Q. By part 1,
L( f , P) L( f , P Q) U( f , P Q) U( f , Q) ()
3. This is an exercise.
Theorem 4.7. Suppose f : [a, b] R is bounded.
1. (Cauchy criterion) f is integrable ϵ > 0, P such that U( f , P) L( f , P) < ϵ.
2. f is integrable (P
n
)
nN
such that U( f , P
n
) L( f , P
n
) 0. In such a situation, both
sequences U( f , P
n
) and L( f , P
n
) converge to
R
b
a
f .
Part 1 is termed a ‘Cauchy’ criterion since it doesn’t mention the integral (limit).
Proof. We prove the Cauchy criterion, leaving part 2 as an exercise.
() Suppose f is integrable and that ϵ > 0 is given. Since inf U( f , Q) =
R
f = sup L( f , R), there
exist partitions Q, R such that
U( f , Q) <
Z
f +
ϵ
2
and L( f , R) >
Z
f
ϵ
2
Let P = Q R and apply (): L( f , R) L( f , P) U( f , P) U( f , Q). But then
U( f , P) L( f , P) U( f , Q) L( f , R) = U( f , Q)
Z
f +
Z
f L( f , R) < ϵ
() Assume the right hand side. For every partition, L( f , P) L( f ) U( f ) U( f , P). Thus
0 U( f ) L( f ) U( f , P) L( f , P) < ϵ
Since this holds for all ϵ > 0, we see that U( f ) = L( f ): that is, f is integrable.
74
Examples 4.8. 1. Consider f (x) =
x on the interval [0, b]. We choose a sequence of partitions (P
n
)
that evaluate nicely when fed to this function:
P
n
= {x
0
, . . . , x
n
} where x
i
=
i
n
2
b
= x
i
= x
i
x
i1
=
b
n
2
i
2
(i 1)
2
=
(2i 1)b
n
2
Since f is increasing on [0, b], we see that
U( f , P
n
) =
n
i=1
f (x
i
)x
i
=
n
i=1
i
b
n
·
(2i 1)b
n
2
=
b
3/2
n
3
n
i=1
2i
2
i
=
b
3/2
n
3
1
3
n(n + 1)(2n + 1)
1
2
n(n + 1)
n
2
3
b
3/2
Similarly
L( f , P
n
) =
n
i=1
f (x
i1
)x
i
=
n
i=1
(i 1)
b
n
·
(2i 1)b
n
2
=
b
3/2
n
3
n
i=1
2i
2
3i + 1
=
b
3/2
n
3
1
3
n(n + 1)(2n + 1)
3
2
n(n + 1) + n
n
2
3
b
3/2
Since the limits are equal, we conclude that f is integrable and
R
b
0
x dx =
2
3
b
3/2
.
0
0 b
b
Upper Sum U( f , P
n
)
0
0 b
b
Lower Sum L ( f , P
n
)
2. Here is the classic example of a non-integrable function. Let f : [a, b] R to be the indicator
function of the irrational numbers,
f (x) =
(
1 if x Q
0 if x Q
Suppose P = {x
0
, . . . , x
n
} is any partition of [a, b] . Since any interval of positive length contains
both rational and irrational numbers, we see that
sup
f (x) : x [x
i1
, x
i
]
= 1 = U( f , P) =
n
i=1
(x
i
x
i1
) = b a = U( f ) = b a
inf
f (x) : x [x
i1
, x
i
]
= 0 = L( f , P) = 0 = L( f ) = 0
Since the upper and lower Darboux integrals differ, f is not (Riemann) integrable.
75
As any freshman calculus student can attest, if you can find an anti-derivative, then the fundamen-
tal theorem of calculus (Section 4.34) makes evaluating integrals far easier. For instance, you are
probably desperate to write
d
dx
2
3
x
3/2
= x
1/2
=
Z
b
0
x dx =
2
3
x
3/2
b
0
=
2
3
b
3/2
rather than computing Riemann/Darboux sums as in the previous example! However, in most prac-
tical situations, no easy-to-compute anti-derivative exists; the best we can do is to approximate using
Riemann sums for progressively finer partitions. Thankfully computers excel at such tedious work!
Exercises 4.32. Key concepts: Darboux sums/integrals, Partitions, sample points & refinements,
Cauchy & sequential criteria for integrability
1. Use partitions to find the upper and lower Darboux integrals on the interval [0, b]. Hence prove
that the function is integrable and compute its integral.
(a) f (x) = x
3
(b) g(x) =
3
x
2. Repeat question 1 for the following two functions. You cannot simply compute Riemann sums
for left and right endpoints and take limits: why not?
(a) h(x) = x(2 x) on [0, 2]
(Hint: choose a partition with 2n subintervals such that x
n
= 1 and observe that h(2 x) = h(x))
(b) On the interval [0, 3], let k(x) =
(
2x if x 1
5 x if x > 1
(Hint: this time try a partition with 3n subintervals)
3. Let f (x) = x for rational x and f (x) = 0 for irrational x. Calculate the upper and lower
Darboux integrals for f on the interval [0, b]. Is f integrable on [0, b]?
4. Prove part 3 of Lemma 4.6: L( f ) U( f ).
5. Prove part 2 of Theorem 4.7.
f is integrable (P
n
)
nN
such that lim
n
U( f , P
n
) L( f , P
n
)
= 0
Moreover, prove that both U( f , P
n
) and L( f , P
n
) converge to
R
f .
6. (a) Reread Definition 4.3. What happens if we allow f : [a, b] R to be unbounded?
(b) (Hard) Read Riemann versus Darboux on page 73. Explain why being Riemann integrable
also forces f to be bounded.
(c) (Hard) Explain the observation that L( f , P) is the infimum of the set of all Riemann sums
on P.
7. (If you like coding) Write a short program to estimate
R
b
a
f (x) dx using Riemann sums. This
can be very simple (equal partitions with right endpoints), or more complex (random partition
and sample points given a mesh). Apply your program to estimate
R
5
0
sin(x
2
e
x
) dx.
76
4.33 Properties of the Riemann Integral
The rough take-away of this long section is that everything you think is integrable probably is! Ex-
amples will be few, since we have not established many explicit values for integrals.
Theorem 4.9 (Linearity). If f , g are integrable and k, l are constant, then k f + lg is integrable and
Z
k f + lg = k
Z
f + l
Z
g
Example 4.10. Thanks to examples in the previous section, we can now calculate, e.g.,
Z
2
0
5x
3
3
x dx = 5 ·
1
4
·2
4
3 ·
2
3
·2
3/2
= 20 4
2
Proof. Suppose ϵ > 0 is given. By the Cauchy criterion (Theorem 4.7, part 1), there exist partitions
R, S such that
U( f , R) L( f , R) <
ϵ
2
and U(g, S) L(g, S) <
ϵ
2
If P = R S, then both inequalities are satisfied by P (Lemma 4.6). On each subinterval,
inf f (x) + inf g(x) inf
f (x) + g(x)
and sup
f (x) + g(x)
sup f (x) + sup g(x)
since the individual suprema/infima could be ‘evaluated’ at different places. Thus
L( f , P) + L(g, P) L( f + g, P) U( f + g, P) U( f , P) + U(g, P)
whence U( f + g, P) L( f + g, P) < ϵ and f + g is integrable. Moreover,
Z
( f + g)
Z
f
Z
g
U( f , P)
Z
f
+
U(g, P)
Z
g
< ϵ
Using lower Darboux integrals similarly obtains the other half of the inequality
ϵ <
Z
( f + g)
Z
f
Z
g < ϵ
Since this holds for all ϵ > 0, we conclude that
R
( f + g) =
R
f +
R
g.
That k f is integrable with
R
k f = k
R
f is an exercise. Put these together for the result.
Corollary 4.11 (Changing endvalues). Suppose f is integrable on [a, b] and g : [a, b] R satisfies
f (x) = g(x) on (a, b). Then g is also integrable on [a, b] and
R
b
a
g =
R
b
a
f .
Definition 4.12 (Integration on an open interval). A bounded function g : (a, b) R is integrable if
it has an integrable extension f : [a, b] R where f (x) = g(x) on (a, b). In such a case, we define
R
b
a
g :=
R
b
a
f .
The Corollary (its proof is an exercise) shows that the choice of extension is irrelevant.
77
Theorem 4.13 (Basic integral comparisons). Suppose f and g are integrable on [a, b]. Then:
1. f (x) g(x) =
R
f
R
g
2. m f (x) M = m(b a)
R
b
a
f M(b a)
3. f g is integrable.
4.
|
f
|
is integrable and
R
f
R
|
f
|
5. max( f , g) and min( f , g) are both integrable.
Part 3 is not integration by parts since it doesn’t tell us how
R
f g relates to
R
f and
R
g!
Proof. 1. Since g f is positive and integrable, L(g f , P) 0 for all partitions P. But then
0 inf L(g f , P) = L(g f ) =
Z
g f =
Z
g
Z
f
2. Apply part 1 twice.
3. This is an exercise.
4. The integrability is an exercise. For the comparison, apply part 1 to
|
f
|
f
|
f
|
.
5. Use max( f , g) =
1
2
( f + g) +
1
2
|
f g
|
, etc., together with the previous parts.
Theorem 4.14 (Domain splitting). Suppose f : [a, b] R and
let c (a, b). If f is integrable on both [a, c] and [c, b], then it is
integrable on [a, b] and
Z
b
a
f =
Z
c
a
f +
Z
b
c
f
f (x)
a c b x
R
c
a
f
R
b
c
f
In light of this result, it is conventional to allow integral limits to be reversed: if a < b, then
Z
a
b
f :=
Z
b
a
f is consistent with
Z
a
a
f = 0
Proof. Let ϵ > 0 be given, then R, S partitions of [a, c], [c, b] such that
U( f , R) L( f , R) <
ϵ
2
, U( f , S) L( f , S) <
ϵ
2
Choose P = R S to partition [a, b], then
U( f , P) L( f , P) = U( f , R) + U( f , S) L( f , R) L( f , S) < ϵ
Moreover
f (x)
a c b x
R
z
}| {
S
z
}| {
Z
b
a
f
Z
c
a
f
Z
b
c
f U( f , P) L( f , R) L( f , S) = U( f , P) L( f , P) < ϵ
Showing that this expression is greater than ϵ is similar.
78
Example 4.15. If f (x) =
x on [0, 1] and f (x) = 1 on [1, 2], then
Z
2
0
f =
Z
1
0
x dx +
Z
2
1
1 dx =
2
3
+ 1 =
5
3
Monotonic & Continuous Functions We establish the integrability of two large classes of functions.
Definition 4.16. A function f : [a, b] R is:
Monotonic if it is either increasing (x < y = f (x) f (y)) or decreasing.
Piecewise monotonic if there is a partition P = {x
0
, . . . , x
n
} (finite!) of [a, b] such that f is monotonic
on each open subinterval (x
k1
, x
k
).
Piecewise continuous if there is a partition such that f is uniformly continuous on each (x
k1
, x
k
).
Theorem 4.17. If f is monotonic or continuous on [a, b], then it is integrable.
Examples 4.18. 1. Since sine is continuous, we can approximate via a sequence of Riemann sums
Z
π
0
sin x dx = lim
n
n
i=1
π
n
sin
πi
n
Evaluating this limit is another matter entirely, one best handled in the next section...
2. Similarly, e
x
is integrable and therefore may be approximated via Riemann sums:
Z
1
0
e
x
dx = lim
n
n
i=1
1
n
exp
r
i
n
= lim
n
n
j=1
2j 1
n
exp
j
n
Both sums use right endpoints: the first has equal subintervals, while the second is analogous
to Example 4.8.1. These limits would typically be estimated using a computer.
Proof. Since [a, b] is closed and bounded, a continuous function f is uniformly so. Let ϵ > 0 be given:
δ > 0 such that x, y [a, b],
|
x y
|
< δ =
|
f (x) f (y)
|
<
ϵ
b a
Let P be a partition with mesh P < δ. Since f attains its bounds on each [x
i1
, x
i
],
x
i
, y
i
[x
i1
, x
i
] such that M
i
m
i
= f (x
i
) f (y
i
) <
ϵ
b a
from which
U( f , P) L( f , P) <
n
i=1
ϵ
b a
(x
i
x
i1
) = ϵ
The monotonicity argument is an exercise.
Combining the proof with Definition 4.12: every uniformly continuous f : (a, b) R is integrable.
79
Corollary 4.19. Piecewise continuous and bounded piecewise monotonic functions are integrable.
Proof. If f is piecewise continuous, then the restriction of f to (x
k1
, x
k
) has a continuous extension
g
k
: [x
k1
, x
k
] R; this is integrable by Theorem 4.17. By Corollary 4.11, f is integrable on [x
k1
, x
k
]
with
R
x
k
x
k1
f =
R
x
k
x
k1
g
k
. Theorem 4.14 (n 1 times!) finishes things off:
Z
b
a
f =
n
k=1
Z
x
k
x
k1
f
The argument for piecewise monotonicity is similar.
Example 4.20. The ‘fractional part’ function f (x) = x x
is both piecewise continuous and piecewise monotone on any
bounded interval. It is therefore integrable on any such interval.
0
1
0 1 2 3 4 5
For a final corollary, here is one more incarnation of the intermediate value theorem.
Corollary 4.21 (IVT for integrals). If f is continuous on [a, b], then ξ (a, b) for which
f (ξ) =
1
b a
Z
b
a
f
Proof. Since f is continuous, it is integrable on [a, b]. By the extreme value theorem it is also bounded
and attains its bounds: p, q [a, b] such that
f (p) := inf
x[a,b]
f (x), f (q) = sup
x[a,b]
f (x)
Applying Theorem 4.13, part 2, with m = f (p) and
M = f (q), we see that
(b a) f (p)
Z
b
a
f (b a) f (q)
f
av
M
m
ξa bp q
Divide by b a and apply the usual intermediate value theorem for f to see that the required ξ exists
between p and q.
In the picture, when f is positive and continuous, the grey area equals that under the curve; imagine
levelling off the blue hill with a bulldozer. . . The notation f
av
=
1
ba
R
b
a
f indicates the average value
of f on [a, b]: to see why this interpretation is sensible, take a sequence of Riemann sums on equally-
spaced partitions P
n
(x =
ba
n
) to see that
1
b a
Z
b
a
f = lim
n
n
i=1
f (x
i
)
x
b a
= lim
n
f (x
1
) + ··· + f (x
n
)
n
is the limit of a sequence of averages of equally-spaced samples f (x
i
).
80
What can/cannot be integrated?
We now know a great many examples of integrable functions:
Piecewise continuous & monotonic functions are integrable.
Linear combinations, products, absolute values, maximums and minimums of (already) inte-
grable functions.
By contrast, we’ve only seen one non-integrable function (Example 4.8.2). After so many positive
integrability conditions, it is reasonable to ask precisely which functions are Riemann integrable.
Here is the answer, though it is quite tricky to understand.
Theorem 4.22 (Lebesgue). Suppose f : [a, b] R is bounded. Then
f is Riemann integrable it is continuous except on a set of measure zero
Na
¨
ıvely, the measure of a set is the sum of the lengths of its maximal subintervals, though unfortu-
nately this doesn’t make for a very useful definition.
20
Any countable subset has measure zero, so
Lebesgue’s result is almost as if we can extend Corollary 4.19 to allow for infinite sums. For instance,
Exercise 1.17.8 describes a function which is continuous only on the irrationals: it is thus Riemann
integrable (indeed
R
b
a
f = 0 for any a < b). There are also uncountable sets with measure zero such
as Cantor’s middle-third set C: the function
f (x) =
(
1 if x C
0 otherwise
is continuous except on C and therefore Riemann integrable; again
R
1
0
f (x) dx = 0.
Exercises 4.33. Key concepts: Linear combinations, products, etc., of integrable functions are integrable,
Continuous and monotone functions are integrable, Integrability on open intervals
1. Explain why
R
2π
0
x
2
sin
8
(e
x
) dx
8
3
π
3
2. If f is integrable on [a, b] prove that it is integrable on any interval [c, d] [a, b].
3. We complete the proof of Theorem 4.9 (linearity of integration).
(a) Suppose k > 0, let A R and define kA := {kx : x A}. Prove that sup kA = k sup A
and inf kA = k inf A.
(b) If k > 0 prove that k f is integrable on any interval and that
R
k f = k
R
f .
(c) How should you modify your argument if k < 0?
20
Formally, the length of an open interval (a, b) is b a and a set A R has measure zero if
ϵ > 0, open intervals I
n
such that A
[
n=1
I
n
and
i=1
length(I
n
) < ϵ
More generally, the Lebesgue measure of a set (subject to a technical condition) is the infimum of the sum of the lengths of
any countable collection of open covering intervals. Measure theory is properly a matter for graduate study. Surprisingly,
there exist sets with positive measure that contain no subintervals, and even sets which are non-measurable!
81
4. Give an example of an integrable but discontinuous function on a closed bounded interval [a, b]
for which the conclusion of the Intermediate Value Theorem for Integrals is false.
5. Use Darboux sums to compute the value of the integral
R
15/2
1/2
x xdx (Example 4.20).
6. We prove and extend Corollary 4.11. Suppose f is integrable on [a, b].
(a) If g : [a, b] R satisfies f (x) = g(x) for all x (a, b), prove that g is integrable and
R
b
a
g =
R
b
a
f .
(Hint: consider h = f g and show that
R
h = 0)
(b) Now suppose g : [a, b] R satisfies f (x) = g(x) for all x [a, b] except at finitely many
points. Prove that g is integrable and
R
b
a
g =
R
b
a
f .
7. Show that an increasing function on [a, b] is integrable and thus complete Theorem 4.17.
(Hint: Choose a partition with mesh P <
ϵ
f (b)f (a)
)
8. Suppose f and g are integrable on [a, b].
(a) Define h(x) =
f (x)
2
. We know:
f is bounded: K such that
|
f (x)
|
K on [a, b].
Given ϵ > 0, P such that U( f , P) L( f , P) <
ϵ
2K
. For each subinterval [x
i1
, x
i
], let
M
i
= sup f (x), m
i
= inf f (x), M
i
= sup h(x), m
i
= inf h(x)
Prove that M
i
m
i
2(M
i
m
i
)K. Hence conclude that h is integrable.
(b) Prove that f g is integrable.
(Hint: f g =
1
4
( f + g)
2
1
4
( f g)
2
)
(c) Prove that U(
|
f
|
, P) L(
|
f
|
, P) U( f , P) L( f , P) for any partition P. Hence conclude
that
|
f
|
is integrable.
(One can extend these arguments to show that if j is continuous, then j f is integrable. Parts (a) and
(c) correspond, respectively, to j(x) = x
2
and j(x) =
|
x
|
.)
9. (Hard) Let f (x) =
x if x = 0 and sin
1
x
> 0
x if x = 0 and sin
1
x
< 0
0 if x = 0
(a) Show that f is not piecewise continuous on [0, 1].
(b) Show that f is not piecewise monotonic on [0, 1].
(c) Show that f is integrable on [0, 1].
(Hint: given ϵ, hunt for a suitable partition to make U( f , P) L( f , P) < ϵ by considering [0, x
1
]
differently to the other subintervals)
(d) Make a similar argument which proves that g = sin
1
x
is integrable on ( 0, 1].
(Hint: Show that g has an integrable extension on [0, 1])
82
4.34 The Fundamental Theorem of Calculus
The key result linking integration and differentiation is usually presented in two parts. While there
are significant subtleties, the rough statements are as follows (we follow the traditional numbering):
Part I Differentiation reverses integration:
d
dx
R
x
a
f (t) dt = f (x)
Part II Integration reverses differentiation:
R
b
a
F
(x) dx = F( b) F(a)
These facts seemed intuitively obvious to early practitioners of calcu-
lus. Given a continuous positive function f :
Let F(x) denote the area under y = f (x) between 0 and x.
A small increase x results in the area increasing by F.
F f (x)x is approximately the area of a rectangle, whence
F
x
f (x). This is part I.
F(b) F(a)
F
i
f (x
i
)x
i
. Since F
= f , this is part II.
F
x
x
f (x)
When Leibniz introduced the symbols
R
and d in the late 1600s, it was partly to reflect the fundamen-
tal theorem.
21
If you’re happy with non-rigorous notions of limit, rate of change, area, and (infinite)
sums, the above is all you need!
Of course we are very much concerned with the details: What must we assume about f and F, and
how are these properties used in the proof?
Theorem 4.23 (FTC, part I). Suppose f is integrable on [a, b]. For any x [a, b], define
F(x) :=
Z
x
a
f (t) dt
Then:
1. F is uniformly continuous on [a, b];
2. If f is continuous at c [a, b], then F is differentiable
22
at c with F
(c) = f ( c).
Compare this with the na
¨
ıve version above where we assumed f was continuous. We now require
only the integrability of f , and its continuity at one point for the full result.
21
R
is a stylized S for sum, while d stands for difference. Given a sequence F = (F
0
, F
1
, F
2
, . . . , F
n
), construct a new
sequence of differences
dF = (F
1
F
0
, F
2
F
1
, . . . , F
n
F
n1
)
which can then be summed:
Z
dF = (F
1
F
0
) + (F
2
F
1
) + ···(F
n
F
n1
) = F
n
F
0
()
Viewing a function as an ‘infinite sequence’ of values spaced along an interval, dF becomes a sequence of infinitesimals and
() is essentially the fundamental theorem:
R
dF = F(b) F(a). It is the concept of function that is suspect here, not the
essential relationship between sums and differences.
22
Strictly: if c = a, then F is right-differentiable, etc.
83
Examples 4.24. Examples in every elementary calculus course.
1. Since f (x) = sin
2
(x
3
7) is continuous on any bounded interval, we conclude that
d
dx
Z
x
4
sin
2
(t
3
7) dt = sin
2
(x
3
7)
If one follows Theorem 4.14 and its conventions, then this is valid for all x R.
2. The chain rule permits more complicated examples. For instance: f (t) = sin
t is continuous
on its domain [0, ) and y(x) = x
2
+ 3 has range [3, ) dom( f ), whence
d
dx
Z
x
2
+3
0
sin
t dt =
dy
dx
d
dy
Z
y
0
sin
t dt = 2x sin
p
x
2
+ 3
3. For a final positive example, we consider when
d
dx
Z
e
x
sin x
tan(t
2
) dt = e
x
tan(e
2x
) cos x tan(sin
2
x)
Makes sense. To evaluate this, first choose any constant a and write
Z
e
x
sin x
=
Z
e
x
a
+
Z
a
sin x
=
Z
e
x
a
Z
sin x
a
before differentiating. This is valid provided sin x, e
x
and a all lie in the same subinterval of
dom tan(t
2
) = R \ {±
q
π
2
, ±
q
3π
2
, ±
q
5π
2
, . . .}
Since
|
sin x
|
1 <
p
π
2
, this requires
e
2x
<
π
2
x <
1
2
ln
π
2
Choosing a = 1 would certainly suffice.
4. Now consider why the theorem requires continuity. The piecewise
continuous function
f : [0, 2] R : x 7
(
2x if x 1
1
2
if x > 1
has a jump discontinuity at x = 1. We can still compute
F(x) =
(
R
x
0
2t dt = x
2
if x 1
R
1
0
2t dt +
R
x
1
1
2
dt =
1
2
(x + 1) if x > 1
This is continuous, indeed uniformly so! However the discontinu-
ity of f results in F having a corner and thus being non-differentiable
at x = 1. Indeed F
(x) = f (x) whenever x = 1: that is, at all values
of x where f is continuous.
0
1
2
f (x)
0 1 2
x
0
1
F(x)
0 1 2
x
84
Proving FTC I Neither half of the theorem is particularly difficult once you write down what you
know and what you need to prove. Here are the key ingredients:
1. Uniform continuity for F means we must control the size of
|
F(y) F(x)
|
=
Z
y
a
f (t) dt
Z
x
a
f (t) dt
=
Z
y
x
f (t) dt
Z
y
x
|
f (t)
|
dt
But the boundedness of f allows us to control this last integral. . .
2. F
(c) = f ( c) means showing that lim
xc
F(x)F(c)
xc
= f (c), which means controlling the size of
F(x) F(c)
x c
f (c)
=
1
x c
Z
x
c
f (t) dt f (c)
The trick here will is to bring the constant f (c) inside the integral as
1
xc
R
x
c
f (c) dt so that the
above becomes
1
|
xc
|
R
x
c
|
f (t) f (c)
|
dt. This may now be controlled via the continuity of f . . .
Proof. 1. Since f is integrable, it is bounded: M > 0 such that
|
f (x)
|
M for all x.
Let ϵ > 0 be given and define δ =
ϵ
M
. Then, for any x, y [a, b],
0 < y x < δ =
|
F(y) F(x)
|
=
Z
y
x
f (t) dt
Z
y
x
|
f (t)
|
dt (Theorem 4.13, part 4)
M(y x) (Theorem 4.13, part 2)
< Mδ = ϵ
We conclude that F is uniformly continuous on [a, b] .
2. Let ϵ > 0 be given. Since f is continuous at c, δ > 0 such that, for all t [a, b] ,
|
t c
|
< δ =
|
f (t) f (c)
|
<
ϵ
2
Now for all x [a, b] (except c),
0 <
|
x c
|
< δ =
F(x) F(c)
x c
f (c)
=
1
x c
Z
x
c
f (t) f (c) dt
(Theorem 4.9)
1
|
x c
|
Z
x
c
|
f (t) f (c)
|
dt (Theorem 4.13)
1
|
x c
|
ϵ
2
|
x c
|
=
ϵ
2
< ϵ
Clearly lim
xc
F(x)F(c)
xc
= f (c). Otherwise said, F is differentiable at c with F
(c) = f ( c).
85
The Fundamental Theorem, part II As with part I, the formulaic part of the result should be familiar,
though we are more interested in the assumptions and where they are needed.
Theorem 4.25 (FTC, part II). Suppose g is continuous on [a, b] , differentiable on (a, b), and moreover
that g
is integrable on (a, b) (recall Definition 4.12). Then,
Z
b
a
g
= g(b) g(a)
Part II is often expressed in terms of anti-derivatives: F being an anti-derivative of f if F
= f . Com-
bined with FTC, part I, we recover the familiar +c result and a simpler version of the fundamental
theorem often seen in elementary calculus.
Corollary 4.26. Let f be continuous on [a, b].
If F is an anti-derivative of f , then
R
b
a
f = F(b) F(a).
Every anti-derivative of f has the form F(x) =
R
x
a
f (t) dt + c for some constant c.
Examples 4.27. Again, basic examples should be familiar.
1. Plainly g(x) = x
2
+ 2x
3/2
is continuous on [1, 4] and differentiable on (1, 4) with derivative
g
(x) = 2x + 3
x; this last is continuous (and thus integrable) on (1, 4). We conclude that
Z
4
1
2x + 3
x dx = x
2
+ 2x
3/2
4
1
= (16 + 16) (1 + 2) = 29
2. If g(x) = sin(3x
2
), then g
(x) = 6x cos(3x
2
). Certainly g satisfies the hypotheses of the theorem
on any bounded interval [a, b] . We conclude
Z
b
a
6x cos(3x
2
) dx = sin(3b
2
) sin(3a
2
)
Moreover, every anti-derivative of f (x) = 6x cos(3x
2
) has the form F(x) = sin(3x
2
) + c.
3. Recall Example 4.24.4 where the discontinuity of f at x = 1 led to the non-differentiability of
F(x) =
R
x
0
f (t) dt. The function F therefore fails the hypotheses of FTC II on the interval [0, 2].
It almost, however, satisfies the conclusions of FTC II, though this is somewhat tautological
given the definition of F: except at x = 1, F is certainly an anti-derivative of f , and moreover
R
2
0
f (x) dx = F(2) F(0).
In case you’re worried that this makes the theorem trivial, note that other anti-derivatives
ˆ
F of
f exist (except at x = 1) which fail to satisfy the conclusion. For instance
ˆ
F(x) =
(
x
2
if x < 1
1
2
x if x > 1
=
ˆ
F(2)
ˆ
F(0) = 1 =
3
2
=
Z
2
0
f (x) dx
86
Proving FTC II Exercise 10 offers a relatively easy proof when g
= f is continuous. For the real
McCoy, we can only rely on the integrability of g
: the trick is to use the mean value theorem to write
g(b) g(a) as a Riemann sum over a suitable partition.
Proof. Suppose ϵ > 0 is given. Since g
is integrable, we may choose some partition P satisfying
U(g
, P) L(g
, P) < ϵ. Since g satisfies the mean value theorem on each subinterval,
ξ
i
(x
i1
, x
i
) such that g
(ξ
i
) =
g(x
i
) g(x
i1
)
x
i
x
i1
from which
g(b) g(a) =
n
i=1
g(x
i
) g(x
i1
) =
n
i=1
g
(ξ
i
)(x
i
x
i1
)
This is a Riemann sum for g
associated to the partition P. Since the upper and lower Darboux sums
are the supremum and infimum of these, we see that
L(g
, P) g(b) g(a) U(g
, P)
However
R
b
a
g
satisfies the same inequality: L(g
, P)
R
b
a
g
U(g
, P). Since these inequalities
hold for all ϵ > 0, we conclude that
R
b
a
g
= g(b) g(a).
While we certainly used the integrability of g
in the proof, it might seem strange that we assumed it
at all: shouldn’t every derivative be integrable? Perhaps surprisingly, the answer is no! If you want
a challenge, look up the Volterra function, which is differentiable everywhere but whose derivative is
non-integrable!
The Rules of Integration
If one wants to evaluate an integral, rather than merely show it exists, there are really only two options:
1. Evaluate Riemann sums and take limits. This is often difficult if not impossible to do explicitly.
2. Use FTC II. The problem now becomes the finding of anti-derivatives, for which the core method
is essentially guess and differentiate. To obtain general rules, we can attempt to reverse the rules
of differentiation.
Integration by Parts Recall the product rule: the product g = uv of two differentiable functions is
differentiable with g
= u
v + uv
. Now apply Theorems 4.9, 4.13 and FTC II.
Corollary 4.28 (Integration by Parts). Suppose u, v are continuous on [a, b], differentiable on (a, b),
and that u
, v
are integrable on (a, b). Then
Z
b
a
u
(x)v(x)dx = u(b)v(b) u(a)v(a)
Z
b
a
u(x)v
(x)dx
This is significantly less useful than the product rule since it merely transforms the integral of one
product into the integral of another.
87
Examples 4.29. With practice, there is no need to explicitly state u and v.
1. Let u(x) = x and v
(x) = cos x. Then u
(x) = 1 and v(x) = sin x. These certainly satisfy the
hypotheses. We conclude
Z
π/2
0
x cos x dx =
[
x sin x
]
π/2
0
Z
π/2
0
sin x dx =
π
2
sin
π
2
0
[
cos x
]
π/2
0
=
π
2
+ cos
π
2
cos 0 =
π
2
1
2. Let u(x) = ln x and v
(x) = 1. Then u
(x) =
1
x
and v(x) = x, whence
Z
e
2
e
ln x dx =
[
x ln x
]
e
2
e
Z
e
2
e
x
x
dx = e
2
ln e
2
e ln e
[
x
]
e
2
e
= 2e
2
e e
2
+ e = e
2
Change of Variables/Substitution We now turn our attention to the chain rule. If g(x) = F
u(x)
,
where F and u are differentiable, then g is differentiable with
g
(x) =
dg
dx
=
dF
du
du
dx
= F
u(x)
u
(x)
Now integrate both sides; the only issue is what assumptions are needed to invoke FTC II.
Theorem 4.30 (Substitution Rule). Suppose u : [a, b] R and f : range(u) R are continuous.
Suppose also that u is differentiable on (a, b) with integrable derivative u
. Then
Z
b
a
f
u(x)
u
(x) dx =
Z
u(b)
u(a)
f (u) du
This is the famous u-sub’/change-of-variables formula from elementary calculus.
Proof. We leave as an exercise the verification that both integrals exist. By the intermediate and
extreme value theorems, range(u) is a closed bounded interval. Assume range(u) has positive length
for otherwise both integrals are trivially zero.
Choose any c range(u) and define
F : range( u) R by F(v) :=
Z
v
c
f (t) dt
Since f is continuous, by FTC I says that F is differentiable with F
(u) = f ( u). But now
Z
b
a
f
u(x)
u
(x) dx =
Z
b
a
d
dx
F
u(x)
dx (chain rule)
= F
u(b)
F
u(a)
(FTC II)
=
Z
u(b)
u(a)
f (u) du
88
Examples 4.31. Successfully applying the substitution rule can require significant creativity.
23
1. To evaluate
R
π
0
2x sin x
2
dx, we consider the substitution u(x) = x
2
defined on [0,
π].
Certainly u is continuous; moreover its derivative u
(x) = 2x is integrable on (0,
π). Finally
f (u) = sin u is continuous on range(u) = [0, π]. The hypotheses are satisfied, whence
Z
π
0
2x sin x
2
dx =
Z
π
0
f
u(x)
u
(x) dx =
Z
u(π)
u(0)
f (u) du =
Z
π
0
sin u du
= cos u
π
0
= 2
2. For the following integral, a simple factorization suggests the substitution u(x) = x
2
2.
Plainly u : [
2,
3] [0, 1] and u
(x) = 2x is integrable. Moreover, f (u) =
1
u
2
+1
is continuous
on range(u) = [0, 1]. We conclude
Z
3
2
2x
x
4
4x
2
+ 5
dx =
Z
3
2
2x
(x
2
2)
2
+ 1
dx =
Z
1
0
1
u
2
+ 1
du = arctan u
1
0
=
π
4
3. The hypotheses on u really are all that’s necessary. In particular, u need not be left-/right-
differentiable at the endpoints of [a, b]. For instance, with f (u) = u
2
and u(x) =
x on [0, 4],
we easily verify
8
3
=
Z
4
0
1
2
x dx =
Z
4
0
x
2
x
dx =
Z
4
0
f
u(x)
u
(x) dx =
Z
2
0
f (u) du =
Z
2
0
u
2
du =
8
3
4. Sloppy ‘substitutions’ might lead to utter nonsense. For instance, u(x) = x
2
suggests
Z
2
1
1
x
dx =
Z
2
1
1
2x
2
2x dx =
Z
4
1
1
2u
du =
1
2
(ln 4 ln 1) = ln 2
This is total gibberish: the first integral does not exist since
1
x
is undefined at 0 (1, 2).
Thankfully, the hypotheses of the substitution rule prevent this: f (u) =
1
2u
is not continuous
on range(u) = [0, 4].
While you are very unlikely to make precisely this mistake, the risk is real in more complicated
or abstract situations. . .
23
Hence the old adage, “Differentiation is a science, whereas integration is an art.” To illustrate by example, consider
f (x) = tan(e
x
cos(3x
2
) + 4x
3
). The derivative is easily found using the product and chain rules:
d f
dx
=
1
1 + (e
x
cos(3x
2
) + 4x
3
)
2
e
x
cos(3x
2
) 6xe
x
sin(3x
2
) + 12x
2
By contrast, if you want to find an explicit anti-derivative of f (x), the integration analogues (parts/substitution) are essen-
tially useless. Similarly, the integral
Z
1
0
tan(e
x
cos(3x
2
) + 4x
3
) dx
is likely impossible to evaluate explicitly and can only be approximated, say by using Riemann sums.
89
Exercises 4.34. Key concepts: Complete statements of FTC parts I & II, Integration by Parts/Substitution
1. Calculate the following limits:
(a) lim
x0
1
x
Z
x
0
e
t
2
dt (b) lim
h0
1
h
Z
3+h
3
e
t
2
dt
2. Let f (t) =
0 if t < 0
t if 0 t 1
4 if t > 1
(a) Determine the function F(x) =
R
x
0
f (t) dt and sketch it. Where is F continuous?
(b) Where is F differentiable? Calculate F
at the points of differentiability.
3. Let f be continuous on R.
(a) Define F(x) =
R
x+1
x1
f (t) dt. Carefully show that F is differentiable on R and compute F
.
(b) Repeat for G(x) =
R
sin x
0
f (t) dt.
4. Recall Examples 4.24.4 and 4.27.3. Describe all anti-derivatives F of f on [0, 1) (1, 2]. Which
satisfy
R
2
0
f (x) dx = F(2) F(0)?
5. Suppose u, v satisfy the hypotheses of integration by parts. By FTC I,
R
x
a
u
(t)v(t) dt is an anti-
derivative of u
(x)v(x): what does integration by parts say is another?
6. Use a substitution to integrate
R
1
0
x
1 x
2
dx
7. Use integration by parts and the substitution rule to evaluate
R
b
0
arcsin x dx for any b < 1.
8. Use integration by parts to evaluate
R
b
0
x arctan x dx for any b > 0
9. If f and u satisfy the hypotheses of the substitution rule, explain why both ( f u)u
and f are
integrable on the required intervals.
10. We prove a simpler version of the fundamental theorem when f : [a, b] R is continuous.
Part I Define F(x) =
R
x
a
f (t) dt. If c, x [a, b] where c = x, prove that
m
F(x) F(c)
x c
M
where m, M are the maximum and minimum values of f (t) on the closed interval with
endpoints c, x; why do m, M exist? Now deduce that F
(c) = f ( c).
Part II Now suppose F is any anti-derivative of f on [a, b]. Use part (a) and the mean value
theorem to prove that
R
b
a
f (t) dt = F(b) F(a).
90
4.36 Improper Integrals
The Riemann integral has several limitations. Even allowing for functions to be integrable on open
intervals (Definition 4.12), the existence of
R
b
a
f (x) dx requires both:
That (a, b) be a bounded interval.
That f be bounded on (a, b).
Limits provide a natural way to extend the Riemann integral to unbounded intervals and functions.
Definition 4.32. Suppose f : [a, b) R satisfies the following properties:
f is integrable on every closed bounded subinterval [a, t] [a, b).
If b is finite, then f is unbounded at b (b can be !)
The improper integral of f on [a, b) is
Z
b
a
f (x) dx := lim
tb
Z
t
a
f (x) dx
This is convergent or divergent as is the limit.
If an integral is improper at its lower limit (f : (a, b] R, etc.), then
R
b
a
f (x) dx := lim
sa
+
R
b
s
f (x) dx.
If an integral is improper at both ends, choose any c (a, b) and define
Z
b
a
f (x) dx = lim
sa
+
Z
c
s
f (x) dx + lim
tb
Z
t
c
f (x) dx
provided both one-sided improper integrals exist and the limit sum makes sense.
Theorem 4.14 says that the choice of c for a doubly-improper integral is irrelevant.
Many properties of the Riemann integral transfer naturally to improper integrals, though not every-
thing. . . For example, part 1 of Theorem 4.13 extends:
Theorem 4.33. If 0 f (x) g(x) on [a, b), then
R
b
a
f
R
b
a
g whenever the integrals exist (standard
or improper). In particular:
R
b
a
f = =
R
b
a
g =
R
b
a
g convergent =
R
b
a
f converges to some value
R
b
a
g
We leave some of the detail to Exercise 7.
91
Examples 4.34. 1.
R
t
0
x
2
dx =
1
3
t
3
for any t > 0. Clearly
Z
0
x
2
dx = lim
t
1
3
t
3
=
More formally, the improper integral
R
0
x
2
dx diverges to infinity.
2. With f (x) = x
4/3
defined on [1, ),
Z
1
x
4/3
dx = lim
t
Z
t
1
x
4/3
dx = lim
t
h
3x
1/3
i
t
1
= lim
t
3 3t
1/3
= 3
3. Consider f (x) =
|
x
|
e
x
2
/2
on (, ). On any bounded interval [ 0, t],
Z
t
0
f (x) dx =
Z
t
0
xe
x
2
/2
dx =
h
e
x
2
/2
i
t
0
= 1 e
t
2
/2
t
1
By symmetry,
Z
|
x
|
e
x
2
/2
dx = 1 + 1 = 2
This example arises naturally in probability: multiplying by
1
2π
computes the expectation of
|
X
|
when X is a standard normally-distributed random variable
E(
|
X
|
) =
Z
1
2π
|
x
|
e
x
2
/2
dx =
r
2
π
4. Our knowledge of derivatives
d
dx
sin
1
x =
1
1x
2
(or the substitution rule) allows us to evaluate
Z
1
0
1
1 x
2
dx = lim
t1
Z
t
0
1
1 x
2
dx = lim
t1
sin
1
t =
π
2
By symmetry,
R
1
1
1
1x
2
dx = π. By comparison, we obtain bounds on another improper inte-
gral:
1
1 x
4
1
1 x
2
=
Z
1
1
1
1 x
4
dx
Z
1
1
1
1 x
2
dx = π
5. Improper integrals need not exist. For instance,
lim
t
Z
t
0
sin x dx = lim
t
1 cos t
diverges by oscillation.
92
Exercises 4.36. Key concepts: Formal definition and careful calculation of Improper Integrals
1. Use your answers from Section 4.34 to decide whether the improper integrals
R
1
0
arcsin x dx
and
R
0
x arctan x dx exist. If so, what are their values?
2. Let p be a positive constant. Prove:
Z
1
0
1
x
p
dx =
(
1
1p
if p < 1
if p 1
Z
1
1
x
p
dx =
(
1
p1
if p > 1
if p 1
(The first of these justifies the convergence/divergence properties of p-series via the integral test)
3. Suppose f is integrable on [a, b]. Explain why
R
b
a
f (x) dx = lim
tb
R
t
a
f (x) dx is still true, even
though the integral is not improper.
4. State a version of integration by parts modified for when
R
b
a
u
(x)v(x) dx is improper at b. Now
evaluate
R
0
xe
4x
dx.
5. What is wrong with the following calculation?
Z
x dx = lim
t
1
2
x
2
t
t
= lim
t
1
2
(t
2
t
2
) = lim
t
0 = 0
6. Prove or disprove: if
R
f and
R
g are convergent improper integrals, so is
R
f g.
7. Prove part of Theorem 4.33. Suppose 0 f (x) g(x) for all x [a, b), and that
R
b
a
g is a
convergent improper integral. Prove that
R
b
a
f converges and that
R
b
a
f
R
b
a
g.
93
Extensions of the Riemann Integral (just for fun)
In the 1890s, Thomas Stieltjes
24
offered a generalization of the Riemann integral.
Definition 4.35. Let f : [a, b] R be bounded and α : [a, b] R monotonically increasing. Given a
partition P = {x
0
, . . . , x
n
} of [a, b], define the sequence of differences
α
i
= α(x
i
) α(x
i1
)
The upper/lower Darboux–Stieltjes sums/integrals are defined analogously to the pure Riemann case:
U( f , P, α) =
n
i=1
sup
[x
i1
,x
i
]
f (x) α
i
L( f , P, α) =
n
i=1
inf
[x
i1
,x
i
]
f (x) α
i
U( f , α) = inf
P
U( f , P, α) L( f , α) = sup
P
L( f , P, α)
If U( f , α) = L( f , α), we say that f is Riemann–Stieltjes integrable of class R(α) and denote its value
R
b
a
f (x) dα.
The standard Riemann integral corresponds to α(x) = x. It is the ability to choose other functions α
that makes the Riemann–Stieltjes integral both powerful and applicable.
Standard Properties Most results in sections 4.32 and 4.33 hold with suitable modifications, as does
the discussion of improper integrals. For instance,
f R(α) P such that U( f , P, α) L( f , P, α) < ϵ
The result regarding the piecewise continuity of f is a notable exception: depending on α, a
piecewise continuous f might not lie in R(α).
Weighted integrals If α is differentiable, we obtain a standard Riemann integral
Z
b
a
f (x) dα =
Z
b
a
f (x)α
(x) dx
weighted so that f (x) contributes more when α is increasing rapidly.
Probability If α(a) = 0 and α(b) = 1, then α may be viewed as a probability distribution function and its
derivative α
as the corresponding probability density function. For example:
1. The uniform distribution on [a, b] has α =
1
ba
(x a) so that
Z
b
a
f (x) dα =
1
b a
Z
b
a
f (x) dx
Since α
is constant, the integrals weigh all values of x uniformly.
2. The standard normal distribution has α(x) =
R
x
1
2π
e
t
2
/2
dt. The fact that α
=
1
2π
e
x
2
/2
is maximal when x = 0 reflects the fact that a normally distributed variable is clustered
near its mean.
In all cases,
R
f (x) dα = E( f (X)) computes an expectation (see, e.g., Example 4.34.3).
24
Stieltjes was Dutch; the pronunciation is roughly ‘steelchez.’
94
Non-differentiable or continuous α This provides major flexibility! For example, if Q = {s
0
, . . . , s
n
}
partitions [a, b] , and (c
k
)
n
k=1
is a positive sequence, then
α(x) =
0 if x = a
k
i=1
c
i
if x ( s
k1
, s
k
]
defines an increasing step function, and the Riemann–Stieltjes integral a weighted sum
Z
b
a
f (x) dα =
n
i=1
c
i
f (s
i
)
Taking an infinite increasing sequence (s
n
) [a, b] results in an infinite series, which helps
explain why so many results for series and integrals look similar!
This also touches on probability. For example, let p [0, 1], n N, and s
k
= k on the interval
[0, n]. If c
k
=
(
n
k
)
p
k
(1 p)
nk
, then
Z
f (x) dα =
n
k=0
n
k
p
k
(1 p)
nk
f (x) = E( f (X))
is the expectation of f (X) when X B(n, p) is binomially distributed.
Lebesgue Integration: Integrals and Convergence
Lebesgue’s extension essentially uses rectangles whose heights tend to zero: cutting up the area under
a curve using horizontal instead of vertical strips. One of its major purposes is to permit a more general
interchange of limits and integration in many cases of pointwise (non-uniform) convergence. To see
the problem, consider the sequence of piecewise continuous functions
f
n
: [0, 1] R : x 7
(
1 if x =
p
q
Q with q n
0 otherwise
Each f
n
is Riemann integrable with
R
1
0
f
n
(x) dx = 0. However, the pointwise limit
f (x) =
(
1 if x Q
0 if x Q
is not Riemann integrable (compare Example 4.8.2). In the Lebesgue theory, the limit f turns out to
be integrable with integral 0, so that
lim
n
Z
1
0
f
n
(x) dx =
Z
1
0
lim
n
f
n
(x) dx
Recall (Theorem 2.19) that the interchange of limits and integrals would be automatic if the conver-
gence f
n
f were uniform: of course the convergence isn’t uniform here.
Like measure theory (recall Theorem 4.22), Lebesgue integration is a central topic in graduate analysis.
95