Math 140B - Notes
Neil Donaldson
Fall 2022
1 Continuity
The primary goal of this course is to make elementary calculus rigorous. We begin with a review of
some basic concepts and conventions.
Sets & Functions We are concerned with functions f : U V where both U, V are subsets of the
real numbers R:
Domain dom( f ) = U; the inputs to f . Often implied to be the largest set on which a formula is
defined. In calculus examples, the domain is typically a union of intervals of positive length.
Codomain codom( f ) = V. We often take V = R by default.
Range range( f ) = f (U) = {f (x) : x U}; the outputs of f and a subset of V.
Injectivity f is injective/one-to-one if f (x) = f (y) = x = y.
Surjectivity f is surjective/onto if f (U) = V.
Inverses f is bijective/invertible if it is injective and surjective. Equivalently, f
1
: V U satisfying
u U, f
1
f (u)
= u and v V, f
f
1
( v)
= v
Example 1.1. The function defined by f (x) =
1
x(x2)
has implied
dom( f ) = R \{0, 2} = (, 0) (0, 2) (2, )
range( f ) = (, 1] (0, )
The function is neither injective nor surjective.
By restricting the domain/codomain, we obtain a bijection:
dom(
ˆ
f ) = [1, 2) (2, )
codom(
ˆ
f ) = ( , 1] (0, )
with inverse
ˆ
f
1
( y) =
(
1 + y
1
p
y + 1 if y > 0
1 y
1
p
y + 1 if y 1
Now dom(
ˆ
f
1
) = codom(
ˆ
f ) and codom(
ˆ
f
1
) = dom(
ˆ
f ).
2
1
1
2
f (x)
1 1 2 3
x
2 1 0 1 2
1
2
3
ˆ
f
1
(y)
y
Suprema and Infima A set U R is bounded above if it has an upper bound M:
M R such that u U, u M
Axiom 1.2 (Completeness). If U R is non-empty and bounded above then it has a least upper
bound, the supremum of U
sup U = min
M R : u U, u M
By convention, sup U = if U is unbounded above and sup = ; now every subset of R has a
supremum. Similarly, the infimum of U is its greatest lower bound:
inf U =
max
m R : u U, u m
if U = is bounded below
if U = is unbounded below
if U =
Examples 1.3. Here are four sets with their suprema and infima stated. You should be able to verify
these assertions directly from the definitions.
U {1, 2, 3, 4} (0, 5) (, π] R {
1
n
: n N}
sup U 4 5 π 1
inf U 1 0 0
Note how the supremum/infimum might or might not lie in the set itself.
Interiors, closures, boundaries and neighborhoods These last concepts might not be review, but
they will be used repeatedly.
Definition 1.4. Let U R. A value a R is interior to U if it lies in some open subinterval of U:
δ > 0 such that (a δ, a + δ) U
A neighborhood of a is any set to which a is interior: the interval (a δ, a + δ) is an open δ-neighborhood
of a. A punctured neighborhood of a is a neighborhood with a deleted.
The set of points interior to U is denoted U
.
A limit point of U is the limit of some sequence (x
n
) U. The closure U is the set of limit points.
The boundary is the set U = U \U
.
Examples 1.5. 1. If U = [ 1, 3), then U
= (1, 3), U = [1, 3] and U = {1, 3}.
2. Q
= and Q = Q = R.
3. (3, 5) (5, 7] is a punctured neighborhood of 5.
2
17 Continuity of Functions
Everything in this section
1
should be review.
Definition 1.6. A function f : U R is continuous at u U if either of the following hold:
1. For all sequences (x
n
) U converging to u, the sequence ( f (x
n
)) converges to f (u).
2. ϵ > 0, δ > 0 such that x U,
|
x u
|
< δ =
|
f (x) f (u)
|
< ϵ.
A function f is continuous on U if it is continuous at every point u U.
Examples 1.7. 1. We prove that f (x) = x
3
is continuous at u = 2.
(a) (Limit method) Let x
n
2. By the limit laws (i.e. lim(x
k
n
) =
(
lim x
n
)
k
),
lim
x
n
2
f (x
n
) = lim
x
n
2
x
3
n
=
lim
x
n
2
x
n
3
= 2
3
= f (2)
(b) (ϵ δ method) Let ϵ > 0 be given and let δ = min
1,
ϵ
19
.
|
x 2
|
< δ =
|
x 2
|
< 1 = 1 < x < 3
from which
x
3
2
3
=
|
x 2
|
x
2
+ 2x + 2
2
< 19
|
x 2
|
ϵ
where we used the triangle inequality.
2. Let g(x) =
(
x sin
1
x
if x = 0,
0 if x = 0
Then g is continuous at x = 0. Again this can be done with limits
or an ϵδ argument; both are essentially the squeeze theorem.
3. The function defined by
h(x) =
(
1 + 2x
2
if x < 1
2 x if x 1
is discontinuous at x = 1.
(a) The sequence with x
n
= 1
1
n
converges to 1, yet
lim h(x
n
) = 3 = 1 = h(1)
(b) Choose ϵ = 1 and suppose δ > 0 is given. Now choose
x = max{1
δ
2
,
1
2
} to see that
|
x 1
|
< δ and
|
h(x) h(1)
|
1 = ϵ
g(x)
x
0
1
2
3
h(x)
0 1 2
x
x
1
Section numbers are identical to those in the official textbook.
3
Theorem 1.8. The two parts of Definition 1.6 are equivalent.
Proof. (1 2) We prove the contrapositive. Suppose condition 2 is false; that is,
ϵ > 0, such that δ > 0, x U with
|
x u
|
< δ and
|
f (x) f (u)
|
ϵ
In particular, for any n N we may let δ =
1
n
to obtain
ϵ > 0, such that n N, x
n
U with
|
x
n
u
|
<
1
n
and
|
f (x
n
) f (u)
|
ϵ
The sequence (x
n
) shows that condition 1 is false:
n,
|
x
n
u
|
<
1
n
whence x
n
u.
n,
|
f (x
n
) f (u)
|
ϵ > 0, whence f (x
n
) does not converge to f (u).
(2 1) Suppose condition 2 is true, that (x
n
) U converges to u and that ϵ > 0 is given. Then
δ > 0 such that
|
x u
|
< δ =
|
f (x) f (u)
|
< ϵ
However, by the definition of convergence (x
n
u),
N N such that n > N =
|
x
n
u
|
< δ =
|
f (x
n
) f (u)
|
< ϵ
Otherwise said, f (x
n
) f (u).
Rather than use these definitions every time, it is helpful to have a working dictionary.
Theorem 1.9 (Common Continuous Functions).
1. Suppose f and g are continuous at u, that h is continuous at f (u) and that k is constant. Then
the following are continuous at u (if defined):
f + g, f g, f g,
f
g
,
|
f
|
, k f , max( f , g), min( f , g), h f
2. Algebraic
2
functions are continuous.
3. The common transcendental functions are continuous: exp, ln, sin, etc.
Example 1.10. f (x) = sin
3
x
2
+7
x2
+ cos
1
e
x
1
is continuous on its domain (, 0) (0, 1) (1, ).
These claims are tedious to prove using elementary definitions. The first two require many uses
of the limit laws, while the transcendental claim is easier to defer until we can define the common
functions using power series, after which continuity comeS for free.
2
Constructed using finitely many addition/subtraction, multiplication/division and n
th
root operations
4
Exercises 17 1. Give examples to show that g f being continuous can happen with:
(a) f continuous and g discontinuous.
(b) g continuous and f discontinuous.
(c) Both f , g discontinuous.
You may use pictures, but make sure they clearly describe the functions f , g.
2. (a) Prove that the function f (x) = x
3
is continuous at x = 2 using an ϵδ argument.
(b) Prove that f (x) = x
3
is continuous at x = u using an ϵδ argument.
3. Prove that the following are discontinuous at x = 0: use both definitions of continuity.
(a) f (x) = 1 for x < 0 and f (x) = 0 for x 0.
(b) g(x) = sin(1/x) for x = 0 and g(0) = 0.
4. Suppose f and g are continuous at u. Prove the following using ϵ δ arguments.
(a) f g is continuous at u.
(b) If h is continuous at f (u), then h f is continuous at u.
5. Contrary to our standing assumption, suppose f : U R is a function whose domain U
contains an isolated point a: i.e. r > 0 such that (a r, a + r) U = {a}. Prove that f is
continuous at a.
6. Refresh your prerequisites by giving formal proofs of the following:
(a) (Suprema and sequences) If M = sup U, then (x
n
) U such that x
n
M.
(Remember that this has to work even if M = . . . )
(b) (Limit of a bounded sequence) If (x
n
) [a, b] and x
n
x, then x [a, b].
(c) (Bolzano–Weierstraß) Every bounded sequence in R has a convergent subsequence.
(Hint: If (x
n
) [a, b] , explain why there exists a family of nested intervals I
1
I
2
I
3
···
such that infinitely many of the terms (x
n
) lie in each interval I
k
. Hence obtain a subsequence
(x
n
k
) and prove that it is Cauchy.
3
)
7. (Hard) Consider the function f : R R where
f (x) =
(
1
q
whenever x =
p
q
Q with q > 0 and gcd(p, q) = 1
0 if x Q
For example, f (1) = f (2) = f (7) = 1, and f (
1
2
) = f (
1
2
) = f (
3
2
) = ··· =
1
2
, etc. Prove that f
is continuous at each point of R \Q and discontinuous at each point of Q.
3
This is a good moment to review the notion of a Cauchy sequence
ϵ > 0, N such that m, n > N =
|
x
m
x
n
|
< ϵ
and the discussion of Cauchy completeness: (x
n
) R is convergent if and only if it is Cauchy.
5
18 Properties of Continuous Functions
The goal of this section is to describe the behavior of a continuous function on an interval. We first
consider the special case when the domain is a closed bounded interval [a, b].
Theorem 1.11 (Extreme Value Theorem). A continuous function on a closed, bounded interval is
bounded and attains its bounds. Otherwise said, if f : [a, b] R is continuous, then
x, y [a, b] such that f (x) = sup range( f ) and f (y) = inf range( f )
In particular, the supremum and infimum are finite.
Proof. Suppose f is continuous with domain [a, b] and let M = sup{f (x) : x [a, b]}. We invoke the
three parts of Exercise 17.6:
(Part a) There exists a sequence (x
n
) [a, b] such that f (x
n
) M.
(Part c) There exists a convergent subsequence (x
n
k
) with limit x.
(Part b) x [a, b] .
Since f is continuous, we now have f (x) = lim
k
f (x
n
k
) = M. This shows that M is finite and that f
attains its least upper bound. For the lower bound, apply this to f .
It is worth considering how the result can fail when one of the hypotheses is weakened. For example:
f discontinuous f : [0, 1] R : x 7
(
x if x = 1
0 if x = 1
is bounded but does not attain its bounds.
dom( f ) not closed f : [0, 1) R : x 7 x is bounded but does not attain its bounds.
dom( f ) not bounded f : [0, ) R : x 7 x is unbounded.
We now generalize to functions on arbitrary intervals. Our next result should be familiar from ele-
mentary calculus and is intuitively obvious from the na
¨
ıve notion of continuity: graph such a func-
tion without taking your pen from the page.
Theorem 1.12 (Intermediate Value Theorem). Let f : I R be continuous on an interval I. Sup-
pose a, b I with a < b and that f (a) = f (b). If L lies between f (a) and f (b), then ξ (a, b) such
that f (ξ) = L.
Example 1.13. Let f (x) = cos x with a =
π
4
, b = 3π
and L =
1
2
; then
f (ξ) = L ξ
π
3
,
5π
3
,
7π
3
There may therefore be several suitable values of ξ. It is
even possible (see Exercise 18.2) for there to be infinitely
many.
1
1
f (x)
x
a bπ 2π
L
6
Proof. Suppose WLOG that f (a) < L < f (b) and let
S = {x [a, b] : f (x) < L}
Plainly S [a, b) is non-empty, hence ξ := sup S exists
and ξ [a, b]. It remains to show that ξ satisfies the
required properties.
By Exercise 6, (s
n
) S with lim s
n
= ξ. Since f is
continuous, f (ξ) = lim f (s
n
) L. In particular, ξ = b.
x
L
a b
f (a)
f (b)
ξ
S
To finish the proof, we can play a similar game with the sequence defined by t
n
= min{b, ξ +
1
n
}; this
is left to Exercise 4.
Example 1.14. The intermediate value theorem is particularly useful for demonstrating the exis-
tence of solutions to equations. For example, we can use the following steps to show that the equation
x2
x
= 1 has a solution.
g(x) = x2
x
1 is continuous.
g(0) = 1 < 0.
g(1) = 1 > 0.
By the intermediate value theorem ξ (0, 1) such
that g(ξ) = 0: that is ξ ·2
ξ
= 1.
1
1
2
g(x)
1
x
ξ
It is inefficient, but one can home in on ξ by repeatedly halving the size of the interval: for instance,
g(
1
2
) =
2
2
1 < 0, g(
3
4
) =
3
4
·2
3/4
1 0.26 > 0 . . . =
1
2
< ξ <
3
4
We finish with a useful corollary.
Corollary 1.15. Continuous functions map intervals to intervals (or points).
Proof. An interval I is characterized by the following property
x
1
, x
2
I, x R, x
1
< x < x
2
= x I
Let f : I R be continuous and suppose its range f (I) is not a single point. If f (a) < L < f (b), then
ξ between a, b such that f (ξ) = L. Otherwise said, L f (I) and so f (I) is an interval.
More generally, if dom( f ) =
S
I
n
is written as a union of disjoint intervals and f is continuous, then
range( f ) =
[
f (I
n
)
is also a union of intervals, though these need not be disjoint: a continuous function can bring inter-
vals together, but cannot break an interval apart.
For example, f (x) =
x
2
4 has domain (, 2] [2, ) and range [0, ): both original intervals
get mapped to the same interval by f .
7
A more general statement from topology says that if f : U V is continuous between topological
spaces and a, b lie in the same component of U, then f (a) and f (b) lie in the same component of f (U).
In single-variable real analysis each component is an interval.
Exercises 18 1. Give examples of the following:
(a) An unbounded discontinuous function on a closed bounded interval.
(b) An unbounded continuous function on a non-closed bounded interval.
(c) A bounded continuous function on a closed unbounded interval which fails to attain its
bounds.
2. Consider the function f (x) =
(
x sin
1
x
if x = 0
0 if x = 0
(a) Explain why f is continuous on any interval I.
(b) Suppose a < 0 < b and that f (a), f (b) have opposite signs. If L = 0, show that the
intermediate value theorem is satisfied by infinitely many distinct values ξ.
3. Use the intermediate value theorem to prove that the equation 8x
3
12x
2
2x + 1 = 0 has at
least 3 real solutions (and thus, by the fundamental theorem of algebra, exactly 3).
4. Complete the proof of the intermediate value theorem by defining t
n
= min(b, ξ +
1
n
).
5. (a) Suppose f : U R is continuous and that U =
n
S
k=1
I
k
is the union of a finite sequence (I
k
)
of closed bounded intervals. Prove that f is bounded and attains its bounds.
(b) Let U =
S
n=1
I
n
, where I
n
= [
1
2n
,
1
2n1
] for each n N. Give an example of a continuous
function f : U R which is either unbounded or does not attain its bounds. Explain.
8
19 Uniform Continuity
Recall the ϵδ definition of continuity: f : U R is continuous at all points
4
y U, we require
y U, ϵ > 0, δ > 0 such that (x U)
|
x y
|
< δ =
|
f (x) f (y)
|
< ϵ
Note the order of the quantifiers: δ is permitted to depend on both y and ϵ. In the na
¨
ıve sense of
continuity (x close to y = f (x) close to f (y)), the meaning of close is seen to depend on the location
y. Uniform continuity is a stronger condition where the meaning of ‘close’ is independent of location.
Definition 1.16. f : U R is uniformly continuous if
ϵ > 0, δ > 0 such that (x, y U)
|
x y
|
< δ =
|
f (x) f (y)
|
< ϵ
We’ve included the (typically) hidden quantifiers (x, y) in both definitions to make clear that ϵ and
δ are independent of x and y. Note also that the definition is now symmetric in x and y.
Example 1.17. Consider f (x) =
1
x
.
1. If 0 < a < b , then f is uniformly continuous on [a, b).
Let ϵ > 0 be given and let δ = a
2
ϵ. Then x, y [a, b),
|
x y
|
< δ =
1
x
1
y
=
y x
xy
<
δ
xy
δ
a
2
= ϵ
2. If 0 < b , then f is not uniformly continuous on (0, b).
Let ϵ = 1 and suppose δ > 0 is given.
Let x = min(δ, 1,
b
2
) and y =
x
2
.
Certainly x, y (0, b) and
|
x y
|
=
x
2
δ
2
< δ. However,
|
f (x) f (y)
|
=
1
x
1 = ϵ
f (x)
x
a b
δ
ϵ
Think about how ϵ and δ must relate as one slides the intervals in the picture up/down and left/right.
Some intuition will help make sense of the above examples.
Bounded/unbounded gradient In part 1, ϵ = δa
2
, where
1
a
2
=
|
f
(a)
|
bounds the gradient of f .
By contrast, the slope of f is unbounded in part 2.
Extendability In part 1 (if b = ), the domain of f may be extended: g : [a, b] R : x 7
1
x
is
continuous. In part 2, this is impossible: there is no continuous function g : [0, b) R such
that g(x) =
1
x
whenever x > 0.
If the gradient of a continuous function is bounded or if you can ‘fill in the holes’ at the endpoints of
its domain, then the function is uniformly continuous. While the utility of uniform continuity is often
in proofs when the independence of ϵ and location are critical, it is often one of the above properties
that is being invoked. The remainder of this section involves making thse observations watertight.
4
To promote the symmetry in the coming definition, we use y instead of u for a generic point of dom( f ).
9
Theorem 1.18. Let f : I R be differentiable on an interval I. If the derivative f
is bounded on
the interior I
, then f is uniformly continuous on I.
The proof depends on the mean value theorem, which we’ll prove later in the term.
Proof. Suppose
|
f
(x)
|
M on I
. Let ϵ > 0 be given, let δ =
ϵ
M
and suppose x, y I with x > y.
Then
|
x y
|
< δ = ξ I
such that f
( ξ) =
f (x) f (y)
x y
(MVT)
=
|
f (x) f (y)
|
=
f
( ξ)
|
x y
|
< Mδ = ϵ
Theorem 1.18 isn’t a biconditional: for instance, Exercise 19.5 shows that f (x) =
x on [0, ) and
g(x) = x
1/3
on R are both uniformly continuous even though they have unbounded slope.
We now discuss the idea of extendability and how uniform continuity relates to continuity on closed
sets. First we see that for closed bounded sets, uniform continuity is nothing new.
Theorem 1.19. If g : [a, b] R is continuous, then it is uniformly continuous.
Proof. Suppose g is continuous but not uniformly so. Then
ϵ > 0 such that δ > 0, x, y [a, b] for which
|
x y
|
< δ and
|
g(x) g(y)
|
ϵ ()
For each n N, let δ =
1
n
to see that there exists sequences (x
n
), (y
n
) [a, b] satisfying the above.
By Bolzano–Weierstraß, the bounded sequence (x
n
) has a convergent subsequence x
n
k
x [a, b].
Clearly
|
x
n
k
y
n
k
|
<
1
n
k
0 = y
n
k
x
But then
|
g(x
n
k
) g( y
n
k
)
|
0 which contradicts ().
Now we build to a partial converse of this.
Lemma 1.20. If f : U R is uniformly continuous and (x
n
) U is a Cauchy sequence, then
( f (x
n
)) is also Cauchy.
Proof. Let ϵ > 0 be given. Then:
(Uniform Continuity) δ > 0 such that
|
x y
|
< δ =
|
f (x) f (y)
|
< ϵ.
(Cauchy) N N such that m, n > N =
|
x
m
x
n
|
< δ.
Putting these together, we see that
N N such that m, n > N =
|
f (x
m
) f (x
n
)
|
< ϵ
Otherwise said, ( f (x
n
)) is Cauchy.
10
We now see that a function f : I R is uniformly continuous on a bounded interval if and only it is
has a continuous extension g : I R defined on the closure of its domain.
Theorem 1.21. Suppose f : I R is continuous where I is a bounded interval with endpoints
a < b. Define g : [a, b] R via
g(x) =
f (x) if x I
lim f (x
n
) whenever (x
n
) I and x
n
a
lim f (x
n
) whenever (x
n
) I and x
n
b
Then f is uniformly continuous if and only g is well-defined (g is continuous, if well-defined).
Proof. () Suppose f is uniformly continuous on I and that a I. Let (x
n
), (y
n
) I be sequences
converging to a. To show that g is well-defined, we must prove that ( f (x
n
)) and ( f (y
n
)) are
convergent, and to the same limit.
Define a sequence
( u
n
) = (x
1
, y
1
, x
2
, y
2
, x
3
, y
3
, . . .)
Since (x
n
) and (y
n
) have the same limit a, we see that u
n
a. But then ( u
n
) is Cauchy;
by Lemma 1.20, ( f (u
n
)) is also Cauchy and thus convergent. Since ( f (x
n
)) and ( f (y
n
)) are
subsequences of a convergent sequence, they must also converge to the same (finite!) limit.
The argument when b I is identical.
() Certainly if g is well-defined then it is continuous. By Theorem 1.19 it is uniformly so. Since
f = g on a subset of dom(g), the same choice of δ will work for f as for g: f is therefore
uniformly continuous.
Examples 1.22. 1. f : x 7 x
2
is uniformly continuous on (3, 10) since its derivative f
(x) = 2x
is bounded (
|
f
(x)
|
= 2
|
x
|
20) on its domain. It has the obvious continuous extension
g(x) = x
2
on [3, 10].
2. Neither argument works for f (x) = x
2
on the domain (3, ): both f
and the domain (3, )
are unbounded, so neither Theorem 1.18 nor 1.21 applies.
Instead, note that if ϵ = 1, then for any δ > 0, we can choose x =
1
δ
and y =
1
δ
+
δ
2
. Clearly
|
x y
|
=
δ
2
< δ and
x
2
y
2
= 1 +
δ
2
4
> 1 = ϵ
whence f is not uniformly continuous.
3. f (x) = x sin
1
x
is continuous on the interval (0, ). Strictly, neither Theorem 1.18 nor 1.21
applies since the derivative
f
(x) = sin
1
x
1
x
cos
1
x
is unbounded as is the domain. However, by breaking the domain into two pieces. . .
11
On (1, ), the derivative is bounded:
|
f
(x)
|
1 +
1
x
2
2 by the triangle inequality.
Theorem 1.18 says f is uniformly continuous on (1, ).
f is continuous on (0, 1] and, by the squeeze theorem
x
n
0
+
= lim f (x
n
) = 0
Extending f so that f (0) = 0 defines a continuous extension. By Theorem 1.21, f is uni-
formly continuous on ( 0, 1].
Putting this together, f is uniformly continuous on (0, ). Indeed the function
h(x) =
(
x sin
1
x
if x = 0
0 if x = 0
is uniformly continuous on R.
Exercises 19 1. Which of the following functions are uniformly continuous on the specified set?
Justify your answers.
(a) f (x) = x
4
on [1, 1].
(b) f (x) = x
4
on (1, 1].
(c) f (x) = x
4
on ( 0, 2].
(d) f (x) = x
4
on ( 1, 2].
(e) f (x) = x
2
sin
1
x
on ( 0, 1].
2. Prove that each of the following functions is uniformly continuous on the indicated set by
verifying the ϵδ property.
(a) f (x) = 2x 14 on R.
(b) f (x) = x
3
on [1, 5].
(c) f (x) = x
1
on ( 1, ).
(d) f (x) =
x+1
x+2
on [0, 1].
3. Prove that f (x) = x
4
is not uniformly continuous on R.
4. (a) Suppose that f is uniformly continuous on a bounded interval I. Prove that f is bounded
on I.
(b) Use part (a) to write down a bounded interval on which the function f (x) = tan x is
defined, but not uniformly continuous.
5. (a) Let f (x) =
x with domain [0, ). Show that f
(x) is unbounded, but that f is still
uniformly continuous on [0, ).
(Hint: try δ = ϵ
2
and WLOG assume 0 y x. Now compute (
y + ϵ)
2
. . . )
(b) Prove that g(x) = x
1/3
is uniformly continuous on R.
(Hint: try δ = (
ϵ
2
)
3
and consider the cases x y 0, x y 0 and x > 0 > y separately)
12
20 Limits of Functions
You’ve likely seen many calculations of the following form in elementary calculus:
lim
x3
x
2
9
x 3
= lim
x3
(x 3)(x + 3)
x 3
= lim
x3
(x + 3) = 6
Our next goal is to make this notation precise and to tie it to our earlier notion of limit.
Definition 1.23. Suppose f : U R, that S U, and that a is the limit of a sequence
5
in S.
We write lim
xa
S
f (x) = L and say that L is the limit of f (x) as x tends to a along S, provided
(x
n
) S, lim x
n
= a = lim f (x
n
) = L
We can now define one-sided and two-sided limits:
Right-hand limit: lim
xa
+
f (x) = L means S = (a, b) U for which lim
xa
S
f (x) = L
Left-hand limit: lim
xa
f (x) = L means S = (c, a) U for which lim
xa
S
f (x) = L
Two-sided limit: lim
xa
f (x) = L means S = (c, a) (a, b) U for which lim
xa
S
f (x) = L
The one-sided definitions apply when a = ±, though we omit the ± modifiers: for instance,
lim
x
f (x) = L lim
x
S
f (x) = L for some S = (c, ) U
The subtlety in the definition is that for lim
xa
f (x) to be defined, the domain U of f must contain
a punctured neighborhood S of a: i.e. a U
. The one-sided limits similarly require a one-sided
punctured neighborhood. These conditions are always satisfied if U is a disjoint union of intervals
of positive length, in which case lim
xa
(±)
f (x) = L if and only if
lim f (x
n
) = L, (x
n
) U \ {a} tending to a (from above/below)
In this situation, Definition 1.6 recovers the familiar idea from elementary calculus:
f is continuous at a U f (a) =
lim
xa
f (x) when a U
lim
xa
±
f (x) when a U \ U
()
By modifying the proof of Theorem 1.8 in the case that a, L R are finite, the above can be
written in ϵ-language. For example lim
xa
f (x) = L means
ϵ > 0, δ > 0 such that (x R) 0 <
|
x a
|
< δ =
|
f (x) L
|
< ϵ
If a and/or L is infinite, use the language of unboundedness: e.g. lim
xa
f (x) = means
M > 0, δ > 0 such that 0 <
|
x a
|
< δ = f (x) > M
There are fifteen distinct combinations: three two-sided and six each of the one-sided limits!
5
I.e. a S or perhaps a = ± if S is unbounded.
13
Examples 1.24. 1. Let f (x) =
2+x
x
where dom( f ) = U = R \ {0} = (, 0) (0, )
The following should be clear:
lim
x3
f (x) =
5
3
lim
x
f (x) = 1
To compute the first, for instance, we could choose S = (0, 3) (3, ); if (x
n
) S and x
n
3,
then the limit laws justify the first claim
lim
n
f (x
n
) =
2 + 3
3
=
5
3
as does the fact that f is continuous at x = 3. The second claim can be checked similarly.
We can take one-sided limits at x = 0:
lim
x0
+
f (x) = and lim
x0
f (x) =
For instance, let (x
n
) (0, ) satisfy x
n
0. Again, the
limit laws show that lim
n
f (x
n
) = , which is enough to
justify the first claim.
Finally, the sequences defined by x
n
=
1
n
and y
n
=
1
n
both lie in S = R \{0} and converge to zero, yet
lim
n
f (x
n
) = = = lim
n
f (y
n
)
It follows that the two-sided limit lim
x0
f (x) does not exist.
9
6
3
3
6
9
f (x)
2 1 1 2
x
x
1
y
1
x
2
y
2
f (x
n
)
f (y
n
)
2. Let f (x) =
1
x
2
whenever x = 0 and additionally let f (0) = 0. Here the two-sided limit exists
lim
x0
f (x) =
However the value of the function at x = 0 does not equal this limit: clearly f is discontinuous
at x = 0.
3. We revisit our motivating example. Let f (x) =
x
2
9
x3
have domain U = R \ {3}. Whenever
x
n
= 3, we see that
f (x
n
) =
(x
n
3) (x
n
+ 3)
x
n
3
= x
n
+ 3
By the limit laws, we conclude that lim f (x
n
) = 3 + 3 = 6 and so
lim
x3
x
2
9
x 3
= 6
14
Since we referenced the limit laws so often in the above examples, it is appropriate to update them
to this new context. We do so without proof.
Corollary 1.25 (Limit Laws). Suppose f , g : U R satisfy L = lim
xa
f (x) and M = lim
xa
g(x) exist.
Then,
1. lim
xa
( f + g)(x) = L + M.
2. lim
xa
( f g)(x) = LM.
3. lim
xa
f
g
(x) =
L
M
(requires M = 0).
4. If L R and h is continuous at L, then lim
xa
(h f )(x) = h(L).
5. (Squeeze Theorem) If L = M and f (x) h(x) g(x) for all x U, then lim
xa
h(x) = L.
The corresponding results for one-sided limits also hold.
As with the original limit laws for sequences, parts 1–3 apply provided the limits are not indeterminate
forms (e.g. , 0 · ,
0
0
,
). We’ll see later how l’H
ˆ
opital’s rule may be applied to such cases.
Examples 1.26. 1. Since f (x) =
x
2
+5
3x
2
2
is a rational function (continuous at all points of its domain),
we quickly conclude that
lim
x2
x
2
+ 5
3x
2
2
= f (2) =
9
10
Alternatively, we may tediously invoke the other parts of the theorem:
lim
x2
x
2
+ 5
3x
2
2
(3)
=
lim(x
2
+ 5)
lim( 3x
2
2)
(1)
=
lim x
2
+ lim 5
lim 3x
2
lim 2
(2)
=
(lim x)
2
+ 5
(lim 3)(lim x)
2
2
=
2
2
+ 5
3 ·2
2
2
=
9
10
2. As x , the simplistic approach results in a nonsense indeterminate form:
lim
x
x
2
+ 5
3x
2
2
?
=
lim(x
2
+ 5)
lim( 3x
2
2)
?
=
However, a little pre-theorem algebra quickly yields
6
lim
x
x
2
+ 5
3x
2
2
= lim
x
1 + 5x
2
3 2x
2
=
lim( 1 + 5x
2
)
lim( 3 2x
2
)
=
1
3
6
Be careful! The expressions
x
2
+5
3x
2
2
and
1+5x
2
32x
2
do not describe the same function, yet their limits at are equal. Being
able easily to equate these limits is one of the advantages of the S formulation of Definition 1.23. Think about why; what
is a suitable set S in this context?
15
Classification of Discontinuities
We finish this section by considering the ways in which a function can fail to be continuous.
Definition 1.27. Suppose that a function is continuous on an interval except at finitely many values:
we call these isolated discontinuities.
Examples 1.28. 1. f (x) =
1
x
has a discontinuity at x = 0 since it is continuous on the interval R,
except at one point x = 0. Note that a function need not be defined at a discontinuity!
2. f (x) =
1
sin
1
x
has a non-isolated discontinuity at x = 0: on any interval containing zero, f has
infinitely many discontinuities: x =
1
πn
where
|
n
|
N.
The next result helps us classify isolated discontinuities.
Theorem 1.29. Let f : U R and suppose a U
is an interior point. Then
lim
xa
f (x) = L lim
xa
+
f (x) = L = lim
xa
f (x)
Proof. () Let S = (c, a) (a, b) satisfy the definition for lim
xa
f (x) = L. Since any sequence (say) in
S
+
is also in S, plainly S
+
= (a, b) and S
= (c, a) satisfy the one-sided definitions.
() Suppose S
= (c, a) and S
+
= (a, b) satisfy the one-sided definitions and denote S = S
S
+
.
Let (x
n
) S be such that x
n
a. Clearly (x
n
) is the disjoint union of two subsequences
(x
n
) S
+
and (x
n
) S
, both of which
7
converge to a. There are three cases:
L finite: Let ϵ > 0 be given. Because of the one-sided limits,
N
1
such that n > N
1
and x
n
> a =
|
f (x
n
) L
|
< ϵ
N
2
such that n > N
2
and x
n
< a =
|
f (x
n
) L
|
< ϵ
Now let N = max(N
1
, N
2
) in the definition of limit to see that lim f (x
n
) = L. Since this
holds for all sequences (x
n
) S converging to a, we conclude that lim
xa
f (x) = L.
L = ±: This is an exercise.
Example 1.30. Recalling elementary calculus, we show that the following is continuous at x = 1:
f (x) =
(
x
2
3 if x 1
3 5x if x < 1
Step 1: Compute the left- and right-handed limits and check that these are equal:
lim
x1
f (x) = lim
x1
3 5x = 2, lim
x1
+
f (x) = lim
x1
+
x
2
3 = 2
Step 2: Check that the value of the limits equals that of the function: f (1) = 1
2
3 = 2.
7
It is possible for one of these subsequences to be finite; say if x
n
> a for all large n. This is of no concern; one of the ϵ-N
conditions would be empty and thus vacuously true.
16
Recalling () on page 13, we describe the different types of isolated discontinuity at some point a.
Removable discontinuity The two-sided limit lim
xa
f (x) = L is fi-
nite, and either:
f (a) = L or f (a) is undefined.
The term comes from the fact that we can remove the discon-
tinuity by changing the behavior of f only at x = a:
˜
f (x) :=
(
f (x) if x = a
lim
xa
f (x) if x = a
is now continuous at x = a. In the pictures,
f
1
(x) =
x
2
9
x 3
and f
2
(x) =
(
x sin(
1
x
) if x = 0
1 if x = 0
have removable discontinuities at x = 3 and 0 respectively.
f
1
(x)
x
f
2
(x)
x
Jump Discontinuity The one-sided limits are finite but not equal. A
jump discontinuity cannot be removed by changing or insert-
ing a value at x = a. The picture shows
g(x) =
|
x
|
x
=
(
1 if x > 0
1 if x < 0
with a jump discontinuity at x = 0.
x
g(x)
Infinite discontinuity The one-sided limits exist but at least one is
infinite. We call the line x = a a vertical asymptote. The picture
shows
h(x) =
1
x
2
with an infinite discontinuity x = 0. The fact that the one-
sided limits of h are equal (and infinite) is irrelevant.
x
h(x)
Essential discontinuity At least one of the one-sided limits does
not exist. The picture shows j(x) = sin
1
x
for which neither of
the limits lim
x0
±
j(x) exist.
x
j(x)
It is also reasonable to refer to removable, infinite or essential discontinuities at interval endpoints.
17
Exercises 20 1. For the function f (x) =
x
3
|
x
|
, determine the limits lim
x
f (x), lim
x→−
f (x), lim
x0
f (x),
lim
x0
+
f (x) and lim
x0
f (x), if they exist.
2. Evaluate the following limits using the methods of this section
(a) lim
xa
x
a
x a
(b) lim
xa
x
3/2
a
3/2
x a
(c) lim
x0
1 + 3x
2
1
x
2
(d) lim
x→−
4 + 3x
2
2
x
3. Suppose that the limits L = lim
xa
+
f (x) and M = lim
xa
+
g(x) exist.
(a) Suppose f (x) g(x) for all x in some interval (a, b). Prove that L M.
(b) Do we have the same conclusion if we have f (x) < g(x) on (a, b), or can we conclude that
L < M? Prove your assertion, or give a counter-example.
4. Suppose that lim
x
f (x) = lim
x
g(x) = . Using only this information, which of the following
can you evaluate? Prove your assertions in each case.
(a) lim
x
( f + g)(x) (b) lim
x
( f g)(x)
(c) lim
x
( f g)(x) (d) lim
x
( f /g)(x)
5. Complete the proof of Theorem 1.29 by considering the L = ± cases.
6. Graph f : R R, find and identify the types of its discontinuities.
f (x) =
0 x = 0, ±1
x
|
x
|
0 <
|
x
|
< 1
x
2
|
x
|
> 1
7. Find the discontinuities and identify their types for the following function
f (x) =
(
1
x
sin
1
x
if x < 0 or x > 1
1
x
if 0 < x 1
8. Let a U
. Verify the claim following Definition 1.23: lim
xa
f (x) = L if and only if
ϵ > 0, δ > 0 such that 0 <
|
x a
|
< δ =
|
f (x) L
|
< ϵ
9. Recall Exercise 17.5, where we saw that a function f : U R is continuous at any isolated
point a U.
(a) Any function with domain dom( f ) = Z is continuous everywhere! Explain why we
cannot define any limits lim
xa
(±)
f (x) for such a function.
(Hint: Being unable to define a limit is different from saying lim f (x) = DNE: see page 13.)
(b) Suppose g(x) = x
2
h(x) has dom(g) = {0} {
1
n
: n Z}, where h is any function taking
values in the interval [1, 1]. Explain why g is continuous at every point of its domain.
(These awkward examples of continuity can be avoided if we follow our usual approach where a domain
is a union of intervals of positive length. This restriction is essentially baked in to the Definition 1.23.)
18
2 Sequences and Series of Functions
If ( f
n
) is a sequence of functions, what should we mean by lim f
n
? This question is of great relevance
to the history of calculus; Issac Newton’s work in the late 1600’s made great use of power series, which
are naturally constructed as limits of sequences of polynomials.
For instance, for each n N
0
, we might consider the polynomial function f
n
: R R defined by
f
n
(x) =
n
k=0
x
k
= 1 + x + ···+ x
n
This is easy to work with, to differentiate and integrate using the power law. What, however, are we
to make of the following series?
f (x) :=
n=0
x
n
= 1 + x + x
2
+ ···
Does this make sense? What is its domain? Does it equal the limit of the sequence ( f
n
) in a meaning-
ful way? Is it continuous, differentiable or integrable? Can we compute its limit/derivative/integral
term-by-term in the obvious way; for instance, is it legitimate to write
f
(x) =
n=1
nx
n1
= 1 + 2x + 3x
2
+ ···?
To many in Newton’s time, these questions were of diminished importance when compared to the
burgeoning applications of calculus to the natural sciences. However, for the 18
th
and 19
th
century
mathematicians who followed, the widespread application of calculus only increased the imperative
to rigorously address these issues.
23 Power Series
First we recall some of the important definitions, examples and results regarding infinite series.
Definition 2.1. Let (b
n
)
n=m
be a sequence of real numbers. The (infinite) series
b
n
is the limit of
the sequence (s
n
) of partial sums,
s
n
=
n
k=m
b
n
= b
m
+ b
m+1
+ ··· + b
n
,
n=m
b
n
= lim
n
s
n
The series
b
n
converges, diverges to infinity or diverges by oscillation
8
if the sequence (s
n
) does so.
b
n
is absolutely convergent if
|
b
n
|
converges. A convergent series that is not absolutely convergent
is conditionally convergent.
8
Recall that every sequence (s
n
) has subsequences tending to each of
lim sup s
n
= lim
N
sup{x
n
: n > N} and lim inf s
n
= lim
N
inf{x
n
: n > N}
If (s
n
) converges, or diverges to ±, then lim s
n
= lim sup s
n
= lim inf s
n
. The remaining case, divergence by oscillation,
is when lim inf s
n
= lim sup s
n
: there exist (at least) two subsequences tending to different limits.
Examples 2.2. These examples form the standard reference dictionary for analysis of more complex
series. Make sure you are familiar with them!
9
1. (Geometric series) Let r be a constant, then s
n
=
n
k=0
r
k
=
1r
n+1
1r
. It follows that
n=0
r
n
converges (absolutely) to
1
1r
if 1 < r < 1
diverges to if r 1
diverges by oscillation if r 1
2. (Telescoping series) If b
n
=
1
n(n+1)
, then s
n
=
n
k=1
b
n
= 1
1
n+1
=
n=1
1
n(n+1)
= 1.
3.
n=1
1
n
2
is (absolutely) convergent. In fact
n=1
1
n
2
=
π
2
6
, though checking this explicitly is tricky.
4. (Harmonic series)
n=1
1
n
is divergent to .
5. (Alternating harmonic series)
n=1
(1)
n
n
is conditionally convergent.
Theorem 2.3 (Root Test). Given a series
b
n
, let β = lim sup
|
b
n
|
1/n
,
If β < 1 then the series converges absolutely.
If β > 1 then the series diverges.
9
We give sketch proofs, and/or refer you to a standard ‘test.’ Review these if you are unfamiliar.
1. s
n
rs
n
= 1 + r + ··· + r
n
(r + ··· + r
n
+ r
n+1
) = 1 r
n+1
= s
n
=
1r
n+1
1r
.
2. b
n
=
1
n
1
n+1
= s
n
=
1
1
2
+
1
2
1
3
+ ··· +
1
n
1
n+1
= 1
1
n+1
.
3. Use the comparison or integral tests. Alternatively: For each n 2, we have
1
n
2
<
1
n(n1)
. By part 2,
s
n
=
n
k=1
1
k
2
< 1 +
n
k=1
1
k(k 1)
2
Since (s
n
) is a monotone up sequence, bounded above by 2, we conclude that
1
n
2
is convergent.
4. Use the integral test. Alternatively, observe that
s
2
n+1
s
2
n
=
2
n+1
k=2
n
1
1
k
2
n
2
n+1
=
1
2
= s
2
n
n
2
n
Since s
n
=
n
k=1
1
k
defines an increasing sequence we conclude that s
n
.
5. Use the alternating series test, or explicitly check that both the even and odd partial sums (s
2n
) and (s
2n+1
) are
convergent (monotone and bounded) to the same limit.
Root Test: β < 1 = ϵ > 0 such that
|
b
n
|
1/n
1 ϵ (for large n) =
|
b
n
|
converges by comparison with the
convergent geometric series
(1 ϵ)
n
.
β > 1 = a subsequence of (
|
b
n
|
1/n
) converges to β > 1, whence b
n
0 =
b
n
diverges (n
th
-term test).
20
The root test is inconclusive if β = 1. Some simple inequalities
10
yield a simpler test.
Corollary 2.4 (Ratio Test). Given a series
b
n
,
If lim sup
b
n+1
b
n
< 1 then
b
n
converges absolutely.
If lim inf
b
n+1
b
n
> 1 then
b
n
diverges.
We can now properly define and analyze our main objects of interest.
Definition 2.5. A power series centered at c R is a formal expression
n=m
a
n
(x c)
n
where (a
n
)
n=m
is a sequence of real numbers and x is considered a variable.
It is common to refer simply to a series, and modify by infinite/power when clarity requires. We
almost always have m = 0 or 1, and it is common for examples to be centered at c = 0.
Example 2.6. Using the geometric series formula, we see that
n=0
( 1)
n
2
n
(x 4)
n
=
1
1
(x4)
2
=
2
x 2
whenever
x 4
2
< 1 2 < x < 6
The series is valid (converges) only on a small subinterval of
the implied domain of the function x 7
2
x2
. The behavior
of both as x 2
+
should not be a surprise; evaluating the
power series results in the divergent infinite series
1 = +
By contrast, as x 6
, we see that limits and infinite series
do not interact the way we might expect,
lim
x6
n=0
( 1)
n
2
n
(x 4)
n
= lim
x6
2
x 2
=
1
2
n=0
lim
x6
( 1)
n
2
n
(x 4)
n
=
( 1)
n
= DNE
with the last divergent by oscillation.
As the example shows, we cannot take limits inside an infinite sum; understanding when we can do
this is one of our primary goals.
10
lim inf
b
n+1
b
n
lim inf
|
b
n
|
1/n
lim sup
|
b
n
|
1/n
lim sup
b
n+1
b
n
21
Radius and Interval of Convergence
At any real number x, a series may converge absolutely, converge conditionally, diverge to ±, or
diverge by oscillation. A series defines a function whose implied domain is the set on which the
series converges. In the previous example, the domain was an interval (2, 6). By applying the root
test (Theorem 2.3), we can show that this holds for every series.
Theorem 2.7 (Root Test for Power Series). Given a series
n=0
a
n
(x c)
n
, define
11
R =
1
lim sup
|
a
n
|
1/n
Exactly one of the following is true:
R = the series converges absolutely for all x R
R = 0 the series converges only when x = c
R R
+
the series converges absolutely when
|
x c
|
< R and diverges when
|
x c
|
> R
Proof. For each fixed x R, let b
n
= a
n
(x c)
n
and apply the root test to
b
n
, noting that
lim sup
|
b
n
|
1/n
=
0 if x = c or R =
if x = c and R = 0
lim sup
|
a
n
|
1/n
|
x c
|
=
1
R
|
x c
|
otherwise
In the final situation, lim sup
|
b
n
|
1/n
< 1
|
x c
|
< R, etc.
Definition 2.8. The radius of convergence is the value R defined in Theorem 2.7. The interval of
convergence is the set of values x for which the series converges; the implied domain.
Radius of convergence Interval of convergence
R = (, )
0 {c}
R ( c R, c + R), (c R, c + R], [c R, c + R), or [c R, c + R]
In the third case convergence/divergence at the endpoints of the interval of convergence must be
tested separately.
By applying Corollary 2.4, we obtain a more user-friendly result.
Corollary 2.9 (Ratio Test for Power Series). If the limit exists, R = lim
n
a
n
a
n+1
.
11
Since
|
a
n
|
0, we here adopt the convention that
1
0
= and
1
= 0. With similar caveats, it is also reasonable to write
R = lim inf
|
a
n
|
1/n
.
22
Examples 2.10. 1. The series
n=1
1
n
x
n
is centered at 0. The ratio test tells us that
R = lim
n
a
n
a
n+1
= lim
n
1/n
1/(n + 1)
= lim
n
n + 1
n
= 1
Test the endpoints of the interval of convergence separately:
x = 1
1
n
= diverges
x = 1
(1)
n
n
converges (conditionally)
We conclude that the interval of convergence is [1, 1).
It can be seen that the series converges to ln(1 x) on its
interval of convergence. As in Example 2.6, this function has
a larger domain (, 1) , than that of the series.
1
1
2
3
y
3 2 1 1
x
y =
n=0
1
n
x
n
y = ln(1 x)
2. The series
n=1
1
n
2
x
n
similarly has
R = lim
n
a
n
a
n+1
= lim
n
( n + 1)
2
n
2
= 1
Since
1
n
2
is absolutely convergent, we conclude that the power series also converges abso-
lutely at x = ±1; the interval of convergence is [1, 1].
3. The series
n=0
1
n!
x
n
converges absolutely for all x R, since
R = lim
n
a
n
a
n+1
= lim
n
( n + 1)!
n!
= lim
n
( n + 1) =
You should recall from elementary calculus that this series converges to the natural exponential
function exp(x) = e
x
everywhere on R; indeed this is one of the common definitions of the
exponential!
4. The series
n=0
n!x
n
has R = 0, and thus only converges at its center x = 0.
5. Let a
n
=
2
3
n
if n is even and
3
2
n
if n is odd. If we try to apply the ratio test to the series
n=0
a
n
x
n
, we see that
a
n
a
n+1
=
(
2
3
2n+1
if n even
3
2
2n+1
if n odd
= lim sup
a
n
a
n+1
= = 0 = lim inf
a
n
a
n+1
The ratio test therefore fails. However, by the root test,
|
a
n
|
1/n
=
(
2
3
if n even
3
2
if n odd
= R =
1
lim sup
|
a
n
|
1/n
=
1
3/2
=
2
3
It is easy to check that the series diverges at x = ±
2
3
; the interval of convergence is (
2
3
,
2
3
).
23
With the help of the root test, we can understand the domain of a power series. The issues of limits,
continuity, differentiability and integrability are more delicate. We will return to these once we’ve
developed some of the ideas around convergence for sequences of functions.
Exercises 23 1. For each power series, find the radius and interval of convergence:
(a)
( 1)
n
n
2
4
n
x
n
(b)
( n + 1)
2
n
3
(x 3)
n
(c)
nx
n
(d)
1
n
n
(x + 7)
n
(e)
(x π)
n!
(f)
3
n
n
x
2n+1
2. For each n N let a
n
=
4+2(1)
n
5
n
(a) Find lim sup
|
a
n
|
1/n
, lim inf
|
a
n
|
1/n
, lim sup
a
n+1
a
n
and lim inf
a
n+1
a
n
.
(b) Do the series
a
n
and
( 1)
n
a
n
converge? Why?
(c) Find the interval of convergence of the power series
a
n
x
n
.
3. Suppose that
a
n
x
n
has radius of convergence R. If lim sup
|
a
n
|
> 0, prove that R 1.
4. On the interval (
2
3
,
2
3
), express the series in Example 2.10.5 as a simple function.
(Hints: Use geometric series formulæ and the fact that the value of an absolutely convergent series is
independent of rearrangements)
5. Consider the power series
n=1
1
3
n
n
(x 7)
5n+1
=
1
3
(x 7) +
1
18
(x 7)
6
+
1
81
(x 7)
11
+ ···
Since only one in five of the terms are non-zero, it is a little tricky to analyze using a na
¨
ıve
application of our standard tests.
(a) Explain why the ratio test for power series (Corollary 2.9) does not apply.
(b) Writing the series as
a
m
(x 7)
m
, observe that
a
m
=
5
3
m1
5
(m1)
if m 1 mod 5
0 otherwise
Use the root test (Theorem 2.7) and your understanding of elementary limits to directly
compute the radius of convergence.
(c) Alternatively, write
1
3
n
n
(x 7)
5n+1
=
b
n
. Apply the ratio test for infinite series (Corol-
lary 2.4): what do you observe? Use your observation to compute the radius of conver-
gence of the original series in a simpler manner than part (a).
(d) Finally, check the endpoints to determine the interval of convergence.
24
24 Uniform Convergence
In this section we consider sequences ( f
n
) of functions f
n
: U R.
Example 2.11. For each n N, consider f
n
: (0, 1) R : x 7 x
n
.
0
1
f
1
(x)
0 1
x
0
1
f
2
(x)
0 1
x
0
1
f
5
(x)
0 1
x
0
1
f
50
(x)
0 1
x
There turn out to be several good notions of convergence for sequences of functions; the simplest it
where, for each x, ( f
n
(x)) is treated as a separate sequence of real numbers.
Definition 2.12. Suppose a function f and a sequence of functions ( f
n
) are given. We say that ( f
n
)
converges pointwise to f on U if,
x U, lim
n
f
n
(x) = f (x)
It is common to write f
n
f pointwise.’ For reference, we state two equivalent rephrasings:
1. x U,
|
f
n
(x) f (x)
|
n
0;
2. x U, ϵ > 0, N such that n > N =
|
f
n
(x) f (x)
|
< ϵ.
As we’ll see shortly, the relative positions of the quantifiers (x and N) is crucial: in this definition,
the value of N is permitted to depend on x as well as ϵ.
Example (2.11, mk. II). The sequence ( f
n
) converges pointwise on the domain (0, 1) to
f : ( 0, 1) R : x 7 0
We prove this explicitly as a sanity check. First observe that
|
f
n
(x) f (x)
|
= x
n
Suppose x (0, 1), that ϵ > 0 is given, and let N =
ln ϵ
ln x
.
Then
n > N = n ln x < ln ϵ = x
n
< ϵ
where the inequality switches sign since ln x < 0.
0
1
f
n
(x)
0 1
x
···
The example is nice in that a sequence of continuous functions converges pointwise to a continuous
function. Unfortunately, this desirable situation is not universal.
25
Example (2.11, mk. III). Extend the domain to include
x = 1; define
g
n
: (0, 1] R : x 7 x
n
Each g
n
is a continuous function, however its pointwise limit
g(x) =
(
0 if x < 1
1 if x = 1
has a jump discontinuity at x = 1.
0
1
g
n
(x)
0 1
x
···
With the goal of having convergence of functions preserve continuity, we make a tighter definition.
Definition 2.13. ( f
n
) converges uniformly to f on U if either
1. sup
xU
|
f
n
(x) f (x)
|
n
0, or,
2. ϵ > 0, N such that x U, n > N =
|
f
n
(x) f (x)
|
< ϵ
A common notation is f
n
f , though we won’t use it.
2ǫ
f (x)
f
n
(x)
Whenever n > N, the graph of f
n
(x) must lie between those of f (x) ± ϵ.
We’ll show that statements 1 and 2 are equivalent momentarily. For the present, compare with the
corresponding statements for pointwise convergence:
As with continuity versus uniform continuity, the distinction comes in the order of the quantifiers:
in uniform convergence, x is quantified after N and so the same N works for all x.
Uniform convergence implies pointwise convergence.
For the last time, we revisit our main example.
Example (2.11, mk. IV). If f
n
: (0, 1) R : x 7 x
n
and f are defined as before, then the pointwise
convergence f
n
f is non-uniform. We show this using both criteria.
1. For every n,
sup
x(0,1)
|
f
n
(x) f (x)
|
= sup{x
n
: 0 < x < 1} = 1 0
2. Suppose the convergence were uniform and let ϵ =
1
2
. Then
N N such that x (0, 1), n > N = x
n
<
1
2
Since N N, a simple choice results in a contradiction;
x =
1
2
1
N+1
(0, 1) = x
N+1
=
1
2
0
1
1
ǫ
ǫ
x
26
Theorem 2.14. The criteria for uniform convergence in Definition 2.13 are equivalent.
Proof. (1 2) This follows from the fact that
x U,
|
f
n
(x) f (x)
|
sup
xU
|
f
n
(x) f (x)
|
(2 1) Suppose ϵ > 0 is given. Then
N R such that x U, n > N =
|
f
n
(x) f (x)
|
<
ϵ
2
But then
n > N = sup
xU
|
f
n
(x) f (x)
|
ϵ
2
< ϵ
Somewhat amazingly, the subtle change of definition results in the preservation of continuity.
Theorem 2.15. Suppose that ( f
n
) is a sequence of continuous functions. If f
n
f uniformly, then
f is continuous.
Proof. We demonstrate the continuity of f at a U. Let ϵ > 0 be given.
Since f
n
f uniformly,
N such that x U, n > N =
|
f (x) f
n
(x)
|
<
ϵ
3
Choose any n > N. Since f
n
is continuous at a,
δ > 0 such that
|
x a
|
< δ =
|
f
n
(x) f
n
(a)
|
<
ϵ
3
(†)
Simply put these together with the triangle inequality to see that
|
x a
|
< δ =
|
f (x) f (a)
|
|
f (x) f
n
(x)
|
+
|
f
n
(x) f
n
(a)
|
+
|
f
n
(a) f (a)
|
<
ϵ
3
+
ϵ
3
+
ϵ
3
= ϵ
We need not have fixed a at the start of the proof. Rewriting () to become
δ > 0 such that x, a U,
|
x a
|
< δ =
|
f
n
(x) f
n
(a)
|
<
ϵ
3
proves a related result.
Corollary 2.16. If f
n
f uniformly where each f
n
is uniformly continuous, then f is uniformly
continuous.
27
Examples 2.17. 1. Let f
n
(x) = x +
1
n
x
2
. This is continuous on R for all x, and converges pointwise
to the continuous function f : x 7 x.
(a) On any bounded interval [M, M] the convergence f
n
f is uniform,
sup
x[M,M]
|
f
n
(x) f (x)
|
= sup
1
n
x
2
: x [M, M]
=
M
2
n
n
0
(b) On any unbounded interval, R say, the convergence is non-uniform,
sup
xR
|
f
n
(x) f (x)
|
= sup
1
n
x
2
: x R
=
2. Consider f
n
(x) =
1
1+x
n
; this is continuous on
( 1, ) and converges pointwise to
f (x) =
0 if x > 1
1
2
if x = 1
1 if 1 < x < 1
We consider the convergence f
n
f on several
intervals.
1
2
f
n
(x)
1 0 1 2 3
x
(a) On [2, ), the pointwise limit is continuous. Moreover, f
n
(x) is decreasing, whence
sup
x[2,)
|
f
n
(x) 0
|
=
1
1 + 2
n
n
0
and the convergence is uniform. Alternatively; if ϵ (0, 1), let N = log
2
( ϵ
1
1), then
x 2, n > N =
|
f
n
(x) 0
|
=
1
1 + x
n
1
1 + 2
n
<
1
1 + 2
N
= ϵ
The same argument shows that f
n
f uniformly on any interval [a, ) where a > 1.
(b) On [1, ) the convergence is not uniform, since the pointwise limit is discontinuous,
f (x) =
(
0 if x > 1
1
2
if x = 1
(c) The convergence is not even uniform on the open interval (1, ) ,
sup
x[1,)
|
f
n
(x) f (x)
|
= sup
1
1 + x
n
: x > 1
=
1
2
/
n
0
(d) Similarly, for any a (0, 1), the convergence f
n
f is uniform on [0, a], this time to the
(continuous) constant function f (x) = 1,
sup
x[0,a]
|
f
n
(x) 1
|
=
1
1
1 + a
n
=
a
n
1 + a
n
n
0
(e) Finally, on (1, 1) the convergence is not uniform,
sup
x[0,1)
|
f
n
(x) f (x)
|
= sup
x
n
1 + x
n
: x [0, 1)
=
1
2
/
n
0
28
Exercises 24 1. For each sequence of functions defined on [0, ):
(i) Find the pointwise limit f (x) as n .
(ii) Determine whether f
n
f uniformly on [0, 1].
(iii) Determine whether f
n
f uniformly on [1, ).
(a) f
n
(x) =
x
n
(b) f
n
(x) =
x
n
1 + x
n
(c) f
n
(x) =
x
n
n + x
n
(d) f
n
(x) =
x
1 + nx
2
(e) f
n
(x) =
nx
1 + nx
2
2. Let f
n
(x) =
x
1
n
2
. If f (x) = x
2
, we clearly have f
n
f pointwise on any domain.
(a) Prove that the convergence is uniform on [1, 1].
(b) Prove that the convergence is non-uniform on R.
3. For each sequence, find the pointwise limit and decide if the convergence is uniform.
(a) f
n
(x) =
1+2 cos
2
(nx)
n
for x R.
(b) f
n
(x) = cos
n
(x) on [π/2, π/2].
4. For each n N, consider the continuous function
f
n
: [0, 1] R : x 7 nx
n
(1 x)
(a) Given 0 x < 1, let a (x, 1). Explain why N such that
n > N =
|
f
n+1
(x)
|
a
|
f
n
(x)
|
Hence conclude that the pointwise limit of ( f
n
) is the zero function.
(b) Use elementary calculus ( f
n
(x) = 0 . . .) to prove that the maximum value of f
n
is
located at x
n
=
n
1+n
. Hence compute
sup
x[0,1]
|
f
n
(x) f (x)
|
and use it to show that the convergence f
n
0 is non-uniform.
This shows that the converse to Theorem 2.15 is false, even on a bounded interval: the continuous
sequence ( f
n
) converges non-uniformly to a continuous function. Sketches of several f
n
are below.
0
0 1
x
e
1
y = f
1
(x)
0
0 1
x
e
1
y = f
2
(x)
0
0 1
x
e
1
y = f
5
(x)
0
0 1
x
e
1
y = f
50
(x)
5. Explain where the proof of Theorem 2.15 fails if f
n
f non-uniformly.
29
25 More on Uniform Convergence
While we haven’t yet developed calculus, our familiarity with basic differentiation and integration
makes it natural to pause to consider the interaction of these operations with sequences of functions.
We also consider a Cauchy-criterion for uniform convergence, which leads to the useful Weierstraß
M-test.
Example 2.18. Recall that f
n
(x) = x
n
converges uniformly to f (x) = 0 on any interval [ 0, a] where
a < 1. We easily check that
Z
a
0
f
n
(x) dx =
1
n + 1
a
n+1
n
0 =
Z
a
0
f (x) dx
In fact the sequence of derivatives converge here also
d
dx
f
n
(x) = nx
n1
n
0 = f
(x)
It is perhaps surprising that integration interacts more nicely with uniform limits than does differen-
tiation. We therefore consider integration first.
Theorem 2.19. Let f
n
f uniformly on [a, b] where the functions f
n
are integrable. Then f is
integrable on [a, b] and
lim
n
Z
b
a
f
n
(x)dx =
Z
b
a
f (x)dx
Proof. Given ϵ > 0, note that
R
b
a
ϵ
2(ba)
dx =
ϵ
2
. Since f
n
f uniformly, N such that
12
x [a, b], n > N =
|
f
n
(x) f (x)
|
<
ϵ
2(b a)
= f
n
(x)
ϵ
2(b a)
< f (x) < f
n
(x) +
ϵ
2(b a)
=
Z
b
a
f
n
(x) dx
ϵ
2
Z
b
a
f (x) dx
Z
b
a
f
n
(x) dx +
ϵ
2
=
Z
b
a
f
n
(x) dx
Z
b
a
f (x) dx
ϵ
2
< ϵ
The appearance of uniform convergence in the proof is subtle. If N = N(ϵ) were allowed to depend
on x, then the integral
R
b
a
f
n
(x) dx would be meaningless: Which n would we consider? Larger than
N(x, ϵ) for which x? Taking n ‘larger than all the N(x, ϵ) might produce the absurdity n = !
12
This assumes f is already integrable. Once we’ve properly defined (Riemann) integrability at the end of the course, we
can insert the following
Z
b
a
f
n
(x) dx
ϵ
2
L( f ) U( f )
Z
b
a
f
n
(x) dx +
ϵ
2
= 0 U( f ) L( f ) ϵ = U( f ) = L( f )
where U( f ) and L( f ) are the upper and lower Darboux integrals of f ; their equality shows that f is integrable on [a, b].
30
Examples 2.20. 1. Uniform convergence is not required for the integrals to converge as we’d like.
For instance, recall that extending the previous example to the domain [0, 1] results in non-
uniform convergence; however, we still have
Z
1
0
f
n
(x) dx =
1
n + 1
n
0 =
Z
1
0
f (x) dx
2. To obtain a sequence of functions f
n
f for which
R
f
n
R
f requires a bit of creativity.
Consider the sequence
f
n
: [1, 1] R : x 7
(
n n
2
x if 0 < x <
1
n
0 otherwise
If 0 < x < 1, then for large n N we have
x
1
n
= f
n
(x) = 0
1 1
x
f
n
(x)
1
n
n
We conclude that f
n
0 pointwise. Since the area under f
n
is a triangle with base
1
n
and height
n, the integral is constant and non-zero;
Z
1
1
f
n
(x) dx =
1
2
= 0 =
Z
1
1
f (x) dx
It should be obvious why the convergence f
n
0 is non-uniform; why?
Derivatives and Uniform Limits We’ve already seen that a uniform limit of differentiable functions
might be differentiable (Example 2.18), but this shouldn’t be expected in general since even uniform
limits of differentiable functions can have corners!
Example 2.21. For each n N, consider the function
f
n
: [1, 1] R : x 7
(
|
x
|
if
|
x
|
1
n
n
2
x
2
+
1
2n
if
|
x
|
<
1
n
f
n
converges pointwise to f (x) =
|
x
|
.
f
n
f uniformly since
sup
x[1,1]
|
f
n
(x) f (x)
|
=
1
2n
0
Each f
n
is differentiable: f
n
(x) =
1 if x
1
n
nx if
|
x
|
<
1
n
1 if x
1
n
The uniform limit f is not differentiable at x = 0.
1
f
n
(x)
1 0 1
x
1
1
f
n
(x)
1 1
x
1
n
1
n
31
Transferring differentiability to the limit of a sequence of functions is a bit messy.
Theorem 2.22. Suppose ( f
n
) is a sequence and g is a function on [a, b] for which:
f
n
f pointwise;
Each f
n
is differentiable with continuous derivative;
13
f
n
g uniformly.
Then f
n
f uniformly on [a, b] and f is differentiable with derivative g.
The issue in the previous example is that the pointwise limit of the derived sequence ( f
n
) is discontin-
uous at x = 0 and therefore f
n
g isn’t uniform!
Proof. For any x [a, b], the fundamental theorem of calculus tells us that
Z
x
a
f
n
( t) dt = f
n
(x) f
n
(a)
By Theorem 2.19, the left side converges to
R
x
a
g( t) dt, while the right converges to f (x) f (a). Since
f
n
g uniformly, we see that g is continuous and we can apply the fundamental theorem again:
R
x
a
g( t) dt = f (x) f (a) is differentiable with derivative g.
The uniformity of the convergence f
n
f follows from Exercise 10.
Uniformly Cauchy Sequences and the Weierstraß M-Test
Recall that one may use Cauchy sequences to demonstrate convergence without knowing the limit in
advance. An analogous discussion is available for sequences of functions.
Definition 2.23. A sequence of functions ( f
n
) is uniformly Cauchy on U if
ϵ > 0, N N such that x U, m, n > N =
|
f
n
(x) f
m
(x)
|
< ϵ
Example 2.24. Let f
n
(x) =
n
k=1
1
k
2
sin k
2
x be defined on R. Given ϵ > 0, let N =
1
ϵ
, then
m > n > N =
|
f
m
(x) f
n
(x)
|
=
m
k=n+1
1
k
2
sin k
2
x
m
k=n+1
1
k
2
m
k=n+1
1
k(k 1)
=
m
k=n+1
1
k 1
1
k
=
1
n
1
m
<
1
N
= ϵ
whence ( f
n
) is uniformly Cauchy.
13
Without the continuity assumption, the fundamental theorem of calculus doesn’t apply and the proof requires an
alternative approach. One can also weaken the hypotheses: if f
n
g uniformly and that ( f
n
(x)) converges for at least one
x [a, b], then there exists f such that f
n
f is uniform and f
= g.
32
As with sequences of real numbers, uniformly Cauchy sequences converge; in fact uniformly!
Theorem 2.25. A sequence ( f
n
) is uniformly Cauchy on U if and only if it converges uniformly to
some f : U R.
Proof. () Let ( f
n
) be uniformly Cauchy on U. For each x U, the sequence ( f
n
(x)) R is Cauchy
and thus convergent. Define f : U R via
f (x) := lim
n
f
n
(x)
We claim that f
n
f uniformly. Let ϵ > 0 be given, then
N N such that m > n > N =
|
f
n
(x) f
m
(x)
|
<
ϵ
2
= f
n
(x)
ϵ
2
< f
m
(x) < f
n
(x) +
ϵ
2
= f
n
(x)
ϵ
2
f (x) f
n
(x) +
ϵ
2
(take limits as m )
=
|
f
n
(x) f (x)
|
ϵ
2
< ϵ
() This is Exercise 2.
Example (2.24, mk. II). Since ( f
n
) is uniformly Cauchy on R, it converges uniformly to some
f : R R. It seems reasonable to write
f (x) =
n=1
1
n
2
sin n
2
x
The graph of this function looks somewhat bizarre:
1
1
f (x)
x
2ππ2π π
Since each f
n
is (uniformly) continuous, Theorem 2.15 says that f is also (uniformly) continuous. By
Theorem 2.19, f (x) is integrable, indeed
Z
b
a
f (x) dx = lim
n
n
k=1
k
4
cos k
2
x
b
a
=
n=1
1
n
4
(cos n
2
a cos n
2
b)
which converges (comparison test) for all a, b. By contrast, the derived sequence
f
n
(x) =
n
k=1
cos k
2
x
does not converge for any x since cos n
2
x
/
k
0. We should thus expect (though we offer no proof)
that f is nowhere differentiable.
33
The example generalizes. Suppose (g
k
) is a sequence of functions on U and define the series
g
k
(x)
as the pointwise limit of the sequence ( f
n
) of partial sums
k=k
0
g
k
(x) := lim
n
f
n
(x) where f
n
(x) =
n
k=k
0
g
k
(x)
whenever the limit exists. The series is said to converge uniformly whenever ( f
n
) does so. Theorems
2.15, 2.19 and 2.22 immediately translate.
Corollary 2.26. Let
g
k
be a series of functions converging uniformly on U. Then:
1. If each g
k
is (uniformly) continuous then
g
k
is (uniformly) continuous.
2. If each g
k
is integrable, then
R
g
k
(x) dx =
R
g
k
(x) dx.
3. If each g
k
is continuously differentiable, and the sequence of derived partial sums f
n
converges
uniformly, then
g
k
is differentiable and
d
dx
g
k
(x) =
g
k
(x).
As an application of the uniform Cauchy criterion, we obtain an easy test for uniform convergence.
Theorem 2.27 (Weierstraß M-test). Suppose (g
k
) is a sequence of functions on U. Moreover as-
sume:
1. (M
k
) is a non-negative sequence such that
M
k
converges.
2. Each g
k
is bounded by M
k
; that is
|
g
k
(x)
|
M
k
.
Then
g
k
(x) converges uniformly on U.
Proof. Let f
n
(x) =
n
k=k
0
g
k
(x) define the sequence of partial sums. Since
M
k
converges, its sequence
of partial sums is Cauchy (the Cauchy criterion for infinite series); given ϵ > 0,
N such that m > n > N =
m
k=n+1
M
k
< ϵ
However, by assumption,
m > n > N =
|
f
m
(x) f
n
(x)
|
=
m
k=n+1
g
k
(x)
m
k=n+1
|
g
k
(x)
|
m
k=n+1
M
k
< ϵ
The sequence of partial sums is uniformly Cauchy and thus uniformly convergent.
Example 2.28. Given the series
n=1
1+cos
2
(nx)
n
2
sin(nx), we clearly have
1 + cos
2
( nx)
n
2
sin(nx)
2
n
2
for all x R
Since
2
n
2
converges, the M-test shows that the original series converges uniformly on R.
34
Exercises 25 1. For each n N, let f
n
(x) = nx
n
when x [0, 1) and f
n
(1) = 0.
(a) Prove that f
n
0 pointwise on [0, 1].
(Hint: recall Exercise 24.4 if you’re not sure how to prove this)
(b) By considering the integrals
R
1
0
f
n
(x) dx show that f
n
0 is not uniform.
2. Prove that if f
n
f uniformly, then the sequence ( f
n
) is uniformly Cauchy.
3. (a) Suppose ( f
n
) is a sequence of bounded functions on U and suppose that f
n
f converges
uniformly on U. Prove that f is bounded on U.
(b) Give an example of a sequence of bounded functions ( f
n
) converging pointwise to f on
[0, ), but for which f is unbounded.
4. The sequence defined by f
n
(x) =
nx
1+nx
2
(Exercise 24.1) converges uniformly on any closed
interval [a, b] where 0 < a < b.
(a) Check explicitly that
R
b
a
f
n
(x) dx
R
b
a
f (x) dx, where f = lim f
n
.
(b) Is the same thing true for derivatives?
5. Let f
n
(x) = n
1
sin n
2
x be defined on R.
(a) Prove that f
n
converges uniformly on R.
(b) Check that
R
x
0
f
n
( t) dt converges for any x R.
(c) Does the derived sequence ( f
n
) converge? Explain.
6. Use the M-test to prove that
n=1
x
n
n
2
defines a continuous function on [1, 1].
7. Prove that
n=1
x
n
sin x
( n + 1)
3
2
n
converges uniformly to a continuous function on the interval [2, 2].
8. Prove that if
g
k
converges uniformly on a set U and if h is a bounded function on U, then
hg
k
converges uniformly on U.
(Warning: you cannot simply write
hg
k
= h
g
k
)
9. Consider Example 2.20.2.
(a) Check explicitly that the convergence isn’t uniform by computing sup
x[1,1]
|
f
n
(x) f (x)
|
(b) Prove that f
n
0 pointwise on (0, 1] using the ϵN definition of convergence: that is,
given ϵ > 0 and x (0, 1], find an explicit N(x, ϵ ) such that
n > N =
|
f (x)
|
< ϵ
What happens to your choice of N(x, ϵ) as x 0
+
?
10. Suppose ( f
n
) converges uniformly on [a, b] and that each f
n
is continuous.
(a) Use the fact that ( f
n
) is uniformly Cauchy to prove that ( f
n
) is uniformly Cauchy and thus
converges uniformly to some function f .
(Hint:
|
f
n
(x) f
m
(x)
|
=
R
x
a
f
n
( t) f
m
( t) dt
. . .)
(b) Explain why we need not have assumed the existence of f in Theorem 2.22.
35
26 Differentiation and Integration of Power Series
In this section we specialize our recent results to power series. While everything will be stated for
series centered at x = 0, all are easily translated to arbitrary centers.
Theorem 2.29. Let
a
n
x
n
be a power series with radius of convergence R > 0 and let T (0, R).
Then:
1. The series converges uniformly on [T, T].
2. The series is uniformly continuous on [T, T] and continuous on (R, R).
Proof. This is an easy application of the M-test. For each k, define M
k
=
|
a
k
|
T
k
,
T < R =
a
n
T
n
converges absolutely =
M
k
converges
By the M-test and Corollary 2.26, the power series converges uniformly on [ T, T] to a uniformly
continuous function.
Finally, every x (R, R) lies in some such interval (take T =
|
x
|
), whence the power series is
continuous on (R, R).
Example 2.30. On its interval of convergence (1, 1), the geometric series
n=0
x
n
converges point-
wise to
1
1x
; convergence is uniform on any interval [T, T] (1, 1).
We needn’t use the Theorem for this is simple to verify directly: writing f , f
n
for the series and its
partial sums,
|
f
n
(x) f (x)
|
=
1 x
n+1
1 x
1
1 x
=
x
n+1
1 x
= sup
x[T,T]
|
f
n
(x) f (x)
|
=
T
n+1
1 T
n
0
By contrast, the convergence is non-uniform on (1, 1);
sup
x(1,1)
|
f
n
(x) f (x)
|
=
Theorem 2.31. Suppose
n=0
a
n
x
n
has radius of convergence R > 0. Then the series is integrable and
differentiable term-by-term on the interval (R, R). Indeed for any x (R, R),
d
dx
n=0
a
n
x
n
=
n=1
na
n
x
n1
and
Z
x
0
n=0
a
n
t
n
dt =
n=0
a
n
n + 1
x
n+1
where both series also have radius of convergence R.
36
Proof. Let f (x) =
a
n
x
n
have radius of convergence R, and observe that
lim sup
|
na
n
|
1/n
= lim n
1/n
lim sup
|
a
n
|
1/n
=
1
R
whence
na
n
x
n
also has radius of convergence R. At any given non-zero x (R, R), we may write
n=1
na
n
x
n1
= x
1
n=1
na
n
x
n
to see that the derived series also has radius of convergence R. On any interval [T, T] (R, R), the
derived series converges uniformly (Theorem 2.29). Since each a
n
x
n
is continuously differentiable,
Corollary 2.26 says that f is differentiable on [T, T] and that
f
(x) =
n=0
d
dx
a
n
x
n
=
n=1
na
n
x
n1
Since any x (R, R) lies in some such interval [T, T], we are done.
The corresponding result for integrals is Exercise 6.
We postpone the canonical examples until after the next result.
Continuity at Endpoints?
There is one small hole in our analysis. If a series has radius of convergence R we know that it
converges and is continuous on (R, R). But what if it additionally converges at x = ±R? Is the
series continuous at the endpoints? The answer is an unequivocal yes, though this small benefit
requires a lot of work!
Theorem 2.32 (Abel’s Theorem). Power series are continuous on their full interval of convergence.
Examples 2.33. 1. Apply our results to the geometric series;
1
(1 x)
2
=
d
dx
1
1 x
=
n=1
nx
n1
=
n=0
( n + 1)x
n
= 1 + 2x + 3x
2
+ 4x
3
+ ···
ln( 1 x) =
Z
x
0
1
1 t
dt =
n=0
1
n + 1
x
n+1
=
n=1
1
n
x
n
=
x +
1
2
x
2
+
1
3
x
3
+ ···
with both series valid on (1, 1). In fact the first series has the same interval of convergence,
while the second is [1, 1). By Abel’s Theorem and the fact that logarithms are continuous, we
have equality at x = 1 and the famous identity
ln 2 =
n=1
( 1)
n+1
n
= 1
1
2
+
1
3
1
4
+ ···
This example shows that while the integrated and differentiated series have the same radius of
convergence as the original, convergence at the endpoints need not be the same in all cases.
37
2. Substitute x 7 x
2
in the geometric series and integrate term-by-term: if
|
x
|
< 1, then
1
1 + x
2
=
n=0
( 1)
n
x
2n
= arctan x =
n=0
( 1)
n
2n + 1
x
2n+1
In fact the arctangent series also converges at x = ±1; Abel’s Theorem says it is continuous on
[1, 1]. Since arctangent is continuous (on R!) we recover another famous identity
π
4
= arctan 1 =
n=0
( 1)
n
2n + 1
= 1
1
3
+
1
5
1
7
+ ···
As with the identity for ln 2, this is a very slowly converging alternating series and therefore
doesn’t provide an efficient method for approximating π.
3. The series f (x) =
n=0
( 1)
n
(2n)!
x
2n
has radius of convergence . Differentiate to obtain
f
(x) =
n=1
( 1)
n
x
2n1
(2n 1) !
=
n=0
( 1)
n+1
(2n + 1) !
x
2n+1
This series is also valid for all x R. Differentiating again,
f
′′
(x) =
n=0
( 1)
n+1
(2n)!
x
2n
=
n=0
( 1)
n
(2n)!
x
2n
= f (x)
Recalling that f (x) = cos x is the unique solution to the initial value problem
(
f
′′
(x) = f (x)
f (0) = 1, f
(0) = 0
We conclude that, x R,
cos x =
n=0
( 1)
n
(2n)!
x
2n
sin x = f
(x) =
n=0
( 1)
n
(2n + 1) !
x
2n+1
These expressions can instead be taken as the definitions of sine and cosine. As promised earlier in
the course, continuity and differentiability now come for free. One difficulty with this definition is
believing that it has anything to do with right-triangles!
We can similarly define other common transcendental functions using power series: for instance
exp(x) =
n=0
1
n!
x
n
Example 2.33.1 could be taken as a definition of the logarithm on the interval ( 0, 2],
ln x = ln(1 (1 x)) =
n=1
1
n
(1 x)
n
=
n=1
( 1)
n+1
n
(x 1)
n
though this is unnecessary since it is more natural to define ln as the inverse of the exponential.
38
Proof of Abel’s Theorem (non-examinable)
This requires a lot of work, so feel free to omit on a first reading!
First observe that there is nothing to check unless 0 < R < . By the change of variable x 7→ ±
x
R
, it
is enough for us to prove the following:
n=0
a
n
convergent and f (x) =
n=0
a
n
x
n
on (1, 1) = lim
x1
f (x) =
n=0
a
n
Proof. Let s
n
=
n
k=0
a
k
and write s = lim s
n
=
a
n
. It is an easy exercise to check that
n
k=0
a
k
x
k
= s
n
x
n
+ (1 x)
n1
k=0
s
k
x
k
If
|
x
|
< 1, then (since s
n
s) lim s
n
x
n
= 0, whence we obtain
x (1, 1), f (x) = (1 x)
n=0
s
n
x
n
Let ϵ (0, 1) be given and fix x ( 0, 1). Then
N N such that n > N =
|
s
n
s
|
<
ϵ
2
()
Use the geometric series formula
n=0
x
n
=
1
1x
and write h(x) = (1 x)
N
n=0
( s
n
s)x
n
to observe
|
f (x) s
|
=
(1 x)
n=0
s
n
x
n
s
=
(1 x)
n=0
s
n
x
n
s(1 x)
n=0
x
n
=
(1 x)
n=0
( s
n
s)x
n
= (1 x)
N
n=0
( s
n
s)x
n
+
n=N+1
( s
n
s)x
n
(1 x)
N
n=0
( s
n
s)x
n
+ (1 x)
n=N+1
( s
n
s)x
n
(-inequality)
< h(x) +
ϵ
2
(1 x)
n=N+1
x
n
(by ())
h(x) +
ϵ
2
Since h > 0 is continuous and h( 1) = 0, δ > 0 such that x (1 δ, 1) = h(x) <
ϵ
2
(the
computation of a suitable δ is another exercise).
We conclude that lim
x1
f (x) = s.
39
Exercises 26 1. (a) Prove that
n=1
nx
n
=
x
(1 x)
2
for
|
x
|
< 1.
(b) Evaluate
n=1
n
2
n
,
n=1
n
4
n
and
n=1
( 1)
n
n
4
n
2. (a) Starting with a power series centered at x = 0, evaluate the integral
Z
1/2
0
1
1 + x
4
dx as an
infinite series.
(b) (Harder) Repeat part (a) but for
Z
1
0
1
1 + x
4
dx. What extra ingredients do you need?
3. The probability that a standard normally distributed random variable X lies in the interval [a, b]
is given by the integral
P(a X b) =
1
2π
Z
b
a
exp
x
2
2
dx
Find P(1 X 1) as an infinite series.
4. Define c(x) =
n=0
x
2n
(2n)!
and s(x) =
n=0
x
2n+1
(2n + 1) !
.
(a) Prove that c
(x) = s(x) and that s
(x) = c(x).
(b) Prove that c(x)
2
s(x)
2
= 1 for all x R.
(These functions are the hyperbolic sine and cosine: s(x) = sinh x and c(x) = cosh x)
5. Let a, b (1, 1). Extending Example 2.30, show that the convergence
x
n
=
1
1x
is non-
uniform on any interval of the form (1, a) or (b, 1).
6. Prove the integration part of Theorem 2.31.
7. Prove or disprove: If a series converges absolutely at the endpoints of its interval of convergence
then its convergence is uniform on the entire interval.
8. Complete the proof of Abel’s Theorem:
(a) Let s
n
=
n
k=0
a
k
be the partial sum of the series
a
n
. For each n, prove that,
n
k=0
a
k
x
k
= s
n
x
n
+ (1 x)
n1
k=0
s
k
x
k
(b) Suppose x > 0. Let S = max{
|
s
n
s
|
: n N} and prove that h(x) S(1 x
N+1
). Hence
find an explicit δ that completes the final step.
40
27 The Weierstraß Approximation Theorem
A major theme of analysis is approximation; for instance power series are an example of (uniform)
approximation by polynomials. It is reasonable to ask whether any function can be so approximated.
In 1885, Weierstraß answered a specific case in the affirmative.
Theorem 2.34 (Weierstraß). If f : [a, b] R is continuous, then there exists a sequence of polyno-
mials converging uniformly to f on [a, b].
Suitable polynomials can be defined in various ways. By scaling the domain, it is enough to do this
on [a, b] = [0, 1] where perhaps the simplest approach is via the Bernstein Polynomials,
B
n
f (x) :=
n
k=0
n
k
f
k
n
x
k
(1 x)
nk
(
(
n
k
)
=
n!
k!(nk)!
is the binomial coefficient)
We omit the proof due to length; Weierstraß’ original argument was completely different. Instead we
compute a couple of examples and give an important interpretation/application.
Examples 2.35. 1. Suppose f (x) = 2x if x <
1
2
and f (x) = 1 otherwise.
B
1
f (x) = f (0)(1 x) f (0) + f (1)x = x
B
2
f (x) = f (0)(1 x)
2
+ 2 f (
1
2
)x(1 x) + f (1)x
2
= 2x(1 x) + x
2
= x(2 x)
0
1
0 1
B
3
f (x) = f (0)(1 x)
3
+ 3 f (
1
3
)x(1 x)
2
+ 3 f (
2
3
)x
2
(1 x) + f (1)x
3
= 0(1 x)
3
+ 2x(1 x)
2
+ 3x
2
(1 x) + x
3
= x(2 x) = B
2
f (x)
B
4
f (x) = 0(1 x)
4
+ 2x(1 x)
3
+ 6x
2
(1 x)
2
+ 4x
3
(1 x) + x
4
= x(x
3
2x
2
+ 2)
The Bernstein polynomials B
2
f (x), B
4
f (x) and B
50
f (x) are drawn.
2. Now assume f (x) = x if x <
1
2
and 1 x otherwise.
B
1
f (x) = f (0)(1 x) + f (1)x = 0
B
2
f (x) = x(1 x)
B
3
f (x) = 0(1 x)
3
+ x(1 x)
2
+ x
2
(1 x) + 0x
3
= x(1 x) = B
2
f (x)
0
0 1
1
2
B
4
f (x) = f (0)(1 x)
4
+ f (
1
4
)·4x(1 x)
3
+ f (
1
2
)·6x
2
(1 x)
2
+ f (
3
4
)·4x
3
(1 x) + f (1)x
4
= x(1 x)
3
+ 3x
2
(1 x)
2
+ x
3
(1 x)
= x(1 x)(1 + x x
2
)
41
B´ezier curves (just for fun!)
The Bernstein polynomials arise naturally when con-
sidering ezier curves. These have many applications,
particularly in computer graphics. Given three points
A, B, C, define points on the line segments
AB and
BC
for each t [0, 1], via
AB(t) = (1 t)A + tB
BC(t) = (1 t)B + tC
These points move at a constant speed along the cor-
responding segments. Now consider a point on the
moving segment between the points defined above:
0
y
0 1
x
A
B
C
AB
B C
1
2
R( t) := (1 t)
AB(t) + t
BC(t) = (1 t)
2
A + 2t(1 t)B + t
2
C
This is the quadratic B´ezier curve with control points A, B, C. The 2
nd
Bernstein polynomial for a function
f is simply the quadratic B
´
ezier curve with control points
(
0, f (0)
)
,
1
2
, f (
1
2
)
and
(
1, f (1)
)
. The
picture
14
above shows B
2
f (x) for the above example.
We can repeat the construction with more control points: with four points A, B, C, D, one constructs
AB(t),
BC(t),
CD(t), then the second-order points between these, and finally the cubic B
´
ezier curve
R( t) : = (1 t)
(1 t)
AB(t) + t
BC(t)
+ t
(1 t)
BC(t) + t
CD(t)
= (1 t)
3
A + 3t(1 t)
2
B + 3t
2
(1 t)C + t
3
D
where we now recognize the relationship to the 3
rd
Bernstein polynomial.
0
1
y
0
1
x
A
B
C
D
A
B
C
D
The pictures show cubic B
´
ezier curves: the first is the graph of the Bernstein polynomial
B
3
f (x) = 0(1 x)
3
+ 3x(1 x)
2
+ 3x
2
(1 x) +
2
3
x
3
while the second is for the four given control points A, B, C, D.
14
To see these pictures move, visit https://www.math.uci.edu/
~
ndonalds/math140b/bezier.html
42
Exercises 27 1. Show that the closed bounded interval assumption in the approximation theorem
is required by giving an example of a continuous function f : (1, 1) R which is not the
uniform limit of a sequence of polynomials.
2. If g : [a, b] R is continuous, then f (x) := g
( b a)x + a
is continuous on [0, 1]. If P
n
f
uniformly on [0, 1], prove that Q
n
g uniformly on [a, b], where
Q
n
(x) = P
n
x a
b a
3. Use the binomial theorem to check that every Bernstein polynomial for f (x) = x is B
n
f (x) = x
itself!
4. Find a parametrization of the cubic B
´
ezier curve with control points (1, 0), (0, 1), (1, 0) and
(0, 1). Now sketch the curve.
(Use a computer algebra package if you like!)
5. (Hard) Show that the Bernstein polynomials for f (x) = x
2
are given by
B
n
f (x) =
1
n
x +
n 1
n
x
2
and thus verify explicitly that B
n
f f uniformly.
43
3 Differentiation
Differentiation grew out of the problem of instantaneous velocity. Velocity can only easily be measured
as an average over a time interval:
15
if an object travels d meters in t seconds, then its average ve-
locity is v
av
=
d
t
ms
1
. An early ‘definition’ (dating to the 1300’s) makes the instantaneous velocity
equal to the constant velocity that would be observed if a body were to stop accelerating: while use-
less for the purposes of measurement, this is essentially Newton’s first law regarding inertial motion
(1687). We also see the concept of the tangent line beginning to appear: if one graphs position against
time, then a couple of things should be clear:
The graph of inertial (constant speed) motion is a straight line whose slope is the velocity.
The tangent line to a curve at a point has slope equal to the instantaneous velocity at that point.
The problem of finding, defining and computing instantaneous velocity thus morphed into the con-
sideration of tangent lines to curves. With the advent of analytic geometry in the early 1600’s,
mathematicians such as Fermat and Descartes pioneered versions of the familiar secant (‘cutting’)
line method for computing tangents.
d
t
a
t
d
v =
d
t
d
t
a
t
Instantaneous velocity equals constant
velocity corresponding to tangent line
Secant lines approximate tangent line as t a
The average velocity of the particle over the time interval [a, t] is the slope of the secant line, namely
v
av
(a, t) =
d( t) d(a)
t a
Since the secant lines approximate the tangent line as t approaches a, it seems reasonable that we
should compute the instantaneous velocity in this manner:
v(a) = lim
ta
v
av
(a, t) = lim
ta
d( t) d(a)
t a
This is, of course, the modern definition of the derivative.
15
Even a modern technique such as Doppler-shift compares measurements separated by the extremely small period of a
light or soundwave. These are still therefore average velocities, albeit taken over very small time intervals.
28 Basic Properties of the Derivative
Definition 3.1. Let f : U R and let a U. We say that f is differentiable at a if the following limit
exists (is finite!)
lim
xa
f (x) f (a)
x a
We call this limit the derivative of f at a and denote its value by either f
(a) or
d f
dx
x=a
.
If f
(a) exists for all a U then f is differentiable (on U); the derivative becomes a function f
(x) =
d f
dx
.
Notation The contrasting styles are partly attributable, to the primary founders of calculus, Issac
Newton and Gottfried Leibniz. Each has its pros and cons and you should be comfortable with both.
One-sided derivatives Since the defining limit is two-sided, differentiability only makes sense at
interior points of U. Left- and right-derivatives may be defined via one-sided limits; differentiability
is equivalent to these being equal. All results in this section hold for one-sided derivatives with
suitable (sometimes tedious) modifications. It is quite common, though strictly incorrect, to say that
f is differentiable on an interval [a, b) if it is differentiable on the interior (a, b) and right-differentiable
at a; however, we will strictly adhere to differentiable meaning two-sided.
Examples 3.2. 1. Let f (x) = x
2
+ 4x. Then, for any a R,
lim
xa
f (x) f (a)
x a
= lim
xa
x
2
+ 4x a
2
4a
x a
= lim
xa
(x a)(x + a + 4)
x a
= lim
xa
(x + a + 4) = 2a + 4
Note how the definition of lim
xa
allows us to cancel the x a terms from the numerator and
denominator. We conclude that f is differentiable (on R) and that f
(x) = 2x + 4.
2. Let g(x) =
x+1
2x3
. Then, for any a =
3
2
,
lim
xa
f (x) f (a)
x a
= lim
xa
1
x a
x + 1
2x 3
a + 1
2a 3
= lim
xa
5a 5x
(x a)(2x 3)(2a 3)
= lim
xa
5
(2x 3)(2a 3)
=
5
(2a 3)
2
f is therefore differentiable on its domain R \ {
3
2
} with derivative f
(x) =
5
(2x3)
2
.
The familiar expressions
f
(a) = lim
h0
f (a + h) f (a)
h
, f
(x) = lim
h0
f (x + h) f (x)
h
are equivalent to the original definition (see Exercise 5). While seemingly simpler, they sometimes
lead to nastier calculations: see what happens if you try the previous example in this language. . .
45
We now turn to possibly the most well-known result of Freshman Calculus.
Theorem 3.3 (Power Law). Let r R. Then f (x) = x
r
is differentiable with f
(x) = rx
r1
.
The domains of f and f
depend messily on r, but the above certainly holds on the interval (0, ).
We leave a complete proof to the exercises and instead consider a few generalizable examples.
Examples 3.4. 1. If n N and a R, a simple factorization yields
lim
xa
x
n
a
n
x a
= lim
xa
(x a)(x
n1
+ ax
n2
+ ··· + a
n2
x + a
n1
)
x a
()
= lim
xa
(x
n1
+ ax
n2
+ ··· + a
n2
x + a
n1
) = na
n1
We conclude that
d
dx
x
n
= nx
n1
.
2. If f (x) = x
1
and a = 0, then
lim
xa
x
1
a
1
x a
= lim
xa
a x
ax(x a)
= lim
xa
1
ax
=
1
a
2
from which we conclude that f
(x) = x
2
.
A similar approach followed by the factorization () proves the
power law for all negative integer exponents:
x
n
a
n
x a
=
a
n
x
n
a
n
x
n
(x a)
= ···
2
1
1
2
y
2 1 1 2
x
3. To differentiate x
1/n
, simply substitute x = y
n
and observe case 1.
If g(x) = x
1/3
and a = 0, then y = x
1/3
and b = a
1/3
yield
lim
xa
x
1/3
a
1/3
x a
= lim
yb
y b
y
3
b
3
=
1
3b
2
=
1
3
a
2/3
= g
(x) =
1
3
x
2/3
Note that g is not differentiable at x = 0!
1
1
2
y
2 1 1 2
x
We could similarly compute the derivative for all rational exponents, though it is much easier to wait
for the chain rule. The power law for irrational exponents is somewhat more ticklish.
Corollary 3.5 (Basic Transcendental Functions). Recalling our development of power series in the
previous chapter, the power law (for positive integers!) is all we need to see that
d
dx
exp(x) = exp(x),
d
dx
sin x = cos x,
d
dx
cos x = sin x
It is also possible to develop these results independently of power series (see e.g. Exercise 9).
46
Failure of differentiability
It is instructive to consider when a function can fail to be differentiable. First a simple result shows
that functions are not differentiable at discontinuities.
Theorem 3.6. If f is differentiable at a then f is continuous at a.
Proof. Simply take the limit (think carefully why this works!):
lim
xa
f (x) = lim
xa
f (x) f (a)
x a
(x a) + f (a)
= f
(a)(0 0) + f (a) = f (a)
It remains to consider situations when a function is continuous but not differentiable.
Examples 3.7. The following cover all situations where a function is continuous on an interval and
differentiable everywhere except at a single interior point; similarly to isolated discontinuities, these
are classified by considering the three ways in which the derivative limit might not exist.
1. A vertical tangent line occurs when the derivative is infinite. For instance, g(x) = x
1/3
at x = 0.
2. Corners occur when the one-sided derivatives are unequal (could be infinite). For instance,
f (x) =
|
x
|
is not differentiable at zero, the one-sided derivatives being
lim
x0
+
|
x
|
|
0
|
x 0
= lim
x0
+
x
x
= 1 = lim
x0
|
x
|
|
0
|
x 0
= lim
x0
x
x
= 1
Indeed f is differentiable everywhere except at zero, with
f
(x) =
(
1 if x > 0
1 if x < 0
3. A singularity is where left- and/or right-derivatives do not ex-
ist. The standard example in this case is
f (x) =
(
x sin
1
x
if x = 0
0 if x = 0
which is continuous on R and differentiable everywhere ex-
cept at zero: the details are in Exercise 8.
1
x
2
π
2
π
2
π
Singularities and vertical tangent lines can also prevent one-sided differentiability.
More esoteric examples of non-differentiability are also possible:
Utilizing series, we can create functions which are continuous on an interval but nowhere differ-
entiable! For a classic example, see page 28.
It is also possible to construct a function which differentiable (and thus continuous) at precisely
one point; can you think of an example?
47
The Basic Rules of Differentiation
Theorem 3.8. Let f , g be differentiable and k, l be constants.
1. (Linearity) The function k f + lg is differentiable with (k f + lg)
= k f
+ lg
.
2. (Product rule) The function f g is differentiable with ( f g)
= f
g + f g
.
3. (Inverse functions) If f is bijective with non-zero derivative, then f
1
is differentiable and
d
dx
f
1
(x) =
1
f
f
1
(x))
Proof. Parts 1 and 2 follow from the limit laws:
lim
xa
( k f + lg)(x) (k f + lg)(a)
x a
= lim
xa
k
f (x) f (a)
x a
+ l
g(x) g(a)
x a
= k f
(a) + lg
(a)
lim
xa
f (x)g(x) f (a)g(a)
x a
= lim
xa
f (x) f (a)
x a
g(x) + f (a)
g(x) g(a)
x a
= f
(a)g(a) + f (a)g
(a)
Note where we used the continuity of g in the second line (lim g(x) = g(a)). Part 3 is an exercise.
The inverse function rule is intuitive since the graphs of f and f
1
are related by reflection in the line
y = x; gradients at corresponding points are therefore reciprocal. In Leibniz notation the result reads
dx
dy
=
dy
dx
1
.
Examples 3.9. 1. Linearity allows us to differentiate any polynomial: for instance
d
dx
7x
2
+ 13x
4
= 7
d
dx
x
2
+ 13
d
dx
x
4
= 14x + 52x
3
2. The product rule extends the reach of differentiation somewhat:
d
dx
(x
4
sin x) =
d
dx
x
4
sin x + x
4
d
dx
sin x = 4x
3
sin x x
4
cos x
3. The inverse trigonometric functions can now be differentiated. For instance,
y = sin
1
x =
d
dx
sin
1
x =
dy
dx
=
dx
dy
1
=
1
cos y
=
1
q
1 sin
2
y
=
1
1 x
2
4. Define natural log to be the inverse of the (bijective!) exponential function exp(x):
y = ln x x = exp y
It follows that
d
dx
ln x =
dx
dy
1
=
1
exp y
=
1
x
The full details, and the justification that exp x = e
x
, form an optional exercise.
48
Theorem 3.10 (Chain Rule). If g is differentiable at a and f is differentiable at g(a) then f g is
differentiable at a with derivative
( f g)
(a) = f
g(a)
g
(a)
In Leibniz notation this reads
d( f g)
dx
=
d f
dg
dg
dx
which looks like a simple cancellation of the dg terms!
16
Proof. Define γ : dom( f ) R via
γ( v) =
f
v
f
g(a)
vg(a)
if v = g(a)
f
g(a)
if v = g(a)
()
Since f is differentiable at g(a), we see that γ is continuous and lim
vg(a)
γ( v) = f
g(a)
.
Since g is differentiable at a, there exists an open interval U a for which x U = g(x) dom( f ).
Now compute: for any x U \{a}, let v = g(x) in (), whence
f
g(x)
f
g(a)
x a
= γ
g(x)
g(x) g(a)
x a
Take limits as x a for the result.
Corollary 3.11 (Quotient Rule). Suppose f and g are differentiable. Then
f
g
is differentiable when-
ever g(x) = 0. Moreover
f
g
=
f
g f g
g
2
The proof is an exercise.
Examples 3.12. 1. By the quotient rule,
d
dx
tan x =
d
dx
sin x
cos x
=
cos
2
x + sin
2
x
cos
2
x
= sec
2
x
2. We can now differentiate highly involved combinations of elementary functions:
d
dx
tan(e
4x
2
)
7x
sin x
= 8xe
4x
2
sec
2
( e
4x
2
)
7 sin x 7x cos x
sin
2
x
16
This is completely unjustified since dg does not (for us) mean anything on its own! The same problem appears in the
famously faulty one-line ‘proof of the chain rule:
lim
xa
f
g(x)
f
g(a)
x a
?
= lim
xa
f
g(x)
f
g(a)
g(x) g(a)
lim
xa
g(x) g(a)
x a
The second limit cannot exist unless g(x) = g(a) for all x near, but not equal to, a. The faulty argument is repaired by
replacing the second difference quotient with f
g(a)) whenever g(x) = g(a), before taking the limit. This is precisely what
γ
g(x)
does in the correct proof.
49
Exercises 28 1. Use Definition 3.1 to calculate the derivatives.
(a) f (x) = x
3
at x = 2 (b) g(x) = x + 2 at x = a
(c) f (x) = x
2
cos x at x = 0 (d) r(x) =
3x + 4
2x 1
at x = 1
2. Differentiate the function f (x) = cos
e
x
5
3x
using the chain and product rules.
3. (a) Prove the quotient rule (Corollary 3.11) by combining the chain and product rules.
(b) Prove the inverse derivative rule (Theorem 3.8, part 3).
(Hint: You can’t simply differentiate 1 =
dx
dx
=
d
dx
f ( f
1
(x)) using the chain rule; why not?)
4. (a) Find the derivatives of secant, cosecant and cotangent using the quotient rule.
(b) Why did we choose the positive square-root when computing
d
dx
sin
1
x? What is the
standard domain of arcsine, and what happens at x = ±1?
(c) Find the derivatives of the inverse trigonometric functions using the inverse function rule.
5. Using the definition of the derivative, and supposing that f is differentiable at a, prove that
f
(a) = lim
h0
f (a + h) f (a)
h
= lim
h0
f (a + h) f (a h)
2h
6. Prove that the function f (x) = x
|
x
|
is differentiable everywhere and compute its derivative.
7. Show that following function is differentiable everywhere and compute its derivative:
f (x) =
(
x
2
sin
1
x
if x = 0
0 if x = 0
Moreover, prove that the derivative f
is discontinuous at x = 0.
8. Show that the following function is differentiable everywhere except at zero:
f (x) =
(
x sin
1
x
if x = 0
0 if x = 0
9. (a) Suppose 0 < h <
π
2
. Use the picture to show that
0 <
1 cos h
h
< sin
h
2
and sin h < h < tan h
Hence conclude that lim
h0
sin h
h
= 1 and lim
h0
1cos h
h
= 0.
(b) Use part (a) to prove that
d
dx
sin x = cos x
cos h
sin h
tan h
1
h
h
10. (Hard) Use induction to prove the Leibniz rule (general product rule):
( f g)
(n)
=
n
k=0
n
k
f
(k)
g
(nk)
50
Masochists Corner (non-examinable)
We finish with two very hard bonus exercises, though the first is somewhat easier. If you want a
challenge, give ’em a go!
The Exponential Function & the General Power Law
Consider the function exp(x) :=
n=0
x
n
n!
which converges for all real x.
As we saw when discussing power series, this function satisfies the initial value problem
d
dx
exp(x) = exp(x), exp( 0) = 1
Define e := exp(1). Certainly e
x
makes sense whenever x Q. When x is irrational, define
e
x
:= sup{e
q
: q Q, q < x}
Our primary goal is to prove that exp(x) = e
x
. As a nice bonus we recover Bernoulli’s limit
identity e = lim
n
1 +
1
n
n
and obtain a complete proof of the power law.
(a) For all x, y R, prove that exp(x + y) = exp(x) exp(y)
(Hint: use the binomial theorem and change the order of summation)
(b) Show that exp(x) is always positive, even when x < 0.
(c) Prove that exp : R (0, ) is bijective.
(Hint: x 0 = exp(x) 1 + x; take limits then apply part (a))
(d) Prove that e
x
= exp(x). Do this in three stages:
If x N, use part (a). Now check for x Z
.
If x =
m
n
Q, first compute
exp(
m
n
)
n
.
If x is irrational, start with (q
n
) Q such that q
n
< x and e
q
n
e
x
. . .
(e) Let ln : (0, ) R be the inverse function of exp. Prove the logarithm laws:
ln(xy) = ln x + ln y and ln x
r
= r ln x
(Just do this when r N; another argument like part (d) is required in general)
(f) We’ve already seen that
d
dy
ln y =
1
y
. Use the fact that
d
dy
ln y = lim
h0
ln(y + h) ln y
h
to prove that exp(x) = lim
n
1 +
x
n
n
, thus recovering Bernoulli’s definition of e.
(g) For any r R, define x
r
:= exp(r ln x). Hence obtain the power law for any exponent.
51
A Very Strange Function
Here is a classic example of a continuous but nowhere-differentiable function!
Let f be the sawtooth function defined by f (x) =
|
x
|
whenever x [1, 1] and extending
periodically to R so that f (x + 2) = f (x). Now define g : R R via
g(x) =
n=0
3
4
n
f (4
n
x)
1
2
2 1 0 1 2
1
2
2 1 0 1 2
f (x) and iterations to n = 3 g(x) (really n = 6, but can you tell?!)
(a) Prove that g is well-defined and continuous on R.
(b) Let x R and m N be fixed. Define h
m
= ±
1
2
·4
m
where the ±-sign is chosen so that
no integers lie strictly between 4
m
x and 4
m
(x + h
m
) = 4
m
x ±
1
2
.
For each n N
0
, define
k
n
=
f
4
n
(x + h
m
)
f ( 4
n
x)
h
m
Prove the following
i.
|
k
n
|
4
n
with equality when n = m.
ii. n > m = k
n
= 0.
(Hint:
|
f (y) f (z)
|
|
y z
|
: when is this an equality?)
(c) Use part (b) to prove that
g(x + h
m
) g(x)
h
m
1
2
(3
m
+ 1)
Hence conclude that g is nowhere differentiable.
52
29 The Mean Value Theorem
We now turn to one of the central results in calculus.
Theorem 3.13 (Mean Value Theorem/MVT). Let f be continuous on [a, b] and differentiable on
(a, b). Then there exists ξ (a, b) such that f
( ξ) =
f (b)f (a)
ba
.
This follows easily from two lemmas.
Lemma 3.14. 1. (Critical Points) Suppose g is bounded on (a, b) and attains its maximum or min-
imum at ξ (a, b). If g is differentiable at ξ then g
( ξ) = 0.
2. (Rolle’s Theorem) Suppose g is continuous on [a, b], differentiable on (a, b), and that g(a) =
g( b). Then there exists ξ (a, b) such that g
( ξ) = 0.
The main result follows by applying Rolle’s theorem to
g(x) = f (x)
f (b) f (a)
b a
(x b)
and observing that g(a) = f (b) = g(b) and g
(x) = f
(x)
f (b)f (a)
ba
.
g(x)
x
a b
ξ
Critical Points/Rolle’s Theorem
f (x)
x
a b
ξ
Mean Value Theorem
In the pictures, the orange and green lines are parallel: the average slope over the interval [a, b] equals
the gradient/derivative f
( ξ).
Proof of Lemma. 1. Suppose, for a contradiction, that
g
( ξ) = lim
xξ
g(x) g(ξ)
x ξ
> 0
Let ϵ = g
( ξ) in the definition of limit: δ > 0 such that
0 <
|
x ξ
|
< δ =
g(x) g(ξ)
x ξ
g
( ξ)
< g
( ξ) = 0 <
g(x) g(ξ)
x ξ
< 2g
( ξ)
In particular, if x (ξ, ξ + δ), then g(x) > g(ξ) , contradicting the maximality at ξ.
The argument when g
( ξ) < 0 is similar. Finally, apply to g for the result at a minimum.
2. By the extreme value theorem, g is bounded and attains its bounds. If the maximum and min-
imum both occur at the endpoints a, b, then g is constant: any ξ (a, b) satisfies the result.
Otherwise, at least one extreme value occurs at some ξ (a, b): part 1 says that g
( ξ) = 0.
53
Examples 3.15. 1. Let f (x) = (x 1)
2
(4 x) + x on [a, b] = [1, 4]: this is roughly the above picture
illustrating the mean value theorem. We compute the average slope and the derivative:
f (b) f (a)
b a
= 1, f
(x) = 2(x 1) (4 x) (x 1)
2
+ 1 = 3x
2
+ 12x 8
and observe that
f
( ξ) =
f (b) f (a)
b a
3ξ
2
12ξ + 9 = 0 ξ = 1 or 3
Since only 3 lies in the interval ( 1, 4), this is the value ξ satisfying the mean value theorem.
2. We find the maximum and minimum values of g(x) = x
4
14x
2
+ 24x on the interval [0, 2].
The function is differentiable, with
g
(x) = 4x
3
28x + 24 = 4(x 2) (x 1)(x + 3)
By the Lemma, the locations of the extrema are either the end-
points x = 0, 2 or locations with zero derivative (x = 1). Since
f (0) = 0, f (1) = 11, f (2) = 8
we conclude that max( f ) = f (1) = 11 and min( f ) = f (0) = 0.
0
5
10
g(x)
0 1 2
x
Consequences of the Mean Value Theorem Several simple corollaries relate to monotonicity.
Definition 3.16. Suppose f : I R is defined on an interval I. We say that f is:
Increasing (monotone-up) on I if x < y = f (x) f (y)
Decreasing (monotone-down) on I if x < y = f (x) f (y)
We say strictly increasing/decreasing if the inequalities are strict.
Examples 3.17. 1. f : x 7 x
2
is strictly increasing on [0, )
and strictly decreasing on (, 0].
2. The floor function f : x 7 x (the greatest integer less
than or equal to x) is increasing, but not strictly, on R.
2
1
1
2
g(x)
2 1 1 2 3
x
Corollary 3.18. Suppose f is differentiable on an interval I, then
1. f
0 on I f is increasing on I
2. f
0 on I f is decreasing on I
3. f
= 0 on I f is constant on I
54
Proof. () Let x < y where x, y I. By the mean value theorem, ξ (x, y) such that
f (y) f (x)
y x
= f
( ξ) whence f
( ξ) 0 = f (y) f (x)
() For the converse, use the definition of derivative: f
( ξ) = lim
xξ
f (x) f (ξ)
xξ
. If f is increasing, then
x > ξ = f (x) f (ξ) = f
( ξ) 0
Parts 2 and 3 are similar.
Corollary 3.18 yields a couple of flashbacks to elementary calculus.
Corollary 3.19. Let I be an interval.
1. (Anti-derivatives on an interval) If f
(x) = g
(x) on I, then c such that g(x) = f (x) + c on I.
2. (First derivative test) Suppose f is continuous on I and differentiable except perhaps at ξ. If
(
f
(x) < 0 whenever x < ξ, and
f
(x) > 0 whenever x > ξ
then f has its minimum value at x = ξ
The statement for a maximum is similar.
Examples 3.20. 1. Since
d
dx
sin( 3x
2
+ x) = ( 6x + 1) cos(3x
2
+ x) on (the interval) R, whence all
anti-derivatives of f (x) = (6x + 1) cos(3x
2
+ x) are given by
Z
f (x) dx =
Z
(6x + 1) cos(3x
2
+ x) dx = sin(3x
2
+ x) + c
As is typical, we use the indefinite integral notation
R
f (x) dx for anti-derivatives.
2. If f (x) = x
2/3
e
x/3
, then f
(x) =
1
3
x
1/3
(2 + x)e
x/3
.
By Lemma 3.14, the only possible critical points are at
x = 0 or 2. The sign of the derivative is also clear:
2 0
x
f
(x) > 0 f
(x) < 0 f
(x) > 0
1
f (x)
3 2 1 0 1
x
By the 1
st
derivative test, f has a maximum at x = 2 and a minimum at x = 0.
We finish this section by tying together the mean and intermediate value theorems.
Theorem 3.21 (IVT for Derivatives). Suppose f is differentiable on an interval I containing a < b,
and that L lies between f
(a) and f
( b). Then ξ (a, b) such that f
( ξ) = L.
If f
(x) is continuous, this is just the intermediate value theorem applied to f
. A full proof is left to
the exercises; surprisingly, continuity is not required. . .
55
Exercises 29 1. Determine whether the conclusion of the mean value theorem holds for each func-
tion on the given interval. If so, find a suitable point ξ. If not, state which hypothesis fails.
(a) x
2
on [1, 2] (b) sin x on [0, π] (c)
|
x
|
on [1, 2]
(d) 1/x on [1, 1] (e) 1/x on [1, 3]
2. Suppose f and g are differentiable on an open interval I, that a < b and f (a) = f (b) = 0. By
considering h(x) = f (x)e
g(x)
, prove that f
( ξ) + f (ξ)g
( ξ) = 0 for some ξ (a, b).
3. Use the Mean Value Theorem to prove the following:
(a) x < tan x for all x (0, π/2).
(b)
x
sin x
is a strictly increasing function on (0, π/2).
(c) x
π
2
sin x for all x [0, π/2].
4. Suppose that
|
f (x) f (y)
|
(x y)
2
for all x, y R. Prove that f is a constant function.
5. (a) Prove that f
> 0 on an interval I = f is strictly increasing on I.
(b) Show that the converse of part (a) is false.
(c) Carefully prove the first derivative test (Corollary 3.19).
6. If f is differentiable on an interval I such that f
(x) = 0 for all x I, use the intermediate value
theorem for derivatives to prove that f is either strictly increasing or strictly decreasing.
7. We prove the intermediate value theorem for derivatives. Let f , a, b and L be as in the Theorem,
define g : I R by g(x) = f (x) Lx, and let ξ [a, b] be such that
g( ξ) = min{g(x) : x [a, b]}
(a) Why can we be sure that ξ exists? If ξ (a, b), explain why f
( ξ) = L.
(b) Now assume WLOG that f
(a) < f
( b). Prove that g
(a) < 0 < g
( b). By considering
lim
xa
+
g(x)g(a)
xa
, show that x > a for which g(x) < g(a). Hence complete the proof.
8. Suppose f
exists on (a, b), and is continuous except for a discontinuity at c (a, b).
(a) Obtain a contradiction if lim
xc
+
f
(x) = L < f
( c). Hence argue that f
cannot have a
removable or a jump discontinuity at x = c.
(Hint: let ϵ =
f
(c)L
2
in the definition of limit then apply IVT for derivatives)
(b) Similarly, obtain a contradiction if lim
xc
+
f
(x) = and conclude that f
cannot have an
infinite discontinuity at x = c.
(c) It remains to see that f
can have an essential discontinuity. Recall (Exercise 28.7) that
f : R R : x 7
(
x
2
sin( 1/x) x = 0
0 x = 0
is differentiable on R, but has discontinuous derivative at x = 0.
i. By considering x
n
=
1
2nπ
and y
n
=
1
(2n+1)π
, show that f
has an essential discontinuity
at x = 0.
ii. Prove that if s
n
0 and f
( s
n
) converges to some M, then M [1, 1].
iii. Use IVT for derivatives to show that for any L [1, 1], (t
n
) R \ {0} such that
lim
n
f
( t
n
) = L.
56
30 L’Hˆopital’s Rule
We are often forced to consider limits known as indeterminate forms, which do not yield easily to the
standard limits laws. For example, it is tempting to try to write
lim
x0
sin 2x
e
3x
1
=
lim
x0
sin 2x
lim
x0
e
3x
1
=
0
0
()
This is an incorrect application of the limit laws since the resulting quotient has no meaning.
Definition 3.22. An indeterminate form is a limit where a na
¨
ıve application of the limit laws results
in a meaningless expression: the primary types are
0
0
,
, , 0 · , 0
0
, 0
, and 1
.
Examples 3.23. 1. lim
x7
+
(x 7)
1
x7
is an indeterminate form of type 0
.
2. The above indeterminate form () may be evaluated using the definition of the derivative
lim
x0
sin 2x
e
3x
1
= lim
x0
sin 2x 0
x 0
x 0
e
3x
1
=
d
dx
x=0
sin 2x
d
dx
x=0
e
3x
1
=
2
3
By considering lim
x0
3a sin 2x
2(e
3x
1)
, we see that an indeterminate form of type
0
0
can take any value a!
This approach generalizes: if f (a) = 0 = g(a), we obtain the simplest version of l’H
ˆ
opital’s rule;
lim
xa
f (x)
g(x)
= lim
xa
f (x) f (a)
x a
·
x a
g(x) g(a)
=
f
(a)
g
(a)
This obviously isn’t rigorous. Our goal is to make it so and to extend to the following situations:
Limits where a = ±.
When the RHS cannot be cleanly evaluated: for instance g
(a) = 0 or if the original limit is ±.
Covering all cases makes the proof an absolute behemoth! Because of this, and because such limits
can often be evaluated more instructively using elementary methods, the rule is often discouraged
in Freshman calculus. To prepare for the upcoming monster, we first generalize the MVT.
Lemma 3.24 (Extended Mean Value Theorem). Fix a < b, suppose f , g are continuous on [a, b] and
differentiable on (a, b). Then there exists ξ (a, b) such that
f (b) f (a)
g
( ξ) =
g( b) g(a)
f
( ξ)
Proof. Simply apply the standard mean value theorem (really Rolle’s Theorem) to
h( t) = ( f (b) f (a))g(t) (g(b) g(a)) f (t)
which satisfies h(a) = h(b).
57
Theorem 3.25 (l’Hˆopital’s rule). Let a R {±} and suppose functions f and g satisfy:
1. lim
xa
f
(x)
g
(x)
= L for some L R {±}
2. (a) lim
xa
f (x) = lim
xa
g(x) = 0, or (b) lim
xa
g(x) = (no condition on f )
Then lim
xa
f (x)
g(x)
= L. The same result holds for one-sided limits.
Examples 3.26. 1. If f (x) = e
4x
and g(x) = 21x 17, then
lim
x
f
(x)
g
(x)
= lim
x
4e
4x
21
= = lim
x
e
4x
21x 17
=
This is an example of type
.
2. For an example of type
0
0
, consider f (x) = x
2
9 and g(x) = ln(4 x):
lim
x3
f
(x)
g
(x)
= lim
x3
2x
1/(4 x)
= lim
x3
2x(x 4) = 6 = lim
x3
x
2
9
ln( 4 x)
= 6
3. One can apply the rule repeatedly: for example
lim
x0
e
4x
1 4x
x
2
= lim
x0
4e
4x
4
2x
= lim
x0
16e
4x
2
= 8
There is an abuse of protocol here, since the existence of the first limit is dependent on the last.
The approach is acceptable, though you should understand why it is an abuse. Indeed. . .
4. It is important that the limit lim
f
g
be seen to exist before applying l’H
ˆ
opital’s rule! Consider
f (x) = x + cos x and g(x) = x: certainly lim
x
f (x)
g(x)
has type
, however
lim
x
f
(x)
g
(x)
= lim
x
1 sin x
does not exist! In this case the rule is unnecessary, since
f (x)
g(x)
= 1 +
cos x
x
x
1
by the squeeze theorem.
5. Finally, a short example to explain why l’H
ˆ
opital’s rule is often prohibited in Freshman calculus.
Consider the calculation:
lim
x0
sin x
x
= lim
x0
cos x
1
= 1
This appears to be a legitimate application of the rule. However, recall (Exercise 28.9) that one
purpose of this limit is to demonstrate that
d
dx
sin x = cos x; to use this fact to calculate the limit
on which it depends is the very definition of circular logic!
58
Other Indeterminate Forms
The remaining indeterminate forms listed in Definition 3.22 may be modified so that l’H
ˆ
opital’s rule
applies. Since you’ve likely seen several such examples in elementary calculus, we give just a couple.
Examples 3.27. 1. An indeterminate form of type is transformed to one of type
0
0
before
applying the rule (twice):
lim
x0
+
1
e
x
1
1
x
= lim
x0
+
x + 1 e
x
x(e
x
1)
(type
0
0
)
= lim
x0
+
1 e
x
e
x
1 + xe
x
(still type
0
0
)
= lim
x0
+
e
x
2e
x
+ xe
x
=
1
2
2. For an indeterminate form of type 1
, we use the log laws & the continuity of the exponential:
lim
x0
+
(1 + sin x)
1/x
= exp
lim
x0
+
1
x
ln( 1 + sin x)
(type
0
0
)
= exp
lim
x0
+
cos x
1 + sin x
= e
1
= e
Proving l’Hˆopital’s Rule
The complete argument is very long; if you do nothing else, read the following proof of the simplest
case. Everything else is a modification.
Proof (type
0
0
with right limits). We prove first for right-limits x a
+
. First observe that condition 1.
forces the existence of an interval (a, b) on which f , g are differentiable and g
(x) = 0.
Assume we have a form of type
0
0
(case 2. (a)) and assume additionally that a and L are finite. Every-
thing follows from the definition of limit (condition 1.) and Lemma 3.24:
Given ϵ > 0, δ (0, b a) such that a < ξ < a + δ =
f
( ξ)
g
( ξ)
L
<
ϵ
2
()
a < y < x < a + δ = ξ (y, x) such that
f (x) f (y)
g(x) g(y)
=
f
( ξ)
g
( ξ)
(†)
Since g
= 0, the usual mean value theorem says we never divide by zero in ():
c (y, x) such that g(x) g(y) = g
( c)(x y) = 0
Observe that
f (x)f (y)
g(x)g(y)
L
=
f
(ξ)
g
(ξ)
L
<
ϵ
2
, let y a
+
and use 2. (a) to see that
x (a, a + δ),
f (x)
g(x)
L
ϵ
2
< ϵ
which is the required result.
59
We now describe some modifications.
If a = : Replace the blue part of () as follows:
Given ϵ > 0, m b such that ξ < m =
f
( ξ)
g
( ξ)
L
<
ϵ
2
The rest of the proof goes through after replacing a with and a + δ with m.
If L = : Replace the green parts of () with Given M > 0 and
f
(ξ)
g
(ξ)
> 2M. Fixing the rest of the
proof is again straightforward.
If L = : Replace the green parts of () with Given M > 0 and
f
(ξ)
g
(ξ)
< 2M.
Left-limits: If f , g are differentiable on (c, a), then the blue part may be replaced with either:
(a finite) δ (0, a c) such that a δ < ξ < a
(a = ) m c such that ξ > m
The blue and green parts of () can be replaced independently. This completes the proof for all
indeterminate forms of type
0
0
.
Proof (case 2. (b) when lim g(x) = ). This requires a little more modification.
17
Since g
= 0, and
lim
xa
+
g(x) = , Exercise 29.6 says that g is strictly decreasing on (a, b). By replacing b by some
˜
b (a, b), if necessary, we may assume that
a < y < x < b = 0 < g(x) < g(y) (‡)
Assume a and L are finite, and obtain () and () as before. Let x (a, a + δ) be fixed and multiply
(†) by
g(y)g(x)
g(y)
(this is positive by (‡)): a little algebra and the triangle inequality tell us that
a < y < x =
f (y)
g( y)
=
f
( ξ)
g
( ξ)
+
f (x)
g( y)
g(x)
g( y)
·
f
( ξ)
g
( ξ)
=
f (y)
g( y)
L
f
( ξ)
g
( ξ)
L
+
1
g( y)
|
f (x)
|
+
|
g(x)
|
L +
ϵ
2
Since lim
ya
+
g( y) = and x is fixed, we see that there exists η x a < δ such that
y (a, a + η) =
1
g( y)
|
f (x)
|
+
|
g(x)
|
L +
ϵ
2
<
ϵ
2
Finally combine with (): given ϵ > 0, η > 0 such that y (a, a + η) =
f (y)
g(y)
L
< ϵ.
The same modifications listed previously complete the proof.
17
Forms of type
? Instead of assumption 2. (b), why not simply assume lim f = lim g = and write
f
g
=
1/g
1/ f
to obtain
a form of type
0
0
? The problem is that the derivative of the ‘new’ denominator
d
dx
1
f
=
f
f
2
need not be non-zero on any
interval (a, b) and so condition 1. need not hold. We could modify this, but it would make for a weaker theorem. Example
3.26.4 illustrates this: f
(x) = 1 + sin x has zeros on any unbounded interval.
After the 2. (b) case is proved and we know that lim
f
g
= L, it is then clear that lim f must also be infinite (unless L = 0 in
which case lim f could be anything and need not exist). This situation therefore really does deal with forms of type
.
60
Exercises 30 1. Evaluate the following limits, if they exist:
(a) lim
x0
x
3
sin x x
(b) lim
x
π
2
tan x
2
π 2x
(c) lim
x0
(cos x)
1/x
2
(d) lim
x0
(1 + 2x)
1/x
(e) lim
x
( e
x
+ x)
1/x
2. Let f be differentiable on (c, ) and suppose that lim
x
[ f (x) + f
(x)] = L is finite.
(a) Prove that lim
x
f (x) = L and that lim
x
f
(x) = 0.
(Hint: write f (x) =
f (x)e
x
e
x
)
(b) Does anything change if L exists and is infinite?
3. If p
n
(x) is a polynomial of degree n, use induction to prove that lim
x
p
n
(x)e
x
= 0
4. Let f (x) = x + sin x cos x, g(x) = e
sin x
f (x) and h(x) =
2 cos x
e
sin x
( f (x) + 2 cos x)
(a) Prove that lim
x
f (x) = = lim
x
g(x) but that lim
x
f (x)
g(x)
does not exist.
(b) If cos x = 0, and x is large, show that
f
(x)
g
(x)
= h(x).
(c) Prove that lim
x
h(x) = 0. Explain why this does not contradict part (a)!
61
31 Taylors Theorem
A primary goal of power series is the approximation of functions. As such, there are two natural
questions to ask of a given function f :
1. Given c dom( f ), is there a series
a
n
(x c)
n
which equals f (x) on an interval containing c?
2. If we take the first n terms of such a series, how accurate is this polynomial approximation?
Example 3.28. Recall the geometric series
f (x) =
1
1 x
=
n=0
x
n
whenever 1 < x < 1
The polynomial approximation
p
n
(x) =
n
k=0
x
k
= 1 + x + ···+ x
n
=
1 x
n+1
1 x
has error
R
n
(x) = f (x) p
n
(x) =
x
n+1
1 x
2
4
6
8
10
y
1 0 1
x
1
2
1
2
p
3
(x) = 1 + x + x
2
+ x
3
If x is close to 0, this is likely very small; for instance if x
1
2
,
1
2
, then
|
R
n
(x)
|
1
1
1
2
1
2
n+1
= 2
n
However, when x is close to 1, the error is unbounded!
The behavior in the Example occurs in general: the truncated polynomial approximations are better
near the center of the series. To see this, we first need to consider higher-order derivatives.
Definition 3.29. We write f
′′
for the second derivative of f , namely the derivative of its derivative
f
′′
(a) = lim
xa
f
(x) f
(a)
x a
The existence of f
′′
(a) presupposes that f
exists on an (open) interval containing a. We can similarly
consider third, fourth, and higher-order derivatives. As a function, the n
th
derivative is written
f
(n)
(x) =
d
n
f
dx
n
By convention, the zeroth derivative is the function itself f
(0)
(x) = f (x). We say that f is n times
differentiable at a if f
(n)
(a) exists, and infinitely differentiable (or smooth) if derivatives of all orders exist.
Example 3.30. f (x) = x
2
|
x
|
is twice differentiable, with f
′′
(x) = 6
|
x
|
. It is smooth everywhere
except at x = 0, where third (and higher-order) derivatives do not exist.
62
Definition 3.31. Suppose f is n times differentiable at x = c. The n
th
Taylor polynomial p
n
of f
centered at c is
p
n
(x) :=
n
k=0
f
(k)
( c)
k!
(x c)
k
= f (c) + f
( c)(x c) +
f
′′
( c)
2
(x c)
2
+ ··· +
f
(n)
( c)
n!
(x c)
n
The remainder R
n
(x) is the error in the polynomial approximation
R
n
(x) = f (x) p
n
(x) = f (x)
n
j=0
f
(k)
( c)
k!
(x c)
k
If f is infinitely differentiable at x = c, then its Taylor series centered at x = c is the power series
T f (x) =
n=0
f
(n)
( c)
n!
(x c)
n
When c = 0 this is known as a Maclaurin series.
18
For simplicity we’ll most often work with Maclaurin series, with general cases hopefully being clear.
Examples 3.32. 1. If f (x) = e
3x
, then f
(n)
(x) = 3
n
e
x
, from which the Maclaurin series is
T f (x) =
n=0
3
n
n!
x
n
2. If g(x) = sin 7x, then the sequence of derivatives is
7 cos 7x, 7
2
sin 7x, 7
3
cos 7x, 7
4
sin 7x, 7
5
cos 7x, 7
6
sin 7x, . . .
At x = 0, every even derivative is zero, while the odd derivatives alternate in sign; the Maclau-
rin series is easily seen to be
Tg(x) =
n=0
( 1)
n
7
2n+1
(2n + 1) !
x
2n+1
3. If h(x) =
x, then h
(x) =
1
2
x
1/2
, h
′′
(x) =
1
2
2
x
3/2
, and h
′′
(x) =
3
2
3
x
5/2
, from which the
third Taylor polynomial centered at c = 1 is
p
2
(x) = h( 1) + h
(1)(x 1) +
h
′′
(1)
2
(x 1)
2
+
h
′′
(1)
6
(x 1)
3
= 1 +
1
2
(x 1)
1
8
(x 1)
2
+
1
16
(x 1)
3
Rather than compute more examples, we develop a little theory that makes verifying Taylor series
much easier.
18
Named for Englishman Brook Taylor (1685–1731) and Scotsman Colin Maclaurin (1698–1746). Taylor’s general method
expanded on examples discovered by James Gregory and Issac Newton in the mid-to-late 1600’s.
63
Differentiation of Taylor Polynomials and Series
Suppose P(x) =
a
j
x
j
is a power series with radius of convergence R > 0. As we discovered
previously, this is differentiable term-by-term on (R, R). Indeed
P
(x) =
j=1
a
j
jx
j1
= P
(0) = a
1
P
′′
(x) =
j=2
a
j
j(j 1)x
j2
= P
′′
(0) = 2a
2
P
′′
(x) =
j=3
a
j
j(j 1)(j 2)x
j3
= P
′′
(0) = 3!a
3
.
.
.
P
(k)
(x) =
j=k
a
j
j(j 1) ···(j k + 1)x
jk
=
j=k
j!a
j
(j k)!
x
jk
= P
(k)
(0) = k!a
k
Otherwise said, P is its own Maclaurin series! The same discussion holds for polynomials: indeed if
P(x) = a
0
+ a
1
x + ···+ a
n
x
n
is a polynomial, then for all k n,
P
(k)
(0) = f
(k)
(0) a
k
=
f
(k)
(0)
k!
If this holds for all k n, then P must be the Taylor polynomial of f ! With a little modification, we’ve
proved the following:
Theorem 3.33. 1. If f (x) =
n=0
a
n
(x c)
n
on a neighborhood of c, then
n=0
a
n
(x c)
n
is the Taylor
series of f .
2. The n
th
Taylor polynomial of f centered at x = c is the unique polynomial p
n
of degree n
whose value and first n derivatives agree with those of f at x = c: that is
k n, p
(k)
n
( c) = f
(k)
( c)
This answers our first motivating question: a function can equal at most one power series with a
given center. The second question requires a careful study of the remainder: we’ll do this shortly.
Examples 3.34 (Common Maclaurin Series). These should be familiar from elementary calculus.
Each of these functions equals the given series by our previous discussion of power series: by the
Theorem, each series is therefore the Maclaurin series of the given function with no requirement to
calculate directly!
e
x
=
n=0
x
n
n!
x R
1
1 x
=
n=0
x
n
x (1, 1)
sin x =
n=0
( 1)
n
(2n + 1) !
x
2n+1
x R ln( 1 + x) =
n=1
( 1)
n+1
n
x
n
x (1, 1]
cos x =
n=0
( 1)
n
(2n)!
x
2n
x R tan
1
x =
n=0
( 1)
n
2n + 1
x
2n+1
x [1, 1]
64
Examples 3.35 (Modifying Maclaurin Series). By substituting for x in a common series, we
quickly obtain new ones.
1. Substitute x 7 7x in the Maclaurin series for sin x, to recover our earlier example
sin 7x =
n=0
( 1)
n
7
2n+1
(2n + 1) !
x
2n+1
, x R
Note how this requires almost no calculation: since the function equals a series, the Theorem
says we have the Maclaurin series for sin 7x!
2. Substitute x 7 x
2
in the Maclaurin series for e
x
to obtain
e
x
2
= exp(x
2
) =
n=0
1
n!
x
2n
, x R
This would be disgusting to verify directly, given the difficulty of repeatedly differentiating e
x
2
.
3. We find the Taylor series for f (x) =
1
5x
centered at x = 2:
f (x) =
1
3 + 2 x
=
1
3( 1
2x
3
)
=
1
3
n=0
2 x
3
n
which is valid whenever 1 <
2x
3
< 1 1 < x < 5.
4. Fix c R and observe that, for all x R,
e
x
= e
c+xc
= e
c
e
xc
=
n=0
e
c
n!
(x c)
n
We conclude that the series is the Taylor series of e
x
centered at x = c. Of course this is easily
verified using the definition, since
d
n
dx
n
x=c
e
x
= e
c
.
5. Combining the Theorem with the multiple-angle formula, we obtain the Taylor series for sin x
centered at x = c:
sin x = sin( c + x c) = sin c cos(x c) + cos x sin(x c)
=
n=0
( 1)
n
sin c
(2n)!
(x c)
2n
+
n=0
( 1)
n
cos c
(2n + 1) !
(x c)
2n+1
Definition 3.36. A function is analytic on a domain if for each c there exists a neighborhood of c on
which the function equals its Taylor series centered at c.
All the examples we’ve so far seen are analytic on their domains; indeed the last two of Examples
3.35 prove this for the exponential and sine functions. Every analytic function is automatically smooth
(infinitely differentiable), however the converse is false in that not every smooth function is analytic
(see Exercise 10). Analyticity is of greater importance in complex analysis where it is seen to be
equivalent to complex-differentiability.
65
Accuracy of Taylor Approximations
Our final goal is to estimate the accuracy of a Taylor polynomial as an approximation to its generating
function. Otherwise said, we want to estimate the size of the remainder R
n
(x) = f (x) p
n
(x).
Theorem 3.37 (Taylors Theorem: Lagrange’s form). Suppose f is n + 1 times differentiable on an
open interval I containing c and let x I \ {c}. Then there exists some ξ between c and x for which
the remainder centered at c satisfies
R
n
(x) =
f
(n+1)
( ξ)
( n + 1)!
(x c)
n+1
Proof. For simplicity let c = 0. Fix x = 0, define a constant M
x
and a function g : I R by
R
n
(x) =
M
x
( n + 1)!
x
n+1
and g(t) =
M
x
( n + 1)!
t
n+1
+ p
n
( t) f (t) =
M
x
( n + 1)!
t
n+1
R
n
( t)
Observe that
k n + 1 = g
(k)
(x) =
M
x
( n + 1 k)!
t
n+1k
+ p
(k)
n
( t) f
(k)
( t) ()
= g
(k)
(0) = p
(k)
n
(0) f
(k)
(0) = 0 if k n
where we invoked Theorem 3.33.
Apply Rolle’s Theorem repeatedly (WLOG assume x > 0):
ξ
1
between 0 and x such that g
( ξ
1
) = 0.
ξ
2
between 0 and ξ
1
such that g
′′
( ξ
2
) = 0, etc.
Iterate to obtain a sequence (ξ
k
) such that
0 < ξ
n+1
< ξ
n
< ··· < ξ
1
< x and g
(k)
( ξ
k
) = 0
Take ξ = ξ
n+1
and consider (): since deg p
n
n, we see that
0 = g
(n+1)
( ξ) = M
x
f
(n+1)
( ξ) = R
n
(x) = f (x) p
n
(x) =
f
(n+1)
( ξ)
( n + 1)!
x
n+1
Corollary 3.38. Suppose f is smooth on an open interval I containing c and that all derivatives f
(n)
of all orders are bounded on I. Then f equals its Taylor series (centered at c) on I.
Proof. For simplicity, let c = 0. Suppose
f
(n+1)
( ξ)
K for all ξ I. Choose any N >
|
x
|
and
observe that
n > N =
|
R
n
(x)
|
K
|
x
|
n+1
( n + 1)!
=
K
|
x
|
n+1
N!(N + 1) ···(n + 1)
K
|
x
|
N
N!
|
x
|
N
n+1N
n
0
66
Examples 3.39. 1. The functions sine and cosine have derivatives bounded by 1 on R, and thus
both functions equal their Maclaurin series on R. This removes the need to have previously
justified these facts using the theory of differential equations.
2. The exponential function does not have bounded derivatives, however we can still apply Tay-
lor’s Theorem. For any fixed x, ξ between 0 and x such that
|
R
n
(x)
|
=
e
ξ
( n + 1)!
x
n+1
n
0
by the same argument in the Corollary. Thus e
x
equals its Maclaurin series on the real line.
3. Extending Example 3.32.3, we see that the function h(x) =
x has the following linear approx-
imation (1
st
Taylor polynomial) centered at c = 9
p
1
(x) = h( 9) + h
(9)(x 9) = 3 +
1
6
(x 9)
This yields the simple approximation
10 p
1
(10) = 3 +
1
6
=
19
6
Taylor’s Theorem can be used to estimate its accuracy (remember to shift the center to 9!):
R
1
(10) =
h
′′
( ξ)
2!
(10 9)
2
=
1
2
2
·2!
ξ
3/2
=
1
8ξ
3/2
for some ξ (9, 10)
Certainly ξ
3/2
< 9
3/2
=
1
27
, whence
1
216
< R
1
(10) < 0 =
19
6
1
216
=
683
216
<
10 <
684
216
=
19
6
19
6
is therefore an overestimate for
10, but is accurate to within
1
216
< 0.005.
Alternative Versions of Taylors Theorem
The two other common expressions for the remainder are typically less easy to use than Lagrange’s
form, but can sometimes provide sharper estimates for the remainder, particularly when x is far from
the center of the series.
Corollary 3.40. Suppose f
(n+1)
is continuous on an open interval I containing c, let x I \{c}, and
let R
n
(x) = f (x) p
n
(x) be the remainder for the Taylor polynomial centered at c. Then:
1. (Integral Remainder) R
n
(x) =
Z
x
c
(x t)
n
n!
f
(n+1)
( t) dt
2. (Cauchy’s Form) ξ between c and x such that R
n
(x) =
(x ξ)
n
n!
(x c) f
(n+1)
( ξ)
67
Using these expressions it is possible to explicitly prove Newton’s binomial series formula:
Theorem 3.41. If α R and
|
x
|
< 1, then
(1 + x)
α
= 1 +
n=1
α(α 1) ···(α n + 1)
n!
x
n
= 1 + αx +
α(α 1)
2!
x
2
+
α( α 1)(α 2)
3!
x
3
+
α( α 1)(α 2) (α 3)
4!
x
4
+ ···
If α N
0
, this is the usual binomial theorem. Otherwise it is more interesting, for instance,
1 + x = (1 + x)
1/2
= 1 +
1
2
x
1
8
x
2
+
1
16
x
3
5
128
x
4
+ ···
1
(1 + x)
3
= 1 3x + 6x
2
10x
3
+ 15x
4
···
Of course this last could easily be obtained from
1
1+x
=
( 1)
n
x
n
by differentiating twice!
Exercises 31 1. Compute the Maclaurin series for cos x directly from the definition and use Taylor’s
Theorem to indicate why it converges to cos x for all x R.
2. Repeat the previous exercise for sinh x =
1
2
( e
x
e
x
) and cosh x =
1
2
( e
x
+ e
x
).
3. Find the Maclaurin series for the function sin(3x
2
). How do you know you are correct?
4. Find the Taylor series of f (x) = x
4
3x
2
+ 2x 5 centered at x = 2 and show that T f (x) = f (x).
5. Find a rational approximation to
3
9 using the first Taylor polynomial for f (x) =
3
x. Now use
Taylor’s Theorem to estimate its accuracy.
6. If c = 1, use the fact that 1 x = (1 c)
1
xc
1c
to obtain the Taylor series of
1
1x
centered at
c. Hence conclude that
1
1x
is analytic on its domain R \ {1}.
7. We use Taylor’s Theorem to prove that the Maclaurin series
n=1
(1)
n+1
n
x
n
converges to ln(1 + x)
whenever 0 < x 1.
(a) Explicitly compute
d
n+1
dx
n+1
ln( 1 + x).
(b) Suppose 0 < x 1. Using Taylor’s Theorem, prove that lim
n
R
n
(x) = 0.
(If 1 < x < 0, the argument is tougher, being similar to Exercise 11)
8. Why can’t we use Taylor’s Theorem to approximate the error in
1
1x
= 1 + x + R
1
(x) when
x 1? Try it when x = 2, what happens? What about when x = 2?
9. Prove Taylor’s Theorem with integral remainder when c = 0 by using the following as an
induction step: for each n N, define
A
n
(x) =
Z
x
0
(x t)
n
n!
f
(n+1)
( t) dt
and use integration by parts to prove that A
n+1
= A
n
x
n+1
(n+1)!
f
(n+1)
(0).
(The Cauchy form follows from the intermediate value theorem for integrals which we’ll see later)
68
10. Consider the function
f (x) =
(
e
1/x
if x > 0
0 otherwise
(a) Prove by induction that there exists a degree 2n polynomial q
n
for which
f
(n)
(x) = q
n
1
x
e
1/x
whenever x > 0
(b) Prove that f is infinitely differentiable at x = 0 with f
(n)
(0) = 0 (use Exercise 30.3).
The Maclaurin series of f is identically zero! Moreover, f is smooth (infinitely differentiable) on R but
non-analytic at zero since it does not equal its Taylor series on any open interval containing zero.
A modification allows us to create bump functions, which find wide use in analysis. If a < b, define
g
a,b
: x 7 f (x a) f (b x)
This is smooth on R but non-zero only on the interval (a, b). A
further modification involving two such functions g
a,b
creates
a smooth function on R which satisfies
h
a,b,ϵ
(x) =
(
0 if x a ϵ or x b + ϵ
1 if a x b
This ‘switches on’ rapidly from 0 to 1 near a and switches off
similarly near b. By letting ϵ be small, we smoothly (but not
uniformly) approximate the indicator function on [a, b].
0
h
a,b,ǫ
(x)
0
x
aa ǫ b b + ǫ
1
11. (Hard) We prove the binomial series formula. Let f (x) = (1 + x)
α
and g(x) = 1 +
n=1
a
n
x
n
where a
n
=
α(α1)···(αn+1)
n!
. Our goal is to prove that f = g on the interval (1, 1).
(a) Check that f
(n)
(0) = n!a
n
so that g really is the Maclaurin series of f .
(b) i. Prove that the radius of convergence of g is 1.
ii. Prove that lim
n
na
n
x
n
= 0 whenever
|
x
|
< 1.
iii. If
|
x
|
< 1 and ξ lies between 0 and x, prove that
xξ
1+ξ
|
x
|
.
(Hint: ξ = tx for some t (0, 1). . . )
(c) Use Taylor’s Theorem with Cauchy remainder to prove that
|
R
n
(x)
|
< (n + 1)
|
a
n+1
||
x
|
n+1
(1 + ξ)
α1
Hence conclude that g = f whenever
|
x
|
< 1.
(d) Here is an alternative argument:
i. Show that (n + 1)a
n+1
+ na
n
= αa
n
.
ii. Differentiate term-by-term to prove directly that g satisfies the differential equation
(1 + x)g
(x) = αg(x). Solve this to show that g = f whenever
|
x
|
< 1.
69
4 Integration
The theory of infinite series addresses the problem of summing infinitely many finite quantities. By
contrast, integration is the business of summing infinitely many infinitesimal quantities. Mathemati-
cians have attempted to do both for well over 2000 years, and the philosophical objections are just as
old.
19
The development and increased application of calculus from the late 1600s spurred mathemati-
cians to try to put the theory on a firmer footing, though from Newton and Leibniz it took another
150 years before Bernhard Riemann (1856) provided a thorough development of the integral.
32 The Riemann Integral
The basic idea behind Riemann integration is to approximate area using a sequence of rectangles
whose width tends to zero. The following discussion is hopefully familiar.
Example 4.1. Consider f (x) = x
2
defined on [0, 1].
For each n N, let x =
1
n
and define x
i
= ix.
Above each subinterval [x
i1
, x
i
], raise a rectangle of height
f (x
i
) = x
2
i
.
The sum of the areas of these rectangles is the Riemann sum
with right-endpoints
20
R
n
=
n
i=1
f (x
i
) x =
n
i=1
i
2
n
3
=
n( n + 1)(2n + 1)
6n
3
=
1
3
+
3n + 1
6n
2
The Riemann sum with left-endpoints is defined similarly:
L
n
=
n
i=1
f (x
i1
) x =
n
i=1
(i 1)
2
n
3
=
1
3
3n 1
6n
2
Since f is increasing, the area A under the curve satisfies
L
n
A R
n
and the squeeze theorem allows us to conclude that A =
1
3
.
0
1
0 1
n =
16
R
n
=
0.365234
0
1
0 1
n =
16
L
n
=
0.302734
The example contains the essential idea, but more flexibility is needed. To get further, we must
properly define the concepts of partition and Riemann sum.
19
Two of Zeno’s ancient paradoxes are relevant here: Achilles and the Tortoise concerns a convergent infinite series,
while the Arrow Paradox discusses a difficulty with integration by questioning whether time can be considered as a sum
of instants. Perhaps the most famous contemporary criticism comes from Bishop George Berkeley, who gave his name
to the Californian city and thus the first UC campus: in The Analyst (1734), Berkeley savaged the foundations of calcu-
lus, describing the infinitesimal increments required in Newton’s theory of fluxions (derivatives) as merely the “ghosts of
departed quantities.”
20
Now is a good time to review some identities:
n
i=1
i =
1
2
n(n + 1),
n
i=1
i
2
=
1
6
n(n + 1)(2n + 1),
n
i=1
i
3
=
1
4
n
2
(n + 1)
2
Definition 4.2. A partition P = {x
0
, . . . , x
n
} of an interval [a, b] is a finite sequence such that
a = x
0
< x
1
< ··· < x
n1
< x
n
= b
For each 1 i n, define x
i
= x
i
x
i1
. The mesh of the partition is mesh(P) := max x
i
.
Choose a sample point x
i
in each subinterval [x
i1
, x
i
].
If f : [a, b] R, the Riemann sum
n
i=1
f (x
i
) x
i
computes the area of a family of n rectangles.
f (x)
x
x
1
x
1
x
2
x
2
x
3
x
3
x
4
x
4
x
5
x
5
x
6
x
6
b = x
7
x
7
a = x
0
b = x
7
In elementary calculus, one typically computes Riemann sums for equally-spaced partitions with left,
right or middle sample points. The double freedom of partition & sample points makes applying the
definition a challenge, so instead we consider two special families of rectangles.
Definition 4.3. Given a partition P of [a, b] and a bounded function f on [a, b], define
M
i
= sup
x[x
i1
,x
i
]
f (x) U( f , P) =
n
i=1
M
i
x
i
m
i
= inf
x[x
i1
,x
i
]
f (x) L( f , P) =
n
i=1
m
i
x
i
U( f , P) and L( f , P) are the upper and lower Darboux sums for
f with respect to P. The upper and lower Darboux integrals are
U( f ) = inf U( f , P) L( f ) = sup L( f , P)
where the supremum and infimum are over all partitions.
We say that f is (Riemann) integrable on [a, b] if U( f ) = L( f )
and denote this value by
Z
b
a
f or
Z
b
a
f (x) dx
If the interval is understood or irrelevant, it is common just
to say that f is integrable and write
R
f .
a x
1
x
2
x
3
x
4
x
5
x
6
x
7
b
Upper Darboux sum U( f , P)
a x
1
x
2
x
3
x
4
x
5
x
6
x
7
b
Lower Darboux sum L( f , P)
Intuitively, L( f , P) is the sum of the areas of rectangles built on P which just fit under the graph of
f . It is also the infimum of all Riemann sums on P. If f is discontinuous, then L( f , P) need not be a
Riemann sum; there might not be suitable sample points!
71
Examples 4.4. 1. We revisit Example 4.1 in this language.
Given a partition Q = {x
0
, . . . , x
n
} of [0, 1] and sample points x
i
[x
i1
, x
i
], we compute the
Riemann sum for f (x) = x
2
n
i=1
f (x
i
) x
i
=
n
i=1
(x
i
)
2
(x
i
x
i1
)
Since f is increasing, we have x
2
i1
(x
i
)
2
x
2
i
on each interval, whence
L( f , Q) =
n
i=1
(x
i1
)
2
(x
i
x
i1
)
n
i=1
(x
i
)
2
(x
i
x
i1
)
n
i=1
(x
i
)
2
(x
i
x
i1
) = U( f , Q)
The Darboux sums are therefore the Riemann sums for right and left endpoints.
If we take Q
n
to be the partition with subintervals of equal width x =
1
n
, then
U( f ) = inf
P
U( f , P) U( f , Q
n
) =
n
i=1
i
n
2
x = R
n
is the right Riemann sum discussed originally. Similarly L( f ) L
n
. Since L
n
and R
n
both
converge to
1
3
as n , the squeeze theorem forces
L
n
L( f ) U( f ) R
n
= L( f ) = U( f ) =
1
3
whence f is Riemann integrable on [0, 1] with
R
1
0
x
2
dx =
1
3
.
2. Suppose f (x) = kx + c on the interval [a, b] where
k > 0. Take the evenly spaced partition P
n
where
x
i
= a +
ba
n
i with x
i
=
ba
n
. Since f is increasing,
the upper Darboux sum is again the Riemann sum
with right-endpoints:
U( f , P
n
) = R
n
=
n
i=1
f (x
i
) x
=
b a
n
n
i=1
k(b a)
n
i + ak + c
0
ba
c
ak + c
bk + c
U( f , P
n
)
=
b a
n
k(b a)
n
·
1
2
n( n + 1) + (ak + c)n
n
1
2
k(b a)
2
+ (b a)(ak + c) =
k
2
( b
2
a
2
) + c(b a)
We similarly see that the lower Darboux sum is given by the Riemann sum with left endpoints,
and that
L( f , P
n
) = L
n
=
b a
n
k(b a)
n
·
1
2
n( n 1) + (ak + c)n
n
k
2
( b
2
a
2
) + c(b a)
By the same argument as above, L
n
L( f ) U( f ) R
n
and the squeeze theorem show that
f is integrable with
R
b
a
f =
k
2
( b
2
a
2
) + c(b a).
72
Following the examples, a few remarks are in order.
Riemann versus Darboux Definition 4.3 is really that of the Darboux integral. Riemann’s definition is
as follows: for f [a, b] R to be integrable with integral
R
b
a
f means
ϵ > 0, δ such that P, x
i
, mesh(P) < δ =
n
i=1
f (x
i
) x
i
Z
b
a
f
< ϵ
It can be shown that this is equivalent to the Darboux integral. We won’t pursue Riemann’s
formulation further, except to observe that if a function is integrable and mesh(P
n
) 0, then
R
b
a
f = lim
n
n
i=1
f (x
i
) x
i
: this allows us to approximate integrals using any sample points we
choose, hence why right endpoints (x
i
= x
i
) are so common in Freshman calculus.
Monotone Functions Darboux sums are particularly easy to compute for monotone functions. As in
the examples, if f is increasing, then each M
i
= f (x
i
), from which U( f , P) is the Riemann sum
with right-endpoints. Similarly, L( f , P) is the Riemann sum with left-endpoints. The roles reverse
if f is decreasing.
Area If f is positive and continuous,
21
the Riemann integral
R
b
a
f serves as a definition for the area
under the curve y = f (x). This should make intuitive sense:
1. In the second example where we have a straight line, we obtain the same value for the
area by computing directly as the sum of a rectangle and a triangle!
2. If the area under the curve is to make sense, then, for any partition P, it plainly satisfies
the inequalities
L( f , P) Area U( f , P)
But these are exactly the same as those satisfied by the integral itself:
L( f , P) L( f ) =
Z
b
a
f = U( f ) U( f , P)
In the examples we exhibited a sequence of partitions (P
n
) where U( f , P
n
) and L( f , P
n
) both con-
verged to the same limit. The next results develop some basic properties of partitions and make this
process rigorous.
Lemma 4.5. Suppose f : [a, b] R is bounded and suppose P, Q are partitions of [a, b].
1. If Q is a refinement of P, that is P Q, then
L( f , P) L( f , Q) U( f , Q) U( f , P)
2. For any partitions P, Q, we have L( f , P) U( f , Q)
3. L( f ) U( f )
21
We’ll see later (Theorem 4.16) that every continuous function is integrable.
73
Proof. 1. We prove inductively. First suppose that Q = P {t} contains exactly one additional point
t (x
k1
, x
k
). Write
m
1
= inf{f (x) : x [x
k1
, t]}
m
2
= inf{f (x) : x [t, x
k1
]}
m = inf{f (x) : x [x
k1
, x
k
]} = min{m
1
, m
2
}
The Darboux sums L( f , P) and L( f , Q) are identical ex-
cept for the terms involving t; indeed
x
k1
x
k
t
Extra area!
··· ···
m
1
m
2
L( f , Q) L( f , P) = m
1
( t x
k1
) + m
2
(x
k
t) m(x
k
x
k1
)
= (m
1
m)(t x
k1
) + (m
2
m)(x
k
t) 0
Since partitions are finite sets, by induction we see that P Q = L( f , P) L( f , Q).
The argument for U( f , Q) U( f , P) is similar, and the middle inequality is trivial.
2. If P and Q are partitions, then P Q is a refinement of both P and Q. By part 1,
L( f , P) L( f , P Q) U( f , P Q) U( f , Q) ()
3. We leave this as an exercise.
Theorem 4.6 (Cauchy criterion for integrability). Suppose f : [a, b] R is bounded.
1. f is integrable ϵ > 0, P such that U( f , P) L( f , P) < ϵ
2. f is integrable (P
n
)
nN
such that U( f , P
n
) L( f , P
n
) 0. Moreover, in such a case
both sequences U( f , P
n
) and L( f , P
n
) converge to
R
b
a
f .
We call this a Cauchy criterion since integrability is demonstrated without mention of the integral!
Proof. 1. () Suppose f is integrable. Since inf U( f , Q) =
R
f = sup L( f , R), Q, R such that
U( f , Q)
Z
f <
ϵ
2
and
Z
f L( f , R) <
ϵ
2
Let P = Q R and apply (): L( f , R) L( f , P) U( f , P) U( f , Q). But then
U( f , P) L( f , P) U( f , Q) L( f , R) = U( f , Q)
Z
f +
Z
f L( f , R) < ϵ
() For every partition, L( f , P) L( f ) U( f ) U( f , P). Thus
0 U( f ) L( f ) U( f , P) L( f , P) < ϵ
Since this holds for all ϵ > 0, we see that U( f ) = L( f ).
2. This is an exercise.
74
Examples 4.7. 1. The freedom to choose a partition can be very useful. Consider f (x) =
x on the
interval [0, b]. We choose a partition that evaluates nicely when fed to this function:
P
n
= {x
0
, . . . , x
n
} where x
i
=
i
n
2
b
= x
i
= x
i
x
i1
=
b
n
2
i
2
(i 1)
2
=
(2i 1)b
n
2
Since f is increasing on [0, b], we see that
U( f , P
n
) =
n
i=1
f (x
i
) x
i
=
n
i=1
i
b
n
·
(2i 1)b
n
2
=
b
3/2
n
3
n
i=1
2i
2
i
=
b
3/2
n
3
1
3
n( n + 1)(2n + 1)
1
2
n( n + 1)
n
2
3
b
3/2
Similarly
L( f , P
n
) =
n
i=1
f (x
i1
) x
i
=
n
i=1
(i 1)
b
n
·
(2i 1)b
n
2
=
b
3/2
n
3
n
i=1
2i
2
3i + 1
=
b
3/2
n
3
1
3
n( n + 1)(2n + 1)
3
2
n( n + 1) + n
n
2
3
b
3/2
Since these limits are equal, we conclude that f is integrable and that
R
b
0
x dx =
2
3
b
3/2
.
0
0 b
b
Upper Sum U( f , P
n
)
0
0 b
b
Lower Sum L( f , P
n
)
2. We finish this section with the classic example of a non-integrable function. Let f : [a, b] R
to be the indicator function of the irrational numbers,
f (x) =
(
1 if x ∈ Q
0 if x Q
Since any interval of positive length contains both rational and irrational numbers, we see that
sup
f (x) : x [x
i1
, x
i
]
= 1 and inf
f (x) : x [x
i1
, x
i
]
= 0
for any partition P = {x
0
, . . . , x
n
}. We conclude that
U( f , P) =
n
i=1
(x
i
x
i1
) = b a = U( f ) = b a and
L( f , P) = 0 = L( f ) = 0
Since the upper and lower integrals are unequal, f is not Riemann integrable.
75
As any freshman calculus student can attest, if you can find an anti-derivative, then the fundamental
theorem of calculus (Section 34) makes evaluating integrals far easier. For instance, you are probably
desperate to write
d
dx
2
3
x
3/2
= x
1/2
=
Z
b
0
x dx =
2
3
x
3/2
b
0
=
2
3
b
3/2
rather than computing Riemann/Darboux sums as in the previous example! In most practical cases,
however, no easy-to-compute anti-derivative exists, so the best we can do is approximate integrals
by evaluating Riemann sums for progressively finer partitions. Thankfully computers excel at such
tedious work!
Exercises 32 1. For each function on the given interval, use partitions to find the upper and lower
Darboux integrals. Hence prove that the function is integrable and compute its integral.
(a) f (x) = x
3
on [0, b] for any b > 0.
(b) g(x) =
3
x on [0, b].
(Hint: mimic Example 4.7.1)
2. Repeat question 1 for the following two functions. You cannot simply compute Riemann sums
for left and right endpoints and take limits: why not?
(a) h(x) = x(2 x) on [0, 2]
(Hint: choose a partition with 2n points such that x
n
= 1 and observe that h(2 x) = h(x))
(b) k(x) =
(
2x if x 1
5 x if x > 1
on [0, 3].
(Hint: this time try a partition with 3n points. . . )
3. Let f (x) = x for rational x and f (x) = 0 for irrational x.
(a) Calculate the upper and lower Darboux integrals for f on the interval [0, b].
(b) Is f integrable on [0, b]?
4. Prove part 3 of Lemma 4.5: L( f ) U( f ).
5. Prove part 2 of Theorem 4.6.
f is integrable (P
n
)
nN
such that U( f , P
n
) L( f , P
n
) 0.
Moreover, both U( f , P
n
) and L( f , P
n
) converge to
R
f .
6. (a) Reread Definition 4.3. What happens if we allow f : [a, b] R to be unbounded?
(b) Read Riemann versus Darboux on page 73. Explain why being Riemann integrable also
forces f to be bounded.
7. (If you like coding) Write a short program to estimate
R
b
a
f (x) dx using Riemann sums. This
can be very simple (equal partitions with right endpoints), or more complex (random partition
and sample points given a mesh). Apply your program to estimate
R
5
0
sin(x
2
e
x
) dx.
76
33 Properties of the Riemann Integral
The rough take-away of this long section is that everything you think is integrable probably is! There
will not be many examples since we have not established many explicit values for integrals.
Theorem 4.8 (Linearity). If f , g are integrable and k, l are constant, then k f + lg is integrable and
Z
k f + lg = k
Z
f + l
Z
g
Example 4.9. Thanks to examples in the previous section, we can now calculate, for instance
Z
2
0
5x
3
3
x dx = 5 ·
1
4
·2
4
3 ·
2
3
·2
3/2
= 20 4
2
Proof. Suppose ϵ > 0 is given. By Theorem 4.6 part 3, there exist partitions R, S such that
U( f , R) L( f , R) <
ϵ
2
and U(g, S) L(g, S) <
ϵ
2
By Theorem 4.6 part 1, if P := R S, then both inequalities are satisfied by P. On each subinterval,
inf f (x) + inf g(x) inf( f (x) + g(x)) and sup( f (x) + g(x) ) sup f (x) + sup g(x)
since the individual suprema/infima could be ‘evaluated’ at different places. Thus
L( f , P) + L(g, P) L( f + g, P) U( f + g, P) U( f , P) + U(g, P)
whence U( f + g, P) L( f + g, P) < ϵ and f + g is integrable. Moreover,
Z
( f + g)
Z
f
Z
g
U( f , P)
Z
f
+
U(g, P)
Z
g
< ϵ
Using lower Darboux integrals similarly, we see that
ϵ <
Z
( f + g)
Z
f
Z
g < ϵ
Since this holds for all ϵ > 0, we conclude that
R
( f + g) =
R
f +
R
g.
That k f is integrable with
R
k f = k
R
f is an exercise. Put these together for the result.
Corollary 4.10 (Changing endvalues). Suppose g is integrable on [a, b] and that f : [a, b] R
satisfies f (x) = g(x) on (a, b). Then f is integrable on [a, b] and
R
b
a
f =
R
b
a
g.
Definition 4.11 (Integration on an open interval). A bounded function f : (a, b) R is integrable if
it has an integrable extension g : [a, b] R where f (x) = g(x) on (a, b). In such a case, we define
R
b
a
f :=
R
b
a
g.
The Corollary (its proof is an exercise) shows that the choice of extension is irrelevant.
77
Theorem 4.12 (Basic Comparisons). Suppose f and g are integrable on [a, b].
1. If f (x) g(x), then
R
f
R
g.
2. If m f (x) M then m(b a)
R
b
a
f M(b a).
3. f g is integrable.
4.
|
f
|
is integrable and
R
f
R
|
f
|
.
5. max( f , g) and min( f , g) are integrable.
Part 3 is not integration by parts and does not tell us how
R
f g relates to
R
f and
R
g!
Proof. 1. Since g(x) f (x) 0 is integrable, L(g f , P) 0 for all partitions P, and so
0 L(g f ) =
Z
g f =
Z
g
Z
f
2. Apply part 1 twice.
3. This is an exercise.
4. The integrability is an exercise. For the comparison, apply part 1 to
|
f
|
f
|
f
|
.
5. max( f , g) =
1
2
( f + g) +
1
2
|
f g
|
, etc.
Theorem 4.13 (Domain splitting). Suppose that f : [a, b] R
and let c (a, b). If f is integrable on both [a, c] and [c, b], then it
is integrable on [a, b] and
Z
b
a
f =
Z
c
a
f +
Z
b
c
f
f (x)
a c b x
R
c
a
f
R
b
c
f
In light of this result, it is conventional to allow integral limits to be reversed:
Z
a
b
f :=
Z
b
a
f is consistent with
Z
a
a
f = 0
Proof. Let ϵ > 0 be given, then R, S partitions of [a, c], [ c, b] such that
U( f , R) L( f , R) <
ϵ
2
, U( f , S) L( f , S) <
ϵ
2
Choose P = R S to partition [a, b], then
U( f , P) L( f , P) = U( f , R) + U( f , S) L( f , R) L( f , S) < ϵ
Moreover
f (x)
a c b x
R
z
} | {
S
z
} | {
Z
b
a
f
Z
c
a
f
Z
b
c
f U( f , P) L( f , R) L( f , S) = U( f , P) L( f , P) < ϵ
The other side is similar.
78
Example 4.14. If f (x) =
x on [0, 1] and f (x) = 1 on [1, 2], then
Z
2
0
f =
Z
1
0
x dx +
Z
2
1
1 dx =
2
3
+ 1 =
5
3
Monotonic & Continuous Functions
We are now in a position to establish the integrability of two large classes of functions.
Definition 4.15. A function f : [a, b] R is:
Monotonic if it is either increasing (x < y = f (x) f (y)) or decreasing.
Piecewise monotonic if there is a partition P = {x
0
, . . . , x
n
} of [a, b] such that f is monotonic on each
open subinterval (x
k1
, x
k
).
Piecewise continuous if there is a partition such that f is uniformly continuous on each (x
k1
, x
k
).
Theorem 4.16. If f is monotonic or continuous on [a, b] , then it is integrable.
Examples 4.17. 1. Since sine is continuous, we can approximate via a sequence of Riemann sums
Z
π
0
sin x dx =
π
n
lim
n
n
i=1
sin
πi
n
Evaluating this limit is another matter entirely, one best handled in the next section...
2. Similarly, e
x
can be integrated and therefore approximated via Riemann sums:
Z
1
0
e
x
dx =
1
n
lim
n
n
i=1
exp
r
i
n
= lim
n
n
i=1
2j 1
n
exp
j
n
Both sums use right endpoints; the first has equal subintervals and the second is analogous to
Example 4.7.1. These limits would typically be estimated using a computer.
Proof. Suppose f : [a, b] R is continuous. Since [a, b] is closed and bounded, f is uniformly contin-
uous. Let ϵ > 0 be given:
δ > 0 such that x, y [a, b],
|
x y
|
< δ =
|
f (x) f (y)
|
<
ϵ
b a
Let P be a partition with mesh P < δ. Since f attains its bounds on each [x
i1
, x
i
] (extreme value
theorem),
x
i
, y
i
[x
i1
, x
i
] such that M
i
m
i
= f (x
i
) f (y
i
) <
ϵ
b a
from which
U( f , P) L( f , P) <
n
i=1
ϵ
b a
(x
i
x
i1
) = ϵ
The monotonicity argument is an exercise.
79
Corollary 4.18. Piecewise continuous and bounded piecewise monotonic functions are integrable.
Proof. If f is piecewise continuous, then the restriction of f to (x
k1
, x
k
) has a continuous extension
g
k
: [x
k1
, x
k
] R; integrable by Theorem 4.16. By Corollary 4.10, f is integrable on [x
k1
, x
k
] with
R
x
k
x
k1
f =
R
x
k
x
k1
g
k
. Several applications of Theorem 4.13 finish things off:
Z
b
a
f =
n
k=1
Z
x
k
x
k1
f
The argument for piecewise monotonicity is similar.
Example 4.19. The ‘fractional part’ function f (x) = x x
is both piecewise continuous and piecewise monotone on any
bounded interval. It is therefore integrable on any [a, b].
0
1
0 1 2 3 4 5
We finish with the final incarnation of the intermediate value theorem.
Corollary 4.20 (IVT for integrals). If f is continuous on [a, b], then ξ (a, b) for which
f (ξ) =
1
b a
Z
b
a
f
Proof. Since f is continuous, it is integrable on [a, b]. By the extreme value theorem it is also bounded
and attains its bounds: p, q [a, b] such that
f (p) := inf
x[a,b]
f (x), f (q) = sup
x[a,b]
f (x)
Applying Theorem 4.12, part 2, with m = f (p) and
M = f (q), we see that
( b a) f (p)
Z
b
a
f (b a) f (q)
f
av
M
m
ξa bp q
Now divide by b a and apply the usual intermediate value theorem for f to see that the required ξ
exists between p and q.
In the picture, when f is positive and continuous, the grey area equals that under the curve; imagine
levelling off the blue hill with a bulldozer. . . The notation f
av
=
1
ba
R
b
a
f is short for the average value
of f on [a, b]: to see why this interpretation is sensible, approach
R
f via a sequence of Riemann sums
on equally-spaced partitions P
n
, then
1
b a
Z
b
a
f = lim
n
n
i=1
f (x
i
) x = lim
n
f (x
1
) + ··· + f (x
n
)
n
is the limit of a sequence of averages of equally-spaced samples f (x
i
).
80
What can/can’t be integrated? (non-examinable)
We now know a great many examples of integrable functions: essentially
Piecewise continuous & monotonic functions are integrable.
Linear combinations, products, absolute values, maximums and minimums of (already) inte-
grable functions.
After so many positive integrability conditions, it is reasonable to ask precisely which functions are
Riemann integrable. There is a precise answer, though it is quite tricky to understand.
Theorem 4.21 (Lebesgue). Suppose f : [a, b] R is bounded. Then
f is Riemann integrable it is continuous except on a set of measure zero
Na
¨
ıvely, the measure of a set is the sum of the lengths of its maximal subintervals; though unfor-
tunately this doesn’t make for a very useful definition.
22
Any countable subset has measure zero;
Lebesgue’s result is almost as if we can extend Corollary 4.18 to allow for infinite sums. Indeed
you might have encountered a function which is continuous only on the irrationals; such a function
is Riemann integrable. There are also some uncountable sets with measure zero such as Cantor’s
middle-third set: if f is the indicator function of Cantor’s set
f (x) =
(
1 if x C
0 otherwise
then f is continuous except on C, and is Riemann integrable with
R
1
0
f (x) dx = 0.
Exercises 33 1. Explain why
R
2π
0
x
2
sin
8
( e
x
) dx
8
3
π
3
2. If f is integrable on [a, b] prove that it is integrable on any interval [c, d] [a, b].
3. We complete the proof of Theorem 4.8 (linearity of integration).
(a) Suppose k > 0, let A R and define kA := {kx : x A}. Prove that sup kA = k sup A
and inf kA = k inf A.
(b) If k > 0 prove that k f is integrable on any interval and that
R
k f = k
R
f .
(c) How should you modify your argument if k < 0?
4. Give an example of an integrable but discontinuous function on a closed bounded interval [a, b]
for which the conclusion of the Intermediate Value Theorem for Integrals is false.
22
Formally, the length of an open interval (a, b) is b a and a set A R has measure zero if
ϵ > 0, open intervals I
n
such that A
[
n=1
I
n
and
i=1
length(I
n
) < ϵ
More generally, the measure of a set (subject to a technical condition) is the infimum of the sum of the lengths of any
countable collection of open covering intervals. A rigorous discussion of measure theory is properly a matter for graduate
analysis. Somewhat surprisingly, there exist sets with positive measure that contain no subintervals, and even sets which
are non-measurable!
81
5. Explicitly compute the value of the integral
R
15/2
1/2
x xdx (recall Example 4.19).
6. We prove and extend Corollary 4.10. Suppose f is integrable on [a, b].
(a) If g : [a, b] R satisfies f (x) = g(x) for all x (a, b), prove that g is integrable and
R
b
a
g =
R
b
a
f .
(Hint: consider h = f g and show that
R
h = 0)
(b) Now suppose g : [a, b] R satisfies f (x) = g(x) for all x [a, b] except at finitely many
points. Prove that g is integrable and
R
b
a
g =
R
b
a
f .
7. Show that an increasing function on [a, b] is integrable and thus complete Theorem 4.16.
(Hint: Choose a partition P with mesh P <
ϵ
f (b)f (a)
)
8. Suppose f and g are integrable on [a, b].
(a) Define h(x) = ( f (x))
2
. We know:
f is bounded: K such that
|
f (x)
|
K on [a, b] .
Given ϵ > 0, P such that U( f , P) L( f , P) <
ϵ
2K
. For each subinterval [x
i1
, x
i
], let
M
i
= sup f (x), m
i
= inf f (x), M
i
= sup h(x), m
i
= inf h(x)
Prove that M
i
m
i
2(M
i
m
i
)K and use this to conclude that h is integrable.
(b) Prove that f g is integrable.
(Hint: f g =
1
4
( f + g)
2
1
4
( f g)
2
)
(c) Prove that U(
|
f
|
, P) L(
|
f
|
, P) U( f , P) L( f , P) for any partition P. Hence conclude
that
|
f
|
is integrable.
(One can extend these arguments—it’s a bit harder!—to show that if j is continuous, then j f is
integrable. Parts (a) and (c) correspond, respectively, to j(x) = x
2
and j(x) =
|
x
|
.)
9. (Hard) Let f (x) =
x if x = 0 and sin
1
x
> 0
x if x = 0 and sin
1
x
< 0
0 if x = 0
(a) Show that f is not piecewise continuous on [0, 1].
(b) Show that f is not piecewise monotonic on [0, 1].
(c) Show that f is integrable on [0, 1].
(Hint: given ϵ, hunt for a suitable partition to make U( f , P) L( f , P) < ϵ by considering [0, x
1
]
differently to the other subintervals)
(d) Make a similar argument to show that g = sin
1
x
is integrable on ( 0, 1], where
g(x) =
(
sin
1
x
if x = 0
0 if x = 0
Note that neither argument evaluates the integrals!
82
34 The Fundamental Theorem of Calculus
The key result linking integration and differentiation is usually presented in two parts.
23
While there
are significant subtleties, the rough statements are as follows:
Part I Differentiation reverses integration:
d
dx
R
x
a
f (t) dt = f (x)
Part II Integration reverses differentiation:
R
b
a
F
(x) dx = F(b) F(a)
These facts seemed intuitively obvious to early practitioners of calcu-
lus: indeed, given a continuous positive function f :
Let F(x) denote the area under the curve between 0 and x.
A small increase x results in the area increasing by F.
F f (x)x is approximately the area of a rectangle, whence
F
x
f (x). This is part I.
F(b) F(a)
F
i
f (x
i
) x
i
. Since F
= f , this is part II.
F
x
x
f (x)
In fact when Leibniz introduced the symbols
R
and d in the late 1600’s, it was partly to reflect the
fundamental theorem.
24
If you’re happy with non-rigorous notions of limit, rate of change, area, and
(infinite) sums, the above is all you need!
Of course, we are very much concerned with the details: What must we assume regarding f and F,
and how are these properties used in the proof?
Theorem 4.22 (FTC, part I). Suppose f is integrable on [a, b]. For any x [a, b], define
F(x) :=
Z
x
a
f (t) dt
Then:
1. F is uniformly continuous on [a, b];
2. If f is continuous at c [a, b], then F is differentiable at c with F
( c) = f (c) .
As ever, the condition at c = a should be right-continuous and the conclusion right-differentiable, etc.
Compare this with the na
¨
ıve version above where we assumed f was continuous. We now require
only the integrability of f , and its continuity at one point for the full result.
23
We follow the traditional numbering; some authors reverse these.
24
R
is a stylized S for sum, while d stands for difference. Given a sequence F = (F
0
, F
1
, F
2
, . . . , F
n
), construct a new
sequence of differences
dF = (F
1
F
0
, F
2
F
1
, . . . , F
n
F
n1
)
which can then be summed:
Z
dF = (F
1
F
0
) + (F
2
F
1
) + ···(F
n
F
n1
) = F
n
F
0
()
Viewing a function as an ‘infinite sequence’ of values spaced along an interval, dF becomes a sequence of infinitesimals and
() is essentially the fundamental theorem:
R
dF = F(b) F(a). It is the conception of a function that is suspect here, not
the essential relationship between sums and differences.
83
Examples 4.23. You should have seen many examples in an elementary calculus course.
1. Since f (x) = sin
2
(x
3
7) is continuous on any bounded interval, we conclude that
d
dx
Z
x
4
sin
2
( t
3
7) dt = sin
2
(x
3
7)
If one follows Theorem 4.13 and its resulting conventions, then this is valid for all x R.
2. The chain rule permits more complicated examples. For instance, since f (t) = sin
t is contin-
uous on its domain [0, ) and y(x) = x
2
+ 3 has range [3, ) dom( f ), we have
d
dx
Z
x
2
+3
0
sin
t dt =
dy
dx
d
dy
Z
y
0
sin
t dt = 2x sin
p
x
2
+ 3
3. For a final positive example, observe that
d
dx
Z
e
x
sin x
tan(t
2
) dt = e
x
tan(e
2x
) cos x tan(sin
2
x)
To evaluate this, one first chooses any constant a and writes
Z
e
x
sin x
=
Z
e
x
a
+
Z
a
sin x
=
Z
e
x
a
Z
sin x
a
before differentiating. This is valid provided sin x, e
x
and a all lie in the same subinterval of
dom tan(t
2
) = R \ {±
q
π
2
, ±
q
3π
2
, ±
q
5π
2
, . . .}
Since
|
sin x
|
1 <
p
π
2
, this requires
e
2x
<
π
2
x <
1
2
ln
π
2
Choosing a = 1 would certainly suffice.
4. Now consider why the theorem requires continuity. The piecewise
continuous function
f : [0, 2] R : x 7
(
2x if x 1
1
2
if x > 1
has a jump discontinuity at x = 1. We can still compute
F(x) =
(
R
x
0
2t dt = x
2
if x 1
R
1
0
2t dt +
R
x
1
1
2
dt =
1
2
(x + 1) if x > 1
This is continuous, indeed uniformly so. However the discontinu-
ity of f results in F having a corner and thus being non-differentiable
at x = 1. Indeed, F
(x) = f (x) for all x = 1; that is, at all values of
x where f is continuous.
0
1
2
f (x)
0 1 2
x
0
1
F(x)
0 1 2
x
84
Proving FTC I Neither half of the theorem is particularly difficult once you write down what you
know and what you need to prove. Here are the key ingredients:
1. F uniformly continuous means controlling the size of
|
F(y) F(x)
|
=
Z
y
a
f (t) dt
Z
x
a
f (t) dt
=
Z
y
x
f (t) dt
Z
y
x
|
f (t)
|
dt
But the boundedness of f allows us to bound this last integral. . .
2. F
( c) = f (c) means showing that lim
xc
F(x)F(c)
xc
= f (c), which means controlling the size of
F(x) F(c)
x c
f (c)
=
1
x c
Z
x
c
f (t) dt f (c)
The trick here will be to bring the constant f (c) inside the integral as
1
xc
R
x
c
f (c) dt so that what
we really have to control is the size of
1
|
xc
|
R
x
c
|
f (t) f (c)
|
dt. This is where the continuity of
f comes in. . .
Proof. 1. Since f is integrable, it is bounded: M > 0 such that
|
f (x)
|
M for all x.
Let ϵ > 0 be given and define δ =
ϵ
M
. Then, for any x, y [a, b],
0 < y x < δ =
|
F(y) F(x)
|
=
Z
y
x
f (t) dt
Z
y
x
|
f (t)
|
dt (Theorem 4.12, part 4)
M(y x) (Theorem 4.12, part 2)
< Mδ = ϵ
We conclude that F is uniformly continuous on [a, b].
2. Let ϵ > 0 be given. Since f is continuous at c, δ > 0 such that, for all t [a, b],
|
t c
|
< δ =
|
f (t) f (c)
|
<
ϵ
2
Now for all x [a, b] (except c),
0 <
|
x c
|
< δ =
F(x) F(c)
x c
f (c)
=
1
x c
Z
x
c
f (t) f (c) dt
(Theorem 4.8)
1
|
x c
|
Z
x
c
|
f (t) f (c)
|
dt (Theorem 4.12)
1
|
x c
|
ϵ
2
|
x c
|
=
ϵ
2
< ϵ
Clearly lim
xc
F(x)F(c)
xc
= f (c), and so F is differentiable at c with F
( c) = f (c) .
85
The Fundamental Theorem, part II As with part I, the formulaic part of the result should be familiar,
though we are more interested in the assumptions and where they are needed.
Theorem 4.24 (FTC, part II). Suppose g is continuous on [a, b], differentiable on (a, b), and that g
is integrable
25
on (a, b). Then,
Z
b
a
g
= g(b) g(a)
Part II is often expressed in terms of anti-derivatives: F being an anti-derivative of f if F
= f . Com-
bined with FTC I, we recover the familiar +c result and a simpler version of the fundamental theo-
rem often seen in elementary calculus.
Corollary 4.25. Let f be continuous on [a, b].
If F is an anti-derivative of f , then
R
b
a
f = F(b) F(a).
Every anti-derivative has the form F(x) =
R
x
a
f (t) dt + c for some constant c.
Examples 4.26. Again, basic examples should be familiar.
1. Plainly g(x) = x
2
+ 2x
3/2
is continuous on [1, 4] and differentiable on (1, 4) with derivative
g
(x) = 2x + 3
x; this last is continuous (and thus integrable) on ( 1, 4). We conclude that
Z
4
1
2x + 3
x dx = x
2
+ 2x
3/2
4
1
= (16 + 16) (1 + 2) = 29
2. If g(x) = sin(3x
2
), then g
(x) = 6x cos(3x
2
). Certainly g satisfies the hypotheses of the theorem
on any bounded interval [a, b]. We conclude
Z
b
a
6x cos(3x
2
) dx = sin(3b
2
) sin(3a
2
)
Moreover, every anti-derivative of f (x) = 6x cos(3x
2
) has the form F(x) = sin(3x
2
) + c.
3. Recall Example 4.23.4 where we saw that the discontinuity of f led to the non-differentiability
of F(x) =
R
x
0
f (t) dt at x = 1. The function F therefore fails the hypotheses of FTC II on the
interval [0, 2].
However, except at x = 1, F is an anti-derivative of f and moreover
R
2
0
f (x) dx = F(2) F(0),
so we appear to have the formulaic conclusion of FTC II, though this is tautological given the
definition of F!
The way out of this conundrum is to note that other anti-derivatives
ˆ
F of f exist (except at
x = 1), and which fail to satisfy the conclusion. For instance
ˆ
F(x) =
(
x
2
if x < 1
1
2
x if x > 1
=
ˆ
F(2)
ˆ
F(0) = 1 =
3
2
=
Z
2
0
f (x) dx
25
See Definition 4.11 if you’re unsure what it means for g
to be integrable on a bounded open interval.
86
Proving FTC II See Exercise 10 for a relatively easy proof when g
= f is continuous. For the real
McCoy, we can only rely on the integrability of g
: the trick is to use the mean value theorem to write
g( b) g(a) as a Riemann sum over a suitable partition.
Proof. Let ϵ > 0 be given and choose a partition P such that U(g
, P) L(g
, P) < ϵ. Since g satisfies
the mean value theorem on each subinterval of the partition P, we see that
ξ
i
(x
i1
, x
i
) such that g
( ξ
i
) =
g(x
i
) g(x
i1
)
x
i
x
i1
from which
g( b) g(a) =
n
i=1
g(x
i
) g(x
i1
) =
n
i=1
g
( ξ
i
)(x
i
x
i1
)
This is a Riemann sum for g
associated to the partition P, hence,
L(g
, P) g(b) g(a) U(g
, P)
However we also have L(g
, P)
R
b
a
g
U(g
, P). Since these hold for all ϵ, the proof is complete.
While we certainly used the integrability of g
in the proof, it might seem strange that we assumed it
at all: shouldn’t every derivative be integrable? Perhaps surprisingly, the answer is no! If you want
a challenge, look up the Volterra function, which is differentiable everywhere, but whose derivative is
non-integrable (on, for instance, [0, 1])!
The Rules of Integration
If one wants to evaluate an integral, rather than merely show it exists, there are really only two options:
1. Evaluate Riemann sums and take limits: often difficult if not impossible to do explicitly.
2. Use FTC II. The problem now becomes the finding of anti-derivatives, for which the core method
is essentially guess and differentiate. To obtain general rules, we attempt to reverse the rules of
differentiation.
Integration by Parts First consider the product rule: the product g = uv of two differentiable
functions is differentiable with g
= u
v + uv
. Now apply Theorems 4.8, 4.12 and FTC II.
Corollary 4.27 (Integration by Parts). Suppose u, v are continuous on [a, b], differentiable on (a, b),
and that u
, v
are integrable on (a, b). Then
Z
b
a
u
(x)v(x)dx = u(b)v(b) u(a)v(a)
Z
b
a
u(x)v
(x)dx
This is significantly less useful than the product rule since it is only capable of transforming the
integral of a product into another such integral.
87
Examples 4.28. You should have seen myriad examples in a previous course. With practice, there
is no need to explicitly state u and v.
1. Let u(x) = x and v
(x) = cos x. Then u
(x) = 1 and v(x) = sin x, whence
Z
π/2
0
x cos x dx =
[
x sin x
]
π/2
0
Z
π/2
0
sin x dx =
π
2
sin
π
2
0
[
cos x
]
π/2
0
=
π
2
+ cos
π
2
cos 0 =
π
2
1
2. Let u(x) = ln x and v
(x) = 1. Then u
(x) =
1
x
and v(x) = x, whence
Z
e
2
e
ln x dx =
[
x ln x
]
e
2
e
Z
e
2
e
x
x
dx = e
2
ln e
2
e ln e
[
x
]
e
2
e
= 2e
2
e e
2
+ e = e
2
Change of Variables/Substitution We now turn our attention to the chain rule. If g(x) = F
u(x)
,
where F and u are differentiable, then g is differentiable with
g
(x) =
dg
dx
=
dF
du
du
dx
= F
u(x)
u
(x)
Now integrate both sides; the only issue is what assumptions are needed to invoke FTC II.
Theorem 4.29 (Substitution Rule). Suppose we have two continuous functions: u : [a, b] R and
f : range(u) R. Suppose also that u is differentiable on (a, b) with integrable derivative u
. Then
Z
b
a
f
u(x)
u
(x) dx =
Z
u(b)
u(a)
f (u) du
This is the famous u-sub’ formula from elementary calculus.
Proof. We leave as an exercise the verification that both integrals exist. We may also assume that
range(u) is an interval of positive length
26
for otherwise both integrals are trivially zero.
Choose any c range( u) and define
F : range(u) R by F(v) :=
Z
v
c
f (t) dt
Since f is continuous, by FTC I we see that F is differentiable with F
( u) = f (u) . But now
Z
b
a
f
u(x)
u
(x) dx =
Z
b
a
d
dx
F
u(x)
dx (chain rule)
= F
u( b)
F
u(a)
(FTC II)
=
Z
u(b)
u(a)
f (u) du
26
By the intermediate and extreme value theorems, range(u) is already a closed bounded interval.
88
Examples 4.30. Reading the theorem is bad enough; its application often requires significant creativ-
ity in order to recognize a suitable substitution.
27
1. To evaluate the integral
R
π
0
2x sin x
2
dx, consider the substitution u(x) = x
2
defined on
[0,
π]. Certainly u is continuous, and its derivative u
(x) = 2x is integrable on (0,
π).
Finally f (u) = sin u is continuous on range(u) = [0, π]. The hypotheses are satisfied, whence
Z
π
0
2x sin x
2
dx =
Z
π
0
f
u(x)
u
(x) dx =
Z
π
0
sin u du = cos u
π
0
= 2
2. For the following integral with f (u) =
1
u
2
+1
, we make the substitution u(x) = x
2
2. Note
that u : [
2,
3] [0, 1] and that u
(x) = 2x is integrable; moreover, f (u) is continuous on
range(u) = [0, 1]. We conclude that
Z
3
2
2x
x
4
4x
2
+ 5
dx =
Z
3
2
2x
(x
2
2)
2
+ 1
dx =
Z
1
0
1
u
2
+ 1
du = arctan u
1
0
=
π
4
3. The hypotheses on u really are all that is necessary. In particular, u doesn’t need to be left-
/right-differentiable at the endpoints of [a, b]! For instance, with f (u) = u
2
and u(x) =
x on
[0, 4], we easily verify
8
3
=
Z
4
0
1
2
x dx =
Z
4
0
x
2
x
dx =
Z
4
0
f
u(x)
u
(x) dx =
Z
2
0
f (u) du =
Z
2
0
u
2
du =
8
3
4. Sloppy use of the substitution rule might lead to utter nonsense. For instance, consider the
‘substitution’ u = x
2
in the following:
Z
2
1
1
x
dx =
Z
2
1
1
2x
2
2x dx =
Z
4
1
1
2u
du =
1
2
(ln 4 ln 1) = ln 2
Of course the left hand integral does not exist since
1
x
is undefined at 0 (1, 2) , so the
conclusion is false. In the language of the substitution rule, f (u) =
1
2u
is not continuous on
range(u) = [0, 4]: it is not even defined at u = 0! You are very unlikely to make precisely this
mistake since the first integral is so clearly undefined, but for more complicated functions. . .
27
Hence the old adage, “Differentiation is a science; integration an art.” To illustrate via an example, consider the function
f (x) = tan(e
x
cos(3x
2
) + 4x
3
). The product and chain rules allow one to explicitly compute the derivative
d f
dx
=
1
1 + (e
x
cos(3x
2
) + 4x
3
)
2
e
x
cos(3x
2
) 6xe
x
sin(3x
2
) + 12x
2
By contrast, the integration analogues (integration by parts/substitution) are essentially useless in attempting to find an
explicit anti-derivative facilitating the integration of the same function via FTC II; for instance, the integral
Z
1
0
tan(e
x
cos(3x
2
) + 4x
3
) dx
is likely impossible to evaluate explicitly and can only be approximated (e.g. via Riemann sums).
89
Exercises 34 1. Calculate the following limits:
(a) lim
x0
1
x
Z
x
0
e
t
2
dt (b) lim
h0
1
h
Z
3+h
3
e
t
2
dt
2. Let f (t) =
0 if t < 0
t if 0 t 1
4 if t > 1
(a) Determine the function F(x) =
R
x
0
f (t) dt.
(b) Sketch F. Where is F continuous?
(c) Where is F differentiable? Calculate F
at the points of differentiability.
3. Let f be a continuous function on R.
(a) Define F(x) =
R
x+1
x1
f (t) dt. Carefully show that F is differentiable on R and compute F
.
(b) Repeat for the function G(x) =
R
sin x
0
f (t) dt.
4. Recall Examples 4.23.4 and 4.26.3. Find all anti-derivatives F of f on [0, 1) (1, 2]. How many
satisfy
R
2
0
f (x) dx = F(2) F(0)?
5. Consider integration by parts. Plainly
R
x
a
u
( t)v(t) dt is an anti-derivative of u
(x)v(x) by FTC I:
what does integration by parts say is another?
6. Use change of variables to integrate
R
1
0
x
1 x
2
dx
7. Use integration by parts and the substitution rule to evaluate
R
b
0
arcsin x dx for any b < 1.
8. Use integration by parts to evaluate
R
b
0
x arctan x dxfor any b > 0
9. Check that the assumptions of int by subs guarantee that both integrals are well-defined (i.e.
that ( f u)u
and f are integrable on the required intervals.
10. We prove a simpler version of the fundamental theorem of calculus.
(a) Suppose f is continuous on [a, b] and define F(x) =
R
x
a
f (t) dt. For any c, x [a, b] where
c = x, prove that
m
F(x) F(c)
x c
M
where m, M are the maximum and minimum values of f (t) on the closed interval bounded
by c, x. Make sure to explain why m, M exist, and use this to deduce that F
( c) = f (c) .
(b) Suppose f is continuous on [a, b] and that F is any anti-derivative of f on a, b (that is,
F
= f ). Use part (a) and the mean value theorem to prove that
R
b
a
f (t) dt = F(b) F(a).
90
36 Improper Integrals
The Riemann integral has several limitations. Even allowing for functions to be integrable on open
intervals (Exercise 32.6), the definition of
R
b
a
f (x) dx requires the following:
That (a, b) be a bounded interval.
That f be bounded on (a, b).
There is a natural way to extend the Riemann integral to unbounded intervals and functions: limits.
Definition 4.31. Suppose f : [a, b) R satisfies the following properties:
f is integrable on every closed bounded subinterval [a, t] [a, b).
Either b = , or b is finite and f is unbounded at b,
The improper integral of f on [a, b) is defined to be
Z
b
a
f (x) dx := lim
tb
Z
t
a
f (x) dx
This is convergent or divergent in the same manner as the limit.
If an integral is improper at its lower limit then
R
b
a
f (x) dx := lim
sa
+
R
b
s
f (x) dx.
If an integral is improper at both ends, choose any c (a, b) and define
Z
b
a
f (x) dx = lim
sa
+
Z
c
s
f (x) dx + lim
tb
Z
t
c
f (x) dx
provided both one-sided improper integrals exist and the limit sum makes sense.
Theorem 4.13 says that the choice of c for a doubly-improper integral is irrelevant.
Many properties of the Riemann integral transfer to improper integrals, though not all. For example,
part 1 of Theorem 4.12 extends:
Theorem 4.32. If 0 f (x) g(x) on [a, b), then
R
b
a
f
R
b
a
g, whenever the integrals exist (standard
or improper). In particular:
R
b
a
f = =
R
b
a
g =
R
b
a
g converges =
R
b
a
f converges to a value
R
b
a
g.
We leave some of the detail to Exercise 36.7.
91
Examples 4.33. 1.
R
t
0
x
2
dx =
1
3
t
3
for any t > 0. Clearly
Z
0
x
2
dx = lim
t
1
3
t
3
=
More formally, the improper integral
R
0
x
2
dx diverges to infinity.
2. With f (x) = x
4/3
defined on [1, ),
Z
1
x
4/3
dx = lim
t
Z
t
1
x
4/3
dx = lim
t
h
3x
1/3
i
t
1
= lim
t
3 3t
1/3
= 3
3. Consider f (x) =
|
x
|
e
x
2
/2
on (, ). On a bounded interval [0, t), we have
Z
t
0
f (x) dx =
Z
t
0
xe
x
2
/2
dx =
h
e
x
2
/2
i
t
0
= 1 e
t
2
/2
t
1
By symmetry, we conclude that
Z
|
x
|
e
x
2
/2
dx = 1 + 1 = 2
This example is important in probability: multiplying by
1
2π
, we have computed the the ex-
pectation of
|
X
|
when X is a normally-distributed random variable
E(
|
X
|
) =
Z
1
2π
|
x
|
e
x
2
/2
dx =
r
2
π
4. If t [0, 1), we can use our knowledge of derivatives
d
dx
sin
1
x =
1
1x
2
to evaluate
Z
1
0
1
1 x
2
dx = lim
t1
Z
t
0
1
1 x
2
dx = lim
t1
sin
1
t =
π
2
and that, moreover
R
1
1
1
1x
2
dx = π. By comparison, we see that
1
1 x
4
1
1 x
2
=
Z
1
1
1
1 x
4
dx
Z
1
1
1
1 x
2
dx = π
5. Improper integrals need not exist. For instance,
lim
t
Z
t
0
sin x dx = lim
t
1 cos t
diverges by oscillation.
92
Exercises 36 1. Use your answers from the previous section to decide whether the improper inte-
grals
R
1
0
arcsin x dx and
R
0
x arctan x dx exist. If so, what are their values?
2. Let p be a positive constant. Prove the following:
Z
1
0
1
x
p
dx =
(
1
1p
if p < 1
if p 1
Z
1
1
x
p
dx =
(
1
p1
if p > 1
if p 1
3. Explain why
R
b
a
f (x) dx = lim
tb
R
t
a
f (x) dx holds, even when f is integrable on [a, b].
4. State a version of integration by parts modified for when
R
b
a
u
(x)v(x) dx is an improper inte-
gral. Now evaluate
R
0
xe
4x
dx.
5. What is wrong with the following calculation?
Z
x dx = lim
t
1
2
x
2
t
t
= lim
t
1
2
( t
2
t
2
) = lim
t
0 = 0
6. Prove or disprove: if
R
f and
R
g are convergent improper integrals, so is
R
f g.
7. Prove part of Theorem 4.32. Suppose 0 f (x) g(x) for all x [a, b), and that
R
b
a
g is a
convergent improper integral. Prove that
R
b
a
f converges and that
R
b
a
f
R
b
a
g.
93
Generalizing the Riemann Integral (non-examinable)
In the 1890’s, Thomas Stieltjes
28
offered a generalization of the Riemann integral.
Definition 4.34. Let α be a monotonically increasing function on an interval [a, b]. Given a partition
P = {x
0
, . . . , x
n
} of [a, b] and a function f , define the differences
α
i
= α(x
i
) α(x
i1
)
The upper/lower Darboux–Stieltjes sums/integrals are defined analogously to the pure Riemann case:
U( f , P, α) =
n
i=1
M
i
α
i
L( f , P, α) =
n
i=1
m
i
α
i
U( f , α) = inf U( f , P, α) L( f , α) = sup L( f , P, α)
f is Riemann–Stieltjes integrable of class R(α) if U( f , α) = L( f , α): we denote this value
R
b
a
f (x) dα.
The standard Riemann integral corresponds to α(x) = x. It is the ability to choose other functions α
that makes the Riemann–Stieltjes integral both powerful and applicable.
Standard Properties Most results in sections 32 and 33 hold with suitable modifications, as does the
discussion of improper integrals. For instance,
f R(α) P such that U( f , P, α) L( f , P, α) < ϵ
The result regarding piecewise continuity of f is a notable exception: if f and α are simultane-
ously piecewise continuous then f might not lie in R(α).
Weighted integrals If α is differentiable, then we obtain a standard Riemann integral
Z
b
a
f (x) dα =
Z
b
a
f (x)α
(x) dx
weighted so that f (x) contributes more when α is increasing rapidly.
Probability If α(a) = 0 and α(b) = 1, then α may be viewed as a probability distribution function.Its
derivative α
is the corresponding probability density function. For example:
1. The uniform distribution on [a, b] has α =
1
ba
(x a) so that
Z
b
a
f (x) dα =
1
b a
Z
b
a
f (x) dx
Since α
is constant, the integrals weigh all values of x uniformly.
2. The standard normal distribution has α(x) =
R
x
1
2π
e
t
2
/2
dt. The fact that α
=
1
2π
e
x
2
/2
is maximal when x = 0 reflects the fact that a normally distributed variable is clustered
near its mean.
In all cases,
R
f (x) dα = E( f (X)) computes an expectation (see, for instance, Example 4.33.3).
28
Stieltjes was Dutch; for the pronunciation try ‘steelchez.’
94
Non-differentiable α A major flexibility comes when we allow α to be non-differentiable, or even dis-
continuous! For example, given a partition Q = {s
0
, . . . , s
n
} of [a, b], and a positive sequence
( c
k
)
n
k=1
, define
α(x) =
0 if x = a
k
i=1
c
i
if x (s
k1
, s
k
]
This is an increasing step function on [a, b]. The Riemann–Stieltjes integral becomes a weighted
sum
Z
b
a
f (x) dα =
n
i=1
c
i
f (s
i
)
Taking instead an infinite sequence (s
n
) [a, b] results in an infinite series, which helps explain
why so many results for series and integrals look similar!
This also touches on probability. For example, let p [ 0, 1], n N, and s
k
= k on the interval
[0, n]. If c
k
=
(
n
k
)
p
k
(1 p)
nk
, then
Z
f (x) dα =
n
k=0
n
k
p
k
(1 p)
nk
f (x) = E( f (X) )
is the expectation of f (X) when X B(n, p) is a binomially distributed random variable.
Integrals and Convergence
The Lebesgue integral is another common generalization. Its main purpose is to permit the transfer
of integrability to the limit of a sequence of integrable functions.
29
To see the problem, consider the
sequence
f
n
: [0, 1] R : x 7
(
1 if x =
p
q
Q with q n
0 otherwise
Each f
n
is piecewise continuous and thus Riemann integrable with
R
1
0
f
n
(x) dx = 0. However, the
pointwise limit of f
n
is the function
f (x) =
(
1 if x Q
0 if x ∈ Q
which is not Riemann integrable. In the Lebesgue theory, the limit f turns out to be integrable with
integral 0, so that
lim
n
Z
1
0
f
n
(x) dx =
Z
1
0
lim
n
f
n
(x) dx
Recall that the interchange of limits and integrals would be automatic if the convergence f
n
f
were uniform: of course the convergence isn’t uniform here.
29
Recall how uniform convergence does this for continuity.
95