Math 140B - Notes

Neil Donaldson

Fall 2022

1 Continuity

The primary goal of this course is to make elementary calculus rigorous. We begin with a review of

some basic concepts and conventions.

Sets & Functions We are concerned with functions f : U → V where both U, V are subsets of the

real numbers R:

Domain dom( f ) = U; the inputs to f . Often implied to be the largest set on which a formula is

deﬁned. In calculus examples, the domain is typically a union of intervals of positive length.

Codomain codom( f ) = V. We often take V = R by default.

Range range( f ) = f (U) = {f (x) : x ∈ U}; the outputs of f and a subset of V.

Injectivity f is injective/one-to-one if f (x) = f (y) =⇒ x = y.

Surjectivity f is surjective/onto if f (U) = V.

Inverses f is bijective/invertible if it is injective and surjective. Equivalently, ∃f

−1

: V → U satisfying

∀u ∈ U, f

−1



f (u)



= u and ∀v ∈ V, f



−1

( v)



= v

Example 1.1. The function deﬁned by f (x) =

x(x−2)

has implied

dom( f ) = R \{0, 2} = (−∞, 0) ∪ (0, 2) ∪ (2, ∞)

range( f ) = (−∞, −1] ∪ (0, ∞)

The function is neither injective nor surjective.

By restricting the domain/codomain, we obtain a bijection:

dom(

f ) = [1, 2) ∪ (2, ∞)

codom(

f ) = ( −∞, −1] ∪ (0, ∞)

with inverse

−1

( y) =

(

1 + y

−1

y + 1 if y > 0

1 −y

−1

y + 1 if y ≤ −1

Now dom(

−1

) = codom(

f ) and codom(

−1

) = dom(

f ).

−2

−1

f (x)

−1 1 2 3

−2 −1 0 1 2

−1

(y)

Suprema and Inﬁma A set U ⊆ R is bounded above if it has an upper bound M:

∃M ∈ R such that ∀u ∈ U, u ≤ M

Axiom 1.2 (Completeness). If U ⊆ R is non-empty and bounded above then it has a least upper

bound, the supremum of U

sup U = min



M ∈ R : ∀u ∈ U, u ≤ M



By convention, sup U = ∞ if U is unbounded above and sup ∅ = −∞; now every subset of R has a

supremum. Similarly, the inﬁmum of U is its greatest lower bound:

inf U =











max



m ∈ R : ∀u ∈ U, u ≥ m



if U = ∅ is bounded below

−∞ if U = ∅ is unbounded below

∞ if U = ∅

Examples 1.3. Here are four sets with their suprema and inﬁma stated. You should be able to verify

these assertions directly from the deﬁnitions.

U {1, 2, 3, 4} (0, 5) (−∞, π] R {

: n ∈ N}

sup U 4 5 π ∞ 1

inf U 1 0 −∞ −∞ 0

Note how the supremum/inﬁmum might or might not lie in the set itself.

Interiors, closures, boundaries and neighborhoods These last concepts might not be review, but

they will be used repeatedly.

Deﬁnition 1.4. Let U ⊆ R. A value a ∈ R is interior to U if it lies in some open subinterval of U:

∃δ > 0 such that (a − δ, a + δ) ⊆ U

A neighborhood of a is any set to which a is interior: the interval (a −δ, a + δ) is an open δ-neighborhood

of a. A punctured neighborhood of a is a neighborhood with a deleted.

The set of points interior to U is denoted U

◦

A limit point of U is the limit of some sequence (x

) ⊆ U. The closure U is the set of limit points.

The boundary is the set ∂U = U \U

◦

Examples 1.5. 1. If U = [ 1, 3), then U

◦

= (1, 3), U = [1, 3] and ∂U = {1, 3}.

2. Q

◦

= ∅ and ∂Q = Q = R.

3. (−3, 5) ∪ (5, 7] is a punctured neighborhood of 5.

17 Continuity of Functions

Everything in this section

should be review.

Deﬁnition 1.6. A function f : U → R is continuous at u ∈ U if either of the following hold:

1. For all sequences (x

) ⊆ U converging to u, the sequence ( f (x

)) converges to f (u).

2. ∀ϵ > 0, ∃δ > 0 such that ∀x ∈ U,

x −u

< δ =⇒

f (x) − f (u)

< ϵ.

A function f is continuous on U if it is continuous at every point u ∈ U.

Examples 1.7. 1. We prove that f (x) = x

is continuous at u = 2.

(a) (Limit method) Let x

→ 2. By the limit laws (i.e. lim(x

) =

(

lim x

)

lim

→2

f (x

) = lim

→2



lim

→2



= 2

= f (2)

(b) (ϵ –δ method) Let ϵ > 0 be given and let δ = min





x −2

< δ =⇒

x −2

< 1 =⇒ 1 < x < 3

from which



−2



x −2



+ 2x + 2



< 19

x −2

≤ ϵ

where we used the triangle inequality.

2. Let g(x) =

(

x sin

if x = 0,

0 if x = 0

Then g is continuous at x = 0. Again this can be done with limits

or an ϵ–δ argument; both are essentially the squeeze theorem.

3. The function deﬁned by

h(x) =

(

1 + 2x

if x < 1

2 − x if x ≥ 1

is discontinuous at x = 1.

(a) The sequence with x

= 1 −

converges to 1, yet

lim h(x

) = 3 = 1 = h(1)

(b) Choose ϵ = 1 and suppose δ > 0 is given. Now choose

x = max{1 −

√

} to see that

x −1

< δ and

h(x) − h(1)

≥ 1 = ϵ

g(x)

h(x)

0 1 2

Section numbers are identical to those in the ofﬁcial textbook.

Theorem 1.8. The two parts of Deﬁnition 1.6 are equivalent.

Proof. (1 ⇒ 2) We prove the contrapositive. Suppose condition 2 is false; that is,

∃ϵ > 0, such that ∀δ > 0, ∃x ∈ U with

x −u

< δ and

f (x) − f (u)

≥ ϵ

In particular, for any n ∈ N we may let δ =

to obtain

∃ϵ > 0, such that ∀n ∈ N, ∃x

∈ U with

−u

and

f (x

) − f (u)

≥ ϵ

The sequence (x

) shows that condition 1 is false:

• ∀n,

−u

whence x

→ u.

• ∀n,

f (x

) − f (u)

≥ ϵ > 0, whence f (x

) does not converge to f (u).

(2 ⇒ 1) Suppose condition 2 is true, that (x

) ⊆ U converges to u and that ϵ > 0 is given. Then

∃δ > 0 such that

x −u

< δ =⇒

f (x) − f (u)

< ϵ

However, by the deﬁnition of convergence (x

→ u),

∃N ∈ N such that n > N =⇒

−u

< δ =⇒

f (x

) − f (u)

< ϵ

Otherwise said, f (x

) → f (u).

Rather than use these deﬁnitions every time, it is helpful to have a working dictionary.

Theorem 1.9 (Common Continuous Functions).

1. Suppose f and g are continuous at u, that h is continuous at f (u) and that k is constant. Then

the following are continuous at u (if deﬁned):

f + g, f − g, f g,

, k f , max( f , g), min( f , g), h ◦ f

2. Algebraic

functions are continuous.

3. The common transcendental functions are continuous: exp, ln, sin, etc.

Example 1.10. f (x) = sin

√

x−2

+ cos

−1

is continuous on its domain (−∞, 0) ∪ (0, 1) ∪ (1, ∞).

These claims are tedious to prove using elementary deﬁnitions. The ﬁrst two require many uses

of the limit laws, while the transcendental claim is easier to defer until we can deﬁne the common

functions using power series, after which continuity comeS for free.

Constructed using ﬁnitely many addition/subtraction, multiplication/division and n

root operations

Exercises 17 1. Give examples to show that g ◦ f being continuous can happen with:

(a) f continuous and g discontinuous.

(b) g continuous and f discontinuous.

You may use pictures, but make sure they clearly describe the functions f , g.

2. (a) Prove that the function f (x) = x

is continuous at x = −2 using an ϵ–δ argument.

(b) Prove that f (x) = x

is continuous at x = u using an ϵ–δ argument.

3. Prove that the following are discontinuous at x = 0: use both deﬁnitions of continuity.

(a) f (x) = 1 for x < 0 and f (x) = 0 for x ≥ 0.

(b) g(x) = sin(1/x) for x = 0 and g(0) = 0.

4. Suppose f and g are continuous at u. Prove the following using ϵ –δ arguments.

(a) f − g is continuous at u.

(b) If h is continuous at f (u), then h ◦ f is continuous at u.

5. Contrary to our standing assumption, suppose f : U → R is a function whose domain U

contains an isolated point a: i.e. ∃r > 0 such that (a − r, a + r) ∩ U = {a}. Prove that f is

continuous at a.

6. Refresh your prerequisites by giving formal proofs of the following:

(a) (Suprema and sequences) If M = sup U, then ∃(x

) ⊆ U such that x

→ M.

(Remember that this has to work even if M = ∞. . . )

(b) (Limit of a bounded sequence) If (x

) ⊆ [a, b] and x

→ x, then x ∈ [a, b].

(Hint: If (x

) ⊆ [a, b] , explain why there exists a family of nested intervals I

⊇ I

⊇ ···

such that inﬁnitely many of the terms (x

) lie in each interval I

. Hence obtain a subsequence

) and prove that it is Cauchy.

)

7. (Hard) Consider the function f : R → R where

f (x) =

(

whenever x =

∈ Q with q > 0 and gcd(p, q) = 1

0 if x ∈ Q

For example, f (1) = f (2) = f (−7) = 1, and f (

) = f (−

) = f (

) = ··· =

, etc. Prove that f

is continuous at each point of R \Q and discontinuous at each point of Q.

This is a good moment to review the notion of a Cauchy sequence

∀ϵ > 0, ∃N such that m, n > N =⇒

− x

< ϵ

and the discussion of Cauchy completeness: (x

) ⊆ R is convergent if and only if it is Cauchy.

18 Properties of Continuous Functions

The goal of this section is to describe the behavior of a continuous function on an interval. We ﬁrst

consider the special case when the domain is a closed bounded interval [a, b].

Theorem 1.11 (Extreme Value Theorem). A continuous function on a closed, bounded interval is

bounded and attains its bounds. Otherwise said, if f : [a, b] → R is continuous, then

∃x, y ∈ [a, b] such that f (x) = sup range( f ) and f (y) = inf range( f )

In particular, the supremum and inﬁmum are ﬁnite.

Proof. Suppose f is continuous with domain [a, b] and let M = sup{f (x) : x ∈ [a, b]}. We invoke the

three parts of Exercise 17.6:

• (Part a) There exists a sequence (x

) ⊆ [a, b] such that f (x

) → M.

• (Part c) There exists a convergent subsequence (x

) with limit x.

• (Part b) x ∈ [a, b] .

Since f is continuous, we now have f (x) = lim

k→∞

f (x

) = M. This shows that M is ﬁnite and that f

attains its least upper bound. For the lower bound, apply this to −f .

It is worth considering how the result can fail when one of the hypotheses is weakened. For example:

f discontinuous f : [0, 1] → R : x 7→

(

x if x = 1

0 if x = 1

is bounded but does not attain its bounds.

dom( f ) not closed f : [0, 1) → R : x 7→ x is bounded but does not attain its bounds.

dom( f ) not bounded f : [0, ∞) → R : x 7→ x is unbounded.

We now generalize to functions on arbitrary intervals. Our next result should be familiar from ele-

mentary calculus and is intuitively obvious from the na

ıve notion of continuity: graph such a func-

tion without taking your pen from the page.

Theorem 1.12 (Intermediate Value Theorem). Let f : I → R be continuous on an interval I. Sup-

pose a, b ∈ I with a < b and that f (a) = f (b). If L lies between f (a) and f (b), then ∃ξ ∈ (a, b) such

that f (ξ) = L.

Example 1.13. Let f (x) = cos x with a =

, b = 3π

and L =

; then

f (ξ) = L ⇐⇒ ξ ∈



5π

7π



There may therefore be several suitable values of ξ. It is

even possible (see Exercise 18.2) for there to be inﬁnitely

many.

−1

f (x)

a bπ 2π

Proof. Suppose WLOG that f (a) < L < f (b) and let

S = {x ∈ [a, b] : f (x) < L}

Plainly S ⊆ [a, b) is non-empty, hence ξ := sup S exists

and ξ ∈ [a, b]. It remains to show that ξ satisﬁes the

required properties.

By Exercise 6, ∃(s

) ⊆ S with lim s

= ξ. Since f is

continuous, f (ξ) = lim f (s

) ≤ L. In particular, ξ = b.

a b

f (a)

f (b)

To ﬁnish the proof, we can play a similar game with the sequence deﬁned by t

= min{b, ξ +

}; this

is left to Exercise 4.

Example 1.14. The intermediate value theorem is particularly useful for demonstrating the exis-

tence of solutions to equations. For example, we can use the following steps to show that the equation

= 1 has a solution.

• g(x) = x2

−1 is continuous.

• g(0) = −1 < 0.

• g(1) = 1 > 0.

• By the intermediate value theorem ∃ξ ∈ (0, 1) such

that g(ξ) = 0: that is ξ ·2

= 1.

−1

g(x)

It is inefﬁcient, but one can home in on ξ by repeatedly halving the size of the interval: for instance,

) =

√

−1 < 0, g(

) =

·2

3/4

−1 ≈ 0.26 > 0 . . . =⇒

< ξ <

We ﬁnish with a useful corollary.

Corollary 1.15. Continuous functions map intervals to intervals (or points).

Proof. An interval I is characterized by the following property

∀x

, x

∈ I, x ∈ R, x

< x < x

=⇒ x ∈ I

Let f : I → R be continuous and suppose its range f (I) is not a single point. If f (a) < L < f (b), then

∃ξ between a, b such that f (ξ) = L. Otherwise said, L ∈ f (I) and so f (I) is an interval.

More generally, if dom( f ) =

is written as a union of disjoint intervals and f is continuous, then

range( f ) =

[

f (I

)

is also a union of intervals, though these need not be disjoint: a continuous function can bring inter-

vals together, but cannot break an interval apart.

For example, f (x) =

√

−4 has domain (−∞, −2] ∪ [2, ∞) and range [0, ∞): both original intervals

get mapped to the same interval by f .

A more general statement from topology says that if f : U → V is continuous between topological

spaces and a, b lie in the same component of U, then f (a) and f (b) lie in the same component of f (U).

In single-variable real analysis each component is an interval.

Exercises 18 1. Give examples of the following:

(a) An unbounded discontinuous function on a closed bounded interval.

(b) An unbounded continuous function on a non-closed bounded interval.

bounds.

2. Consider the function f (x) =

(

x sin

if x = 0

0 if x = 0

(a) Explain why f is continuous on any interval I.

(b) Suppose a < 0 < b and that f (a), f (b) have opposite signs. If L = 0, show that the

intermediate value theorem is satisﬁed by inﬁnitely many distinct values ξ.

3. Use the intermediate value theorem to prove that the equation 8x

−12x

−2x + 1 = 0 has at

least 3 real solutions (and thus, by the fundamental theorem of algebra, exactly 3).

4. Complete the proof of the intermediate value theorem by deﬁning t

= min(b, ξ +

5. (a) Suppose f : U → R is continuous and that U =

k=1

is the union of a ﬁnite sequence (I

)

of closed bounded intervals. Prove that f is bounded and attains its bounds.

(b) Let U =

∞

n=1

, where I

= [

2n−1

] for each n ∈ N. Give an example of a continuous

function f : U → R which is either unbounded or does not attain its bounds. Explain.

19 Uniform Continuity

Recall the ϵ–δ deﬁnition of continuity: f : U → R is continuous at all points

y ∈ U, we require

∀y ∈ U, ∀ϵ > 0, ∃δ > 0 such that (∀x ∈ U)

x −y

< δ =⇒

f (x) − f (y)

< ϵ

Note the order of the quantiﬁers: δ is permitted to depend on both y and ϵ. In the na

ıve sense of

continuity (x close to y =⇒ f (x) close to f (y)), the meaning of close is seen to depend on the location

y. Uniform continuity is a stronger condition where the meaning of ‘close’ is independent of location.

Deﬁnition 1.16. f : U → R is uniformly continuous if

∀ϵ > 0, ∃δ > 0 such that (∀x, y ∈ U)

x −y

< δ =⇒

f (x) − f (y)

< ϵ

We’ve included the (typically) hidden quantiﬁers (∀x, y) in both deﬁnitions to make clear that ϵ and

δ are independent of x and y. Note also that the deﬁnition is now symmetric in x and y.

Example 1.17. Consider f (x) =

1. If 0 < a < b ≤ ∞ , then f is uniformly continuous on [a, b).

Let ϵ > 0 be given and let δ = a

ϵ. Then ∀x, y ∈ [a, b),

x −y

< δ =⇒



−



y − x



≤

= ϵ

2. If 0 < b ≤ ∞, then f is not uniformly continuous on (0, b).

Let ϵ = 1 and suppose δ > 0 is given.

Let x = min(δ, 1,

) and y =

Certainly x, y ∈ (0, b) and

x −y

≤

< δ. However,

f (x) − f (y)

≥ 1 = ϵ

f (x)

a b

Think about how ϵ and δ must relate as one slides the intervals in the picture up/down and left/right.

Some intuition will help make sense of the above examples.

Bounded/unbounded gradient In part 1, ϵ = δa

, where

′

(a)

bounds the gradient of f .

By contrast, the slope of f is unbounded in part 2.

Extendability In part 1 (if b = ∞), the domain of f may be extended: g : [a, b] → R : x 7→

continuous. In part 2, this is impossible: there is no continuous function g : [0, b) → R such

that g(x) =

whenever x > 0.

If the gradient of a continuous function is bounded or if you can ‘ﬁll in the holes’ at the endpoints of

its domain, then the function is uniformly continuous. While the utility of uniform continuity is often

in proofs when the independence of ϵ and location are critical, it is often one of the above properties

that is being invoked. The remainder of this section involves making thse observations watertight.

To promote the symmetry in the coming deﬁnition, we use y instead of u for a generic point of dom( f ).

Theorem 1.18. Let f : I → R be differentiable on an interval I. If the derivative f

′

is bounded on

the interior I

◦

, then f is uniformly continuous on I.

The proof depends on the mean value theorem, which we’ll prove later in the term.

Proof. Suppose

′

(x)

≤ M on I

◦

. Let ϵ > 0 be given, let δ =

and suppose x, y ∈ I with x > y.

Then

x −y

< δ =⇒ ∃ξ ∈ I

◦

such that f

′

( ξ) =

f (x) − f (y)

x −y

(MVT)

=⇒

f (x) − f (y)



′

( ξ)



x −y

< Mδ = ϵ

Theorem 1.18 isn’t a biconditional: for instance, Exercise 19.5 shows that f (x) =

√

x on [0, ∞) and

g(x) = x

1/3

on R are both uniformly continuous even though they have unbounded slope.

We now discuss the idea of extendability and how uniform continuity relates to continuity on closed

sets. First we see that for closed bounded sets, uniform continuity is nothing new.

Theorem 1.19. If g : [a, b] → R is continuous, then it is uniformly continuous.

Proof. Suppose g is continuous but not uniformly so. Then

∃ϵ > 0 such that ∀δ > 0, ∃x, y ∈ [a, b] for which

x −y

< δ and

g(x) − g(y)

≥ ϵ (∗)

For each n ∈ N, let δ =

to see that there exists sequences (x

), (y

) ⊆ [a, b] satisfying the above.

By Bolzano–Weierstraß, the bounded sequence (x

) has a convergent subsequence x

→ x ∈ [a, b].

Clearly

−y

→ 0 =⇒ y

→ x

But then

g(x

) − g( y

)

→ 0 which contradicts (∗).

Now we build to a partial converse of this.

Lemma 1.20. If f : U → R is uniformly continuous and (x

) ⊆ U is a Cauchy sequence, then

( f (x

)) is also Cauchy.

Proof. Let ϵ > 0 be given. Then:

• (Uniform Continuity) ∃δ > 0 such that

x −y

< δ =⇒

f (x) − f (y)

< ϵ.

• (Cauchy) ∃N ∈ N such that m, n > N =⇒

− x

< δ.

Putting these together, we see that

∃N ∈ N such that m, n > N =⇒

f (x

) − f (x

)

< ϵ

Otherwise said, ( f (x

)) is Cauchy.

We now see that a function f : I → R is uniformly continuous on a bounded interval if and only it is

has a continuous extension g : I → R deﬁned on the closure of its domain.

Theorem 1.21. Suppose f : I → R is continuous where I is a bounded interval with endpoints

a < b. Deﬁne g : [a, b] → R via

g(x) =











f (x) if x ∈ I

lim f (x

) whenever (x

) ⊆ I and x

→ a

lim f (x

) whenever (x

) ⊆ I and x

→ b

Then f is uniformly continuous if and only g is well-deﬁned (g is continuous, if well-deﬁned).

Proof. (⇒) Suppose f is uniformly continuous on I and that a ∈ I. Let (x

), (y

) ⊆ I be sequences

converging to a. To show that g is well-deﬁned, we must prove that ( f (x

)) and ( f (y

)) are

convergent, and to the same limit.

Deﬁne a sequence

( u

) = (x

, y

, x

, y

, x

, y

, . . .)

Since (x

) and (y

) have the same limit a, we see that u

→ a. But then ( u

) is Cauchy;

by Lemma 1.20, ( f (u

)) is also Cauchy and thus convergent. Since ( f (x

)) and ( f (y

)) are

subsequences of a convergent sequence, they must also converge to the same (ﬁnite!) limit.

The argument when b ∈ I is identical.

(⇐) Certainly if g is well-deﬁned then it is continuous. By Theorem 1.19 it is uniformly so. Since

f = g on a subset of dom(g), the same choice of δ will work for f as for g: f is therefore

uniformly continuous.

Examples 1.22. 1. f : x 7→ x

is uniformly continuous on (−3, 10) since its derivative f

′

(x) = 2x

is bounded (

′

(x)

= 2

≤ 20) on its domain. It has the obvious continuous extension

g(x) = x

on [−3, 10].

2. Neither argument works for f (x) = x

on the domain (−3, ∞): both f

′

and the domain (−3, ∞)

are unbounded, so neither Theorem 1.18 nor 1.21 applies.

Instead, note that if ϵ = 1, then for any δ > 0, we can choose x =

and y =

. Clearly

x −y

< δ and



−y



= 1 +

> 1 = ϵ

whence f is not uniformly continuous.

3. f (x) = x sin

is continuous on the interval (0, ∞ ). Strictly, neither Theorem 1.18 nor 1.21

applies since the derivative

′

(x) = sin

−

cos

is unbounded as is the domain. However, by breaking the domain into two pieces. . .

• On (1, ∞), the derivative is bounded:

′

(x)

≤ 1 +

≤ 2 by the triangle inequality.

Theorem 1.18 says f is uniformly continuous on (1, ∞).

• f is continuous on (0, 1] and, by the squeeze theorem

→ 0

=⇒ lim f (x

) = 0

Extending f so that f (0) = 0 deﬁnes a continuous extension. By Theorem 1.21, f is uni-

formly continuous on ( 0, 1].

Putting this together, f is uniformly continuous on (0, ∞). Indeed the function

h(x) =

(

x sin

if x = 0

0 if x = 0

is uniformly continuous on R.

Exercises 19 1. Which of the following functions are uniformly continuous on the speciﬁed set?

Justify your answers.

(a) f (x) = x

on [−1, 1].

(b) f (x) = x

on (−1, 1].

−4

on ( 0, 2].

(d) f (x) = x

−4

on ( 1, 2].

(e) f (x) = x

sin

on ( 0, 1].

2. Prove that each of the following functions is uniformly continuous on the indicated set by

verifying the ϵ–δ property.

(a) f (x) = 2x −14 on R.

(b) f (x) = x

on [1, 5].

−1

on ( 1, ∞).

(d) f (x) =

x+1

x+2

on [0, 1].

3. Prove that f (x) = x

is not uniformly continuous on R.

4. (a) Suppose that f is uniformly continuous on a bounded interval I. Prove that f is bounded

on I.

(b) Use part (a) to write down a bounded interval on which the function f (x) = tan x is

deﬁned, but not uniformly continuous.

5. (a) Let f (x) =

√

x with domain [0, ∞). Show that f

′

(x) is unbounded, but that f is still

uniformly continuous on [0, ∞).

(Hint: try δ = ϵ

and WLOG assume 0 ≤ y ≤ x. Now compute (

√

y + ϵ)

. . . )

(b) Prove that g(x) = x

1/3

is uniformly continuous on R.

(Hint: try δ = (

)

and consider the cases x ≥ y ≥ 0, x ≤ y ≤ 0 and x > 0 > y separately)

20 Limits of Functions

You’ve likely seen many calculations of the following form in elementary calculus:

lim

x→3

−9

x −3

= lim

x→3

(x −3)(x + 3)

x −3

= lim

x→3

(x + 3) = 6

Our next goal is to make this notation precise and to tie it to our earlier notion of limit.

Deﬁnition 1.23. Suppose f : U → R, that S ⊆ U, and that a is the limit of a sequence

in S.

We write lim

x→a

f (x) = L and say that L is the limit of f (x) as x tends to a along S, provided

∀(x

) ⊆ S, lim x

= a =⇒ lim f (x

) = L

We can now deﬁne one-sided and two-sided limits:

Right-hand limit: lim

x→a

f (x) = L means ∃S = (a, b) ⊆ U for which lim

x→a

f (x) = L

Left-hand limit: lim

x→a

−

f (x) = L means ∃S = (c, a) ⊆ U for which lim

x→a

f (x) = L

Two-sided limit: lim

x→a

f (x) = L means ∃S = (c, a) ∪ (a, b) ⊆ U for which lim

x→a

f (x) = L

• The one-sided deﬁnitions apply when a = ±∞, though we omit the ± modiﬁers: for instance,

lim

x→∞

f (x) = L ⇐⇒ lim

x→∞

f (x) = L for some S = (c, ∞) ⊆ U

• The subtlety in the deﬁnition is that for lim

x→a

f (x) to be deﬁned, the domain U of f must contain

a punctured neighborhood S of a: i.e. a ∈ U

◦

. The one-sided limits similarly require a one-sided

punctured neighborhood. These conditions are always satisﬁed if U is a disjoint union of intervals

of positive length, in which case lim

x→a

(±)

f (x) = L if and only if

lim f (x

) = L, ∀(x

) ⊆ U \ {a} tending to a (from above/below)

In this situation, Deﬁnition 1.6 recovers the familiar idea from elementary calculus:

f is continuous at a ∈ U ⇐⇒ f (a) =







lim

x→a

f (x) when a ∈ U

◦

lim

x→a

f (x) when a ∈ U \ ∂U

(∗)

• By modifying the proof of Theorem 1.8 in the case that a, L ∈ R are ﬁnite, the above can be

written in ϵ-language. For example lim

x→a

f (x) = L means

∀ϵ > 0, ∃δ > 0 such that (∀x ∈ R) 0 <

x −a

< δ =⇒

f (x) − L

< ϵ

If a and/or L is inﬁnite, use the language of unboundedness: e.g. lim

x→a

f (x) = ∞ means

∀M > 0, ∃δ > 0 such that 0 <

x −a

< δ =⇒ f (x) > M

There are ﬁfteen distinct combinations: three two-sided and six each of the one-sided limits!

I.e. a ∈ S or perhaps a = ±∞ if S is unbounded.

Examples 1.24. 1. Let f (x) =

2+x

where dom( f ) = U = R \ {0} = (−∞, 0) ∪ (0, ∞)

The following should be clear:

lim

x→3

f (x) =

lim

x→∞

f (x) = 1

To compute the ﬁrst, for instance, we could choose S = (0, 3) ∪ (3, ∞); if (x

) ⊆ S and x

→ 3,

then the limit laws justify the ﬁrst claim

lim

n→∞

f (x

) =

2 + 3

as does the fact that f is continuous at x = 3. The second claim can be checked similarly.

We can take one-sided limits at x = 0:

lim

x→0

f (x) = ∞ and lim

x→0

−

f (x) = −∞

For instance, let (x

) ⊆ (0, ∞) satisfy x

→ 0. Again, the

limit laws show that lim

n→∞

f (x

) = ∞, which is enough to

justify the ﬁrst claim.

Finally, the sequences deﬁned by x

and y

= −

both lie in S = R \{0} and converge to zero, yet

lim

n→∞

f (x

) = ∞ = −∞ = lim

n→∞

f (y

)

It follows that the two-sided limit lim

x→0

f (x) does not exist.

−9

−6

−3

f (x)

−2 −1 1 2

f (x

)

f (y

)

2. Let f (x) =

whenever x = 0 and additionally let f (0) = 0. Here the two-sided limit exists

lim

x→0

f (x) = ∞

However the value of the function at x = 0 does not equal this limit: clearly f is discontinuous

at x = 0.

3. We revisit our motivating example. Let f (x) =

−9

x−3

have domain U = R \ {3}. Whenever

= 3, we see that

f (x

) =

−3) (x

+ 3)

−3

= x

+ 3

By the limit laws, we conclude that lim f (x

) = 3 + 3 = 6 and so

lim

x→3

−9

x −3

= 6

Since we referenced the limit laws so often in the above examples, it is appropriate to update them

to this new context. We do so without proof.

Corollary 1.25 (Limit Laws). Suppose f , g : U → R satisfy L = lim

x→a

f (x) and M = lim

x→a

g(x) exist.

Then,

1. lim

x→a

( f + g)(x) = L + M.

2. lim

x→a

( f g)(x) = LM.

3. lim

x→a





(x) =

(requires M = 0).

4. If L ∈ R and h is continuous at L, then lim

x→a

(h ◦ f )(x) = h(L).

5. (Squeeze Theorem) If L = M and f (x) ≤ h(x) ≤ g(x) for all x ∈ U, then lim

x→a

h(x) = L.

The corresponding results for one-sided limits also hold.

As with the original limit laws for sequences, parts 1–3 apply provided the limits are not indeterminate

forms (e.g. ∞ − ∞, 0 · ∞,

∞

). We’ll see later how l’H

opital’s rule may be applied to such cases.

Examples 1.26. 1. Since f (x) =

−2

is a rational function (continuous at all points of its domain),

we quickly conclude that

lim

x→2

+ 5

−2

= f (2) =

Alternatively, we may tediously invoke the other parts of the theorem:

lim

x→2

+ 5

−2

(3)

lim(x

+ 5)

lim( 3x

−2)

(1)

lim x

+ lim 5

lim 3x

−lim 2

(2)

(lim x)

+ 5

(lim 3)(lim x)

−2

+ 5

3 ·2

−2

2. As x → ∞, the simplistic approach results in a nonsense indeterminate form:

lim

x→∞

+ 5

−2

lim(x

+ 5)

lim( 3x

−2)

∞

However, a little pre-theorem algebra quickly yields

lim

x→∞

+ 5

−2

= lim

x→∞

1 + 5x

−2

3 −2x

−2

lim( 1 + 5x

−2

)

lim( 3 −2x

−2

)

Be careful! The expressions

−2

and

1+5x

−2

3−2x

−2

do not describe the same function, yet their limits at ∞ are equal. Being

able easily to equate these limits is one of the advantages of the ‘∃S’ formulation of Deﬁnition 1.23. Think about why; what

is a suitable set S in this context?

Classiﬁcation of Discontinuities

We ﬁnish this section by considering the ways in which a function can fail to be continuous.

Deﬁnition 1.27. Suppose that a function is continuous on an interval except at ﬁnitely many values:

we call these isolated discontinuities.

Examples 1.28. 1. f (x) =

has a discontinuity at x = 0 since it is continuous on the interval R,

except at one point x = 0. Note that a function need not be deﬁned at a discontinuity!

2. f (x) =

sin

has a non-isolated discontinuity at x = 0: on any interval containing zero, f has

inﬁnitely many discontinuities: x =

πn

where

∈ N.

The next result helps us classify isolated discontinuities.

Theorem 1.29. Let f : U → R and suppose a ∈ U

◦

is an interior point. Then

lim

x→a

f (x) = L ⇐⇒ lim

x→a

f (x) = L = lim

x→a

−

f (x)

Proof. (⇒) Let S = (c, a) ∪ (a, b) satisfy the deﬁnition for lim

x→a

f (x) = L. Since any sequence (say) in

is also in S, plainly S

= (a, b) and S

−

= (c, a) satisfy the one-sided deﬁnitions.

(⇐) Suppose S

−

= (c, a) and S

= (a, b) satisfy the one-sided deﬁnitions and denote S = S

−

∪ S

Let (x

) ⊆ S be such that x

→ a. Clearly (x

) is the disjoint union of two subsequences

) ∩S

and (x

) ∩S

−

, both of which

converge to a. There are three cases:

L ﬁnite: Let ϵ > 0 be given. Because of the one-sided limits,

• ∃N

such that n > N

and x

> a =⇒

f (x

) − L

< ϵ

• ∃N

such that n > N

and x

< a =⇒

f (x

) − L

< ϵ

Now let N = max(N

, N

) in the deﬁnition of limit to see that lim f (x

) = L. Since this

holds for all sequences (x

) ⊆ S converging to a, we conclude that lim

x→a

f (x) = L.

L = ±∞: This is an exercise.

Example 1.30. Recalling elementary calculus, we show that the following is continuous at x = 1:

f (x) =

(

−3 if x ≥ 1

3 −5x if x < 1

Step 1: Compute the left- and right-handed limits and check that these are equal:

lim

x→1

−

f (x) = lim

x→1

−

3 −5x = −2, lim

x→1

f (x) = lim

x→1

−3 = −2

Step 2: Check that the value of the limits equals that of the function: f (1) = 1

−3 = −2.

It is possible for one of these subsequences to be ﬁnite; say if x

> a for all large n. This is of no concern; one of the ϵ-N

conditions would be empty and thus vacuously true.

Recalling (∗) on page 13, we describe the different types of isolated discontinuity at some point a.

Removable discontinuity The two-sided limit lim

x→a

f (x) = L is ﬁ-

nite, and either:

f (a) = L or f (a) is undeﬁned.

The term comes from the fact that we can remove the discon-

tinuity by changing the behavior of f only at x = a:

f (x) :=

(

f (x) if x = a

lim

x→a

f (x) if x = a

is now continuous at x = a. In the pictures,

(x) =

−9

x −3

and f

(x) =

(

x sin(

) if x = 0

1 if x = 0

have removable discontinuities at x = 3 and 0 respectively.

(x)

Jump Discontinuity The one-sided limits are ﬁnite but not equal. A

jump discontinuity cannot be removed by changing or insert-

ing a value at x = a. The picture shows

g(x) =

(

1 if x > 0

−1 if x < 0

with a jump discontinuity at x = 0.

g(x)

Inﬁnite discontinuity The one-sided limits exist but at least one is

inﬁnite. We call the line x = a a vertical asymptote. The picture

shows

h(x) =

with an inﬁnite discontinuity x = 0. The fact that the one-

sided limits of h are equal (and inﬁnite) is irrelevant.

h(x)

Essential discontinuity At least one of the one-sided limits does

not exist. The picture shows j(x) = sin

for which neither of

the limits lim

x→0

j(x) exist.

j(x)

It is also reasonable to refer to removable, inﬁnite or essential discontinuities at interval endpoints.

Exercises 20 1. For the function f (x) =

, determine the limits lim

x→∞

f (x), lim

x→−∞

f (x), lim

x→0

−

f (x),

lim

x→0

f (x) and lim

x→0

f (x), if they exist.

2. Evaluate the following limits using the methods of this section

(a) lim

x→a

√

x −

√

x −a

(b) lim

x→a

−3/2

− a

−3/2

x −a

x→0

√

1 + 3x

−1

(d) lim

x→−∞

√

4 + 3x

−2

3. Suppose that the limits L = lim

x→a

f (x) and M = lim

x→a

g(x) exist.

(a) Suppose f (x) ≤ g(x) for all x in some interval (a, b). Prove that L ≤ M.

(b) Do we have the same conclusion if we have f (x) < g(x) on (a, b), or can we conclude that

L < M? Prove your assertion, or give a counter-example.

4. Suppose that lim

x→∞

f (x) = lim

x→∞

g(x) = ∞. Using only this information, which of the following

can you evaluate? Prove your assertions in each case.

(a) lim

x→∞

( f + g)(x) (b) lim

x→∞

( f − g)(x)

x→∞

( f g)(x) (d) lim

x→∞

( f /g)(x)

5. Complete the proof of Theorem 1.29 by considering the L = ±∞ cases.

6. Graph f : R → R, ﬁnd and identify the types of its discontinuities.

f (x) =











0 x = 0, ±1

0 <

< 1

> 1

7. Find the discontinuities and identify their types for the following function

f (x) =

(

sin

if x < 0 or x > 1

if 0 < x ≤ 1

8. Let a ∈ U

◦

. Verify the claim following Deﬁnition 1.23: lim

x→a

f (x) = L if and only if

∀ϵ > 0, ∃δ > 0 such that 0 <

x −a

< δ =⇒

f (x) − L

< ϵ

9. Recall Exercise 17.5, where we saw that a function f : U → R is continuous at any isolated

point a ∈ U.

(a) Any function with domain dom( f ) = Z is continuous everywhere! Explain why we

cannot deﬁne any limits lim

x→a

(±)

f (x) for such a function.

(Hint: Being unable to deﬁne a limit is different from saying lim f (x) = DNE: see page 13.)

(b) Suppose g(x) = x

h(x) has dom(g) = {0} ∪ {

: n ∈ Z}, where h is any function taking

values in the interval [−1, 1]. Explain why g is continuous at every point of its domain.

(These awkward examples of continuity can be avoided if we follow our usual approach where a domain

is a union of intervals of positive length. This restriction is essentially baked in to the Deﬁnition 1.23.)

2 Sequences and Series of Functions

If ( f

) is a sequence of functions, what should we mean by lim f

? This question is of great relevance

to the history of calculus; Issac Newton’s work in the late 1600’s made great use of power series, which

are naturally constructed as limits of sequences of polynomials.

For instance, for each n ∈ N

, we might consider the polynomial function f

: R → R deﬁned by

(x) =

∑

k=0

= 1 + x + ···+ x

This is easy to work with, to differentiate and integrate using the power law. What, however, are we

to make of the following series?

f (x) :=

∞

∑

n=0

= 1 + x + x

+ ···

Does this make sense? What is its domain? Does it equal the limit of the sequence ( f

) in a meaning-

ful way? Is it continuous, differentiable or integrable? Can we compute its limit/derivative/integral

term-by-term in the obvious way; for instance, is it legitimate to write

′

(x) =

∞

∑

n=1

n−1

= 1 + 2x + 3x

+ ···?

To many in Newton’s time, these questions were of diminished importance when compared to the

burgeoning applications of calculus to the natural sciences. However, for the 18

and 19

century

mathematicians who followed, the widespread application of calculus only increased the imperative

to rigorously address these issues.

23 Power Series

First we recall some of the important deﬁnitions, examples and results regarding inﬁnite series.

Deﬁnition 2.1. Let (b

)

∞

n=m

be a sequence of real numbers. The (inﬁnite) series

∑

is the limit of

the sequence (s

) of partial sums,

∑

k=m

= b

+ b

m+1

+ ··· + b

∞

∑

n=m

= lim

n→∞

The series

∑

converges, diverges to inﬁnity or diverges by oscillation

if the sequence (s

) does so.

∑

is absolutely convergent if

∑

converges. A convergent series that is not absolutely convergent

is conditionally convergent.

Recall that every sequence (s

) has subsequences tending to each of

lim sup s

= lim

N→∞

sup{x

: n > N} and lim inf s

= lim

N→∞

inf{x

: n > N}

If (s

) converges, or diverges to ±∞, then lim s

= lim sup s

= lim inf s

. The remaining case, divergence by oscillation,

is when lim inf s

= lim sup s

: there exist (at least) two subsequences tending to different limits.

Examples 2.2. These examples form the standard reference dictionary for analysis of more complex

series. Make sure you are familiar with them!

1. (Geometric series) Let r be a constant, then s

∑

k=0

1−r

n+1

1−r

. It follows that

∞

∑

n=0











converges (absolutely) to

1−r

if −1 < r < 1

diverges to ∞ if r ≥ 1

diverges by oscillation if r ≤ −1

2. (Telescoping series) If b

n(n+1)

, then s

∑

k=1

= 1 −

n+1

=⇒

∞

∑

n=1

n(n+1)

= 1.

∞

∑

n=1

is (absolutely) convergent. In fact

∞

∑

n=1

, though checking this explicitly is tricky.

4. (Harmonic series)

∞

∑

n=1

is divergent to ∞.

5. (Alternating harmonic series)

∞

∑

n=1

(−1)

is conditionally convergent.

Theorem 2.3 (Root Test). Given a series

∑

, let β = lim sup

1/n

• If β < 1 then the series converges absolutely.

• If β > 1 then the series diverges.

We give sketch proofs, and/or refer you to a standard ‘test.’ Review these if you are unfamiliar.

1. s

−rs

= 1 + r + ··· + r

−(r + ··· + r

+ r

n+1

) = 1 −r

n+1

=⇒ s

1−r

n+1

1−r

2. b

−

n+1

=⇒ s



1 −





−



+ ··· +



−

n+1



= 1 −

n+1

3. Use the comparison or integral tests. Alternatively: For each n ≥ 2, we have

n(n−1)

. By part 2,

∑

k=1

< 1 +

∑

k=1

k(k −1)

≤ 2

Since (s

) is a monotone up sequence, bounded above by 2, we conclude that

∑

is convergent.

4. Use the integral test. Alternatively, observe that

n+1

−s

n+1

∑

k=2

−1

≥

n+1

=⇒ s

≥

−−−→

n→∞

∞

Since s

∑

k=1

deﬁnes an increasing sequence we conclude that s

→ ∞.

5. Use the alternating series test, or explicitly check that both the even and odd partial sums (s

) and (s

2n+1

) are

convergent (monotone and bounded) to the same limit.

Root Test: β < 1 =⇒ ∃ϵ > 0 such that

1/n

≤ 1 − ϵ (for large n) =⇒

∑

converges by comparison with the

convergent geometric series

∑

(1 − ϵ)

β > 1 =⇒ a subsequence of (

1/n

) converges to β > 1, whence b

↛ 0 =⇒

∑

diverges (n

-term test).

The root test is inconclusive if β = 1. Some simple inequalities

yield a simpler test.

Corollary 2.4 (Ratio Test). Given a series

∑

• If lim sup



n+1



< 1 then

∑

converges absolutely.

• If lim inf



n+1



> 1 then

∑

diverges.

We can now properly deﬁne and analyze our main objects of interest.

Deﬁnition 2.5. A power series centered at c ∈ R is a formal expression

∞

∑

n=m

(x −c)

where (a

)

∞

n=m

is a sequence of real numbers and x is considered a variable.

It is common to refer simply to a series, and modify by inﬁnite/power when clarity requires. We

almost always have m = 0 or 1, and it is common for examples to be centered at c = 0.

Example 2.6. Using the geometric series formula, we see that

∞

∑

n=0

( −1)

(x −4)

1 −

−(x−4)

x −2

whenever



−

x −4



< 1 ⇐⇒ 2 < x < 6

The series is valid (converges) only on a small subinterval of

the implied domain of the function x 7→

x−2

. The behavior

of both as x → 2

should not be a surprise; evaluating the

power series results in the divergent inﬁnite series

∑

1 = +∞

By contrast, as x → 6

−

, we see that limits and inﬁnite series

do not interact the way we might expect,

lim

x→6

−

∞

∑

n=0

( −1)

(x −4)

= lim

x→6

−

x −2

∞

∑

n=0

lim

x→6

−

( −1)

(x −4)

∑

( −1)

= DNE

with the last divergent by oscillation.

−6

−4

−2

−2 2 4 6

Power Series

f (x) =

x−2

As the example shows, we cannot take limits inside an inﬁnite sum; understanding when we can do

this is one of our primary goals.

lim inf



n+1



≤ lim inf

1/n

≤ lim sup

1/n

≤ lim sup



n+1



Radius and Interval of Convergence

At any real number x, a series may converge absolutely, converge conditionally, diverge to ±∞, or

diverge by oscillation. A series deﬁnes a function whose implied domain is the set on which the

series converges. In the previous example, the domain was an interval (2, 6). By applying the root

test (Theorem 2.3), we can show that this holds for every series.

Theorem 2.7 (Root Test for Power Series). Given a series

∞

∑

n=0

(x −c)

, deﬁne

R =

lim sup

1/n

Exactly one of the following is true:

R = ∞ the series converges absolutely for all x ∈ R

R = 0 the series converges only when x = c

R ∈ R

the series converges absolutely when

x −c

< R and diverges when

x −c

> R

Proof. For each ﬁxed x ∈ R, let b

= a

(x −c)

and apply the root test to

∑

, noting that

lim sup

1/n











0 if x = c or R = ∞

∞ if x = c and R = 0

lim sup

1/n

x −c

otherwise

In the ﬁnal situation, lim sup

1/n

< 1 ⇐⇒

x −c

< R, etc.

Deﬁnition 2.8. The radius of convergence is the value R deﬁned in Theorem 2.7. The interval of

convergence is the set of values x for which the series converges; the implied domain.

Radius of convergence Interval of convergence

∞ R = (−∞, ∞)

0 {c}

R ( c − R, c + R), (c − R, c + R], [c −R, c + R), or [c − R, c + R]

In the third case convergence/divergence at the endpoints of the interval of convergence must be

tested separately.

By applying Corollary 2.4, we obtain a more user-friendly result.

Corollary 2.9 (Ratio Test for Power Series). If the limit exists, R = lim

n→∞



n+1



Since

≥ 0, we here adopt the convention that

= ∞ and

∞

= 0. With similar caveats, it is also reasonable to write

R = lim inf

−1/n

Examples 2.10. 1. The series

∞

∑

n=1

is centered at 0. The ratio test tells us that

R = lim

n→∞



n+1



= lim

n→∞

1/n

1/(n + 1)

= lim

n→∞

n + 1

= 1

Test the endpoints of the interval of convergence separately:

x = 1

∑

= ∞ diverges

x = −1

∑

(−1)

converges (conditionally)

We conclude that the interval of convergence is [−1, 1).

It can be seen that the series converges to −ln(1 − x) on its

interval of convergence. As in Example 2.6, this function has

a larger domain (−∞, −1) , than that of the series.

−1

−3 −2 −1 1

y =

∞

∑

n=0

y = −ln(1 − x)

2. The series

∞

∑

n=1

similarly has

R = lim

n→∞



n+1



= lim

n→∞

( n + 1)

= 1

Since

∑

is absolutely convergent, we conclude that the power series also converges abso-

lutely at x = ±1; the interval of convergence is [−1, 1].

3. The series

∞

∑

n=0

converges absolutely for all x ∈ R, since

R = lim

n→∞



n+1



= lim

n→∞

( n + 1)!

= lim

n→∞

( n + 1) = ∞

You should recall from elementary calculus that this series converges to the natural exponential

function exp(x) = e

everywhere on R; indeed this is one of the common deﬁnitions of the

exponential!

4. The series

∞

∑

n=0

n!x

has R = 0, and thus only converges at its center x = 0.

5. Let a





if n is even and





if n is odd. If we try to apply the ratio test to the series

∞

∑

n=0

, we see that



n+1



(





2n+1

if n even





2n+1

if n odd

=⇒ lim sup



n+1



= ∞ = 0 = lim inf



n+1



The ratio test therefore fails. However, by the root test,

1/n

(

if n even

if n odd

=⇒ R =

lim sup

1/n

3/2

It is easy to check that the series diverges at x = ±

; the interval of convergence is (−

With the help of the root test, we can understand the domain of a power series. The issues of limits,

continuity, differentiability and integrability are more delicate. We will return to these once we’ve

developed some of the ideas around convergence for sequences of functions.

Exercises 23 1. For each power series, ﬁnd the radius and interval of convergence:

(a)

∑

( −1)

(b)

∑

( n + 1)

(x −3)

(c)

∑

√

(d)

∑

√

(x + 7)

(e)

∑

(x −π)

(f)

∑

√

2n+1

2. For each n ∈ N let a



4+2(−1)



(a) Find lim sup

1/n

, lim inf

1/n

, lim sup



n+1



and lim inf



n+1



(b) Do the series

∑

and

∑

( −1)

converge? Why?

∑

3. Suppose that

∑

has radius of convergence R. If lim sup

> 0, prove that R ≤ 1.

4. On the interval (−

), express the series in Example 2.10.5 as a simple function.

(Hints: Use geometric series formulæ and the fact that the value of an absolutely convergent series is

independent of rearrangements)

5. Consider the power series

∞

∑

n=1

(x −7)

5n+1

(x −7) +

(x −7)

+ ···

Since only one in ﬁve of the terms are non-zero, it is a little tricky to analyze using a na

ıve

application of our standard tests.

(a) Explain why the ratio test for power series (Corollary 2.9) does not apply.

(b) Writing the series as

∑

(x −7)

, observe that







m−1

(m−1)

if m ≡ 1 mod 5

0 otherwise

Use the root test (Theorem 2.7) and your understanding of elementary limits to directly

compute the radius of convergence.

∑

(x −7)

5n+1

∑

. Apply the ratio test for inﬁnite series (Corol-

lary 2.4): what do you observe? Use your observation to compute the radius of conver-

gence of the original series in a simpler manner than part (a).

(d) Finally, check the endpoints to determine the interval of convergence.

24 Uniform Convergence

In this section we consider sequences ( f

) of functions f

: U → R.

Example 2.11. For each n ∈ N, consider f

: (0, 1) → R : x 7→ x

(x)

0 1

(x)

0 1

(x)

0 1

(x)

0 1

There turn out to be several good notions of convergence for sequences of functions; the simplest it

where, for each x, ( f

(x)) is treated as a separate sequence of real numbers.

Deﬁnition 2.12. Suppose a function f and a sequence of functions ( f

) are given. We say that ( f

)

converges pointwise to f on U if,

∀x ∈ U, lim

n→∞

(x) = f (x)

It is common to write ‘ f

→ f pointwise.’ For reference, we state two equivalent rephrasings:

1. ∀x ∈ U,

(x) − f (x)

−−−→

n→∞

2. ∀x ∈ U, ∀ϵ > 0, ∃N such that n > N =⇒

(x) − f (x)

< ϵ.

As we’ll see shortly, the relative positions of the quantiﬁers (∀x and ∃N) is crucial: in this deﬁnition,

the value of N is permitted to depend on x as well as ϵ.

Example (2.11, mk. II). The sequence ( f

) converges pointwise on the domain (0, 1) to

f : ( 0, 1) → R : x 7→ 0

We prove this explicitly as a sanity check. First observe that

(x) − f (x)

= x

Suppose x ∈ (0, 1), that ϵ > 0 is given, and let N =

ln ϵ

ln x

Then

n > N =⇒ n ln x < ln ϵ =⇒ x

< ϵ

where the inequality switches sign since ln x < 0.

(x)

0 1

···

The example is nice in that a sequence of continuous functions converges pointwise to a continuous

function. Unfortunately, this desirable situation is not universal.

Example (2.11, mk. III). Extend the domain to include

x = 1; deﬁne

: (0, 1] → R : x 7→ x

Each g

is a continuous function, however its pointwise limit

g(x) =

(

0 if x < 1

1 if x = 1

has a jump discontinuity at x = 1.

(x)

0 1

···

With the goal of having convergence of functions preserve continuity, we make a tighter deﬁnition.

Deﬁnition 2.13. ( f

) converges uniformly to f on U if either

1. sup

x∈U

(x) − f (x)

−−−→

n→∞

0, or,

2. ∀ϵ > 0, ∃N such that ∀x ∈ U, n > N =⇒

(x) − f (x)

< ϵ

A common notation is f

⇒ f , though we won’t use it.

2ǫ

f (x)

(x)

Whenever n > N, the graph of f

(x) must lie between those of f (x) ± ϵ.

We’ll show that statements 1 and 2 are equivalent momentarily. For the present, compare with the

corresponding statements for pointwise convergence:

• As with continuity versus uniform continuity, the distinction comes in the order of the quantiﬁers:

in uniform convergence, x is quantiﬁed after N and so the same N works for all x.

• Uniform convergence implies pointwise convergence.

For the last time, we revisit our main example.

Example (2.11, mk. IV). If f

: (0, 1) → R : x 7→ x

and f are deﬁned as before, then the pointwise

convergence f

→ f is non-uniform. We show this using both criteria.

1. For every n,

sup

x∈(0,1)

(x) − f (x)

= sup{x

: 0 < x < 1} = 1 ↛ 0

2. Suppose the convergence were uniform and let ϵ =

. Then

∃N ∈ N such that ∀x ∈ (0, 1), n > N =⇒ x

Since N ∈ N, a simple choice results in a contradiction;

x =





N+1

∈ (0, 1) =⇒ x

N+1

−ǫ

Theorem 2.14. The criteria for uniform convergence in Deﬁnition 2.13 are equivalent.

Proof. (1 ⇒ 2) This follows from the fact that

∀x ∈ U,

(x) − f (x)

≤ sup

x∈U

(x) − f (x)

(2 ⇒ 1) Suppose ϵ > 0 is given. Then

∃N ∈ R such that ∀x ∈ U, n > N =⇒

(x) − f (x)

But then

n > N =⇒ sup

x∈U

(x) − f (x)

≤

< ϵ

Somewhat amazingly, the subtle change of deﬁnition results in the preservation of continuity.

Theorem 2.15. Suppose that ( f

) is a sequence of continuous functions. If f

→ f uniformly, then

f is continuous.

Proof. We demonstrate the continuity of f at a ∈ U. Let ϵ > 0 be given.

• Since f

→ f uniformly,

∃N such that ∀x ∈ U, n > N =⇒

f (x) − f

(x)

• Choose any n > N. Since f

is continuous at a,

∃δ > 0 such that

x −a

< δ =⇒

(x) − f

(a)

(†)

Simply put these together with the triangle inequality to see that

x −a

< δ =⇒

f (x) − f (a)

≤

f (x) − f

(x)

(x) − f

(a)

(a) − f (a)

= ϵ

We need not have ﬁxed a at the start of the proof. Rewriting (†) to become

∃δ > 0 such that ∀x, a ∈ U,

x −a

< δ =⇒

(x) − f

(a)

proves a related result.

Corollary 2.16. If f

→ f uniformly where each f

is uniformly continuous, then f is uniformly

continuous.

Examples 2.17. 1. Let f

(x) = x +

. This is continuous on R for all x, and converges pointwise

to the continuous function f : x 7→ x.

(a) On any bounded interval [−M, M] the convergence f

→ f is uniform,

sup

x∈[−M,M]

(x) − f (x)

= sup



: x ∈ [−M, M]



−−−→

n→∞

(b) On any unbounded interval, R say, the convergence is non-uniform,

sup

x∈R

(x) − f (x)

= sup



: x ∈ R



= ∞

2. Consider f

(x) =

1+x

; this is continuous on

( −1, ∞) and converges pointwise to

f (x) =











0 if x > 1

if x = 1

1 if −1 < x < 1

We consider the convergence f

→ f on several

intervals.

(x)

−1 0 1 2 3

(a) On [2, ∞), the pointwise limit is continuous. Moreover, f

(x) is decreasing, whence

sup

x∈[2,∞)

(x) − 0

1 + 2

−−−→

n→∞

and the convergence is uniform. Alternatively; if ϵ ∈ (0, 1), let N = log

( ϵ

−1

−1), then

∀x ≥ 2, n > N =⇒

(x) − 0

1 + x

≤

1 + 2

= ϵ

The same argument shows that f

→ f uniformly on any interval [a, ∞) where a > 1.

(b) On [1, ∞) the convergence is not uniform, since the pointwise limit is discontinuous,

f (x) =

(

0 if x > 1

if x = 1

sup

x∈[1,∞)

(x) − f (x)

= sup



1 + x

: x > 1



−−−→

n→∞

(d) Similarly, for any a ∈ (0, 1), the convergence f

→ f is uniform on [0, a], this time to the

(continuous) constant function f (x) = 1,

sup

x∈[0,a]

(x) − 1



1 −

1 + a



1 + a

−−−→

n→∞

(e) Finally, on (−1, 1) the convergence is not uniform,

sup

x∈[0,1)

(x) − f (x)

= sup



1 + x

: x ∈ [0, 1)



−−−→

n→∞

Exercises 24 1. For each sequence of functions deﬁned on [0, ∞):

(i) Find the pointwise limit f (x) as n → ∞.

(ii) Determine whether f

→ f uniformly on [0, 1].

(iii) Determine whether f

→ f uniformly on [1, ∞).

(a) f

(x) =

(b) f

(x) =

1 + x

(x) =

n + x

(d) f

(x) =

1 + nx

(e) f

(x) =

1 + nx

2. Let f

(x) =



x −



. If f (x) = x

, we clearly have f

→ f pointwise on any domain.

(a) Prove that the convergence is uniform on [−1, 1].

(b) Prove that the convergence is non-uniform on R.

3. For each sequence, ﬁnd the pointwise limit and decide if the convergence is uniform.

(a) f

(x) =

1+2 cos

(nx)

√

for x ∈ R.

(b) f

(x) = cos

(x) on [−π/2, π/2].

4. For each n ∈ N, consider the continuous function

: [0, 1] → R : x 7→ nx

(1 − x)

(a) Given 0 ≤ x < 1, let a ∈ (x, 1). Explain why ∃N such that

n > N =⇒

n+1

(x)

≤ a

(x)

Hence conclude that the pointwise limit of ( f

) is the zero function.

(b) Use elementary calculus ( f

′

(x) = 0 ⇐⇒ . . .) to prove that the maximum value of f

located at x

1+n

. Hence compute

sup

x∈[0,1]

(x) − f (x)

and use it to show that the convergence f

→ 0 is non-uniform.

This shows that the converse to Theorem 2.15 is false, even on a bounded interval: the continuous

sequence ( f

) converges non-uniformly to a continuous function. Sketches of several f

are below.

0 1

−1

y = f

(x)

0 1

−1

y = f

(x)

0 1

−1

y = f

(x)

0 1

−1

y = f

(x)

5. Explain where the proof of Theorem 2.15 fails if f

→ f non-uniformly.

25 More on Uniform Convergence

While we haven’t yet developed calculus, our familiarity with basic differentiation and integration

makes it natural to pause to consider the interaction of these operations with sequences of functions.

We also consider a Cauchy-criterion for uniform convergence, which leads to the useful Weierstraß

M-test.

Example 2.18. Recall that f

(x) = x

converges uniformly to f (x) = 0 on any interval [ 0, a] where

a < 1. We easily check that

(x) dx =

n + 1

n+1

−−−→

n→∞

0 =

f (x) dx

In fact the sequence of derivatives converge here also

(x) = nx

n−1

−−−→

n→∞

0 = f

′

(x)

It is perhaps surprising that integration interacts more nicely with uniform limits than does differen-

tiation. We therefore consider integration ﬁrst.

Theorem 2.19. Let f

→ f uniformly on [a, b] where the functions f

are integrable. Then f is

integrable on [a, b] and

lim

n→∞

(x)dx =

f (x)dx

Proof. Given ϵ > 0, note that

2(b−a)

dx =

. Since f

→ f uniformly, ∃N such that

∀x ∈ [a, b], n > N =⇒

(x) − f (x)

2(b −a)

=⇒ f

(x) −

2(b −a)

< f (x) < f

(x) +

2(b −a)

=⇒

(x) dx −

≤

f (x) dx ≤

(x) dx +

=⇒



(x) dx −

f (x) dx



≤

< ϵ

The appearance of uniform convergence in the proof is subtle. If N = N(ϵ) were allowed to depend

on x, then the integral

(x) dx would be meaningless: Which n would we consider? Larger than

N(x, ϵ) for which x? Taking n ‘larger’ than all the N(x, ϵ) might produce the absurdity n = ∞!

This assumes f is already integrable. Once we’ve properly deﬁned (Riemann) integrability at the end of the course, we

can insert the following

(x) dx −

≤ L( f ) ≤ U( f ) ≤

(x) dx +

=⇒ 0 ≤ U( f ) − L( f ) ≤ ϵ =⇒ U( f ) = L( f )

where U( f ) and L( f ) are the upper and lower Darboux integrals of f ; their equality shows that f is integrable on [a, b].

Examples 2.20. 1. Uniform convergence is not required for the integrals to converge as we’d like.

For instance, recall that extending the previous example to the domain [0, 1] results in non-

uniform convergence; however, we still have

(x) dx =

n + 1

−−−→

n→∞

0 =

f (x) dx

2. To obtain a sequence of functions f

→ f for which

↛

f requires a bit of creativity.

Consider the sequence

: [−1, 1] → R : x 7→

(

n −n

x if 0 < x <

0 otherwise

If 0 < x < 1, then for large n ∈ N we have

x ≥

=⇒ f

(x) = 0

−1 1

(x)

We conclude that f

→ 0 pointwise. Since the area under f

is a triangle with base

and height

n, the integral is constant and non-zero;

−1

(x) dx =

= 0 =

−1

f (x) dx

It should be obvious why the convergence f

→ 0 is non-uniform; why?

Derivatives and Uniform Limits We’ve already seen that a uniform limit of differentiable functions

might be differentiable (Example 2.18), but this shouldn’t be expected in general since even uniform

limits of differentiable functions can have corners!

Example 2.21. For each n ∈ N, consider the function

: [−1, 1] → R : x 7→

(

≥

• f

converges pointwise to f (x) =

• f

→ f uniformly since

sup

x∈[−1,1]

(x) − f (x)

→ 0

• Each f

is differentiable: f

′

(x) =











1 if x ≥

nx if

−1 if x ≤ −

• The uniform limit f is not differentiable at x = 0.

(x)

−1 0 1

−1

′

(x)

−1 1

−

Transferring differentiability to the limit of a sequence of functions is a bit messy.

Theorem 2.22. Suppose ( f

) is a sequence and g is a function on [a, b] for which:

• f

→ f pointwise;

• Each f

is differentiable with continuous derivative;

• f

′

→ g uniformly.

Then f

→ f uniformly on [a, b] and f is differentiable with derivative g.

The issue in the previous example is that the pointwise limit of the derived sequence ( f

′

) is discontin-

uous at x = 0 and therefore f

′

→ g isn’t uniform!

Proof. For any x ∈ [a, b], the fundamental theorem of calculus tells us that

′

( t) dt = f

(x) − f

(a)

By Theorem 2.19, the left side converges to

g( t) dt, while the right converges to f (x) − f (a). Since

′

→ g uniformly, we see that g is continuous and we can apply the fundamental theorem again:

g( t) dt = f (x) − f (a) is differentiable with derivative g.

The uniformity of the convergence f

→ f follows from Exercise 10.

Uniformly Cauchy Sequences and the Weierstraß M-Test

Recall that one may use Cauchy sequences to demonstrate convergence without knowing the limit in

advance. An analogous discussion is available for sequences of functions.

Deﬁnition 2.23. A sequence of functions ( f

) is uniformly Cauchy on U if

∀ϵ > 0, ∃N ∈ N such that ∀x ∈ U, m, n > N =⇒

(x) − f

(x)

< ϵ

Example 2.24. Let f

(x) =

∑

k=1

sin k

x be deﬁned on R. Given ϵ > 0, let N =

, then

m > n > N =⇒

(x) − f

(x)



∑

k=n+1

sin k



≤

∑

k=n+1

≤

∑

k=n+1

k(k −1)

∑

k=n+1

k −1

−

= ϵ

whence ( f

) is uniformly Cauchy.

Without the continuity assumption, the fundamental theorem of calculus doesn’t apply and the proof requires an

alternative approach. One can also weaken the hypotheses: if f

′

→ g uniformly and that ( f

(x)) converges for at least one

x ∈ [a, b], then there exists f such that f

→ f is uniform and f

′

= g.

As with sequences of real numbers, uniformly Cauchy sequences converge; in fact uniformly!

Theorem 2.25. A sequence ( f

) is uniformly Cauchy on U if and only if it converges uniformly to

some f : U → R.

Proof. (⇒) Let ( f

) be uniformly Cauchy on U. For each x ∈ U, the sequence ( f

(x)) ⊆ R is Cauchy

and thus convergent. Deﬁne f : U → R via

f (x) := lim

n→∞

(x)

We claim that f

→ f uniformly. Let ϵ > 0 be given, then

∃N ∈ N such that m > n > N =⇒

(x) − f

(x)

=⇒ f

(x) −

< f

(x) < f

(x) +

=⇒ f

(x) −

≤ f (x) ≤ f

(x) +

(take limits as m → ∞)

=⇒

(x) − f (x)

≤

< ϵ

(⇐) This is Exercise 2.

Example (2.24, mk. II). Since ( f

) is uniformly Cauchy on R, it converges uniformly to some

f : R → R. It seems reasonable to write

f (x) =

∞

∑

n=1

sin n

The graph of this function looks somewhat bizarre:

−1

f (x)

2ππ−2π −π

Since each f

is (uniformly) continuous, Theorem 2.15 says that f is also (uniformly) continuous. By

Theorem 2.19, f (x) is integrable, indeed

f (x) dx = lim

n→∞

∑

k=1

−k

−4

cos k



∞

∑

n=1

(cos n

a −cos n

which converges (comparison test) for all a, b. By contrast, the derived sequence

′

(x) =

∑

k=1

cos k

does not converge for any x since cos n

−−→

k→∞

0. We should thus expect (though we offer no proof)

that f is nowhere differentiable.

The example generalizes. Suppose (g

) is a sequence of functions on U and deﬁne the series

∑

(x)

as the pointwise limit of the sequence ( f

) of partial sums

∞

∑

k=k

(x) := lim

n→∞

(x) where f

(x) =

∑

k=k

(x)

whenever the limit exists. The series is said to converge uniformly whenever ( f

) does so. Theorems

2.15, 2.19 and 2.22 immediately translate.

Corollary 2.26. Let

∑

be a series of functions converging uniformly on U. Then:

1. If each g

is (uniformly) continuous then

∑

is (uniformly) continuous.

2. If each g

is integrable, then

∑

(x) dx =

∑

(x) dx.

3. If each g

is continuously differentiable, and the sequence of derived partial sums f

′

converges

uniformly, then

∑

is differentiable and

∑

(x) =

∑

′

(x).

As an application of the uniform Cauchy criterion, we obtain an easy test for uniform convergence.

Theorem 2.27 (Weierstraß M-test). Suppose (g

) is a sequence of functions on U. Moreover as-

sume:

1. (M

) is a non-negative sequence such that

∑

converges.

2. Each g

is bounded by M

; that is

(x)

≤ M

Then

∑

(x) converges uniformly on U.

Proof. Let f

(x) =

∑

k=k

(x) deﬁne the sequence of partial sums. Since

∑

converges, its sequence

of partial sums is Cauchy (the Cauchy criterion for inﬁnite series); given ϵ > 0,

∃N such that m > n > N =⇒

∑

k=n+1

< ϵ

However, by assumption,

m > n > N =⇒

(x) − f

(x)



∑

k=n+1

(x)



≤

∑

k=n+1

(x)

≤

∑

k=n+1

< ϵ

The sequence of partial sums is uniformly Cauchy and thus uniformly convergent.

Example 2.28. Given the series

∞

∑

n=1

1+cos

(nx)

sin(nx), we clearly have



1 + cos

( nx)

sin(nx)



≤

for all x ∈ R

Since

∑

converges, the M-test shows that the original series converges uniformly on R.

Exercises 25 1. For each n ∈ N, let f

(x) = nx

when x ∈ [0, 1) and f

(1) = 0.

(a) Prove that f

→ 0 pointwise on [0, 1].

(Hint: recall Exercise 24.4 if you’re not sure how to prove this)

(b) By considering the integrals

(x) dx show that f

→ 0 is not uniform.

2. Prove that if f

→ f uniformly, then the sequence ( f

) is uniformly Cauchy.

3. (a) Suppose ( f

) is a sequence of bounded functions on U and suppose that f

→ f converges

uniformly on U. Prove that f is bounded on U.

(b) Give an example of a sequence of bounded functions ( f

) converging pointwise to f on

[0, ∞), but for which f is unbounded.

4. The sequence deﬁned by f

(x) =

1+nx

(Exercise 24.1) converges uniformly on any closed

interval [a, b] where 0 < a < b.

(a) Check explicitly that

(x) dx →

f (x) dx, where f = lim f

(b) Is the same thing true for derivatives?

5. Let f

(x) = n

−1

sin n

x be deﬁned on R.

(a) Prove that f

converges uniformly on R.

(b) Check that

( t) dt converges for any x ∈ R.

′

) converge? Explain.

6. Use the M-test to prove that

∞

∑

n=1

deﬁnes a continuous function on [−1, 1].

7. Prove that

∞

∑

n=1

sin x

( n + 1)

converges uniformly to a continuous function on the interval [−2, 2].

8. Prove that if

∑

converges uniformly on a set U and if h is a bounded function on U, then

∑

converges uniformly on U.

(Warning: you cannot simply write

∑

= h

∑

)

9. Consider Example 2.20.2.

(a) Check explicitly that the convergence isn’t uniform by computing sup

x∈[−1,1]

(x) − f (x)

(b) Prove that f

→ 0 pointwise on (0, 1] using the ϵ–N deﬁnition of convergence: that is,

given ϵ > 0 and x ∈ (0, 1], ﬁnd an explicit N(x, ϵ ) such that

n > N =⇒

f (x)

< ϵ

What happens to your choice of N(x, ϵ) as x → 0

10. Suppose ( f

′

) converges uniformly on [a, b] and that each f

′

is continuous.

(a) Use the fact that ( f

′

) is uniformly Cauchy to prove that ( f

) is uniformly Cauchy and thus

converges uniformly to some function f .

(Hint:

(x) − f

(x)



′

( t) − f

′

( t) dt



. . .)

(b) Explain why we need not have assumed the existence of f in Theorem 2.22.

26 Differentiation and Integration of Power Series

In this section we specialize our recent results to power series. While everything will be stated for

series centered at x = 0, all are easily translated to arbitrary centers.

Theorem 2.29. Let

∑

be a power series with radius of convergence R > 0 and let T ∈ (0, R).

Then:

1. The series converges uniformly on [−T, T].

2. The series is uniformly continuous on [−T, T] and continuous on (−R, R).

Proof. This is an easy application of the M-test. For each k, deﬁne M

T < R =⇒

∑

converges absolutely =⇒

∑

converges

By the M-test and Corollary 2.26, the power series converges uniformly on [ −T, T] to a uniformly

continuous function.

Finally, every x ∈ (−R, R) lies in some such interval (take T =

), whence the power series is

continuous on (−R, R).

Example 2.30. On its interval of convergence (−1, 1), the geometric series

∞

∑

n=0

converges point-

wise to

1−x

; convergence is uniform on any interval [−T, T] ⊆ (−1, 1).

We needn’t use the Theorem for this is simple to verify directly: writing f , f

for the series and its

partial sums,

(x) − f (x)



1 − x

n+1

1 − x

−

1 − x



n+1

1 − x



=⇒ sup

x∈[−T,T]

(x) − f (x)

n+1

1 − T

−−−→

n→∞

By contrast, the convergence is non-uniform on (−1, 1);

sup

x∈(−1,1)

(x) − f (x)

= ∞

Theorem 2.31. Suppose

∞

∑

n=0

has radius of convergence R > 0. Then the series is integrable and

differentiable term-by-term on the interval (−R, R). Indeed for any x ∈ (−R, R),

∞

∑

n=0

∞

∑

n=1

n−1

and

∞

∑

n=0

dt =

∞

∑

n=0

n + 1

n+1

where both series also have radius of convergence R.

Proof. Let f (x) =

∑

have radius of convergence R, and observe that

lim sup

1/n

= lim n

1/n

lim sup

1/n

whence

∑

also has radius of convergence R. At any given non-zero x ∈ (−R, R), we may write

∞

∑

n=1

n−1

= x

−1

∞

∑

n=1

to see that the derived series also has radius of convergence R. On any interval [−T, T] ⊆ (−R, R), the

derived series converges uniformly (Theorem 2.29). Since each a

is continuously differentiable,

Corollary 2.26 says that f is differentiable on [−T, T] and that

′

(x) =

∞

∑

n=0

∞

∑

n=1

n−1

Since any x ∈ (−R, R) lies in some such interval [−T, T], we are done.

The corresponding result for integrals is Exercise 6.

We postpone the canonical examples until after the next result.

Continuity at Endpoints?

There is one small hole in our analysis. If a series has radius of convergence R we know that it

converges and is continuous on (−R, R). But what if it additionally converges at x = ±R? Is the

series continuous at the endpoints? The answer is an unequivocal yes, though this small beneﬁt

requires a lot of work!

Theorem 2.32 (Abel’s Theorem). Power series are continuous on their full interval of convergence.

Examples 2.33. 1. Apply our results to the geometric series;

(1 − x)

1 − x

∞

∑

n=1

n−1

∞

∑

n=0

( n + 1)x

= 1 + 2x + 3x

+ 4x

+ ···

ln( 1 −x) = −

1 − t

dt = −

∞

∑

n=0

n + 1

n+1

= −

∞

∑

n=1

= −



x +

+ ···



with both series valid on (−1, 1). In fact the ﬁrst series has the same interval of convergence,

while the second is [−1, 1). By Abel’s Theorem and the fact that logarithms are continuous, we

have equality at x = −1 and the famous identity

ln 2 =

∞

∑

n=1

( −1)

n+1

= 1 −

−

+ ···

This example shows that while the integrated and differentiated series have the same radius of

convergence as the original, convergence at the endpoints need not be the same in all cases.

2. Substitute x 7→ −x

in the geometric series and integrate term-by-term: if

< 1, then

1 + x

∞

∑

n=0

( −1)

=⇒ arctan x =

∞

∑

n=0

( −1)

2n + 1

2n+1

In fact the arctangent series also converges at x = ±1; Abel’s Theorem says it is continuous on

[−1, 1]. Since arctangent is continuous (on R!) we recover another famous identity

= arctan 1 =

∞

∑

n=0

( −1)

2n + 1

= 1 −

−

+ ···

As with the identity for ln 2, this is a very slowly converging alternating series and therefore

doesn’t provide an efﬁcient method for approximating π.

3. The series f (x) =

∞

∑

n=0

( −1)

(2n)!

has radius of convergence ∞. Differentiate to obtain

′

(x) =

∞

∑

n=1

( −1)

2n−1

(2n −1) !

∞

∑

n=0

( −1)

n+1

(2n + 1) !

2n+1

This series is also valid for all x ∈ R. Differentiating again,

′′

(x) =

∞

∑

n=0

( −1)

n+1

(2n)!

= −

∞

∑

n=0

( −1)

(2n)!

= −f (x)

Recalling that f (x) = cos x is the unique solution to the initial value problem

(

′′

(x) = −f (x)

f (0) = 1, f

′

(0) = 0

We conclude that, ∀x ∈ R,

cos x =

∞

∑

n=0

( −1)

(2n)!

sin x = −f

′

(x) =

∞

∑

n=0

( −1)

(2n + 1) !

2n+1

These expressions can instead be taken as the deﬁnitions of sine and cosine. As promised earlier in

the course, continuity and differentiability now come for free. One difﬁculty with this deﬁnition is

believing that it has anything to do with right-triangles!

We can similarly deﬁne other common transcendental functions using power series: for instance

exp(x) =

∞

∑

n=0

Example 2.33.1 could be taken as a deﬁnition of the logarithm on the interval ( 0, 2],

ln x = ln(1 −(1 − x)) = −

∞

∑

n=1

(1 − x)

∞

∑

n=1

( −1)

n+1

(x −1)

though this is unnecessary since it is more natural to deﬁne ln as the inverse of the exponential.

Proof of Abel’s Theorem (non-examinable)

This requires a lot of work, so feel free to omit on a ﬁrst reading!

First observe that there is nothing to check unless 0 < R < ∞. By the change of variable x 7→ ±

, it

is enough for us to prove the following:

∞

∑

n=0

convergent and f (x) =

∞

∑

n=0

on (−1, 1) =⇒ lim

x→1

−

f (x) =

∞

∑

n=0

Proof. Let s

∑

k=0

and write s = lim s

∑

. It is an easy exercise to check that

∑

k=0

= s

+ (1 − x)

n−1

∑

k=0

< 1, then (since s

→ s) lim s

= 0, whence we obtain

∀x ∈ (−1, 1), f (x) = (1 − x)

∞

∑

n=0

Let ϵ ∈ (0, 1) be given and ﬁx x ∈ ( 0, 1). Then

∃N ∈ N such that n > N =⇒

−s

(∗)

Use the geometric series formula

∞

∑

n=0

1−x

and write h(x) = (1 − x)



∑

n=0

( s

−s)x



to observe

f (x) − s



(1 − x)

∞

∑

n=0

−s



(1 − x)

∞

∑

n=0

−s(1 − x)

∞

∑

n=0



(1 − x)

∞

∑

n=0

( s

−s)x



= (1 − x)



∑

n=0

( s

−s)x

∞

∑

n=N+1

( s

−s)x



≤ (1 − x)



∑

n=0

( s

−s)x



+ (1 − x)



∞

∑

n=N+1

( s

−s)x



(△-inequality)

< h(x) +

(1 − x)



∞

∑

n=N+1



(by (∗))

≤ h(x) +

Since h > 0 is continuous and h( 1) = 0, ∃δ > 0 such that x ∈ (1 − δ, 1) =⇒ h(x) <

(the

computation of a suitable δ is another exercise).

We conclude that lim

x→1

−

f (x) = s.

Exercises 26 1. (a) Prove that

∞

∑

n=1

(1 − x)

for

< 1.

(b) Evaluate

∞

∑

n=1

∞

∑

n=1

and

∞

∑

n=1

( −1)

2. (a) Starting with a power series centered at x = 0, evaluate the integral

1/2

1 + x

dx as an

inﬁnite series.

(b) (Harder) Repeat part (a) but for

1 + x

dx. What extra ingredients do you need?

3. The probability that a standard normally distributed random variable X lies in the interval [a, b]

is given by the integral

P(a ≤ X ≤ b) =

√

2π

exp



−



Find P(−1 ≤ X ≤ 1) as an inﬁnite series.

4. Deﬁne c(x) =

∞

∑

n=0

(2n)!

and s(x) =

∞

∑

n=0

2n+1

(2n + 1) !

(a) Prove that c

′

(x) = s(x) and that s

′

(x) = c(x).

(b) Prove that c(x)

−s(x)

= 1 for all x ∈ R.

(These functions are the hyperbolic sine and cosine: s(x) = sinh x and c(x) = cosh x)

5. Let a, b ∈ (−1, 1). Extending Example 2.30, show that the convergence

∑

1−x

is non-

uniform on any interval of the form (−1, a) or (b, 1).

6. Prove the integration part of Theorem 2.31.

7. Prove or disprove: If a series converges absolutely at the endpoints of its interval of convergence

then its convergence is uniform on the entire interval.

8. Complete the proof of Abel’s Theorem:

(a) Let s

∑

k=0

be the partial sum of the series

∑

. For each n, prove that,

∑

k=0

= s

+ (1 − x)

n−1

∑

k=0

(b) Suppose x > 0. Let S = max{

−s

: n ≤ N} and prove that h(x) ≤ S(1 − x

N+1

). Hence

ﬁnd an explicit δ that completes the ﬁnal step.

27 The Weierstraß Approximation Theorem

A major theme of analysis is approximation; for instance power series are an example of (uniform)

approximation by polynomials. It is reasonable to ask whether any function can be so approximated.

In 1885, Weierstraß answered a speciﬁc case in the afﬁrmative.

Theorem 2.34 (Weierstraß). If f : [a, b] → R is continuous, then there exists a sequence of polyno-

mials converging uniformly to f on [a, b].

Suitable polynomials can be deﬁned in various ways. By scaling the domain, it is enough to do this

on [a, b] = [0, 1] where perhaps the simplest approach is via the Bernstein Polynomials,

f (x) :=

∑

k=0









(1 − x)

n−k

(

)

k!(n−k)!

is the binomial coefﬁcient)

We omit the proof due to length; Weierstraß’ original argument was completely different. Instead we

compute a couple of examples and give an important interpretation/application.

Examples 2.35. 1. Suppose f (x) = 2x if x <

and f (x) = 1 otherwise.

f (x) = f (0)(1 − x) f (0) + f (1)x = x

f (x) = f (0)(1 − x)

+ 2 f (

)x(1 − x) + f (1)x

= 2x(1 − x) + x

= x(2 − x)

0 1

f (x) = f (0)(1 − x)

+ 3 f (

)x(1 − x)

+ 3 f (

(1 − x) + f (1)x

= 0(1 − x)

+ 2x(1 − x)

+ 3x

(1 − x) + x

= x(2 − x) = B

f (x)

f (x) = 0(1 − x)

+ 2x(1 − x)

+ 6x

(1 − x)

+ 4x

(1 − x) + x

= x(x

−2x

+ 2)

The Bernstein polynomials B

f (x), B

f (x) and B

f (x) are drawn.

2. Now assume f (x) = x if x <

and 1 − x otherwise.

f (x) = f (0)(1 − x) + f (1)x = 0

f (x) = x(1 − x)

f (x) = 0(1 − x)

+ x(1 − x)

+ x

(1 − x) + 0x

= x(1 − x) = B

f (x)

0 1

f (x) = f (0)(1 − x)

+ f (

)·4x(1 − x)

+ f (

)·6x

(1 − x)

+ f (

)·4x

(1 − x) + f (1)x

= x(1 − x)

+ 3x

(1 − x)

+ x

(1 − x)

= x(1 − x)(1 + x − x

)

B´ezier curves (just for fun!)

The Bernstein polynomials arise naturally when con-

sidering B´ezier curves. These have many applications,

particularly in computer graphics. Given three points

A, B, C, deﬁne points on the line segments

−→

AB and

−→

for each t ∈ [0, 1], via

−→

AB(t) = (1 −t)A + tB

−→

BC(t) = (1 −t)B + tC

These points move at a constant speed along the cor-

responding segments. Now consider a point on the

moving segment between the points deﬁned above:

0 1

−→

B C

R( t) := (1 − t)

−→

AB(t) + t

−→

BC(t) = (1 −t)

A + 2t(1 −t)B + t

This is the quadratic B´ezier curve with control points A, B, C. The 2

Bernstein polynomial for a function

f is simply the quadratic B

ezier curve with control points

(

0, f (0)

)



, f (

)



and

(

1, f (1)

)

. The

picture

above shows B

f (x) for the above example.

We can repeat the construction with more control points: with four points A, B, C, D, one constructs

−→

AB(t),

−→

BC(t),

−→

CD(t), then the second-order points between these, and ﬁnally the cubic B

ezier curve

R( t) : = (1 − t)



(1 −t)

−→

AB(t) + t

−→

BC(t)



+ t



(1 −t)

−→

BC(t) + t

−→

CD(t)



= (1 − t)

A + 3t(1 −t)

B + 3t

(1 −t)C + t

where we now recognize the relationship to the 3

Bernstein polynomial.

The pictures show cubic B

ezier curves: the ﬁrst is the graph of the Bernstein polynomial

f (x) = 0(1 − x)

+ 3x(1 − x)

+ 3x

(1 − x) +

while the second is for the four given control points A, B, C, D.

To see these pictures move, visit https://www.math.uci.edu/

ndonalds/math140b/bezier.html

Exercises 27 1. Show that the closed bounded interval assumption in the approximation theorem

is required by giving an example of a continuous function f : (−1, 1) → R which is not the

uniform limit of a sequence of polynomials.

2. If g : [a, b] → R is continuous, then f (x) := g



( b − a)x + a



is continuous on [0, 1]. If P

→ f

uniformly on [0, 1], prove that Q

→ g uniformly on [a, b], where

(x) = P



x −a

b − a



3. Use the binomial theorem to check that every Bernstein polynomial for f (x) = x is B

f (x) = x

itself!

4. Find a parametrization of the cubic B

ezier curve with control points (1, 0), (0, 1), (−1, 0) and

(0, −1). Now sketch the curve.

(Use a computer algebra package if you like!)

5. (Hard) Show that the Bernstein polynomials for f (x) = x

are given by

f (x) =

x +

n −1

and thus verify explicitly that B

f → f uniformly.

3 Differentiation

Differentiation grew out of the problem of instantaneous velocity. Velocity can only easily be measured

as an average over a time interval:

if an object travels ∆d meters in ∆t seconds, then its average ve-

locity is v

∆d

∆t

−1

. An early ‘deﬁnition’ (dating to the 1300’s) makes the instantaneous velocity

equal to the constant velocity that would be observed if a body were to stop accelerating: while use-

less for the purposes of measurement, this is essentially Newton’s ﬁrst law regarding inertial motion

(1687). We also see the concept of the tangent line beginning to appear: if one graphs position against

time, then a couple of things should be clear:

• The graph of inertial (constant speed) motion is a straight line whose slope is the velocity.

• The tangent line to a curve at a point has slope equal to the instantaneous velocity at that point.

The problem of ﬁnding, deﬁning and computing instantaneous velocity thus morphed into the con-

sideration of tangent lines to curves. With the advent of analytic geometry in the early 1600’s,

mathematicians such as Fermat and Descartes pioneered versions of the familiar secant (‘cutting’)

line method for computing tangents.

∆t

∆d

v =

∆d

∆t

Instantaneous velocity equals constant

velocity corresponding to tangent line

Secant lines approximate tangent line as t → a

The average velocity of the particle over the time interval [a, t] is the slope of the secant line, namely

(a, t) =

d( t) − d(a)

t − a

Since the secant lines approximate the tangent line as t approaches a, it seems reasonable that we

should compute the instantaneous velocity in this manner:

v(a) = lim

t→a

(a, t) = lim

t→a

d( t) − d(a)

t − a

This is, of course, the modern deﬁnition of the derivative.

Even a modern technique such as Doppler-shift compares measurements separated by the extremely small period of a

light or soundwave. These are still therefore average velocities, albeit taken over very small time intervals.

28 Basic Properties of the Derivative

Deﬁnition 3.1. Let f : U → R and let a ∈ U. We say that f is differentiable at a if the following limit

exists (is ﬁnite!)

lim

x→a

f (x) − f (a)

x −a

We call this limit the derivative of f at a and denote its value by either f

′

(a) or

d f



x=a

If f

′

(a) exists for all a ∈ U then f is differentiable (on U); the derivative becomes a function f

′

(x) =

d f

Notation The contrasting styles are partly attributable, to the primary founders of calculus, Issac

Newton and Gottfried Leibniz. Each has its pros and cons and you should be comfortable with both.

One-sided derivatives Since the deﬁning limit is two-sided, differentiability only makes sense at

interior points of U. Left- and right-derivatives may be deﬁned via one-sided limits; differentiability

is equivalent to these being equal. All results in this section hold for one-sided derivatives with

suitable (sometimes tedious) modiﬁcations. It is quite common, though strictly incorrect, to say that

f is differentiable on an interval [a, b) if it is differentiable on the interior (a, b) and right-differentiable

at a; however, we will strictly adhere to differentiable meaning two-sided.

Examples 3.2. 1. Let f (x) = x

+ 4x. Then, for any a ∈ R,

lim

x→a

f (x) − f (a)

x −a

= lim

x→a

+ 4x − a

−4a

x −a

= lim

x→a

(x − a)(x + a + 4)

x −a

= lim

x→a

(x + a + 4) = 2a + 4

Note how the deﬁnition of lim

x→a

allows us to cancel the x − a terms from the numerator and

denominator. We conclude that f is differentiable (on R) and that f

′

(x) = 2x + 4.

2. Let g(x) =

x+1

2x−3

. Then, for any a =

lim

x→a

f (x) − f (a)

x −a

= lim

x→a

x −a



x + 1

2x −3

−

a + 1

2a −3



= lim

x→a

5a −5x

(x − a)(2x −3)(2a −3)

= lim

x→a

−5

(2x −3)(2a −3)

−5

(2a −3)

f is therefore differentiable on its domain R \ {

} with derivative f

′

(x) =

−5

(2x−3)

The familiar expressions

′

(a) = lim

h→0

f (a + h) − f (a)

, f

′

(x) = lim

h→0

f (x + h) − f (x)

are equivalent to the original deﬁnition (see Exercise 5). While seemingly simpler, they sometimes

lead to nastier calculations: see what happens if you try the previous example in this language. . .

We now turn to possibly the most well-known result of Freshman Calculus.

Theorem 3.3 (Power Law). Let r ∈ R. Then f (x) = x

is differentiable with f

′

(x) = rx

r−1

The domains of f and f

′

depend messily on r, but the above certainly holds on the interval (0, ∞).

We leave a complete proof to the exercises and instead consider a few generalizable examples.

Examples 3.4. 1. If n ∈ N and a ∈ R, a simple factorization yields

lim

x→a

− a

x −a

= lim

x→a

(x − a)(x

n−1

+ ax

n−2

+ ··· + a

n−2

x + a

n−1

)

x −a

(∗)

= lim

x→a

n−1

+ ax

n−2

+ ··· + a

n−2

x + a

n−1

) = na

n−1

We conclude that

= nx

n−1

2. If f (x) = x

−1

and a = 0, then

lim

x→a

−1

− a

−1

x −a

= lim

x→a

a − x

ax(x − a)

= lim

x→a

−1

= −

from which we conclude that f

′

(x) = −x

−2

A similar approach followed by the factorization (∗) proves the

power law for all negative integer exponents:

−n

− a

−n

x −a

− x

(x − a)

= ···

−2

−1

−2 −1 1 2

3. To differentiate x

1/n

, simply substitute x = y

and observe case 1.

If g(x) = x

1/3

and a = 0, then y = x

1/3

and b = a

1/3

yield

lim

x→a

1/3

− a

1/3

x −a

= lim

y→b

y −b

−b

−2/3

=⇒ g

′

(x) =

−2/3

Note that g is not differentiable at x = 0!

−1

−2 −1 1 2

We could similarly compute the derivative for all rational exponents, though it is much easier to wait

for the chain rule. The power law for irrational exponents is somewhat more ticklish.

Corollary 3.5 (Basic Transcendental Functions). Recalling our development of power series in the

previous chapter, the power law (for positive integers!) is all we need to see that

exp(x) = exp(x),

sin x = cos x,

cos x = −sin x

It is also possible to develop these results independently of power series (see e.g. Exercise 9).

Failure of differentiability

It is instructive to consider when a function can fail to be differentiable. First a simple result shows

that functions are not differentiable at discontinuities.

Theorem 3.6. If f is differentiable at a then f is continuous at a.

Proof. Simply take the limit (think carefully why this works!):

lim

x→a

f (x) = lim

x→a



f (x) − f (a)

x −a

(x − a) + f (a)



= f

′

(a)(0 −0) + f (a) = f (a)

It remains to consider situations when a function is continuous but not differentiable.

Examples 3.7. The following cover all situations where a function is continuous on an interval and

differentiable everywhere except at a single interior point; similarly to isolated discontinuities, these

are classiﬁed by considering the three ways in which the derivative limit might not exist.

1. A vertical tangent line occurs when the derivative is inﬁnite. For instance, g(x) = x

1/3

at x = 0.

2. Corners occur when the one-sided derivatives are unequal (could be inﬁnite). For instance,

f (x) =

is not differentiable at zero, the one-sided derivatives being

lim

x→0

−

x −0

= lim

x→0

= 1 = lim

x→0

−

x −0

= lim

x→0

−

−x

= −1

Indeed f is differentiable everywhere except at zero, with

′

(x) =

(

1 if x > 0

−1 if x < 0

3. A singularity is where left- and/or right-derivatives do not ex-

ist. The standard example in this case is

f (x) =

(

x sin

if x = 0

0 if x = 0

which is continuous on R and differentiable everywhere ex-

cept at zero: the details are in Exercise 8.

−

Singularities and vertical tangent lines can also prevent one-sided differentiability.

More esoteric examples of non-differentiability are also possible:

• Utilizing series, we can create functions which are continuous on an interval but nowhere differ-

entiable! For a classic example, see page 28.

• It is also possible to construct a function which differentiable (and thus continuous) at precisely

one point; can you think of an example?

The Basic Rules of Differentiation

Theorem 3.8. Let f , g be differentiable and k, l be constants.

1. (Linearity) The function k f + lg is differentiable with (k f + lg)

′

= k f

′

+ lg

′

2. (Product rule) The function f g is differentiable with ( f g)

′

= f

′

g + f g

′

3. (Inverse functions) If f is bijective with non-zero derivative, then f

−1

is differentiable and

−1

(x) =

′



−1

(x))



Proof. Parts 1 and 2 follow from the limit laws:

lim

x→a

( k f + lg)(x) − (k f + lg)(a)

x −a

= lim

x→a



f (x) − f (a)

x −a

+ l

g(x) − g(a)

x −a



= k f

′

(a) + lg

′

(a)

lim

x→a

f (x)g(x) − f (a)g(a)

x −a

= lim

x→a



f (x) − f (a)

x −a

g(x) + f (a)

g(x) − g(a)

x −a



= f

′

(a)g(a) + f (a)g

′

(a)

Note where we used the continuity of g in the second line (lim g(x) = g(a)). Part 3 is an exercise.

The inverse function rule is intuitive since the graphs of f and f

−1

are related by reﬂection in the line

y = x; gradients at corresponding points are therefore reciprocal. In Leibniz notation the result reads





−1

Examples 3.9. 1. Linearity allows us to differentiate any polynomial: for instance



+ 13x



= 7

+ 13

= 14x + 52x

2. The product rule extends the reach of differentiation somewhat:

sin x) =





sin x + x

sin x = 4x

sin x −x

cos x

3. The inverse trigonometric functions can now be differentiated. For instance,

y = sin

−1

x =⇒

sin

−1

x =





−1

cos y

1 −sin

√

1 − x

4. Deﬁne natural log to be the inverse of the (bijective!) exponential function exp(x):

y = ln x ⇐⇒ x = exp y

It follows that

ln x =





−1

exp y

The full details, and the justiﬁcation that exp x = e

, form an optional exercise.

Theorem 3.10 (Chain Rule). If g is differentiable at a and f is differentiable at g(a) then f ◦ g is

differentiable at a with derivative

( f ◦ g)

′

(a) = f

′



g(a)



′

(a)

In Leibniz notation this reads

d( f ◦g)

d f

which looks like a simple cancellation of the dg terms!

Proof. Deﬁne γ : dom( f ) → R via

γ( v) =











−f



g(a)



v−g(a)

if v = g(a)

′



g(a)



if v = g(a)

(∗)

Since f is differentiable at g(a), we see that γ is continuous and lim

v→g(a)

γ( v) = f

′



g(a)



Since g is differentiable at a, there exists an open interval U ∋ a for which x ∈ U =⇒ g(x) ∈ dom( f ).

Now compute: for any x ∈ U \{a}, let v = g(x) in (∗), whence



g(x)



− f



g(a)



x −a

= γ



g(x)



g(x) − g(a)

x −a

Take limits as x → a for the result.

Corollary 3.11 (Quotient Rule). Suppose f and g are differentiable. Then

is differentiable when-

ever g(x) = 0. Moreover





′

g − f g

′

The proof is an exercise.

Examples 3.12. 1. By the quotient rule,

tan x =

sin x

cos x

cos

x + sin

cos

= sec

2. We can now differentiate highly involved combinations of elementary functions:



tan(e

) −

sin x



= 8xe

sec

( e

) −

7 sin x −7x cos x

sin

This is completely unjustiﬁed since dg does not (for us) mean anything on its own! The same problem appears in the

famously faulty one-line ‘proof’ of the chain rule:

lim

x→a



g(x)



− f



g(a)



x −a

= lim

x→a



g(x)



− f



g(a)



g(x) − g(a)

lim

x→a

g(x) − g(a)

x −a

The second limit cannot exist unless g(x) = g(a) for all x near, but not equal to, a. The faulty argument is repaired by

replacing the second difference quotient with f

′



g(a)) whenever g(x) = g(a), before taking the limit. This is precisely what



g(x)



does in the correct proof.

Exercises 28 1. Use Deﬁnition 3.1 to calculate the derivatives.

(a) f (x) = x

at x = 2 (b) g(x) = x + 2 at x = a

cos x at x = 0 (d) r(x) =

3x + 4

2x −1

at x = 1

2. Differentiate the function f (x) = cos



−3x



using the chain and product rules.

3. (a) Prove the quotient rule (Corollary 3.11) by combining the chain and product rules.

(b) Prove the inverse derivative rule (Theorem 3.8, part 3).

(Hint: You can’t simply differentiate 1 =

f ( f

−1

(x)) using the chain rule; why not?)

4. (a) Find the derivatives of secant, cosecant and cotangent using the quotient rule.

(b) Why did we choose the positive square-root when computing

sin

−1

x? What is the

standard domain of arcsine, and what happens at x = ±1?

5. Using the deﬁnition of the derivative, and supposing that f is differentiable at a, prove that

′

(a) = lim

h→0

f (a + h) − f (a)

= lim

h→0

f (a + h) − f (a − h)

6. Prove that the function f (x) = x

is differentiable everywhere and compute its derivative.

7. Show that following function is differentiable everywhere and compute its derivative:

f (x) =

(

sin

if x = 0

0 if x = 0

Moreover, prove that the derivative f

′

is discontinuous at x = 0.

8. Show that the following function is differentiable everywhere except at zero:

f (x) =

(

x sin

if x = 0

0 if x = 0

9. (a) Suppose 0 < h <

. Use the picture to show that

0 <

1 −cos h

< sin

and sin h < h < tan h

Hence conclude that lim

h→0

sin h

= 1 and lim

h→0

1−cos h

= 0.

(b) Use part (a) to prove that

sin x = cos x

cos h

sin h

tan h

10. (Hard) Use induction to prove the Leibniz rule (general product rule):

( f g)

(n)

∑

k=0





(k)

(n−k)

Masochists Corner (non-examinable)

We ﬁnish with two very hard bonus exercises, though the ﬁrst is somewhat easier. If you want a

challenge, give ’em a go!

The Exponential Function & the General Power Law

Consider the function exp(x) :=

∞

∑

n=0

which converges for all real x.

As we saw when discussing power series, this function satisﬁes the initial value problem

exp(x) = exp(x), exp( 0) = 1

Deﬁne e := exp(1). Certainly e

makes sense whenever x ∈ Q. When x is irrational, deﬁne

:= sup{e

: q ∈ Q, q < x}

Our primary goal is to prove that exp(x) = e

. As a nice bonus we recover Bernoulli’s limit

identity e = lim

n→∞



1 +



and obtain a complete proof of the power law.

(a) For all x, y ∈ R, prove that exp(x + y) = exp(x) exp(y)

(Hint: use the binomial theorem and change the order of summation)

(b) Show that exp(x) is always positive, even when x < 0.

(Hint: x ≥ 0 =⇒ exp(x) ≥ 1 + x; take limits then apply part (a))

(d) Prove that e

= exp(x). Do this in three stages:

• If x ∈ N, use part (a). Now check for x ∈ Z

−

• If x =

∈ Q, ﬁrst compute



exp(

)



• If x is irrational, start with ∃(q

) ⊆ Q such that q

< x and e

→ e

. . .

(e) Let ln : (0, ∞) → R be the inverse function of exp. Prove the logarithm laws:

ln(xy) = ln x + ln y and ln x

= r ln x

(Just do this when r ∈ N; another argument like part (d) is required in general)

(f) We’ve already seen that

ln y =

. Use the fact that

ln y = lim

h→0

ln(y + h) −ln y

to prove that exp(x) = lim

n→∞



1 +



, thus recovering Bernoulli’s deﬁnition of e.

(g) For any r ∈ R, deﬁne x

:= exp(r ln x). Hence obtain the power law for any exponent.

A Very Strange Function

Here is a classic example of a continuous but nowhere-differentiable function!

Let f be the sawtooth function deﬁned by f (x) =

whenever x ∈ [−1, 1] and extending

periodically to R so that f (x + 2) = f (x). Now deﬁne g : R → R via

g(x) =

∞

∑

n=0





f (4

−2 −1 0 1 2

f (x) and iterations to n = 3 g(x) (really n = 6, but can you tell?!)

(a) Prove that g is well-deﬁned and continuous on R.

(b) Let x ∈ R and m ∈ N be ﬁxed. Deﬁne h

= ±

·4

−m

where the ±-sign is chosen so that

no integers lie strictly between 4

x and 4

(x + h

) = 4

x ±

For each n ∈ N

, deﬁne



(x + h

)



− f ( 4

Prove the following

≤ 4

with equality when n = m.

ii. n > m =⇒ k

= 0.

(Hint:

f (y) − f (z)

≤

y −z

: when is this an equality?)



g(x + h

) − g(x)



≥

+ 1)

Hence conclude that g is nowhere differentiable.

29 The Mean Value Theorem

We now turn to one of the central results in calculus.

Theorem 3.13 (Mean Value Theorem/MVT). Let f be continuous on [a, b] and differentiable on

(a, b). Then there exists ξ ∈ (a, b) such that f

′

( ξ) =

f (b)−f (a)

b−a

This follows easily from two lemmas.

Lemma 3.14. 1. (Critical Points) Suppose g is bounded on (a, b) and attains its maximum or min-

imum at ξ ∈ (a, b). If g is differentiable at ξ then g

′

( ξ) = 0.

2. (Rolle’s Theorem) Suppose g is continuous on [a, b], differentiable on (a, b), and that g(a) =

g( b). Then there exists ξ ∈ (a, b) such that g

′

( ξ) = 0.

The main result follows by applying Rolle’s theorem to

g(x) = f (x) −

f (b) − f (a)

b − a

(x −b)

and observing that g(a) = f (b) = g(b) and g

′

(x) = f

′

(x) −

f (b)−f (a)

b−a

g(x)

a b

Critical Points/Rolle’s Theorem

f (x)

a b

Mean Value Theorem

In the pictures, the orange and green lines are parallel: the average slope over the interval [a, b] equals

the gradient/derivative f

′

( ξ).

Proof of Lemma. 1. Suppose, for a contradiction, that

′

( ξ) = lim

x→ξ

g(x) − g(ξ)

x −ξ

> 0

Let ϵ = g

′

( ξ) in the deﬁnition of limit: ∃δ > 0 such that

0 <

x −ξ

< δ =⇒



g(x) − g(ξ)

x −ξ

− g

′

( ξ)



< g

′

( ξ) =⇒ 0 <

g(x) − g(ξ)

x −ξ

< 2g

′

( ξ)

In particular, if x ∈ (ξ, ξ + δ), then g(x) > g(ξ) , contradicting the maximality at ξ.

The argument when g

′

( ξ) < 0 is similar. Finally, apply to −g for the result at a minimum.

2. By the extreme value theorem, g is bounded and attains its bounds. If the maximum and min-

imum both occur at the endpoints a, b, then g is constant: any ξ ∈ (a, b) satisﬁes the result.

Otherwise, at least one extreme value occurs at some ξ ∈ (a, b): part 1 says that g

′

( ξ) = 0.

Examples 3.15. 1. Let f (x) = (x −1)

(4 −x) + x on [a, b] = [1, 4]: this is roughly the above picture

illustrating the mean value theorem. We compute the average slope and the derivative:

f (b) − f (a)

b − a

= 1, f

′

(x) = 2(x −1) (4 − x) − (x −1)

+ 1 = −3x

+ 12x −8

and observe that

′

( ξ) =

f (b) − f (a)

b − a

⇐⇒ 3ξ

−12ξ + 9 = 0 ⇐⇒ ξ = 1 or 3

Since only 3 lies in the interval ( 1, 4), this is the value ξ satisfying the mean value theorem.

2. We ﬁnd the maximum and minimum values of g(x) = x

−14x

+ 24x on the interval [0, 2].

The function is differentiable, with

′

(x) = 4x

−28x + 24 = 4(x −2) (x −1)(x + 3)

By the Lemma, the locations of the extrema are either the end-

points x = 0, 2 or locations with zero derivative (x = 1). Since

f (0) = 0, f (1) = 11, f (2) = 8

we conclude that max( f ) = f (1) = 11 and min( f ) = f (0) = 0.

g(x)

0 1 2

Consequences of the Mean Value Theorem Several simple corollaries relate to monotonicity.

Deﬁnition 3.16. Suppose f : I → R is deﬁned on an interval I. We say that f is:

Increasing (monotone-up) on I if x < y =⇒ f (x) ≤ f (y)

Decreasing (monotone-down) on I if x < y =⇒ f (x) ≥ f (y)

We say strictly increasing/decreasing if the inequalities are strict.

Examples 3.17. 1. f : x 7→ x

is strictly increasing on [0, ∞)

and strictly decreasing on (−∞, 0].

2. The ﬂoor function f : x 7→ ⌊x⌋ (the greatest integer less

than or equal to x) is increasing, but not strictly, on R.

−2

−1

g(x)

−2 −1 1 2 3

Corollary 3.18. Suppose f is differentiable on an interval I, then

1. f

′

≥ 0 on I ⇐⇒ f is increasing on I

2. f

′

≤ 0 on I ⇐⇒ f is decreasing on I

3. f

′

= 0 on I ⇐⇒ f is constant on I

Proof. (⇒) Let x < y where x, y ∈ I. By the mean value theorem, ∃ξ ∈ (x, y) such that

f (y) − f (x)

y − x

= f

′

( ξ) whence f

′

( ξ) ≥ 0 =⇒ f (y) ≥ f (x)

(⇐) For the converse, use the deﬁnition of derivative: f

′

( ξ) = lim

x→ξ

f (x) −f (ξ)

x−ξ

. If f is increasing, then

x > ξ =⇒ f (x) ≥ f (ξ) =⇒ f

′

( ξ) ≥ 0

Parts 2 and 3 are similar.

Corollary 3.18 yields a couple of ﬂashbacks to elementary calculus.

Corollary 3.19. Let I be an interval.

1. (Anti-derivatives on an interval) If f

′

(x) = g

′

(x) on I, then ∃c such that g(x) = f (x) + c on I.

2. (First derivative test) Suppose f is continuous on I and differentiable except perhaps at ξ. If

(

′

(x) < 0 whenever x < ξ, and

′

(x) > 0 whenever x > ξ

then f has its minimum value at x = ξ

The statement for a maximum is similar.

Examples 3.20. 1. Since

sin( 3x

+ x) = ( 6x + 1) cos(3x

+ x) on (the interval) R, whence all

anti-derivatives of f (x) = (6x + 1) cos(3x

+ x) are given by

f (x) dx =

(6x + 1) cos(3x

+ x) dx = sin(3x

+ x) + c

As is typical, we use the indeﬁnite integral notation

f (x) dx for anti-derivatives.

2. If f (x) = x

2/3

x/3

, then f

′

(x) =

−1/3

(2 + x)e

x/3

By Lemma 3.14, the only possible critical points are at

x = 0 or −2. The sign of the derivative is also clear:

−2 0

′

(x) > 0 f

′

(x) < 0 f

′

(x) > 0

f (x)

−3 −2 −1 0 1

By the 1

derivative test, f has a maximum at x = −2 and a minimum at x = 0.

We ﬁnish this section by tying together the mean and intermediate value theorems.

Theorem 3.21 (IVT for Derivatives). Suppose f is differentiable on an interval I containing a < b,

and that L lies between f

′

(a) and f

′

( b). Then ∃ξ ∈ (a, b) such that f

′

( ξ) = L.

If f

′

(x) is continuous, this is just the intermediate value theorem applied to f

′

. A full proof is left to

the exercises; surprisingly, continuity is not required. . .

Exercises 29 1. Determine whether the conclusion of the mean value theorem holds for each func-

tion on the given interval. If so, ﬁnd a suitable point ξ. If not, state which hypothesis fails.

(a) x

on [−1, 2] (b) sin x on [0, π] (c)

on [−1, 2]

(d) 1/x on [−1, 1] (e) 1/x on [1, 3]

2. Suppose f and g are differentiable on an open interval I, that a < b and f (a) = f (b) = 0. By

considering h(x) = f (x)e

g(x)

, prove that f

′

( ξ) + f (ξ)g

′

( ξ) = 0 for some ξ ∈ (a, b).

3. Use the Mean Value Theorem to prove the following:

(a) x < tan x for all x ∈ (0, π/2).

(b)

sin x

is a strictly increasing function on (0, π/2).

sin x for all x ∈ [0, π/2].

4. Suppose that

f (x) − f (y)

≤ (x −y)

for all x, y ∈ R. Prove that f is a constant function.

5. (a) Prove that f

′

> 0 on an interval I =⇒ f is strictly increasing on I.

(b) Show that the converse of part (a) is false.

6. If f is differentiable on an interval I such that f

′

(x) = 0 for all x ∈ I, use the intermediate value

theorem for derivatives to prove that f is either strictly increasing or strictly decreasing.

7. We prove the intermediate value theorem for derivatives. Let f , a, b and L be as in the Theorem,

deﬁne g : I → R by g(x) = f (x) − Lx, and let ξ ∈ [a, b] be such that

g( ξ) = min{g(x) : x ∈ [a, b]}

(a) Why can we be sure that ξ exists? If ξ ∈ (a, b), explain why f

′

( ξ) = L.

(b) Now assume WLOG that f

′

(a) < f

′

( b). Prove that g

′

(a) < 0 < g

′

( b). By considering

lim

x→a

g(x)−g(a)

x−a

, show that ∃x > a for which g(x) < g(a). Hence complete the proof.

8. Suppose f

′

exists on (a, b), and is continuous except for a discontinuity at c ∈ (a, b).

(a) Obtain a contradiction if lim

x→c

′

(x) = L < f

′

( c). Hence argue that f

′

cannot have a

removable or a jump discontinuity at x = c.

(Hint: let ϵ =

′

(c)−L

in the deﬁnition of limit then apply IVT for derivatives)

(b) Similarly, obtain a contradiction if lim

x→c

′

(x) = ∞ and conclude that f

′

cannot have an

inﬁnite discontinuity at x = c.

′

can have an essential discontinuity. Recall (Exercise 28.7) that

f : R → R : x 7→

(

sin( 1/x) x = 0

0 x = 0

is differentiable on R, but has discontinuous derivative at x = 0.

i. By considering x

2nπ

and y

(2n+1)π

, show that f

′

has an essential discontinuity

at x = 0.

ii. Prove that if s

→ 0 and f

′

( s

) converges to some M, then M ∈ [−1, 1].

iii. Use IVT for derivatives to show that for any L ∈ [−1, 1], ∃(t

) ⊆ R \ {0} such that

lim

n→∞

′

( t

) = L.

30 L’Hˆopital’s Rule

We are often forced to consider limits known as indeterminate forms, which do not yield easily to the

standard limits laws. For example, it is tempting to try to write

lim

x→0

sin 2x

−1

lim

x→0

sin 2x

lim

x→0

−1

(∗)

This is an incorrect application of the limit laws since the resulting quotient has no meaning.

Deﬁnition 3.22. An indeterminate form is a limit where a na

ıve application of the limit laws results

in a meaningless expression: the primary types are

∞

, ∞ − ∞, 0 · ∞, 0

, 0

∞

, and 1

∞

Examples 3.23. 1. lim

x→7

(x −7)

x−7

is an indeterminate form of type 0

∞

2. The above indeterminate form (∗) may be evaluated using the deﬁnition of the derivative

lim

x→0

sin 2x

−1

= lim

x→0

sin 2x −0

x −0

−1





x=0

sin 2x





x=0



−1

By considering lim

x→0

3a sin 2x

2(e

−1)

, we see that an indeterminate form of type

can take any value a!

This approach generalizes: if f (a) = 0 = g(a), we obtain the simplest version of l’H

opital’s rule;

lim

x→a

f (x)

g(x)

= lim

x→a

f (x) − f (a)

x −a

g(x) − g(a)

′

(a)

′

(a)

This obviously isn’t rigorous. Our goal is to make it so and to extend to the following situations:

• Limits where a = ±∞.

• When the RHS cannot be cleanly evaluated: for instance g

′

(a) = 0 or if the original limit is ±∞.

Covering all cases makes the proof an absolute behemoth! Because of this, and because such limits

can often be evaluated more instructively using elementary methods, the rule is often discouraged

in Freshman calculus. To prepare for the upcoming monster, we ﬁrst generalize the MVT.

Lemma 3.24 (Extended Mean Value Theorem). Fix a < b, suppose f , g are continuous on [a, b] and

differentiable on (a, b). Then there exists ξ ∈ (a, b) such that



f (b) − f (a)



′

( ξ) =



g( b) − g(a)



′

( ξ)

Proof. Simply apply the standard mean value theorem (really Rolle’s Theorem) to

h( t) = ( f (b) − f (a))g(t) − (g(b) − g(a)) f (t)

which satisﬁes h(a) = h(b).

Theorem 3.25 (l’Hˆopital’s rule). Let a ∈ R ∪ {±∞} and suppose functions f and g satisfy:

1. lim

x→a

′

(x)

′

(x)

= L for some L ∈ R ∪ {±∞}

2. (a) lim

x→a

f (x) = lim

x→a

g(x) = 0, or (b) lim

x→a

g(x) = ∞ (no condition on f )

Then lim

x→a

f (x)

g(x)

= L. The same result holds for one-sided limits.

Examples 3.26. 1. If f (x) = e

and g(x) = 21x −17, then

lim

x→∞

′

(x)

′

(x)

= lim

x→∞

= ∞ =⇒ lim

x→∞

21x −17

= ∞

This is an example of type

∞

2. For an example of type

, consider f (x) = x

−9 and g(x) = ln(4 − x):

lim

x→3

−

′

(x)

′

(x)

= lim

x→3

−

−1/(4 − x)

= lim

x→3

−

2x(x −4) = −6 =⇒ lim

x→3

−

−9

ln( 4 −x)

= −6

3. One can apply the rule repeatedly: for example

lim

x→0

−1 −4x

= lim

x→0

−4

= lim

x→0

16e

= 8

There is an abuse of protocol here, since the existence of the ﬁrst limit is dependent on the last.

The approach is acceptable, though you should understand why it is an abuse. Indeed. . .

4. It is important that the limit lim

′

be seen to exist before applying l’H

opital’s rule! Consider

f (x) = x + cos x and g(x) = x: certainly lim

x→∞

f (x)

g(x)

has type

∞

, however

lim

x→∞

′

(x)

′

(x)

= lim

x→∞

1 −sin x

does not exist! In this case the rule is unnecessary, since

f (x)

g(x)

= 1 +

cos x

−−−→

x→∞

by the squeeze theorem.

5. Finally, a short example to explain why l’H

opital’s rule is often prohibited in Freshman calculus.

Consider the calculation:

lim

x→0

sin x

= lim

x→0

cos x

= 1

This appears to be a legitimate application of the rule. However, recall (Exercise 28.9) that one

purpose of this limit is to demonstrate that

sin x = cos x; to use this fact to calculate the limit

on which it depends is the very deﬁnition of circular logic!

Other Indeterminate Forms

The remaining indeterminate forms listed in Deﬁnition 3.22 may be modiﬁed so that l’H

opital’s rule

applies. Since you’ve likely seen several such examples in elementary calculus, we give just a couple.

Examples 3.27. 1. An indeterminate form of type ∞ − ∞ is transformed to one of type

before

applying the rule (twice):

lim

x→0

−1

−

= lim

x→0

x + 1 − e

x(e

−1)

(type

)

= lim

x→0

1 − e

−1 + xe

(still type

)

= lim

x→0

−e

+ xe

= −

2. For an indeterminate form of type 1

∞

, we use the log laws & the continuity of the exponential:

lim

x→0

(1 + sin x)

1/x

= exp



lim

x→0

ln( 1 + sin x)



(type

)

= exp



lim

x→0

cos x

1 + sin x



= e

Proving l’Hˆopital’s Rule

The complete argument is very long; if you do nothing else, read the following proof of the simplest

case. Everything else is a modiﬁcation.

Proof (type

with right limits). We prove ﬁrst for right-limits x → a

. First observe that condition 1.

forces the existence of an interval (a, b) on which f , g are differentiable and g

′

(x) = 0.

Assume we have a form of type

(case 2. (a)) and assume additionally that a and L are ﬁnite. Every-

thing follows from the deﬁnition of limit (condition 1.) and Lemma 3.24:

Given ϵ > 0, ∃δ ∈ (0, b −a) such that a < ξ < a + δ =⇒



′

( ξ)

′

( ξ)

− L



(∗)

a < y < x < a + δ =⇒ ∃ξ ∈ (y, x) such that

f (x) − f (y)

g(x) − g(y)

′

( ξ)

′

( ξ)

(†)

Since g

′

= 0, the usual mean value theorem says we never divide by zero in (†):

∃c ∈ (y, x) such that g(x) − g(y) = g

′

( c)(x −y) = 0

Observe that



f (x)−f (y)

g(x)−g(y)

− L



′

(ξ)

′

(ξ)

− L



, let y → a

and use 2. (a) to see that

∀x ∈ (a, a + δ),



f (x)

g(x)

− L



≤

< ϵ

which is the required result.

We now describe some modiﬁcations.

If a = −∞: Replace the blue part of (∗) as follows:

Given ϵ > 0, ∃m ≤ b such that ξ < m =⇒



′

( ξ)

′

( ξ)

− L



The rest of the proof goes through after replacing a with −∞ and a + δ with m.

If L = ∞: Replace the green parts of (∗) with Given M > 0 and

′

(ξ)

′

(ξ)

> 2M. Fixing the rest of the

proof is again straightforward.

If L = −∞: Replace the green parts of (∗) with Given M > 0 and

′

(ξ)

′

(ξ)

< −2M.

Left-limits: If f , g are differentiable on (c, a), then the blue part may be replaced with either:

• (a ﬁnite) ∃δ ∈ (0, a −c) such that a − δ < ξ < a

• (a = ∞) ∃m ≥ c such that ξ > m

The blue and green parts of (∗) can be replaced independently. This completes the proof for all

indeterminate forms of type

Proof (case 2. (b) when lim g(x) = ∞). This requires a little more modiﬁcation.

Since g

′

= 0, and

lim

x→a

g(x) = ∞, Exercise 29.6 says that g is strictly decreasing on (a, b). By replacing b by some

b ∈ (a, b), if necessary, we may assume that

a < y < x < b =⇒ 0 < g(x) < g(y) (‡)

Assume a and L are ﬁnite, and obtain (∗) and (†) as before. Let x ∈ (a, a + δ) be ﬁxed and multiply

(†) by

g(y)−g(x)

g(y)

(this is positive by (‡)): a little algebra and the triangle inequality tell us that

a < y < x =⇒

f (y)

g( y)

′

( ξ)

′

( ξ)

f (x)

g( y)

−

g(x)

g( y)

′

( ξ)

′

( ξ)

=⇒



f (y)

g( y)

− L



≤



′

( ξ)

′

( ξ)

− L



g( y)



f (x)

g(x)



L +



Since lim

y→a

g( y) = ∞ and x is ﬁxed, we see that there exists η ≤ x −a < δ such that

y ∈ (a, a + η) =⇒

g( y)



f (x)

g(x)



L +



Finally combine with (∗): given ϵ > 0, ∃η > 0 such that y ∈ (a, a + η) =⇒



f (y)

g(y)

− L



< ϵ.

The same modiﬁcations listed previously complete the proof.

Forms of type

∞

? Instead of assumption 2. (b), why not simply assume lim f = lim g = ∞ and write

1/g

1/ f

to obtain

a form of type

? The problem is that the derivative of the ‘new’ denominator

−f

′

need not be non-zero on any

interval (a, b) and so condition 1. need not hold. We could modify this, but it would make for a weaker theorem. Example

3.26.4 illustrates this: f

′

(x) = 1 + sin x has zeros on any unbounded interval.

After the 2. (b) case is proved and we know that lim

= L, it is then clear that lim f must also be inﬁnite (unless L = 0 in

which case lim f could be anything and need not exist). This situation therefore really does deal with forms of type

∞

Exercises 30 1. Evaluate the following limits, if they exist:

(a) lim

x→0

sin x −x

(b) lim

x→

−

tan x −

π −2x

x→0

(cos x)

1/x

(d) lim

x→0

(1 + 2x)

1/x

(e) lim

x→∞

( e

+ x)

1/x

2. Let f be differentiable on (c, ∞) and suppose that lim

x→∞

[ f (x) + f

′

(x)] = L is ﬁnite.

(a) Prove that lim

x→∞

f (x) = L and that lim

x→∞

′

(x) = 0.

(Hint: write f (x) =

f (x)e

)

(b) Does anything change if L exists and is inﬁnite?

3. If p

(x) is a polynomial of degree n, use induction to prove that lim

x→∞

(x)e

−x

= 0

4. Let f (x) = x + sin x cos x, g(x) = e

sin x

f (x) and h(x) =

2 cos x

sin x

( f (x) + 2 cos x)

(a) Prove that lim

x→∞

f (x) = ∞ = lim

x→∞

g(x) but that lim

x→∞

f (x)

g(x)

does not exist.

(b) If cos x = 0, and x is large, show that

′

(x)

′

(x)

= h(x).

x→∞

h(x) = 0. Explain why this does not contradict part (a)!

31 Taylor’s Theorem

A primary goal of power series is the approximation of functions. As such, there are two natural

questions to ask of a given function f :

1. Given c ∈ dom( f ), is there a series

∑

(x −c)

which equals f (x) on an interval containing c?

2. If we take the ﬁrst n terms of such a series, how accurate is this polynomial approximation?

Example 3.28. Recall the geometric series

f (x) =

1 − x

∞

∑

n=0

whenever −1 < x < 1

The polynomial approximation

(x) =

∑

k=0

= 1 + x + ···+ x

1 − x

n+1

1 − x

has error

(x) = f (x) − p

(x) =

n+1

1 − x

−1 0 1

−

(x) = 1 + x + x

+ x

If x is close to 0, this is likely very small; for instance if x ∈



−



, then

(x)

≤

1 −





n+1

= 2

−n

However, when x is close to 1, the error is unbounded!

The behavior in the Example occurs in general: the truncated polynomial approximations are better

near the center of the series. To see this, we ﬁrst need to consider higher-order derivatives.

Deﬁnition 3.29. We write f

′′

for the second derivative of f , namely the derivative of its derivative

′′

(a) = lim

x→a

′

(x) − f

′

(a)

x −a

The existence of f

′′

(a) presupposes that f

′

exists on an (open) interval containing a. We can similarly

consider third, fourth, and higher-order derivatives. As a function, the n

derivative is written

(n)

(x) =

By convention, the zeroth derivative is the function itself f

(0)

(x) = f (x). We say that f is n times

differentiable at a if f

(n)

(a) exists, and inﬁnitely differentiable (or smooth) if derivatives of all orders exist.

Example 3.30. f (x) = x

is twice differentiable, with f

′′

(x) = 6

. It is smooth everywhere

except at x = 0, where third (and higher-order) derivatives do not exist.

Deﬁnition 3.31. Suppose f is n times differentiable at x = c. The n

Taylor polynomial p

of f

centered at c is

(x) :=

∑

k=0

(k)

( c)

(x −c)

= f (c) + f

′

( c)(x −c) +

′′

( c)

(x −c)

+ ··· +

(n)

( c)

(x −c)

The remainder R

(x) is the error in the polynomial approximation

(x) = f (x) − p

(x) = f (x) −

∑

j=0

(k)

( c)

(x −c)

If f is inﬁnitely differentiable at x = c, then its Taylor series centered at x = c is the power series

T f (x) =

∞

∑

n=0

(n)

( c)

(x −c)

When c = 0 this is known as a Maclaurin series.

For simplicity we’ll most often work with Maclaurin series, with general cases hopefully being clear.

Examples 3.32. 1. If f (x) = e

, then f

(n)

(x) = 3

, from which the Maclaurin series is

T f (x) =

∞

∑

n=0

2. If g(x) = sin 7x, then the sequence of derivatives is

7 cos 7x, −7

sin 7x, −7

cos 7x, 7

sin 7x, 7

cos 7x, −7

sin 7x, . . .

At x = 0, every even derivative is zero, while the odd derivatives alternate in sign; the Maclau-

rin series is easily seen to be

Tg(x) =

∞

∑

n=0

( −1)

2n+1

(2n + 1) !

2n+1

3. If h(x) =

√

x, then h

′

(x) =

−1/2

, h

′′

(x) =

−1

−3/2

, and h

′′′

(x) =

−5/2

, from which the

third Taylor polynomial centered at c = 1 is

(x) = h( 1) + h

′

(1)(x −1) +

′′

(1)

(x −1)

′′′

(1)

(x −1)

= 1 +

(x −1) −

(x −1)

Rather than compute more examples, we develop a little theory that makes verifying Taylor series

much easier.

Named for Englishman Brook Taylor (1685–1731) and Scotsman Colin Maclaurin (1698–1746). Taylor’s general method

expanded on examples discovered by James Gregory and Issac Newton in the mid-to-late 1600’s.

Differentiation of Taylor Polynomials and Series

Suppose P(x) =

∑

is a power series with radius of convergence R > 0. As we discovered

previously, this is differentiable term-by-term on (−R, R). Indeed

′

(x) =

∞

∑

j=1

j−1

=⇒ P

′

(0) = a

′′

(x) =

∞

∑

j=2

j(j −1)x

j−2

=⇒ P

′′

(0) = 2a

′′′

(x) =

∞

∑

j=3

j(j −1)(j −2)x

j−3

=⇒ P

′′′

(0) = 3!a

(k)

(x) =

∞

∑

j=k

j(j −1) ···(j −k + 1)x

j−k

∞

∑

j=k

j!a

(j − k)!

j−k

=⇒ P

(k)

(0) = k!a

Otherwise said, P is its own Maclaurin series! The same discussion holds for polynomials: indeed if

P(x) = a

+ a

x + ···+ a

is a polynomial, then for all k ≤ n,

(k)

(0) = f

(k)

(0) ⇐⇒ a

(k)

(0)

If this holds for all k ≤ n, then P must be the Taylor polynomial of f ! With a little modiﬁcation, we’ve

proved the following:

Theorem 3.33. 1. If f (x) =

∞

∑

n=0

(x − c)

on a neighborhood of c, then

∞

∑

n=0

(x − c)

is the Taylor

series of f .

2. The n

Taylor polynomial of f centered at x = c is the unique polynomial p

of degree ≤ n

whose value and ﬁrst n derivatives agree with those of f at x = c: that is

∀k ≤ n, p

(k)

( c) = f

(k)

( c)

This answers our ﬁrst motivating question: a function can equal at most one power series with a

given center. The second question requires a careful study of the remainder: we’ll do this shortly.

Examples 3.34 (Common Maclaurin Series). These should be familiar from elementary calculus.

Each of these functions equals the given series by our previous discussion of power series: by the

Theorem, each series is therefore the Maclaurin series of the given function with no requirement to

calculate directly!

∞

∑

n=0

x ∈ R

1 − x

∞

∑

n=0

x ∈ (−1, 1)

sin x =

∞

∑

n=0

( −1)

(2n + 1) !

2n+1

x ∈ R ln( 1 + x) =

∞

∑

n=1

( −1)

n+1

x ∈ (−1, 1]

cos x =

∞

∑

n=0

( −1)

(2n)!

x ∈ R tan

−1

x =

∞

∑

n=0

( −1)

2n + 1

2n+1

x ∈ [−1, 1]

Examples 3.35 (Modifying Maclaurin Series). By substituting for x in a common series, we

quickly obtain new ones.

1. Substitute x 7→ 7x in the Maclaurin series for sin x, to recover our earlier example

sin 7x =

∞

∑

n=0

( −1)

2n+1

(2n + 1) !

2n+1

, x ∈ R

Note how this requires almost no calculation: since the function equals a series, the Theorem

says we have the Maclaurin series for sin 7x!

2. Substitute x 7→ x

in the Maclaurin series for e

to obtain

= exp(x

) =

∞

∑

n=0

, x ∈ R

This would be disgusting to verify directly, given the difﬁculty of repeatedly differentiating e

3. We ﬁnd the Taylor series for f (x) =

5−x

centered at x = 2:

f (x) =

3 + 2 − x

3( 1 −

2−x

)

∞

∑

n=0



2 − x



which is valid whenever −1 <

2−x

< 1 ⇐⇒ −1 < x < 5.

4. Fix c ∈ R and observe that, for all x ∈ R,

= e

c+x−c

= e

x−c

∞

∑

n=0

(x −c)

We conclude that the series is the Taylor series of e

centered at x = c. Of course this is easily

veriﬁed using the deﬁnition, since



x=c

= e

5. Combining the Theorem with the multiple-angle formula, we obtain the Taylor series for sin x

centered at x = c:

sin x = sin( c + x −c) = sin c cos(x −c) + cos x sin(x −c)

∞

∑

n=0

( −1)

sin c

(2n)!

(x −c)

∞

∑

n=0

( −1)

cos c

(2n + 1) !

(x −c)

2n+1

Deﬁnition 3.36. A function is analytic on a domain if for each c there exists a neighborhood of c on

which the function equals its Taylor series centered at c.

All the examples we’ve so far seen are analytic on their domains; indeed the last two of Examples

3.35 prove this for the exponential and sine functions. Every analytic function is automatically smooth

(inﬁnitely differentiable), however the converse is false in that not every smooth function is analytic

(see Exercise 10). Analyticity is of greater importance in complex analysis where it is seen to be

equivalent to complex-differentiability.

Accuracy of Taylor Approximations

Our ﬁnal goal is to estimate the accuracy of a Taylor polynomial as an approximation to its generating

function. Otherwise said, we want to estimate the size of the remainder R

(x) = f (x) − p

(x).

Theorem 3.37 (Taylor’s Theorem: Lagrange’s form). Suppose f is n + 1 times differentiable on an

open interval I containing c and let x ∈ I \ {c}. Then there exists some ξ between c and x for which

the remainder centered at c satisﬁes

(x) =

(n+1)

( ξ)

( n + 1)!

(x −c)

n+1

Proof. For simplicity let c = 0. Fix x = 0, deﬁne a constant M

and a function g : I → R by

(x) =

( n + 1)!

n+1

and g(t) =

( n + 1)!

n+1

+ p

( t) − f (t) =

( n + 1)!

n+1

− R

( t)

Observe that

k ≤ n + 1 =⇒ g

(k)

(x) =

( n + 1 − k)!

n+1−k

+ p

(k)

( t) − f

(k)

( t) (∗)

=⇒ g

(k)

(0) = p

(k)

(0) − f

(k)

(0) = 0 if k ≤ n

where we invoked Theorem 3.33.

Apply Rolle’s Theorem repeatedly (WLOG assume x > 0):

• ∃ξ

between 0 and x such that g

′

( ξ

) = 0.

• ∃ξ

between 0 and ξ

such that g

′′

( ξ

) = 0, etc.

• Iterate to obtain a sequence (ξ

) such that

0 < ξ

n+1

< ξ

< ··· < ξ

< x and g

(k)

( ξ

) = 0

Take ξ = ξ

n+1

and consider (∗): since deg p

≤ n, we see that

0 = g

(n+1)

( ξ) = M

− f

(n+1)

( ξ) =⇒ R

(x) = f (x) − p

(x) =

(n+1)

( ξ)

( n + 1)!

n+1

Corollary 3.38. Suppose f is smooth on an open interval I containing c and that all derivatives f

(n)

of all orders are bounded on I. Then f equals its Taylor series (centered at c) on I.

Proof. For simplicity, let c = 0. Suppose



(n+1)

( ξ)



≤ K for all ξ ∈ I. Choose any N >

and

observe that

n > N =⇒

(x)

≤

n+1

( n + 1)!

n+1

N!(N + 1) ···(n + 1)

≤





n+1−N

−−−→

n→∞

Examples 3.39. 1. The functions sine and cosine have derivatives bounded by 1 on R, and thus

both functions equal their Maclaurin series on R. This removes the need to have previously

justiﬁed these facts using the theory of differential equations.

2. The exponential function does not have bounded derivatives, however we can still apply Tay-

lor’s Theorem. For any ﬁxed x, ∃ξ between 0 and x such that

(x)



( n + 1)!

n+1



−−−→

n→∞

by the same argument in the Corollary. Thus e

equals its Maclaurin series on the real line.

3. Extending Example 3.32.3, we see that the function h(x) =

√

x has the following linear approx-

imation (1

Taylor polynomial) centered at c = 9

(x) = h( 9) + h

′

(9)(x −9) = 3 +

(x −9)

This yields the simple approximation

√

10 ≈ p

(10) = 3 +

Taylor’s Theorem can be used to estimate its accuracy (remember to shift the center to 9!):

(10) =

′′

( ξ)

(10 −9)

= −

·2!

−3/2

= −

8ξ

3/2

for some ξ ∈ (9, 10)

Certainly ξ

−3/2

< 9

−3/2

, whence

−

216

< R

(10) < 0 =⇒

−

216

683

216

√

10 <

684

216

is therefore an overestimate for

√

10, but is accurate to within

216

< 0.005.

Alternative Versions of Taylor’s Theorem

The two other common expressions for the remainder are typically less easy to use than Lagrange’s

form, but can sometimes provide sharper estimates for the remainder, particularly when x is far from

the center of the series.

Corollary 3.40. Suppose f

(n+1)

is continuous on an open interval I containing c, let x ∈ I \{c}, and

let R

(x) = f (x) − p

(x) be the remainder for the Taylor polynomial centered at c. Then:

1. (Integral Remainder) R

(x) =

(x −t)

(n+1)

( t) dt

2. (Cauchy’s Form) ∃ξ between c and x such that R

(x) =

(x −ξ)

(x −c) f

(n+1)

( ξ)

Using these expressions it is possible to explicitly prove Newton’s binomial series formula:

Theorem 3.41. If α ∈ R and

< 1, then

(1 + x)

= 1 +

∞

∑

n=1

α(α −1) ···(α − n + 1)

= 1 + αx +

α(α −1)

α( α −1)(α − 2)

α( α −1)(α − 2) (α − 3)

+ ···

If α ∈ N

, this is the usual binomial theorem. Otherwise it is more interesting, for instance,

√

1 + x = (1 + x)

1/2

= 1 +

x −

−

128

+ ···

(1 + x)

= 1 −3x + 6x

−10x

+ 15x

−···

Of course this last could easily be obtained from

1+x

∑

( −1)

by differentiating twice!

Exercises 31 1. Compute the Maclaurin series for cos x directly from the deﬁnition and use Taylor’s

Theorem to indicate why it converges to cos x for all x ∈ R.

2. Repeat the previous exercise for sinh x =

( e

−e

−x

) and cosh x =

( e

+ e

−x

3. Find the Maclaurin series for the function sin(3x

). How do you know you are correct?

4. Find the Taylor series of f (x) = x

−3x

+ 2x −5 centered at x = 2 and show that T f (x) = f (x).

5. Find a rational approximation to

√

9 using the ﬁrst Taylor polynomial for f (x) =

√

x. Now use

Taylor’s Theorem to estimate its accuracy.

6. If c = 1, use the fact that 1 − x = (1 − c)



1 −

x−c

1−c



to obtain the Taylor series of

1−x

centered at

c. Hence conclude that

1−x

is analytic on its domain R \ {1}.

7. We use Taylor’s Theorem to prove that the Maclaurin series

∞

∑

n=1

(−1)

n+1

converges to ln(1 + x)

whenever 0 < x ≤ 1.

(a) Explicitly compute

n+1

ln( 1 + x).

(b) Suppose 0 < x ≤ 1. Using Taylor’s Theorem, prove that lim

n→∞

(x) = 0.

(If −1 < x < 0, the argument is tougher, being similar to Exercise 11)

8. Why can’t we use Taylor’s Theorem to approximate the error in

1−x

= 1 + x + R

(x) when

x ≥ 1? Try it when x = 2, what happens? What about when x = −2?

9. Prove Taylor’s Theorem with integral remainder when c = 0 by using the following as an

induction step: for each n ∈ N, deﬁne

(x) =

(x −t)

(n+1)

( t) dt

and use integration by parts to prove that A

n+1

= A

−

n+1

(n+1)!

(n+1)

(0).

(The Cauchy form follows from the intermediate value theorem for integrals which we’ll see later)

10. Consider the function

f (x) =

(

−1/x

if x > 0

0 otherwise

(a) Prove by induction that there exists a degree 2n polynomial q

for which

(n)

(x) = q





−1/x

whenever x > 0

(b) Prove that f is inﬁnitely differentiable at x = 0 with f

(n)

(0) = 0 (use Exercise 30.3).

The Maclaurin series of f is identically zero! Moreover, f is smooth (inﬁnitely differentiable) on R but

non-analytic at zero since it does not equal its Taylor series on any open interval containing zero.

A modiﬁcation allows us to create bump functions, which ﬁnd wide use in analysis. If a < b, deﬁne

a,b

: x 7→ f (x − a) f (b −x)

This is smooth on R but non-zero only on the interval (a, b). A

further modiﬁcation involving two such functions g

a,b

creates

a smooth function on R which satisﬁes

a,b,ϵ

(x) =

(

0 if x ≤ a −ϵ or x ≥ b + ϵ

1 if a ≤ x ≤ b

This ‘switches on’ rapidly from 0 to 1 near a and switches off

similarly near b. By letting ϵ be small, we smoothly (but not

uniformly) approximate the indicator function on [a, b].

a,b,ǫ

(x)

aa − ǫ b b + ǫ

11. (Hard) We prove the binomial series formula. Let f (x) = (1 + x)

and g(x) = 1 +

∞

∑

n=1

where a

α(α−1)···(α−n+1)

. Our goal is to prove that f = g on the interval (−1, 1).

(a) Check that f

(n)

(0) = n!a

so that g really is the Maclaurin series of f .

(b) i. Prove that the radius of convergence of g is 1.

ii. Prove that lim

n→∞

= 0 whenever

< 1.

iii. If

< 1 and ξ lies between 0 and x, prove that



x−ξ

1+ξ



≤

(Hint: ξ = tx for some t ∈ (0, 1). . . )

(x)

< (n + 1)

n+1

(1 + ξ)

α−1

Hence conclude that g = f whenever

< 1.

(d) Here is an alternative argument:

i. Show that (n + 1)a

n+1

+ na

= αa

ii. Differentiate term-by-term to prove directly that g satisﬁes the differential equation

(1 + x)g

′

(x) = αg(x). Solve this to show that g = f whenever

< 1.

4 Integration

The theory of inﬁnite series addresses the problem of summing inﬁnitely many ﬁnite quantities. By

contrast, integration is the business of summing inﬁnitely many inﬁnitesimal quantities. Mathemati-

cians have attempted to do both for well over 2000 years, and the philosophical objections are just as

old.

The development and increased application of calculus from the late 1600s spurred mathemati-

cians to try to put the theory on a ﬁrmer footing, though from Newton and Leibniz it took another

150 years before Bernhard Riemann (1856) provided a thorough development of the integral.

32 The Riemann Integral

The basic idea behind Riemann integration is to approximate area using a sequence of rectangles

whose width tends to zero. The following discussion is hopefully familiar.

Example 4.1. Consider f (x) = x

deﬁned on [0, 1].

For each n ∈ N, let ∆x =

and deﬁne x

= i∆x.

Above each subinterval [x

i−1

, x

], raise a rectangle of height

f (x

) = x

The sum of the areas of these rectangles is the Riemann sum

with right-endpoints

∑

i=1

f (x

) ∆x =

∑

i=1

n( n + 1)(2n + 1)

3n + 1

The Riemann sum with left-endpoints is deﬁned similarly:

∑

i=1

f (x

i−1

) ∆x =

∑

i=1

(i −1)

−

3n −1

Since f is increasing, the area A under the curve satisﬁes

≤ A ≤ R

and the squeeze theorem allows us to conclude that A =

0 1

n =

0.365234

0 1

n =

0.302734

The example contains the essential idea, but more ﬂexibility is needed. To get further, we must

properly deﬁne the concepts of partition and Riemann sum.

Two of Zeno’s ancient paradoxes are relevant here: Achilles and the Tortoise concerns a convergent inﬁnite series,

while the Arrow Paradox discusses a difﬁculty with integration by questioning whether time can be considered as a sum

of instants. Perhaps the most famous contemporary criticism comes from Bishop George Berkeley, who gave his name

to the Californian city and thus the ﬁrst UC campus: in The Analyst (1734), Berkeley savaged the foundations of calcu-

lus, describing the inﬁnitesimal increments required in Newton’s theory of ﬂuxions (derivatives) as merely the “ghosts of

departed quantities.”

Now is a good time to review some identities:

∑

i=1

i =

n(n + 1),

∑

i=1

n(n + 1)(2n + 1),

∑

i=1

(n + 1)

Deﬁnition 4.2. A partition P = {x

, . . . , x

} of an interval [a, b] is a ﬁnite sequence such that

a = x

< x

< ··· < x

n−1

< x

= b

For each 1 ≤ i ≤ n, deﬁne ∆x

= x

− x

i−1

. The mesh of the partition is mesh(P) := max ∆x

Choose a sample point x

∗

in each subinterval [x

i−1

, x

If f : [a, b] → R, the Riemann sum

∑

i=1

f (x

∗

) ∆x

computes the area of a family of n rectangles.

f (x)

∗

b = x

∗

a = x

b = x

In elementary calculus, one typically computes Riemann sums for equally-spaced partitions with left,

right or middle sample points. The double freedom of partition & sample points makes applying the

deﬁnition a challenge, so instead we consider two special families of rectangles.

Deﬁnition 4.3. Given a partition P of [a, b] and a bounded function f on [a, b], deﬁne

= sup

x∈[x

i−1

]

f (x) U( f , P) =

∑

i=1

∆x

= inf

x∈[x

i−1

]

f (x) L( f , P) =

∑

i=1

∆x

U( f , P) and L( f , P) are the upper and lower Darboux sums for

f with respect to P. The upper and lower Darboux integrals are

U( f ) = inf U( f , P) L( f ) = sup L( f , P)

where the supremum and inﬁmum are over all partitions.

We say that f is (Riemann) integrable on [a, b] if U( f ) = L( f )

and denote this value by

f or

f (x) dx

If the interval is understood or irrelevant, it is common just

to say that f is integrable and write

f .

a x

Upper Darboux sum U( f , P)

a x

Lower Darboux sum L( f , P)

Intuitively, L( f , P) is the sum of the areas of rectangles built on P which just ﬁt under the graph of

f . It is also the inﬁmum of all Riemann sums on P. If f is discontinuous, then L( f , P) need not be a

Riemann sum; there might not be suitable sample points!

Examples 4.4. 1. We revisit Example 4.1 in this language.

Given a partition Q = {x

, . . . , x

} of [0, 1] and sample points x

∗

∈ [x

i−1

, x

], we compute the

Riemann sum for f (x) = x

∑

i=1

f (x

∗

) ∆x

∑

i=1

∗

)

− x

i−1

)

Since f is increasing, we have x

i−1

≤ (x

∗

)

≤ x

on each interval, whence

L( f , Q) =

∑

i=1

i−1

)

− x

i−1

) ≤

∑

i=1

∗

)

− x

i−1

) ≤

∑

i=1

)

− x

i−1

) = U( f , Q)

The Darboux sums are therefore the Riemann sums for right and left endpoints.

If we take Q

to be the partition with subintervals of equal width ∆x =

, then

U( f ) = inf

U( f , P) ≤ U( f , Q

) =

∑

i=1





∆x = R

is the right Riemann sum discussed originally. Similarly L( f ) ≥ L

. Since L

and R

both

converge to

as n → ∞, the squeeze theorem forces

≤ L( f ) ≤ U( f ) ≤ R

=⇒ L( f ) = U( f ) =

whence f is Riemann integrable on [0, 1] with

dx =

2. Suppose f (x) = kx + c on the interval [a, b] where

k > 0. Take the evenly spaced partition P

where

= a +

b−a

i with ∆x

b−a

. Since f is increasing,

the upper Darboux sum is again the Riemann sum

with right-endpoints:

U( f , P

) = R

∑

i=1

f (x

) ∆x

b − a

∑

i=1

k(b − a)

i + ak + c

ak + c

bk + c

U( f , P

)

b − a



k(b − a)

n( n + 1) + (ak + c)n



−−−→

n→∞

k(b − a)

+ (b − a)(ak + c) =

( b

− a

) + c(b − a)

We similarly see that the lower Darboux sum is given by the Riemann sum with left endpoints,

and that

L( f , P

) = L

b − a



k(b − a)

n( n −1) + (ak + c)n



−−−→

n→∞

( b

− a

) + c(b − a)

By the same argument as above, L

≤ L( f ) ≤ U( f ) ≤ R

and the squeeze theorem show that

f is integrable with

f =

( b

− a

) + c(b − a).

Following the examples, a few remarks are in order.

Riemann versus Darboux Deﬁnition 4.3 is really that of the Darboux integral. Riemann’s deﬁnition is

as follows: for f [a, b] → R to be integrable with integral

f means

∀ϵ > 0, ∃δ such that ∀P, x

∗

, mesh(P) < δ =⇒



∑

i=1

f (x

∗

) ∆x

−



< ϵ

It can be shown that this is equivalent to the Darboux integral. We won’t pursue Riemann’s

formulation further, except to observe that if a function is integrable and mesh(P

) → 0, then

f = lim

n→∞

∑

i=1

f (x

∗

) ∆x

: this allows us to approximate integrals using any sample points we

choose, hence why right endpoints (x

∗

= x

) are so common in Freshman calculus.

Monotone Functions Darboux sums are particularly easy to compute for monotone functions. As in

the examples, if f is increasing, then each M

= f (x

), from which U( f , P) is the Riemann sum

with right-endpoints. Similarly, L( f , P) is the Riemann sum with left-endpoints. The roles reverse

if f is decreasing.

Area If f is positive and continuous,

the Riemann integral

f serves as a deﬁnition for the area

under the curve y = f (x). This should make intuitive sense:

1. In the second example where we have a straight line, we obtain the same value for the

area by computing directly as the sum of a rectangle and a triangle!

2. If the area under the curve is to make sense, then, for any partition P, it plainly satisﬁes

the inequalities

L( f , P) ≤ Area ≤ U( f , P)

But these are exactly the same as those satisﬁed by the integral itself:

L( f , P) ≤ L( f ) =

f = U( f ) ≤ U( f , P)

In the examples we exhibited a sequence of partitions (P

) where U( f , P

) and L( f , P

) both con-

verged to the same limit. The next results develop some basic properties of partitions and make this

process rigorous.

Lemma 4.5. Suppose f : [a, b] → R is bounded and suppose P, Q are partitions of [a, b].

1. If Q is a reﬁnement of P, that is P ⊆ Q, then

L( f , P) ≤ L( f , Q) ≤ U( f , Q) ≤ U( f , P)

2. For any partitions P, Q, we have L( f , P) ≤ U( f , Q)

3. L( f ) ≤ U( f )

We’ll see later (Theorem 4.16) that every continuous function is integrable.

Proof. 1. We prove inductively. First suppose that Q = P ∪{t} contains exactly one additional point

t ∈ (x

k−1

, x

). Write

= inf{f (x) : x ∈ [x

k−1

, t]}

= inf{f (x) : x ∈ [t, x

k−1

]}

m = inf{f (x) : x ∈ [x

k−1

, x

]} = min{m

, m

}

The Darboux sums L( f , P) and L( f , Q) are identical ex-

cept for the terms involving t; indeed

k−1

Extra area!

··· ···

L( f , Q) − L( f , P) = m

( t − x

k−1

) + m

−t) −m(x

− x

k−1

)

= (m

−m)(t − x

k−1

) + (m

−m)(x

−t) ≥ 0

Since partitions are ﬁnite sets, by induction we see that P ⊆ Q =⇒ L( f , P) ≤ L( f , Q).

The argument for U( f , Q) ≤ U( f , P) is similar, and the middle inequality is trivial.

2. If P and Q are partitions, then P ∪ Q is a reﬁnement of both P and Q. By part 1,

L( f , P) ≤ L( f , P ∪Q) ≤ U( f , P ∪ Q) ≤ U( f , Q) (∗)

3. We leave this as an exercise.

Theorem 4.6 (Cauchy criterion for integrability). Suppose f : [a, b] → R is bounded.

1. f is integrable ⇐⇒ ∀ϵ > 0, ∃P such that U( f , P) − L( f , P) < ϵ

2. f is integrable ⇐⇒ ∃(P

)

n∈N

such that U( f , P

) − L( f , P

) → 0. Moreover, in such a case

both sequences U( f , P

) and L( f , P

) converge to

f .

We call this a Cauchy criterion since integrability is demonstrated without mention of the integral!

Proof. 1. (⇒) Suppose f is integrable. Since inf U( f , Q) =

f = sup L( f , R), ∃Q, R such that

U( f , Q) −

f <

and

f − L( f , R) <

Let P = Q ∪ R and apply (∗): L( f , R) ≤ L( f , P) ≤ U( f , P) ≤ U( f , Q). But then

U( f , P) − L( f , P) ≤ U( f , Q) − L( f , R) = U( f , Q) −

f +

f − L( f , R) < ϵ

(⇐) For every partition, L( f , P) ≤ L( f ) ≤ U( f ) ≤ U( f , P). Thus

0 ≤ U( f ) − L( f ) ≤ U( f , P) − L( f , P) < ϵ

Since this holds for all ϵ > 0, we see that U( f ) = L( f ).

2. This is an exercise.

Examples 4.7. 1. The freedom to choose a partition can be very useful. Consider f (x) =

√

x on the

interval [0, b]. We choose a partition that evaluates nicely when fed to this function:

= {x

, . . . , x

} where x





=⇒ ∆x

= x

− x

i−1



−(i −1)



(2i −1)b

Since f is increasing on [0, b], we see that

U( f , P

) =

∑

i=1

f (x

) ∆x

∑

i=1

√

(2i −1)b

3/2

∑

i=1

−i

3/2



n( n + 1)(2n + 1) −

n( n + 1)



−−−→

n→∞

3/2

Similarly

L( f , P

) =

∑

i=1

f (x

i−1

) ∆x

∑

i=1

(i −1)

√

(2i −1)b

3/2

∑

i=1

−3i + 1

3/2



n( n + 1)(2n + 1) −

n( n + 1) + n



−−−→

n→∞

3/2

Since these limits are equal, we conclude that f is integrable and that

√

x dx =

3/2

0 b

√

Upper Sum U( f , P

)

0 b

√

Lower Sum L( f , P

)

2. We ﬁnish this section with the classic example of a non-integrable function. Let f : [a, b] → R

to be the indicator function of the irrational numbers,

f (x) =

(

1 if x ∈ Q

0 if x ∈ Q

Since any interval of positive length contains both rational and irrational numbers, we see that

sup



f (x) : x ∈ [x

i−1

, x

]



= 1 and inf



f (x) : x ∈ [x

i−1

, x

]



= 0

for any partition P = {x

, . . . , x

}. We conclude that

U( f , P) =

∑

i=1

− x

i−1

) = b −a =⇒ U( f ) = b −a and

L( f , P) = 0 =⇒ L( f ) = 0

Since the upper and lower integrals are unequal, f is not Riemann integrable.

As any freshman calculus student can attest, if you can ﬁnd an anti-derivative, then the fundamental

theorem of calculus (Section 34) makes evaluating integrals far easier. For instance, you are probably

desperate to write

3/2

= x

1/2

=⇒

√

x dx =

3/2



3/2

rather than computing Riemann/Darboux sums as in the previous example! In most practical cases,

however, no easy-to-compute anti-derivative exists, so the best we can do is approximate integrals

by evaluating Riemann sums for progressively ﬁner partitions. Thankfully computers excel at such

tedious work!

Exercises 32 1. For each function on the given interval, use partitions to ﬁnd the upper and lower

Darboux integrals. Hence prove that the function is integrable and compute its integral.

(a) f (x) = x

on [0, b] for any b > 0.

(b) g(x) =

√

x on [0, b].

(Hint: mimic Example 4.7.1)

2. Repeat question 1 for the following two functions. You cannot simply compute Riemann sums

for left and right endpoints and take limits: why not?

(a) h(x) = x(2 − x) on [0, 2]

(Hint: choose a partition with 2n points such that x

= 1 and observe that h(2 −x) = h(x))

(b) k(x) =

(

2x if x ≤ 1

5 − x if x > 1

on [0, 3].

(Hint: this time try a partition with 3n points. . . )

3. Let f (x) = x for rational x and f (x) = 0 for irrational x.

(a) Calculate the upper and lower Darboux integrals for f on the interval [0, b].

(b) Is f integrable on [0, b]?

4. Prove part 3 of Lemma 4.5: L( f ) ≤ U( f ).

5. Prove part 2 of Theorem 4.6.

f is integrable ⇐⇒ ∃(P

)

n∈N

such that U( f , P

) − L( f , P

) → 0.

Moreover, both U( f , P

) and L( f , P

) converge to

f .

6. (a) Reread Deﬁnition 4.3. What happens if we allow f : [a, b] → R to be unbounded?

(b) Read “Riemann versus Darboux” on page 73. Explain why being Riemann integrable also

forces f to be bounded.

7. (If you like coding) Write a short program to estimate

f (x) dx using Riemann sums. This

can be very simple (equal partitions with right endpoints), or more complex (random partition

and sample points given a mesh). Apply your program to estimate

sin(x

−

√

) dx.

33 Properties of the Riemann Integral

The rough take-away of this long section is that everything you think is integrable probably is! There

will not be many examples since we have not established many explicit values for integrals.

Theorem 4.8 (Linearity). If f , g are integrable and k, l are constant, then k f + lg is integrable and

k f + lg = k

f + l

Example 4.9. Thanks to examples in the previous section, we can now calculate, for instance

−3

√

x dx = 5 ·

·2

−3 ·

·2

3/2

= 20 −4

√

Proof. Suppose ϵ > 0 is given. By Theorem 4.6 part 3, there exist partitions R, S such that

U( f , R) − L( f , R) <

and U(g, S) − L(g, S) <

By Theorem 4.6 part 1, if P := R ∪S, then both inequalities are satisﬁed by P. On each subinterval,

inf f (x) + inf g(x) ≤ inf( f (x) + g(x)) and sup( f (x) + g(x) ) ≤ sup f (x) + sup g(x)

since the individual suprema/inﬁma could be ‘evaluated’ at different places. Thus

L( f , P) + L(g, P) ≤ L( f + g, P) ≤ U( f + g, P) ≤ U( f , P) + U(g, P)

whence U( f + g, P) − L( f + g, P) < ϵ and f + g is integrable. Moreover,

( f + g) −

f −

g ≤



U( f , P) −





U(g, P) −



< ϵ

Using lower Darboux integrals similarly, we see that

−ϵ <

( f + g) −

f −

g < ϵ

Since this holds for all ϵ > 0, we conclude that

( f + g) =

f +

That k f is integrable with

k f = k

f is an exercise. Put these together for the result.

Corollary 4.10 (Changing endvalues). Suppose g is integrable on [a, b] and that f : [a, b] → R

satisﬁes f (x) = g(x) on (a, b). Then f is integrable on [a, b] and

f =

Deﬁnition 4.11 (Integration on an open interval). A bounded function f : (a, b) → R is integrable if

it has an integrable extension g : [a, b] → R where f (x) = g(x) on (a, b). In such a case, we deﬁne

f :=

The Corollary (its proof is an exercise) shows that the choice of extension is irrelevant.

Theorem 4.12 (Basic Comparisons). Suppose f and g are integrable on [a, b].

1. If f (x) ≤ g(x), then

f ≤

2. If m ≤ f (x) ≤ M then m(b −a) ≤

f ≤ M(b − a).

3. f g is integrable.

is integrable and



≤

5. max( f , g) and min( f , g) are integrable.

Part 3 is not integration by parts and does not tell us how

f g relates to

f and

Proof. 1. Since g(x) − f (x) ≥ 0 is integrable, L(g − f , P) ≥ 0 for all partitions P, and so

0 ≤ L(g − f ) =

g − f =

g −

2. Apply part 1 twice.

3. This is an exercise.

4. The integrability is an exercise. For the comparison, apply part 1 to −

≤ f ≤

5. max( f , g) =

( f + g) +

f − g

, etc.

Theorem 4.13 (Domain splitting). Suppose that f : [a, b] → R

and let c ∈ (a, b). If f is integrable on both [a, c] and [c, b], then it

is integrable on [a, b] and

f =

f +

f (x)

a c b x

In light of this result, it is conventional to allow integral limits to be reversed:

f := −

f is consistent with

f = 0

Proof. Let ϵ > 0 be given, then ∃R, S partitions of [a, c], [ c, b] such that

U( f , R) − L( f , R) <

, U( f , S) − L( f , S) <

Choose P = R ∪S to partition [a, b], then

U( f , P) − L( f , P) = U( f , R) + U( f , S) − L( f , R) − L( f , S) < ϵ

Moreover

f (x)

a c b x

} | {

f −

f ≤ U( f , P) − L( f , R) − L( f , S) = U( f , P) − L( f , P) < ϵ

The other side is similar.

Example 4.14. If f (x) =

√

x on [0, 1] and f (x) = 1 on [1, 2], then

f =

√

x dx +

1 dx =

+ 1 =

Monotonic & Continuous Functions

We are now in a position to establish the integrability of two large classes of functions.

Deﬁnition 4.15. A function f : [a, b] → R is:

Monotonic if it is either increasing (x < y =⇒ f (x) ≤ f (y)) or decreasing.

Piecewise monotonic if there is a partition P = {x

, . . . , x

} of [a, b] such that f is monotonic on each

open subinterval (x

k−1

, x

Piecewise continuous if there is a partition such that f is uniformly continuous on each (x

k−1

, x

Theorem 4.16. If f is monotonic or continuous on [a, b] , then it is integrable.

Examples 4.17. 1. Since sine is continuous, we can approximate via a sequence of Riemann sums

sin x dx =

lim

n→∞

∑

i=1

sin

πi

Evaluating this limit is another matter entirely, one best handled in the next section...

2. Similarly, e

√

can be integrated and therefore approximated via Riemann sums:

√

dx =

lim

n→∞

∑

i=1

exp

= lim

n→∞

∑

i=1

2j −1

exp

Both sums use right endpoints; the ﬁrst has equal subintervals and the second is analogous to

Example 4.7.1. These limits would typically be estimated using a computer.

Proof. Suppose f : [a, b] → R is continuous. Since [a, b] is closed and bounded, f is uniformly contin-

uous. Let ϵ > 0 be given:

∃δ > 0 such that ∀x, y ∈ [a, b],

x −y

< δ =⇒

f (x) − f (y)

b − a

Let P be a partition with mesh P < δ. Since f attains its bounds on each [x

i−1

, x

] (extreme value

theorem),

∃x

∗

, y

∗

∈ [x

i−1

, x

] such that M

−m

= f (x

∗

) − f (y

∗

) <

b − a

from which

U( f , P) − L( f , P) <

∑

i=1

b − a

− x

i−1

) = ϵ

The monotonicity argument is an exercise.

Corollary 4.18. Piecewise continuous and bounded piecewise monotonic functions are integrable.

Proof. If f is piecewise continuous, then the restriction of f to (x

k−1

, x

) has a continuous extension

: [x

k−1

, x

] → R; integrable by Theorem 4.16. By Corollary 4.10, f is integrable on [x

k−1

, x

] with

k−1

f =

k−1

. Several applications of Theorem 4.13 ﬁnish things off:

f =

∑

k=1

k−1

The argument for piecewise monotonicity is similar.

Example 4.19. The ‘fractional part’ function f (x) = x − ⌊x⌋

is both piecewise continuous and piecewise monotone on any

bounded interval. It is therefore integrable on any [a, b].

0 1 2 3 4 5

We ﬁnish with the ﬁnal incarnation of the intermediate value theorem.

Corollary 4.20 (IVT for integrals). If f is continuous on [a, b], then ∃ξ ∈ (a, b) for which

f (ξ) =

b − a

Proof. Since f is continuous, it is integrable on [a, b]. By the extreme value theorem it is also bounded

and attains its bounds: ∃p, q ∈ [a, b] such that

f (p) := inf

x∈[a,b]

f (x), f (q) = sup

x∈[a,b]

f (x)

Applying Theorem 4.12, part 2, with m = f (p) and

M = f (q), we see that

( b − a) f (p) ≤

f ≤ (b −a) f (q)

ξa bp q

Now divide by b −a and apply the usual intermediate value theorem for f to see that the required ξ

exists between p and q.

In the picture, when f is positive and continuous, the grey area equals that under the curve; imagine

levelling off the blue hill with a bulldozer. . . The notation f

b−a

f is short for the average value

of f on [a, b]: to see why this interpretation is sensible, approach

f via a sequence of Riemann sums

on equally-spaced partitions P

, then

b − a

f = lim

n→∞

∑

i=1

f (x

∗

) ∆x = lim

n→∞

f (x

∗

) + ··· + f (x

∗

)

is the limit of a sequence of averages of equally-spaced samples f (x

∗

What can/can’t be integrated? (non-examinable)

We now know a great many examples of integrable functions: essentially

• Piecewise continuous & monotonic functions are integrable.

• Linear combinations, products, absolute values, maximums and minimums of (already) inte-

grable functions.

After so many positive integrability conditions, it is reasonable to ask precisely which functions are

Riemann integrable. There is a precise answer, though it is quite tricky to understand.

Theorem 4.21 (Lebesgue). Suppose f : [a, b] → R is bounded. Then

f is Riemann integrable ⇐⇒ it is continuous except on a set of measure zero

ıvely, the measure of a set is the sum of the lengths of its maximal subintervals; though unfor-

tunately this doesn’t make for a very useful deﬁnition.

Any countable subset has measure zero;

Lebesgue’s result is almost as if we can extend Corollary 4.18 to allow for inﬁnite sums. Indeed

you might have encountered a function which is continuous only on the irrationals; such a function

is Riemann integrable. There are also some uncountable sets with measure zero such as Cantor’s

middle-third set: if f is the indicator function of Cantor’s set

f (x) =

(

1 if x ∈ C

0 otherwise

then f is continuous except on C, and is Riemann integrable with

f (x) dx = 0.

Exercises 33 1. Explain why

2π

sin

( e

) dx ≤

2. If f is integrable on [a, b] prove that it is integrable on any interval [c, d] ⊆ [a, b].

3. We complete the proof of Theorem 4.8 (linearity of integration).

(a) Suppose k > 0, let A ⊆ R and deﬁne kA := {kx : x ∈ A}. Prove that sup kA = k sup A

and inf kA = k inf A.

(b) If k > 0 prove that k f is integrable on any interval and that

k f = k

f .

4. Give an example of an integrable but discontinuous function on a closed bounded interval [a, b]

for which the conclusion of the Intermediate Value Theorem for Integrals is false.

Formally, the length of an open interval (a, b) is b − a and a set A ⊆ R has measure zero if

∀ϵ > 0, ∃ open intervals I

such that A ⊆

∞

[

n=1

and

∞

∑

i=1

length(I

) < ϵ

More generally, the measure of a set (subject to a technical condition) is the inﬁmum of the sum of the lengths of any

countable collection of open covering intervals. A rigorous discussion of measure theory is properly a matter for graduate

analysis. Somewhat surprisingly, there exist sets with positive measure that contain no subintervals, and even sets which

are non-measurable!

5. Explicitly compute the value of the integral

15/2

1/2

x −⌊x⌋dx (recall Example 4.19).

6. We prove and extend Corollary 4.10. Suppose f is integrable on [a, b].

(a) If g : [a, b] → R satisﬁes f (x) = g(x) for all x ∈ (a, b), prove that g is integrable and

g =

f .

(Hint: consider h = f − g and show that

h = 0)

(b) Now suppose g : [a, b] → R satisﬁes f (x) = g(x) for all x ∈ [a, b] except at ﬁnitely many

points. Prove that g is integrable and

g =

f .

7. Show that an increasing function on [a, b] is integrable and thus complete Theorem 4.16.

(Hint: Choose a partition P with mesh P <

f (b)−f (a)

)

8. Suppose f and g are integrable on [a, b].

(a) Deﬁne h(x) = ( f (x))

. We know:

• f is bounded: ∃K such that

f (x)

≤ K on [a, b] .

• Given ϵ > 0, ∃P such that U( f , P) − L( f , P) <

. For each subinterval [x

i−1

, x

], let

= sup f (x), m

= inf f (x), M

= sup h(x), m

= inf h(x)

Prove that M

−m

≤ 2(M

−m

)K and use this to conclude that h is integrable.

(b) Prove that f g is integrable.

(Hint: f g =

( f + g)

−

( f − g)

)

, P) − L(

, P) ≤ U( f , P) − L( f , P) for any partition P. Hence conclude

that

is integrable.

(One can extend these arguments—it’s a bit harder!—to show that if j is continuous, then j ◦ f is

integrable. Parts (a) and (c) correspond, respectively, to j(x) = x

and j(x) =

9. (Hard) Let f (x) =











x if x = 0 and sin

> 0

−x if x = 0 and sin

< 0

0 if x = 0

(a) Show that f is not piecewise continuous on [0, 1].

(b) Show that f is not piecewise monotonic on [0, 1].

(Hint: given ϵ, hunt for a suitable partition to make U( f , P) − L( f , P) < ϵ by considering [0, x

]

differently to the other subintervals)

(d) Make a similar argument to show that g = sin

is integrable on ( 0, 1], where

g(x) =

(

sin

if x = 0

0 if x = 0

Note that neither argument evaluates the integrals!

34 The Fundamental Theorem of Calculus

The key result linking integration and differentiation is usually presented in two parts.

While there

are signiﬁcant subtleties, the rough statements are as follows:

Part I Differentiation reverses integration:

f (t) dt = f (x)

Part II Integration reverses differentiation:

′

(x) dx = F(b) − F(a)

These facts seemed intuitively obvious to early practitioners of calcu-

lus: indeed, given a continuous positive function f :

• Let F(x) denote the area under the curve between 0 and x.

• A small increase ∆x results in the area increasing by ∆F.

• ∆F ≈ f (x)∆x is approximately the area of a rectangle, whence

∆F

∆x

≈ f (x). This is part I.

• F(b) − F(a) ≈

∑

∆F

≈

∑

f (x

) ∆x

. Since F

′

= f , this is part II.

∆F

∆x

f (x)

In fact when Leibniz introduced the symbols

and d in the late 1600’s, it was partly to reﬂect the

fundamental theorem.

If you’re happy with non-rigorous notions of limit, rate of change, area, and

(inﬁnite) sums, the above is all you need!

Of course, we are very much concerned with the details: What must we assume regarding f and F,

and how are these properties used in the proof?

Theorem 4.22 (FTC, part I). Suppose f is integrable on [a, b]. For any x ∈ [a, b], deﬁne

F(x) :=

f (t) dt

Then:

1. F is uniformly continuous on [a, b];

2. If f is continuous at c ∈ [a, b], then F is differentiable at c with F

′

( c) = f (c) .

As ever, the condition at c = a should be right-continuous and the conclusion right-differentiable, etc.

Compare this with the na

ıve version above where we assumed f was continuous. We now require

only the integrability of f , and its continuity at one point for the full result.

We follow the traditional numbering; some authors reverse these.

is a stylized S for sum, while d stands for difference. Given a sequence F = (F

, F

, . . . , F

), construct a new

sequence of differences

dF = (F

− F

, F

− F

, . . . , F

− F

n−1

)

which can then be summed:

dF = (F

− F

) + (F

− F

) + ···(F

− F

n−1

) = F

− F

(∗)

Viewing a function as an ‘inﬁnite sequence’ of values spaced along an interval, dF becomes a sequence of inﬁnitesimals and

(∗) is essentially the fundamental theorem:

dF = F(b) − F(a). It is the conception of a function that is suspect here, not

the essential relationship between sums and differences.

Examples 4.23. You should have seen many examples in an elementary calculus course.

1. Since f (x) = sin

−7) is continuous on any bounded interval, we conclude that

sin

( t

−7) dt = sin

−7)

If one follows Theorem 4.13 and its resulting conventions, then this is valid for all x ∈ R.

2. The chain rule permits more complicated examples. For instance, since f (t) = sin

√

t is contin-

uous on its domain [0, ∞) and y(x) = x

+ 3 has range [3, ∞) ⊆ dom( f ), we have

sin

√

t dt =

sin

√

t dt = 2x sin

+ 3

3. For a ﬁnal positive example, observe that

sin x

tan(t

) dt = e

tan(e

) −cos x tan(sin

To evaluate this, one ﬁrst chooses any constant a and writes

sin x

−

sin x

before differentiating. This is valid provided sin x, e

and a all lie in the same subinterval of

dom tan(t

) = R \ {±

, ±

3π

, ±

5π

, . . .}

Since

sin x

≤ 1 <

, this requires



⇐⇒ x <

Choosing a = 1 would certainly sufﬁce.

4. Now consider why the theorem requires continuity. The piecewise

continuous function

f : [0, 2] → R : x 7→

(

2x if x ≤ 1

if x > 1

has a jump discontinuity at x = 1. We can still compute

F(x) =

(

2t dt = x

if x ≤ 1

2t dt +

dt =

(x + 1) if x > 1

This is continuous, indeed uniformly so. However the discontinu-

ity of f results in F having a corner and thus being non-differentiable

at x = 1. Indeed, F

′

(x) = f (x) for all x = 1; that is, at all values of

x where f is continuous.

f (x)

0 1 2

F(x)

0 1 2

Proving FTC I Neither half of the theorem is particularly difﬁcult once you write down what you

know and what you need to prove. Here are the key ingredients:

1. F uniformly continuous means controlling the size of

F(y) − F(x)



f (t) dt −

f (t) dt



f (t) dt



≤

f (t)

But the boundedness of f allows us to bound this last integral. . .

2. F

′

( c) = f (c) means showing that lim

x→c

F(x)−F(c)

x−c

= f (c), which means controlling the size of



F(x) − F(c)

x −c

− f (c)



x −c

f (t) dt − f (c)



The trick here will be to bring the constant f (c) inside the integral as

x−c

f (c) dt so that what

we really have to control is the size of

x−c

f (t) − f (c)

dt. This is where the continuity of

f comes in. . .

Proof. 1. Since f is integrable, it is bounded: ∃M > 0 such that

f (x)

≤ M for all x.

Let ϵ > 0 be given and deﬁne δ =

. Then, for any x, y ∈ [a, b],

0 < y − x < δ =⇒

F(y) − F(x)



f (t) dt



≤

f (t)

dt (Theorem 4.12, part 4)

≤ M(y − x) (Theorem 4.12, part 2)

< Mδ = ϵ

We conclude that F is uniformly continuous on [a, b].

2. Let ϵ > 0 be given. Since f is continuous at c, ∃δ > 0 such that, for all t ∈ [a, b],

t −c

< δ =⇒

f (t) − f (c)

Now for all x ∈ [a, b] (except c),

0 <

x −c

< δ =⇒



F(x) − F(c)

x −c

− f (c)



x −c

f (t) − f (c) dt



(Theorem 4.8)

≤

x −c

f (t) − f (c)

dt (Theorem 4.12)

≤

x −c

< ϵ

Clearly lim

x→c

F(x)−F(c)

x−c

= f (c), and so F is differentiable at c with F

′

( c) = f (c) .

The Fundamental Theorem, part II As with part I, the formulaic part of the result should be familiar,

though we are more interested in the assumptions and where they are needed.

Theorem 4.24 (FTC, part II). Suppose g is continuous on [a, b], differentiable on (a, b), and that g

′

is integrable

on (a, b). Then,

′

= g(b) − g(a)

Part II is often expressed in terms of anti-derivatives: F being an anti-derivative of f if F

′

= f . Com-

bined with FTC I, we recover the familiar ‘+c’ result and a simpler version of the fundamental theo-

rem often seen in elementary calculus.

Corollary 4.25. Let f be continuous on [a, b].

• If F is an anti-derivative of f , then

f = F(b) − F(a).

• Every anti-derivative has the form F(x) =

f (t) dt + c for some constant c.

Examples 4.26. Again, basic examples should be familiar.

1. Plainly g(x) = x

+ 2x

3/2

is continuous on [1, 4] and differentiable on (1, 4) with derivative

′

(x) = 2x + 3

√

x; this last is continuous (and thus integrable) on ( 1, 4). We conclude that

2x + 3

√

x dx = x

+ 2x

3/2



= (16 + 16) − (1 + 2) = 29

2. If g(x) = sin(3x

), then g

′

(x) = 6x cos(3x

). Certainly g satisﬁes the hypotheses of the theorem

on any bounded interval [a, b]. We conclude

6x cos(3x

) dx = sin(3b

) −sin(3a

)

Moreover, every anti-derivative of f (x) = 6x cos(3x

) has the form F(x) = sin(3x

) + c.

3. Recall Example 4.23.4 where we saw that the discontinuity of f led to the non-differentiability

of F(x) =

f (t) dt at x = 1. The function F therefore fails the hypotheses of FTC II on the

interval [0, 2].

However, except at x = 1, F is an anti-derivative of f and moreover

f (x) dx = F(2) − F(0),

so we appear to have the formulaic conclusion of FTC II, though this is tautological given the

deﬁnition of F!

The way out of this conundrum is to note that other anti-derivatives

F of f exist (except at

x = 1), and which fail to satisfy the conclusion. For instance

F(x) =

(

if x < 1

x if x > 1

=⇒

F(2) −

F(0) = 1 =

f (x) dx

See Deﬁnition 4.11 if you’re unsure what it means for g

′

to be integrable on a bounded open interval.

Proving FTC II See Exercise 10 for a relatively easy proof when g

′

= f is continuous. For the real

McCoy, we can only rely on the integrability of g

′

: the trick is to use the mean value theorem to write

g( b) − g(a) as a Riemann sum over a suitable partition.

Proof. Let ϵ > 0 be given and choose a partition P such that U(g

′

, P) − L(g

′

, P) < ϵ. Since g satisﬁes

the mean value theorem on each subinterval of the partition P, we see that

∃ξ

∈ (x

i−1

, x

) such that g

′

( ξ

) =

g(x

) − g(x

i−1

)

− x

i−1

from which

g( b) − g(a) =

∑

i=1

g(x

) − g(x

i−1

) =

∑

i=1

′

( ξ

)(x

− x

i−1

)

This is a Riemann sum for g

′

associated to the partition P, hence,

L(g

′

, P) ≤ g(b) − g(a) ≤ U(g

′

, P)

However we also have L(g

′

, P) ≤

′

≤ U(g

′

, P). Since these hold for all ϵ, the proof is complete.

While we certainly used the integrability of g

′

in the proof, it might seem strange that we assumed it

at all: shouldn’t every derivative be integrable? Perhaps surprisingly, the answer is no! If you want

a challenge, look up the Volterra function, which is differentiable everywhere, but whose derivative is

non-integrable (on, for instance, [0, 1])!

The Rules of Integration

If one wants to evaluate an integral, rather than merely show it exists, there are really only two options:

1. Evaluate Riemann sums and take limits: often difﬁcult if not impossible to do explicitly.

2. Use FTC II. The problem now becomes the ﬁnding of anti-derivatives, for which the core method

is essentially guess and differentiate. To obtain general rules, we attempt to reverse the rules of

differentiation.

Integration by Parts First consider the product rule: the product g = uv of two differentiable

functions is differentiable with g

′

= u

′

v + uv

′

. Now apply Theorems 4.8, 4.12 and FTC II.

Corollary 4.27 (Integration by Parts). Suppose u, v are continuous on [a, b], differentiable on (a, b),

and that u

′

, v

′

are integrable on (a, b). Then

′

(x)v(x)dx = u(b)v(b) − u(a)v(a) −

u(x)v

′

(x)dx

This is signiﬁcantly less useful than the product rule since it is only capable of transforming the

integral of a product into another such integral.

Examples 4.28. You should have seen myriad examples in a previous course. With practice, there

is no need to explicitly state u and v.

1. Let u(x) = x and v

′

(x) = cos x. Then u

′

(x) = 1 and v(x) = sin x, whence

π/2

x cos x dx =

[

x sin x

]

π/2

−

π/2

sin x dx =

sin

−0 −

[

−cos x

]

π/2

+ cos

−cos 0 =

−1

2. Let u(x) = ln x and v

′

(x) = 1. Then u

′

(x) =

and v(x) = x, whence

ln x dx =

[

x ln x

]

−

dx = e

ln e

−e ln e −

[

]

= 2e

−e − e

+ e = e

Change of Variables/Substitution We now turn our attention to the chain rule. If g(x) = F



u(x)



where F and u are differentiable, then g is differentiable with

′

(x) =

= F

′



u(x)



′

(x)

Now integrate both sides; the only issue is what assumptions are needed to invoke FTC II.

Theorem 4.29 (Substitution Rule). Suppose we have two continuous functions: u : [a, b] → R and

f : range(u) → R. Suppose also that u is differentiable on (a, b) with integrable derivative u

′

. Then



u(x)



′

(x) dx =

u(b)

u(a)

f (u) du

This is the famous ‘u-sub’ formula from elementary calculus.

Proof. We leave as an exercise the veriﬁcation that both integrals exist. We may also assume that

range(u) is an interval of positive length

for otherwise both integrals are trivially zero.

Choose any c ∈ range( u) and deﬁne

F : range(u) → R by F(v) :=

f (t) dt

Since f is continuous, by FTC I we see that F is differentiable with F

′

( u) = f (u) . But now



u(x)



′

(x) dx =





u(x)





dx (chain rule)

= F



u( b)



− F



u(a)



(FTC II)

u(b)

u(a)

f (u) du

By the intermediate and extreme value theorems, range(u) is already a closed bounded interval.

Examples 4.30. Reading the theorem is bad enough; its application often requires signiﬁcant creativ-

ity in order to recognize a suitable substitution.

1. To evaluate the integral

√

2x sin x

dx, consider the substitution u(x) = x

deﬁned on

[0,

√

π]. Certainly u is continuous, and its derivative u

′

(x) = 2x is integrable on (0,

√

π).

Finally f (u) = sin u is continuous on range(u) = [0, π]. The hypotheses are satisﬁed, whence

√

2x sin x

dx =

√



u(x)



′

(x) dx =

sin u du = −cos u



= 2

2. For the following integral with f (u) =

, we make the substitution u(x) = x

− 2. Note

that u : [

√

3] → [0, 1] and that u

′

(x) = 2x is integrable; moreover, f (u) is continuous on

range(u) = [0, 1]. We conclude that

√

−4x

+ 5

dx =

√

−2)

+ 1

dx =

+ 1

du = arctan u



3. The hypotheses on u really are all that is necessary. In particular, u doesn’t need to be left-

/right-differentiable at the endpoints of [a, b]! For instance, with f (u) = u

and u(x) =

√

x on

[0, 4], we easily verify

√

x dx =

√

dx =



u(x)



′

(x) dx =

f (u) du =

du =

4. Sloppy use of the substitution rule might lead to utter nonsense. For instance, consider the

‘substitution’ u = x

in the following:

−1

dx =

−1

2x dx =

du =

(ln 4 − ln 1) = ln 2

Of course the left hand integral does not exist since

is undeﬁned at 0 ∈ (−1, 2) , so the

conclusion is false. In the language of the substitution rule, f (u) =

is not continuous on

range(u) = [0, 4]: it is not even deﬁned at u = 0! You are very unlikely to make precisely this

mistake since the ﬁrst integral is so clearly undeﬁned, but for more complicated functions. . .

Hence the old adage, “Differentiation is a science; integration an art.” To illustrate via an example, consider the function

f (x) = tan(e

cos(3x

) + 4x

). The product and chain rules allow one to explicitly compute the derivative

d f

1 + (e

cos(3x

) + 4x

)



cos(3x

) −6xe

sin(3x

) + 12x



By contrast, the integration analogues (integration by parts/substitution) are essentially useless in attempting to ﬁnd an

explicit anti-derivative facilitating the integration of the same function via FTC II; for instance, the integral

tan(e

cos(3x

) + 4x

) dx

is likely impossible to evaluate explicitly and can only be approximated (e.g. via Riemann sums).

Exercises 34 1. Calculate the following limits:

(a) lim

x→0

dt (b) lim

h→0

3+h

2. Let f (t) =











0 if t < 0

t if 0 ≤ t ≤ 1

4 if t > 1

(a) Determine the function F(x) =

f (t) dt.

(b) Sketch F. Where is F continuous?

′

at the points of differentiability.

3. Let f be a continuous function on R.

(a) Deﬁne F(x) =

x+1

x−1

f (t) dt. Carefully show that F is differentiable on R and compute F

′

(b) Repeat for the function G(x) =

sin x

f (t) dt.

4. Recall Examples 4.23.4 and 4.26.3. Find all anti-derivatives F of f on [0, 1) ∪ (1, 2]. How many

satisfy

f (x) dx = F(2) − F(0)?

5. Consider integration by parts. Plainly

′

( t)v(t) dt is an anti-derivative of u

′

(x)v(x) by FTC I:

what does integration by parts say is another?

6. Use change of variables to integrate

√

1 − x

7. Use integration by parts and the substitution rule to evaluate

arcsin x dx for any b < 1.

8. Use integration by parts to evaluate

x arctan x dxfor any b > 0

9. Check that the assumptions of int by subs guarantee that both integrals are well-deﬁned (i.e.

that ( f ◦u)u

′

and f are integrable on the required intervals.

10. We prove a simpler version of the fundamental theorem of calculus.

(a) Suppose f is continuous on [a, b] and deﬁne F(x) =

f (t) dt. For any c, x ∈ [a, b] where

c = x, prove that

m ≤

F(x) − F(c)

x −c

≤ M

where m, M are the maximum and minimum values of f (t) on the closed interval bounded

by c, x. Make sure to explain why m, M exist, and use this to deduce that F

′

( c) = f (c) .

(b) Suppose f is continuous on [a, b] and that F is any anti-derivative of f on a, b (that is,

′

= f ). Use part (a) and the mean value theorem to prove that

f (t) dt = F(b) − F(a).

36 Improper Integrals

The Riemann integral has several limitations. Even allowing for functions to be integrable on open

intervals (Exercise 32.6), the deﬁnition of

f (x) dx requires the following:

• That (a, b) be a bounded interval.

• That f be bounded on (a, b).

There is a natural way to extend the Riemann integral to unbounded intervals and functions: limits.

Deﬁnition 4.31. Suppose f : [a, b) → R satisﬁes the following properties:

• f is integrable on every closed bounded subinterval [a, t] ⊆ [a, b).

• Either b = ∞, or b is ﬁnite and f is unbounded at b,

The improper integral of f on [a, b) is deﬁned to be

f (x) dx := lim

t→b

−

f (x) dx

This is convergent or divergent in the same manner as the limit.

If an integral is improper at its lower limit then

f (x) dx := lim

s→a

f (x) dx.

If an integral is improper at both ends, choose any c ∈ (a, b) and deﬁne

f (x) dx = lim

s→a

f (x) dx + lim

t→b

−

f (x) dx

provided both one-sided improper integrals exist and the limit sum makes sense.

Theorem 4.13 says that the choice of c for a doubly-improper integral is irrelevant.

Many properties of the Riemann integral transfer to improper integrals, though not all. For example,

part 1 of Theorem 4.12 extends:

Theorem 4.32. If 0 ≤ f (x) ≤ g(x) on [a, b), then

f ≤

g, whenever the integrals exist (standard

or improper). In particular:

•

f = ∞ =⇒

g = ∞

•

g converges =⇒

f converges to a value ≤

We leave some of the detail to Exercise 36.7.

Examples 4.33. 1.

dx =

for any t > 0. Clearly

∞

dx = lim

t→∞

= ∞

More formally, the improper integral

∞

dx diverges to inﬁnity.

2. With f (x) = x

−4/3

deﬁned on [1, ∞),

∞

−4/3

dx = lim

t→∞

−4/3

dx = lim

t→∞

−3x

−1/3

= lim

t→∞

3 −3t

−1/3

= 3

3. Consider f (x) =

−x

on (−∞, ∞). On a bounded interval [0, t), we have

f (x) dx =

−x

dx =

−e

−x

= 1 − e

−t

−−→

t→∞

By symmetry, we conclude that

∞

−∞

−x

dx = 1 + 1 = 2

This example is important in probability: multiplying by

√

2π

, we have computed the the ex-

pectation of

when X is a normally-distributed random variable

) =

∞

−∞

√

2π

−x

dx =

4. If t ∈ [0, 1), we can use our knowledge of derivatives

sin

−1

x =

√

1−x

to evaluate

√

1 − x

dx = lim

t→1

−

√

1 − x

dx = lim

t→1

−

sin

−1

t =

and that, moreover

−1

√

1−x

dx = π. By comparison, we see that

√

1 − x

≤

√

1 − x

=⇒

−1

√

1 − x

dx ≤

−1

√

1 − x

dx = π

5. Improper integrals need not exist. For instance,

lim

t→∞

sin x dx = lim

t→∞

1 −cos t

diverges by oscillation.

Exercises 36 1. Use your answers from the previous section to decide whether the improper inte-

grals

arcsin x dx and

∞

x arctan x dx exist. If so, what are their values?

2. Let p be a positive constant. Prove the following:

dx =

(

1−p

if p < 1

∞ if p ≥ 1

∞

dx =

(

p−1

if p > 1

∞ if p ≤ 1

3. Explain why

f (x) dx = lim

t→b

−

f (x) dx holds, even when f is integrable on [a, b].

4. State a version of integration by parts modiﬁed for when

′

(x)v(x) dx is an improper inte-

gral. Now evaluate

∞

−4x

dx.

5. What is wrong with the following calculation?

∞

−∞

x dx = lim

t→∞



−t

= lim

t→∞

( t

−t

) = lim

t→∞

0 = 0

6. Prove or disprove: if

f and

g are convergent improper integrals, so is

f g.

7. Prove part of Theorem 4.32. Suppose 0 ≤ f (x) ≤ g(x) for all x ∈ [a, b), and that

g is a

convergent improper integral. Prove that

f converges and that

f ≤

Generalizing the Riemann Integral (non-examinable)

In the 1890’s, Thomas Stieltjes

offered a generalization of the Riemann integral.

Deﬁnition 4.34. Let α be a monotonically increasing function on an interval [a, b]. Given a partition

P = {x

, . . . , x

} of [a, b] and a function f , deﬁne the differences

∆α

= α(x

) −α(x

i−1

)

The upper/lower Darboux–Stieltjes sums/integrals are deﬁned analogously to the pure Riemann case:

U( f , P, α) =

∑

i=1

∆α

L( f , P, α) =

∑

i=1

∆α

U( f , α) = inf U( f , P, α) L( f , α) = sup L( f , P, α)

f is Riemann–Stieltjes integrable of class R(α) if U( f , α) = L( f , α): we denote this value

f (x) dα.

The standard Riemann integral corresponds to α(x) = x. It is the ability to choose other functions α

that makes the Riemann–Stieltjes integral both powerful and applicable.

Standard Properties Most results in sections 32 and 33 hold with suitable modiﬁcations, as does the

discussion of improper integrals. For instance,

f ∈ R(α) ⇐⇒ ∃P such that U( f , P, α) − L( f , P, α) < ϵ

The result regarding piecewise continuity of f is a notable exception: if f and α are simultane-

ously piecewise continuous then f might not lie in R(α).

Weighted integrals If α is differentiable, then we obtain a standard Riemann integral

f (x) dα =

f (x)α

′

(x) dx

weighted so that f (x) contributes more when α is increasing rapidly.

Probability If α(a) = 0 and α(b) = 1, then α may be viewed as a probability distribution function.Its

derivative α

′

is the corresponding probability density function. For example:

1. The uniform distribution on [a, b] has α =

b−a

(x − a) so that

f (x) dα =

b − a

f (x) dx

Since α

′

is constant, the integrals weigh all values of x uniformly.

2. The standard normal distribution has α(x) =

−∞

√

2π

−t

dt. The fact that α

′

√

2π

−x

is maximal when x = 0 reﬂects the fact that a normally distributed variable is clustered

near its mean.

In all cases,

f (x) dα = E( f (X)) computes an expectation (see, for instance, Example 4.33.3).

Stieltjes was Dutch; for the pronunciation try ‘steelchez.’

Non-differentiable α A major ﬂexibility comes when we allow α to be non-differentiable, or even dis-

continuous! For example, given a partition Q = {s

, . . . , s

} of [a, b], and a positive sequence

( c

)

k=1

, deﬁne

α(x) =











0 if x = a

∑

i=1

if x ∈ (s

k−1

, s

]

This is an increasing step function on [a, b]. The Riemann–Stieltjes integral becomes a weighted

sum

f (x) dα =

∑

i=1

f (s

)

Taking instead an inﬁnite sequence (s

) ⊆ [a, b] results in an inﬁnite series, which helps explain

why so many results for series and integrals look similar!

This also touches on probability. For example, let p ∈ [ 0, 1], n ∈ N, and s

= k on the interval

[0, n]. If c

(

)

(1 − p)

n−k

, then

f (x) dα =

∑

k=0





(1 − p)

n−k

f (x) = E( f (X) )

is the expectation of f (X) when X ∼ B(n, p) is a binomially distributed random variable.

Integrals and Convergence

The Lebesgue integral is another common generalization. Its main purpose is to permit the transfer

of integrability to the limit of a sequence of integrable functions.

To see the problem, consider the

sequence

: [0, 1] → R : x 7→

(

1 if x =

∈ Q with q ≤ n

0 otherwise

Each f

is piecewise continuous and thus Riemann integrable with

(x) dx = 0. However, the

pointwise limit of f

is the function

f (x) =

(

1 if x ∈ Q

0 if x ∈ Q

which is not Riemann integrable. In the Lebesgue theory, the limit f turns out to be integrable with

integral 0, so that

lim

n→∞

(x) dx =

lim

n→∞

(x) dx

Recall that the interchange of limits and integrals would be automatic if the convergence f

→ f

were uniform: of course the convergence isn’t uniform here.

Recall how uniform convergence does this for continuity.