8 Analytic Geometry and Calculus
8.1 Axes and Co-ordinates
Modern mathematics is almost entirely algebraic: we trust equations and the rules of algebra more
than pictures. For example, modern mathematics considers the expression (x + y)
2
= x
2
+ 2xy + y
2
to follow from the laws (axioms) of algebra:
(x + y)
2
= (x + y)(x + y) (definition of ‘square’)
= x(x + y) + y(x + y) (distributive law)
= x
2
+ xy + yx + y
2
(distributive law twice more)
= x
2
+ 2xy + y
2
(commutativity)
For most of mathematical history, this result would have been viewed
geometrically as in Euclid’s Elements (Thm II. 4):
The square on two parts equals the squares on each part plus
twice the rectangle on the parts.
The proof was a simple picture.
We’ve seen how algebra and algebraic notation were slowly adopted in renaissance Europe. While its
utility for efficient calculation was noted, algebra was not initially considered acceptable for proof and
calculations would be justified geometrically. From our modern viewpoint this seems backwards: if a
student today were asked to prove Euclid’s result, they’d likely label the ‘parts’ x and y, before using
the algebraic formula at the top of the page to justify the result! Of course each line in the algebraic
proof has a geometric basis:
Distributivity says that the rectangle on a side and two parts equals the sum of the rectangles
on the side and each of the parts respectively.
Commutativity says that a rectangle has the same area if rotated 90°.
Modern mathematics has converted geometric rules into algebra and largely forgotten their geomet-
ric origin! This slow movement from geometry to algebra is one of the great revolutions of mathe-
matical history, completely changing the way mathematicians think. More practically, the conversion
to algebra allows easier generalization: how would one geometrically justify an expression such as
(x + y)
4
= x
4
+ 4x
3
y + 6x
2
y
2
+ 4xy
3
+ y
4
?
Euclidean Geometry is synthetic: based on purely geometric axioms without formulæ or co-ordinates.
The revolution of analytic geometry was to marry algebra and geometry using axes and co-ordinates.
Modern geometry is primarily analytic and it is now rare to find a mathematician working solely in
synthetic geometry—algebra’s domination of Euclidean geometry is total! The critical step in this
revolution was made almost simultaneously by two Frenchmen. . .
83
Pierre de Fermat (1601–1665) Mathematics was Fermat’s pastime rather than his profession, though
this didn’t prevent him making great strides in several areas such as probability, analytic geometry,
early calculus, number theory and optics.
52
Some of Fermat’s fame comes from his enigma, with most
of what we know of his work coming in letters to friends in which he rarely offers proofs. He would
regularly challenge friends to prove results, and it is often unknown whether he had proofs himself
or merely suspected a general statement. Being outside the mainstream, his ideas were often ignored
or downplayed. When he died, his notes and letters contained many unproven claims. Leonhard
Euler (1707–1783) in particular expended much effort proving several of these.
Fermat’s approach to analytic geometry was not dissimilar to that of Descartes which we shall de-
scribe below: he introduced a single axis which allowed the conversion of curves into algebraic
equations. We’ll return to Fermat when we discuss the beginnings of calculus in the next section.
Ren´e Descartes (1596–1650) In his approach to mathematics, Descartes is the chalk to Fermat’s
cheese, rigorously recording everything. His defining work is 1637’s Discours de la ethode. . .
53
While
enormously influential in philosophy, Discours was intended to lay the groundwork for investigation
within mathematics and the sciences—Descartes finishes Discours by commenting on the necessity
of experimentation in science and on his reluctance to publish due to the environment of hostility
surrounding Galileo’s prosecution.
54
The copious appendices to Discours contain Descartes’ scientific
work. It is in one of these, La eom´etrie, that Descartes introduces axes and co-ordinates.
We now think of Cartesian axes and co-ordinates as plural, but
both Fermat and Descartes used only one axis. Here is a sketch
of their approach.
Draw a straight line (the axis) containing two fixed points la-
belled 0 (the origin) and 1. All points on the axis are identified
with numbers x (originally only positive).
A curve is described as an algebraic relationship between x and
the distance y from the axis to the curve measured using a family
of parallel lines intersecting both.
2 1 0 1 2 3
x
y
Neither Descartes nor Fermat had a second axes, though their approach implicitly imagines one,
the measuring line through the origin. It therefore makes sense for us to speak of the co-ordinates
(x, y); the modern terms abscissa (x) and ordinate (y) date from shortly after the time of Descartes. It
wasn’t long before a second axis orthogonal to the first was instituted (Frans van Schooten, 1649), an
approach that quickly became standard.
52
Fermat was wealthy but not aristocratic, attending the University of Orl
´
eans for three years where he trained as a
lawyer. You’ve likely encountered his name in relation to two famous results in number theory:
Fermat’s Little Theorem p prime = x
p
x (mod p) for all integers x.
Fermat’s Last Theorem If n N
3
, then x
n
+ y
n
= z
n
has no integer solutions with xyz = 0. Fermat is not believed to have
proved this beyond a special case (n = 4), with a complete proof not appearing until the 1990s.
53
. . . of rightly conducting one’s reason and of seeking truth in the sciences. The primary part of this work is philosophical and
contains his famous phrase cogito egro sum (I think therefore I am).
54
At this time, France was still Catholic. Descartes had moved thence to Holland in part to pursue his work more freely.
In 1649 Descartes moved to Sweden where he died the following year.
84
Example 1 The previous picture shows some of the flexibility in Descartes’ approach. The curve
y = x
2
+ 1 is drawn, where the y-axis’ is inclined 60° to the horizontal. To recognize the curve in a
more standard fashion, we can perform a change of co-ordinates. Suppose P = (x, y) with respect to
the slanted axes and (X, Y) with respect to the usual orthogonal axes.
2 1 0 1 2 3
x
y
x, X axes
Y axis
y axis
origin
x
y
Y
X
60°
P
The second picture shows that
(
X = x + y cos 60° = x +
1
2
y
Y = y sin 60° =
3
2
y
For any point on the curve,
3X Y =
3x = (
3X Y)
2
= 3x
2
= 3(y 1) = 3
2
3
Y 1
= 3X
2
2
3XY + Y
2
2
3Y + 3 = 0
which is an implicit equation for a parabola.
55
Example 2 Analytic geometry affords easy proofs of many results that are significantly harder in
Euclidean geometry. For instance, here is the famous centroid theorem.
Choose axes pointing along two sides of a triangle with the origin at
one vertex.
If the two axes-side lengths are a and b, then the third side has equa-
tion bx + ay = ab or y = b
b
a
x and the midpoints of the sides have
co-ordinates as in the picture. Now compute the point 1/3 of the
way along each median: for instance
2
3
a
2
, 0
+
1
3
(0, b) =
1
3
(a, b)
One obtains the same result with the other medians, whence all three
meet at a common point G, the centroid of the triangle.
(0, 0) (a, 0)
(0, b)
(
a
2
, 0)
(0,
b
2
)
(
a
2
,
b
2
)
G
This ability to choose axes to fit the problem is a critical advantage of analytic geometry, largely dispens-
ing with the tedious consideration of congruence in synthetic geometry.
55
This really is a parabola, just rotated! This may be confirmed by computing its discriminant: a non-degenerate quadratic
curve aX
2
+ bXY + cY
2
+ linear terms is a parabola if and only if b
2
4ac = 0.
85
Descartes used his method to solve problems that had proved more difficult synthetically, such as
finding complicated intersections of curves. As we’ll see in the next section, such arguments would
often rely on his Factor Theorem (pg. 75). Given the novelty of his approach, Descartes typically
gave geometric proofs of all assertions to back up his algebraic work. However he also saw the
future, stating that once several examples were done it was no longer necessary to draw physical
lines and provide a geometric argument, the algebra was the proof.
Exercises 8.1. 1. Assume that xy = c represents a hyperbola with asymptotes the x- and y-axes.
Show that xy + c = rx + sy also represents a hyperbola, and find its asymptotes.
2. Determine the locus of the equation b
2
2x
2
= 2xy + y
2
.
(Hint: add x
2
to both sides and remember that ‘axes’ do not have to be orthogonal. . . )
3. We describe a method whereby Descartes constructed the product of two lengths.
Let
BC and
BD be rays forming an acute angle at B.
Our goal is to multiply
|
BD
|
by
|
BC
|
.
Suppose
|
AB
|
= 1, where A lies on
BD.
Join AC and draw DE parallel to AC so that E
BC.
Prove that
|
BE
|
is the product of
|
BD
|
and
|
BC
|
.
A
B
C
D
E
Similarly, given lengths
|
BE
|
and
|
BD
|
, construct a segment whose length is the quotient
|
BE
|
|
BD
|
.
4. Here is a geometric justification of Descartes for the solution of the equation x
2
= ax + b
2
,
where a, b > 0.
Let OQ and PQ be perpendicular with lengths
a
2
and b respectively.
Draw the circle centered at O with radius
a
2
.
Draw and extend the line
OP.
The solution is x =
|
NP
|
.
M
N
P
Q
O
b
a
2
(a) Prove that Descartes’ construction indeed recovers a solution.
(b) Show that the other solution to the equation x
2
= ax + b
2
is negative (false to Descartes).
How is it visible in the picture?
(c) Explain how the same picture could be used to solve the equation x
2
+ ax = b
2
.
5. Following the work of Islamic mathematicians such as Omar Khayyam, Fermat could describe
the solutions to certain cubic equations as the intersection of two conics. For example, to solve
x
3
+ bx
2
= bc (b, c > 0), he would introduce a new variable y by setting both sides equal to bxy.
The (positive) solution is then the x-solution to the system of equations
(
x
2
+ bx = by
xy = c
Sketch these curves explicitly for the example x
3
+ 4x
2
= 24.
86
8.2 The Beginnings of Calculus
At the heart of calculus is the relationship between velocity, displacement, rate of change and area.
The instantaneous velocity of a particle is the rate of change of its displacement.
The displacement of a particle is the net area under its velocity-time graph.
To state these principles requires graphs. Analytic geometry makes computation much easier (rate
of change means slope). Once it appeared in the early 1600s, the rapid development of calculus was
arguably inevitable. However, many of the basic ideas long predate Descartes and Fermat.
In the context of the above principles, the Fundamental Theorem of Calculus is intuitive: complete
knowledge of displacement (from some starting point) is equivalent to complete knowledge of ve-
locity. The modern statement is more daunting, though familiar:
Theorem. 1. If f is continuous on [a, b], then F(x) :=
R
x
a
f (u) du is continuous on [a, b], differen-
tiable on (a, b), and F
(x) = f (x).
2. If F is continuous on [a, b] with integrable derivative on (a, b), then
R
b
a
F
(x) dx = F(b) F(a).
The triumph of the modern version is its abstraction and wide applicability: we’ve gone way beyond
considerations of velocity ( f ) and displacement (F). The challenge of teaching
56
and proving the Fun-
damental Theorem lies in developing and understanding what is meant by continuous, differentiable
and integrable. The quest for good definitions of these concepts is the story of analysis in the 17-1800s.
We begin with some older considerations of the velocity and area problems.
The Velocity Problem pre-1600
The modern ideas of uniform/average velocity are straightforward:
Measure how far an object travels in a given time interval and divide one by the other.
While several ancient Greek mathematicians considered this (and uniform acceleration), the chal-
lenge of considering a ratio of two unlike quantities (distance : time) proved difficult to surmount.
Around 1200, Gerard of Brussels resurrected this approach as a definition of velocity, though it was
not considered a numerical quantity in its own right.
Gerard was credited in the 1330s by the Oxford/Merton Thinkers as influencing their investigations
of instantaneous velocity, a much more difficult issue. They offered the following definition and made
the first statement of the ‘mean speed theorem,’ though both are vague and logically dubious.
Definition. The velocity of a particle at an instant will be measured as the uniform velocity along
the path that would have been taken by the particle if it continued with that velocity.
This is really a convoluted idea of inertial motion.
Theorem. If a particle is uniformly accelerated from rest to some velocity, it will travel half the
distance it would have traveled over the same interval with the final velocity.
56
Calculus students can easily be taught the mechanics of calculus (the power law, chain rule, etc.) without having any
idea of its meaning; witness the power and curse of analytic geometry and algebra!
87
For centuries it was thought that Galileo was the first to state such ideas (compare his falling body
discussion, pg. 81), but the Oxford group beat him by 250 years. They had no algebra with which to
prove their assertions and essentially only asserted examples.
In the 1350’s, Nicolas Oresme (Paris) considered velocity geometrically by (essentially) drawing
velocity-time graphs. As we saw previously, this is essentially the approach taken by Galileo; it
is also an early version of axes. A major difference is that Galileo married mathematics to observation;
uniform acceleration for Galileo was precisely the motion of a falling body.
A proper definition of instantaneous velocity is difficult because it requires limits, measuring average
velocity over smaller and smaller intervals. You are in good company if you find this challenging:
Zeno’s arrow paradox is essentially an objection to the very idea of instantaneous velocity! Even if
one accepts the concept, its direct measurement, even in modern times, is essentially impossible.
57
The Area Problem pre-1600
We’ve previously seen two situations in which calculus-like methods were used to describe areas.
Archimedes (sec. 3.4) computed/approximated the area of a circle and inside a parabola using
infinitely many triangles. His ‘cross-section’ approach to finding area and volume also seems
modern, though this work remained unknown until 1899.
Kepler (pg. 79) argued for his second law (equal areas in equal times) using infinitesimally small
triangles to approximate segments of an ellipse. He also applied this method to several other
problems, crediting Archimedes with the approach.
The modern notion of Riemann sums is just a special case of approximating an area using small
rectangles: the philosophical challenge is again the notion of limits and infinitesimals.
In an early antecedent, Oresme describes how to compute
the distance travelled by a particle whose speed is constant
on a sequence of intervals. For example:
Over the time interval
1
2
n+1
,
1
2
n
a particle travels at speed
1 + 3n. How far does it travel in 1 second?
Oresme drew boxes to compute areas and obtain
d =
n=0
(1 + 3n)
1
2
n
1
2
n+1
=
n=0
1 + 3n
2
n+1
= 4
Similar to Archimedes, the infinite sum was evaluated by
spotting two patterns:
0
10
20
30
y
0 1
x
1
2
+
1
2
2
+
1
2
3
··· +
1
2
n+1
= 1
1
2
n+1
0
2
+
1
2
2
+
2
2
3
+
3
2
4
+ ··· +
n
2
n+1
= 1
n + 2
2
n+1
Oresme had neither our notation nor our (limit-dependent) concept of an infinite series! He also
worked with similar problems for uniform accelerations over intervals. These are not true Riemann
sums, nor are they physical, for a particle cannot suddenly change speed!
57
For instance, radar Doppler-shift (as used to catch speeding motorists) requires measuring the wavelength of a radar
beam, which essentially compute the average velocity over a very small time interval.
88
Calculus `a la Fermat & Descartes
The advent of analytic geometry allowed Fermat and Descartes to turn the computation of instanta-
neous velocity and related differentiation problems into algebraic processes. The velocity of an object
is now identified with the slope of the displacement-time graph, which can be computed using vari-
ations on the modern method of secant lines. We discuss their competing methods.
Fermat’s method of adequation Consider the function p(x) =
x
3
12x + 19; the goal is to find the minimum, which we know
to be located at the x-value m = 2.
Fermat argues that if x
1
, x
2
are located near m such that p(x
1
) =
p(x
2
), then the polynomial p(x
2
) p(x
1
) (which equals zero!)
is divisible by x
2
x
1
. Indeed
0 =
p(x
2
) p(x
1
)
x
2
x
1
=
x
3
2
12x
2
+ 19 x
3
1
+ 12x
1
19
x
2
x
1
=
(x
2
x
1
)( x
2
2
+ x
1
x
2
+ x
2
1
12)
x
2
x
1
= x
2
2
+ x
1
x
2
+ x
2
1
12
0
5
10
15
p(x)
0 1 2 3
x
x
1
x
2
Since this holds for any x
1
, x
2
with p(x
1
) = p(x
2
), Fermat claims it also holds when x
1
= x
2
= m
(note the assumption of continuity!), and he concludes
3m
2
12 = 0 = m = 2
By considering values of x near m, it is clear that Fermat really has found a local minimum. We
recognize the idea that the slope of the tangent line is zero at local extrema.
This approach dates from the 1620s and is similar to earlier work of Vi
`
ete. Fermat later alters his
method by considering values p(x) and p(x + e) for small e (x is ‘adequated by e’). The difference
p(x + e) p(x) is more easily divided by e without factorizing. Compared with the above, we obtain
0 =
p(x + e) p(x)
e
=
x
3
+ 3x
2
e + 3xe
2
+ e
3
12x 12e + 19 x
3
+ 12x 19
e
=
3x
2
e + 3xe
2
+ e
3
12e
e
= 3x
2
12 + 3xe + e
2
Fermat finished by setting e to zero and solving for x as before. Observe the derivative p
(x) =
3x
2
12 and how e plays the role of h in the modern expression
p
(x) = lim
h0
p(x + h) p(x)
h
Fermat’s method works for any polynomial, where the limit definition of derivative requires no more
than simple evaluation at h = 0. Fermat also extended his method to cover implicit curves and their
tangents.
89
Descartes’ method of normals Descartes and Fermat are known to have corresponded regarding
their methods. Descartes indeed seems to have felt somewhat challenged by Fermat, and engaged
in some criticism of his approach. Descartes’ method (in La G´eom´etrie) relies on circles and repeated
roots of polynomials in order to compute tangents.
In this example, we compute the slope of the curve y =
1
4
(10x x
2
) at the point P = (4, 6).
0 4
P
Q
R
t
N
n
r
y
ν
x
Let N = (4 + ν, 0) be where the normal to the curve intersects the x-axis.
58
Draw a circular arc radius
r centered at N. If r is close to n, the circle intersects the curve in two points Q, R near to P. The line
QR plainly approximates the tangent at P.
The co-ordinates of Q, R may be found by solving algebraic equations: substituting y =
1
4
(10x x
2
)
into the equation for the circle results in an equation with two known roots, namely the x-values of
Q and R. By the factor theorem,
(
x (4 + ν)
2
+ y
2
= r
2
y =
1
4
(10x x
2
)
= (x Q
x
)( x R
x
) f (x) = 0
where f (x) is some polynomial (in this case quadratic). Rather than doing this explicitly, Descartes
observes that if the radius r is adjusted until it equals n, then Q and R coincide with P and the above
equation has a double-root:
(
x (4 + ν)
2
+ y
2
= n
2
y =
1
4
(10x x
2
)
= (x P
x
)
2
f (x) = (x 4)
2
f (x) = 0
Factorization can be done by hand using long-division (note that ν and n are currently unknown!):
substituting as above, we obtain
0 = x
4
20x
3
+ 116x
2
32(4 + ν)x + 16(4 + ν)
2
16n
2
= (x 4)
2
(x
2
12x + 4) + 32(3 ν)x + 16( 12 + 8ν + ν
2
n
2
)
Since the remainder 32(3 ν)x + 16(12 + 8ν + ν
2
n
2
) must be the zero polynomial, we conclude
that ν = 3. By similar triangles, the slope of the curve at P is therefore
y
p
t
2
y
2
=
ν
y
=
1
2
58
At the time, ν was known as the subnormal and t the tangent.
90
Fermat and Area The previous methods permit differentiation, albeit inefficiently. Fermat also ap-
proached the area problem, in a manner similar to Oresme. Here is an example where we find the
area under the curve y = x
3
between x = 0 and x = a.
Let 0 < r < 1 be constant. The rectangle on the interval
[ar
n+1
, ar
n
] touching the curve at its upper right-corner has
area
A
n
= (ar
n
ar
n+1
) · (ar
n
)
3
= a
4
(1 r)r
4n
The sum of the areas is therefore
n=0
A
n
= a
4
(1 r)
n=0
r
4n
=
a
4
(1 r)
1 r
4
=
a
4
1 + r + r
2
+ r
3
Setting r = 1 recovers the area under the curve:
1
4
a
4
.
y
x
aarar
2
ar
3
···
a
3
(ar)
3
(ar
2
)
3
A
0
A
1
A
2
This is non-rigorous by modern standards and again implicitly invokes limits
59
by setting r = 1.
Regardless, Fermat is able to establish the power law
R
a
0
x
n
dx =
1
n+1
a
n+1
for any positive integer n.
Italian Calculus in the 17
th
Century: the Area and Volume Problems
In contrast to the work of Fermat and Descartes, contemporary Italian scholars were more focused on
integration problems. Here is Galileo’s classic ‘soup bowl’ problem, where he compares the volume
between a hemisphere and a cylinder to that of a cone.
Galileo observed
60
that the cross-sectional areas on both sides are equal (to πy
2
). Since all cross-
sections are equal, so must be the volumes. Unfortunately for Galileo, he couldn’t sufficiently address
two philosophical objections:
The zero-measure problem If cross-sections are ‘equal,’ then the top cross-sections state that a circle
‘equals’ a point.
Infinitesimals sum to the whole? Can we claim that equal cross-sections imply equal volumes?
It was Galileo’s advocacy on these points that first gained him notoriety with the Church. His later
evangelism for the Copernican theory merely rekindled old animosities.
59
For Fermat, r =
n
m
would have been rational, and he’d have set m = n at the end as in his adequation method
60
As did Archimedes 1900 years earlier (pg. 32), though Galileo was unaware of it.
91
Bonaventura Cavalieri (1598–1647) Cavalieri, a student of Galileo and a Jesuat scholar, gave a more
thorough discussion of indivisibles in 1635. He is particularly remembered for Cavalieri’s principle:
If geometric figures have proportional cross-sectional measure at every point relative to a
line, then the figures have measure in the same proportion.
Galileo’s soup bowl is an example of this reasoning, where the
‘line’ is any vertical.
Another classic example involves sliding a stack of coins or a
deck of cards; the volume of the slanted coin stack equals that
of the cylinder.
Extending his principle, Cavalieri inferred the power law
R
a
0
x
n
dx =
1
n+1
a
n+1
, giving reasonable
arguments for n = 1 and 2. Here is a sketch of his approach for n = 2.
Draw a cube of side x inside a cube of side a. This consists of three
congruent pyramids with base a
2
and eight a.
Consider the pyramid apex O and whose base is the square face near-
est the viewer. The cross-section of this pyramid at position x has
area x
2
. In Cavalieri’s language, the pyramid is ‘all the squares;’ in
modern language
Z
a
0
x
2
dx =
1
3
a
3
Cavalieri also used his method (Book IV, Prop 19 of Geometria Indivisibilis) to calculate the area en-
closed in an Archimedean spiral. Suppose B moves at constant speed along a line OA rotating at
constant speed about O. In polar co-ordinates B traces a curve r = kθ for 0 θ 2π (we take k = 1).
If
|
OB
|
= r, then the arc inside the spiral has length
(r) = 2πr ·
2π θ
2π
= r(2π r)
Imagine the arc as a noodle which, when cut at B and allowed to fall,
forms the line CD. The area in the spiral therefore equals that within the
parabola which (thanks to Archimedes and Cavalieri himself) is
4
3
that of
the largest triangle that can fit inside, namely
4
3
·
1
2
(2π)(π) =
4
3
π
3
Cavalieri did this slightly differently, his parabola being drawn in a rectangle and the difference
subtracted from a triangle, but the above is easier to visualize.
Cavalieri did not court controversy like Galileo, being well-aware of the contentious nature of indi-
visibles and taking great pains to distinguish ‘all the lines/squares’ of a figure from the figure itself.
Geometria Indivisibilis was dense and difficult, so it wasn’t easy to challenge. Cavalieri therefore re-
mained relatively safe, even as his political rivals (the Jesuit order) worked hard to stamp out the
‘dangerous’ study of indivisibles.
92
Evangelista Torricelli (1608–1647) A contemporary of Galileo and Cavalieri, Torricelli made several
applications of Cavalieri’s principle and advocated for its careful use.
If the sides of a rectangle are in the ratio 2 : 1, so also are indicated
red and blue segments. In Cavalieri’s language, ‘all the lines’ of the
red triangle are twice ‘all the lines’ of the blue triangle! Torricelli ob-
serves that the red triangle cannot have twice the area of the blue, since
they are congruent and points out that Cavalieri’s principle has been
misapplied: the cross-sections were not measured with reference to a
common line.
In modern language,
R
2
0
1
2
x dx =
R
1
0
2y dy are equal via the substitution x = 2y. The point is that the
infinitesimals are also in the same ratio dx : dy = 2 : 1.
Another of Torricelli’s examples offers a seeming paradox.
The hyperbola with equation z =
1
x
is rotated around the z-
axis. A cylinder centered on the z-axis with radius x lying
under the surface has surface area
A = circumference · height = 2πxz = 2π
Underneath the graph at x, Torricelli draws a circular disk
with area 2π. Since the area of this disk is independent of
x, he argues that the volume under the original surface out
to radius a equals the volume of the solid cylinder:
V = 2πa
Torricelli argues that this is a correct use of Cavalieri’s
principle since the cylindrical ‘cross-sections’ and the cir-
cular cross-sections are both measured with respect to the
same line (the x-axis).
This is precisely the method of volume by cylindrical
shells that we learn in modern calculus:
V =
Z
a
0
2πx ·
1
x
dx = 2πa
The conundrum is that the surface is infinitely tall! How
can we justify the idea that it lies above a finite volume?
Galileo, Cavalieri and Torricelli mark the end of 400 years of Italian dominance of in science and
mathematics dating back to Fibonacci. Their ideas were too controversial to thrive so close to the
center of Church power. The center of European science and philosophy therefore moved northward.
The English and French (protestant) reformations of the 1500s together with developing ideas of
reformed government
61
meant that Northern Europe provided more fertile ground for new ideas.
61
For instance Hobbes’ Leviathan written during the English Civil War (1642–1651) was a plea for the constraint of abso-
lute monarchical power. The War itself indeed proved decapitatingly effective at reining in a King. . .
93
Exercises 8.2. 1. Find the maximum of p(x) = 5 + x 2x
2
using Fermat’s first method.
2. Use Fermat’s second method (“+e”) to find the maximum of bx x
3
. How might Fermat decide
which of the two solutions to choose as his maximum?
3. Justify Fermat’s first method of determining maxima and minima by showing that if M is a
maximum of p(x), then the polynomial p(x) M always has a factor (x a)
2
, where a is the
value of x giving the maximum.
4. Use Descartes’ method of normals to compute the slope of the curve y = x
2
at (a, a
2
).
5. Suppose the surface of a sphere of radius r is subdivided into infinitesimal regions of equal
area. Following Kepler, use the formula for the volume of a cone (
1
3
base·height) to find the
relationship between the volume V of the sphere and its surface area A.
6. (A problem of Kepler) Show that the largest circular cylinder that can be inscribed in a sphere
is one in which the ratio of diameter to altitude is
2 : 1.
(Hint: Relate the problem to finding the maximum of the function f (x) = x
1
4
x
3
, for which you can
use modern calculus)
7. We consider a version of Gregory St. Vincent’s 1647 approach to the area under the hyperbola
xy = 1.
(a) If 1 < a < b and r > 0 as in the picture, explain why
the areas A and B under the curve satisfy the same
inequalities
b a
b
< A, B <
b a
a
(Since [a, b] may be subdivided into arbitrarily many subin-
tervals, the areas A and B are therefore equal)
0
1
y
0 1
x
a b ar b r
A
B
(b) If A(x) is the area under the hyperbola between 1 and x, explain why A satisfies the loga-
rithmic identity
A(x
1
x
2
) = A(x
1
) + A(x
2
)
Why are you not surprised by this?
(For simplicity, assume 1 < x
1
< x
1
x
2
)
8. Consider two copies of triangle with sides a, b arranged
into a rectangle. Argue that ‘all the lines’ of the rectangle
are twice ‘all the lines’ of the triangle.
(In modern language,
R
a
0
b dy = 2
R
a
0
b
a
y dy)
b
a
9. Repeat Cavalieri’s analysis of the spiral (pg. 92) to find the area inside one revolution of the
curve r = kθ for any k > 0.
94
8.3 Calculus in the late 1600s
By the second half of the 17
th
century, the mathematical center of Europe had moved northwards, to
France, Germany, Holland and Britain. In this section we present some of this work, culminating in
the efforts of Newton and Leibniz.
Hendrick van Heuraet (1634–1660) Working in Holland, van
Heuraet studied Descartes’ and argued that the arc-length of a
curve described by a function y equals the area under the curve
z =
n
y
where n is the ‘normal’ curve. His method appeared in
Frans van Schooten’s 1659 version of Descartes’ La Geometrie. To
see why this should make sense, recall Descartes’ method of nor-
mals and observe that the ratio n : y equals that of ds : dx in a
differential triangle.
His most famous example involved calculating the arc-length of
the curve y
2
= x
3
. By Descartes’ method,
(
(x a ν)
2
+ y
2
= n
2
y
2
= x
3
= 0 = x
3
+ x
2
2(x + ν)x + (a + ν)
2
n
2
= (x a)
2
(x + 2a + 1) + (3a
2
2ν)x
+ ν
2
+ 2aν 2a
3
n
2
y
x
a0
0
y
ν
n
dx
dy
ds
z
x
a0
0
1
Arc-length
equals
Area
Since the remainder must be zero, we conclude that ν =
3
2
a
2
, from which
n =
q
ν
2
+ y
2
=
r
9
4
a
4
+ a
3
= z =
n
y
=
r
9
4
a + 1
The arc-length from x = 0 to a is therefore the area under the parabola z
2
=
9
4
x + 1 between the same
limits. By the usual Archimedean
4
3
-triangle approach, we see that
Arc-length =
4
3
1
2
·
4
9
+ a
z(a)
1
2
·
4
9
·1
=
a +
4
9
3/2
8
27
James Gregory (1638–1675) Hailing from Aberdeen, Scotland, Gregory studied in Italy with Stefano
Angeli, a pupil of Torricelli, before returning to Scotland where he became chair of mathematics at
St. Andrews, and then Edinburgh.
Gregory repeats van Heuraet’s work relating the length of a curve to the area under another, before
considering whether the process can be reversed: given a curve z(x), can we find a curve y(x) such
that the arc-length of y is given by the area under z? In modern language, given z, find y such that
Z
a
0
q
1 + y
2
dx =
Z
a
0
z dx
which we’d view as solving the ODE
dy
dx
=
z
2
1. Gregory’s solution was to define y to be the area
under the curve
z
2
1 from x = 0 to a. This is essentially part 1 of the fundamental theorem: if
you want something whose slope (derivative) is given, define it to be the area under the curve!
95
Isaac Barrow (1630–1677) Like Gregory, Barrow also studied mathematics in Italy (and France),
before returning to England to become the inaugural Lucasian Professor of Mathematics
62
at Trinity
College Cambridge. Barrow’s work remained predominantly geometric; he stated proving geometric
versions of both parts of the fundamental theorem, though credited Gregory with part of the argu-
ment. In a precursor of Newton’s work, he also modified Fermat’s algorithm for differentiation. For
example, here is how Barrow would have found the slope of the curve x
2
+ 2xy
2
= c at a point (x, y):
Replace x and y with x + e and y + a respectively and expand:
x
2
+ 2ex + e
2
+ 2xy
2
+ 4axy + 2a
2
x + 2ey
2
+ 4eay + 2ea
2
= c
Delete everything from the original equation x
2
+ 2xy
2
= c and every expression containing
two or more of the terms e, a:
2ex + 4axy + 2ey
2
= 0
Rearrange: the slope is the ratio a : e = x y
2
: 2xy
This is implicit differentiation, where dx = e and dy = a. Note again the essential difficulty with
these algorithmic approaches to calculus: the infinitesimal quantities e, a are necessary for the calcu-
lation, but most of them are discarded when no longer useful! Can we really calculate with objects
that must simultaneously exist and be zero?
63
Isaac Newton (1642–1727)
The caricature of Newton is of an obsessive genius—difficult to get along with, but with a phenom-
enal ability to concentrate on problems. One possibly apocryphal story describes how he continued
lecturing to an empty room even after no-one had turned up!
We are mostly interested in Newton’s mathematics,
though his wider fame comes from its wide applica-
tion, particularly to gravitation. In Philosophiæ Natu-
ralis Principia Mathematica (1686), Newton applied his
three laws of motion
64
and the machinery of calcu-
lus to prove the relationship between Kepler’s laws
of planetary motion (pg. 78) and an inverse-square
gravitational force. The Principia is Newton’s first
published work involving calculus, though many
of its results and his method appear to have been
worked out 20 years previously, just after completing
his undergraduate studies and while Cambridge was
closed due to the 1665–6 plague epidemic.
Kepler’s 1st law = inverse-square force
62
One of the most prestigious world-wide academic positions in theoretical Physics, in large part due to the fame of its
second incumbent: Issac Newton. Later chairs include Paul Dirac and Stephen Hawking.
63
This is at the heart of Bishop George Berkeley’s (after whom the Californian city and university are named) famous
1734 objection to calculus; that infinitesimals are merely the “ghosts of departed quantities.”
64
I. Inertial motion; II. F = ma; III. Equal-and-opposite forces. Consider these as the axioms of Newtonian mechanics.
96
Newton’s geometric presentation was typical for the time. He comments on how calculations are
more efficient using indivisible methods, but that the ‘hypothesis of indivisibles seems somewhat
harsh’ (he likely wants to counter the impression that his method is philosophically shaky). Newton’s
approach makes it hard for modern readers to extract calculus algorithms; indeed the Principia is not
a calculus textbook and it is not really possible to learn calculus directly from it. Nevertheless, it
contains notions of many standard concepts, for instance:
Limits/continuity Book I, Lemma I: “Quantities, . . . which in any finite time converge continually
to equality, and before the end of that time approach nearer to each other than by any given
difference, become ultimately equal.”
One can see modern ideas appearing (e.g., ϵ > 0,
|
a b
|
< ϵ = a = b) though there is a
long way to go! What, for instance does approach mean?
Product Rule In the following pages, Newton argues for the product rule by augmenting the sides
of a rectangle.
A
B
a
b
If a, b are infinitesimal changes in the sides of a rectangle with sides A, B, then the moment of
mutation (infinitesimal change) of the generated rectangle AB is the quantity aB + bA. In more
familiar language,
(A + a)(B + b) AB aB + bA
where Newton ignores the double-infinitesimal quantity ab, exactly as did Barrow and others
before him.
Power Law In the same pages, Newton asserts the general power law for rational exponents
. . . the moment of any power A
n
m
will be
n
m
aA
nm
m
.
in what seems like a non-rigorous appeal to induction based on the product rule. In reality,
Newton established this using infinitesimal arguments; we’ll see one of his methods for this
shortly.
In contrast to the mostly synthetic presentation in the Principia, Newton made great use of algebra
in his private calculations and correspondence with friends. Some of these private works were pub-
lished many years later. We discuss some of his methods in what follows.
97
Fluxions and Fluents Newton’s main language for calculus (he had several!) referred to time-
dependent quantities x, y as fluents and their derivatives as fluxions, denoted using dots
˙
x,
˙
y; the
modern notation y
comes from this.
65
Anti-derivatives were denoted by placing an accent directly
over a quantity:
´
x is a fluent of which x is the fluxion. Newton had several algorithms for computing
fluxions, often variants of those of previous mathematicians: for instance, here he finds the relation-
ship between the fluxions of fluents x, y satisfying x
2
+ 3xy
3
+ y = 5.
Rearrange as a polynomial in x: thus x
2
+ (3y
3
)x + (y 5) = 0.
Multiply terms by a decreasing arithmetic sequence (e.g. 2, 1, 0) and the entire expression by
˙
x
x
:
(2x + 3y
3
)
˙
x
Repeat for y, using the same arithmetic sequence; in this example, 2 corresponds to x
2
, so we start
with 3 for y
3
:
(3x)y
3
+ 0y
2
+ y + (x
2
5) = 0 (9xy
2
+ y)
˙
y
Sum these expressions, set equal to zero and rearrange for the required ratio:
(2x + 3y
2
)
˙
x + (9xy
2
+ y)
˙
y = 0 =
˙
y
˙
x
=
2x + 3y
2
9xy
2
+ y
The arithmetic sequence encodes the power law for derivatives, and the result is exactly what you’d
expect from modern implicit differentiation.
The Binomial Series Also discovered during the plague years, but first appearing in a private letter
of 1676, is Newton’s discovery of the binomial series
(1 + x)
α
= 1 +
k=1
α(α 1)(α 2) ···(α k + 1)
k!
x
k
which allowed Newton to expand expressions such as
(1 + x)
1/2
= 1 +
1
2
x
1
8
x
2
+
1
16
x
3
5
128
x
4
+ ···
Newton’s version was only for fractional exponents and is more difficult to read:
P + PQ
m
n
= P
m
n
+
m
n
AQ +
m n
2n
BQ +
m 2n
3n
CQ +
m 3n
4n
DQ + ··· ()
Newton wrote exponents using juxtaposition, and A, B, C, D, . . ., meant ‘the previous term’: thus
P
m
n
= P
m/n
, A = P
m/n
and B =
m
n
AQ =
m
n
P
m/n
Q. In more modern language, () reads
(P + PQ)
m/n
= P
m/n
+
m
n
P
m/n
Q +
m(m n)
2n
2
P
m/n
Q
2
+
m(m n)(m 2n)
6n
3
P
m/n
Q
3
+ ···
65
A fluent is a ‘flowing’ quantity: to us, a smooth function, though this was not formally defined. To be in flux is to be
changing, hence ‘rate of change.’ Newton’s dot-notation (‘pricked letters’) persists in the modern field of dynamics: for
instance
¨
r =
GM
r
3
r is the differential equation arising from Newton’s second law together with the inverse-square law for
gravitation (note the double dot for the second derivative).
98
Newton’s ‘proof wouldn’t pass modern muster. His discoveries were largely the result of some
inspired pattern-spotting! Several examples were explicitly verified by multiplying out or using
long-division. For instance the series for
1 + x may be obtained
1 + x = (1 + ax + bx
2
+ cx
3
+ ···)
2
= 1 + 2ax + (a
2
+ 2b)x
2
+ (2c + 2ab)x
3
+ ···
= a =
1
2
, b =
1
2
a
2
=
1
8
, c = ab =
1
16
, etc.
=
1 + x = 1 +
1
2
x
1
8
x
2
+
1
16
x
3
+ ···
Newton did not work through the full theory of infinite series as you would encounter in a typical
undergraduate analysis course. For instance:
1 + x is well-defined for x 1, yet the resulting series only converges when 1 x 1.
What happens in general: must a series converge to the original/generating function?
Is it legitimate to differentiate and integrate power series term-by-term as if they are polynomi-
als? Answer: (mostly) yes, though it was 150 years before Cauchy, Weierstraß and others could
rigorously confirm this.
The Power Law & the Fundamental Theorem Newton’s ability to expand expressions as infinite
series was essential to his arguments. Here is one of his arguments to prove the power law.
1. Assume that the area under a curve y is given by a function
z =
x
m
n
+1
m
n
+ 1
=
n
m + n
x
m+n
n
2. If x is increased by an infinitesimal amount o, then the new area
under the curve is found using the binomial series
z + oy =
n
m + n
(x + o)
m+n
n
=
n
m + n
x
m+n
n
1 +
o
x
m+n
n
=
n
m + n
x
m+n
n
1 +
m + n
n
·
o
x
+
(m + n)m
2n
2
·
o
2
x
2
+ ···
oy
z
x
y
3. Following Barrow, Newton cancels the terms in the original equation, divides by o, and throws
out all remaining o-terms. The result is
y =
n
m + n
x
m+n
n
m + n
n
·
1
x
= x
m/n
Note the link-up with the fundamental theorem, which is intuitively obvious when y is a
‘flowing’ (continuous) quantity: the additional area is approximately an infinitesimal rectan-
gle oy = dz with base o = dx and height y, whence
dz = y dx
dz
dx
=
d
dx
Z
x
y(t) dt = y(x)
99
Using similar approaches, Newton produced one of the first tables of integrals, listing much of what
you’d find inside the covers of an undergraduate calculus textbook! He also obtained power series
for trigonometric and logarithmic functions, partly following the work of Gregory and others. By
combining his approach with the power law, he was able to efficiently integrate and differentiate an
enormous variety of functions.
Gottfried Wilhelm Leibniz (1646–1716)
Leibniz hailed from Leipzig, southwest of Berlin, in what was then part of the Holy Roman Empire.
His initial studies were in philosophy, following his professor father. Though he eventually became
a diplomat and then counsellor to the Duke of Hanover, his taste for advanced mathematics was
fueled during his 1672–76 sojourn in Paris, where he was introduced to van Shooten’s expansion of
Descartes’ geometric ideas, in particular the concept of the differential triangle (page 95) which was
already in use by others such as Pascal and Barrow.
The familiar notations for derivatives
dy
dx
and integrals
R
y dx come from Leibniz. Very loosely, here
is its origin and how it relates to the fundamental theorem. Suppose that
(x
0
, z
0
), (x
1
, z
1
), . . . , (x
n
, z
n
), x
0
< x
1
< ··· < x
n
describe a sequence of points along a curve z(x) defined on an interval [x
0
, x
n
]. One may then form
the sequences of differences and sums of the ordinates z
i
:
δz
i
=
z
1
z
0
, z
2
z
1
, . . . , z
n
z
n1
,
z
i
=
z
0
, z
0
+ z
1
, . . . , z
0
+ ··· + z
n
Two relationships between sums and differences are immediate:
1. The difference sequence of the sums returns the original sequence:
δ
z
i
=
z
0
, z
1
, . . . , z
n
2. The sum of the difference sequence is the net change in the ordinate:
δz
i
= (z
1
z
0
) + (z
2
z
1
) + ···+ (z
n
z
n1
) = z
n
z
0
Leibniz’s notation arises from viewing a curve as an infinite sequence of points: he writes dz for the
infinitesimal differences and
R
for the sum of infinitely many infinitesimal objects. Observation 2
becomes
R
dz = z(x
n
) z(x
0
): the sum of the infinitesimal changes in z is its net change. Indeed, if
we assume that z(x) describes the area
66
under a curve y(x), the fact that dz = y dx recovers part 2 of
the fundamental theorem of calculus:
2.
R
y dx = z: the area under the curve is the sum of its infinitesimal increments.
Observation 1 makes sense once we apply it to a ‘sequence’ of infinitesimals z
i
y dx, whence it
becomes part 1 of the fundamental theorem:
1. d
R
y dx
= y dx, or alternatively,
d
dx
R
y dx
= y: the rate of change of the area
function is the ordinate.
66
Notation is chosen to match Newton’s discussion of the power law (page 99).
100
While we are happy to refer to infinitesimals and infinite sums, Leibniz was more cagey in his pub-
lications out of fear of criticism. He referred to each dx as an arbitrary (if small) finite line segment,
and therefore—like every other contemporary practitioner of calculus—fails to get to grips with the
essential paradoxes involved. Regardless, he and his followers became adept at manipulating differ-
ential expressions. For instance, if y = z
3
+ 2z, Leibniz might compute
dy = (z + dz)
3
+ 2(z + dz) z
3
2z = 3z
2
dz + 3z(dz)
2
+ (dz)
3
+ 2 dz
= (3z
2
+ 2z) dz
where the (dz)
2
and (dz)
3
terms are discarded due to their (relatively) infinitesimal size. Using
such approaches Leibniz justified general formulas such as the linearity of derivatives, the product,
quotient and power rules. Even the chain rule is easy in Leibniz’s notation: given y(x) =
1 + x
3
,
Leibniz might perform a substitution u = 1 + x
3
and observe that
dy = d
u =
u + du
u =
u + du u
u + du +
u
=
du
2
u
=
3x
2
dx
2
1 + x
3
Like Newton, Leibniz worked extensively with power series. As a final example here is his compu-
tation of that for sine. In contrast to Newton’s approach,
67
Leibniz derived the well-known second-
order ODE satisfied by sine.
In a circle of radius 1, let s be the polar angle and y = sin s the ordinate.
By considering the differential triangle, we see that
dy
ds
=
q
1 y
2
= (1 y
2
)(ds)
2
= (dy)
2
()
Leibniz supposes that infinitesimals ds describing the arc are constant
and applies d again (with the product rule):
68
2y (dy)(ds)
2
= 2(dy)d(dy) =
d( dy)
(ds)
2
= y
=
d
2
y
ds
2
s
y
1
p
1 y
2
s
dx
ds
dy
He then writes y(s) = sin s = c
0
+ c
1
s + c
2
s
2
+ c
3
s
3
··· as a power series. Since at s = 0, y = sin s = 0
and ds = dy (by ), Leibniz sees that c
0
= 0 and c
1
= 1. He now differentiates twice and equates
coefficients:
d( dy)
(ds)
2
= 2c
2
+ 6c
3
s + 12c
4
s
2
+ 20c
5
s
3
+ 30c
6
s
4
+ ··· = y = s c
2
s
2
c
3
s
3
c
4
4
···
= 0 = c
2
= c
4
= c
6
= ··· , c
3
=
1
6
, c
5
=
1
20
c
3
=
1
120
, . . .
to obtain the familiar series
y = sin s = s
1
6
s
3
+
1
120
s
5
+ ··· = s
1
3!
s
3
+
1
5!
s
5
1
7!
s
7
+ ···
While not precisely the same as modern calculus—in particular, the differentials dx, dy, ds are sep-
arate quantities—this should all seem very familiar! The computational efficiency of such notation
super-charged mathematics and its applicability to real-world problems.
67
Newton first found a series for arcsine before computing its inverse.
68
We’ve bracketed everything. When these are removed, we obtain the standard Leibniz notation for second derivatives!
101
Exercises 8.3. 1. Show that to find the length of the arc of the parabola y = x
2
one needs to
determine the area under the hyperbola y
2
4x
2
= 1.
2. Use Barrow’s a, e method to determine the slope of the tangent line to the curve x
3
+ y
3
= c
3
.
3. Calculate a power series for
1
1x
2
by using long-division.
4. Use the binomial series to obtain a power series expression for ln(1 + x), which Newton knew
to describe the area under the curve
1
1+x
.
5. Using his calculus, Newton was able to extend older methods of approximation.
Suppose f (c) = 0 and that a
0
is an initial approximation to c.
The tangent line at
a
0
, f (a
0
)
has equation
y = f (a
0
) + f
(a
0
)( x a
0
)
which intersects the x-axis at a
1
:= a
0
f (a
0
)
f
(a
0
)
.
Iterate to obtain a sequence (a
n
) that (typically) converges to c:
a
n+1
= a
n
f (a
n
)
f
(a
n
)
lim
n
a
n
= c
f (x)
x
a
0
a
1
a
2
c
(a) If f (x) = x
2
c, show that Newton’s method is the Babylonian method of the mean.
(b) Use Newton’s method to solve the equation x
2
2 = 0 to a result accurate to eight decimal
places. How many steps does this take?
6. Calculate the relationship of the fluxions in the equation x
3
ax
2
+ axy y
3
= 0 using multi-
plication by the progression 4, 3, 2, 1. Compare to what happens if you use the progression 3,
2, 1, 0. What do you notice?
7. Use Leibniz’s differential triangle to argue that
x = cos s and dx = y ds
Where does the negative sign come from? Hence find the standard power series representation
for cosine and conclude that the rate of change (derivative) of sine is cosine.
8. Given a curve y(x) with y(0) = 0, Leibniz’s transmutation theorem relates the area under y and
the curve z(x) = y x
dy
dx
obtained by considering the y-intercepts of the tangent lines to the
original:
Z
a
0
y dx =
1
2
ay(a) +
Z
a
0
z dx
If x
p
= y
q
, find z and use the transmutation theorem to find the area
R
a
0
y dx.
How does the transmutation theorem relate to integration by parts?
102