100% found this document useful (3 votes)
351 views326 pages

Multivariable Calculus

This document is a textbook on multivariable calculus by Don Shimamoto. It covers topics such as vectors, vector-valued functions, real-valued functions of several variables, and integration in multiple dimensions. The book is divided into preliminary chapters covering concepts like vectors, matrices and linear transformations. Subsequent chapters cover vector-valued functions of one variable, real-valued functions, differentiation and integration of functions with multiple variables. The textbook contains examples, exercises and is intended for use in a multivariable calculus course.

Uploaded by

Saurabh Sarkar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
100% found this document useful (3 votes)
351 views326 pages

Multivariable Calculus

This document is a textbook on multivariable calculus by Don Shimamoto. It covers topics such as vectors, vector-valued functions, real-valued functions of several variables, and integration in multiple dimensions. The book is divided into preliminary chapters covering concepts like vectors, matrices and linear transformations. Subsequent chapters cover vector-valued functions of one variable, real-valued functions, differentiation and integration of functions with multiple variables. The textbook contains examples, exercises and is intended for use in a multivariable calculus course.

Uploaded by

Saurabh Sarkar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 326

MULTIVARIABLE

CALCULUS

Don Shimamoto

www.dbooks.org
www.dbooks.org
Multivariable Calculus
Don Shimamoto
Swarthmore College

www.dbooks.org
ii

2019
c Don Shimamoto
ISBN: 978-1-7082-4699-0

This work is licensed under the Creative Commons Attribution 4.0 International
License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Unless noted otherwise, the graphics in this work were created using the software packages: Cin-
derella, Mathematica, and, in a couple of instances, Adobe Illustrator and Canvas.
Contents

Preface vii

I Preliminaries 1

1 Rn 3
1.1 Vector arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 The matrix of a linear transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 The geometry of the dot product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.7 Exercises for Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

II Vector-valued functions of one variable 23

2 Paths and curves 25


2.1 Parametrizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Velocity, acceleration, speed, arclength . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3 Integrals with respect to arclength . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4 The geometry of curves: tangent and normal vectors . . . . . . . . . . . . . . . . . . 31
2.5 The cross product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.6 The geometry of space curves: Frenet vectors . . . . . . . . . . . . . . . . . . . . . . 38
2.7 Curvature and torsion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.8 The Frenet-Serret formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.9 The classification of space curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.10 Exercises for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

III Real-valued functions 53

3 Real-valued functions: preliminaries 55


3.1 Graphs and level sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2 More surfaces in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3 The equation of a plane in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.4 Open sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.5 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

iii

www.dbooks.org
iv CONTENTS

3.6 Some properties of continuous functions . . . . . . . . . . . . . . . . . . . . . . . . . 69


3.7 The Cauchy-Schwarz and triangle inequalities . . . . . . . . . . . . . . . . . . . . . . 71
3.8 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.9 Exercises for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4 Real-valued functions: differentiation 83


4.1 The first-order approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.2 Conditions for differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.3 The mean value theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.4 The C 1 test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.5 The Little Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.6 Directional derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.7 ∇f as normal vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.8 Higher-order partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.9 Smooth functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.10 Max/min: critical points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.11 Classifying nondegenerate critical points . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.11.1 The second-order approximation . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.11.2 Sums and differences of squares . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.12 Max/min: Lagrange multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.13 Exercises for Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5 Real-valued functions: integration 121


5.1 Volume and iterated integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.2 The double integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.3 Interpretations of the double integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.4 Parametrizations of surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.4.1 Polar coordinates (r, θ) in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.4.2 Cylindrical coordinates (r, θ, z) in R3 . . . . . . . . . . . . . . . . . . . . . . . 137
5.4.3 Spherical coordinates (ρ, φ, θ) in R3 . . . . . . . . . . . . . . . . . . . . . . . 138
5.5 Integrals with respect to surface area . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.6 Triple integrals and beyond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
5.7 Exercises for Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

IV Vector-valued functions 155

6 Differentiability and the chain rule 157


6.1 Continuity revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
6.2 Differentiability revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6.3 The chain rule: a conceptual approach . . . . . . . . . . . . . . . . . . . . . . . . . . 161
6.4 The chain rule: a computational approach . . . . . . . . . . . . . . . . . . . . . . . . 162
6.5 Exercises for Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

7 Change of variables 173


7.1 Change of variables for double integrals . . . . . . . . . . . . . . . . . . . . . . . . . 174
7.2 A word about substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
7.3 Examples: linear changes of variables, symmetry . . . . . . . . . . . . . . . . . . . . 178
CONTENTS v

7.4 Change of variables for n-fold integrals . . . . . . . . . . . . . . . . . . . . . . . . . . 182


7.5 Exercises for Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

V Integrals of vector fields 195

8 Vector fields 197


8.1 Examples of vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
8.2 Exercises for Chapter 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

9 Line integrals 203


9.1 Definitions and examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
9.2 Another word about substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
9.3 Conservative fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
9.4 Green’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
9.5 The vector field W . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
9.6 The converse of the mixed partials theorem . . . . . . . . . . . . . . . . . . . . . . . 222
9.7 Exercises for Chapter 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

10 Surface integrals 235


10.1 What the surface integral measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
10.2 The definition of the surface integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
10.3 Stokes’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
10.4 Curl fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
10.5 Gauss’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
10.6 The inverse square field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
10.7 A more substitution-friendly notation for surface integrals . . . . . . . . . . . . . . . 259
10.8 Independence of parametrization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
10.9 Exercises for Chapter 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

11 Working with differential forms 273


11.1 Integrals of differential forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
11.2 Derivatives of differential forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
11.3 A look back at the theorems of multivariable calculus . . . . . . . . . . . . . . . . . 279
11.4 Exercises for Chapter 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280

Answers to selected exercises 285

Index 307

www.dbooks.org
vi CONTENTS
Preface

This book is based on a course that I taught several times at Swarthmore College. The material is
standard, though this particular course is geared towards students who enjoy learning mathematics
for its own sake. As a result, there is a priority placed on understanding why things are true and
a recognition that, when details are sketched or omitted, that should be acknowledged. Otherwise,
the level of rigor is fairly normal. The course has a prerequisite of a semester of linear algebra.
That is not necessary for this book, but it helps. Many of the students go on to more advanced
courses in mathematics, and the course is calibrated to be part of the path on the way to the more
theoretical, proof-oriented nature of upper-level material. Indeed, the hope is that the course will
help inspire the students to continue on.
Roughly speaking, the book is organized into three main parts depending on the type of function
being studied: vector-valued functions of one variable, real-valued functions of many variables, and,
finally, the general case of vector-valued functions of many variables. The table of contents gives a
pretty good idea of the topics that are covered, but here are a few notes:
• Chapter 1 contains the basics of working with vectors in Rn . For students who have studied
linear algebra, this is review.
• Chapter 2 is concerned with curves in Rn . This belongs to the study of functions of one
variable, and many students will have studied parametric equations in their first-year calculus
courses, at least for curves in the plane. One of the appealing aspects of the topic is that it is
possible to give a reasonably complete proof of a substantial result—the classification of space
curves—in a way that illustrates the basic vector concepts that have just been introduced.
Many of the proofs along the way and in the related exercises feel like calculations. This
allows the students to ease into the mindset of proving things before encountering the more
argument-based proofs to come.
• Chapters 3, 4, and 5 study real-valued functions of many variables. This includes the topics
most closely associated with multivariable calculus: partial derivatives and multiple integrals.
The discussion of differentiation emphasizes first-order approximations and the notion of
differentiability. For integration, the focus is almost entirely on functions of two variables
and, after the change of variables theorem is introduced later on, functions of three variables.
• Chapters 6 and 7 begin the study of vector-valued functions of many variables. The main
results here are the chain rule and the change of variables theorem. Their importance to the
subsequent theory is reiterated throughout the rest of the book.
• Chapters 8, 9, and 10 introduce vector fields and their integrals over curves in Rn and surfaces
in R3 , in other words, line and surface integrals. This leads to the theorems of Green, Stokes,
and Gauss, which are often taken as the target destinations of a multivariable calculus course.
• The book closes in Chapter 11 with a brief discussion of differential forms and their relation
to what has come before. The treatment is superficial but hopefully still illuminating. The

vii

www.dbooks.org
viii PREFACE

chapter is a success if the students come away believing that something interesting is going
on and curious enough to want to learn more.

As is always the case, the most useful way for students to learn the material is by doing problems,
and this book is written to get to the exercises as quickly as possible. In some cases, proofs are
sketched in the text and the details are left for the exercises. I have tried to make sure that there is
enough information in the text for the students to be able to do all the problems, but some students
and instructors may find it helpful to see more worked examples. I believe that there are enough
exercises in the book that instructors can choose to include some of them—ranging from routine
calculations to proofs—as part of their own presentation of the material. Or students can try some
of the exercises on their own and check their answers against those in the back of the book. The
exercises are written with these possibilities in mind. I hope that this also gives instructors the
flexibility to integrate the approach taken in the book more easily with their personal perspectives
on the subject.
In writing the book, it has become clear that my own view of the material is heavily influenced
by the multivariable calculus course I took as a student. It was taught by Greg Brumfiel, and I
thank him for getting me excited about the subject in such a lasting way. I also thank my colleague
Ralph Gomez for reading an entire draft of the manuscript and making numerous suggestions
that improved the book considerably. I wish I had thought of them myself. Lastly, I thank the
students who took my courses over the years for their responsiveness and feedback which allowed
my approach to the material to evolve over the years to its current state, such as it is. They did
not know that the course they were taking might someday become the basis for a textbook, but
then I wasn’t aware of it either.
Part I

Preliminaries

www.dbooks.org
Chapter 1

Rn

Let R denote the set of real numbers. Its elements are also called scalars . If n is a positive integer,
then Rn is defined to be the set of all sequences x of n real numbers:

x = (x1 , x2 , . . . , xn ). (1.1)

The elements of Rn are called points, vectors, or n-tuples. We follow the convention of indicating
vectors in boldface and scalars in plainface. For a vector x, the individual scalar entries xi for
i = 1, 2, . . . , n are called coordinates or components.
Multivariable calculus studies functions between these sets, that is, functions of the form
f : Rn → Rm , or, more accurately, of the form f : A → Rm , where A is a subset of Rn . In
this context, if x represents a typical point of Rn , the coordinates x1 , x2 , . . . , xn are referred to as
variables. For example, first-year calculus studies real-valued functions of one variable, functions
of the form f : R → R.
This chapter collects some of the background information about Rn that we use throughout
the book. The presentation is meant to be self-contained, though readers who have studied linear
algebra are likely to have a greater perspective on how the pieces fit together as part of a bigger
picture.

1.1 Vector arithmetic


There are two basic algebraic operations in Rn . Let x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ),
and let c be a real number. The aforementioned operations are:

• Addition: x + y = (x1 + y1 , x2 + y2 , . . . , xn + yn ).
• Scalar multiplication: cx = (cx1 , cx2 , . . . , cxn ).

Because of our familiarity with R2 , we illustrate these concepts there in some detail. An element
of R2 is an ordered pair x = (x1 , x2 ). Geometrically, x is a point in the plane plotted in the usual
way. The origin (0, 0) is called the zero vector and is denoted by 0. Alternatively, we may
visualize x by drawing the arrow starting at (0, 0) and ending at (x1 , x2 ). We’ll go back and forth
freely between the point/arrow viewpoints.
Given two vectors x = (x1 , x2 ) and y = (y1 , y2 ) in R2 , the sum x + y as defined above is the
point that results from adding the displacements in each of the horizontal and vertical directions,
respectively. For instance, if x = (1, 2) and y = (3, 4), then x + y = (4, 6). Thinking of x and y as
arrows emanating from 0, this places x + y at the vertex opposite the origin in the parallelogram

www.dbooks.org
4 CHAPTER 1. RN

Figure 1.1: Vector addition

determined by x and y, as on the left of Figure 1.1. If we think of x + y as an arrow as well, it is


one of the diagonals of the parallelogram, as shown on the right.
Another way to reach x + y is to move the arrow representing y so that it retains the same
length and direction but begins at the endpoint of x. This is called a “translation” of the original
vector y. Then x + y is the destination if you go along x followed by the translated version of
y as illustrated in Figure 1.2, like following two displacements x and y in succession. It is often

Figure 1.2: Using the translation of a vector

convenient to translate vectors, especially when they represent quantities for which length and
direction are the most relevant characteristics. Nevertheless, it’s important to remember that the
translations are only copies: the real vector starts at the origin.
Similarly, if c is a real number, then cx results from multiplying each of the coordinate dis-
placements by a factor of c. For instance, if x = (1, 2), then 3x = (3, 6). In general, cx is an arrow
|c| times as long as x and in the same direction as x if c > 0, the opposite direction if c < 0. See
Figure 1.3. In particular, (−1)x is a copy of x rotated by 180◦ to reverse the direction. It is usually
denoted −x since it satisfies x + (−1)x = 0.
Going back to the parallelogram used to visualize x+y, we could also look at the other diagonal,
say drawn as an arrow from y to x, as indicated in Figure 1.4. We sometimes denote this arrow by
−→ It is what you would add to y to get to x. In other words, it is the difference x − y:
yx.

→ = x − y.
yx

Again, the real x − y should be returned so that it starts at 0.


1.1. VECTOR ARITHMETIC 5

Figure 1.3: Scalar multiplication

Figure 1.4: The difference of two vectors

In the case of R2 , the coordinates are usually denoted x, y, rather than x1 , x2 . Then R2 is
the usual xy-plane. The notation is potentially confusing since x is also often used to denote the
generic vector in Rn , as in equation (1.1) above. Hopefully, the context and the use of boldface will
clarify whether a coordinate or vector is intended. Similarly, R3 represents 3-dimensional space.
Its coordinates are often denoted x, y, z.
Returning to the general case, every vector x = (x1 , x2 , . . . , xn ) in Rn can be decomposed as a
sum along the coordinate directions:

x = (x1 , 0, . . . , 0) + (0, x2 , . . . , 0) + · · · + (0, 0, . . . , xn )


= x1 (1, 0, . . . , 0) + x2 (0, 1, . . . , 0) + · · · + xn (0, 0, . . . , 1).

The vectors e1 = (1, 0, . . . , 0), e2 = (0, 1, . . . , 0), . . . , en = (0, 0, . . . , 1), with a 1 in a single compo-
nent and zeros everywhere else, are called the standard basis vectors. Thus:

x = x1 e1 + x2 e2 + · · · + xn en . (1.2)

The scalar coefficients xi are the coordinates of x.


In general, a set of n vectors {v1 , v2 , . . . , vn } in Rn is called a basis if every x in Rn can be
written in a unique way as a combination x = c1 v1 +c2 v2 +· · ·+cn vn for some scalars c1 , c2 , . . . , cn .
Any basis can be used to define its own coordinate system in Rn , and Rn has many different bases.
Apart from a few occasions, however, we’ll stick with the standard basis.
In R2 , the standard basis vectors are usually denoted i = (1, 0) and j = (0, 1). In R3 , the
corresponding names are i = (1, 0, 0), j = (0, 1, 0), and k = (0, 0, 1).

www.dbooks.org
6 CHAPTER 1. RN

1.2 Linear transformations


Linear transformations are functions that respect vector addition and scalar multiplication. More
precisely:
Definition. A function T : Rn → Rm is called a linear transformation if:
• T (x + y) = T (x) + T (y) and
• T (c x) = c T (x).
The conditions must be satisfied for all x, y in Rn and all c in R.
This definition may seem austere, but many familiar functions are linear transformations. We
just may not be used to thinking of them that way.
Example 1.1. Let T : R2 → R2 be counterclockwise rotation by π3 about the origin. That is, if
x ∈ R2 , then T (x) is the point that is reached after x is rotated counterclockwise by π3 about 0. We
give a geometric argument that T satisfies the two requirements for being a linear transformation.
First, consider the parallelogram determined by vectors x and y in R2 . Then T rotates this
parallelogram to another parallelogram, the one determined by T (x) and T (y). In particular, the
vertex x + y is rotated to the vertex T (x) + T (y). This says exactly that T (x + y) = T (x) + T (y).
This is illustrated at the left of Figure 1.5.

Figure 1.5: Rotations are linear transformations.

This geometric argument is so simple that it may not be clear that there is any actual reasoning
behind it. The reader is encouraged to go through it carefully to pin down how it proves what
we want. Also, the case where x and y are collinear, so that they don’t determine an honest
parallelogram, requires a separate argument. We won’t give it, though see the reasoning in the
next paragraph.
Similarly, T rotates the line through the origin and x to the line through the origin and T (x).
The point on this line c times as far from the origin as x is rotated to the point c times as far from
the origin as T (x). In other words, T (cx) = c T (x). This is shown on the right in the figure.
Example 1.2. Likewise, the function T : R2 → R2 that reflects a point x = (x1 , x2 ) in the x1 -axis
is a linear transformation. The appropriate supporting diagrams are shown in Figure 1.6.

Example 1.3. Let T : R3 → R2 be the function that projects a point x = (x1 , x2 , x3 ) in R3 onto
the x1 x2 -plane, that is, T (x1 , x2 , x3 ) = (x1 , x2 ). Rather than use pictures, this time we show that
T is linear by calculating.
For instance:

T (x + y) = T (x1 + y1 , x2 + y2 , x3 + y3 ) = (x1 + y1 , x2 + y2 ).
1.3. THE MATRIX OF A LINEAR TRANSFORMATION 7

Figure 1.6: So are reflections.

On the other hand:


T (x) + T (y) = (x1 , x2 ) + (y1 , y2 ) = (x1 + y1 , x2 + y2 ).
Both expressions equal the same thing, so T (x + y) = T (x) + T (y).
Similarly, T (cx) = T (cx1 , cx2 , cx3 ) = (cx1 , cx2 ) while cT (x) = c(x1 , x2 ) = (cx1 , cx2 ). Thus
T (cx) = cT (x) as well.

1.3 The matrix of a linear transformation


Let T : Rn → Rm be a linear transformation. One of the things that make such a function easy to
work with is that, although T (x) is defined for all x in Rn , the transformation is actually determined
by a finite amount of input. To make sense of this, recall from equation (1.2) that every x in Rn
can be expressed in terms of the standard basis vectors: x = x1 e1 + x2 e2 + · · · + xn en . Thus by
the defining properties of a linear transformation:
T (x) = T (x1 e1 + x2 e2 + · · · + xn en )
= T (x1 e1 ) + T (x2 e2 ) + · · · + T (xn en )
= x1 T (e1 ) + x2 T (e2 ) + · · · + xn T (en )
= x1 a1 + x2 a2 + · · · + xn an , (1.3)
where aj = T (ej ) for j = 1, 2, . . . , n. Conversely, given any n vectors a1 , a2 , . . . , an in Rm , it’s
straightforward to check by calculation that the formula T (x) = x1 a1 + x2 a2 + · · · + xn an satisfies
the two requirements for being a linear transformation. This is Exercise 2.3 at the end of the
chapter. Moreover, T (ej ) = T (0, 0, . . . , 1, . . . , 0) = 0 · a1 + 0 · a2 + · · · + 1 · aj + · · · + 0 · an = aj for
each j. In other words, a linear transformation T : Rn → Rm is completely determined, as in (1.3),
once you know the values T (e1 ) = a1 , T (e2 ) = a2 , . . . , T (en ) = an , and there are no restrictions on
what these values can be.
To understand how this might be used, we consider first the case of a linear transformation
T : Rn → R whose values are real numbers. Then every T (ej ) is a scalar, say T (ej ) = aj , and
equation (1.3) becomes:
T (x) = x1 a1 + x2 a2 + · · · + xn an . (1.4)
Sums of this type are an indispensable part of vector algebra.
Definition. Let x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ) be vectors in Rn . Their dot product,
denoted x · y, is defined by:
x · y = x1 y1 + x2 y2 + · · · + xn yn .

www.dbooks.org
8 CHAPTER 1. RN

For example, in R3 , if x = (1, 2, 3) and y = (4, 5, 6), then x·y = 1·4+2·5+3·6 = 4+10+18 = 32.
The dot product satisfies a variety of elementary properties, such as x · y = y · x. The ones
we shall use are pretty obvious, so we won’t bother listing them out, though please see Exercises
5.5–5.10 if you’d like to see some of them collected together.
Returning to the main point, we have shown in equation (1.4) that every real-valued linear
transformation T : Rn → R has the form:
T (x) = a · x
for some vector a = (a1 , a2 , . . . , an ) in Rn .
The analysis for the general case of a linear transformation T : Rn → Rm follows the same
pattern except that now the values aj = T (ej ) are vectors in Rm . We record these values by
putting them in the columns of a rectangular table. That is, say aj is the vector (a1j , a2j , . . . , amj )
in Rm , and let A be the table:
 
a11 a12 · · · a1j · · · a1n
 a21 a22 · · · a2j · · · a2n 
A= . ..  ,
 
.. ..
 .. . . . 
am1 am2 · · · amj ··· amn
where aj is highlighted in red in the jth column. Such a table is called a matrix. In fact, A is
called an m by n matrix, which means that it has m rows and n columns. The subscripting has
been chosen so that aij is the entry in row i, column j, where the rows are numbered starting from
the top and the columns starting from the left.
The matrix obtained in this way is called the matrix of T with respect to the standard bases.
Since, by equation (1.3), the transformation T is completely determined by the columns aj , the
matrix contains all the data we need to find T (x) for all x in Rn .
We illustrate this with the three examples of linear transformations considered earlier.
Example 1.4. Let T : R2 → R2 be the counterclockwise rotation by π3 about the origin. Then T
rotates the vector e1 = (1, 0) to the vector on the unit circle that makes an angle of π3 with the
√ √
x1 -axis. That is, T (e1 ) = (cos π3 , sin π3 ) = ( 12 , 23 ). Similarly, T (e2 ) = (cos 5π 5π 3 1
6 , sin 6 ) = (− 2 , 2 ).
Hence the matrix of T with respect to the standard bases is:
" √ #
1
2 − 23
A= √ . (1.5)
3 1
2 2

Example 1.5. If T : R2 → R2 is the reflection in the x1 -axis, then T (e1 ) = T (1, 0) = (1, 0) and
T (e2 ) = T (0, 1) = (0, −1). Hence:  
1 0
A= . (1.6)
0 −1

Example 1.6. Lastly, if T : R3 → R2 is the projection of x1 x2 x3 -space onto the x1 x2 -plane, then
T (e1 ) = T (1, 0, 0) = (1, 0), T (e2 ) = T (0, 1, 0) = (0, 1), and T (e3 ) = T (0, 0, 1) = (0, 0), so:
 
1 0 0
A= . (1.7)
0 1 0
1.3. THE MATRIX OF A LINEAR TRANSFORMATION 9

To use the matrix A to compute T (x) in a systematic way, we observe the convention that
vectors are identified with matrices having a single column. Thus:
 
x1
 
a1j
 x2   a2j 
 
x =  .  and aj =  . 
   
.
 .   .
. 
xn amj

represent vectors in Rn and Rm , respectively. Sometimes they’re referred to as column vectors.


With this notation, equation (1.3) becomes:
     
a11 a12 a1n
 a21   a22   a2n 
T (x) = x1  .  + x2  .  + · · · + xn  . 
     
 ..   ..   .. 
am1 am2 amn
 
a11 x1 + a12 x2 + · · · + a1n xn
 a21 x1 + a22 x2 + · · · + a2n xn 
= . (1.8)
 
..
 . 
am1 x1 + am2 x2 + · · · + amn xn

This is similar to the real-valued case (1.4) in that each component is a dot product. Specifically,
the ith component is the dot product of the ith row of A and x. This expression is given a name.
Definition (Preliminary version of matrix multiplication). Let A be an m by n matrix, and let
x be a column vector in Rn . Then the product Ax is defined to be the column vector y in Rm
whose ith component is the dot product of the ith row of A and x:

yi = ai1 x1 + ai2 x2 + · · · + ain xn .

In other words, Ax is the column vector given by equation (1.8) above.


We’ve labeled this as a “preliminary” version, because we define a more general matrix multi-
plication shortly that includes this as a special case.
Using this definition, equation (1.8) can be written in the simple form T (x) = Ax. The preceding
discussion is summarized in the following result.
Proposition 1.7.
1. Given any linear transformation T : Rn → Rm , there is an m by n matrix A such that:

• the jth column of A is T (ej ) for j = 1, 2, . . . , n, and


• T (x) = Ax for all x in Rn .

2. Conversely, given any m by n matrix A, the formula T (x) = Ax defines a linear transforma-
tion T : Rn → Rm .
We apply this to the examples previously considered. For instance, if T is the counterclockwise
rotation of R2 by π3 about the origin, then by (1.5):
" √ #  " √ # √ √
1
2 − 23 x1 1
x 1 − 3
x 2 1 3 3 1 
T (x) = √ = 2√3 2
1
= x1 − x2 , x 1 + x2 .
3 1 x2 x 1 + x 2 2 2 2 2
2 2 2 2

www.dbooks.org
10 CHAPTER 1. RN

Once you get the hang of it, this may be the simplest way to find a formula for a rotation.
For the reflection in the x1 -axis, (1.6) gives:
      
1 0 x1 1 · x1 + 0 · x2 x1
T (x) = = = = (x1 , −x2 ).
0 −1 x2 0 · x1 − 1 · x2 −x2
We didn’t need matrix methods to come up with this formula, but at least it’s correct.
Lastly, for the projection of R3 onto the x1 x2 -plane, we find from (1.7) that:
 
  x1    
1 0 0   x +0+0 x
T (x) = x2 = 1 = 1 = (x1 , x2 ),
0 1 0 0 + x2 + 0 x2
x3
as expected.

1.4 Matrix multiplication


Let T : Rn → Rm and S : Rm → Rp be functions, where, as indicated, the codomain of T equals
the domain of S. The composition S ◦ T : Rn → Rp is defined to be the function given by
(S ◦ T )(x) = S(T (x)) for all x in Rn . If S and T are linear transformations, it’s easy to check that
S ◦ T is a linear transformation, too:
• S(T (x + y)) = S(T (x) + T (y)) = S(T (x)) + S(T (y)),
• S(T (cx)) = S(cT (x)) = cS(T (x)).
In each case, the first equation follows from the definition of linear transformation applied to T
and the second applied to S. As a result, S ◦ T is represented by a matrix C with respect to the
standard bases.
Let A and B be the matrices of S and T , respectively, with respect to the standard bases. We
shall describe C in terms of A and B. By Proposition 1.7, the jth column cj of C is S(T (ej )). For
the same reason T (ej ) is the jth column of B. Let’s call it bj , so cj = S(bj ). On the other hand,
S is represented by the matrix A, so cj is the matrix product cj = Abj . Its ith component is the
ith row of A dotted with bj , that is, the dot product of the ith row of A and the jth column of B.
Doing this for all j = 1, 2, . . . , n, fills out the matrix C. The matrix obtained in this way is called
the product of A and B.
Definition (Final version of matrix multiplication). Let A be a p by m matrix and B an m by
n matrix. Their product, denoted AB, is the p by n matrix C whose (i, j)th entry is the dot
product of the ith row of A and the jth column of B:
m
X
cij = ai1 b1j + ai2 b2j + · · · + aim bmj = aik bkj .
k=1

Visually, the action is something like this:


 ··· b1j ···

.. .. .. ..
  
 .   . . .  ··· b2j ··· 
 · · · cij · · ·  = ai1 ai2 · · · aim 

.. .

 .
.. .. .. .. 
   
. . . . ··· bmj ···
This only makes sense when the number of columns of A equals the number of rows of B.
The definitions have been rigged so that the following statement is an immediate consequence.
1.5. THE GEOMETRY OF THE DOT PRODUCT 11

Proposition 1.8. Composition of linear transformations corresponds to matrix multiplication.


That is, let S : Rm → Rp and T : Rn → Rm be linear transformations with matrices A and B,
respectively, with respect to the standard bases. Then S ◦ T is a linear transformation, and its
matrix with respect to the standard bases is the product AB.
Example 1.9. Let T : R2 → R2 be the counterclockwise rotation by π3 about the origin, S : R2 →
R2 the reflection in the x1 -axis, and consider the composition S ◦T (first rotate, then reflect). Using
the matrices from (1.5) and (1.6), the matrix of S ◦ T with respect to the standard bases is the
product:
" 1 √ # " √ √ # " √ #
3 1 3 3 1 1 3


1 0 1 · + 0 · 1 · (− ) + 0 · −
√2 2
= 2 2√ √ 2 2 = 2√ 2 .
0 −1 3 1 0 · 12 + (−1) · 23 0 · (− 23 ) + (−1) · 12 − 23 − 12
2 2

Working

backwards and looking at the columns of √
this matrix, this tells us that (S ◦ T )(e1 ) =
( 12 , − 23 ) = (cos(− π3 ), sin(− π3 )) and (S ◦ T )(e2 ) = (− 23 , − 21 ) = (cos( 7π 7π
6 ), sin( 6 )). These points are
plotted in Figure 1.7. A linear transformation that has the same effect on e1 and e2 is the reflection

Figure 1.7: The composition of a rotation and a reflection

in the line ` that makes an angle of − π6 with the positive x1 -axis. But linear transformations are
completely determined by what they do to the standard basis: two transformations that do the
same thing must be the same transformation. Thus we conclude that S ◦ T is the reflection in `.
This can be verified with geometric reasoning as well.
By comparison, the composition T ◦ S (first reflect, then rotate) is represented by the same
matrices multiplied in the opposite order:
" √ #  " 1 √ #
1 3 3
− 1 0
√2 2
= √23 2 . (1.9)
3 1 0 −1 − 12
2 2 2

Note that this is different from the matrix of S ◦ T . Matrix multiplication need not obey the
commutative law AB = BA. (Can you describe geometrically the linear transformation that (1.9)
represents?)

1.5 The geometry of the dot product


We now discuss some geometric properties of the dot product, or, perhaps more accurately, how
the dot product can be used to develop geometric intuition about Rn when n is large enough to be
outside direct experience.

www.dbooks.org
12 CHAPTER 1. RN

Definition. The norm, or magnitude, of a vector x = (x1 , x2 , . . . , xn ) in Rn , denoted kxk, is


defined to be: q
kxk = x21 + x22 + · · · + x2n .
√ √
For instance, in R3 , if x = (1, 2, 3), then kxk = 1+4+9= 14.
The following simple property gets used a lot.

Proposition 1.10. If x ∈ Rn , then x · x = kxk2 .

Proof. Both sides equal x21 + x22 + · · · + x2n .

We return to the familiar p setting of the plane and examine these notions there. For instance,
if x = (x1 , x2 ), then kxk = x21 + x22 . By the Pythagorean theorem, this is the length of the
hypotenuse of a right triangle with legs |x1 | and |x2 |. If we think of x as an arrow emanating from
the origin, then kxk is the length of the arrow. If we think of x as a point, then kxk is the distance
from x to the origin.
Given two points x and y in R2 , the distance between them is the length of the arrow that
connects them, − → = x − y. Hence:
yx

Distance between x and y = kx − yk.

Next, let x = (x1 , x2 ) and y = (y1 , y2 ) be nonzero elements of R2 , regarded as arrows emanating
from the origin. Suppose that the arrows are perpendicular. If x and y do not form a horizon-
tal/vertical pair, then the slopes of the lines through the origin that contain them are defined and
are negative reciprocals. The slope is the ratio of vertical displacement to horizontal displacement,
so this gives xx21 = − yy21 . See Figure 1.8. This is easily rearranged to become x1 y1 + x2 y2 = 0, or

Figure 1.8: Perpendicular vectors in the plane: for instance, note that x has slope x2 /x1 .

x · y = 0. If x and y do form a horizontal/vertical pair, say x = (x1 , 0) and y = (0, y2 ), then


x · y = 0 once again. Conversely, if x · y = 0, the preceding reasoning can be reversed to conclude
that x and y are perpendicular. Thus:

In R2 , x and y are perpendicular if and only if x · y = 0.

For vectors in the plane in general, let θ denote the angle between two given vectors x and y,
where 0 ≤ θ ≤ π. To study the relationship between the dot product and θ, assume for the moment
that neither x nor y is a scalar multiple of the other, so θ 6= 0, π, and consider the triangle whose
1.5. THE GEOMETRY OF THE DOT PRODUCT 13

Figure 1.9: Vectors x and y in R2 and the angle between them

vertices are 0, x, and y. Two of the sides of this triangle have lengths kxk and kyk, and the length
of the third side is the length of the arrow −
→ = x − y. See Figure 1.9. Thus by the law of cosines,
yx
kx − yk2 = kxk2 + kyk2 − 2kxk kyk cos θ. By Proposition 1.10, this is the same as:

(x − y) · (x − y) = x · x + y · y − 2 kxk kyk cos θ. (1.10)

Multiplying out the left-hand side gives (x−y)·(x−y) = x·x−x·y−y·x+y·y = x·x−2x·y+y·y.


This expansion uses some of the elementary, but obvious, algebraic properties of the dot product
that we declined to list. After substitution into (1.10), we obtain:

x · x − 2 x · y + y · y = x · x + y · y − 2 kxk kyk cos θ.

Hence after some cancellation:


x · y = kxk kyk cos θ. (1.11)
In the case that x or y is a scalar multiple of the other, then θ = 0 or π. Say, for instance, that
y = cx. Then x · y = x · (cx) = ckxk2 . Meanwhile, kyk = |c| kxk, so kxk kyk cos θ = |c| kxk2 cos θ =
±c kxk2 cos θ. The plus sign occurs when |c| = c, i.e., the scalar multiple c is nonnegative, in which
case θ = 0 and cos θ = 1. The minus sign occurs when c < 0, whence cos θ = cos π = −1. Either
way, kxk kyk cos θ = ckxk2 , the same as x · y. Thus (1.11) is valid for all x, y in R2 .
We use our knowledge of the plane to try to create some intuition about Rn when n ≥ 3. This
is especially important when n > 3, since visualization in those spaces is almost completely an
act of imagination. For instance, if v is a nonzero vector in Rn , we think of the set of all scalar
multiples cv as a “line.” If w is a second vector, not a scalar multiple of v, and c and d are scalars,
the parallelogram law of addition suggests that the combination cv + dw should lie in a “plane”
determined by v and w and that, as c and d range over all possible scalars, cv + dw should sweep
out this plane. As a result, the set of all vectors cv + dw, where c, d ∈ R, is called the plane
spanned by v and w. We denote it by P. The only reason that it’s a plane is because that’s what
we have chosen to call it.
We would like to make P into a replica of the familiar plane R2 . We sketch one approach for
doing this, without filling in many technical details. We begin by choosing two vectors u1 and u2
in P such that u1 · u1 = u2 · u2 = 1 and u1 · u2 = 0. These vectors play the role of the standard
basis vectors e1 and e2 , which of course satisfy the same relations. (It’s not hard to show that such
vectors u1 and u2 exist. In fact, in terms of the spanning vectors v and w, one can check that
1 1
u1 = kvk v and u2 = kzk z, where z = w − v·w
v·v v, is one possibility, though there are many others.)
We use u1 and u2 to establish a two-dimensional coordinate system internal to P. Every element
of P can be written in pthe form x = a1 u1 +a2 u2 for some scalars a1 and a2 . In terms of our budding
intuition, we think of a21 + a22 as representing the distance from the origin, or the length, of x.

www.dbooks.org
14 CHAPTER 1. RN

If y = b1 u2 + b2 u2 is another element of P, then:

x · y = (a1 u1 + a2 u2 ) · (b1 u1 + b2 u2 )
= a1 b1 u1 · u1 + (a1 b2 + a2 b1 )u1 · u2 + a2 b2 u2 · u2
= a1 b1 + a2 b2 .

Thus the dot product in Rn agrees with the result we would expect in terms of the newly created
internal coordinates in P. In particular, kxk2 = x · x = a21 + a22 , so the norm in Rn represents our
intuitive notion of length in P. This is true even though the coordinates of x = (x1 , x2 , . . . , xn ) in
Rn don’t really have anything to do with P.
Continuing in this way, we can build up P as a copy of R2 and transfer over the familiar concepts
of plane geometry, such as distance and angle. The following terminology reflects this intuition.

Definition. A vector x in Rn is called a unit vector if kxk = 1.

For example, the standard basis vectors ei = (0, 0, . . . , 0, 1, 0, . . . , 0) are unit vectors.

1
Corollary 1.11. If x is a nonzero vector in Rn , then u = kxk x is a unit vector. It is called the
unit vector in the direction of x.

1 1 1 1
kxk2
  
Proof. We calculate that u·u = kxk x · kxk x = kxk2
x·x = kxk2
= 1. Thus by Proposition
1.10, kuk = 1.

Our main result in this section is that the relation between the dot product and angles that we
derived for the plane in equation (1.11) is true in Rn for all n.

Theorem 1.12. If x and y are vectors in Rn , then:

x · y = kxk kyk cos θ,

where θ is the angle between x and y in the plane that they span.

Proof. The case that x or y is a scalar multiple of the other is proved in the same way as in R2 .
Otherwise, x and y span a plane P, and the points 0, x, and y are the vertices of a “triangle” in
P. Since we can make P into a geometric replica of R2 , the law of cosines remains true, and, since
the norm in Rn represents length in P, this takes the form:

kx − yk2 = kxk2 + kyk2 − 2 kxk kyk cos θ.

From here, the argument is identical to the one for R2 .

Lastly, we introduce the standard terminology for the case of perpendicular vectors, that is,
when θ = π2 , so cos θ = 0.

Definition. Two vectors x and y in Rn are called orthogonal if x · y = 0.

For instance, the standard basis vectors in Rn are orthogonal: ei · ej = 0 whenever i 6= j.


1.6. DETERMINANTS 15

1.6 Determinants
The determinant is a function that assigns a real number to an n by n matrix. There’s a separate
function for each n. We shall focus almost exclusively on the cases n = 2 and n = 3, since those are
the cases we really need later. The determinant is not defined for matrices in which the numbers
of rows and columns are unequal.
The determinant of a 2 by 2 matrix is defined to be:
 
a11 a12
det = a11 a22 − a12 a21 .
a21 a22

It’s the product of the diagonal entries minus the product of the off-diagonal entries. For instance,
det [ 13 24 ] = 1 · 4 − 2 · 3 = 4 − 6 = −2.
One often thinks  of the determinant as a function of the rows of the matrix. If x = (x1 , x2 ) and
y = (y1 , y2 ), let x y denote the matrix whose rows are x and y:
 
  x1 x2
det x y = det = x1 y2 − x2 y1 .
y1 y2

The main geometric property of 2 by 2 determinants is the following.

Proposition 1.13. Let x and y be vectors in R2 , neither a scalar multiple of the other. Then:
 
det x y = Area of the parallelogram determined by x and y.

2
= (Area)2 . The area of the parallelogram

Proof. We show the equivalent result that det x y
is (base)(height). As base, we use the arrow x, which has length kxk. For the height h, we drop a

Figure 1.10: Area of a parallelogram

perpendicular from the point y to the base. See Figure 1.10. Let θ be the angle between x and y,
where as usual 0 ≤ θ ≤ π. Then the height is given by h = kyk sin θ, so Area = kxk kyk sin θ and:

(Area)2 = kxk2 kyk2 sin2 θ. (1.12)

www.dbooks.org
16 CHAPTER 1. RN

Meanwhile:
2
= (x1 y2 − x2 y1 )2

det x y
= x21 y22 − 2x1 x2 y1 y2 + x22 y12
= (x21 + x22 )(y12 + y22 ) − x21 y12 − x22 y22 − 2x1 x2 y1 y2
= (x21 + x22 )(y12 + y22 ) − (x1 y1 + x2 y2 )2
= kxk2 kyk2 − (x · y)2
= kxk2 kyk2 − kxk2 kyk2 cos2 θ
= kxk2 kyk2 sin2 θ. (1.13)

Comparing equations (1.12) and (1.13) gives the result.

If x or y is a scalar multiple of the other, for instance, if y = cx for some scalar c, then the
“parallelogram” they  determine
 degenerates into a line segment, which has area 0. At the same
  x1 x2
time, det x y = = cx1 x2 − cx1 x2 = 0, so the proposition remains valid when things
cx1 cx2
degenerate, too.
Determinants satisfy a great many algebraic properties. We list only the ones that we shall use,
which barely begins to scratch the surface. In part (c) of the following proposition, the transpose
of a matrix A, denoted At , refers to the matrix obtained by turning the rows into columns and the
columns into rows. So the (i, j)th entry of At is the (j, i)th entry of A. For instance:
   t
 t    t 1 4 1
1 2 1 3 1 2 3  
= , = 2 5 , 2 = 1 2 3 , and so on.
3 4 2 4 4 5 6
3 6 3

If A is an m by n matrix, then At is n by m.
Proposition 1.14.
(a) If two of the rows are equal, the determinant is 0.
(b) Interchanging two rows flips the sign of the determinant.
(c) det(At ) = det A.
(d) det(AB) = (det A)(det B).
Proof. For 2 by 2 determinants, the proofs are easy.
(a) det x x = det [ xx11 xx22 ] = x1 x2 − x1 x2 = 0.
 

(b) det y x = det [ xy11 xy22 ] = y1 x2 − y2 x1 = −(x1 y2 − x2 y1 ) = − det x y .


   

(c) det(At ) = det [ aa12


11 a21
] = a11 a22 − a21 a12 = det [ aa11 a12
21 a22 ] = det(A).
 a22 h i h i
(d) det(AB) = det [ aa21 11 a12
a22 ] b11 b12
b21 b22 = det a11 b11 +a12 b21 a11 b12 +a12 b22
a21 b11 +a22 b21 a21 b12 +a22 b22 = (a11 b11 + a12 b21 )(a21 b12

+a22 b22 )−(a11 b12 +a12 b22 )(a21 b11 +a22 b21 ). After multiplying out, the terms involving a11 a21 b11 b12
and a12 a22 b21 b22 cancel in pairs, leaving:

det(AB) = a11 b11 a22 b22 + a12 b21 a21 b12 − a11 b12 a22 b21 − a12 b22 a21 b11 .
 
One can check that (det A)(det B) = a11 a22 − a12 a21 b11 b22 − b12 b21 expands to the same thing.
1.6. DETERMINANTS 17

For 3 by 3 matrices, the determinant is defined in terms of the 2 by 2 case:


 
a11 a12 a13      
a22 a23 a21 a23 a21 a22
det a21 a22 a23 = a11 · det
  − a12 · det + a13 · det . (1.14)
a32 a33 a31 a33 a31 a32
a31 a32 a33

The signs in the sum alternate, and the pattern is that the terms run along the entries of the first
row, a1j , each multiplied by the 2 by 2 determinant obtained by deleting the first row and jth
column of the original matrix. The process is known as expansion along the first row. For
instance:
 
1 2 3      
5 6 4 6 4 5
det 4 5 6 = 1 · det
  − 2 · det + 3 · det
8 9 7 9 7 8
7 8 9
= 1 · (45 − 48) − 2 · (36 − 42) + 3 · (32 − 35) = −3 + 12 − 9 = 0.

One can show that 3 by 3 determinants satisfy all four of the algebraic properties of Proposition
1.14. The calculations are longer than in the 2 by 2 case but still straightforward, except for the
product formula det(AB) = (det A)(det B) which calls for a new approach.
One consequence of the properties is that there’s a formula for expanding the determinant along
any row. To expand along the ith row, interchange row i with the row above it repeatedly until it
reaches the top, then expand using equation (1.14). The result has the same form as (1.14), except
that the leading scalar factors come from the ith row and sometimes the signs might alternate
beginning with a minus sign depending on how many sign flips were introduced in getting the ith
row to the top.
In addition, since det(At ) = det(A), any general statement about rows applies to columns as
well, so there are formulas for expanding along any column, too. We won’t write down those
formulas precisely.
There is also a geometric interpretation of 3 by 3 determinants in terms of 3-dimensional volume.
This is discussed in the next chapter.
For larger matrices, the same pattern continues, that is, n by n determinants can be defined in
terms of (n − 1) by (n − 1) determinants using expansion along the first row. For the 4 by 4 case,
the formula is:
 
a11 a12 a13 a14    
a21 a22 a23 a24 a21 a23 a24
a22 a23 a24 
 = a11 · det a32 a33
det 
a31 a34 − a12 · det a31 a33
  a34 
a32 a33 a34 
a42 a43 a44 a41 a43 a44
a41 a42 a43 a44
   
a21 a22 a24 a21 a22 a23
+ a13 · det a31
 a32 a34 − a14 · det a31
  a32 a33  .
a41 a42 a44 a41 a42 a43

The algebraic properties of Proposition 1.14 remain true for n by n determinants whatever the
value of n, though, to prove this, one should really develop the algebraic structure of determinants
in a systematic way rather than hope for success with brute force calculation. We leave this level
of generality for a course in linear algebra. In what follows, we work for the most part with the
cases n = 2 and n = 3.

www.dbooks.org
18 CHAPTER 1. RN

1.7 Exercises for Chapter 1


Section 1 Vector arithmetic
For Exercises 1.1–1.4, let x and y be the vectors x = (1, 2, 3) and y = (4, −5, 6) in R3 . Also, 0
denotes the zero vector, 0 = (0, 0, 0).

1.1. Find x + y, 2x, and 2x − 3y.

1.2. Find −
→ and y + −
yx →
yx.

1.3. If x + y + z = 0, find z.

1.4. If x − 2y + 3z = 0, find z.

1.5. Let x and y be points in Rn .

(a) Show that the midpoint of line segment xy is given by m = 12 x + 12 y. (Hint: What do
you add to x to get to the midpoint?)
(b) Find an analogous expression in terms of x and y for the point p that is 2/3 of the way
from x to y.
(c) Let x = (1, 1, 0) and y = (0, 1, 1). Find the point z in R3 such that y is the midpoint of
line segment xz.

Section 2 Linear transformations

2.1. Let T : R3 → R2 be the projection of x1 x2 x3 -space onto the x1 x2 -plane, T (x1 , x2 , x3 ) =


(x1 , x2 ). Draw pictures that show that T satisfies the requirements for being a linear trans-
formation.

2.2. Let T : R2 → R3 be the function defined by T (x1 , x2 ) = (x1 , x2 , 0). Show that T is a linear
transformation.

2.3. Let a1 , a2 , . . . , an be vectors in Rm . Verify that the function T : Rn → Rm given by:

T (x1 , x2 , . . . , xn ) = x1 a1 + x2 a2 + · · · + xn an

is a linear transformation.

Section 3 The matrix of a linear transformation

3.1. Let T : R2 → R2 be the linear transformation such that T (e1 ) = (1, 2) and T (e2 ) = (3, 4).

(a) Find the matrix of T with respect to the standard bases.


(b) Find T (5, 6).
(c) Find a general formula for T (x1 , x2 ).

3.2. Let T : R3 → R3 be the linear transformation such that T (e1 ) = (1, 0, −1), T (e2 ) = (−1, 1, 0),
and T (e3 ) = (0, −1, 1).

(a) Find the matrix of T with respect to the standard bases.


(b) Find T (1, −1, 1).
1.7. EXERCISES FOR CHAPTER 1 19

(c) Find the set of all points x = (x1 , x2 , x3 ) in R3 such that T (x) = 0.

In Exercises 3.3–3.6, find the matrix of the given linear transformation T : R2 → R2 with respect
to the standard bases.

3.3. T is the counterclockwise rotation by π/2 about the origin.


3.4. T is the reflection in the x2 -axis.
3.5. T is the identity function, T (x) = x.
3.6. T is the dilation about the origin by a factor of 2, T (x) = 2x.

3.7. Let ρθ : R2 → R2 be the rotation about the origin counterclockwise by an angle θ. Show that
the matrix of ρθ with respect to the standard bases is:
 
cos θ − sin θ
Rθ = . (1.15)
sin θ cos θ
The matrix Rθ is called a rotation matrix.
3.8. Let T : R2 → R2 be
 the linear transformation whose matrix with respect to the standard
0 1
bases is A = . Describe T geometrically.
1 0

3.9. Let T : R2 → R2 be
 the linear transformation whose matrix with respect to the standard
1 2
bases is A = .
2 4
(a) Find T (e1 ) and T (e2 ).
(b) Describe the image of T , that is, the set of all y in R2 such that y = T (x) for some x in
R2 .
3.10. Let v1 = (1, 1) and v2 = (−1, 1).
(a) Find scalars c1 , c2 such that c1 v1 + c2 v2 = e1 .
(b) Find scalars c1 , c2 such that c1 v1 + c2 v2 = e2 .
(c) Let T : R2 → R2 be the linear transformation such that T (v1 ) = (−1, −1) and T (v2 ) =
(−2, 2). Find the matrix of T with respect to the standard bases.
(d) Let S : R2 → R2 be the linear transformation such that S(v1 ) = e1 and S(v2 ) = e2 .
Find the matrix of S with respect to the standard bases.
3.11. Let T : R2 → R3 be the linear transformation given by T (x1 , x2 ) = (x1 , x2 , 0). Find the
matrix of T with respect to the standard bases.
3.12. Let T : R3 → R3 be the rotation about the x3 -axis by π/2 counterclockwise as viewed looking
down from the positive x3 -axis. Find the matrix of T with respect to the standard bases.
3.13. Let T : R3 → R3 be the rotation by π about the line x1 = x2 , x3 = 0, in the x1 x2 -plane. Find
the matrix of T with respect to the standard bases.
3.14. Let A and B be m by n matrices. If Ax = Bx for all column vectors x in Rn , show that
A = B.

www.dbooks.org
20 CHAPTER 1. RN

Section 4 Matrix multiplication


In Exercises 4.1–4.6, find the indicated matrix products.
   
1 −1 2 4
4.1. AB and BA, where A = and B =
1 1 −1 3
   
1 2 1 0
4.2. AB and BA, where A = and B =
3 4 0 1
   
0 0 1 0 1 0
4.3. AB and BA, where A = 1 0 0 and B = 0 0 1
0 1 0 1 0 0
   
1 2 3 1 1 1
4.4. AB and BA, where A = 0 4 5 and B = 0 1 1
0 0 6 0 0 1
 
  2 3
2 −1 −4
4.5. AB and BA, where A = and B = −1 2
3 2 1
−4 1
 
  4
4.6. AB and BA, where A = 1 2 3 and B = 5
6

4.7. We have seen that the commutative law AB = BA does not hold in general for matrix
multiplication. In fact, the situation is worse than that: it’s rare for AB and BA even to
both be defined. In that sense, the preceding half dozen exercises are misleading.

(a) Find an example of matrices A and B such that neither AB nor BA is defined.
(b) Find an example where AB is defined but BA is not.

4.8. (a) Let T : Rn → Rm , S : Rm → Rp and R : Rp → Rq be functions. Show that:

(R ◦ S) ◦ T (x) = R ◦ (S ◦ T ) (x) for all x in Rn .


 

(b) Show that matrix multiplication is associative, that is, if A is a q by p matrix, B a p by


m matrix, and C an m by n matrix, show that:

(AB)C = A(BC).

Since it does not matter how the terms are grouped, we shall write the product henceforth
simply as ABC, without parentheses. (Hint: See Exercise 3.14.)

In Exercises 4.9–4.10:
 
cos θ − sin θ
• let Rθ = be the matrix (1.15) that represents the rotation ρθ : R2 → R2 about
sin θ cos θ
the origin counterclockwise by angle θ, and
 
1 0
• let S = be the matrix (1.6) that represents the reflection r : R2 → R2 in the x1 -axis.
0 −1
1.7. EXERCISES FOR CHAPTER 1 21

4.9. Use matrices to prove that r ◦ ρθ ◦ r = ρ−θ .

4.10. (a) Compute the product A = Rθ SR−θ .


(b) By thinking about the corresponding composition of linear transformations, give a ge-
ometric description of the linear transformation T : R2 → R2 that is represented with
respect to the standard bases by A.

Section 5 The geometry of the dot product

5.1. Let x = (−1, 1, −2) and y = (4, −1, −1).

(a) Find x · y.
(b) Find kxk and kyk.
(c) Find the angle between x and y.

5.2. Find the unit vector in the direction of x = (2, −1, −2).

5.3. Find all unit vectors in R2 that are orthogonal to x = (1, 2).

5.4. What does the sign of x · y tell you about the angle between x and y?

In Exercises 5.5–5.10, show that the dot product satisfies the given property. The properties
are true for vectors in Rn , though you may assume in your arguments that the vectors are in R2 ,
i.e., x = (x1 , x2 ), y = (y1 , y2 ), and so on. The proofs for Rn in general are similar.

5.5. x · y = y · x

5.6. (cx) · y = c(x · y) for any scalar c

5.7. kcxk = |c| kxk for any scalar c

5.8. w · (x + y) = w · x + w · y

5.9. (v + w) · (x + y) = v · x + v · y + w · x + w · y (Hint: Make use of the preceding exercises.)

5.10. (x + y) · (x − y) = kxk2 − kyk2

5.11. Recall that a rhombus is a planar quadrilateral whose sides all have the same length. Use the
dot product to show that the diagonals of a rhombus are perpendicular to each other.

5.12. The Pythagorean theorem states that, for a right triangle in the plane, a2 + b2 = c2 , where
a and b are the lengths of the legs and c is the length of the hypotenuse. Use vector algebra
and the dot product to show that the theorem remains true for right triangles in Rn . (Hint:
The hypotenuse is a diagonal of a rectangle.)

Section 6 Determinants
In Exercises 6.1–6.4, find the given determinant.
 
2 5
6.1. det
−3 4
 
0 5
6.2. det
−3 4

www.dbooks.org
22 CHAPTER 1. RN
 
1 −2 3
6.3. det −4 5 −6
7 −8 9
 
1 0 0
6.4. det 2 3 0
4 5 6

6.5. Find the area of the parallelogram in R2 determined by x = (4, 0) and y = (1, 3).

6.6. Find the area of the parallelogram in R2 determined by x = (−2, −3) and y = (−3, 2).
 
1 −2
6.7. Let A = . Use the product rule for determinants to show that there is no 2 by 2
2 −4
 
1 0
matrix B such that AB = .
0 1

6.8. Let A be an n by n matrix, and let At be its transpose. Show that det(At A) = (det A)2 .

6.9. If (x1 , x2 , x3 ) ∈ R3 , let:  


x1 x2 x3
T (x1 , x2 , x3 ) = det  1 2 3  . (1.16)
4 5 6

(a) Expand the determinant to find a formula for T (x1 , x2 , x3 ).


(b) Show that equation (1.16) defines a linear transformation T : R3 → R.
(c) Find the matrix of T with respect to the standard bases.
(d) The matrix you found in part (c) should be a 1 by 3 matrix. Thinking of it as a vector
in R3 , show that it is orthogonal to both (1, 2, 3) and (4, 5, 6), the second and third rows
of the matrix used to define T (x1 , x2 , x3 ).
Part II

Vector-valued functions of one


variable

23

www.dbooks.org
Chapter 2

Paths and curves

This chapter is concerned with curves in Rn . While we may have an intuitive sense of what a
curve is, at least in R2 or R3 , the formal description here is somewhat indirect in that, rather than
requiring a curve to have a defining equation, we describe it by how it is swept out, like the trace
of a skywriter. Thus in addition to studying geometric features of the curve, such as its length,
we also look at quantities related to the skywriter’s motion, such as its velocity and acceleration.
The goal of the chapter is a remarkable result about curves in R3 that describes measurements
that characterize the geometry of a space curve completely. Along the way, we shall gain valuable
experience applying vector methods.
The functions in this chapter take their values in Rn , but they are functions of one variable. As
a result, we treat the material as a continuation of first-year calculus without going back to redefine
or redevelop concepts that are introduced there. At times, this assumed familiarity may lead to a
rather loose treatment of certain basic topics, such as continuity, derivatives, and integrals. When
we study functions of more than one variable, we shall go back and define these concepts carefully,
and what we say then applies retroactively to what we cover here. We hope that any reader who
becomes anxious about a possible lack of rigor will be willing to wait.

2.1 Parametrizations
Let I be an interval of real numbers, typically, I = [a, b], (a, b), or R.
Definition. A continuous function α : I → Rn is called a path. As t varies over I, α(t) traces out
a curve C. More precisely:

C = {x ∈ Rn : x = α(t) for some t in I}.

This is also known as the image of α. We say that α is a parametrization of the curve. We often
refer to the input variable t as time and think of α(t) as the position of a moving object at time
t. See Figure 2.1.
A path is a vector-valued function of one variable. To follow our notational convention, we
should write the value in boldface as α(t), but for the most part we continue to use plainface,
usually reserving boldface for a particular type of vector-valued function that we begin studying in
Chapter 8. For each t in I, α(t) is a point of Rn , so we may write α(t) = (x1 (t), x2 (t), . . . , xn (t)),
where each of the n coordinates is a real number that depends on t, i.e., a real-valued function of
one variable.
Here are some of the standard examples of parametrized curves that we shall refer to frequently.

25

www.dbooks.org
26 CHAPTER 2. PATHS AND CURVES

Figure 2.1: A parametrization α : [a, b] → Rn of a curve C

Example 2.1. Circles in R2 : x2 + y 2 = a2 , where a is the radius of the circle.


2 2
The equation of the circle can be rewritten as xa + ay = 1. To parametrize the circle, we
use the identity cos2 t + sin2 t = 1 and let xa = cos t and ay = sin t, or x = a cos t and y = a sin t.
These expressions show that t has a geometric interpretation as the angle that (x, y) makes with
the positive x-axis, as shown in Figure 2.2. The substititutions x = a cos t and y = a sin t satisfy
the defining equation x2 + y 2 = a2 of the circle, so α : R → R2 ,

α(t) = (a cos t, a sin t), a constant, a > 0,

parametrizes the circle, traversing it in the counterclockwise direction.

Figure 2.2: A circle of radius a

Example 2.2. Graphs y = f (x) in R2 .


As a concrete example, consider the curve described by y = sin x in R2 , where 0 ≤ x ≤ π. See
Figure 2.3. It consists of points of the form (x, sin x), and it is traced out as x goes from x = 0 to
x = π. Hence x serves as a natural parameter, and α : [0, π] → R2 ,

α(x) = (x, sin x),

is a parametrization of the curve.

Figure 2.3: The graph y = sin x, 0 ≤ x ≤ π


2.1. PARAMETRIZATIONS 27

Example 2.3. Helices in R3 .


A helix winds around a circular cylinder, say of radius a. If the axis of the cylinder is the z-axis,
then the x and y coordinates along the helix satisfy the equation of the circle as in Example 2.1,
while the z-coordinate changes at a constant rate. Thus we set x = a cos t, y = a sin t, and z = bt
for some constant b. That is, the helix is parametrized by α : R → R3 , where:

α(t) = (a cos t, a sin t, bt), a, b constant, a > 0.

If b > 0, the helix is said to be “right-handed” and if b < 0 “left-handed.” Each type is shown in
Figure 2.4.

Figure 2.4: Helices: right-handed (at left) and left-handed (at right)

Example 2.4. Lines in Rn .


Given a point a and a nonzero vector v in Rn , the line through a and parallel to v is
parametrized by α : R → Rn , where:
α(t) = a + tv.

See Figure 2.5. If a = (a1 , a2 , . . . , an ) and v = (v1 , v2 , . . . , vn ), then α(t) = (a1 , a2 , . . . , an ) +


t (v1 , v2 , . . . , vn ) = (a1 + tv1 , a2 + tv2 , . . . , an + tvn ), so in terms of coordinates:


 x1 = a1 + tv1
 x2

= a2 + tv2
..


 .
xn = an + tvn .

If the domain of α is restricted to an interval of finite length, then α parametrizes a finite segment
of the line.

Figure 2.5: As t varies, α(t) = a + tv traces out a line.

www.dbooks.org
28 CHAPTER 2. PATHS AND CURVES

Example 2.5. Find a parametrization of the line in R3 that passes through the points a = (1, 2, 3)
and b = (4, 5, 6).
−→
The line passes through the point a = (1, 2, 3) and is parallel to v = ab = b − a = (4, 5, 6) −
(1, 2, 3) = (3, 3, 3). The setup is indicated in Figure 2.6. Therefore one parametrization is:

α(t) = (1, 2, 3) + t (3, 3, 3) = (1 + 3t, 2 + 3t, 3 + 3t).

Figure 2.6: Parametrizing the line through two given points a and b

Any nonzero scalar multiple of v = (3, 3, 3) is also parallel to the line and could be used as
part of a parametrization as well. For instance, w = (1, 1, 1) is such a multiple, and β(t) =
(1, 2, 3) + t (1, 1, 1) = (1 + t, 2 + t, 3 + t) is another parametrization of the same line.

2.2 Velocity, acceleration, speed, arclength


We now introduce some basic aspects of calculus for paths and curves. We discuss derivatives of
paths in this section and integrals of real-valued functions over paths in the next.

Definition. Given a path α : I → Rn , the derivative of α is defined by:


1
α0 (t) = lim

α(t + h) − α(t)
h→0 h
 
1  
= lim x1 (t + h), x2 (t + h), . . . , xn (t + h) − x1 (t), x2 (t), . . . , xn (t)
h→0 h
 
x1 (t + h) − x1 (t) x2 (t + h) − x2 (t) xn (t + h) − xn (t)
= lim , ,...,
h→0 h h h
0 0 0
= (x1 (t), x2 (t), . . . , xn (t)),

provided the limit exists. The derivative is also called the velocity v(t) of α.

Likewise, α00 (t) = v0 (t) is called the acceleration, denoted a(t).

Example 2.6. For the helix parametrized by α(t) = (cos t, sin t, t):

• velocity: v(t) = α0 (t) = (− sin t, cos t, 1),


• acceleration: a(t) = v0 (t) = (− cos t, − sin t, 0).

Now, choose some time t0 , and, for any time t, let s(t) be the distance traced out along the
path from α(t0 ) to α(t). Then s(t) is called the arclength function, and we think of the rate of
2.2. VELOCITY, ACCELERATION, SPEED, ARCLENGTH 29

change of distance ds
dt as the speed. Unfortunately, these terms should be defined more carefully,
and, to get everything in the right logical order, it seems best to define the speed first.
To define what we expect dsdt to be, we make the intuitive approximation that the distance 4s
along the curve between two nearby points α(t) and α(t + 4t) is approximately the straight line
distance between them, as indicated in Figure 2.7. We assume that 4t is positive. If 4t is small:

“distance along curve” ≈ straight line distance, or


4s ≈ kα(t + 4t) − α(t)k.
4s
Thus 4t ≈ k α(t+4t)−α(t)
4t k. In the limit as 4t goes to 0, this approaches kα0 (t)k, the magnitude of

Figure 2.7: 4s ≈ kα(t + 4t) − α(t)k

the velocity. We take this as the definition of speed.


Definition. If α : I → Rn is a differentiable path, then its speed, denoted v(t), is defined to be:

v(t) = kv(t)k.

Note that the speed v(t) is a scalar quantity, whereas the velocity v(t) is a vector.
Now, to define the length of the path from t = a to t = b, we integrate the speed.
Rb Rb
Definition. The arclength from t = a to t = b is defined to be a v(t) dt = a kv(t)k dt.
The arclength function s(t) we considered above is then given by integrating the speed v(t),
so, by the fundamental theorem of calculus, ds
dt = v(t). Thus the definitions realize the intuitive
relations with which we began.
Example 2.7. For the helix parametrized by α(t) = (cos t, sin t, t), find:

(a) the speed and


(b) the arclength from t = 0 to t = 4π, i.e., two full twists around the helix.

First, as just shown in Example 2.6, the velocity is v(t) = (− sin t, cos t, 1). Thus:
p √
(a) speed: v(t) = kv(t)k = sin2 t + cos2 t + 1 = 2.
R 4π R 4π √ √ 4π √
(b) arclength: 0 v(t) dt = 0 2 dt = 2 t 0 = 4π 2.

Summary of definitions
velocity v(t) = α0 (t)
acceleration a(t) = α00 (t)
speed v(t) = kv(t)k = ds dt
Rb
arclength = a v(t) dt

www.dbooks.org
30 CHAPTER 2. PATHS AND CURVES

2.3 Integrals with respect to arclength


Let C be a curve in Rn , and let f : C → R be a real-valued function defined on C. To formulate
the definition of the integral of f over C, we adapt the usual approach of first-year calculus for
integrating a function over an interval, namely:
• Chop up the domain C into a sequence of short pieces of lengths 4s1 , 4s2 , . . . , 4sk .
• Choose a point x1 , x2 , . . . , xk in each piece.
P
• Consider sums of the form j f (xj ) 4sj .
• Take the limit of the sums as 4sj goes to 0.
See Figure 2.8.

Figure 2.8: Defining the integral with respect to arclength

We can rewrite the sums in a way that leads to a normal first-year calculus integral over an
interval by using a parametrization α : [a, b] → Rn of C. We assume that [a, b] can be subdivided
into a sequence of consecutive subintervals of lengths 4t1 , 4t2 , . . . , 4tk such that the subinterval
of length 4tj gets mapped to the curve segment of length 4sj for each j = 1, 2, . . . , k. Then the
sum that models the integral can be written as:
X X 4sj
f (xj ) 4sj = f (α(cj )) 4tj ,
4tj
j j

where cj is a value of the parameter for which α(cj ) = xj . In the limit as 4tj goes to 0, these
Rb
sums approach the integral a f (α(t)) ds ds
dt dt. In this last expression, dt is the speed, also denoted
v(t). This intuition is formalized in the following definition.
Definition. Let α : [a, b] → Rn be a differentiable parametrization of a curve C, and let v(t) denote
its speed. If f : C →RR is a continuous real-valued function, the integral of f with respect to
arclength, denoted α f ds, is defined by:
Z Z b
f ds = f (α(t)) v(t) dt.
α a
R
We also denote this integral by C f ds, though this raises some issues that are addressed later when
we have more practice with parametrizations and their effect on calculations. (See page 208.)
R
Example 2.8. If f = 1 (constant function), then the definition of the integral says C 1 ds =
Rb
a v(t) dt. On the other hand, this is also precisely the definition of arclength. Hence:
Z
1 ds = arclength of C.
C
2.4. THE GEOMETRY OF CURVES: TANGENT AND NORMAL VECTORS 31

Example 2.9. Consider


R again the portion of the helix parametrized by α(t) = (cos t, sin t, t),
0 ≤ t ≤ 4π. Find C (x + y + z) ds.
Here, f (x, y, z) = x + y + z, so f (α(t)) = f (cos t, sin t, t) = cos t + sin t + t. In other words, we
read off the components of the parametrization to substitute√ x = cos t, y = sin t, and z = t into
the formula for f . From Example 2.7, v(t) = kv(t)k = 2. Thus:
Z Z 4π √
(x + y + z) ds = (cos t + sin t + t) 2 dt
C 0
√ 1 4π
= 2 (sin t − cos t + t2 ) 0
√ 2
2

= 2 (0 − 1 + 8π ) − (0 − 1 + 0)

= 8 2 π2.

2.4 The geometry of curves: tangent and normal vectors


With the basic notions involving paths and curves in Rn now in place, we direct most of the
remainder of the chapter towards the special case of curves in R3 . It turns out that we are already
in a position to derive a fairly substantial result about the geometry of such curves: we shall
describe scalar measurements that determine whether two curves in R3 are congruent in the same
spirit that one learns in high school to test whether triangles in the plane are congruent. In a sense,
this gives a classification of curves in three-dimensional space.
The main idea is to attach a three-dimensional coordinate system that moves along with the
curve. By measuring how fast the coordinate directions are changing, one recovers the geometry of
the curve. In order for this to work, the coordinate directions need to have some intrinsic connection
with the geometry of the curve.
We begin with the most natural direction related to the motion along the curve, namely, the
tangent direction. Let α : I → R3 be a differentiable parametrization of a curve, and consider the
difference α(t + h) − α(t) between two nearby points on the curve. As h goes to 0, this vector
approaches the tangent direction at α(t), and hence so does the scalar multiple h1 (α(t + h) − α(t)).
See Figure 2.9. In other words, the derivative α0 (t) = limh→0 h1 (α(t + h) − α(t)) points in the

Figure 2.9: As h → 0, α(t + h) − α(t) approaches the tangent direction.

direction tangent to the curve.

Definition. If α0 (t) 6= 0, define:


1
T(t) = α0 (t).
kα0 (t)k
This is a unit vector in the tangent direction at α(t), called the unit tangent vector (Figure
2.10).

www.dbooks.org
32 CHAPTER 2. PATHS AND CURVES

Figure 2.10: The unit tangent vector T(t), translated to start at α(t)

The remaining two curve-related coordinate directions are orthogonal to this first one. In R3 ,
there is a whole plane of orthogonal possibilities, but one of the possibilities turns out to be naturally
distinguished. Identifying it requires some preliminary work.

Proposition 2.10 (Product rule for the dot product). If α and β are differentiable paths in Rn ,
then:
(α · β)0 = α0 · β + α · β 0 .

Proof. Note that α · β is real-valued, so by definition of the derivative in first-year calculus:


1
(α · β)0 (t) = lim

α(t + h) · β(t + h) − α(t) · β(t) .
h→0 h

The expression within the limit can be put into more manageable form by using the trick of “adding
zero,” suitably disguised, in the middle:
1  1
α(t + h) · β(t + h) − α(t) · β(t) = α(t + h) · β(t + h)−α(t) · β(t + h)
h h 
+α(t) · β(t + h) − α(t) · β(t)
α(t + h) − α(t) β(t + h) − β(t)
= · β(t + h) + α(t) · .
h h
Now, taking the limit as h goes to 0 gives the desired result: (α·β)0 (t) = α0 (t)·β(t)+α(t)·β 0 (t).

Here are two immediate consequences.

Corollary 2.11. (kαk2 )0 = 2α · α0

Proof. By Proposition 1.10 of Chapter 1, kαk2 = α · α, so by the product rule (kαk2 )0 = (α · α)0 =
α0 · α + α · α0 = 2α · α0 .

Corollary 2.12. kαk is constant if and only if α · α0 = 0.

In words, a vector-valued function of one variable has constant magnitude if and only if it is
orthogonal to its derivative at all times.

Proof. kαk is constant if and only if kαk2 is constant. This in turn is true if and only if (kαk2 )0 = 0.
By the preceding corollary, this is equivalent to saying that α · α0 = 0.

Returning to the geometry of curves, the unit tangent vector T(t) satisfies kT(t)k = 1, which
is constant. Therefore T and T0 are orthogonal. The unit vector in the direction of T0 will be
orthogonal to T, too. This gives us our second coordinate direction.
2.5. THE CROSS PRODUCT 33

Definition. If T0 (t) 6= 0, then:


1
N(t) = T0 (t)
kT0 (t)k
is called the principal normal vector. See Figure 2.11.

Figure 2.11: The unit tangent and principal normal vectors, T(t) and N(t)

Example 2.13. For the helix parametrized by α(t) = (cos t, sin t, t), find the unit tangent T(t)
and the principal normal N(t). √
As calculated in Examples 2.6 and 2.7, α0 (t) = (− sin t, cos t, 1) and kα0 (t)k = v(t) = 2. Thus:
1 1
T(t) = α0 (t) = √ (− sin t, cos t, 1).
kα0 (t)k 2
p
Continuing with this, T0 (t) = √1 (− cos t, − sin t, 0)
2
and kT0 (t)k = √1
2
cos2 t + sin2 t + 02 = √1
2
Hence:
1
N(t) = T0 (t)
kT0 (t)k
1 1
= · √ (− cos t, − sin t, 0)
√1 2
2
= (− cos t, − sin t, 0).
As a check, we verify that the unit tangent and principal normal are orthogonal, as predicted
by the theory:
1 
T(t) · N(t) = √ (− sin t)(− cos t) + (cos t)(− sin t) + 1 · 0
2
1 
= √ sin t cos t − cos t sin t + 0 = 0.
2
The remaining step is to find a third and final coordinate direction, that is, a unit vector in
R3 orthogonal to both T(t) and N(t). There are two choices for such a vector, and, in the next
section, we describe a systematic way to single out one of them, denoted B(t). Once this is done,
we will have a basis of orthogonal unit vectors (T(t), N(t), B(t)) at each point of the curve that is
naturally adapted to the motion along the curve.

2.5 The cross product


We remain focused on R3 . Recall that the standard basis vectors are denoted i = (1, 0, 0), j =
(0, 1, 0), k = (0, 0, 1) and that every vector x = (x1 , x2 , x3 ) in R3 can be written in terms of them:
x = (x1 , x2 , x3 ) = (x1 , 0, 0) + (0, x2 , 0) + (0, 0, x3 ) = x1 i + x2 j + x3 k.

www.dbooks.org
34 CHAPTER 2. PATHS AND CURVES

Our goal for the moment is this: given vectors v and w in R3 , find a vector, written v × w,
that is orthogonal to both v and w. We concentrate initially not so much on what v × w actually
is but rather on requiring it to have a certain critical property. Namely, whatever v × w is, we
insist that it satisfy the following.
 
x1 x2 x3
Key Property. For all x in R3 , x · (v × w) = det  v1 v2 v3 , that is:
w1 w2 w3
 
x · (v × w) = det x v w (P)
 
where x v w denotes the 3 by 3 matrix whose rows are x, v, w.
For instance, i × j must satisfy:
 
  x1 x2 x3
x · (i × j) = det x i j = det  1 0 0 
0 1 0
= x1 · (0 − 0) − x2 · (0 − 0) + x3 · (1 − 0)
= x3 for all x = (x1 , x2 , x3 ) in R3 .
But x · k = (x1 , x2 , x3 ) · (0, 0, 1) = x3 for all x as well, so k satisfies the key property (P) required
of i × j. We would expect that i × j = k.
Assuming for the moment that v × w can be defined in general to satisfy (P), our main goal
follows immediately.
Proposition 2.14. v × w is orthogonal to v and w (Figure 2.12).
Proof. We take dot products and use the key property:
 
v · (v × w) = det v v w = 0 (two equal rows ⇒ det = 0).
 
Thus v × w is orthogonal to v. For the same reason, w · (v × w) = det w v w = 0.

Figure 2.12: The cross product v × w is orthogonal to v and w.


It remains to define v × w in a way that satisfies (P). It turns out that there is no choice about
how to do this. Set v × w = (c1 , c2 , c3 ). The components c1 , c2 , c3 are determined by (P). For
instance:
 
c1 = (1, 0, 0) · (c1 , c2 , c3 ) = i · (v × w) = det i v w
 
1 0 0  
v 2 v 3
= det  v1 v2 v3  = det .
w2 w3
w1 w2 w3
2.5. THE CROSS PRODUCT 35

Similarly, c2 and c3 are determined by looking at j · (v × w) and k · (v × w), respectively.


Definition. Given vectors v and w in R3 , their cross product is defined by:
      
v2 v3 v1 v3 v1 v2
v × w = det , − det , det .
w2 w3 w1 w3 w1 w2
One can check that this is the same as:
 
i j k
v × w = det  v1 v2 v3  . (×)
w1 w2 w3

Working backwards, it is not hard to verify that the formula given by (×) does indeed satisfy
(P) (Exercise 5.8), so that we have actually accomplished something.
The odd looking determinant in (×) is somewhat illegitimate in that the entries in the top row
are vectors, not scalars. It is meant to be calculated in the usual way by expansion along the top
row. To understand how it works, perhaps it is best to do some examples.
Example 2.15. First, we reproduce our result for i × j, putting i = (1, 0, 0) and j = (0, 1, 0) in the
second and third rows of (×) and expanding along the first row:
 
i j k
i × j = det 1 0 0  = i · (0 − 0) − j · (0 − 0) + k · (1 − 0) = k,
0 1 0
as expected.
Similarly, j × k = i and k × i = j.
Example 2.16. If v = (1, 2, 3) and w = (4, 5, 6), then:
 
i j k
v × w = det 1 2 3 
4 5 6
= i · (12 − 15) − j · (6 − 12) + k · (5 − 8)
= −3i + 6j − 3k
= (−3, 6, −3).
As a check against possible computational error, one can go back and confirm that the result is
orthogonal to both v and w.
Example 2.17. With the same v and w as in the previous example, let u = (1, −2, 3). Then:
u · (v × w) = (1, −2, 3) · (−3, 6, −3) = −3 − 12 − 9 = −24. (2.1)
Suppose that we use the same three factors but in a permuted order, say w · (v × u). Taking
advantage of the result of (2.1), what can you say about the value of this product without calculating
the individual dot and cross products involved?
We use (P) and properties of the determinant:
 
w · (v × u) = det w v u (property (P))
 
= − det u v w (row switch)
= −u · (v × w) (property (P))
= −(−24) (by (2.1))
= 24.

www.dbooks.org
36 CHAPTER 2. PATHS AND CURVES

The basic properties of the cross product are summarized below.


1. w × v = −v × w
(Justification. Interchanging rows in (×) changes the sign of the determinant.)
2. v × v = 0
(Justification. Equal rows in (×) implies the determinant is zero.)
3. (The length of the cross product.)

kv × wk = kvk kwk sin θ, where θ is the angle between v and w (2.2)

(Justification. This appears as Exercise 5.9. It uses v · w = kvk kwk cos θ.)
The length formula (2.2) for v × w has some useful consequences:

(a) v × w = 0 if and only if v = 0, w = 0, or θ = 0 or π. This follows immediately from (2.2).


In other words, v × w = 0 if and only if one of v or w is a scalar multiple of the other.

(b) Consider the parallelogram determined by v and w. We can find its area by thinking of v as
base and dropping a perpendicular from w to get the height, as in the case of Figure 1.10 in
Chapter 1. See Figure 2.13.

Area of parallelogram = (base)(height)


= kvk(kwk sin θ)
= kv × wk.

Figure 2.13: The parallelogram determined by v and w

In other words:

kv × wk is the area of the parallelogram determined by v and w.

(c) We can now describe the geometric interpretation of 3 by 3 determinants as volume that
was mentioned in Chapter 1. Let u, v, and w be points in R3 such that 0, u, v, and w
are not coplanar. This determines a “paralellepiped,” which is the 3-dimensional analogue of
a parallelogram in the plane, as shown in Figure 2.14. It consists of all points of the form
au + bv + cw, where a, b, c are scalars such that 0 ≤ a ≤ 1, 0 ≤ b ≤ 1, and 0 ≤ c ≤ 1. Then:

 
det u v w is the volume of the parallelepiped determined by u, v, and w.

2.5. THE CROSS PRODUCT 37

Figure 2.14: The parallelepiped determined by u, v, w in R3

This follows from property (P), which we can use to reverse course and take what we’ve
learned about dot and cross products to tell us about determinants. We leave the details
for the exercises (Exercise 5.13 to be precise), though the relevant ingredients are labeled in
Figure 2.14. In the figure, ϕ denotes the angle between u and v × w.

4. (The direction of the cross product.)


We have seen that the cross product v ×w is orthogonal to v and w, but this does not pin down
its direction completely. There are two possible orthogonal directions, each opposite the other. We
present a criterion that distinguishes the possibilities in terms of v and w.

Definition. Let (v1 , v2 , . . . , vn ) be a basis of Rn , where the


 vectors have been
 listed in a particular
order. We say that (v1 , v
 2 , . . . , vn ) is right-handed
 if det v1 v2 . . . vn > 0 and left-handed
if det v1 v2 . . . vn < 0. (As usual, v1 v2 . . . vn is the n by n matrix whose rows are
v1 , v2 , . . . , vn .)

Example 2.18. Determine the orientation of the standard basis (e1 , e2 , . . . , en ).


 
1 0 ... 0
 1
0 . . . 0
  
det e1 e2 . . . en = det  . . ..  = 1.

 .. .. ..
. .
0 0 ... 1
This is positive, so (e1 , e2 , . . . , en ) is right-handed. By comparison, (e2 , e1 , . . . , en ) is left-handed
(switching two rows flips the sign of the determinant). The orientation of a basis depends on the
order in which the basis vectors are written.

In R3 , given v and w, what is the orientation of (v, w, v × w)? Note that:


   
det v w v × w = + det v × w v w (two row switches)
= (v × w) · (v × w) (property (P) with x = v × w)
2
= kv × wk . (2.3)

As shown earlier, v × w 6= 0 provided that v and w are not scalar


 multiples of one another. If
this is the case, then equation (2.3) implies that det v w v × w is positive. Thus:

Proposition 2.19. If v and w are vectors in R3 , neither a scalar multiple of the other, then the
triple (v, w, v × w) is right-handed.

In R3 , the right-handed orientation gets its name from a rule of thumb known as the “right-hand
rule.” This is based on a convention regarding the standard basis (i, j, k), namely, if you rotate the

www.dbooks.org
38 CHAPTER 2. PATHS AND CURVES

fingers of your right hand from i to j, your thumb points in the direction of k. This is not so
much a fact as it is an agreement: whenever we draw R3 , we agree to orient the positive x, y, and
z-axes so that the right-hand rule for (i, j, k) is satisfied, as in Figure 2.15. As we continue with
further material, figures in R3 become increasingly prominent, so it is probably good to state this
convention explicitly.
More generally, given a basis (v, w, z) of R3 , the plane P spanned by v and w separates R3
into two half-spaces, or “sides.” Once we adopt the right-hand rule for (i, j, k), it turns out that
(v, w, z) is right-handed in the sense of the definition if and only if, when you rotate the fingers
of your right hand from v to w, your thumb lies on the same side of P as z. In particular, by
Proposition 2.19, your thumb lies on the same side as v × w. In this way, your right thumb gives
the direction of v × w.

Figure 2.15: The circle of life: i × j = k, j × k = i, k × i = j.

Proving the connection between the sign of a determinant and the right-hand rule would be
too great a digression. One approach uses the continuity of the determinant and the geometry of
rotations to show that, if the rule works for (i, j, k), then it works in general. In any case, the
right-hand rule is sometimes a convenient way to find cross products geometrically. For instance,
in addition to i × j = k, it allows us to see that j × k = i, not −i, and that k × i = j, as pictured
in Figure 2.15.

2.6 The geometry of space curves: Frenet vectors


We take up the following question: How can one tell when two curves in R3 are congruent, that is,
either curve can be translated, rotated and/or reflected so that it fits exactly on top of the other?
As mentioned earlier, the main idea is to construct a basis of orthogonal unit vectors that moves
along the curve and then to study how the basis vectors change during the motion.
So let C be a curve in R3 , parametrized by a differentiable path α : I → R3 , where I is an
interval in R. In Section 2.4, we defined two of the vectors in our moving basis:

• the unit tangent vector T(t) = 1 0 1


kα0 (t)k α (t) = v(t) v(t) and
• the principal normal vector N(t) = 1 0
kT0 (t)k T (t).

These definitions make sense only if α0 (t) 6= 0 and T0 (t) 6= 0, so we assume that this is the case
from now on. Both T(t) and N(t) are constructed to be unit vectors, and we showed before that
T(t) · N(t) = 0. At this point, the third basis vector is easily defined.

Definition. The cross product B(t) = T(t) × N(t) is called the binormal vector.
2.6. THE GEOMETRY OF SPACE CURVES: FRENET VECTORS 39

As a cross product, B(t) is orthogonal to T(t) and N(t). Moreover, its length is kB(t)k =
kT(t)k kN(t)k sin θ = 1 · 1 · sin π2 = 1. Thus (T(t), N(t), B(t)) is a collection of orthogonal unit
vectors. The vectors are called the Frenet vectors of α and are illustrated in Figure 2.16.

Figure 2.16: The Frenet vectors T(t), N(t), B(t)

Example 2.20 (The Frenet vectors of a helix). We shall use the helix with parametrization

α(t) = (a cos t, a sin t, bt)

as a running example to illustrate new concepts as they arise. Here, a and b are constant, and
a > 0. Recall that the helix is called right-handed if b > 0 and left-handed if b < 0. If b = 0, the
curve collapses to a circle of radius a in the xy-plane.
To find the Frenet vectors T(t), N(t), and B(t) of the helix, we simply and carefully follow the
definitions. p √
Unit tangent: α0 (t) = (−a sin t, a cos t, b), so kα0 (t)k = a2 sin2 t + a2 cos2 t + b2 = a2 + b2 . Thus:

1 1
T(t) = α0 (t) = √ (−a sin t, a cos t, b).
kα0 (t)k a2 + b2

Principal normal: From the last calculation, T0 (t) = √ 1


a2 +b2
(−a cos t, −a sin t, 0), so kT0 (t)k =
p
√ 1 a2 cos2 t + a2 sin2 t + 02 = √a2a+b2 . Hence:
a2 +b2

1 1 1
N(t) = T0 (t) = ·√ (−a cos t, −a sin t, 0)
kT0 (t)k √ a a2 + b2
a2 +b2
1
= (−a cos t, −a sin t, 0)
a
= (− cos t, − sin t, 0).

Binormal: Lastly:

i j k
 

B(t) = T(t) × N(t) = det − √aasin t


2 +b2
√a cos t
a2 +b2
√ b
a2 +b2

− cos t − sin t 0
b sin t  b cos t  a sin2 t a cos2 t 
=i· √ −j· √ +k· √ +√
a2 + b2 a2 + b2 a2 + b2 a2 + b2
1
=√ b sin t, −b cos t, a).
a2 + b2

www.dbooks.org
40 CHAPTER 2. PATHS AND CURVES

2.7 Curvature and torsion


We use the Frenet vectors (T, N, B) to make two geometric measurements:
• curvature, the rate of turning, and
• torsion, the rate of wobbling.
First, curvature measures how fast the tangent direction is changing, which is represented
by kT0 (t)k. A given curve can be traced out in many different ways, however, so, if we want
curvature to reflect purely on the geometry of the curve, we must take into account the effect of
the parametrization. This is an important point, and it is an issue that we address more carefully
later when we discuss the effect of parametrization on vector integrals over curves. (See Exercise
1.13 in Chapter 9.) For the time being, let’s agree informally that we can normalize for the effect
of the parametrization by dividing by the speed. For instance, if you traverse the same curve a
second time, twice as fast before, the unit tangent will change twice as fast, too. Dividing kT0 (t)k
by the speed maintains a constant ratio.
Definition. The curvature is defined by:
kT0 (t)k
κ(t) = ,
v(t)
where v(t) is the speed.1 Note that, by definition, curvature is a nonnegative quantity.
Example 2.21 (The curvature of a helix). For the helix α(t) = (a cos t, a sin t, bt), the ingredients
that go into the definition of curvature were
√ obtained as part of the work of Example 2.20, namely,
0 a 0
kT (t)k = a2 +b2 and v(t) = kα (t)k = a2 + b2 . Thus:

√ a
2 2 a
Curvature of a helix: κ(t) = √ a +b = .
a + b2
2 a2 + b2
a
In particular, for a circle of radius a, in which case b = 0, the curvature is κ = a2 +02
= a1 .
Next, to measure wobbling, we look at the binormal B, which is orthogonal to the plane spanned
by T and N. This plane is the “plane of motion” in the sense that it contains the velocity and
acceleration vectors (see Exercise 9.4), so B0 can be thought of as representing the rate of wobble
of that plane.
We prove in Lemma 2.24 below that B0 (t) is always a scalar multiple of N(t), that is:
B0 (t) = c(t)N(t),
where c(t) is a scalar that depends on t. For the moment, let’s accept this as true. Then c(t)
represents the rate of wobble. We again normalize by dividing by the speed. Moreover, the
convention is to throw in a minus sign.
Definition. Given that B0 (t) = c(t)N(t) as above, then the torsion is defined by:
c(t)
τ (t) = − ,
v(t)
where v(t) is the speed.
1
What Exercise 1.13 in Chapter 9 shows is that the curvature is essentially a function of the points on the curve
C in the sense that, if x ∈ C and if there is only one value of the parameter t such that α(t) = x, then the quantity
kT0 (t)k
v(t)
is the same for all such parametrizations of C. Thus we could denote the curvature by κ(x) and calculate it
without worrying about which parametrization we use.
2.7. CURVATURE AND TORSION 41

Example 2.22 (The torsion of a helix). To find the torsion of the helix α(t) = (a cos t, a sin t, bt),
we again piggyback on the calculations of Example 2.20. For instance, using the result that B(t) =
√ 1
a2 +b2
(b sin t, −b cos t, a), we obtain B0 (t) = √a21+b2 (b cos t, b sin t, 0). To find c(t), we want to
manipulate this to be a scalar multiple of N(t) = (− cos t, − sin t, 0):

1 b b
B0 (t) = √ (b cos t, b sin t, 0) = − √ (− cos t, − sin t, 0) = − √ N(t).
a2+b2 2
a +b2 a + b2
2

Therefore c(t) = − √a2b+b2 , so:

c(t) − √a2b+b2 b
Torsion of a helix: τ (t) = − = −√ = 2 .
v(t) a2 + b2 a + b2

Note that the torsion is positive if the helix is right-handed (b > 0) and negative if left-handed
(b < 0). In the case of a circle (b = 0), the torsion is zero. This reflects the fact that planar curves
don’t wobble in space at all.

To complete the discussion of torsion, it remains to justify the claim that B0 is a scalar multiple
of N. The argument uses a product rule for the cross product.

Proposition 2.23 (Product rules). Let I be an interval of real numbers, and let α, β : I → Rn be
differentiable paths and f : I → R a differentiable real-valued function. Then:

1. (Dot product) (α · β)0 = α0 · β + α · β 0 .


2. (Cross product) (α × β)0 = α0 × β + α × β 0 .
3. (Scalar multiplication) (f α)0 = f 0 α + f α0 .

Proof. The first of these was proven earlier (Proposition 2.10), and the others are the same, swap-
ping in the appropriate type of product. We leave them as exercises (Exercise 7.3).

Lemma 2.24. The derivative B0 of the binormal is always a scalar multiple of the principal normal
N.

Proof. We show that B0 is orthogonal (a) to B and (b) to T. The only vectors orthogonal to both
are precisely the scalar multiples of N, so the lemma follows.
(a) B is a unit vector, so kBk is constant. Hence B and its derivative B0 are always orthogonal
(Corollary 2.12).
(b) By definition, B = T × N, so using the product rule:

B0 = T0 × N + T × N0
= (kT0 k N) × N + T × N0 (definition of N)
0 0
= kT k 0 + T × N (v × v = 0 always)
0
=T×N.

As a cross product, B0 is orthogonal to T (and to N0 for that matter).

www.dbooks.org
42 CHAPTER 2. PATHS AND CURVES

2.8 The Frenet-Serret formulas


We saw in Examples 2.21 and 2.22 that, for the helix α(t) = (a cos t, a sin t, bt), a > 0, the curvature
and torsion are given by:
a b
κ= 2 2
and τ= 2 ,
a +b a + b2
respectively. For example, if α(t) = (2 cos t, 2 sin t, t), i.e., a = 2 and b = 1, then κ = 52 and τ = 15 .
In particular, helices have constant curvature and torsion.
We are about to see that the geometry of a curve in R3 is determined by its curvature and
torsion. For instance, it turns out that any curve with constant curvature and torsion is necessarily
congruent to a helix. The proof is based on the following formulas, which describe precisely how
the Frenet vectors change within the coordinate systems that they determine.
Proposition 2.25 (Frenet-Serret formulas). Let α : I → R3 be a differentiable path in R3 with
speed v, Frenet vectors (T, N, B), curvature κ, and torsion τ . Assume that v 6= 0 and T0 6= 0 so
that the Frenet vectors are defined for all t. Then:
(1) T0 = κvN,
(2) N0 = −κvT + τ vB,
(3) B0 = −τ vN.
Proof. The first and third of these are basically the definitions of curvature and torsion, so we prove
0
them first. To prove (1), we know that, by definition, N = kT10 k T0 and κ = kTv k . Thus:

T0 = kT0 kN = κvN.

For (3), we know that B0 = cN and τ = − vc for some scalar-valued function c. Hence c = −τ v,
and B0 = −τ vN.
Lastly, for (2), the right-hand rule gives N = B × T (see Figure 2.16). Therefore, according to
the product rule and the two Frenet-Serret formulas that were just proven:

N0 = B0 × T + B × T0
= −τ vN × T + B × (κvN).

But, again by the right-hand rule, N × T = −B and B × N = −T. Substituting into the last
expression gives N0 = −κvT + τ vB, completing the proof.

2.9 The classification of space curves


Our discussion of curves in R3 culminates with a result on the connection between curvature and
torsion—that is, turning and wobbling—and the intrinsic geometry of a curve. On the one hand,
moving a curve rigidly in space does not change geometric features of the curve such as its length.
It is plausible that this extends to not changing the curvature or torsion as well, and we begin by
elaborating on this point.
For example, suppose that a path α(t) is translated by a constant amount c to obtain a new
path β(t) = α(t) + c. Then β 0 (t) = α0 (t), so α and β have the same velocity and therefore the
same speed. It follows from the definition that they also have the same unit tangent vector T(t).
Following along with further definitions, they then have the same principal normal N(t), hence the
same binormal B(t), and finally the same curvature κ(t) and torsion τ (t). In other words, speed,
curvature, and torsion are preserved under translations.
2.9. THE CLASSIFICATION OF SPACE CURVES 43

The same conclusion applies to rotations, though we don’t really have the tools to give a rigorous
proof at this point. So we try an intuitive explanation. Suppose that α is rotated in R3 to obtain
a new path β. The velocities α0 (t) and β 0 (t) are related by the same rotation, and then so are
the respective Frenet vectors (T(t), N(t), B(t)). So the Frenet vectors of the two paths are not the
same, but, since they are rotated versions of one another, the corresponding vectors change at the
same rates. In particular, the speed, curvature, and torsion are the same.
Our main theorem is a converse to these considerations. That is, we show that, if two paths α
and β have the same speed, curvature, and torsion, then they differ by a translation and/or rotation
in the sense that there is a composition of translations and rotations F : R3 → R3 that transforms
α into β, i.e., F (α(t)) = β(t) for all t. Hence measuring the three scalar quantities v, κ, τ suffices to
determine the geometry of a space curve. It is striking and satisfying that such a complete answer
is possible. In the spirit of the theorems in plane geometry about congruent triangles (SAS, ASA,
SSS, etc.), we call the theorem the “vκτ theorem.”

Theorem 2.26 (vκτ theorem). Let I be an interval of real numbers, and let α, β : I → R3 be
differentiable paths in R3 such that v and T0 are nonzero so that the Frenet vectors are defined for
all t. Then either path can be translated and rotated so that it fits exactly on top of the other if and
only if they have:

• the same speed v(t),


• the same curvature κ(t), and
• the same torsion τ (t).

In particular, two paths with the same speed, curvature, and torsion are congruent to each other.

Proof. That translations and rotations preserve speed, curvature, and torsion was discussed above.
For the converse, assume that α and β have the same speed, curvature, and torsion. We proceed
in three stages to move α on top of β.
Step 1. A translation.
Choose now and for the rest of the proof a point a in the interval I. We first translate α so
that α(a) moves to β(a). In other words, we shift α by the constant vector d = β(a) − α(a) to get
a new path γ(t) = α(t) + d = α(t) + β(a) − α(a). Then:

γ(a) = α(a) + β(a) − α(a) = β(a).

Moreover, as a translate, γ has the same speed, curvature, and torsion as α and hence as β as well.
Step 2. Two rotations.
Let x0 denote the common point γ(a) = β(a). The unit tangents to γ and β at x0 may not
be equal, but we can rotate γ about x0 until they are. Then, rotate γ again using this tangent
direction as axis of rotation until the principal normals at x0 coincide, too. The binormals are then
automatically the same, since they are the cross products of T and N. Moreover, neither rotation
changes the speed, curvature, or torsion.
Call this final path αe. In this way, we have translated and rotated α into a path α e satisfying
the following three conditions:

• α
e(a) = β(a) = x0 ,
• at the point x0 , α
e and β have the same Frenet vectors, and
• α
e and β have the same speed, curvature, and torsion for all t.

www.dbooks.org
44 CHAPTER 2. PATHS AND CURVES

Step 3. A calculation.
We show that α e = β. Since αe has been obtained from α by translation and rotation, the
theorem follows.
The argument uses the Frenet-Serret formulas, which we repeat for convenience:

T0 = κvN, N0 = −κvT + τ vB, B0 = −τ vN.

We denote the Frenet vectors of α e by (T,


e N,
e B)e and those of β by (T, N, B). Using similar
notation for the speed, curvature, and torsion, we have ve = v. κe = κ, and τe = τ by construction.
Let θ be the angle between T(t) and T(t) . Then T(t)·T(t) = kT(t)k
e e e kT(t)k cos θ = 1·1·cos θ =
cos θ. Therefore T(t) · T(t) is less than or equal to 1 always and is equal to 1 if and only if θ = 0,
e
that is, if and only if T(t)
e = T(t). The same comments apply to N(t)e · N(t) and B(t)
e · B(t).
Thus if we define f : I → R by

e · T(t) + N(t)
f (t) = T(t) e · N(t) + B(t)
e · B(t),

then the previous remarks imply:


(
≤3 for all t,
f (t)
=3 if and only if T(t)
e = T(t), N(t)
e = N(t), and B(t)
e = B(t).

For instance, f (a) = 3 by Step 2.


Now, by the product rule, f 0 = (T
e0 · T + T
e · T0 ) + (N
e0 ·N+N e · N0 ) + (B
e0 · B + B
e · B0 ). We use
the Frenet-Serret formulas to substitute for every derivative in sight, keeping in mind that v = ve,
κ=κ e, and τ = τe:
 
0
f = (κv N · T + T · κvN) + (−κv T + τ v B) · N + N · (−κvT + τ vB)
e e e e e
 
+ (−τ v N) · B + B · (−τ vN)
e e
 
= (κv N · T + κv T · N) + −κv T · N + τ v B · N − κv N · T + τ v N · B
e e e e e e

−(τ v N
e · B + τ vB
e · N).

Miraculously, the terms cancel in pairs so that f 0 = 0 for all t.


Hence f is a constant function. Since f (a) = 3, it follows that f (t) = 3 for all t. As noted
above, this in turn implies that T(t)
e = T(t) for all t (and N(t)
e = N(t), B(t) e = B(t) as well, but
we don’t need them).
By definition of the unit tangent, this gives v(t)1
e0 (t) = v(t)
α 1
β 0 (t), so αe0 (t) = β 0 (t) for all t.
Consequently α e and β differ by a constant vector: α e(t) = β(t) + c. Since α e(a) = x0 = β(a), the
constant is c = 0. Thus α e(t) = β(t) for all t. We have succeeded in translating and rotating α to
reach a path αe that lies on top of β.

Note that the theorem does not say what happens when a curve is reflected in a plane, that is,
how the curvature and torsion of a curve are related to those of its mirror image. For a clue as to
what happens in this case, see Exercise 9.14.
2.10. EXERCISES FOR CHAPTER 2 45

2.10 Exercises for Chapter 2


Section 1 Parametrizations
In Exercises 1.1–1.8, sketch the curve parametrized by the given path. Indicate on the curve
the direction in which it is being traversed.

1.1. α : [0, 2] → R2 , α(t) = (t, t2 )

1.2. α : [0, 6π] → R2 , α(t) = (t cos t, t sin t)

1.3. α : R → R2 , α(t) = (et , e−t ) (Hint: How are the x and y-coordinates related?)

1.4. α : R → R3 , α(t) = (cos t, sin t, t)

1.5. α : R → R3 , α(t) = (cos t, sin t, −t)

1.6. α : R → R3 , α(t) = (sin t, cos t, t)

1.7. α : R → R3 , α(t) = (sin t, cos t, −t)

1.8. α : [0, 2π] → R3 , α(t) = (cos t, sin t, cos 2t)

In Exercises 1.9–1.12, find a parametrization of the given curve.

1.9. The portion of the parabola y = x2 in R2 with −1 ≤ x ≤ 1

1.10. The portion of the curve xy = 1 in R2 with 1 ≤ x ≤ 2

1.11. The circle x2 + y 2 = a2 traversed once in the clockwise direction

1.12. A right-handed helix in R3 that:

• lies on the circular cylinder of radius 2 whose axis is the z-axis,


• completes one complete twist as it rises from z = 0 to z = 1.

1.13. Find a parametrization of the line in R3 that passes through the point a = (1, 2, 3) and is
parallel to v = (4, 5, 6).

1.14. Let ` be the line in R3 parametrized by α(t) = (1 + 2t, −3t, 4 + 5t). Find a point that lies on
` and a vector that is parallel to `.

1.15. Find a parametrization of the line in R3 that passes through the points a = (1, 0, 0) and
b = (0, 0, 1).

1.16. Find a parametrization of the line in R3 that passes through the points a = (1, −2, 3) and
b = (−4, 5, −6).

1.17. Let ` be the line in Rn parametrized by α(t) = a + tv, and let p be a point in Rn . Suppose
that q is the foot of the perpendicular dropped from p to ` (Figure 2.17).

(a) Since q lies on `, there is a value of t such that q = a + tv. Find a formula for t in terms
of a, v, and p. (Hint: ` lies in a direction perpendicular to p − q.)
(b) Find a formula for the point q.

www.dbooks.org
46 CHAPTER 2. PATHS AND CURVES

Figure 2.17: The foot of the perpendicular from p to `

1.18. Let ` be the line in R3 parametrized by α(t) = (1 + t, 1 + 2t, 1 + 3t), and let p = (0, 0, 4).
Find the foot of the perpendicular dropped from p to `.

1.19. Let ` and m be lines in R3 parametrized by:

α(t) = a + tv and β(t) = b + tw,

respectively, where a = (a1 , a2 , a3 ), v = (v1 , v2 , v3 ), b = (b1 , b2 , b3 ), and w = (w1 , w2 , w3 ).

(a) Explain why the problem of determining whether ` and m intersect in R3 amounts to
solving a system of three equations in two unknowns t1 and t2 .
(b) Determine whether the lines parametrized by α(t) = (−1, 0, 1) + t (1, −3, 2) and β(t) =
(1, 5, 4) + t (−1, 1, 6) intersect by writing down the corresponding system of equations,
as described in part (a), and solving the system.
(c) Repeat for the lines parametrized by α(t) = (2, 1, 0) + t (0, 0, 1) and β(t) = (0, 1, 3) +
t (1, 0, 0).

Section 2 Velocity, acceleration, speed, arclength

2.1. Let α(t) = (t3 , 3t2 , 6t).

(a) Find the velocity, speed, and acceleration.


(b) Find the arclength from t = 0 to t = 4.

2.2. Let α(t) = ( 21 t2 , 2t, 43 t3/2 ).

(a) Find the velocity, speed, and acceleration.


(b) Find the arclength from t = 2 to t = 4.

2.3. Let α(t) = (cos 2t, sin 2t, 4t3/2 ).

(a) Find the velocity, speed, and acceleration.


(b) Find the arclength from t = 1 to t = 2.

2.4. Let a, b be real numbers, a > 0, and let α(t) = (a cos t, a sin t, bt).

(a) Find the velocity, speed, and acceleration.


(b) Find the arclength from t = 0 to t = 2π.

2.5. Find a parametrization of the line through the points a = (1, 1, 1) and b = (2, 3, 4) that
traverses the line with constant speed 1.
2.10. EXERCISES FOR CHAPTER 2 47

Section 3 Integrals with respect to arclength


In Exercises 3.1–3.2, evaluate the integral with respect to arclength for the curve C with the
given parametrization α.

xyz ds, where α(t) = (t3 , 3t2 , 6t), 0 ≤ t ≤ 1


R
3.1. C
R
3.2. C z ds, where α(t) = (cos t + t sin t, sin t − t cos t, t), 0 ≤ t ≤ 4

3.3. Let C be the line segment in R2 from (1, 0) to (0, 1). With as little calculation as possible,
find the values of the following integrals.
R
(a) C 5 ds
R
(b) C (x + y) ds

3.4. Let C be the line segment in R2 parametrized by α(t) = (t, t), 0 ≤ t ≤ 1.


R
(a) Evaluate C (x + y) ds.
(b) Find a triangle in R3 whose base is C and whose area is represented by the integral in
part (a). (Hint: Area under a curve.)

Section 4 The geometry of curves: tangent and normal vectors


Note: Further exercises on this material appear in Section 9.
In Exercises 4.1–4.2, find the unit tangent T(t) and the principal normal N(t) for the given
path α. Then, verify that T(t) and N(t) are orthogonal.

4.1. α(t) = (sin 4t, cos 4t, 3t)

4.2. α(t) = (t, t2 , 0)

4.3. Let α(t) be a differentiable path in Rn .

(a) If the speed is constant, show that the velocity and acceleration vectors are always
orthogonal. (Hint: Consider kv(t)k2 .)
(b) Conversely, if the velocity and acceleration are always orthogonal, show that the speed
is constant.

4.4. Let C be a curve in Rn traced out by a differentiable path α : (a, b) → Rn defined on an open
interval (a, b). Assume that there is a point α(t0 ) on C that is closer to the origin than any
other point of C. Prove that the tangent vector α0 (t0 ) is orthogonal to α(t0 ). Draw a sketch
that illustrates the conclusion.

Section 5 The cross product


In Exercises 5.1–5.4, (a) find the cross product v × w and (b) verify that it is orthogonal to v
and w.

5.1. v = (1, 1, 1), w = (1, 3, 5)

5.2. v = (1, 2, 3), w = (2, 4, 6)

5.3. v = (1, −2, 4) , w = (−2, 1, 3)

www.dbooks.org
48 CHAPTER 2. PATHS AND CURVES

5.4. v = (2, 0, −1), w = (0, −2, 1)

5.5. Find a nonzero vector in R3 that points in a direction perpendicular to the plane that contains
the origin and the points p = (1, 1, 1) and q = (2, 1, −3).

5.6. Find a nonzero vector in R3 that points in a direction perpendicular to the plane that contains
the points p = (1, 0, 0), q = (0, 1, 0), and r = (0, 0, 1).

5.7. Let v = (−1, 1, −2), w = (4, −1, −1), and u = (−3, 2, 1).

(a) Find v × w.
(b) What is the area of the parallelogram determined by v and w?
(c) Find u · (v × w). Then, use your answer to find v · (u × w) and (u × v) · w.
(d) What is the volume of the parallelepiped determined by u, v, and w?

5.8. Verify that the cross  v × w as defined in equation (×) satisfies the key property that
 product
x · (v × w) = det x v w for all x in R3 .

5.9. Let v = (v1 , v2 , v3 ) and w = (w1 , w2 , w3 ) be vectors in R3 .

(a) Use the definitions of the dot and cross products in terms of coordinates to prove that:

kv × wk2 = kvk2 kwk2 − (v · w)2 .

(b) Use part (a) to give a proof that kv × wk = kvk kwk sin θ, where θ is the angle between
v and w.

5.10. Do there exist nonzero vectors v and w in R3 such that v · w = 0 and v × w = 0? Explain.

5.11. In R3 , let ` be the line parametrized by α(t) = a + tv, and let p be a point.

Figure 2.18: The distance from a point to a line

(a) Show that the distance from p to ` is given by the formula:

k(p − a) × vk
d= .
kvk

See Figure 2.18.


(b) Find the distance from the point p = (1, 1, −1) to the line parametrized by α(t) =
(2, 1, 2) + t(1, −2, 2).
2.10. EXERCISES FOR CHAPTER 2 49

5.12. True or false: u × (v × w) = (u × v) × w for all u, v, w in R3 . Either give a proof or find a


counterexample.
5.13. Let u, v, and w be points in R3 such that 0, u, v, and w are not coplanar.
 Prove that the
volume of the parallelepiped determined by u, v, and w is equal to det u v w . (Hint:
See Figure 2.14, where ϕ is the angle between u and v × w. Keep in mind that u and v × w
may lie on opposite sides of the plane spanned by v and w.)
5.14. Let (v, w, z) be a basis of R3 , and let θ be the angle between v × w and z, where 0 ≤ θ ≤ π,
as usual. Use the definition of handedness to show that θ < π2 if (v, w, z) is right-handed and
θ > π2 if left-handed. Draw pictures to illustrate both situations. (Hint: Consider z · (v × w).)
5.15. Strictly speaking, the cross product is defined only in R3 , but there is an analogous operation
in R4 if three factors are allowed. That is, given vectors u, v, w in R4 , the product u × v × w
is a vector in R4 that satisfies the property:
 
x · (u × v × w) = det x u v w
for all x in R4 , where the determinant is of the 4 by 4 matrix whose rows are x, u, v, w.
(a) Assuming for the moment that such a product u×v×w exists, show that it is orthogonal
to each of u, v, and w.
(b) If u × v × w 6= 0, is the quadruple (u, v, w, u × v × w) right or left-handed?
(c) Describe how you might find a formula for u × v × w, and use it to compute the product
if u = (1, −1, 1, −1), v = (1, 1, −1, −1), and w = (−1, 1, 1, −1).
5.16. In general, the “cross product” in Rn is a function of n − 1 factors. In R2 , this means that
there is only one factor.
(a) Given a vector v = (a, b) in R2 , find a reasonable formula for “v ×”, and explain how
you came up with it.
(b) Draw a sketch that illustrates v and your candidate for v×.
Section 6 The geometry of space curves: Frenet vectors
Note: Further exercises on this material appear in Section 9.
In Exercises 6.1–6.2, find the binormal B(t) for the given path α. These problems are continu-
ations of Exercises 4.1–4.2.
6.1. α(t) = (sin 4t, cos 4t, 3t)
6.2. α(t) = (t, t2 , 0)
6.3. Consider the line in R3 parametrized by α(t) = (1 + t, 1 + 2t, 1 + 3t). What happens when
you try to find its Frenet vectors (T(t), N(t), B(t))?
Section 7 Curvature and torsion
Note: Further exercises on this material appear in Section 9.
1
7.1. Find a parametrization of a curve that has constant curvature 2 and constant torsion 0.
1
7.2. Find a parametrization of a curve that has constant curvature 2 and constant torsion 14 .
7.3. Prove the product rule for cross products: If α and β are differentiable paths in R3 , then:
(α × β)0 = α0 × β + α × β 0 .

www.dbooks.org
50 CHAPTER 2. PATHS AND CURVES

Section 9 The classification of space curves

9.1. Consider the curve parametrized by:

α(t) = (cos t, − sin t, t).

(a) Find the speed v(t).


(b) Find the unit tangent T(t).
(c) Find the principal normal N(t).
(d) Find the binormal B(t).
(e) Find the curvature κ(t).
(f ) Find the torsion τ (t).
(g) The curve traced out by α is congruent to a curve with a parametrization of the form
β(t) = (a cos t, a sin t, bt), where a and b are constant and a > 0. Find the values of a
and b. Then, sketch the curves traced out by α and β, and describe a motion of R3 that
takes the path α and puts it right on top of β.

9.2. Let α be the path defined by:

α(t) = (6 cos t, 8 cos t, 10 sin t).

(a) Find the speed v(t).


(b) Find the unit tangent T(t).
(c) Find the principal normal N(t).
(d) Find the binormal B(t).
(e) Find the curvature κ(t).
(f ) Find the torsion τ (t).
(g) Based on your answers to parts (e) and (f), what can you say about the type of curve
traced out by α?

9.3. Let α be the path defined by:


√ √
α(t) = (t + 3 sin t, 3 t − sin t, 2 cos t).

The following calculations are kind of messy, but the conclusion may be unexpected.

(a) Find the speed v(t).


(b) Find the unit tangent T(t).
(c) Find the principal normal N(t).
(d) Find the binormal B(t).
(e) Find the curvature κ(t).
(f ) Find the torsion τ (t).
(g) Based on your answers to parts (e) and (f), what can you say about the type of curve
traced out by α?
2.10. EXERCISES FOR CHAPTER 2 51

In Exercises 9.4–9.8, α is a path in R3 with velocity v(t), speed v(t), acceleration a(t), and
Frenet vectors T(t), N(t), and B(t). You may assume that v(t) 6= 0 and T0 (t) 6= 0 for all t so that
the Frenet vectors are defined.

9.4. Prove the following statements.

(a) v = vT
(b) a = v 0 T + κv 2 N (Hence v 0 is the tangential component of acceleration and κv 2 the
normal component.)

9.5. Prove that v · a = vv 0 .


kv×ak
9.6. Show that the curvature is given by κ = v3
.

9.7. Let C be the curve in R3 parametrized by α(t) = (t, t2 , t3 ). Use the formula from the previous
exercise to find the curvature at the point (1, 1, 1).

9.8. According to Exercise 9.4, the velocity and acceleration vectors lie in the plane spanned by
T and N. Thus, like the binormal vector B, the cross product v × a is orthogonal to that
plane. It follows that the unit vector in the direction of v × a must be ±B. Show that in fact
v×a
it is B. In other words, show that B = kv×ak .

9.9. Let α(t) = (2 cos t, 2 sin t, 2 cos t).

(a) Find the curvature κ(t). (You may use the result of Exercise 9.6 if you wish.)
(b) For this curve, what does v × a tell you about the torsion τ (t)?

9.10. Let α be a path in R3 such that v 6= 0 and T0 6= 0, and let:

w = τ v T + κv B.

Prove the Darboux formulas:


w × T = T0 ,
w × N = N0 ,
w × B = B0 .

9.11. Suppose that the Darboux vector w = τ v T + κv B from the previous exercise is a constant
vector.

(a) Prove that τ v and κv are both constant. (Hint: Calculate w0 , and use the Frenet-Serret
formulas. Note that T and B need not be constant even if w is.)
(b) If α has constant speed, prove that α is congruent to a helix.

9.12. Let α be a path in R3 that lies on the sphere of radius a centered at the origin, that is:

kα(t)k = a for all t.

Let T(t) be the unit tangent vector, N(t) the principal normal, v(t) the speed, and κ(t) the
curvature. Assume that v(t) 6= 0 and T0 (t) 6= 0 for all t so that the Frenet vectors are defined.

(a) Show that α(t) · T(t) = 0 for all t.


(b) Show that κ(t)α(t) · N(t) = −1 for all t. (Hint: Differentiate part (a).)

www.dbooks.org
52 CHAPTER 2. PATHS AND CURVES
1
(c) Show that κ(t) ≥ a for all t.

9.13. Let α be a path in R3 that has:

• constant speed v = 1,
• positive torsion τ (t), and
√ √ √
2 2 2

• binormal vector B(t) = − 2 , 2 sin 2t, 2 cos 2t .

(a) Find, in whatever order you wish, the unit tangent vector T(t), the principal normal
N(t), the curvature κ(t), and the torsion τ (t).
(b) In addition, if α(0) = (0, 0, 0), find a formula for α(t) as a function of t. (Hint: Use your
formula for T(t).)

9.14. In this problem, we determine the effect of a reflection, specifically the reflection in the xy-
plane, on the Frenet vectors, the curvature, and the torsion of a path. To fix some notation,
if p = (x, y, z) is a point in R3 , let p∗ = (x, y, −z) denote its reflection in the xy-plane.
Now, let α(t) = (x(t), y(t), z(t)) be a path in R3 whose Frenet vectors are defined for all t,
and let β(t) be its reflection:

β(t) = α(t)∗ = (x(t), y(t), −z(t)).

(a) Show that α and β have the same speed v.


(b) Show the unit tangent vectors of α and β are related by Tβ = T∗α , i.e., they are also
reflections of one another.
(c) Similarly, show that the principal normals are reflections: Nβ = N∗α . (Hint: To organize
the calculation, start by writing Tα = (t1 , t2 , t3 ). Then, what does Tβ look like?)
(d) Show that α and β have the same curvature: κα (t) = κβ (t).
(e) Show that the binormals are related in the following way: if Bα = (b1 , b2 , b3 ), then
Bβ = (−b1 , −b2 , b3 ) = −B∗α .
(f ) Describe how the torsions τα (t) and τβ (t) of the paths are related. (Hint: Use parts (c)
and (e).)
Part III

Real-valued functions

53

www.dbooks.org
Chapter 3

Real-valued functions: preliminaries

Thus far, we have studied functions α for which the input is a real number t and the output is a
vector α(t) = (x1 (t), x2 (t), . . . , xn (t)). The techniques from calculus that we used were basically
familiar from first-year calculus. Now, we reverse the roles and consider functions where the input is
a vector x = (x1 , x2 , . . . , xn ) and the output is a real number y = f (x) = f (x1 , x2 , . . . , xn ). Thinking
of the coordinates x1 , x2 , . . . , xn as variables, these are real-valued functions of n variables. More
formally, they are functions f : A → R where the domain A is a subset of Rn . When more than
one variable is involved, we shall need methods that go beyond first-year calculus.
Example 3.1. We have seen that a helix can be parametrized by α(t) = (a cos t, a sin t, bt). The
values of a and b give the radius of the cylinder around which the helix winds and the rate at which
it rises, respectively. If a and b vary, the size and shape of the helix change.
a
The curvature of the helix is given by κ = a2 +b 2 . We can think of this as a real-valued function
that describes how the curvature depends on the geometric parameters a and b. In a way, it’s like
a function on the set of helices, but, strictly speaking, it is a function κ : A → R whose domain
as far as helices go is A = {(a, b) ∈ R2 : a > 0}, a subset of R2 . Likewise, the torsion of a helix
b
τ = a2 +b 2 is a real-valued function of the same two variables with the same domain.

3.1 Graphs and level sets


In order to analyze a function, it seems like a good idea to try to visualize its behavior somehow.
It is easy enough to come up with ways of doing this in principle, but it takes a lot of practice to
do it effectively when there are two or more variables.
For instance, to graph a real-valued function of n variables, we replace y = f (x) in the one-
variable case with xn+1 = f (x1 , x2 , . . . , xn ).
Definition. Let A be a subset of Rn , and let f : A → R be a function. The graph of f is the set:
{(x1 , x2 , . . . , xn+1 ) ∈ Rn+1 : (x1 , x2 , . . . , xn ) ∈ A and xn+1 = f (x1 , x2 , . . . , xn )}.
Note that the graph is a subset of Rn+1 , so we can hope to draw it only if n equals 1 or 2, i.e.,
when there are one or two variables. In the case of two variables, which is our focus, the domain
A is a subset of R2 , and the graph is the surface in R3 that satisfies the equation z = f (x, y) and
whose projection on the xy-plane is A.
p p
Example 3.2. Let f : R2 → R be given by f (x, y) = x2 + y 2 . To sketch the graph z = x2 + y 2 ,
we slice with various well-chosen planes and then try to reconstruct an image of the surface from
the cross-sections.

55

www.dbooks.org
56 CHAPTER 3. REAL-VALUED FUNCTIONS: PRELIMINARIES

For instance, consider the cross-sections with the three coordinate planes. The
p yz-plane is where
x = 0, so the equation of the intersection of the graph with this plane is z = 02 + y 2 = |y| and
x = 0, a V -shaped curve. Similarly, the cross-section with the xz-plane is z = |x|, y = 0, another
V -shaped curve at right angles to the first. These cross-sections are shown in Figure 3.1. Lastly,

Figure 3.1: Two cross-sections with coordinate planes: z = |y|, x = 0 (left) and z = |x|, y = 0
(right)
p
the cross-section with the xy-plane, where z = 0, is 0 = x2 + y 2 , z = 0. This consists of the single
point (0, 0, 0).
In general,
p cross-sections with horizontal planes z = c are also useful. Here, they are described
by c = x + y 2 , or x2 + y 2 = c2 , where c ≥ 0. This is a circle of radius c in the plane z = c. As c
2

increases, the circles get bigger. Some of them are shown in Figure 3.2.

Figure 3.2: Some horizontal cross-sections

By putting
p all this information together, we can assemble a pretty good picture of the whole
graph z = x2 + y 2 . It is the circular cone with vertex at the origin pictured in Figure 3.3.

p
Figure 3.3: A circular cone: the graph z = x2 + y 2
3.1. GRAPHS AND LEVEL SETS 57

The information about horizontal cross-sections in Figure 3.2 can also be presented by projecting
the cross-sections onto the xy-plane and presenting them like a topographical map. As we saw
above, the cross-section with z = c is described by the points (x, y) such that x2 + y 2 = c2 , a
circle of radius c. When projected onto the xy-plane, this is called a level curve. Sketching a
representative sample of level curves in one picture and labeling them with the corresponding value
of c is called a contour map. In this case, it consists of concentric circles about the origin. See
Figure 3.4. The function f is constant on each circle and increases uniformly as you move out.
2
2.5 2.5
2.
1 1.5
1.

0
0.5

-1

2.5 2.5
-2
-2 -1 0 1 2

p
Figure 3.4: Level curves of f (x, y) = x2 + y 2

Definition. Let A be a subset of Rn , and let f : A → R be a real-valued function. Given a real


number c, the level set corresponding to c is {x ∈ A : f (x) = c}. It is the set of all points in the
domain at which f has a value of c. For functions of two variables, level sets are also called level
curves and, for functions of three variables, level surfaces.

A level set is a subset of the domain A, which is contained in Rn . So we can hope to draw level
sets only for functions of one, two, or three variables.

Example 3.3. Let f : R2 → R be given by f (x, y) = x2 − y 2 . Sketch some level sets and the graph
of f .
Level sets: We set f (x, y) = x2 − y 2 = c and choose a few values of c. For example:
Level set
c=1 x2 − y 2 = 1, a hyperbola
c=0 x2 − y 2 = 0, or y = ±x, two lines
c = −1 x2 − y 2 = −1, or y 2 − x2 = 1, a hyperbola
These curves and a couple of others are sketched in Figure 3.5.
2

-2.

1
-1.
0. 0.

0
1. 1. 2.
2.

0. 0.
-1
-1.

-2.
-2
-2 -1 0 1 2

Figure 3.5: Level curves of f (x, y) = x2 − y 2 corresponding to c = −2, −1, 0, 1, 2

www.dbooks.org
58 CHAPTER 3. REAL-VALUED FUNCTIONS: PRELIMINARIES

The graph: For the graph, the level sets are taken out of the xy-plane and raised to height z = c.
Furthermore, identifying the cross-sections with the coordinate planes provides some additional
framework.
Coordinate plane Equation of cross-section in that plane
yz-plane: x = 0 z = −y 2 , a downward parabola
xz-plane y = 0 z = x2 , an upward parabola
These are shown in Figure 3.6.

Figure 3.6: Two cross-sections with coordinate planes: z = −y 2 , x = 0 (left) and z = x2 , y = 0


(right)

Reconstructing the graph from these pieces gives a saddle-like surface, also known as a hy-
perbolic paraboloid. See Figure 3.7. It has the distinctive feature that, in the cross-section with

Figure 3.7: A saddle surface: the graph z = x2 − y 2

the yz-plane, the origin looks like a local maximum, while, in the xz-plane, it looks like a local
minimum. (This last information can be deduced from the contour map, too. See Figure 3.5.)

3.2 More surfaces in R3


Not all surfaces in R3 are graphs z = f (x, y). We look at a few examples of some other surfaces
and the equations that define them.
Example 3.4. A sphere of radius a centered at the origin.
p A point (x, y, z) lies on the sphere if and only if k(x, y, z)k = a, in other words, if and only if
x2 + y 2 + z 2 = a. Hence:

Equation of a sphere: x2 + y 2 + z 2 = a2
The sphere of radius 1 is shown in Figure 3.8, left.
3.2. MORE SURFACES IN R3 59

Figure 3.8: The unit sphere x2 + y 2 + z 2 = 1 (left) and the cylinder x2 + y 2 = 1 (right)

Example 3.5. A circular cylinder of radius a whose axis is the z-axis.


Here, as long as (x, y) satisfies the condition for being on the circle of radius a, z can be anything.
Thus:

Equation of a circular cylinder: x2 + y 2 = a2

An example is shown on the right of Figure 3.8.

Example 3.6. Sketch the surface x2 + y 2 − z 2 = 1.


This is not the graph z = f (x, y) of a function: given an input (x, y), there can be two
p choices of z
the equation. In fact, the surface is the union of two graphs, namely, z = x2 + y 2 − 1
that satisfy p
and z = − x2 + y 2 − 1. Nevertheless, as was the case with graphs, we can try to construct a
sketch by looking at cross-sections with various well-chosen planes. For example:
Cross-sectional plane Equation of cross-section in that plane
yz-plane: x = 0 y 2 − z 2 = 1, a hyperbola
xz-plane: y = 0 x2 − z 2 = 1, a hyperbola √
horizontal plane: z = c x2 + y 2 = 1 + c2 , a circle of radius 1 + c2
See Figure 3.9.

Figure 3.9: Cross-sections with the yz-plane (left), xz-plane (middle), and horizontal planes (right)

These cross-sections fit together to form a surface that looks like a nuclear cooling tower, as
shown in Figure 3.10.

In fact, each of these last three surfaces is a level set of a function of three variables. The
sphere is the level set of f (x, y, z) = x2 + y 2 + z 2 corresponding to c = a2 ; the cylinder the

www.dbooks.org
60 CHAPTER 3. REAL-VALUED FUNCTIONS: PRELIMINARIES

Figure 3.10: The cooling tower x2 + y 2 − z 2 = 1

level set of f (x, y, z) = x2 + y 2 corresponding to c = a2 ; and the cooling tower the level set of
f (x, y, z) = x2 + y 2 − z 2 corresponding to c = 1.
This suggests a way to visualize the behavior of functions of three variables, which would require
four dimensions to graph. Namely, determine the level sets given by f (x, y, z) = c. Then the way
these surfaces morph into one another as c varies reflects which points get mapped to which values.
For example, the function f (x, y, z) = x2 + y 2 + z 2 is constant on its level sets, which are spheres,
and the larger the sphere, the greater the value of f .

Example 3.7. A graph of a function f (x, y) of two variables can be regarded as a level set, too, but
of a function of three variables. For let F (x, y, z) = f (x, y) − z. Then the graph z = f (x, y) is the
level set of F corresponding to c = 0. Both are defined by the condition F (x, y, z) = f (x, y) − z =
0.

3.3 The equation of a plane in R3


Suppose that a real-valued function of three variables T : R3 → R happens to be a linear transfor-
mation. As with any linear transformation, T is represented by a matrix, in this case, a 1 by 3
matrix:  
  x
T (x, y, z) = A B C y  = Ax + By + Cz.
z
The level set of T corresponding to a value D is the set of all (x, y, z) in R3 such that:

Ax + By + Cz = D. (3.1)

To understand what this level set is, we rewrite (3.1) as a dot product (A, B, C) · (x, y, z) = D, or,
writing the typical point (x, y, z) of R3 as x:

n · x = D,

where n = (A, B, C). Suppose that we happen to know one particular point p on the level set.
Then n · p = D, so all points on the level set satisfy:

n · x = D = n · p. (3.2)
3.3. THE EQUATION OF A PLANE IN R3 61

In other words, n·(x−p) = 0. This says that x lies on the level set if and only if x−p is orthogonal
to n. Geometrically, this is true if and only if x lies in the plane that is perpendicular to n and
that passes through p. This is illustrated in Figure 3.11. We say that n is a normal vector to the

Figure 3.11: The plane through p with normal vector n

plane. Thus the level sets of the linear transformation T are planes, all having as normal vector
the vector n whose components are the entries of the matrix that represents T .
For future reference, we record the results of (3.1) and (3.2).
Proposition 3.8.
1. In R3 , the equation of the plane through p with normal vector n is:
n · x = n · p, where x = (x, y, z).

2. Alternatively, the equation Ax + By + Cz = D is the equation of a plane with normal vector


n = (A, B, C).
Example 3.9. Find an equation of the plane containing the point p = (1, 2, 3) and the line
parametrized by α(t) = (4, 5, 6) + t (7, 8, 9). See Figure 3.12.

Figure 3.12: The plane determined by a given point and line

We use the form n · x = n · p. As a point in the desired plane, we can take p = (1, 2, 3). From
its parametrization, the line α passes through the point a = (4, 5, 6) and is parallel to v = (7, 8, 9).
Hence as normal vector to the plane, we can use the cross product:

→ × v = (p − a) × v = (1, 2, 3) − (4, 5, 6) × (7, 8, 9)
ap
= (−3, −3, −3) × (7, 8, 9)
 
i j k
= det −3 −3 −3
7 8 9
= (−3, 6, −3)
= −3(1, −2, 1).

www.dbooks.org
62 CHAPTER 3. REAL-VALUED FUNCTIONS: PRELIMINARIES

The scalar multiple n = (1, −2, 1) is also a normal vector, and, since it’s a little simpler, it’s the
one we use. Substituting into n · x = n · p gives:

(1, −2, 1) · (x, y, z) = (1, −2, 1) · (1, 2, 3)


x − 2y + z = 1 − 4 + 3
or x − 2y + z = 0.

Example 3.10. Are the planes x + y − z = 5 and x − 2y + 3z = 6 perpendicular?


The angle between two planes is the same as the angle between their normal vectors (Figure
3.13), so it suffices to check if the normals are orthogonal. Reading off the coefficients from the

Figure 3.13: The angle between two planes

equations that define the planes, the normals are n1 = (1, 1, −1) and n2 = (1, −2, 3). Then
n1 · n2 = 1 − 2 − 3 = −4, which is nonzero. The planes are not perpendicular.

3.4 Open sets


We prepare for the important concept of continuity. In first-year calculus, a real-valued function f
of one variable is said to be continuous if its graph has no holes or breaks. Intuitively, whenever x
approaches a point a, the value f (x) approaches the value f (a). This can be written succinctly as
limx→a f (x) = f (a).
Unfortunately, this idea is difficult to write down in a rigorous way. The notions of continuity
and limit are closely related. In many respects, limits are more fundamental, but we shall work
more directly with continuity. Hence we base our discussion on the definition of a continuous
function and modify it later to incorporate limits.
We begin by discussing the most natural type of subset of Rn for taking limits. These are sets
in which there is a cushion around every point so that one can approach the point from all possible
directions.

Definition. Let a be a point in Rn , and assume that r > 0. The open ball with center a and
radius r, denoted B(a, r), is defined to be:

B(a, r) = {x ∈ Rn : kx − ak < r}.

In other words, it is the set of all points within r units of a.

Example 3.11. In R, an open ball is an open interval (a − r, a + r). In R2 , it is a disk, the region
in the interior of a circle. These are shown in Figure 3.14. In R3 , an open ball is a solid ball, the
region in the interior of a sphere.
3.4. OPEN SETS 63

Figure 3.14: Open balls in R and R2

Definition. A subset U of Rn is called open if, given any point a in U , there exists a positive real
number r such that B(a, r) ⊂ U .
So to show that a set U is open, one starts with an arbitrary point a of U and finds a value of r
so that the open ball about a of radius r stays entirely within U . The value of r depends typically
on the conditions that define the set U as well as on the point a. For the time being, we shall be
content to find a concrete expression for r and accept geometric intuition to justify that it works.
If, for some point a in U , there is no value of r that works, then U is not open.
Example 3.12. In R2 , let U be the open ball B((0, 0), 1) of radius 1 centered at the origin. This
is the set of all x in R2 such that kxk < 1. We verify that this open ball is in fact an open set.
Let a be a point of U , and let r = 1 − kak. Note that r > 0 since kak < 1. Then B(a, r) ⊂ U ,
so, given any a, we’ve found an r that works. See Figure 3.15. Hence U is an open set.

Figure 3.15: Open balls are open sets.

To those who would like a more detailed justification that B(a, r) ⊂ U , please see Exercise
7.2. Note that the expression r = 1 − kak gets smaller as a approaches the boundary of the disk.
That this is necessary makes sense from the geometry of the situation. By modifying the argument
slightly, one can show that every open ball in Rn is an open set in Rn (Exercise 7.3).
Example 3.13. Prove that the set U = {(x, y) ∈ R2 : x > 0, y < 0} is open in R2 .
This is the fourth quadrant of the plane, not including the coordinate axes. Let a be a point
of U . Write a = (c, d), so c > 0 and d < 0. The closest point on the boundary of U lies on one of
the axes, either c or |d| units away (Figure 3.16). We choose r to be whichever of these is smaller,
that is, r = min{c, |d|}. Then B(a, r) ⊂ U . Thus U is open.
Example 3.14. Is U = {(x, y) ∈ R2 : x > 0, y ≤ 0} open in R2 ?
This is the same set as in the previous example with the positive x-axis added in. The change is
enough to keep the set from being open. For instance, the point a = (1, 0) is in U , but every open

www.dbooks.org
64 CHAPTER 3. REAL-VALUED FUNCTIONS: PRELIMINARIES

Figure 3.16: An open quadrant

ball about a includes points outside of U , namely, points in the upper half-plane. This is shown in
Figure 3.17. Thus no value of r works for this choice of a.

Figure 3.17: A nonopen quadrant

Example 3.15. If U and V are open sets in Rn , prove that their intersection U ∩ V is open, too.
Let a be a point of U ∩ V . Then a ∈ U , and, since U is open, there exists a positive number
r1 such that B(a, r1 ) ⊂ U . Likewise, a ∈ V , and there exists r2 such that B(a, r2 ) ⊂ V . Let
r = min{r1 , r2 } (Figure 3.18). Then B(a, r) ⊂ B(a, r1 ) ⊂ U and B(a, r) ⊂ B(a, r2 ) ⊂ V , so

Figure 3.18: The intersection of two open sets

B(a, r) ⊂ U ∩ V . In other words, given any a in U ∩ V , there’s an r that works.


By similar reasoning, the intersection U1 ∩ U2 ∩ · · · ∩ Uk of a finite number of open sets in Rn
is open in Rn . On the other hand, an infinite intersection U1 ∩ U2 ∩ · · · need not be open. For
instance, in R2 , consider the sequence of concentric open balls U1 = B((0, 0), 1), U2 = B((0, 0), 12 ),
3.4. OPEN SETS 65

U3 = B((0, 0), 13 ), etc. Each Un is open, but the only point common to all of them is the origin.
Thus U1 ∩ U2 ∩ · · · = {(0, 0)}. This is no longer an open set.
There is also a notion of closed set, though the definition may not be what one would guess.

Definition. A subset K of Rn is called closed if Rn − K = {x ∈ Rn : x ∈


/ K} is open. The set
Rn − K is called the complement of K in Rn .

Example 3.16. Let K = {x ∈ R2 : kxk ≤ 1}. This is the open ball centered at the origin of radius
1 together with its boundary, the unit circle, as shown in Figure 3.19. Its complement R2 − K is
the set of all points x such that kxk > 1, which is an open set. (Briefly, given a in R2 − K, then
r = kak − 1 works in the definition of open set.) Hence K is closed.

Figure 3.19: Closed balls are closed sets.

More generally, if a ∈ Rn and r ≥ 0, then K(a, r) = {x ∈ Rn : kx − ak ≤ r} is called a closed


ball. These are closed subsets of Rn as well by a similar argument.

Example 3.17. Let K = {x ∈ R2 : either kxk < 1 or x2 + y 2 = 1 where y ≥ 0}. This is the same
as the closed ball of the preceding example except that the lower semicircle has been deleted. See
Figure 3.20.

Figure 3.20: A set that is neither open nor closed

K is no longer closed: for instance, the point a = (0, −1) belongs to R2 − K, but no open ball
about a is contained entirely within R2 − K. At the same time, K is not open either: the point
a = (0, 1) belongs to K, but no open ball about a stays within K.

www.dbooks.org
66 CHAPTER 3. REAL-VALUED FUNCTIONS: PRELIMINARIES

3.5 Continuity
We are ready for continuity. Intuitively, the idea is that a function f is continuous at a point a if
limx→a f (x) = f (a). That is, as x gets close to a, f (x) gets close to f (a). We make this precise by
expressing the requirement in terms of open balls.
Definition. Let U be an open set in Rn , and let f : U → R be a real-valued function. We say that
f is continuous at a point a of U if, given any open ball B(f (a), ) about f (a), there exists an
open ball B(a, δ) about a such that:
f (B(a, δ)) ⊂ B(f (a), ).
In other words, if x ∈ B(a, δ), then f (x) ∈ B(f (a), ) (Figure 3.21). Equivalently, writing out the
definition of open ball, f is continuous at a if, given any  > 0, there exists a δ > 0 such that:
if kx − ak < δ, then |f (x) − f (a)| < .

Figure 3.21: f maps the open ball B(a, δ) about a into the open ball (f (a) − , f (a) + ) about
f (a).

The definition says that f is continuous at a if you can guarantee that f (x) will be as close
to f (a) as you want (“within ”) by making sure that x is close enough to a (“within δ”). The
strategy for proving that a function is continuous is similar formally to proving that a set is open.
There, one starts with a point a and tries to find a radius r. Here, one starts with an  and tries
to find a δ.
It may seem that using the open ball B(f (a), ) in the definition is an unnecessarily confusing
way to write the interval (f (a) − , f (a) + ), but we have in mind the generalization of continuity
to vector-valued functions in Chapter 6. The definition phrased in terms of open balls extends
naturally to the more general case, as we shall see.
Definition. A function f : U → R is simply called continuous if it is continuous at every point
of its domain U .
Example 3.18. Let f : R2 → R be defined by f (x, y) = x. In other words, f is the projection of
the xy-plane onto the x-axis. Is f a continuous function?
Let a be a point of R2 . To get a sense of what answer to expect, we consider limx→a f (x).
(Of course, we haven’t defined rigorously what a limit is yet, so this is meant to be completely
informal.) Write x = (x, y) and a = (c, d). Then:
lim f (x) = lim f (x, y) = lim x = c.
x→a (x,y)→(c,d) (x,y)→(c,d)
3.5. CONTINUITY 67

At the same time, f (a) = f (c, d) = c, which agrees. Thus we expect intuitively that f is continuous
at a.
To prove this rigorously, let  > 0 be given. We want to find a radius δ so that the open ball
B (c, d), δ is projected inside the open interval B(f (c, d), ) = B(c, ) = (c − , c + ). From the
geometry of the projection, taking δ =  works. See Figure 3.22. That is, if (x, y) ∈ B((c, d), ),
then its x-coordinate satisfies c −  < x < c + . In other words, c −  < f (x, y) < c + , or
f (x, y) ∈ (c − , c + ). Therefore δ =  works in the definition of continuity.
Actually, δ could be anything less than , too, but the definition requires only that we find one
δ that works, not all of them.

Figure 3.22: For the projection, choosing δ =  works.

Since f is continuous at every point a of R2 , it is a continuous function.

Example 3.19. Let f (x, y) = x2xy +y 2


for all (x, y) 6= (0, 0). Is it possible to assign a value to f (0, 0)
so that f is continuous at (0, 0)? See Figure 3.23.

xy
Figure 3.23: The graph z = x2 +y 2
: can it be made continuous at (0, 0)?

Say we set f (0, 0) = c. First, the intuitive approach: we want to know if it’s possible to choose
c so that:
lim f (x, y) = c.
(x,y)→(0,0)

To get a feel for this, we try approaching the origin in a couple of different ways. For instance,

www.dbooks.org
68 CHAPTER 3. REAL-VALUED FUNCTIONS: PRELIMINARIES
x·0
suppose that we approach along the x-axis. Then f (x, 0) = x2 +02
= 0, so:

lim f (x, 0) = lim 0 = 0.


(x,0)→(0,0) (x,0)→(0,0)

x·x
On the other hand, if we approach along the line y = x, then f (x, x) = x2 +x2
= 12 , so:

1 1
lim f (x, x) = lim = .
(x,x)→(0,0) (x,x)→(0,0) 2 2

Thus f (x, y) can approach different values depending on how you approach the origin. As a result,
intuitively, the limit does not exist, so f cannot be continuous regardless of what c is.
To prove this rigorously, we need to articulate what it means for the criterion for continuity to
fail. The definition says that a function is continuous if, for every , there exists a δ. The negation
of this is that there is some  for which there is no δ. We try to find such a bad .
Our intuitive calculations above showed that there are points arbitrarily close to the origin
where f = 0 and other points arbitrarily close where f = 21 . We choose  so that these values of f
cannot both be within  of f (0, 0) = c.
For this, let  = 81 . Then B(f (0, 0), ) = B(c, 81 ) = (c − 18 , c + 18 ), an interval of length 14 . As just
noted, in any open ball B((0, 0), δ) about the origin, there will be points (x, 0) where f (x, 0) = 0
and points (x, x) where f (x, x) = 12 . It’s impossible to fit both these values inside an interval of
length 14 . So for  = 18 , there is no δ > 0 such that f (B((0, 0), δ) ⊂ (c − 81 , c + 18 ). Thus f is not
continuous at (0, 0) no matter what the value of f (0, 0) is.
The δ-definition of continuity is awkward and difficult to work with. One of its virtues, how-
ever, is that, not only can it be used to verify that a function is continuous, but it also gives an
unambiguous condition for proving that a function is not. That is the moral of the preceding
example.
3 3
Example 3.20. In the same spirit, let f (x, y) = xx2 +y+y 2
for all (x, y) 6= (0, 0). The graph of f is
shown in Figure 3.24. Is it possible to assign a value to f (0, 0) so that f is continuous at (0, 0)?

x3 +y 3
Figure 3.24: The graph z = x2 +y 2

x3 +y 3
As before, intuitively, in order to be continuous, f (0, 0) should equal lim(x,y)→(0,0) x2 +y 2
, if the
limit exists. We try approaching the origin along various lines:
x3 +03
Along the x-axis f (x, 0) = x2 +02
= x → 0 as x → 0
03 +y 3
Along the y-axis f (0, y) = 02 +y2 = y → 0 as y → 0
3 3 x3 3
Along the line y = mx f (x, mx) = xx2 +m
+m2 x2
= 1+m
1+m2
x→0 as x → 0
3.6. SOME PROPERTIES OF CONTINUOUS FUNCTIONS 69

We get a consistent answer of 0, but this does not prove that the limit is 0. We have not
exhausted all possible ways of approaching the origin, and being close to the origin is different
from being close to it along any individual curve. Nevertheless, the evidence suggests that choosing
f (0, 0) = 0 might make f continuous, and this gives us something concrete to shoot for.
So set f (0, 0) = 0, and let  > 0 be given. We want to find a δ > 0 such that, if k(x, y)−(0, 0)k <
δ, then |f (x, y) − 0| < . This is satisfied automatically when (x, y) = (0, 0) for any value of δ since
3 3
f (0, 0) = 0, so we assume (x, y) 6= (0, 0) and look for a δ such that, if k(x, y)k < δ, then xx2 +y
+y 2
< .
x3 +y3
To hunt for a connection between the quantities k(x, y)k and x2 +y2 , we introduce polar coor-
p
dinates: x = r cos θ and y = r sin θ, where r = x2 + y 2 = k(x, y)k. See Figure 3.25.

Figure 3.25: Polar coordinates r and θ

Then, if (x, y) 6= (0, 0):


x3 + y 3 r3 (cos3 θ + sin3 θ)
= = r (cos3 θ + sin3 θ).
x2 + y 2 r2
Since | cos3 θ + sin3 θ| ≤ 2, this implies that:
3
x + y3

p
= |r (cos3 θ + sin3 θ)| ≤ 2r = 2 x2 + y 2 = 2k(x, y)k.
x2 + y 2
3 3
Thus if k(x, y)k < a, then xx2 +y
+y 2
≤ 2k(x, y)k < 2a.
So let δ = 2 . With this choice, if k(x, y)k < δ, then |f (x, y)| < 2δ = , i.e., f maps B((0, 0), δ)
inside (−, ). In other words, given any  > 0, we’ve found a δ > 0 that satisfies the definition of
continuity, namely, δ = 2 . Therefore setting f (0, 0) = 0 makes f continuous at (0, 0).

3.6 Some properties of continuous functions


The definition of continuity makes for unimpeachable arguments, but, as we learn more about
continuous functions, we might prefer to build on what we learn and not have to start from scratch
with the definition every time. We present a couple of general properties that are especially useful.
These results are behind the working principle that most functions that look continuous really are
continuous. One need not bring out the ’s and δ’s.
Proposition 3.21. Let f, g : U → R be real-valued functions defined on an open subset U of Rn .
If f and g are continuous at a point a of U , then so are:

 f + g,

cf for any scalar c,
f g, the product of f and g,
f

g , the quotient, assuming that g(a) 6= 0.

www.dbooks.org
70 CHAPTER 3. REAL-VALUED FUNCTIONS: PRELIMINARIES

Before proving this result, let’s apply it to some examples.

Example 3.22. We showed earlier in Example 3.18 that the projection onto the x-axis f : R2 → R,
f (x, y) = x, is continuous. Likewise, the projection onto the y-axis g : R2 → R, g(x, y) = y, is
continuous. It then follows immediately from the proposition that functions like x + y, 3x, xy,
x2 = x · x, x3 − 4x2 y 2 + 5y, and, at points other than (0, 0), x2xy
+y 2
are all continuous.

Proof. We prove the statement about the sum f + g and leave the rest for the exercises. (See
Exercises 7.4–7.7.) For this, we want to argue that, as x gets close to a, f (x) + g(x) gets close
to f (a) + g(a). Since f (x) and g(x) get close to f (a) and g(a) individually, this conclusion seems
clear, and it’s just a matter of feeding the intuition into the formal definition. The key ingredient
in the argument below is the “triangle inequality.”
Let  > 0 be given. We need to come up with a δ > 0 such that f (x) + g(x) ∈ B(f (a) + g(a), )
whenever x ∈ B(a, δ), in other words, a δ such that:

|(f (x) + g(x)) − (f (a) + g(a))| <  whenever x ∈ B(a, δ).

Note that:
|(f (x) + g(x)) − (f (a) + g(a))| = |(f (x) − f (a)) + (g(x) − g(a))|. (3.3)

The triangle inequality says that, for any real numbers r and s, |r + s| ≤ |r| + |s|. This seems
reasonable, but it has a generalization to Rn that is important enough that we discuss it separately
in the next section. Accepting that it is true for the time being, we find from equation (3.3) that:

|(f (x) + g(x)) − (f (a) + g(a))| ≤ |f (x) − f (a)| + |g(x) − g(a)|. (3.4)

This separates the f contribution and the g contribution into terms that can be manipulated
independently.
Since f is continuous at a, taking the positive quantity 2 as the given input in the definition
of continuity, there is a δ1 > 0 such that |f (x) − f (a)| < 2 whenever x ∈ B(a, δ1 ). Similarly, there
is a δ2 > 0 such that |g(x) − g(a)| < 2 whenever x ∈ B(a, δ2 ). Let δ = min{δ1 , δ2 }. Then the last
two inequalities remain true for all x in B(a, δ), and substituting them into (3.4) gives:

(f (x) + g(x)) − (f (a) + g(a)) <  +  = 



whenever x ∈ B(a, δ).
2 2
This shows that f + g is continuous at a.

Proposition 3.23. Compositions of continuous functions are continuous. That is, let U be an
open set in Rn and V an open set in R, and let f : U → R and g : V → R be functions such
that f (x) ∈ V for all x in U . (This assumption guarantees that the composition g ◦ f : U → R is
defined.) Then, if f and g are continuous, so is g ◦ f .

Again, we look at some examples first.

Example 3.24. We accept without comment that the familiar functions of one variable from first-

p includes |x|, x,
n
year calculus that were said to be continuous there really are continuous. This
sin x, cos x, ex , and ln x. The proposition then implies that functions like x2 + y 2 , sin(x
p + y),
and exy are continuous. For example, the first is the composition (x, y) 7→ x2 + y 2 7→ x2 + y 2 ,
which is a composition of continuous steps.
3.7. THE CAUCHY-SCHWARZ AND TRIANGLE INEQUALITIES 71

Proof. Let a be a point of U , and let  > 0 be given. We want to find an open ball  about a that
is mapped by g ◦ f inside the interval B(g(f (a)), ) = g(f (a)) − , g(f (a)) +  . We work our
way backwards from g(f (a)) to a using the definition of continuity twice to find such a ball. The
ingredients are depicted in Figure 3.26.
First, since g is continuous at the point f (a), there exists δ 0 > 0 such that:

g(B(f (a), δ 0 )) ⊂ B(g(f (a)), ).

But then since f is continuous at a, treating δ 0 as the given input, there exists δ > 0 such that:

f (B(a, δ)) ⊂ B(f (a), δ 0 ).

Linking these two steps:

(g ◦ f )(B(a, δ)) = g(f (B(a, δ))) ⊂ g(B(f (a), δ 0 )) ⊂ B(g(f (a)), ).

Thus given an , we’ve found a δ that works.

Figure 3.26: The composition of continuous functions: given an , a δ 0 exists (g is continuous), then
a δ exists (f is continuous).

3.7 The Cauchy-Schwarz and triangle inequalities


We now discuss two fundamental inequalities about vectors in Rn , one involving the dot product
and the other involving vector addition.
Theorem 3.25 (Cauchy-Schwarz inequality).

|v · w| ≤ kvk kwk

for all v, w in Rn .
Proof. Recall that v · w = kvk kwk cos θ, where θ is the angle between v and w. Thus since
| cos θ| ≤ 1, we have |v · w| = kvk kwk | cos θ| ≤ kvk kwk.

From this follows a generalization of the property of R that was alluded to in the proof of
Proposition 3.21.
Theorem 3.26 (Triangle inequality).

kv + wk ≤ kvk + kwk

for all v, w in Rn .

www.dbooks.org
72 CHAPTER 3. REAL-VALUED FUNCTIONS: PRELIMINARIES

Proof. It’s equivalent to square both sides and prove that kv + wk2 ≤ (kvk + kwk)2 . But:

kv + wk2 = (v + w) · (v + w)
=v·v+v·w+w·v+w·w
= kvk2 + 2 v · w + kwk2 . (3.5)

Also, v · w ≤ |v · w| ≤ kvk kwk, where the second inequality is Cauchy-Schwarz. Substituting this
into (3.5) gives:
kv + wk2 ≤ kvk2 + 2kvk kwk + kwk2 = (kvk + kwk)2 .

The connection with triangles is illustrated in Figure 3.27.

Figure 3.27: The geometry of the triangle inequality: the length kv + wk of one side of a triangle
is less than or equal to the sum kvk + kwk of the lengths of the other two sides.

3.8 Limits
We used our intuition about limits as a way to think about continuity. Now that continuity has
been defined rigorously, we turn the tables and use continuity to give a formal definition of limits.
One technical, though important, point is that, in determining how f (x) behaves as x ap-
proaches a, what is happening right at a does not matter. The value of f (a), or even whether f is
defined there, is irrelevant.
Definition. Let U be an open set in Rn , and let a be a point of U . If f is a real-valued function
that is defined on U , except possibly at the point a, then we say that limx→a f (x) exists if there is
a number L such that the function fe: U → R defined by:
(
f (x) if x 6= a,
fe(x) =
L if x = a

is continuous at a. When this happens, we write limx→a f (x) = L.


By carefully applying the definition of continuity to the function fe, the definition of limit can
be restated using ’s and δ’s, which is its more customary form. Namely, limx→a f (x) = L means:
Given any  > 0, there exists a δ > 0 such that |f (x) − L| <  whenever kx − ak < δ,
except possibly when x = a.
We shall never use this formulation, however.
If f is defined at a, then it’s basically a tautology from these definitions that limx→a f (x) = f (a)
if and only if f is continuous at a, thus completing the intuitive connection between the two concepts
with which we began.
3.9. EXERCISES FOR CHAPTER 3 73
3 3
Example 3.27. In Example 3.20, we showed that the function given by f (x, y) = xx2 +y +y 2
, which
is defined for (x, y) 6= (0, 0), could be made continuous at (0, 0) by setting f (0, 0) = 0. (Strictly
speaking, this extended function is the one called fe in the definition.) Hence by definition:

x3 + y 3
lim = 0.
(x,y)→(0,0) x2 + y 2

Similarly, in Example 3.19, for the function f (x, y) = x2xy +y 2


, there is no value that can be
assigned to f (0, 0) that makes f continuous at (0, 0). In this case:
xy
lim does not exist.
(x,y)→(0,0) x2 + y2

Because of the close connection between the two concepts, properties of continuous functions
have analogues for limits. Here is the version of Proposition 3.21 for limits.
Proposition 3.28. Assume that limx→a f (x) and limx→a g(x) both exist. Then:
  
1. limx→a f (x) + g(x) = limx→a f (x) + limx→a g(x) .
 
2. limx→a cf (x) = c limx→a f (x) for any scalar c.
  
3. limx→a f (x)g(x) = limx→a f (x) limx→a g(x) .
f (x) limx→a f (x)
4. limx→a g(x) = limx→a g(x) , provided that limx→a g(x) 6= 0.

We leave the proofs for the exercises (Exercise 8.1).

3.9 Exercises for Chapter 3


Section 1 Graphs and level sets
For the functions f (x, y) in Exercises 1.1–1.7: (a) sketch the cross-sections of the graph z =
f (x, y) with the coordinate planes, (b) sketch several level curves of f , labeling each with the
corresponding value of c, and (c) sketch the graph of f .

1.1. f (x, y) = x2 + y 2

1.2. f (x, y) = x2 + y 2 + 1

1.3. f (x, y) = y − x2

1.4. f (x, y) = x + y

1.5. f (x, y) = |x| + |y|

1.6. f (x, y) = xy
y
1.7. f (x, y) = x2 +1

1.8. The graph of the function f (x, y) = x3 − 3xy 2 is called a monkey saddle. You will need to
bring one along whenever you invite a monkey to go riding with you. The level sets of f are
not easy to sketch directly, but it is still possible to get a reasonable idea of what the graph
looks like.

www.dbooks.org
74 CHAPTER 3. REAL-VALUED FUNCTIONS: PRELIMINARIES
√ √
(a) Sketch the level set corresponding to c = 0. (Hint: x3 − 3xy 2 = x(x + 3y)(x − 3y).)
(b) Draw the region of the xy-plane in which f (x, y) > 0 and the region in which f (x, y) < 0.
(Hint: Part (a) might help.)
(c) Use the information from parts (a) and (b) to make a rough sketch of the monkey saddle.

1.9. It is possible to construct an origami approximation of the saddle surface z = x2 − y 2 of


Example 3.3 by folding a piece of paper. The resulting model is called a hypar. (The
saddle surface itself is known as a hyperbolic paraboloid.) See Figure 3.28. Construct your
own hypar by starting with a square piece of paper and following the instructions appended
to the end of the chapter (page 80).2

Figure 3.28: A hypar

Section 2 More surfaces in R3


In Exercises 2.1–2.4, sketch the given surface in R3 .

2.1. x2 + y 2 + z 2 = 4

2.2. x2 + y 2 = 4

2.3. x2 + z 2 = 4
y2 z2
2.4. x2 + 9 + 4 =1

In Exercises 2.5–2.9, sketch the level sets corresponding to the indicated values of c for the given
function f (x, y, z) of three variables. Make a separate sketch for each individual level set.
2.5. f (x, y, z) = x2 + y 2 + z 2 , c = 0, 1, 2

2.6. f (x, y, z) = x2 + y 2 , c = 0, 1, 2

2.7. f (x, y, z) = x2 + y 2 − z, c = −1, 0, 1

2.8. f (x, y, z) = x2 + y 2 − z 2 , , c = −1, 0, 1

2.9. f (x, y, z) = x2 − y 2 − z 2 , , c = −1, 0, 1

2.10. Can the saddle surface z = x2 − y 2 in R3 be described as a level set? If so, how? What about
the parabola y = x2 in R2 ?

2
For more information about hypars, see http://erikdemaine.org/hypar/.
3.9. EXERCISES FOR CHAPTER 3 75

Section 3 The equation of a plane

3.1. Find an equation of the plane through the point p = (1, 2, 3) with normal vector n = (4, 5, 6).

3.2. Find an equation of the plane through the point p = (2, −1, 3) with normal vector n =
(1, −4, 5).

3.3. Find a point that lies on the plane x + 3y + 5z = 9 and a normal vector to the plane.

3.4. Find a point that lies on the plane x + z = 1 and a normal vector to the plane.

3.5. Are the planes x − y + z = 8 and 2x + y − z = −1 perpendicular? Are they parallel?

3.6. Are the planes x − y + z = 8 and 2x − 2y + 2z = −1 perpendicular? Are they parallel?

3.7. Find an equation of the plane that contains the points p = (1, 0, 0), q = (0, 2, 0), and
r = (0, 0, 3).

3.8. Find an equation of the plane that contains the points p = (1, 1, 1), q = (2, −1, 2), and
r = (−1, 2, 0).

3.9. Find an equation of the plane that contains the point p = (1, 1, 1) and the line parametrized
by α(t) = (1, 0, 1) + t (2, 3, −4).

3.10. Find a parametrization of the line of intersection of the planes x + y + z = 3 and x − y + z = 1.

3.11. What point on the plane x − y + 2z = 3 is closest to the point q = (2, 1, −1)? (Hint: Find a
parametrization of an appropriate line through q.)

3.12. Two planes in R3 are perpendicular to each other. Their line of intersection is described by
the parametric equations:

x = 2 − t, y = −1 + t, z = −4 + 2t.

If one of the planes has equation 2x + 4y − z = 4, find an equation for the other plane.

3.13. Consider the plane x + y + z = 1.

(a) Sketch the cross-sections with the three coordinate planes.


(b) Sketch and describe in words the portion of the plane that lies in the “first octant” of
R3 , that is, the part where x ≥ 0, y ≥ 0, and z ≥ 0.

3.14. Sketch and describe some of the level sets of the function f (x, y, z) = x + y + z.

3.15. Let π1 and π2 be parallel planes in R3 given by the equations:

π1 : Ax + By + Cz = D1 and π2 : Ax + By + Cz = D2 .

(a) If p1 and p2 are points on π1 and π2 , respectively, show that:

n · (p2 − p1 ) = D2 − D1 ,

where n = (A, B, C) is a normal vector to the two planes.

www.dbooks.org
76 CHAPTER 3. REAL-VALUED FUNCTIONS: PRELIMINARIES

(b) Show that the perpendicular distance d between π1 and π2 is given by:

|D2 − D1 |
d= √ .
A2 + B 2 + C 2

(c) Find the perpendicular distance between the planes x + y + z = 1 and x + y + z = 5.

3.16. Let α be a path in R3 that has constant torsion τ = 0. You may assume that α0 (t) 6= 0 and
T0 (t) 6= 0 for all t so that the Frenet vectors of α are always defined.

(a) Prove that the binormal vector B is constant.


(b) Fix a time t = t0 , and let x0 = α(t0 ). Show that the function

f (t) = B · (α(t) − x0 )

is a constant function. What is the value of the constant?


(c) Prove that α(t) lies in a single plane for all t. In other words, curves that have constant
torsion 0 must be planar. (Hint: Use part (b) to identify a point and a normal vector
for the plane in which the path lies.)

Section 4 Open sets


In Exercises 4.1–4.7, let U be the set of points (x, y) in R2 that satisfy the given conditions.
Sketch U , and determine whether it is an open set. Your arguments should be at a level of rigor
comparable to those given in the text.

4.1. 0 < x < 1 and 0 < y < 1

4.2. |x| < 1 and |y| < 1

4.3. |x| ≤ 1 and |y| < 1

4.4. x2 + y 2 = 1

4.5. 1 < x2 + y 2 < 4

4.6. y > x

4.7. y ≥ x

4.8. Is R2 − {(0, 0)}, the plane with the origin removed, an open set in R2 ? Justify your answer.

4.9. Prove that the union of any collection of open sets in Rn is open. That is, if {Uα } is a
n
S
collection of open sets in R , then α Uα is open.

Section 5 Continuity

5.1. Let f : R2 → R be the function:


x3 y
(
x2 +y 2
if (x, y) 6= (0, 0),
f (x, y) =
0 if (x, y) = (0, 0).

(a) Let a be a positive real number. Show that, if k(x, y)k < a, then |f (x, y)| < a2 .
3.9. EXERCISES FOR CHAPTER 3 77

(b) Use the δ-definition of continuity to determine whether f is continuous at (0, 0).

5.2. Consider the function f (x, y) defined by:

x2 y 4
f (x, y) = if (x, y) 6= (0, 0).
(x2 + y 4 )2
Determine what happens to the value of f (x, y) as (x, y) approaches the origin along:

(a) the x-axis,


(b) the y-axis,
(c) the line y = mx,
(d) the parabola x = y 2 .
(e) Is it possible to assign a value to f (0, 0) so that f continuous at (0, 0)? Justify your
answer using the δ-definition of continuity.
−y 2 2
5.3. Let f (x, y) = xx2 +y 2 if (x, y) 6= (0, 0). Is it possible to assign a value to f (0, 0) so that f

continuous at (0, 0)? Justify your answer using the δ-definition.


3 3
5.4. Let f (x, y) = xx2+2y
+y 2
if (x, y) 6= (0, 0). Is it possible to assign a value to f (0, 0) so that f
continuous at (0, 0)? Justify your answer using the δ-definition.

Section 6 Some properties of continuous functions

6.1. Let a = (1, 2). Show that the function f : R2 → R given by f (x) = a · x is continuous.

In Exercises 5.1–5.2, you determined the continuity of the following two functions at (0, 0).
Now, consider points (x, y) other than (0, 0). Determine all such points at which f is continuous.
x3 y
6.2. f (x, y) = x2 +y 2
if (x, y) 6= (0, 0), f (0, 0) = 0
x2 y 4
6.3. f (x, y) = (x2 +y 4 )2
if (x, y) 6= (0, 0)

6.4. Let U be an open set in Rn , and let f : U → R be a function that is continuous at a point a
of U .

(a) If f (a) > 0, show that there exists an open ball B = B(a, r) centered at a such that
f (x) > f (a) f (a)
2 for all x in B. (Hint: Let  = 2 .)
(b) Similarly, if f (a) < 0, show that there exists an open ball B = B(a, r) such that
f (x) < f (a)
2 for all x in B.

In particular, if f (a) 6= 0, there is an open ball B centered at a throughout which f (x) has
the same sign as f (a).

6.5. Let U be the set of all points (x, y) in R2 such that y > sin x, that is:

U = {(x, y) ∈ R2 : y > sin x}.

Prove that U is an open set in R2 . (Hint: Apply the previous exercise to an appropriate
continuous function.)

www.dbooks.org
78 CHAPTER 3. REAL-VALUED FUNCTIONS: PRELIMINARIES

Section 7 The Cauchy-Schwarz and triangle inequalities

7.1. Let v and w be vectors in Rn .

(a) Show that kvk − kwk ≤ kv − wk. (Hint: v = (v − w) + w.)



(b) Show that kvk − kwk ≤ kv − wk.

7.2. (a) Let a be a point of Rn . If r > 0, show that B(a, r) ⊂ B(0, kak + r). Draw a picture that
illustrates the result in R2 . (Hint: To prove a set inclusion of this type, you must show
that, if x ∈ B(a, r), then x ∈ B(0, kak+r), that is, if kx−ak < r, then kx−0k < kak+r.)
(b) In R2 , if a ∈ B((0, 0), 1) and r = 1 − kak, show that B(a, r) ⊂ B((0, 0), 1). (Recall that
this inclusion was used in Example 3.12 to prove that B((0, 0), 1) is an open set.)

7.3. Let a be a point in Rn , and let r be a positive real number. Prove that the open ball B(a, r)
is an open set in Rn .

Exercises 7.4–7.7 provide the missing proofs of parts of Proposition 3.21 about properties of
continuous functions. Here, U is an open set in Rn , f, g : U → R are real-valued functions defined
on U , and a is a point of U .

7.4. If f is continuous at a and c is a scalar, prove that cf is continuous at a as well. (Hint: One
approach is to consider the two cases c = 0 and c 6= 0 separately. Another is to start by
showing that |cf (x) − cf (a)| ≤ (|c| + 1) |f (x) − f (a)|.)

7.5. If f is continuous at a, prove that there exists a δ > 0 such that, if x ∈ B(a, δ), then
|f (x)| < |f (a)| + 1. (Hint: Use Exercise 7.1.)

7.6. If f and g are continuous at a, prove that their product f g is continuous at a. (Hint: Use
an “add zero” trick and the previous exercise to show
 that there
is a δ1 > 
0 such that, if
x ∈ B(a, δ1 ), then f (x)g(x)−f (a)g(a) ≤ |f (a)|+1 g(x)−g(a) + |g(a)|+1 f (x)−f (a) .)

7.7. Assume that f and g are continuous at a and that g(a) 6= 0.

(a) Prove that g1 is continuous at a. (Hint: Use Exercise 6.4 to show that there exists a
1 1
2
δ1 > 0 such that, if x ∈ B(a, δ1 ), then g(x) − g(a) ≤ |g(a)| 2 |g(x) − g(a)|.)

f f
(b) Prove that the quotient g is continuous at a. (Hint: g = f · g1 .)

7.8. Let f : U → R be a real-valued function defined on an open set U in Rn , and let g : U → R


be given by g(x) = |f (x)|.

(a) If f is continuous at a point a of U , is g necessarily continuous at a as well? Either give


a proof or find a counterexample.
(b) Conversely, if g is continuous at a, is f as well? Again, give a proof or find a counterex-
ample.
3.9. EXERCISES FOR CHAPTER 3 79

Section 8 Limits

8.1. Prove parts 1 and 3 of Proposition 3.28 about limits of sums and products. (Hint: By using
the corresponding properties of continuous functions in Proposition 3.21, you should be able
to avoid ’s and δ’s entirely.)

8.2. Let U be an open set in Rn , x0 a point of U , and f a real-valued function defined on U ,


except possibly at x0 , such that limx→x0 f (x) = L. Let I be an open interval in R, and let
α : I → U be a continuous path in U such that α(t0 ) = x0 for some t0 in I. Consider the
function g : I → R given by:
(
f (α(t)) if α(t) 6= x0 ,
g(t) =
L if α(t) = x0 .

Show that limt→t0 g(t) = L.

8.3. (Sandwich principles.) Let U be an open set in Rn , and let a be a point of U .

(a) Let f, g, h : U → R be real-valued functions such that f (x) ≤ g(x) ≤ h(x) for all x in
U . If f (a) = h(a)—let’s call the common value c—and if f and h are continuous at a,
prove that g(a) = c and that g is continuous at a as well.
(b) Let f, g, h be real-valued functions defined on U , except possibly at the point a, such
that f (x) ≤ g(x) ≤ h(x) for all x in U , except possibly when x = a. If limx→a f (x) =
limx→a h(x) = L, prove that limx→a g(x) = L, too.

www.dbooks.org
80 CHAPTER 3. REAL-VALUED FUNCTIONS: PRELIMINARIES

Addendum: Folding a hypar (Exercise 1.9)


1. Take your square piece of paper, and fold and unfold along each of the diagonals.

Step 1

2. Turn the paper over.


3. Fold the bottom edge of the paper to the center point, but only crease the part between the
diagonals. Unfold back to the square.

Step 3 Step 4

4. Repeat for the other three sides.


5. Now, fold the bottom edge up to the upper crease line that you just made, 3/4 of the way
up, again only creasing the part between the diagonals. This is a fairly short crease. Unfold
back to the square.

Step 5 Step 6

6. Fold the bottom edge to the lower crease line (the one you made in step 3), once again only
creasing between the diagonals. Unfold back to the square.
7. Repeat for all four sides. At this point, the square should be divided into four concentric
square rings. See the figure on the next page.
8. Turn the square over.
9. The goal now is to create additional creases halfway between the ones constructed so far, for
a total of eight concentric square rings. You can do this by folding the bottom edge up to
previously constructed folds, but use only every other fold. That is, fold the bottom edge
up to the existing creases 7/8, 5/8, 3/8, and 1/8 of the way to the top, again only creasing
between the diagonals. Do this for all four sides.
3.9. EXERCISES FOR CHAPTER 3 81

Step 7

Step 9

10. Turn the paper over. The creases should alternate as you move from one concentric square
to the next: mountain, valley, mountain, valley, . . . .

11. Now, try to fold all the creases. It’s easiest to start from the outer ring and work your way in,
pinching in at the corners and trying to compress adjacent faces of the ring flat as you move
towards the center. Half of the short diagonal creases between the rings need to be reversed
so that the rings nest inside each other, though the paper can be coaxed to do this without
too much trouble. In the end, you should get something shaped like an X, or a four-armed
starfish.

12. Open up the arms of the starfish. The paper will naturally assume the hypar shape.

www.dbooks.org
82 CHAPTER 3. REAL-VALUED FUNCTIONS: PRELIMINARIES
Chapter 4

Real-valued functions: differentiation

We now extend the notion of derivative from real-valued functions of one variable, as in first-year
calculus, to real-valued functions of n variables f : U → R, where U is an open set in Rn . The
most natural choice would be to copy the one-variable definition verbatim and let the derivative
at a point a be limx→a f (x)−f
x−a
(a)
. Unfortunately, this involves dividing by a vector, which does not
make sense.
One potential remedy would be to consider limx→a f (x)−f (a)
kx−ak instead. Here, it turns out that
the limit fails to exist even for very simple functions. For example, for the function f (x) = x of one
x−a
variable, this would be limx→a |x−a| which approaches +1 from the right and −1 from the left. So
we must look harder to find a formulation of the one-variable derivative that can be generalized.

4.1 The first-order approximation


In first-year calculus, we think of a function f as being differentiable at a point a if it has a good
tangent line approximation:
f (x) ≈ `(x) when x is near a,
where y = `(x) = mx + b is the tangent line to the graph of f at the point (a, f (a)). See Figure
4.1.

Figure 4.1: Approximating a differentiable function f (x) by its tangent line `(x)

In fact, the tangent line is given by:

`(x) = f (a) + f 0 (a)(x − a), (4.1)

so m = f 0 (a) and b = f (a) − f 0 (a) · a.

83

www.dbooks.org
84 CHAPTER 4. REAL-VALUED FUNCTIONS: DIFFERENTIATION

By definition, f 0 (a) = limx→a f (x)−f


x−a
(a)
, assuming the limit exists. This can be rewritten as
f (x)−f (a) f (x)−f (a)−f 0 (a)(x−a)
− f 0 (a) = limx→a

0 = limx→a x−a x−a , or, using equation (4.1), as:

f (x) − `(x)
lim = 0. (4.2)
x→a x−a
In particular, when x is near a, f (x) − `(x) must be much much smaller than x − a in order for
f (x)−`(x)
x−a to be near 0. Note that f (x) − `(x) is the error in using the tangent line to approximate
f , so, in order for f to be differentiable at a, not only must this error go to 0 as x approaches a, it
must go to 0 much faster than x − a. This is the principle we generalize in moving to functions of
more than one variable.

Definition. A function ` : Rn → Rm is called an affine function if it has the form `(x) = T (x)+b,
where:

• T : Rn → Rm is a linear transformation and


• b is a vector in Rm .

Let U be an open set in Rn and f : U → R a real-valued function. Provisionally, we’ll say


that f is differentiable at a point a of U if there exists a real-valued affine function ` : Rn → R,
`(x) = T (x) + b, such that:
f (x) − `(x)
lim = 0. (4.3)
x→a kx − ak

This is the generalization of equation (4.2).


We can pin down the best choice of ` much more precisely. For instance, as ` is meant to
approximate f near a, it should be true that f (a) = `(a), that is, f (a) = T (a) + b. Thus b =
f (a) − T (a), and `(x) = T (x) + f (a) − T (a). Using the fact that T is linear, this becomes:

`(x) = f (a) + T (x − a).

This happens to have the same form as the tangent line (4.1), which seems like a good sign.
Substituting into equation (4.3), we say, still provisionally, that f is differentiable at a if there
exists a linear transformation T : Rn → R such that:
f (x) − f (a) − T (x − a)
lim = 0. (4.4)
x→a kx − ak

There is really only one choice for T as well. For,  as a linear transformation,
 T is represented
by a matrix, in this case, a 1 by n matrix A = m1 m2 . . . mn , where mj = T (ej ) for
j = 1, 2, . . . , n. (See Proposition 1.7.) We shall determine the values of the entries mj .
Suppose that x approaches a in the x1 -direction, i.e., let x = a + he1 , and let h go to 0. Then
x − a = he1 , and kx − ak = |h|. Assume for a moment that h > 0. In order for equation (4.4) to
hold, it must be true that:

f (a + he1 ) − f (a) − T (he1 )


lim = 0;
h→0 h
or, using the linearity of T :

f (a + he1 ) − f (a) − hT (e1 )


lim = 0. (4.5)
h→0 h
4.1. THE FIRST-ORDER APPROXIMATION 85

If h < 0, then the denominator in equation (4.4) is |h| = −h. The minus sign can be factored out
and canceled, so equation (4.5) still holds. This can be solved for T (e1 ):

f (a + he1 ) − f (a)
m1 = T (e1 ) = lim
h→0 h
f (a1 + h, a2 , . . . , an ) − f (a1 , a2 , . . . , an )
= lim .
h→0 h

This limit of a difference quotient is the definition of the ordinary one-variable derivative where
x2 , . . . , xn are held fixed at x2 = a2 , . . . , xn = an , and only x1 varies. It is called the partial
∂f
derivative ∂x 1
(a).
∂f
Similarly, by approaching a in the x2 , . . . , xn directions, we find that m2 = ∂x2 (a), . . . , mn =
h i
∂f ∂f ∂f ∂f
∂xn (a) and hence that A = ∂x1 (a) ∂x2 (a) . . . ∂xn (a) .

Definition. For a real-valued function f : U → R defined on an open set U in Rn and a point a in


U:

1. If j = 1, 2, . . . , n, the partial derivative of f at a with respect to xj is defined by:

∂f f (a + hej ) − f (a)
(a) = lim .
∂xj h→0 h

Note that a + hej = (a1 , . . . , aj + h, . . . , an ), so a + hej and a differ only in the jth coordinate.
Thus the partial derivative is defined by the one-variable difference quotient for the derivative
with variable xj .
Other common notations for the partial derivative are fxj (a) and (Dj f )(a).
h i
∂f ∂f ∂f
2. Df (a) is defined to be the 1 by n matrix Df (a) = ∂x1 (a) ∂x2 (a) . . . ∂xn (a) . The
∂f ∂f ∂f
Rn is called the gradient of

corresponding vector ∇f (a) = ∂x 1
(a), ∂x 2
(a), . . . , ∂x n
(a) in
f . (The intent of the notation might be clearer if these objects were denoted by (Df )(a) and
(∇f )(a), respectively, but the extra parentheses are usually omitted.)

Example 4.1. Let f (x, y) = x3 + 2x2 y − 3y 2 , and let a = (2, 1). To find ∂f ∂x (a), we fix y = 1 and
∂f
differentiate with respect to x. Since f (x, 1) = x + 2x − 3, this gives ∂x (x, 1) = 3x2 + 4x and
3 2

then ∂f
∂x (2, 1) = 12 + 8 = 20.
Similarly, for the partial derivative at a with respect to y, f (2, y) = 8 + 8y − 3y 2 , so ∂f
∂y (2, y) =
8 − 6y and ∂f∂y (2, 1) = 8 − 6 = 2.
In practice, this is not how partial derivatives are calculated usually. Instead, one works directly
with the general formula for f . For instance, to find the partial derivative with respect to x,
differentiate with respect to x thinking of all other variables—in this case, only y—as constant:
∂f 2 2 ∂f ∂f 2
∂x = 3x + 4xy − 0 = 3x + 4xy, so ∂x (2, 1) = 12 + 8 = 20 as before. Likewise, ∂y = 2x − 6y, and
∂f
∂y (2, 1) = 8 − 6 = 2.
 
In any case, Df (a) = 20 2 , and ∇f (a) = (20, 2).

We now have all the ingredients needed to state the formal definition of differentiability, as
motivated by equation (4.4).

www.dbooks.org
86 CHAPTER 4. REAL-VALUED FUNCTIONS: DIFFERENTIATION

Definition. Let U be an open set in Rn , f : U → R a real-valued function, and a a point of U .


Then f is said to be differentiable at a if:
f (x) − f (a) − Df (a) · (x − a)
lim = 0.
x→a kx − ak
When this happens, the matrix Df (a) is called the derivative of f at a. It is also known as the
Jacobian matrix. The affine function `(x) = f (a)+Df (a)·(x−a) is called the first-order affine
approximation of f at a. (Note the resemblance to the one-variable tangent line approximation
(4.1).)
The product in the numerator of the definition is the matrix product:
 
x1 − a1
h i  x2 − a2 
∂f ∂f ∂f
Df (a) · (x − a) = ∂x (a) (a) . . . (a) .
 
1 ∂x2 ∂xn  ..
 . 
xn − an
It could also be written as a dot product: ∇f (a) · (x − a).
The definition generalizes equation (4.2) for one variable in that it says that f is differentiable
at a if the error in using the first-order approximation `(x) goes to 0 faster than kx − ak as x
approaches a. In particular, the first-order approximation is a good approximation. This is worth
emphasizing: the existence of a good first-order approximation is often the most important way to
think of what it means for a function to be differentiable. Note that the derivative Df (a) itself is
no longer just a number the way it is in first-year calculus, but rather a matrix or, even better, the
linear part of a good affine approximation of f near a.
Example 4.2. Let f : R2 → R be f (x, y) = x2 + y 2 , and let a = (1, 2). Is f differentiable at a?
We calculate the various elements that go into the definition. First, ∂f∂x = 2x and
∂f
= 2y, so
 ∂y 
    x−1
Df (x, y) = 2x 2y and Df (a) = Df (1, 2) = 2 4 . Also, f (a) = 5, and x − a = . Thus
y−2
we need to check if:  
2 2
  x−1
x +y −5− 2 4
y−2
lim p = 0.
(x,y)→(1,2) (x − 1)2 + (y − 2)2
Let’s simplify the numerator, completing the square at an appropriate stage:
 
2 2
 x−1
= x2 + y 2 − 5 − 2(x − 1) + 4(y − 2)
 
x +y −5− 2 4
y−2
= x2 + y 2 − 5 − (2x − 2 + 4y − 8)
= x2 − 2x + y 2 − 4y + 5
= (x2 − 2x + 1) − 1 + (y 2 − 4y + 4) − 4 + 5
= (x − 1)2 + (y − 2)2 .
Therefore:
 
 x−1
x2 y2

+ −5− 2 4
y−2 (x − 1)2 + (y − 2)2
lim p = lim p
(x,y)→(1,2) (x − 1)2 + (y − 2)2 (x,y)→(1,2) (x − 1)2 + (y − 2)2
p
= lim (x − 1)2 + (y − 2)2 .
(x,y)→(1,2)
4.1. THE FIRST-ORDER APPROXIMATION 87

The function in this last expression is continuous, as it is a composition of sums and products of
continuous pieces. Thus the value of the limit is simply the value of the function at the point (1, 2).
In other words:
 
2 2
  x−1
x +y −5− 2 4
y−2 p
lim p = (1 − 1)2 + (2 − 2)2 = 0.
(x,y)→(1,2) (x − 1)2 + (y − 2)2

This shows that the answer is: yes, f is differentiable at a = (1, 2).

In addition, the preceding calculations show that the first-order approximation of f (x, y) =
x2 + y 2 at a = (1, 2) is:

`(x, y) = f (a) + Df (a) · (x − a)


 
  x−1
=5+ 2 4
y−2
= 2x + 4y − 5. (4.6)

For instance, (x, y) = (1.05, 1.95) is near a, and f (1.05, 1.95) = 1.052 + 1.952 = 4.905 while
`(1.05, 1.95) = 2(1.05) + 4(1.95) − 5 = 4.9. The values are pretty close.
The same sort of reasoning can be used to show that, more generally, the function f (x, y) =
x + y 2 is differentiable at every point of R2 . This is Exercise 1.16.
2

(
xy
2 2 if (x, y) 6= (0, 0),
Example 4.3. Let f (x, y) = x +y . Its graph is shown in Figure 4.2. Is f
0 if (x, y) = (0, 0)
differentiable at a = (0, 0)?

xy
Figure 4.2: The graph z = x2 +y 2

Using the definition requires knowing the partial derivatives at (0, 0). We could apply the
quotient rule to the formula that defines f , but the result would be valid only when (x, y) 6= (0, 0),
which is precisely what we don’t need. So instead we go back to the basic idea behind partial
derivatives, namely, that they are one-variable derivatives where all variables but one are held
constant. For example, to find ∂f ∂x (0, 0), fix y = 0 in f (x, y) and look at the resulting function of
x. By the formula for f , f (x, 0) = x2x·0+02
= 0 if x 6= 0. This expression also holds when x = 0:
by definition, f (0, 0) = 0. Thus f (x, 0) = 0 for all x. Differentiating this with respect to x gives
∂f d
∂x (x, 0) = dx 0 = 0, whence ∂f ∂f
∂x (0, 0) = 0. A symmetric calculation results in ∂y (0, 0) = 0. Hence
 
Df (0, 0) = 0 0 .

www.dbooks.org
88 CHAPTER 4. REAL-VALUED FUNCTIONS: DIFFERENTIATION
   
x−0 x p
Moreover, x − a = = , and kx − ak = x2 + y 2 . Thus, according to the definition
y−0 y
of differentiability, we need to check if:
 
xy   x
x2 +y 2
−0− 0 0
y
lim p = 0,
(x,y)→(0,0) 2
x +y 2

xy
i.e., if lim(x,y)→(0,0) (x2 +y 2 )3/2 = 0.

For this, suppose we approach the origin along the line y = x. Then we are looking at the
x·x x2 1
behavior of (x2 +x 2 )3/2 = 23/2 |x|3 = 23/2 |x| . This blows up as x goes to 0. It certainly doesn’t
xy
approach 0. Hence there’s no way that lim(x,y)→(0,0) (x2 +y 2 )3/2 can equal 0 (in fact, the limit does

not exist), so f is not differentiable at (0, 0).

This example shows that a function of more than one variable can fail to be differentiable at a
point even though all of its partial derivatives exist there. This differs from the one-variable case.

4.2 Conditions for differentiability


We now look for ways to tell if a function is differentiable without having to do all the work of
going through the definition. As we saw in the last example, calculating the partial derivatives is
not enough. That’s bad news, because calculating partial derivatives is pretty easy. The good news
is that we are about to obtain results that show that the examples in the last section could have
been solved much more quickly using means other than the definition.
For the remainder of this section:
U denotes an open set in Rn , f : U → R is a real-valued function defined on U , and
a is a point of U .
We won’t keep repeating this.
For instance, in Example 4.3 above, we asked if the function
(
xy
2 2 if (x, y) 6= (0, 0),
f (x, y) = x +y (4.7)
0 if (x, y) = (0, 0)

were differentiable at (0, 0). In fact, we had shown before that f is not continuous at (0, 0) (see
Example 3.19 in Chapter 3). In first-year calculus, discontinuous functions cannot be differentiable.
If the same were true for functions of more than one variable, we could have saved ourselves a lot
of work. We state the result in contrapositive form.

Proposition 4.4. If f is differentiable at a, then f is continuous at a.

Proof. We just outline the idea and leave the details for the exercises (Exercise 2.3). If f is
differentiable at a, then limx→a (f (x) − `(x)) = 0, where `(x) is the first-order approximation at a.
In other words:
lim (f (x) − f (a) − Df (a) · (x − a)) = 0.
x→a

The matrix product term Df (a) · (x − a) goes to 0 as x approaches a (it’s a continuous function of
x), so the last line becomes limx→a (f (x) − f (a) − 0) = 0. Hence limx→a f (x) = f (a). As a result,
f is continuous at a.
4.3. THE MEAN VALUE THEOREM 89

Example 4.5. Returning to function (4.7) above, we could have simply said from the start that f
is not continuous at (0, 0) and therefore is not differentiable there either, avoiding the definition of
differentiability entirely.
Conversely, we know from first-year calculus that a function can be continuous without being
differentiable. The absolute value function f (x) = |x| is the standard example. It is continuous,
but, because its graph has a corner, it is not differentiable at the origin. Sometimes, we can use
the failure of a one-variable derivative to tell us about the differentiability of a function of more
than one variable.
∂f
Proposition 4.6. If some partial derivative ∂xj does not exist at a, then f is not differentiable at
a.
∂f
Proof. This follows simply because if ∂x j
(a) doesn’t exist, then neither does Df (a), so the condition
for differentiability cannot be satisfied.
2
Example 4.7. p One analogue of the absolute value function is the function f : R → R, f (x, y) =
2 2
k(x, y)k = x + y . We have seen that its graph is a cone with vertex at the origin (Example
3.2). √
To find ∂f
∂x at the origin, we fix y = 0 to get f (x, 0) = x2 = |x|. The derivative with respect to
∂f
x does not exist at x = 0, so, by definition, ∂x (0, 0) does not exist either. Thus f is not differentiable
at (0, 0) (though it is continuous there).
We have saved the most useful criterion for differentiability for last. To understand why it is
true, however, we take a slight detour.

4.3 The mean value theorem


We devote this section to reviewing an important, and possibly underappreciated, result from
first-year calculus.
Theorem 4.8 (Mean Value Theorem). Let f : [a, b] → R be a continuous function that is differen-
tiable on the open interval (a, b). Then there exists a point c in (a, b) such that:
f (b) − f (a) = f 0 (c) · (b − a).
The theorem relates the difference in function values f (b) − f (a) to the difference in domain
values b − a. The relation can be useful even if one cannot completely pin down the location of the
point c (in fact, one rarely tries). For instance, if f (x) = sin x, then |f 0 (c)| = | cos c| ≤ 1 regardless
of what c is. Hence the mean value theorem gives | sin b − sin a| ≤ |b − a| for all a, b. By the same
reasoning, | cos b − cos a| ≤ |b − a| as well.
There is an n-variable version of the mean value theorem that follows from results we prove
later (see Exercise 5.5), but for now we show how the one-variable theorem can be used to study a
difference in function values for a function of two variables by allowing only one variable to change
at a time and using partial derivatives.
Example 4.9. Let f (x, y) be a continuous real-valued function of two variables that is defined
on an open ball B in R2 and whose partial derivatives ∂f ∂f
∂x and ∂y exist at every point of B. Let
a = (a1 , a2 ) and b = (b1 , b2 ) be points of B. We show that there exist points p and q in B such
that kp − ak ≤ kb − ak, kq − ak ≤ kb − ak, and:
∂f ∂f
f (b) − f (a) = (p) · (b1 − a1 ) + (q) · (b2 − a2 ). (4.8)
∂x ∂y

www.dbooks.org
90 CHAPTER 4. REAL-VALUED FUNCTIONS: DIFFERENTIATION

Figure 4.3: Going from a to b by way of c

See Figure 4.3.


To prove this, we introduce the intermediate point c = (b1 , a2 ) and think of going from a to b
by first going from a to c and then from c to b, as in Figure 4.3. Only one variable varies along
each of the segments, so by writing

f (b) − f (a) = (f (b) − f (c)) + (f (c) − f (a)), (4.9)

the mean value theorem applies to each of the differences on the right.
That is, in the second difference, using x as variable and y = a2 fixed, the mean value theorem
implies that there exists a point p = (p1 , a2 ) on the line segment between a and c such that:

∂f
f (c) − f (a) = (p) · (b1 − a1 ). (4.10)
∂x
Similarly, for the first difference, using y as variable and x = b1 fixed, there exists a point q = (b1 , q2 )
on the line segment between c and b such that:

∂f
f (b) − f (c) = (q) · (b2 − a2 ). (4.11)
∂y
∂f ∂f
Substituting (4.10) and (4.11) into equation (4.9) results in f (b) − f (a) = ∂y (q) · (b2 − a2 ) + ∂x (p) ·
(b1 − a1 ), which is the desired conclusion (4.8).

4.4 The C 1 test


We present the result that often gives the simplest way to show that a function is differentiable. It
uses partial derivatives.

Definition. Let U be an open set in Rn . A real-valued function f : U → R all of whose partial


∂f
derivatives ∂x , ∂f , . . . , ∂x
1 ∂x2
∂f
n
are continuous on U is said to be of class C 1 , or continuously
differentiable, on U .

Theorem 4.10 (The C 1 test). If f is of class C 1 on U , then f is differentiable at every point of


U.

Proof. We give the proof in the case of two variables, that is, U ⊂ R2 . This contains the main new
idea, and we leave for the reader the generalization to n variables, which is a relatively straightfor-
ward extension.
4.4. THE C 1 TEST 91

We use the definition of differentiability. Let a be a point of U . We first analyze the error
f (x) − f (a) − Df (a) · (x − a) in the first-order approximation of f at a. Since U is an open set and
we are interested only in what happens when x is near a, we may assume that x lies in an open
ball about a that is contained in U . Say a = (c, d) and x = (x, y). By the result we just obtained
in Example 4.9, there are points p and q such that kp − ak ≤ kx − ak, kq − ak ≤ kx − ak, and:
∂f ∂f
f (x) − f (a) = (p) · (x − c) + (q) · (y − d).
∂x ∂y
Also:
h i x − c
∂f ∂f
Df (a) · (x − a) = ∂x (a) ∂y (a)
y−d
∂f ∂f
= (a) · (x − c) + (a) · (y − d).
∂x ∂y
Hence:
∂f ∂f  ∂f ∂f 
f (x) − f (a) − Df (a) · (x − a) = (p) − (a) · (x − c) + (q) − (a) · (y − d),
∂x ∂x ∂y ∂y
so:
|f (x) − f (a) − Df (a) · (x − a)|
kx − ak
∂f ∂f  ∂f ∂f 
| ∂x (p) − ∂x (a) · (x − c) + ∂y (q) − ∂y (a) · (y − d)|
=
kx − ak

∂f ∂f |x − c| ∂f ∂f |y − d|
≤ (p) −
(a) ·
+ (q) −
(a) · ,
∂x ∂x kx − ak ∂y ∂y kx − ak

|x−c| (x−c)2
where the last step uses the triangle inequality. Note that kx−ak = √ 2 2
≤ 1. Similarly,
(x−c) +(y−d)
|y−d|
kx−ak ≤ 1. Therefore:

|f (x) − f (a) − Df (a) · (x − a)| ∂f ∂f ∂f ∂f
≤ (p) − (a) + (q) −
(a) . (4.12)
kx − ak ∂x ∂x ∂y ∂y
At last, we bring in the assumption that f has continuous partial derivatives. Since kp − ak ≤
kx − ak and kq − ak ≤ kx − ak, it follows that, as x approaches a, so do p and q. By the continuity
of the partial derivatives, this means that ∂f ∂f ∂f ∂f
∂x (p) approaches ∂x (a) and ∂y (q) approaches ∂y (a).
Hence, in the limit as x goes to a, the right side of (4.12) approaches 0, and therefore so must the
left. By definition, f is differentiable at a.

Example 4.11. In Example 4.2, we showed that the function f : R2 → R, f (x, y) = x2 + y 2 , is


differentiable at the point a = (1, 2) by using the definition of differentiability. It is much simpler to
note that ∂f ∂f 2 1
∂x = 2x and ∂y = 2y are both continuous on R . Hence, by the C test, f is differentiable
at every point of R2 .
Example 4.12. The other example we looked at carefully is the function
(
xy
2 2 if (x, y) 6= (0, 0),
f (x, y) = x +y
0 if (x, y) = (0, 0)

www.dbooks.org
92 CHAPTER 4. REAL-VALUED FUNCTIONS: DIFFERENTIATION

of (4.7). We saw that f is not differentiable at the origin. On the other hand, at points other than
the origin, the partial derivatives can be found using the quotient rule. This gives:
∂f (x2 + y 2 ) · y − xy · 2x y 3 − x2 y y(y 2 − x2 )
= = =
∂x (x2 + y 2 )2 (x2 + y 2 )2 (x2 + y 2 )2
∂f (x2 + y 2 ) · x − xy · 2y x(x2 − y 2 )
and = = .
∂y (x2 + y 2 )2 (x2 + y 2 )2
As algebraic combinations of x and y, both of these partial derivatives are continuous on the open
set R2 − {(0, 0)}. As a result, the C 1 test says that f is differentiable at all points other than the
origin.

4.5 The Little Chain Rule


In one-variable calculus, the derivative measures the rate of change of a function. When there is
more than one variable, the derivative is no longer just a number, but it still has something to say
about rates of change.
Let U be an open set in Rn , and let f : U → R be a real-valued function defined on U . Suppose
that α : I → U is a path in U defined on an interval I in R. The composition f ◦ α : I → R is a
real-valued function of one variable that describes how f behaves along the path. We examine the
rate of change of this composition.
Let t0 be a point of I, and let a = α(t0 ). If f is differentiable at a, then its first-order
approximation applies: f (x) ≈ f (a) + ∇f (a) · (x − a) or:

f (x) − f (a) ≈ ∇f (a) · (x − a) when x is near a.

Here, we use the gradient form of the derivative.


Assume in addition that α is differentiable at t0 . Then α is continuous at t0 , so, when t is near
t0 , α(t) is near α(t0 ). Substituting x = α(t) and a = α(t0 ) into the previous approximation gives:

f (α(t)) − f (α(t0 )) ≈ ∇f (α(t0 )) · (α(t) − α(t0 )) when t is near t0 .

Thus f (α(t))−f
t−t0
(α(t0 ))
≈ ∇f (α(t0 ))· α(t)−α(t
t−t0
0)
. In the limit as t approaches t0 , the left side approaches
the one-variable derivative (f ◦ α) (t0 ) and the right approaches ∇f (α(t0 )) · α0 (t0 ). Writing t in
0

place of t0 , this gives the following result.


Theorem 4.13 (Little Chain Rule). Let U be an open set in Rn , α : I → U a path in U defined
on an open interval I in R, and f : U → R a real-valued function. If α is differentiable at t and f
is differentiable at α(t), then f ◦ α is differentiable at t and:

(f ◦ α)0 (t) = ∇f (α(t)) · α0 (t).

In words, the derivative of the composition is “gradient dot velocity.” While we outlined the
idea why this is true, a complete proof of the result as stated entails a more careful treatment of
the approximations involved. The details appear in the exercises (see Exercise 5.6).

4.6 Directional derivatives


We present two interpretations of the gradient that follow from the Little Chain Rule. The first
concerns the rate of change of a function of n variables in a given direction. Let a be a point
4.6. DIRECTIONAL DERIVATIVES 93

of an open set U in Rn . To represent a direction going away from a, we choose a unit vector u
pointing in that direction. We travel in that direction on the line through a parallel to u, which
is parametrized by α(t) = a + tu. The parametrization has velocity α0 (t) = u and hence constant
speed kuk = 1.
If f : U → R is a real-valued function, then the derivative (f ◦ α)0 (t) is the rate of change of f
along this line. In particular, since α(0) = a, we think of (f ◦ α)0 (0) as the rate of change at a in
the direction of u.
Definition. Given the input described above, (f ◦ α)0 (0) is called the directional derivative of
f at a in the direction of the unit vector u, denoted (Du f )(a). If v is any nonzero vector in
v
Rn , we define the directional derivative in the direction of v to be (Du f )(a), where u = kvk is the
unit vector in the direction of v.
Directional derivatives are easy to compute, for by the Little Chain Rule:

(f ◦ α)0 (0) = ∇f (α(0)) · α0 (0) = ∇f (a) · u.

In other words:
Proposition 4.14. If f is differentiable at a and u is a unit vector, then the directional derivative
is given by:
(Du f )(a) = ∇f (a) · u.
As a result, (Du f )(a) = k∇f (a)k kuk cos θ = k∇f (a)k cos θ, where θ is the angle between
∇f (a) and the direction vector u. Keeping a fixed but varying the direction, we see that (Du f )(a)
is maximized when θ = 0, i.e., when u points in the same direction as ∇f (a), and that, in this
direction, (Du f )(a) = k∇f (a)k.
Corollary 4.15. The maximum directional derivative of f at a:
• occurs in the direction of ∇f (a) and
• has a value of k∇f (a)k.
Sometimes, weather reports mention the “temperature gradient.” Usually, this does not refer to
the gradient vector that we have been studying but rather to one of the aspects of the gradient in
the corollary, that is, the magnitude and/or direction of the maximum rate of temperature increase.
Example 4.16. Let f : R3 → R be f (x, y, z) = x2 + y 2 + z 2 , and let a = (1, 2, 3).
(a) Find the directional derivative of f at a in the direction of v = (2, −1, 1).
(b) Find the direction and rate of maximum increase at a.
(a) First, ∇f = (2x, 2y, 2z), so ∇f (a) = (2, 4, 6). Next, the unit vector in the direction of v is
1
u = kvk v = √16 (2, −1, 1). Thus:

(Du f )(a) = ∇f (a) · u


1 1 6 √
= (2, 4, 6) · √ (2, −1, 1) = √ (4 − 4 + 6) = √ = 6.
6 6 6
(b) The direction of maximum increase is the direction of ∇f (a) = (2, 4, 6), that is, the direction of
1
the unit vector u = √4+16+36 (2, 4, 6) = √156 (2, 4, 6) = √114 (1, 2, 3). The rate of maximum increase
√ √
is k∇f (a)k = 56 = 2 14.

www.dbooks.org
94 CHAPTER 4. REAL-VALUED FUNCTIONS: DIFFERENTIATION

4.7 ∇f as normal vector


Let f : U → R be a differentiable real-valued function of n variables defined on an open set U in
Rn , and let S be the level set of f corresponding to f = c. Let a be a point of S, and let v be a
vector tangent to S at a. We take this to mean that v can be realized as the velocity vector of a
path in S, that is, there is a differentiable path α : I → Rn that:

• lies in S, i.e., α(t) ∈ S for all t,


• passes through a, say a = α(t0 ) for some t0 in I, and
• has velocity v at a, i.e., α0 (t0 ) = v.

This is illustrated in Figure 4.4. Then f (α(t)) = c for all t, so (f ◦ α)0 (t) = c0 = 0. By the

Figure 4.4: A level set S of f and a vector v tangent to it

Little Chain Rule, this can be written ∇f (α(t)) · α0 (t) = 0. In particular, when t = t0 , we obtain
∇f (a) · v = 0. This is true for all vectors v tangent to S at a, which has the following consequence.

Proposition 4.17. If a is a point on a level set S of f , then ∇f (a) is a normal vector to S at a.

Example 4.18. Find an equation for the plane tangent to the graph z = x2 + y 2 at the point
(1, 2, 5).
As with the equation of any plane, it suffices to find a point p on the plane and a normal vector
n. As a point, we can use the point of tangency, p = (1, 2, 5). To find a normal vector, we rewrite
the graph as a level set. That is, rewrite z = x2 + y 2 as x2 + y 2 − z = 0. Then the graph is the level
of the function of three variables F (x, y, z) = x2 + y 2 − z corresponding to c = 0. In particular,
∇F (p) is a normal vector to the level set, that is, to the graph, at p, so it is normal to the tangent
plane as well. See Figure 4.5.
The rest is calculation: ∇F = (2x, 2y, −1), so n = ∇F (p) = (2, 4, −1) is our normal vector.
Substituting into the equation n · x = n · p for a plane gives:

(2, 4, −1) · (x, y, z) = (2, 4, −1) · (1, 2, 5) = 2 + 8 − 5 = 5,


or 2x + 4y − z = 5.

Note that we just studied the graph of the function f (x, y) = x2 + y 2 of two variables, and
we considered the point p = (1, 2, 5) = (a, f (a)) on the graph, where a = (1, 2). In the remarks
following Example 4.2, we saw that the first-order approximation of f at a is given by `(x, y) =
2x + 4y − 5 (see equation (4.6)). The graph of the approximation ` is given by z = 2x + 4y − 5,
which is exactly the tangent plane we just found.
4.7. ∇F AS NORMAL VECTOR 95

Figure 4.5: The tangent plane to z = x2 + y 2 at p = (1, 2, 5)

In other words, geometrically, the approximation we get by using the tangent plane at (a, f (a))
to approximate the graph coincides with the first-order approximation at a. This extends a familiar
idea from first-year calculus, where tangent lines are used to obtain the first-order approximation
of a differentiable function. In fact, it is the idea we used to motivate our multivariable definition
of differentiability in the first place.

Example 4.19. Suppose that T (x, y) represents the temperature at the point (x, y) in an open
set U of the plane. There are two natural families of curves associated with this function.
The first is the curves along which the temperature is constant. They are called isotherms
and are the level sets of T : all points on a single isotherm have the same temperature. See Figure
4.6.3

Figure 4.6: Isotherms are level curves of temperature. Heat travels along curves that are always
orthogonal to the isotherms.

The second is the curves along which heat flows. The principle here is that, at every point, heat
travels in the direction in which the temperature is decreasing as rapidly as possible, from hot to
cold.
From our work on directional derivatives, we know that the maximum rate of decrease occurs
in the direction of −∇T . And we’ve just seen that ∇T is always normal to the level sets, the
isotherms. In other words, the isotherms and the curves of heat flow are mutually orthogonal
families of curves. If you have a map of the isotherms, you can visualize the flow of heat: always
move at right angles to the isotherms.
3
Figure taken from https://www.weather.gov/images/jetstream/synoptic/ll analyze temp soln3.gif by National
Weather Service, public domain.

www.dbooks.org
96 CHAPTER 4. REAL-VALUED FUNCTIONS: DIFFERENTIATION

4.8 Higher-order partial derivatives


Once they start partial differentiating, most people can’t stop.

Example 4.20. Let f (x, y) = x3 − 3x2 y 4 + exy . Then:

∂f
= 3x2 − 6xy 4 + yexy
∂x
∂f
and = −12x2 y 3 + xexy .
∂y
Both of these are again real-valued functions of x and y, so they have partial derivatives of their
own. In the case of ∂f
∂x :

∂2f
 
∂ ∂f
= = 6x − 6y 4 + y 2 exy
∂x2 ∂x ∂x
∂2f
 
∂ ∂f
and = = −24xy 3 + exy + xyexy .
∂y ∂x ∂y ∂x
∂f
Similarly, for ∂y :

∂2f
 
∂ ∂f
= = −24xy 3 + exy + xyexy
∂x ∂y ∂x ∂y
∂2f
 
∂ ∂f
and = = −36x2 y 2 + x2 exy .
∂y 2 ∂y ∂y

These are the second-order partial derivatives. They too have partial derivatives, and the
process could continue indefinitely to obtain partial derivatives of higher and higher order.

Perhaps the most striking part of the preceding calculation is that the two partial derivatives
∂2f ∂2f
∂y ∂x and ∂x ∂y turned out to be equal. They are called the mixed partials. To see whether
they might be equal in general, we return to the definition of a partial derivative as a limit of
difference quotients and try to describe the second-order partials in terms of the original function
f . In fact, what we are about to discuss is a method one might use to numerically approximate
the second-order partials.
∂2f ∂ ∂f
 ∂f
Let a = (c, d). First, consider ∂x ∂y = ∂x ∂y . The function being differentiated is ∂y . It is
being differentiated with respect to x, so we let x vary and hold y constant:
∂f ∂f
∂2f ∂y (c + h, d) − ∂y (c, d)
 
∂ ∂f 
(c, d) = (c, d) ≈ .
∂x ∂y ∂x ∂y h

The approximation is meant to hold when h is small. The terms in the numerator are both partials
of f with respect to y, so in each case we differentiate f , letting y vary and holding x constant:
f (c+h,d+k)−f (c+h,d) f (c,d+k)−f (c,d)
∂2f k − k
(c, d) ≈
∂x ∂y h
f (c + h, d + k) − f (c + h, d) − f (c, d + k) + f (c, d)
≈ (4.13)
hk
when h and k are small.
4.8. HIGHER-ORDER PARTIAL DERIVATIVES 97

This expression is symmetric enough in h and k that perhaps you are willing to believe that
you would get the same thing when the order of differentiation is reversed. But if you are nervous
∂2f
about it, here is the parallel calculation for ∂y ∂x :

∂2f
 
∂ ∂f 
(c, d) = (c, d)
∂y ∂x ∂y ∂x
∂f ∂f
∂x (c, d + k) − ∂x (c, d)

k
f (c+h,d+k)−f (c,d+k) f (c+h,d)−f (c,d)
h − h

k
f (c + h, d + k) − f (c, d + k) − f (c + h, d) + f (c, d)
≈ . (4.14)
hk
Indeed, comparing the approximations shows that they are the same, so:

∂2f ∂2f
(c, d) ≈ (c, d).
∂x ∂y ∂y ∂x

A proper proof that the two sides are actually equal requires tightening up the approximations that
are involved. We leave the details for the exercises (Exercise 8.7), but the main idea is to use the
mean value theorem to show that the quotient Q = f (c+h,d+k)−f (c+h,d)−f
hk
(c,d+k)+f (c,d)
common to
approximations (4.13) and (4.14) is equal on the nose to the values of the mixed partials at nearby
points, that is:
∂2f ∂2f
(p) = Q = (q), (4.15)
∂x ∂y ∂y ∂x
where p and q are points that are close to a but depend on h and k. As h and k go to 0, p and q
approach a. If the mixed partials were known to be continuous at a, then, in the limit, equation
(4.15) becomes:
∂2f ∂2f
(a) = (a).
∂x ∂y ∂y ∂x
Thus the mixed partials are equal provided one invokes a requirement of continuity. In fact, there
are functions for which the mixed partials are not equal, so some sort of restriction of this type is
unavoidable.
For functions of more than two variables, the second-order mixed partials treat all but two vari-
ables as fixed, so we remain essentially in the two-variable case. In other words, if the mixed partials
are equal for functions of two variables, they are also equal for more than two. We summarize the
discussion with the following result.

Theorem 4.21 (Equality of mixed partials). Let f : U → R be a real-valued function defined on


an open set U in Rn , all of whose first and second-order partial derivatives are continuous on U .
(Such a function is said to be of class C 2 .) Then:

∂2f ∂2f
=
∂xi ∂xj ∂xj ∂xi

at all points of U and for all i, j.

www.dbooks.org
98 CHAPTER 4. REAL-VALUED FUNCTIONS: DIFFERENTIATION
4
Example 4.22. Find ∂x ∂∂yf2 ∂z if f (x, y, z) = x2 + y 3 + sin(xz) (cos x − xz 5 ).


The order of differentiation doesn’t matter, so we may as well choose one that simplifies the
bookkeeping. Differentiating with respect to x or z would involve the product rule, and who wants
that, so let’s differentiate first with respect to y: ∂f 2 5
∂y = 3y (cos x − xz ). The next choices don’t
∂2f
make much difference, but let’s try differentiating with respect to z: ∂z ∂y = −15xy 2 z 4 . Then, with
∂3f
respect to x: ∂x ∂z ∂y = −15y 2 z 4 . And lastly with respect to y again:

∂4f
= −30yz 4 .
∂y ∂x ∂z ∂y
∂4f
By the equality of mixed partials, this is the same as the originally requested ∂x ∂y 2 ∂z
.

4.9 Smooth functions


The C 1 test states that functions with continuous partial derivatives are differentiable. The converse
is false, however. There are plenty of functions that are differentiable but don’t have continuous
partials, though actually writing one down takes some determination. In the one-variable case,
f (x) = x2 sin(1/x) is an example. If f (0) is defined to be 0, then f is differentiable at a = 0, but f 0
is not continuous there. Similarly, there are functions whose first-order partials are continuous but
whose second-order partials are not. There is an example in the exercises (Exercise 8.6), though
again it is somewhat contrived to make the point.
Most of the differentiable functions that we work with have continuous partial derivatives of
all possible orders. Such functions are said to be of class C ∞ . We refer to them simply as
smooth. We shall usually take them as the default and state our results involving derivatives for
smooth functions. We may not actually need all those higher-order partial derivatives to draw the
conclusion we are trying to reach, but keeping track of the degree of differentiability that is really
necessary in every case is not something we want to dwell on.
In addition, so far we have defined continuity and differentiability only for functions defined on
open sets. This is so that every point in the domain can be approached within the domain from
all possible directions, making the corresponding limiting processes easier to handle. Sometimes,
however, we are interested in domains that are not open. If D is a subset of Rn , not necessarily
open, we say that a function f : D → R is differentiable at a point a in D if it agrees with
a differentiable function near a, that is, there is an open ball B containing a and a function
g : B → R such that g(x) = f (x) for all x in B ∩ D and g is differentiable at a in the sense
previously defined. We make the analogous modification for smoothness. For continuity, we keep
the definition as before but restrict our attention to points where f is defined: f is continuous at a
if, given any open ball B(f (a), ) about f (a), there exists an open ball B(a, δ) about a such that
f (B(a, δ) ∩ D) ⊂ B(f (a), ). We won’t dwell on these points either. In most concrete examples, f
is defined by a well-behaved formula that applies to an open set larger than the domain in which
we happen to be interested, so there isn’t a serious problem.

4.10 Max/min: critical points


In first-year calculus, one of the featured applications of the derivative is finding the largest and/or
smallest values that a function attains and where it attains them. Whenever one wants to do
something in an optimal way—for instance, the most productive or the least costly—and the
objective can be modeled by a smooth function, calculus is likely to be of great use. We discuss
4.10. MAX/MIN: CRITICAL POINTS 99

this for functions of more than one variable. This is a big subject that overflows with applications.
Our treatment is not comprehensive. Instead, we have selected a couple of topics that illustrate
some of the similarities and differences that pertain to moving to the multivariable case.
Definition. Let f : U → R be a function defined on an open set U in Rn . f is said to have a local
maximum at a point a of U if there is an open ball B(a, r) centered at a such that f (a) ≥ f (x)
for all x in B(a, r). It is said to have a local minimum at a if instead f (a) ≤ f (x) for all x in
B(a, r). See Figure 4.7. It has a global maximum or global minimum at a if the associated
inequalities are true for all x in U , not just in B(a, r).

Figure 4.7: Local maximum (left) and local minimum (right) at a

If f is smooth and has a local maximum or local minimum at a, then it has a local maximum or
local minimum in every direction, in particular in each of the coordinate directions. The derivatives
in these directions are the partial derivatives. They are essentially one-variable derivatives, so from
∂f
first-year calculus, at a local maximum or minimum, all of them must be 0, that is, ∂x j
(a) = 0 for
j = 1, 2, . . . , n.
∂f
Definition. A point a is called a critical point of f if ∂xj (a) = 0 for j = 1, 2, . . . , n. Equivalently,
 
Df (a) = 0 0 . . . 0 , or ∇f (a) = 0.
The preceding discussion proves the following.
Proposition 4.23. Let U be an open set in Rn , and let f : U → R be a smooth function. If f has
a local maximum or a local minimum at a, then a is a critical point of f .
Hence if one is looking for local maxima or minima of a smooth function on an open set, the
critical points are the only possible candidates. This is just like the one-variable case. Conversely,
as in the one-variable case, not every critical point is necessarily a local max or min, though there
are new ways in which this failure can occur. A good example to keep in mind are functions whose
graphs are saddles, such as f (x, y) = x2 − y 2 . At a saddle point, the function has a local maximum
in one direction and a local minimum in another, so there is no open ball about the point in which
it is exclusively one or the other.
Even so, in first-year calculus, a given critical point can often be classified as a local maximum
or minimum using the second derivative. There is an analogue of this for more than one variable
that is similar in spirit while reflecting the greater number of possibilities. We discuss the situation
for two variables.
Let U be an open set in R2 , and let f : U → R be a smooth function. If a ∈ U , consider the
matrix of second-order partials:
" 2
∂2f
#
∂ f
∂x2
(a) ∂y ∂x (a)
H(a) = ∂ 2 f ∂2f .
∂x ∂y (a) ∂y 2
(a)

www.dbooks.org
100 CHAPTER 4. REAL-VALUED FUNCTIONS: DIFFERENTIATION

It is called the Hessian matrix of f at a. By the equality of mixed partials, the two off-diagonal
terms are actually equal.
Theorem 4.24 (Second derivative test for functions of two variables). Let a be a critical point of
f such that det H(a) 6= 0. Under this assumption, a is called a nondegenerate critical point.
1. If det H(a) > 0 and:
∂2f
(a) if ∂x2
(a) > 0, then f has a local minimum at a,
∂2f
(b) if ∂x2
(a) < 0, then f has a local maximum at a.

2. If det H(a) < 0, then f has a saddle point at a.


If det H(a) = 0, the test is inconclusive.
We discuss the ideas behind the test in the next section, but, before that, we present some
examples. The first involves three simple functions for which we already know the answers, but it
is meant to illustrate how the test works and confirm that it gives the right results. In a way, these
three functions are prototypes of the general situation.
Example 4.25. Find all critical points of the following functions and, if possible, classify each as
a local maximum, local minimum, or saddle point.
(a) f (x, y) = x2 + y 2
(b) f (x, y) = −x2 − y 2
(c) f (x, y) = x2 − y 2
Intuitively, it’s clear that the first two of these attain their minimum and maximum values,
respectively, at the origin, while, as noted earlier, the third has a saddle point there. The graphs
of the functions are shown in Figure 4.8.

Figure 4.8: Prototypical nondegenerate critical points at (0, 0): f (x, y) = x2 + y 2 (left), f (x, y) =
−x2 − y 2 (middle), f (x, y) = x2 − y 2 (right)

The second derivative test reaches these conclusions as follows.


(a) To find the critical points, we set ∂f ∂f
∂x = 2x = 0 and ∂y = 2y = 0. The only solution is x = 0
and y = 0. Hence the only critical point is (0, 0). To classify its type, we look at the second-order
2 ∂2f ∂2f ∂2f
partials: ∂∂xf2 = 2, ∂x ∂y = ∂y ∂x = 0, and ∂y 2 = 2. Therefore:
 
2 0
H(0, 0) = .
0 2
2
Since det H(0, 0) = 4 > 0 and ∂∂xf2 (0, 0) = 2 > 0, the second derivative test implies that f has a
local minimum at (0, 0), as expected.
4.10. MAX/MIN: CRITICAL POINTS 101

(b) By the same calculation, (0, 0) is the only critical point, only this time:
 
−2 0
H(0, 0) = .
0 −2
2
Hence det H(0, 0) = 4 > 0 and ∂∂xf2 (0, 0) = −2 < 0, so f has a local maximum at (0, 0).
(c) Again, (0, 0) is the only critical point, but:
 
2 0
H(0, 0) = .
0 −2

Since det H(0, 0) = −4 < 0, f has a saddle point at (0, 0).


Example 4.26. We repeat for the function f (x, y) = x3 + 8y 3 − 3xy. A portion of the graph of f
is shown in Figure 4.9.

Figure 4.9: The graph of f (x, y) = x3 + 8y 3 − 3xy for −1 ≤ x ≤ 1, −1 ≤ y ≤ 1

First, to find the critical points, we set:


( ∂f
∂x = 3x2 − 3y = 0
∂f
∂y = 24y 2 − 3x = 0.

The first equation implies that y = x2 , and substituting this into the second gives 24x4 − 3x = 0,
or 3x(8x3 − 1) = 0. Hence x = 0, in which case y = 02 = 0, or x3 = 18 , i.e., x = 12 , in which case
2
y = 12 = 14 . As a result, f has two critical points, (0, 0) and ( 21 , 41 ).
To classify their types, we consider the Hessian:
 
6x −3
H(x, y) = .
−3 48y
 
0 −3
For instance, at the critical point (0, 0), det H(0, 0) = det = −9. Since this is negative,
−3 0
(0, 0) is a saddle point.  
3 −3 2
1 1
Similarly, det H( 2 , 4 ) = det = 36 − 9 = 27. This is positive and ∂∂xf2 (0, 0) = 3 is
−3 12
positive (it’s the upper left entry of the Hessian), so, by the second derivative test, ( 12 , 14 ) is a local
minimum.
As a check on the plausibility of these conclusions, see Figure 4.10 for what the graph of f looks
like near each of the critical points.

www.dbooks.org
102 CHAPTER 4. REAL-VALUED FUNCTIONS: DIFFERENTIATION

Figure 4.10: The graph of f (x, y) = x3 + 8y 3 − 3xy near (0, 0) (left) and near ( 12 , 14 ) (right)

4.11 Classifying nondegenerate critical points


We now say a few words about why the second derivative test for functions of two variables works
(Theorem 4.24 of the preceding section). We won’t give a complete proof, but, since the test may
appear to come out of nowhere, a word or two of explanation is warranted.
We begin by reviewing the corresponding situation for functions of one variable. Let f : I → R
be a smooth function defined on an open interval I in R. The critical points of f are the points a
where f 0 (a) = 0. At such points, the usual first-order approximation f (x) ≈ f (a) + f 0 (a) · (x − a)
degenerates to f (x) ≈ f (a), which gives us little information about the behavior of f (x) when x is
near a. As a result, we add an additional term by considering the second-order Taylor approximation
at a:
f 00 (a)
f (x) ≈ f (a) + f 0 (a) · (x − a) + · (x − a)2 . (4.16)
2
00
If a is a critical point, this reduces to f (x) ≈ f (a) + f 2(a) · (x − a)2 , a parabola whose general
00
shape is determined by the sign of the coefficient f 2(a) . If f 00 (a) > 0, then the approximating
parabola is concave up and f has a local minimum at a. If f 00 (a) < 0, the parabola is concave
down and f has a local maximum. If f 00 (a) = 0, the approximation degenerates again. Further
information is required to determine the type of critical point in this case.

4.11.1 The second-order approximation


Let f : U → R be a smooth function of two variables defined on an open set U in R2 , and let a be
a point of U . The situation here is that the second-order approximation of f (x) for x near a is:

1
f (x) ≈ f (a) + Df (a) · (x − a) + (x − a)t · H(a) · (x − a). (4.17)
2
The various dots are matrix products, where, writing x = (x, y) and a = (c, d):
" 2
∂2f
#
∂ f
2 (a) ∂x ∂y (a)
• H(a) = ∂∂x2 f ∂2f is the Hessian matrix at a,
∂x ∂y (a) ∂y 2
(a)
 
x−c
• x−a= , and
y−d

• (x − a)t = x − c y − d , the transpose of x − a.


 

To justify the approximation, let x be a point near a, and let v = x − a. Consider the real-
valued function g defined by g(t) = f (a + tv). Then g(0) = f (a) and g(1) = f (a + v) = f (x), so
4.11. CLASSIFYING NONDEGENERATE CRITICAL POINTS 103

finding an approximation for f (x) is the same as approximating g(1). As a real-valued function of
one variable, g(t) has a second-order Taylor approximation (4.16) at 0, which for g(1) gives:

g 00 (0)
f (x) = g(1) ≈ g(0) + g 0 (0) · (1 − 0) + · (1 − 0)2
2
g 00 (0)
≈ f (a) + g 0 (0) + . (4.18)
2
To find the derivatives of g, note that, by construction, g is the composition g(t) = f (α(t)),
where α(t) = a + tv. Hence by the Little Chain Rule:

g 0 (t) = ∇f (α(t)) · α0 (t) = ∇f (a + tv) · v. (4.19)

In particular, g 0 (0) = ∇f (a) · v, or in matrix form, g 0 (0) = Df (a) · v. Substituting this into (4.18)
and recalling that v = x − a gives:

g 00 (0)
f (x) ≈ f (a) + Df (a) · (x − a) + . (4.20)
2
So far, we have succeeded only in reconstructing the first-order approximation of f at a. The
really new information comes from determining g 00 (0). From equation (4.19), note that g 0 is also
a composition, namely, g 0 (t) = u(α(t)), where u(x) = ∇f (x) · v and α(t) = a + tv as before.
Therefore by the same Little Chain Rule calculation:

g 00 (0) = (u ◦ α)0 (0) = ∇u(a) · v. (4.21)


∂f ∂f 
To find ∇u, we let v = (h, k) and write out u(x) = ∇f (x) · v = ∂x , ∂y · (h, k) = h ∂f ∂f
∂x + k ∂y . Then
2 2 2 2
∇u = (h ∂∂xf2 + k ∂x
∂ f ∂ f ∂ f
∂y , h ∂y ∂x + k ∂y 2 ). From equation (4.21) and the equality of mixed partials, we
obtain:
 2
∂2f ∂2f ∂2f

∂ f
g 00 (0) = h 2 (a) + k (a), h (a) + k 2 (a) · (h, k)
∂x ∂x ∂y ∂y ∂x ∂y
2
∂ f 2
∂ f 2
∂ f
= 2
(a) h2 + 2 (a) hk + 2 (a) k 2 . (4.22)
∂x ∂x ∂y ∂y
2 2 2
∂ f
To simplify the notation somewhat, let A = ∂∂xf2 (a), B = ∂x ∂ f
∂y (a), and C = ∂y 2 (a). We do some
algebraic rearrangement in order to get the quadratic term of the approximation into the desired
form (4.17):

g 00 (0) = Ah2 + 2Bhk + Ck 2 (4.23)


2 2
= (Ah + Bhk) + (Bhk + Ck )
 
  Ah + Bk
= h k
Bh + Ck
  
  A B h
= h k
B C k
= (x − a)t · H(a) · (x − a).

Substituting this into (4.20) gives the second-order approximation f (x) ≈ f (a) + Df (a) · (x − a) +
1 t
2 (x − a) · H(a) · (x − a) as stated in (4.17). This accomplishes what we set out to do.

www.dbooks.org
104 CHAPTER 4. REAL-VALUED FUNCTIONS: DIFFERENTIATION

We remark that one can continue in this way to obtain higher-order approximations of f by
using higher-order Taylor approximations of g. The coefficients involve the higher-order derivatives
g (n) (0) of g. The idea is that, by the Little Chain Rule, each additional derivative of g corresponds
∂ ∂ ∂ ∂ ∂ ∂ n
. Thus g (n) (0) = h ∂x
 
to applying the “operator” ∇·v = ( ∂x , ∂y )·(h, k) = h ∂x +k ∂y +k ∂y f (a),
∂ ∂ n ∂ ∂ ∂ ∂ ∂ ∂
   
where h ∂x + k ∂y denotes the n-fold composition h ∂x + k ∂y ◦ h ∂x + k ∂y ◦ · · · ◦ h ∂x + k ∂y .
We leave for the curious and ambitious reader the work of checking that this formula is consistent
with our results for the first two derivatives and of working out a general formula for the nth-order
approximation of f . For example, see Exercise 11.3 for the third-order approximation.

4.11.2 Sums and differences of squares


Still
 assuming
 that f is a smooth function of two variables, if a is a critical point, then Df (a) =
0 0 , so the second-order approximation (4.17) becomes:

1
f (x) = f (a + v) ≈ f (a) + (x − a)t · H(a) · (x − a)
2
1
≈ f (a) + vt · H(a) · v,
2
where v = x − a. The approximation holds when v is near 0.
We assume that det H(a) 6= 0 and determine the nature of the critical point a. To do this,
it is simpler to write the quadratic term of the approximation using the notation of (4.23) with
v = (h, k) so that:
1
f (a + v) ≈ f (a) + (Ah2 + 2Bhk + Ck 2 ) (4.24)
2
A B
and H(a) = B C .
We consider the case A 6= 0. Similar arguments apply if A = 0 (see Exercise 11.5). Completing
the square on the expression in parentheses in (4.24) gives:

Ah2 + 2Bhk + Ck 2 = A(h2 + 2B


A hk) + Ck
2

B2 2 B2 2
= A(h2 + 2BA hk + A2 k ) − A k + Ck 2
2 B2 2
= A(h + BA k) + (C − A )k
2 AC−B 2 2
= A(h + BA k) + A k
2
= Ah̃2 + AC−B
A k2 ,

B
where h̃ = h + A k. Then (4.24) becomes:

1 AC−B 2
Ah̃2 + k2 .

f (a + v) ≈ f (a) + A
2
The important point is that the expression in parentheses is now a combination of squares,
and we know from Example 4.25 what to expect of expressions of this type. For instance, if both
2
coefficients A and AC−BA are positive, then f has a local minimum at v = 0, like x2 + y 2 . If
both coefficients are negative, then f has a local maximum, like −x2 − y 2 . If the coefficients have
opposite signs, then f has a saddle point, like x2 − y 2 . Note that this last case arises precisely when
2
the product of the coefficients is negative, that is, when A AC−B A = AC − B 2 < 0.
2
Recall that A = ∂∂xf2 (a),
 A Band
 observe that the expression AC − B 2 is the determinant of the
Hessian: det H(a) = det B C = AC − B 2 . Collecting the information in the previous paragraph
4.12. MAX/MIN: LAGRANGE MULTIPLIERS 105
2
about the signs of the coefficients A and AC−B A then gives the second-derivative test as stated in
Theorem 4.24.
The key idea of the argument was to complete the square to turn f into one of the prototypes
of being a sum and/or difference of squares, at least up to second order. This is not as ad hoc as it
might seem. A theorem in linear algebra states that, if M is a symmetric matrix in the sense that
M t = M , then it is always possible to complete the square and convert the expression vt M v into
a sum and/or difference of squares. This is one interpretation of a powerful result known as the
spectral theorem. Thanks to the equality of mixed partials, the Hessian matrix H(a) is symmetric,
so the spectral theorem applies. In fact, the spectral theorem is true for n by n symmetric matrices
for all n. Using similar ideas, this leads to a corresponding classification of nondegenerate critical
points for real-valued functions of n variables when n > 2.

4.12 Max/min: Lagrange multipliers


We now consider the problem of maximizing or minimizing a function subject to a “constraint.”
The constraint is often what gives an optimization problem substance. For instance, trying to find
the maximum value of a function like f (x, y, z) = x + 2y + 3z is silly: the function can be made
arbitrarily large by going far away from the origin where x, y, and z are all large. But if (x, y, z)
is confined to a limited region of space, then there’s something to think about. We illustrate with
a specific example.
2 2
Example 4.27. Let S be the surface described by x4 + y9 + z 2 = 1 in R3 . It is an ellipsoid, as
sketched in Figure 4.11. If f (x, y, z) = x + 2y + 3z, at what points of S does f attain its maximum
and minimum values?

x2 y2
Figure 4.11: The ellipsoid 4 + 9 + z2 = 1

2
Here, we want to maximize and minimize f = x + 2y + 3z subject to the constraint x4 +
y 2
2
9 + zq = 1. One approach is to convert this q
to a two-variable problem, for instance, writing
q
2 2 2 2 2 2
z = ± 1 − x4 − y9 on S, so f = x + 2y + 3 1 − x4 − y9 or f = x + 2y − 3 1 − x4 − y9 . These
are functions of two variables, so we could find the critical points, i.e., set the partials with respect
to x and y equal to 0, and proceed as in the previous two sections.
Instead, we try a new approach based on organizing the values of f on S according to the level
sets of f . Each level set f = x + 2y + 3z = c is a plane that intersects S typically in a curve, if at
all. As c varies, we get a family of level curves on S with f constant on each. See Figure 4.12.
Let’s say we want to maximize f on S. Then, starting at some point, we move on S in the
direction of increasing c. We continue doing this until we reach a point a where we can’t go any
farther. Let cmax denote the value of f at this point a. Observe that at a:

www.dbooks.org
106 CHAPTER 4. REAL-VALUED FUNCTIONS: DIFFERENTIATION

Figure 4.12: Intersecting S with a level set f = c (left) and a family of intersections (right)

• The level set f = cmax must be tangent to S. Otherwise we would be able to cross the level
curve at a and make the value of f larger while remaining on S.
• ∇f (a) is a normal vector to the level set f = cmax . This is because gradients are always
normal to level sets (Proposition 4.17).
• Similarly, the surface S is itself a level set, namely, the set where:

x2 y 2
g(x, y, z) = + + z 2 = 1.
4 9
Hence ∇g(a) is a normal vector to S.

The situation is shown in Figure 4.13.

Figure 4.13: The level set f = cmax and the ellipsoid S are tangent at a.

Since the level set f = cmax and the surface S are tangent at a, their normal vectors are scalar
multiples of each other. In other words, there is a scalar λ such that ∇f (a) = λ∇g(a). This is the
idea behind the following general principle.

Proposition 4.28 (Method of Lagrange multipliers). Let U be an open set in Rn , and suppose that
you want to maximize or minimize a smooth function f : U → R subject to the constraint g(x) = c
for some smooth function g : U → R and constant c. If f has a local maximum or local minimum
at a point a and if ∇g(a) 6= 0, then there is a scalar λ such that:

∇f (a) = λ∇g(a).

The scalar λ is called a Lagrange multiplier.


We return to Example 4.27 and find the maximum and minimum of f = x + 2y + 3z subject
2 2
to the constraint g(x, y, z) = x4 + y9 + z 2 = 1. According to the method of Lagrange multipliers,
4.12. MAX/MIN: LAGRANGE MULTIPLIERS 107

the solutions occur at points where ∇f = λ∇g, that is, where (1, 2, 3) = λ( 21 x, 29 y, 2z). This gives
a system of equations:
1


1 = λx
2



2
2 = λy



 9
3 = 2λz.

There are four unknowns and only three equations which may seem like not enough information,
but we must remember that the constraint provides a fourth equation that can be included as part
of the system. In this case, the equations in the system imply that λ 6= 0, so we can solve for each
of x, y, and z in terms of λ, then substitute into the constraint: x = λ2 , y = λ9 , and z = 2λ
3
, and:
2 2 9 2
 
λ λ 3 2
+ + = 1.
4 9 2λ
This simplifies to λ12 + 9
λ2
+ 9
4λ2
= 1, or 4+36+9
4λ2
= 1. Hence 4λ2 = 49, so λ = ± 72 .
If λ = 27 , then:
2 2 4 9 18 3 3
x= = 7 = , y= 7 = , z= 7 = 7,
λ 2
7 2
7 2· 2
and f = x + 2y + 3z = 47 + 36 9 7
7 + 7 = 7. If λ = − 2 , it’s the same thing with minus signs. Thus, on
S, f attains a maximum value of 7 at ( 74 , 18 3 4 18 3
7 , 7 ) and a minimum value of −7 at (− 7 , − 7 , − 7 ).
Example 4.29. Suppose that, when a manufacturer spends x dollars on labor, y dollars on equip-
ment, and z dollars on research, the total number of units that it produces is:
f (x, y, z) = 60x1/6 y 1/3 z 1/2 .
This is an example of what is called a Cobb-Douglas production function. If the manufacturer has
a total budget of B dollars, how should it distribute its spending among the three categories in
order to maximize production?
Here, we want to maximize the function f above subject to the budget constraint g = x+y +z =
B. Following the method of Lagrange multipliers, we set ∇f = λ∇g:
(10x−5/6 y 1/3 z 1/2 , 20x1/6 y −2/3 z 1/2 , 30x1/6 y 1/3 z −1/2 ) = λ(1, 1, 1).
In other words:

−5/6 1/3 1/2
 10x
 y z =λ (4.25)
1/6 −2/3 1/2
20x y z =λ (4.26)

 1/6 1/3 −1/2
30x y z = λ. (4.27)
Both (4.25) and (4.26) equal λ, and setting them equal gives:
10x−5/6 y 1/3 z 1/2 = 20x1/6 y −2/3 z 1/2 . (4.28)
If z = 0, then f = 0, which is clearly not the maximum value. Thus we may assume z 6= 0 so that
the factors of z 1/2 in equation (4.28) can be canceled, leaving 10x−5/6 y 1/3 = 20x1/6 y −2/3 , or y = 2x.
Similarly, setting (4.25) and (4.27) equal and assuming y 6= 0 gives 10x−5/6 z 1/2 = 30x1/6 z −1/2 , or
z = 3x.
Thus the budget constraint becomes x + y + z = x + 2x + 3x = 6x = B, so x = B6 . We conclude
that the optimal level of production is achieved by spending x = B6 dollars on labor, y = 2x = B3
dollars on equipment, and z = 3x = B2 dollars on research.

www.dbooks.org
108 CHAPTER 4. REAL-VALUED FUNCTIONS: DIFFERENTIATION

It’s worth stepping back to review this calculation, as there is more here than meets the eye.
A partial derivative can be viewed as a one-variable rate of change. For instance, in the previous
example, ∂f∂x is the rate at which the output changes per dollar spent on labor. This is called the
marginal productivity with respect to labor. There are similar interpretations of ∂f ∂y and ∂z
∂f

with respect to equipment and research, respectively.


The Lagrange multiplier condition ∇f = λ∇g with g = x + y + z says that ( ∂f ∂f ∂f
∂x , ∂y , ∂z ) =
λ(1, 1, 1) = (λ, λ, λ), or:
∂f ∂f ∂f
= = = λ. (4.29)
∂x ∂y ∂z
In other words, the marginal productivities with respect to labor, equipment, and research are
equal. This part of the analysis applies to any production function f , not just the Cobb-Douglas
model of the example. Thus at the optimal level of production, an extra dollar spent on labor,
equipment, or research all increase production by the same amount. This makes sense since, if
there were an advantage to increasing the amount spent on one of them, then the manufacturer
could increase production by spending more on that one and less on the others. At the optimal
level, no such advantage can exist.
Moreover, according to (4.29), the common value of these marginal productivities is the La-
grange multiplier λ. In the example, by (4.25):

10 y 1/3 z 1/2
λ=
x5/6
1/3 B 1/2
10 B3 2
=
B 5/6

6
1 1/3 1 1/2 5/6 65/6
= 10 B · 5/6
3 2 B
1 1/3 1 1/2 1 1/6
= 60
3 2 6
≈ 21.82.

So, at the optimal level, an additional dollar spent on any combination of labor, equipment, or
research increases production by approximately 21.82 units. The seemingly extraneous Lagrange
mutliplier has told us something interesting.

4.13 Exercises for Chapter 4


Section 1 The first-order approximation

1.1. Let f (x, y) = x2 − y 2 , and let a = (2, 1).


∂f ∂f
(a) Find ∂x (a) and ∂y (a).
(b) Find Df (a) and ∇f (a).
(c) Find the first-order approximation `(x, y) of f at a. (You may assume that f is differ-
entiable at a.)
(d) Compare the values of f (2.01, 1.01) and `(2.01, 1.01).

1.2. Let f (x, y, z) = x2 y + y 2 z + z 2 x, and let a = (1, −1, 1).


4.13. EXERCISES FOR CHAPTER 4 109
∂f ∂f ∂f
(a) Find ∂x (a), ∂y (a), and ∂z (a).
(b) Find Df (a) and ∇f (a).
(c) Find the first-order approximation `(x, y, z) of f at a. (You may assume that f is
differentiable at a.)
(d) Compare the values of f (1.05, −1.1, .95) and `(1.05, −1.1, .95).
∂f ∂f
In Exercises 1.3–1.8, find (a) the partial derivatives ∂x and ∂y and (b) the matrix Df (x, y).

1.3. f (x, y) = x3 − 2x2 y + 3xy 2 − 4y 3

1.4. f (x, y) = sin x sin 2y

1.5. f (x, y) = xexy


x2 −y 2
1.6. f (x, y) = x2 +y 2

1
1.7. f (x, y) = x2 +y 2

1.8. f (x, y) = xy
∂f ∂f ∂f
In Exercises 1.9–1.13, find (a) the partial derivatives ∂x , ∂y , and ∂z and (b) the gradient
∇f (x, y, z).

1.9. f (x, y, z) = x + 2y + 3z + 4

1.10. f (x, y, z) = xy + xz + yz − xyz


2 −y 2 −z 2
1.11. f (x, y, z) = e−x sin(x + 2y)
x+y
1.12. f (x, y, z) = y+z

1.13. f (x, y, z) = ln(x2 + y 2 )

1.14. Let: y
yx x3 y + x4
 
yx ln x
f (x, y) = x sin(xy) + 2 arctan 5 .
y + xy + 1 x + y6
∂f ∂f
Evaluate ∂y (1, π). (Hint: Do not try to find a general formula for ∂y . Ever.)

1.15. Let f (x, y) = x + 2y. Use the definition of differentiability to prove that f is differentiable at
every point a = (c, d) of R2 .

1.16. Use the definition of differentiability to show that the function f (x, y) = x2 + y 2 is differen-
tiable at every point a = (c, d) of R2 .

1.17. Let f : R2 → R be the function: p


f (x, y) = x4 + y 4 .
∂f ∂f
(a) Find formulas for the partial derivatives ∂x and ∂y at all points (x, y) other than the
origin.
(b) Find the values of the partial derivatives ∂f ∂f
∂x (0, 0) and ∂y (0, 0). (Note that the formulas
you found in part (a) probably do not apply at (0, 0).)

www.dbooks.org
110 CHAPTER 4. REAL-VALUED FUNCTIONS: DIFFERENTIATION

(c) Use the definition of differentiability to determine whether f is differentiable at (0, 0).
(Hint: Polar coordinates might be useful.)

1.18. Consider the function f : R2 → R defined by:


( 3 3
x +2y
x2 +y 2
if (x, y) 6= (0, 0),
f (x, y) =
0 if (x, y) = (0, 0).

∂f ∂f
(a) Find the values of the partial derivatives ∂x (0, 0) and ∂y (0, 0). (This shouldn’t require
much calculation.)
(b) Use the definition of differentiability to determine whether f is differentiable at (0, 0).

1.19. Let f and g be differentiable real-valued functions defined on an open set U in Rn . Prove
that:
∇(f g) = f ∇g + g ∇f.
(Hint: Partial derivatives are one-variable derivatives.)

1.20. Let f, g : U → R be real-valued functions defined on an open set U of Rn , and let a be a point
of U . Use the definition of differentiability to prove the following statements.

(a) If f and g are differentiable at a, so is f + g. (It’s reasonably clear that D(f + g)(a) =
Df (a) + Dg(a), so the problem is to show that the limit in the definition of differentia-
bility for f + g is 0.)
(b) If f is differentiable at a, so is cf for any scalar c.

Section 2 Conditions for differentiability

2.1. Let f : R2 → R be the function defined by:


(
x2 − y 2 if (x, y) 6= (0, 0),
f (x, y) =
π if (x, y) = (0, 0).

Is f differentiable at (0, 0)? If so, find Df (0, 0).

2.2. Is the function f (x, y) = x2/3 + y 2/3 differentiable at (0, 0)? If so, find Df (0, 0).

2.3. This exercise gives the proof of Proposition 4.4, that differentiability implies continuity. Let
U be an open set in Rn , f : U → R a real-valued function, and a a point of U . Define a
function Q : U → R by:
( f (x)−f (a)−∇f (a)·(x−a)
kx−ak if x 6= a,
Q(x) =
0 if x = a.

Note that the quotient is the one that appears in the definition of differentiability using the
gradient form of the derivative.

(a) If f is differentiable at a, show that Q is continuous at a.


(b) If f is differentiable at a, show that there exists a δ > 0 such that, if x ∈ B(a, δ), then
|Q(x)| < 1.
4.13. EXERCISES FOR CHAPTER 4 111

(c) If f is differentiable at a, prove that f is continuous at a. (Hint: Use the triangle


inequality and the Cauchy-Schwarz inequality to show that:

|f (x) − f (a)| ≤ |Q(x)|kx − ak + k∇f (a)k kx − ak.)

Section 3 The mean value theorem

3.1. Let B = B(a, r) be an open ball in R2 centered at the point a = (c, d), and let f : B → R be
a differentiable function such that:
 
Df (x) = 0 0 for all x in B.

Prove that f (x) = f (a) for all x in B. In other words, if the derivative of f is zero everywhere,
then f is a constant function.

Section 4 The C 1 test


p
4.1. We showed in Example 4.7 that the function f (x, y) = x2 + y 2 is not differentiable at (0, 0).
Is f differentiable at points other than the origin? If so, which points?

In Exercises 4.2–4.6, use the various criteria for differentiability discussed so far to determine
the points at which the function is differentiable and the points at which it is not. Your reasons
may be brief as long as they are clear and precise. It should not be necessary to use the definition
of differentiability.

4.2. f : R2 → R, f (x, y) = sin(x + y)


( 2 2
x −y
2 2 if (x, y) 6= (0, 0),
4.3. f : R2 → R, f (x, y) = x +y
1 if (x, y) = (0, 0)

4.4. f : R2 → R, f (x, y) = |x| + |y|

4.5. f : R3 → R, f (x, y, z) = x4 + 2x2 yz − y 3 z 3


 
4 w x
4.6. f : R → R, f (w, x, y, z) = det
y z
p
3
4.7. Let f (x, y) = x3 + 8y 3 .
∂f ∂f
(a) Find formulas for the partial derivatives ∂x and ∂y at all points (x, y) where x3 +8y 3 6= 0.
∂f ∂f
(b) Do ∂x (0, 0) and ∂y (0, 0) exist? If so, what are their values? If not, why not?
(c) Is f differentiable at (0, 0)?
(d) More generally, at which points of R2 is f differentiable? √ (Hint: If a =√(c, d) is a
point other than the origin where √ 3 3
c + 8d = 0, note that 3 x3 + 8d3 = 3 x3 − c3 =
p √
3
(x − c)(x2 + cx + c2 ) = 3 x − c · 3 x2 + cx + c2 where x2 + cx + c2 6= 0 when x = c.)

Section 5 The Little Chain Rule

5.1. Let f (x, y) = x2 + y 2 , and let α(t) = (t2 , t3 ). Calculate (f ◦ α)0 (1) in two different ways:

www.dbooks.org
112 CHAPTER 4. REAL-VALUED FUNCTIONS: DIFFERENTIATION

(a) by using the Little Chain Rule,


(b) by substituting for x and y in terms of t in the formula for f directly and differentiating
the result.

5.2. Let f (x, y, z) = xyz and α(t) = (cos t, sin t, t). Calculate (f ◦ α)0 ( π6 ) in two different ways:

(a) by using the Little Chain Rule,


(b) by substituting for x, y, and z in terms of t in the formula for f directly and differentiating
the result.

5.3. Let f (x, y, z) be a differentiable real-valued function of three variables, and let α(t) =
(x(t), y(t), z(t)) be a differentiable path in R3 . If w = f (α(t)), use the Little Chain Rule
to find a formula for dw dt in terms of the partial derivatives of f and the derivatives with
respect to t of x, y, and z.

5.4. The Little Chain Rule often lurks in the background of a type of first-year calculus problem
known as “related rates.” For instance, as an (admittedly artificial) example, suppose that
the length ` and width w of a rectangular region in the plane are changing in time, so that
the area A = `w also changes.

(a) Find a formula for dA


dt in terms of `, w, and their derivatives by differentiating the formula
A = `w directly with respect to t.
(b) Obtain the same result using the Little Chain Rule. (Hint: Let α(t) = (`(t), w(t)).)
(c) Suppose that, at the instant that the length of the region is 100 inches and its width is 40
inches, the length is increasing at a rate of 2 inches/second and the width is increasing
at a rate of 3 inches/second. How fast is the area changing?

5.5. Let a be a point of Rn , and let B = B(a, r) be an open ball centered at a. Let f : B → R be
a differentiable function. If b is any point of B, show that there exists a point c on the line
segment connecting a and b such that:

f (b) − f (a) = ∇f (c) · (b − a).

Note that this is a generalization of the mean value theorem to real-valued functions of more
than one variable. (Hint: Explain why the line segment can be parametrized by α(t) =
a + t(b − a), 0 ≤ t ≤ 1. Then, note that the composition f ◦ α is a real-valued function of one
variable, so the one-variable mean value theorem applies.)

5.6. In this exercise, we prove the Little Chain Rule (Theorem 4.13). Let U be an open set in Rn ,
α : I → U a path in U defined on an open interval I, and f : U → R a real-valued function.
Let t0 be a point of I. Assume that α is differentiable at t0 and f is differentiable at α(t0 ).
Let a = α(t0 ), and let Q : U → R be the function defined in Exercise 2.3:
( f (x)−f (a)−∇f (a)·(x−a)
kx−ak if x 6= a,
Q(x) =
0 if x = a.

(a) Prove that limt→t0 f (α(t))−f (α(t0 ))−∇f (α(t0 ))·(α(t)−α(t0 ))



t−t0 = 0. (Hint: f (x)−f (a)−∇f (a)·
(x − a) = kx − ak Q(x). Or see Exercise 8.2 in Chapter 3.)
(b) Prove the Little Chain Rule: (f ◦ α)0 (t0 ) = ∇f (α(t0 )) · α0 (t0 ).
4.13. EXERCISES FOR CHAPTER 4 113

5.7. Let f : R3 → R be a differentiable function with that property that ∇f (x) points in the same
direction as x for all nonzero x in R3 . If a > 0, prove that f is constant on the sphere
x2 + y 2 + z 2 = a2 . (Hint: If p and q are any two points on the sphere, there is a differentiable
path α on the sphere from p to q.)

Section 6 Directional derivatives


In Exercises 6.1–6.4, find the directional derivative of f at the point a in the direction of the
vector v.

6.1. f (x, y) = e−2x y 3 , a = (0, 1), v = (4, 1)

6.2. f (x, y) = x2 + y 2 , a = (1, 2), v = (−2, 1)

6.3. f (x, y, z) = x3 − 2x2 yz + xz − 3, a = (1, 0, −1), v = (1, −1, 2)

6.4. f (x, y, z) = sin x sin y cos z, a = (0, π2 , π), v = (π, 2π, 2π)

6.5. For a certain differentiable real-valued function f : R2 → R, the maximum directional deriva-
tive at the point a = (0, 0) has a value of 6 and occurs in the direction from a to b = (4, 1).
Find ∇f (a).

6.6. Let f (x, y) = x2 − y 2 .



(a) At the point a = ( 2, 1), find a unit vector u that points in the direction in which f is
increasing most rapidly. What is the rate of increase in this direction?
(b) Describe the set of all points a = (x, y) in R2 such that f increases most rapidly at a in
the direction that points directly towards the origin.

6.7. A certain function has the form:

f (x, y) = rx2 + sy 2 ,

where r and s are constant. Find values of r and s so that the maximum directional derivative
of f at the point a = (1, 2) has a value of 10 and occurs in the direction from a to the point
b = (2, 3).

6.8. Let f (x, y) = xy, and let a = (5, −5).

(a) Find all directions at a in which the directional derivative of f at a is equal to 1.


(b) Is there a direction in which the directional derivative at a is equal to 10?

6.9. The temperature in a certain region of space, in degrees Celsius, is modeled by the function
2 2 2
T (x, y, z) = 20e−x −2y −4z , where x, y, z are measured in meters. At the point a = (2, −1, 3):

(a) In what direction is the temperature increasing most rapidly?


(b) In what direction is it decreasing most rapidly?
(c) If you travel in the direction described in part (a) at a speed of 10 meters/second, how
fast is the observed temperature changing at a in degrees Celsius per second?

6.10. Let c = (c1 , c2 , . . . , cn ) be a vector in Rn , and define f : Rn → R by f (x) = c · x.

(a) Find the derivative Df (x).

www.dbooks.org
114 CHAPTER 4. REAL-VALUED FUNCTIONS: DIFFERENTIATION

(b) If u is a unit vector in Rn , find a formula for the directional derivative (Du f )(x).

6.11. You discover that your happiness is a function of your location in the plane. At the point
(x, y), your happiness is given by the formula:

H(x, y) = x3 e2y happs,

where “happ” is a unit of happiness. For instance, at the point (1, 0), you are H(1, 0) = 1
happ happy.

(a) At the point (1, 0), in which direction is your happiness increasing most rapidly? What
is the value of the directional derivative in this direction?
(b) Still at the point (1, 0), in which direction is your happiness decreasing most rapidly?
What is the value of the directional derivative in this direction?
(c) Suppose that, starting at (1, 0), you follow a curve so that you are always traveling in
the direction in which your happiness increases most rapidly. Find an equation for the
curve, and sketch the curve. An equation describing the relationship between x and y
along the curve is fine. No further parametrization is necessary. (Hint: What can you
say about the slope at each point of the curve?)
(d) Suppose instead that, starting at (1, 0), you travel along a curve such that the directional
derivative in the tangent direction is always equal to zero. Find an equation describing
this curve, and sketch the curve.

6.12. Does there exist a differentiable function f : U → R defined on an open subset U of Rn with
the property that, for some point a in U , the directional derivatives at a satisfy (Du f )(a) > 0
for all unit vectors u in Rn , i.e., the directional derivative at a is positive in every direction?
Either find such a function and point a, or explain why none exists.

Section 7 ∇f as normal vector

7.1. Find an equation of the tangent plane to the surface z = x2 − y 2 at the point a = (1, 2, −3).

7.2. Find an equation of the tangent plane to the surface x2 +y 2 −z 2 = 1 at the point a = (1, −1, 1).

7.3. The surface in R3 given by:


2 2 3
(9x2 + y 2 + z 2 − 1)3 − y 2 z 3 − x z =0
5
was featured in an article in the February 14, 2019 issue of The New York Times.4 See Figure
4.14. Nowhere in the article was the tangent plane at the point (0, 1, 1) mentioned. Scoop
the Times by finding an equation of the tangent plane.

7.4. Let S be the ellipsoid x2 + 21 y 2 + 14 z 2 = 4. Let a = (1, 2, 2), and let n be the unit normal
vector to S at a that points outward from S. If f (x, y, z) = xyz, find the directional derivative
(Dn f )(a).

7.5. The saddle surface y 2 − x2 − z = 0 and the sphere x2 + y 2 + z 2 = 14 in R3 intersect in a curve


C that passes through the point a = (1, 2, 3). Find a parametrization of the line tangent to
C at a.
4
https://www.nytimes.com/2019/02/14/science/math-algorithm-valentine.html. Actually, the article con-
cerns a family of such surfaces, parametrized by two real numbers. The surface in the exercise is a specific example.
4.13. EXERCISES FOR CHAPTER 4 115

Figure 4.14: The surface (9x2 + y 2 + z 2 − 1)3 − y 2 z 3 − 25 x2 z 3 = 0

7.6. Find all values of c such that, at every point of intersection of the spheres

(x − c)2 + y 2 + z 2 = 3 and x2 + (y − 1)2 + z 2 = 1,

the respective tangent planes are perpendicular to one another.

Section 8 Higher-order partial derivatives


In Exercises 8.1–8.4, find all four second-order partial derivatives of the given function f .

8.1. f (x, y) = x4 − 2x3 y + 3x2 y 2 − 4xy 3 + 5y 4

8.2. f (x, y) = sin x cos y


2 −y 2
8.3. f (x, y) = e−x
y
8.4. f (x, y) = x+y

∂ i+j f
8.5. Let f (x, y) = x4 y 3 . Find i and j so that the (i + j)th-order partial derivative ∂xi ∂y j
(0, 0) is
nonzero. What is the value of this partial derivative?

8.6. Consider the function f : R2 → R defined by:

x3 y
(
x2 +y 2
if (x, y) 6= (0, 0),
f (x, y) =
0 if (x, y) = (0, 0).

∂f ∂f
(a) Find formulas for the partial derivatives ∂x (x, y) and ∂y (x, y) if (x, y) 6= (0, 0).
∂f ∂f
(b) Find the values of ∂x (0, 0) and ∂y (0, 0).
(c) Use your answers to parts (a) and (b) to evaluate the second-order partials

∂2f ∂2f
   
∂ ∂f  ∂ ∂f 
(0, 0) = (0, 0) and (0, 0) = (0, 0)
∂y ∂x ∂y ∂x ∂x ∂y ∂x ∂y

at the origin (and only at the origin). (Warning: Do not assume that the second-order
partial derivatives are continuous.)
(d) Is f of class C 2 on R2 ? Is it of class C 1 ?

www.dbooks.org
116 CHAPTER 4. REAL-VALUED FUNCTIONS: DIFFERENTIATION

Figure 4.15: The rectangle with vertices a, u, v, w

8.7. This exercise gives the details of a full proof of the equality of mixed partials (Theorem 4.21).
Let a = (c, d) be a point of R2 , and let f : B → R be a real-valued function defined on an open
ball B = B(a, r) centered at a. Assume that the first and second-order partial derivatives of
f are continuous on B. Let R be the rectangle whose vertices are a = (c, d), u = (c + h, d),
v = (c + h, d + k), and w = (c, d + k), where h and k are nonzero real numbers small enough
that R ⊂ B. See Figure 4.15.
Let:

∆ = f (v) − f (u) − f (w) + f (a)


= f (c + h, d + k) − f (c + h, d) − f (c, d + k) + f (c, d).

(a) Show that ∆ = ∂f ∂f 


∂y (c + h, y1 ) − ∂y (c, y1 ) k for some number y1 between d and d + k.
(Hint: Note that ∆ = g(d + k) − g(d), where g is the function defined by g(y) =
f (c + h, y) − f (c, y). Apply the mean value theorem.)
∂ f 2
(b) Show that there is a point p = (x1 , y1 ) in R such that ∆ = ∂x ∂y (p) hk. (Hint: Apply
the mean value theorem again, this time to the difference in the result of part (a).)
∂2f
(c) Similarly, show that there is a point q = (x2 , y2 ) in R such that ∆ = ∂y ∂x (q) hk. (Hint:
Consider j(x) = f (x, d + k) − f (x, d).)
∂2f ∂2f
(d) Deduce that the mixed partials at a are equal: ∂x ∂y (a) = ∂y ∂x (a). (Hint: What hap-
pens to R as h, k → 0?)

Section 10 Max/min: critical points


In Exercises 10.1–10.6, find all critical points of f , and, if possible, classify their type.

10.1. f (x, y) = 2 − 10x − 3x2 − 20y + 8xy + 3y 2

10.2. f (x, y) = 5 − 2x − 2x2 − 2xy + 2y − y 2

10.3. f (x, y) = x4 + 2y 4 + 3x2 y 2 + 4x2 + 5y 2

10.4. f (x, y) = x3 + x2 + 4xy − 2y 2 + 9

10.5. f (x, y) = 2x3 + y 2 − 6xy

10.6. f (x, y) = 4x3 + y 3 + (1 − x − y)3

10.7. Let k be a real number, and let f (x, y) = x3 + y 3 − kxy. Find all critical points of f , and
classify each one as a local maximum, local minimum, or neither. (The answer may depend
on the value of k.)
4.13. EXERCISES FOR CHAPTER 4 117

10.8. Let f (x, y) = 3x4 − 4x2 y + y 2 .


(a) Make a sketch in R2 indicating the set of points (x, y) where f (x, y) > 0 and the set
where f (x, y) < 0. (Hint: Factor f .)
(b) Explain why f does not have a local maximum or minimum at (0, 0).
(c) On the other hand, show that f does have a local minimum at (0, 0) when restricted to
any line y = mx.
10.9. (a) Given a point a in Rn , define f : Rn → R by f (x) = kx − ak2 . Show that ∇f (x) =
2(x − a).
(b) Let p1 , p2 , . . . , pm be m distinct points in Rn , and let F (x) = m 2
P
k=1 kx−pk k . Assuming
that F has a global minimum, show that it must occur when:
m
1 X
x= pk (the “centroid”).
m
k=1

10.10. Let (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) be a collection of n distinct points in R2 , where n ≥ 2. We


think of the points as “data.” The line y = mx + b that best fits the data in the sense of
least squares is defined to be the one that minimizes the quantity:
n
X
E= (mxi + b − yi )2 .
i=1

Note that using a line y = mx + b to predict the data value y = yi when x = xi gives an error
of magnitude |mxi + b − yi |. Consequently E is called the total squared error. It depends on
the line, which we identify here by its slope m and y-intercept b. Thus E is a function of the
two variables m and b, and we can apply the methods we have been studying to find where
it assumes its minimum.
∂E
Calculate the partial derivatives ∂m and ∂E
∂b , and show that the critical points (m, b) of E are
the solutions of the system of equations:
 n n n
P 2 P  P


 xi m + xi b = x i yi
i=1 i=1 i=1
Pn  Pn (4.30)
xi m + nb = yi .



i=1 i=1

In Exercises 10.11–10.12, find the best fitting line in the sense of least squares for the given
data points by solving the system (4.30). Then, plot the points and sketch the line.

10.11. (1, 1), (2, 3), (3, 3)


10.12. (1, 2), (2, 1), (3, 4), (4, 3)

10.13. In this exercise, we use the second derivative test to verify that, for the best fitting line in the
sense of least squares, the critical point (m, b) given by the system (4.30) is a local minimum
of the total squared error E.
n n
xi )2 ≤ n ( x2i ) and that equality
P P
(a) If x1 , x2 , . . . , xn are real numbers, show that (
i=1 i=1
holds if and only if x1 = x2 = · · · = xn . (Hint: Consider the dot product x · v, where
x = (x1 , x2 , . . . , xn ) and v = (1, 1, . . . , 1).)

www.dbooks.org
118 CHAPTER 4. REAL-VALUED FUNCTIONS: DIFFERENTIATION

(b) Let (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) be given data points. Show that the Hessian of the
total squared error E is given by:
 n n

P 2 P 
2 xi 2 xi 
H(m, b) =  i=1 i=1 .
 P n  
2 xi 2n
i=1

(c) Assuming that the data points don’t all have the same x-coordinate, show that the
critical point of E given by (4.30) is a local minimum. (In fact, it is a global minimum,
as follows once one factors in that E is a quadratic polynomial in m and b, though we
won’t go through the details to justify this.)

Section 11 Classifying nondegenerate critical points

11.1. Find the second-order approximation of f (x, y) = cos(x + y) at a = (0, 0).

11.2. Find the second-order approximation of f (x, y) = x2 + y 2 at a = (1, 2).

11.3. Let U be an open set in R2 , and let f : U → R be a smooth real-valued function defined on
U . Let a be a point of U , and let x = a + v be a nearby point, where v = (h, k).
∂ ∂
Due to the equality of mixed partials, the powers of the operator ∇ · v = h ∂x + k ∂y under
composition expand in the same way as ordinary binomial expressions. For instance:
 
∂ ∂ 2 ∂ ∂ ∂ ∂
h +k f = (h + k ) ◦ (h +k ) f
∂x ∂y ∂x ∂y ∂x ∂y
∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ 
= h2 ◦ + hk ◦ + hk ◦ + k2 ◦ f
∂x ∂x ∂x ∂y ∂y ∂x ∂y ∂y
∂2 ∂2 ∂2 
= h2 2 + 2hk + k2 2 f
∂x ∂x ∂y ∂y
2
∂ f 2
∂ f ∂2f
= h2 2 + 2hk + k2 2 .
∂x ∂x ∂y ∂y

Note that this agrees with the expression derived in (4.22) as part of the quadratic term in
the second-order approximation of f .
∂ ∂ 3

(a) Find the corresponding expansion of h ∂x + k ∂y f.
(b) Find a formula for the third-order approximation of f at a. (Recall from first-year
calculus that the third-order Taylor approximation of a function of one variable is f (x) =
00 000
f (a + h) ≈ f (a) + f 0 (a) h + f 2(a) h2 + f 3!(a) h3 .)

11.4. Let f (x, y) = ax2 + by 2 , where a and b are nonzero constants. For which values of a and b
is the origin a local maximum? For which is it a local minimum? For which is it a saddle
point?

11.5. Complete the description of the behavior of the second-order approximationnear a critical
A B
point a by considering the case that det H(a) 6= 0 and A = 0, where H(a) = .
B C

(a) Show that det H(a) < 0.


4.13. EXERCISES FOR CHAPTER 4 119

(b) Show that the quadratic term Ah2 + 2Bhk + Ck 2 = 2Bhk + Ck 2 of the second-order
approximation can be written as a difference of two perfect squares. (Hence, based on
our prototypical models, we predict a to be a saddle point. Hint: In the case that C = 0,
too, consider the expansions of (a ± b)2 .)

Section 12 Max/min: Lagrange multipliers

12.1. Consider the problem of finding the maximum and minimum values of the function f (x, y) =
xy subject to the constraint x + 2y = 1.
(a) Explain why a minimum value does not exist.
(b) On the other hand, you may assume that a maximum does exist. Use Lagrange multi-
pliers to find the maximum value and the point at which it is attained.
12.2. (a) Find the points on the curve x2 − xy + y 2 = 4 at which the function f (x, y) = x2 + y 2
attains its maximum and minimum values. You may assume that these maximum and
minimum values exist.
(b) Use your answer to part (a) to sketch the curve x2 − xy + y 2 = 4. (Hint: Note that
f (x, y) = x2 + y 2 represents the square of the distance from the origin.)
12.3. At what points on the unit sphere x2 + y 2 + z 2 = 1 does the function f (x, y, z) = 2x − 3y + 4z
attain its maximum and minimum values?
12.4. Find the minimum value of the function f (x, y, z) = x2 + y 2 + z 2 subject to the constraint
2x − 3y + 4z = 1. At what point is the minimum attained?
12.5. Find the positive values of x, y, and z for which the function f (x, y, z) = xy 2 z 3 attains its
maximum value subject to the constraint 3x+4y+5z = 12. You may assume that a maximum
exists.
12.6. (a) Use the method of Lagrange multipliers to find the positive values of x, y, and z such
that xyz = 1 and x + y + z is as small as possible. What is the minimum value? (You
may assume that a minimum exists.)
(b) Use part (a) to show that, for all positive real numbers a, b, c,
a+b+c √ 3
≥ abc.
3
This√is called the inequality of the arithmetic and geometric means. (Hint: Let
k = 3 abc, and consider a/k, b/k, and c/k.)
 
a b c
12.7. Let A be a 3 by 3 symmetric matrix, A =  b d e , and consider the quadratic polynomial:
c e f
  
a b c x
Q(x) = xt Ax = x y z  b d e  y  ,
 

c e f z
 
x
where x = y . Use Lagrange multipliers to show that, when Q is restricted to the unit

z
sphere x +y +z 2 = 1, any point x at which it attains its maximum or minimum value satisfies
2 2

www.dbooks.org
120 CHAPTER 4. REAL-VALUED FUNCTIONS: DIFFERENTIATION

Ax = λx for some scalar λ and that the maximum or minimum value is the corresponding
value of λ. (In the language of linear algebra, x is called an eigenvector of A and λ is called
the corresponding eigenvalue.) In fact, maxima and minima do exist, so the conclusion is
not an empty one.

12.8. You discover that your happiness is a function of your daily routine. After a great deal of
soul-searching, you decide to focus on three activities to which you will devote your entire
day: if x, y, and z are the number of hours a day that you spend eating, sleeping, and
studying multivariable calculus, respectively, then, based on data provided by the federal
government, your happiness is given by the function:

h(x, y, z) = 1000 − 23x2 − 23y 2 − 46y − z 2 happs.5

There is a mix of eating, sleeping, and studying that leads to the greatest possible daily
happiness (you may assume this). What is it? Answer in two ways, as follows.

(a) Substitute for z in terms of x and y in the formula for h, and find the critical points of
the resulting function of x and y.
(b) Use Lagrange multipliers.

12.9. After conducting extensive market research, a manufacturer of monkey saddles discovers that,
if it produces a saddle consisting of x ounces of leather, y ounces of copper, and z ounces of
premium bananas, the monkey rider experiences a total of:

S = 2xy + 3xz + 4yz units of satisfaction.

Leather costs $1 per ounce, copper $2 per ounce, and premium bananas $3 per ounce, and the
manufacturer is willing to spend at most $1000 per saddle. What combination of ingredients
yields the most satisfied monkey? You may assume that a maximum exists.

5
See Exercise 6.11, page 114.
Chapter 5

Real-valued functions: integration

5.1 Volume and iterated integrals


We now study the integral of real-valued functions of more than one variable. Most of our time
is devoted to functions of two variables. In this section, we introduce the integral informally by
thinking about how to visualize it. This uses an idea likely to be familiar from first-year calculus.
It won’t be until the next section that we define what the integral actually is.
Let D be a subset of R2 . At this point, we won’t be too careful about putting any conditions on
D, but it’s best to think of it as some sort of two-dimensional blob, as opposed to a set of discrete
points or a curve. Let f : A → R be a real-valued function whose domain A is a subset of R2 that
contains D. For the moment, assume that f (x) ≥ 0 for all x in D.
Let W be the region in R3 that lies below the graph z = f (x, y) and above D, as shown in
Figure 5.1. That is:

W = {(x, y, z) ∈ R3 : (x, y) ∈ D and 0 ≤ z ≤ f (x, y)}.

Figure 5.1: The region W under the graph z = f (x, y) and above D
RR RR
Consider
RR the volume of W , which we denote variously by D f (x, y) dA, D f (x, y) dx dy, or
just D f dA. To calculate it, we apply the following principle from first-year calculus:
Volume = Integral of cross-sectional area.
For instance, in first-year calculus, the cross-sections may turn out to be disks, washers, or, in a
slight variant, cylinders.
Example 5.1. Find the volume under the plane z = 4 − x + 2y and above the square 1 ≤ x ≤ 3,
2 ≤ y ≤ 4, in the xy-plane. See Figure 5.2.

121

www.dbooks.org
122 CHAPTER 5. REAL-VALUED FUNCTIONS: INTEGRATION

Figure 5.2: The region under z = 4 − x + 2y and above the square 1 ≤ x ≤ 3, 2 ≤ y ≤ 4


RR
Here, f (x, y) = 4 − x + 2y, and, if D denotes the specified square, we are looking for D (4 −
x + 2y) dA. As above, let W denote the region below the plane and above the square. We consider
first cross-sections perpendicular to the x-axis, that is, intersections of W with planes of the form
x = constant. There is such a cross-section for each value of x from x = 1 to x = 3, so:
Z 3
Volume = A(x) dx, (5.1)
1
where A(x) is the area of the cross-section at x. A typical cross-section is shown in Figure 5.3. The

Figure 5.3: A cross-section perpendicular to the x-axis

base is a line segment from y = 2 to y = 4, and the top is a curve lying in the graph z = 4 − x + 2y.
(In fact, the cross-sections in this example are trapezoids, but let’s not worry
R 4 about that.) In other
words, A(x) is an area under a curve, so it too is an integral: A(x) = 2 (4 − x + 2y) dy, where,
for the given cross-section, x is fixed. Integrating the cross-sectional area as in equation (5.1) and
evaluating gives:
Z 3 Z 4 

Volume = 4 − x + 2y dy dx
1 2
 y=4
Z 3
2
= 4y − xy + y dx
1 y=2
Z 3 
= (16 − 4x + 16) − (8 − 2x + 4) dx
1
Z 3
= (20 − 2x) dx
1
3
= 20x − x2 1
= (60 − 9) − (20 − 1)
= 32.
5.1. VOLUME AND ITERATED INTEGRALS 123

Alternatively, we could have used cross-sections perpendicular to the y-axis, in which case the
cross-sections occur from y = 2 to y = 4 (Figure 5.4):
Z 4
Volume = A(y) dy. (5.2)
2

Figure 5.4: A cross-section perpendicular to the y-axis

Each cross-section
R3 lies below the graph and above a line segment that goes from x = 1 to x = 3,
so A(y) = 1 (4 − x + 2y) dx. This time the integral of the cross-sectional area (5.2) gives:
Z 4 Z 3 

Volume = 4 − x + 2y dx dy
2 1
4  x=3
Z
1 2
= 4x − x + 2xy dy
2 2 x=1
Z 4
9 1 
= (12 − + 6y) − (4 − + 2y) dy
2 2 2
Z 4
= (4 + 4y) dy
2
4
= 4y + 2y 2 2
= 48 − 16
= 32.

According to this approach, the volume is the integral of an integral, or an iterated integral.
Evaluating the integral consists of a sequence of “partial antidifferentiations” with respect to one
variable at a time, all other variables being held constant. Once you integrate with respect to a
particular variable, that variable disappears from the remainder of the calculation.

Example 5.2. Find the volume of the solid that lies inside the cylinders x2 +z 2 = 1 and y 2 +z 2 = 1
and above the xy-plane.
The cylinders have axes along the y and x-axis, respectively, and intersect one another at right
angles, as in Figure 5.5. Perhaps it’s simplest to focus on the quarter of the solid that lies in the first
“octant,” i.e., the region of R3 in which x, y, and z are all nonnegative. The base of this portion
is the unit square 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 (Figure 5.6, left). Half of the solid lies under x2 + z 2 = 1
(Figure 5.6, right), running in the y-direction above the triangle in the xy-plane bounded by the
x-axis, the line x = 1, and the line y = x. Call this triangle D1 . The other half runs in the
x-direction, under y 2 + z 2 = 1 and above a complementary triangle D2 within the unit square.

www.dbooks.org
124 CHAPTER 5. REAL-VALUED FUNCTIONS: INTEGRATION

Figure 5.5: The cylinders x2 + z 2 = 1 (left), y 2 + z 2 = 1 (middle), and the region contained in both
(right)

Figure 5.6: The portion in the first octant (left) and the portion under only x2 + z 2 = 1 (right)

The volume of the original solid is √


8 times the volume of the portion over D1 . This lies below
x2 + z 2 = 1, i.e., below the graph z = 1 − x2 . Hence:
ZZ p
Volume = 8 1 − x2 dA.
D1

To calculate the volume over D1 , let’s try cross-sections perpendicular to the x-axis. These
exist from x = 0 to x = 1:
Z 1
Volume = 8 A(x) dx.
0

The base of each cross-section is a line segment perpendicular to the x-axis within D1 . The lower
endpoint is always at y = 0, but the upper endpoint depends on x. Indeed, the upper endpoint lies
on the line y = x. See Figure 5.7.

Figure 5.7: The base of the cross-section at x


5.1. VOLUME AND ITERATED INTEGRALS 125
Rx√
Hence A(x) = 0 1 − x2 dy, so:
Z 1 Z x p 
Volume = 8 2
1 − x dy dx
0 0
Z 1p y=x

=8 2
1 − x · y dx
0 y=0
Z 1 p
=8 x 1 − x2 dx (let u = 1 − x2 , du = −2x dx)
0
 
1 2 2 3/2 1

= 8 − · (1 − x ) 0
2 3
8
= − (03/2 − 13/2 )
3
8
= .
3
Alternatively, suppose we had used cross-sections perpendicular to the y-axis. The cross-sections
exist from y = 0 to y = 1, and, for each such y, the base of the cross-section is a line segment in
the x-direction that goes from x = y to x = 1, as illustrated in Figure 5.8.

Figure 5.8: The base of the cross-section at y

Consequently: Z 1 Z 1 Z 1p 
Volume = 8 A(y) dy = 8 1− x2 dx dy.
0 0 y
While not impossible, it is a little less obvious how to do the first partial antidifferentiation with
respect to x by hand. In other words, our original order of antidifferentiation might be preferable.
Z 2 Z 4 
3 y3
Example 5.3. Consider the iterated integral x e dy dx.
0 x2

(a) Sketch the domain of integration D in the xy-plane.


(b) Evaluate the integral.
(a) The domain of integration can be reconstructed from the endpoints of the integrals. The
outermost integral says that x goes from x = 0 to x = 2. Geometrically, we are integrating areas of
cross-sections perpendicular to the x-axis. Then the inner integral says that, for each x, y goes from
y = x2 to y = 4. Figure 5.9 exhibits the relevant input. Hence D is described by the conditions:
D = {(x, y) ∈ R2 : 0 ≤ x ≤ 2, x2 ≤ y ≤ 4}.
This is the region in the first quadrant bounded by the parabola y = x2 , the line y = 4, and the
y-axis.

www.dbooks.org
126 CHAPTER 5. REAL-VALUED FUNCTIONS: INTEGRATION

Figure 5.9: The domain of integration: cross-sections perpendicular to the x-axis

(b) To evaluate the integral as presented, we would antidifferentiate first with respect to y, treating
x as constant: Z Z 2  4 Z  Z  2 4
3 3
x3 ey dy dx = x3 ey dy dx.
0 x2 0 x2

The innermost antiderivative looks hard. So, having nothing better to do, we try switching the
order of antidifferentiation. Changing to cross-sections perpendicular to the y-axis, we see from the

description of D that y goes from y = 0 to y = 4, and, for each y, x goes from x = 0 to x = y.
The thinking behind the switched order is illustrated in Figure 5.10.

Figure 5.10: Reversing the order of antidifferentiation

Therefore:
Z 2 Z √
4  Z 4 Z y 
3 y3 3 y3
x e dy dx = x e dx dy
0 x2 0 0
√ 
1 4 y3 x= y
Z 4
= x e dy
0 4 x=0
Z 4
1 2 y3
(let u = y 3 , du = 3y 2 dy)

= y e − 0 dy
0 4
1 1 3 4
= · ey 0
4 3
1
= (e64 − 1).
12

5.2 The double integral


RR
We now define the integral D f (x, y) dA of a real-valued function f of two variables whose domain
is a subset D of R2 . We assume the following:
5.2. THE DOUBLE INTEGRAL 127

• D is a bounded subset of R2 : this means that there is a rectangle R in R2 such that D ⊂ R.


• f is a bounded function: this means that there is a scalar M such that |f (x)| ≤ M for all
x in D.

To define the integral, we take as a guiding principle that it is a limit of weighted sums. This is
often what gives the integral its power.
We begin by reviewing the one-variable case. Let f : [a, b] → R be defined on a closed interval
[a, b]. We subdivide the interval into n subintervals of widths 4x1 , 4x2 , . . . , 4xn and choose
P a point
pi in the ith subinterval for i = 1, 2, . . . , n. See Figure 5.11. We then form the sum i f (pi ) 4xi .
This is called a Riemann sum, and we refer to the pi as sample points. We think of the

Figure 5.11: Subdividing [a, b] into n subintervals of widths 4x1 , 4x2 , . . . , 4xn with sample points
p1 , p2 , . . . , pn

Riemann sum as the sum of the values f (pi ) at the sample points weighted by the width of the
subintervals (though it may be equally appropriate on occasion to think of it as the widths of the
subintervals weighted by the function values). The integral is defined as a limit of the Riemann
sums: Z b X
f (x) dx = lim f (pi ) 4xi .
a 4xi →0
i

The limit is a different type from the ones we encountered before.


P Here, it means that, given any
 > 0, there exists a δ > 0 such that, for all Riemann sums i f (pi ) 4xi based on subdivisions of
P Rb
[a, b] with 4xi < δ for all i, we have | i f (pi ) 4xi − a f (x) dx| < . This is extremely cumbersome
to work with.
Moving on to functions of two variables f : D → R, where D ⊂ R2 , we do our best to mimic
the one-variable case and proceed in two steps.
Step 1. Integrals over rectangles.
We first introduce some notation that makes it easier to specify the kind of rectangles we have
in mind.

Definition. If X and Y are sets, then the Cartesian product X × Y is defined to be the set of
all ordered pairs:
X × Y = {(x, y) : x ∈ X, y ∈ Y }.

For instance, the product of two closed intervals R = [a, b] × [c, d] = {(x, y) : x ∈ [a, b], y ∈
[c, d]} = {(x, y) : a ≤ x ≤ b, c ≤ y ≤ d} is a rectangle in R2 whose sides are parallel to the
coordinate axes (Figure 5.12). We define how to integrate a function f (x, y) over such a rectangle
R.
We chop up R into a grid of subrectangles with sides parallel to the axes and of dimensions
4xi by 4yj . A simple example is shown inP Figure 5.13. We choose a sample point pij in each
subrectangle, and consider the Riemann sum i,j f (pij ) 4xi 4yj . This is a weighted sum in which
the function values at the sample points are weighted by the areas of the subrectangles.
Now, let 4xi and 4yj go to 0.

www.dbooks.org
128 CHAPTER 5. REAL-VALUED FUNCTIONS: INTEGRATION

Figure 5.12: The rectangle R = [a, b] × [c, d] in R2

Figure 5.13: A subdivision of a rectangle into 4 subrectangles

Definition. Let R = [a, b] × [c, d] be a rectangle in R2 . The integral of a bounded function


f : R → R is defined by:
ZZ X
f (x, y) dA = lim f (pij ) 4xi 4yj ,
R 4xi →0
4yj →0 i,j

assuming that the limit exists. When it does, we say that f is integrable on R.
There are other versions of integrals, too, and the one defined here is known technically as the
Riemann integral. Integrals of functions of two variables are called double integrals.
We begin with a standard somewhat artificial example that is constructed to make a point.
Example 5.4. Let R = [0, 1] × [0, 1], and define f : R → R by:
(
0 if x and y are both rational,
f (x, y) =
1 otherwise.
√ √
For instance, f ( 12 , 53 ) = 0, while f ( 12 , 22 ) = 1 and f ( π6 , 22 ) = 1. The graph z = f (x, y) consists of
two 1 by 1 squares, one at height RR z = 0 and one at height z = 1, both with lots of missing points.
To evaluate the integral R f (x, y) dA using the definition, consider a subdivision of R into
subrectangles. Regardless of the subdivision, each subrectangle contains P some points where f = 0
and others where P f = 1. Thus some Riemann sums have the form i.j 0·4xi 4yj = 0, while others
have
P the form i,j 1 · 4xi 4yj = Area of R = 1. This is true no matter what 4xi and 4yj are, so
i,j f (p ij )4x i 4y j does not approach a limit as 4xi and 4yj go to 0. Thus f is not integrable
on R.
This raises the question of whether it’s possible to tell when a function is integrable. For the
preceding function, in some sense the problem is that it is too discontinuous.
5.2. THE DOUBLE INTEGRAL 129

Theorem 5.5. Let f : R → R be a bounded function defined on a rectangle R. If f is continuous


on R, except possibly at a finite number of points or on a finite number of smooth curves, then f
is integrable on R.
This is actually a special case of a more general theorem that characterizes precisely when a
function is Riemann integrable. Unfortunately, the proof uses concepts that are best left for a
course in real analysis, so we do not go into it further. One consequence, however, is the useful
fact that continuous functions are integrable. This is worth remembering. It covers most of the
functions that we work with.
A second question is that, once a function is known to be integrable, is there a practical way to
evaluate the integral? Here again, we only indicate why the answer is plausible without attempting
anything like a correct proof. The intuition is that, P after you subdivide a rectangle and choose
sample points, you can organize the Riemann sum i,j f (pij ) 4xi 4yj either:
P P 
• column-by-column: i j f (pij ) 4yj ) 4xi , or
P P 
• row-by-row: j i f (pij ) 4xi ) 4yj .
These areR bRiemann
Rd sums of Riemann sums. The first sum is an Rapproximation of
 the iterated
d Rb
integral a c f (x, y) dy dx, while the second sum approximates c a f (x, y) dx dy. Refer to
Figure 5.14. As 4xi and 4yj go to 0, these approximations become exact, and, in the limit, the
iterated integrals coincide with the double integral.

Figure 5.14: Organizing


P Riemann sums row-by-row or column-by-column. For instance, a typical
column sum is j f (pij ) 4yj , where i is fixed.

RR
Theorem 5.6 (Fubini’s theorem). If f is integrable on R = [a, b] × [c, d], then R f (x, y) dA is
equal to both of the iterated integrals:
ZZ Z b Z d  Z d Z b 
f (x, y) dA = f (x, y) dy dx = f (x, y) dx dy.
R a c c a

In particular, the iterated integrals that we computed in Section 5.1 were actually examples of
double integrals.
Step 2. Integrals over bounded sets in general.
Let f : D → R be a bounded function defined on a bounded subset D of R2 . By definition, D is
contained in a rectangle R. By taking an even larger rectangle, if necessary, we may assume that
R has the form [a, b] × [c, d]. Define a function fe: R → R by:
(
f (x, y) if (x, y) ∈ D,
fe(x, y) =
0 otherwise.

www.dbooks.org
130 CHAPTER 5. REAL-VALUED FUNCTIONS: INTEGRATION

An example of a graph of fe is shown in Figure 5.15.

Figure 5.15: The general case: the graph of z = fe(x, y)

The integral of fe, as a function defined on a rectangle, has been covered in Step 1.

Definition.
RR say that f : D → R is integrable on D if fe is integrable on R. If so, we define
We RR
D f (x, y) dA = R f (x, y) dA.
e
P
Thus the integral of f over D is defined as a limit of Riemann sums for fe, i,j fe(pij ) 4xi 4yj ,
based on subdivisions of the larger rectangle R into subrectangles. Since fe = 0 away from D,
however, only those subrectangles that intersect D contribute to the Riemann sum. Hence we
might write: ZZ X
f (x, y) dA = lim f (pij ) 4xi 4yj .
D 4xi →0
subrectangles
4yj →0
that intersect D

The subdivision is indicated in Figure 5.16.

Figure 5.16: The general case: subdividing a rectangle containing D into subrectangles. Only those
subrectangles that intersect D may contribute to a Riemann sum for fe.

As Figure 5.15 indicates, even if f is continuous on D, the function fe might well be discontinuous
on the boundary of D. Nevertheless, by Theorem 5.5, fe will still be integrable on R as long as the
boundary of D consists of finitely many smooth curves. In other words, the integral can tolerate
a certain amount of bad behavior on the boundary. Having this latitude is something we take
advantage of later on.
Here are some basic properties of the integral. They are quite reasonable and can be proven
using the definition of the integral as a limit of sums.

Proposition 5.7. Let f and g be integrable functions on a bounded set D in R2 . Then:


RR RR RR
1. D (f + g) dA = D f dA + D g dA.
5.3. INTERPRETATIONS OF THE DOUBLE INTEGRAL 131
RR RR
2. D cf dA = c D f dA for any scalar c.
RR RR
3. If f (x, y) ≤ g(x, y) for all (x, y) in D, then D f dA ≤ D g dA.

5.3 Interpretations of the double integral


We have defined the integral of a bounded real-valued function f (x, y) of two variables over a
bounded subset D as:
ZZ X
f (x, y) dA = lim f (pij ) 4xi 4yj ,
D 4xi →0
subrectangles
4yj →0
that intersect D

where the subrectangles come from subdividing a rectangle R that contains D. Now, we go about
trying to understand why this might be useful.
RR is a limit of sums. Therefore, if a typical summand f (pij )4xi 4yj represents
First, the integral
“something,” then D f (x, y) dA represents the total “something.” Also, we have noted before that
the Riemann sum is the sum of the sample values f (pij ) weighted by the area of the subrectangles.
We consider a few situations where such a weighted sum might arise.

Example 5.8. The first is when f (x, y) represents some sort of density, or quantity per unit
area, at the point (x, y), for instance, mass density or population density. Then f (pij )4xi 4yj is
approximately the quantity on a typical subrectangle, and
ZZ
(density) dA = total quantity in D,
D

for instance, the total mass or total population.

Example 5.9. Suppose that f is the constant function f (x, y) = 1 for all (x, y) in D. Then
f (pij )4xi 4yj = 1 · 4xi 4yj is the area of a subrectangle, so:
ZZ
1 dA = Area (D).
D

Example 5.10. Let’s return to the graph of a function z = f (x, y), where f (x, y) ≥ 0 for all (x, y)
in D. Then f (pij ) is the height of the graph above the sample point pij , and f (pij )4xi 4yj is
approximately the volume below the graph and above a typical subrectangle (Figure 5.17). Thus:
ZZ
f (x, y) dA = volume below the graph of f and above D.
D

As a result, volume can be regarded either as an iterated integral, as in Section 5.1, or as a


double integral, i.e., a limit of Riemann sums. That these two approaches give the same thing is
another way of understanding Fubini’s theorem.

Example 5.11. Average value of a function.


To average a list of numbers, say 2, 4, 7, 4, 4, we find the sum and divide by the number of
entries:
2+4+7+4+4 21
average = = = 4.2.
5 5

www.dbooks.org
132 CHAPTER 5. REAL-VALUED FUNCTIONS: INTEGRATION

Figure 5.17: Visualizing a Riemann sum as approximating the volume under a graph

This could also be written as:


1 1 1 1 1 1 3 1
2· +4· +7· +4· +4· or 2· +4· +7· .
5 5 5 5 5 5 5 5
In other words, it’s a sum in which each value is weighted by its relative frequency.
Similarly, to find the average value of a function f (x, y) over a region D, we take a sampling
of points pij in D and consider the sum in which each function value f (pij ) is weighted by the
fraction of D that it represents. If the sampling comes from a subdivision into subrectangles and
we think of each sample point as representing its subrectangle, this gives:
X 4xi 4yj 1 X
Average value ≈ f (pij ) = f (pij )4xi 4yj .
Area (D) Area (D)

Taking the limit as 4xi and 4yj go to 0 leads to the following definition.
Definition.
1. The average value of a bounded function f (x, y) on a bounded subset D of R2 , denoted f ,
is defined to be: ZZ
1
f= f dA.
Area (D) D

2. For example, the average x and y-coordinates on D are:


ZZ ZZ
1 1
x= x dA and y= y dA.
Area (D) D Area (D) D

The point (x, y) is called the centroid of D.


Example 5.12. Let D be the triangular region in R2 with vertices (0, 0), (2, 2), and (−2, 2).
(a) Find the average value of f (x, y) = y 2 − x2 on D. The graph of f is in Figure 5.18.
(b) Find the centroid of D.
(a) Here, f = Area1 (D) D (y 2 − x2 ) dA, where the region D is shown in Figure 5.19. We could find
RR
RR
the area of D using Area (D) = D 1 dA as in Example 5.9, but, since D is a triangle, we can use
geometry instead:
1 1
Area (D) = (base)(height) = · 4 · 2 = 4.
2 2
5.3. INTERPRETATIONS OF THE DOUBLE INTEGRAL 133

Figure 5.18: The graph of f (x, y) = y 2 − x2 over the triangle D

Figure 5.19: The triangle D

To set up the double integral D (y 2 − x2 ) dA as an iterated integral, note that D is bounded on


RR

the left by y = −x, on the right by y = x, and on top by y = 2. We use cross-sections perpendicular
to the y-axis. These cross-sections exist from y = 0 to y = 2, and, for each y, x goes from x = −y
to x = y. Hence:
ZZ Z 2 Z y 
2 2 2 2
(y − x ) dA = (y − x ) dx dy
D 0 −y
2
1 3 x=y 
Z
2
= y x− x dy
0 3 x=−y
Z 2
1 1 
= (y 3 − y 3 ) − (−y 3 + y 3 ) dy
0 3 3
Z 2
4 3
= y dy
0 3
1 2
= y 4 0
3
16
= .
3
Therefore f = 41 · 16 4
3 = 3.
As a side note, suppose we had tried to calculate the double integral using cross-sections per-
pendicular to the x-axis instead. These cross-sections exist from x = −2 to x = 2, but the y
endpoints of a given cross-section are different for the left and right halves of the triangle. When x
is between −2 and 0, the cross-section at x goes from y = −x to y = 2, whereas when x is between
0 and 2, it goes from y = x to y = 2 (Figure 5.20). Thus to describe D this way, the integral must
be split into two pieces:
ZZ Z 0 Z 2  Z 2 Z 2 
2 2 2 2 2 2
(y − x ) dA = (y − x ) dy dx + (y − x ) dy dx.
D −2 −x 0 x

www.dbooks.org
134 CHAPTER 5. REAL-VALUED FUNCTIONS: INTEGRATION

Figure 5.20: Reversing the order of integration to integrate over D

Actually, we could try to take advantage of symmetry properties of D and f to avoid having
to evaluate both halves. We take up this idea more fully when we discuss the change of variables
theorem in Chapter 7.
(b) Because of the symmetry of D in the y-axis, it’s intuitively clear that the average x-coordinate
is 0: x = 0. Again, we justify this more carefully later (see Example 7.7 in
RRChapter 7).
For the average y-coordinate, by definition, y = Area1 (D) D y dA = 14 D y dA. We describe D
RR

using the same limits of integration as in part (a) with cross-sections perpendicular to the y-axis:
ZZ Z 2 Z y 
y dA = y dx dy
D 0 −y
Z 2
x=y

= xy dy
0 x=−y
Z 2
= (y 2 − (−y 2 )) dy
0
Z 2
= 2y 2 dy
0
2 2
= y 3 0
3
16
= .
3
1 16
Thus, y = 4 · 3 = 43 , and the centroid is the point (x, y) = (0, 43 ).

5.4 Parametrizations of surfaces


Thus far, we have studied surfaces that arise as:

• graphs z = f (x, y) (or, analogously, y = g(x, z) and x = h(y, z)), or


• level sets f (x, y, z) = c.

We next describe surfaces by how they are traced out. This is like the way we described curves using
parametrizations, except that now, since surfaces are two-dimensional, we need two parameters. In
other words, surfaces will be described by functions σ : D → Rn , where D ⊂ R2 , as illustrated in
Figure 5.21. We want the domain of the parametrization to be two-dimensional, so we assume that
D is an open set U in R2 together with all of its boundary points. The technical term is that D
is the closure of U . Actually, the theory carries over if some of the boundary points are missing,
5.4. PARAMETRIZATIONS OF SURFACES 135

but, in our examples, D will often be a rectangle or a closed disk in R2 , in which case all boundary
points are included.
The coordinates of D are the parameters. There are often natural choices for what to call them
depending on the surface, though generically we call them s and t. For every (s, t) in D, σ(s, t)
is a point in Rn , so we may write σ(s, t) = (x1 (s, t), x2 (s, t), . . . , xn (s, t)). We shall focus almost
exclusively on surfaces in R3 , in which case, σ(s, t) = (x(s, t), y(s, t), z(s, t)).

Figure 5.21: A parametrization of a surface in R3

The reason for introducing the topic now is that this is a good time to talk about how to integrate
real-valued functions over surfaces. Because surfaces are two-dimensional objects, in principle, these
integrals should be extensions of the double integrals we are in the midst of studying. For example,
the interpretations of limits of weighted sums that apply to integrals over regions in the plane
should then carry over to integrals over surfaces as well.
In the remainder of this section, we look at several examples of parametrizations of surfaces
and, in particular, some common geometric parameters. Integrals we leave for the next section.
p
Example 5.13. Consider the portion of the cone z = x2 + y 2 in R3 where 0 ≤ z ≤ 2. See Figure
5.22.pThe surface is already described in terms of x and y, so we use (x, y) as parameters with
z = x2 + y 2 , that is, let:
p
σ(x, y) = (x, y, x2 + y 2 ).
p
The domain D of σ is the set of all (x, y) such that 0 ≤ x2 + y 2 ≤ 2, or x2 + y 2 ≤ 4. This is a
disk of radius 2 in the xy-plane. As (x, y) ranges over D, σ(x, y) sweeps out the given portion of
the cone.

Figure 5.22: A parametrization of a cone using x and y as parameters

www.dbooks.org
136 CHAPTER 5. REAL-VALUED FUNCTIONS: INTEGRATION

In this example, we used the x and y coordinates of the xyz-coordinate system as parameters
for the surface. This is always an option for surfaces that are graphs z = f (x, y) of functions of
two variables. We next introduce some other coordinate systems that can be used to locate points
in the plane or in three-dimensional space. These coordinate systems provide the most convenient
way to approach a variety of situations, though, for the time being, we just show how two of the
coordinates are natural parameters for certain common surfaces.

5.4.1 Polar coordinates (r, θ) in R2


We referred to polar coordinates in Chapter 3, but perhaps it is best to introduce them formally.
To plot a point in R2 with polar coordinates (r, θ), go out r units along the positive x-axis, then
rotate counterclockwise about the origin by angle θ (Figure 5.23). This brings you to the point

Figure 5.23: Polar coordinates (r, θ)

(r cos θ, r sin θ) in rectangular coordinates. Thus the conversions from polar to rectangular coordi-
nates are: (
x = r cos θ
y = r sin θ.
p
Also, by the Pythagorean theorem, r = x2 + y 2 . All of R2 can be described by choosing r and θ
in the ranges r ≥ 0 and 0 ≤ θ ≤ 2π. Actually, θ = 0 and θ = 2π represent the same points, along
the positive x-axis, but it is convenient to allow this degree of redundancy.
For example, the disk D of radius 2 about the origin in Example 5.13 can be described in polar
coordinates by the conditions 0 ≤ r ≤ 2, 0 ≤ θ ≤ 2π, that is, (r, θ) ∈ [0, 2] × [0, 2π]. We think of
imposing polar coordinates on D as a transformation T : [0, 2] × [0, 2π] → R2 from the rθ-plane to
the xy-plane, where T (r, θ) = (r cos θ, r sin θ). Horizontal segments 0 ≤ r ≤ 2, θ = constant, in the
rθ-plane get mapped to radial segments emanating from the origin in the xy-plane, and vertical
segments r = constant, 0 ≤ θ ≤ 2π, get mapped to circles centered at the origin. This is illustrated
in Figure 5.24
p
Example 5.14. We return to the cone z = x2 + y 2 , 0 ≤ z ≤ 2, from Example 5.13. In polar
coordinates, the equation of the cone is z = r, and combining this with the polar description of the
disk D gives a parametrization σ e of the cone with r and θ as parameters:

e : [0, 2] × [0, 2π] → R3 ,


σ σ
e(r, θ) = (r cos θ, r sin θ, r).

See Figure 5.25. The actual cone is the same one as before, but one advantage of this parametriza-
tion is that, for the purposes of integration, rectangles are nicer domains than disks.
5.4. PARAMETRIZATIONS OF SURFACES 137

Figure 5.24: The polar coordinate transformation from the rθ-plane to the xy-plane

Figure 5.25: A parametrization of a cone using polar coordinates as parameters

5.4.2 Cylindrical coordinates (r, θ, z) in R3


Cylindrical coordinates are polar coordinates in the xy-plane together with the usual z-coordinate.
To plot a point with cylindrical coordinates (r, θ, z), go out r units along the positive x-axis, rotate
counterclockwise about the positive z-axis by an angle θ, and then go vertically z units. These
coordinates are shown in Figure 5.26. This brings you to the point (r cos θ, r sin θ, z) in xyz-space.

Figure 5.26: Cylindrical coordinates (r, θ, z)

Thus the conversions from cylindrical to rectangular coordinates are:



x = r cos θ

y = r sin θ

z = z.

www.dbooks.org
138 CHAPTER 5. REAL-VALUED FUNCTIONS: INTEGRATION
p
Again r = x2 + y 2 . All of R3 is covered by choosing r ≥ 0, 0 ≤ θ ≤ 2π, −∞ < z < ∞.

Example 5.15 (Circular cylinders). We parametrize the circular cylinder of radius a and height
h given by the conditions:
x2 + y 2 = a2 , 0 ≤ z ≤ h.
The axis of the cylinder is the z-axis.
In cylindrical coordinates, along the cylinder, r is fixed at the radius a, but θ and z can vary
independently. The surface is traced out by setting r = a and letting θ and z range over 0 ≤ θ ≤ 2π,
0 ≤ z ≤ h. In other words, we can parametrize the cylinder using θ and z as parameters. With
r = a fixed, the conversions from cylindrical to rectangular coordinates give the parametrization:

σ : [0, 2π] × [0, h] → R3 , σ(θ, z) = (a cos θ, a sin θ, z).

See Figure 5.27.

Figure 5.27: A parametrization of a cylinder using cylindrical coordinates θ and z as parameters

It is instructive to think about how σ transforms the rectangle D = [0, 2π] × [0, h] to become
the cylinder S. In polar/cylindrical coordinates, θ = 0 and θ = 2π represent the same thing in
xyz-space. Thus σ maps the points (0, z) and (2π, z) in the θz-parameter plane to the same point
in R3 . Apart from this, distinct points in D get mapped to distinct points in xyz-space. One way
to construct a cylinder is to take a rectangle like D and glue together, or identify, points (0, z) on
the left edge with points (2π, z) on the right. It is clear geometrically that the result is a cylinder,
and σ gives a formula that carries it out.

5.4.3 Spherical coordinates (ρ, φ, θ) in R3


The three spherical coordinates of a point in R3 are described geometrically as follows:

• ρ is the distance from the origin to the point.


• φ is the angle from the positive z-axis to the point. It is like an angle of latitude, except
measured down from the positive z-axis rather than up or down from the equatorial plane.
• θ is the usual polar/cylindrical angle.

To plot a point in R3 with spherical coordinates (ρ, φ, θ), first go out ρ units along the positive z-
axis. This brings you to the point (0, 0, ρ) in rectangular coordinates. Then, rotate counterclockwise
about the positive y-axis by an angle φ, bringing you to the point (ρ sin φ, 0, ρ cos φ) in the xz-plane,
a distance ρ from the origin and r = ρ sin φ from the z-axis. Finally, rotate counterclockwise about
5.4. PARAMETRIZATIONS OF SURFACES 139

Figure 5.28: Spherical coordinates (ρ, φ, θ)

the positive z-axis by an angle θ. This lands at the point (ρ sin φ cos θ, ρ sin φ sin θ, ρ cos φ). The
coordinates are labeled in Figure 5.28.
Thus the conversions from spherical to rectangular coordinates are:

x = ρ sin φ cos θ

y = ρ sin φ sin θ

z = ρ cos φ.

p
From the Pythagorean theorem, the distance from the origin is ρ = x2 + y 2 + z 2 . All of R3 is
covered by spherical coordinates in the intervals ρ ≥ 0, 0 ≤ φ ≤ π, 0 ≤ θ ≤ 2π. (Note that the
angular interval 0 ≤ φ ≤ π ranges from the direction of the positive z-axis to the direction of
the negative z-axis, and the interval 0 ≤ θ ≤ 2π sweeps all the way around the z-axis. This is
why values of φ in the interval π < φ ≤ 2π are not needed—they would duplicate points already
covered.)
Example 5.16 (Spheres). We parametrize the sphere of radius a centered at the origin:
x2 + y 2 + z 2 = a2 .
In spherical coordinates, the sphere is traced out by keeping the value of ρ fixed at the radius a,
while φ and θ vary independently. More precisely, set ρ = a, and let φ and θ range over 0 ≤ φ ≤ π,
0 ≤ θ ≤ 2π. Hence we use φ and θ as parameters, and, from the spherical to rectangular conversions
with ρ = a fixed, obtain the parametrization:
σ : [0, π] × [0, 2π] → R3 , σ(φ, θ) = (a sin φ cos θ, a sin φ sin θ, a cos φ).
This is illustrated in Figure 5.29.
Again the parametrization gives precise instructions for how to transform the rectangle D =
[0, π] × [0, 2π] into a sphere. Here, all points where φ = 0 get mapped to the north pole. These are
the points along the left edge of D. Similarly, the right edge of D, where φ = π, gets mapped to
the south pole. Finally, points where θ = 0 and θ = 2π, that is, along the top and bottom edges,
are identified in pairs along a meridian on the sphere in the xz-plane. Thus to construct a sphere
from a rectangle, collapse each of the left and right sides to points and zip together the remaining
two sides to close up the surface.
p
Example 5.17. We return one last time to the cone z = x2 + y 2 , 0 ≤ z ≤ 2, which we
have parametrized twice already, once using rectangular coordinates (x, y) and once using po-
lar/cylindrical coordinates (r, θ) as parameters. It can be parametrized using spherical coordinates
as well.

www.dbooks.org
140 CHAPTER 5. REAL-VALUED FUNCTIONS: INTEGRATION

Figure 5.29: A parametrization of a sphere using spherical coordinates φ and θ as parameters

Here, along the cone, the angle φ down from the positive z-axis is fixed at π4 . To see this, note,
for instance, that the cross-section with the yz-plane is √ z = |y|. On the other hand, ρ and θ can
vary independently. For example, ρ ranges from 0 to 2 2. (To determine the upper endpoint, one
can either use trigonometry or solve for ρ using z = ρ cos φ with z = 2 and φ = π4 .) Also, θ rotates
all the way around from 0 to 2π. From the spherical conversions to rectangular coordinates, we
obtain a parametrization:

b : [0, 2 2] × [0, 2π] → R3 ,
σ
√ √ √
π π π 2 2 2
σ
b(ρ, θ) = (ρ sin cos θ, ρ sin sin θ, ρ cos ) = ( ρ cos θ, ρ sin θ, ρ).
4 4 4 2 2 2
We leave as an exercise the instructions that σ
b gives for turning a rectangle into a cone. See Figure
5.30.

Figure 5.30: A parametrization of a cone using spherical coordinates (ρ, θ) as parameters

As the cone illustrates, it may be possible to parametrize a given surface in several different
ways.

5.5 Integrals with respect to surface area


We now discuss how to integrate a real-valued function f : S → R defined on a surface S in R3 . To
formulate a reasonable definition, we follow the usual procedure for integrating: chop up S into a
5.5. INTEGRALS WITH RESPECT TO SURFACE AREA 141

two-dimensional partition
P of small pieces of surface area 4Sij , choose a sample point pij in each
piece, form a sum i,j f (pij ) 4Sij , and take the limit as the size of the pieces goes to zero.
We use a parametrization of S to convert the calculation into a double integral over a region in
the plane. The approach is similar to what we did when we defined integrals over curves with respect
to arclength, though that was a long time ago (Section 2.3 to be precise). Thus let σ : D → R3 be
a parametrization of S, where D ⊂ R2 . We write σ(s, t) = (x(s, t), y(s, t), z(s, t)). We subdivide
D (or, really, a rectangle that contains D) into small subrectangles of dimensions 4si by 4tj , and
choose a sample point aij in each subrectangle. Intuitively, the expectation is that σ transforms
each subrectangle into a small “curvy quadrilateral” in S. This is one of the small pieces of S
alluded to earlier. We denote its area by 4Sij and take pij = σ(aij ) as the corresponding sample
point. This is depicted in Figure 5.31.

Figure 5.31: The integral with respect to surface area


∂σ ∂σ
Let ∂s and ∂t be the vectors given by:
   
∂σ ∂x ∂y ∂z ∂σ ∂x ∂y ∂z
= , , and = , , .
∂s ∂s ∂s ∂s ∂t ∂t ∂t ∂t
We think of these vectors as velocities with respect to s and t, respectively. Then σ transforms
the 4si side of a typical subrectangle of D approximately to the vector ∂σ ∂s 4si and the 4tj side
approximately to ∂σ
∂t 4tj . This gives an approximation of the area of the curvy quadrilaterals:
∂σ ∂σ
4Sij ≈ area of the parallelogram determined by 4si and 4tj
∂s ∂t
∂σ ∂σ

∂s 4s i × 4t j
(by an old property about the cross product and areas, page 36)
∂t

∂σ ∂σ

∂s × ∂t 4si 4tj .

See FigureP 5.32.


Hence i,j f (pij ) 4Sij ≈ i,j f (σ(aij )) k ∂σ ∂σ
P
∂s × ∂t k 4si 4tj . In the limit as 4si and 4tj go to
zero, this becomes the double integral D f (σ(s, t)) k ∂σ ∂σ
RR
∂s × ∂t k ds dt. With this as motivation, we
adopt the following definition.
Definition. Let S be a surface in R3 , and let f : S → R be a continuous
RR real-valued function.
Then the integral of f with respect to surface area, denoted S f dS, is defined to be:
ZZ ZZ
∂σ ∂σ
f dS = f (σ(s, t))
× ds dt,
S D ∂s ∂t
where σ : D → R3 is a parametrization of S.

www.dbooks.org
142 CHAPTER 5. REAL-VALUED FUNCTIONS: INTEGRATION

Figure 5.32: A blow-up of the previous figure: σ sends a small 4si by 4tj subrectangle to a curvy
quadrilateral whose area 4Sij is approximately ∂σ ∂σ

∂s × ∂t 4s i 4tj .

We should acknowledge a point that was also glossed over in connection with integrals
P with re-
spect to arc length. Namely, we motivated the surface integral using Riemann sums i,j f (pij )4Sij
based on the surface, but the actual definition
RR relies on a parametrization σ. Perhaps a better nota-
tion for the integral as defined would be σ f dS. The issue is whether the choice of parametrization
affects the value of the integral. Once we have a few more tools available, we shall see that there is
no need for concern: using the definition of the integral with two different parametrizations of the
same surface gives the same value. The effect of parametrizations on surface integrals is discussed
at the end of Chapter 10.
P P
Example 5.18. If f (x, y, z) = 1 for all (x, y, z) in S, then i,j f (pij ) 4Sij = i,j 1 · 4Sij , the
sum of the small pieces of surface area. This is the total area of S, that is:
ZZ
Surface area of S = 1 dS. (5.3)
S

Example 5.19. Let S be the sphere x2 + y 2 + z 2 = a2 of radius a.


(a) Find the surface area of S.
(b) Find S z 2 dS.
RR

(c) Find S (x2 + y 2 + z 2 ) dS.


RR

We use the parametrization of a sphere derived in Example 5.16 with spherical coordinates
(φ, θ) as parameters and ρ = a fixed:
σ : [0, π] × [0, 2π] → R3 , σ(φ, θ) = (a sin φ cos θ, a sin φ sin θ, a cos φ).
Let D denote the domain of the parametrization, D = [0, π] × [0, 2π].
(a) Applying the previous example, we integrate the function f = 1. Then, using the definition of
the integral: ZZ ZZ
∂σ ∂σ
Area (S) = 1 dS = 1· ∂φ × ∂θ dφ dθ.

S D
∂σ ∂σ
We calculate ∂φ and and place them in the rows of the determinant for a cross product:
∂θ
 
i j k
∂σ ∂σ
× = det  a cos φ cos θ a cos φ sin θ −a sin φ
∂φ ∂θ
−a sin φ sin θ a sin φ cos θ 0
= a sin φ cos θ, a sin φ sin θ, a cos φ sin φ(cos2 θ + sin2 θ)
2 2 2 2 2


= a2 sin φ sin φ cos θ, sin φ sin θ, cos φ ,



5.5. INTEGRALS WITH RESPECT TO SURFACE AREA 143

so:

∂σ ∂σ q
× = a2 sin φ sin2 φ cos2 θ + sin2 φ sin2 θ + cos2 φ
∂φ ∂θ
q
= a2 sin φ sin2 φ + cos2 φ
= a2 sin φ. (5.4)
Hence:
ZZ
Area (S) = a2 sin φ dφ dθ
D
Z 2π Z π 
2
= a sin φ dφ dθ
0 0
Z 2π
φ=π
2

= −a cos φ dθ
0 φ=0
Z 2π
a2 − (−a2 ) dθ

=
0
Z 2π
= 2a2 dθ
0

= 2a2 θ 0
2
= 4πa .
This formula gets used enough that we put it in a box:
The surface area of a sphere of radius a is 4πa2 .
(b) Here, f (x, y, z) = z 2 , and substituting for z in terms of the parameters gives f (σ(φ, θ)) =
a2 cos2 φ. Using the result of (5.4) for k ∂σ ∂σ
∂φ × ∂θ k, we obtain:
ZZ ZZ
2
∂σ ∂σ
z dS = f (σ(φ, θ))
× dφ dθ
S D ∂φ ∂θ
ZZ
= (a2 cos2 φ) · (a2 sin φ) dφ dθ
ZDZ
4
=a cos2 φ sin φ dφ dθ
D
Z 2π Z π 
4 2
=a cos φ sin φ dφ dθ (let u = cos φ, du = − sin φ dφ)
0 0
Z 2π
φ=π
4 1 3

=a − cos φ dθ
0 3 φ=0
Z 2π
1 1 
= a4 − (− ) dθ
0 3 3
Z 2π
2
= a4 dθ
0 3
2 2π
= a4 θ 0
3
4
= πa4 .
3

www.dbooks.org
144 CHAPTER 5. REAL-VALUED FUNCTIONS: INTEGRATION

= x2 + y 2 + z 2RR
(c) This time f (x, y, z) RR . We use a trick! On the sphere S, x2 + y 2 + z 2 = a2 , so
2 2 2 2 2 2
RR
S (x + y + z ) dS = S a dS = a S 1 dS = a Area (S). Using the formula for the area from
part (a) then gives: ZZ
(x2 + y 2 + z 2 ) dS = a2 · 4πa2 = 4πa4 .
S

The trick enabled us to evaluate the integral with almost no calculation. This is a highly
desirable situation. In retrospect,
RR we can get even more mileage out of it by going back and using
the result of part (c) to redo z 2 dS in part (b). For the sphere S is symmetric in x, y, and z,
SRR
whence S x2 dS = S y 2 dS = S z 2 dS. Therefore S (x2 + y 2 + z 2 ) dS = 3 S z 2 dS, or:
RR RR RR RR

ZZ ZZ
2 1 1 4
z dS = (x2 + y 2 + z 2 ) dS = · 4πa4 = πa4 ,
S 3 S 3 3

where the next-to-last step used part (c). Happily, this is the same as the answer we obtained in
part (b) originally by grinding it out with a parametrization.

5.6 Triple integrals and beyond


Integrals of real-valued functions of three or more variables are conceptually the same as double
integrals, though the notation is more encumbered. We merely outline the definition. Given:

• a bounded subset W of Rn (that is, there is an n-dimensional box R = [a1 , b1 ] × [a2 , b2 ] ×


· · · × [an , bn ] such that W ⊂ R) and
• a bounded real-valued function f : W → R (that is, there is a scalar M such that |f (x)| ≤ M
for all x in W ),

one subdivides W into small n-dimensional subboxes of dimensions 4x1P by 4x2 by · · · by 4xn ,
chooses a sample point p in each subbox, and considers sums of the form f (p) 4x1 4x2 · · · 4xn ,
where the summation is over all the subboxes. The integral is the limit of these sums as the
dimensions of the subboxes go to zero. It is denoted:
ZZ Z ZZ Z
··· f dV or ··· f dx1 dx2 · · · dxn .
W W

For instance, if W ⊂ R3 and f = f (x, y, z) is a function of three variables, then


RRR
W f dV is called
a triple integral.
The higher-dimensional integral has the same sort of weighted sum intepretations as the double
integral, such as:
RR R
• n-dimensional volume: Vol (W ) = · · · W 1 dV ,
• average value: f = Vol 1(W )
RR R
· · · W f dV .

For example, if W ⊂ R3 , the average x-coordinate on W is x = Vol 1(W )


RRR
W x dV , similarly for y
and z, and the point (x, y, z) is called the centroid of W .
To calculate the integral, one usually evaluates a sequence of n one-dimensional partial integrals.
This can be complicated, because it may be difficult to determine the nested limits of integration
that describe the domain W . This is true even in three dimensions, where, in principle, it is still
possible to visualize what W looks like. We illustrate this by examining in detail a single example
of a triple integral.
5.6. TRIPLE INTEGRALS AND BEYOND 145

Figure 5.33: Three of the planes that bound W : z = y (left), y = x (middle), and x = 1 (right).
The fourth is z = 0, the xy-plane.

3
RRR
Example 5.20. Find W (y + z) dV if W is the solid region in R bounded by the planes z = y,
y = x, x = 1, and z = 0. See Figure 5.33.
The planes in question slice through one another in such a way that the solid W that they
bound is a tetrahedron. Each of its four triangular faces is contained in one of the planes. Looking
at W towards the origin from the first octant, the z = y face is the top, z = 0 is the bottom, x = 1
is the front, and y = x is the back. This is shown on the left of Figure 5.34.

Figure 5.34: The region of integration W (left) and its projection on the xy-plane (right)

To find the endpoints that describe W in an iterated integral, our approach is to focus on
choosing which variable to integrate with respect to first. Once that variable is integrated out of
the picture, what remains is a double integral, which by now is a much more familiar situation.
For instance, say we decide to integrate first with respect to z, treating x and y as constant.
We need to know the values of x and y for which this process makes sense, that is, the pairs (x, y)
for which there is at least one z such that (x, y, z) ∈ W . These pairs are precisely the projection of
W on the xy-plane. In this example, this is the triangle D at the base of the tetrahedron, which is
shown on the right of Figure 5.34. In the xy-plane, it is bounded by the x-axis and the lines x = 1
and y = x.
For each (x, y) in D, the values of z for which (x, y, z) ∈ W go from z = 0 to z = y. Thus:
ZZZ Z Z Z y 
(y + z) dV = (y + z) dz dx dy.
W D 0

The remaining limits of integration come from describing D as a double integral in the usual way,
for instance, using the order of integration dy dx. After that, the integral can be evaluated one

www.dbooks.org
146 CHAPTER 5. REAL-VALUED FUNCTIONS: INTEGRATION

antidifferentiation at a time. Thus:


ZZZ Z 1 Z x Z y  
(y + z) dV = (y + z) dz dy dx
W 0 0 0
1 2 z=y
Z 1 Z x 
= (yz + z ) dy dx
0 0 2
Z 1 Z x  z=0
3 2
= y dy dx
0 0 2
1 3 y=x
Z 1
= y dx
0 2 y=0
Z 1
1 3
= x dx
0 2
1 1
= x4 0
8
1
= .
8
RRR
Example 5.21. Express the same integral W (y+z) dV from the previous example as an iterated
integral in which:
(a) the first integration is with respect to x,
(b) the first integration is with respect to y.
(a) To integrate first with respect to x, we treat y and z as constant. As in the previous example,
the relevant values of (y, z) are the points in the projection D of W onto the yz-plane. As before,
this is a triangle, though no longer one of the faces of the tetrahedron. The triangle is bounded in
the yz-plane by the y-axis and the lines y = 1 and z = y. See Figure 5.35 on the left.
For each (y, z) in D, the point (x, y, z) is in W from x = y to x = 1. Hence:
ZZZ Z Z Z 1 
(y + z) dV = (y + z) dx dy dz.
W D y
Describing the remaining double integral over D using the order of integration, say dz dy, gives:
ZZZ Z 1 Z y Z 1  
(y + z) dV = (y + z) dx dz dy.
W 0 0 y
We won’t carry out the iterated antidifferentiation, though we are optimistic that the final answer
is again 18 .
(b) This time, the integral will be a double integral of an integral with respect to y over the
projection of W onto the xz-plane . Again, the projection D is a triangle. Two sides of the triangle
are contained in the x-axis and the line x = 1 in the xz-plane. The third side is the projection of
the line where the planes z = y and y = x intersect. This line of intersection in R3 is described
by z = y = x, and its projection onto the xz-plane is z = x. This is the remaining side of D. See
Figure 5.35 on the right.
For each (x, z) in D, y goes from y = z to y = x. Therefore:
ZZZ Z Z Z x 
(y + z) dV = (y + z) dy dx dz.
W D z
Finally, integrating over D using the order dz dx results in:
ZZZ Z 1 Z x Z x  
(y + z) dV = (y + z) dy dz dx.
W 0 0 z
5.7. EXERCISES FOR CHAPTER 5 147

Figure 5.35: Integrating first with respect to x (left) or y (right)

5.7 Exercises for Chapter 5


Section 1 Volume and iterated integrals
In Exercises 1.1–1.4, evaluate the given iterated integral.
Z 2 Z 4 
2
1.1. x y dy dx
1 3
Z 1 Z x 
1.2. (x + y) dy dx
0 x2
Z π/2 Z sin y 
1.3. (x cos y + 1) dx dy
0 0
Z 2 Z y 
1.4. sin(3x + 2y) dx dy
1 −y

x2
RR
1.5. Evaluate D y dA if D is the rectangle described by 1 ≤ x ≤ 3, 2 ≤ y ≤ 4.

+ y) dA if D is the quarter-disk x2 + y 2 ≤ 1, x ≥ 0, y ≥ 0.
RR
1.6. Evaluate D (x

ex+y dA if D is the triangular region with vertices (0, 0), (0, 2), (1, 0).
RR
1.7. Evaluate D
RR
1.8. Evaluate D xy dA if D is the triangular region with vertices (0, 0), (1, 1), (2, 0).

In Exercises 1.9–1.14, (a) sketch the domain of integration D in the xy-plane and (b) write an
equivalent expression with the order of integration reversed.
Z 2 Z 4 
1.9. f (x, y) dy dx
1 3
Z 1 Z e 
1.10. f (x, y) dy dx
0 ex
Z 1 Z y 
1.11. f (x, y) dx dy
0 y2
Z π/2 Z sin y 
1.12. f (x, y) dx dy
0 0

www.dbooks.org
148 CHAPTER 5. REAL-VALUED FUNCTIONS: INTEGRATION

Z 1 Z 1−x2 
1.13. f (x, y) dy dx
−1 0

Z 6 Z −1+ 3+y 
1.14. f (x, y) dx dy
1
−3 3
y

Z 1 Z 4 
xy
1.15. Consider the iterated integral ye dy dx.
1 1
4 x

(a) Sketch the domain of integration D in the xy-plane.


(b) Write an equivalent expression with the order of integration reversed.
(c) Evaluate the integral using whatever method seems best.
Z 6 Z 3 
2
1.16. Evaluate the integral sin(x ) dx dy.
y
0 2
ZZ p
1.17. Evaluate the integral 1 − x2 dA, where D is the quarter-disk in the first quadrant
D
described by x2 + y 2 ≤ 1, x ≥ 0, and y ≥ 0.

1.18. Find the volume of the wedge-shaped solid that lies above the xy-plane, below the plane
z = y, and inside the cylinder x2 + y 2 = 1.

1.19. Find the volume of the pyramid-shaped solid in the first octant bounded by the three coor-
dinate planes and the planes x + z = 1 and y + 2z = 2.

1.20. (a) If c is a positive constant, find the volume of the tetrahedron in the first octant bounded
by the plane x + y + z = c and the three coordinate planes.
(b) Consider the tetrahedron W bounded by the plane x+y +z = 1 and the three coordinate
planes. Suppose that you want to divide W into three pieces of equal volume by slicing
it with two planes parallel to x + y + z = 1, i.e., with planes of the form x + y + z = c.
How should the slices be made?

1.21. Let D be the unit square 0RR≤ x ≤ 1, 0 ≤ y ≤ 1, and let f : D → R be the function given by
f (x, y) = min{x, y}. Find D f (x, y) dx dy.

Section 2 The double integral

2.1. Let R = [0, 1] × [0, 1], and let f : R → R be the function given by:
(
1 if (x, y) = ( 21 , 12 ),
f (x, y) =
0 otherwise.

(a) Let R be subdivided into a 3 by 3 grid of subrectangles of equal size (Figure 5.36). What
are the possible values of Riemann sums based on this subdivision? (Different choices
of sample points may give different values.)
(b) Repeat for a 4 by 4 grid of subrectangles of equal size.
(c) How about for an n by n grid of subrectangles of equal size?
5.7. EXERCISES FOR CHAPTER 5 149

Figure 5.36: A 3 by 3 subdivision of R = [0, 1] × [0, 1]

(d) Let δ be a positive real number. If R is subdivided into a grid of subrectangles of


dimensions 4xi by 4yj , where 4xi < δ and 4yj < P δ for all i, j, show that any Riemann
sum based on the subdivision lies in the range 0 ≤ i,j f (pij ) 4xi 4yj < 4δ 2 .
RR P
(e) Show that R f (x, y) dA = 0, i.e., that lim i,j f (pij ) 4xi 4yj = 0. Technically,
4xi →0
4yj →0
this means you need to show that, given any  > 0, there exists a δ > 0 such that, for
any subdivision of R into subrectangles with 4x i < δ and 4yj < δ for all i, j, every
P
Riemann sum based on the subdivision satisfies i,j f (pij ) 4xi 4yj − 0 < .

2.2. Let R = [0, 1] × [0, 1], and let f : R → R be the function given by:
(
1 if x = 0 or y = 0,
f (x, y) =
0 otherwise.

(a) Let R be subdivided into a 3 by 3 grid of subrectangles of equal size (Figure 5.36). What
are the possible values of Riemann sums based on this subdivision?
(b) Repeat for a 4 by 4 grid of subrectangles of equal size.
(c) How about for an n by n grid of subrectangles of equal size?
RR
(d) Is f integrable on R? If so, what is the value of R f (x, y) dA?

2.3. Here is a result that will come in handy increasingly as we calculate more integrals. Let
R = [a, b] × [c, d] be a rectangle, and let F : R → R be a real-valued function such that the
variables x and y separate into two continuous factors, that is, F (x, y) = f (x)g(y), where
f : [a, b] → R and g : [c, d] → R are continuous. Show that:
ZZ Z b Z d 
f (x)g(y) dx dy = f (x) dx g(y) dy .
R a c

2.4. Let f : U → R be a continuous function defined on an open set U in R2 , and let a be a point
of U .

(a) If f (a) > 0, prove that there exists a rectangle R containing a such that:
ZZ
f dA > 0.
R

(Hint: Use Exercise 6.4 of Chapter 3.)

www.dbooks.org
150 CHAPTER 5. REAL-VALUED FUNCTIONS: INTEGRATION

(b) Let g : U → R be another continuous function on U . If f (a) > g(a), prove that there
exists a rectangle R containing a such that:
ZZ ZZ
f dA > g dA.
R R

(Hint: Consider f − g.)

2.5. In this exercise, we use the double integral to give another proof of the equality of mixed
partial derivatives. Let f (x, y) be a real-valued function defined on an open set U in R2
whose first and second-order partial derivatives are continuous on U .

(a) Let R = [a, b] × [c, d] be a rectangle that is contained in U . Show that:

∂2f ∂2f
ZZ ZZ
(x, y) dA = (x, y) dA.
R ∂x ∂y R ∂y ∂x

(Hint: Use Fubini’s theorem, with appropriate orders of integration, and the fundamen-
tal theorem of calculus to show that both sides are equal to f (b, d) − f (b, c) − f (a, d) +
f (a, c).)
∂ f 2 ∂ f 2
(b) Show that ∂x ∂y (a) = ∂y ∂x (a) for all a in U . (Hint: Use Exercise 2.4 to show that
something goes wrong if they are not equal.)

Section 3 Interpretations of the double integral

3.1. The population density of birds in a wildlife refuge decreases at a uniform rate with the
distance from a river. If the river is modeled as the x-axis in the plane, then the density at
the point (x, y) is given by f (x, y) = 5 − |y| hundred birds per square mile, where x and y are
measured in miles. Find the total number of birds in the rectangle R = [−2, 2] × [0, 1].

3.2. A small cookie has the shape of the region in the first quadrant bounded by the curves y = x2
and x = y 2 , where x and y are measured in inches. Chocolate is poured unevenly on top
of the cookie in such a way that the density of chocolate at the point (x, y) is given by
f (x, y) = 100(x + y) grams per square inch. Find the total mass of chocolate on the cookie.

3.3. The eye of a tornado is positioned directly over the origin in the plane. Suppose that the
wind speed on the ground at the point (x, y) is given by v(x, y) = 30(x2 + y 2 ) miles per hour.

(a) Find the average wind speed on the square R = [0, 2] × [0, 2].
(b) Find all points (x, y) of R at which the wind speed equals the average.

3.4. Find the centroid of the half-disk of radius a in R2 described by x2 + y 2 ≤ a2 , y ≥ 0.

3.5. Find the centroid of the triangular region in R2 with vertices (0, 0), (1, 2), and (1, 3).

3.6. Find the centroid of the triangular region in R2 with vertices (0, 0), (2, 2), and (1, 3).

3.7. Let D be the fudgsicle-shaped region of R2 that consists of the rectangle [−1, 1] × [0, h] of
height h topped by a half-disk of radius 1, as shown in Figure 5.37. Find the value of h such
that the point (0, h) is the centroid of D.
5.7. EXERCISES FOR CHAPTER 5 151

Figure 5.37: The fudgsicle

3.8. Let R be a rectangle in R2 , and let f : R → R be an integrable function. If f is continuous,


then it is true that there is a point of R at which f assumes its average value, that is, there
exists a point c in R such that f (c) = f . The proof is not hard, but it depends on subtle
properties of the real numbers that are beyond what we cover here. The same conclusion
need not hold, however, for discontinuous functions. Find an example of a rectangle R and
an integrable function f : R → R such that there is no point c in R at which f (c) = f .

Section 4 Parametrization of surfaces

4.1. Find a parametrization of the upper hemisphere of radius a described by x2 + y 2 + z 2 = a2 ,


z ≥ 0, using the following coordinates as parameters. Be sure to state what the domain of
your parametrization is.

(a) the rectangular coordinates x, y


(b) the polar/cylindrical coordinates r, θ
(c) the spherical coordinates ϕ, θ

4.2. Let S be the surface described in cylindrical coordinates by z = θ, 0 ≤ r ≤ 1, 0 ≤ θ ≤ 4π. It


is called a helicoid. One parametrization for it is:

σ : [0, 1] × [0, 4π] → R3 , σ(r, θ) = (r cos θ, r sin θ, θ).

(a) Describe the curve that is traced out by σ if r = 1/2 is held fixed and θ varies, that is,
the curve parametrized by the path α(θ) = σ( 21 , θ), 0 ≤ θ ≤ 4π.
(b) Similarly, describe the curve that is traced out by σ if θ = π/4 is held fixed and r varies,
that is, the curve parametrized by β(r) = σ(r, π4 ), 0 ≤ r ≤ 1.
(c) Describe S in a few words, and draw a sketch.

Section 5 Integrals with respect to surface area

5.1. Let S be the triangular surface in R3 whose vertices are (1, 0, 0), (0, 1, 0), and (0, 0, 1).

(a) Find a parametrization of S using x and y as parameters. What is the domain of your
parametrization? (Hint: The triangle is contained in a plane.)
(b) Use your parametrization and formula (5.3) to find the area of S. Check that your
answer agrees with what you would get using the formula: Area = 21 (base)(height).
RR
(c) Find S 7 dS. (Hint: Use your answer to part (b).)

www.dbooks.org
152 CHAPTER 5. REAL-VALUED FUNCTIONS: INTEGRATION

5.2. Let S be the cylindrical surface x2 + y 2 = 4, 0 ≤ z ≤ 3. Find the integral z 2 dS.


RR
S

5.3. Consider the graph z = f (x, y) of a smooth real-valued function f defined on a bounded
subset D of R2 . Show that the surface area of the graph is given by the formula:
s  2  2
ZZ
∂f ∂f
Surface area = 1+ + dx dy.
D ∂x ∂y

5.4. Let f : [a, b] → R be a smooth real-valued function of one variable, where a ≥ 0, and let S be
the surface of revolution that is swept out when the curve z = f (x) in the xz-plane is rotated
all the way around the z-axis. Show that the surface area of S is given by the formula:
Z b p
Surface area = 2π x 1 + f 0 (x)2 dx.
a

(Hint: Parametrize S using cylindrical coordinates.)

5.5. Let W be the solid bounded on top by the plane z = y + 5, on the sides by the cylinder
x2 + y 2 = 4, and on the bottom by the plane z = 0.

(a) Sketch W .
(b) Let S be the surface that bounds W (top, sides, and bottom). Find the area of S.
RR
(c) Find S z dS.

5.6. Let S be the helicoid from Exercise 4.2:

σ : [0, 1] × [0, 4π] → R3 , σ(r, θ) = (r cos θ, r sin θ, θ).


RR p
Find S z x2 + y 2 dS.

5.7. Let W be the solid region of Example 5.2 that lies inside the cylinders x2 + z 2 = 1 and
y 2 + z 2 = 1 and above the xy-plane, and let S be the exposed part of the surface that bounds
W , that is, the cylindrical surfaces but not the base.

(a) Find the surface area of S.


RR
(b) Find S z dS.

5.8. The sphere x2 + y 2 + z 2 = a2 of radius a sits inscribed in the circular cylinder x2 + y 2 = a2 .


Given real numbers c and d, where −a ≤ c < d ≤ a, show that the portions of the sphere
and the cylinder lying between the planes z = c and z = d have equal surface area. (Hint:
Parametrize both surfaces using the cylindrical coordinates θ and z as parameters.)

Section 6 Triple integrals and beyond


In Exercises 6.1–6.2, evaluate the given iterated integral.
Z 1 Z x Z x  
6.1. (x + y − z) dz dy dx
0 0 y
Z 1 Z 2x Z xy  
3 2
6.2. x y z dz dy dx
0 x 0
5.7. EXERCISES FOR CHAPTER 5 153
3 2 2
RRR
6.3. Find W xyz dV if W is the solid region in R below the surface z = x + y and above the
square 0 ≤ x ≤ 1, 0 ≤ y ≤ 1.
3
RRR
6.4. Find W (xz + y) dV if W is the solid region in R above the xy-plane bounded by the
surface y = x2 and the planes z = y and y = 1.
2 2
RRR
6.5. Find W x dV if W is the region in the first octant bounded by the saddle surface z = x −y
and the planes x = 2, y = 0, and z = 0.
2
RRR
6.6. Find W (1 + x + y + z) dV if W is the solid tetrahedron in the first octant bounded by
the plane x + y + z = 1 and the three coordinate planes.

6.7. Let a be a positive number, and let W = [0, a] × [0, a] × [0, a] in R3 . Find the average value
of f (x, y, z) = x2 + y 2 + z 2 on W .

For each of the triple integrals in Exercises 6.8–6.11, (a) sketch the domain of integration in
R3 , (b) write an equivalent expression in which the first integration is with respect to x, and (c)
write an equivalent expression in which the first integration is with respect to y.
q q
2 2 2
Z 2 Z 2− x2 Z 1− x4 − y2  
6.8. f (x, y, z) dz dy dx
−2 0 0

Z 1 Z 1−x2 Z 1  
6.9. f (x, y, z) dz dy dx
−1 0 0
Z 1 Z 1 Z x  
6.10. f (x, y, z) dz dy dx
0 x 0
Z 1 Z 1 Z y  
6.11. f (x, y, z) dz dy dx
0 x 0

www.dbooks.org
154 CHAPTER 5. REAL-VALUED FUNCTIONS: INTEGRATION
Part IV

Vector-valued functions

155

www.dbooks.org
Chapter 6

Differentiability and the chain rule

So far, we have studied functions of which at least one of the domain or codomain is a subset of R,
in other words:
• vector-valued functions of one variable, also known as paths, α : I → Rn , where I ⊂ R, or
• real-valued functions of n variables f : U → R, where U ⊂ Rn .
Now, we consider the general case of vector-valued functions of n variables, that is, functions of the
form f : U → Rm , where U ⊂ Rn and both n and m are allowed to be greater than 1.
We can leverage what we know about real-valued functions because, if x ∈ U , then, as an
element of Rm , f (x) has m coordinates:
f (x) = (f1 (x), f2 (x), . . . , fm (x)),
where each fi (x) is a real number. In other words, each component is a real-valued function
fi : U → R. Thus a vector-valued function is also a sequence of real-valued functions. We use
the notation above for the components consistently from now on. For example, if f : R2 → R2 is
the polar coordinate transformation f (r, θ) = (r cos θ, r sin θ), then f1 (r, θ) = r cos θ and f2 (r, θ) =
r sin θ. Also, as with paths, we denote f in plainface type, reserving boldface for a special type of
vector-valued function that we study beginning in Chapter 8.
We treated the case of real-valued functions fairly rigorously, and we shall see that much of the
theory carries over to vector-valued functions without incident. On the other hand, we discussed
paths back in Chapter 2 more informally, so what we are about to do here in the vector-valued case
can be taken as establishing the theory behind what we did then.

6.1 Continuity revisited


The motivation and definition for continuity of vector-valued functions carries over almost verbatim
from the real-valued case.
Definition. Let U be an open set in Rn , and let f : U → Rm be a function. We say that f is
continuous at a point a of U if, given any open ball B(f (a), ) about f (a), there exists an open
ball B(a, δ) about a such that:
f (B(a, δ)) ⊂ B(f (a), ).
In other words, if x ∈ B(a, δ), then f (x) ∈ B(f (a), ). See Figure 6.1. Equivalently, given any
 > 0, there exists a δ > 0 such that:
if kx − ak < δ, then kf (x) − f (a)k < .

157

www.dbooks.org
158 CHAPTER 6. DIFFERENTIABILITY AND THE CHAIN RULE

Figure 6.1: Continuity of a vector-valued function: given any ball B(f (a), ) about f (a), there is a
ball B(a, δ) about a such that f (B(a, δ)) ⊂ B(f (a), ).

We say that f is a continuous function if it is continuous at every point of its domain.


Proposition 6.1. If f (x) = (f1 (x), f2 (x), . . . , fm (x)), then f is continuous at a if and only if
f1 , f2 , . . . , fm are all continuous at a.
Proof. Intuitively, this is because f (x) − f (a) can be made small if and only if all of its components
fi (x)−fi (a) can be made small, but this needs to be made more precise. There are two implications
that need to be proven here, that each condition implies the other. We give the details for one of
the implications and leave the other for the exercises.
Namely, we show that, if f1 , f2 , . . . , fm are continuous at a, then so is f . Let  > 0 be given.
We want to ensure that kf (x) − f (a)k < . Note that:
q 2 2 2
kf (x) − f (a)k = f1 (x) − f1 (a) + f2 (x) − f2 (a) + · · · + fm (x) − fm (a) .
If we could arrange that |fi (x) − fi (a)| < √m for i = 1, 2, . . . , m, then kf (x) − f (a)k <
q
2 2 2

m + m + · · · + m = 2 =  would follow.
This we can do. For taking √m as the given input in the definition of continuity, the fact that fi is
continuous at a for i = 1, 2, . . . , m means that there exists a δi > 0 such that |fi (x) − fi (a)| < √m
whenever kx − ak < δi . Let δ = min{δ1 , δ2 , . . . , δm }. If kx − ak < δ, it remains true that
|fi (x) − fi (a)| < √m for all i, and hence, as noted above, kf (x) − f (a)k < . We have found a δ
that satisfies the definition of continuity for f .
The proof of the converse is Exercise 1.1.

The proposition allows us to determine if f is continuous by looking at its real-valued component


functions, which is something we have studied before. For instance, for the function f (r, θ) =
(r cos θ, r sin θ), both component functions f1 (r, θ) = r cos θ and f2 (r, θ) = r sin θ are continuous as
real-valued algebraic combinations and compositions of continuous pieces. Therefore f is continuous
as well.
Similarly, the definition of limit for vector-valued functions carries over from the real-valued
case essentially unchanged.
Definition. Let U be an open set in Rn , and let a be a point of U . If f : U − {a} → Rm is a
vector-valued function defined on U , except possibly at the point a, then we say that limx→a f (x)
exists if there is a vector L in Rm such that the function fe: U → Rm defined by:
(
f (x) if x 6= a,
fe(x) =
L if x = a
6.2. DIFFERENTIABILITY REVISITED 159

is continuous at a. When this happens, we write limx→a f (x) = L.


As before, it is immediate that f : U → Rm is continuous at a if and only if limx→a f (x) = f (a).
One consequence of this is that Proposition 6.1 then implies that a vector-valued limit can also be
viewed as a sequence of real-valued limits.
Proposition 6.2. If f (x) = (f1 (x), f2 (x), . . . , fm (x)) and L = (L1 , L2 , . . . , Lm ), then limx→a f (x)
= L if and only limx→a fi (x) = Li for all i = 1, 2, . . . , m.

6.2 Differentiability revisited


Recall that, in the real-valued case, f : U → R is differentiable at a if it has a first-order approxi-
mation f (x) ≈ `(x) = f (a) + ∇f (a) · (x − a) such that limx→a f (x)−`(x) kx−ak = 0. For a vector-valued
m
function f : U → R , we find an approximation, again denoted `(x), by compiling the approxima-
tions of the component functions f1 , f2 , . . . , fm into one vector:
   
`1 (x) f1 (a) + ∇f1 (a) · (x − a)
 `2 (x)   f2 (a) + ∇f2 (a) · (x − a) 
`(x) =  .  = 
   
..
 ..  

. 
`m (a) fm (a) + ∇fm (a) · (x − a)
   
f1 (a) ∇f1 (a) · (x − a)
 f2 (a)   ∇f2 (a) · (x − a) 
= . +
   
..
 ..  

. 
fm (a) ∇fm (a) · (x − a)
 
∇f1 (a)
 ∇f2 (a)   
= f (a) +  · x−a
 
..
 . 
∇fm (a)
= f (a) + Df (a) · (x − a),
 
∇f1 (a)
 ∇f2 (a) 
where Df (a) =   is the matrix whose rows are the gradients of the component functions.
 
..
 . 
∇fm (a)
We consider the error f (x) − `(x) = f (x) − f (a) − Df (a) · (x − a) in using the approximation
and proceed as in the real-valued case.
Definition. Let U be an open set in Rn , and let f : U → Rm be a function. If a ∈ U , consider the
matrix:  ∂f1 ∂f1 
∂x1 (a) · · · ∂xn (a)
Df (a) =  ... .. ..  .

. . 
∂fm ∂fm
∂x1 (a) ··· ∂xn (a)
We say that f is differentiable at a if:
f (x) − f (a) − Df (a) · (x − a)
lim = 0. (6.1)
x→a kx − ak
When this happens, Df (a) is called the derivative, or Jacobian matrix, of f at a.

www.dbooks.org
160 CHAPTER 6. DIFFERENTIABILITY AND THE CHAIN RULE

To repeat a point made in the real-valued case, the derivative is not a single number, but rather
a matrix or, even better, the linear part of a good affine approximation of f near a.
We begin by calculating a few simple examples of Df (a). In general, it is an m by n matrix
whose entries are various partial derivatives. The ith row is the gradient ∇fi , and the jth column
is the “velocity” ∂x∂ j of f with respect to xj .
Example 6.3. Let f : R2 → R2 be f (r, θ) = (r cos θ, r sin θ). Then:
 
cos θ −r sin θ
Df (r, θ) = .
sin θ r cos θ
The formula above gives what the matrix looks like at"a√ general √ point
# a" √
= (r, θ). #At a specific
2 2 2

π π 2 −2 · 2 2 − 2
point, such as a = (2, 4 ), we have Df (a) = Df (2, 4 ) = √2 √ = √2 √ .
2 2 · 22 2 2
Example 6.4. Let g : R3 → R2 be the projection of xyz-space onto the xy-plane, g(x, y, z) = (x, y).
Then:  
1 0 0
Dg(x, y, z) = .
0 1 0
 
1 0 0
In other words, Dg(a) = for all a in R3 .
0 1 0
Example 6.5. Let h : R2 → R3 be h(x, y) = (x2 + y 3 , x4 y 5 , e6x+7y ). Then, if a = (x, y):
3y 2
 
2x
Dh(a) = Dh(x, y) =  4x3 y 5 5x4 y 4  .
6e6x+7y 7e6x+7y
To determine whether a function is differentiable without going through the definition, we may
use criteria similar to those in the real-valued case. For instance, the same reasoning as before
∂fi
shows that a differentiable function must be continuous and that all the partial derivatives ∂x j
must exist in order for f to be differentiable.
In addition, the condition for differentiability in the definition (6.1) requires that a certain limit
be the zero vector. This can happen if and only if the limit of each of the components is zero, too.
This follows from Proposition 6.2. In other words, we have the following componentwise criterion
for differentiability.
Proposition 6.6. If f (x) = (f1 (x), f2 (x), . . . , fm (x)), then f is differentiable at a if and only if
f1 , f2 , . . . , fm are all differentiable at a.
Example 6.7. Lat I be an open interval in R, and let α : I → Rn be a path in Rn , where
α(t) = (x1 (t), x2 (t), . . . , xn (t)). By the last proposition, α is differentiable if and only if each of
its components x1 , x2 , . . . , xn is differentiable. The latter condition is essentially how we defined
differentiability of paths in Chapter 2, so what we did there is supported by the theory we now
have in place. Moreover, by definition of the derivative matrix:
 0 
x1 (t)
 x0 (t) 
 2 
Dα(t) =  .  .
 .. 
x0n (t)
This agrees with α0 (t), the velocity, under the usual identification of a vector in Rn with a column
matrix.
6.3. THE CHAIN RULE: A CONCEPTUAL APPROACH 161

According to Proposition 6.6, we may apply the C 1 test for real-valued functions to each of the
components of a vector-valued function to obtain the following criterion.
∂fi
Theorem 6.8 (The C 1 test). If all the partial derivatives ∂x j
(i.e., the entries of Df (x)) exist
and are continuous on U , then f is differentiable at every point of U .
To illustrate this, in each of Examples 6.3–6.5 above, all entries of the D matrices are continuous,
so those three functions f, g, h are differentiable at every point of their domains.
A function f is called smooth if each of its component functions f1 , f2 , . . . , fm is smooth in the
sense we have discussed previously (see Section 4.9), that is, they have continuous partial derivatives
of all orders. As before, we generally state our results about differentiability for smooth functions
without worrying about whether this is a stronger assumption than necessary.

6.3 The chain rule: a conceptual approach


We now discuss the all-important rule for the derivative of a composition. We first take a fairly
high-level approach to understand where the rule comes from.
Let U and V be open sets in Rn and Rm , respectively, and let f : U → V and g : V → Rp
be functions. The composition g ◦ f : U → Rp is then defined, and by definition, if a ∈ U , the
composition is differentiable at a if there is a good first-order approximation of the form:
g(f (x)) ≈ g(f (a)) + D(g ◦ f )(a) · (x − a) (6.2)

when x is near a. The challenge is to figure out the expression that goes into the box for D(g ◦f )(a).
The idea is that the first-order approximation of the composition g ◦f should be the composition
of the first-order approximations of g and f individually. Let b = f (a), and assume that f is
differentiable at a and g is differentable at b. Then there are good first-order approximations:
• for f : f (x) ≈ `f (x)
≈ f (a) + Df (a) · (x − a) when x is near a, (6.3)
• for g : g(y) ≈ `g (y)
≈ g(b) + Dg(b) · (y − b) when y is near b
≈ g(f (a)) + Dg(f (a)) · (y − f (a)). (6.4)
Note that, by continuity, when x is near a, f (x) is near f (a) = b, so substituting y = f (x) in
equation (6.4) gives:
g(f (x)) ≈ g(f (a)) + Dg(f (a)) · (f (x) − f (a)).
Then, using approximation (6.3) to substitute f (x) − f (a) ≈ Df (a) · (x − a) yields:

g(f (x)) ≈ g(f (a)) + Dg(f (a)) · Df (a) · (x − a).


If you like, you can check that the right side of this last approximation is also the composition
`g (`f (x)) of the first-order approximations of g and f . In any case, the upshot is that we have
identified something that fits naturally in the box in equation (6.2).
Theorem 6.9 (Chain rule). Let U and V be open sets in Rn and Rm , respectively, and let f : U → V
and g : V → Rp be functions. Let a be a point of U . If f is differentiable at a and g is differentiable
at f (a), then g ◦ f is differentiable at a and:
D(g ◦ f )(a) = Dg(f (a)) · Df (a),
where the product on the right is matrix multiplication.

www.dbooks.org
162 CHAPTER 6. DIFFERENTIABILITY AND THE CHAIN RULE

In terms of the sizes of the various parties involved, the left side of the chain rule is a p by n
matrix, while the right side is a product (p by m) × (m by n). This need not be memorized. In
practice, it usually takes care of itself.
The chain rule says that the derivative of a composition is the product of the derivatives, or,
to be even more sophisticated, the composition of the linear parts of the associated first-order
approximations. Though the objects involved are more complicated, this is actually the same as
what happens back in first-year calculus. There, if u is a function of x, say u = f (x), and y is a
function of u, say y = g(u), then y becomes a function of x via composition: y = g(f (x)). The
one-variable chain rule says:
dy dy du
= · .
dx du dx
Again, the derivative of a composition is the product of the derivatives of the composed steps.
The line of reasoning we used to obtain the chain rule is natural and reasonably straightforward,
but a proper proof requires greater attention to the various approximations involved. The letter 
is likely to appear. The arguments are a little like those needed for a correct proof of the Little
Chain Rule back in Exercise 5.6 of Chapter 4, though the details here are more complicated. We
shall say more about the proof of the chain rule in the next section.
The following example is meant to illustrate how the pieces of the chain rule fit together.

Example 6.10. Let f : R2 → R2 be given by f (r, θ) = (r cos θ, r sin θ) and g : R2 → R3 by g(x, y) =


(x2 + y 2 , 2x + 3y, xy 2 ). Find D(g ◦ f )(2, π2 ).
By the chain rule:
π π  π π
D(g ◦ f )(2, ) = Dg f (2, ) · Df (2, ) = Dg(0, 2) · Df (2, ),
2 2 2 2

since f (2, π2 ) = (2 cos π2 , 2 sin π2 ) = (0, 2). Calculating the various derivatives involved is straight-
forward:
   
2x 2y 0 4
Dg(x, y) =  2 3  ⇒ Dg(0, 2) = 2 3 ,
2
y 2xy 4 0
   
cos θ −r sin θ π 0 −2
Df (r, θ) = ⇒ Df (2, ) = .
sin θ r cos θ 2 1 0

Therefore:    
0 4   4 0
π 0 −2
D(g ◦ f )(2, ) = 2 3 = 3 −4 .
2 1 0
4 0 0 −8

6.4 The chain rule: a computational approach


The chain rule states that the derivative of a composition is the product of the derivatives:

D(g ◦ f )(a) = Dg(f (a)) · Df (a).

This is remarkably concise and elegant. Nevertheless, under the hood, there is a lot going on. Each
entry of a matrix product involves considerable information: the (i, j)th entry of the product is the
6.4. THE CHAIN RULE: A COMPUTATIONAL APPROACH 163

dot product of the ith row and jth column:


 

 ..   
. 
 ∗ 


· · · (i, j)
 
· · · ..
 = ∗ ∗ · · · .
 
 ∗ ∗  .
..
 
.
 ∗ 

There is one such dot product for each entry of D(g ◦ f )(a). We shall write out these entries in
a few cases in a moment and discover that we have seen this type of product before. You can
think of the discovery either as an alternative way to derive the chain rule or as confirmation of
the consistency of the theory.
We take a simple situation that we have studied previously and then tweak it a couple of times.

Example 6.11. Let α : R → R3 be a smooth path in R3 , α(t) = (x(t), y(t), z(t)), and let g : R3 → R
be a smooth real-valued function of three variables, w = g(x, y, z). Then w becomes a function of
t by composition: w = g(α(t)) = g(x(t), y(t), z(t)). As a matter of notation, from here on, we use
w to denote the value of the composition, reserving g to stand for the original function of x, y, z.
The composition is a real-valued function of one variable, so the Little Chain Rule applies. It
says that the derivative of the compositon is “gradient dot velocity”:
   
dw 0 ∂g ∂g ∂g dx dy dz
= ∇g(α(t)) · α (t) = , , · , ,
dt ∂x ∂y ∂z dt dt dt
∂g dx ∂g dy ∂g dz
= + + , (6.5)
∂x dt ∂y dt ∂z dt
∂g ∂g ∂g
where the partial derivatives ∂x , ∂y , ∂z are evaluated at the point α(t) = (x(t), y(t), z(t)). This
calculation can be visualized using a “dependence diagram,” as shown in Figure 6.2. It illustrates

w=g

x y z

Figure 6.2: A dependence diagram for the Little Chain Rule

the dependence of the various variables involved: g is a function of x, y, z, and in turn x, y, z are
functions of t. The Little Chain Rule (6.5) says that, to find the derivative of the top with respect
to the bottom, multiply the derivatives along each possible path from top to bottom and add these
products.

Tweak 1. Suppose that x, y, z are functions of two variables s and t. In other words, replace
α : R → R3 with a smooth function f : R2 → R3 , where f (s, t) = (x(s, t), y(s, t), z(s, t)). Then the
composition g ◦f : R2 → R3 → R is a function of s and t: w = (g ◦f )(s, t) = g(x(s, t), y(s, t), z(s, t)).
The corresponding dependence diagram is shown in Figure 6.3.
To compute ∂w ∂w
∂s (or ∂t ), only one variable actually varies. Because the composition is again
real-valued, the Little Chain Rule applies with respect to that variable, just as in the previous case.
Notationally, because w, x, y, and z are no longer functions of just one variable, we need to replace

www.dbooks.org
164 CHAPTER 6. DIFFERENTIABILITY AND THE CHAIN RULE

w=g

x y z

s t

Figure 6.3: Dependence diagram for tweak 1

d d ∂ ∂
the ordinary derivative terms ds (or dt ) in equation (6.5) with the partial derivatives ∂s (or ∂t ).
∂w
For instance, for ∂s , the Little Chain Rule says:

∂w ∂g ∂x ∂g ∂y ∂g ∂z
= + + . (6.6)
∂s ∂x ∂s ∂y ∂s ∂z ∂s

The relevant paths in the dependence diagram are shown in bold in the figure. Similarly:

∂w ∂g ∂x ∂g ∂y ∂g ∂z
= + + . (6.7)
∂t ∂x ∂t ∂y ∂t ∂z ∂t

In fact, equations (6.6) and (6.7) can be rewritten as a single matrix equation:
 ∂x ∂x 
h i ∂s ∂t
 ∂w ∂w
 ∂g ∂g ∂g  ∂y ∂y 
∂s ∂t = ∂x ∂y ∂z  ∂s ∂t 
. (6.8)
∂z ∂z
∂s ∂t

The entries corresponding to equation (6.6) are shown in red. The point is that the matrix on the
left of equation (6.8) is the derivative D(g ◦f ), while those on the right are Dg and Df , respectively.

Tweak 2. We alter g so that it is vector-valued, say g : R3 → R2 , where g(x, y, z) = (g1 (x, y, z),
2 3 1. The composition g ◦ f : R2 → R2 is vector-valued,
 leave f : R → R as in tweak
g2 (x, y, z)), and 
w = g f (s, t) = g1 (f (s, t)), g2 (f (s, t)) . Each of the components is a real-valued function of s
and t: w1 = g1 (f (s, t)) = g1 (x(s, t), y(s, t), z(s, t)) and w2 = g2 (f (s, t)) = g2 (x(s, t), y(s, t), z(s, t)).
The dependence diagram is shown in Figure 6.4.

w1 = g1 w2 = g2

x y z

s t

Figure 6.4: Dependence diagram for tweak 2

Both w1 and w2 have partial derivatives with respect to s and t, and each of them fits exactly
into the context of tweak 1. For example:

∂w2 ∂g2 ∂x ∂g2 ∂y ∂g2 ∂z


= + + ,
∂t ∂x ∂t ∂y ∂t ∂z ∂t
6.4. THE CHAIN RULE: A COMPUTATIONAL APPROACH 165

as highlighted in bold in the figure. Again, the collection of the four individual partial derivatives
of the composition computed in this way can be organized into a single matrix equation:
 ∂x ∂x 
" # " # ∂s ∂t
∂w1 ∂w1 ∂g1 ∂g1 ∂g1
∂s ∂t ∂x ∂y ∂z  ∂y ∂y 
∂w2 ∂w2
= ∂g2 ∂g2 ∂g2  ∂s ∂t 
.
∂s ∂t ∂x ∂y ∂z ∂z ∂z
∂s ∂t

Once again, D(g ◦ f ) appears on the left, while the product of Dg and Df is on the right.

The same pattern persists in general for any composition. The individual partial derivatives
of a composition can be calculated using the Little Chain Rule and then combined as the matrix
equation D(g ◦ f )(a) = Dg(f (a)) · Df (a). In a way, the matrix version of the chain rule is a
bookkeeping device that keeps track of many, many Little Chain Rule calculations!
Indeed, since we presented a rigorous proof of the Little Chain Rule in Exercise 5.6 of Chapter
4, we can think of this approach as supplying a proof of the general chain rule as well.

Example 6.12. The substitutions x = r cos θ, y = r sin θ convert a smooth real-valued function
f (x, y) of x and y into a function of r and θ: w = f (r cos θ, r sin θ). For instance, if f (x, y) = x2 y 3 ,
then w = (r cos θ)2 (r sin θ)3 = r5 cos2 θ sin3 θ.
∂f ∂f
We find general formulas for ∂w ∂w
∂r and ∂θ in terms of ∂x and ∂y . As just discussed, we can
compute these individual partials using the Little Chain Rule and a dependence diagram (Figure
6.5).

w=f

x y

r θ

Figure 6.5: Finding partial derivatives with respect to polar coordinates

Hence:
∂w ∂f ∂x ∂f ∂y ∂f ∂f
= + = cos θ + sin θ,
∂r ∂x ∂r ∂y ∂r ∂x ∂y
∂w ∂f ∂x ∂f ∂y ∂f ∂f
= + =− r sin θ + r cos θ.
∂θ ∂x ∂θ ∂y ∂θ ∂x ∂y

Example 6.13 (Implicit differentiation). Let f : R2 → R be a smooth real-valued function of two


variables, and let S be the level set corresponding to f = c, that is, S = {(x, y) ∈ R2 : f (x, y) = c}.
Assume that the condition f (x, y) = c defines y implicitly as a smooth function of x, y = y(x), for
dy
some interval of x values. See Figure 6.6. Find the derivative dx .
We start with the condition f (x, y(x)) = c that defines y(x). The expression on the left is the
composition x 7→ (x, y(x)) 7→ f (x, y(x)). Call this last value w. The dependence diagram is in
Figure 6.7.
The derivative dw
dx can be computed using the chain rule:

dw ∂f dx ∂f dy ∂f ∂f dy
= + = + .
dx ∂x dx ∂y dx ∂x ∂y dx

www.dbooks.org
166 CHAPTER 6. DIFFERENTIABILITY AND THE CHAIN RULE

Figure 6.6: The level set f (x, y) = c defines y implicitly as a function of x on an interval of x values.

w=f

x y

Figure 6.7: A dependence diagram for implicit differentiation

dw
On the other hand, w has a constant value of c, so dx = 0. Hence:
∂f ∂f dy
+ = 0. (6.9)
∂x ∂y dx

Solving for dy
dx gives dy
dx = − ∂f /∂x
∂f /∂y .
2 2 2
x2
For example, consider the ellipse x9 + y4 = 1. It’s the level set of f (x, y) = 9 + y4 corresponding
to c = 1. Hence by the previous example, along the ellipse, we have:
dy ∂f /∂x 2x/9 4x
=− =− =− .
dx ∂f /∂y 2y/4 9y
Of course, implicit differentiation problems of this type appear in first-year calculus, and the
multivariable chain rule is never mentioned. There, to find the derivative along the ellipse, one
simply differentiates both sides of the equation of the ellipse and uses the one-variable chain rule.
d x2 y2  d
In other words, one computes dx 9 + 4 = dx (1), or:

2 1 dy
x+ y = 0.
9 2 dx
This is exactly the same as the result given by equation (6.9), from which the general formula
dy
for dx followed immediately. The multivariable chain rule gives us a way of articulating what the
one-variable calculation is doing.

6.5 Exercises for Chapter 6


Section 1 Continuity revisited

1.1. (a) If x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ) are elements of Rn , show that |xi − yi | ≤
kx − yk for i = 1, 2, . . . , n.
6.5. EXERCISES FOR CHAPTER 6 167

(b) Let U be an open set in Rn , and let f : U → Rm be a function, where f (x) =


(f1 (x), f2 (x), . . . , fm (x)). Show that, if f is continuous at a point a of U , then so
are f1 , f2 , . . . , fm .
1.2. Let U be an open set in Rn , a a point of U , and f : U − {a} → Rm a function defined
on U , except possibly at a. Write out in terms of ’s and δ’s what it means to say that
limx→a f (x) = L.

Section 2 Differentiability revisited

2.1. Let f : R2 → R2 be given by f (x, y) = (x2 − y 2 , 2xy).


(a) Find Df (x, y).
(b) Find Df (1, 2).
2.2. Let f : R3 → R2 be given by f (x, y, z) = xy 2 z 3 + 2, x cos(yz) .


(a) Find Df (x, y, z).


(b) Find Df (− π2 , 1, π2 ).
2.3. Find Df (x, y, z) if f (x, y, z) = x + y 2 + 3z 3 .
2.4. Find Df (x, y, z) if f (x, y, z) = (x, y 2 , 3z 3 ).
2.5. Find Df (x, y) if f (x, y) = (x, y, 0).
2.6. Find Df (x) if f (x) = (x2 , x sin x, 2x + 3).
2.7. Let f : R2 → R3 be given by f (x, y) = (ex+y , e−x cos y, e−x sin y).
(a) Find Df (x, y).
(b) Show that f is differentiable at every point of R2 .
2.8. Let f : R3 → R3 be the spherical coordinate transformation from ρφθ-space to xyz-space:
f (ρ, φ, θ) = (ρ sin φ cos θ, ρ sin φ sin θ, ρ cos φ).
(a) Find Df (ρ, φ, θ).
(b) Show that f is differentiable at every point of R3 .
(c) Show that det Df (ρ, φ, θ) = ρ2 sin φ. (This result will be used in the next chapter.)
2.9. Let T : R3 → R3 be a linear transformation, and let A be its matrix
 with respect
 to the
a b c
standard bases. Hence T (x) = Ax, where A is a 3 by 3 matrix, A = d
 e f .
g h i
(a) Find DT (x, y, z).
(b) Use the definition of differentiability to show that T is differentiable at every point a in
R3 .
2.10. Given points x = (x1 , x2 , x3 ) and y = (y1 , y2 , y3 ) in R3 , let us think of (x, y) as the point
(x1 , x2 , x3 , y1 , y2 , y3 ) in R6 . Let f : R6 → R and g : R6 → R3 be the dot and cross products in
R3 , respectively, that is:
f (x, y) = x · y and g(x, y) = x × y.

www.dbooks.org
168 CHAPTER 6. DIFFERENTIABILITY AND THE CHAIN RULE

(a) Find Df (x, y) and Dg(x, y).


(b) Show that f and g are differentiable at every point (x, y) in R6 .

Section 3 The chain rule: a conceptual approach

3.1. Let f : R2 → R3 and g : R3 → R2 be given by:

f (s, t) = s − t + 2, e2s+3t , s cos t and g(x, y, z) = (xyz, x2 + z 3 ).




(a) Find Df (s, t) and Dg(x, y, z).


(b) Find D(g ◦ f )(0, 0).
(c) Find D(f ◦ g)(0, 0, 0).

3.2. Let f : R2 → R3 and g : R3 → R be the functions:

f (s, t) = 2s − t2 , st − 1, 2s2 + st − t2 and g(x, y, z) = (x + 1)2 eyz .




If h(s, t) = g(f (s, t)), find Dh(1, 2).

3.3. Let U and V be open sets in R2 , and let f : U → V and g : V → U be inverse functions. That
is:

• g(f (x, y)) = (x, y) for all (x, y) in U and


• f (g(x, y)) = (x, y) for all (x, y) in V .

Let a be a point of U , and let b = f (a). If f is differentiable at a and g is differentiable at


b, what can you say about the following matrix products?

(a) Dg(b) · Df (a)


(b) Df (a) · Dg(b)

3.4. Exercise 5.5 of Chapter 4 described a generalization of the mean value theorem for real-valued
functions of n variables. Does the same generalization apply to vector-valued functions? More
precisely, let a be a point of Rn , and let B = B(a, r) be an open ball centered at a. Is it
necessarily true that, if f : B → Rm is a differentiable function on B and b is any point of B,
then there exists a point c on the line segment connecting a and b such that:

f (b) − f (a) = Df (c) · (b − a)?

If true, give a proof. Otherwise, find a counterexample.

Section 4 The chain rule: a computational approach

4.1. Let f (x, y, z) be a smooth real-valued function of x, y, and z. The substitutions x = s + 2t,
y = 3s+4t, and z = 5s+6t convert f into a function of s and t: w = f (s+2t, 3s+4t, 5s+6t).
∂f ∂f ∂f
Find expressions for ∂w ∂w
∂s and ∂t in terms of ∂x , ∂y , and ∂z .

4.2. Let w = f (x, y) be a smooth real-valued function, where:

x = s2 − t2 + u2 + 2s − 2t and y = stu.
∂w ∂w ∂w ∂f ∂f
Find expressions for ∂s , ∂t , and ∂u in terms of ∂x and ∂y .
6.5. EXERCISES FOR CHAPTER 6 169

4.3. The spherical substitutions x = ρ sin ϕ cos θ, y = ρ sin ϕ sin θ, and z = ρ cos ϕ convert a
smooth real-valued function f (x, y, z) into a function of ρ, ϕ, and θ:

w = f (ρ sin ϕ cos θ, ρ sin ϕ sin θ, ρ cos ϕ).


∂w ∂w ∂w ∂f ∂f ∂f
(a) Find formulas for ∂ρ , ∂ϕ , ∂θ in terms of ∂x , ∂y , ∂z .

(b) If f (x, y, z) = x2 + y 2 + z 2 , use your answer to part (a) to find ∂w


∂φ , and verify that it
is the same as the result obtained if you first write w in terms of ρ, ϕ, θ directly, say by
substituting for x, y, z, and then differentiate that expression with respect to ϕ.

4.4. The substitutions x = s2 − t2 and y = 2st convert a smooth real-valued


function f (x, y) into
a function of s and t: w = f (s2 − t2 , 2st). Find a formula for ∇w = ∂w ∂w

,
∂s ∂t in terms
∂f ∂f
of ∂x and ∂y .

4.5. Let f (x, y) be a smooth real-valued function of x and y. The substitutions

x = e2s+t − 1 and y = s2 − 3st + 2t

convert f into a function w of s and t. If ∂w ∂w


∂s = 5 and ∂t = 4 when s and t are both 0, find
∂f ∂f
the values of ∂x and ∂y when x and y are both 0. (Note that when s and t are both 0, so are
x and y.)

4.6. The substitutions x = r cos θ and y = r sin θ convert a smooth real-valued function f (x, y)
into a function of r and θ: w = f (r cos θ, r sin θ).

(a) Show that r ∂w


∂r = x ∂f
∂x + y
∂f
∂y .
∂w ∂f ∂f
(b) Find a similar expression for ∂θ in terms of x, y, ∂x , and ∂y .

4.7. (a) Suppose that w = f (u, v) is a smooth real-valued function, where u = x/z and v = y/z.
Show that:
∂w ∂w ∂w
x +y +z = 0. (6.10)
∂x ∂y ∂z
x2 +xy+y 2
(b) Without calculating any partial derivatives, deduce that w = z2
satisfies equation
(6.10).

4.8. Let f : U → R be a smooth real-valued function of three variables defined on an open set U
in R3 . Assume that the condition f (x, y, z) = c defines z implicitly as a smooth function of
x and y, z = z(x, y), on some open set of points (x, y) in R2 . Show that, on this open set:

∂z ∂f /∂x ∂z ∂f /∂y
=− and =− .
∂x ∂f /∂z ∂y ∂f /∂z

4.9. Assume that the condition xy + xz + yz + ex+2y+3z = 1 defines z implicitly as a smooth


∂z ∂z
function of x and y on some open set of points (x, y) containing (0, 0) in R2 . Find ∂x and ∂y
when x = 0 and y = 0.

4.10. (a) Let f : R3 → R2 be a smooth function. By definition, the level set S of f corresponding
to the value c = (c1 , c2 ) is the set of all (x, y, z) such that f (x, y, z) = c or, equivalently,
where f1 (x, y, z) = c1 and f2 (x, y, z) = c2 . Typically, the last two equations define a
curve in R3 , the curve of intersection of the surfaces defined by f1 = c1 and f2 = c2 .

www.dbooks.org
170 CHAPTER 6. DIFFERENTIABILITY AND THE CHAIN RULE

Assume that the condition f (x, y, z) = c defines x and y implicitly as smooth functions
of z on some interval of z values in R, that is, x = x(z) and y = y(z) are functions
that satisfy f (x(z), y(z), z) = c. Then the corresponding portion of the level set is
parametrized by α(z) = (x(z), y(z), z).
Show that x0 (z) and y 0 (z) satisfy the matrix equation:
" ∂f1 ∂f1 #  " ∂f #
0 (z)
 1
∂x ∂y x ∂z
= − ,
∂f2 ∂f2 y 0 (z) ∂f2
∂x ∂y ∂z

where the partial derivatives are evaluated at the point (x(z), y(z), z). (Hint: Let w =
(w1 , w2 ) = f (x(z), y(z), z).)
(b) Let f (x, y, z) = (x − cos z, y − sin z), and let C be the level set of f corresponding to
c = (0, 0). Find the functions x = x(z) and y = y(z) and the parametrization α of C
described in part (a), and describe C geometrically. Then, use part (a) to find Dα(z),
and verify that your answer makes sense.
4.11. Let f (x, y) be a smooth real-valued function of x and y defined on an open set U in R2 . The
partial derivative ∂f ∂f
∂x is also a real-valued function of x and y (as is ∂y ). If α(t) = (x(t), y(t))
∂f
is a smooth path in U , then substituting for x and y in terms of t converts ∂x into a function
of t, say w = ∂f
∂x (x(t), y(t)).

(a) Show that:


dw ∂ 2 f dx ∂ 2 f dy
= + ,
dt ∂x2 dt ∂y ∂x dt
where the partial derivatives are evaluated at α(t).
(b) Now, consider the composition g(t) = (f ◦α)(t). Find a formula for the second derivative
d2 g
dt2
in terms of the partial derivatives of f as a function of x and y and the derivatives
of x and y as functions of t. (Hint: Differentiate the Little Chain Rule. Note that this
involves differentiating some products.)
(c) Let a be a point of U . If α : I → U , α(t) = a + tv = (a1 + tv1 , a2 + tv2 ) is a parametriza-
tion of a line segment containing a and if g(t) = (f ◦ α)(t), use your formula from part
(b) to show that:
d2 g 2
2 ∂ f ∂2f 2
2 ∂ f
= v 1 + 2v 1 v 2 + v 2 ,
dt2 ∂x2 ∂x ∂y ∂y 2
where the partial derivatives are evaluated at α(t). (Compare this with equation (4.22)
in Chapter 4 regarding the second-order approximation of f at a.)
4.12. Let f (x, y) be a smooth real-valued function defined on an open set in R2 . The polar co-
ordinate substitutions x = r cos θ and y = r sin θ convert f into a function of r and θ,
w = f (r cos θ, r sin θ).
(a) Show that:
∂2w ∂f ∂f 2
2
2 ∂ f 2 ∂2f
= −r cos θ − r sin θ + r sin θ − 2r sin θ cos θ
∂θ2 ∂x ∂y ∂x2 ∂x ∂y
2
∂ f
+ r2 cos2 θ 2 .
∂y
(Hint: Start with the results of Example 6.12.)
6.5. EXERCISES FOR CHAPTER 6 171
∂2w
(b) Find an analogous expression for the mixed partial derivative ∂r ∂θ .
∂2w
(c) Find an analogous expression for ∂r2
.
2
∂ w
(d) If f (x, y) = x2 + y 2 + xy, use your answer to part (b) to find ∂r ∂θ . Then, verify that it is
the same as the result obtained by writing w in terms of r and θ directly by substituting
for x and y and then differentiating that expression with respect to r and θ.

www.dbooks.org
172 CHAPTER 6. DIFFERENTIABILITY AND THE CHAIN RULE
Chapter 7

Change of variables

We now turn to the counterpart for integrals of the chain rule. It is called the change of variables
theorem. For functions of one variable, the corresponding notion is the method of substitution.
The one-variable and multivariable versions are expressed in equations that are similar formally,
but the approaches one might take to understand where they come from are quite different. It’s
exciting when a change in perspective still leads to a result of recognizable form. We make a few
remarks comparing the two versions later, after the theorem has been stated.
In many respects, the change of variables theorem and the chain rule hold together the foun-
dation of the more advanced theory of multivariable calculus. We begin, however, with a concrete
example where changing variables just seems like the right thing to do.
RR p
Example 7.1. Let D be the disk x2 + y 2 ≤ 4 in the xy-plane. Find D x2 + y 2 dx dy. For
instance, if one wanted to know the average distance to the origin on D, one would evaluate this
integral and divide by Area (D) = 4π.

Figure 7.1: The disk x2 + y 2 ≤ 4

Setting up the integral with the order of integration dy dx in the usual way (Figure 7.1) gives:

ZZ p Z 2 Z 4−x2 p 
x2 + y 2 dx dy = √
2 2
x + y dy dx.
D −2 − 4−x2

While not impossible, this looks potentially messy. On the other hand, expressing the problem in
terms of polar coordinates seems promising for two reasons.
p
• The function to be integrated is simpler: x2 + y 2 = r.
• The region of integration is simpler: in polar coordinates, D is described by 0 ≤ r ≤ 2,
0 ≤ θ ≤ 2π, in other words, by a rectangle in the rθ-plane.

173

www.dbooks.org
174 CHAPTER 7. CHANGE OF VARIABLES

The obstacle is what to do about the dA = dx dy part of the integral. For this, we need to see how
small pieces of area in the xy-plane are related to those in the rθ-plane. We return to this example
later.

7.1 Change of variables for double integrals


RR
Let D f (x, y) dx dy be a given double integral. We want to describe how to convert the integral to
an equivalent one with a different domain of integration and a correspondingly different integrand.
The setup is to assume that there is a smooth function T : D∗ → D, where D∗ is a subset of
R2 . We think of T as transforming D∗ onto D, so we assume that T (D∗ ) = D and that T is
one-to-one, except possibly along the boundary of D∗ . This last part means that whenever a and
b are distinct points of D∗ , not on the boundary, then the values T (a) and T (b) are distinct, too:
a 6= b ⇒ T (a) 6= T (b). (For the purposes of integration, anomalous behavior confined to the
boundary can be allowed typically without messing things up. See the remarks on page 130.)
The new integral will be an integral over D∗ . It helps keep things straight if the coordinates
in the domain and codomain have different names, so we think of T as a transformation from the
uv-plane to the xy-plane and write T (u, v) = (x(u, v), y(u, v)).
Thus we want to convert the given integral over D with respect to x and y to an integral over
D∗ with respect to u and v. We outline the main idea. It is similar to what we did in Section 5.5 to
define surface integrals with respect to surface area using a parametrization, though of course there
is a big difference between motivating a definition and justifying a theorem. Another difference is
that, in the interim, we have learned about the derivative of a vector-valued function.

Figure 7.2: Change of variables: x and y as functions of u and v via transformation T

To begin, subdivide D∗ into small subrectangles of dimensions 4ui by 4vj , as in Figure 7.2.
The transformation T sends a typical subrectangle of area 4ui 4vj in the uv-plane to a small curvy
quadrilateral in the xy-plane. Let 4Aij denote the area of the curvy quadrilateral. If we choose
a sample point pij in each curvy quadrilateral, we can form a Riemann sum-like approximation of
the integral over D: ZZ X
f (x, y) dx dy ≈ f (pij ) 4Aij . (7.1)
D i,j

To turn this into a Riemann sum over D∗ , we need to relate the areas of the subrectangles and the
curvy quadrilaterals.
To do so, we use the first-order approximation of T . Choose a point aij in the (i, j)th subrect-
angle of D∗ such that T (aij ) = pij . Then the first-order approximation says that, for points u in
7.1. CHANGE OF VARIABLES FOR DOUBLE INTEGRALS 175

D∗ near aij :

T (u) ≈ T (aij ) + DT (aij ) · (u − aij )


≈ pij + DT (aij ) · (u − aij )

≈ pij − DT (aij ) · aij + DT (aij ) · u. (7.2)

It may be difficult to get a foothold on the behavior of T itself, but describing how the approximation
transforms the (i, j)th subrectangle is quite tractable.
For this, we bring in some linear algebra. Note that the derivative DT (aij ) is a 2 by 2 ma-
trix. As shown in Chapter 1, any 2 by 2 matrix A = ac db determines a linear transformation
L : R2 → R2 given by matrix multiplication: L(x) = A x. This transformation sends the unit
square determined by e1 and e2 to the parallelogram determined by L(e1 ) = [ ac ] and L(e2 ) = db ,


which are the columns of A (Figure 7.3). We saw in Proposition 1.13 of Chapter 1 that the area

Figure 7.3: A linear transformation L sends the square determined by e1 and e2 to the parallelogram
determined by L(e1 ) and L(e2 ).
   
of this parallelogram is | det L(e1 ) L(e2 ) |, where L(e1 ) L(e2 ) is the matrix whose rows are
L(e1 ) and L(e2 ). This is precisely the matrix transpose At . Thus the area of the parallelogram is
| det(At )| = | det A|, where we have used the general fact that an n by n matrix and its transpose
have the same determinant (Proposition 1.14, Chapter 1). In particular, L has changed the area
by a factor of | det A|.
This principle extends easily to a couple of slightly more general cases.

• Given nonzero scalars ` and w, let R be the |`| by |w| rectangle determined by `e1 and w e2 .
Then L sends R to the parallelogram P determined by L(`e1 ) = `L(e1 ) and L(we2 ) = wL(e2 ).
Compared to the original case, the areas of both rectangle and parallelogram have changed
by a common factor of |`w|, hence they still differ by a factor of | det A|.
• Suppose that we translate the rectangle R in the previous case by a constant vector a. This
results in another |`| by |w| rectangle R0 consisting of all points of the form a+x, where x ∈ R.
By linearity L(a+x) = L(a)+L(x), so L transforms R0 to the translation of the parallelogram
L(R) = P by L(a). Let’s call the translated parallelogram P 0 . Since translations don’t affect
areas, R0 and its image P 0 again differ by a factor of | det A|.

This shows the following.

Lemma 7.2. A linear transformation L : R2 → R2 represented with respect to the standard bases by
a 2 by 2 matrix A sends rectangles whose sides are parallel to the coordinate axes to parallelograms
and alters the area of the rectangles by a factor of | det A|.

www.dbooks.org
176 CHAPTER 7. CHANGE OF VARIABLES

Returning to the first-order approximation (7.2) of T , we conclude that multiplication by


DT (aij ) transforms the subrectangle of area 4ui 4vj that contains aij to a parallelogram of area
| det DT (aij )| 4ui 4vj . The determinant of the derivative, det DT (aij ), is sometimes referred to as
the Jacobian determinant at aij . Hence after additional translation, the first-order approxima-
tion (7.2) sends the subrectangle containing aij to a parallelogram that:
• approximates the curvy quadrilateral of area 4Aij that contains pij and
• has area | det DT (aij )| 4ui 4vj .
Thus the Riemann sum approximation (7.1) becomes:
ZZ X X
f (x, y) dx dy ≈ f (pij ) · 4Aij ≈ f (T (aij )) · | det DT (aij )| 4ui 4vj .
D i,j i,j

Letting 4ui and 4vj go to zero, this becomes an integral over D∗ , giving the following major
result.
Theorem 7.3 (Change of variables theorem for double integrals). Let D and D∗ be bounded subsets
of R2 , and let T : D∗ → D be a smooth function such that T (D∗ ) = D and that is one-to-one, except
possibly on the boundary of D∗ . If f is integrable on D, then:
ZZ ZZ
f (x, y) dx dy = f (T (u, v)) | det DT (u, v)| du dv.
D D∗

In other words, x and y are replaced in terms of uand v in the


function f using (x, y) = T (u, v)
∂x ∂x
and dx dy is replaced by | det DT (u, v)| du dv = det ∂u
∂y
∂v
∂y du dv.
∂u ∂v
By comparison, in first-year calculus, if f (x) is a real-valued function of one variable and x is
expressed in terms of another variable u, say as x = T (u), then the method of substitution says
that: Z T (b) Z b
f (x) dx = f (T (u)) T 0 (u) du.
T (a) a
In this case, it seems simplest to understand the result in terms of antiderivatives using the chain
rule and the fundamental theorem of calculus rather than through Riemann sums, though we
won’t present the details of the argument. In particular, substituting for x in terms of u includes
the substitution dx = T 0 (u) du = du dx
du. The substitution dx dy = | det DT (u, v)| du dv in the
previous paragraph is the analogue in the double integral case. We say more about the principle
of substitution in the next section.
Returning to double integrals, in the case of polar coordinates,  the substitutions
 for x and y
cos θ −r sin θ
are given by (x, y) = T (r, θ) = (r cos θ, r sin θ). Then DT (r, θ) = , and:
sin θ r cos θ

| det DT (r, θ)| = |r cos2 θ + r sin2 θ| = |r| = r.


By the change of variables theorem, this means that “dx dy = r dr dθ” and that polar coordinates
transform integrals as follows.
Corollary 7.4. In polar coordinates:
ZZ ZZ
f (x, y) dx dy = f (r cos θ, r sin θ) r dr dθ,
D D∗

where D∗ is the region of the rθ-plane that describes D in polar coordinates.


7.2. A WORD ABOUT SUBSTITUTION 177
RR p
We now complete Example 7.1, which asked to evaluate D x2 + y 2 dx dy, where D is the
disk x2 + y 2 ≤ 4 in the xy-plane. As noted earlier, D is described in polar coordinates by the
rectangle D∗ given by 0 ≤ r ≤ 2, 0 ≤ θ ≤ 2π in the rθ-plane. The polar coordinate transformation
T is shown in Figure 7.4.

Figure 7.4: A change of variables in polar coordinates

We verify that T satisfies the hypotheses of the theorem, though we shall grow less meticulous
about writing out the details on this point after a couple more examples. All the points on the
left side of D∗ , where r = 0, are mapped to the origin, that is, T (0, θ) = (0, 0) for 0 ≤ θ ≤ 2π.
Also, along the top and bottom of D∗ , θ = 0 and θ = 2π correspond to the same thing in the
xy-plane: T (r, 0) = T (r, 2π) for 0 ≤ r ≤ 2. Apart from this, distinct points in D∗ are mapped to
distinct values in D. In other words, T is one-to-one away from the boundary of D∗ , so the change
of variables
p theorem applies.
Since x2 + y 2 = r, this gives:
ZZ p ZZ
2 2
x + y dx dy = r · r dr dθ
D D∗
Z 2π Z 2 
2
= r dr dθ
0 0
1 3 r=2 
Z 2π
= r dθ
0 3 r=0
Z 2π
8
= dθ
0 3
8 2π
= θ 0
3
16π
= .
3

7.2 A word about substitution


The change of variables theorem concerns how to convert an integral with respect to x and y into
an integral with respect to two other variables, say u and v. Another name for this is substitution:
we require expressions to substitute for x, y, and dA = dx dy in terms of u and v. Substituting for
x and y means writing them as functions of u and v: x = x(u, v) and y = y(u, v). This is equivalent
to having a function T (u, v) = (x(u, v), y(u, v)) from the uv-plane to the xy-plane. The logistics
can be confusing and perhaps counterintuitive: the transformation of the integral goes from being
with respect to x and y to being with respect to u and v, but, in order to accomplish this, the

www.dbooks.org
178 CHAPTER 7. CHANGE OF VARIABLES

geometric transformation goes the other way, from (u, v) to (x, y). This is an inherent aspect of
substitution.
Sometimes, the original integrand is denoted by an expression like η = f (x, y) dx dy, and then
the transformed integrand f (T (u, v)) | det DT (u, v)| du dv is denoted by T ∗ (η). Another appropriate
notation for it might be η ∗ . With this notation, the change of variables theorem becomes:
ZZ ZZ
η= T ∗ (η). (7.3)
T (D∗ ) D∗

Actually, what we just said is not entirely accurate and the situation is a little more complicated,6
but it serves to illustrate the general principle: in a substitution, the geometry is “pushed forward,”
and the integral is “pulled back.”

7.3 Examples: linear changes of variables, symmetry


We illustrate the change of variables theorem with three further examples.
2 2
Example 7.5. Let D be the region x9 + y4 ≤ 1, y ≥ 0. Evaluate D (x2 + y 2 ) dx dy.
RR

Here, the integrand f (x, y) = x2 + y 2 is fairly simple, but the region of integration poses some
problems. D is the region inside the upper half of an ellipse. The endpoints of the iterated integral
with respect to x and y are somewhat hard to work with, and polar coordinates don’t help either—
the relationship between r and θ needed to describe D is a little complicated. On the other hand,
we can think of D as a transformed semicircular disk, which is easy to describe in polar coordinates.
Thus we attempt to map the half-disk D∗ of radius 1 given in the uv-plane by u2 + v 2 ≤ 1, v ≥ 0,
over to D and hope that this does not complicate the function to be integrated too much.

Figure 7.5: Changing an integral over a half-ellipse to an integral over a half-disk

The x and y-intercepts of the ellipse are ±3 and ±2, respectively, so, to achieve the transfor-
mation, we stretch horizontally by a factor of 3 and vertically by a factor of 2. In other words,
define:
T (u, v) = (3u, 2v).
The transformation is illustrated in Figure 7.5. In effect, we are making the substitutions x = 3u and
y = 2v. Then T transforms D∗ to D: if (u, v) satisfies u2 + v 2 ≤ 1 and v ≥ 0, then (x, y) = T (u, v)
2 2 2 (2v)2
satisfies x9 + y4 = (3u) 2 2
9 + 4 = u + v ≤ 1 and y = 2v ≥ 0.
Moreover, T is one-to-one on all of R2 . This seems reasonable geometrically, or, in terms of
equations, if (u1 , v1 ) 6= (u2 , v2 ), then (3u1 , 2v1 ) 6= (3u2 , 2v2 ). Perhaps it is more readable to phrase
6
The correct expressions are η = f (x, y) dx ∧ dy and T ∗ (η) = f (T (u, v)) det DT (u, v) du ∧ dv. We shall learn more
about integrands like this, and more importantly how to integrate them, in the last two chapters. In particular, for
the correct version of equation (7.3), see Exercise 4.20 in Chapter 11.
7.3. EXAMPLES: LINEAR CHANGES OF VARIABLES, SYMMETRY 179

this in the contrapositive: if (3u1 , 2v1 ) = (3u2 , 2v2 ), then (u1 , v1 ) = (u2 , v2 ). This is clear. For
instance, 3u1 = 3u2 ⇒ u1 = u2 .
Finally, DT (u, v) = [ 30 02 ], so | det DT (u, v)| = 6, and f (T (u, v)) = (3u)2 + (2v)2 = 9u2 + 4v 2 .
Thus by the change of variables theorem:
ZZ ZZ
2 2
(x + y ) dx dy = (9u2 + 4v 2 ) · 6 du dv.
D D∗
We convert this to polar coordinates in the uv-plane using u = r cos θ, v = r sin θ, and du dv =
r dr dθ. The half-disk D∗ is described in polar coordinates by 0 ≤ r ≤ 1, 0 ≤ θ ≤ π, in other words,
by the rectangle [0, 1] × [0, π] in the rθ-plane. Thus:
ZZ ZZ
(x2 + y 2 ) dx dy = ∗
(9r2 cos2 θ + 4r2 sin2 θ) · 6 · r dr dθ
D D in
polar
Z π Z 1 
3 2 2
=6 r (9 cos θ + 4 sin θ) dr dθ. (7.4)
0 0

At this point, we pause to put on the record a couple of facts from the exercises that we shall
use freely from now on (see Exercise 2.3 in Chapter 5 and Exercise 1.1 in this chapter):
• For a function F whose variables separate into two independent continuous factors, F (x, y) =
f (x)g(y), the integral over a rectangle is given by:
Z d Z b  Z b Z d 
f (x)g(y) dx dy = f (x) dx g(y) dy .
c a a c

Note that integrals over rectangles are characterized by the property that all the limits of
integration are constant.
Rπ Rπ Rπ
• R0 cos2 θ dθ = 0 sin2 θ dθ = π2 . (An easy way to remember this is to note that 0 cos2 θ dθ =
π 2 π
the graphs of the sine and cosine functions and that 0 (cos2 θ + sin2 θ) dθ =
R
R0π sin θ dθ from
π
0 1 dθ = θ 0 = π.)

Applying these facts to the present integral (7.4) gives:


ZZ Z 1 Z π 
(x2 + y 2 ) dx dy = 6 r3 dr (9 cos2 θ + 4 sin2 θ) dθ
D 0 0
1 1  π π
= 6 · r 4 0 · 9 · + 4 ·
4 2 2
1 13π
=6· ·
4 2
39π
= .
4
RR
Example 7.6. If D is the parallelogram with vertices (0, 0), (3, 2), (2, 3), (−1, 1), find D xy dx dy.
The difficulty here is that setting up an iterated integral in terms of x and y involves splitting
into three integrals in order to describe D. To simplify the domain, we use the fact previously
noted that a linear transformation T : R2 → R2 transforms the unit square into the parallelogram
determined by T (e1 ) and T (e2 ). Here, parallelogram D is determined by the vectors (3, 2) and
(−1, 1), so we want T (e1 ) = (3, 2) and T (e2 ) = (−1,1). Using Proposition 1.7 from Chapter 1, we
put these vectors into the columns of a matrix A = 32 −11 and define:


    
3 −1 u 3u − v
T (u, v) = = = (3u − v, 2u + v).
2 1 v 2u + v

www.dbooks.org
180 CHAPTER 7. CHANGE OF VARIABLES

Figure 7.6: A linear change of variables

Then T maps the unit square D∗ = [0, 1] × [0, 1] in the uv-plane onto D. See Figure 7.6. We leave
for the exercises the verification that T is one-to-one on R2 (Exercise 3.3), so we can use the change
of variables theorem to pull back the original integral to an integral over D∗ .
The substitutions given by T are x = 3u − v and y = 2u + v. Also, DT (u, v) = 23 −11 , and
 

| det DT (u, v)| = |3 + 2| = 5. Hence:


ZZ ZZ
xy dx dy = (3u − v)(2u + v) · 5 du dv
D D∗
Z 1 Z 1 
2 2
=5 (6u + uv − v ) dv du
0 0
1 2 1 3 v=1 
Z 1
2
=5 6u v + uv − v du
0 2 3 v=0
Z 1
1 1
=5 (6u2 + u − ) du
0 2 3
1 1 1
= 5 2u3 + u2 − u 0 )
4 3
1 1
= 5 (2 + − )
4 3
24 + 3 − 4
=5·
12
115
= .
12
Example 7.7. A subset D of the xy-plane is said to be symmetric in the y-axis if, whenever
(x, y) is in D, so is (−x, y) (Figure 7.7). Let D be such a subset.
RR
(a) Show that D x dx dy = 0.

(b) Let L = {(x, y) ∈ D : x ≤ {(x, y) ∈ D : x ≥ 0} be the left and right halves of D,


RR0} and R =RR
respectively. Show that L y dx dy = R y dx dy.
P RR
(a) The rough intuition here is that, in the Riemann sums x 4x 4y for D x dx dy, the symmetry
of D means that each contribution for x > 0 is balanced by an opposite contribution for x < 0. For
a more polished argument, let T : R2 → R2 be the reflection in the y-axis, T (x, y) = (−x, y). Since
D is symmetric in the y-axis, T transforms D into itself, i.e., T (D) = D. Hence in the notation
of the change of variables theorem, ∗
 −1 0  D = D. Since distinct points have distinct reflections, T is
one-to-one. Also DT (x, y) = 0 1 , which gives | det DT (x, y)| = | − 1| = 1.
7.3. EXAMPLES: LINEAR CHANGES OF VARIABLES, SYMMETRY 181

Figure 7.7: Symmetry in the y-axis

We apply the change of variables theorem with f (x, y) = x so that f (T (x, y)) = f (−x, y) = −x.
Then:
ZZ ZZ
x dx dy = f (T (x, y)) | det DT (x, y)| dx dy
D ∗
Z ZD =D
= −x · 1 dx dy
ZDZ
=− x dx dy.
D
RR RR
As a result, 2 D x dx dy = 0, so D x dx dy = 0. P RR
(b) Here, the intuition is that the contributions to the Riemann sums y 4x 4y for D y dx dy
are the same for (x, y) and (−x, y), hence the sums converge to the same value on L and on R.
More formally, again let T (x, y) = (−x, y). Then T (L) = R, i.e., R∗ = L, and | det DT (x, y)| = 1
as before. Here, f (x, y) = y, so f (T (x, y)) = f (−x, y) = y. Consequently:
ZZ ZZ
y dx dy = f (T (x, y)) | det DT (x, y)| dx dy
R ∗
Z ZR =L
= y · 1 dx dy
Z ZL
= y dx dy.
L

This is exactly what we wanted to show.


This example can be generalized. Namely, let D be symmetric in the y-axis, and let f : D → R
be an integrable real-valued function.
• If f (−x, y) = −f (x, y) for all (x, y) in D, then:
ZZ
f (x, y) dx dy = 0.
D

• If f (−x, y) = f (x, y) for all (x, y) in D, then:


ZZ ZZ
f (x, y) dx dy = f (x, y) dx dy.
L R

The justifications use essentially the same reasoning as in the example.

www.dbooks.org
182 CHAPTER 7. CHANGE OF VARIABLES

7.4 Change of variables for n-fold integrals


The change of variables theorem for double integrals has a natural analogue for functions of n
variables. Let T be a smooth transformation that sends a subset W ∗ of Rn onto a subset W of Rn
and that is one-to-one, except possibly on the boundary of W ∗ . See Figure 7.8. We think of T as
a function from (u1 , u2 , . . . , un )-space to (x1 , x2 , . . . , xn )-space, so:

T (u1 , u2 , . . . , un ) = x1 (u1 , u2 , . . . , un ), x2 (u1 , u2 , . . . , un ), . . . , xn (u1 , u2 , . . . , un ) .

Then the change of variables formula reads:


ZZ Z ZZ Z

··· f (x) dx1 dx2 · · · dxn = ··· f (T (u)) det DT (u) du1 du2 · · · dun .
W W∗

Figure 7.8: A 3-dimensional change of variables

More informally,
one uses T to substitute for x1 , x2 , . . . , xn in terms of u1 , u2 , . . . , un and sets
dx1 dx2 · · · dxn = det DT (u) du1 du2 · · · dun . This pulls back the integral over W to an integral
over W ∗ .
The intuition behind this generalization is the same as for double integrals: A linear transfor-
mation L : Rn → Rn represented by a matrix A alters n-dimensional volume by a factor of | det A|.
So, to obtain the integral, one chops up W ∗ into small pieces and, on each one, approximates T by
its first-order approximation, which alters volume by a factor of | det DT (u)|.
For triple integrals, there are two standard sets of alternate variables.

• Cylindrical coordinates. For this, the change of variables is T (r, θ, z) = (x, y, z), where,
as we saw in Section 5.4.2, the substitutions are given by:

 x = r cos θ

y = r sin θ

z = z.

 
cos θ −r sin θ 0
Therefore DT (r, θ, z) =  sin θ r cos θ 0. After expanding along the third column, this
0 0 1
 
cos θ −r sin θ
gives | det DT (r, θ, z)| = det = r, i.e.:
sin θ r cos θ

dx dy dz = r dr dθ dz.
7.4. CHANGE OF VARIABLES FOR N -FOLD INTEGRALS 183

• Spherical coordinates. This time, T (ρ, φ, θ) = (x, y, z) with conversions:



 x = ρ sin φ cos θ

y = ρ sin φ sin θ

z = ρ cos φ.

 
sin φ cos θ ρ cos φ cos θ −ρ sin φ sin θ
See Section 5.4.3. Then DT (ρ, φ, θ) =  sin φ sin θ ρ cos φ sin θ ρ sin φ cos θ. We leave as
cos φ −ρ sin φ 0
2
an exercise the calculation that | det DT (ρ, φ, θ)| = ρ sin φ (Exercise 2.8 in Chapter 6), i.e.:

dx dy dz = ρ2 sin φ dρ dφ dθ.

Example 7.8. Find the volume of a closed ball of radius a in R3 , that is, the volume of:

W = {(x, y, z) ∈ R3 : x2 + y 2 + z 2 ≤ a2 }.

See Figure 7.9.

Figure 7.9: A ball of radius a in R3


RRR
We begin with Vol (W ) = W 1 dx dy dz. As we are about to see, W is easy to describe in
spherical coordinates, so we convert to an integral in spherical coordinates. The endpoints depend
on the order of integration, and here we choose dρ dφ dθ. Thus the first integration is with respect
to ρ. For fixed φ and θ, ρ varies along a radial line segment from ρ = 0 to ρ = a. Then, for fixed θ,
φ rotates all the way down from the positive z-axis, φ = 0, to the negative z-axis, φ = π. Finally, θ
varies from θ = 0 to θ = 2π. In other words, W is described in ρφθ-space by the three-dimensional
box W ∗ = [0, a] × [0, π] × [0, 2π]. Hence:
ZZZ
Vol (W ) = 1 dx dy dz
Z Z ZW
= 1 · ρ2 sin φ dρ dφ dθ
W∗
Z 2π Z π Z a  
2
= ρ sin φ dρ dφ dθ.
0 0 0

The variables of the integrand separate into independent factors and all endpoints are constant, so,

www.dbooks.org
184 CHAPTER 7. CHANGE OF VARIABLES

as in the two-dimensional case, we can separate the integrals as well:


Z a Z π
Z 2π 
2
Vol (W ) = ρ dρ sin φ dφ 1 dθ
 0  0
 0
1 3 a π 2π
= ρ 0 − cos φ 0 θ 0
3
1
= a3 · 1 − (−1) · 2π

3
4
= πa3 .
3

That is:

The volume of a 3-dimensional ball of radius a is 34 πa3 .

This is often referred to as the volume of a sphere, though that is a misnomer. A sphere is a
surface, not a solid.
For those who would like more practice setting up these types of integrals, this example could
have been solved using cylindrical coordinates as well, where dx dy dzp = r dr dθ dz. We√use the
order of integration
√ dz dr dθ. For fixed r and θ, z varies from z = − a2 − x2 − y 2 = − a2 − r2
2 2
to z = + a − r . Moreover, the possible values of r and θ come from the two-dimensional disk
of radius a in the xy-plane, which in polar coordinates is described by 0 ≤ r ≤ a and 0 ≤ θ ≤ 2π.
Thus:

Z 2π Z a Z a2 −r2  
Vol (W ) = √ 1 · r dz dr dθ.
0 0 − a2 −r2
Z 2π Z a
z=√a2 −r2 

= rz √
dr dθ
0 0 z=− a2 −r2
Z 2π Z a p 
= 2 2
2r a − r dr dθ.
0 0

Since we already calculated the answer, we won’t complete the evaluation.

Example 7.9. Let W be the region in R3 inside bothRRR the circular cylinder x2 + y 2 = 4 and the
2 2 2
sphere x + y + z = 9 and above the xy-plane. Find 2 2
W (x + y )z dx dy dz.
The region of integration W is a silo-shaped solid bounded laterally by a circular cylinder and
capped with a sphere, as drawn in Figure 7.10. Because the base is a disk in the xy-plane, perhaps
the simplest way to describe W is with polar coordinates together with the z-coordinate, that is,
with cylindrical coordinates. Substituting x = r cos θ, y = r sin θ, and dx dy dz = r dr dθ dz gives:
ZZZ ZZZ ZZZ
2 2 2
(x + y )z dx dy dz = r z · r dr dθ dz = r3 z dr dθ dz.
W W in W in
cylindrical cylindrical

We describe W in cylindricalpcoordinates using


√ the order of integration dz dr dθ. For fixed r and
θ, z goes from z = 0 to z = 9 − x − y = 9 − r2 . The domain of values of (r, θ) comes from
2 2
7.4. CHANGE OF VARIABLES FOR N -FOLD INTEGRALS 185

Figure 7.10: The region bounded by a circular cylinder of radius 2, a sphere of radius 3, and the
xy-plane

the base of W , which is a disk of radius 2. Thus 0 ≤ r ≤ 2 and 0 ≤ θ ≤ 2π. Therefore:



ZZZ Z 2π Z 2 Z 9−r2  
2 2 3
(x + y )z dx dy dz = r z dz dr dθ
W 0 0 0
√ 2
2π Z 2
1 3 2 z= 9−r 
Z 
= r z dr dθ
0 0 2 z=0
Z 2π Z 2 
1 3
= r (9 − r2 ) dr dθ
0 0 2
Z 2 Z 2π 
1 3 2
= r (9 − r ) dr 1 dθ
2 0 0
  
1 9 4 1 6 2 2π
= r − r 0 θ0

2 4 6
1 32 
= · 36 − · 2π
2 3
1 76
= · · 2π
2 3
76
= π.
3

Example 7.10. Find the centroid 2 2 2


p (x, y, z) of the solid W bounded above by the sphere x +y +z =
8 and below by the cone z = x2 + y 2 , shown in Figure 7.11.
First, by symmetry, x = 0 and y = 0. More precisely, consider:
ZZZ
1
x= x dx dy dz.
Vol (W ) W

Note that W is symmetric in the yz-plane in the sense that (−x, y, z) is in W whenever (x, y, z)
is in W . In addition, the function being integrated, namely, f (x, y, z) = x, satisfies f (−x, y, z) =
−x = −f (x, y, z). The same sort of change of variables
RRRargument that we applied in Example 7.7
to study symmetry for double integrals shows that W x dx dy dz = 0, and hence x = 0. The
argument that y = 0 is similar using symmetry in the xz-plane.
To find z = Vol1(W )
RRR
W z dx dy dz, we actually must do some calculation. We begin with
RRR
Vol (W ) = W 1 dx dy dz. It is simplest to describe W using spherical coordinates. Using the

www.dbooks.org
186 CHAPTER 7. CHANGE OF VARIABLES


Figure 7.11: An ice cream cone-shaped solid, capped by a sphere of radius 2 2


order of integration dρ dφ dθ, for fixed φ and θ, ρ goes along a radial segment from ρ = 0 to ρ = 2 2,
the radius of the spherical cap. Then, for fixed θ, the angle φ varies from φ = 0 to φ = π4 . Lastly
θ goes all the way around from θ = 0 to θ = 2π. Hence:
ZZZ ZZZ
Vol (W ) = 1 dx dy dz = 1 · ρ2 sin φ dρ dφ dθ
W W in
spherical
π

Z 2π Z 4
Z 2 2  
2
= ρ sin φ dρ dφ dθ
0 0 0
√ Z π
Z 2 2 4
 Z 2π 
2
= ρ dρ sin φ dφ 1 dθ
0 0 0
1 3 2√2
   
π 2π
= ρ − cos φ 0
4
θ0

3 0
√  √ 
16 2 2
= · − − (−1) · 2π
3 2
32π √
= ( 2 − 1).
3

Similarly, substituting the conversion z = ρ cos φ:


ZZZ ZZZ
z dx dy dz = ρ cos φ · ρ2 sin φ dρ dφ dθ
W W in
spherical
π

Z 2π Z 4
Z 2 2  
3
= ρ cos φ sin φ dρ dφ dθ
0 0 0
√ Z π
Z 2 2 4
 Z 2π 
3
= ρ dρ cos φ sin φ dφ 1 dθ
 0 √  0
  0
1 4 2 2 1 π 2π
= ρ 0 sin2 φ 04 θ 0
4 2
1
= 16 · · 2π
4
= 8π.
7.4. CHANGE OF VARIABLES FOR N -FOLD INTEGRALS 187
1 √3
As a result, z = 32π

( 2−1)
· 8π = 4( 2−1)
, and the centroid of W is the point:
3

3 
(x, y, z) = 0, 0, √ .
4( 2 − 1)

As a reality√check, note that this is approximately the point (0, 0, 1.8), whereas the radius of the
sphere is 2 2 ≈ 2.8. Hence the centroid is roughly two-thirds of the way to the spherical top of
the region. Given the top-heavy shape of W , this is plausible.

We close with a quadruple integral.

Example 7.11. Find the volume of the 4-dimensional closed unit ball:

W = {(x1 , x2 , x3 , x4 ) ∈ R4 : x21 + x22 + x23 + x24 ≤ 1}.


RRRR
As usual, we begin with Vol (W ) = W 1 dx1 dx2 dx3 dx4 . This can be analyzed with tech-
niques similar to those we used for triple integrals. For instance, say we integrate first with respect
to x4 , treating x1 , x2 , and x3 as fixed. The projection of W onto the x1 x2 x3 -subspace is the
three-dimensional ball B given by x21 + x22 + x33 ≤ p 1. For each (x1 , x2 , x3 ) in B,
p the values of x4
for which (x1 , x2 , x3 , x4 ) ∈ W range from x4 = − 1 − x21 − x22 − x23 to x4 = + 1 − x21 − x22 − x23 .
This is indicated in Figure 7.12, which is supposed to be a drawing in four-dimensional space R4 .
Of course, this should be taken with a grain of salt, but we’re doing the best we can.

Figure
RRR R 7.12: The volume of a 4-dimensional ball as a triple integral of a single integral,
( . . . dx4 ) dx1 dx2 dx3

Hence:
Z Z Z Z √1−x2 −x2 −x2 
1 2 3
Vol (W ) = √ 1 dx4 dx1 dx2 dx3 (7.5)
B − 1−x22 −x22 −x23
ZZZ q
= 2 1 − x21 − x22 − x23 dx1 dx2 dx3 .
B

This triple integral can be evaluated, say using spherical coordinates, though it gets a little messy.
Here is an alternative. One can think of equation (7.5) above as expressing the volume of W as
the triple integral of a single integral.We look at it instead
 as a double integral of a double integral,
RR RR
along the lines of Vol (W ) = x3 ,x4 x1 ,x2 1 dx1 dx2 dx3 dx4 . More precisely, the projection of
W onto the x3 x4 -plane is the set of all pairs (x3 , x4 ) for which there is at least one (x1 , x2 ) such

www.dbooks.org
188 CHAPTER 7. CHANGE OF VARIABLES

that (x1 , x2 , x3 , x4 ) is in W . This is satisfied by all points in the unit disk x23 + x24 ≤ 1. In fact,
given such a (x3 , x4 ), the corresponding points (x1 , x2 ) are those such that x21 + x22 ≤ 1 − x23 − x24 .
We write this way of describing W as:
ZZ Z Z 
Vol (W ) = 1 dx1 dx2 dx3 dx4 .
x23 +x24 ≤1 x21 +x22 ≤1−x23 −x24

See Figure 7.13 (more nonsense).

Figure
RR RR 7.13: The volume of a 4-dimensional ball as a double integral of a double integral,
( . . . dx1 dx2 ) dx3 dx4
p
The inner integral represents the area of a two-dimensional disk of radius a = 1 − x23 − x24 in
2 2 2
RR
the x1 x2 -plane. Hence x2 +x2 ≤1−x2 −x2 1 dx1 dx2 = πa = π(1 − x3 − x4 ), whence:
1 2 3 4

ZZ
Vol (W ) = π(1 − x23 − x24 ) dx3 dx4 .
x23 +x24 ≤1

This is an integral over the unit disk in the x3 x4 -plane, so it can be evaluated using polar coordinates
with dx3 dx4 = r dr dθ:
Z 2π Z 1 
2
Vol (W ) = π(1 − r ) · r dr dθ
0 0
Z 1 Z 2π 
2
=π r(1 − r ) dr 1 dθ
0 0
  
1 2 1 4 1 2π
=π r − r 0 θ 0
2 4
1
= π · · 2π
4
π2
= .
2

We leave for the exercises the problems of finding the volume of a 5-dimensional ball and, more
broadly, a strategy for approaching the volume of an n-dimensional ball in general. See Exercises
4.9 and 4.10.
7.5. EXERCISES FOR CHAPTER 7 189

7.5 Exercises for Chapter 7


Section 1 Change of variables for double integrals

1.1. This exercise involves two standard one-variable integrals that appear regularly enough that it
seems like a good idea Rto get out into theRopen how they can be evaluated quickly. Namely, we
nπ nπ
compute the integrals 0 cos2 x dx and 0 sin2 x dx, where n is a positive integer. They can
be found using trigonometric identities, but there is another way that is easier to reproduce
on the spot.

2
(a) First, sketch the graphsRof y = cos2 x and
R nπy = sin x for x in the interval [0, nπ], and use
nπ 2 2
them to illustrate that 0 cos x dx = 0 sin x dx.
(b) This is an optional exercise for those who are uneasy about drawing the conclusion in
part (a) based only on a picture.
R nπ R nπ+ π
(i) Show that 0 sin2 x dx = π 2 cos2 x dx. (Hint: Use the substitution u = x + π2
2
and the identity sin(θ − π2 ) = − cos θ.)
Rπ R nπ+ π
(ii) Show that 02 cos2 x dx = nπ 2 cos2 x dx. (Hint: cos(θ − π) = − cos θ.)
R nπ R nπ
Deduce that 0 cos2 x dx = 0 sin2 x dx.
(c) Integrate the identity cos2 x + sin2 x = 1, and use part (a) (or (b)) to show that:
Z nπ Z nπ
nπ nπ
cos2 x dx = and sin2 x dx = .
0 2 0 2

y 2 dx dy, where D is the upper half-disk described by x2 + y 2 ≤ 1, y ≥ 0.


RR
1.2. Find D

1.3. Find D xy dx dy, where D is the region in the first quadrant lying inside the circle x2 +y 2 = 4
RR

and below the line y = x.

1.4. Find D cos(x2 + y 2 ) dx dy, where D is the region described by 4 ≤ x2 + y 2 ≤ 16.


RR

1.5. Let D be the region in the xy-plane satisfying x2 + y 2 ≤ 2, y ≥ x, and x ≥ 0. See Figure
7.14.

Figure 7.14: The region x2 + y 2 ≤ 2, y ≥ x, and x ≥ 0


ZZ p
Consider the integral 2 − x2 dx dy.
D

(a) Write an expression for the integral using the order of integration dy dx.
(b) Write an expression for the integral using the order of integration dx dy.

www.dbooks.org
190 CHAPTER 7. CHANGE OF VARIABLES

(c) Write an expression for the integral in polar coordinates in whatever order of integration
you prefer.
(d) Evaluate the integral using whichever approach seems best.

1.6. Let W be the region in R3 lying above the xy-plane, inside the cylinder x2 + y 2 = 1, and
below the plane x + y + z = 2. Find the volume of W .

1.7. Let W be the region in R3 lying above the xy-plane, inside the cylinder x2 + y 2 = 1, and
below the plane x + y + z = 1. Find the volume of W .
R∞ 2
1.8. In this exercise, we evaluate the improper one-variable integral −∞ e−x dx by following the
unlikely strategy of replacing it with a more tractable improper double integral. Let a be a
positive real number.

(a) Let Ra be the rectangle Ra = [−a, a] × [−a, a]. Show that:


ZZ Z a 2
−x2 −y 2 −x2
e dx dy = e dx .
Ra −a

2 −y 2
(b) Let Da be the disk x2 + y 2 ≤ a2 . Use polar coordinates to evaluate e−x
RR
Da dx dy.
that, as a goes to ∞, both
(c) Note RR Ra and Da fill out all of R2 . It is true that both
2 2 2 2
lim Ra e−x −y dx dy and lim Da e−x −y dx dy exist and that they are equal. Their
RR
a→∞ a→∞
2 2
common value is the improper integral R2 e−x −y dx dy. Use this information along
RR

with your answers to parts (a) and (b) to show that:


Z ∞
2 √
e−x dx = π.
−∞

Section 3 Examples: linear changes of variables, symmetry

3.1. Let D be the set of points (x, y) in R2 such that (x − 3)2 + (y − 2)2 ≤ 4.

(a) Sketch and describe D.


(b) Let T : R2 → R2 be the translation T (u, v) = (u, v) + (3, 2) = (u + 3, v + 2). Describe
the region D∗ in the uv-plane such that T (D∗ ) = D.
ZZ
(c) Use the change of variables in part (b) to convert the integral (x + y) dx dy over D
D
to an integral over D∗ . Then, evaluate the integral using whatever techniques seem best.
2 2
3.2. Let a and b be positive real numbers. Find the area of the region xa2 + yb2 ≤ 1 inside an
ellipse by using
RR the change of variables T (u, v) = (au, bv). (Recall from Chapter 5 that
Area (D) = D 1 dA.)

3.3. Let T : R2 → R2 be the linear change of variables T (u, v) = (3u − v, 2u + v) of Example 7.6.

(a) Show that the only point (u, v) that satisfies T (u, v) = 0 is (u, v) = 0.
(b) Show that T is one-to-one on R2 . (Hint: Start by assuming that T (a) = T (b), and use
part (a) and the linearity of T .)
RR
3.4. Find D (2x + 4y) dx dy if:
7.5. EXERCISES FOR CHAPTER 7 191

(a) D is the parallelogram with vertices (0, 0), (3, 1), (5, 5), and (2, 4),
(b) D is the triangular region with vertices (3, 1), (5, 5), and (2, 4). (Hint: Take advantage
of the work you’ve already done in part (a). You should be able to use quite a bit of it.)
ZZ
2 2
3.5. Find (x + y) ex −y dx dy, where D is the rectangle with vertices (0, 0), (1, −1), (3, 1),
D
and (2, 2).
ZZ p
3.6. Find 12 + x2 + 3y 2 dx dy, where D is the region in R2 described by x2 + 3y 2 ≤ 12 and
D
0≤y≤ √1 x.
3

3.7. This problem concerns the integral


ZZ
y xy
e dx dy, (7.6)
D x
where D is the region in the xy-plane bounded by the curves y = x, y = 4x, xy = 1, and
xy = 2. See Figure 7.15, left. The main idea is to simplify the integrand by making the
substitutions u = xy and v = xy .
(a) With these substitutions, solve for x and y as functions of u and v: (x, y) = T (u, v).
(b) Describe the region D∗ in the uv-plane such that T (D∗ ) = D.
(c) Evaluate the integral (7.6).

� = ��

�=�
D

�� = �

�� = �

Figure 7.15: The region bounded by y = x, y = 4x, xy = 1, and xy = 2 (left) and the region
x2 − 4xy + 8y 2 ≤ 4 (right)

2 2 2
Z Z D be the region in R described by x − 4xy + 8y ≤ 4. See Figure 7.15, right. Find
3.8. Let
y 2 dx dy. (Hint: x2 − 4xy + 8y 2 = (x − 2y)2 + 4y 2 .)
D

(x − y)2
ZZ
3.9. Find dx dy, where D is the square with vertices (2, 0), (4, 2), (2, 4), and (0, 2).
D (x + y)2
3.10. Let T : R2 → R2 be a one-to-one
  linear transformation from the uv-plane to the xy-plane
represented by a matrix A = ac db , i.e., (x, y) = T (u, v) = A · [ uv ]. Let D∗ and D be bounded
regions of R2 of positive area such that T maps D∗ to D, as shown in Figure 7.16.
(a) Let k = | det A|. Use the change of variables theorem to show that:
Area (D) = k · Area (D∗ ) .


www.dbooks.org
192 CHAPTER 7. CHANGE OF VARIABLES

Figure 7.16: A linear change of variables

(b) Let (u, v) and (x, y) denote the centroids of D∗ and D, respectively. Show that:

(x, y) = T (u, v).

In other words, linear transformations map centroids to centroids.

3.11. A region D in the xy-plane is called symmetric in the line y = x if, whenever a point
(x, y) is in D, so is the point (y, x). If D is such a region, prove that its centroid lies on the
line y = x.

3.12. Let D be a subset of R2 that is symmetric about the origin, that is, whenever (x, y) is in D,
so is (−x, −y).

RR f : D → R be an integrable function on D. Find a condition on f that ensures that


(a) Let
D f (x, y) dx dy = 0. Justify your answer.
(b) Find an example of such a function other than the zero function.

Section 4 Change of variables for n-fold integrals


ZZZ
4.1. Find (x2 + y 2 + z 2 ) dx dy dz if W is the region of R3 that lies above the cone z =
p W
x2 + y 2 and below the plane z = 2.
ZZZ
4.2. Find x2 z dx dy dz if W is the region of R3 described by x2 + y 2 + z 2 ≤ 4, y ≥ 0, and
W
z ≥ 0.
ZZZ
z
4.3. Find dx dy dz if W the region of R3 given by x2 + y 2 + z 2 ≤ 1, z ≥ 0.
(3 + + y 2 )2
W x2
(Hint: The proper coordinate system and order of integration can make a difference.)

4.4. A class of enthusiastic multivariable calculus students celebrates the change of variables theo-
rem by drilling a cylindrical hole of radius 1 straight through the center of the earth. Assuming
that the earth is a 3-dimensional closed ball of radius 2, find the volume of the portion of the
earth that remains.

4.5. Find the centroid of the half-ball of radius a in R3 that lies inside the sphere x2 + y 2 + z 2 = a2
and above the xy-plane.

4.6. Let W be the wedge-shaped solid in R3 that lies under the plane z = y, inside the cylinder
x2 + y 2 = 1, and above the xy-plane. Find the centroid of W .
7.5. EXERCISES FOR CHAPTER 7 193

4.7. Let W be the region of R3 that lies inside the three cylinders x2 + z 2 = 1, yR2 + z 2 = 1,
x 2 2
R +y2 = 1, and above the xy-plane. Find the volume of W . (Hints: The integral cos12 θ dθ =
sec θ dθ = tan θ + C might be helpful. Also, sin θ = sin2 θ · sin θ = (1 − cos2 θ) sin θ.)
3

4.8. Within the solid ball x2 + y 2 + z 2 ≤ a2 of radius a in R3 , find the average distance to the
origin.

4.9. Find the volume of the 5-dimensional closed unit ball:

W = {(x1 , x2 , x3 , x4 , x5 ) ∈ R5 : x21 + x22 + x23 + x24 + x25 ≤ 1}.


RR RRR
(Hint: Organize the calculation as a double integral of a triple integral: ( . . . ).)

4.10. If a > 0, let Wn (a) denote the closed ball x21 + x22 + · · · + x2n ≤ a2 in Rn of radius a centered
at the origin, and let Vol (Wn (a)) denote its n-dimensional volume. For instance, we showed
2
in Example 7.11 that Vol (W4 (1)) = π2 .

(a) Use the change of variables theorem to show that Vol (Wn (a)) = an Vol (Wn (1)).
(b) By thinking of Vol (Wn (1)) as a double integral of an (n − 2)-fold integral, find a rela-
tionship between Vol (Wn (1)) and Vol (Wn−2 (1)).
(c) Verify that your answer to part (b) predicts a correct relationship between Vol (W4 (1))
and Vol (W2 (1)).
(d) Find Vol (W6 (1)), the volume of the 6-dimensional closed unit ball.

www.dbooks.org
194 CHAPTER 7. CHANGE OF VARIABLES
Part V

Integrals of vector fields

195

www.dbooks.org
Chapter 8

Vector fields

Now that we know what it means to differentiate vector-valued functions of more than one variable,
we turn to integrating them. The functions that we integrate, however, are of a special type. The
integrals we consider are rather specialized, too. The functions are called vector fields, and this
short chapter is devoted to introducing them. The integration begins in the next chapter.

8.1 Examples of vector fields


The domain of a vector field F is a subset U of Rn . The values of F are in Rn as well, but we don’t
think so much of F as transforming U into its image. Rather, for a point x of U , we visualize the
value F(x) as an arrow translated to start at x.
For example, F might represent a force exerted on an object, and the force might depend on
where the object is. Then F(x) could represent the force exerted if the object is at the point x. Or
some medium might be flowing through U , and F(x) could represent the velocity of the medium
at x.
By taking a sample of points throughout U and drawing the corresponding arrows based at
those points, one can see how some vector quantity acts on U . For instance, with the aid of the
arrows, one might be able to visualize the path an object would follow under the influence of a
force or a flowing medium.
With this prelude, the actual definition is quite simple.

Definition. Let U be a subset of Rn . A function F : U → Rn is called a vector field on U .

A distinctive feature of vector fields is that both the elements x of the domain and their values
F(x) are in Rn , as opposed to a function in general from Rn to Rm .

Example 8.1. Let F : R2 → R2 be given by F(x, y) = (1, 0).


This vector field assigns the vector (1, 0) = i to every point of the plane. To visualize the vector
field, we draw the vector (1, 0), or rather a translated copy of it, at a representative collection of
points. That is, at each such point, we draw an arrow of length 1 pointing to the right. This is a
constant vector field. It is sketched on the left of Figure 8.1.

As the figure shows, the sample vectors often end up running into each other, making the picture
a little muddled. As a result, we usually depict a vector field by drawing a constant positive scalar
multiple of the vectors F(x), where the scalar factor is chosen to improve the readability of the
picture. For the vector field in the last example, this is done on the right of the figure. The lengths
of the vectors are scaled down, but the arrows still point in the correct direction.

197

www.dbooks.org
198 CHAPTER 8. VECTOR FIELDS

Figure 8.1: The constant vector field F(x, y) = (1, 0) = i literally (left) and scaled down (right)

Example 8.2. Let F : R2 → R2 be given by F(x, y) = (−y, x).


For example, F(1, 0) = (−0, 1) = (0, 1) and F(0, 1) = (−1, 0), as in Figure 8.2.

Figure 8.2: The vector field F(x, y) = (−y, x) at the points (1, 0) and (0, 1)

We can get some qualitative information of how F acts by noting, for example, that in the
first quadrant, where both x and y are positive, F(x, y) = (−y, x) has a negative first component
and positive second component, i.e., F(+, +) = (−, +), so the arrow points to the left and up.
Similarly, in the second quadrant F(−, +) = (−, −) (to the left and down), in the third quadrant
F(−, −) = (+, −) (to the right and down), and in the fourth F(+, −) = (+, +) (to the right and
up).

Figure 8.3: The vector field F(x, y) = (−y, x)

In fact, we can identify both the length and direction of F(x, y) more precisely:
8.1. EXAMPLES OF VECTOR FIELDS 199
p
• kF(x, y)k = (−y)2 + x2 = k(x, y)k,
• F(x, y) · (x, y) = (−y, x) · (x, y) = −yx + xy = 0.

In other words, the length of F(x, y) equals the distance of (x, y) from the origin, and its direction
is orthogonal to (x, y) in the counterclockwise direction. After drawing in a sample of arrows, one
can imagine F as describing a circular counterclockwise flow, or vortex, around the origin, growing
in magnitude as one moves further out, as in Figure 8.3.
y x
Example 8.3. Let W : R2 − {(0, 0)} → R2 be given by W(x, y) = − x2 +y

2 , x2 +y 2 , where (x, y) 6=

(0, 0).
This is a nonconstant positive scalar multiple of the previous example. The scalar factor is
1
x +y 2
2 . Thus the vectors point in the same direction as before, but now:

1 k(x, y)k 1
kW(x, y)k = k(x, y)k = = .
x2 +y 2 k(x, y)k 2 k(x, y)k

Hence the vector field also circulates about the origin, but its length is inversely proportional to the
distance from the origin. See Figure 8.4. We shall see later that this vector field has some notable
features.

y x

Figure 8.4: The vector field W(x, y) = − x2 +y 2 , x2 +y 2

Example 8.4 (The inverse square field). An important vector field G from classical physics is
characterized by the following properties.

• G is defined at every point of R3 except the origin.

• The length of G is inversely proportional to the square of the distance to the origin. In other
c
words, kG(x, y, z)k = k(x,y,z)k2 for some positive constant c.

• The direction of G(x, y, z) is from (x, y, z) to the origin. That is, G(x, y, z) is a negative
scalar multiple of (x, y, z).

See Figure 8.5. For instance, in physics, the gravitational force due to a mass at the origin and the
electrostatic force due to a charged particle at the origin are both modeled by such a field.

www.dbooks.org
200 CHAPTER 8. VECTOR FIELDS

Figure 8.5: An inverse square vector field

c
To find a formula for G, we write G(x, y, z) = k(x,y,z)k2
u, where u is the unit vector in the
(x,y,z) c (x,y,z)
direction from (x, y, z) to (0, 0, 0), that is, u =− k(x,y,z)k . Hence
G(x, y, z) = − k(x,y,z)k2 · k(x,y,z)k =
c
− k(x,y,z)k3 (x, y, z).

This is called an inverse square field. To summarize, it is the vector field G : R3 −{(0, 0, 0)} →
3
R given variously by:
c
G(x, y, z) = − (x, y, z)
k(x, y, z)k3
c
=− 2 (x, y, z)
(x + y + z 2 )3/2
2
 
cx cy cz
= − 2 ,− 2 ,− 2 ,
(x + y 2 + z 2 )3/2 (x + y 2 + z 2 )3/2 (x + y 2 + z 2 )3/2
for some positive constant c.
In the future, we take the constant of proportionality to be c = 1 and refer to the resulting
vector field G as “the” inverse square field.
These examples shall often serve as test cases for the concepts we are about to develop.

8.2 Exercises for Chapter 8


In Exercises 1.1–1.6, sketch the given vector field F. To get started, it may be helpful to get an
idea of the general direction of the arrows F(x, y) at points on the x and y-axes and in each of the
four quadrants.

1.1. F(x, y) = (x, y)

1.2. F(x, y) = (2x, y)

1.3. F(x, y) = (y, −x)

1.4. F(x, y) = (−2y, x)

1.5. F(x, y) = (y, x)

1.6. F(x, y) = (−x, y)


8.2. EXERCISES FOR CHAPTER 8 201

1.7. (a) Find a smooth vector field F on R2 such that, at each point (x, y), F(x, y) is a unit
vector normal to the parabola of the form y = x2 + c that passes through that point.
(b) Find a smooth vector field F on R2 such that, at each point (x, y), F(x, y) is a unit
vector tangent to the parabola of the form y = x2 + c that passes through that point.

1.8. (a) Find a smooth vector field F on R2 such that kF(x, y)k = k(x, y)k and, at each point
(x, y) other than the origin, F(x, y) is a vector normal to the curve of the form xy = c
that passes through that point.
(b) Find a smooth vector field F on R2 such that kF(x, y)k = k(x, y)k and, at each point
(x, y) other than the origin, F(x, y) is a vector tangent to the curve of the form xy = c
that passes through that point.

Let F be a vector field on an open set U in Rn . If we think of F as the velocity of a medium


flowing through U , then an object dropped into U will follow a path along which the velocity at
each point is the value of the vector field at the point. More precisely, a smooth path α : I → U
defined on an open interval I is called an integral path of F if α0 (t) = F(α(t)) for all t in I. The
curve traced out by α is called an integral curve.
For example, let F(x, y) = (−y, x) be the circulating vector field from Example 8.2. If a > 0,
let α : R → R2 be the parametrization of a circle of radius a given by:

α(t) = (a cos t, a sin t). (8.1)

Then α0 (t) = (−a sin t, a cos t), and F(α(t)) = F(a cos t, a sin t) = (−a sin t, a cos t). Since the two
are equal, α is an integral path of F. The integral curves are circles centered at the origin. This
reinforces our sense that F describes counterclockwise circular flow around the origin.

1.9. Let F(x, y) = (−2y, x). (This vector field also appears in Exercise 1.4.)
√ √ √ 
(a) If a is a real number, show that α(t) = 2 a cos( 2 t), a sin( 2 t) is an integral path
of F.
2
(b) Show that the integral curves satisfy the equation x2 + y 2 = a2 , and sketch a represen-
tative sample of them. Indicate the direction in which the curves are traversed by the
integral paths.

1.10. Let F(x, y) = (y, x). (This vector field also appears in Exercise 1.5.)
t + e−t ), a(et − e−t ) and α(t) = a(et −

(a) If a is a real number, show that α(t) = a(e
e−t ), a(et + e−t ) are integral paths of F.


(b) Describe the integral curves, and sketch a representative sample of them. Indicate the
direction in which the curves are traversed by the integral paths. (Hint: To find a
relationship between x and y on the integral paths, compute x(t)2 and y(t)2 .)
y x
1.11. Find integral paths for the vector field W(x, y) = (− x2 +y 2 , x2 +y 2 ), where (x, y) 6= (0, 0), from

Example 8.3 by modifying the integral paths for F(x, y) = (−y, x) given in equation (8.1).

Finding explicit formulas for the integral paths of a vector field may be difficult in prac-
tice. To simplify the discussion, we restrict ourselves to two-dimensional vector fields F(x, y) =
(F1 (x, y), F2 (x, y)) defined on open sets of R2 . If α(t) = (x(t), y(t)) is a path, then the condition
α0 (t) = F(α(t)) that it must satisfy to be an integral path can be written in terms of coordinates

www.dbooks.org
202 CHAPTER 8. VECTOR FIELDS
dy 
as dx

dt , dt = F1 (x(t), y(t)), F2 (x(t), y(t)) . In other words, the components x(t) and y(t) of α(t)
must satisfy the differential equations:
dx


 = F1 (x, y)
dt (8.2)
 dy = F (x, y).

2
dt
Let (x0 , y0 ) be a point of R2 . For the vector fields in Exercises 1.12–1.13, (a) solve the differential
equations (8.2) to find the integral path α(t) = (x(t), y(t)) of F that satisfies the condition α(0) =
(x0 , y0 ) and (b) describe the integral curves of F geometrically and sketch a representative sample
of them, including the direction in which they are traversed by the integral paths. (Note that these
vector fields also appear in Exercises 1.1 and 1.6, respectively.)

1.12. F(x, y) = (x, y)

1.13. F(x, y) = (−x, y)


Chapter 9

Line integrals

We begin integrating vector fields in the case that the domain of integration is a curve. The integrals
are called line integrals. By comparison, we learned in Section
Rb 2.3 about the integral over a curve
C of a real-valued function f . By definition, C f ds = a f (α(t)) kα0 (t)k dt, where α : [a, b] → Rn
R

is a smooth parametrization of C. This is called the integral with respect to arclength.


The integral of a vector field is a new concept—for example, we don’t simply integrate each of
the real-valued component functions of the vector field independently using the integral with respect
to arclength. In the grand scheme of things, line integrals are a richer subject than integrals with
respect to arclength. In part, this is because of the ways in which the line integral can be interpreted
and applied, many of which originate in physics (for instance, see Exercise 3.19 at the end of the
chapter). But in addition, on a more abstract level, line integrals provide us with a first peek into
a line of unifying results that relate various types of integrals to one another. By the end of the
book, we shall be in a position to see that these connections run quite deep.
We open the chapter by defining the line integral and discussing what it represents. There can
be more than one option for how to compute a given line integral, and we identify the circumstances
under which the different approaches apply. These computational techniques, however, follow from
theoretical principles that are interesting in their own right. For instance, for vector fields in the
plane, some line integrals can be expressed as double integrals. This result is known as Green’s
theorem. Throughout the chapter, the discussion reveals an active interplay between geometric
features of the curve, the vector field, and, on occasion, the geometry of the domain on which the
vector field is defined as well.

9.1 Definitions and examples


Let F : U → Rn be a vector field on an open set U in Rn , and let C be a curve in U . Roughly
speaking, the integral of F over C is meant to represent the “effectiveness” of F in contributing
to motion along C. By this, we mean the degree of alignment of F with the direction of motion.
For instance, suppose that F is a force field and that α(t) is the position of a moving object at
time t. Then the line integral of F over C is called the work done by F. If F points in the same
direction as the direction of motion, then F helps move the object along. If it points in the opposite
direction, then it impedes the motion. If it is orthogonal, it neither helps nor hinders.
In general, we measure the contribution of F to the motion at each point of C by finding the
scalar component Ftan of F in the direction Rtangent to the curve. See Figure 9.1. Then, we
integrate with respect to arclength, obtaining C Ftan ds.
Note that Ftan = kFk cos θ, where θ is the angle between F and the tangent direction. Hence

203

www.dbooks.org
204 CHAPTER 9. LINE INTEGRALS

Figure 9.1: The component Ftan of a vector field in the tangent direction

α0
if T = kα0 k is the unit tangent vector, then:

Ftan = kFk cos θ = kFk kTk cos θ = F · T.

By the definition of the integral with respect to arclength with f = Ftan , we obtain:
Z Z Z b
F(α(t)) · T(t) kα0 (t)k dt

Ftan ds = F · T ds =
C C a
α0 (t)  0
Z b
= F(α(t)) · 0 kα (t)k dt
a kα (t)k
Z b
= F(α(t)) · α0 (t) dt.
a

This motivates the following definition.

Definition. Let U be an open set in Rn , and let F : U → Rn be a continuous


R vector field on U . If
α : [a, b] → U is a smooth path in U , then the line integral, denoted α F · ds, is defined by:
Z Z b
F · ds = F(α(t)) · α0 (t) dt.
α a

Written out in coordinates, say as F = (F1 , F2 , . . . , Fn ) and α = (x1 , x2 , . . . , xn ), the definition


becomes:
Z Z b
dx1 dx2 dxn 
F · ds = (F1 ◦ α, F2 ◦ α, . . . , Fn ◦ α) · , ,..., dt
α a dt dt dt
Z b
dx1 dx2 dxn 
= (F1 ◦ α) + (F2 ◦ α) + · · · + (Fn ◦ α) dt. (9.1)
a dt dt dt
This suggests an alternative notation for the line integral:
Z Z
F · ds = F1 dx1 + F2 dx2 + · · · + Fn dxn . (9.2)
α α

The integrand F1 dx1 + F2 dx2 + · · · + Fn dxn is called a differential form, or, more precisely,
a differential 1-form. This notation is very convenient, because it is easy to remember how to
get from (9.2) to (9.1) in order to evaluate the integral. One simply uses the parametrization to
substitute for x1 , x2 , . . . , xn in terms of the parameter t wherever they appear in the integrand.
This includes the substitution dxi = dx dt dt for i = 1, 2, . . . , n. We use differential form notation for
i

line integrals whenever possible.


9.1. DEFINITIONS AND EXAMPLES 205

One final word about notation: we have chosen to write the line integral as an integral over
α rather than over the underlying curve C to acknowledge that the definition makes use of the
specific parametrization α. This is an important point, and we elaborate on it momentarily.
Just as in first-year calculus a function can be integrated over many intervals, so can a vector
field be integrated over many paths. The relation between the vector field and the path is reflected
in the value of the integral.

Example 9.1. Let F be the vector field on R2 given by F(x, y) = (−y, x). This is the vector field
from Chapter 8 that circulates around the origin, shown again in Figure 9.2.

Figure 9.2: The circulating vector field F(x, y) = (−y, x)

Integrate F over the following paths.

(a) α1 (t) = (cos t, sin t), 0 ≤ t ≤ 2π


(b) α2 (t) = (t, t), 0 ≤ t ≤ 3
(c) α3 (t) = (cos t, sin t), 0 ≤ t ≤ π/2
(d) α4 (t) = (1 − t, t), 0 ≤ t ≤ 1
(e) α5 (t) = (cos2 t, sin2 t), 0 ≤ t ≤ π/2
R R
In differential form notation, the integrals αF · ds we seek are written α −y dx + x dy.

Figure 9.3: The vector field F(x, y) = (−y, x) along the unit circle (left), the line y = x (middle),
and the line x + y = 1 (right)

(a) The path α1 traces out the unit circle counterclockwise. The direction of the vector field and
the direction of motion are aligned at every point. See Figure 9.3 at left. As a result, F · T is
always positive, and we expect the integral to be positive as well. In fact, for this parametrization,

www.dbooks.org
206 CHAPTER 9. LINE INTEGRALS

x = cos t and y = sin t, so substituting in the definition gives:


Z Z 2π
dx dy 
−y dx + x dy = −y(t) + x(t) dt
α1 0 dt dt
Z 2π 
= − sin t · (− sin t) + cos t · cos t dt
0
Z 2π
= 1 dt
0

= t 0
= 2π.

(b) The path α2 is a line segment radiating out from (0, 0) to (3, 3) along the line y = x. At every
point of this path, the vector field is orthogonal to the direction of motion (Figure 9.3, middle). In
other words, F · T = 0 at every point, so we would expect the integral to be zero, too. To confirm
this: Z Z Z 3 3
−y dx + x dy = −t · 1 + t · 1) dt = 0 dt = 0.
α2 0 0

(c) This again moves along the unit circle, but only along the arc from (1, 0) to (0, 1). Using the
work from part (a):
Z Z π
2 π π
−y dx + x dy = 1 dt = t 02 = .
α3 0 2
(d) The path α4 also goes from (1, 0) to (0, 1), but here the parametrization satisfies x + y =
(1 − t) + t = 1 so the curve is a segment contained in the line x + y = 1. The vector field is neither
perfectly aligned with the direction of motion nor perpendicular to it, but it does have a positive
component in the direction of motion at every point (Figure 9.3, right). Thus we expect the line
integral to be positive. Indeed:
Z Z 1 
Z 1
−y dx + x dy = −t · (−1) + (1 − t) · 1) dt = 1 dt = 1.
α4 0 0

(e) α5 is another path from (1, 0) to (0, 1), and, again, x + y = cos2 t + sin2 t = 1. Thus the path
traces out the same line segment as in part (d), though in a different way. The value of the line
integral is:
Z Z π  
2
2 2

−y dx + x dy = − sin t · 2 cos t · (− sin t) + cos t · 2 sin t cos t dt
α5 0
Z π
2
2 sin3 t cos t + 2 cos3 t sin t dt

=
0
Z π
2
= 2 sin t cos t (sin2 t + cos2 t) dt
0
Z π
2
= 2 sin t cos t dt
0
π
= sin2 t 02
= 1.
9.1. DEFINITIONS AND EXAMPLES 207

Note that each of the paths α3 , α4 , and α5 above goes from (1, 0) to (0, 1). The values of the
line integral are not all equal, however. Thus changing the path between the endpoints can change
the integral. On the other hand, α4 and α5 not only go between the same two points but also trace
out the same curve, just parametrized in different ways. The integrals over these two paths are
equal. Is this a coincidence?
In general, suppose that α : [a, b] → U and β : [c, d] → U are smooth parametrizations of the
same curve C. To help distinguish between them, we use t to denote the parameter of α and u for
the parameter of β. It simplifies the argument somewhat to assume that α is one-to-one, except
possibly at the endpoints, where α(a) = α(b) is allowed, and likewise for β. This is not strictly
necessary, though it has been true of all the examples of parametrizations we have encountered
thus far. Rb Rd
We want to compare the integrals α F · ds = a F(α(t)) · α0 (t) dt and β F · ds = c F(β(u)) ·
R R

β 0 (u) du obtained using the respective parametrizations. Consider first the case that α and β trace
out C in the same direction, that is, both start at a point p = α(a) = β(c) and end at a point
q = α(b) = β(d), as in Figure 9.4.
Let x be a point of C other than the endpoints. Then x = α(t) for some value of t and x = β(u)
for some value of u. We write this as t = g(u), that is, g(u) is the value of t such that α(t) = β(u).
We also set g(c) = a and g(d) = b. Thus:
β(u) = α(g(u)) for all u in [c, d]. (9.3)
For instance, in the case of the line segments α4 and α5 above, α5 (u) = (cos2 u, sin2 u) = 1 −
sin2 u, sin2 u) = α4 (sin2 u), so g(u) = sin2 u.

Figure 9.4: Two parametrizations α and β that trace out the same curve in the same direction:
g(c) = a and g(d) = b

By equation (9.3) and the chain rule:7


β 0 (u) = α0 (g(u)) g 0 (u). (9.4)
Hence:
Z Z d
F · ds = F(β(u)) · β 0 (u) du (by definition)
β c
Z d
= F(α(g(u))) · α0 (g(u)) g 0 (u) du (let t = g(u), dt = g 0 (u) du)
c
Z g(d) Z b Z
= F(α(t)) · α0 (t) dt = F(α(t)) · α0 (t) dt = F · ds.
g(c) a α
7
We assume here that g is smooth. Indeed, equation (9.3) may be the best way to define what it means in the
first place for α and β to parametrize the same curve C, namely, that there exists a smooth function g : [c, d] → [a, b]
such that β(u) = α(g(u)) and g 0 (u) 6= 0 for all u in [c, d]. Then, we say that α and β traverse C in the same direction
if g 0 > 0 and in opposite directions if g 0 < 0.

www.dbooks.org
208 CHAPTER 9. LINE INTEGRALS

In other words, both parametrizations give the same value of the line integral.
If the parametrizations traverse C in opposite directions so that, say p = α(a) = β(d) and
q = α(b) = β(c), then the only difference is that the endpoints are reversed. That is, g(d) = a and
g(c) = b, so:
Z Z g(d)
F · ds = F(α(t)) · α0 (t) dt (just as before, but . . . )
β g(c)
Z a
= F(α(t)) · α0 (t) dt
b
Z b
=− F(α(t)) · α0 (t) dt (9.5)
Za

= − F · ds.
α

The integrals differ by a sign.


We summarize this as follows.
Proposition 9.2. If α and β are smooth parametrizations of the same curve C, then:
Z Z
F · ds = ± F · ds,
β α

where the choice of sign depends on whether they traverse C in the same or opposite direction.
This allows us to define a line integral over a curve, as opposed to over a parametrization.
Definition. An oriented curve is a curve C with a specified direction of traversal. Given such a
C, define: Z Z
F · ds = F · ds,
C α
where α is any smooth parametrization that traverses C in the given direction. By the preceding
proposition, any two such parametrizations give the same answer.
The point is that the line integral depends only on the underlying curve and the direction
in which it is traversed. This is critical in that it allows us to think of the line integral as an
integral over an oriented geometric set, independent of the parametrization that is used to trace
it out. This is an issue that we overlooked previously when we discussed integrals over curves
with respect to arclength and integrals over surfaces with respect to surface area. In both of those
cases, the integrals were defined using parametrizations, but we did not check that the answers
were independent of the particular parametrization used to trace out the curve R or surface.
R
In the case of the integral with respect to arclength, the story is that α f ds = β f ds for any
smooth parametrizations α and β of C. The new twist is that the integral is not only independent
of the parametrization but also of the direction in which R it traverses
Rb C. The proof proceeds as
before, but a difference arises because the definition α f ds = a f (α(t))kα0 (t)k dt involves the
norm kα0 (t)k, not α0 (t) itself. This introduces a factor of |g 0 (u)| after applying the chain rule as
in equation (9.4) and taking norms. If α and β traverse C in opposite directions, then g 0 (u) < 0,
so |g 0 (u)| = −g 0 (u). This minus sign is counteracted by the minus sign that appears in this case
when reversing the limits of integration in (9.5), so the two minus signs cancel. We leave the details
for the exercises (Exercise 1.12). The role of parametrizations in surface integrals is discussed in
Chapter 10.
Having cleared the air on that lingering matter, we now return to studying line integrals.
9.1. DEFINITIONS AND EXAMPLES 209
R
Example 9.3. Evaluate C x dx+2y dy+3z dz if C is the oriented curve consisting of three quarter-
circular arcs on the unit sphere in R3 from (1, 0, 0) to (0, 1, 0) to (0, 0, 1) and then back to (1, 0, 0).
See Figure 9.5.

Figure 9.5: A curve consisting of three arcs C1 , C2 , and C3

The vector field being integrated here is given by F(x, y, z) = (x, 2y, 3z). Since C consists of
three arcs C R 1 , C2 , C3 , Rwe can imagine
R traversing
R each of them in succession and breaking up the
integral as C F · ds = C1 F · ds + C2 F · ds + C3 F · ds. Moreover, we may parametrize each arc
however we like as long as it is traversed in the proper direction. In this way, we can avoid having
to parametrize all of C over a single parameter interval.
For C1 , we use the usual parametrization of the unit circle in the xy-plane from (1, 0, 0) to
(0, 1, 0): α1 (t) = (cos t, sin t, 0), 0 ≤ t ≤ π2 .
Z Z π
2 
x dx + 2y dy + 3z dz = cos t · (− sin t) + 2 sin t · cos t + 3 · 0 · 0 dt
C1 0
Z π π
2 1 2 1
= sin t cos t dt = sin2 t = .
0 2 0 2

For C2 , we use a similar parametrization, only now in the yz-plane from (0, 1, 0) to (0, 0, 1):
α2 (t) = (0, cos t, sin t), 0 ≤ t ≤ π2 .
Z Z π
2 
x dx + 2y dy + 3z dz = 0 · 0 + 2 cos t · (− sin t) + 3 sin t · cos t dt
C2 0
Z π π
2 1 2
2 1
= sin t cos t dt = sin t = .
0 2 0 2

Lastly for C3 we travel in the xz-plane from (0, 0, 1) to (1, 0, 0): α3 (t) = (sin t, 0, cos t), 0 ≤ t ≤
π
2.
Z Z π
2 
x dx + 2y dy + 3z dz = sin t · cos t + 2 · 0 · 0 + 3 cos t · (− sin t) dt
C3 0
Z π π
2
2
2
= −2 sin t cos t dt = − sin t = −1.
0 0
1 1
R
Adding these three calculations gives C x dx + 2y dy + 3z dz = 2 + 2 − 1 = 0.
This example illustrates a general circumstance that we should make explicit. In addition to
integrating over smooth oriented curves, we can integrate over curves C that are concatenations of

www.dbooks.org
210 CHAPTER 9. LINE INTEGRALS

smooth oriented curves, that is, curves of the form C = C1 ∪ C2 ∪ · · · ∪ Ck , where C1 , C2 , . . . , Ck are
traversed in succession and each of them has a smooth parametrization. Such a curve C is called
piecewise smooth. Under these conditions, we define:
Z Z Z Z
F · ds = F · ds + F · ds + · · · + F · ds.
C C1 C2 Ck

This allows us to deal with curves that have corners, as in the example, without worrying about
whether the entire curve can be parametrized in a smooth way.

9.2 Another word about substitution


Before moving on, we comment on the approach we took to defining the line integral. A curve is
one-dimensional in the sense that it can be traced out using one parameter. Thus we might think
of an integral over a curve as essentially a one-variable integral. The one-variable integrals that we
are used to are those from first-year calculus, integrals over intervals in the real line.
The integral of a vector field F in Rn over a curve C is a new type of integral, however. F itself
is a function of n variables x1 , x2 , . . . , xn , but, along C, each of these variables can be expressed
in terms of the parameter t. This is exactly the information provided by the parametrization
α(t) = (x1 (t), x2 (t), . . . , xn (t)), a ≤ t ≤ b. So the idea is that substituting for x1 , x2 , . . . , xn in
terms of t converts an integral over C to an integral over the interval I = [a, b], where things are
familiar.
This fits into a typical pattern. Whereas the parametrization α transforms the interval I into
the curve C, substitution has the effect of pulling back an integral over C to an integral over I.
For instance, after substitution, the component function Fi (x1 , x2 , . . . , xn ) of F(x1 , x2 , . . . , xn ) is
converted to the function Fi (x1 (t), x2 (t), . . . , xn (t)) = Fi (α(t)). We say that α pulls back Fi to
Fi ◦ α and write:
α∗ (Fi ) = Fi ◦ α.
dxi
Similarly, the expression dxi is pulled back to dt dt, and we write:

dxi
α∗ (dxi ) = dt.
dt
Actually, the xi ’s on each side of this equation mean slightly different things (coordinate label
vs. function of t) and it would be better to write α∗ (dxi ) = dα dt dt, but hopefully our cavalier
i

attitude will not lead us astray.


For a line integral, by pulling back the individual terms, the entire integrand F1 dx1 + F2 dx2 +
· · · + Fn dxn in terms of x1 , x2 , . . . , xn is pulled back to the expression:
 
∗ dx1 dx2 dxn
α (F1 dx1 + F2 dx2 + · · · + Fn dxn ) = (F1 ◦ α) + (F2 ◦ α) + · · · + (Fn ◦ α) dt
dt dt dt

in terms of t. If we denote F1 dx1 + F2 dx2 + · · · + Fn dxn by ω, then the definition of the line
integral reads: Z Z
ω = α∗ (ω). (9.6)
α(I) I

This looks a lot like the change of variables theorem. For instance, compare with equation (7.3) of
Chapter 7. We verified that the value of the integral is independent of the parametrization α so
9.3. CONSERVATIVE FIELDS 211

that it is well-defined as an integral over the oriented curve C = α(I). The chain rule played an
essential, though easily missed, role in the verification. See equation (9.4).
This approach recurs more generally. In order to define a new type of integral, we use substitu-
tion to pull the integral back to a familiar situation. This is done in a way that allows the change
of variables theorem and the chain rule to conspire to make the process legitimate.

9.3 Conservative fields


If C is a smooth oriented curve n
R in an open set U of R from a point p to a point q as in Figure 9.6,
the value of the line integral C F · ds does not depend on how C is parametrized as long as it is
traversed in the correct direction. We have seen, however, that it does depend on the actual curve
C: choosing a different curve from p to q can change the integral. Are there situations where even
changing the curve doesn’t matter?

Figure 9.6: An oriented curve C from p to q


Rb
By definition, C F · ds = a F(α(t)) · α0 (t) dt, where α : [a, b] → U is a smooth parametrization
R

that traces out C with the proper orientation. Now, suppose—and this is a big assumption—that
F is the gradient of a smooth real-valued function f , that is, F = ∇f , or, in terms of components,
∂f
(F1 , F2 , . . . , Fn ) = ( ∂x , ∂f , . . . , ∂x
1 ∂x2
∂f
n
). Then F(α(t)) · α0 (t) = ∇f (α(t)) · α0 (t) = (f ◦ α)0 (t), where
the last equality is the Little Chain Rule. In other words, the integrand in the definition of the line
integral is a derivative. So:
Z Z b
b
F · ds = (f ◦ α)0 (t) dt = (f ◦ α)(t) a = f (α(b)) − f (α(a)) = f (q) − f (p).
C a

This depends only on the endpoints p and q and not at all on the curve that connects them.
In order to apply the Little Chain Rule, this argument requires that C have a smooth parametri-
zation. If C is piecewise smooth instead, say C = C1 ∪ C2 ∪ · · · ∪ Ck , where each Ci is smooth and
C1 goes from p to p1 , C2 from p1 to p2 , and so on until Ck goes from pk−1 to q, then applying the
previous result to the pieces gives a telescoping sum in which the intermediate endpoints appear
twice with opposite signs:
Z Z Z Z
F · ds = F · ds + F · ds + · · · + F · ds
C C1 C2 Ck
  
= f (p1 ) − f (p) + f (p2 ) − f (p1 ) + · · · + f (q) − f (pk−1 )
= −f (p) + f (q)
= f (q) − f (p).
In other words, in the end, the same result holds, and again only p and q matter.
Definition. Let U be an open set in Rn . A vector field F : U → Rn is called a conservative, or
a gradient, field on U if there exists a smooth real-valued function f : U → R such that F = ∇f .
Such an f is called a potential function for F.

www.dbooks.org
212 CHAPTER 9. LINE INTEGRALS

We have proven the following.


Theorem 9.4 (Conservative vector field theorem). If F is a conservative vector field on U with
potential function f , then: Z
F · ds = f (q) − f (p)
C
for any piecewise smooth oriented curve C in U from p to q.
A couple of other ways to write this are:
Z
∇f · ds = f (q) − f (p)
C
Z
∂f ∂f ∂f
or dx1 + dx2 + · · · + dxn = f (q) − f (p).
C ∂x1 ∂x2 ∂xn
We shall give some examples to illustrate the theorem shortly, but first we present an immediate
general consequence that also helps explain in part why the word conservative is used. (See Exercise
3.19 for another reason.)
Definition. A curve C is called closed if it starts and ends at the same point. See Figure 9.7 for
some examples.

Figure 9.7: Examples of closed curves: an ellipse (left) and a figure eight (middle) in R2 and a knot
in R3 (right)

Corollary 9.5. If F is conservative on U , then for any piecewise smooth oriented closed curve C
in U : Z
F · ds = 0.
C

Example 9.6. Let F be the vector field on R3 given by F(x, y, z) = (x, 2y, 3z). Is F a conservative
field?
We look to see if there is a real-valued function f such that ∇f = F, that is, ∂f ∂f
∂x = x, ∂y = 2y,
and ∂f 1 2 2 3 2
∂z = 3z. By inspection, it’s easy to see that f (x, y, z) = 2 x + y + 2 z works. Thus F is
conservative with fR as potential function.
It follows that C x dx + 2y dy + 3z dz = 0 for any piecewise smooth oriented closed curve C in
R3 . This explains the answer of 0 that we obtained in Example 9.3 when we integrated F around
the closed curve consisting of three circular arcs (Figure 9.5).
Similarly, if C is any curve from (1, 0, 0) to (0, 1, 0), such as the arc C1 in the xy-plane of
Example 9.3, then by the conservative vector field theorem:
Z
1 1
x dx + 2y dy + 3z dz = f (0, 1, 0) − f (1, 0, 0) = (0 + 1 + 0) − ( + 0 + 0) = .
C 2 2
Again, this agrees with what we got before. More importantly, it is a much simpler way to evaluate
the integral.
9.3. CONSERVATIVE FIELDS 213

Example 9.7. Let G be the inverse square field:


 
x y z
G(x, y, z) = − 2 ,− ,− ,
(x + y 2 + z 2 )3/2 (x2 + y 2 + z 2 )3/2 (x2 + y 2 + z 2 )3/2
R
where (x, y, z) 6= (0, 0, 0). Find C G · ds if C is the portion of the helix parametrized by α(t) =
(cos t, sin t, t), 0 ≤ t ≤ 3π.
We could calculate the integral explicitly using the definition and the parametrization, but,
inspired by the previous example, we check first to see if G is conservative. After a little trial-
and-error, we find that g(x, y, z) = (x2 + y 2 + z 2 )−1/2 = k(x,y,z)k1
works as a potential funcion.
∂g
For instance, ∂x = − 21 (x2 + y 2 + z 2 )−3/2 · 2x = − (x2 +y2x+z 2 )3/2 , and so on. Since C goes from
p = α(0) = (1, 0, 0) to q = α(3π) = (−1, 0, 3π), we find that:
Z
1
G · ds = g(−1, 0, 3π) − g(1, 0, 0) = √ − 1.
C 1 + 9π 2
The truth is that evaluating this integral directly using the parametrization is actually fairly
straightforward. Try it—it turns out that it’s not as bad as it might look. This is often the case
with conservative fields. Instead, perhaps the most meaningful takeaway from the approach we
1
R
took is that we can say that C G · ds = √1+9π 2
− 1 for any piecewise smooth curve in the domain
3
of G, R − {(0, 0, 0)}, from p = (1, 0, 0) to q = (−1, 0, 3π). That the given curve is a helix is
irrelevant. Indeed, more generally, if C is any piecewise smooth curve in R3 − {(0, 0, 0)} from a
1
point p to a point q, then using the potential function g(x) = kxk above gives:
Z
1 1
G · ds = − .
C kqk kpk
In particular, the integral of the inverse square field around any piecewise smooth oriented closed
curve that avoids the origin is 0.
Example 9.8. Let F be the counterclockwise circulation field F(x, y) = (−y, x) on R2 (Figure
9.2). Is F a conservative field?
This time the answer is no, and we give a couple of ways of looking at it. First, if F were
conservative, then its integral around any closed curve would be zero. But just by looking at the
way the field circulates around the origin, one can see that the integral around a circle centered
at the origin and oriented counterclockwise is positive. In fact, Rwe calculated earlier in Example
9.1 that, if C is the unit circle traversed counterclockwise, then C −y dx + x dy = 2π. Since C is
closed and the integral is not zero, F is not conservative.
A more direct approach is to try to guess a potential function f . Here, we want ∇f = F =
(−y, x), that is: (
∂f
∂x = −y
∂f (9.7)
∂y = x.

To satisfy the first condition, we might try f (x, y) = −xy, but then ∂f
∂y = −x, not x, and there
doesn’t seem to be any way to fix it.
This shows that our first guess didn’t work, but it’s possible that we are simply bad guessers.
To show that (9.7) is truly inconsistent, we bring in the second-order partials. For if there were a
function f that satisfied (9.7), then:
∂2f ∂ ∂f  ∂ ∂2f ∂ ∂f  ∂
= = −y) = −1 and = = x) = 1.
∂y ∂x ∂y ∂x ∂y ∂x ∂y ∂x ∂y ∂x
This contradicts the equality of mixed partials. Hence no such f exists.

www.dbooks.org
214 CHAPTER 9. LINE INTEGRALS

This last argument is easily generalized.

Theorem 9.9 (Mixed partials theorem). Let F be a smooth vector field on an open set U in Rn ,
written in terms of components as F = (F1 , F2 , . . . , Fn ). If F is conservative, then:

∂Fi ∂Fj
= for all i, j = 1, 2, . . . , n.
∂xj ∂xi

∂Fi ∂Fj
Equivalently, if ∂xj 6= ∂xi for some i, j, then F is not conservative.

Proof. Assume that F is conservative, and let f be a potential function. Then F = ∇f , that is,
∂f 2f
(F1 , F2 , . . . , Fn ) = ( ∂x , ∂f , . . . , ∂x
1 ∂x2
∂f
n
∂f
). Therefore Fi = ∂x i
∂f
and Fj = ∂x j
∂Fi
, and so ∂x j
= ∂x∂j ∂xi
and
∂Fj ∂2f
∂xi = ∂xi ∂xj . By the equality of mixed partials, these are equal.

We look at what the mixed partials theorem says for R2 and R3 . For a vector field F = (F1 , F2 )
on a subset of R2 , there is just one mixed partials pair: ∂F ∂F2
∂y and ∂x . Thus:
1

∂F1 ∂F2
In R2 , if F = (F1 , F2 ) is conservative, then ∂y = ∂x .

Next, for F = (F1 , F2 , F3 ) on a subset of R3 , there are three mixed partials pairs: (i) ∂F ∂y and
1

∂F2 ∂F1 ∂F3 ∂F2 ∂F3


∂x , (ii) ∂z and ∂x , and (iii) ∂z and ∂y . We describe a slick way to compare the partials in
each pair.
∂ ∂ ∂
Let ∇ = ( ∂x , ∂y , ∂z ). This is not a legitimate vector, but we think of it as an “operator.” For
∂ ∂ ∂
instance, it acts on a real-valued function f (x, y, z) by scalar multiplication: ∇f = ( ∂x , ∂y , ∂z )f =
( ∂f ∂f ∂f
∂x , ∂y , ∂z ). This agrees with the usual notation for the gradient.
For a vector field F = (F1 , F2 , F3 ), ∇ acts by the cross product:
 
i j k
∂ ∂ ∂ 
∇ × F = det  ∂x ∂y ∂z
F1 F2 F3
 
∂F3 ∂F2 ∂F3 ∂F1  ∂F2 ∂F1
= − ,− − , −
∂y ∂z ∂x ∂z ∂x ∂y
 
∂F3 ∂F2 ∂F1 ∂F3 ∂F2 ∂F1
= − , − , − .
∂y ∂z ∂z ∂x ∂x ∂y

This is called the curl of F. Note that its components are precisely the differences of the three
mixed partials pairs. Hence:

In R3 , if F = (F1 , F2 , F3 ) is conservative, then ∇ × F = 0.

Example 9.10. Let F(x, y, z) = (2xy 3 z 4 + x, 3x2 y 2 z 4 + 3, 4x2 y 3 z 3 + y 2 ). Is F conservative?


We begin by trying to guess a potential function. In order to get ∂f 3 4
∂x = 2xy z + x, we guess
∂f
f = x2 y 3 z 4 + 2 x2 . Then, to get ∂y = 3x2 y 2 z 4 + 3, we modify this to f = x2 y 3 z 4 + 21 x2 + 3y. Then,
1

to get ∂f 2 3 3 2 2 3 4 1 2 2
∂z = 4x y z + y , we modify again to f = x y z + 2 x + 3y + y z. Unfortunately, this
∂f
spoils ∂y . It seems unlikely that a potential function exists.
9.4. GREEN’S THEOREM 215

Having raised the suspicion, we compute the curl:


 
i j k
∂ ∂ ∂
∇ × F = det  ∂x ∂y ∂z

3 4 2 2 4 2 3 3
2xy z + x 3x y z + 3 4x y z + y 2

= 12x2 y 2 z 3 + 2y − 12x2 y 2 z 3 , 8xy 3 z 3 − 8xy 3 z 3 , 6xy 2 z 4 − 6xy 2 z 4




= (2y, 0, 0).

This is not identically 0, so F is not conservative.


∂Fi ∂F
We remark that it is not true in general that, if ∂xj
= ∂xji for all i, j, then F is conservative.
In other words, the converse of the mixed partials theorem does not hold. This is a point to which
we return in Sections 9.5 and 9.6.

9.4 Green’s theorem


We now discuss the first of three major theorems of multivariable calculus, each of which is named
after somebody. The one at hand applies to vector fields F = (F1 , F2 ) on subsets of R2 .
Suppose that we integrate such a vector field F around the boundary of a rectangle R =
[a, b] × [c, d] in R2 , oriented counterclockwise. Let’s call this oriented curve C. We break up C into
the four sides C1 , C2 , C3 , C4 of the rectangle but orient the horizontal sides C1 and C3 to the right
and the vertical sides C2 and C4 upward. See Figure 9.8.

Figure 9.8: The boundary of a rectangle R = [a, b] × [c, d]

Taking into account the orientations, we write C = C1 ∪ C2 ∪ (−C3 ) ∪ (−C4 ), where the minus
signs indicate that the segment has been given the orientation opposite the direction in which it is
actually traversed in C. Then the line integral over C can be written as:
Z Z Z Z Z 
F1 dx + F2 dy = + − − F1 dx + F2 dy. (9.8)
C C1 C2 C3 C4

Hopefully, the notation, where the integral signs are meant to distribute over the integrand, is
self-explanatory.
For the horizontal segment C1 , we use x as parameter, keeping y = c fixed: α1 (x) = (x, c),
dx dy d

a ≤ x ≤ b. Then, along C1 , dx = 1 and dx = dx c = 0, so:
Z Z b 
Z b
F1 dx + F2 dy = F1 (x, c) · 1 + F2 (x, c) · 0 dx = F1 (x, c) dx.
C1 a a

www.dbooks.org
216 CHAPTER 9. LINE INTEGRALS

Similarly, for C3 :
Z Z b
F1 dx + F2 dy = F1 (x, d) dx.
C3 a
For the vertical segments C2 and C4 , we use y as parameter, c ≤ y ≤ d, keeping x fixed. Hence
dx
dy = 0 and dy
dy = 1, which gives:
Z Z d Z Z d
F1 dx + F2 dy = F2 (b, y) dy and F1 dx + F2 dy = F2 (a, y) dy.
C2 c C4 c

Substituting these four calculations into equation (9.8) results in:


Z Z b Z d Z b Z d
F1 dx + F2 dy = F1 (x, c) dx + F2 (b, y) dy − F1 (x, d) dx − F2 (a, y) dy
C a c a c
Z d Z b
 
= F2 (b, y) − F2 (a, y) dy − F1 (x, d) − F1 (x, c) dx.
c a
Somewhat remarkably, the fundamental theorem of calculus applies to each of these terms:
Z Z d x=b Z b y=d
 
F1 dx + F2 dy = F2 (x, y)
dy − F1 (x, y) dx
C c x=a a y=c
Z d Z b  Z b Z d 
∂F2 ∂F1
= (x, y) dx dy − (x, y) dy dx
a ∂x ∂y
ZcZ a c
∂F2 ∂F1 
= − dx dy. (9.9)
R ∂x ∂y
This expresses the line integral around the boundary of the rectangle as a certain double integral
over the rectangle itself.
Next, suppose that D is a rectangle with a rectangular hole, as on the left of Figure 9.9. We
examine the relationship (9.9), working from right to left, for this new domain. We write D as the
union of four rectangles R1 , R2 , R3 , R4 that overlap at most along portions of their boundaries, as
on the right of Figure 9.9, so that the double integral may be decomposed:
ZZ Z Z ZZ ZZ ZZ 
∂F2 ∂F1  ∂F2 ∂F1 
− dx dy = + + + − dx dy.
D ∂x ∂y R1 R2 R3 R4 ∂x ∂y

Figure 9.9: A rectangle with a rectangular hole

Let Cj be the boundary of Rj , j = 1, 2, 3, 4, oriented counterclockwise. Equation (9.9) applies


to each of the integrals over the rectangles R1 , R2 , R3 , and R4 , which yields:
ZZ Z Z Z Z 
∂F2 ∂F1 
− dx dy = + + + F1 dx + F2 dy.
D ∂x ∂y C1 C2 C3 C4
9.4. GREEN’S THEOREM 217

Note that, for the boundaries of R1 , R2 , R3 , R4 , the portions in the interior of D appear as part of
two different Cj traversed in opposite directions. Thus these portions of the line integrals cancel
out, leaving only the parts left exposed around the boundary of the original region D. Hence if C
denotes the boundary of D, we have:
ZZ Z
∂F2 ∂F1 
− dx dy = F1 dx + F2 dy.
D ∂x ∂y C

This is the same relation as we obtained for the filled-in rectangle in equation (9.9). As the
argument shows, it is valid provided that C is oriented so that the outer perimeter is traversed
counterclockwise and the inner perimeter is traversed clockwise, as shown in Figure 9.10.

Figure 9.10: The oriented boundary of a rectangle with a rectangular hole

Before generalizing, we need a further definition.

Definition. We have said that a curve C is called closed if it starts and ends at the same point.
In addition, C is said to be simple if, apart from this, it does not intersect itself.

For example, a circle or ellipse is simple, but a figure eight is not.


Now, let D be a subset of R2 whose boundary consists of a finite number of simple closed curves.
We adopt the following notation:

∂D denotes the boundary of D oriented so that, as


you traverse the boundary, D stays on the left.

For instance, this describes the orientation of the boundary of the rectangle-with-rectangular-hole
in Figure 9.10. Note that oriented boundaries are completely different from partial derivatives,
even though the same symbol “∂ ” happens to be used for both.
The relation we found for line integrals around the oriented boundaries of rectangles or rectan-
gles with holes holds in greater generality.

Theorem 9.11 (Green’s theorem). Let U be an open set in R2 , and let D be a bounded subset of
U such that the boundary of D consists of a finite number of piecewise smooth simple closed curves.
If F = (F1 , F2 ) is a smooth vector field on U , then:
Z ZZ  
∂F2 ∂F1
F1 dx + F2 dy = − dx dy.
∂D D ∂x ∂y
R
If F happens to be a conservative field, then ∂D F1 dx + F2 dy = 0, since ∂D consists of closed
curves and the integral of a conservative field around any closed curve is RR
0. On the other
 hand, by
∂F2 ∂F1 ∂F2 ∂F1
the mixed partials theorem, the mixed partials ∂x and ∂y are equal, so D ∂x − ∂y dx dy = 0
as well. Thus for conservative fields, Green’s theorem confirms the generally accepted belief that
0 = 0.

www.dbooks.org
218 CHAPTER 9. LINE INTEGRALS

We have proven Green’s theorem only in very special cases. The argument given for a rectangle
with a hole, however, can be extended easily to any domain D that is a union of finitely many
rectangles that intersect at most in pairs along portions of their common boundaries. For a general
region D, one strategy for giving a proof is to show that D can be approximated by such a union
of rectangles so that Green’s theorem is approximately true for D. By taking a limit of such
approximations, the theorem can be proven. This requires considerable technical expertise which
we leave for another course.
Example 9.12. Let F(x, y) = (x − y 2 , xy). Find C F · ds if C is the simple closed curve consisting
R

of the semicircular arc in the upper half-plane from (2, 0) to (−2, 0) followed by the line segment
from (−2, 0) to (2, 0). See Figure 9.11.

Figure 9.11: An integral around the boundary of a half-disk

Let D be the filled in half-disk bounded by C so that C = ∂D. By Green’s theorem:


Z ZZ
2 ∂ ∂
(x − y 2 ) dx dy

(x − y ) dx + xy dy = (xy) −
C=∂D ∂x ∂y
Z D

= y − (−2y) dx dy
D
ZZ
= 3y dx dy.
D

This last double integral can be evaluated using polar coordinates:


Z Z π Z 2 
2
(x − y ) dx + xy dy = 3r sin θ · r dr dθ
C 0 0
Z 2 Z π 
2
=3 r dr sin θ dθ
 0  0

1 3 2 π
=3 r − cos θ 0
3 0
8
=3· ·2
3
= 16.

Example 9.13. Once again, consider the vector field F(x, y) = (−y, x) that circulates counter-
clockwise about the origin. Let C be a piecewise smooth simple closed curve in R2 , oriented
counterclockwise. For example, it could be the one shown in Figure 9.12. Can we say anything
about the integral of F over C?
There are portions of C that go counterclockwise about the origin, for instance, the parts
furthest from the origin. The tangential component of F along these portions is positive. But there
9.5. THE VECTOR FIELD W 219

Figure 9.12: Integrating the circulating vector field F(x, y) = (−y, x) around a simple closed curve
C

may also be places where C moves in the clockwise direction around the origin. For instance, if C
does not encircle the origin, this happens at points nearest the origin. The tangental component
is negative along such places. The two types of contributions counteract one another, but the
magnitude of F increases as you move away from the origin so we might expect the positive
contribution to be greater. In fact, using Green’s theorem, we can measure precisely how much
greater it is.
Let D be the filled-in region bounded by C. Then, by Green’s theorem:
Z ZZ ZZ
∂ ∂ 
−y dx + x dy = (x) − (−y) dx dy = 2 dx dy = 2 Area (D).
C D ∂x ∂y D

In general, the same reasoning shows the following fact.

Corollary 9.14. Let D be a subset of R2 that satisfies the assumptions of Green’s theorem. Then:
Z
1
Area (D) = −y dx + x dy.
2 ∂D

This illustrates the striking principle that information about all of D can be recovered from
measurements taken only along its boundary.

9.5 The vector field W


One of the early examples we looked at when we started studying vector fields was the vector field
W : R2 − {(0, 0)} → R2 given by:
 
y x
W(x, y) = − 2 , .
x + y 2 x2 + y 2

This was Example 8.3 in Chapter 8. Like our usual circulating vector field, W circulates counter-
1
clockwise around the origin, but, due to the factor of x2 +y 2 , the arrows get shorter as you move

away (Figure 9.13). We are about to see that W is an instructive example to keep in mind.
First, we integrate W around the circle Ca of radius a centered at the origin and oriented
counterclockwise. We evaluate the integral in two different ways.
Approach A. We use Green’s theorem. Let Da be the filled-in disk of radius a so that Ca = ∂Da .
See Figure 9.14.

www.dbooks.org
220 CHAPTER 9. LINE INTEGRALS

Figure 9.13: The vector field W

Figure 9.14: The circle and disk of radius a: Ca = ∂Da

By the quotient rule:


(x2 + y 2 ) · 1 − x · 2x y 2 − x2
 
∂W2 ∂ x
= = =
∂x ∂x x2 + y 2 (x2 + y 2 )2 (x2 + y 2 )2
(x2 + y 2 ) · 1 − y · 2y y 2 − x2
 
∂W1 ∂ y
and = − 2 = − = .
∂y ∂y x + y2 (x2 + y 2 )2 (x2 + y 2 )2
∂W2 ∂W1
Hence ∂x = ∂y ,
so Green’s theorem gives:
Z ZZ   ZZ
y x ∂W2 ∂W1
− 2 dx + dy = − dx dy = 0 dx dy = 0.
Ca x + y2 x2 + y 2 Da ∂x ∂y Da

Approach B. We calculate the line integral directly using a parametrization of Ca , say α(t) =
(a cos t, a sin t), 0 ≤ t ≤ 2π. This gives:
Z
y x
− 2 2
dx + 2 dy
Ca x + y x + y2
Z 2π  
a sin t a cos t
= − 2 · (−a sin t) + · (a cos t) dt
0 a cos2 t + a2 sin2 t a2 cos2 t + a2 sin2 t
Z 2π 2 2
a sin t + a2 cos2 t
= dt
0 a2
Z 2π 2
a
= dt
0 a2
Z 2π
= 1 dt
0
= 2π.
9.5. THE VECTOR FIELD W 221

In other words, 0 = 2π.


Evidently, there’s been a mistake. The truth is that Approach A is flawed. The disk Da contains
the origin, and the origin is not in the domain of W. Thus Green’s theorem does not apply to W
and Da . Instead, the correct answer is B:
Z
y x
− 2 2
dx + 2 dy = 2π. (9.10)
Ca x +y x + y2

This calculation has some noteworthy consequences.

• The value of the integral is 2π regardless of the radius of the circle.


• W is not conservative, as integrating it around a closed curve does not always give 0.
∂W2 ∂W1
• Nevertheless, all possible mixed partials pairs are equal: ∂x = ∂y . Hence the converse of
the mixed partials theorem can fail.

We broaden the analysis by considering any piecewise smooth simple closed curve C in the
plane, oriented counterclockwise, that does not pass through the origin, meaning that (0, 0) ∈ / C.
This doesn’t seem like very much to go on, but it turns out that we can say a lot about the integral
of W over C.
There are two cases, corresponding to the fact that C separates the plane into two regions,
which we refer to as the interior and the exterior.
Case 1. (0, 0) lies in the exterior of C. See Figure 9.15.

Figure 9.15: The origin lies in the exterior of C: ∂D = C.

Then the region D bounded by C does not contain the origin, so Approach A above is valid and
Green’s theorem applies with C = ∂D. Since the mixed partials of W are equal, it follows that:
Z ZZ  
y x ∂W2 ∂W1
− 2 dx + 2 dy = − dx dy = 0.
C x + y2 x + y2 D ∂x ∂y

Case 2. (0, 0) lies in the interior of C. See Figure 9.16.


As we saw with the flaw in Approach A, we can no longer fill in C and use Green’s theorem.
Instead, let C be a circle of radius  centered at the origin and oriented counterclockwise, where 
is chosen small enough that C is contained entirely in the interior of C.
Let D be the region inside of C but outside of C . The origin does not belong to D, so Green’s
theorem applies. Since the mixed partials of W are equal, this gives:
Z ZZ
y x ∂W2 ∂W1 
− 2 2
dx + 2 2
dy = − dx dy = 0.
∂D x +y x +y D ∂x ∂y

www.dbooks.org
222 CHAPTER 9. LINE INTEGRALS

Figure 9.16: The origin lies in the interior of C: ∂D = C ∪ (−C ).

On the other hand, taking orientation into account, ∂D = C ∪ (−C ), so:


Z Z Z
y x y x y x
− 2 2
dx + 2 2
dy = − 2 2
dx + 2 2
dy − − 2 2
dx + 2 dy.
∂D x + y x + y C x + y x + y C x + y x + y2

Combining these results gives:


Z Z
y x y x
− 2 2
dx + 2 2
dy = − 2 2
dx + 2 dy.
C x +y x +y C x +y x + y2

But we showed in equation (9.10) that the integral of W around any counterclockwise circle about
the origin, such as C , is 2π. Hence the value around C must be the same.
To summarize, we have proven:
Z (
y x 0 if (0, 0) lies in the exterior of C,
− 2 2
dx + 2 2
dy = (9.11)
C x + y x + y 2π if (0, 0) lies in the interior of C.

If C is oriented clockwise instead, then the change in orientation reverses the sign of the integral, so
the corresponding values in the two cases are 0 and −2π, respectively. That there are exactly three
possible values of the integral over all piecewise smooth oriented simple closed curves in R2 −{(0, 0)}
may be somewhat surprising.

9.6 The converse of the mixed partials theorem


We now discuss briefly and informally conditions regarding when a vector field F = (F1 , F2 , . . . , Fn )
is conservative. On the one hand, we have seen that, if F is conservative, then:

C1. the integral of F over any piecewise smooth oriented curve C depends only on the endpoints
of C,
C2. the integral of F over any piecewise smooth oriented closed curve is 0, and
∂Fi ∂Fj
C3. all mixed partial pairs are equal: ∂xj = ∂xi for all i, j.

So these conditions are necessary for F to be conservative. Are any of them sufficient? It turns
out that the converses corresponding to C1 and C2 are true, but we know from the example of W
in the previous section that the converse of C3 is not. Computing the mixed partials and seeing
if they are equal is the simplest of the three conditions to check, however, so we could ask if there
are special circumstances under which the converse of C3 does hold. We investigate this next and
toss in a few remarks about the other two conditions along the way.
9.6. THE CONVERSE OF THE MIXED PARTIALS THEOREM 223

Definition. A subset U of Rn is called simply connected if:

• every pair of points p, q in U can be joined by a piecewise smooth curve in U and


• every closed curve in U can be continuously contracted within U to a point.

For example, Rn itself is simply connected. So are rectangles and balls. A sphere in R3 is simply
connected as well. A closed curve in any of them can be collapsed to a point within the set rather
like a rubber band shrinking in on itself. See Figure 9.17.

Figure 9.17: Examples of simply connected sets: a disk, a rectangle, and a sphere

The punctured plane R2 − {(0, 0)} is not simply connected, however. A closed curve that goes
around the origin cannot be shrunk to a point without passing through the origin, i.e., without
leaving the set. Similarly, an annulus in R2 , that is, the ring between two concentric circles, is not
simply connected. An example of a non-simply connected surface is the surface of a donut, which is
called a torus. A loop that goes through the hole cannot be contracted to a point while remaining
on the surface. These examples are shown in Figure 9.18.

Figure 9.18: Examples of non-simply connected sets and closed curves that cannot be contracted
within the set: the punctured plane, an annulus, and a torus

Theorem 9.15 (Converse of the mixed partials theorem). Let F = (F1 , F2 , . . . , Fn ) be a smooth
vector field on an open set U in Rn . If:
∂Fi ∂Fj
• ∂xj = ∂xi for all i, j = 1, 2, . . . , n and
• U is simply connected,

then F is a conservative vector field on U .

A rigorous proof of this theorem is quite hard, so we postpone even any discussion of it until
after we look at an example.
y x

Example 9.16. Consider again the vector field W(x, y) = − x2 +y 2 , x2 +y 2 , (x, y) 6= (0, 0), of

Section 9.5. We established that W is not conservative on R2 − {(0, 0)}. On the other hand,
the mixed partials of W are equal; therefore Theorem 9.15 implies that W is conservative when
restricted to any simply connected subset of R2 − {(0, 0)}.

www.dbooks.org
224 CHAPTER 9. LINE INTEGRALS

For instance, let U be the right half-plane, U = {(x, y) ∈ R2 : x > 0}. This is a simply connected
set, so a potential function for W must exist on U . Indeed, one can check that w(x, y) = arctan( xy )
works. For instance, ∂w 1
2 · − xy2 = − x2 +y
y

∂x = y
2.
1+ x
To illustrate how this might be used, suppose that C is a piecewise smooth curve in the right
half-plane that goes from a point p, say on the line y = −x in the fourth quadrant, to a point q
on the line y = x in the first quadrant, as in Figure 9.19.

Figure 9.19: A curve in the right half-plane from the line y = −x to the line y = x

Using w as potential function:


Z
y x π π π
− 2 2
dx + 2 2
dy = w(q) − w(p) = arctan(1) − arctan(−1) = − (− ) = .
C x + y x + y 4 4 2

Note that this difference in the arctangent is the angle about the origin swept out by C.
The same potential function works in the left half-plane x < 0. Similarly, on the upper half-
plane y > 0 and the lower half-plane y < 0, v(x, y) = − arctan( xy ) works as a potential function for
W. Integrating W over a curve in any of these half-planes amounts to finding the counterclockwise
change in angle traced out by the curve.
We can use these observations to go beyond the result of (9.11) and give a more complete
description of the integral of W over curves in R2 − {(0, 0)}. Any such curve C can be divided
into pieces, each of which lies in one of the four half-planes above. By summing the integrals over
each of these pieces, we find that the integral of W over all of C is the total counterclockwise angle
about the origin swept out by C from start to finish.
If C is a closed
R curve so that it comes back to where it started, the total angle is an integer
y x
multiple of 2π: C − x2 +y 2 dx + x2 +y 2 dy = 2πn. This conclusion does not require that C be
simple. The integer n represents the net number of times that C goes around the origin in the
counterclockwise sense and is given a special name.

Definition. Let C be a piecewise smooth oriented closed curve in R2 −{(0, 0)}. Then the winding
number of C with respect to (0, 0) is the integer defined by the equation:
Z
1 y x
winding # = − dx + 2 dy.
2π C x2 + y 2 x + y2

See Figure 9.20 for some examples.

For instance, if Ca is the circle of radius a centered at the origin, traversed once counterclockwise,
R y x
then we showed in Section 9.5 that Ca − x2 +y 2 dx + x2 +y 2 dy = 2π (see equation (9.10)). Hence
according to the definition, Ca has winding number 1.
9.6. THE CONVERSE OF THE MIXED PARTIALS THEOREM 225

Figure 9.20: Oriented closed curves with winding number 1 (left), 2 (middle), and −1 (right)

One reason that this discussion is interesting is that, in first-year calculus, we tend to focus on
the integral of a function as telling us mainly about the function. The winding number, on the
other hand, is an instance where the integral is telling us about the geometry of the domain, i.e.,
the curve, over which the integral is taken.

We close by going over some of the ideas behind the proof of Theorem 9.15, the converse of
the mixed partials theorem. We do this only in the case of a vector field F = (F1 , F2 ) on an open
set U in R2 . Thus we assume that ∂F ∂F1
∂x = ∂y and that U is simply connected, and we want to
2

prove that F is conservative on U , that is, that it has a potential function f . We think of f as an
“anti-gradient” of F. Then the method of finding f is a lot like what is done in first-year calculus
for the fundamental theorem, where an antiderivative is constructed by integrating.
Choose a point a of U . If x is any point of U , then by assumptionR there is aR piecewise smooth
curve C in U from a to x. The idea is to show that the line integral C F · ds = C F1 dx + F2 dy is
the same regardless of which curve C is chosen. For suppose that C1 is another piecewise smooth
curve in U from a to x. See Figure 9.21. We want to prove that:
Z Z
F1 dx + F2 dy = F1 dx + F2 dy.
C C1

Figure 9.21: Two curves C and C1 from a to x and the region D that fills in the closed curve
C2 = C ∪ (−C1 )

By reversing the orientation on C1 and appending it to C, we obtain a closed curve C2 =


C ∪ (−C1 ) in U . Here’s the hard part. Since U is simply connected, a closed curve such as C2
can be shrunk within U to a point. The idea is that, in the process of the contraction, C2 sweeps
out a filled-in region D such that C2 = ±∂D. It is difficult to turn this intuition into a rigorous
proof. Assuming it has been done, we apply Green’s theorem using the assumption of equal mixed
partials: Z ZZ   ZZ
∂F2 ∂F1
F1 dx + F2 dy = ± − dx dy = ± 0 dx dy = 0. (9.12)
C2 D ∂x ∂y D

www.dbooks.org
226 CHAPTER 9. LINE INTEGRALS

At the same time, since C2 = C ∪ (−C1 ), we have:


Z Z Z
F1 dx + F2 dy = F1 dx + F2 dy − F1 dx + F2 dy. (9.13)
C2 C C1
R R
By combining equations (9.12) and (9.13), we find that C F1 dx + F2 dy = C1 F1 dx + F2 dy, as
desired. R
As a result, if x ∈ U , we may define f (x) = C F1 dx + F2 dy, where C is any piecewise smooth
curve in U from a to x. We write this more suggestively as:
Z x
f (x) = F1 dx + F2 dy. (9.14)
a

Note that, whether U is simply connected or not, as long as condition C1 above holds—namely,
the line integral depends only on the endpoints of a curve—then the formula in equation (9.14)
describes a well-defined function f . The reasoning above also contains the argument that, if con-
dition C2 holds—that the integral around any closed curve is 0—that too is enough to show that
the function f in (9.14) is well-defined. This basically follows from equation (9.13).
Thus to show that any of C1, C2, or, on simply connected domains, C3 suffice to imply that F
is conservative, it remains only to confirm that f in (9.14) is a potential function for F: ∂f
∂x = F1
∂f
and ∂y = F2 . This part is actually not so bad, and we leave it for the exercises (Exercise 6.4).

9.7 Exercises for Chapter 9


Section 1 Definitions and examples

1.1. Let F be the vector field on R2 given by F(x, y) = (x, y).

(a) Sketch the vector field.


(b) Based on your drawing, describe some paths over which you expect the integral of F to
be positive. Find an example of such a path, and calculate the integral over it.
(c) Describe some paths over which you expect the integral to be 0. Find an example of
such a path, and calculate the integral over it.

Find C F · ds if F(x, y) = (−y, x) and C is the circle x2 + y 2 = a2 of radius a, traversed


R
1.2.
counterclockwise.

Find C F·ds if F(x, y) = (cos2 x, sin y cos y) and C is the curve parametrized by α(t) = (t, t2 ),
R
1.3.
0 ≤ t ≤ π.
R
1.4. Find C F · ds if F(x, y, z) = (x + y, y + z, x + z) and C is the curve parametrized by α(t) =
(t, t2 , t3 ), 0 ≤ t ≤ 1.
R
1.5. Find C F·ds if F(x, y, z) = (−y, x, z) and C is the curve parametrized by α(t) = (cos t, sin t, t),
0 ≤ t ≤ 2π.
R
1.6. Consider the line integral C (x + y) dx + (y − x) dy.

(a) What is the vector field that is being integrated?


(b) Evaluate the integral if C is the line segment parametrized by α(t) = (t, t), 0 ≤ t ≤ 1,
from (0, 0) to (1, 1).
9.7. EXERCISES FOR CHAPTER 9 227

(c) Evaluate the integral if C consists of the line segment from (0, 0) to (1, 0) followed by
the line segment from (1, 0) to (1, 1).
(d) Evaluate the integral if C consists of the line segment from (0, 0) to (0, 1) followed by
the line segment from (0, 1) to (1, 1).
R
1.7. Consider the line integral C xy dx + yz dy + xz dz.

(a) What is the vector field that is being integrated?


(b) Evaluate the integral if C is the line segment from (1, 2, 3) to (4, 5, 6).
(c) Evaluate the integral if C is the curve from (1, 2, 3) to (4, 5, 6) consisting of three consec-
utive line segments, each parallel to one of the coordinate axes: from (1, 2, 3) to (4, 2, 3)
to (4, 5, 3) to (4, 5, 6).

1.8. Find C (x − y 2 ) dx + 3xy dy if C is the portion of the parabola y = x2 from (−1, 1) to (2, 4).
R

1.9. Find C y dx + z dy + x dz if C is the curve of intersection in R3 of the cylinder x2 + y 2 = 1


R

and the plane z = x + y, oriented counterclockwise as viewed from high above the xy-plane,
looking down.

For the line integrals in Exercises 1.10–1.11, (a) sketch the curve C over which the integral
is taken and (b) evaluate the integral. (Hint: Look for ways to reduce, or eliminate, messy
calculation.)
2
8z e(x+y) dx + (2x2 + y 2 )2 dy + 6xy 2 z dz, where C is parametrized by:
R
1.10. C

α(t) = (sin t)t+12 , (sin t)t+12 , (sin t)t+12 , 0 ≤ t ≤ π/2




8xy dx − ln cos2 (x + y) + π x y 1,000,000 dy, where C is parametrized by:


R 
1.11. C e
(
(cos t, sin t) if 0 ≤ t ≤ π,
α(t) =
(cos t, − sin t) if π ≤ t ≤ 2π

1.12. If α : [a, b] → Rn is a smooth parametrization of a curve C and f is a continuous real-


valued function
R defined on C, Rlet us temporarily denote the integral with respect to arc-
length by α f ds rather than C f ds. Write out the details that the integral is indepen-
dent of parametrization in the sense that, if α : [a, b] → Rn and β : [c, d] → Rn are smooth
parametrizations of the same curve C, then:
Z Z
f ds = f ds.
α β

Consider both cases that α and β traverse C in the same or opposite directions. You may
assume α and β are one-to-one, except possibly at the endpoints of their respective domains.

1.13. In this exercise, we return to the geometry of curves in R3 and study the role of the
parametrization in defining curvature. Recall that, if C is a curve in R3 with a smooth
parametrization α : I → R3 , then the curvature is defined by the equation:

kT0α (t)k
κα (t) = ,
vα (t)

www.dbooks.org
228 CHAPTER 9. LINE INTEGRALS

where Tα (t) is the unit tangent vector and vα (t) is the speed. We have embellished our original
notation with the subscript α to emphasize our interest in the effect of the parametrization.
We assume that vα (t) 6= 0 for all t so that κα (t) is defined.
Let β : J → R3 be another such parametrization of C. As in the text, we assume that α and
β are one-to-one, except possibly at the endpoints of their respective domains. As in equation
(9.3), there is a function g : J → I, assumed to be smooth, such that β(u) = α(g(u)) for all
u in J. In other words, g(u) is the value of t such that β(u) = α(t).

(a) Show that the velocities of α and β are related by vβ (u) = vα g(u) g 0 (u).


(b) Show that the unit tangents are related by Tβ (u) = ±Tα g(u) .

(c) Show that κβ (u) = κα g(u) .
In other words, if x = β(u) = α(t) is any point of C other than an endpoint, then
κβ (u) = κα (t). We denote this common value by κ(x), i.e., κ(x) is defined to be κα (t)
for any smooth parametrization α of C, where t is the value of the parameter such that
α(t) = x. In other words, the curvature at x depends only on the point x and not on
how C is parametrized.

Section 3 Conservative fields

3.1. Let F : R2 → R2 be the vector field F(x, y) = (y, x).

(a) Is F a conservative vector field? If so, find a potential function for it. If not, explain
why not.
R
(b) Find C F · ds if C is the line segment from the point (4, 1) to the point (2, 3).

3.2. Let F : R2 → R2 be the vector field F(x, y) = (y, xy − x).

(a) Show that F is not conservative.


(b) Find an oriented closed curve C in R2 such that
R
C F · ds 6= 0.

In Exercises 3.3–3.4, (a) find the curl of F and (b) determine if F is conservative. If it is
conservative, find a potential function for it, and, if not, explain why not.

3.3. F(x, y, z) = (3y 4 z 2 , 4x3 z 2 , −3x2 y 2 )

3.4. F(x, y, z) = (y − z, z − x, y − x)

In Exercises 3.5–3.8, determine whether the vector field F is conservative. If it is, find a potential
function for it. If not, explain why not.

3.5. F(x, y) = (x2 − y 2 , 2xy)

3.6. F(x, y) = (x2 + y 2 , 2xy)

3.7. F(x, y, z) = (x + yz, y + xz, z + xy)

3.8. F(x, y, z) = (x − yz, −y + xz, z − xy)


9.7. EXERCISES FOR CHAPTER 9 229

3.9. Let p be a positive real number, and let F be the vector field on R3 − {(0, 0, 0)} given by:
 
x y z
F(x, y, z) = − 2 ,− 2 ,− 2 .
(x + y 2 + z 2 )p (x + y 2 + z 2 )p (x + y 2 + z 2 )p
For instance, the inverse square field is the case p = 3/2.
(a) The norm of F is given by kF(x, y, z)k = k(x, y, z)kq for some exponent q. Find q in
terms of p.
(b) For which values of p is F a conservative vector field? The case p = 1 may require special
attention.
2
3.10. Find C ey dx + xey dy if C is the curve parametrized by α(t) = (et , t3 ), 0 ≤ t ≤ 1.
R

3.11. Let a and b be real numbers. If C is a piecewise smooth oriented closed curve in the punctured
plane R2 − {(0, 0)}, show that:
(b − a)y
Z Z
ax by
2 2
dx + 2 2
dy = 2 2
dy.
C x +y x +y C x +y

3.12. Let G(x, y, z) = − (x2 +y2x+z 2 )3/2 , − (x2 +y2y+z 2 )3/2 , − (x2 +y2z+z 2 )3/2 be the inverse square field.


Find the integral of G over the curve parametrized by α(t) = (1 + t + t2 , t + t2 + t3 , t2 + t3 +


t4 ), 0 ≤ t ≤ 1.
3.13. Find C (y + z) dx + (x + z) dy + (x + y) dz if C is the curve in R3 from the origin to (1, 1, 1)
R

that consists of the sequence of line segments, each parallel to one of the coordinate axes,
from (0, 0, 0) to (1, 0, 0) to (1, 1, 0) and finally to (1, 1, 1).
3.14. Let C be a piecewise smooth 3
R oriented curve in R from a point p = (xR1 , y1 , z1 ) toRa point
q = (x2 , y2 , z2 ). Show that C 1 dx = x2 − x1 . (Similar formulas apply to C 1 dy and C 1 dz.)
3.15. (a) Find a vector field F = (F1 , F2 ) on R2 with the property that, whenever C is a piecewise
smooth oriented curve in R2 , then:
Z
F1 dx + F2 dy = kqk2 − kpk2 ,
C

where p and q are the starting and ending points, respectively, of C.


(b) Sketch the vector field F that you found in part (a).
3.16. Does there exist a vector field F on Rn with the property that, whenever C is a piecewise
smooth oriented curve in Rn , then:
Z
F · ds = p · q,
C

where p and q are the starting and ending points, respectively, of C? Either describe such a
vector field as precisely as you can, or explain why none exists.
3.17. Let f, g : Rn → R be smooth real-valued functions. Prove that, for all piecewise smooth
oriented closed curves C in Rn :
Z Z
(f ∇g) · ds = − (g∇f ) · ds.
C C

(Hint: See Exercise 1.19 in Chapter 4.)

www.dbooks.org
230 CHAPTER 9. LINE INTEGRALS

3.18. For a vector field F = (F1 , F2 , F3 ) on an open set in R3 , is the cross product ∇ × F, i.e., the
curl, necessarily orthogonal to F? Either prove that it is, or find an example where it isn’t.

3.19. Newton’s second law of motion, F = ma, relates the force F acting on an object to the
object’s mass m and acceleration a. Let F be a smooth force field on an open set of R3 , and
assume that an object of mass m travels under the influence of F along a curve C having a
smooth parametrization α : [a, b] → R3 with velocity v(t) and acceleration a(t).

(a) The quantity K(t) = 21 mkv(t)k2 is called the kinetic energy of the object. Use New-
ton’s second law to show that:
Z
F · ds = K(b) − K(a).
C

In other words, the work done by the force equals the change in the object’s kinetic
energy.
(b) Assume, in addition, that F is a conservative field with potential function f . Show that
the function
1
E(t) = mkv(t)k2 − f (α(t))
2
is constant. (The function −f is called a potential energy for F. The fact that, for a
conservative force field F, the sum of the kinetic and potential energies along an object’s
path is constant is known as the principle of conservation of energy.)

3.20. Let F be a smooth vector field on an open set U in Rn . A smooth path α : I → U defined
on an interval I is called an integral path of F if α0 (t) = F(α(t)) for all t. In other words, at
time t, when the position is the point α(t), the velocity equals the vector field vector at that
point. (See page 201.)

(a) If α : [a, b] → U
R is a nonconstant integral path of F and C is the curve parametrized by
α, show that C F · ds > 0. (Hint: The assumption that α is not constant means that
there is some point t in [a, b] where α0 (t) 6= 0.)
(b) Let f : U → R be a smooth real-valued function on U , and let F = ∇f . If α is a
nonconstant integral path of F going from a point p to a point q, show that f (q) > f (p).
(c) Show that a conservative vector field F cannot have a nonconstant closed integral path.

3.21. Let U be an open set in Rn with the property that every pair of points p, q in U can be joined
by a piecewise smooth path in U . Let f : U → R be a smooth real-valued function defined on
U . If ∇f (x) = 0 for all x in U , use line integrals to give a simple proof that f is a constant
function.

Section 4 Green’s theorem

4.1. Find C (x2 − y 2 ) dx + 2xy dy if C is the boundary of the square with vertices (0, 0), (1, 0),
R

(1, 1), and (0, 1), oriented counterclockwise.

4.2. Find C (2xy 2 − y 3 ) dx + (x2 y + x3 ) dy if C is the unit circle x2 + y 2 = 1, oriented counter-


R

clockwise.
R
4.3. Find C −y cos x dx + xy dy if C is the boundary of the parallelogram with vertices (0, 0),
(2π, π), (3π, 3π), and (π, 2π), oriented counterclockwise.
9.7. EXERCISES FOR CHAPTER 9 231

Figure 9.22: An oval track

4.4. Find C (ex+2y − 3y) dx + (4x + 2ex+2y ) dy if C is the track shown in Figure 9.22 consisting
R

of two straightaways joined by semicircles at each end, oriented counterclockwise.

4.5. (a) Use the parametrization α(t) = (a cos t, a sin t), 0 ≤ t ≤ 2π, and Corollary 9.14 to verify
that the area of the disk x2 + y 2 ≤ a2 of radius a is πa2 .
(b) By modifying the parametrization in part (a), use Corollary 9.14 to find the area of the
2 2
region xa2 + yb2 ≤ 1 inside an ellipse.

In Exercises 4.6–4.11, evaluate the line integral using whatever methods seem best.
R
4.6. C (x + y) dx + (x − y) dy, where C is the curve consisting of the line segment from (0, 1) to
(1, 0) followed by the line segment from (1, 0) to (2, 1)
x
4.7. C x2e+y2 dx − x2 +y
x 2 2 2
R
2 dy, where C is the unit circle x + y = 1 in R , oriented counterclockwise

4.8. C (3x2 y − xy 2 − 3x2 y 2 ) dx + (x3 − 2x3 y + x2 y) dy, where C is the closed triangular curve in
R

R2 with vertices (0, 0), (1, 1), and (0, 1), oriented counterclockwise

4.9. C yz 2 dx + x4 z dy + x2 y 2 dz, where C is the curve parametrized by α(t) = (t, t2 , t3 ), 0 ≤ t ≤ 1


R

4.10. C (6x2 −4y+2xy) dx+(2x−2 sin y+3x2 ) dy, where C is the diamond-shaped curve |x|+|y| = 1,
R

oriented counterclockwise

4.11. C yz dx + xz dy + xy dz, where C is the curve of intersection in R3 of the cylinder x2 + y 2 = 1


R

and the saddle surface z = x2 − y 2 , oriented counterclockwise as viewed from high above the
xy-plane, looking down

Let U be an open set in R2 . A smooth real-valued function h : U → R is called harmonic on


U if:
∂2h ∂2h
+ 2 = 0 at all points of U.
∂x2 ∂y
Exercises 4.12–4.15 concern harmonic functions.

4.12. Show that h(x, y) = e−x sin y is a harmonic function on R2 .

4.13. Let D be a bounded subset of U whose boundary consists of a finite number of piecewise
smooth simple closed curves. If h is harmonic on U , prove that:
Z ZZ
∂h ∂h
−h dx + h dy = k∇h(x, y)k2 dx dy.
∂D ∂y ∂x D

www.dbooks.org
232 CHAPTER 9. LINE INTEGRALS

4.14. Let D be a region as in the preceding exercise, and suppose that every pair of points p, q in
D can be joined by a piecewise smooth path in D. If h is harmonic on U and if h = 0 at all
points of ∂D, prove that h = 0 on all of D. (Hint: See Exercises 2.4 of Chapter 5 and 3.21
of this chapter.)

4.15. Continuing with the previous exercise, prove that a harmonic function is completely deter-
mined on D by its values on the boundary in the sense that, if h1 and h2 are harmonic on U
and if h1 = h2 at all points of ∂D, then h1 = h2 on all of D.

Section 5 The vector field W

5.1. Let F = (F1 , F2 ) be a smooth vector field that:

• is defined everywhere in R2 except at three points p, q, r and


• satisfies the mixed partials condition ∂F ∂F1
∂x = ∂y everywhere in its domain.
2

Let C1 , C2 , C3 be small circles centered at p, q, r, respectively, oriented counterclockwise as


shown in Figure 9.23. Assume that:
Z Z Z
F1 dx + F2 dy = 12, F1 dx + F2 dy = 10, and F1 dx + F2 dy = 15.
C1 C2 C3

Figure 9.23: Three circles C1 , C2 , and C3 and a fish-shaped curve C

R
(a) Find F1 dx + F2 dy if C is the oriented fish-shaped curve shown in the figure.
C
R
(b) Draw a piecewise smooth oriented closed curve K such that K F1 dx + F2 dy = 1. Your
curve need not be simple. Include in your drawing the direction in which K is traversed.

Section 6 The converse of the mixed partials theorem

6.1. (a) Let C be the curve in R2 − {(0, 0)} parametrized by α(t) = (cos t, − sin t), 0 ≤ t ≤ 2π.
Describe C geometrically, and calculate its winding number using the definition.
(b) More generally, let n be an integer (positive, negative, or zero allowed), and let Cn be the
curve parametrized by α(t) = (cos nt, sin nt), 0 ≤ t ≤ 2π. Give a geometric description
of Cn , and calculate its winding number using the definition.
9.7. EXERCISES FOR CHAPTER 9 233

6.2. Let C be a piecewise smooth simple closed curve in R2 , oriented counterclockwise, such that
the origin lies in the exterior of C. What are the possible values of the winding number of
C?

6.3. Let F = (F1 , F2 ) be a smooth vector field on the punctured plane U = R2 − {(0, 0)} such that
∂F2 ∂F1
∂x = ∂y on U , and let C be a piecewise smooth oriented closed curve in U .
R
(a) If C does not intersect the x-axis, explain why C F · ds = 0.
(b) Suppose instead that C mayRintersect the positive x-axis but not the negative x-axis. Is
it necessarily still true that C F · ds = 0? Explain.

6.4. (a) Let c = (c, d) be a point of R2 , and let B = B(c, r) be an open ball centered at c. For
the moment, we think of R2 as the uv-plane to avoid having x and y mean too many
different things later on. Let F(u, v) = (F1 (u, v), F2 (u, v)) be a continuous vector field
on B.
Given any point x = (x, y) in B, let C1 be the curve from (c, d) to (x, y) consisting of
a vertical line segment followed by a horizontal line segment, as indicated on the left of
Figure 9.24. Define a function f1 : B → R by:
Z
f1 (x, y) = F1 du + F2 dv.
C1

Show that ∂f1


∂x (x, y) = F1 (x, y) for all (x, y) in B. (Hint: Parametrize C1 and use the
fundamental theorem of calculus, keeping in mind what a partial derivative is.)

Figure 9.24: The curves of integration C1 and C2 for functions f1 and f2 , respectively

(b) Similarly, let C2 be the curve from (c, d) to (x, y) consisting of a horizontal segment
followed by a vertical segment, as shown on the right of Figure 9.24, and define:
Z
f2 (x, y) = F1 du + F2 dv.
C2

∂f2
Show that ∂y (x, y) = F2 (x, y) for all (x, y) in B.
(c) Now, let U be an open set in R2 such that any two points of U can be joined by a
piecewise smooth path in U . Let a be a point of U . Assume that F is a smooth vector
field on U such that the function f : U → R given in equation (9.14) is well-defined.
That is: Z x
f (x) = F1 du + F2 dv,
a

www.dbooks.org
234 CHAPTER 9. LINE INTEGRALS
R
where the notation means C F1 du + F2 dv for any piecewise smooth curve C in U from
a to x.
Show that f is a potential function for F on U , i.e., ∂f ∂f
∂x = F1 and ∂y = F2 . In particular,
F is a conservative vector field. (Hint: Given a point c in U , choose an open ball
B = B(c, r) that is contained in U . Define functions f1 and f2 as in parts (a) and (b).
For x in B, how is f (x) related to f (c) + f1 (x) and f (c) + f2 (x)?)
Chapter 10

Surface integrals

We next study how to integrate vector fields over surfaces. One important caveat: our discussion
applies only to surfaces in R3 . For surfaces in Rn with n > 3, the expressions that one integrates
are more complicated than vector fields.
It is easy to get lost in the weeds with the details of surface integrals, so, in the first section
of the chapter, we just try to build some intuition about what the integral measures and how to
compute that measurement. The formal definition of the integral appears in the second section.
Calculating surface integrals is not necessarily difficult but it can be messy, so having a range of
options is especially welcome. In addition to the definition and the intuitive approach, sometimes
a surface integral can be converted to a different type of integral altogether. For example, one
of the theorems of the chapter, Stokes’s theorem, relates surface integrals to line integrals, and
another, Gauss’s theorem, relates them to triple integrals. It would be a mistake to think of these
results primarily as computational tools, however. Their theoretical consequences are at least as
important, and we try to give a taste of some of the new lines of reasoning about integrals that
they open up. The chapter closes with a couple of sections in which we tie up some loose ends.
This also puts us in position to wrap things up in the next, and final, chapter.

10.1 What the surface integral measures


If S is a surface in R3 , we learned in Section 5.5 how to integrate a continuous real-valued function
f : S → R over S. By definition:
ZZ ZZ
∂σ ∂σ
f dS = ∂s × ∂t ds dt,
f (σ(s, t)) (10.1)

S D

where σ : D → R3 is a smooth parametrization of S defined on a subset D of the st-plane, as in


Figure 10.1. This is called the integral with respect to surface area. As before, we assume now
and in the future that D is the closure of a bounded open set in R2 , that is, a bounded open set
together with all its boundary points, such as a closed disk or rectangle (see page 134). As a matter
of full disclosure, we did not show that the value of the integral is independent of the particular
parametrization σ, a point that we address at the end of the chapter.
Now, let F : U → R3 be a vector field on an open set U in R3 that contains S. The idea is
that the integral of F over S measures the amount that F flows through S. This quantity is called
the flux. For example, if F points in a direction normal to the surface, then it is flowing through
effectively, whereas, if it points in a tangent direction, it is not flowing through at all. This stands

235

www.dbooks.org
236 CHAPTER 10. SURFACE INTEGRALS

Figure 10.1: A parametrization σ of a surface

in contrast with the interpretation of the line integral, where we were interested in the degree to
which the vector field flowed along a curve.
To begin, we need to designate a direction of flow through S that is considered to be positive.
Definition. An orientation of a smooth surface S in R3 is a continuous vector field n : S → R3
such that, for every x in S, n(x) is a unit normal vector to S at x. See Figure 10.2. An oriented
surface is a surface S together with a specified orientation n.

Figure 10.2: A surface S with orienting unit normal vector field n

Example 10.1. Suppose that S is a level set defined by f (x, y, z) = c for some smooth real-valued
function f of three variables. If x ∈ S, then, by Proposition 4.17, ∇f (x) is a normal vector to S
at x, so the formula n(x) = k∇f1(x)k ∇f (x) describes an orientation of S, provided that ∇f (x) 6= 0
for all x in S.
For instance, let S be the sphere x2 + y 2 + z 2 = a2 of radius a. Here, f (x, y, z) = x2 + y 2 + z 2 ,
so ∇f = (2x, 2y, 2z) = 2(x, y, z). As a result, an orientation of the sphere is given by:
1 1 1
n(x, y, z) = p 2(x, y, z) = √ (x, y, z) = (x, y, z)
2 x2 + y 2 + z 2 a2 a

for all points (x, y, z) on the sphere. This vector points away from the origin, so we say that it
orients S with the outward normal. See Figure 10.3, left.
We could equally well orient the sphere with the inward unit normal. This would be given by
n(x, y, z) = − a1 (x, y, z).
Or let S be the circular cylinder x2 + y 2 = a2 . This is a level set of f (x, y, z) = x2 + y 2 . Then
∇f = (2x, 2y, 0) = 2(x, y, 0), and an orientation is:
1 1
n(x, y, z) = p 2(x, y, 0) = (x, y, 0)
2 x2 + y 2 a

for all (x, y, z) on the cylinder. Once again, this is the outward normal (Figure 10.3, right).
10.1. WHAT THE SURFACE INTEGRAL MEASURES 237

Figure 10.3: A sphere and cylinder oriented by the outward normal

Figure 10.4: A mammal with orientation8

In general, let S be an oriented surface with orienting normal n. To measure the flow of a vector
field F = (F1 , F2 , F3 ) through S, we find the scalar component Fnorm of F in the normal direction
at every point x of S and integrate it over S. This component is given by Fnorm = kFk cos θ,
where θ is the angle between F and n, as shown in Figure 10.5.

Figure 10.5: The component of a vector field in the normal direction at a point x

Hence:
Fnorm = kFk cos θ = kFk knk cos θ = F · n.
RR
The integral of F over S, denoted S F · dS, can be expressed as:
ZZ ZZ ZZ
F · dS = Fnorm dS = F · n dS. (10.2)
S S S
8
Photo credit: https://pixabay.com/photos/nature-animal-porcupine-mammal-3588682/ by analogicus under
Pixabay license.

www.dbooks.org
238 CHAPTER 10. SURFACE INTEGRALS

As the integral of the real-valued function F · n, the expression on the right is an integral with
respect to surface area of the type considered previously in (10.1). We take equation (10.2) as a
tentative definition of the integral of F and use it to work through some examples.
Example 2 2 2 2
RR 10.2. Let S be the sphere x + y + z = a of radius a, oriented by the outward normal.
Find S F · dS if:
(a) F(x, y, z) = − (x2 +y21+z 2 )3/2 (x, y, z) (i.e., the inverse square field),

(b) F(x, y, z) = (0, 0, z),

(c) F(x, y, z) = (0, 0, −1).


As shown in Example 10.1, the outward unit normal to S is given by n = a1 (x, y, z).

(a) The inverse square field, illustrated in Figure 10.6, points directly inwards towards the origin,

Figure 10.6: The inverse square field

so F · n is negative at every point of S. In other words, if the positive direction is taken to be


outward, then F is flowing in the negative direction. Thus we expect the integral to be negative.
To calculate it precisely, note that on S, F(x, y, z) = − (a21)3/2 (x, y, z) = − a13 (x, y, z), so:

1 1 1 1 1
(x, y, z) · (x, y, z) = − 4 x2 + y 2 + z 2 = − 4 · a2 = − 2 .

F·n=− 3
a a a a a
Thus:
ZZ ZZ ZZ
1 1 1
F · dS = F · n dS = − dS = − 2 Area (S) = − 2 · 4πa2 = −4π.
S S S a2 a a
This uses the formula that the surface area is 4πa2 from Example 5.19 in Chapter 5. We shall refer
to this surface integral later, so to repeat for emphasis: the integral of the inverse square field over
a sphere of radius a centered at the origin and with the outward orientation is −4π, regardless of
the radius of the sphere.
(b) The vector field F(x, y, z) = (0, 0, z) points straight upward when z > 0 and straight downward
when z < 0 (Figure 10.7). In either region, the component in the outward direction of S is positive.
Thus we expect the integral to be positive.
In fact, F · n = (0, 0, z) · a1 (x, y, z) = a1 z 2 , so:
ZZ ZZ
1
F · dS = z 2 dS.
S a S
10.1. WHAT THE SURFACE INTEGRAL MEASURES 239

Figure 10.7: The vector field F(x, y, z) = (0, 0, z) flowing through a sphere

This last integral is precisely one of the examples we calculated using a parametrization
RR 2 when we
4
studied integrals with respect to surface area (Example 5.19 again). The answer dS = 3 πa4 .
RR 2is S z RR
(We
RR 2also gaveRR a symmetry argumentRR that made use of the facts that S x dS = S y 2 dS =
2 2 2 2 2 2 4
S z dS and S (x + y + z ) dS = S a dS = a · (4πa ) = 4πa .) Thus:
ZZ
1 4 4 4 3
F · dS = · πa = πa .
S a 3 3

We shall take another look at this example later as well.


(c) Here, F(x, y, z) = (0, 0, −1) is a constant vector field consisting of unit vectors that point straight
down everywhere (Figure 10.8). Roughly speaking, F points inward on the upper hemisphere, i.e.,
the component in the outward direction is negative, and outward on the lower hemisphere. From
the symmetry of the situation, we might expect these two contributions to cancel out exactly, giving
an integral of 0. Informally, the flow in equals the flow out, so the net flow is 0.

Figure 10.8: The vector field F(x, y, z) = (0, 0, −1) flowing through a sphere

To verify this, note that F · n = (0, 0, −1) · a1 (x, y, z) = − a1 z. Thus:


ZZ ZZ ZZ
1 1
F · dS = − z dS = − z dS.
S S a a S

By symmetry, this last integral is indeed 0. More precisely, S is symmetric in the xy-plane, i.e.,
(x, y, −z) is in S whenever (x, y, z) is, and the integrand f (x, y, z) = z satisfies f (x, y, −z) =
−z
RR = −f (x, y, z), so the contributions of the northern and southern hemispheres cancel. Thus
S F · dS = 0.

www.dbooks.org
240 CHAPTER 10. SURFACE INTEGRALS

In this last example, if we had integrated instead only over the northern hemisphere, the same
line of thought would lead us to expect that the integral is negative. As a test of your intuition
about what the integral represents, you might think about whether this is related to the following
example.

Example 10.3. Let S be the disk of radius a in the xy-plane described by:

x2 + y 2 ≤ a2 , z = 0,

oriented by the upward


RR normal, and let F(x, y, z) = (0, 0, −1) as in Example 10.2(c) above. See
Figure 10.9. Find S F · dS.

Figure 10.9: The vector field F(x, y, z) = (0, 0, −1) flowing through a disk in the xy-plane

The upward unit normal to S is n = k = (0, 0, 1), so F · n = (0, 0, −1) · (0, 0, 1) = −1. Therefore:
ZZ ZZ
F · dS = −1 dS = −Area (S) = −πa2 .
S S

Continuing with the comments right before the example, the result turned out be negative,
but is the flow of F = (0, 0, −1) through the disk really related to the flow through the northern
hemisphere? We shall be able to give a precise answer based on general principles once we learn a
little more about surface integrals (see Exercises 4.4 and 5.5 at the end of the chapter).

10.2 The definition of the surface integral


RR
We now present the actual definition of the vector field surface integral S F · dS. This follows
a strategy we have used before: to define this new type of integral, we parametrize the surface,
substitute for x, y, and z in terms of the two parameters, and then pull back to a double integral
in the parameter plane. Of course, we need to figure out what the pullback should be.
Let S be an oriented surface in R3 with orienting normal n, and let σ : D → R3 be a smooth
parametrization of S with domain D in R2 :

σ(s, t) = (x(s, t), y(s, t), z(s, t)), (s, t) ∈ D.

We first describe how σ can be used


 to give an independent
 orientation of S. Recall from Section
∂x ∂y ∂z ∂x ∂y ∂z
5.5 that the vectors ∂σ
∂s = , ,
∂s ∂s ∂s and ∂σ
∂t = ∂t ∂t ∂t are “velocity” vectors with respect to
, ,
∂σ
∂σ ∂σ × ∂σ
s and t, respectively, and thus are tangent to S. As a result, ∂s × ∂t is normal to S, and ∂s ∂t
k ∂s × ∂σ
∂σ
k
∂t
10.2. THE DEFINITION OF THE SURFACE INTEGRAL 241

is a unit normal, i.e., an orientation of S, provided that ∂σ


∂s ×
∂σ
∂t 6= 0. Since the orientation n that
is given for S is also a unit normal, it must be true that:
∂σ ∂σ
∂s × ∂t
n=± ∂σ ∂σ
. (10.3)
k ∂s × ∂t k

We say that σ preserves orientation if the sign is + and that it reverses orientation RRif −.
Now, to formulate the definition of the surface integral, recall that the idea was that S F · dS
∂σ
RR RR × ∂σ
should equal S F·n dS. Using (10.3) to substitute for n, this becomes ± S F· k ∂σ ∂s ∂t
× ∂σ k
dS. Then
∂s ∂t
∂σ
× ∂σ
the definition of the integral with respect to surface area (10.1) with f = F · ∂s
k ∂σ
∂t
× ∂σ k
gives:
∂s ∂t

∂σ ∂σ 
×
ZZ ZZ 
∂s ∂t
∂σ ∂σ
F · n dS = ± F(σ(s, t)) · ∂σ ∂σ

∂s × ds dt
S D k ∂s
× ∂t k ∂t
ZZ
∂σ ∂σ 
=± F(σ(s, t)) · × ds dt.
D ∂s ∂t
This is the thinking behind the following definition.

Definition. Let U be an open set in R3 , and let F : U → R3 be a continuous vector field. Let S
be an oriented surface contained in U , and let σ : D → R3 be a smooth parametrization of S such
∂σ ∂σ 9
RR
that ∂s × ∂t is never 0. Then the integral of F over S, denoted S F · dS, is defined by:
ZZ ZZ
∂σ ∂σ 
F · dS = ± F(σ(s, t)) · × ds dt, (10.4)
S D ∂s ∂t
where + is used if σ is orientation-preserving and − if orientation-reversing.

We defer the discussion that the definition is independent of the parametrization σ until the
end of the chapter after we have gotten used to working with the definition and there are fewer
new things to absorb. Also, the definition extends in the obvious way to surfaces that are piecewise
smooth. That is, if a surface S is a union S = S1 ∪ S2 ∪ · · · ∪ Sk of oriented smooth surfaces that
intersect at most in pairs along portions of their boundaries, we define:
ZZ ZZ ZZ ZZ
F · dS = F · dS + F · dS + · · · + F · dS.
S S1 S2 Sk

As a first example, we calculate an integral that we evaluated using the tentative definition in
Example 10.2(b).

Example
RR 10.4. Let S be the sphere x2 + y 2 + z 2 = a2 , oriented by the outward normal. Find
S F · dS if F(x, y, z) = (0, 0, z).
We apply the definition using the standard parametrization of S with spherical coordinates φ
and θ as parameters and ρ = a fixed:

σ(φ, θ) = (a sin φ cos θ, a sin φ sin θ, a cos φ), 0 ≤ φ ≤ π, 0 ≤ θ ≤ 2π.

The domain of the parametrization is the rectangle D = [0, π] × [0, 2π] in the φθ-plane. See Figure
10.10.
9
Actually, it’s allowed for ∂σ
∂s
× ∂σ
∂t
to be 0 at points (s, t) on the boundary of D. As noted earlier, as far as
integrals are concerned, bad behavior can be neglected if confined to the boundary (page 130).

www.dbooks.org
242 CHAPTER 10. SURFACE INTEGRALS

Figure 10.10: A parametrization of the sphere of radius a

Substituting in terms of the parameters gives F(σ(φ, θ)) = (0, 0, z(φ, θ)) = (0, 0, a cos φ). In
addition:

 
i j k
∂σ ∂σ
× = det  a cos φ cos θ a cos φ sin θ −a sin φ
∂φ ∂θ
−a sin φ sin θ a sin φ cos θ 0
.
= .. (we calculated this before in Example 5.19)
= a2 sin φ sin φ cos θ, sin φ sin θ, cos φ .

(10.5)

This vector is guaranteed to be normal to S, but, to determine whether σ is orientation-preserving


or reversing, we must check if it agrees with the given orientation, which is outward. Note that, in
the first octant, where φ and θ are both between 0 and π2 , all components of ∂σ ∂σ
∂φ × ∂θ in equation
(10.5) are positive, hence ∂σ ∂σ
∂φ × ∂θ points outward. Thus σ is orientation-preserving.

This can also be seen geometrically. At any point x of S, ∂σ


∂φ is a tangent vector in the direction
∂σ
of increasing φ. Similarly, ∂θ is tangent in the direction of increasing θ. These vectors are shown
in Figure 10.11. By the right-hand rule, ∂σ ∂σ
∂φ × ∂θ points in the outward direction.

∂σ ∂σ
Figure 10.11: The tangent vectors ∂φ and ∂θ
10.2. THE DEFINITION OF THE SURFACE INTEGRAL 243

To use the definition of the integral (10.4), we integrate the function:


 
∂σ ∂σ
= (0, 0, a cos φ) · a2 sin φ (sin φ cos θ, sin φ sin θ, cos φ)

F(σ(φ, θ)) · ×
∂φ ∂θ
= a2 sin φ 0 + 0 + a cos2 φ


= a3 sin φ cos2 φ.

Thus:
ZZ ZZ  
∂σ ∂σ
F · dS = + F(σ(φ, θ)) · × dφ dθ
S ∂φ ∂θ
ZZ D
= a3 sin φ cos2 φ dφ dθ
D
Z 2π Z π 
3 2
= a sin φ cos φ dφ dθ
0 0
Z π Z 2π 
3 2
=a sin φ cos φ dφ 1 dθ
 0   0
1 π 2π
= a3 − cos3 φ 0 θ 0
3
 
3 1
= a · − (−1 − 1) · 2π
3
2
= a3 · · 2π
3
4 3
= πa .
3
This agrees with the answer we got earlier in Example 10.2(b), but the calculation here was
at least as messy as the one obtained there by integrating the normal component of F. In other
words, the definition may not always be the most efficient way to go. We’ll return to this example
one more time later and apply a method that’s simplest of all.
Example 10.5. Let F(x, y, z) = (0, 1, −3). This is a constant
RR vector field which we think of as
“rain” falling through space. See Figure 10.12, left. Find S F · dS if:
p
(a) S is the cone z = x2 + y 2 , where x2 + y 2 ≤ 4, oriented by the upward normal (Figure 10.12,
middle),

(b) S is the disk that caps the cone in part (a), i.e., the disk x2 + y 2 ≤ 4 in the plane z = 2,
oriented by the upward normal (Figure 10.12, right).

(a) As we saw in Chapter 5, there are several ways to parametrize the cone, but it’s the graph of
a function of x and y so one way is to use x and y as parameters:
p
σ(x, y) = (x, y, x2 + y 2 ), x2 + y 2 ≤ 4.

The domain D in the xy-plane is the disk x2 + y 2 ≤ 4 of radius 2. Moreover:


 
i j k
∂σ ∂σ 1 0 √ x  x y 
× = det  x2 +y 2  = − p
 , −p ,1 .
∂x ∂y x2 + y 2 x2 + y 2
0 1 √ 2y 2

x +y

www.dbooks.org
244 CHAPTER 10. SURFACE INTEGRALS

p
Figure 10.12: Rain (left), the cone z = x2 + y 2 , x2 + y 2 ≤ 4 (middle), and the cap of the cone,
x2 + y 2 ≤ 4, z = 2 (right)

∂σ ∂σ
The z-component is positive, so ∂x × ∂y points upward. Thus σ is orientation-preserving. In
addition:

∂σ ∂σ  x y 
F(σ(x, y)) · × = (0, 1, −3) · − p , −p ,1
∂x ∂y x2 + y 2 x2 + y 2
y
= −p − 3.
x2 + y 2

Thus by definition:
ZZ ZZ
y 
F · dS = −p − 3 dx dy
S D x2 + y 2
ZZ ZZ
y
=− p dx dy − 3 1 dx dy.
D x2 + y 2 D

The first term


RR is 0 using the symmetry of D in the x-axis, and the second is 3 times the area of D.
Therefore S F · dS = −3 Area (D) = −3 · (π · 22 ) = −12π.
The negative answer makes sense, since S is oriented so that the upward direction is considered
positive whereas the rain is falling downward.
(b) To integrate over the oriented disk on the right of Figure 10.12, we won’t parametrize at all but
rather go back to integrating the normal component of F over S, which is what the actual definition
formalized. The upward unit normal to S is n = (0, 0, 1), so F·n = (0, 1, −3)·(0, 0, 1) = −3. Hence:
ZZ ZZ ZZ
F · dS = F · n dS = −3 dS = −3 Area (S) = −3 · (π · 22 ) = −12π.
S S S

We close by picking up the thread with which we ended the previous section, which is to note
that the answers to parts (a) and (b) of the last example are the same. We can interpret this to
say that the flow through the top of the cone equals the flow through the sides. If we could have
been sure that this was true in advance, then we could have calculated the integral over the cone
by replacing it with the much simpler calculation over the capping disk. The next three sections
have something to say about why this would have been valid from a couple of different perspectives.
They also contain the last two big theorems of introductory multivariable calculus.
10.3. STOKES’S THEOREM 245

10.3 Stokes’s theorem


There are two types of points on a surface. At some points, one can approach along the surface
from any direction and continue on through the point while remaining on the surface. These are
sometimes called interior points. Other points lie on the edge of the surface in the sense that, after
approaching the point from certain directions, one falls off the surface after continuing through.
These are called boundary points. The set of all boundary points is called the boundary of the
surface.
For instance, a hemisphere has a boundary consisting of the equatorial circle, and a cylinder has
a boundary consisting of two pieces, namely, the circles at each end. On the other hand, a sphere
and a torus have no boundary points. See Figure 10.13. In general, the boundary of a surface is
either empty or consists of one or more simple closed curves.

Figure 10.13: The boundary of a hemisphere is a circle and that of a cylinder is two circles. The
boundaries of a sphere and torus are empty.

Stokes’s theorem relates a line integral B F · ds around the boundary B of a surface in R3 to a


R

vector field integral over the surface itself. It can be seen as an extension of Green’s theorem from
double integrals to surface integrals. The theorem is not obvious, and we shall try to show where
it comes from, though in a very special case similar to the one we used to explain Green’s theorem.
Namely, we assume that S is an oriented
 surface with a smooth orientation-preserving parametri-
zation σ(s, t) = x(s, t), y(s, t), z(s, t) such that:
• the domain of σ is a rectangle R = [a, b] × [c, d] and
• σ is one-to-one and, in particular, transforms the boundary of R precisely onto the boundary
of S.
The situation is shown in Figure 10.14.

Figure 10.14: A magic carpet: a surface S with parametrization σ defined on a rectangle R. The
sides of R are oriented so that ∂R = A1 ∪ A2 ∪ (−A3 ) ∪ (−A4 ).

Let A denote the boundary of R and B the boundary of S. We orient A counterclockwise and
break it into its four sides A1 , A2 , A3 , A4 , where the horizontal sides A1 and A3 are oriented to the

www.dbooks.org
246 CHAPTER 10. SURFACE INTEGRALS

right and the vertical sides A2 and A4 are oriented upward. Taking orientation into account, we
have A = A1 ∪ A2 ∪ (−A3 ) ∪ (−A4 ). Applying σ then breaks B up into four curves B1 , B2 , B3 , B4 ,
where Bj = σ(Aj ) for j = 1, 2, 3, 4. B inherits the orientation B = B1 ∪ B2 ∪ (−B3 ) ∪ (−B4 ).
Let F = (F1 , F2 , F3 ) be a smooth vector field defined on an open set in R3 that contains S. We
consider the line integral of F around B, the boundary of S:
Z Z Z Z Z 
F · ds = + − − F · ds.
B B1 B2 B3 B4
R R
We use differential form notation: B F · ds R = B F1Rdx + F2 dy
R + F3 dz. To cut down on the
amount of notation in any one place, we find B F1 dx, B F2 dy, B F3 dz separately and add the
results at the end.
We can parametrize each of the four pieces of B by applying σ to the corresponding side of A,
fixing one of s or t and using the other as parameter:
For B1 : α1 (s) = σ(s, c), a ≤ s ≤ b.
For B2 : α2 (t) = σ(b, t), c ≤ t ≤ d.
For B3 : α3 (s) = σ(s, d), a ≤ s ≤ b.
For B4 : α4 (t) = σ(a, t), c ≤ t ≤ d.
R
So for B F1 dx, we integrate over each of B1 through B4 and take the sum with appropri-
ate signs. For example, for B1 , the parametrization is α1 (s) = σ(s, c) = (x(s, c), y(s, c), z(s, c)),
where t = c is held fixed. Hence the derivatives with respect to the parameter are deriva-
tives with respect to s, partial derivatives where appropriate. For instance, α10 (s) = ∂σ ∂s (s, c) =
∂y
( ∂x
∂s (s, c), ∂s (s, c), ∂z
∂s (s, c)). As a result, we obtain:
Z Z b
∂x
F1 dx = F1 (σ(s, c)) (s, c) ds.
B1 a ∂s

Similarly, over B2 with parameter t:


Z Z d
∂x
F1 dx = F1 (σ(b, t)) (b, t) dt.
B2 c ∂t

Continuing in this way with B3 and B4 and combining the results gives:
Z Z b Z d
∂x ∂x
F1 dx = F1 (σ(s, c)) (s, c) ds + F1 (σ(b, t)) (b, t) dt
B a ∂s c ∂t
Z b Z d
∂x ∂x
− F1 (σ(s, d)) (s, d) ds − F1 (σ(a, t)) (a, t) dt. (10.6)
a ∂s c ∂t
Upon closer examination, each of the terms in this last expression is a line integral over one of
the sides of the boundary of R back in the st-plane. For instance, the first term is A1 (F1 ◦ σ) ∂x
R
∂s ds
∂x
R
and the second is A2 (F1 ◦ σ) ∂t dt, line integrals over A1 and A2 , respectively. In fact, since t is
constant on A1 and s is constant on A2 , we can give these terms a more uniform appearance by
writing them as:
Z Z
∂x ∂x ∂x ∂x
(F1 ◦ σ) ds + (F1 ◦ σ) dt and (F1 ◦ σ) ds + (F1 ◦ σ) dt,
A1 ∂s ∂t A2 ∂s ∂t

respectively. The extra summands are superfluous to the integrals, since the derivatives of the extra
coordinates are zero on the corresponding segment. After converting the remaining two terms to
10.3. STOKES’S THEOREM 247

line integrals over A3 and A4 , equation (10.6) becomes a line integral over all of A:
Z Z
∂x ∂x
F1 dx = (F1 ◦ σ) ds + (F1 ◦ σ) dt.
B A ∂s ∂t
But now that we are back to a line integral in the plane, Green’s theorem applies. In other
words, we can replace the line integral over A with a double integral over the rectangle R that it
bounds. Thus: Z ZZ  
∂ ∂x  ∂ ∂x 
F1 dx = (F1 ◦ σ) − (F1 ◦ σ) ds dt. (10.7)
B R ∂s ∂t ∂t ∂s
We work with the integrand of the double integral. Using the product rule to calculate the
partial derivatives with respect to s and t, the integrand becomes:
∂ ∂x ∂2x ∂ ∂x ∂2x
F1 ◦ σ) · + (F1 ◦ σ) · − F1 ◦ σ) · − (F1 ◦ σ) · .
∂s ∂t ∂s ∂t ∂t ∂s ∂t ∂s
2
∂ x ∂ x 2
Thanks to the equality of the mixed partials ∂s ∂t = ∂t ∂s , the second and fourth terms cancel,
leaving:
∂ ∂x ∂ ∂x
F1 ◦ σ) − F1 ◦ σ) .
∂s ∂t ∂t ∂s
So after substituting into equation (10.7), at this point, we have:
Z ZZ  
∂ ∂x ∂ ∂x
F1 dx = F1 ◦ σ) − F1 ◦ σ) ds dt. (10.8)
B R ∂s ∂t ∂t ∂s
We compute the partials of F1 ◦ σ using the chain rule. Fans of dependence diagrams will

F1

x y z

s t

Figure 10.15: A dependence diagram for the composition F1 ◦ σ

welcome Figure 10.15. By the chain rule, the integrand in equation (10.8) becomes:
∂F1 ∂x ∂F1 ∂y ∂F1 ∂z  ∂x ∂F1 ∂x ∂F1 ∂y ∂F1 ∂z  ∂x
+ + − + +
∂x ∂s ∂y ∂s ∂z ∂s ∂t ∂x ∂t ∂y ∂t ∂z ∂t ∂s
∂F1 ∂z ∂x ∂x ∂z  ∂F1 ∂x ∂y ∂y ∂x 
= − − − .
∂z ∂s ∂t ∂s ∂t ∂y ∂s ∂t ∂s ∂t
Plugging this into (10.8) yields:
Z ZZ  
∂F1 ∂z ∂x ∂x ∂z  ∂F1 ∂x ∂y ∂y ∂x 
F1 dx = − − − ds dt.
B R ∂z ∂s ∂t ∂s ∂t ∂y ∂s ∂t ∂s ∂t
R R
The calculations of B F2 dy and B F3 dz are similar and end up giving:
Z ZZ  
∂F2 ∂y ∂z ∂z ∂y  ∂F2 ∂x ∂y ∂y ∂x 
F2 dy = − − + − ds dt
B R ∂z ∂s ∂t ∂s ∂t ∂x ∂s ∂t ∂s ∂t
Z ZZ  
∂F3 ∂y ∂z ∂z ∂y  ∂F3 ∂z ∂x ∂x ∂z 
and F3 dz = − − − ds dt.
B R ∂y ∂s ∂t ∂s ∂t ∂x ∂s ∂t ∂s ∂t

www.dbooks.org
248 CHAPTER 10. SURFACE INTEGRALS

Adding the three calculations and suitably regrouping the terms results in the formula:
Z ZZ 
∂F3 ∂F2  ∂y ∂z ∂z ∂y 
F1 dx + F2 dy + F3 dz = − −
B R ∂y ∂z ∂s ∂t ∂s ∂t
∂F1 ∂F3  ∂z ∂x ∂x ∂z 
+ − −
∂z ∂x ∂s ∂t ∂s ∂t 
∂F2 ∂F1  ∂x ∂y ∂y ∂x 
+ − − ds dt. (10.9)
∂x ∂y ∂s ∂t ∂s ∂t

This looks awful, but actually the ungainly double integral can be expressed much more con-
cisely. The integrand is the dot product of the vectors:
 
∂F3 ∂F2 ∂F1 ∂F3 ∂F2 ∂F1
v= − , − , −
∂y ∂z ∂z ∂x ∂x ∂y
 
∂y ∂z ∂z ∂y ∂z ∂x ∂x ∂z ∂x ∂y ∂y ∂x
and w = − , − , − .
∂s ∂t ∂s ∂t ∂s ∂t ∂s ∂t ∂s ∂t ∂s ∂t

For v, the components are the differences of the mixed partial pairs of F, and in fact v = ∇ × F,
the curl of F. Likewise, w is also a cross product:
 
i j k
∂y ∂σ ∂σ
w = det  ∂x
∂s ∂s
∂z 
∂s = ∂s × ∂t .
∂x ∂y ∂z
∂t ∂t ∂t

Thus, after all this work, equation (10.9) becomes:


Z ZZ  
∂σ ∂σ
F1 dx + F2 dy + F3 dz = (∇ × F) · × ds dt.
B R ∂s ∂t

This last expression is precisely the definition of the surface integral of the vector field ∇ × F over
S. We have arrived at the final formula:
Z Z ZZ
F · ds = F1 dx + F2 dy + F3 dz = (∇ × F) · dS. (10.10)
B B S

This was a long calculation. To recap the strategy, to evaluate the line integral over the boundary
B of S, we used a parametrization of S to pull back to a line integral in the plane, applied Green’s
theorem there, and lastly identified the resulting double integral (10.9) as the pullback of an integral
over the surface.
The argument behind (10.10) assumed that the domain of the parametrization is a rectangle, but
the conclusion remains true for surfaces more broadly. In order to formulate a general statement,
we need to say something about the relation between the orientation of a surface and the orientation
of its boundary. Let S be a smooth oriented surface in R3 with orienting normal n. We think of n
as defining “up” and imagine walking around S with our feet on the surface and our heads in the
direction of n. Then:

∂S denotes the boundary of S oriented so that, as


you traverse the boundary, S stays on the left.

If S happens to be contained in the plane R2 , this is consistent with the way we oriented its
boundary in Green’s theorem, assuming that S is oriented by the upward normal n = (0, 0, 1).
10.3. STOKES’S THEOREM 249

Figure 10.16: Three oriented surfaces and their oriented boundaries

For example, let S be a sphere that is oriented by the outward normal. Then the boundary of
the northern hemisphere is the equatorial circle oriented counterclockwise, while the boundary of
the southern hemisphere is the same circle but oriented clockwise. If S is a pair of pants oriented
by the outward normal, then ∂S consists of three closed curves: the waist oriented one way—let’s
call it clockwise—and the bottoms of the two legs oriented counterclockwise. See Figure 10.16.
For a piecewise smooth surface S = S1 ∪ S2 ∪ · · · ∪ Sk , we require that the smooth pieces
S1 , S2 , . . . , Sk are oriented compatibly in the sense that the oriented boundaries of adjacent pieces
are traversed in opposite directions wherever they intersect. These intersections are not considered
to be part of the boundary of the whole surface S. This is illustrated in Figure 10.17.

Figure 10.17: Compatible orienting normals n1 , n2 for adjacent surfaces S1 , S2 , respectively

For the extension of (10.10) beyond surfaces whose parameter domain is a rectangle, one ap-
proach might be to mimic the strategy we outlined informally for Green’s theorem. Namely, first
verify it in the case where the parameter domain is a union of rectangles that overlap at most
along parts of their boundaries and then argue that the domain of any parametrization can be
approximated arbitrarily well by such a union. We won’t give the details.
With these preliminaries out of the way, we can state the theorem.

Theorem 10.6 (Stokes’s theorem). Let S be a piecewise smooth oriented surface in R3 that is
bounded as a subset of R3 , and let F be a smooth vector field defined on an open set containing S.
Then: Z ZZ
F · ds = (∇ × F) · dS.
∂S S

In the form stated, Stokes’s theorem converts a line integral to a surface integral. Since surface
integrals are usually messier than line integrals, this is not the usual way in which the theorem
is applied, at least as far as calculating specific examples goes. (See Exercises 3.3–3.6 for some
exceptions, however.) Sometimes though, the theorem can be used to go the other way, converting
a surface integral to a line integral. This possibility has implications for the general properties of
surface integrals.

www.dbooks.org
250 CHAPTER 10. SURFACE INTEGRALS

Examplep10.7. Let F be the “falling rain” vector field F(x, y, z) = (0, 1, −3), and let S be the
cone z = x2 + y 2 , x2 + y 2 ≤ 4, oriented by the upward normal. In Example 10.5, we integrated
F over S using a parametrization. Since RR we evaluated the integral before, there seems no harm in
giving away a spoiler: the answer is S F · dS = −12π.
Could this surface integral have been computed using Stokes’s theorem? The answer is yes
provided
RR thatRRthere is a vector Rfield G such that F = ∇ × G, for then Stokes’s theorem gives
S F · dS = S (∇ × G) · dS = ∂S G · ds. In other words, we could find the surface integral of F
over S by computing the line integral of G around the boundary.
We try to find such a G = (G1 , G2 , G3 ) by setting F = ∇ × G and guessing:
 
i j k
∂ ∂ ∂ 
F = (0, 1, −3) = det  ∂x ∂y ∂z .
G1 G 2 G3

This leads to the system: 


∂G3 ∂G2

 0 = ∂y − ∂z

∂G1 ∂G3
1 = ∂z − ∂x

∂G2
−3 = − ∂G

∂y .
1
∂x

From the second equation, we guess G1 = z, from the third G2 = −3x, and from the first G3 = 0.
One can check that these guesses actually are consistent with all three equations. Thus G(x, y, z) =
(z, −3x, 0) works! Our virtuous lifestyle has paid off.
This gives:
ZZ Z Z Z
F · dS = G · ds = z dx − 3x dy + 0 dz = z dx − 3x dy.
S ∂S ∂S ∂S

We calculate the line integral by parametrizing C = ∂S, which is a circle of radius 2 in the plane
z = 2, oriented counterclockwise (see Figure 10.12, middle, or, if you’re willing to peek ahead,
Figure 10.18, left). The circle can be parametrized by α(t) = (2 cos t, 2 sin t, 2), 0 ≤ t ≤ 2π. Thus:
ZZ Z Z 2π

F · dS = z dx − 3x dy = 2 · (−2 sin t) − 6 cos t · 2 cos t dt
S C 0
Z 2π
= (−4 sin t − 12 cos2 t) dt
0
2π 2π
= 4 cos t 0 − 12 · (10.11)
2
= 0 − 12π
= −12π,

where in the step labeled (10.11) we used our trick to integrate cos2 t (Exercise 1.1, Chapter 7).

10.4 Curl fields


In the last example, we got the same answer using Stokes’s theorem as we did in Example 10.5 using
a parametrization, but it’s not clear that the Stokes’s theorem approach was any simpler. Actually,
it seemed somewhat roundabout. The argument has some important consequences, however.
Continuing with the example, let C denote the oriented boundary of the cone S, and suppose
that Se is any piecewise smooth oriented surface having C as its oriented boundary. For instance, Se
10.4. CURL FIELDS 251

Figure 10.18: An oriented cone S and two other oriented surfaces S,


e all three having the same
oriented boundary

could be the disk in the plane z = 2 that caps off the cone, oriented by the upward normal, which
was also considered in Example 10.5. See Figure 10.18.
If we integrate the falling rain vector field F over the new surface S,
e we obtain:
ZZ ZZ
F · dS = (∇ × G) · dS (where G = (z, −3x, 0) as before)
S
e
Z S
e

= G · ds (by Stokes’s theorem)


Z∂ S
e

= G · ds
C
= −12π (by the calculation in the previous example).

In other words, the integral of F is −12π over all oriented surfaces whose boundary is C.
This idea can be formulated more generally.

Definition. Let U be an open set in R3 . A vector field F : U → R3 is called a curl field on U if


there exists a smooth vector field G : U → R3 such that F = ∇ × G at all points of U .

For instance, we saw in Example 10.7 that F = (0, 1, −3) is a curl field on R3 with “anti-curl”
G = (z, −3x, 0).
One perspective on curl fields is the following general working principle: curl fields are to surface
integrals as conservative fields are to line integrals. We consider a couple of illustrations of this.

Example 10.8. For line integrals: The integral of a conservative field over an oriented curve
depends only on the endpoints of the curve.
For surface integrals: The integral of a curl field over an oriented surface depends only on the
oriented boundary of the surface. More precisely:

Theorem 10.9. Let S1 and S2 be piecewise smooth oriented surfaces in an open set U in R3 that
have the same oriented boundary, i.e., ∂S1 = ∂S2 . If F is a curl field on U , then:
ZZ ZZ
F · dS = F · dS.
S1 S2

R Suppose that F = ∇ × G. Then by Stokes’s theorem, both sides of the equation are equal
Proof.
to C G · ds, where C = ∂S1 = ∂S2 .

www.dbooks.org
252 CHAPTER 10. SURFACE INTEGRALS

Example 10.10. For line integrals: The integral of a conservative field over a piecewise smooth
oriented closed curve is 0.
For surface integrals:

Definition. A piecewise smooth surface S in Rn is called closed if ∂S = ∅ (the empty set).

For example, spheres and tori are closed surfaces, but hemispheres and cylinders are not.

Theorem 10.11. If F is a curl field on an open set U in R3 , then S F · dS = 0 for any piecewise
RR

smooth oriented closed surface S in U .

Proof. Let S be a piecewise smooth oriented closed surfaceRR in U . The


R simplest argument would be
to say that, if F = ∇ × G, then, by Stokes’s theorem, S F · dS = ∂S G · ds = 0 since ∂S = ∅.
If integrating over the empty set makes you nervous, then perhaps a more aboveboard alternative
would be to cut S into two nonempty pieces S1 and S2 such that S = S1 ∪S2 and S1 and S2 intersect
only along their common boundary. For instance, S1 could be a small disklike piece of S, and S2
could be what’s left. See Figure 10.19. Then:
ZZ ZZ ZZ
F · dS = F · dS + F · dS
S
Z S1 Z S2
= G · ds + G · ds. (10.12)
∂S1 ∂S2

Figure 10.19: Dividing S into two pieces S = S1 ∪ S2

S1 and S2 inherit the orienting unit normal from S, so, in order to keep them on the left, their
boundaries must be traversed in opposite directions. This is consistent with our requirement for
orientations of piecewise smooth surfaces. In other words, ∂S1 and ∂S2 are the same curve but
with opposite orientations. Hence the line integrals in equation (10.12) cancel each other out.

Lastly, we look for an analogue for curl fields of the mixed partials theorem for conservative
fields. In R3 , the mixed partials theorem takes the form that, if F is conservative, 
then ∇ × F = 0.
i j k
∂ ∂ ∂ 
Now, let F be a curl field, say F = ∇ × G, so (F1 , F2 , F3 ) = det  ∂x ∂y ∂z . Hence:
G1 G2 G3

F = ∂G ∂G2
∂y − ∂z
3
 1


F2 = ∂G ∂G3
∂z − ∂x
1

F = ∂G2 − ∂G1 .

3 ∂x ∂y
10.5. GAUSS’S THEOREM 253

Taking certain partial derivatives of F1 , F2 , F3 introduces the second-order mixed partials of G1 ,


G2 , G3 :
∂ 2 G3 ∂ 2 G2
 ∂F

 ∂x
1
= ∂x ∂y − ∂x ∂z

∂F2 ∂ 2 G1 ∂ 2 G3
∂y = ∂y ∂z − ∂y ∂x

 ∂F3 ∂ 2 G2 ∂ 2 G1

∂z = ∂z ∂x − ∂z ∂y .

The terms on the right side are mixed partial pairs that appear with opposite signs. Thus, by the
equality of mixed partials, taking the sum gives:
∂F1 ∂F2 ∂F3
+ + = 0. (10.13)
∂x ∂y ∂z

The sum can also be written in terms of the ∇ operator as ∂F ∂F2 ∂F3 ∂
∂x + ∂y + ∂z = ( ∂x ,
1 ∂ ∂
∂y , ∂z ) ·
(F1 , F2 , F3 ) = ∇ · F, where the product is dot product. This quantity has a name.
Definition. If F = (F1 , F2 , F3 ) is a smooth vector field on an open set U of R3 , then:
∂F1 ∂F2 ∂F3
∇·F= + +
∂x ∂y ∂z
is called the divergence of F.
Note that, while the curl of a vector field is again a vector field, the divergence is a real-valued
function.
From (10.13), we have proven the following:
Theorem 10.12. Let U be an open set in R3 , and let F be a smooth vector field on U . If F is a
curl field on U , then:
∇·F=0
at all points of U . Equivalently, given a smooth vector field F on U , if ∇ · F 6= 0 at some point,
then F is not a curl field.
Example 10.13. The vector field F(x, y, z) = (2xy + z 2 , 2yz + x2 , 2xz + y 2 ) is conservative on
R3 . It has f (x, y, z) = x2 y + y 2 z + z 2 x as a potential function. Is F also a curl field on R3 ?
We could write down the conditions for being a curl field and try to guess a vector field G
such that F = ∇ ×G the way we did before. It is  easy, however, to compute the divergence:
∂ ∂ ∂
∇ · F = ∂x 2xy + z 2 + ∂y 2yz + x2 + ∂z 2xz + y 2 = 2y + 2z + 2x. Since this is not identically
0, F is not a curl field.

10.5 Gauss’s theorem


Gauss’s theorem relates the integral of a vector field over a closed surface to a triple integral over
the three-dimensional region that the surface encloses. As with Green’s and Stokes’s theorems, we
indicate where the theorem comes from by deriving it in a very special case, then hint at how it
might be extended to more general situations.
Let F = (F1 , F2 , F3 ) be a smooth vector field defined on an open set U in R3 . We consider the
case of a surface S that is the boundary of a rectangular box W = [a, b] × [c, d] × [e, f ] contained
in U . Thus S consists of the six rectangular faces of the box, each parallel to one of the coordinate
planes. We orient each face with the unit normal pointing away from the box, as shown for three
of the faces in Figure 10.20. This makes S into a piecewise smooth oriented closed surface.

www.dbooks.org
254 CHAPTER 10. SURFACE INTEGRALS

Figure 10.20: The rectangular box W = [a, b]×[c, d]×[e, f ] in R3 and the orientation of its boundary
surface S

We integrate F over S. To do so, we consider each face separately and add the results:

ZZ Z Z ZZ ZZ ZZ ZZ ZZ 
F · dS = + + + + + F · dS. (10.14)
S top bottom front back left right

For instance, the top may be parametrized using x and y as parameters with z = f fixed: σ(x, y) =
(x, y, f ), a ≤ x ≤ b, c ≤ y ≤ d. Then:

∂σ ∂σ
× = (1, 0, 0) × (0, 1, 0) = i × j = k = (0, 0, 1).
∂x ∂y

This points upward, hence away from the box W , so σ is orientation-preserving. Therefore:

ZZ ZZ  
∂σ ∂σ
F · dS = F(σ(x, y)) · × dx dy
top [a,b]×[c,d] ∂x ∂y
ZZ
= (F1 , F2 , F3 ) · (0, 0, 1) dx dy
[a,b]×[c,d]
ZZ
= F3 (x, y, f ) dx dy.
[a,b]×[c,d]

For the face on the bottom, the similar parametrization σ(x, y) = (x, y, e) works, only now σ is
orientation-reversing since the orientation of S on the bottom points downward in order to point
away from W . This gives:

ZZ ZZ
F · dS = − F3 (x, y, e) dx dy.
bottom [a,b]×[c,d]
10.5. GAUSS’S THEOREM 255

We add these results and apply the fundamental theorem of calculus:


ZZ ZZ ZZ

F · dS + F · dS = F3 (x, y, f ) − F3 (x, y, e) dx dy
top bottom [a,b]×[c,d]
ZZ z=f

= F3 (x, y, z) dx dy
[a,b]×[c,d] z=e
ZZ Z f 
∂F3
= (x, y, z) dz dx dy
[a,b]×[c,d] e ∂z
ZZZ
∂F3
= (x, y, z) dx dy dz.
W ∂z

The corresponding calculations for the remaining four faces yield:


ZZ ZZ ZZZ
∂F1
F · dS + F · dS = (x, y, z) dx dy dz
front back W ∂x
and
ZZ ZZ ZZZ
∂F2
F · dS + F · dS = (x, y, z) dx dy dz.
left right W ∂y

Adding up the results over all the faces as in equation (10.14) then gives:
ZZ ZZZ ZZZ
∂F1 ∂F2 ∂F3 
F · dS = + + dx dy dz = ∇ · F dx dy dz. (10.15)
S W ∂x ∂y ∂z W

One can extend this formula to the case that W is a union of rectangular boxes that intersect
at most in pairs along their boundaries by applying equation (10.15) to the individual boxes and
adding. A key observation is that when two neighboring boxes intersect in the interior of W on
a portion of a common face, then the orienting normals of each box point in opposite directions
in order to point away from the respective boxes. Hence the normal components F · n are also
opposite, so that the surface integrals over the overlaps cancel out in the sum. This leaves only the
surface integral over the boundary of W . One then claims that equation (10.15) remains true for
more general regions by some sort of limiting argument.
Before stating the theorem, we identify the orientation convention implicit in the argument
given above, namely, if W is a bounded set in R3 whose boundary consists of one or more closed
surfaces, then:

∂W denotes the boundary of W oriented so that


the orienting unit normal points away from W .

An example is shown in Figure 10.21.


The general version of equation (10.15) is then the following.

Theorem 10.14 (Gauss’s theorem). Let U be an open set in R3 , and let F be a smooth vector
field on U . If W is a bounded subset of U whose boundary consists of a finite number of piecewise
smooth closed surfaces, then:
ZZ ZZZ
F · dS = ∇ · F dx dy dz.
∂W W

www.dbooks.org
256 CHAPTER 10. SURFACE INTEGRALS

Figure 10.21: A solid region W in R3 and its oriented boundary ∂W

In the case that F is a curl field, the left side of Gauss’s theorem is 0. This is because ∂W
consists of closed surfaces and the integral of a curl field over a closed surface is 0 (Theorem 10.11).
The right side is 0, too, because curl fields have divergence 0 (Theorem 10.12). Thus for curl fields,
Gauss’s theorem reflects properties already known to us.
To illustrate Gauss’s theorem, we begin by taking a new look at two surface integrals we
evaluated earlier.

Example 10.15. 2 2 2 2
RR Let Sa be the sphere x + y + z = a of radius a, oriented by the outward
normal. Find Sa F · dS if:

(a) F(x, y, z) = (0, 0, −1),

(b) F(x, y, z) = (0, 0, z).

Figure 10.22: The solid ball Wa of radius a and its oriented boundary, the sphere Sa

(a) This was Example 10.2(c). To answer it using Gauss’s theorm, let Wa be the solid ball x2 +
∂ ∂ ∂
y 2 + z 2 ≤ a2 so that Sa = ∂Wa , as in Figure 10.22. Since ∇ · F = ∂x (0) + ∂y (0) + ∂z (−1) = 0,
Gauss’s theorem says that:
ZZ ZZZ ZZZ
F · dS = ∇ · F dx dy dz = 0 dx dy dz = 0.
Sa Wa Wa

(b) This is the third time we have evaluated this integral (see Examples 10.2(b) and 10.4). Here,
∇ · F = 0 + 0 + 1 = 1, so with Wa as above:
ZZ ZZZ
4
F · dS = 1 dx dy dz = Vol (Wa ) = πa3 ,
Sa Wa 3
10.6. THE INVERSE SQUARE FIELD 257

where we have used the formula for the volume of a 3-dimensional ball from Example 7.8 in Chapter
7.
It seems odd to say that turning a problem into a triple integral makes it simpler, but, with the
help of Gauss’s theorem, finding the preceding two surface integrals became essentially trivial.
Example 10.16. Let F(x, y, z) = (x, y, z). This vector field radiates directly away from the origin
at every point, and the arrows get longer the further out W be a solid region in R3 as
RR you go. Let RRR
in Gauss’s theorem. Then ∇ · F = 1 + 1 + 1 = 3, so ∂W F · dS = W 3 dx dy dz = 3 Vol (W ),
that is: ZZ
1
Vol (W ) = (x, y, z) · dS.
3 ∂W
This is another instance where information about a set can be gleaned from a calculation just
involving its boundary. (Question: Thinking of the surface integral as the integral of the normal
component
RR and keeping in mind that ∂W is oriented by the outward normal, does it make sense
that ∂W (x, y, z) · dS is always positive? Note that the origin need not lie in W , so there may
be points where the orienting normal n on the boundary points towards the origin and the normal
component (x, y, z) · n is negative.)

10.6 The inverse square field


Throughout this section, G denotes the inverse square field G : R3 − {(0, 0, 0)} → R3 , given by:
 
x y z
G(x, y, z) = − 2 ,− 2 ,− 2 . (10.16)
(x + y 2 + z 2 )3/2 (x + y 2 + z 2 )3/2 (x + y 2 + z 2 )3/2
We study G using Gauss’s theorem, starting with a couple of preliminary facts.
Fact 1. Let Sa be the sphere x2 + y 2 + z 2 = a2 , oriented by the outward normal. Then:
ZZ
G · dS = −4π, independent of the radius a.
Sa

Justification. We calculated and even emphasized this earlier in Example 10.2(a), which was the
first surface integral of a vector field that we found. Note that this implies that G is not a curl
field, since a curl field would integrate to 0 over the closed surface Sa .
Fact 2. ∇ · G = 0.
Justification. This is a brute force calculation. For instance:
 
∂G1 ∂ x
= − 2
∂x ∂x (x + y 2 + z 2 )3/2
(x2 + y 2 + z 2 )3/2 · 1 − x · 32 (x2 + y 2 + z 2 )1/2 · 2x
=−
(x2 + y 2 + z 2 )3
(x2 + y 2 + z 2 ) − 3x2
=−
(x2 + y 2 + z 2 )5/2
2x2 − y 2 − z 2
= 2 .
(x + y 2 + z 2 )5/2
By symmetric calculations:
∂G2 2y 2 − x2 − z 2 ∂G3 2z 2 − x2 − y 2
= 2 and = 2 .
∂y (x + y 2 + z 2 )5/2 ∂z (x + y 2 + z 2 )5/2

www.dbooks.org
258 CHAPTER 10. SURFACE INTEGRALS
2 2 2 2 2 2 2 2 2
Taking the sum gives ∇ · G = (2x −y −z )+(2y −x −z )+(2z −x −y )
(x2 +y 2 +z 2 )5/2
= 0, as claimed.
This fact has an interesting implication, too, for we know that curl fields have divergence 0.
This shows that the converse need not hold, since ∇ · G = 0 yet G is not a curl field by Fact 1.
Now, let S be a piecewise smooth closed surface in R3 − {(0, RR0, 0)}, oriented by the outward
normal. Is that enough for us to be able to say something about S G · dS? Strikingly, the answer
is yes. There are two cases.
Case 1. Suppose that the origin 0 lies in the exterior of S (Figure 10.23).

Figure 10.23: The origin lies exterior to S.

Let W be the three-dimensional solid inside of S so that S = ∂W . By assumption, 0 ∈


/ W , so
G is defined throughout W and we may apply Gauss’s theorem. This gives:
ZZ ZZZ
G · dS = ∇ · G dx dy dz
S Z Z ZW
= 0 dx dy dz (by Fact 2)
W
= 0.
Case 2. Suppose that 0 lies in the interior of S.
The preceding argument is no longer valid, since filling in the interior of S would include the
origin, which is excluded from the domain of G. Instead, we choose a very small sphere S of radius
 centered at the origin, oriented by the outward normal, and we let W be the region inside S but
outside S . See Figure 10.24.

Figure 10.24: The origin lies in the interior of S.

Then Gauss’s theorem applies to W , giving:


ZZ ZZZ
G · dS = ∇ · G dx dy dz = 0, (10.17)
∂W W
10.7. A MORE SUBSTITUTION-FRIENDLY NOTATION FOR SURFACE INTEGRALS 259

as before. This time, however, the boundary of W consists of S and S , oriented by the normal
pointing away from W . This is the RR outward normal
RR on S but RR
points inward on S . In other words,
∂W = S ∪ (−S ), and therefore ∂W G · dS = S G · dS − S G · dS. By equation (10.17), the
RR RR
difference is 0, from which it follows that S G · dS = S G · dS. And, by Fact 1, the integral of
G over any sphere centered at the origin is −4π. Hence:
ZZ
G · dS = −4π.
S

After a slight rearrangement, we can summarize the discussion as follows. For a piecewise
smooth closed surface S that doesn’t pass through the origin and is oriented by the outward
normal: (
ZZ
1 0 if 0 lies in the exterior of S,
− G · dS =
4π S 1 if 0 lies in the interior of S.

As we noted back in Chapter 8, the inverse square field describes the gravitational, or electro-
static, force due to an object, or charged particle, located at the origin. If several objects of equal
mass, or equal charge, are placed at the points p1 , p2 , . . . , pk in R3 , then the total force at a point
x due to these objects is described by the vector field:

k
X
F(x) = G(x − pi ), (10.18)
i=1

where G is the same as above (10.16). It is not hard to verify that again ∇ · F = 0. A modification
of the argument just given shows that, if S is a piecewise smooth closed surface contained in the
set R3 − {p1 , p2 , . . . pk } and oriented by the outward normal, then:
ZZ
1
− F · dS counts the number of the points p1 , p2 , . . . , pk that lie inside S.
4π S

This is an instance of what is known in physics as Gauss’s law. In effect, the total mass, or charge,
contained inside S determines, and is determined by, the integral of F over S. An explicit formula
or parametrization for S is not needed to obtain this information.

10.7 A more substitution-friendly notation for surface integrals


We now fill in a few details that we have been postponing. Some of the arguments are technical,
but they also unveil themes that unify the theory behind what we have been R doing.
In the case ofR line integrals, we have typically replaced the notation C F · ds in favor of the
more suggestive C F1 dx1 + F2 dx2 + · · · + Fn dxn . The latter is useful in that it tells us more or
less what we need to substitute in order to express the integral in terms of the parameter used to
trace out C. We now introduce a corresponding notation for surface integrals.
Let F = (F1 , F2 , F3 ) be a continuous vector field on an open set U in R3 , and let S be a smooth
oriented surface in U . To integrate F over S, we want to turn the calculation into a double integral
in the two parameters of a parametrization. So let σ : D → R3 be a smooth parametrization of S,
where σ(s, t) = (x(s, t), y(s, t), z(s, t)). For any function of (x, y, z), such as the components F1 , F2 ,
and F3 of F, the parametrization tells us how to substitute for x, y, and z, resulting in a function
of s and t.

www.dbooks.org
260 CHAPTER 10. SURFACE INTEGRALS

For simplicity, let’s assume that σ is orientation-preserving. By definition:


ZZ ZZ  
∂σ ∂σ
F · dS = F(σ(s, t)) · × ds dt
S D ∂s ∂t
 
ZZ i j k
∂y
F1 (σ(s, t)), F2 (σ(s, t)), F3 (σ(s, t)) · det  ∂x ∂z 

= ∂s ∂s ∂s ds dt
D ∂x ∂y ∂z
∂t ∂t ∂t
ZZ   ∂y ∂z
  ∂z ∂x

= F1 (σ(s, t)) · det ∂s ∂s + F2 (σ(s, t)) · det ∂s ∂s
∂y ∂z ∂z ∂x
D ∂t ∂t ∂t ∂t
 ∂x ∂y 
+ F3 (σ(s, t)) · det ∂s ∂s ds dt. (10.19)
∂x ∂y
∂t ∂t
We adopt the notation:
ZZ ZZ
F · dS = F1 dy ∧ dz + F2 dz ∧ dx + F3 dx ∧ dy. (10.20)
S S
The “∧” symbol stands for a kind of multiplication known as the wedge product. We develop
some of its properties along the way as we get more practice working RR with it.
2 2 2
RR For instance, if 2F(x,2 y, z)2 = (xyz, x + y + z , xy + yz), then S F · dS would be written
S xyz dy ∧ dz + (x + y + z ) dz ∧ dx + (xy + yz) dx ∧ dy. The order of the factors in the individual
terms of the notation is important. The pattern is that the subscripts/variables appear in cyclic
order, as indicated in Figure 10.25.

F3
dz

F1 F2
dx dy

Figure 10.25: The cyclic ordering of coordinates in R3

In comparison with the expanded definition (10.19), the notation in (10.20) says that, as part
of the substitutions for x, y, z in terms of s, t, one should make the substitution, for example:
 ∂y ∂z 
dy ∧ dz = det ∂y ∂s ∂s
∂z ds dt.
∂t ∂t
Actually, this may look more familiar if we take the transpose of the matrix, which does not affect
the value of the determinant:  ∂y ∂y 
dy ∧ dz = det ∂z ∂s ∂t
∂z ds dt. (10.21)
∂s ∂t
This has the form of a change of variables, i.e., a substitution, for a transformation from the st-
plane to the yz-plane. Similarly, to evaluate an integral presented in the notation of (10.20), one
substitutes:  ∂z ∂z 
dz ∧ dx = det ∂x∂s ∂t
∂x ds dt
∂s ∂t
 ∂x (10.22)
∂x 
and dx ∧ dy = det ∂s ∂t ds dt,
∂y ∂y
∂s ∂t
10.8. INDEPENDENCE OF PARAMETRIZATION 261

which also have the form of certain changes of variables.


The integrand η = F1 dy ∧dz +F2 dz ∧dx+F3 dx∧dy of (10.20) is called a differential 2-form.
By comparison, the expressions ω = F1 dx + F2 dy + F3 dz that we integrated over curves in R3 are
called differential 1-forms. To integrate the 2-form η over S, we substitute for x, y, z, in terms of
s, t in F1 , F2 , F3 and replace the wedge products like dy ∧ dz with expressions that come formally
from the change of variables theorem. This converts the integral over S into a double integral over
the parameter domain. The integrand of this double integral is called the pullback of η, denoted
σ ∗ (η). From (10.19), it can be written:
  ∂y ∂y   ∂z ∂z   ∂x ∂x 
∗ ∂s ∂t ∂s ∂t ∂s ∂t
σ (η) = (F1 ◦ σ) · det ∂z ∂z + (F2 ◦ σ) · det ∂x ∂x + (F3 ◦ σ) · det ∂y ∂y ds ∧ dt.
∂s ∂t ∂s ∂t ∂s ∂t

With this notation, the definition of the surface integral takes the form:
ZZ ZZ
η= σ ∗ (η), (10.23)
σ(D) D
RR
where, onRR the right, an integral of the form D f (s, t) ds ∧ dt is defined to be the usual double
integral D f (s, t) ds dt. Equation (10.23) highlights that the definition has the same form as the
change of variables theorem, continuing an emerging pattern (see (7.3) and (9.6)).
We say more in the next chapter about properties of differential forms, but our discussion so far
can be used to motivate some elementary manipulations. The substitution formulas (10.21) and
(10.22) for dy ∧ dz and the like suggest a connection between wedge products and determinants.
To reinforce this, we adopt the following rules for working with differential forms:

• dz ∧ dy = −dy ∧ dz,
dx ∧ dz = −dz ∧ dx,
dy ∧ dx = −dx ∧ dy,

• dx ∧ dx = 0,
dy ∧ dy = 0,
dz ∧ dz = 0.

The first collection mirrors the property that interchanging two rows changes the sign of a deter-
minant, and the second mirrors the property that two equal rows imply a determinant of 0. We
postpone until the next chapter the discussion of how these properties may be used.

10.8 Independence of parametrization


Finally, we address
RR the long-standing question of the extent to which the official definition of the
surface integral, S F · dS = ± D F(σ(s, t)) · ∂σ ∂σ
RR 
∂s × ∂t ds dt, depends on the smooth parametriza-
tion σ : D → R of S, where D is a subset of the st-plane and, by assumption, ∂σ
3 ∂σ
∂s × ∂t 6= RR
0, except
possibly on the boundary of D. To highlight the role of the parametrization σ, we write σ F · dS
to denote the quantity D F(σ(s, t)) · ∂σ ∂σ
RR 
∂s × ∂t ds dt in the definition.
We continue to assume that σ is orientation-preserving. Let τ : D∗ → R3 be another smooth
parametrization of S. Call its parameters u and v so that τ (u, v) = (x(u, v), y(u, v), z(u, v)). For
simplicity, we assume that σ and RR τ are one-to-one,
RR except possibly on the boundaries of their
domains. The goal is to compare σ F · dS and τ F · dS. We use the formula for the surface

www.dbooks.org
262 CHAPTER 10. SURFACE INTEGRALS

integral after it has been expanded out in terms of coordinates as suggested by the differential form
notation, that is:
ZZ ZZ   ∂y ∂y   ∂z ∂z 
F · dS = ∂s ∂t ∂s
F1 (σ(s, t)) · det ∂z ∂z + F2 (σ(s, t)) · det ∂x ∂t
∂x
σ D ∂s ∂t ∂s ∂t
 ∂x ∂x 
+ F3 (σ(s, t)) · det ∂s ∂t ds dt, (10.24)
∂y ∂y
∂s ∂t
as
RR in equation (10.19), except that the ∗2 by 2 matrices have been transposed. The expression for
τ F · dS is similar, integrating over D and using u’s and v’s instead of s’s and t’s.
If x ∈ S, then x = σ(s, t) for some (s, t) in D and x = τ (u, v) for some (u, v) in D∗ . We write
this as (s, t) = T (u, v). This defines a function T : D∗ → D, where, given (u, v) in D∗ , T (u, v) is
the point (s, t) in D such that σ(s, t) = τ (u, v). Then, by construction:
τ (u, v) = σ(s, t) = σ(T (u, v)). (10.25)
See Figure 10.26. (If σ or τ is not one-to-one, which is permitted only on the boundaries of their

Figure 10.26: Two parametrizations σ and τ of the same surface S

respective domains, there is some ambiguity on how to define T there, but, as we have observed
before, the integral can tolerate a certain amount of misbehavior on the boundary.)
By equation (10.25) and the chain rule, Dτ (u, v) = Dσ(T (u, v))·DT (u, v) = Dσ(s, t)·DT (u, v),
or:  ∂x ∂x   ∂x ∂x 
∂u ∂v ∂s ∂t
 ∂s ∂s 
 ∂y ∂y  =  ∂y ∂y  ∂u ∂t
∂v .
∂t (10.26)
∂u ∂v ∂s ∂t
∂z ∂z ∂z ∂z ∂u ∂v
∂u ∂v ∂s ∂t
We focus on the 2 by 2 blocks within the first two matrices of this equation. For instance, isolating
the portions of the product involving the second and third rows, as indicated in red above, gives:
 ∂y ∂y   ∂y ∂y   ∂s ∂s   ∂y ∂y 
∂u ∂v = ∂s ∂t ∂u ∂v = ∂s ∂t
∂z ∂z ∂z ∂z ∂t ∂t ∂z ∂z · DT (u, v).
∂u ∂v ∂s ∂t ∂u ∂v ∂s ∂t
Since the determinant of a product is the product of the determinants—that is, det(AB) =
(det A)(det B)—we obtain:
 ∂y ∂y   ∂y ∂y 
∂u ∂v ∂s ∂t
det ∂z ∂z = det ∂z ∂z · det DT (u, v). (10.27)
∂u ∂v ∂s ∂t
10.8. INDEPENDENCE OF PARAMETRIZATION 263
RR
determinant appears in the definition of τ F · dS, while
Referring to equation (10.24), the first RR
the second appears in the definition of σ F · dS. By choosing different pairs of rows in the chain
rule (10.26), the same reasoning applies to the other two determinants that appear in (10.24). In
each case, the determinants for τ and σ differ by a factor of det DT (u, v).
∂τ
In fact, these 2 by 2 determinants are precisely the components of the cross products ∂u × ∂τ
∂v
∂σ ∂σ
and ∂s × ∂t , so this calculation shows that:
∂τ ∂τ ∂σ ∂σ 
× = × · det DT (u, v),
∂u ∂v ∂s ∂t
where, contrary to standard practice, we have written the scalar factor det DT (u, v) on the right.
The two cross products are normal vectors to S. Since we assumed that σ is orientation-preserving,
we conclude that τ is orientation-preserving if det DT (u, v) is positive and orientation-reversing if
it is negative.
We are ready to compare the integrals obtained using the two parametrizations. Beginning
with the expansion (10.24) with τ as parametrization and then using (10.27) and its analogues to
pull out a factor of det DT (u, v), we obtain:
ZZ ZZ   ∂y ∂y   ∂z ∂z 
F · dS = (F1 ◦ τ ) · det ∂u ∂v + (F ◦ τ ) · det ∂u ∂v
∂z ∂z 2 ∂x ∂x
τ D∗ ∂u ∂v ∂u ∂v
∂x   ∂x
+ (F3 ◦ τ ) · det ∂u∂y
∂v
∂y du dv (10.28)
∂u ∂v
ZZ   ∂y ∂y   ∂z ∂z 
= ∂s
(F1 ◦ τ ) · det ∂z ∂t + (F ◦ τ ) · det ∂s ∂t
∂z 2 ∂x ∂x
D∗ ∂s ∂t ∂s ∂t
 ∂x ∂x 
∂s
+ (F3 ◦ τ ) · det ∂y ∂t · det DT (u, v) du dv. (10.29)
∂y
∂s ∂t

Next, note that det DT (u, v) = ±| det DT (u, v)|, where the sign depends on whether det DT (u, v) is
positive or negative or equivalently, as noted above, whether τ is orientation-preserving or reversing.
Keeping in mind that τ = σ ◦ T , we find:
ZZ ZZ   ∂y ∂y   ∂z ∂z 
F · dS = ± ∂s ∂t
(F1 ◦ σ ◦ T ) · det ∂z ∂z + (F2 ◦ σ ◦ T ) · det ∂x∂s ∂t
∂x
τ D∗ ∂s ∂t ∂s ∂t
 ∂x ∂x 
+ (F3 ◦ σ ◦ T ) · det ∂s ∂t · det DT (u, v) du dv.
∂y ∂y
∂s ∂t
This last integral is the result of a change of variables. It is an integral over D∗ that is the
pullback of an integral over D using T as the change of variables transformation from the uv-plane
to the st-plane. The integral over D to which the change of variables is applied is:
ZZ   ∂y ∂y   ∂z ∂z   ∂x ∂x 
± ∂s
(F1 ◦ σ) · det ∂z ∂t ∂s ∂t ∂s ∂t
∂z + (F2 ◦ σ) · det ∂x ∂x + (F3 ◦ σ) · det ∂y ∂y ds dt.
D ∂s ∂t ∂s ∂t ∂s ∂t

Going back to equation (10.24), this last integral is exactly what


RR would be used to calculate the
surface integral using σ as parametrization. That is, it is ± σ F · dS. We have attained our
objective.
Proposition 10.17. Let σ : D → R3 be a smooth orientation-preserving parametrization of an
oriented surface S in R3 . If τ : D∗ → R3 is another smooth parametrization of S, then:
ZZ ZZ
F · dS = ± F · dS,
τ σ

www.dbooks.org
264 CHAPTER 10. SURFACE INTEGRALS

where the sign depends on whether τ is orientation-preserving or reversing.


RR RR
In particular, in order to compute S F · dS = σ F · dS, it does not matter which orientation-
preserving parametrization σ is used.
For RR
integrals ofRRreal-valued functions with respect to surface area, the corresponding assertion
is that σ f dS = τ f dS for any two smooth parametrizations σ, τ of S. The proof combines the
ideas just given above with those given in the analogous situation for integrals over curves with
respect to arclength. We won’t go through the details. This may be one of those rare occasions
where it is all right to say that the authorities have weighed in and to let them have their way.
One last remark: We commented earlier that the definition of the vector field surface integral
can be regarded as pulling back the integral of the 2–form η = F1 dy∧dz +F2 dz ∧dx+F3 dx∧dy to a
double integral in the parameter plane (see equation (10.23)). Using that language, our calculation
of the independence of parametrization can be summarized in one line:
ZZ Z ZZ ZZ
∗ ∗ ∗ ∗
τ (η) = (σ ◦ T ) (η) = T (σ (η)) = σ ∗ (η). (10.30)
D∗ D∗ D∗ D

In other words, using either σ or τ to pull back η gives the same value.
Don’t panic! The first equation follows since T was defined so that τ = σ ◦ T . The second
equation uses the fact that (σ ◦ T )∗ (η) = T ∗ (σ ∗ (η)). This is essentially the chain rule-based
calculation that brought out the factor of det DT (u, v) in going from (10.28) to (10.29), though
we would need to develop the properties of pullbacks more carefully to explain this. (To get a
taste of these properties, see Exercises 4.14–4.18 in Chapter 11.) Finally, the step going from D∗
to D uses the pullback form of the change of variables theorem, which is a refinement that takes
orientation into account and eliminates the absolute value around the determinant (see Exercise
4.20 in Chapter 11).
Though there are gaps that need to be filled in, the intended lesson here is that differential forms,
properly developed, provide a structure in which the arguments can be presented with remarkable
conciseness. Of course, the proper development may involve lengthy calculations similar to the
ones we presented, but their validity is guided by and rooted in basic principles: the chain rule and
change of variables.

10.9 Exercises for Chapter 10


Section 1 What the surface integral measures

1.1. Let S be the hemisphere x2 + y 2 + z 2 = a2 , z ≥ 0, oriented by the outward normal.


RR
(a) Find S F · dS if F(x, y, z) = (x, y, z).
RR
(b) Find S F · dS if F(x, y, z) = (y − z, z − x, x − y).
1.2. Let F(x, y, z) = (z, x, y), and let S be the triangular surface 3
RR in R with vertices (1, 0, 0),
(0, 1, 0), and (0, 0, 1), oriented by the upward normal. Find S F · dS.
1.3. Let F be a continuous vector field on R3 such that F(x, y, z) is always a scalar multiple of
2 2
RR let S be the cylinder x + y = 1, 0 ≤ z ≤ 5,
(0, 0, 1), i.e., F(x, y, z) = 0, 0, F3 (x, y, z) , and
oriented by the outward normal. Show that S F · dS = 0.
1.4. Let U be an open set in R3 , and let S be a smooth oriented surface in U with orienting normal
vector
RRfield n. Let F be a continuous vector field on U . If F(x) = n(x) for all x in S, prove
that S F · dS = Area (S).
10.9. EXERCISES FOR CHAPTER 10 265

1.5. Let F : R3 → R3 be a vector field such that F(−x) = −F(x) for all x in R3 , and let S be
the 2 2 2 2
RR sphere x + y + z = a , oriented by the outward normal. Is it necessarily true that
S F · dS = 0? Explain.

Section 2 The definition of the surface integral

2.1. Let S be the portion of the plane x + y + z = 3 for which 0 ≤ xRR ≤ 1 and 0 ≤ y ≤ 1. Let
F(x, y, z) = (2y, x, z). If S is oriented by the upward normal, find S F · dS.

2.2. Let F(x, y, z) = (xz, yz, z 2 ).


2 2
RR S be the cylinder x + y = 4, 1 ≤ z ≤ 3, oriented by the outward normal. Find
(a) Let
S F · dS.
(b) Suppose that you “close up” the cylinder S in part (a) by adding the disks x2 + y 2 ≤ 4,
z = 3, and x2 + y 2 ≤ 4, z = 1, at the top and bottom, respectively, where the disk at
the top is oriented by the upward normal and the one at the bottom is oriented by the
downward
RR normal. Let S1 be the resulting surface: cylinder, top, and bottom. Find
S1 F · dS.

2.3. Let S be the helicoid parametrized by:

σ(r, θ) = (r cos θ, r sin θ, θ), 0 ≤ r ≤ 1, 0 ≤ θ ≤ 4π,

oriented by the upward normal. Find S F · dS if F(x, y, z) = (y, −x, z 2 ).


RR

2.4. Let F(x, y, z) = x, −z, 2y(x2 − y 2 ) . Find S F · dS if S is the saddle surface z = x2 − y 2 ,


 RR

where x2 + y 2 ≤ 4, oriented by the upward normal.

2.5. Let W be the solid region in R3 that lies inside the cylinders x2 + z 2 = 1 and y 2 + z 2 = 1 and
above the xy-plane. This solid appeared in Example 5.2 of Chapter 5, shown again in Figure
10.27. Let S be the piecewise smooth surface that is the exposed part of the boundary of W ,
that is, the cylindrical surfaces but not the base, oriented by the upward normal.
RR
Find S F · dS if F is the vector field F(x, y, z) = (0, 0, z).

Figure 10.27: The surface S consisting of portions of the cylinders x2 + z 2 = 1 and y 2 + z 2 = 1

www.dbooks.org
266 CHAPTER 10. SURFACE INTEGRALS

Section 3 Stokes’s theorem

3.1. Consider the vector field F(x, y, z) = (2yz, y sin z, 1 + cos z).

(a) Find a vector field G whose curl is F.


2 + 4y 2 + z 2 = 4, z ≥ 0, oriented by the upward normal.
(b) Let S be the half-ellipsoid 4xRR
Use Stokes’s theorem to find S F · dS.
(c) Find Se F · dS if Se is the portion of the surface z = 1 − x2 − y 2 above the xy-plane,
RR

oriented by the upward-pointing normal. (Hint: Take advantage of what you’ve already
done.)

3.2. Consider the vector field F(x, y, z) = (ex+y − xey+z , ey+z − ex+y + yez , −ez ).

(a) Is F a conservative vector field? Explain.


(b) Find a vector field G = (G1 , G2 , G3 ) such that G2 = 0 and the curl of G is F.
(c) Find S F · dS if S is the hemisphere x2 + y 2 + z 2 = 4, z ≥ 0, oriented by the outward
RR

normal.
(d) Find S F · dS if S is the hemisphere x2 + y 2 + z 2 = 4, z ≤ 0, oriented by the outward
RR

normal.
(e) Find S F · dS if S is the cylinder x2 + y 2 = 4, 0 ≤ z ≤ 4, oriented by the outward
RR

normal.

Exercises 3.3–3.6 illustrate situations where, by bringing in a surface integral, Stokes’s theorem
can be used to obtain information about line integrals that would be hard to find directly.

3.3. Let F(x, y, z) = (xz + yz, 2xz, 12 y 2 ).

(a) Find ∇ × F.
(b) Let C be the curve of intersection in R3 of the cylinder x2 + y 2 = 1 and the saddle
surface z = x2 − y 2 , oriented counterclockwise as viewed from Rhigh above the xy-plane,
looking down. Use Stokes’s theorem to find the line integral C F · ds. (Hint: Find a
surface whose boundary is C.)

3.4. Let P be a plane in R3 described by the equation ax + by + cz = d. Let C be a piecewise


smooth oriented simple closed curve that lies in P, and let S be the subset of P that is
enclosed by C, i.e., the region of P “inside” C. Show that the area of S is equal to the
absolute value of:
Z
1
√ (bz − cy) dx + (cx − az) dy + (ay − bx) dz.
2 a2 + b2 + c2 C

3.5. (a) Find an example of a smooth vector field F = (F1 , F2 , F3 ) defined on an open set U of
∂Fi ∂F
R3 such that ∂x j
= ∂xji for all i, j but there exists a piecewise smooth oriented simple
R
closed curve C in U such that C F1 dx + F2 dy + F3 dz 6= 0.
(b) On the other hand, show that, if F is any smooth vector field on an open set U in R3
∂Fi ∂Fj R
such that ∂x j
= ∂xi
for all i, j, then C F1 dx + F2 dy + F3 dz = 0 for any piecewise
smooth oriented simple closed curve C that is the boundary of an oriented surface S
contained in U .
10.9. EXERCISES FOR CHAPTER 10 267

(c) What does part (b) tell you about the curve C you found in part (a)?
3.6. Let C1 and C2 be piecewise smooth simple closed curves contained in the cylinder x2 + y 2 = 1
that do not intersect one another, both oriented counterclockwise when viewed from high
above the xy-plane looking down, as illustrated inR Figure 10.28. If F is the vector field
3 5 2
R
F(x, y, z) = (z − 2y , 2yz + x , y + x), show that C1 F · ds = C2 F · ds.

Figure 10.28: Two nonintersecting counterclockwise curves C1 and C2 in the cylinder x2 + y 2 = 1

Section 4 Curl fields


In Exercises 4.1–4.3, determine whether the given vector field F is a curl field. If it is, find a
vector field G whose curl is F. If not, explain why not.

4.1. F(x, y, z) = (x, y, z)


4.2. F(x, y, z) = (y, z, x)
4.3. F(x, y, z) = (x + 2y + 3z, x4 + y 5 + z 6 , x7 y 8 z 9 )

4.4. This exercise ties up some loose ends left dangling after Example 10.3 back in Section 10.1.
Let S1 be the hemisphere x2 + y 2 + z 2 = a2 , z ≥ 0, oriented by the outward normal, and let
S2 be the disk x2 + y 2 ≤ a2 , z = 0, oriented by the upward normal.
Let F be the vector field F(x, y, z) = (0, 0, −1).
(a) Show that F is a curl field by finding a vector field G whose curl is F.
(b) In Example 10.3, we showed that it is fairly easy to calculate that S2 F · dS = −πa2 .
RR

Without any further calculation, explain why S1 F · dS = −πa2 as well.


RR

4.5. Let F and G be vector fields on R3 .


(a) True or false: If G = F + C for some constant vector C, then F and G have the same
curl. Either prove that the statement is true, or find an example in which it is false.
(b) True or false: If F and G have the same curl, then G = F + C for some constant vector
C. Either prove that the statement is true, or find an example in which it is false.
4.6. Does there exist a nonzero vector field F on R3 with the property that:
• C F · ds = 0 for every piecewise smooth oriented closed curve C in R3 and
R

• S F · dS = 0 for every piecewise smooth oriented closed surface S in R3 ?


RR

If so, find such a vector field, and justify why it works. If no such vector field exists, explain
why not.

www.dbooks.org
268 CHAPTER 10. SURFACE INTEGRALS

Section 5 Gauss’s theorem

5.1. Let F(x, y, z) = (x3 , y 3 , z 3 ). Use Gauss’s theorem to find S F · dS if S is the sphere x2 +
RR

y 2 + z 2 = 4, oriented by the outward normal.


5.2. Find S F · dS if F(x, y, z) = (x3 y 3 z 4 , x2 y 4 z 4 , x2 y 3 z 5 ) and S is the boundary of the cube
RR

W = [0, 1] × [0, 1] × [0, 1], oriented by the normal pointing away from W .
5.3. Let S be the silo-shaped closed surface consisting of:
• the cylinder x2 + y 2 = 4, 0 ≤ z ≤ 3,
• a hemispherical cap of radius 2 centered at (0, 0, 3), i.e., x2 + y 2 + (z − 3)2 = 4, z ≥ 3,
and
• a base consisting of the disk x2 + y 2 ≤ 4, z = 0,
RR
all oriented by the normal pointing away from the silo. Find S F · dS if F(x, y, z) =
(3xz 2 , 2y, x + y 2 − z 3 ).
5.4. One way to produce a torus is to take the circle C in the xz-plane of radius b and center
(a, 0, 0) and rotate it about the z-axis. We assume that a > b. The surface that is swept out
is a torus. The circle C can be parametrized by α(ψ) = (a + b cos ψ, 0, b sin ψ), 0 ≤ ψ ≤ 2π.
The distance of α(ψ) to the z-axis is a + b cos ψ, so rotating α(ψ) counterclockwise by θ about
the z-axis brings it to the point (a + b cos ψ) cos θ, (a + b cos ψ) sin θ, b sin ψ). This gives the

Figure 10.29: Parametrizing a torus

following parametrization of the torus with ψ and θ as parameters, illustrated in Figure 10.29:

σ(ψ, θ) = (a + b cos ψ) cos θ, (a + b cos ψ) sin θ, b sin ψ , 0 ≤ ψ ≤ 2π, 0 ≤ θ ≤ 2π.

Let S denote the resulting torus, and assume that it is oriented by the unit normal vector
that points towards the exterior of the torus.
(a) Show that:
 
∂σ ∂σ
× = −b(a + b cos ψ) cos ψ cos θ, cos ψ sin θ, sin ψ .
∂ψ ∂θ
10.9. EXERCISES FOR CHAPTER 10 269

(b) Is σ orientation-preserving or orientation-reversing?


(c) Find the surface area of the torus.
(d) Let F(x, y, z) = RR
(x, y, z). Use the parametrization of S and the definition of the surface
integral to find S F · dS. Then, use your answer and the result of Example 10.16 to
find the volume of the solid torus, that is, the three-dimensional region enclosed by the
torus.
(e) Find the integral of the vector field F(x, y, z) = (0, 0, 1) over S. Use whatever method
seems best. What does your answer say about the flow of F through S?
5.5. We describe an alternative approach to Exercise 4.4. Let F(x, y, z) = (0, 0, −1). The main
point of the aforementioned exercise is that:
ZZ ZZ
F · dS = F · dS, (10.31)
S1 S2

where S1 is the hemisphere x2 + y 2 + z 2 = a2 , z ≥ 0, oriented by the outward normal, and


S2 is the disk x2 + y 2 ≤ a2 , z = 0, oriented by the upward normal. The argument was based
on Stokes’s theorem.
Show that the same conclusion (10.31) can be reached using Gauss’s theorem.
5.6. A bounded solid W in R3 is contained in the region a ≤ z ≤ b, where a < b. Its boundary is
the union of three nonempty parts, as shown on the left of Figure 10.30:
• a top surface T that is a subset of the plane z = b,
• a bottom surface B that is a subset of the plane z = a, and
• a surface S in between that connects the bottom and the top.
Let F(x, y, z) = (0,
RR0, 1). If the third piece above, S, is oriented by the normal pointing away
from W , express S F · dS in terms of Area (T ) and Area (B).

T
xx²2++ yy²2 ==4,9,z = z8 = 2

n–
S

Figure 10.30: A solid region W lying between the planes z = a and z = b whose boundary consists
of a surface S, a top T , and a bottom B (left) and a lumpy shopping bag of volume 75 (right)

5.7. A shopping bag is modeled as a lumpy surface S having an open circular top. Suppose that
the bag is placed in R3 in such a way that the boundary of the opening is the circle x2 +y 2 = 4
in the plane z = 8. See Figure 10.30, right. Assume that, when filled exactly to the brim, the
bag has a volume of 75.
Let F(x, y, z) = (x, y, z). If S is oriented by the outward normal, use Gauss’s theorem to find
RR
S F · dS. (Note that S is not a closed surface.)

www.dbooks.org
270 CHAPTER 10. SURFACE INTEGRALS

Let U be an open set in R3 . A smooth real-valued function h : U → R is called harmonic on


U if:
∂2h ∂2h ∂2h
+ 2 + 2 = 0 at all points of U.
∂x2 ∂y ∂z
Exercises 5.8–5.10 concern harmonic functions. The results are the analogues of Exercises 4.12–4.15
in Chapter 9, which were about harmonic functions of two variables.
5.8. Show that h(x, y, z) = e−5x sin(3y + 4z) is a harmonic function on R3 .
5.9. Let W be a bounded subset of an open set U in R3 whose boundary ∂W consists of a finite
number of piecewise smooth closed surfaces. Assume that every pair of points of W can be
connected by a piecewise smooth path in W . If h is harmonic on U and if h = 0 at all points of
∂W , prove that h = 0 on all of W . (Hint: Consider the vector field F = h ∇h = h( ∂h ∂h ∂h
∂x , ∂y , ∂z ).)

5.10. Let U and W be as in the preceding exercise. Show that, if h1 and h2 are harmonic on U
and if h1 = h2 at all points of ∂W , then h1 = h2 on all of W . Thus a harmonic function is
determined on W by its values on the boundary.
Section 6 The inverse square field

6.1. Let Sa denote the sphere x2 + y 2 + z 2 = a2 of radius a, oriented by the outward normal. If
we did not already know the answer, it might be tempting to calculate the integral of the
inverse square field G(x, y, z) = − (x2 +y2x+z 2 )3/2 , − (x2 +y2y+z 2 )3/2 , − (x2 +y2z+z 2 )3/2 over Sa by


filling in the sphere with the solid ball Wa given by x2 + y 2 + z 2 ≤ a2 and using Gauss’s
theorem. Unfortunately, this is not valid (and would give an incorrect answer), because Wa
contains the origin and the origin is not in the domain of G. On Sa , however, G agrees with
the vector field F(x, y, z) = − a13 (x, y, z), and F is defined on all of R3 .
RR
Find Sa G · dS by replacing G with F and applying Gauss’s theorem.

6.2. Let F be a vector field defined on all of R3 , except at the two points p = (2, 0, 0) and
q = (−2, 0, 0). Let S1 , S2 , and S be the following spheres, centered at (2, 0, 0), (−2, 0, 0), and
(0, 0, 0), respectively, each oriented by the outward normal:
S1 : (x − 2)2 + y 2 + z 2 = 1
S2 : (x + 2)2 + y 2 + z 2 = 1
S: x2 + y 2 + z 2 = 25.
RR RR RR
Assume that ∇ · F = 0. If S1 F · dS = 5 and S2 F · dS = 6, what is S F · dS?
6.3. Give the details of the justification of the instance of Gauss’s law cited in the text. That is,
3
Pk closed surface in R − {p1 , p2 , . . . , pk }, oriented by the outward
if S is a piecewise smooth
normal, and F(x) = i=1 G(x − pi ) is the vector field of (10.18), show that:
ZZ
1
− F · dS equals the number of the points p1 , p2 , . . . , pk that lie inside S.
4π S

6.4. Suppose that objects of mass m1 , m2 , . . . , mk , not necessarily equal, are located at the points
p1 , p2 , . . . , pk , respectively, in R3 . The gravitational force that they exert on another object
whose position is x is described by the vector field:
k
X
F(x) = mi G(x − pi ),
i=1
10.9. EXERCISES FOR CHAPTER 10 271

where G is our usual inverse square field (10.16). Let S be a piecewise smooth closed surface in
3 1
RR
R −{p1 , p2 , . . . , pk }, oriented by the outward normal. What can you say about − 4π S F·dS
in this case?

Section 7 A more substitution-friendly notation for surface integrals

7.1. Let S be the unit sphere x2 + y 2 + z 2 = 1, oriented by the outward normal. Consider the
surface integral: ZZ
xy dy ∧ dz + yz dz ∧ dx + xz dx ∧ dy.
S

(a) What is the vector field that is being integrated?


(b) Evaluate the integral.

7.2. Suppose that a surface S in R3 is the graph z = f (x, y) of a smooth real-valued function
f : D → R whose domain D is a bounded subset of R2 . Let S be oriented by the upward
normal. Show that: ZZ
1 dx ∧ dy = Area (D).
S
In other words, the integral
RR is the area RR
of the projection of S on the xy-plane. Similar
interpretations hold for S 1 dy ∧ dz and S 1 dz ∧ dx under analogous hypotheses.

Section 8 Independence of parametrization


p
8.1. Let S be the portion of the cone z = x2 + y 2 in R3 where 0 ≤ z ≤ 2. In Section 5.4, we
showed that S could be parametrized in more than one way. One possibility is based on using
polar/cylindrical coordinates as parameters (Example 5.14):

σ : [0, 2] × [0, 2π] → R3 , σ(s, t) = (s cos t, s sin t, s).

Another is based on spherical coordinates (Example 5.17):


√ √ √
√ 3 2 2 2
τ : [0, 2 2] × [0, 2π] → R , τ (u, v) = ( u cos v, u sin v, u).
2 2 2
√ 
Find a function T : [0, 2 2] × [0, 2π] → [0, 2] × [0, 2π] that satisfies τ (u, v) = σ T (u, v) , as in
equation (10.25).

www.dbooks.org
272 CHAPTER 10. SURFACE INTEGRALS
Chapter 11

Working with differential forms

This final chapter is a whirlwind survey of how to work with differential forms. We focus particularly
on what differential forms have to say about some of the main theorems we have learned recently.
In what follows, all functions, including vector fields, are assumed to be smooth.

Four theorems, version 1

1. Conservative vector fields. Let C be a piecewise smooth oriented curve in Rn from p


to q. If F = ∇f is a conservative vector field on an open set containing C, then:
Z
∇f · ds = f (q) − f (p). (C)
C

2. Green’s theorem. Let D be a bounded subset of R2 . If F = (F1 , F2 ) is a vector field


on an open set containing D, then:
ZZ Z
∂F2 ∂F1 
− dx dy = F · ds. (Gr)
D ∂x ∂y ∂D

3. Stokes’s theorem. Let S be a piecewise smooth oriented surface in R3 . If F =


(F1 , F2 , F3 ) is a vector field on an open set containing S, then:
ZZ Z
(∇ × F) · dS = F · ds. (S)
S ∂S

4. Gauss’s theorem. Let W be a bounded subset of R3 . If F = (F1 , F2 , F3 ) is a vector


field on an open set containing W , then:
ZZZ ZZ
(∇ · F) dx dy dz = F · dS. (Ga)
W ∂W

Our goal is to present just enough of the properties of differential forms to indicate how forms
enter the picture as far as these theorems are concerned. Further information is developed in the
exercises.

273

www.dbooks.org
274 CHAPTER 11. WORKING WITH DIFFERENTIAL FORMS

11.1 Integrals of differential forms


Roughly speaking, a differential k-form is an expression that can be integrated over an oriented
k-dimensional domain. For instance, in Rn , here are the cases at opposite ends of the range of
dimensions.

• n-forms. In Rn with coordinates x1 , x2 , . . . , xn , an n-form is an expression:

f dx1 ∧ dx2 ∧ · · · ∧ dxn ,

where f is a real-valued function of x = (x1 , x2 , . . . , xn ). It could also be written f ∧ dx1 ∧


dx2 ∧ · · · ∧ dxn , though f is treated commonly as a scalar factor. If W is an n-dimensional
n
Rbounded subset of R , which by default we regard as positively oriented, then the integral
W f dx1 ∧ dx2 ∧ · · · ∧ dxn is defined to be the usual Riemann integral:
Z ZZ Z
f dx1 ∧ dx2 ∧ · · · ∧ dxn = ··· f (x1 , x2 , . . . , xn ) dx1 dx2 · · · dxn . (11.1)
W W

As indicated on the left of equation (11.1), we won’t use multiple integral signs with differential
forms. The dimension of the integral can be inferred from the type of form that is being
integrated.

• 0-forms. A 0-form is simply another name for a real-valued function f . 0-forms are integrated
over 0-dimensional domains, i.e., individual points, by evaluating
R the function at thatR point
and where we orient a point by attaching a + or − sign. So +{p} f = f (p) and −{p} f =
−f (p). If C is an oriented curve from p to q, we define its oriented boundary to be the
oriented points ∂C = (+{q}) ∪ (−{p}).

We restrict our attention here to differential forms on subsets of R3 (or R or R2 , depending on


the context), real-valued functions f of three variables (or one or two), and vector fields on subsets
of R3 (or R or R2 ), since these are the cases of interest in the four big theorems listed above.
Table 11.1 below gives the notation for 0, 1, 2, and 3-forms and their associated integrals. The
integrals that the notation stands for, shown in the last column, are the kinds of integrals we have
been studying: 1-forms integrate as line integrals, 2-forms as surface integrals, and 3-forms as triple
integrals.

Type of Domain of Notation for integral Meaning


form integration using forms of integral
R R
0-form oriented points +{p} f or −{p} f = f (p) or −f (p)
R R
1-form oriented curves C F1 dx + F2 dy + F3 dz = C F · ds
R
2-form oriented surfaces S F1 dy ∧ dz + F2 dz ∧ dx RR
+F3 dx ∧ dy = S F · dS
R RRR
3-form 3-dim solids W f dx ∧ dy ∧ dz = W f dx dy dz

Table 11.1: Integrals of differential forms

In this notation, the four theorems (C), (Gr), (S), and (Ga) translate as integrals of forms as
follows.
11.2. DERIVATIVES OF DIFFERENTIAL FORMS 275

Four theorems, version 2

1. Conservative vector fields.


Z Z
∂f ∂f ∂f
dx + dy + dz = f (q) − f (p) = f (C)
C ∂x ∂y ∂z ∂C

2. Green’s theorem.
Z   Z
∂F2 ∂F1
− dx ∧ dy = F1 dx + F2 dy (Gr)
D ∂x ∂y ∂D

3. Stokes’s theorem.
Z      
∂F3 ∂F2 ∂F1 ∂F3 ∂F2 ∂F1
− dy ∧ dz + − dz ∧ dx + − dx ∧ dy
S ∂y ∂z ∂z ∂x ∂x ∂y
Z (S)
= F1 dx + F2 dy + F3 dz
∂S

4. Gauss’s theorem.
Z   Z
∂F1 ∂F2 ∂F3
+ + dx ∧ dy ∧ dz = F1 dy ∧ dz + F2 dz ∧ dx + F3 dx ∧ dy (Ga)
W ∂x ∂y ∂z ∂W

11.2 Derivatives of differential forms


In this brief foray, we focus on how differential forms behave without worrying about exactly what
they are. Here are some simple rules for working with forms.

1. Arithmetic. These are the rules we encountered earlier that reflect a connection between
differential forms and determinants (see the end of Section 10.7):
(a) dy ∧ dx = −dx ∧ dy, dz ∧ dy = −dy ∧ dz, dx ∧ dz = −dz ∧ dx,
(b) dx ∧ dx = dy ∧ dy = dz ∧ dz = 0.

These rules have the effect of simplifying what differential forms can look like. For instance, a
2-form should be a generalized double integrand. In R3 , such an expression would have the form:

f1 dx ∧ dx + f2 dx ∧ dy + f3 dx ∧ dz + f4 dy ∧ dx + · · · + f9 dz ∧ dz,

where f1 , f2 , . . . , f9 are real-valued functions. According to the rules, however, three of the nine
terms—the ones involving dx ∧ dx, dy ∧ dy, and dz ∧ dz—are zero and therefore can be omitted.
Similarly, the remaining six terms can be combined in pairs: for instance, f2 dx ∧ dy + f4 dy ∧ dx =
f2 dx ∧ dy − f4 dx ∧ dy = (f2 − f4 ) dx ∧ dy. Hence we may assume that any 2-form on a subset of
R3 is a sum of three terms having the form η = F1 dy ∧ dz + F2 dz ∧ dx + F3 dx ∧ dy.

2. Differentiation. In general, the derivative of a k-form is a (k + 1)-form. Specifically, for


differential forms on subsets of R3 , we define:

www.dbooks.org
276 CHAPTER 11. WORKING WITH DIFFERENTIAL FORMS

(a) For 0-forms:


∂f ∂f ∂f
df = dx + dy + dz.
∂x ∂y ∂z
(b) For 1-forms:

d(F1 dx + F2 dy + F3 dz) = dF1 ∧ dx + dF2 ∧ dy + dF3 ∧ dz.

(c) For 2-forms:

d(F1 dy ∧ dz + F2 dz ∧ dx + F3 dx ∧ dy) = dF1 ∧ dy ∧ dz + dF2 ∧ dz ∧ dx + dF3 ∧ dx ∧ dy.

In (b) and (c), F1 , F2 , and F3 are 0-forms, so their derivatives are as defined in (a).

Example 11.1. Let x = r cos θ and y = r sin θ. These are real-valued functions of r and θ—in
other words, they are 0-forms—so by definition:
∂x ∂x
dx = dr + dθ = cos θ dr − r sin θ dθ
∂r ∂θ
∂y ∂y
and dy = dr + dθ = sin θ dr + r cos θ dθ.
∂r ∂θ
Then, by the arithmetic rules:

dx ∧ dy = (cos θ dr − r sin θ dθ) ∧ (sin θ dr + r cos θ dθ)


= cos θ sin θ dr ∧ dr + r cos2 θ dr ∧ dθ − r sin2 θ dθ ∧ dr − r2 sin θ cos θ dθ ∧ dθ
= cos θ sin θ · 0 + r cos2 θ dr ∧ dθ − r sin2 θ · (−dr ∧ dθ) − r2 sin θ cos θ · 0
= (r cos2 θ + r sin2 θ) dr ∧ dθ
= r dr ∧ dθ.

Example 11.2. Let σ(s, t) = (x(s, t), y(s, t), z(s, t)) be a smooth parametrization of a surface S.
Then x, y, and z are each real-valued functions of s and t, so, for example:
∂y ∂y ∂z ∂z
dy = ds + dt and dz = ds + dt.
∂s ∂t ∂s ∂t
It follows that:
∂y ∂y ∂z ∂z
dy ∧ dz = ( ds + dt) ∧ ( ds + dt)
∂s ∂t ∂s ∂t
∂y ∂z ∂y ∂z
=0+ ds ∧ dt + dt ∧ ds + 0
∂s ∂t ∂t ∂s
∂y ∂z ∂y ∂z 
= − ds ∧ dt
∂s ∂t ∂t ∂s
 ∂y ∂y 
∂s ∂t
= det ∂z ∂z ds ∧ dt.
∂s ∂t

This is the same as the expression for dy ∧ dz that we obtained when we discussed the substitutions
used to compute surface integrals, as in equation (10.21) of Chapter 10. Perhaps the calculation
given here is a more organic way of obtaining the substitution. Either way, the formula is consistent
with what the change of variables theorem says to do to pull back an integral with respect to y and
z to one with respect to s and t.
11.2. DERIVATIVES OF DIFFERENTIAL FORMS 277

Next, we derive formulas for the derivatives of 1 and 2-forms.

Example 11.3. For a 1-form ω = F1 dx + F2 dy + F3 dz, we find dω by calculating d(F1 dx),


d(F2 dy), and d(F3 dz) separately. For instance, by definition:

d(F1 dx) = dF1 ∧ dx


∂F1 ∂F1 ∂F1 
= dx + dy + dz ∧ dx
∂x ∂y ∂z
∂F1 ∂F1 ∂F1
= dx ∧ dx + dy ∧ dx + dz ∧ dx
∂x ∂y ∂z
∂F1 ∂F1
=0− dx ∧ dy + dz ∧ dx
∂y ∂z
∂F1 ∂F1
=− dx ∧ dy + dz ∧ dx.
∂y ∂z

Similarly:

∂F2 ∂F2
d(F2 dy) = dx ∧ dy − dy ∧ dz
∂x ∂z
∂F3 ∂F3
and d(F3 dz) = − dz ∧ dx + dy ∧ dz.
∂x ∂y

Adding these calculations gives:

∂F3 ∂F2 
d(F1 dx + F2 dy + F3 dz) = − dy ∧ dz
∂y ∂z
∂F1 ∂F3  ∂F2 ∂F1 
+ − dz ∧ dx + − dx ∧ dy,
∂z ∂x ∂x ∂y

a formula that is reproduced in row 2(b) of Table 11.2 below.

Example 11.4. For the derivative of a 2-form η = F1 dy ∧ dz + F2 dz ∧ dx + F3 dx ∧ dy, we compute,


for example:

d(F2 dz ∧ dx) = dF2 ∧ dz ∧ dx


∂F2 ∂F2 ∂F2 
= dx + dy + dz ∧ dz ∧ dx
∂x ∂y ∂z
∂F2 ∂F2 ∂F2
= dx ∧ dz ∧ dx + dy ∧ dz ∧ dx + dz ∧ dz ∧ dx
∂x ∂y ∂z
∂F2 ∂F2
=− dx ∧ dx ∧ dz − dy ∧ dx ∧ dz + 0
∂x ∂y
∂F2
=0+ dx ∧ dy ∧ dz
∂y
∂F2
= dx ∧ dy ∧ dz.
∂y

Likewise:
∂F1 ∂F3
d(F1 dy ∧ dz) = dx ∧ dy ∧ dz and d(F3 dx ∧ dy) = dx ∧ dy ∧ dz.
∂x ∂z

www.dbooks.org
278 CHAPTER 11. WORKING WITH DIFFERENTIAL FORMS

Summing these calculations gives:


∂F1 ∂F2 ∂F3 
d(F1 dy ∧ dz + F2 dz ∧ dx + F3 dx ∧ dy) = + + dx ∧ dy ∧ dz,
∂x ∂y ∂z
as in row 3 of Table 11.2.
We collect the results in a table. Below it, for future reference, we reproduce the most recent
version of our four big theorems (version 2).

1. 0-forms
(a) f = f (x) df = f 0 (x) dx
∂f ∂f
(b) f = f (x, y) df = ∂x dx + ∂y dy
∂f ∂f ∂f
(c) f = f (x, y, z) df = ∂x dx + ∂y dy + ∂z dz
2. 1-forms
∂F2 ∂F1

(a) ω = F1 (x, y) dx + F2 (x, y) dy dω = ∂x − ∂y dx ∧ dy
dω = ∂F ∂F2

(b) ω = F1 (x, y, z) dx + F2 (x, y, z) dy ∂y − ∂z dy ∧ dz
3

+ ∂F ∂F3 ∂F2 ∂F1


 
+F3 (x, y, z) dz ∂z − ∂x dz ∧ dx + ∂x − dx ∧ dy
1
∂y

3. 2-forms
∂F1 ∂F2 ∂F3

η = F1 dy ∧ dz + F2 dz ∧ dx + F3 dx ∧ dy dη = ∂x + ∂y + ∂z dx ∧ dy ∧ dz

Table 11.2: Formulas for derivatives

Four theorems, version 2 (again)


1. Conservative vector fields.
Z Z
∂f ∂f ∂f
dx + dy + dz = f (q) − f (p) = f (C)
C ∂x ∂y ∂z ∂C

2. Green’s theorem.
Z   Z
∂F2 ∂F1
− dx ∧ dy = F1 dx + F2 dy (Gr)
D ∂x ∂y ∂D

3. Stokes’s theorem.
Z      
∂F3 ∂F2 ∂F1 ∂F3 ∂F2 ∂F1
− dy ∧ dz + − dz ∧ dx + − dx ∧ dy
S ∂y ∂z ∂z ∂x ∂x ∂y
Z (S)
= F1 dx + F2 dy + F3 dz
∂S

4. Gauss’s theorem.
Z   Z
∂F1 ∂F2 ∂F3
+ + dx ∧ dy ∧ dz = F1 dy ∧ dz + F2 dz ∧ dx + F3 dx ∧ dy (Ga)
W ∂x ∂y ∂z ∂W
11.3. A LOOK BACK AT THE THEOREMS OF MULTIVARIABLE CALCULUS 279

11.3 A look back at the theorems of multivariable calculus


Take a moment to compare the formulas in Table 11.2 with the four theorems. It soon becomes
apparent that, in each theorem, a differential form and its derivative appear. In this new framework,
the theorems can be written in spectacularly streamlined form.

Four theorems, final version

1. Conservative vector fields. If f is a real-valued function, then:


Z Z
df = f. (C)
C ∂C

2. Green’s theorem. If ω = F1 dx + F2 dy, then:


Z Z
dω = ω. (Gr)
D ∂D

3. Stokes’s theorem. If ω = F1 dx + F2 dy + F3 dz, then:


Z Z
dω = ω. (S)
S ∂S

4. Gauss’s theorem. If η = F1 dy ∧ dz + F2 dz ∧ dx + F3 dx ∧ dy, then:


Z Z
dη = η. (Ga)
W ∂W

They’re all the same theorem! But there’s even more to it, for suppose we go even further back,
to first-year calculus and a real-valued function f (x) of one variable. The fundamental theorem of
Rb
calculus says that a f 0 (x) dx = f (b) − f (a). Denoting the interval [a, b] by I and converting to
differential form notation, this becomes:
Z Z
df = f.
I ∂I

In other words, the four theorems (C), (Gr), (S), and (Ga) of multivariable calculus are general-
izations of something that we’ve known for a long time.
The pattern continues in higher dimensions and with more variables. One studies subsets of Rn
that are k-dimensional in the sense that they can be traced out using k independent parameters.
In other words, they are the images of smooth functions of the form ψ : D → Rn , where D ⊂ Rk .
These objects are called k-dimensional manifolds. Differential k-forms can be integrated over
them, or at least over the ones for which there is a notion of orientation.
A typical k-form in Rn is a k-dimensional integrand, that is, a sum of terms of the form
f dxi1 ∧ dxi2 ∧ · · · ∧ dxik , where f is a real-valued function of n variables x1 , x2 , . . . , xn . By the
arithmetic rules for differential forms, we may assume that the subscripts i1 , i2 , . . . , ik are distinct
and in fact that they are arranged, say in increasing order i1 < i2 < · · · < ik . If ζ is a k-form
and M is a oriented k-dimensional manifold in Rn with an orientation-preserving parametrization

www.dbooks.org
280 CHAPTER 11. WORKING WITH DIFFERENTIAL FORMS

ψ : D → Rn as above, then the integral M ζ is defined using substitution and the rules of forms to
R

pull the integral back to an ordinary Riemann integral over the k-dimensional parameter domain
D in Rk . (See the remarks before Exercise 4.14 at the end of the chapter for more about the
pullback process.) In the end, the relation between the integral and its pullback is summarized by
the formula: Z Z
ζ= ψ ∗ (ζ). (11.2)
M D
That the value of the integral does not depend on the parametrization rests ultimately on the
change of variables theorem and the chain rule. In fact, you have seen the proof, at least in outline
form. It is (10.30).
Perhaps it is worth reinforcing the point that equation (11.2) agrees with the expressions for
line integrals and surface integrals that we obtained in the previous two chapters. See (9.6) and
(10.23). In particular, line and surface integrals fit in as two special cases of a general type of
integral, something that may not have been clear from the original motivations in terms of tangent
and normal components of vector fields.
The general theory has a striking coherence and consistency. The culmination may be a theorem
that describes how information over the entire manifold is related to behavior along the boundary.
The boundary of a k-dimensional manifold M , denoted ∂M , is (k − 1)-dimensional, as was the case
with curves (k = 1) and surfaces (k = 2). If ω is a differential (k − 1)-form, then this theorem,
which is known as Stokes’s theorem, states that:
R R
M dω = ∂M ω.

It’s the fundamental theorem of calculus. !

11.4 Exercises for Chapter 11


In Exercises 4.1–4.5, calculate the indicated product.

4.1. (x dx + y dy) ∧ (−y dx + x dy)

4.2. (x dx + y dy) ∧ (x dx − y dy)

4.3. (x dx + y dy + z dz) ∧ (y dx + z dy + x dz)

4.4. (−y dx + x dy + z dz) ∧ (yz dy ∧ dz + xz dz ∧ dx + xy dx ∧ dy)

4.5. (x2 dy ∧ dz) ∧ (yz dy ∧ dz + xz dz ∧ dx + xy dx ∧ dy)

4.6. (a) If ω = F1 dx1 + F2 dx2 + · · · + Fn dxn is a 1-form on Rn , show that ω ∧ ω = 0.


(b) Find an example of a 2-form η on R4 such that η ∧ η 6= 0.

In Exercises 4.7–4.11, find the derivative of the given differential form.

4.7. f = x2 + y 2

4.8. f = x sin y cos z

4.9. ω = (x2 − y 2 ) dx + 2xy dy


11.4. EXERCISES FOR CHAPTER 11 281

4.10. ω = 2xy 3 z 4 dx + 3x2 y 2 z 4 dy + 4x2 y 3 z 3 dz


4.11. η = (x + 2y + 3z) dy ∧ dz + exyz dz ∧ dx + x4 y 5 dx ∧ dy

4.12. Let U be an open set in R3 .


(a) Let f = f (x, y, z) be a 0-form (= real-valued function) on U . Find d(df ).
(b) Let ω = F1 dx + F2 dy + F3 dz be a 1-form on U . Find d(dω).
(c) To what facts about vector fields do the results of parts (a) and (b) correspond?
4.13. A typical 3-form on R4 has the form:

ζ = F1 dx2 ∧ dx3 ∧ dx4 + F2 dx1 ∧ dx3 ∧ dx4 + F3 dx1 ∧ dx2 ∧ dx4 + F4 dx1 ∧ dx2 ∧ dx3 ,

where F1 , F2 , F3 , and F4 are real-valued functions. The notation is arranged so that, in the
term with leading factor Fi , the differential dxi is omitted.
As a 4-form on R4 , dζ has the form dζ = f dx1 ∧ dx2 ∧ dx3 ∧ dx4 for some real-valued function
f . Find a formula for f in terms of F1 , F2 , F3 , and F4 .

We lay out more formally some of the rules that govern substitutions and pullbacks. In Rm
with coordinates x1 , x2 , . . . , xm , a typical differential k-form is a sum of terms of the form:

ζ = f dxi1 ∧ dxi2 ∧ · · · ∧ dxik , (11.3)

where f is a real-valued function of x1 , x2 , . . . , xm and 1 ≤ ij ≤ m for j = 1, 2, . . . , k. Suppose that


each of the coordinates xi can be expressed as a smooth function of n other variables u1 , u2 , . . . , un ,
say xi = γi (u1 , u2 , . . . , un ) for i = 1, 2, . . . , m. This is equivalent to having a smooth function
γ : Rn → Rm from (u1 , u2 , . . . , un )-space to (x1 , x2 , . . . , xm )-space, where, in terms of component
functions, γ = (γ1 , γ2 , . . . , γm ).
Using γ to substitute for x1 , x2 , . . . , xm converts the k-form ζ into a k-form in the variables
u1 , u2 , . . . , un . This k-form is what we have been calling the pullback of ζ, denoted γ ∗ (ζ). For
instance, substituting x = γ(u) converts f (x) into f (γ(u)) = (f ◦ γ)(u). In other words, f pulls
back to the composition f ◦ γ : Rn → Rm → R. We write this as:

γ ∗ (f ) = f ◦ γ.

Similarly, after substitution, dxi becomes dγi , i.e.:

γ ∗ (dxi ) = dγi ,

where the expression on the right is the derivative of the 0-form γi . For instance, if (x1 , x2 ) =
γ(u1 , u2 ) = (u1 cos u2 , u1 sin u2 ), then γ ∗ (dx1 ) = dγ1 = d(u1 cos u2 ) = cos u2 du1 − u1 sin u2 du2 .
Applying these substitutions en masse, the k-form ζ = f dxi1 ∧ dxi2 ∧ · · · ∧ dxik of equation (11.3)
pulls back as:
γ ∗ (ζ) = (f ◦ γ) dγi1 ∧ dγi2 ∧ · · · ∧ dγik .
Perhaps a slicker way of writing this is:

γ ∗ (f dxi1 ∧ dxi2 ∧ · · · ∧ dxik ) = γ ∗ (f ) γ ∗ (dxi1 ) ∧ γ ∗ (dxi2 ) ∧ · · · ∧ γ ∗ (dxik ).

If ζ = ζ1 + ζ2 is a sum of k-forms, then carrying out the substitutions in both summands gives the
rule γ ∗ (ζ1 + ζ2 ) = γ ∗ (ζ1 ) + γ ∗ (ζ2 ).
Exercises 4.14–4.18 are intended to provide some practice working with pullbacks.

www.dbooks.org
282 CHAPTER 11. WORKING WITH DIFFERENTIAL FORMS

4.14. Let α : R → R3 be given by α(t) = (cos t, sin t, t), where we think of R3 as xyz-space.

(a) Find α∗ (x2 + y 2 + z 2 ).


(b) Show that α∗ (dx) = − sin t dt.
(c) Find α∗ (dy) and α∗ (dz).
(d) Let ω be the 1-form ω = −y dx + x dy + z dz on R3 . Find α∗ (ω).

4.15. Let T : R2 → R2 be the function from the uv-plane to the xy-plane given by T (u, v) =
(u2 − v 2 , 2uv).

(a) Show that T ∗ (dx) = 2u du − 2v dv.


(b) Find T ∗ (dy).
(c) Let ω = −y dx + x dy. Find T ∗ (ω).
(d) Let η = xy dx ∧ dy. Find T ∗ (η).

4.16. Let σ : R2 → R3 be the function from the uv-plane to xyz-space given by σ(u, v) = (u cos v,
u sin v, u).

(a) Let ω = x dx + y dy − z dz. Show that σ ∗ (ω) = 0.


(b) Let η = z dx ∧ dy. Find σ ∗ (η).

4.17. Let α : R → Rn be a smooth function, where α(t) = (x1 (t), x2 (t), . . . , xn (t)). If f : Rn → R is
a smooth real-valued function, f = f (x1 , x2 , . . . , xn ), then df is a 1-form on Rn . Show that:

d(α∗ (f )) = α∗ (df ).

(Hint: Little Chain Rule.)

4.18. Let T : R2 → R2 be a smooth function, regarded as a map from the uv-plane to the xy-plane,
so (x, y) = T (u, v) = (T1 (u, v), T2 (u, v)).

(a) Find T ∗ (dx) and T ∗ (dy) in terms of the partial derivatives of T1 and T2 .
(b) Let η be a 2-form on the xy-plane: η = f dx ∧ dy, where f is a real-valued function of x
and y. Show that:
T ∗ (η) = (f ◦ T ) · det DT du ∧ dv.

(c) Let S : R2 → R2 be a smooth function, regarded as a map from the st-plane to the
uv-plane. Then the composition T ◦ S : R2 → R2 goes from the st-plane to the xy-plane.
Let η = f dx ∧ dy, as in part (b). Show that:

(T ◦ S)∗ (η) = S ∗ (T ∗ (η)),

i.e., (T ◦ S)∗ = S ∗ ◦ T ∗ for 2-forms on R2 . (Actually, the same relation holds for
compositions and k-forms in general.)

We next formulate a version of the change of variables theorem for double integrals that is
adapted for differential forms, something we have alluded to before. We are not trying to give
another proof of the theorem. Rather, we take the theorem as known and describe how to restate
it in the language of differential forms.
11.4. EXERCISES FOR CHAPTER 11 283

Differential forms are integrated over oriented domains, so, given a bounded subset D of R2 , we
first make a choice whether to assign it the positive orientation or the negative orientation. If this
seems too haphazard, think of it as analogous to choosing whether, as a subset of R3 , D is oriented
by the upward normal n = k or the downward normal n = −k.
A typical 2-form on D has the form f dx ∧ dy. We define its integral over D by:
Z ( RR
f (x, y) dx dy if D has the positive orientation,
f dx ∧ dy = RRD
D − D f (x, y) dx dy if D has the negative orientation,

where, in each case, the integral on the right is the ordinary Riemann double integral. So, for
instance,
RR if D has the negative orientation, then we could
RR also say that, as an integral of a differential
form, D f dy ∧ dx is equal to the Riemann integral D f (x, y) dx dy.
Exercises 4.19–4.20 concern how to express the change of variables theorem in terms of integrals
of differential forms.

4.19. We say that a basis (v, w) of R2 has the positive orientation
 
  if det v w > 0 and the
negative orientation if det v w < 0. Here, we use v w to denote the 2 by 2 matrix
whose columns are v and w. In Section 2.5 of Chapter 2, we used the terms “right-handed”
and “left-handed” to describe the same concepts, though there we put the vectors into the
rows of the matrix, i.e., we worked with the transpose, which doesn’t affect the determinant.
Let L : R2 → R2 be a linear transformation represented by a matrix A with respect to the
standard bases. Thus L(x) = Ax. In addition, assume that det A 6= 0.

(a) Show that L(e1 ), L(e2 ) has the positive orientation if det A > 0 and the negative
orientation if det A < 0.
(b) More generally, let (v, w) be any basis of R2 . Show that (v, w) and L(v), L(w) have


the same orientation if det A > 0 and opposite orientations if det A < 0.

4.20. Let D∗ and D be bounded subsets of R2 , and let T : D∗ → D be a smooth function that maps
D∗ onto D and that is one-to-one, except possibly on the boundary of D∗ . As in the change of
variables theorem, we think of T as a function from the uv-plane to the xy-plane. Moreover,
assume that det DT (u, v) has the same sign at all points (u, v) of D∗ , except possibly on the
boundary of D∗ , where det DT (u, v) = 0 is allowed.
Let D∗ be given an orientation, and let D have the orientation imposed on it by T . That is, let
T (D∗ ) denote the set D with the same orientation, positive or negative, as D∗ if det DT > 0
and the opposite orientation from D∗ if det DT < 0. In the first case, we say that T is
orientation-preserving and, in the second, that it is orientation-reversing. In other words, we
categorize how T acts on orientation based on the behavior of its first-order approximation.
Let η = f dx ∧ dy be a 2-form on D. Use the change of variables theorem to prove that:
Z Z
η= T ∗ (η).
T (D∗ ) D∗

This equation has appeared in various guises throughout the book. You may assume in your
proof that D∗ has the positive orientation. The arguments are similar if it has the negative
orientation. (Hint: See Exercise 4.18.)

The remaining exercises call for speculation rather than proofs. You are asked to propose
reasonable solutions consistent with patterns that have come before . It is considered a bonus if

www.dbooks.org
284 CHAPTER 11. WORKING WITH DIFFERENTIAL FORMS

what you say is true. The correct answers are known and are covered in more advanced courses that
study manifolds, for instance, advanced real analysis or perhaps differential geometry or differential
topology.

4.21. A surface S in R4 is a two-dimensional subset, something that can be parametrized by a


function σ : D → R4 , where D is a subset of R2 . Let us call the parameters s and t so
that σ(s, t) = (x1 (s, t), x2 (s, t), x3 (s, t), x4 (s, t)). Assume that it is known how to define an
orientation of S and what it means for a parametrization σ to be orientation-preserving.

(a) Write down the general form of a differential form η in the variables x1 , x2 , x3 , x4 that
could be integrated over S. (Hint: Six terms.)
(b) Let σ : D → R4 be an orientation-preserving parametrization RR
R
of S. The integral S η
should be defined via the parametrization as a double integral D f (s, t) ds dt over the
parameter domain D. Propose a formula for the function f .

4.22. Continuing with the preceding problem, assume that it also known how to orient the boundary
∂S of an oriented surface S in R4 in a reasonable way. Propose an expression µ that makes
the following statement true:
Z Z
F1 dx1 + F2 dx2 + F3 dx3 + F4 dx4 = µ.
∂S S

4.23. Let W be a bounded four-dimensional region in R4 whose boundary ∂W consists of a finite


number of piecewise smooth three-dimensional subsets. Assuming that it is possible to define
an orientation on ∂W in a reasonable way, propose a 3-form ζ on R4 such that:
Z
Vol (W ) = ζ,
∂W

where the volume on the left refers to four-dimensional volume. More than one answer for ζ
is possible. Try to find one with as much symmetry in the variables x1 , x2 , x3 , x4 as you can.
Answers to selected exercises

Section 1.1 Vector arithmetic

1.1. x + y = (5, −3, 9), 2x = (2, 4, 6), 2x − 3y = (−10, 19, −12)

1.3. (−5, 3, −9)


1
1.5. (b) 3x + 23 y
(c) (−1, 1, 2)

Section 1.2 Linear transformations

2.2. T ((x1 , x2 ) + (y1 + y2 )) = T (x1 + x2 , y1 + y2 ) = (x1 + y1 , x2 + y2 , 0) = (x1 , x2 , 0) + (y1 , y2 , 0) =


T (x1 , x2 ) + T (y1 , y2 )
T (c(x1 , x2 )) = T (cx1 , cx2 ) = (cx1 , cx2 , 0) = c(x1 , x2 , 0) = cT (x1 , x2 )

Section 1.3 The matrix of a linear transformation


 
1 3
3.1. (a)
2 4
(b) (23, 34)
(c) (x1 + 3x2 , 2x1 + 4x2 )
 
0 −1
3.3.
1 0
 
1 0
3.5.
0 1

3.9. (a) T (e1 ) = (1, 2), T (e2 ) = (2, 4)


(b) all scalar multiples of (1, 2)
 
1 0
3.11. 0 1
0 0
 
0 1 0
3.13. 1 0 0
0 0 −1

285

www.dbooks.org
286 ANSWERS TO SELECTED EXERCISES

Section 1.4 Matrix multiplication


   
3 1 6 2
4.1. AB = , BA =
1 7 2 4
   
1 0 0 1 0 0
4.3. AB = 0 1 0, BA = 0 1 0
0 0 1 0 0 1
 
  13 4 −5
21 0
4.5. AB = , BA =  4 5 6
0 14
−5 6 17

Section 1.5 The geometry of the dot product

5.1. (a) −3
√ √
(b) kxk = 6, kyk = 3 2
1
(c) arccos(− 2√ 3
)

5.3. ± √15 (−2, 1)

Section 1.6 Determinants

6.1. 23

6.3. 0

6.5. 12

6.9. (a) T (x1 , x2 , x3 ) = −3x1 + 6x2 − 3x3


(b) T (x + y) = T (x1 + y1 , x2 + y2 , x3 + y3 ) = −3(x1 + y1 ) + 6(x2 + y2 ) − 3(x3 + y3 ) =
−3x1 + 6x2 − 3x3 − 3y1 + 6y2 − 3y3 = T (x) + T (y)
T (cx) = T (cx1 , cx2 , cx3 ) = −3(cx1 ) + 6(cx2 ) − 3(cx3 ) = c(−3x1 + 6x2 − 3x3 ) = c T (x)
 
(c) −3 6 −3
(d) (−3, 6, −3) · (1, 2, 3) = −3 + 12 − 9 = 0
(−3, 6, −3) · (4, 5, 6) = −12 + 30 − 18 = 0

Section 2.1 Parametrizations

1.1.
4

0.5 1 1.5 2
287

1.3.
5

-1 0 1 2 3 4 5

1.5.

1.7.

1.9. α(x) = (x, x2 ), −1 ≤ x ≤ 1

1.11. α(t) = (a cos t, −a sin t) , 0 ≤ t ≤ 2π

1.13. α(t) = (1 + 4t, 2 + 5t, 3 + 6t)

1.15. α(t) = (1 − t, 0, t)
v·(p−a)
1.17. (a) v·v

(b) a + v·(p−a)
v·v v

1.19. (b) No intersection


(c) Intersection at (2, 1, 3)

Section 2.2 Velocity, acceleration, speed, arclength

2.1. (a) v(t) = (3t2 , 6t, 6)


v(t) = 3(t2 + 2)
a(t) = (6t, 6, 0)
(b) 88

www.dbooks.org
288 ANSWERS TO SELECTED EXERCISES

2.3. (a) v(t) = (−2 sin 2t, 2 cos 2t, 6t1/2 )



v(t) = 2 1 + 9t
a(t) = (−4 cos 2t, −4 sin 2t, 3t−1/2 )
4
√ √ 
(b) 27 19 19 − 10 10

2.5. α(t) = (1 + √1 t, 1 + √2 t, 1 + √3 t)
14 14 14

Section 2.3 Integrals with respect to arclength


150
3.1. 7

3.3. (a) 5 2

(b) 2

Section 2.4 The geometry of curves: tangent and normal vectors


1

4.1. T(t) = 5 4 cos 4t, −4 sin 4t, 3
N(t) = (− sin 4t, − cos 4t, 0)
1

T(t) · N(t) = 5 −4 cos 4t sin 4t + 4 sin 4t cos 4t + 0 = 0

Section 2.5 The cross product

5.1. (a) (2, −4, 2)


(b) v · (v × w) = 2 − 4 + 2 = 0, w · (v × w) = 2 − 12 + 10 = 0

5.3. (a) (−10, −11, −3)


(b) v · (v × w) = −10 + 22 − 12 = 0, w · (v × w) = 20 − 11 − 9 = 0

5.5. (−4, 5, −1)

5.7. (a) (−3, −9, −3)



(b) 3 11
(c) u · (v × w) = −12, v · (u × w) = 12, (u × v) · w = −12
(d) 12

5.11. (b) 13 41
 
5.15. (a) u · (u × v × w) = det u u v w = 0 (two equal rows) and so on
(b) Left-handed
(c) (−4, −4, −4, −4)

Section 2.6 The geometry of space curves: Frenet vectors

6.1. ( 53 cos 4t, − 53 sin 4t, − 45 )

Section 2.7 Curvature and torsion

7.1. α(t) = (2 cos t, 2 sin t, 0) is one possibility.


289

Section 2.9 The classification of space curves


9.1. (a) 2
(b) √1 (− sin t, − cos t, 1)
2

(c) (− cos t, sin t, 0)


(d) − √12 (sin t, cos t, 1)
1
(e) 2

(f ) − 12
(g) a = 1, b = −1. Both α and β trace out the same helix but in opposite directions. The
rotation T (x, y, z) = (x, −y, −z) about the x-axis by π transforms one parametrization
into the other.


9.3. (a) 2 2
1
√ √ 
(b) 1 + 3 cos t, 3 − cos t, −2 sin t

2 2

(c) 12 − 3 sin t, sin t, −2 cos t


1
√ √ 
(d) 2√ 2
1 − 3 cos t, 3 + cos t, 2 sin t
1
(e) 4
1
(f ) 4

(g) It’s congruent to the helix parametrized by β(t) = (2 cos t, 2 sin t, 2t).
q
1 19
9.7. 7 14


2
9.9. (a) 2 (1+sin2 t)3/2

(b) τ (t) = 0

2
9.13. (a) T(t) = − 2 (1, sin 2t, cos 2t)
N(t) = (0, − cos 2t, sin 2t)

κ(t) = 2

τ (t) = 2
√ √ √ √
2 2 2 2

(b) α(t) = − 2 t, − 4 + 4 cos 2t, − 4 sin 2t

www.dbooks.org
290 ANSWERS TO SELECTED EXERCISES

Section 3.1 Graphs and level sets

1.1. (b)
2
6
5
4
3
1 2
1

-1

-2
-2 -1 0 1 2

(c)

1.3. (b)
2

1
1

0
0

-1
-1

-2
-2
-2 -1 0 1 2

(c)

1.5. (b)
2
3
2.5
1 2
1.5
1
0.5
0

-1

-2
-2 -1 0 1 2
291

(c)

1.7. (b)
2
1.5

1
1
0.5

0
0
-0.5

-1
-1
-1.5

-2
-2 -1 0 1 2

(c)

Section 3.2 More surfaces in R3

2.1.

2.3.

www.dbooks.org
292 ANSWERS TO SELECTED EXERCISES

2.5. The level set corresponding to c = 0 consists of the single


√point (0, 0, 0). Those for c = 1 and
c = 2 are spheres centered at the origin of radius 1 and 2, respectively.

2.7. c = −1, 0, 1 (left, center, right)

2.9. c = −1, 0, 1 (left, center, right)

Section 3.3 The equation of a plane

3.1. 4x + 5y + 6z = 32

3.3. For example, p = (9, 0, 0), n = (1, 3, 5)

3.5. Perpendicular

3.7. 6x + 3y + 2z = 6

3.9. 2x + z = 3

3.11. ( 83 , 31 , 31 )

3.13. (b) It’s the triangle with vertices (1, 0, 0), (0, 1, 0), and (0, 0, 1).

3.15. (c) √4
3

Section 3.4 Open sets

4.1. Let a = (c, d) be a point of U , and let r = min{c, 1 − c, d, 1 − d}. Then B(a, r) ⊂ U , so U is
an open set.
293

4.3. The point a = (1, 0) is in U , but no open ball centered at a stays within U . Thus U is not
an open set.

Section 3.5 Continuity

5.1. (a) If (x, y) 6= (0, 0), then, in polar coordinates:

(r cos θ)3 (r sin θ)


f (x, y) = f (r cos θ, r sin θ) = = r2 cos3 θ sin θ,
r2
so |f (x, y)| = |r2 cos3 θ sin θ| ≤ r2 = k(x, y)k2 . The inequality also holds if (x, y) = (0, 0)
(both sides equal 0). Thus if k(x, y)k < a, then |f (x, y)| ≤ k(x, y)k2 < a2 .

(b) Let  > 0 be given, and let δ = . By part (a), if k(x, y) − (0, 0)k = k(x, y)k < δ, then
|f (x, y) − f (0, 0)| = |f (x, y)| < δ 2 = . Therefore f is continuous at (0, 0).

Section 3.6 Some properties of continuous functions

6.1. f (x, y) = (1, 2) · (x, y) = x + 2y. We know that the projections x and y are continuous, so, as
an algebraic combination of continuous functions, f is continuous as well.

Section 3.7 The Cauchy-Schwarz and triangle inequalities

7.1. (a) By the triangle inequality, kvk = k(v−w)+wk ≤ kv−wk+kwk, so kvk−kwk ≤ kv−wk.
(b) Reversing the roles of v and w gives kwk−kvk ≤
kw−vk = kv−wk, or −(kvk−kwk) ≤
kv − wk. Hence ±(kvk − kwk) ≤ kv − wk, so kvk − kwk ≤ kv − wk.

7.4. First, note that |cf (x) − cf (a)| = |c| |f (x) − f (a)| ≤ (|c| + 1) |f (x) − f (a)|. Now, let  > 0
be given. Since f is continuous at a, there exists a δ > 0 such that, if kx − ak < δ,

then |f (x) − f (a)| < |c|+1 . Then, for this choice of δ, if kx − ak < δ, |cf (x) − cf (a)| ≤

(|c| + 1) |f (x) − f (a)| < (|c| + 1) · |c|+1 = . Hence cf is continuous at a.

www.dbooks.org
294 ANSWERS TO SELECTED EXERCISES

Section 3.8 Limits

8.1. For sums: Let L = limx→a f (x) and M = limx→a g(x), and consider the functions:
( (
f (x) if x 6
= a, g(x) if x = 6 a,
fe(x) = and ge(x) =
L if x = a M if x = a.

By definition of limit, fe and ge are continuous at a, hence so is their sum. In other words, the
function given by: (
f (x) + g(x) if x 6= a,
fe(x) + ge(x) =
L+M if x = a
is continuous at a. By definition, this means that limx→a (f (x) + g(x)) = L + M .

Section 4.1 The first-order approximation


∂f
1.1. (a) = 4, ∂f
∂x (a) ∂y (a) = −2
 
(b) Df (a) = 4 −2 , ∇f (a) = (4, −2)
(c) `(x, y) = −3 + 4x − 2y
(d) f (2.01, 1.01) = 3.02, `(2.01, 1.01) = 3.02.
∂f
1.3. (a) ∂x = 3x2 − 4xy + 3y 2
∂f
= −2x2 + 6xy − 12y 2
∂y
(b) 3x − 4xy + 3y 2 −2x2 + 6xy − 12y 2
 2 

∂f
1.5. (a) ∂x = (1 + xy)exy
∂f
= x2 exy
∂y
(b) (1 + xy)exy x2 exy
 

∂f 2x
1.7. (a) ∂x = − (x2 +y 2 )2

∂f 2y
∂y = − (x2 +y 2 )2
h i
2x 2y
(b) − (x2 +y 2 )2 − (x2 +y 2 )2

∂f ∂f ∂f
1.9. (a) ∂x = 1, ∂y = 2, ∂z =3
(b) (1, 2, 3)
∂f 2 2 2
= cos(x + 2y) − 2x sin(x + 2y) e−x −y −z

1.11. (a) ∂x
∂f  −x2 −y2 −z 2
∂y = 2 cos(x + 2y) − y sin(x + 2y) e
∂f 2 −y 2 −z 2
∂z = −2z sin(x + 2y)e−x
2 2 2 2 2 2
cos(x + 2y) − 2x sin(x + 2y) e−x −y −z , 2 cos(x + 2y) − y sin(x + 2y) e−x −y −z ,
 
(b)
2 2 2
−2z sin(x + 2y)e−x −y −z
∂f 2x
1.13. (a) ∂x = x2 +y 2
∂f 2y
∂y = x2 +y 2
∂f
∂z =0
295

(b) ( x22x , 2y , 0)
+y 2 x2 +y 2

∂f 3
1.17. (a) ∂x = √ 2x4
x +y 4
∂f 3
∂y = √ 2y4
x +y 4
∂f ∂f
(b) ∂x (0, 0) = 0, ∂y (0, 0) = 0.

Section 4.2 Conditions for differentiability

2.1. lim(x,y)→(0,0) f (x, y) = lim(x,y)→(0,0) x2 − y 2 = 0, while f (0, 0) = π, so f is not continuous at


(0, 0), hence not differentiable there either.

Section 4.4 The C 1 test


∂f x ∂f y
4.1. For (x, y) 6= (0, 0), the partial derivatives are given by ∂x = √ and ∂y = √ . As
x2 +y 2 x2 +y 2
algebraic combinations and compositions of continuous functions, both of these are continuous
at all points other than the origin. Thus, by the C 1 test, f is differentiable at all points other
than the origin.

4.2. Differentiable at all (x, y) in R2

4.4. Differentiable at all (x, y) where x 6= 0 and y 6= 0

4.6. Differentiable at all (w, x, y, z) in R4

Section 4.5 The Little Chain Rule

5.1. (b) (f ◦ α)(t) = f (t2 , t3 ) = t4 + t6 , so (f ◦ α)0 (t) = 4t3 + 6t5 and (f ◦ α)0 (1) = 10.
∂f dx ∂f dy ∂f dz ∂f ∂f ∂f
5.3. ∂x dt + ∂y dt + ∂z dt , where ∂x , ∂y , and ∂z are evaluated at α(t)

Section 4.6 Directional derivatives

6.1. − √517

6.3. √2
6

6.5. √6 (4, 1)
17

6.7. r = √5 , s= 5

2 2 2

6.9. (a) √1 (−1, 1, −6)


38
1
√ (1, −1, 6)
(b) 38

(c) 800 38 e−42 ◦ C/sec

6.11. (a) Direction: u = √1 (3, 2)


13

Value: 13
(b) Direction: u = √1 (−3, −2)
13

Value: − 13

www.dbooks.org
296 ANSWERS TO SELECTED EXERCISES

(c) y = 13 x2 − 1
3
3

1 2 3

(d) y = − 32 ln x
5

1 2 3
-1

Section 4.7 ∇f as normal vector

7.1. 2x − 4y − z = −3

7.3. 4y + 3z = 7

7.5. α(t) = (1 + 14t, 2 + 5t, 3 − 8t)

Section 4.8 Higher-order partial derivatives

∂2f
8.1. ∂x2
= 12x2 − 12xy + 6y 2
∂2f
∂y ∂x = −6x2 + 12xy − 12y 2
∂2f
∂x ∂y = −6x2 + 12xy − 12y 2
∂2f
∂y 2
= 6x2 − 24xy + 60y 2

∂2f 2 −y 2
8.3. ∂x2
= (4x2 − 2)e−x
∂2f 2 −y 2
∂y ∂x = 4xye−x
∂2f 2 −y 2
∂x ∂y = 4xye−x
∂2f 2 −y 2
∂y 2
= (4y 2 − 2)e−x

∂7f
8.5. i = 4, j = 3, ∂x4 ∂y 3
(0, 0) = 144

Section 4.10 Max/min: critical points

10.1. (1, 2) saddle point

10.3. (0, 0) local minimum

10.5. (0, 0) saddle point, (3, 9) local minimum

10.7. k 6= 0: (0, 0) saddle point; ( k3 , k3 ) local minimum if k > 0, local maximum if k < 0
k = 0: (0, 0) neither local maximum nor minimum (degenerate)
297
1
10.11. y = x + 3
4

-1 1 2 3 4

Section 4.11 Classifying nondegenerate critical points

11.1. 1 − 21 (x + y)2

∂3f ∂3f ∂3f ∂3f


11.3. (a) h3 ∂x3
+ 3h2 k ∂x2 ∂y
+ 3hk 2 ∂x ∂y 2
+ k3 ∂y 3
∂f ∂f 1 ∂2f 2 2
∂ f ∂2f 1 ∂3f
(a) k 2 3

(b) f (a) + ∂x (a) h + ∂y (a) k + 2 ∂x2 (a) h + 2 ∂x ∂y (a) hk + ∂y 2
+ 6 ∂x3 (a) h +
3 3f ∂3f
3 ∂x∂2 f∂y (a) h2 k + 3 ∂x∂ ∂y 2 3

2 (a) hk + ∂y 3 (a) k

Section 4.12 Max/min: Lagrange multipliers

12.1. (a) x can be chosen large and positive and y large and negative so that x + 2y = 1. Thus
f (x, y) = xy can be made as large and negative as you like.
1
(b) Maximum value 8 at ( 12 , 41 )

12.3. Maximum at √1 (2, −3, 4) and minimum at √1 (−2, 3, −4)


29 29

12.5. x = 23 , y = 1, z = 6
5

12.9. x = 400, y = 150, z = 100

Section 5.1 Volume and iterated integrals

49
1.1. 6

7
1.3. 6

26
1.5. 3 ln 2

1.7. e2 − 2e + 1

1.9. (a)
4

-1 1 2 3 4

-1

Z 4 Z 2 
(b) f (x, y) dx dy
3 1

www.dbooks.org
298 ANSWERS TO SELECTED EXERCISES

1.11. (a)
1.25

1
x = y^2
0.75
x=y
0.5

0.25

0.25 0.5 0.75 1

Z 1 Z √
x 
(b) f (x, y) dy dx
0 x

1.13. (a)
1.25
1 - x2
1

0.75

0.5

0.25

-1 -0.5 0.5 1

Z 1 Z √1−y2 
(b) √ f (x, y) dx dy
0 − 1−y 2

1.15. (a)
4

y = 1/x
1

0
0 0.25 0.5 0.75 1 1.25
Z 4 Z 1 
(b) yexy dx dy
1
1 y

(c) e4 − 4e
2
1.17. 3

2
1.19. 3

1
1.21. 3

Section 5.2 The double integral

1
2.1. (a) S = 0 or 9
1 1 3 1
(b) S = 0, 16 , 8 , 16 , or 4
1
(c) If n is odd, S = 0 or n2
. If n is even, S = 0, n12 , n22 , n32 , or 4
n2
.

(e) Let  > 0 be given, and let δ = 2 . If R is subdivided into subrectangles of dimensions
4xi by 4yj , where 4xi < δ and 4yj P < δ for all i, j, then, by part (d), all Riemann
sums based on the subdivision satisfy | i,j f (pij ) 4xi 4yj − 0| < 4δ 2 = . Therefore f
RR
is integrable over R, and R f (x, y) dA = 0.
299

Section 5.3 Interpretations of the double integral

3.1. 18 hundred birds

3.3. (a) 80 miles per hour


(b) All points of R that lie on the circle x2 + y 2 = 38 , a quarter-circular arc

3.5. ( 23 , 35 )
q
2
3.7. 3

Section 5.4 Parametrization of surfaces


p
4.1. (a) σ : D → R3 , σ(x, y) = (x, y, a2 − x2 − y 2 ), where D is the disk x2 + y 2 ≤ a2

(b) σ : D → R3 , σ(r, θ) = (r cos θ, r sin θ, a2 − r2 ), where D = [0, a] × [0, 2π]
(c) σ : D → R3 , σ(φ, θ) = (a sin φ cos θ, a sin φ sin θ, a cos φ), where D = [0, π2 ] × [0, 2π]

Section 5.5 Integrals with respect to surface area

5.1. (a) σ : D → R3 , σ(x, y) = (x, y, 1 − x − y), where D is the triangular region in the xy-plane
with vertices (0, 0), (1, 0), (0, 1)

3
(b) 2

7 3
(c) 2

5.2. 36π

5.5. (a)


(b) 4π (6 + 2)

(c) 2π (27 + 10 2)

5.7. (a) 8
(b) 4

Section 5.6 Triple integrals and beyond


1
6.1. 12
7
6.3. 48
64
6.5. 15

www.dbooks.org
300 ANSWERS TO SELECTED EXERCISES

6.7. a2

6.8. (a)

√ q
2
q
2
Z 2 Z 1− y2 Z 2 1− y2 −z 2  
(b) q
2
f (x, y, z) dx dz dy
0 0 −2 1− y2 −z 2
q q
2 2
Z 2 Z 1− x4 Z 2− x2 −2z 2  
(c) f (x, y, z) dy dz dx
−2 0 0

6.10. (a)

Z 1 Z y Z y  
(b) f (x, y, z) dx dz dy
0 0 z
Z 1 Z x Z 1  
(c) f (x, y, z) dy dz dx
0 0 x

Section 6.1 Continuity revisited

1.2. Given any  > 0, there exists a δ > 0 such that kf (x) − Lk <  whenever kx − ak < δ, except
possibly when x = a.

Section 6.2 Differentiability revisited


 
2x −2y
2.1. (a)
2y 2x
 
2 −4
(b)
4 2

2.3. 1 2y 9z 2
 

 
1 0
2.5. 0 1
0 0
301

ex+y ex+y
 

2.7. (a) −e−x cos y −e−x sin y 


−e−x sin y e−x cos y
(b) All entries of Df (x, y) are continuous on R2 , so, by the C 1 test, f is differentiable at
every point of R2 .
 
a b c
2.9. (a) d e f  (= A)
g h i

Section 6.3 The chain rule: a conceptual approach


 
1 −1  
yz xz xy
3.1. (a) Df (s, t) = 2e2s+3t 3e2s+3t  , Dg(x, y, z) =
2x 0 3z 2
cos t −s sin t
 
2 0
(b)
4 −4
 
0 0 0
(c) 0 0 0
0 0 0
 
1 0
3.3. (a)
0 1
 
1 0
(b)
0 1

Section 6.4 The chain rule: a computational approach


∂f
4.1. ∂w
∂s = ∂x + 3 ∂f ∂f
∂y + 5 ∂z
∂w
∂t = 2 ∂f ∂f ∂f
∂x + 4 ∂y + 6 ∂z

∂w ∂f ∂f ∂f
4.3. (a) ∂ρ = ∂x sin φ cos θ + ∂y sin φ sin θ + ∂z cos φ
∂w ∂f ∂f ∂f
∂φ = ∂x ρ cos φ cos θ + ∂y ρ cos φ sin θ − ∂z ρ sin φ
∂w
∂θ = − ∂f ∂f
∂x ρ sin φ sin θ + ∂y ρ sin φ cos θ
(b) 0
∂f ∂f
4.5. ∂x = 52 , ∂y = 3
4

∂z
4.9. ∂x = − 13 , ∂z
∂y = − 32

∂f d2 x ∂f d2 y ∂2f dx 2
 2
∂ f dx dy ∂2f dy 2
4.11. (b) ∂x dt2 + ∂y dt2 + ∂x2 dt + 2 ∂x ∂y dt dt + ∂y 2 dt

Section 7.1 Change of variables for double integrals


π
1.2. 8

1.4. π(sin 16 − sin 4)

www.dbooks.org
302 ANSWERS TO SELECTED EXERCISES

1.6. 2π
2
1.8. (b) π(1 − e−a )

Section 7.3 Examples: linear changes of variables, symmetry

3.1. (a) D is the disk of radius 2 centered at (3, 2).


5

2 (3,2)

1 D

1 2 3 4 5

(b) D∗ is the disk u2 + v 2 ≤ 4 of radius 2 centered at the origin.


ZZ
(c) (u + v + 5) du dv = 20π
D∗

3.4. (a) 150


(b) 100
pu √ 
3.7. (a) T (u, v) = v , uv
(b) D∗ = [1, 2] × [1, 4]
3
(c) 2 e(e − 1)
8
3.9. 9

Section 7.4 Change of variables for n-fold integrals


48π
4.1. 5
π
4.3. 6 (1 + 3 ln 34 )

4.5. (0, 0, 83 a)

4.7. 4(2 − 2)
8π 2
4.9. 15

Section 8.2 Exercises for Chapter 8 (Vector fields)

1.1.
2

-1

-2

-2 -1 0 1 2
303

1.3.
2

-1

-2

-2 -1 0 1 2

1.5.
2

-1

-2

-2 -1 0 1 2

1.7. (a) F(x, y) = √ 1 (−2x, 1) or its negative


4x2 +1

(b) F(x, y) = √ 1 (1, 2x) or its negative


4x2 +1
√ √ √  √ √ √ 
1.9. (a) α0 (t) = −2a sin( 2t), 2a cos( 2t) and F(α(t)) = F 2a cos( 2t), a sin( 2t) =
√ √ √ 
−2a sin( 2t), 2a cos( 2t) , so α0 (t) = F(α(t)).
2 2
√ √ √ √
2
(b) x2 + y 2 = 2a cos2 ( 2t) + a2 sin2 ( 2t) = a2 cos2 ( 2t) + a2 sin2 ( 2t) = a2

-1

-2
-2 -1 0 1 2

1.12. (a) α(t) = (x0 et , y0 et ) = et (x0 , y0 )


(b) The integral curves are rays emanating out from the origin.

www.dbooks.org
304 ANSWERS TO SELECTED EXERCISES

Section 9.1 Definitions and examples

1.1. (a)
2

-1

-2

-2 -1 0 1 2

(b) For instance,R any line segment radiating directly away from the origin. If α(t) = (t, t), 0 ≤
t ≤ 1, then C x dx + y dy = 1.

R instance, any circle centered at the origin. If α(t) = (cos t, sin t), 0 ≤ t ≤ 2π, then
(c) For
C x dx + y dy = 0.
1
1.3. 2 (π + sin2 (π 2 ))
179
1.4. 60

1.6. (a) F(x, y) = (x + y, y − x)


(b) 1
(c) 0
(d) 2
69
1.8. 2

1.10. (a)

(b) e4 + 2

Section 9.3 Conservative fields

3.1. (a) It’s conservative with potential function f (x, y) = xy.


(b) 2

3.3. (a) (−6x2 y − 8x3 z, 6y 4 z + 6xy 2 , 12x2 z 2 − 12y 3 z 2 )


(b) Not conservative

3.5. Not conservative

3.7. Conservative with potential function f (x, y, z) = 21 x2 + 21 y 2 + 12 z 2 + xyz

3.9. (a) q = 1 − 2p
305

(b) All positive p


3.13. 3
R
3.17. The vector field F = ∇(f g) is conservative with potential function f g, so C ∇(f g) · ds = 0
for all piecewiseRsmooth oriented closed curves C.
R By Exercise 1.19R in Chapter 4, ∇(f g) =
f ∇g + g∇f , so C (f ∇g + g∇f ) · ds = 0. Hence C (f ∇g) · ds = − C (g∇f ) · ds.

Section 9.4 Green’s theorem

4.1. 2
9π 3
4.3. 2

4.6. 4
1
4.8. 2

4.10. 12

Section 9.5 The vector field W

5.1. (a) 27

Section 9.6 The converse of the mixed partials theorem

6.1. (a) C traverses the unit circle once clockwise. Winding number = −1.
(b) Cn goes around the unit circle |n| times, counterclockwise if n > 0 and clockwise if n < 0.
It stays stuck at the point (1, 0) if n = 0. Winding number = n.
6.2. Winding number = 0

Section 10.1 What the surface integral measures

1.1. (a) 2πa3


(b) 0
1
1.2. 2

Section 10.2 The definition of the surface integral


7
2.1. 2
32π 3
2.3. 2π + 3
8
2.5. 3

Section 10.3 Stokes’s theorem

3.1. (a) For example, G(x, y, z) = (−y cos z, x, y 2 z)


(b) 2π
(c) 2π
3.3. (a) (y − 2x, x + y, z)

(b) 2

www.dbooks.org
306 ANSWERS TO SELECTED EXERCISES

Section 10.4 Curl fields

4.1. ∇ · F = 3, not 0, so F is not a curl field.

4.2. It’s a curl field with G(x, y, z) = ( 12 z 2 , 12 x2 , 21 y 2 ), for example.

Section 10.5 Gauss’s theorem


384π
5.1. 5
104π
5.3. 3

5.7. 225 − 32π

Section 10.6 The inverse square field

6.2. 11

6.4. It’s the total mass contained in the interior of S, i.e., the sum of those mi for which pi is in
the interior of S.

Section 10.7 A more substitution-friendly notation for surface integrals

7.1. (a) F(x, y, z) = (xy, yz, xz)


(b) 0

Section 11.4 Exercises for Chapter 11 (Working with differential forms)

4.1. (x2 + y 2 ) dx ∧ dy

4.3. (xy − z 2 ) dy ∧ dz + (yz − x2 ) dz ∧ dx + (xz − y 2 ) dx ∧ dy

4.5. 0

4.7. 2x dx + 2y dy

4.9. 4y dx ∧ dy

4.11. (1 + xzexyz ) dx ∧ dy ∧ dz
∂F1 ∂F2 ∂F3 ∂F4
4.13. ∂x1 − ∂x2 + ∂x3 − ∂x4

4.14. (a) 1 + t2
(c) α∗ (dy) = cos t dt, α∗ (dz) = dt
(d) (1 + t) dt

4.16. (b) u2 du ∧ dv
∂T1 ∂T1
4.18. (a) T ∗ (dx) = ∂u du + ∂v dv
∂T2 ∂T2
T ∗ (dy) = ∂u du + ∂v dv

4.21. (a) F1 dx1 ∧ dx2 + F2 dx1 ∧ dx3 + F3 dx1 ∧ dx4 + F4 dx2 ∧ dx3 + F5 dx2 ∧ dx4 + F6 dx3 ∧ dx4
Index

acceleration, 28 closed curve, 212


add zero, 32, 78 closed set, 65
affine function, 84 closed surface, 252
annulus, 223 closure, 134, 235
arclength, 29, 30 Cobb-Douglas, 107
area complement, 65
as double integral, 131 composition of functions, 10, 11
as line integral, 219 conservation of energy, 230
of graph z = f (x, y), 152 conservative field, 211, 212, 221, 222, 234,
of parallelogram, 15, 36, 175 273, 274, 279
of sphere, 143, 238 continuity, 66, 98, 157
of surface, 142 contour map, 57
of surface of revolution, 152 critical point, 99
average value, 132, 144 cross product, 35
curl, 214
basis, 5
curl field, 251
binormal, 38
curvature, 40, 227
boundary
curve, 25
of curve, 274
cylinder, 59, 138, 236
of subset of R2 , 217
cylindrical coordinates, 137, 182
of subset of R3 , 255
of surface, 245, 248
Darboux formulas, 51
bounded
dependence diagram, 163
function, 127, 144
derivative
set, 127, 144
matrix, 85, 86, 159
C1 of differential form, 275
class, 90 of path, 28
test, 90, 98, 161 determinant, 15
C2 , 97 differentiability, 86, 159
C∞ , 98 differential form
Cartesian product, 127 0-form, 274
Cauchy-Schwarz inequality, 71 k-form, 274
centroid, 117, 132, 144 n-form, 274
chain rule, 161, 211, 262, 264, 280 1-form, 204
change of variables, 176, 178, 182, 210, 260, 2-form, 261
261, 263, 264, 276, 282, 283 directional derivative, 93
closed ball, 65 divergence, 253

307

www.dbooks.org
308 INDEX

dot product, 7, 21 law of cosines, 13


least squares, 117
eigenvalue, 120 level set, 57
eigenvector, 120 limit, 72, 158
line integral, 204, 208
first-order approximation, 83, 86, 91, 92, 95,
linear transformation
159, 161, 174, 283
definition, 6
flux, 235
matrix of, 8, 9, 175
Frenet vectors, 39
Little Chain Rule, 92, 93, 94, 103, 112, 163,
Frenet-Serret formulas, 42
165, 211
Fubini’s theorem, 129
local maximum/minimum, 99, 100
fundamental theorem of calculus, 279
manifold, 279
Gauss’s law, 259, 270
marginal productivity, 108
Gauss’s theorem, 255, 273, 274, 279
matrix
global maximum/minimum, 99
associativity of multiplication, 20
gradient, 85, 94
definition, 8
gradient field, 211
multiplication, 9, 10
graph, 55
mean value theorem, 89, 112, 116, 168
Green’s theorem, 217, 273, 274, 279
mixed partials
handedness, 27, 37, 283 of real-valued function, 96, 97, 115, 116,
harmonic function, 231, 270 150, 253
helicoid, 151, 152, 265 of vector field, 214, 221, 223
helix, 27, 39, 40, 41, 55 monkey saddle, 73, 120
Hessian, 100
nondegenerate critical point, 100
hypar, 74, 80
normal vector, 61, 94
hyperbolic paraboloid, 58, 74
octant, 123
implicit differentiation, 165, 169
one-to-one, 174
integral
open ball, 62
definition, 127, 128, 130, 144
open set, 63
iterated, 123, 129
orientation
of density, 131
of basis, 37, 283
of differential form, 274
of curve, 208
Riemann, 128
of surface, 236
integral curve, 201
of surface parametrization, 240
integral path, 201, 230
orthogonal, 14
integral with respect to arclength, 30, 203,
208, 227 parametrization
integral with respect to surface area, 141, 235 effect on line integral, 207, 208
isotherm, 95 effect on surface integral, 261, 263
of circle, 26
Jacobian
of curve, 25
determinant, 176
of graph, 26
matrix, 86, 159
of helix, 27
kinetic energy, 230 of line, 27
of surface, 134
Lagrange multipliers, 106 partial derivative, 85
INDEX 309

path, 25 surface integral, 237, 241, 261


piecewise smooth symmetric matrix, 105, 119
curve, 210 symmetry
surface, 241, 249, 252 about the origin, 192
plane in y = x, 192
as span, 13 in y-axis, 180
equation of, 61 in plane, 185, 239
polar coordinates, 69, 136, 176
potential energy, 230 torsion, 40, 76
potential function, 211, 212 torus, 223, 268
principal normal, 33 transpose, 16
product rules, 41 triangle inequality, 70, 71
pullback, 178, 210, 240, 261, 264, 281, 282
unit tangent, 31
related rates, 112 unit vector, 14
Riemann sum, 30, 127, 130, 131, 141, 174,
180 vector
right-hand rule, 37 addition, 3
rotation matrix, 19 column, 9
norm, 12
saddle point, 99, 100 scalar multiplication, 3
saddle surface, 58, 74, 99, 100 unit, 14
sample points, 127 zero, 3
scalar, 3 vector field
second derivative test, 100 circulating, 198, 201, 205, 213, 218
second-order approximation, 102, 170 definition, 197
separate variables, 149, 179 inverse square, 199, 213, 238, 257
simple curve, 217 winding, 199, 219, 223
simply connected, 223 velocity, 28
smooth, 98, 161 volume
solid torus, 269 as n-dimensional integral, 144
span, 13 as double integral, 121, 131
spectral theorem, 105 as surface integral, 257
speed, 29 of ball, 183, 187, 193, 257
sphere, 58, 139, 143, 236, 241 of parallelepiped, 36
spherical coordinates, 138, 183
standard basis, 5 wedge product, 260, 261, 275
Stokes’s theorem, 249, 273, 274, 279, 280 winding number, 224
substitution, 176, 177, 210, 240, 261, 281 work, 203, 230

www.dbooks.org
www.dbooks.org
www.dbooks.org

You might also like