UofTMAT137Lecture Notes
UofTMAT137Lecture Notes
c
Tyler Holden,
2014-2015
Contents
1 Logic and Proofs
1.1
1.2
1.3
1.4
1.1.1
Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fundamental Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.1
If, Then . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.2
1.2.3
Quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.2
Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2 Limits
2.1
2.2
The Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.2
A Measure of Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Limited Intuition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.1
2.3
20
Negation of Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4
Limits at Infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5
Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.6
2.7
2.7.2
3 Derivatives
3.1
First Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.1.1
3.2
3.3
52
Failures of Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.2.2
Higher Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2.3
Trigonometric Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Rates of Change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.3.1
Some Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.4
3.5
3.6
3.5.1
3.5.2
3.5.3
The Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
The Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.6.2
Their Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.6.3
Logarithmic Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4 Applications Of Derivatives
4.1
83
Implicit Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.1.1
4.1.2
4.2
Related Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3
4.3.2
4.4
4.5
LH
opitals Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.5.1
4.5.2
Little o-notation
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
2
4.6
4.6.2
5 Integration
5.1
115
5.1.2
5.2
5.3
Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.3.2
5.3.3
5.4
Anti-Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.5
5.6
5.7
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6 Integration Techniques
138
6.1
6.2
6.3
6.4
6.5
153
Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
8 Improper Integrals
8.1
8.2
158
8.1.2
8.2.1
8.2.2
168
9.1
The Basics
9.2
9.3
9.5
9.4
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
9.4.2
10 Power Series
186
Many of you have already waded through the quagmire that is high-school calculus, endlessly
berated with salvos of mindless computational questions asking you to find numbers with which
you associate no meaning. This is what I call recipe mathematics, wherein the student is provided
with a recipe and the necessary ingredients, and is told to prepare a mathematical meal.
This is far removed from our goal in this class: our goal is to show the students how real
mathematics is done. We are going to be problem solving and proving theorems, and this is a
murky world which is hard to teach, and harder to learn.
Mathematics is often obsessed with rigour : the process of removing ambiguity and attaining the
highest possible level of absolute, infallible deduction. To do this, the student must first understand
the rigid framework in which mathematics is presented.
1.1
Before we can begin to speak complicated sentences, we must first learn the words of a language.
A set is any collection of well-defined and distinct objects. By this we mean that you can put as
many things as you like into a set, so long as they are concrete and all different. We often surround
the elements of a set by curly braces {, }, for example
{1, 2, 3, ...} ,
{^,
_}
,
{, , , } .
We can put anything we want into a set1 so long as the object is a well-defined thing (for example,
we cannot consider the set of all objects which I think are interesting. What objects are in this
set? It is ambiguous), and all the elements of the set are distinct (so the object {1, 1, 2} is not a
set, because the element 1 appears multiple times).
We use the symbol (read as in) to talk about when an element is in a set; for example,
1 {1, 2, 3} but _
/ {dog, cat}. We can also talk about subsets, which are collections of items in
a set and indicated with a sign. For example,
{2, 4, 6} {1, 2, 3, 4, 5, 6}
since every element on the left-hand-side is also present in the right-hand-side.
Since sets can have many objects within them, it is often impractical to list them all explicitly.
Instead, we might use set-builder notation, which allows to say the set of all things which satisfy
some property. For example,
{x : x > 0}
is read as the set of all x such that x is greater than 0, while
{month : month ends in ber} = {September, October, November, December} ,
1
This is actually a complete lie, but the reason why it is a lie is rather subtle. The quintessential example of this is
something known as Russells paradox. Let S be the set whose elements are the sets which do not contain themselves
as subsets. Is S an element of itself? This is a self-referential paradox.
5
c
2015
Tyler Holden
1.1
is the set of all months for which the end of the name of the month ends in ber. Another way we
could say this, is to let M be the set of all months, and consider the set
{x M : x ends in ber} .
More interesting than months are sets of numbers, since they tend to be quite big. Some sets
that we will be involved with a lot are as follows:
The naturals2 N = {0, 1, 2, 3, . . .},
The integers Z = {..., 2, 1, 0, 1, 2, ...},
The rationals Q = {p/q : p, q Z, q 6= 0},
The reals R (the set of all infinite decimal expansion).
I have been somewhat sloppy in defining the real numbers here since their construction is actually
rather involved. Nonetheless, I believe the average student is familiar with some of the basic
properties3 of R.
1.1.1
Intervals
In addition to the universes listed above, we will often find ourselves concerned with subsets of
R, called intervals. Almost certainly the student has seen these in one form or another, but we
re-introduce them here to ensure that everyone is on the same page. Effectively, intervals represent
connected subsets of real numbers, such as
1 x 5,
2 < x < 4,
10 < x 1,
x > 10.
These can be a pain to write down though, so for the sake of compactness we introduce a new
notation: If a, b are real numbers with a < b, the following hold4
(a, b) = {x R : a < x < b}
[a, b) = {x R : a x < b}
(a, b] = {x R : a < x b}
[a, b] = {x R : a x b}
Intervals of the form (a, b) are said to be open intervals, while [a, b] are said to be closed intervals.
The other two [a, b) and (a, b] are half-open intervals. With a slight abuse of notation, we can also
choose to let our sets be unbounded by writing infinities:
(, a) = {x R : x < a}
(a, ) = {x R : x > a}
6
c
2015
Tyler Holden
1.2
Fundamental Logic
We must always use open brackets when enclosing the infinity, since to use closed brackets [, ] would
imply a point at infinity, and no such point exists in5 R. Combining these together, we may write
the set of all real numbers are R = (, ) (noting that we always use open brackets, since there
is no number infinity contained in the interval).
1.2
Fundamental Logic
The logic we will use will be similar to that found in computer science and philosophy classes,
and while we will introduce it in a philosophical context, do not underestimate its importance
throughout mathematics.
1.2.1
If, Then
We say that P (x) is a proposition if it has a truth value associated to it; for example, we might
have P (x) = x is a dog, in which case
P (Lassie) = true,
P (Shitzu) = true,
This allows us to string together logical sentences, possibly the most important of which is the
if,then statement: Let P (x), Q(x) be propositions, and consider
If P (x), then Q(x).
This is a sentence, and we can always write it down regardless of what P (x) and Q(x) might mean.
However, what we would really like to do is decide whether the sentence is true. For example, if
P (x) =x is a dog and Q(x) =x is an animal then the statement
If P (x), then Q(x)
translates to
This sentence is certainly true, since every dog is an animal. More generally (and confusingly), the
statement If P (x), then Q(x) is true if whenever P (x) is true then Q(x) is true.6 In the interest
of removing all words, we sometimes write P (x) Q(x) in lieu of the If, then sentence.
The converse of P (x) Q(x) is the statement Q(x) P (x). Note that these are not logically
equivalent! For example, taking P (x) and Q(x) as before, the converse of If x is a dog, then x is
an animal becomes If x is an animal, then it is a dog and this is certainly false, since not all
animals are dogs.
The interesting thing happens when both P (x) Q(x) and the converse Q(x) P (x) are
both true. In this case we write P (x) Q(x) (read as P (x) if and only if Q(x)), and this means
that P (x) and Q(x) are logically equivalent; namely, the truth of P (x) is exactly the same as the
truth of Q(x). For example,
5
When we formally add the points to R, we get what are called the extended reals. Weirder yet, if we make
+ = , then what we get in the end is a circle! I may elaborate more on this later in these notes.
6
P (x) Q(x) also evaluates to true whenever P (x) is false, and this is called a vacuous truth. However, I do not
want to confuse the student with why this is true right now, so you should ignore this fact if you find it confusing.
7
c
2015
Tyler Holden
1.2
Fundamental Logic
If a and b are two odd integers, then their product ab is also an odd integer.
Solution. Recall that even integers can be written as 2n for some n (since they are divisible by 2)
and so odd integers are written as 2n + 1. Since we have been told that a and b are odd, we know
we can find m and n such that a = 2n + 1 and b = 2m + 1. Multiplying them together we get
ab = (2n + 1)(2m + 1) = 4nm + 2n + 2m + 1 = 2(2nm + n + m) + 1.
This number is again of the form 2(integer) + 1 and hence is odd, as we wanted to show.
1.2.2
If P (x) is a proposition, we can negate its truth by creating the proposition not P (x), often
written as P (x). This means that whenever P (x) is true, P (x) is not true; namely, P (x) is false.
If P (x) is as before
P (Lassie) = false,
P (Shitzu) = false,
We mentioned in the last section that P (x) Q(x) is not equivalent to the converse, but what
it is equivalent to is the contrapositive. The contrapositive of P (x) Q(x) is Q(x) P (x); in
our usual example, this becomes
If x is not an animal, then x is not a dog.
This is definitely still true. The idea is that if youre not even an animal, then you had no chance
of being a dog.
The following is a (contrived) example:
Example 1.2
1.2
Fundamental Logic
x x2 = 0
x = 0 or x = 1
x(1 x) = 0
Quantifiers
Now we add some spice to our logical lives by introducing the notion of quantifiers. Quantifiers
do exactly as the word implies: they denote some sort of quantity. We have two quantifiers in
mathematics, the universal quantifier () and the existential quantifier (). Remembering these as
universal and existential would be silly, and so the student should instead read these as =for
all and =there exists. The easy way to remember this is that looks like an upside-down A
corresponding to All , while looks like a backwards E, corresponding to Exists.7 As an example,
consider the sentence
x R, y R, y > x.
Right now this is just a bunch of symbols, and our job is to translate it into English. Using our
newfound knowledge of quantifiers, we can directly translate this as
For all x in the real numbers, there exists y in the real numbers, y is greater than x.
This is not particularly enlightening, so we take some linguistic liberties and re-arrange to say
For every real number x, there is a real number y which is bigger than x.
Thats much easier to read and to understand. In fact, we can even go one step further along the
translation and write
There is no largest real number,
though that might have been a big conceptual jump from the last line (be sure to think about why
those two sentences are the same!). This is true statement: there is no largest real number.
Order of Quantifiers: It is very important to realize that the order in which the quantifiers
appears is important, as they do not commute with each other. More precisely, we always have
that xy = yx and xy = yx, but it may not be the case that xy = yx. The student
is likely screaming about the equality sign here. What does it mean for these things to be equal!?
7
Why is the A upside down and the E backwards? This is probably related to the mnemonic above. Unfortunately,
the A and E exhibit symmetry relations: A is symmetric through a vertical axis while E is symmetric through a
horizontal axis. Thus we must flip the A and reverse the E, necessitating the orientation we see today.
9
c
2015
Tyler Holden
1.2
Fundamental Logic
By equality, I simply mean that they evaluate to the same level of truth. An example should shed
some light on this discussion.
Example 1.3
(1.1)
x Z, y Z, x + y = 0.
(1.2)
(1.3)
We can do our same translation trick, to determine that the corresponding English sentence is
There is an element which is the negative of every integer.
Let us think about what this means. This says there is a number to which we can add any
other number and always get zero. Certainly this is not true! If it were, then there would be a
10
c
2015
Tyler Holden
1.2
Fundamental Logic
number n such that n + a = 0 and n + b = 0 for any integers a and b. Equating these expressions,
we would find that n + a = n + b which in turn implies that a = b. This would force all integers to
be equal, and this is clearly nonsense.
By changing the order of the quantifiers, we have changed whether the statement is true or not,
and hence we must be careful about the order in which quantifiers are presented.
Negating Quantifiers: Negating quantifiers turns out to be rather easy. Let P (x) be a proposition, then negation of quantifiers works as follows:
Statement Negation
x, P (x) x, P (x)
x, P (x) x, P (x)
As we can see, negating the universal quantifier gives the existential quantifier and vice versa.
Whenever we go through a sentence, we just negate all of its parts.
Example 1.4
Now intuitively, this should say the opposite of what we determined above; namely There is some
integer that does not have a corresponding negative number. Lets see if we get this using the
same techniques we employed in Example ??. Translating the English verbatim we get
There exists an x in the integers, for all y in integers, x + y 6= 0.
This is a good start. Refining it further we get
There is an integer such that no matter what we add to it, we cant get 0
And this of course finally becomes the sentence we were expecting.
For (1.2) we move a bit quicker. The statement is
x Z, y Z, x + y = 0
and its negation is therefore
x Z, y Z, x + y 6= 0.
Exercise:
11
c
2015
Tyler Holden
(1.4)
1.3
1.3.1
1.3
Definitions are just what their name implies: they are a way of defining/creating a new notion by
using older notions. For example, a random person off the street might walk up to you and ask
What is a prime number?
While we often have fuzzy, intuitive ideas about many notions, it does not do well to build a
structure upon shaky ground. Since mathematics continuously builds upon itself, it is essential
that we ensure our foundation is rock-solid. An answer to the above question might be something
along the lines of
A prime number is a number whose only factors are 1 and itself.
This is an acceptable definition in most cases, but what if I were to now ask you Is 1 a prime
number? This is a somewhat tricky question: certainly the only factor of 1 is itself, but something
feels dubious about this definition. It turns out that 1 is not prime, and the reason we were unable
to deduce this is because our definition above is not sufficient.8
The additional course notes (located on the course website) has a great treatment of how to
turn the intuitive idea of an increasing function into a proper definition. I will leave this to the
student to read on his/her own. Instead, Im going to introduce a different notion which we will use
later in the course. I will first give you the intuitive idea, and then we will work towards building
up a proper definition.
Injective Functions: Heres the idea that we want to define: we know that some functions are
many-to-one in the sense that different inputs can yield the same output. A clear example of
this is the square function f (x) = x2 . Notice that f (2) = 4 = f (2), and in general for any a 6= 0
we will have f (a) = a2 = f (a). This function is thus two-to-one: there are two inputs which map
to a single out. Similarly, the function g(x) = sin(x) is infinite-to-one, since for any x R we have
sin(x) = sin(x + 2) = sin(x + 4) = . . . .
We want to determine when a function is one-to-one, also known as an injective function.
There is a geometric way of determining whether a function is one-to-one, called the horizontal
line test. Exactly akin to the vertical line test for determining whether a map is a function, the
horizontal line test is roughly stated as
The function f (x) satisfies the horizontal line test if any horizontal line drawn in the plane meets
the graph of f (x) at most once.
8
The real definition of a prime is somewhat complicated. The student should take it as If p Z is a number
not equal to 1, then p is prime if its only factors are 1 and itself. More generally, mathematicians define a prime as
follows: If p Z does not have a multiplicative inverse, then p is a prime if whenever p divides a product ab, then p
divides either a or p divides b. This is a better definition because it extends nicely to sets other than just Z.
12
c
2015
Tyler Holden
1.3
Okay, so we are starting to get an idea for how these functions work, but we need to write down
a formal definition. How do we start? There are two ways we can go, but Im going to choose one
and then Ill show you the other afterwards:
A function f is injective on (a, b) if whenever we plug in two different numbers, the value of f
at those two numbers is different.
This roughly captures what we are try to say: different inputs lead to different outputs. Clearly
f (x) = x2 does not satisfy this, since we can put in different numbers ( 2 and 2) and get the same
output. Still, its not a very rigorous definition: what are the inputs? What are the outputs? Lets
spruce it up a bit more.
A function f is injective on (a, b) if whenever x, y (a, b) and x 6= y then f (x) 6= f (y).
This is actually a pretty good definition so far, but we could make it even better by throwing
in a universal quantifier and mathematizing everything:
Definition 1.5
x 6= y
f (x) 6= f (y).
This is a nice definition in terms of understanding the concept, but it turns out to not be
terribly practical to show that a function is injective using this definition. Far more useful is the
contrapositive of this statement. Take a moment to see if you can determine the answer before
reading on.
A function f : (a, b) R is injective if for all x, y (a, b), f (x) = f (y) implies that x = y.
Example 1.6
13
c
2015
Tyler Holden
1.3
Exercise:
1.3.2
Theorems (or to a lesser degree, lemmas and propositions) are the major results of mathematics.
Up to poetic licence, these are phrased in the form
Theorem: If HYPOTHESES, then RESULTS
When proving a theorem, one typically starts by assuming the hypotheses, then showing that
the results are true. When applying a theorem, one must first check that the hypotheses are
satisfied. Lets jump into examples to elucidate what we mean.
Theorem 1.7
is equivalent to
f (m) = f (n).
Using the fact that f is injective implies that m = n, or equivalently, g(x) = g(y).
Now we get stuck again since we cant seem to reduce any further. Looking back at our
hypotheses, we have yet to use the fact that g is injective. But this is precisely what we need to
finish our proof; namely, since g is injective, g(x) = g(y) implies that x = y.
So lets quickly summarize what we found. We showed that if f (g(x)) = f (g(y)) then g(x) =
g(y), and this implied that x = y. This is the definition of what it means for f g to be injective,
so we are done the proof!
14
c
2015
Tyler Holden
1.3
Example 1.8
Assume that g : R R is an injective function. Show that g(x)3 is also an injective function.
Solution. If we think about this for a moment, we see that by defining f (x) = x3 we get g(x)3 =
f (g(x)). Now we would like to apply Theorem 1.7, but first we must check that the hypotheses
apply; that is, both f and g are injective.
We were told that g(x) was injective to start with, so theres nothing to check there. The
diligent student will have shown in Exercise 1.3.1 that f (x) = x3 is injective. Hence both f and g
are injective, and Theorem 1.7 applies. We conclude that the composition f g is also injective,
hence g(x)3 is injective as required.
Theorem 1.9
Proof. This is a result that often confuses students, since it seems to go the wrong way compared
to our last result. We are given two functions, and all we know is that when we combine them to
form f g then this new function is injective. From that, we would like to deduce that g is injective.
Hypotheses: f g is injective; that is, f (g(x)) = f (g(y)) implies that x = y.
Want to show: g is injective; that is, g(x) = g(y) implies that x = y.
Okay, so to show that g is injective, we should start with the expression we always start with
when we want to show injectivity; namely, assume that x, y R satisfy g(x) = g(y). At this point
we are stuck and do not know what to do, so we go back to our hypotheses. By applying f to both
sides of our equating, we can get f (g(x)) = f (g(y)). Now our hypothesis tells us that x = y, which
is what we wanted to show.
Example 1.10
Find an example of functions f and g such that f g is injective but f is not injective.
Conclude that the converse of Theorem 1.7 does not hold.
1. Let f (x) = x2 and g(x) = x. The student can easily check that f is not injective but g is
2
injective. Now f (g(x)) = ( x) = x is an injective function, as required.
2. Consider the functions f : R R and g : R 2 , 2 given by f (x) = tan(x) and
g(x) = arctan(x). One can show that g(x) is injective, and it is easy to see that f (x) is not
injective. However, their composition is
f (g(x)) = tan(arctan(x)) = x
15
c
2015
Tyler Holden
1.4
Induction
1.4
Induction
Proof by mathematical induction is a proof technique used to show that a result holds for every
natural number N. For example, lets try and show that for every n N, 2n 2n .
Lets try out a few values of n and see what happens:
n 2n 2n 2n 2n
0 0 1
true
1 2 2
true
2 4 4
true
3 6 8
true
4 8 16
true
5 10 32
true
The student can likely see the pattern, and it is apparent that 2n is growing very quickly, so 2n
will never catch up and the result will be true. Nonetheless, this is not a mathematical proof! Its
not enough to wave your arms and say it looks like it should be correct, so it is!10
What mathematical induction brings to the table is a very succinct way of showing that something holds for all n, and it relies on the following principle:
16
c
2015
Tyler Holden
1.4
Induction
1. Base Case: Show that P (1) is true (or P (k), where k is the smallest number for which the
result holds).
2. Induction Step: Show that, under the assumption that P (k) is true, P (k + 1) is also true.
Step (1) just confirms that P (1) is true according to our induction principle, while step (3) is
the P (k) P (k + 1) step.
Example 1.11
(2k + 2) + 2
= 2k + 4 = 2(k + 1) + 2.
Show that 2n 2n for all n N. [Hint: You will need to use Example ??.]
Example 1.12
n=1
1
k
=
.
n(n + 1)
k+1
n=1
1
1+1
1
1
1
=
=
n(n + 1)
1(1 + 1)
2
c
2015
Tyler Holden
1.4
Induction
2. Induction Step: Let k be some fixed by arbitrary number and assume that
k
X
n=1
1
k
=
.
n(n + 1)
k+1
We would like to show that the result holds for k + 1. It makes most sense to start by working
with the left-hand-side, since it will give us the most flexibility. Notice that
k+1
X
n=1
k
X
1
1
1
+
=
n(n + 1)
n(n + 1) n=k+1
n(n + 1)
n=1
1
k
=
+
(k + 1)(k + 2) k + 1
k 2 + 2k + 1
1 + k(k + 2)
=
(k + 1)(k + 2)
(k + 1)(k + 2)
(k + 1)2
k+1
=
=
(k + 1)(k + 2)
k+2
(k + 1)
.
=
(k + 1) + 1
=
The point I would like to make is that while the above may have looked somewhat messy, there
was really only one thing we could have done. By intelligently setting up the problem, the rest of
the work simply fell out.
Definition 1.13
Let m, n be integers. We say that m|n (read as m divides n) if there exists some integer k
such that mk = n.
To see what this means intuitively, notice that in general when we divide two integers, we get
n
a rational number. For example, 32 is rational and not an integer. However, we say that m|n if m
4
is actually an integer. Thus 2|4 since 2 = 2.
Example 1.14
1.4
Induction
= (1 + 5)(6k ) 1
= 5(6k ) + (6k 1)
We claim that 5 divides this number. To see that this is the case, let us divide by 5 and see
what we get.
6k+1 1
5(6k ) + (6k 1)
=
5
5
k
6k 1
56
+
=
5
5
k
=6 +d
by induction hypothesis.
19
c
2015
Tyler Holden
Limits
Limits
2.1
Absolute values often give students trouble, but will be absolutely crucial for the upcoming section.
Our goal is thus twofold: the first is to ensure that the student understand the intuition for absolute
values and why we use them; the second objective is to show students how to deal with problems
that involve absolute values.
2.1.1
The Definition
The absolute value of a number n is denoted |n|, and many students often interpret the absolute
value as being the thing which makes n positive. This is correct to a degree, as evidenced by the
proper definition of the absolute value.
Definition 2.1
(2.1)
This definition is a little weird: we know that the absolute value of a number should always
be positive, so why does this negative sign appear in the definition? Lets try computing a few
absolute values using this definition, and hopefully that will make things clear:
Example 2.2
Solution. We start by computing |3|. By looking at (2.1) we see that since 3 > 0 the absolute value
is given by just writing down the argument; namely, |3| = 3.
To compute | 4| we must be a bit more careful. Clearly 4 < 0 which means we fall into
the bottom-half of the piecewise definition, so the absolute value is the negative of the argument;
namely | 4| = (4) = 4.
The absolute value performed as expected, when we put a negative number into it, we got its
positive counterpart out. The fact is that when x < 0 we know that x is a negative number. The
definition of the absolute value then tells us to set |x| = x, which in turn guarantees that x > 0.
More than just a technical definition, (2.1) is actually the key tool to apply when dealing with
problems that involve absolute values: We break the problem into a piecewise problem using the
definition.
20
c
2015
Tyler Holden
2.1
Limits
Example 2.3
x 3
.
x < 3
This allows us to break the problem into two smaller problems, based on whether x 3 or
x < 3.
Case x 3: When x 3 we have |x + 3| = x + 3, so our problem becomes x + 3 > 2x + 4.
This is now easy to solve and we get that x < 1. Since we must have both x 3 and
x < 1, this will only be satisfied for x [3, 1).
Case x < 3: When x < 3 we have |x+3| = x3, so our problem becomes x3 > 2x+4
which we can solve to get x < 7/3. Hence we must have both x < 3 and x < 7/3, which
means that our equation is satisfied when x (, 3).
Combining both solutions, we get that |x + 3| > 2x + 4 when x (, 3) [3, 1) =
(, 1).
To re-iterate: when solving a problem which involves an absolute value, break the problem into
cases to get rid of the absolute value, then solve as normal.
Very quickly, here are the important properties of the absolute value:
Proposition 2.4
2.1.2
A Measure of Distance
This tells us how to deal with absolute values when they arise and we will see more example shortly,
but lets stop for a moment and think about what an absolute value means. An absolute value is
a measure of distance or length, in particular, from the number 0. For example, |3| is the distance
21
c
2015
Tyler Holden
Limits
2.1
from the number 0 to the number 3: unsurprising this is 3-units. On the other hand, | 4| is the
distance from 0 to the number 4, and that is 4-units (since distances are never negative).
Of far greater interest is that we can use the absolute value to measure distances between two
non-zero numbers. For any x, y R, the distance between x and y is |x y|. For example, what
do we intuitively expect the distance to be between the numbers 1 and 5? I think its going to
be 6 units, and indeed |(1) 5| = | 6| = 6 as we expected. Of course, distance does not care
about which element came first and we could have just as easily written |5 (1)| = |6| = 6 and
gotten the same answer.
An incredibly useful thing that we should check is the following: Assume that r > 0 is some
real number. What is the set {x R : |x| < r}? If we read this properly, this should be the set of
all elements that are within r of 0; namely, (r, r). Lets see if this is true:
First, we break the absolute value down into cases. We know that if x 0 then |x| = x, and so
if x 0 and x < r then x [0, r). Similarly, if x < 0 then |x| = x and we have x < r which is
the same as x > r. In this case, we get x (r, 0). Combining the two solutions, we get
x (r, 0) [0, r) = (r, r)
as required. We dont always want to write down intervals though, and we know that x (r, r)
is the same thing as r < x < r. Hence we can often make the following transformation:
|x| < r
r < x < r.
(2.2)
Example 2.5
Solution. Equation (2.2) show us that writing |x| < r is the same thing as r < x < r, and this is
very symmetric. To write (1, 3) in the same way, we notice that the centre of (1, 3) is the number
2, and that all points are within a distance of 1 from the center. We hypothesize that
(1, 3) = {x R : |x 2| < 1} .
Lets check to see if this is correct. Using a modified version of Equation (2.2) we get
|x 2| < 1 1 < x 2 < 1
1 + 2 < x 2 + 2 < 1 + 2
1<x<3
x (1, 3).
Example 2.6
2.2
Limited Intuition
Limits
Solution. There are many ways to solve this, and since we were only asked to find an upper bound,
there are infinitely many different solutions. One could probably guess that if we chose a very large
number, it would probably bound |2x + 4|, but we 1) want to prove it mathematically, and 2) often
want as good a bound as possible.
Now certainly the function |2x+4| is not bounded in general, but the fact that |x+1| < 2 means
that we are only looking at a very small segment of the real line. The first method for solving this
is to unravel the definition of |x + 1| < 2 and see if we can transform it into something that looks
like |2x + 4|. Indeed, we know that
|x + 1| < 2
3 < x < 1
2 < 2x + 4 < 6
6 < 2x < 2
multiply by 2
add 4
So we now have upper and lower bounds on the function 2x + 4, but we want to turn this into
bounds on the function |2x + 4|. If we think about this, we can see that since 2x + 4 can only take
numbers in the interval (2, 6), then the largest it can be in absolute value is 6, hence |2x + 4| < 6.
Alternatively, if we are clever we can use the triangle inequality. Since |x + 1| < 2, we have that
|2x + 4| = |2(x + 1) + 2| 2|x + 1| + 2 < 2(2) + 2 = 6
which is the same result we had before.
Exercise: Let f (x) be a function such that a < f (x) < b. Convince yourself that |f (x)| <
max {|a|, |b|}.
2.2
Limited Intuition
Limits are the method why which we, as manifestly finite being, deal with concepts of infinities and
infinitesimals. The key goal towards which we are looking is the description of an instantaneous
rate of change, so let us ponder what this means.
The majority of us have been in a car at some point or another, and have afforded a casual
glance at the speedometer. Let us say that at the instant we look down, the speedometer reads 90
km/hr. Have you ever thought about what it means, at that single instant in time, to be travelling
at that speed? As suggested by its units, speed is an object which requires both distance and time
to measure, but at a single moment, neither any time nor any distance has passed, so what does
this mysterious quantity mean?
Despite my claims that the previous example should get you thinking about how the word
instantaneous really affects a quantity, many of you will simply shrug aside my suggestions. In
anticipation of this reaction, what if we change the associated quantities around and instead of the
instantaneous speed of a car, we discuss shopping! At any given point of time, somebody on this
planet is making a purchase. Let us assume that we were able to measure the rate at which people
were spending money, and I told you that at this moment in time the human species was globally
spending $140 million dollars and hour? What does this mean!
23
c
2015
Tyler Holden
Limits
2.2
Limited Intuition
Now on the other hand, what if you were asked to determine the instantaneous speed of a
race car at the instant its front bumper passes a finish line? Being clever students, you decide to
measure how far the car has travelled in the minute before it hits the finish line, and get a result
of 1500 meters. Hence the car was travelling
1500 metres
1 kilometre
60 seconds
90kilometres
=
.
1 minute
1000 metres
1 hour
1 hour
But what if the cars speed was not constant during that minute? What if the driver accelerated at
the end? You decide that you can get a better estimate of the speed at the finish line by instead
just looking at how far the car travelled in the single second before the car hit the finish line. This
time the car travelled 30 metres, so you calculate
30 metres
108 kilometres
=
.
1 second
1 hour
But still, this does not account for any change in acceleration which occurred in the last second.
Your guess of 108km/hr is probably close, but close is not good enough in mathematics! So you try
again by measuring the distance after 0.1 seconds, then 0.01 seconds, and so on (does this remind
you of anything?), but no matter how hard you try you cannot get the exact speed because there is
always the chance that the car was not travelling at a constant speed during your measurements.
Nonetheless, we know there must be an answer: the car was travelling at some speed, so what is
it? Limits provide the solution.
2.2.1
Limits are the mathematical device which allow us to infer information about a (possibly misbehaved) point by analyzing information about well-behaved points nearby. The way this is done in a
formal environment can be a little bit messy, so let us take a moment to get some simple examples
under our belts before diving headfirst into the chaos.
Let f (x) be an arbitrary function and c R. We say that the limit of f (x) as x approaches c
is equal to L if, whenever we let x get arbitrarily close to c then f (x) gets arbitrarily close to L.
This is written as
lim f (x) = L.
xc
The best way to gain an intuitive understanding of limits is to see a few examples. We warn the
student that this first example is rather nicely behaved and fails to capture why we use limits.
Nonetheless, simple examples are often the best for getting a grasp as to how something works.
Example 2.7
x0
lim f (x),
x4
lim f (x).
x5
2.2
Limited Intuition
Limits
Solution. We implore the student to keep in mind that this solution is purely heuristic and is only
presented in a way to show the student how to think about these problems. A limit should never
be computed in the following manner.
The first example asks us to consider what happens when x = 0, so we would like to see what
happens for values of x which are close (but not equal to zero). Hopefully the student can guess
that as x gets close to zero, 4x + 2 gets close to 4 0 + 2 = 2. Similarly, as x 4 then 4x + 2
approaches 4 (4) + 2 = 14. The following table corroborates this idea:
x<0
x0
x>0
x
f (x)
x
f (x)
0.1
1.6
0.1
2.4
0.05
1.8
0.05
2.2
0.01
1.96
0.01
2.04
0.005
1.98
0.005
2.02
0.001 1.9996 0.001 2.0004
0.0005 1.9998 0.0005 2.0002
x 4
x < 4
x > 4
x
f (x)
x
f (x)
4.1
14.4
3.9
13.6
4.05
14.2
3.95
13.8 .
4.01
14.04
3.99
13.96
4.005 14.02 3.995 13.98
4.001 14.004 3.999 13.996
4.0005 14.002 3.9995 13.998
x0
x4
xc
In example 2.7 we guessed that the limit as x c could be determined by evaluating f (c),
and it turns out that in this example that is correct. However, one must be very careful about
just freely plugging in numbers into equations as the function might not always be defined at that
point.
Example 2.8
x2 +x6
x2 .
lim f (x)
x2
Solution. Unlike the previous example, attempting the substitute x = 2 into f (x) will result in
division-by-zero, which we know is never permitted. However, we can evaluate f (x) at any number
other than 2 and the hope is that this will tell us what the function looks like at x = 2. Indeed,
creating the following table we once again find that
x
f (x)
2.1
5.1
2.05
5.05
2.01
5.01
2.005 5.005
2.001 5.001
2.0005 5.0005
25
c
2015
Tyler Holden
Limits
2.2
Limited Intuition
so it certainly appears as though f (x) is approaching 5. Indeed, if x 6= 2 then we may factor f (x)
as
x2 + x 6
(x + 3)(x 2)
=
=x+3
x2
x2
and the behaviour of this function as x 2 agrees with our observations.
The previous example demonstrates that a function does not need to be defined at a point for
the limit at that point to exist. In fact, this is an excellent opportunity to point out that the
2 +x6
functions f (x) = x x2
and g(x) = x + 3 are similar but are not equal : the distinction being
that the domain of f (x) is R \ {2} while the domain of g(x) is R. If two functions have different
domains, they certainly cannot be equal! Of course, x = 2 is the only point where the functions do
not agree, and their graphs are even identical with the exception that the graph of f (x) will have
a hole at x = 2. Luckily, this does not matter when we are taking limits, and we have the equality
x2 + x 6
= lim x + 3.
x2
x2
x2
lim
The reason is that while the functions differ at the point x = 2, the limit only looks at what the
functions do at points close to but not equal to 2. Thus the limits see them as the same function
(cf Figure 1).
y
4
f (x) =
x2 +x6
x2
f (x) =
x+2
x2
x<0
x>0
2
-3
-2
x
2
+x6
Figure 1: Left: The function x x2
is identical to the function x + 3 except for the presence of
a hole at x = 2. This does not affect the limit though, as the limit is only concerned with the
behaviour of the function near x = 2. Right: A graph of a piecewise function whose limit at zero
is dependent upon the direction of approach. Notice that in either case, the limit disagrees with
the value of the function at zero.
Example 2.9
x2
x2
.
x+73
26
c
2015
Tyler Holden
2.2
Limited Intuition
Limits
Solution. This is actually identical to the previous example, but it may not be obvious why. Instead,
the usual thing to do in
such situations where one summand is a square root is to multiply by the
conjugate. In this case, x + 7 + 3. In that case we have
x2
x+7+3
(x 2)( x + 7 + 3)
lim
= lim
x2
(x + 7) 9
x + 7 3 x + 7 + 3 x2
x 2
x+7+3
= lim
x2 x 2
= lim
x + 7 + 3 = 6.
x2
and so the steps we just iterated may be replaced by a much simpler cancellation argument.
One Sided Limits: In each of the above examples, our tables were typically biased by choosing
to approach our limiting number in one direction. For example, when looking at the limit as
x 0 we approached 0 by considering successively smaller positive numbers, but we could have
equivalently looked at successively smaller negative numbers. We say that the limit at 0 exists
if and only if the limit of from the positive numbers is equal to the limit given by the negative
numbers.
More generally, we will say that the limit of f (x) as x approaches c from the right is L if
whenever x > c gets arbitrarily close to c, f (x) gets arbitrarily close to L. This is written in
symbols as
lim f (x) = L.
xc+
Similarly, we say that the limit of f (x) as x approaches c from the left is L if whenever x < c
gets arbitrarily close to c, f (x) gets arbitrarily close to L, and in this case we write
lim f (x) = L.
xc
We then define the two-sided limit to be L if both limits exist and are equal. There are plenty of
examples where the two-sided limit does not exist, as our following examples demonstrate.
Example 2.10
x + 2
f (x) = 1
2
x
x<0
x=0.
x>0
Compute the limit of f (x) as x 0 and as x 0+ . Does the two-sided limit exist?
Solution. We first look at the limit as x 0 . In this case, we know that x is always less than 0,
and so in this regime f (x) effectively looks like the function x + 2. As x 0 we see that x + 2 2
and so we conclude that
lim f (x) = 2.
x0
27
c
2015
Tyler Holden
Limits
2.2
Limited Intuition
On the other hand, the limit x 0+ guarantees that x is always positive. Here, f (x) looks like
the function x2 and as x approaches 0, x2 approaches 0 as well, so
lim f (x) = 0.
x0+
Each one-sided limit exists, but they are not equal. Hence the two-sided limit does not exist. The
graph of f (x) is given in Figure 1.
The other way that a two-sided limit can fail to exist is if the one-sided limits do not exist
either. This can happen in one of two ways: The first is that it is impossible to find a number L to
which the function gets close. This can happen, for example, if our function oscillates infinitely in
any small interval around a point. Alternatively, the limits can diverge, meaning that the function
goes to either positive or negative infinity. This next example demonstrates both phenomena.
Example 2.11
sin
1
1x
x2
x1
1
x
x<0
0x<1
x>1
Solution. Take a look at Figure 2 which plots f (x). In the limit as x 0 we want to examine how
the function sin (1/x) behaves when x is a small negative number. With any luck, the student will
recognize that the function begins to oscillate infinitely often as x gets small, making it impossible
for the function to ever converge to a single point. This tells us that
lim f (x)
x0
In the case x 0+ , we are only interested in what happens in a small positive neighbourhood of
1
0 and hence we can assume that 0 < x < 1 so that f (x) looks like 1x
. We encounter no problem
by substituting small positive numbers into this function, and one can quickly deduce that
lim f (x) = 1.
x0+
1
Now 1x
is not defined at x = 1, so we must be careful in looking at the limit as x 1 . As x < 1
1
gets arbitrarily close to 1, the number 1 x is a very small positive number. This means that 1x
1
is a very large positive number. As we can make 1 x arbitrarily small, 1x
can be arbitrarily
large so
lim f (x) = .
x1
x
Finally, as x 1+ we again run into troubles with the function x1
. But we can see that the
number is always positive and tends to the number 1, while the denominator x 1 gets arbitrarily
x2
small, but positive since x > 1. Making x arbitrarily close to 1 implies that x1
can get arbitrarily
large, so that
lim f (x) = .
x1+
28
c
2015
Tyler Holden
2.3
Limits
y
f (x) =
sin
1
1x2
x2
x1
1
x
x<0
0x<1
x>1
y=1
-2
Figure 2: A pathological function for which no two sided limit exists at the points x = 0 and x = 1.
Note that as x 0 the limit fails to converge, while as x 1 the function diverges to infinity.
2.3
The - definition of the limit is the bane of every calculus student and will not cease haunting you
until your time in this course has finished. However, it is so crucially important in developing a
rigorous theory of limits that the sooner the student grasps this concept the better.
The unfortunate reality of the situation is that the definition of a limit is awkward and clumsy,
though is absolutely necessary for reasons I will make clear shortly. There are multiple ways to
understand - concepts and I will try to convey the most important to you.
Definition 2.12
|f (x) L| < .
(2.3)
With our study of logical quantifiers in hand, let us see if we can translate this into a plain
sentence. Reading verbatim, (2.3) says
For every positive , there is a positive such that if |x c| < then |f (x) L| < .
Well, this has failed to clear anything up, so we must make a very small detour to clarify things.
Recall that aside from the technical definition of the absolute value, the quantity |a b| is just the
distance for a to b. The absolute value is necessary to ensure that the distance is always positive
since it makes no sense to talk about negative distances.
Hence |a b| can be read as the distance of a from b (or of b from a) and |x c| < means
the distance from x to c is at most , and |f (x) L| < means the distance from f (x) to L is
at most . Our translation of (2.3) now becomes
29
c
2015
Tyler Holden
Limits
2.3
No matter how close I want f to get to L, I can find an x that is sufficiently close to c such that
f (x) is closer.
The textbook uses a diagram similar to Figure 3 to explain this situation. The idea is that given
L and a prescribed , we may draw the lines corresponding to L . Where these lines meet the
function f (x), we then draw vertical lines that meet the x-axis. We then take the largest symmetric
interval about c and define to be the distance from c to these intersecting lines.
f (x)
L
L
L+
c c+
2.3
Limits
Show that
lim 2x + 1 = 7.
x3
(2.4)
Solution. Let > 0 be given and choose = 2 . We would like to show that if |x 3| < then
|(2x + 1) 7| < . Indeed this is the case since
|(2x + 1) 7| = |2x 6| = 2|x 3| < 2 =
2
as required.
(2.5)
The most important question you should all have is how did he know to choose = /2? I
know because I did a bunch of rough work ahead of time, then wrote up the proper solution already
knowing that this choice of would work. So now let me explain how you should do your rough
work.
It may help to think of this as some sort of curious game which, speaking from experience, is
a great way to break the ice at parties. In particular, let us assume that some dodgy mysterious
31
c
2015
Tyler Holden
Limits
2.3
fellow approaches you on the street, and gives you an > 0. He tells you that you can do whatever
you want with the quantity |x c|, and that you win the game if you can find a > 0 such that
|f (x) L| < whenever |x c| < . What should our strategy be? Our only tool is that we can
make |x c| do whatever we want, so we should try to manipulate |f (x) L| to look K|x c| for
some constant K. If we can do this, then no matter what we are given, we can automatically just
choose = /K and win the game. Let me illustrate this with a simple but more general example
than that given above:
Example 2.14
xc
(2.6)
Solution. We begin by doing the rough work. Let > 0 be given. Per our strategy discussed above,
we should try to make |(ax + b) (ac + b)| look like K|x c| for some constant K. Indeed,
|(ax + b) (ac + b)| = |ax + b ac b| = |ax ac| = |a||x c|.
(2.7)
Now that was not that hard was it? For equation (2.7) to be less than , we want |a||x c| < , or
rather |x c| < /a. Since we can let |x c| be as small as we want, we choose = /|a|.
Having done the rough-work, we may now give the proper proof, which we would write as
follows:
Let > 0 be given and choose = /|a|. If |x c| < then we have
|f (x) L| = |(ax + b) (ax c)| = |ax + b ax b|
= |ax ac| = |a||x c|
< |a|
=
|a|
Show that
lim x2 9 = 16.
x5
(2.8)
Solution. As always, let > 0 be given and consider |f (x) L| which we can manipulate as
|(x2 9) 16| = |x2 25| = |(x 5)(x + 5)| = |x 5||x + 5|.
(2.9)
We recall that our goal is to get this to look like something of the form K|x 5|, but we have this
pesky |x + 5| term hanging around. How do we deal with this? Since we only want |f (x) L| to
be less than epsilon, we are allowed to make some approximations.
32
c
2015
Tyler Holden
2.3
Limits
Let us assume for the moment that |x 5| < 1 (which roughly corresponds to choosing = 1),
though we can choose any number that we like. Under this assumption we would like to determine
an upper bound for |x + 5|. But notice that
|x + 5| = |(x 5) + 10| |x 5| + 10 < 11.
(2.10)
|f (x) L| 11|x 5|
(2.11)
Thus if = 1 we have
and we should choose = /11. We have almost completed our rough work, but notice that in
order to get this bound, we needed to assume that |x 5| < 1. In order to take care of this, we
simply set = min(1, /11), and then our proof will work. Indeed, the proof is as follows:
Let > 0 be given and set = min(1, /11). Notice that
|(x2 9) 16| = |x2 25| = |x + 5||x 5|.
(2.12)
Since we chose = min(1, /11) it is certainly the case that |x 5| < 1. Hence (2.10) holds
and so |x + 5| < 11. Thus
|(x2 9) 16| = |x + 5||x 5| 11|x 5| < 11
=
11
(2.13)
as required.
As I noted above, in approximating |x + 5| we could have assumed that |x 5| was less than
any positive number. For example, if we had used 2 then we would have found that |x + 5| < 12.
The only change we then need to make in the proof is to set = min(2, /12).
Exercise:
xd
(2.14)
In particular, if we make the approximation that |x d| < n for some n > 0, show that
choosing
= min n,
|a|n + 2|ad| + |b|
is sufficient for the proof.
In the following examples, I have only done the rough work required for each
question. I have left the proof part as a simple exercise for the student.
This being said, there are occasions in which it can be a little tricky to start from |f (x) L|
and get |x c|. In fact, sometimes it is actually easier to go the other direction!
Example 2.16
Show that
lim
x9
x = 3.
33
c
2015
Tyler Holden
(2.15)
Limits
2.3
Solution. Let > 0 be given. In this case, we realize that |x 9| = | x 3|| x + 3| so that we
can write
|x 9|
|f (x) L| = | x 3| =
.
(2.16)
| x + 3|
Following the procedure outlined above, our goal should then be to find a number K such that
| x + 3| K. We must use the greater-than sign here, since when we take the reciprocal the
greater-than sign will switch to the less-than sign that we desire. There are two ways to do this:
the easy way and the hard but more general way.
The first way is to realize that x + 3 > 3 for all x and so | x + 3| = x + 3 > 3. In this case,
we now have
|x 9|
1
| x 3| =
|x 9|
(2.17)
3
| x + 3|
so choosing = 3 will do the trick.
On the other hand, we may also proceed with the same trick as before and hypothesize that
|x 9| < 1. In this case, we then have
1 < x 9 < 1
8 <x < 10
8
<
x
<
10
8 + 3 < x + 3 < 10 + 3.
Now when we take absolute values, we can only guarantee
the
the function
will be greater than
|x 9|
1
| x 3| =
|x 9|
| x + 3|
8+3
so choosing = min 1, ( 8 + 3) will do the trick.
(2.18)
Show that
lim
x5
x+5
= 5.
x7
(2.19)
c
2015
Tyler Holden
2.3
Limits
We want to try to bound |x 7| from below so that its reciprocal is bounded from above. Assume
that |x 5| < 1 and notice that
1 < x 5 < 1
1 2 < x 5 2 < 1 2
3 < x 7 < 1
and so |x 7| min {| 3|, | 1|} = 1. We thus have
x + 5
x 7 + 5 6 |x 5|
(2.20)
Negation of Limits
In the previous section we determined how to use the - definition of a limit to show that a limit
exists. In this section, we demonstrate the (much harder) task of showing that a limit does not
exist. Having spent a great deal of time learning how to negate logical sentences, we can just apply
our know-how to directly negate the limit:
lim f (x) 6= Lprecisely if > 0, > 0, x, 0 < |x c| < and |f (x) L| .
xc
Show that
lim
x0
x
6= 0.
|x|
Solution. A quick look at the graph of x/|x| indicates that indeed, the limit should not be zero,
but we can now rigorously show this using our negation above. We have to find an > 0 which
works. If > 1 then our function will always live within the given -band, so we should choose
some 0 < < 1. Lets take = 1/2.
Let > 0 be given and arbitrary, so that we are looking at the interval (, 0) (0, ). We have
to choose a point in here to make |f (x) 0| = |f (x)| > . Lets try taking x0 = /2. Notice that
|x0 0| = |x0 | = /2 < so this x0 lies within the desired interval. Furthermore, |f (x)| = |1| = 1,
and so
1
|f (x)| = 1 > = ,
2
as required.
35
c
2015
Tyler Holden
Limits
2.3
This shows that the limit of a function is not a single limit point, but it doesnt say that it
cant be some other point. To show that a limit does not exist at all, we have to first realize that
implicit in the - definition of a limit is another existential quantifier; namely, there exists a limit
L. Hence to show that a limit does not exist for all L, our negation should be
L R, > 0, > 0,
Take careful note of how L, , and are permitted to depend on one another.
1. The is allowed to change depending upon the choice of L.
2. The must work for all choices of .
Example 2.19
x0
x
does not exist (for any L).
|x|
Solution. If |L| =
6 1 then the above proof can be adapted to work, so we break our proof into two
cases.
Case 1 (|L| 6= 1) Lets assume that L > 0 since precisely the same argument holds if L < 0. Whats
the idea here? Try choosing a random limit L 6= 1. How do we choose to ensure that the
function will not lie in the given -band? We make smaller than the distance from L to 1.
Hence set d = |L 1| and take = d2 . Let > 0 be given, and choose x to be any positive
number 0 < x < , so x0 = /2. In this case we have
|f (x) L| = |1 L| >
d
= .
2
Case 2 (|L| = 1) Once again, take L = 1 and use a similar argument for L = 1. If L = 1 then our
previous arugment will not work (try it!). The trick is now to take x to be negative. Indeed,
set = 1 and take x0 = /2, so that f (x0 ) = 1 and
|f (x) L| = |(1) 1| = | 2| = 2 > 1 = .
Rather than do everything with the - definition, we can use the powerful but unsophisticated
tools to build an infrastructure for problem solving. Of course, such infrastructures are created
through bootstrapping, meaning that we are going to be using the - definition a lot in this section.
Our first theorem tells us that once we have found a limit, we can be confident that it is the
only one:
Theorem 2.20
If f is a function defined in a neighbourhood of c and lim f (x) exists, then the limit is unique.
xc
36
c
2015
Tyler Holden
2.3
Limits
Proof. We will proceed by contradiction. What would go wrong if we had two different limits
L1 6= L2 ? If we draw a picture and think about -bands, we see that it should be impossible for
two different limits to be contained in the same band. In particular, if we let be less than the
distance between L1 and L2 , something should go terribly wrong.
Let d = |L1 L2 | which we know is not zero by assumption. Since we assume that
L = lim f (x) = M,
xc
|f (x) L1 | + |f (x) L2 |
d
d d
= + = .
4 4
2
Of course, since d > 0, this is impossible and we have arrived at a contradiction. Hence we have
shown that L1 = L2 and we conclude that limits are unique.
We will not put this theorem to much use in this course, but it is a load off our shoulders
knowing that nothing terribly weird can happen with limits.
On to our next subject: Mathematicians love to be lazy, in the sense that if we have already
performed a calculation, why should we repeat it ever again? Similarly, we like to build complicated
examples from simple examples. To this end, we formulate the following collection of limit laws,
which are intended to dramatically simplify our life:
Theorem 2.21: Limit Laws
lim g(x) = M
xc
xc
xc
h
i h
i
2. lim [f (x) g(x)] = lim f (x) lim g(x) = L M ,
xc
xc
xc
ih
i
3. lim [f (x)g(x)] = lim f (x) lim g(x) = LM ,
xc
4. lim
xc
xc
xc
f (x)
limxc f (x)
L
=
=
if M 6= 0.
g(x)
limxc g(x)
M
37
c
2015
Tyler Holden
Limits
2.3
It is crucial that both limits must exist. It is a common mistake for students to gleefully attempt
to apply the above limits laws in instances in which it is not permitted. For example, the following
IS NOT CORRECT:
h
i
1
1
1
2
2
lim sin
lim x sin
= lim x
= 0 lim sin
= 0.
x0
x0
x0
x0
x
x
x
Interestingly, this is the correct answer, but for all the wrong reasons. Hence this solution is
absolutely incorrect.
Proof. We will only do the proof of number 2, though we will do it in great detail. The remaining
proofs are left as exercises for the student.
We are given a new function which is the sum (f + g) of our given functions, and we would like
to show that their limit is the corresponding sum L + M . This means that we have to show that,
given any > 0, there exist some > 0 such that 0 < |x c| < implies that
[f
(x)
+
g(x)]
[L
+
M
]
(2.21)
< .
Lets take a moment to think about what hypotheses we have been given. First, we are told that
both limits exist; hence for any > 0 we know there exist f , g > 0 such that by taking |x c|
smaller than f (respectively, g ) then we can make |f (x) L| (respectively |g(x) M |) smaller
than . Since this is the only information we have been given, we should then try to reduce (2.21)
to an expression which involves |f (x) L| and |g(x) M |. By staring at (2.21) we realize we can
re-arrange it as follows:
[f (x) + g(x)] [L + M ] = [f (x) L] + [g(x) M ]
|f (x) L| + |g(x) M |
triangle inequality.
This is perfect! We can control how small |f (x) L| and |g(x) M | become, so in particular, lets
make them less than /2. Let f , g > 0 be given such that
0 < |x c| < f
0 < |x c| < g
2
|g(x) M | < .
2
|f (x) L| <
(2.22)
The problem now is that we have two deltas, and the definition of the limit requires that we give
a single delta. The way to solve this is to recall that if we have found a delta which works, any
smaller delta will also work. Hence by setting = min {f , g } we can guarantee that f and
g so that both equations in (2.22) hold. Hence if 0 < |x c| < then
[f (x) + g(x)] [L + M ] |f (x) L| + |g(x) M | < + = .
2 2
Exercise:
38
c
2015
Tyler Holden
2.4
Limits at Infinity
Limits
We can immediately use this to prove some very powerful and useful results.
Corollary 2.22
If f (x) = p(x)
q(x) is any rational functions (so that p(x) and q(x) are polynomials), and c R
is such that q(c) 6= 0 then
p(c)
p(x)
=
.
lim
xc q(x)
q(c)
Proof. The key is to first show this for polynomials and apply the limit laws. It is easy to see (the
student should check!) that
lim x = c
xc
xc
Let p(x) = an xn + an1 xn1 + + c1 x + c0 be an arbitrary polynomial. Our discussion tells us that
we know how to deal with every occurrence of the function x 7 x, and so using the Theorem 2.21
we have
lim p(x) = lim [an xn + + a1 x + a0 ]
xc
h
in
h
in1
h
i
= an lim x + an1 lim x
+ + a1 lim x + a0
xc
xc
n
xc
n1
= an c + an1 c
= p(c).
xc
+ + a1 c + a0
Thus the result holds for any polynomial. Now if p(x) and q(x) are two polynomials and q(c) 6= 0,
then the limit laws for quotients implies
lim p(x)
p(x)
p(c)
= xc
=
,
xc q(x)
lim q(x)
q(c)
lim
xc
as required.
2.4
Limits at Infinity
There are two notions of infinity that we will be interested in tackling, both with definitions similar
to the - definition we saw before.
Definition 2.23
We say that lim f (x) = L if for every > 0 there exists an M R such that whenever
x
We say that lim f (x) = if for every M R there exists a > 0 such that if
xc
39
c
2015
Tyler Holden
Limits
2.4
Limits at Infinity
In both cases, to consider we replace x > M, f (x) > M with x < M, f (x) < M .
Exercise:
Example 2.24
x
= 1.
x+1
Solution. Let > 0 be given and take M = max 1, 1 1 . If x > M then since x M 1
we know that |x + 1| = x + 1. Furthermore, since x > 1 1 we know that x + 1 > 1 , thus
x
1
1
x + 1 1 = x + 1 = |x + 1| <
as required.
Example 2.25
x0
1
= .
x2
f (x) =
|M |
1
> |M | > M.
x2
1
|M |
and so
Just as it was possible to use the - definition to show that finite limits do not exist, we can
use our rigorous definitions to show that limits at infinity do not exist. Indeed, we say that
lim f (x) does not exist
if for every L R there exists an > 0 such that for every M R there exists an x such that
x > M and |f (x) L| .
Example 2.26
Solution. We will consider the case where L > 0, noting that the L < 0 case follows precisely the
same argument. Assume that L > 1 and let = L1
2 . Let x be any number such that x > M and
notice that
since sin(x) 1 < L
| sin(x) L| = L 1
L1
= .
2
40
c
2015
Tyler Holden
2.5
Continuity
L
2.
Limits
L
= .
2
2.5
M
.
Continuity
In many of the cases that we have hitherto examined our functions have been nicely behaved, in
the sense that the limits were often obvious (if not so obvious to prove). We give such functions
a very nice name:
Definition 2.27
(2.23)
Solution. Here it is quite natural to use the - definition of continuity. Indeed, let > 0 be given
and set = /C. If p R is arbitrary but fixed and |x p| < , then
|f (x) f (p)| C|x p| < C
=
C
(2.24)
so f is continuous at p. Since p was arbitrary, this must hold for every point, and hence f is
everywhere continuous.
We have already seen an entirely family of continuous functions; namely, recall that from
Corollary 2.22 we know that if p(x) is any polynomial then
lim p(x) = p(c)
xc
41
c
2015
Tyler Holden
Limits
2.5
Continuity
showing that all polynomial are continuous. Similarly, if p(x) and q(x) are polynomials and c is in
the domain of p(x)/q(x) (which implies that q(c) 6= 0) then
lim
xc
p(x)
p(c)
=
q(x)
q(c)
Solution. The majority of these proofs often reduce to doing as we did in the previous section, but
instead of fixing a limit point we allow that limit point to vary in general.
1. We shall do the example for
We want to show that
xc
x=
c,
c (0, ).
|x c|
|f (x) f (c)| = | x c| =
| x + c|
|x c|
c
c
< = .
c
(2.25)
c 6= 0. Now
since | x + c| c
In fact, we can use the Limit Laws to say very powerful things about continuous functions:
42
c
2015
Tyler Holden
2.5
Continuity
Limits
Theorem 2.30
xc
xc
Since the limits exists, we know that the limit of the product is the product of the limits, and hence
h
ih
i
lim [f g](x) = lim [f (x)g(x)] = lim f (x) lim g(x) = f (c)g(c) = [f g](c),
xc
xc
xc
xc
xa
This is a little contrived at the moment, but really turns out to be (an alternative to) the defining
characteristic of continuous functions:
Theorem 2.31
If f, g are functions such that lim g(x) = L exists and f is continuous, then
xc
lim f (g(x)) = f lim g(x) = f (L).
xc
xc
(2.27)
Proof. Let > 0 be given. Since we know f (x) is continuous, there exists some > 0 such that
xc
if |y L| < then |f (y) f (L)| < . Furthermore, since g(x) L we know there exist some
> 0 such that if 0 < |x c| < then |g(x) L| < . By setting y = g(x), we thus have that
0 < |x c| <
|g(x) L| = |y L| <
Limits
2.5
Continuity
Corollary 2.32
xc
xc
as required.
Definition 2.33
L = lim f (x).
xc
xc+
Removable
Jump
x
Essential
(
0
1
q
xR\Q
x Q, x = pq , gcd(p, q) = 1
(2.28)
44
c
2015
Tyler Holden
2.6
2.6
Limits
Some limits can be rather tricky to determine, simply because they behave in curious ways around
the limit point. In this section we will introduce the Squeeze Theorem, sometimes also referred
to as the Pinching Theorem, Sandwich Theorem, or sometimes the Police Theorem. The Squeeze
Theorem (as I will likely call it, from here on out) allows us to determine troublesome limits by
bounding one function in terms of two other functions which converge to the same point.
Theorem 2.34: The Squeeze Theorem
If f (x), g(x), and h(x) are all defined in an interval around the point c, satisfying f (x)
g(x) h(x) on that interval, and
lim f (x) = L = lim h(x)
xc
xc
then the limit of g(x) as x c also exists and is equal to L; that is,
lim g(x) = L.
xc
Proof. Let > 0 be given. Since the limits of f (x) and h(x) both exist at the point c, we know
there exist f , h > 0 such that
0 < |x c| < f |f (x) L| <
0 < |x c| < h |h(x) L| <
(2.29)
Set = min {f , h }, so that if 0 < |x c| < then our conditions in (2.29) become
L <f (x) < L +
L <h(x) < L + .
(2.30)
Combining this information together with the fact that f (x) and h(x) bound g(x), we get
L < f (x) g(x) h(x) < L +
showing that |g(x) L| < as required.
As is demonstrated by Figure 6, the idea is that the functions f (x) and h(x) squeeze g(x) into
having the same limit.
Example 2.35
Show that
1
lim x sin
= 0.
x0
x
2
Solution. Regardless of its argument, the functions sin(x) is always bounded above by y = 1 and
below by 1; that is,
1
1 sin
1.
x
45
c
2015
Tyler Holden
Limits
2.6
f (x)
g(x)
h(x)
x= c
x0
x0
as required.
Exercise:
Using the Squeeze Theorem, show the following two results:
1. If lim |f (x)| = 0 then lim f (x) = 0. [Hint: Convince yourself that |f (x)| f (x)
x0
x0
|f (x)|.]
2. If f (x) is bounded (so that there is some M > 0 with |f (x)| < M for all x) and
lim g(x) = 0 then lim f (x)g(x) = 0.
x0
x0
Two useful limits that appears in this section are the following
sin(x)
= 1,
x0
x
lim
cos(x) 1
= 0.
x0
x
lim
(2.31)
We will not present the proofs of these facts, but will instead defer to the textbook for the boring
and technical details as to how these are shown. Instead, I would like to emphasize several points
here that may be pertinent in other courses. Looking at (2.31), we see that these heuristically say
46
c
2015
Tyler Holden
2.7
Limits
that sin(x) x and cos(x) 1 + x as gets gets very close to 0. In physics, these are often referred
to as the small angle approximations, and are responsible for a plethora of dodgy (yet somehow
accurate) physical results. The results of (2.31) may be generalized even further by realizing the
following result, which we leave as an exercise for the students:
Proposition 2.36
x0
cos(ax) 1
= 0.
x0
ax
lim
Example 2.37
x2 sin(4x)
.
x0 sin3 (7x)
lim
Solution. The first important point to realize is that we can evaluate a limit as follows:
sin(ax)
sin(ax) a
sin(ax)
= lim
= a lim
= a.
x0
x0
x0
x
x a
ax
lim
The next key point is to try to match up our x components with our sine components. However,
we notice that we have four sines and only two x components. Luckily, we can artificially introduce
additional xs as follows:
x2 sin(4x)
x
x sin(4x)
= lim
3
x0 sin (7x)
x0 sin(7x) sin(7x) sin(7x)
x
x sin(4x) x
= lim
x0 sin(7x) sin(7x) sin(7x) x
x
x sin(4x) x
= lim
x0 sin(7x) sin(7x)
x sin(7x)
x
x
sin(4x)
x
= lim
lim
lim
lim
x0 sin(7x)
x0 sin(7x)
x0
x0 sin(7x)
x
1 1
1
4
= 4 = 3.
7 7
7
7
lim
2.7
With just continuity under our belts, we can take a look at two exceptionally powerful theorems:
the Intermediate Value Theorem (IVT) and the Extreme Value Theorem (EVT).
47
c
2015
Tyler Holden
Limits
2.7.1
2.7
Let f be a continuous function on [a, b] and assume that f (a) < f (b). For every v
(f (a), f (b)) there exists an x (a, b) such that f (x) = v.
This theorem also holds if f (b) < f (a) so long as we take v (f (b), f (a)). Once we wipe away
the mysterious mathematical jargon, the idea of this theorem is actually rather obvious: It says
that if a function is continuous on an interval, it cannot jump over any point. Additionally, notice
that the theorem implies the existence of an x such that f (x) = v, but in no way indicates how to
find that x. This is an example of an non-constructive theorem. The proof of this theorem is rather
hard11
Remark: There is a version of the Intermediate Value Theorem that uses open intervals,
though it is more complicated to write down. The most important parts of the hypotheses are that
the function be continuous, and the interval be connected.
Example 2.39
Solution. I have deliberately underlined the trigger words for this question: notice that I have not
asked the student to find the solution. Indeed, one would have a very difficult time solving for x
without the use of a computer. The trick here is the use the Intermediate Value Theorem (which
is perhaps unsurprising given that this is the topic of our discussion). The key to doing most
questions like this are to define a new function, say
f (x) = ex sin(x) 2
and notice that if we can find a point p [0, 1] such that f (p) = 0 then we will be done, since
f (p) = 0 = ep sin(p) 2
ep = sin(p) + 2.
To invoke the intermediate value theorem, we first must show that f (x) is a continuous function. We
know that this is true though since f (x) is a sum of continuous functions and hence is continuous.
Next, we must show that there are values in [0, 1] such that f (x) < 0 and f (x) > 0. This problem
has been rigged so that these are very easy to find. Indeed
f (0) = e0 sin(0) 2 = 1,
where in calculating f (1) we have noted that e 2.7 so that e 2 > 0. Thus f (0) < 0 and
f (1) > 0 implies there is some p [0, 1] such that f (p) = 0, and this is precisely what we wanted
to show.
11
Using first principles of the real numbers, it requires a forcing argument. There is a much simpler proof using
the notion of connectedness, but this is too advanced for this course. In fact, the IVT is a special case of a far more
powerful theorem: The continuous image of a connected set is always connected.
48
c
2015
Tyler Holden
2.7
Limits
Now that we have seen an example, let us talk about the general methodology for proving
problems with the intermediate value theorem. The questions will almost always say something
along the lines of Here are two continuous functions f (x) and g(x). Show there is a point such
c that f (c) = g(c). It might not be this clear in the problem statement, but you can always
re-arrange it so that this is the problem you are asked to solve.
In such instances, your strategy should be as follows: Define a new function h(x) = f (x) g(x),
which is continuous as it is the sum of continuous functions. Find a point a such that h(a) < 0 and
b such that h(b) > 0. By the intermediate value theorem, there is a point c between a and b such
that h(c) = 0. This is precisely the point we want to find, since
0 = h(c) = f (c) g(c)
so re-arranging we have f (c) = g(c). Lets see some more examples, this time a little more theoretical.
Example 2.40
If f (x) is a continuous function such that x [0, 1] we have f (x) (0, 1) show that f (x)
has a fixed point; that is, there is a point p such that f (p) = p.
1
f (x)
Solution. We refer to Figure 7 for geometric intuition. The question is essentially asking us to show
that any function which lies strictly in the box [0, 1] [0, 1] must cross the diagonal y = x at some
point.
This question is asking us to find a solution to the equation f (x) = x. If we proceed in the
same fashion as Exercise ?? and our discussion above, we should define a new function
g(x) = f (x) x
and show that there is a point p such that g(p) = 0. To apply the Intermediate Value Theorem, we
must first confirm that g(x) is continuous. Indeed, since f (x) is continuous and x is continuous,
49
c
2015
Tyler Holden
Limits
2.7
g(x) is a difference of continuous functions and hence is continuous itself. Next, we want to show
that g goes below zero and above zero. Indeed, this is the case since
g(0) = f (0) 0 = f (0)
and since f (0) (0, 1) we know that g(0) = f (0) > 0. On the other hand,
g(1) = f (1) 1
and since f (1) (0, 1) it must be that g(1) < 0. Thus by the intermediate value theorem, there
exists p [0, 1] such that g(p) = 0 and so f (p) = p as required.
I personally hate the above stated version of the Intermediate Value Theorem.
Here is an (equivalent) and much better statement:
Exercise:
If f is continuous on the interval [a, b] and f (a)f (b) < 0, then there exists some
x such that f (x) = 0.
Convince yourself that this statement is equivalent to Theorem 2.38.
2.7.2
If f is continuous on [a, b] then f attains its maximum and minimum values on [a, b].
Remark: The important parts of this theorem are the opposite of the IVT. Indeed, it is
crucial for this theorem that we look at the closed interval [a, b], and we can actually put multiple
closed intervals into this theorem and still have it be true.
Being a named theorem, the EVT finds a tonne of applications in mathematics12 . Unfortunately
(or perhaps fortunately for the student), the most interesting examples are beyond the scope of this
course. For this reason, the variety of questions regarding the EVT is quite sparse. Nonetheless,
let us look at some examples:
Example 2.42
Determine on which of the following intervals the function x1 attains a global maximum and
minimum.
[1, 1],
(0, 1),
[1, 2].
Solution. According to the Extreme Value Theorem, we can be guaranteed that x1 attains its max
and min if it is continuous on a closed interval. The first interval [1, 1] is closed but x1 is not
continuous at 0 which is a point in [1, 1]. Hence we cannot guarantee that x1 attains a max and
12
Just as in the case of the intermediate value theorem, the EVT is a special case of a far more powerful theorem:
The continuous image of a compact set is compact.
50
c
2015
Tyler Holden
2.7
Limits
min on [1, 1] (though note that it does attain global minima at x = 1). Similarly, the interval
(0, 1) is not closed so we cannot guarantee that the maximum or minimum is attained. In fact,
there is no max or min of x1 in (0, 1). Finally, x1 is continuous on [1, 2] and so by the Extreme Value
Theorem the max and min are attained.
51
c
2015
Tyler Holden
Derivatives
Derivatives
Derivatives are the tools which allow us to dynamically analyze the properties of functions, and
are intimately connected with limits (in so much as we require the limits ability to capture the
infinitesimal behaviour of a function).
3.1
First Principles
The speed of the car is measured as the distance travelled in a given amount of time, but limiting
ourselves to a single instant means that neither any time nor distance has passed. How then can
we ascribe a sensible meaning to this instantaneous velocity? One approach was to approximate
the speed by measuring the distance travelled by the car in the minute before (or after) crossing
the finish line, and taking the average speed in that duration. Of course, we can get a better
approximation by measuring the distance in a single second before/after crossing the finish line,
or any time smaller than a second. If f (t) represents the position of the car at time t and the car
passes the finish line at time t0 , then the average speed of the car in the second after passing the
finish line is given by
average speed
distance
f (t0 + 1 sec) f (t0 )
=
.
time
1 sec
We can do this more generally by letting h denote an arbitrary quantity of time, so that
average speed
distance
f (t0 + h) f (t0 )
=
time
h
where if h > 0 then we are measuring after crossing the finish line, and if h < 0 we are measuring
prior to crossing the finish line. The instantaneous speed of the car may then be determined by
taking the limit as our time interval becomes arbitrarily small; that is, by taking h 0.
While this works well for cars, there is no reason that we cannot generalize this discussion to a
more abstract setting or to other examples. The key point is that having precise knowledge about
an object at any given point allows us to determine how quickly some attribute is changing. This
can be realized geometrically as follows:
Definition 3.1
Consider the graph of a function f (x) and let a < b be distinct real numbers. The secant
line from a to b is the unique straight line which passes through the points (a, f (a)) and
(b, f (b)).
Of interest to use is the slope of this secant line, given by
mab =
f (b) f (a)
.
ba
In fact, this slope describes the average value of the function f (x) between f (a) and f (b). We can
get successively better approximations to the instantaneous rate of change of a function by taking
the distance between a and b to be successively smaller, and the instantaneous rate of change will
be given by taking a limit as a b. Geometrically, in the limit we get the tangent line.
52
c
2015
Tyler Holden
3.1
First Principles
Derivatives
Definition 3.2
If f (x) is a function and a is in the domain of f (x), we define the derivative of f (x) at a to
be the value
f (x) f (a)
f 0 (a) = lim
xa
xa
if this limit exists. In such an instance we say that f (x) is differentiable at a, and when f (x)
is differentiable everywhere we say it is simply differentiable.
Example 3.3
f 0 (1) = lim
x1
f 0 (4) = lim
x4
One of the things that the student may have noticed in Example 3.3 was that the computation
of the derivative at each point is rather redundant. Indeed, notice that in our computation of the
derivative of f (x) = x2 at x = 1 and x = 4, what if we used the number x = a in general? We
would find that
f (x) f (a)
f 0 (a) = lim
xa
xa
x2 a2
= lim
h0 x a
(x + a)(x a)
= lim
= lim (x + a)
xa
xa
xa
= 2a.
This agrees with what we found when a = 1 and a = 4, but gives us the derivative at any point
a. Hence one can view the derivative of a function as a function itself, with
f 0 (x) = lim
h0
f (x + h) f (x)
f (y) f (x)
= lim
.
yx
h
yx
53
c
2015
Tyler Holden
Derivatives
3.1
First Principles
In fact, this is where the word derivative comes from: the value of the function f 0 (x) is derived
from that of f (x).
Remark 3.4 If f is a function and we compute its derivative f 0 , what is the domain of
f 0 and how is it related to that of f ? The way this is usually handled often depends on
the type of mathematics being done. For example, some people insist that the domain of
f and f 0 must be the same. This is typically done for convenience, but can sometimes be
important to the study of your space. On the other hand, some mathematicians define the
domain of f 0 to be the largest subset of f such that the derivative exists. In this case, the
domain of f 0 is a subset of the domain of f . For the most part unless otherwise specified,
this class will always use the latter definition.
A different parameterization: An alternative way of writing the derivative comes from changing how we parameterize the limit. Instead of taking two distinct points a and b, let us set b = a+h,
so that the limit b a is equivalent to the limit h 0. Thus an equivalent definition of the derivative is given by
f (a + h) f (a)
f (a + h) f (a)
f 0 (a) = lim
= lim
.
(3.1)
h0
h0
(a + h) a
h
Exercise:
Convince yourself of the equality of the two definitions for the derivative given in
(3.1).
Example 3.5
Compute the derivative of the function f (x) = 9/x. Compare the domain of f and f 0 .
Solution. Let us apply our alternative definition of the derivative to this example (though I encourage the student to also compute the derivative using the former definition and see that they
are the same).
f 0 (x) = lim
h0
f (x + h) f (x)
9/(x + h) 9/x
= lim
h0
h
h
9x9(x+h)
x(x+h)
9h
h0
h0 xh(x + h)
h
9
9
= lim
= 2.
h0 x(x + h)
x
= lim
= lim
Tangent Lines: We know that f 0 (a) thus represents the slope of the tangent line to the graph
of f (x) at the point a. Since we have a slope and a point on in the plane (a, f (a)), we can form
the equation of the tangent line through f (x) at a using the point-slope formula:
y f (a) = f 0 (a)(x a).
54
c
2015
Tyler Holden
3.1
First Principles
Derivatives
Example 3.6
x at the point x = 1.
Solution. While we are at it, we might as well determine the derivative of the function f (x) = x.
x+h x
f (x + h) f (x)
0
= lim
f (x) = lim
h0
h0
h
h
x+h x
x+h+ x
Multiply by conjugate
= lim
h0
h
x+h+ x
x+hx
= lim
h0 h( x + h + x)
x
1
= lim
= .
h0
2
x
x+h+ x
At x = 1 we have f (1) = 1 = 1, so we know our line passes through the point (1, 1). Our
derivative formula tells us that f 0 (1) = 12 , so using our point slope formula, we thus deduce that
the equation of the tangent line is y 1 = 12 (x 1) or y = 12 x + 12 .
Example 3.7
f 0 (x) = lim
= lim
x(x+h)
x(x+h)
h
1
= lim
h0 x(x + h)
1
= 2.
x
h0
= lim
h0
1
x+h
1
x
h
hx(x + h)
There are occasions when in lieu of using a function to find a derivative, we would instead
to prefer to define a relationship amongst variables, with one variable relying explicitly upon the
other variable. Such a relationship might be written as y = x2 . In the event that the relationship
is given by a function f (x), we often write y = f (x). This alternative method of thinking about
the relationship between the input and output of an assignment has a correspondingly alternative
method for writing derivative: If y = f (x) then
f 0 (x) =
dy
.
dx
dy
One should understand the notation dx
as meaning a quantity which describes how y is changing
with respect to x. The motivation for this definition comes from the definition of the derivative as
55
c
2015
Tyler Holden
Derivatives
3.1
First Principles
the limit of secant lines. If a < b are two distinct real numbers, the change in the y-value of the
function f (x) between a and b may be written as y = f (b) f (a) (this is still often used amongst
physicists), with the change in the x-value being written as x = b a. The secant line between
a and b is then
y
dy
x0
mfab =
x
dx
and in the limit as x 0 these deltas transform into ds.
Example 3.8
Determine
dy
dx
if y = (x2 + 1)/(2x).
Solution. As all examples previous, this amounts to an exercise in algebraic manipulation of the
definition of the derivative. Indeed, one finds that
dy
= lim
dx h0
= lim
(x+h)2 +1
2(x+h)
x2 +1
2x
h
+
+ 2x) (2x3 2x2 h 2x 2h)
= lim
h0
h(2x)(2x + 2h)
2
2x h 2h
2x2 2
= lim
=
h0 h(2x)(2x + h)
4x2
2
x 1
=
.
2x2
h0
(2x3
3.1.1
4x2 h
Failures of Differentiability
A posteriori, there are three ways in which a function which fail to be differentiable.
1. The function may have a cusp.
2. The function may fail to be continuous.
3. The function may have a vertical tangent line.
With any luck, the student may be able to convince his/herself as to why a cusp will quickly fail to
be differentiable. If one were to visualize the tangent lines to the graph in a neighbourhood around
a cusp, they would see that the tangents have distinctly different values at the cusp point. This
implies that the two-sided limit
f (x + h) f (x)
lim
(3.2)
h0
h
does not exist, despite the fact that the one-sided limits might. A similar argument holds for
discontinuity. If f (x) is not continuous at the point x0 then either f (x0 ) does not exist (making
Equation 3.2 read as non-sense in the case of removable or essential discontinuities) or f (x+h)f (x)
56
c
2015
Tyler Holden
3.1
First Principles
Derivatives
converges to a finite number as h 0, so that the limit in Equation 3.2 is not defined. Finally, a
vertical tangent line has slope infinity, meaning that the derivative diverges at that point.
Example 3.9
Consider the functions f (x) = |x|, g(x) = x/|x| and h(x) = 3 x. Determine the points where
each function is not differentiable, and classify the manner in which the function fails to be
differentiable.
Solution. We begin by analyzing the function f (x) = |x|. A glance at the graph of f (x) shows that
f (x) has a cusp at the point x = 0, so let us check whether f (x) is differentiable at 0. Each one
sided limit may be computed as
f (0 + h) f (0)
|h|
= lim
=1
h
h0+
h0+ h
|h|
f (0 + h) f (0)
= lim
= 1
lim
h
h0 h
h0
lim
Finally, let us look at the function h(x) = 3 x. If we recall that we may factor a difference of
cubes as
a3 b3 = (a b)(a2 + ab + b2 )
we may compute the derivative of h(x) in general to be
3
x 3y
0
h (x) = lim
yx
xy
3
x 3y
p
= lim
yx ( 3 x 3 y)( 3 x2 + 3 xy + 3 y 2 )
1
p
= lim
p
3
3
yx
x2 + xy + 3 y 2
1
=
.
3
3 x2
This limit will thus hold true for any x 6= 0, but at x = 0 the slope of the tangent line is infinite.
I had mentioned previously that continuity measures the well-behavedness of a function; in
particular, a function f (x) is continuous at a if, in a neighbourhood of a, f (x) is bounded, does
not oscillate infinitely quickly, and does not jump. The above discussion shows that a function
f (x) is differentiable at a if it is behaved in many of the same ways as a continuous function,
with the added assumption that f (x) has no cusps. We may glean this as an indication that
57
c
2015
Tyler Holden
Derivatives
3.2
differentiability measures the smoothness of a function, and in fact the word smooth has a
formal mathematical definition which I will allude to shortly. Of more immediate importance is
the fact that being differentiable is stronger than being continuous; that is, all differentiable
functions are continuous, though the converse is not true (not all continuous functions are
differentiable).
Proposition 3.10
(3.3)
xa
or equivalently
xa
At this point we say to ourselves, what part of the hypothesis have we failed to use? Examination
will show that we know something about the quantity (f (x) f (a))/(x a) and now wish to say
something about f (x) f (a), so we should multiply and divide by the quantity (x a) to find
f (x) f (a)
lim [f (x) f (a)] = lim
(x a)
xa
xa
i
f (x) f (a) h
= lim
lim x a
xa
xa
xa
xa
=L0=0
as desired.
3.2
Introducing the Limit Laws meant that we could avoid having to create annoying value-tables each
time we wished to evaluate a limit. Similarly, it is rather cumbersome lugging around the awkward
definition of the derivative. In this section, we give a few formulas to simplify some calculations,
and examine how to differentiate polynomials and exponential functions.
Proposition 3.11
d
[f (x)] = f 0 (x) for any real number ,
dx
2.
d
[f (x) + g(x)] = f 0 (x) + g 0 (x)
dx
58
c
2015
Tyler Holden
3.2
Proof.
Derivatives
re-arranging terms
d
is what is called a linear operator, in that it preserves
The previous theorem tells us that dx
scalar multiplication and addition. As an immediate corollary, we see that
d
d
[f (x) g(x)] =
[f (x) + (1) g(x)] = f 0 (x) + (1) g 0 (x) = f 0 (x) g 0 (x)
dx
dx
so that the derivative of the difference is also the difference of the derivatives.
These facts are ultimately important, but are often very easy to remember and are not particularly enlightening. However, the are extremely useful in that they tell us exactly how to differentiate
polynomials, once we have the following result in hand.
Proposition 3.12
d n
x = nxn1 .
dx
Proof. Recall that one may always factor the nth -powered difference of two elements as
(an bn ) = (a b)(an1 + an2 b1 + + a1 bn2 + bn1 ).
Applying this to the definition of the derivative, we see that
d n
y n xn
(y x)(y n1 + y n2 x + + yxn2 + xn1 )
x = lim
= lim
yx y x
yx
dx
yx
n1
n2
n2
n1
lim y
+y
x + + yx
+x
{z
}
yx |
n-times
= nx
n1
Now every polynomial function is built for scalar multiplication and addition of monomials of
the form an xn , hence we may now differentiate any polynomial quite easily as follows:
59
c
2015
Tyler Holden
Derivatives
3.2
d
an xn + an1 xn1 + + a2 x2 + a1 x + a + 0
dx
d n
d n1
d
= an
+ + a1
x + an1
x
x +0
dx
dx
dx
= nan xn1 + (n 1)an1 xn2 + + 2a2 x + a1 .
Example 3.13
d
While the above proposition only showed that dx
= nxn1 for any positive integer n, it turns
out that the formula works for all real numbers. There is some work involved in showing that this
is the case, so for the moment the student should take this fact for granted.
Exercise:
Show that
d n
dx
Exercise:
f (x) = 1 + x +
x2 x3 x4 x5
+
+
+
+ .
2!
3!
4!
5!
Compute the derivative of f (x). If you are given that derivatives are unique up to additive
constants, what function must f (x) be?
3.2.1
Computing the derivative of sums of functions was ultimately rather simple. However, it turns out
that computing the derivative of a product is a far more complicated affair.
Theorem 3.14
60
c
2015
Tyler Holden
3.2
Derivatives
Let f (x) be a differentiable function such that f 0 (x) = 1/f (x). Compute
d
dx
[f (x)]2 .
Exercise:
Example 3.16
Let f (x) be differentiable and satisfy f (1) = 1 and f 0 (1) = 2. Compute g 0 (1) where g(x) =
f (x)/x.
d 1
1
Solution. We have already seen that dx
x = x2 , so the product rule tells us that
d
d
1
1
d 1
g(x) =
f (x)
= f 0 (x)
+ f (x)
dx
dx
x
x
dx x
0
1
f (x)
=
+ f (x) 2
x
x
0
f (x) f (x)
=
2 .
x
x
f 0 (1) f (1)
2 1
2 = = 2 1 = 1.
1
1
1 1
61
c
2015
Tyler Holden
Derivatives
3.2
In fact, there is no reason to limit ourselves to considering the product of only two functions.
If f (x), g(x), and h(x) are all differentiable then
d
f (x)g(x)h(x) = f 0 (x)g(x)h(x) + f (x)g 0 (x)h(x) + f (x)g(x)h0 (x).
dx
The way to see this is simple: define a new function n(x) = g(x)h(x) so that n0 (x) = g 0 (x)h(x) +
g(x)h0 (x) and f (x)g(x)h(x) = f (x)n(x). Since the right-hand-side is a product of two functions,
the product rule again gives us
d
f (x)n(x) = f 0 (x)n(x) + f (x)n0 (x) = f 0 (x)g(x)h(x) + f (x) g 0 (x)h(x) + g(x)h0 (x)
dx
= f 0 (x)g(x)h(x) + f (x)g 0 (x)h(x) + f (x)g(x)h0 (x)
and this process is easily generalized to any number of functions.
Theorem 3.17: The Quotient Rule
If f (x) and g(x) are differentiable at c and g(c) 6= 0 then f (x)/g(x) is differentiable at c and
d f (x)
f 0 (x)g(x) f (x)g 0 (x)
=
.
dx g(x)
[g(x)]2
Exercise:
d
f (x)n =
dx
2. Use part (a) to show that the Product Rule and the Quotient Rule are equivalent.
Example 3.18
62
c
2015
Tyler Holden
3.2
3.2.2
Derivatives
Higher Derivatives
Differentiating a function f (x) resulted in another function f 0 (x). If f 0 (x) is nice enough, we
should be able to differentiate it again to get f 00 (x), the second derivative of f (x). Once again,
f 00 (x) is a function and we may differentiate it to get a third derivative f 000 (x). After three primes,
one usually uses the notation f (n) (x) to denote the nth derivative of f (x). These have important
interpretations in both mathematics and science which we shall explore later. In Leibniz notation,
d
we use the operator dx
to take subsequent derivatives, hence if y = f (x) then the first derivative is
dy
dx , while the second, third, and fourth derivatives are
d dy
d2 y
d d2 y
d3 y
=
,
=
,
dx dx
dx2
dx dx2
dx3
respectively, with the patter continuing ad infinitum.
d d3 y
d4 y
=
dx dx3
dx3
Example 3.19
Compute the second derivative of the function f (x) = x1 . Try to determine a formula for the
nth derivative f (n) (x), and prove that this is the case using induction.
Solution. In Example 3.7 we showed that f 0 (x) = 1/x2 . To compute f 00 (x) we take the derivate
of f 0 (x) and find that
f 00 (x) =
d 0
f 0 (x + h) f 0 (x)
f (x) = lim
h0
dx
h
x2 +(x+h)2
1 + x12
x2 (x+h)2
= lim x+h
= lim
h0
h0
h
h
2
2xh + h
2x + h
= lim
= lim 2
h0 hx2 (x + h)2
h0 x (x + h)2
2x
2
= 4 = 3.
x
x
Of course, having completed exercise 3.2 we can also confirm this using the power-rule.
If we were to continue on in this fashion, the third derivative would be computed to be
so on. We hypothesize that in general:
6
,
x4
and
(1)n n!
.
xn+1
We have done the base case already. In the induction step, we thus have
d (n)
d (1)n n!
(n+1)
f
(x) =
f (x) =
dx
dx
xn+1
d n+1
= (1)n n!
x
dx
h
i
f (n) (x) =
= (1)n n! (n + 1)x(n+2)
=
(1)n+1 (n + 1)!
,
xn+2
as required.
63
c
2015
Tyler Holden
Derivatives
3.2.3
3.2
Trigonometric Derivatives
We would now like to explore the derivatives of the trigonometric functions sin(x) and cos(x). Since
all other functions (such as tan(x) and sec(x)) are built from quotients of these two functions, our
knowledge of the product and quotient rules will allow us to compute the derivatives of all the
remaining fundamental trigonometric functions. For the sake of clarity, recall the following two
limits from Section 2.6.
sin()
cos() 1
lim
= 1,
lim
= 0.
(3.4)
0
0
Theorem 3.20
The functions sin(x) and cos(x) are everywhere differentiable, and in particular
d
sin(x) = cos(x),
dx
d
cos(x) = sin(x).
dx
Proof. I will demonstrate the proof for the derivative of sin(x) and leave the proof for cos(x) as an
exercise for the student. Applying the definition of the derivative we have
d
sin(x + h) sin(x)
sin(x) = lim
h0
dx
h
sin(x) cos(h) + cos(x) sin(h) sin(x)
= lim
h0
h
cos(h) 1
sin(h)
= lim sin(x)
+ cos(x)
h0
h
h
sin(h)
cos(h) 1
+ cos(x) lim
= sin(x) lim
h0
h0
h
h
{z
}
{z
}
|
|
=0
=1
= cos(x).
Example 3.21
Solution. Since tan(x) = sin(x)/ cos(x) we can apply the quotient rule to find
d
d
d
dx sin(x) cos(x) sin(x) dx cos(x)
tan(x) =
dx
cos2 (x)
cos2 (x) + sin2 (x)
1
=
=
cos2 (x)
cos2 (x)
= sec2 (x).
64
c
2015
Tyler Holden
3.2
Derivatives
Computing the remaining trigonometric derivatives is an excellent exercise in the use of the
quotient rule, but I shall provide here a table of the trigonometric derivatives:
d
sin(x) = cos(x)
dx
d
csc(x) = csc(x) cot(x)
dx
d
cos(x) = sin(x)
dx
d
sec(x) = sec(x) tan(x)
dx
d
tan(x) = sec2 (x)
dx
d
cot(x) = csc2 (x)
dx
Periodicity in the differentiation process: It is not too hard to see that by iteratively differentiating sine/cosine, we eventually cycle back on ourselves:
d
sin(x) = cos(x),
dx
d2
sin(x) = sin(x),
dx2
d3
= cos(x),
dx2
d4
sin(x) = sin(x),
dx4
so differentiation the sine function four times gives us back the function back (or alternatively,
differentiate the function twice gives us back the negative of the function). The student should
check that precisely the same argument works with the cosine function.13
Degrees vs Radians: It was essential that we used radians in lieu of degrees when doing all of
our trigonometric limits. The reason is precisely because the identity limx0 sin(x)
= 1 would no
x
longer be true, and would instead given a value of 180
.
One
can
then
see
how
this
would
affect our
calculations of the derivative by either imitating the proof of Theorem 3.20 or using the fact that
if r is radians and is degrees, then = 180
r. Indeed,
d
sin(r) = sin
d
180
180
=
cos
180
.
This destroys the nice symmetry between the derivatives of sine and cosine, and introduces a pesky
scaling factor every time we differentiate.
Even more importantly, since we learn calculus from the point of view of radians, if the student
is ever asked to do a computation, using degrees will actually give the wrong answer. The moral
of the story: Always use radians.
13
65
c
2015
Tyler Holden
Derivatives
3.3
3.3
Rates of Change
Rates of Change
We already saw at the beginning of this section how derivatives can be used to deduce the instantaneous rate of change of one quantity with respect to another. The power of this is that it allows
us to take boring, static relationships, and differentiate them to discover the dynamic interplay
between variables.
Example 3.22
Solution. All we need to do is differentiate the given expression with respect to r to find that
dV
= 4r2 .
dr
This says that the rate of change of volume V with respect to the radius r is 4r2 . As an example,
this means that double the radius of a sphere will quadruple its volume, while tripling the radius
will increase the volume 9-fold!
One of the more utilized relationships is that of position, velocity, acceleration, jerk, etc. If p(t)
describes the position of an object with respect to time, then p0 (t) is its velocity and p00 (t) is its
acceleration with respect to time. This can be used to model physical situations which can then
be solved by mathematical methods:
3.3.1
Some Physics
Gravity: We have learned at this point that if the trajectory of an object is described as x(t),
d2 x
we may think of dx
dt as the instantaneous velocity and dt2 as the instantaneous acceleration of that
object.
Example 3.23
Consider a child standing on the surface of the earth. If the child has a ball of mass m and
throws the ball from an initial height y0 with an initial velocity v0 , describe the trajectory
of the ball as a function of time.
Solution. The common theme in these types of questions is to exploit Newtons second law, which
say that the net force Fnet exerted on a body is equal to simply the objects inertial mass times its
acceleration; that is, Fnet = ma. In our case, we shall ignore air-friction and assume that the force
of gravity is a constant, denoted by g. A quick look at Figure 8 shows us that the net force acting
on the ball is F = mg, where the negative sign is included to indicate that the force of gravity is
acting in the opposite direction of initial motion. Newtons second law then implies that
F = ma = mg.
66
c
2015
Tyler Holden
3.3
Rates of Change
Derivatives
Now since the mass is the same in either case (see the note at the bottom) we may cancel to deduce
2
that a = g. But we mentioned earlier that if y is the balls vertical displacement, then a = ddt2y ,
yielding the ordinary differential equation
d2 y
= g.
dt2
We are not yet at a stage wherein we will properly be able to treat differential equations. Nonetheless, this is not too difficult to handle since g is a constant. In particular, we ask ourselves What
kind of function may we differentiate twice to get a constant? The immediate answer is that
quadratic functions will suffice. To see this, recall that
d2
at2 + bt + c = 2a
2
dt
(3.5)
which is a constant. We thus make the ansatz (educated guess) that we may describe the equation
of motion as y(t) = at2 + bt + c for which we then need to determine the coefficients a, b, c R.
Equation (3.5) implies that y 00 (t) = 2a = g so that a = g/2. The other coefficients are determined
by the two pieces of information that we have yet to use: the initial height and initial velocity. The
initial velocity (at time t = 0) is v0 which means that
v0 =
and the initial height is
dy
(0) = [2at + b]t=0 = b
dt
y0 = y(0) = at2 + bt + c t=0 = c.
Putting this all together we conclude that the equation of motion is given by
g
y(t) = t2 + v0 t + y0 .
2
(3.6)
Note that this makes sense intuitively (Figure 8): Indeed, the leading term of t2 should be negative
since the ball is being projected upwards and then is returning to the ground.
The derivative yields y 0 (t) = gt + v0 which has a zero at t = v0/g. This is clearly a maximum
since the second derivative test is y 00 (t) = g < 0. The height at this time is given by
y
v0
g
g
=
2
v0
g
2
+ v0
v0
g
+ y0 =
v02
+ y0
2g
(3.7)
Remark 3.24 In the above derivation of our equation of motion, we commented that we
could simply cancel the masses on each side of the equation. While this is something
that we often do without thinking, I would ask the student to be very thoughtful about this
procedure. Philosophically, these are in fact different masses: one is the objects inertial
mass and the other is the objects gravitational mass. The fact that they are essentially the
same is something called the Equivalence Principle and is actually one of the foundational
postulates of Einsteins theory of general relativity.
67
c
2015
Tyler Holden
Derivatives
3.3
Rates of Change
y (t)
v0
m
y0
t = v0 /g
Figure 8: Left: A child throws a ball of mass m into the air from an initial height of y0 . It is
pulled back to Earth via the force of gravity. Right: A plot of the height of the ball as a function
of time.
A good way of checking the feasibility of your answers in physics is to ensure that things are
dimensionally correct. For example, the dimension of y0 , denoted [y0 ], is length. We write this as
[y0 ] = L. Since v0 is velocity its units are length per time, written as [v0 ] = LT 1 . Acceleration
is length per time per time, so [g] = LT 2 . Finally, time is clearly just [t] = T . We can multiply
any collection of dimensions, but we may only add dimensions when they are the same. Looking
at equation (3.6) it is thus easy to see that
hg i
t2 = [g][t]2 = (LT 2 )T 2 = L, [v0 t] = [v0 ][t] = (LT 1 )T = L, [y0 ] = L
2
so that each component actually has the correct dimensions. Similarly, in (3.7) we have
2
v0
L2 T 2
=
=L
g
LT 2
d
so again our solution makes sense. Notice then that all of this implies that dt
= T 1 . In general,
differentiating by a quantity with units U results in an object whose units are affected by U 1 .
The Simple Harmonic Oscillator The simple harmonic oscillator is easily one of the most
important physical systems that can be described. It appears constantly throughout all of physics
and much of mathematics, and so warrants some of our attention.
Example 3.25
Consider an object of mass m attached to a spring with spring constant k. If the object is
initially a distance of x0 from its equilibrium and is given an initial velocity of v0 , what is
the equation of motion of the spring?
Solution. Let the equilibrium point of the spring be given by x = 0, in which case Hookes law tells
us that the net force experienced by the object is given by F = kx. Again a simple application
68
c
2015
Tyler Holden
3.3
Rates of Change
Derivatives
x0
v0
m
k
Figure 9: A spring-mass system, wherein the spring coefficient is given by the quantity k.
d2 x
,
dt2
(3.8)
p
k/m,
Again, a proper treatment of this question warrants the use of some ODE theory, but we can
formulate another ansatz as to the solutions of this equation. Essentially, this equation is asking
us What function, when you differentiate it twice, gives you back the negative of the function?
The only such functions (of which you are currently aware) are the sine and cosine functions. In
fact, the student should convince themselves that equation (3.9) is satisfied by
x(t) = A sin(t) + B cos(t)
for as-of-yet determined coefficients A, B R. Just as we did before, we may calculate A, B using
the unused information given to us in the problem. The initial displacement is y0 implying that
y0 = x(0) = A sin(0) + B cos(0) = B
and the initial velocity is
v0 = x0 (0) = A cos(0) B sin(0) = A.
Putting this all together, the equation of motion for the spring is
x(t) =
v0
sin(t) + y0 cos(t).
69
c
2015
Tyler Holden
Derivatives
3.3
Rates of Change
Remark 3.26 The introduced above is known as the resonance frequency and is the natural frequency of the system. Many systems have natural frequencies and much technology
has been developed by exploiting this frequency (example: Magnetic Resonance Imaging).
On the other hand, resonance is also responsible for some catastrophes. Look for videos of
the original Tacoma Narrows Bridge. When engineers designed this bridge, they failed to
consider what the bridges natural frequency would be. There would be some days wherein
high-winds would match this frequency causing the bridge to oscillate. The bridge eventually
collapsed.
The Pendulum: So far I have given you some very simple problems whose solutions could easily
be found. However, the reality is that while we can often easily formulate differential equations to
describe a system, it is very rare that these equations can be properly solved (we will see more on
this when we get to integration theory). Nonetheless, there is an entire field of mathematical study
dedicated to solving differential equations which do not have closed form solutions. The moral of
the story: we can use calculus to describe the motion of a great deal many things.
Here is an example which does not have a closed form solution, but whose equation of motion
is simple to derive.
Example 3.27
Consider a bob of mass m hanging from a rigid rod of length L. If the rod is not in an
equilibrium state, describe its equation of motion.
L
m
3.4
d2 x
dt2
Derivatives
d2
g
sin = 0.
2
dt
L
It may not be obvious at this point that this equation cannot by solved analytically, but let me
assure you that it cannot. However, there is one extenuating circumstance in which we may find
approximate solutions. If is a very small angle then sin in which case this becomes
g
d2
=0
2
dt
L
which we recognize as the equation for the simple harmonic oscillator.
3.4
We have seen how to take the derivatives of sums, products, and quotients of functions. The only
major operation left is to look at function composition.
The reason that this rule is called the chain rule is
chain: Given functions f and g the compositions f g
afterwards.
g
/R f
R
f g
So how do we differentiate the composition? The answer is perhaps more insightful if we look
at Leibniz notation. Let y = g(x) and w = f (y) so that w = f (y) = f (g(x)). We should first
remark that some students find the above equation disconcerting. What does it mean that f is a
function of y, but that f (g(x)) is a function of x?
The idea behind this is easier to comprehend if we think of a physical situation. Lets say
that youre a weatherman, and you want to determine the outside temperature over the course of
the day. You realize that the temperature H depends on the amount of sunshine s, and they are
related through a function H = f (s). This makes perfect sense, and you can even use calculus to
determine dH
ds , the instantaneous rate of change of temperature with respect to sunshine.
But maybe you find it difficult to measure the amount of sunshine directly. Instead, you realize
that sunshine itself is a (periodic) function of time t; that is, you can write s = g(t) for some
function g. This allows you to determine ds
dt , the instantaneous rate of sunshine with respect to
time.
But what if you now want to determine how the temperature changes with respect to time?
It seems like you should be able to do this: after all, you know how temperature changes with
sunshine, and how sunshine changes with time:
time sunshine temperature.
Here, sunshine just acts as an intermediary for getting from time to sunshine, and we can write
T = f (s) = f (g(t)).
71
c
2015
Tyler Holden
Derivatives
3.4
Now back to the question of differentiation. To differentiate temperature with respect to time
in this way, we are definitely going to have to go through time. We have already seen how the
differentials can act like fractions, and keeping this in mind it is tempting to say that
dH
dH ds
=
.
dt
ds dt
In fact, it turns out that this is correct:
Theorem 3.28: Chain Rule
Let f, g be function such that g is differentiable at c and f is differentiable at g(c). Then the
composition f g is differentiable at c and (f g)0 (c) = f 0 (g(c))g 0 (c). In Leibniz notation, if
w = f (y) and y = g(x) then
dw
dw
dy
=
.
(3.10)
dx
dy
dx
a
g(a)
Example 3.29
x + x.
Solution. The key to differentiation this function is to realize it as the composition of two functions.
f (g(x)) = g(x) = x + x.
Now we know that f 0 (x) =
Example 3.30
2 x
d
d
f (g(x)) = f 0 (g(x))g 0 (x)
x+ x=
dx
dx
1
1
= p
1+
2 x
2 g(x)
2 x+1
= p
.
4 x2 + x x
Solution. By setting g(x) = xn we have that f (x)n = g(f (x)). Furthermore, g 0 (x) = nxn1 , so
applying the chain rule we have
d
f (x)n = g 0 (f (x))f 0 (x) = nf (x)n1 f 0 (x)
dx
72
c
2015
Tyler Holden
3.4
Derivatives
as required.
Example 3.31
Denote by f n (x) the n-fold composition of f . For example, f 3 (x) = f (f (f (x))). Assuming
that f (1) = 1 and f 0 (1) = 2 find the derivative of f n evaluated at x = 1.
Solution. This is just a repeated exercise of the chain rule requiring a small bit of trickery. First,
we notice that since f (1) = 1 then no matter how many times we compose by f , we will always
get 1. More explicitly, notice that f (f (1)) = f (1) = 1, and in general f n (1) = 1. For the sake of
intuition, let us try this when n = 3. Notice in this case that
(f 3 )0 (x) = f 0 (f (f (x))) f 0 (f (x)) f 0 (x)
so that
3
(f 3 )0 (1) = f 0 (f (f (1))) f 0 (f (1) f 0 (1) = f 0 (1) f 0 (1) f 0 (1) = f 0 (1) = 8.
In fact, precisely the same procedure will work for general n, and we get
(f n )0 (x) = f 0 (f (n1) (x))f 0 (f (n2) (x)) f 0 (f (x))f 0 (x) =
n1
Y
f 0 (f i (x))
i=0
ntimes
Just as we were able to use the product rule to extend the power rule from n N to n Z, we
can use the product rule to tell us something about inverse functions (we will discuss this more in
a subsequent section).
Theorem 3.32: Inverse Function Theorem
Let f be continuously differentiable at the point c, and assume further that f 0 (c) 6= 0. Then
there is a neighbourhood of c on which f is injective with continuously differentiable inverse,
and the derivative of the inverse is given as
0
f 1 (x) =
1
f 0 (f 1 (x))
(3.11)
The proof of this theorem is not terribly difficult (until you generalize it to higher dimensions).
The key is to show that since f 0 (c) 6= 0, one can find an inverse function f 1 which is actually
continuous is a neighbourhood of c. The fact that the function is differentiable is not too much
more work, so we leave it as an exercise for the student14 . Instead, lets look at how (3.11) is
derived.
14
This question looks daunting and is requires some concentrated thinking, but is not impossibly hard. Try drawing
a prototypical picture with and bands and think about what is happening for a long time, tilting your head to
the side as necessary.
73
c
2015
Tyler Holden
Derivatives
3.5
Let us begin by assuming that we have been given f 1 and told that it is differentiable. By
definition of how the inverse function behaves, we know that f (f 1 (x)) = x for all x in the range
of f . Differentiating both sides (applying the chain rule to the composition) we then get
d
f (f 1 (x)) = f 0 (f 1 (x))(f 1 )0 (x).
dx
1
We can solve for (f 1 )0 (x) by re-arranging, to get (f 1 )0 (x) = f (f 1 (x))
as required. Again,
this is an instance in which the derivation of the formula is so easy that it would be wasteful to
memorize (3.11). Instead, we emphasize that the student should focus on the derivation itself.
1=
Exercise:
3.5
3.5.1
The word inverse has many different meanings depending on the context in which it is used. For
example, what if we were to ask the student to find the inverse of the number 2? What does this
mean? To what are we taking the inverse? To properly understand this, we need to understand the
following: Given a binary operator (an operator which takes in two things and produces a single
thing in return, such as addition and multiplication), we say that a number id is the identity of that
operator if operating against it does nothing to the input. For example, in the case of addition,
the operator will satisfy x + id+ = x for all possible x; for example,
2 + id+ = 2,
5 + id+ = 5.
Our experience tells us that id+ = 0. Similarly, for multiplication the identity id will satisfy
x id = x for all x; for example,
3 id = 3,
id = .
Again our experience tells us that id = 1. We thus say that 0 is the additive identity and 1 is the
multiplicative identity. We say that the inverse of x is an element which, when paired against x,
gives the identity. Hence the additive inverse of 2 is the number y such that 2 + y = id+ = 0, or
rather 2. In general, the additive inverse of n is n, and this always exists! For multiplication,
it is not too hard to convince ourselves that the multiplicative inverse of x is x1 ; for example,
2 21 = 1 = id . Notice that there is no multiplicative inverse for the number 0, so in this case the
inverse does not always exist.
Function composition f g is another example of a binary operator: we put two functions f, g
in, and we get one function f g out. What is the identity for this operation? Well, we would like
a function id such that
f (id (x)) = f (x)
= id (f (x)).
74
c
2015
Tyler Holden
3.5
Derivatives
If we think about this for a moment, the identity function is the function id (x) = x, the function
which does nothing to the argument! Now what is the inverse of a function? The inverse of a
function f (x) is a function f 1 (x) such that f f 1 = f 1 f = id .
The underlined word above is important: We need our inverse map to also be a function so
that we can apply our usual ideas to it. In particular, lets think about this in the context of the
vertical line test. Recall from Section 1 that a function f is said to be injective if
f (x) = f (y)
x = y;
that is, the function is one-to-one, or satisfies the horizontal line test. Since geometrically, the
inverse of f arises by reflecting the graph of f about the line y = x, the horizontal line test will
afterwards become a vertical line test. We conclude that for the inverse map f 1 to be a function,
f must be injective.
3.5.2
As suggested by this section name, our goal is to differentiate the inverse trigonometric functions.
But wait a second! We just said that for the inverse of a function to even exist, our original function
had to be injective. When we think about the trigonometric functions like sin(x) and cos(x), they
are certainly not injective, so what does it even meant to talk about their inverse?
The solution is that we technically cannot define their inverse. What we do instead is create new
functions which are restrictions of the old functions to domains where they are injective. Moreover,
we need to ensure that we capture as much information as possible (namely, that our range is
[1, 1]), suggesting that we take in interval of length . In the case of sin(x), there are many
possible intervals that we couldchoose. For the sake of simplicity, we choose the maximal injective
interval at the origin, namely 2 , 2 . In the case of cos(x) we take [0, ]. On these intervals,
sin(x) has an inverse arcsin(x) and cos(x) has an inverse arccos(x). We can perform a similar
analysis of all the remaining trigonometric functions and deduce the following table:
3.5.3
Name
arcsine
Definition
= arcsin(x)
Notation
sin() = x
arccosine
= arccos(x)
cos() = x
arctangent
= arctan(x) tan() = x
arccosecant
= arccsc(x)
csc() = x
arcsecant
= arcsec(x)
sec() = x
arccotangent = arccot(x)
cot() = x
Maps
[1, 1] 2 , 2
[1, 1] [0, ]
R
2 , 2 \ {0}
R \ (1, 1) [0, ] \ 2
R \ (1, 1)
2 , 2
R (0, )
The Derivatives
We can use the Inverse Function Theorem (Theorem 3.32) to compute the derivatives of the inverse
trigonometric functions quite easily. Let us proceed with an example on the sine-function.
75
c
2015
Tyler Holden
Derivatives
3.5
Theorem 3.33
The function arcsin : [1, 1] 2 , 2 is differentiable on (1, 1) and its derivative is given
by
d
1
arcsin(x) =
.
(3.12)
dx
1 x2
d
Proof. Set f (x) = sin(x) so that f 1 (x) = arcsin(x). Note that dx
sin(x) = cos(x) is never zero on
the interval ( 2 , 2 ) and hence by the Inverse Function Theorem, we know that f 1 (x) = arcsin(x)
is differentiable on (1, 1). Now we know that on (1, 1) we have sin(arcsin(x)) = x and hence
differentiating both sides we have15
d
arcsin(x)
1 = cos(arcsin(x))
dx
d
1
arcsin(x) =
.
dx
cos(arcsin(x))
This is the derivative, but does not agree with the statement given in (3.12). To deduce cos(arcsin(x)),
set y = arcsin(x) so that sin(y)
= x (see Figure 11). From the figure, we immediately read off that
1 x2
76
c
2015
Tyler Holden
3.6
d
1
arcsin(x) =
dx
1 x2
d
1
arccos(x) =
dx
1 x2
d
1
arctan(x) =
dx
1 + x2
3.6
Derivatives
d
1
arccsc(x) =
dx
|x| x2 1
d
1
arcsec(x) =
dx
|x| x2 1
d
1
arccot(x) =
.
dx
1 + x2
Exponentials and logarithms are two exceptionally important types of functions that, along with
inverse trigonometric functions, are referred to as transcendental functions. Unfortunately, they are
tricky to define, despite the fact that the average student has probably seen them before. Indeed,
while there are several possible ways to define them, to be completely rigorous we must either show
the student the Monotone Convergence Theorem, start with Taylor Series, or develop the theory of
integration. As we have not yet covered any of these topics, we are certainly in a sticky situation!!
However, the ubiquity of these functions throughout much of the natural sciences necessitates that
we introduce them earlier rather than later.
3.6.1
The Functions
The Exponential: For now, let us just naively assume that if a > 0, the object ax makes sense,
for all real numbers x. We then have the following properties:
1. a0 = 1
2. a1 = a
3. ax ay = ax+y
4. (ax )y = axy
5. (ab)x = ax bx .
From this list of properties we can derive other properties. For example, combining the fact that
ax ay = ax+y we must then have ax ax = a0 = 1 which implies that ax = ax . Furthermore, 1x = 1
for all x and a0 = 1 for all a > 0. Notice that for every a 6= 1, ax has domain R and range (0, ).
Logarithms: One can show that the non-constant exponential functions are monotonic; that is,
they are either increasing or decreasing. Since such functions are automatically injective, we can
be guaranteed that the exponential function has an inverse, if we choose the inverse function to
have appropriate domain. Hence if a > 0, we define the map loga : (0, ) R as loga (x) = y if
ay = x; that is, the logarithm is the inverse function to ax :
aloga (x) = x,
loga (ax ) = x.
A few easy properties of loga that come from the properties of the exponential are as follows:
77
c
2015
Tyler Holden
Derivatives
3.6
-4
-3
-2
-1
-1
Figure 12: A graph of some of the exponential functions. Functions in blue are ax for a > 1, while
those in red are for 0 < a < 1. The green line corresponds to f (x) = 1x 1.
Proposition 3.34
3.6.2
loga (y)
loga (x)
Their Derivatives
Using the properties of the exponential function as well as the definition of the limit, we can start
to get an idea about the derivative of the exponential. Indeed,
78
c
2015
Tyler Holden
3.6
Derivatives
3
2
1
-1
-1
-2
-3
Figure 13: A graph of some of the logarithmic functions. Functions in blue are loga x for a > 1,
while those in red are for 0 < a < 1.
d x
ax+h ax
a = lim
h0
dx
h
ax ah ax
= lim
h0
h
h
a 1
= ax lim
h0
h
x
= Ca a .
h
where Ca = limh0 a h1 . We shall blindly assume that this constant exists, and we shall define
Eulers number e to be the (blindly assumed) unique number such that Ce = 1. Hence by definition,
d x
e = ex .
dx
For this number e, we shall denote the inverse of ex as simply17 log(x).
To differentiate log(x), we recall that elog(x) = x, so differentiating both sides we get
d
log(x)
1=e
log(x)
dx
17
Some people in the sciences use the notation ln(x) for the inverse of ex , but this is not common throughout
mathematics and does not generalize well. It should be taken that if no base to the logarithm is supplied, it is always
base e.
79
c
2015
Tyler Holden
Derivatives
3.6
ax = elog(a
= ex log(a) .
Differentiating we get
d x
d x log(a)
a =
e
= log(a)ex log(a) = log(a)ax .
dx
dx
To determine the remaining derivatives for the other bases of the logarithm, convince yourself that
we can always write loga (x) = log(x)/ log(a), in which case we simply get
d
d log(x)
1
loga (x) =
=
.
dx
dx log(a)
x log(a)
Corollary 3.35: Generalized Power Rule
d n
dx x
= nxn1
Logarithmic Differentiation
One of the great things about logarithms is there is a sense in which they decrease the complexity
of an operation. For example, we often think of addition as being easier than multiplication, and
multiplication being easier than exponents:
log(xy) = log(x) + log(y),
log(xy ) = y log x.
At the cost of introducing a logarithm, we are thus able to convert product to sums, and powers
to products! Since the logarithm is not very hard to differentiate, this does not seem like such a
terrible cost.
This idea in general is known as logarithmic differentiation. Where it can be particularly useful
is when we have a product/quotient of many objects which are individually simple to differentiate,
but which will become complicated when nested with the product rule. For example, given a
collection of functions f1 (x), . . . , fn (x) and g1 (x), . . . , gm (x), notice that we can write
f1 (x) fn (x)
log
= log f1 (x) + + log fn (x) log g1 (x) log gm (x).
g1 (x) gm (x)
80
c
2015
Tyler Holden
3.6
Derivatives
g1 (x) gm (x) f1 (x)
fn (x) g1 (x)
gm (x)
Example 3.36
(x 1)2 (x2 + 2) x
Compute the derivative of f (x) =
.
x4 + 5
Solution. This would be an absolute nightmare to compute using the quotient rule, so instead we
use logarithm differentiation. Taking the logarithm of both sides yields:
log f (x) = 2 log(x 1) + log(x2 + 2) +
1
log(x) log(x4 + 5).
2
.
=
x4 + 5
x 1 x2 + 2 2x x4 + 5
0
This takes care of converting products to sums, but now what about powers to products? Given
two functions f (x) and g(x), lets try to differentiate f (x)g(x) . The problem here is that neither
the power rule, nor the rules for differentiating exponents can apply (in both of those cases, the
function should only occur in the power or the base, but not both). To deal with this, we set
y = f (x)g(x) so that log y = g(x) log f (x). We can now differentiate implicitly:
1 dy
f 0 (x)
= g 0 (x) log f (x) + g(x)
y dx
f (x)
which we may then solve for
dy
dx
to get
dy
f 0 (x)
g(x)
0
= f (x)
g (x) log f (x) + g(x)
.
dx
f (x)
(3.13)
Like many of the formulae that weve derived, Equation (3.13) is not worth remembering on its
own. Rather, what is important is remembering how this derivation was performed so that it can
be repeated when necessary.
Example 3.37
81
c
2015
Tyler Holden
Derivatives
3.6
Solution. Setting y = xsin(x) we have log y = sin(x) log x. Differentiating implicitly we get
1 dy
sin(x)
= cos(x) log(x) +
y dx
x
which we may solve for
dy
dx
to get
dy
sin(x)
sin(x)
cos(x) log(x) +
.
=x
dx
x
82
c
2015
Tyler Holden
Applications Of Derivatives
Applications Of Derivatives
We now have an extensive collection of tools at our disposal, but we did not create this toolbox
simply to admire the lustre of the implements, but rather to use them to solve problems. Here we
will begin to see how we can use the tools of calculus.
4.1
4.1.1
Implicit Differentiation
The Idea of Implicit Functions
The idea of implicit differentiation is that we may be given variables in which it is implied that those
variables depend on other variables, though we may not be able to explicitly write that relationship
down. For example, to this point we have typically seen examples where we might write y = f (x),
in which case it is clear that that changes in x affect changes in y, as prescribed by the function x.
dy
We were then able to determine the rate of change of y with respect to x by computing dx
.
However, we can sometimes write relationships down without being able to solve for one variable
as a function of the other; for example
x2 + y 2 = 25
ex + x cos(y) + y = 1
f (x, y) = k
We can convince ourselves that the variables x and y above depend on one another. For example,
consider the equation of the circle x2 + y 2 = 25. If we set y = 5 then x is forced to be 0, while if
we were to set x = 3 then y would have to be one of y = 4. However, there is no function which
makes the relationship between x and y explicit, since as our above example indicates, a single
x-value may correspond to two possible y-values, and hence the relationship is not one given by a
function.18
As an alternative example, consider the volume V of a cylinder as a function of its radius r and
height h:
V = r2 h.
This equation defines an explicit relationship between the three entities V, r, and h. However,
what if our cylinder were made of metal, and we were told that as temperature T increase, both
the radius and the height increase, while when temperatures decrease, the radius and the height
decrease19 ? In that case, we are implicitly assuming that r and h are functions of temperature T .
This means that V is also implicitly a function of temperature, and we have
V (T ) = r(T )2 h(T ).
18
(4.1)
Some students might be distressed at the fact that I have written x2 + y 2 = 25 as an implicit equation, since
certainly we could solve to find
p
y = 25 x2 ,
but I claim that this actually not an explicit representation of this function. The reason is that, for example,
both
(0, 5) and (0, 5) are solutions to this equation, but we are unable to recover (0, 5) from the expression y = 25 x2 .
19
Metals typically expand with increased heat
83
c
2015
Tyler Holden
Applications Of Derivatives
4.1
Implicit Differentiation
This implicit understanding that all the variables now depend on temperature allows us to determine
how the volume of our cylinder is changing with temperature. Certainly, one could imagine a
scenario in which this could be important!
In both of the aforementioned cases, it seems as though we should still be able to discuss the
rate of change of one variable with respect to another, even if we are unable to explicitly describe
the relationship between the variables using a function. This leads us to a process known as implicit
differentiation.
4.1.2
Let us say that we know a variable y implicitly depends upon another variable x. The idea of implicit
differentiation is to differentiate as though the exact nature of the relationship were known. The
best way to understand this is to see an example.
Example 4.1
Consider the equation of the circle centered at the origin with radius 1, x2 +y 2 = 1. Determine
the rate of change of y with respect to x.
dy
Solution. Our goal is to compute dx
. We saw earlier that the equation of a circle is an implicit
relationship as there is no function which describes how y changes with respect to x or vice versa.
Nonetheless, we are going to differentiate the equation x2 +y 2 = 1, but we keep in mind always that
we are assuming that y is a function of x. To make this more clear, lets actually write y = f (x),
so that our equation of the circle is
1 = x2 + y 2 = x2 + f (x)2 .
Now differentiating both sides, we have
d
x2 + f (x)2
dx
= 2x + 2f (x)f 0 (x).
0=
dy
Remember that we are trying to solve for dx
, which under our choice of y = f (x) is just
hence
dy
x
= f 0 (x) = .
dx
y
dy
dx
= f 0 (x),
(4.2)
Note: Many people do not like to use this y = f (x) notation, and instead will just write down
d
dy
x2 + y 2 = 2x + 2y .
dx
dx
This is acceptable and if you are comfortable using it, then you should feel free to do that. However,
writing this down often hides what is really happening, so we have used the y = f (x) notation just
to be clear.
84
c
2015
Tyler Holden
4.1
Implicit Differentiation
Applications Of Derivatives
On the other hand, let me write y = 1 x2 , where the fact that I have used the tilde is to
indicate that y is not actually the same thing as y. When we differentiate we get
d
y
x
x
=
= .
2
dx
y
1x
(4.3)
The inquisitive student may realize looks very similar to (4.2), but I claim is not quite the same.
Indeed, let us try to find the slope of the tangent line to the circle at the point x =
+ 12
1 .
2
Notice
1/ 2
1/ 2
dy
dy
= = 1,
=
=1
1/ 2
dx 1 , 1
dx 1 , 1
1/ 2
2
which is in fact what we would expect. On the other hand, using (4.3) we find that at x =
1
2
there is only one possible y value, corresponding to y = 12 in which case the slope of the tangent
line is the same as that found above, namely
1/ 2
d
y
=
= 1.
1/ 2
dx 1 , 1
2
We have in fact lost the other tangent line! This is because when we took the square root of
y 2 = 1 x2 we needed to make a choice as to whether to take the positive or negative root. In
doing so, we actually lost information.
Example 4.2
Solution. We already know that V = r2 h. Although (4.1) has the temperature dependence written
in, we will ignore it for this exercise to show the student how the notation is typically conveyed.
Differentiating, we get
dV
d 2
=
r h
dT
dT
dr
dh
h + r2
= 2r
dT
dT
The power of implicit differentiation can be even greater though if one is given a transcendental
equation such as ex + x cos(y) + y = 1 in cases where y has no change of being isolated. In such
instances, one is indeed forced to use implicit differentiation.
Example 4.3
dy
dx
85
c
2015
Tyler Holden
Applications Of Derivatives
4.2
d
dx
Related Rates
d x
d
(e + x cos(y) + y) =
1
dx
dx
dy
dy
ex + cos(y) x sin(y)
+
=0
dx dx
ex + cos(y)
dy
=
.
x sin(y) 1
dx
4.2
Related Rates
Much as we saw in our example relating the volume of a cylinder to its radius and height, related
rates are way of using the relationship between objects to infer how they change with respect
to an implicit variable. The trick when doing related rates questions is to find an equation that
involves the quantities you are interested in, assume everything varies with time, then differentiate
implicitly. The best way to get the hang of related rates questions is to do examples, so lets get
started.
Example 4.4
Assume that a custodian is standing on a latter 5 metres tall and a pesky student pulls that
ladder out from the wall at a rate of 1 metre/second. Determine how quickly the custodian
is falling when he is 3 metres from the ground.
5m
y
x
1 m/s
Solution. If one begins by drawing a simple diagram of the situation, we see that we get a triangle
whose side lengths are (x, y, 5), where x is the distance of the ladder from the wall, y is the height
of the ladder above the ground, and 5 is length of the ladder (the hypotenuse). The obvious
relationship between the variables is to use the Pythagorean theorem, x2 + y 2 = 25. Since x and y
are both changing as a function of time, we differentiate both implicitly to find
2x
dx
dy
+ 2y
= 0.
dt
dt
Since we want to know how fast the custodian is falling we would like to evaluate dy
dt , which is
dy
x dx
dx
solved as dt = y dt . We already know that dt = 1 (as this is the rate at which the latter is being
pulled from the wall), and we are told to evaluate when y = 3. Hence we need only find the x-value
86
c
2015
Tyler Holden
4.2
Related Rates
Applications Of Derivatives
which
corresponds to a y-value of 3. The Pythagorean theorem again implies that if y = 3 then
x = 52 32 = 4, so that
dy
4
4
= (1) = .
dt
3
3
Thus the custodian is falling at 4/3 metres per second.
Example 4.5
A sphere is shrinking in such a way that its surface area changes at a constant rate of 1 unit2
per time. Find the rate at which the diameter of the sphere is decreasing when the diameter
is 5 units.
Solution. First, we look at the quantities involved. We are given information about the surface
area A of the sphere, and its diameter . We know that the formula for the surface area of a sphere
is given by A = 4r2 where r is the radius of the sphere. This is not quite what we want though,
since we are interested in the diameter. Luckily, we also know that the radius is half the diameter,
so that r = 21 . Substituting this we find that
A = 4
2
= 2 .
Now it is clear that both the area and the diameter are changing with time, so when we differentiate
we will have to do so implicitly. Indeed, one gets that
d
dA
= 2 .
dt
dt
We want to know
to find that
d
dt
as this represents the change in the diameter with respect to time, so we solve
d
1 dA
=
.
dt
2 dt
Example 4.6
Consider a square pyramid which is being filled with some mysterious physically pertinent
quantity. The base square has length and width both equal to L > 0 and the height of the
apex of the pyramid is also a distance L. At what rate must we poor the mysterious quantity
into the pyramid such that the rate of change of the height of the material does not depend
on the size of the base square. Is this rate dependent on the height of the pyramid?
87
c
2015
Tyler Holden
Applications Of Derivatives
4.2
Related Rates
Solution. Let V be the volume of the liquid contained in the pyramid, which we know is an increasing function in time. We have the capability of adjusting the flow rate, which corresponds to the
quantity dV
dt . We are interested in the determining how the height of the water h is changing as a
dh
function of time, so that we might choose an appropriate dV
dt to ensure that dt has no dependence
upon L.
If we project the pyramid into two dimensions, we have the following figure:
Vc
L
V
Where Vc is the volume not occupied by water. Notice that the height of the pyramid bounding
Vc is L h, so if its base has dimension `, its volume is Vc = 13 `2 (L h). In order to relate this to
the volume of the water, notice that the total volume of the pyramid is
1
1
Vtot = Ah = L3 = V + Vc
3
3
dVc
so we can write V = 13 L3 Vc (which implies that dV
dt = dt ). Now the triangle which bounds
the projected pyramid is similar to the triangle which bounds Vc , and this implies that the ratio of
their side lengths will be equal. In particular, if we let ` be the length of the base of the triangle
bounding Vc , then we must have
`
L
= =1
Lh
L
or equivalently
dh
1
dV
=
.
2
dt
(L h) dt
2
Thus by setting dV
dt = C(L h) for some constant C, we can guarantee that
on L, and changes at a constant rate C.
dh
dt
88
c
2015
Tyler Holden
4.3
Applications Of Derivatives
Example 4.7
A cat is sitting on a ledge a distance d from a door when suddenly a frantic math professor
bursts through, chased by a mob of angry students. The cat, having watched many Olympic
races, figures that the teacher is running at a rate of s metres per second. Assuming that
the angry mob of students is unable to catch the spry professor, at what rate must the cats
head move in order to watch the chase once the professor has run r metres. Your answer
should be expressed purely in terms of r, s and d.
Figure 14: The professor running, while the cat patiently observes.
Solution. Let us begin by drawing an appropriate diagram, labeling the angle the cats head makes
with the door and the distance from the professor to the door by y. Notice we will be interested
in the case when y = r (signifying that the professor has run a distance r) and when dy
dt = s (since
this is the speed of the professor). This is illustrated in Figure 14.
The triangle tells us that tan() =
y
d
y0
.
d
Since all units are consistent and the professor is running at s metres per second, we have that
y 0 = s giving
s
0 =
d sec2 ()
and we need only remove the dependency
illustrated in Figure 14
on to be finished. The triangle
d
2
2
tells us that the hypotenuse is given by r + d so that cos() = r2 +d2 . Since sec2 () = cos12 () =
r2 +d2
d2
4.3
sd
r2 + d2
The Mean Value Theorem, being another of these named theorems, is an exceptionally important
result in analysis. However, first year students often find it difficult to use. Unlike the other named
theorems in this class, we will actually prove the Mean Value Theorem, so we present it below now
(despite the fact that it will take us a while to prove it).
89
c
2015
Tyler Holden
Applications Of Derivatives
4.3
If f is a function and [a, b] is an interval such that f continuous on [a, b] and differentiable
on (a, b), then there exists c (a, b) such that
f 0 (c) =
f (b) f (a)
.
ba
(4.4)
Remarks:
1. There are two hypotheses; namely the continuity and differentiability of f . Many students
like to ignore these hypotheses and instead focus on Equation (4.7). While the equation is
critical, it is simply false if we ignore the hypotheses, so be mindful of them.
2. Like the Intermediate and Extreme Value Theorems, this theorem is not constructive; that
is, we are guaranteed that such a c exists, but have no way of finding it.
(a)
3. The quantity f (b)f
is the slope of the secant line between the points (a, f (a)) and (b, f (b))
ba
(see Figure 15).
y
f (b)
(a)
slope = f (b)f
ba
f (a)
x
a
Figure 15: The Mean Value Theorem says that there is a point on this graph such that the tangent
line has the same slope as the secant between (a, f (a)) and (b, f (b)).
4.3.1
To prove the Mean Value Theorem, we start with a seemingly more simple (yet actually equivalent)
result, known as Rolles theorem. However, in order to prove Rolles theorem, we will need to discuss
something of maxima and minima.
Definition 4.9
4.3
Applications Of Derivatives
The point, which we shall expound on in more detail later, is that local minima and maxima
which occur on the interior necessarily correspond to points where the derivative is zero. Indeed, if
we look at Figure 15 we see that the function has a local minimum and local maximum which occur
on the interior of the domain of f . At these points, it certainly appears as though the tangent line
is horizontal, and this is indeed the case.
Proposition 4.10
Proof. We shall do the proof for the case when c corresponds to a local maximum and leave the proof
of the other case to the student. Since c is a local maximum, we know there is some neighbourhood
I D of c such that for all x I, f (x) f (c).
Since c corresponds to a maximum of f , for all h > 0 sufficiently small so that c + h I, we
have that f (c + h) f (c). Hence f (c + h) f (c) 0, and since h is positive, the difference quotient
(c)
satisfies f (c+h)f
0. In the limit as h 0+ we thus have
h
lim
h0+
f (c + h) f (c)
0
h
(4.5)
Similarly, if h < 0 we still have f (c + h) f (c) 0 but now with a negative denominator our
difference quotient is non-negative and
lim
h0
f (c + h) f (c)
0.
h
(4.6)
Combining (4.5) and (4.6) and using the fact that f is differentiable at c, we have
0 lim
h0
f (c + h) f (c)
f (c + h) f (c)
= f 0 (c) = lim
0
+
h
h
h0
Let f be continuous on [a, b] and differentiable on (a, b). If f (a) = f (b) = 0 then there exists
a point c (a, b) such that f 0 (c) = 0.
The intuition for Rolles theorem is quite clear. If our function is continuous (and differentiable),
then in order for the function to start and 0 and end at 0, at some point it must have been
increasing and then decreasing (or vice-versa). This corresponds to a derivative which positive and
then negative, which we suspect will imply that at some point the derivative was in fact zero.
Proof. If f is constant (and hence the zero function) then we are done, so assume that f is a
non-constant function. Since f is continuous on [a, b], by the Extreme Value Theorem we know
that it achieves its maximum and minimum on [a, b]. Since the function is non-constant, at least
91
c
2015
Tyler Holden
Applications Of Derivatives
4.3
one of the maximum or minimum is non-zero. Without loss of generality, assume that f takes on
a maximum at c (a, b) and that f (c) > 0. Since f (c) > 0 then c cannot be an endpoint (since
f (a) = f (b) = 0) and hence must be an interior point. By Proposition 4.10, it then follows that
f 0 (c) = 0 as required.
Rolles Theorem on its own can be applied to give some fairly useful results. One example is
that we can use it to say something about the number of roots of a function.
Example 4.12
Let f (x) be a differentiable function, and assume that f 0 (x) 6= 0 for all x. Show that f has
at most one root.
Solution. As we have discussed before, the fact that f (x) is continuously differentiable means that
f 0 (x) is continuous. Since f 0 (x) 6= 0 for all x (a, b) it must follow that f 0 (x) is either strictly
positive or strictly negative (otherwise there are points c1 , c2 such that f (c1 ) > 0 and f (c2 ) < 0 and
the Intermediate Value Theorem would then imply a point between c1 and c2 such that f 0 (c) = 0).
Figure 16: Plot of a decreasing (left) and increasing (right) function. No matter how we choose to
draw such a function, it may only have at most one root.
Think about any nice (read: differentiable) graph with two roots: it typically looks something
along the lines of Figure 16 not occur? The reason is that the derivative must switch signs at some
point. In particular, there is a point p (a, b) where f 0 (p) = 0. This is the contradiction at which
we would like to arrive, and Rolles Theorem will give it to us.
For the sake of contradiction, assume that the function has more than one root and choose any
two roots, say r, s (a, b), so that f (r) = f (s) = 0. Since our function is continuously differentiable,
it certainly satisfies the hypotheses of Rolles theorem, so there exists c (a, b) such that
f 0 (c) = 0.
This is a contradiction to the fact that f 0 (x) 6= 0 for all x (a, b). Hence the function cannot have
two roots, so we conclude it has at most one root.
Example 4.13
Show that the function f (x) = x3 6x2 12x + ex 4 has exactly one root.
92
c
2015
Tyler Holden
4.3
Applications Of Derivatives
Solution. We will first use the Intermediate Value Theorem to show that there is a root, then we
will argue that this can be the only root.
Indeed, our function is certainly continuous, and we notice that
lim
xinf ty
f (x) = and
lim f (x) = . This certainly suggests there is a root. If we want to be more precisely, one can
show that f (10) < 0 and f (10) > 0 without too much trouble. Hence by the Intermediate Value
Theorem, there is a root in [10, 10].
Now we claim that this is the only root. Since f is differentiable everywhere, by Example 4.12
it suffices to show that f 0 (x) 6= 0 for all x. Differentiating we get
f 0 (x) = 3x2 12x 12 + ex = 3(x 2)2 + ex .
The summands here satisfy 3(x 2)2 0 and ex > 0, which means that between them we have
f 0 (x) > 0. We conclude that the function has at most one root. Combining our data, we find that
f has exactly one root, as required.
4.3.2
We implied earlier that the Mean Value Theorem and Rolles Theorem are in fact equivalent. Its
not too hard to convince ourselves that Rolles Theorem is actually just a special case of the Mean
Value Theorem, but the fact that we use Rolles theorem to prove the Mean Value Theorem is why
they are equivalent.
The idea of why Rolles theorem is so useful is that the Mean Value Theorem effectively reduces
to turning your head to the side and applying Rolles Theorem.
Rolles Theorem Equivalent
f (x)
F (x) = f (x)
f (b)f (a)
(x
ba
a)
Figure 17: We can turn the statement of the Mean Value Theorem (left) into an equivalent problem
which can be solved using Rolles Theorem (right).
93
c
2015
Tyler Holden
Applications Of Derivatives
4.3
Let f : [a, b] R be continuous on [a, b] and differentiable on (a, b). Then there exist a
c (a, b) such that
f (b) f (a)
.
(4.7)
f 0 (c) =
ba
Proof. As mentioned prior to the theorem statement, the trick is to define a new function by
pivoting f (x) about the point f (a) until f (a) = f (b). At this point we will be able to use Rolles
theorem, which will complete the result.
Define a function F : [a, b] R by
F (x) = f (x)
f (b) f (a)
(x a)
ba
and notice that F is continuous on [a, b] and differentiable on (a, b). Furthermore,
F (a) = f (a)
f (b) f (a)
(a a) = f (a),
ba
F (b) = f (b)
f (b) f (a)
(b a) = f (a)
ba
so that F (a) = F (b). Applying Rolles Theorem to the function F , we know there exists some
c (a, b) such that F 0 (c) = 0. This in turn implies that
0 = F 0 (c) = f 0 (c)
f (b) f (a)
,
ba
f 0 (c) =
f (b) f (a)
ba
If f : [a, b] R is differentiable and f 0 (x) = 0 for all x [a, b], then f is a constant function
on [a, b].
Proof. Since we want to show that f is constant, it is sufficient to show that f (x) = f (a) for
all x (since f (a) is a constant). Let p [a, b] be some arbitrary point and consider the interval
[a, p] [a, b]. Since f is differentiable on the larger of these two, its restriction is continuous on
[a, p] and differentiable on (a, p).
Recalling that f 0 (x) = 0 for all x, we can now apply the Mean Value Theorem to find some
c (a, p) such that
f (p) f (a)
.
0 = f 0 (c) =
pa
This in turn implies that f (p) f (a) = 0 so that f (p) = f (a). Since p was arbitrary, we must have
that f (x) = f (a) for all x [a, b], and so f is the constant function.
94
c
2015
Tyler Holden
4.4
4.4
Applications Of Derivatives
We introduced the idea of maxima and minima in Section 4.3 as it was necessary to prove the Mean
Value Theorem. Here now we develop tools from the Mean Value Theorem for determining where
maxima and minima occur. Intuitively, we saw that a function is a maximum if our function is
increasing to the left and decreasing to the right (with an analogous notion for minima). In order
to make this idea more formal, we introduce the notion of increasing/decreasing functions.
Definition 4.16
f (x2 ) f (x1 )
.
x2 x1
But x2 x1 > 0 and f 0 (c) < 0 implies that f (x2 ) f (x1 ) < 0, or rather, f (x1 ) > f (x2 ). Since x1
and x2 were chosen arbitrarily, it then follows that whenever x1 < x2 that f (x1 ) > f (x2 ), which is
precisely the definition of decreasing.
For the case when f 0 (x) > 0, define a new function g(x) = f (x) and notice that g 0 (x) =
< 0. By our previous case, we know that g 0 (x) is decreasing on [a, b]; that is, if x1 < x2
then g(x1 ) > g(x2 ). This in turn implies that
f 0 (x)
95
c
2015
Tyler Holden
Applications Of Derivatives
4.4
Example 4.18
Solution. In light of the previous corollary, our goal is to find intervals on which the derivative is
negative. Computing the derivative we get
f 0 (x) = 2x sin(x2 ).
If x > 0 then the sign of f 0 (x) is entirely determined by sin(x2 ). Now we know
that if y
> 0 then
2 ) < 0 whenever x [ 2k + 1, 2k + 2]
sin(y) < 0 for
y
[(2k+1),
(2k+2)],
k
N.
Hence
sin(x
or x [ 2k + 2, 2k + 1]. But since we wanted x > 0, we can throw away the negative
solutions.
For x < 0 we want sin(x2 ) > 0 which happens to coincide with x [ 2k + 2, 2k + 1],
and so our final solution is
[
[ 2k + 2, 2k + 1]
[ 2k + 1, 2k + 2].
kN
kN
Via Proposition 4.10 we saw that extreme points which occured on the interior of the domain
of a function necessarily had zero derivative. These values are in fact so special that they have a
name:
Definition 4.19
Critical points and values are exceptionally interesting in mathematics, though our treatment of them in
this course will be very limited. One example is Sards theorem, which says that if f is differentiable, then
there are very few critical values. Alternatively, there is an entire study called Morse Theory, which is
interested in reconstructing spaces by using information about critical values of functions on that space.
Proposition 4.10 tolds us that if c is a max/min on the interior of the domain of a function, the
c is a critical point. Does the converse need to be true? Unfortunately the answer is no: A simple
example is the function f (x) = x3 . This function has a critical point at x = 0 since f 0 (x) = 2x2 .
However, f (x) is actually an increasing function, so the point 0 certainly cannot be a maximum or
minimum!
Since it is only necessary for extrema to be critical points, we need to develop a strategy for
determining whether a critical point actually corresponds to a max/min. Our first such test is
primitive yet intuitive:
96
c
2015
Tyler Holden
4.4
Applications Of Derivatives
5
f 0 (0) = 0
2
f (x) = x3
5
Figure 18: The function f (x) = x3 has a critical point at x = 0 despite not having a max or min
at x = 0.
Proof. Effectively, the theorem says that if our function is increasing to the left of the critical point
and decreasing to the right, then we have a maximum, and vice versa for a minimum. Indeed, this
is what we suspect.
We will do the proof for the maximum only, and leave the other as an exercise for the student.
Hence assume that such a exists. By Proposition 4.17, since f 0 (x) > 0 for all x (c , c) we
know that f is an increasing function on this interval, and since f is continuous it then follows
that f (x) f (c) for all x (c , c). Similarly, since f 0 (x) < 0 for all x (c, c + ) we have that
f (x) f (c) for all x (c, c + ), hence c is a maximum.
Example 4.21
Find the (local) maxima and minima of the function f (x) = 6x4 3x2 + 2.
Solution. We begin by finding the critical points, so we differentiate to get f 0 (x) = 24x3 6x =
6x(4x2 1) = 6x(2x 1)(2x + 1). Setting f 0 (x) = 0 and solving for x, we get x = 0, 21 . Now since
f 0 (x) splits into linear factors, and each linear factor can only switch sign once, we can determine
97
c
2015
Tyler Holden
Applications Of Derivatives
4.4
2.2
f (x)
1.8
1.6
0.5
0.5
Figure 19: The function f (x) = 6x4 3x2 + 2 from Example 4.21
.
whether f is increasing or decreasing on each interval by creating the following chart:
x < 12
2x + 1
2x 1
f (x)
1
2
1
2
x>
+
+
+
+
1
2
Ths chart from Example 4.21 was non-trivial to produce. It turns out there is a much more
simple test:
Theorem 4.22: Second Derivative Test
4.4
Applications Of Derivatives
5 102
0.2
1.4
0.1
Figure 20: The function f (x) = x2 (1 x)3 from Example 4.23
.
Example 4.23
Let f (x) = x2 (1 x)3 . Find the critical points of f and classify them.
Solution. To find the critical points, we begin by differentiating:
f 0 (x) = 2x(1 x)3 3x2 (1 x)2
Setting f 0 (x) = 0 and solving for x, we immediately see that x = 0 and x = 1 are solutions.
Furthermore,
2x(1 x)3 = 3x2 (1 x)2
2(1 x) = 3x
2
x=
5
We have seen how to determine local minima/maxima which occur on the interior of the domain
of a function, but if our function is defined on a closed and bounded domain (like [a, b]) then it
could have endpoints which are maxima and minima. The definition for an endpoint to be a local
99
c
2015
Tyler Holden
Applications Of Derivatives
4.4
max or min is the same as that given in Definition 4.9, except that we only look at one-sided
neighbourhoods.
Rather than being interested in local minima and maxima, we are more often interested in
finding global extrema. The global maximum of a function f is a value c such that for all x in the
domain of f , f (x) f (c): similarly for the global minimum. Of course, global extrema can occur
at interior points, endpoints, or might not exist at all! However, since we know that extrema can
only occur at critical points or at endpoints, all we need to do is check all possible (hopefully finite)
points and determine which amongst them is the largest/smallest.
Example 4.24
Compute the global maximum and minimum of the function f (x) = x arctan(x) on the
interval [1, 1].
f (1) = 1 .
f (1) = 1 ,
4
4
Next we compute the critical points. The derivative of f (x) is
f 0 (x) = 1
1
(1 + x2 ) 1
x2
=
=
1 + x2
1 + x2
1 + x2
which has a single zero at x = 0. Plugging this back into our function we get f (0) = 0arctan(0) =
0. A quick comparison reveals that
f (1) < f (0) < f (1)
so that the global min is at 1 and the global max is at +1.
Example 4.25
Find the area of the largest rectangle which can be inscribed in a circle of radius R.
Solution. The trick to doing this question is to choose an intelligent parameterization of the rectangles. Assume that the circle is centred at the origin and notice that every inscribed rectangle can
be uniquely prescribed by the angle one of its vertices makes with the positive x axis (see Figure
21). In this way one may construct the area function
A(x) = 4R2 sin cos = 2R2 sin(2)
for20 [0, /2]. We differentiate to get A0 () = 4R2 cos(2), which has a zero a = /4. It is
not too hard to see that this is a max, since
A00 () = 8R2 sin
< 0.
2
Furthermore, the two endpoints actually have zero area, so this is the only such maximum.
20
Strictly speaking, we need only take [0, /4] but this requires arguing about the symmetry of the problem
which will just serve to complicate the solution.
100
c
2015
Tyler Holden
4.5
LHopitals Rule
Applications Of Derivatives
y
R cos
R sin
Figure 21: The construction of the parameterization of inscribed rectangles of a circle of radius R.
4.5
LH
opitals Rule
Just when we thought we had left limits behind forever, they find a way to creep back up to us.
This time however, we are wielding a powerful weapon: the Mean Value Theorem. Recall that
when evaluating limits which were not continuous, we often had to perform some algebraic trickery
to manipulate the function into a form more amenable to substitution. It was a task which often
required an intimate knowledge of the behaviour of the function. LHopitals Rule shall make this
much easier.
Definition 4.26
xc
f (x)
is indeterminate of type
xc g(x)
xc
/.
Note: If we take the limit as x or x , or either of the one sided limits x c
we will still say that the limits are of the above indeterminate types.
Indeterminate forms are precisely the aforementioned pesky functions. Here is how we attack
them.
101
c
2015
Tyler Holden
Applications Of Derivatives
4.5
LHopitals Rule
Theorem 4.27: LH
opitals Rule
f (x)
g(x)
is indeterminate of either type 0/0 or type /. If one of the following conditions holds:
Suppose that f and g are differentiable in a neighbourhood of c, and that the limit lim
xc
f 0 (x)
exists,
xc g 0 (x)
then lim
Proof. We shall do the proof for the 0/0-indeterminate type and with condition (1). The others
are left as an exercise for the student. Let [c , c + ] be the neighbourhood on which f, g are
differentiable. Define if necessary the auxiliary functions
(
f (x) x 6= c
F (x) =
,
0
x=c
(
g(x) x 6= c
G(x) =
0
x=c
so that both f and g are continuous at 0. Let h > 0 be sufficiently small so that [c, c + h]
(c , c + ), and notice that the function h(x) = F (x)G(c + h) G(x)F (c + h) is continuous on
[c, c + h] and differentiable on (c, c + h). Furthermore, h(c) = 0 and h(c + h) = 0, so by Rolles
Theorem, there exists some (c, c + h) such that
0 = h0 () = F 0 ()G(c + h) G0 ()F (c + h).
In particular, this means that
F 0 ()
F (c + h)
=
.
G0 ()
G(c + h)
h0+
xc+
f (x)
F (x)
F 0 ()
= lim
= lim 0 .
g(x) h0+ G(x) c G ()
Hence the result holds for the one sided limit. The two sided limit can be done by repeating the
argument for h < 0.
Example 4.28
xenx x
for n 6= 0.
x0 1 cos(nx)
102
c
2015
Tyler Holden
4.5
LHopitals Rule
Applications Of Derivatives
Solution. It is easy to check that this is of type 0/0, so we can apply LHopital to get
enx + nxenx 1
xenx x hHi
= lim
x0
x0 1 cos(nx)
n sin(nx)
nx
ne + nenx + n2 xenx
hHi
= lim
x0
n2 cos(nx)
2n
2
= 2 = .
n
n
lim
This is an example where we actually need to apply LHopitals rule more than once. There are
also lots of interesting results we can get from this theorem.
Example 4.29
f (x + h) 2f (x) + f (x h)
.
h0
h2
lim
lim
(4.8)
Solution. Contextually, it should be obvious that we are going to proceed via LHopitals rule,
though the student might recall that I pointed out the first limit earlier in the course.
We recognize that the limit in question is indeterminate of type 0/0 and so we have
f (x + h) f (x h) hHi
f 0 (x + h) (1)f 0 (x h)
= lim
= f 0 (x)
h0
h0
2h
2
lim
(4.9)
by LHopital
with type 0/0
lim
= (f 0 )0 (x)
00
= f (x).
by (4.9)
Corollary 4.30
Solution. Define the function F (x) = f (x) f (p) and let G(x) = x p. We claim that we can
apply LHopitals rule to the quotient F (x)/G(x) at the point 0. Since f is continuous, we clearly
have that
lim F (x) = lim [f (x) f (p)] = f (p) f (p) = 0
xp
xp
103
c
2015
Tyler Holden
Applications Of Derivatives
4.5
LHopitals Rule
p0
(4.10)
f 0 (p) = lim
LHopitals Rule
Equation (4.10)
xp
There are several other indeterminate types, which we approach by converting them into the type
0/0 case.
1. 0 : Possibly the easiest case to deal with, assume that we are given a limit of the form
lim f (x)g(x),
xa
lim g(x) = .
and
xa
xa
f (x)
1/g(x)
type 0/0,
lim
xa
g(x)
1/f (x)
type /.
2. : In contrast to the previous cases, these are possibly the hardest to transform into
one of 0/0 or /. This case arises when we have a limit of the form:
lim [f (x) g(x)],
xa
and
lim g(x) = .
xa
There is no obvious way to deal with these problems, and often need to be done on a caseby-case basis.
3. 1 , 00 , 0 : These cases are all handled identically, so we shall treat them together. For the
sake of concreteness, assume we are given a limit of the form
lim f (x)g(x) ,
xa
and
lim g(x) = 0
xa
This case is handled by calling upon our old friend, implicit differentiation. Let y = f (x)g(x)
so that log(y) = g(x) log f (x). As we have seen previously, continuity of ex means that it is
sufficient to compute
lim log y,
xa
since
lim log y
lim y = lim elog(y) = exa
.
xa
xa
104
c
2015
Tyler Holden
4.5
LHopitals Rule
Applications Of Derivatives
Now in the limit, log(y) = g(x) log f (x) is of type 0 , so we resort to case (1) which tells
us that we compute this limit by either
g(x)
,
xa 1/ log f (x)
lim
or
log f (x)
.
xa 1/g(x)
lim
In the case where f (x) 1 and g(x) , or f (x) 0 and g(x) 0 we also get 0 and
so these cases are treated the same as above.
Note that there is no type 0 despite the fact that this seems like it could potentially be
indeterminate. The intuitive reason for this is that as soon as the base becomes less than the number
1, taking powers will actually drive the limit closer to 0. Hence 0 and is not an indeterminate
form.
Example 4.31
Solution. In the limit as x 0 the function log(x) , so this limit is indeterminate of type
0 . We now have to make a choice as to which component to invert. Let us choose to first
invert the x, giving
log(x) hHi
1/x
= lim
x0 1/x
x0 1/x2
= lim x = 0.
x0+
x0
Example 4.32
Solution. It is easy to see that this limit is indeterminate of type , but as everything is
trigonometric, there is likely some simplification that can be performed. Writing everything in
terms of sin(x) and cos(x) we get
1
cos(x)
1 cos(x)
lim [csc(x) cot(x)] = lim
= lim
x0
x0 sin(x)
x0
sin(x)
sin(x)
sin(x)
hHi
= lim
= 0.
x0 cos(x)
Note that alternatively in this last step, we could have avoided using LHopitals Rule by writing
1 cos(x)
lim
= lim
x0
x0
sin(x)
105
c
2015
Tyler Holden
1cos(x)
x
x
sin(x)
0
= 0.
1
Applications Of Derivatives
4.5
LHopitals Rule
Example 4.33
1+
a x
.
x
4.5.2
1+
lim log y
a x
= exa
= ea .
x
Little o-notation
In many areas of computer science and number theory, one is interested in the growth rate of
functions relative to other functions. There are many ways of characterizing these growth rates,
but one way is with little o notation.
Definition 4.34
Example 4.35
x hHi
1
= lim x = 0.
x
x e
x e
lim
Now assume that the result holds for some k 1, and notice that
xk+1 hHi
xk
=
k
lim
=0
x ex
x ex
lim
106
c
2015
Tyler Holden
4.6
Curve Sketching
Applications Of Derivatives
where the last equality is via the induction hypothesis. We thus conclude that xn = o(ex ) for all
n N.
Exercise:
4.6
Curve Sketching
Without the assistance of computers, it can be difficult to assess the qualitative properties of a
function. Luckily, we can use our differentiation tools to begin to analyze how functions behave.
The first notion we would like to examine is concavity. The true definition of concavity is rather
subtle and not terribly enlightening. Since we often deal with very nice (read: differentiable)
functions, we shall instead define concavity as follows:
Definition 4.36
Let f be twice a differentiable function on the interval (a, b). We say that the graph of f
is concave up at c (a, b) if f 0 is increasing in a neighbourhood of c, and concave down if
f 0 is decreasing in a neighbourhood of c. We say that c (a, b) is a inflection point if the
concavity of the function changes in any neighbourhood of c.
If our function is twice differentiable, concavity may be ascertained by examining the sign of
f 00 (c) (since this describes whether f 0 is increasing or decreasing), and it will be necessary that
points of inflection satisfy f 00 (c) = 0. On the other hand, if f is not twice differentiable, points
of inflection could corresponds to places where f 00 (x) does not exist. The notion of concavity is
intimately related to that of maxima and minima. Recall that from the second derivative test, if
f 0 (c) = 0 and f 00 (c) 6= 0 then c is a min if f 00 (c) > 0 and a max if f 00 (c) < 0.
Show that if f is concave up on an interval (a, b) then for any x1 , x2 (a, b) and
any t [x1 , x2 ], we have
Exercise:
4.6.1
Our goal is thus to combine all of our information into a system which allows us to analyze the
qualitative behaviour of a function without knowing the nitty-gritty details of its exact value at
every point. There are approximately seven pieces of information that we need to compute to
107
c
2015
Tyler Holden
Applications Of Derivatives
4.6
Curve Sketching
Let f (x) and g(x) be continuous functions. We say that f (x) behaves like g(x) asymptotically
if
lim [f (x) g(x)] = 0.
x
We say that f (x) has an oblique asymptote if f (x) behaves asymptotically like g(x) = mx + b
for some m 6= 0.
Example 4.38
2x3 x2 +2x
.
x2 +1
Solution. The easiest way to proceed is to try to write f (x) as an improper rational function.
Performing long division, we see that
f (x) = (2x 1) +
1
.
x2 + 1
The idea is that in the limit as x , then 1/(x2 + 1) term will die off and contribute very little
to the behaviour of f (x), so that f (x) looks like the function 2x 1. To see that this satisfies the
definition above, set g(x) = 2x 1 so that
1
(2x 1)
lim [f (x) g(x)] = lim 2x 1 + 2
x
x
x +1
1
= lim 2
= 0,
x x + 1
which is precisely what we wanted to show. Similarly, the limit as x shows that f (x) is also
asymptotically like g(x) in that limit as well. Thus 2x 1 is an oblique asymptote for f (x) at both
.
108
c
2015
Tyler Holden
4.6
Curve Sketching
Applications Of Derivatives
Now that we know how to compute all terms involved in this computation, we shall proceed
with some examples.
Example 4.39
x3
.
(x + 1)2
Solution. Domain: The only point which could possibly give us trouble is x = 1. Hence our
domain is simply R \ {1}.
Intercepts: The y-intercept occurs when x = 0, so namely f (0) = 0. Similarly the x-intercept
comes when y = 0, for which we see that
x3
= 0,
(x + 1)2
x = 0.
x3
= ,
x (x + 1)2
lim
lim
x1
x3
= .
(x + 1)2
Finally, we want to check for oblique asymptotes. Using long polynomial division we may easily
find that
x3
3x + 2
= (x 2) + 2
2
(x + 1)
x + 2x + 1
109
c
2015
Tyler Holden
Applications Of Derivatives
4.6
Curve Sketching
multiply and
divide by 1/x2 .
= 0.
First Derivative: Computing the first derivative can be a chore, but we find that
3x2 (x + 1)2 2(x + 1)x3
(x + 1)4
3x4 + 6x3 + 3x2 2x4 2x3
=
(x + 1)4
2
x (x + 3)(x + 1)
=
(x + 1)4
x2 (x + 3)
=
(x + 1)3
f 0 (x) =
so that the critical points correspond to x = 0 and x = 3. The y-values for these points will be
useful when we plot, so we substitute to find that f (0) = 0 and f (3) = 27/4. To determine where
the function is increasing and decreasing, we consider the following table:
x+3
(x + 1)3
x2
f 0 (x)
+
+
+
+
+
+
+
+
+
+
+
+
Second Derivative: The second derivative is a little messy, but simplifies if done correctly.
d x2 (x + 3)
(3x2 + 6x)(x + 1)3 3(x + 1)2 (x3 + 3x2 )
=
dx (x + 1)3
(x + 1)6
3
2
2
3x + 3x + 6x + 6x 3x3 9x2
=
(x + 1)4
6x
=
(x + 1)4
so there is an inflection point at (0, 0) (telling us that one of the critical points is an inflection
point. Furthermore, f 00 (3) = 9/8 < 0 so the point (3, 27/4) is a max. Since the denominator is a
quartic it is always positive, and we can see that concavity is entirely determined by the numerator
6x. Hence f (x) is concave up when f 00 (x) > 0, corresponding to x > 0; and f (x) is concave down
when f 00 (x) < 0, corresponding to x < 0.
Plotting: Putting all of this information together, the student should get the following plot:
110
c
2015
Tyler Holden
4.6
Curve Sketching
Applications Of Derivatives
1
5
f (x) =
x3
(x+1)2
10
15
Example 4.40
x3
.
1 x2
x = 0.
(x)3
x3
=
= f (x).
1 (x)2
1 x2
Asymptotes: The vertical asymptotes will clearly occur at x = 1. Typically, one would
calculate the limits
x3
x3
lim
,
lim
x1 1 x2
x1 1 x2
but this is laborious and is redundant once we have information on the first derivative. For the
interested student who would like to see how to do this all the same, we have the following table
111
c
2015
Tyler Holden
Applications Of Derivatives
4.6
x 1+
x 1
x 1+
x 1
so that
lim
x1
x3 1 x2
+
+
+
x3
x3
=
lim
= ,
1 x2 x1 1 x2
lim
x1+
Curve Sketching
x3/1x2
+
x3
x3
=
lim
=
1 x2 x1+ 1 x2
Because the degree of the numerator is strictly greater than the degree of the denominator,
there are no horizontal asymptotes:
x3
= .
x 1 x2
lim
Finally, we want to check for oblique asymptotes. Using long polynomial division we may easily
find that
x3
x
= x +
2
1x
1 x2
so we claim that y = x is an oblique asymptote. Indeed, notice that
x3
+x
lim [f (x) (x)] = lim
x
x 1 x2
x
= lim
x 1 x2
1/x
= lim
x 1/x2 1
= 0.
First Derivative: This step allows us to determine where the function is increasing, decreasing,
the critical points, and when combined with the second derivative, maxima and minima. The first
derivative is computed to be
d x3
(3x2 )(1 x2 ) (2x)(x3 )
=
dx 1 x2
(1 x2 )2
2
2
x (3 x )
=
.
(1 x2 )2
We may simply read off the critical points as 1, 3, 0 with potential extrema at 0, 3 .
Setting up a quick table for increasing and decreasing we have
x < 3 3 < x < 1 1 < x < 0 0 < x < 1 1 < x < 3 x > 3
f 0 (x)
+
+
+
+
x2
(1x2 )2
of f 0 (x) is entirely determined by the sign of 3 x2 , which is negative whenever |x| > 3. It will
112
c
2015
Tyler Holden
4.6
Curve Sketching
Applications Of Derivatives
likely be useful to know the function values corresponding to our critical points. We already know
that f (0) = 0 and we find that
3 3
3 3
f ( 3) =
=
.
2
2
Second Derivative: The second derivative is a little messy, but simplifies if done correctly.
d x2 (3 x2 )
(6x 4x3 )(1 x)2 2(1 x2 )(2x)(3x3 x4 )
=
dx (1 x2 )2
(1 x2 )4
2
2x(x + 3)
.
=
(1 x2 )3
The inflection points will occur when f 00 (x) = 0 or does not exist, which we can again read off
as being {0, 1}. We form a table to check for concavity and find
f 00 (x)
2 3(3 + 3)
12 3
3 3
=
=
.
f ( 3) =
(1 3)2
8
2
Thus
3, 3 2 3 is a local maximum and 3, 3 2 3 is a local minimum. Since f 00 (0) = 0 we
cannot infer any information about this critical point. If we continue to take derivatives, we will
find that f (3) (0) = 6 and so by the generalized second derivative test, 0 is an inflection point.
Plotting: Putting all of this information together, the student should get the following plot:
10
2
5
f (x) =
x3
1x2
10
4.6.2
An interesting question comes when we ask ourselves How much can we change the above examples
and still end up with something which is qualitatively similar? For example, what if we consider
113
c
2015
Tyler Holden
Applications Of Derivatives
4.6
Curve Sketching
the function
x3
c x2
for some c R? What values of c change the qualitative behaviour?
fc (x) =
Our function still passes through the origin, is odd, and has an oblique asymptote of x (since
cx
polynomial long division yields fc (x) = x + cx
2 ), but we notice that whether or not fc (x) has
vertical asymptotes depends on the sign of c. In particular, if c > 0 then there are asymptotes are
c, while if c = 0 our function becomes fc (x) = x for all x 6= 0, so has no asymptotes but is still
not defined at a point. Finally, if c < 0 then our function has no asymptotes, and has domain R.
We can compute our derivative to be
fc0 (x) =
x2 (3c x2 )
.
(c x2 )2
Again here, the sign of cis important. The critical point at 0 is unchanged: if c > 0 then there are
also critical points at 3c and c, while if c < 0 then there are no additional critical points.
fc0 (x)
c>0
x < 3c 3c < x < c c < x < 0 0 < x < c
c < x < 3c x > 3c
+
+
+
+
xR
fc0 (x)
c=0
x<0
fc0 (x)
c<0
x>0
This tells us that if c 0 then there this function has no maxima/minima, and is always decreasing.
The second derivative can be computed to be
2cx(x2 + 3c)
.
(c x2 )3
If c > 0 then potential inflection points are x = 0, x= c. If c = 0 then fc00 (x) = 0. If c < 0 then
the potential inflection points are x = 0 and x = 3c.
fc00 (x) =
fc00 (x)
c>0
fc00 (x)
c=0
fc00 (x)
c<0
x< c
+
c<x<0
0< c
+
x> c
xR
0
114
c
2015
Tyler Holden
f (x) =
Integration
x3
cx2
f (x) =
x3
cx2
Integration
The theory of integration will consume much of the remainder of this course, and is in many ways
dual to differentiation. Consequently, the theory of integration is usually more difficult to establish,
despite the fact that it is much more natural mathematically.
The over-arching goal of integration is to add things together in a continuous fashion. This
manifests itself in applications as finding the area under a curve, the volume of an object, or even
as calculating physical quantities such as work, flux, or voltage potentials. The fact that this is even
remotely related to the process of differentiation is not at all obvious, though we will see shortly
that there is in fact an intimate relationship.
There are many different techniques for defining the integral. Any student who has seen the
integral in his/her secondary school days likely saw the method of Riemann sums, done in the typical
hand-waving fashion. We are going to digress from this approach and introduce an alternative
technique which generalizes much better to higher dimensions. This will allow for a more natural
segue for students who have to take the follow up course, MAT237.
5.1
We have to take a small digression before introducing the integral. In this section, we will introduce
the notions of infima and suprema. While these could have been introduced at the beginning of
the course, it is a difficult notion which can add complexity to an already chaotic first few weeks.
Now that we are more comfortable with abstract thinking, we shall begin to probe the subject.
As motivation to the subject, recall the following problem. One would often be given a function
f and told that its range is the interval [0, 1). When asked What is the maximum of this function,
the student would often be tempted to say The maximum is at 1, despite the fact that the correct
answer is There is no maximum. The reason there is no maximum is that the function never
actually attains the value 1, despite getting arbitrarily close to it (see Figure 22).
Here we feel cheated! Morally, the number 1 does everything that we want a maximum to do,
we just get bogged down in the pedantry of whether or not 1 actually belongs to the set or not.
However, the precision of mathematics means that pedantry is essential, so we have to introduce a
new way of discussing what this moral maximum should be.
115
c
2015
Tyler Holden
Integration
5.1
1
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
Figure 22: A function whose range is [0, 1). This function does not have a maximum, despite the
fact that it gets arbitrarily close to 1.
5.1.1
Definition 5.1
Let S R be a set. We say that M R is an upper bound for S if for every x S we have
x M . In this case we say that S is bounded from above. We say that M is the supremum
or least upper bound of S if whenever M 0 is an upper bound, then M M 0 . We will write
M = sup S to indicate the M is the supremum of S. If M S then we say that M is the
maximum of S.
Just as the name suggests, the least upper bound of a set is the smallest of the upper bounds,
though the word supremum is more commonly used by mathematicians. Naturally, there is a dual
notion for lower bounds:
Definition 5.2
Let S R be a set. We say that m R is an lower bound for S if for every x S we have
x M . In this case we say that S is bounded from below. We say that m is the infimum
or greatest lower bound of S if whenever m0 is an lower bound, then m m0 . We will write
m = inf S to indicate the m is the infimum of S. If m S then we say that m is the
minimum of S.
Hence we have that the infimum and supremum play the role of the moral minimum and
maximum discussed earlier, but are only bona-fide minima and maxima when the infimum and
supremum actually belong to the set.
116
c
2015
Tyler Holden
5.1
lower bounds
M
Integration
upper bounds
M = sup S
m = inf S
Figure 23: The blue represents the set S, green the set of lower bounds, and red the set of upper
bounds for S. The number m is the largest of the green while M is the smallest of the red.
Example 5.3
1. The set S = (3, 1) is bounded both above and below. The infimum is 3 and
supremum is 1. Since neither 3 nor 1 belong to S, the set S does not have a
minimum or a maximum.
2. Let S = 1, 21 , 13 , 14 , 15 , . . . . This set is bounded above and below. It supremum is 1,
and since 1 is in the set this is actually a maximum. On the other hand, its infimum
is 0, which is not in the set.
3. The empty set is bounded above and below. Every real number is both an upper
bound and a lower bound. Since there is no smallest/largest real number, this set does
not have an infimum or supremum, and so certainly does not have a maximum or a
minimum.
4. Every finite set has a maximum and a minimum.
5. Let S = x Q : x2 > 2, x > 0 . This set is bounded below but not above. The
In this course we have glossed over the construction/definition of the real numbers, and asked you to
believe us that such a thing exists. One way of constructing
the real numbers
from the rational numbers is
via a method called Dedekind cuts. The set S = x Q : x2 2, x > 0 is such a cut. One then identifies S
5.1.2
An absolutely crucial property of the real numbers that we will take for granted is the following:
The Completeness Axiom: Every non-empty set of real numbers which is bounded above
has a supremum. Equivalently, every non-empty set of real numbers which is bounded below has
an infimum.
The necessity that the set be non-empty is exemplified by Example 5.3 (3), while the fact that
the set must be bounded is demonstrated by Example 5.3 (5). There will be many instances in the
sections which follow in which we would like to work with the supremum of a set without needing
to know its value explicitly. The completeness axiom gives us the ability to guarantee that the
supremum exists, allowing us to work with it.
117
c
2015
Tyler Holden
Integration
5.1
Proposition 5.4
If S R and M = sup S then for any > 0 there exists an x S such that M < x M .
Proof. The proof of this fact is actually fairly obvious if we think about it, and I would encourage
the student to give it a try before reading onwards.
Since M is an upper bound for S we have that x M for all x S, so this part is always true.
Thus we need only show that there exists an x S with M < x. For the geometric insight, we
refer the reader back to Figure 23. The idea is that if there is no such x, then M is also an
upper bound and this is not possible.
More rigorously, for the sake of contradiction assume that no such x exists. Consequently, we
have that x M for all x S. This is precisely what it means for M to be an upper bound
for S. However, M < M which contradicts the fact that M is the least upper bound. Hence
our original assumption must have been false, and we conclude that there must be an x satisfying
M < x.
Determine the equivalent statement to Proposition 5.4 for the infimum of a set.
Prove your statement.
Exercise:
The real reason we have introduced suprema and infima is that we which to determine these
properties on the image of a function restricted to a subset of the real. To elaborate further, assume
that f is a function and S R is some set. Recall that the image of S under f is the set
f (S) = {f (x) : x S} = {y : x S, f (x) = y} .
One can think of f (S) as the range of f when we restrict f to the set S. We are interested in the
supremum and infimum of these sets. This is commonly written in one of the two following ways:
M = sup f (S) = sup f (x).
xS
Example 5.5
If f (x) =
x sin(x)
and S = (0, ) determine sup f (S) and inf f (S).
x+1
Solution. We know that 1 sin(x) 1 for all x R. Furthermore, since x S we have that
x
x > 0 so that x+1
> 0, yielding
x
x sin(x)
x
.
x+1
x+1
x+1
Now x/(x + 1) can never be greater than 1 (check this) but we have
x
lim
= 1.
x x + 1
Hence f (S) = (1, 1) and so sup f (S) = 1 while inf f (S) = 1.
118
c
2015
Tyler Holden
5.2
Sigma Notation
Integration
1
x sin(x)
x+1
0.5
0.5
1
5.2
Sigma Notation
Sigma notation is used to make complicated sums much easier to write down. In particular, we use
a summation index to iterate through elements of a list and then sum them together. Let us take
a moment to dissect the elements of the notation itself. Consider the expression
m
X
ri
(5.1)
i=n
which is read as the sum from i = n to m of ri . The element i is known as the dummy or
summation index, n and m are known as the summation bounds, and ri is the summand. In order
to decipher this rather cryptic notation, we adhere to the following algorithm:
1. Set i = n and write down ri ;
2. Add 1 to the index i and add ri to the current sum;
3. If i is equal to m then stop, otherwise go to step 2 and repeat.
For those computer savvy students out there, this is nothing more than a for-loop. Interpreting
(5.1) we thus have
m
X
ri = rn + rn+1 + rn+2 + rm .
i=n
Example 5.6
Set r1 = 5, r2 = 8, r3 = 4. Compute
3
X
ri .
i=1
Solution. Via our discussion above, we may write the summation out explicitly as
3
X
ri = r1 + r2 + r3 = 5 + (8) + 4 = 1.
i=1
119
c
2015
Tyler Holden
Integration
5.2
Sigma Notation
Now the ri could be a collection of unrelated numbers as in Exercise 5.6, but they could also
be a function of the index variable as follows:
Example 5.7
Compute
4
X
(2i + 1).
i=1
Solution. Following our algorithm, we start by setting i = 1 and then evaluating the summand. I
will write out the steps in slightly more detail than usual to illustrate the process:
4
X
(2i + 1) = (2i + 1)i=1 + (2i + 1)i=2 + (2i + 1)i=3 + (2i + 1)i=4
i=1
= (2 1 + 1) + (2 2 + 1) + (2 3 + 1) + (2 4 + 1)
=3+5+7+9
= 24.
Sometimes we get lucky and can find closed form expressions for certain abstract summations.
The following two examples are incredibly useful identities that are used throughout many fields of
mathematics.
Example 5.8
For n 1 compute
n
X
1.
j=1
Solution. Notice that our summand (the number 1) does not depend on our index variable j.
Nonetheless, we must proceed with our algorithm to find that
n
X
j=1
= n.
Since the summand did not depend on the index, we just repeated the index n-times with absolutely
no changes between the iterations. Of course, we can be hyper-rigorous and prove this by induction
if we so choose.
Example 5.9
For n 1 compute
n
X
j.
j=1
120
c
2015
Tyler Holden
5.3
Integration
Solution. This is another common example that we saw when we were doing induction, and indeed
we know that the solution is
n
X
j=
j=1
n(n + 1)
.
2
Notice now that we can combine Examples ?? and ?? in order to derive a general closed from
solution for the sum considered in Example ??. Indeed,
n
n
X
X
(2i + 1) = 2
i
i=1
i=1
n
X
i=1
n(n + 1)
+n
2
= n(n + 2).
=2
ip
i=1
but these expressions become successively more difficult to compute from a naive standpoint
as done above. For example, can you think of a clever way to even compute the case when
p = 2? Luckily, there is a standard way of deriving the closed form for any p using the
Bernoulli polynomials, which are popular objects in the study of number theory but are
tricky to define.
5.3
As mentioned in the beginning of this section, the goal of integrals is to sum up a continuous
collection of objects. One great motivational example for this is computing the area under a curve,
which can be thought of as summing the values f (x) as x ranges in some interval. The way this
is done is by first breaking our curve into rectangles whose area is easily computed, and effectively
analyze what happens as we allow the width of the rectangles to get arbitrarily small.
121
c
2015
Tyler Holden
Integration
5.3.1
5.3
Partitions
Definition 5.11
Consider a bounded function f and an interval [a, b]. A partition of [a, b] is a finite set of
points, P = {x1 , . . . , xn } such that
a = x1 < x1 < x2 < < xn1 < xn = b.
We will denote by |P | = n the number of points in the partition.
As defined, a partition is nothing more than a finite collection of ordered points. However, that
is not how you should think about a partition. Instead, you should have in mind that a partition
P defines
3 the endpoints of a collection of subintervals. So for example, consider the partition
P = 1, 2 , 2, 3 of the interval [1, 3]. Again, the idea is that each element of P represents a point
at which to cut the interval [1, 3]. After doing this, we get the result:
[1, 3/2],
[3/2, 2],
[2, 3].
If P and Q are partitions such that P Q, we say that Q is a refinement of P . In our example
above, if Q = {1, 3/2, 2, 5/2, 3} then P Q. The reason why this is said to be a refinement is
because the addition of the point 25 means that we split the interval [2, 3] into two more intervals
[2, 5/2], [5/2, 3]. In a very real sense, the partition has been refined.
5.3.2
3
2
5
2
Now that we have partitions in hand, we want to start computing areas under curves. We are going
to use partitions to break our intervals into smaller intervals, and use rectangles to make coarse
approximations to our curve.
Definition 5.12
|P |1
X
i=1
Lf (P ) =
"
|P |1
X
i=1
sup
x[xi ,xi+1 ]
inf
x[xi ,xi+1 ]
f (x) (xi+1 xi )
f (x) (xi+1 xi )
Note that since f is bounded on [a, b], it is certainly bounded on subintervals of [a, b], guaranteeing that the supremum and infimum actually exist.
122
c
2015
Tyler Holden
5.3
Integration
1
0.8
0.6
0.4
0.2
0
0
1
4
1
2
Figure 24: The upper (light blue) and lower (dark blue) Riemann sums for the function f (x) = 1x2
on the interval [0, 1].
Example 5.13
Solution. Conveniently, our function f (x) = 1 x2 is a decreasing function and is actually continuous. This means that not only do the supremum and infimum exist, but they are in fact maxima
and minima. Furthermore, a decreasing function will attain its maximum at the left-endpoint,
while attaining its min at the right end-point. This will make computing the upper and lower
Riemann sums much simpler.
We have three subintervals: [0, 1/4], [1/4, 1/2], and [1/2, 1]. To facilitate our computations, I
have made the following table:
xi+1 xi
sup f (x)
inf f (x)
c
2015
Tyler Holden
Integration
5.3
We can use this information to compute our upper and lower sums as follows:
Uf (P ) =
|P |1
X
i=1
"
sup
x[xi ,xi+1 ]
15 1
3 1
55
1
+
+
=
= 1
4
16 4
4 2
64
|P |1
X
Lf (P ) =
inf
f (x) (xi xi+1 )
i=1
x[xi ,xi+1 ]
15 1
16 4
3 1
4 4
1
27
+ 0
=
2
64
Refining a partition will cause the lower sum to get larger, while the upper sum to get smaller,
The lower sum is always smaller than the upper sum.
The proof of this is a simple exercise that we leave for the reader.
What we are interested in is whether we can make the upper and lower sums of a function get
really close to each other by choosing appropriate partitions. These seems like a limit question!
Can we make the limit of Lf (P ) approach that of Uf (P ) as we change partitions? The problem is
that the notion of defining limits in this case turns out to be extraordinarily tricky. As a result, we
will put the idea of limits temporarily out of our minds and proceed as follows:
124
c
2015
Tyler Holden
5.3
Integration
Definition 5.15
Let f be a bounded function on the interval [a, b]. We define the lower integral I ba (f ) and
b
upper integral I a (f ) as
I ba (f ) = sup Lf (P )
P
b
I a (f )
= inf Uf (P )
P
f = I ba (f ) = I a (f ),
Note that by Lemma 5.14 we always have that I ba (f ) I a (f ), so if the upper and lower integrals
are equal, their value is unique. Furthermore, when it is important to emphasize the particular
variable, we might sometimes write
f=
f (x)dx.
Of course, any other choice of variable other than x would also have served.
125
c
2015
Tyler Holden
Integration
5.3
Theorem 5.16
4. Inherited Integrability: If f is integrable on [a, b] then f is integrable on any subinterval [c, d] [a, b].
5. Monotonicity of Integral: If f, g are integrable on [a, b] and f (x) g(x) for all
x [a, b] then
Z b
Z b
f (x)dx
g(x)dx.
a
One approach often taken in defining the integral is to take a limit as the length of partitions
goes to zero, or alternatively as the number of partition points goes to infinity. We have avoided
this technique because properly understanding what such a limit does turns out to be exceptionally
complicated. There is special case where one might only consider partitions of uniform length,
which might be something like the following: 21
21
If you are interested in the buzzwords, the limit can be computed because the set of partitions on an interval [a, b]
form a directed set. The fact that uniform partitions can be used follows form the fact that uniform partitions are
126
c
2015
Tyler Holden
5.3
Integration
Assume a priori that f is integrable on the interval [a, b]. Since f is integrable, we know that
Rb
the infimum of its upper sums is precisely the value a f . For each n N, define a partition Pn
with 2n + 1 elements (hence containing 2n subintervals) defined by
ba
ba
ba
n
,a + 2
, . . . , a + (2 1)
,b
Pn = a, a +
2n
2n
2n
ba
n
= a+i
: i = 0, . . . , 2 .
n
Uf (Pn ) =
n 1
2X
Mi,n
i=0
ba
2n
ba
2n
n 1
2X
Mi,n .
i=1
Recall from Lemma 5.14 that since Pn1 Pn then Uf (Pn1 ) Uf (Pn ). Since the infimum of
Uf (P ) is the integral and the Uf (Pn ) are decreasing, then23
Z
2 1
ba X
f = lim
Mi,n .
n 2n
i=0
Example 5.18
Rb
a
f.
Proof. Since f is continuous, we can use a Riemann sum to compute the definite integral. Furthermore, since f (x) = x is monotonically increasing, it maximum on an interval will always occur at
cofinal amongst all partitions. Alternatively, if one assigns heights to each partition (corresponding to the length of
a maximal subinterval), one can always choose a uniform representative from the collection, which turns out to be
the coarsest such partition amongst those partitions of the same height.
22
The reason why we broke [a, b] into 2n subintervals is so that Pn1 Pn . Notice that if we had only broken [a, b]
into n subintervals, there would be no relationship between partitions for subsequent n.
23
We are cheating a bit here. We need to argue that the Uf (Pn ) do not get stuck before reaching the infimum,
but its not too hard to convince ourselves that this is true.
127
c
2015
Tyler Holden
Integration
5.4
Anti-Derivatives
5.4
2n + 1
n
2n
1
1
= ab a2 + (b a)2 = ab a2 + (b2 2ab + a2 )
2
2
1 2
2
= (b a ).
2
Anti-Derivatives
Anti-differentiation is the reverse process of differentiation; that is, if I give you a function f (x)
then our goal is to find a function F (x) such that F 0 (x) = f (x). To this end, we have the formal
definition:
Definition 5.19
Given a function f on [a, b], we say that a function F is an anti-derivative of f if F 0 (x) = f (x)
for all x [a, b].
A decent part of this course will be dedicated to determining anti-derivatives of functions
(though this is not yet obvious). In fact, just using the properties of differentiation, we can immediately infer a few results about anti-derivatives. Since the derivative is linear, we have
d
cf (x) = cf 0 (x),
dx
d
[f (x) + g(x)] = f 0 (x) + g 0 (x)
dx
and this tells us that anti-differentiation will also be linear. To see this, let F (x) and G(x) be
anti-derivatives of f (x) and g(x), so that
d
d
[cF (x)] = cF 0 (x) = cf (x) = c F (x)
dx
dx
d
d
d
[F (x) + G(x)] = F 0 (x) + G0 (x) = f (x) + g(x) =
F (x) +
G(x).
dx
dx
dx
128
c
2015
Tyler Holden
5.4
Anti-Derivatives
Integration
Example 5.20
Solution. We know that polynomials differentiate to give polynomials, so lets assume that F (x) =
xn for some n. For F (x) to be an anti-derivative of f (x) it must be that F 0 (x) = nxn1 = f (x) =
5x4 . It is not too hard to see that n = 5 works, so that the anti-derivative of f (x) = 5x4 is
F (x) = x5 .
The previous example was exceptionally easy to solve because of the coefficient 5 in the monomial term. If that term had not been there, then we would just artificially add it. For example,
the anti-derivative of 3x4 may be computed by realizing that
5
3
3x4 = 3 x4 = 5x4 .
5
5
Since scalar multiples pass through derivatives, we hypothesize that the anti-derivative of 3x4 is
3 5
5 x and a quick computation confirms this.
Note that the anti-derivative of a function is not unique, since we may add any constant to a
function to find a new anti-derivative. For example, assume that F (x) is an anti-derivative for f (x)
so that F 0 (x) = f (x). Define a new function Fc (x) = F (x) + c for any constant c R. We then
have that
d
d
Fc (x) =
[F (x) + c] = F 0 (x) = f (x).
dx
dx
so that Fc (x) is also an anti-derivative. This implies that there are an entire real numbers worth
of functions which are the anti-derivative of a function. More concretely, Example 5.20 shows that
x5 is the anti-derivative of 5x4 , but a quick computation easily shows that x5 + c also differentiates
to 5x4 for any constant c.
Of course, our knowledge of the Mean Value Theorem immediately tells us the following:
Corollary 5.21
x + sin(x) ex .
Solution. Notice that the second derivative is given, so we will have to compute the anti-derivative
twice. Here in particular it is essential to recall that anti-derivatives are only defined up to additive
constants. According to our table above, we have the following derivative - anti-derivative pairs:
x=
d 2 3/2
x ,
dx 3
sin(x) =
d
( cos(x)),
dx
129
c
2015
Tyler Holden
ex =
dx
de
Integration
5.5
d 2 5/2
x ,
dx 5
cos(x) =
d
sin(x),
dx
ex =
dx
,
de
c=
d
x,
dc
so that f (x) is
f (x) =
4 5/2
x sin(x) + ex + cx + d
15
If additional criteria are supplied, such as the value of f (x) (or its derivatives) at particular
points, then a truly unique solution may be identified.
Example 5.23
Using your solution to Example 5.22, compute the unique anti-derivative which satisfies
f (0) = 10 and f 0 (0) = 0.
Solution. Our above example showed that f 0 (x) = 32 x3/2 cos(x) + ex + c. By substituting x = 0
into this we get
2
0 = f 0 (0) = 03/2 cos(0) + e0 + c = c
3
so that c = 0. Thus f (x) =
4 5/2
15 x
10 = f (0) =
4 5/2
0 sin(0) + e0 + d = 1 + d
15
4 5/2
x sin(x) + ex + 9.
15
Notice that Example 5.23 required two conditions to specify the number of constants. In
general, if one is given the nth derivative of a function, one needs to specify n-conditions to uniquely
determine the function.
5.5
In this section, we will make the connection between the theory of integration and the theory of
differentiation, by means of the Fundamental Theorem of Calculus. Let f be an integrable function
on [a, b]. By Theorem 5.16 we know that for any subinterval [c, d] [a, b], f is also integrable on
[c, d]. In particular, lets fix the left endpoint at a. Now for each x [a, b], we have an integrable
130
c
2015
Tyler Holden
5.5
Integration
function on [a, x] and hence the definite integral exists and produces a number. Thus we have a
function
Z x
F (x) =
f (s)ds
a
which assigns to each point x the value of the definite integral on [a, x]. Analogous to differentiation,
wherein we had a function f on [a, b] and created a function f 0 on [a, b] with interesting properties,
we now have the function F on [a, b], and we are interested in its properties.
Theorem 5.24: Fundamental Theorem of Calculus
Rx
1. If f is integrable on [a, b] then F (x) = a f (s)ds is continuous on [a, b]. Moreover, F is
differentiable at any point where f is continuous, and in this case F is an anti-derivative
of f .
2. If f is integrable on [a, b], and F be a continuous anti-derivative of f which is differentiable at all but finitely many points, then
Z
Proof.
xy
f (s)ds +
f (s)ds =
f (s)ds = F (y)
Ry
which we can re-arrange to get F (y) F (x) = x f (s)ds. Since f is necessarily bounded,
there exists M > 0 such that |f (s)| M for all s [a, b]. Applying Theorem 5.16 we see that
Z y
Z y
Z y
f (s)ds
|f (s)|ds
M ds = M |y x|.
|F (y) F (x)| =
x
M.
We will proceed by using the - definition of the limit. Let > 0 be given. Since f is
continuous, there exists a such that for all |x y| < we have |f (x) f (y)| < . If |h| <
then
Z
Z
F (x + h) F (x)
1 x+h
1 x+h
f (x) =
f (s)ds f (x) =
[f (s) f (x)] ds.
h
h x
h x
131
c
2015
Tyler Holden
Integration
5.5
In the last equality, we have used the fact that we are integrating with respect to s, so f (x)
R x+h
is a constant, and x f (x)ds = f (x)h. Applying Theorem 5.16 we have
Z
Z x+h
F (x + h) F (x)
1
1 x+h
[f (s) f (x)] ds
|f (s) f (x)|dx.
f (x) =
h
|h| x
|h| x
Now since s [x, x + h] (or [x + h, x] if h < 0) and |h| < , then by continuity we know
|f (s) f (x)| < , and so
Z x+h
Z x+h
F (x + h) F (x)
1
|f (s) f (x)|dx
ds = .
f (x)
h
|h| x
|h| x
We thus conclude that
F (x + h) F (x)
lim
f (x) = 0.
h0
h
By the Squeeze theorem, this limit is zero if and only if the object within the absolute values
is zero, and so differentiability follows.
2. Let {y1 , . . . , yn } be the points where F fails to be differentiable. Let P 0 be an arbitrary
partition, and consider the partition
P = P 0 {y1 , . . . , yn } = {x1 , . . . , xn } .
Notice that F is continuous on each subinterval [xi , xi+1 ] and differentiable on (xi , xi+1 ) and
so by the Mean Value Theorem, there exists i [xi , xi+1 ] such that
F (xi+1 ) F (xi ) = F 0 (i )(xi+1 xi ) = f (i )(xi+1 xi )
where in the last inequality, we have used the fact that we know F 0 (i ) = f (i ) by Part 1.
Now
F (b) F (a) = [F (xn ) F (xn1 )] + [F (xn1 ) F (xn2 )] +
+ [F (x3 ) F (x2 )] + [F (x2 ) F (x1 )]
n1
X
i=1
inf
x[xi ,xi+1 ]
f (i )(xi+1 xi ).
f (x) f (i )
sup
x[xi ,xi+1 ]
Lf (P ) F (b) F (a) Uf (P ).
This inequality is true regardless of the partition, and since the function is integrable, the
resulting number F (b) F (a) must be the value of the integral, as required.
Remark: Effectively, the fundamental theorem of calculus indicates that differentiation and
integration are inverses of one another. This is not exactly true, as the following example demonstrates:
132
c
2015
Tyler Holden
5.6
Integration
Example 5.25
Hence the difference between these two terms comes out to f (0); that is, they differ up to a
constant.
Example 5.26
Compute
sin(t)dt, and
1
dx.
1 + x2
1
Similarly, we know that G(x) = arctan(x) is an anti-derivative of 1+x
2 , and hence
Z 0
1
dx = arctan(0) arctan(1) = .
2
4
1 1 + x
5.6
The whole motivation for introducing integrals was to compute the areas under curves. Of course,
Example 5.26 demonstrates a problem: It is possible for our areas to be 0, or even negative! The
reason for this is that the integral as defined computes the signed area of the function, and hence
assigns negative area to any area that occurs beneath the x-axis. The way to fix this is to use
absolute values:
Example 5.27
Determine the unsigned (that is, total) area under the graph of sin(x) on the interval [, ].
133
c
2015
Tyler Holden
Integration
5.7
Indefinite Integrals
Solution. Example 5.26 told us that the signed area was exactly zero despite the fact that sin(x)
is
R not identically the zero function. To compute the total area, we shall compute the integral
| sin(x)|dx. We deal with the absolute value in precisely the same manner that we always deal
with absolute values: We break it into cases:
(
sin(x)
0x
| sin(x)| =
.
sin(x) x < 0
By additivity of domain, we thus get
Z
Z
| sin(x)|dx =
| sin(x)|dx +
0
sin(x)dx +
| sin(x)dx
sin(x)dx
= [ cos(x)]0x= + [ cos(x)]x=0
We can also use the integral to find the area between two curves. Given two curves f, g on [a, b],
the area between f and g can be computed as
Z b
[f (x) g(x)] dx.
a
We note though that this is still a signed area. In particular, anywhere where f > g will be assigned
a positive area, while
R b area where f < g will be given a negative area. Of course, the total area can
be computed by a |f (x) g(x)|dx.
Example 5.28
x and g(x) = x2 .
Solution. Notice that we were not explicitly given an interval over which to integrate. The reason is
that if one sketches the graphs of f and g, there is only one area that can be said to be enclosed by
the two functions. Consequently, we need to determine where the appropriate intersections occur.
This can be done by equation f (x) = g(x).
Setting x2 = x, one can easily to solve to find that the intercepts occur at x = 0 and x = 1.
Furthermore, on [0, 1] we have that x > x2 , and hence our area is given as
Z 1
2 3/2 1 3 1
2
x x dx =
x x
3
3
0
0
2 1
1
= = .
3 3
3
5.7
Indefinite Integrals
We have seen that integration and differentiation are spiritual inverses of one another,
up to an
Rx
additive constant. In particular, for an integral f on [a, b] we saw that F (x) = a f (x)dx was an
134
c
2015
Tyler Holden
5.7
Indefinite Integrals
Integration
f (x)dx.
Anti-derivatives are unique up to constants, so there exists some constant C such that F = G + C,
with F playing a particularly nice representative. However, the constant seems rather artificial:
we know that the anti-derivative of x3 is 41 x4 + C, but the meat-and-bones lies with the 14 x4 term,
not the constant. Hence our goal for this section is to represent the entire class of anti-derivatives,
something called the indefinite integral.
The indefinite integral does not concern itself with upper and lower bounds of integration: our
goal is to encompass an entire class of functions and imposing bounds forces us to look at particular
representatives. Consequently, we denote the indefinite integral as the usual integral, albeit with
the bounds omitted:
Z
f (x)dx.
Example 5.29
f (x)f 0 (x)dx.
Solution. In time, we will learn more systematic ways of determining these integrals, but for now
we will need to use the clever part of our brains to find appropriate classes of anti-derivatives.
1. Notice that we can re-write the integrand as
x4 + 2x2 + 1
2
1
= x + + 3.
3
x
x x
We are well acquainted with the functions which yield these as derivatives, and we get
Z 4
Z
x + 2x2 + 1
2
1
1
1
=
x
+
+
dx = x2 + 2 log(x) 2 + C.
x3
x x3
2
2x
This yields to use our first curious case: Integrating a rational function resulted in a logarithm.
2. The important step here is to realize that 12 sin(4x) = sin(2x) cos(2x), hence our integral
becomes
Z
Z
1
1
sin(2x) cos(2x)dx =
sin(4x)dx = cos(4x) + C.
2
8
135
c
2015
Tyler Holden
Integration
5.7
Indefinite Integrals
3. This problem is a little more abstract: We need to find a function which differentiates to
d
f (x)f 0 (x). If we think hard, we see that dx
[f (x)]2 = 2f (x)f 0 (x), so by dividing by 2 we will
get the desired integrand. Applying the Fundamental Theorem of Calculus, we thus get
Z
Z
d
0
f (x)f (x)dx =
f (x)2 dx = f (x)2 + C.
(5.2)
dx
Remark:
1. Part 3 can lead to rather terrible notation, not seen in mathematics so much as physics. Redy
member that physicists love to treat dx
as a proper fraction, treating dy and dx as indivisible
atoms which form the whole of the derivative. If one examines Equation (5.2) we see that
treating dx as a fraction could cause them to cancel: consequently, some physicists might
write
Z
Z
d
g(x)dx = dg(x) = g(x).
dx
There is a sense in which this is valid, and it leads to an incredible world of complex and
beautiful mathematics, but not one that is easily explained. For now, let us take this as a
sign of physicist abuse and disregard it as little more than such.
2. While
we are on the topic of bad notation, another convention sometimes seen is to write
R
dxf (x). While I cannot think of anything which is technically wrong with this notation, I
find it distasteful.
Indefinite integrals, in one form or another, are essential in solving differential equations. These
are equations of the form
dy
= f (y, x)
dx
and are an exceptionally useful study in their own right. We will not invest much time on them in
this course, but their importance is so great that they warrant many courses dedicated explicitly
to their study. As a very simple example, consider an equation that one often sees in high-school
physics. A particle thrown upwards from the surface of the Earth with initial height y0 and velocity
v0 has its height y(t) dictated as a function of time by the equation
1
y = gt2 + v0 t + y0 ,
2
where g is the gravitational acceleration of the Earth. More generally, a particle given an acceleration a will travel a distance y(t) = 12 at2 + v0 t + y0 . How do we arrive at this equation? If we
2
look back at Section 3.3.1, one can show that ddt2y = a. Using anti-derivatives, we can compute that
dy
1 2
dt = at + C1 and y(t) = 2 at + C1 t + C2 . This is not quite what we were after, but we can solve
for the constants C1 , C2 by using our initial conditions. Knowing that at time 0 our height was y0 ,
we have
y0 = y(0) = 0 + 0 + C2
so that C2 = y0 . Similarly, at time 0 our speed was v0 , giving
v0 = y 0 (0) = 0 + C1 ,
so that C1 = v0 . Combining these together yields y(t) = 21 at2 + v0 t + y0 as required.
136
c
2015
Tyler Holden
5.7
Indefinite Integrals
Integration
Notes on Notation: At this point, it is probably worth pointing out that we have been dramatically overloading our use of the integral sign. We have thus far seen three different objects: If f is
integrable on [a, b] then
Z
f (s)ds,
F (x) =
f (s)ds,
f (s)ds.
The first is simply a number which represents the signed area under the function f ; the second is
a function which assigns to each x the area under the function from a to x; the third is an infinite
family of functions, all representing anti-derivatives of f . They are all intimately related to be
certain, but each has a very different lifestyle. One must be careful not to confuse the relationships.
137
c
2015
Tyler Holden
Integration Techniques
Integration Techniques
6.1
Integration by Substitution
Having seen that integration and differentiation are essentially inverses, we would like to develop
some techniques and rules to compute integrals. It should be unsurprising that those rules will arise
as the inverse operations of the rules obtained from differential calculus. We recall the chain rule
of differential calculus tells us that
d
f (g(x)) = f 0 (g(x))g 0 (x)
dx
hence by applying the fundamental theorem of calculus, we see that
Z
f 0 (g(x))g 0 (x)dx = f (g(x)) + C.
(6.1)
Unfortunately, the majority of times nature will conspire against us and not write our integrand
so plainly as f 0 (g(x))g 0 (x). Hence we develop some techniques to make life simpler. Our strategy
is as follows:
1. Look to see if we can find the occurrence of a function and its derivative. In (6.1) above, we
are looking for the function g(x), since it occurs in the argument of f (x) and its derivative
appears as g 0 (x).
2. Define a new variable, u = g(x) so that
du = g 0 (x)dx.
du
dx
du
5. We now have our solution, but it is in terms of the variable u. This is not a problem since
we know that u = g(x), so we just make this substitution to get our final solution
Z
f 0 (g(x))g 0 (x)dx = f (g(x)) + C.
Lets try a simple example:
Example 6.1
Determine
sin(2x) cos(2x)dx.
138
c
2015
Tyler Holden
6.1
Integration by Substitution
Integration Techniques
Solution. We can choose either u = sin(2x) or u = cos(2x). Indeed, lets see what happens if we
choose either. If u = sin(2x) then du = 2 cos(2x)dx and our integral becomes
Z
Z
1
1
1
sin(2x) cos(2x)dx =
udu = u2 + C = sin2 (2x) + C.
2
4
2
On the other hand, if we choose u = cos(2x) then du = 21 sin(2x) and we get
Z
Z
1
1
udu = cos2 (2x) + C.
sin(2x) cos(2x)dx =
2
4
Of course, we know that sin2 (2x) and cos2 (2x) are very different functions, and here we see the
importance of the constants.
R
To add to this bit of noise, notice that in Example 5.29 we calculated sin(x) cos(x) =
81 cos(4x) + C.
Example 6.2
For a, b 6= 0, compute
xn1
dx.
a + bxn
Solution. Following the above program, our first step should be to identify a function and its
derivative. The fact that there is an xn and an xn1 is a pretty good sign. Note that since
constants do not affect the integration, we can make our lives even easier if we define u = a + bxn
so that du = bnxn1 dx. Unfortunately, there is no bnxn1 dx in the integrand, but there is an
xn1 dx. Since these are related only up to a constant, we can divide both sides to find that
1
xn1 dx = bn
du. Adding our substitutions we then get
Z
xn1
dx =
a + bxn
1
du
bn
1
=
bn
u
1
du.
u
This is now a very simple integral to calculate, and indeed we find that
Z
1
1
2
du =
u + C.
bn
bn
u
We need this to be in terms of x rather than u, so we recall that u = a + bxn to finally find that
Z
xn1
2
dx =
a + bxn + C.
n
bn
a + bx
Substitution is not just handy for applying the chain rule. It also allows us to change variables.
Example 6.3
Compute
x x + 1dx.
139
c
2015
Tyler Holden
Integration Techniques
6.1
Integration by Substitution
Solution. Notice that if we could somehow switch the x and the x + 1, this integral would actually
3
be really easy, since then (x + 1) x = x 2 + x. Normally in mathematics, if we want to do such a
thing, we just define a new variable u = x + 1 so that x = (u 1) and then a similar trick to the
one above will work.
Since we are working with an integral though, we must be a bit more careful. We shall still
define u = x + 1 with x = (u 1), but we must also track the differentials. Luckily, in this case
du = dx and there is nothing to do. We thus get
Z
x x + 1dx =
=
(u 1) udu
3
(u 2 u)du
2 5 1
= u 2 u2 + C
5
2
Very quickly, let us remark how we must change definite integrals when we do integration.
Consider the following integral
Z 1
x dx.
cos
2
0
One can easily see compute this as
Z
cos
1
2
2
x dx = sin
x = .
2
Alternatively though, one should recognize that integrating cos 2 from 0 to 1 is that same as
integrating cos(x) from 0 to 2 . How does this reflect in the integral if we make such a substitution?
Let u = 2 x so that du = 2 dx. If we were to naively carry though with the integration we would
get
Z 1
Z
1
2 1
2
2
cos
cos(x)dx = sin(x) = sin(1).
x dx =
2
0
0
0
6.2
6.2
Integration by Parts
Integration Techniques
Integration by Parts
Just as Integration by Substitution was the inverse of the chain rule, Integration by Parts is the
analog of the product rule. Namely, we know that if u(x) and v(x) are functions, then
d
[u(x) v(x)] = u0 (x)v(x) + u(x)v 0 (x).
dx
Integrating and applying the fundamental theorem of calculus, we then find that
Z
d
(u v)dx = u(x) v(x)
dx
Z
Z
dv
du
= v dx + u dx
dx
dx
Z
Z
= vdu + udv
for which we may re-arrange to find
Z
udv = uv
vdu.
(6.2)
In the event of the definite integral, the bounds of integration can be carried throughout; that is,
Z b
b Z b
udv = uv
vdu.
a
R
Our strategy should be as follows: Assume we are integrating a product f (x)g(x)dx. We want
to choose a candidate for dv and for u. Since we will need to integrate dv, it is often best to choose
a function which is easy to integrate:
1. First look at the integrand and see if we can apply substitution. If so, do not worry about
integration by parts.
2. Choose dv and u (I often choose dv to be whichever function is easiest to integrate),
R
R
3. Compute that v by integrating dv = f (x)dx. Compute du by differentiating u.
4. Substitute all appropriate variables into (6.2).
This is just a general idea of how you should proceed. To give some insight as to what is
happening, consider Equation (6.2) by omitting the uv-term:
Z
Z
u(x)v 0 (x)dx = u0 (x)v(x)dx.
The is the power of integration by parts: It effectively allows us to transfer the derivative from one
function to another!24
24
(x)
is
done
using integration
R
R
by parts; that is 0 (x)f (x)dx = (x)f 0 (x)dx = f 0 (0).
141
c
2015
Tyler Holden
Integration Techniques
6.2
Integration by Parts
Example 6.4
x sin(x)dx.
Solution. Following our program, I personally find that sin(x) is easier to integrate than just x so
we set dv = sin(x)dx. Furthermore, my note above suggests that we should set u = x so everything
works out. Computing du and v we find that
u=x
dv = sin(x)dx
du = dx v = cos(x).
Plugging these into (6.2) we find that
Z
x sin(x)dx = x cos(x) +
cos(x)dx
= x cos(x) + sin(x) + C.
The best thing about integration is that you can always check your answers by differentiating. Try
it!
Alternatively, there are time when the integrand does not look like a product, but we may still
apply integration by parts.
Example 6.5
Compute
log(x)dx.
Solution. Looking at the integrand, there do not immediately appear to be two functions, so how
can we apply integration by parts? The solution is to realize that log(x) = log(x) 1, so that the
constant function 1(x) = 1 is actually our second function. I think that 1 is really easy to integrate,
so let us set dv = 1 dx and u = log(x) to find that
u = log(x) dv = 1 dx
v=x
du = dx
x
Substituting these values into (6.2) we find
Z
x
dx
x
= x log x x + C.
log(x)dx = x log(x)
142
c
2015
Tyler Holden
6.2
Integration by Parts
Integration Techniques
I believe it is worth noting at this point that while we have never come across a function which
could not be differentiated, there are lots of functions which cannot be analytically integrated. Per2
haps the most famous example is the function ex . It can be shown (using a field called Differential
Galois Theory) that this function has no elementary anti-derivative.
Lets conclude with an example that will require all of our skills thus far.
Example 6.6
cos(log x)dx.
Solution. The tricky thing to notice here is that it is not clear that either integration by parts or
substitution will actually work. Nonetheless, a change of variables is the appropriate thing to do,
as it will let us pull the function log(x) outside of the cos(x) as follows. Let y = log(x) so that
dy = dx
x . Alternatively, we can write dx = xdy. This is silly though, since if we were to substitute
this into our integral we would have variables in both x and y and that would be horrible. Instead,
realize that since y = log x then x = ey and so dx = ey dy. Putting this all together, we have
Z
cos(log x)dx =
ey cos(y)dy.
Now we can proceed via integration by parts. It seems to be that ey is easiest to integrate, so let
us set dv = ey dy and u = cos(y) yielding
u = cos(y)
dv = ey dy
du = sin(y)dy v = ey .
Plugging these into (6.2) we get
Z
e cos(y)dy = e cos(y) +
ey sin(y)dy.
(6.3)
R
Looking at the integral ey sin(y)dy, we have failed to simplify the integral. However, lets just see
what happens if we integrate by parts once more. Set u = sin(y) and dv = ey so that
u = sin(y)
dv = ey dy
du = cos(y)dy v = ey .
Equation (6.2) then tells us that
Z
e sin(y)dy = e sin(y)
143
c
2015
Tyler Holden
ey cos(y)dy.
(6.4)
Integration Techniques
6.3
ey cos(y)dy = ey
1
ey cos(y)dy = ey
2
cos(y) + sin(y)
ey cos(y)dy.
adding
ey cos(y)
to both sides
cos(y) + sin(y)
dividing by 2.
Now we have our solution in terms of y, but need to revert to a solution in terms of x. Since
y = log(x) we conclude
Z
x
1
cos(log x)dx = elog x cos(log x) + sin(log x) =
cos(log x) + sin(log x) .
2
2
6.3
Integrating trigonometric functions often comes down to a combination of using the Pythagorean
identities, or angle sum identities. Notice very conveniently that the three Pythagorean identities
are
sin2 (x) + cos2 (x) = 1, and
d
dx
d
sin(x) = cos(x), dx
cos(x) = sin(x),
d
dx
d
tan(x) = sec2 (x), dx
sec(x) = sec(x) tan(x)
d
dx
d
cot(x) = csc2 (x), dx
csc(x) = csc(x) cot(x).
Hence each identity only contains the components of a function and its derivatives. This will prove
to be an exceptionally useful tool shortly.
The other trick to remember is the following two identities:
1
sin2 (x) = (1 cos(2x)),
2
and
1
cos2 (x) = (1 + cos(2x)).
2
6.3
Integration Techniques
Now what happens if our integrand is made up a several different trigonometric functions? For
example, how would we deal with something of the form
Z
sin2 (x) cos3 (x)dx?
The idea is to use the Pythagorean identities listed above to reduce the integrand to an expression
of the form g(f (x))f 0 (x)dx, since from here one can easily apply the substitution method: indeed,
Z
Z
u = f (x)
0
g(f (x))f (x)dx = g(u)du
du = f 0 (x)dx
= G(f (x)) + C
Our integrand now consists entire of sine functions, and a single cosine function which will serve
the role of allowing us to make the appropriate substitution. Hence setting u = sin(x) so that
du = cos(x)dx we have
Z
Z
2
3
sin (x) cos (x)dx = sin2 (x)[1 sin2 (x)] cos(x)dx
Z
= u2 (1 u2 )du
1
1
= u3 u5 + C
3
5
1
1
3
= sin (x) sin5 (x) + C.
3
5
Solution. One again we conveniently have only secant and tangent functions, which are intimately
related to one another through their derivatives. We have to now determine which function we are
going to use as a substitution. If we make the substitution u = tan(x) then du = sec2 (x)dx and
our integral becomes
Z
Z
Z
4
5
3
4
sec (x) tan (x)dx = sec (x) tan (x)du = sec3 (x)u4 du.
145
c
2015
Tyler Holden
Integration Techniques
6.3
But now we are in trouble, since there is no obvious way to turn sec3 (x) into a function involving
only tan(x).
Lets try the other substitution. If we make the u = sec(x) substitution, we will have du =
sec(x) tan(x) and so our integral will become
Z
This is perfect, since tan4 (x) = [tan2 (x)]2 = [sec2 (x) 1]2 . Hence we can compute our integral as
Z
2
u2 1 u3 du
1
1
1
= u8 u6 + u4 + C
8
3
4
1
1
1
8
= sec (x) sec6 (x) + sec4 (x) + C.
8
3
4
The previous two example really seem to indicate the importance that the powers of one of the
functions have odd degree. In the event that everything is of even degree, we once again have to
use our identities on relating sin2 (x) and cos2 (x) to cos(2x).
Example 6.8
cos4 (x)dx.
Solution. We know that cos2 (x) = 12 (1 + cos(2x)) and so our integral becomes
Z
cos (x)dx =
=
=
=
=
Z
1
[1 + cos(2x)] [1 + cos(2x)] dx
4
Z
1
1 + 2 cos(2x) + cos2 (2x) dx
4
Z
1
1
[x + sin(2x)] +
cos2 (2x)dx
4
4
Z
1
1
1
x + sin(2x) +
[1 + cos(4x)] dx
4
4
8
3
1
1
x + sin(2x) +
sin(4x) + C.
8
4
32
One could enumerate a rather large list of all the possible cases and the tricks to use, but this is
a rather large waste of time. Instead, the student should just realize that the fundamental idea is
to apply trigonometric identities in a way that isolates a single occurrence of a functions derivative
and then apply substitution.
146
c
2015
Tyler Holden
6.4
Trigonometric Substitution
6.4
Integration Techniques
Trigonometric Substitution
The observant student will have recognized something very curious when we were computing the
derivatives of the inverse trigonometric functions. In particular, we have
d
1
arctan(x) =
,
dx
1 + x2
d
1
arcsin(x) =
dx
1 x2
so that the derivative of ostensibly trigonometric functions, yielded quantities which were effectively
rational in description. If one recollects how these derivatives were computed, one of the main
reasons that rational function appear is the Pythagorean identities:
sin2 (x) + cos2 (x) = 1,
1
dx.
1+x2
Solution. We make the substitution x = tan() so that dx = sec2 ()d. Our integral thus becomes
Z
sec2 ()
d
1 + tan2 ()
Z
sec2 ()
=
d
sec2 ()
= + C.
1
dx =
1 + x2
147
c
2015
Tyler Holden
Integration Techniques
6.4
Trigonometric Substitution
Not all things will be so simple, so let us try a slightly harder question:
Example 6.10
Determine
x2
dx.
16 x2
Solution. This time we are tempted to make the substitution x = sin() and morally this is correct.
However, if we think about it a bit more we can make our lives a lot easier by setting x = 4 sin().
The reason is that
16 x2 = 16 (4 sin(x))2 = 16 1 sin2 (x) = 16 cos2 (x).
This will make our lives much easier, as removing the addition sign will allow us to apply the square
root. Now if x = 4 sin() then dx = 4 cos()d and we get
Z
Z
16 sin2 () [4 cos()d]
x2
p
dx =
d
16 x2
16 cos2 ()
Z
= 16 sin2 ()d
1
= 8 sin(2) + C
2
The tricky part comes in changing this back into a function of x. Since x = 4 sin() we know that
= arcsin(x/4), but the pesky presence of the 2 means that we cannot just simply plug this back
in. Instead, we must make the following substitution:
sin(2) = 2 sin() cos()
x 16 x2
=2
4
4
x 16 x2
=
.
8
We conclude that
x x16 x2
x2
dx = 8 arcsin
+ C.
4
2
16 x2
Example 6.11
(x2
x
dx.
+ 6x + 18)3/2
Solution. My hope would be that each student would try substitution first, though it is unlikely
to succeed given that I am trying to demonstrate the technique of trigonometric substitution.
Nonetheless, it should always be your starting point when you recognize a function and its derivative. Unfortunately, we do not have any of the recommended forms above so we must endeavour
148
c
2015
Tyler Holden
6.5
Partial Fractions
Integration Techniques
to manipulate the integrand until it looks amenable to our techniques. Indeed, we may quickly
complete the square of the denominator to find that
x2 + 6x + 18 = (x + 3)2 + 9.
Referencing our table above, we decide that we should make the substitution x + 3 = 3 tan so
that dx = 3 sec2 ()d and
(x + 3)2 + 9 = 9(tan2 () + 1) = 9 sec2 ().
Substituting into our integral we find
Z
Now tan =
x+3
3
(3 tan() 3)
(3 sec2 ())d
(9 sec2 ())3/2
Z
9(tan() 1) sec2 ()
=
d
27 sec3 ()
Z
tan() 1
1
d
=
3
sec()
Z
1
=
(sin() cos())d
3
1
= [cos() + sin()] + C.
3
x
dx =
2
(x + 6x + 18)3/2
x2
3
,
+ 6x + 18
sin() =
x2
x+3
+ 6x + 18
6.5
(x2
1
x+6
x
dx =
+ C.
3/2
2
3 x + 6x + 18
+ 6x + 18)
Partial Fractions
The method of partial fractions is a means to turn integrands which are rational functions (that is,
the quotient of polynomials) into simpler constitution pieces which may be more simply integrated.
Let us quickly recall partial fractions: Let p(x)
q(x) be a rational function, so that p and q are polynomials. If we can write q(x) as in product of linear factors, say q(x) = (x r1 )(x r2 ) (x rn )
then our goal is to write
p(x)
C1
C2
Cn
=
+
+ +
,
q(x)
x r1 x r2
x rn
for constants Ci , i = 1, . . . , n. If q contains an irreducible quadratic factor, say q(x) = (xr1 )(ax2 +
bx + c) then we try to write
p(x)
A
Bx + C
=
+ 2
.
q(x)
x r1 ax + bx + c
149
c
2015
Tyler Holden
Integration Techniques
6.5
Partial Fractions
In general, if q(x) contains an irreducible component of degree k then the numerator of that
component can be at most k 1. If in the factorization a factor occurs with multiplicity greater
than 1, say q(x) = (x r1 )(x r2 )2 then we seek an expression of the form
B
C
p(x)
A
+
+
.
=
q(x)
x r1 x r2 (x r2 )2
More generally, if the factor (x r) occurs with multiplicity m, then we must account for a factor
(x r)j for all j = 1, . . . , m. All possible combinations of the above rules also hold.
Example 6.12
x2
Solution. We can factor the denominator as x2 +x2 = (x+2)(x1) so we look for a decomposition
of the form
5x + 1
A
B
=
+
.
x2 + x 2
x+2 x1
By cross multiplying, we must thus have 5x + 1 = A(x 1) + B(x + 2). If one so wishes, we could
expand this out to find that
5x + 1 = (A + B)x + (A + 2B),
A+B =5
A + 2B = 1
which is a system of equations that can be solved. On the other hand, realizing that our equation
must hold for all values of x, we could substitute convenient values of x = 1, x = 2 into our
expression to find that
6 = 3B,
9 = 3A.
This tells us that A = 3 and B = 2, so that
5x + 1
3
2
=
+
.
x2 + x 2
x+2 x1
The process of partial fractions allows us to decompose our integrand into bite size pieces. From
here we can exploit the linearity of the integral.
Example 6.13
5x + 1
dx.
+x2
x2
Solution. One might be tempted to try substitution here, but there is no obvious way to make it
work. Instead, we exploit the partial fraction decomposition that we computed in Example 6.12 to
150
c
2015
Tyler Holden
6.5
Partial Fractions
Integration Techniques
write
Z
Z
5x + 1
3
2
dx
dx =
+
x2 + x 2
x+2 x1
= 3 ln |x + 2| + 2 ln |x 1| + C.
The problem with partial fractions, as written, is that it is impossible for us to consider the
case when the degree of the numerator is greater than or equal to that of the numerator. In such
cases, we have to exploit the power polynomial long division:
Example 6.14
Compute
x3 4x 10
dx.
x2 x 6
Solution. We would like to apply partial fractions, but first need to perform polynomial long division. Some quick calculations show us that
x3 4x 10
3x 4
=x+1+ 2
.
2
x x6
x x6
We may now perform partial fractions on the latter part of this expression; namely
3x 4
A
B
=
+
.
x6
x3 x2
x2
One may find that A = 1 and B = 2. Substituting this into the integral we get
Z 3
Z
x 4x 10
1
2
dx =
x+1+
+
dx
x2 x 6
x3 x+2
1
= x2 + x + ln |x 3| + 2 ln |x 2| + C.
2
Compute
2x3 4x2 + 2x 2
.
x4 2x3 + 2x2 2x + 1
Solution. Our first goal is to factor the denominator. By the Rational Roots Theorem25 the only
possible rational roots are x = 1. Substitution into the denominator reveals that only x = 1
is a root, meaning that we can remove a factor of x 1 from the denominator. Performing long
division, we get that
x4 2x3 + 2x2 2x + 1 = (x 1)(x3 x2 + x + 1).
25
The Rational Root Theorem says that if p/q is a rational number, written in irreducible form, and is a root of
an xn + + a0 where both an , a0 6= 0, then p is a factor of a0 and q is a factor of an .
151
c
2015
Tyler Holden
Integration Techniques
6.5
Partial Fractions
Once again the only possible rational roots for x3 x2 + x + 1 are x = 1, and in fact we already
know that x = 1 cannot be a root. Checking x = 1 we again see that 1 is a root, so we may
remove another factor of x 1 to get
x4 2x3 + 2x2 2x + 1 = (x 1)2 (x2 + 1).
As x2 + 1 has no real roots, we cannot factor any further. Our partial fraction decomposition is
thus of the form
2x3 4x2 + 2x 2
A
B
Cx + D
=
+
+ 2
,
x4 2x3 + 2x2 2x + 1
x 1 (x 1)2
x +1
which if we cross multiply gives us
D A = 1.
Substituting x = 2 we get
2 = 5A 5 + 2C + D
5A + 2C + D = 7.
Substituting x = 1 we get
10 = 4A 2 + 4(D C),
4A + 4D 4C = 8.
+ 2
.
4
3
2
2
x 2x + 2x 2x + 1
x 1 (x 1)
x +1
Integrating we thus have
Z
Z
2x3 4x2 + 2x 2
1
1
x
dx =
+
dx
x4 2x3 + 2x2 2x + 1
x 1 (x 1)2 x2 + 1
1
1
= log |x 1| +
+ log |x2 + 1| + C.
x1 2
152
c
2015
Tyler Holden
In this section we will list a few of the plethora of applications of the integral. The utility of the
integral cannot be overstated: it would be impossible to provide a full list of applications, simply
because the author cannot begin to know them all. The integral finds its place amongst almost
every stem field, from biology (for example, computing glucose tolerance, or other metabolic curves)
to engineering (the I in PID-controller stands for integral).
7.1
Volumes
Before the invention of calculus, computing volumes was all but impossible. It was a task relegated
to geniuses, and even then the volume of only the most simple objects could be computed, using
what was effectively calculus techniques. With the power of calculus, now almost anyone can
compute the volume of very complicated shapes!
The textbook presents this subject by enumerating three or four different techniques for computing volumes, depending on how the target shape in generated. I reiterate that our goal is not
to teach you algorithms, but to teach you how to problem solve. To this end, I will not present
the algorithms for determining these volumes; instead, I will try to show the student how to set up
these volume problems so that they can be computed in any situation.
The major theme is to realize that volume can be computed by integrating area. The idea is
that integrating adds a dimension. Computing the area under a curve was done by integrating the
height of lines f (x) over an interval. Hence integrating length - a one dimensional thing - yielded
area. To compute three dimensional volumes, we will thus want to integrate areas. Hence setting
up a volume question amounts to determining the cross-sectional area, then integrating:
V =
A(x)dx
For the sake of simplicity, I will continuously use the same two functions: f (x) = x and
g(x) = x. In Example 5.28 we saw how to compute the area between these two curves. In what
follows, we will construct a series of three dimensional shapes using the area that they bound, and
show the student how to compute the generated volume.
Example 7.1
Consider a space which has as its base the area enclosed by the functions f (x) = x and
g(x) = x, and whose cross sections are squares. Determine the volume of this object.
153
c
2015
Tyler Holden
7.1
Volumes
Solution. We are told that the cross sectional area is given by squares; however, the dimension of
the squares change depending on the value of x. Choosing a typical point, we see that the base of
the square has length x x and hence the area of the square is
A(x)dx =
[x 2x
3/2
1 2 4 5/2 1 3
+ x ]dx =
x x + x
2
5
3
2
1
0
1
.
30
That was not too bad, but one will rarely be explicitly told the cross sectional area. Often, one
has to analyze the object and discover this one his/her own.
Example 7.2
Consider the area enclosed by the functions f (x) = x and g(x) = x. Determine the volume
of the object given by rotating that area about the x-axis.
Solution. Once again, we consider a typical cross-section at the point x. The object we get by
taking out this slice looks like a washer, whose outer radius is f (x) and whose inner radius is g(x).
The cross-sectional area of this element is thus
A(x) = f (x)2 g(x)2 = [f (x)2 g(x)2 ].
Integrating we thus get
V =
A(x)dx =
1
1
= x2 x3
2
3
1
0
i
2
x x2 dx
Heres an interesting question: What if we have revolved this shape around the y-axis instead?
First of all, do we get the same volume? Secondly, how do we handle the integral?
Example 7.3
Solution. This time, taking cross sections about a point x yields really odd shapes whose area is
difficult to compute. Instead, lets try taking cross-sections in the y-direction. Notice that the
function y = x can be written as y 2 = x, so that the cross section in the y-direction again looks
like a washer, but this time with outer radius (x =)y and inner radius (x =)y 2 . The cross-sectional
area is thus A(y) = [y 2 y 4 ]. Again, our interval is going from 0 to 1, and hence the volume is
154
c
2015
Tyler Holden
7.1
Volumes
given by
Z
y 2 y 4 dy
0
1 3 1 5 1 2
=
y y
=
.
3
5
15
0
V =
Determine the volume of the object given by rotating the area enclosed by f (x) =
g(x) = x about the line x = 1.
x and
Solution. As always, we take a typical cross-section, again in the y-direction this time. We again
have a washer whose outer radius is 1 y 2 and whose inner radius is 1 y. The cross sectional
area is thus
A(y) = [(1 y 2 )2 (1 y)2 ] = (2y 3y 2 + y 4 ),
and the volume can be computed as
V = 2
1
[2y 3y + y ]dy = y y + y 5
5
2
1
Alternatives to Sections Another technique for integrating can be to use other shapes instead
of just cross-sections. There are times when it may in fact this may be more convenient that using
cross-sections, so we introduce it as yet another technique for finding volumes.
As in the previous sections, the trick is to think about an area (representing an infinitesimal
piece of volume) and sweep out the shape with which we are concerned. For our first example,
instead of using cross-sections, we will use cylinders. Consider a function f (x) defined on [0, a],
rotated about the y-axis. A typical slice of the cylinder will have surface area
S(x) = 2xf (x).
If we were to use our differential framework, we could think about fattening up this cylinder slightly,
to gives us a volume 2xf (x)dx. In that case, the volume of the object can be computed by sweeping
these cylinders from 0 to a, to give us the volume
Z a
V =
2xf (x)dx.
0
Example 7.5
155
c
2015
Tyler Holden
7.1
Volumes
2
128
V =
2x xdx = 2 x5/2 =
.
5
5
0
0
Of course, we can easily generalize this process to the case when we have a difference of functions.
For example, consider the area bounded between two function f (x) and g(x), say on [0, a]. If we
look at the cylinder in this case, the radius of the cylinder remains constant, but the height is given
by the difference of the two functions. Hence the surface area is given by
S(x) = 2x[f (x) g(x)].
Example 7.6
Determine the volume of the object given by rotating the area enclosed between
about the y-axis.
x and x2
Solution. We did this exact problem in Example 7.3 where we got a solution of 2
15 . We certainly
expect that we should get the same solution this time. Again, using the method of shells we get
Z 1
Z 1h
i
V =
2x x x dx = 2
x3/2 x2 dx
0
0
1
2
2
1
= 2 x5/2 x4 =
.
5
3
15
0
As a general rule, I find that using cross-sections more often works out to be the better technique,
but the following is a (contrived) example where the shell method proves much simpler:
Example 7.7
2
Solution. If we were to do this using cross-sections, we would need to find a way of determining
2
the x-coordinate given y = ex , and while that is doable, the resulting function is not very easy to
integrate. Instead, if we use the shell method we have
Z 2
2
V =
2xex dx
u = x2 , du = 2xdx
0
Z 4
=
eu du = eu |40 = (e4 1).
0
Naturally, were we to revolve around the x-axis, the procedure would be the same, though we
would have to convert our problem to integrating about the y-direction, and make the appropriate
changes to our functions.
What happens if we rotate about a different axis?
156
c
2015
Tyler Holden
7.1
Volumes
Example 7.8
Determine the volume of the object given by rotating the area enclosed by
the x = 1 axis.
x and x about
Solution. We did this problem in Example 7.4, and found a solution of 5 . Setting up our cylinder,
we see that for a typical slice our radius is now 1 x while our height is x x; thus our surface
area is given by
V =
2(1 x)[ x x]dx = 2
x x x3/2 + x2 dx
0
0
2 3/2 1 2 2 5/2 1 3 1
= .
= 2 x x x + x
3
2
5
3
5
0
Example 7.9
Solution. Consider the curve y = r2 x2 from [0, r]. By rotating this about the y-axis, we will
get the upper hemisphere of the sphere, and hence half the area. Using the method of shells, we
thus get
Z r p
V = 2
x r2 x2 dx
u = r 2 x2
0
Z 0
=
udu
r2
2
= u3/2
3
r 2
0
2
= r3 .
3
Of course, the total area of the sphere is then twice this amount, giving
what we expected.
157
c
2015
Tyler Holden
4
3
3 r ,
which is exactly
Improper Integrals
Improper Integrals
It was essential in defining the integral that we examined bounded functions on bounded intervals
[a, b], so that everything under consideration could be finite. In this section we examine how to
now extend the idea of integrals to cover unbounded functions, and functions defined on unbounded
intervals.
8.1
8.1.1
First Principles
Infinite Intervals
Our goal is to extend the notion of integration from a finite interval [a, b] to an infinite interval
[a, ) or (, b]. We will develop the idea for the interval [a, ) and leave the details for (, b]
to the student.
Definition 8.1
Let f be a bounded function on the interval [a, ) such that f is integrable on [a, x] for
every x > a. We define the improper integral of f on [a, ) as
Z x
Z
f (t)dt = lim
f (t)dt.
x a
We say that that the improper integral converges if this limit is finite, and diverges otherwise.
Let us take a moment to think about what this is saying: We are defining a new function
Z x
F (x) =
f (t)dt
a
, which is precisely the anti-derivative of f . The improper integral then converges if the function
F (x) has a horizontal asymptote; that is, the area under the graph of f asymptotically stabilizes
to a single, finite number.
Naturally, one then defines the improper integral on (, b] as
Z b
Z b
f (t)dt = lim
f (t)dt.
x x
Example 8.2
Determine
R
0
et dt, if it exists.
Z
e dt = lim
x 0
x 0
158
c
2015
Tyler Holden
8.1
First Principles
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
1
Improper Integrals
Figure 25: The function f (x) = ex and an anti-derivative F (x) = 1 ex . We see that F (x)
tends to the number 1 as x . This occurs because as x the graph under the function
f (x) becomes very small.
Example 8.3
Determine
R
0
sin(x)dx, if it exists.
which does not exist. If we think about this for a while, we can convince ourselves why it is true.
The area under the graph of sin(x) is constantly oscillating between +1 and 1, and this oscillation
never stops. Hence even though there are many points where the area is arbitrarily small, its not
enough to ensure that the area ever converges.
Intuitively, it seems like functions which tend to zero may have a convergent improper integral.
However, this is not always the case. It is necessary that function goes to zero fast enough to
ensure that the area is positive.
Proposition 8.4
p > 1.
1
dt = lim log(x/a) =
x
t
Improper Integrals
8.1
First Principles
We know that 1/xp1 converges only if and if the power is non-negative; that is, p 1 0.
Combining this with p 6= 1 tells us that the improper integral converges if and only if p > 1.
As was computed explicitly, this implies that
1
does not converge, which is a result that
x
f (t)dt converges.
f (t)dt =
f (t)dt +
f (t)dt.
a
The student should convince him/herself that the value of the improper integral does not depend on the
value of c.
The reason is that the value of the integral should not depend on how quickly we take our limits.
For example
Z x
lim
sin(t)dt = lim [cos(x) cos(x)] = 0
x x
2x
x x
Example 8.6
Determine
|t| dt.
since t > 0
by Example 8.2.
=1
160
c
2015
Tyler Holden
8.1
First Principles
Similarly,
Improper Integrals
e|t| dt = 1, thus
|t|
dt =
|t|
dt +
8.1.2
e|t| dt = 2.
Unbounded Functions
The case of unbounded functions often poses even more difficulty, since it is very tempting to just
blindly apply the Fundamental Theorem of Calculus without paying attention.
Example 8.7
1
dx.
x2
Solution. We know that the anti-derivative of x12 is x1 , so if we were to just blindly apply the FTC
we would get
Z 1
1
1 1
dx
=
= 2.
2
x 1
1 x
Unfortunately, this is completely and totally wrong. Our first hint at a miscalculation is
probably the fact that x12 is everywhere positive, yet we somehow ended up with a negative integral.
In fact, the integral is infinite. To see this, note that since x12 > 0 then
Z
1
dx
x2
1
dx
2
x
1 1
=
x
1
= 1,
and that by choosing to be small enough we can make the integral arbitrarily large. The reason is
that the function x12 is not integrable on the interval [1, 1] (all integrable functions are bounded!),
and hence we could not apply the FTC.
Exercise:
The way to deal with unbounded functions is precisely the same way that we deal with unbounded intervals: we take a limit.
161
c
2015
Tyler Holden
Improper Integrals
8.1
First Principles
Definition 8.8
If f is a function on [a, b], unbounded at a, but integrable on [x, b] for every x > a then we
define the improper integral
Z
f (t)dt = lim
xa+
f (t)dt.
If this limit is finite we say that the improper integral converges; otherwise, we say that the
improper integral diverges.
Similarly, if f were unbounded at b, we would define the improper integral as
Z
f (t)dt = lim
xb
f (t)dt.
Example 8.9
Evaluate
t
dt, if it exists.
9
t2
Solution. We immediately recognize that there could be a problem at t = 3. Using the substitution
u = t2 9 we get
Z
hp
i5
t
dt = lim
t2 9
x
x3+
t2 9
h
i
p
= lim 4 x2 9 = 4.
x3+
Just as in the case of integrating from (, ) we must also be careful about integrating on
both sides of an unbounded function.
Definition 8.10
If f is a function on [a, b] which is unbounded at c [a, b] then we say that the improper
integral
Z
f (t)dt and
f (t)dt =
f (t)dt +
f (t)dt.
162
c
2015
Tyler Holden
8.2
Comparison Tests
Improper Integrals
Proposition 8.11
1
dt converges, if and only if p < 1.
tp
1
dt = lim
tp
x0+
1
1
1 a
dt =
lim
.
t
1 p x0+ xp1 x
The limit converges if and only if p 1 0 so that p 1. Combined with the fact that p 6= 1 we
get p < 1 as required.
8.2
Comparison Tests
In this section, we develop some techniques to make our lives simpler in terms of dealing improper
integrals. The idea is something like the following: Say that you were asked to determine whether
the following improper integral converged:
Z
2 + sin(x)
dx,
x
0
or how about
x2 + 1
dx?
x4 + 3x2 4x + 1
The Comparison Test is the simplest of the basic tests, and exploits the monotonicity of the
integral. In particular, the following lemma is rather intuitive:
Lemma 8.12
If F, G are two functions with a domain D, such that F (x) G(x) for all x D, then
lim F (x) =
163
c
2015
Tyler Holden
lim G(x) = .
Improper Integrals
8.2
Comparison Tests
Proof. This is a kind of Squeeze Theorem for infinite limits: If the smaller of the two functions goes
off to infinity, then certainly the bigger of two must go to infinity as well. The proof is effectively
x
a one-liner. Since F (x) we know that for every M R there exists an N R such that
if x > N then f (x) > M . We claim that this same N works for G. Indeed, let M be arbitrarily
chosen and choose the N guaranteed to exist by above. Then if x > N we have g(x) f (x) > M
as required.
Of use will be this spiritual contrapositive:
Lemma 8.13
If F is a bounded, increasing function on some interval [a, ), then lim F (x) exists.
x
Let f, g be functions on an interval [a, ) such that 0 f (x) g(x) for all x [a, ).
Z
Z
1. If
g(t)dt converges then
f (t)dt converges.
a
2. If
g(t)dt diverges.
The idea is again a type of Squeeze Theorem argument. If the integral of the bigger function g
becomes finite, the monotonicity of the integral cannot allow f to go off to infinity. Similarly, if the
integral of the smaller function f goes off to infinity, the larger functions integral must also diverge.
A good question at this point is to ask whether any of the integrals could oscillate and hence not
converge. The condition that 0 f (x) g(x) guarantees that the integrands are positive: this
means that the corresponding integrals are increasing functions.
164
c
2015
Tyler Holden
8.2
Comparison Tests
Improper Integrals
Proof of the Basic Comparison Test. In both cases, define the functions
Z x
Z x
F (x) =
f (t)dt,
G(x) =
g(t)dt,
a
and note that F, G are increasing functions (since f, g are non-negative) and F (x) G(x) by
monotonicity of the integral.
1. Assume that
R
a
exists and is finite. This implies that F is an increasing bounded function. By Lemma 8.13
it follows that
lim F (x)
x
so that
R
a
g(t) diverges.
Example 8.15
Show that
2 + sin(x)
dx diverges.
x
R
1
1
x
2 + sin(x)
dx diverges.
x
165
c
2015
Tyler Holden
Improper Integrals
8.2
Comparison Tests
Example 8.16
Determine whether
dx converges or diverges.
6
x +1
Solution. Once again, we just want to look at which terms in the numerator and denominator
dominate in the limit x . Certainly, we do not expect x6 + 1 to be too much different than x6
for very large x, so we will compare our integrand to the function
1
x
= 2
6
x
x
. Indeed, since 1 + x6 x6 we have that
1
x6
1
,
1+x6
1
1
x
=
.
2
6
x
x +1
x6
Now the integral of the left-hand-side converges by 8.11, so by the Comparison Test we know that
Z
x
dx converges.
6
x +1
0
8.2.2
The Basic Comparison Test is just that, basic. Often times the obvious inequality that you want
actually ends up going in the wrong direction, yet the integrals are so similar that you feel like you
should still be able to compare them. Example 8.18 below will demonstrate precisely this.
The Limit Comparison Test will fix this by asking the question: Do f and g grow at roughly
the same rate?
Theorem 8.17: The Limit Comparison Test
Let f, g be integrable functions on all subintervals of [a, ) and satisfy 0 f (x) g(x). If
f (x)
<
x g(x)
Z
Z
then
f (x)dx converges if and only if
g(x)dx converges.
0 < lim
(x)
The statement that fg(x)
converges to some finite, non-zero number, means that f and g grow
asymptotically at the same speed, up to some multiplicative constant (which is precisely the value
of the limit). Note that if the limit is 0, eventually one must have g(x) f (x) and can use the
Basic Comparison Test appropriately. Similarly, if the limit is , then eventually f (x) g(x) and
again the Basic Comparison Test can be used.
1
L
f (x)
g(x)
166
c
2015
Tyler Holden
8.2
Comparison Tests
f (x) x
g(x)
Improper Integrals
f (x)
L + 1.
g(x)
R
Assume that a g(x)dx converges, and note that the above equation implies that f (x) (L+1)g(x)
for all x > N . Hence for x > N we have
Z x
Z N
Z x
Z N
Z x
g(x)dx.
f (x)dx + (L + 1)
f (x)dx
f (x) +
f (x)dx =
a
In theZ limit as x both terms are the right-hand-side are finite, so by the Basic Comparison
Test
f (x)dx converges.
a
f (x)dx converges, then we repeat the argument above but use the fact
that f (x) (L 1)g(x) and again apply the Basic Comparison Test.
Example 8.18
Determine whether
R
1
1 dx
1+x
converges or diverges.
Solution. If one were to try to use the Basic Comparison Test, the obvious inequality is that
1
1x . But this does not tell us anything! The right-hand-side diverges, and so does not
1+x
impose its will on the left-hand-side. Instead, we recognize that
x
lim
= 1.
x
1+x
R 1
R
Thus by the Limit Comparison Test, since 1 1x dx diverges, we necessarily have that 1 1+x
dx
diverges as well.
Example 8.19
Determine whether
x2 + 1
dx converges or diverges.
x4 + 3x2 4x + 1
Solution. With a strong enough argument, one might be able to argue this example using the Basic
Comparison Test, but the Limit Comparison Test proves much simpler. Again the idea is to look
at how the numerator and denominator grow asymptotically. The numerator grows as x2 , while
the denominator grows as x4 , meaning that the combined system grows as x12 . To invoke the Limit
comparison Test, we must compute the limit of the ratio of these functions:
x2 + 1
1
x4 + x2
=
lim
= 1.
x x4 + 3x2 4x + 1 1/x2
x x4 + 3x2 4x + 1
lim
Since
Z
1
1
dx
x2
2
x +
converges (by Proposition 8.11), we conclude by the Limit Comparison Test that
1
dx also converges.
x4 + 3x2 4x + 1
1
167
c
2015
Tyler Holden
Our goal is to now analyze what happens when we deal with discrete, rather than continuous,
information. Our first subject will be to consider sequences and their limits, which will be analogous
to the study of differential calculus. Shortly after sequences, we will begin studying series which is
the analogous to the integral calculus. At a very rough level, we have seen sequences before. We
think of sequences as an ordered collection of elements, such as
1 1 1
1, , , , . . . .
(9.1)
2 3 4
Sets are not typically ordered though, so there is a more fruitful way of thinking about sequences
that will often allow us to capitalize on our knowledge of calculus.
9.1
The Basics
Definition: The idea is as follows: thus far we have been examining maps f : R R, functions
which take a real number as an input and produce a real number as an output. A sequence is a
map a : N R which takes in natural numbers as input and produces real numbers as output. For
example, the function a(n) = n1 is such a sequence, and we have
a(1) = 1,
1
a(2) = ,
2
1
a(3) = ,
3
1
a(4) = , . . . .
4
This notation is often clumsy though, so instead we abbreviate a1 = a(1), a2 = a(2) and so on,
with a general element of the sequence written as an = a(n). The image of a sequence is the set
(an )
n=1 = {an : n N}
Hence the image of the sequence an = 1/n is exactly the set given in (9.1). We will often make
no distinction between the sequence as a function an , and its image set (an )
n=1 . Notice that this
latter notation is particularly useful if we want to change the index at which the sequence starts.
In fact, if the indexing set is implicitly known we may sometimes be lazy and only write (an ).
One can take infinite subsets of N and, up to re-ordering, get something which is still labelled
by N. Equivalently, it is possible to take infinite subsets of the set {a1 , a2 , a3 , . . .} and hence get
another sequence. This leads us to the following definition:
Definition 9.1
The choice of notation ank comes from the fact that if we think of a as a function a : N R then we can
think of n as a function n : N N for which nk = n(k). Thus ank = a(n(k)) is a composition of functions.
1
n.
Indeed, notice
168
c
2015
Tyler Holden
9.1
The Basics
f (x) = x1
an = n1
1
0.8
0.6
0.4
0.2
2
Figure 26: The sequence an = n1 can be recognized as the restriction of the function f (x) =
the positive integers. This sequence is decreasing and is bounded.
1
x
to
Relation to Elementary Functions: We know that N R and so often one can think of
sequences as the restriction of a function f : R R to just the natural numbers. For example,
1. an = n1 ,
4. an =
2. an = sin n,
5. an = 2n ,
3. an =
en
n+1 ,
n,
6. an = nn .
are all sequences which correspond to functions. Consequently, many notions for functions on R
have a corresponding notion for sequences:
Definition 9.2
Show that the sequence an = n + sin(n) is a non-decreasing sequence, bounded from below
but not from above.
Solution. Evaluating sin(n) for an arbitrary integer is not a simple task; however, since we can
think of an as the restriction of the function f (x) = x + sin(x), we can use differential calculus to
169
c
2015
Tyler Holden
9.1
The Basics
help solve our problem. Notice that f 0 (x) = 1 + cos(x) 0, which means that f is a non-decreasing
function. In turn, the sequence an must then also be non-decreasing (convince yourself of this).
Since the function is non-decreasing, it is bounded from below by its first term: a1 = 1 + sin(1).
To see that it is not bounded above, we have that for any M > 0 the element aM +1 = M +sin(M )
M , showing that an grows unbounded.
Exercise: One can convince his/herself that if f (x) is an increasing function, then the sequence defined by an = f (n) is also increasing. A similar notion holds for decreasing and
bounded. However, the converse is certainly not true. Give an example of a sequence an
and a real function f such that an = f (n), an is increasing, but f is not increasing. Give a
similar example in the case when an is bounded but f is not.
Sequences without Elementary Relations: There are sequences which have (a priori ) no
relationship to functions on R, and hence must really be thought of purely in terms of a function
on N. The prototypical example of a sequence with no R-function equivalent26 is the sequence
an = n!, whose first few terms are given by
a1 = 1, a2 = 2, a3 = 6, a4 = 24, a5 = 120, a6 = 720, a7 = 5040, . . . .
Another class of example are those functions which are defined recursively. For example, defining
2, a3 =
2 + 1, a4 =
rq
2 + 1 + 1, . . .
or equivalently
r
q
q
a1 = 1, a2 = 1 + 1, a3 = 1 + 1 + 1, a4 =
1+
1+
1 + 1, . . .
A rather famous example is given by the Fibonacci sequence. Define a0 = a1 = 1 and set
an = an1 + an2 for n 2. The first few terms of this sequence are given by28
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, . . . .
26
There is a function-equivalent of the factorial sequence, though it is not quite amenable to the type of analysis
we would like to apply. The function is given by improper integral
Z
an = (n + 1) =
xn ex dx.
0
27
170
c
2015
Tyler Holden
9.2
Limits of Sequences
9.2
Limits of Sequences
Treating sequences as functions, and especially realizing the analogy with real functions, allows
us to talk about the notion of convergence of a sequence. Roughly speaking, we will say that a
sequence converges if it behaves as though the function a : N R has a horizontal asymptote. If
we have this idea in our mind, we can immediately adapt the notion of a horizontal asymptote for
real functions to sequences:
Definition 9.4
or
(an ) L.
Example 9.5
1
n
converges to 0.
Solution. We proceed in precisely the same fashion as we would do for a function. Let > 0 be
1
given, and choose a positive integer M such that M
< . Then if k M we have
|ak 0| =
1
1
< ,
K
M
so (an ) 0 as required.
Exercise:
Show that a sequence (an ) converges if and only if every subsequence (ank ) con-
verges.
Many of the Limit Laws we saw in Theorem 2.21 hold for sequences as well:
Theorem 9.6: Limit Laws for Sequences
Proof. The proof of these are almost identical to those of Theorem 2.21. We will prove (1) and
leave the remainder as an exercise.
171
c
2015
Tyler Holden
9.2
Limits of Sequences
Assume that (an ) L and (bn ) M . Let > 0 be given and choose M1 , M2 N such that
if k M1 then |an L| < 2 and if k M2 then |bn M | < 2 . Let M = max M1 , M2 , so that if
k M then
|an + bn (L + M )| |an L| + |bn M | < + = .
2 2
The Squeeze Theorem also carries over directly:
Theorem 9.7: Squeeze Theorem for Sequences
Assume that for sufficiently large k we have an bn cn . If (an ) L and (cn ) L, then
(bn ) is also convergent with limit L.
Once again, the proof of this theorem is almost identical to that of Theorem 2.34.
The two main theorems that we want to prove again following almost identically from those
which we have already proven:
Theorem 9.8
Proof. The idea is that because the sequence converges, at some point infinitely many terms of the
sequence must lie in any -band of its limit point. Since there are only finitely many other terms,
the function cannot grow infinitely large.
More formally, let L be the limit point of (an ), and choose N N such that for all k N we
have |ak L| < 1. In particular, this means that L 1 < ak < L + 1, so that the sequence is always
bounded after N . Now set
M = max {|a1 |, . . . , |aM |, |L + 1|, |L 1|} .
Certainly if k N then ak M by definition of M , and if k N then ak < M by our previous
line.
The following theorem is often referred to as the Monotone Convergence Theorem and was mentioned in Section 8.2.1. It is important enough to be worth proving again explicitly for sequences,
but the diligent student will immediately recognize that the proof is almost identical to that of
Theorem 8.13.
Theorem 9.9: Monotone Convergence Theorem
If (an ) is bounded from above and non-decreasing, then (an ) is convergent with its limit
given by supnN an .
Proof. Let L = supn an and let > 0 be given. By Proposition 5.4 we know that there exists some
M N such that
L < aM L.
172
c
2015
Tyler Holden
9.2
Limits of Sequences
Example 9.10
2n
n!
is convergent.
Solution. We have
2
an+1
2n+1 n!
=
,
=
an
(n + 1)! 2n
n+1
so that if n 2 we have an+1 < an and the sequence is decreasing. It is easy to see that an is
always positive, and so by the Monotone Convergence Theorem we know that (an ) converges.
Example 9.11
n
2n+1 an .
n
Solution. We begin by noticing that this sequence is decreasing, since an+1
an = 2n+1 < 1 which
implies that an+1 < an . Furthermore, the sequence is bounded below by 0; that is, we claim
that every an is positive. This follows quite quickly by induction, for clearly a1 = 1 > 0 and
n
an+1 = 2n+1
an > 0 by the induction hypothesis. By (Corollary to) the Monotone Convergence
n
Theorem, (an ) converges, say to L. Now taking the limit as n of the equation an+1 = 2n+1
an
we get
1
L= L
2
1+ 5
sequence is L = 2 : the golden ratio.
Exercise:
173
c
2015
Tyler Holden
9.2
Limits of Sequences
Figure 27: If (an ) L, then by going far enough into our sequence (blue) we can guarantee that
we will be in -neighbourhood of L. The image of these points are the f (an ) (brown), which live
in the desired -neighbourhood because of the continuity of f .
9.2.1
Continuous Functions
There are a plethora of ways of defining continuous functions, and now we have the tools to introduce
yet another.
Theorem 9.12
Proof. Assume that f is continuous, and let (an ) L. We want to show that (f (an )) L. Let
> 0 be given. Since f is continuous, there exists a > 0 such that for each x satisfying |x L| <
we have |f (x) f (L)| < . Since (an ) is convergent, there exists an N N such that for all n N
we have |an L| < . Combining these, we see that whenever n N we have
|an L| < , and so |f (an ) f (L)| < .
which is exactly what we want to show.
Conversely, assume that f is not continuous, say at c. Hence there exists an > 0 such that
for any > 0 there is an x such that |x c| < and |f (x) f (c)| . For each n = n1 , choose
an element xn satisfying |xn c| < n and |f (xn ) f (c)| . Then (xn ) c but f (xn ) does not
converge to f (c).
174
c
2015
Tyler Holden
9.3
Infinite Series
9.3
Infinite Series
Section 5.2 introduced sigma notation as a convenient way of writing down finite sums; namely,
n
X
i=m
ai = am + am+1 + + an1 + an .
The summand here conveniently looks like a sequence, though with finitely many terms. In fact,
we will be extending the notion of the finite sum above to an infinite sum, called an infinite series.
Definition 9.13
th partial sum of (a ) to be
Let (an )
n
n=1 be a sequence. Define the n
Sn =
n
X
ak .
k=1
n=1
n=1
an = lim Sn .
n
1
1
3
1
11
1
25
= 1 + = , S3 = S2 + = , S4 = S3 + = , . . . .
2
2
2
3
6
4
12
It is not at all clear from looking at the partial sums whether or not this sequence converges.
Shortly, we will analyze techniques that will allow us to determine whether series converge (though
we will rarely be able to determine the limit to which the series converges).
Here is an important distinction to make, as students often confuse sequences and series:
A sequence is a collection of elements (an )
n=1 . Note that an need not be in any way related
to an1 . The analogy to keep in mind that is sequences behave like functions.
A series is a sum of elements of a sequence, with the partial sums being related via Sn =
Sn1 + an . The analogy to keep in mind is that a sum is like an improper integral of a
function.
This analogy between sequences/series and functions/integrals is an important one to keep in mind.
Since the convergence of a series is defined in terms of the limits of its partial sums (a sequence),
the limit laws for sequences immediately give us the following result:
175
c
2015
Tyler Holden
9.3
Infinite Series
Theorem 9.14
Let
an and
k=1
bn be convergent series.
k=1
(an + bn ) =
k=1
bn .
k=1
X
X
(an ) =
an .
k=1
an +
k=1
k=0
k=1 an
(an + bn ) =
k=1
an +
k=1
k=1 bn .
bn = sn + tn .
k=1
(an + bn ) =
k=1
X
k=1
an +
bn .
k=1
If
X
k=1
Proof. Let (Sn ) be the partial sums, so that (Sn ) L for some limit L. Since Sn = Sn1 + an we
have an = Sn Sn1 , so that (an ) L L = 0.
P
The contrapositive of this is exactly what we mentioned above: If (ak ) 6 0 then
k=1 ak
does not converge. However, the converse of Theorem 9.15 is not true. We do not yet have the
techniques to show that this is the case, but the series
X
1
k
k=1
176
c
2015
Tyler Holden
9.3
Infinite Series
9.3.1
1 k
k
0.
Geometric Series: A geometric series is a series defined by a sequence (an ) where an = ran1
for some r R. For example,
1
1
1
1
a0 = 1, a1 = , a2 = , a3 = , a4 = , . . .
2
4
8
16
satisfies the relation an = 21 an1 . We can write such series as
rk .
k=0
Theorem 9.16
rk =
k=0
1
, and the series diverges otherwise.
1r
r r2 r3 r4 + rn rn+1
= 1 rn+1
so that sn =
1 rn+1
n
. Now in the limit as n we have that rn+1 0, so
1r
X
k=0
1 rn1
1
=
.
n 1 r
1r
rk = lim
An interesting if not sophistic argument that proves the same result is to see that
(1 r)(1 + r + r2 + r3 + ) = 1 + r + r2 + r3 +
r r2 r3
= 1.
In an entirely formal sense (that is, treating r purely as a symbol without assigning it any value)
we see that (1 r) is the multiplicative inverse of (1 + r + r2 + ), giving the desired results as
well. Determining from this which values of r actually make sense is something that we will see in
a later chapter.
177
c
2015
Tyler Holden
9.4
Convergence Tests
Telescoping Series: Telescoping series are ones in which many of the internal summands cancel
one another out. For example, an example of a telescoping sum arises when we consider the Riemann
sum of a constant function. If f (x) = c and P = {a = x0 , x1 , . . . , xn1 , xn = b} is a partition, then
the Riemann sum is
n
n
X
X
Uf (P ) =
f (xi )(xi xi1 ) = c
(xi xi1 ).
i=1
i=1
Notice what happens if we expand out a few terms of the sum: we get
X
k=1
1
if it converges.
k(k + 1)
Solution. The trick is to realize our sum as a telescoping series. Using partial fractions, we get that
1
1
1
=
k(k + 1)
k k+1
so that the nth partial fraction is given by
X
1
1
1
Sn =
=
k(k + 1)
k k+1
k=1
k=1
1
1 1
1 1
1
1
1
1
= 1
+
+ +
2
2 3
3 4
n1 n
n n+1
1
n
=1
=
.
n+1
n+1
n
X
X
k=1
9.4
X
1
1
k
= lim
= lim
= 1.
k(k + 1) n
k(k + 1) k k + 1
k=1
Convergence Tests
The analogy between infinite series and improper integrals continues in this section with the introduction of the basic and limit comparison test. Later, we will develop several more tests which
allow us to infer convergence of infinite series.
178
c
2015
Tyler Holden
9.4
Convergence Tests
9.4.1
Comparison Tests
In section 8.2 we saw the Basic Comparison Test (Theorem 8.14) and Limit Comparison Test
(Theorem 8.17) for improper integrals. The statement and proof of these tests for series are almost
identical. We will give the statement, but leave the proofs as an exercise:
Theorem 9.18: Basic Comparison Test for Series
Suppose that (an ) and (bn ) are sequences such that 0 ak bk for sufficiently large k.
P
P
1. If k bk converges, then k ak converges.
P
P
2. If k ak diverges, then k bk diverges.
The idea here is precisely the same as that of Theorem 8.14: If the larger of the two series
converges, it forces the smaller sequence to also converge. In the opposite direction, if the smaller
sequence diverges, then the larger sequence must also diverge.
Example 9.19
X
k=1
7n
converges.
+2
8n
Solution. By considering the terms which grow fastest in this expression, we suspect that the 2
in the denominator should not affect the long term behaviour of the function. We thus have the
comparison
n
7n
7n
7
=
.
8n + 2
8n
8
P 7 n
Now the series
is a geometric series
ratio less than 1, and hence converges. We
k 8
P with
n
conclude by the Basic Comparison Test the k 8n7+2 converges as well.
Theorem 9.20: Limit Comparison Test for Series
ak
= L for some 0 < L <
n bk
If (an ) and (bn ) are sequences with positive terms, and lim
then
ak converges
bk converges.
The fact that ak /bk converges to some positive, non-zero number means that ak and bk grow at
approximately the same speed. Hence convergence or divergence of one will immediately imply
convergence/divergence of the other.
Example 9.21
P k10 +25k7 +1
k12 20
converges or diverges.
179
c
2015
Tyler Holden
9.4
Convergence Tests
Solution. As mentioned above, we only care about the terms which grow the fastest, which means if
we want to guess we just consider the biggest factors in each of
numerator and the denominator.
Pthe
1
k10
1
Those terms are k12 = k2 and since we know that the series
converges, it seems likely that so
k2
too will our series above. Indeed, this actually gives us the other series we should use in our Limit
Comparison Test. Notice that
1
k2
lim 10
7 +1
k k +25k
k12 20
1/k12
k 12 20
k k 12 + 25k 9 + k 2 1/k12
= lim
1 20/k12
k 1 + 251/k2 + 1/k10
= 1.
= lim
Thus the Limit Comparison Test tells us that the series converges.
Example 9.22
1
an
1
an +M
Solution. Since constants neither shrink nor grow, we do not expect them to affect convergence of
the limit, and we can use both the Basic Comparison Test.
P 1
P 1
P 1
1
. Thus
<
and
We realize that an < an + M so that a1n > an +M
a
+M
a
an converges,
n
n
P 1
so
an +M will also converge.
On the other hand, by using the Limit
this result to
P 1 Comparison Test we can actually extend
1
work for any M R. Recall that since
converges,
it
must
be
the
case
that
0 as n .
an
an
The Limit Comparison Test then tells us that
lim
1/an
n 1/an +M
9.4.2
1
an +M
an + M
an
M
= lim 1 +
=1
n
an
= lim
converges.
Non-Comparison Tests
The following tests are, in a sense, more appealing than the above test, since they do not require
us to finagle a sequence against which to compare.
The Integral Test: This tests takes our series/integral analogy to the next level, but allowing
us to directly compare them.
180
c
2015
Tyler Holden
9.4
Convergence Tests
f (k) converges
k=1
f (x)dx converges.
Proof. Since f is continuous, it is integrable on every subinterval [1, b] of [1, ); that is, for any
partition P of [1, b] we have
Z n
f (x)dx Uf (P )
Lf (P )
1
Consider the interval [1, n] and the partition P = {1, 2, 3, . . . , n} consisting n equal subintervals,
each of length 1. Since our function is decreasing, we know that
min f (x) = f (i + 1),
x[i,i+1]
x[i,i+1]
i = 1, . . . , n 1
so that f (i + 1) f (x) f (i) for all x [i, i + 1]. In terms of upper and lower Riemann sums, we
get
Z n
n
n1
X
X
f (k).
f (k) = Lf (P ) <
f (x)dx < Uf (P ) =
1
k=2
k=1
f (x)dx = lim
n 1
n1
X
lim
f (x)dx
f (k)
k=1
f (k)
k=1
R
P
which tells us that if
k f (k) converges, then so too does 1 f (x)dx. Conversely, the other
inequality shows us that
X
k=1
Thus if
R
1
n
X
f (k)
k=2
Z
n
Example 9.24
Consider the series whose terms are given by an = n1 . Determine the convergence/divergence
of this series.
181
c
2015
Tyler Holden
9.4
Convergence Tests
Solution. By now, it is my hope that you have seen this example an enumerable number of times.
This is what is called the Harmonic Series and is the classic examplePof how to trick first year
students into making mistakes.
n , but
an diverges. Indeed, the
P 1 It is clear that an 0Ras
n 1
integral test tells us that
converges
if
and
only
if
dx
converges,
but
n
1 x
Z n
1
dx = log(x) = .
1
1 x
However, regardless of how many times I emphasize the point that Just because the sequence
converges to zero, does not mean the series converges there are countless students who get this
wrong! Do not make the same mistake, and learn this classic example immediately.
Example 9.25
en .
Solution. Here, the terms of the series are given by the function f (x) = ex . Applying the integral
test, we have
Z
1
ex dx = ex =
e
1
1
P n
so
e converges. Note that since theindex occurs as a power, we can also apply the root test.
n
Indeed, setting an = en we have that en = e1 . Since e1 < 1 it then follows that the series
converges.
Ratio Test: One of the most powerful test, the ratio test is somewhat similar to the Limit
Comparison Test at first glance, but only uses sequential terms of the sequence itself.
Theorem 9.26: Ratio Test
ak+1 n
L.
Let (an ) be a sequence such that
ak
1. If L < 1 then
ak converges,
ak diverges,
k=1
2. If L > 1 then
k=1
Proof. The idea of the proof is as follows: If the ratio converges to L 6= 1, then for sufficiently
large N we can treat
Pak+1 = Lak for all k N . At this point the series effectively behaves like a
geometric series aN i Li , which will converge if L < 1 and diverge if L > 1.
Assume that L < 1 and choose some ` such that L < ` < 1. Since the ratio converges to L, we
know there exists some N N such that if k N then |ak+1 /ak | < ` (convince yourself of this,
182
c
2015
Tyler Holden
9.4
Convergence Tests
using -N if necessary). In particular, this means that |ak+1 | < `|ak |. In particular,
|aN +1 | < `|aN |,
Using the Basic Comparison Test, we have |ak | `k |aN |, with the right-hand-side
being a geometric
P
series
with
common
ratio
`
<
1
and
hence
converging.
We
conclude
that
|a
|
converges
as well,
k
k
P
so k ak converges.
Precisely the same reasoning holds for (2), though this train of argument cannot be used when
L = 1.
Example 9.27
P 2k k!
kk
converges or diverges.
Solution. We might be tempted to use the root test here but the presence of the factorial term
k
suggests that we should use the ratio test instead. Let ak = 2kkk! so that
ak+1
2k+1 (k + 1)! k k
= lim
k ak
k (k + 1)k+1 2k k!
kk
2k+1 (k + 1)!
= lim
k 2k
k! (k + 1)k+1
k k (k + 1)
= 2 lim
k (k + 1)k+1
k
k
= 2 lim
k k + 1
2
=
e
lim
where the last line is an exercise in LHopitals rule. By the Ratio Test, we then know that the
series converges.
Root Test: The root test is similar in both statement and proof to the ratio test, but is far less
useful. I will thus present the statement and an example, but leave the proof as an exercise.
Theorem 9.28: The Root Test
1/k
183
c
2015
Tyler Holden
9.5
Kinds of Convergence
Example 9.29
kk
3k 2
converges or diverges.
Solution. Since everything involves the index in the power, we should probably apply the root test.
k
Indeed, set ak = kk2 so that k ak = 3kk . It is then clear (using LHopitals rule) that k ak 0 as
3
k . By the root test, we then have that the series converges.
Example 9.30
P 3k +k9
kk
converges or diverges.
Solution. This one is a bit tricky, as it will require us to use two tests. First of all, we notice that
k
the terms which grow that fastest are given by k3k , and so we use the Limit Comparison Test to see
that
3k + k 9 k k
k9
lim
=
lim
1
+
=1
k
k
k k 3k
3k
P 3k +k9
where the last equality is an exercise in LHopitals rule. Thus
converges if and only if
kk
P 3k
3k
converges. Setting ak = kk the root test tells us that
kk
lim
r
k
s
k
3
k
k
3
= lim
=0
k k
3k
= lim
k
kk
9.5
P 3k
kk
Kinds of Convergence
Alternating Series: PIf (an ) is a sequence of positive numbers, an alternating sequence is any
sequence of the form k (1)k ak . Luckily, there is a very simple test to determine whether an
alternating series converges:
Theorem 9.31: Alternating Series Test
k
k (1) ak
converges if and
The proof of this theorem is not terribly enlightening, so we omit it and refer the reader to the
textbook.
184
c
2015
Tyler Holden
9.5
Kinds of Convergence
Example 9.32
Determine whether
X
(1)k+1
k=1
converges.
k
P
Proof. By taking a negative sign out of the summation, our series is equivalent to k (1)
k , and
this is an alternating series with ak = k1 . These ak are clearly limiting to zero, and hence by the
alternating series test, the limit converges.
Notice that the alternating series given in Example 9.32 is closely related to the Harmonic series
which we actually know does not converge, and is often called the alternating harmonic series. In
fact, we will (very luckily) be able to determine the exact value of the alternating harmonic series
after we have seen Taylor series.
P
Absolute
Convergence: If (an ) is a sequence, the series
ak is said to converge absolutely if
P
|ak | converges. There is an important relationship between series which converge, and which
converge absolutely.
Theorem 9.33: Absolute Convergence
P
P
If k |ak | converges then
ak converges; that is, all absolutely convergent series are convergent.
Proof. The crux of this proof follows from the fact that taking absolute values means that the
terms of the series get larger, and hence are less likely converge. In fact, we shall use the Basic
Comparison Test to show exactly this. Notice that since
|ak | ak |ak |
0 ak + |ak | 2|ak |.
P
The series formed by the terms on the right-hand-side is just 2 k |ak |P
and hence converges by
assumption. By the Basic Comparison Test, it follows that the series k (ak + |ak |) converges.
Since ak = (ak + |ak |) |ak | the linearity of the series implies that
X
X
X
ak =
(ak + |ak |)
|ak | < .
| {z }
k
|
{z
}
finite
then
finite
A natural question to ask at this point is whether the converse is true: Are all convergent
series also absolutely convergent. The answer in this case is no: we saw in Example 9.32 that the
alternating harmonic series is convergent, while taking the absolute value gives the harmonic series
which is not convergent. Hence the alternating harmonic series is a convergent series which is not
absolutely convergent. We often call such series conditionally convergent.
Example 9.34
cos(k)
k k
converges or diverges.
185
c
2015
Tyler Holden
10
Power Series
Solution. Notice that cos(k) = (1)k+1 , so that the terms of our series are actually given by
k+1
10
10.1
Power Series
Taylor Series
We now begin our study of using infinite series to write down functions, and is both a topic of great
interest and great application. The idea is to use an infinite polynomial, for which evaluating at
a value x = a will give the value of the function f (a).
10.1.1
n
X
ak xk .
k=0
186
c
2015
Tyler Holden
10.1
Taylor Series
10
Power Series
(k)
Our goal is to choose the ak such that pn (0) = f (k) (0). Starting this process, notice that
pn (0) = a0
p0n (0) = a1
p00n (0) = 2a2
p(3)
n (0) = 3!a3
..
.
p(k)
n (0) = k!ak .
(k)
f (k) (0)
, and our polynomial is thus
k!
n
f 00 (0) 2
f (n) (0) n X f (k) (0) k
x + +
x =
x .
2
n!
k!
k=0
Now what happens if we want to perform this expansion around a more general point a instead
of just 0? Instead of doing more work, we just translate our pn above by the term x a. More
precisely, consider the polynomial
pn,a (x) = an (x a)n + an1 (x a)n1 + + a2 (x a)2 + a1 (x a) + a0
(k)
for which we have pn (a) = k!ak . We would like to set this to f (k) (a), meaning that we should take
ak =
f (k) (a)
k!
n
X
f (k) (a)
k=0
k!
(x a)k .
Solution. We are well familiar with the fact that f (k) (x) = ex and so f (k) (0) = e0 = 1. Thus the
Taylor polynomial is
n
n
X
f (k) (0) k X xk
x =
.
pn,0 (x) =
k!
k!
k=0
k=0
Example 10.2
187
c
2015
Tyler Holden
10
Power Series
10.1
Taylor Series
Solution. This requires that we determine a general form for the derivative f (k) (x). Checking our
first few derivatives, we find that
f (x) = log(1 + x)
1
f 0 (x) =
1+x
1
f 00 (x) =
(1 + x)2
2!
f (3) (x) =
(1 + x)3
..
.
f (k) (x) =
f (0) = 0
f 0 (0) = 1
f 00 (0) = 1
f (3) (0) = 2!
(1)k1 (k 1)!
(1 + x)k
The veracity of this most general form can be checked by induction, and is left as an exercise for
the student. Hence the nth -order Taylor polynomial is
pn,0 (x) =
n
X
f (k) (0)
k=0
k!
x =
k=0
k!
k=0
n
X
(1)k1
n
X
(1)k1 (k 1)!
xk
xk .
Sums and Products of Taylor Polynomials: The great thing about Taylor polynomials is
that they behave exactly as we expect they should under addition and multiplication.
Proposition 10.3
Let f, g be functions with nth order Taylor polynomials pn,a (x) and qn,a (x) respectively.
1. The Taylor polynomial of f + g is pn,a (x) + qn,a (x).
2. The Taylor polynomial of f g is pn,a (x)qn,a (x) (and is of order 2n).
Proof.
d
dxn x=a [f
"
#
n
n
(k) (a)
(k)
X
X
(f + g)(k) (a)
f
g
(x a)k =
(x a)k +
(x a)k
k!
k!
k!
k=0
k=0
n
X
f (k) (a)
k=0
k!
(x a)k +
n
X
g (k) (a)
k=0
k!
(x a)k
2. The product is somewhat more difficult to discern. One can check via induction that the n-th
188
c
2015
Tyler Holden
10.1
Taylor Series
10
Power Series
derivative of f g is
(n)
(f g)
n
X
n
(a) =
f (k) (a)g (nk) (a),
k
k=0
n
k
n!
.
k!(n k)!
(10.1)
k=0 `=0
r=0
n
n
XX
r=0 s=0
s=0
(x a)r+s
2n X
n
X
f (r) (a)g (kr) (a)
k=0 r=0
2n X
n
X
r!(k r)!
(x a)k
k =r+s
k=0 r=0
Comparing this to (10.1) we see that the equation are equal, giving the desired result.
Example 10.4
Solution. It will quickly become laborious for us to differentiate the function f (x) = x sin(x), and in
fact it will be effectively impossible to derive a closed form expression for a generic n-th derivative.
Instead, the Taylor polynomial of x is just x itself, so we need only compute the Taylor polynomial
of sin(x).
We know that
sin(0) = 0
d
sin(x)
= cos(x)|x=0 = 1
dx
x=0
d2
sin(x) = sin(x)|x=0
dx2 x=0
3
d
= cos(x)|x=0 = 1
dx3 x=0
=0
Since the derivatives of sine repeat with periodicity 4, this is all we need to compute. In particular
though, we notice that only the odd terms will survive, and they will switch sign. By writing
our odd numbers as 2n + 1, the sign of the 2n + 1-st derivative is (1)n and we get the Taylor
polynomial (or order 2n + 1)
n
X
(1)n 2n+1
sin(x) =
x
.
(2n + 1)!
k=0
189
c
2015
Tyler Holden
10
Power Series
10.1
Taylor Series
n
n
X
(1)n 2n+1 X (1)n 2n+2
x
=
x
.
(2n + 1)!
(2n + 1)!
k=0
k=0
Approximation Error: How well does this polynomial do at actually approximating f (x)? The
following theorem gives us the ability to talk about the Taylor remainder.
Theorem 10.5: Taylors Theorem
f (n+1) (c)
(x a)n+1 for some c U , called the Lagrange form of the remain(n + 1)!
der.
The Lagrange form of the remainder in particular allows us to determine upper bounds on the
error of using Taylor series. Indeed, assume that f (k+1) is continuous. If we wish to approximate
f (x) on an interval [a c, a + c] for some c > 0, then by the Extreme Value Theorem we know that
|f (k+1) | attains its max M on [a c, a + c], and hence
f (k+1) (c)
M
|rk (x)| =
|x a|k+1
|x a|k+1
(k + 1)!
(k + 1)!
which can be computed explicitly.
Example 10.6
Let = 0.001 and f (x) = ex . Find the order of Taylor polynomial (about 0) necessary to
ensure that one can approximate ex to within on [1, 1].
n
X
xk
k=0
k!
c
2015
Tyler Holden
10.2
Power Series
10
Power Series
x[1,1]
x[1,1]
n+1
e
x
(n+1)!
. The first n for which this is true is n = 6, hence the 6-th order Taylor
and rn (x) e (n+1)!
polynomial is required.
10.2
Power Series
The idea of a power series is an exceptionally fundamental thing within many fields of mathematics,
such as statistics, combinatorics, algebra, analysis, etc. The idea is to look at infinite series of the
form
X
ak xk
k=0
where the ak R are just real numbers. The role of the xk is much more subtle. Sometimes
the xk just serve as containers, indicating that we want degree k information to be stored in its
coefficients. This is especially useful in for things such as generating functions, which are found in
the description of moments (via the moment generating function) in statistics, and in the field of
combinatorial enumeration. Other times they are designed to be purely abstract containers, such
as in algebra.
Where we are going to be interested in power series was alluded to earlier: We know that the
Taylor polynomial pn (x) corresponding to the function f serves as an approximation for f . Taylors
Theorem then says that as we let n get larger, the approximation becomes better and better. Does
it make sense to take a limit as n , in which case our infinite series would then just be the
function?
We have seen that infinite series can be present some tricky situations, in which we must be
particularly careful with notions such as convergence. Let us start with some general discussion of
power series, before moving on to Taylor series in particular.
k
k=0 ak x as a function in x, we need to ask
29 This will naturally depend on the value of x
xk
(10.2)
k=0
represents the geometric series with common ratio x. We already know that the geometric series
will converge if and only if our ratio x satisfies |x| < 1. This means that the function defined by
(10.2) only converges if we input a number x (1, 1).
29
If we do not care about convergence of the power series, as is the case with generating functions, we say we are
dealing with formal power series.
191
c
2015
Tyler Holden
10
Power Series
10.2
Power Series
Theorem 10.7
If f (x) =
X
k=0
such that f (x) converges for all |x| < r and diverges for all |x| > r.
a
Note that nothing has been said about what happens when |x| = r, since this actually differs dramatically
depending on the function at hand.
X
xk
.
k 2 2k
k=0
Solution. We would like to determine the values of x such that the series
k
bk = k|x|
2 2k converges. Using the ratio test, we have
bk defined by setting
bk+1
xk+1
2k k 2
= lim
k bk
k (k + 1)2 2k+1 xk
2
|x|
k
=
lim
2 k k + 1
|x|
=
.
2
lim
|x|
2
The footnote in Theorem 10.7 emphasized the fact that we do not know how to stipulate
endpoint conditions for convergence on an interval. More precisely, we know that there exists an
r 0 so that a power series will converge on (r, r), but what about that the points r and r
themselves? A brute force solution is to plug r into the power series and just check convergence
directly. The Interval of Convergence is the whole interval (including the endpoints) on which the
power series converges.
192
c
2015
Tyler Holden
10.2
Power Series
10
Power Series
Example 10.9
X
3k
k=0
Solution. Set ak =
(2x 4)k .
3k
|2x 4|k and apply the ratio test to get
k
3k+1 |2x 4|k+1
k
ak+1
= lim
k
k ak
k+1
3k |2x 4|k
k
= 3|2x 4| lim
k k + 1
= 3|2x 4|.
lim
1
1
Setting 3|2x 4| < 1 we
equivalently get |x 2| < 6 , so our radius of convergence if 6 , and our
11 13
interval so far is 6 , 6 .
To determine endpoint convergence, we plug these number directly into our series. At x =
we get
11
6
k X
X
11
(1)k 3k1
3k
2 4 =
.
k
6
k!
k=0
k=0
k1
Since 3 k! 0, this series converges by the Alternating Series Test. On the other hand, at
x = 13
6 we have
k X
X
3k
13
3k1
2 4 =
k
6
k!
k=0
k=0
which
by the Ratio Test. Hence the interval of convergence included both endpoints and
11 converges
13
is 6 , 6 .
Taylor Series: By extending the idea of Taylor polynomials to power series, we get the definition
of the Taylor series of the function f about the point x = a. In particular, consider the following
result:
193
c
2015
Tyler Holden
10
Power Series
10.2
Power Series
Theorem 10.10
Let f be an infinitely differentiable function and consider the power series defined by
X
f (k) (a)
k!
k=0
(x a)k .
If I is the interval of convergence of this power series, then for every x I we have
f (x) =
X
f (k) (a)
k=0
k!
(x a)k .
That is, on its interval of convergence, a function is equal to the value of its Taylor series.
We say that the function f is analytic at the point c if the Taylor series at f has a non-zero
radius of convergence at c. If f is analytic at every c R, we just say that f is analytic.
One must be careful to realize that infinitely differentiable is not the same thing as analytic
Define the function
( 1
e x x > 0
.
f (x) =
0
x0
30 .
This function is infinitely differentiable. However, it is not analytic at x = 0. With a bit of work,
one can show that the Taylor series of f at 0 has radius of convergence 0.
1.
ex
X
xk
k=0
2. sin(x) =
k!
, interval of convergence R.
X
(1)k x2k+1
k=0
(2k + 1)!
cos(x) =
X
(1)k x2k
k=0
(2k)!
, interval of convergence R.
X
1
3.
=
xk , interval of convergence (1, 1).
1x
k=0
4. log(1 + x) =
X
(1)k+1 xk
X
(1)k x2k+1
k=1
5. arctan(x) =
k=0
2k + 1
An easy way to remember property (2): we know sin(x) is odd as a function, and indeed its
Taylor series only has powers in odd order. Similarly, cos(x) is even as a function and its Taylor
series only has powers of even order. We are not yet in a position to compute (5), but we provide
it here as an interesting and motivating example which we will use (and prove) later.
30
Unless youre doing calculus on C, in which case these are the same thing.
194
c
2015
Tyler Holden
10.3
10
Power Series
It is worth noting that I am cheating here. Theorem 10.10 only tells us that the function is
equal to its Taylor series on the interior of the interval of convergence and not at the endpoints.
It is another theorem to show that if convergence holds at the endpoints then the function still
converges there, but lets sweep that under the rug for now.
Notice how the Taylor series for ex looks very similar to the Taylor series for sin(x)
and cos(x). Indeed, if we ignore the (1)k terms we see that sin(x) contains the odd terms of
ex while cos(x) contains the even terms. If we can account for these minus signs, maybe we
can get a relationship between a priori unrelated exponential function and the trigonometric
functions.
Exercise:
1. Recall that in the complex numbers, we can define a number i such that i2 = 1.
Using their Taylor series expansions, show that eix = cos(x) + i sin(x).
2. We know that every point on the unit circle can be written as (cos(), sin()) where is
the angle between the point and the positive x-axis. If we think about cos(x) + i sin(x)
as (cos(x), sin(x)) convince yourself that eix sweeps out the unit circle as x traverses
through [0, 2).
10.3
A natural question to ask (in a calculus course at least) is whether one can differentiate and integrate
the terms of the Taylor series, and if so
does the new power series represent the Taylor series of the derivative/integral?
The answer to both of these questions lies in the following theorem, which we are not going to
prove:
195
c
2015
Tyler Holden
10
Power Series
10.3
Theorem 10.11
Let f (x) =
k=0
1. If f is differentiable then
k=1
f 0 (x) =
kak xk1 .
k=1
X
ak k+1
2. If f is integrable then F (x) =
x
has radius of convergence r, and moreover
k+1
k=1
f (x)dx =
F (x)dx + C.
Effectively, this theorem says that it is fine to integrate and differentiate the terms of a Taylor
series. This is the same as saying that for a Taylor series, we can interchange the sum with the
derivative/integral. Indeed,
X
X
d
d X
d k
k
0
f (x) =
f (x) =
ak x =
ak
x =
kak xk1 ,
dx
dx
dx
k=0
k=0
k=1
and similar for integration. This is not the same thing as linearity, and we should not assume that
this result follows immediately. We encourage the student to accept the theorem as stated, but not
to underestimate the subtlety of the situation.
Note however that while the radius of convergence does not change, the interval of convergence
might be different. For example, we know that
X
1
=
xk on (1, 1),
1x
log(1 + x) =
k=0
X
(1)k+1 xk
k=1
on (1, 1].
These functions are not immediately related, but they can be via some manipulation. Notice that
k=0
k=0
X
X
1
1
=
=
(x)k =
(1)k xk ,
1+x
1 (x)
and the interval of convergence is still (1, 1). Now differentiating log(1 + x) term by term we get
X
1
d
d (1)k+1 xk
=
log(1 + x) =
1+x
dx
dx
k
k=1
X
=
(1)k+1 xk1
k=1
X
=
(1)k xk .
k=0
196
c
2015
Tyler Holden
10.4
10
Power Series
We knew that this would be true from our theorem, but notice that the interval of convergence
of log(1 + x) is (1, 1] while its derivative has interval of convergence (1, 1). So the radius is
unchanged, but the interval is not.
Example 10.12
X
(1)k x2k+1
k=0
2k + 1
1
dx, and we know the Taylor series for
1 + t2
k=0
k=0
1
1+t2
since it
X
X
1
1
2 k
=
=
(t
)
=
(1)k t2k .
1 + t2
1 (t2 )
arctan(x) =
Z
X
k=0
as required.
10.4
k 2k
(1) t dt =
(1)k
k=0
x2k+1
2k + 1
Calculators and calculating: One of the historic applications of power series is the ability a
function to arbitrary precision in terms of polynomials. In the case of transcendental functions especially (such as log, sin, cos, exp), this is how modern day computers and calculators often compute
values.
Computing Limits and Asymptotics: Mathematicians often cheat when looking at limits, in
the sense that we can often guess that value of the limit if we know a little something about the
Taylor series of the terms involved.
Before proceeding, let us introduce a little notation. Recall in Section 4.5.2 that we defined
little-o notation, and said that f = o(g) if f and g grow at the same rate; that is, f and g shared
the same asymptotics in the limit as f and g go to infinity. In a similar vein, we may define big-O
notation as describing the asymptotics of f and g when they go to zero 31 . In effect, we will write
f = O(g) if f is at worst g in the limit as x 0. For example, the Taylor series of ex is
ex =
X
xk
k=0
k!
If we are playing with numbers very near zero, and are only interested in what is happening up to
the quadratic term, we might write this as
ex = 1 + x +
31
x2
+ O(x3 )
2
197
c
2015
Tyler Holden
10
Power Series
10.4
where here O(x3 ) indicates that the dominant factor in the error is of order x3 (for x near 0, the
term x4 will be much smaller than x3 , and so on). When taking limits as x 0, anything with
O(xk ) k 1 will vanish, meaning that we are only interested in the constant terms.
Now we often used Taylor series to create LHopital problems, such as the following:
Example 10.13
ex 1 x
.
x0
x2
Solution. When we made this example, we used the fact that we knew what the Taylor series for
ex . Indeed, the numerator becomes
x2
x2
3
e 1x= 1+x+
+ O(x ) 1 x =
+ O(x3 ).
2
2!
x
x
3
ex 1 x
2! + O(x )
lim
=
lim
2
x0
x0
x2
x
1
= lim
+ O(x)
x0 2!
1
= .
2
One can check this is the same answer we get by applying LHopital.
Example 10.14
xenx x
for n 6= 0.
x0 1 cos(nx)
nx
(nx)2
3
x = x 1 + (nx) +
+ O(x ) x
2
n2 x3
= nx2 +
+ O(x4 ).
2
10.4
10
Power Series
Difficult Integrals: We have seen a few examples of functions that cannot be integrated, and
with some more advanced theory (called differential Galois theory), one can actually prove that
some of these integrals do not have anti-derivatives that can be expressed with elementary functions.
However, the simplicity of integrating polynomials means that, so long as we are content with a
power series representation, one can still get a closed form expression for these integrals.
Example 10.15
sin(x)
dx.
x
sin(x)
x
is given by
1 X (1)k x2k+1
sin(x)
=
x
x
(2k + 1)!
k=0
X
(1)k x2k
k=0
(2k + 1)!
X
(1)k x2k+1
=
.
(2k + 1)2 (2k)!
k=0
Example 10.16
ex dx.
ex =
X
(x2 )k
k=0
k!
199
c
2015
Tyler Holden
X
(1)k x2k
k=0
k!
10
Power Series
10.4
t2
dt =
X
(1)k
k=0
X
k=0
k!
t2k dt
(1)k x2k+1
.
(2k + 1)k!
Differential Equations: In a similar vein to integrals, one can solve differential equations using
Taylor series. One begins by assuming that the solution can be written as a power series, then uses
the differential equation to determine a relationship amongst the coefficients.
Example 10.17
Solution. We have already seen that the most general solution to this equation is y(x) = A cos(x) +
B sin(x) for variables A and B. We expect that
Pwe should arrive at the same solution using power
series. Indeed, suppose that we can write y = k ak xk , so that
y=
y0 =
ak xk
k=0
y 00 =
X
k=2
kak xk1
k=1
k=2
k=0
# "
#
X
X
(k + 2)(k + 1)ak+2 xk +
ak xk
=
=
k=0
k=0
k=0
Now the power series is zero if and only if each coefficient is identically zero. This means that we
can solve for ak+2 in terms of ak as
ak
ak+2 =
.
(k + 2)(k + 1)
Hence once we have determine a0 and a1 , we can find all the remaining coefficients. Set a0 = A
and a1 = B and notice that
A
a2 = 21
a2
a4 = 43 =
..
.
A
4!
A
a2n = (1)n (2n)!
B
a3 = 32
a3
a5 = 54 =
..
.
B
5!
B
a2n+1 = (1)n (2n+1)!
200
c
2015
Tyler Holden
10.4
10
Power Series
By splitting the even and odd powers of the power series, our solution thus looks like
y=A
X
(1)k x2k
k=0
(2k)!
+B
X
(1)k x2k+1
k=0
(2k + 1)!
exactly as we expected.
201
c
2015
Tyler Holden
= A cos(x) + B sin(x),