0% found this document useful (0 votes)
18 views18 pages

Computer Arithmetic and Error Analysis

The document discusses computer arithmetic, focusing on floating-point representation, errors, and computation pitfalls. It explains the concepts of round-off errors, propagated errors, and the non-associativity of arithmetic operations in computers. The unit aims to educate readers on the implications of finite digit arithmetic and the stability of algorithms in numerical calculations.

Uploaded by

malikmavia21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views18 pages

Computer Arithmetic and Error Analysis

The document discusses computer arithmetic, focusing on floating-point representation, errors, and computation pitfalls. It explains the concepts of round-off errors, propagated errors, and the non-associativity of arithmetic operations in computers. The unit aims to educate readers on the implications of finite digit arithmetic and the stability of algorithms in numerical calculations.

Uploaded by

malikmavia21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Computer Arithmetic

UNIT 1 COMPUTER ARITHMETIC


Structure Page Nos.
1.0 Introduction 7
1.1 Objectives 7
1.2 Floating–Point Arithmetic and Errors 7
1.2.1 Floating Point Representation of Numbers
1.2.2 Sources of Errors
1.2.3 Non-Associativity of Arithmetic
1.2.4 Propagated Errors
1.3 Some Pitfalls in Computation 15
1.3.1 Loss of Significant Digits
1.3.2 Instability of Algorithms
1.4 Summary 18
1.5 Exercises 18
1.6 Solutions/Answers 19

1.0 INTRODUCTION
When a calculator or digital computer is used to perform numerical calculations, an
unavoidable error, called round-off error must be considered. This error arises
because the arithmetic operations performed in a machine involve numbers with only
a finite number of digits, with the result that many calculations are performed with
approximate representation of actual numbers. The computer has the in–built
capability to perform only the basic arithmetic operations of addition, subtraction,
multiplication and division. While formulating algorithm all other mathematical
operators are reduced to these basic operations even when solving problems involving
the operations of calculus. We will discuss the way arithmetic operations are carried
out in a computer and some of the peculiarities of computer arithmetics. Finally we
dwell on the propagation of errors.

1.1 OBJECTIVES
After going through this unit, you should be able to:

• learn about floating-point representation of numbers;


• learn about non-associativity of arithmetic in computer;
• learn about sources of errors;
• understand the propagation of errors in subsequent calculations;
• understand the effect of loss of significant digits in computation; and
• know when an algorithm is unstable.

1.2 FLOATING POINT ARITHMETIC


AND ERRORS
First of all we discuss representation of numbers in floating point format.

1.2.1 Floating Point Representation of Numbers


There are two types of numbers, which are used in calculations:
1. Integers: 1, … ─3, ─2, ─1, 0, 1, 2, 3, …

7
Computer Arithmetic 2. Other Real Numbers, such as numbers with decimal point.
and Solution of
Non-linear Equations
In computers, all the numbers are represented by a (fixed) finite number of digits.
Thus, not all integers can be represented in a computer. Only finite number of
integers, depending upon the computer system, can be represented. On the other
hand, the problem for non-integer real numbers is still more serious, particularly for
non-terminating fractions.

Definition 1 (Floating Point Numbers): Scientific calculations are usually carried


out in floating point arithmetic in computers.

An n-digit floating-point number in base β (a given natural number), has the form

x = ± (.d1d2…dn)ββe, 0 ≤ di < β, m ≤ e ≤ M; I = 1, 2, …n, d1 ≠ 0;

where (.d1d2…dn)β is a β─ fraction called mantissa and its value is given by

1 1 1
(.d1d2…dn)β = d1 × + d2 × 2 + + dn × n ; e is an integer called the
β β β
exponent.

The exponent e is also limited to range m < e < M, where m and M are integers
varying from computer to computer. Usually, m = –M.

In IBM 1130, m = –128 (in binary), –39 (decimal) and M = 127 (in binary), 38 (in
decimal).

For most of the computers β = 2 (binary), on some computers β = 16 (hexadecimal)


and in pocket calculators β = 10 (decimal).

The precision or length n of floating-point numbers on any computer is usually


determined by the word length of the computer.

Representation of real numbers in the computers:


There are two commonly used ways of approximating a given real number x into an
n– digits floating point number, i.e. through rounding and chopping. If a number x
has the representation in the form x = (d1d2…dn+1 …) βe, then the floating point
number fl(x) in n-digit – mantissa can be obtained in the floating two ways:

Definition 2 (Rounding): fl(x) is chosen as the n–digit floating-point number nearest


to x. If the fractional part of x = d1d2…dn+1 requires more than n digits, then if
1
dn+1 < β , then x is represented as (.d1 d2…dn) βe else, it is written as
2
(.d1 d2…d n-1 (dn+1)βe

2
Example 1: fl   = .666667 × 100 in 6 decimal digit floating point representation.
3
Definition 3 (Chopping): fl(x) is chosen as the floating point number obtained by
deleting all the digits except the left-most n digits. Here dn+1… etc. are neglected and
fl(x) = d1d2…dnβe.
2
Example 2: If number of digits n = 2, fl   = (.67) × 100 rounded
3
(.66) × 100 chopped
fl(–83.7) = –(0.84) × 103 rounded
–(0.83) × 103 chopped.

8
Computer Arithmetic
On some computers, this definition of fl(x) is modified in case x ≥ βM (overflow) or
0 < x ≤ βm (under flow), where m and M are the bounds on the exponents. Either
fl(x) is not defined in this case causing a stop or else fl(x) is represented by a special
number which is not subject to the usual rules of arithmetic, when combined with
ordinary floating point number.

Definition 4: Let fl(x) be floating point representation of real number x. Then


ex = x – fl(x) is called round-off (absolute) error,
x − fl(x)
rx = is called the relative error.
x

Theorem: If fl(x) is the n – digit floating point representation in base β of a real


number x, then rx the relative error in x satisfies the following:

1 1– n
(i) rx < β if rounding is used.
2
(ii) 0 ≤ rx ≤ β1 – n if chopping is used.

For proving (i), you may use the following:

Case 1.
1
dn+1 < β, then fl(x) = ± (.d1d2…dn)βe
2
x-fl(x) = dn+1, dn+2 …βe– n–1
1 1
≤ β.βe– n–1 = βe– n
2 2
Case 2.
1
dn+1 ≥ β,
2
fl(x) = ± {(.d1d2…dn)βe+βe– n}
x-fl(x) = . − d n +1 , d n + 2 . β e-n-1 + β e − n
= βe– n–1 dn+1 . dn+2 ─ β
1 1 e– n
≤ βe– n–1 × β= β
2 2

1.2.2 Sources of Errors


We list below the types of errors that are encountered while carrying out numerical
calculation to solve a problem.
1. Round off errors arise due to floating point representation of initial data in the
machine. Subsequent errors in the solution due to this is called propagated
errors.

2. Due to finite digit arithmetic operations, the computer generates, in the solution
of a problem errors known as generated errors or rounding errors.

3. Sensitivity of the algorithm of the numerical process used for computing f(x): if
small changes in the initial data x lead to large errors in the value of f(x) then
the algorithm is called unstable.

4. Error due to finite representation of an inherently infinite process. For example,


consider the use of a finite number of terms in the infinite series expansions of

9
Computer Arithmetic Sin x, Cos x or f(x) by Maclaurin’s or Taylor Series expression. Such errors are
and Solution of called truncation errors.
Non-linear Equations
Generated Error
Error arising due to inexact arithmetic operation is called generated error. Inexact
arithmetic operation results due to finite digit arithmetic operations in the machine. If
arithmetic operation is done with the (ideal) infinite digit representation then this
error would not appear. During an arithmetic operation on two floating point numbers
of same length n, we obtain a floating point number of different length m (usually m >
n). Computer can not store the resulting number exactly since it can represent
numbers a length n. So only n digits are stored. This gives rise to error.

Example 3: Let a = .75632 × 102 and b = .235472 × 10–1


a + b = 75.632 + 0.023
= 75.655472 in accumulator
a + b = .756555 × 10 if 6 decimal digit arithmetic is used.

We denote the corresponding machine operation by superscript * i.e.

a + * b = .756555 × 102 (.756555E2)

Example 4: Let a = .23 × 101 and b = .30 × 102

a 23
= = (0. 075666E2)
b 300

a
If two decimal digit arithmetic is used then * = .76 × 10–1 (0.76E – 1)
b

In general, let w* be computer operation corresponding to arithmetic operation w on x


and y.

Generated error is given by xwy – xw*y. However, computers are designed in such a
way that

xw*y = fl(xwy). So the relative generated error

xwy − xw* y
r.g.e. = rxwy =
xwy

we observe that in n – digit arithmetic

1 1– n
r.g.e. < β , if rounding is used.
2

0 ≤ r.g.e. < β1 – n, if chopping is used.

Due to generated error, the associative and the distributive laws of arithmetic are not
satisfied in some cases as shown below:
1
In a computer 3 × would be represented as 0.999999 (in case of six significant
3
digit) but by hand computation it is one. This simple illustration suggested that
everything does not go well on computers. More precisely 0.333333 + 0.333333
+0.333333 = 0.999999.

10
Computer Arithmetic
1.2.3 Non-Associativity of Arithmetic
Example 5: Let a = 0.345 × 100, b = 0.245 × 10–3 and c = 0.432 × 10–3. Using
3-digit decimal arithmetic with rounding, we have
b+c = 0.000245 + 0.000432
= 0.000677 (in accumulator)
= 0.677 × 10–3
a + (b + c) = 0.345 + 0.000677 (in accumulator)
= 0.346 × 100 (in memory) with rounding
a+b = 0.345 × 100 + 0.245 × 10–3
= 0.345 × 100 (in memory)
(a + b) + c = 0.345432 (in accumulator)
= 0.345 × 100 (in memory)

Hence we see that


(a + b) + c ≠ a + (b + c)

Example 6: Let a = 0.41, b = 0.36 and c = 0.70.

Using two decimal digit arithmetic with rounding we have,


(a − b)
= .71 × 10–1
c
a b
and − = .59 – .51 = .80 × 10–1
c c
(a − b)
while true value of = 0.071428 ….
c
(a − b) a b
i.e. ≠ −
c c c

These above examples show that error is due to finite digit arithmetic.

Definition 5: If x* is an approximation to x, then we say that x* approximates x to n


significant β digits provided absolute error satisfies
1 s– n+1
x − x* ≤ β ,
2
with s the largest integer such that βs ≤ x .
From the above definition, we derive the following:
x* is said to approximate x correct to n – significant β digits, if
x − x* 1 1– n
≤ β
x 2

In numerical problems we will use the following modified definition.


Definition 6: x* is said to approximate x correct to n decimal places (to n places after
the decimal)
1 –n
If x − x * ≤ 10
2
In n β ─digit number, x* is said to approximate x correct to n places after the
x − x*
dot if ≤ β–n.
x

Example7: Let x* = .568 approximate to x = .5675


x – x * = –.0005

11
Computer Arithmetic 1 1
and Solution of x − x * = 0.0005 = (.001) = × 10– 3
2 2
Non-linear Equations

So x* approximates x correct to 3 decimal place.

Example 8: Let x = 4.5 approximate to x = 4.49998.


x –x* = – .00002

x − x*
= 0.0000044 ≤ .000005
x
1 1 –5 1
≤ (.00001) = 10 = × 101–6
2 2 2
Hence, x* approximates x correct to 6 significant decimal digits.

1.2.4 Propagated Error


In a numerical problem, the true value of numbers may not be used exactly i.e. in
place of true values of the numbers, some approximate values like floating point
numbers are used initially. The error arising in the problem due to these
inexact/approximate values is called propagated error.

Let x* and y* be approximations to x and y respectively and w denote arithmetic


operation.

The propagated error = xwy – x* wy*

r.p.e. = relative propagated error

xwy − x * wy *
=
xwy

Total Error: Let x* and y* be approximations to x and y respectively and let w* be


the machine operation corresponding to the arithmetic operation w. Total relative
error

xwy − x * w* y *
rxwy =
xwy

xwy − x * wy * x * wy * − x * w* y *
= +
xwy xwy
xwy − x * wy * x * wy * − x * w* y *
= +
xwy x * wy *

for the first approximation. So total relative error = relative propagated error +
relative generated error.

Therefore, rxwy< 101– n if rounded.


rxwy< 2.101– n if chopped.
Where β = 10.

Propagation of error in functional evaluation of a single variable.


Let f(x) be evaluated and x* be an approximation to x. Then the (absolute) error in
evaluation of f(x) is f(x) – f(x*) and relative error is

12
Computer Arithmetic
f(x) − f(x * )
rf(x) = (1)
f(x)

suppose x = x* + ex, by Taylor’s Series, we get f(x) = f(x*) + exf(x*) + … neglecting


higher order term in ex in the series, we get

e x f(x* ) e xf' (x* ) xf(x* )


rf(x) = ─ x ≅ = rx.
f(x) x f(x) f(x)
xf(x* )
rf(x) = rx
f(x)

Note: For evaluation of f(x) in denominator of r.h.s. after simplification, f(x) must be
replaced by f(x*) in some cases. So
xf' (x* )
rf(x) = rx
f(x)

xf' (x* )
The expression is called condition number of f(x) at x. The larger the
f(x)
condition number, the more ill-conditioned the function is said to be.

Example 9:
1
1. Let f(x) = xand x approximates x* correct to n significant decimal digits.
10

Prove that f(x*) approximates f(x) correct to (n+1) significant decimal digits.
xf' (x* )
rf(x) = rx.
f(x)
9
1 * − 10
x⋅ x
= rx. 10 1
x 10
 1 
=   rx
 10 
 1  1 1 1
rf(x) =   rx ≤ . .101–n = 101–(n+1)
10
  10 2 2

Therefore, f(x*) approximates f(x) correct to (n + 1) significant digits.

Example 10: The function f(x*) = ex is to be evaluated for any x, 0 ≤ x ≤ 50, correct
to at least 6 significant digits. What digit arithmetic should be used to get the required
accuracy?
xf' (x x )
rf(x) = rx
f(x)
*
x.e x
= rx
ex
= rxx

Let n digit arithmetic be used, then


1 1–n
rx < 10
2
1 1–6
This is possible, if x rx ≤ 10
2
1 1–n 1
or 50. 10 ≤ 101–6
2 2
13
Computer Arithmetic 1 1–n  1  1–6
and Solution of . 10 ≤   10
Non-linear Equations 2  100 
101–n ≤ 2.101–8

or 10–n ≤ 10–8.2
–n ≤ –8 + log 10
2

2
8 – log 10 ≤ n or 8 – .3 ≤ n
That is n ≥ 8.

Hence, n ≥ 8 digit arithmetic must be used.

Propagated Error in a function of two variables.

Let x* and y* be approximations to x and y respectively.


For evaluating f(x, y), we actually calculate f(x*, y*)
ef(x, y) = f(x, y) – f(x*, y*)
but f(x, y) = f(x*+ ex, y* + ey)
= f(x*, y*) + (exfx + efff)(x*, y*) – higher order term. Therefore, ef(x, y) = (exfx + efff)(x*, y*).
For relative error divide this by f(x, y).

Now we can find the results for propagated error in an addition, multiplication,
subtraction and division by using the above results.

(a) Addition: f(x,y) = x + y


ex + y = ex + ey
xe x ye y
rx + y = +
x(x + y) y(x + y)
x y
= rx + ry
x+ y x+ y

(b) Multiplication: f(x,y) = xy


ex + y = exy + eyx
ex ey
rxy = +
x y
= rx + ry

(c) Subtraction: f(x,y) = x – y


ex–y = exy – eyx
xe x ye y
rx– y = –
x(x - y) y(x − y)
x y
= rx + ry
x− y x− y

x
(d) Division: f(x,y) =
y
1 x
e x = ex. – ey. 2
y
y y
ex ey
rx = –
y
x y
= rx – ry

14
Computer Arithmetic
1.3 SOME PITFALLS IN COMPUTATIONS
As mentioned earlier, the computer arithmetic is not completely exact. Computer
arithmetic sometimes leads to undesirable consequences, which we discuss below:

1.3.1 Loss of Significant Digits


One of the most common (and often avoidable) ways of increasing the importance of
an error is known as loss of significant digits.

Loss of significant digits in subtraction of two nearly equal numbers:


The above result of subtraction shows that x and y are nearly equal then the relative
error

x y
rx– y = rx – ry
x− y x− y

will become very large and further becomes large if rx and ry are of opposite signs.

Suppose we want to calculate the number z = x – y and x* and y* are approximations


for x and y respectively, good to r digits and assume that x and y do not agree in the
most left significant digit, then z* = x* – y* is as good approximation to x – y as x* and
y* to x and y..

But if x* and y* agree at left most digits (one or more) then the left most digits will
cancel and there will be loss of significant digits.

The more the digit on left agrees the more loss of significant digits would take place.
A similar loss in significant digits occurs when a number is divided by a small number
(or multiplied by a very large number).

Remark 1

To avoid this loss of significant digits, in algebraic expressions, we must rationalize


and in case of trigonometric functions, Taylor’s series must be used.

If no alternative formulation to avoid the loss of significant digits is possible, then


carry more significant digits in calculation using floating-using numbers in double
precision.

Example 11: Let x* = .3454 and y* = .3443 be approximations to x and y respectively


correct to 3 significant digits. Further let z* = x* – y* be the approximation to x – y,
then show that the relative error in z* as an approximation to x – y can be as large as
100 times the relative error in x or y.

Solution:
1 1-3
Given, rx, ry, ≤ 10
2
z* = x* – y* = .3454 – .3443
= .0011
= .11 × 10-2
This is correct to one significant digit since last digits 4 in x* and 3 in y* are not
reliable and second significant digit of i* is derived from the fourth digits of x* and y*.
1 1 1
Max. rz = 101–1 = = 100. .10–2
2 2 2
≥ 100 rx, 100 ry

15
Computer Arithmetic Example 12: Let x = .657562 × 103 and y = .657557 × 103. If we round these
and Solution of numbers then
Non-linear Equations

x* = .65756 × 103 and y* = .65756 × 103. (n = 5)


x – y = .000005 × 103 = .005

while x* – y* = 0, this is due to loss of significant digits.

Now
u .253 × 10 −2 253 1
= = ≠
x−y .005 500 2

u*
whereas =∞
x* − y

Example 13: Solve the quadratic equation x2 + 9.9 x – 1 = 0 using two decimal digit
floating arithmetic with rounding.

Solution:

Solving the quadratic equation, we have

2
− b + b 2 − 4ac − 9.9 + (9.9) − 4.1.( −1)
x= =
2a 2

− 9.9 + 102 −9.9 + 10 .1


= = = =.05
2 2 2

while the true solutions are – 10 and 0.1. Now, if we rationalize the expression.

− b + b 2 − 4ac −4ac
x= =
2a 2a(b + b 2 − 4ac )

−2c 2
= =
2
b + b − 4ac ) 9.9 + 102

2 2 2
= = = ≅ .1 .(0.1000024)
9.9 + 10 19.9 20

which is one of the true solutions.

1.3.2 Instability of Algorithms


An algorithm is a procedure that describes, an unambiguous manner, a finite sequence
of steps to be performed in a specified order. The object of the algorithm generally is
to implement a numerical procedure to solve a problem or to find an approximate
solution of the problem.

In numerical algorithm errors grow in each step of calculation. Let ε be an initial


error and Rn(ε) represents the growth of an error at the nth step after n subsequence
operation due to ε.

If Rn(ε) ≈ C n ε, where C is a constant independent of n, then the growth of error is


called linear. Such linear growth of error is unavoidable and is not serious and the

16
results are generally accepted when C and ε are small. An algorithm that exhibits Computer Arithmetic
linear growth of error is stable.

If Rn(ε) ≈ Cknε, k > 1, C > 0, k and C are independent of n, then growth of error is
called exponential. Since the term kn becomes large for even relatively small values
of n. The final result will be completely erroneous in case of exponential growth of
error. Such algorithm is called unstable.

Example 14:

  1 1 1 
Let yn = n! e −  1 + + +, +  (1)
  1! 2! n! 

1 1
yn = + +….. (2)
n + 1 (n + 1)(n + 2 )

1 1 1
yn < + 2 + 3 + ….
n n n

1
n 1
0 ≤ yn < =
1 n −1
1−
n

yn → 0 as n → ∞

i.e. {yn} is monotonically decreasing sequence which converges to zero. The value of
y9 using (2) is y9 = .10991 correct to 5 significant figures.

Now if we use (1) by writing

  1 1 1 
yn+1 = (n+1)! e −  1 + + +, + 
  1! 2! (n + 1)! 

i.e., yn+1 = (n +1) yn – 1

Using (3) and starting with

y0 = e – 1 =1.7183, we get
y1 = .7183
y2 = .4366
y3 = .3098
y4 = .2392
y5 = .1960
y6 = .1760
y7 = .2320
y8 = .8560
y9 = 6.7040

This value is not correct even to a single significant digit, because algorithm is
unstable. This is shown computationally. Now we show it theoretically.

Let y *n be computed value by (3), then we have

yn+1 = (n+1) yn–1


y *n +1 = (n+1) y *n –1

17
Computer Arithmetic yn+1 – y *n +1 = (n+1) (yn–y *n )
and Solution of
Non-linear Equations i.e. en+1 = (n+1) en
en+1 = (n+1)! eo
en+1 > 2n eofor n > 1
1 n
en > .2 eo
2

Here k = 2, hence growth of error is exponential and the algorithm is unstable.

1
Example 15: The integral En = ∫ 0
xnex-1dx is positive for all n ≥ 0. But if we
1 1
integrate by parts, we get En = 1 – nEn (= xnex-1 ∫ 0
─ ∫ 0
n xn-1ex-1dx).

1
Starting from E1 = .36787968 as an approximation to (accurate value of E1) correct
e
to 7 significant digits, we observe that En becomes negative after a finite number of
iteration (in 8 digit arithmetic). Explain.

Solution

Let E *n be computed value of En.

En – E *n = –n(En–1 – En–1)
en = (–1)n n!en
en≥ ½ .2n eohence process is unstable.

Using 4 digit floating point arithmetic and E1 = 0.3678 × 100 we have E2 = 0.2650,
E3 = 0.2050, E4 = 0.1800, E5 = 0.1000, E6 = 0.4000. By inspection of the arithmetic,
the error in the result is due to rounding error committed in approximating E2.

Correct values are E1 = 0.367879, E2 = 0.264242. Such an algorithm is known as an


unstable algorithm. This algorithm can be made into a stable one by rewriting
1 − En
En–1 = , n = … 4, 3, 2. This algorithm works backward from large n towards
n
small number. To obtain a starting value one can use the following:
1 1
En = ≤ ∫ 0
xndx =
n +1
.

1.4 SUMMARY
In this unit we have covered the following:
After discussing floating-point representation of numbers we have discussed the
arithmetic operations with normalized floating-point numbers. This leads to a
discussion on rounding errors. Also we have discussed other sources of errors… like
propagated errors loss of significant digits etc. Very brief idea about stability or
instability of a numerical algorithm is presented also.

1.5 EXERCISES
E1) Give the floating point representation of the following numbers in 2 decimal
digit and 4 decimal digit floating point number using (i) rounding and (ii)

18
chopping. Computer Arithmetic
(a) 37.21829
(b) 0.022718
(c) 3000527.11059

E2) Show that a(b – c) ≠ ab – ac


where
a = .5555 × 101
b = .4545 × 101
c = .4535 × 101

E3) How many bits of significance will be lost in the following subtraction?
37.593621 – 37.584216

E4) What is the relative error in the computation of x – y, where x =


0.3721448693 and y = 0.3720214371 with five decimal digit of accuracy?

E5) If x* approximates x correct to 4 significant decimal figures/digits, then


calculate to how many significant decimal figures/digits ex*/100 approximates
ex/100.

E6) Find a way to calculate


(i) f (x ) = x 2 +1 −1
(ii) f (x ) = x – Sin x
(iii) f(x) = x − x 2 − α

correctly to the number of digits used when it is near zero for (i) and (ii),very
much larger than α for (iii)

x3
E7) Evaluate f (x ) = when x =.12 ×10 −10 using two digit arithmetic.
x − Sinx

a −b a b
E8) Let u = and v = − when a = .41, b = .36 and c = .70. Using two
c c c
digit arithmetic show that evis nearly two times eu.

E9) Find the condition number of


(i) f(x) = x
10
(ii) f(x) =
1− x2
and comment on its evaluation.

E10) Consider the solution of quadratic equation

x 2 + 111.11x + 1.2121 = 0

using five-decimal digit floating point chopped arithmetic.

1.6 SOLUTIONS/ANSWERS
E1) (a) rounding chopping
.37 × 102 .37 × 102
.3722 × 102 .3721 × 102

19
Computer Arithmetic (b) .23 × 10–1 .22 × 10–1
and Solution of .2272 × 10–1 .2271 × 10–1
Non-linear Equations
(c) .31 × 102 .30 × 102
.3056 × 102 .3055 × 102

Note: Let x be approximated by


ap …a1a0.a–1a–2 .. a–q .
In case a–q–1 > 5, x is rounded to
ap …a1a0.a–1a–2 … (a–q + 1)

In case a–q–1 = 5 which is followed by at least one non-zero digit, x is rounded


to
ap …a1a0.a–1a–2 .. a–q+1 .(a–q + 1)

In case a–q–1 = 5, being the last non-zero digit, x is rounded to


ap …a1a0.a–1a–2 .. a–q
if a–q is even or to
ap …a1a0.a–1a–2 .. a–q+1 .(a–q + 1)
If a–q if odd.

E2) Let
a = .5555 × 101
b = .4545 × 101
c = .4535 × 101

b – c = .0010 × 101 = .1000 × 10–1

a(b – c) = (.5555 × 101) × (.1000 × 10–1)


= .05555 × 100
= .5550 × 10–1

ab = (.5555 × 101) (.4545 × 101)


= (.2524 × 102)

ac = (.5555 × 101) (.4535 × 101)


= (.2519 × 102)

and ab – ac = .2524 × 102 – .2519 × 102


= .0005 × 102
= .5000 × 10–1

Hence a(b – c) ≠ ab – ac

E3) 37.593621 – 37.584216


i.e. (0.37593621)102 – (0.37584216)102

Here x* = (0.37593621)102, y* = (0.37584216)102


and assume each to be an approximation to x and y, respectively, correct to
seven significant digits.

Then, in eight-digit floating-point arithmetic,


= (0.00009405)102
z* = x* – y* = (0.94050000) 10 −2
is the exact difference between x* and y*. But as an approximation to
z = x – y, z* is good only to three digits, since the fourth significant digit of z* is
derived from the eighth digits of x* and y*, and both possibly in error. Here while
the error in z* as an approximation to z = x – y is at most the sum of the errors in x*

20
and y*, the relative error in z* is possibly 10,000 times the relative error in x* or y*. Computer Arithmetic
Loss of significant digits is, therefore, dangerous only if we wish to keep the relative
error small.

1
Given rx , r y < 10 1−7
2
z* = (0.9405)10 −2

is correct to three significant digits.

1 1
Max rz = 10 1−3 = 10,000. 10 −6 ≥ 10,000 rz ,10,000 r y
2 2

E4) With five decimal digit accuracy

x* = 0.37214 × 100 y* = 0.37202 × 100

x* – y* = 0.00012 while x – y = 0.0001234322

(x − y ) − (x* − y * ) 0.0000034322
= ≈ 3 × 10 − 2
x− y 0.0001234322

The magnitude of this relative error is quite large when compared with the relative
errors of x* and y* (which cannot exceed 5 × 10–5 and in this case it is approximately
1.3 × 10–5)

x
E5) Here f (x ) = e 100

r f(x) ≈ rx .
( )
xf' x *
≈ rx
xf' x * ( )
x
= rx .e 100 .
1
.
1
f(x) f(x) 100 x
e 100

i.e.
1 1 1 1
rf(x) ≈ rx ≤ . 10 1 −4 = 10 1 −6.
100 100 2 2

x* x
Therefore, e 100
approximates e 100
correct for 6 significant decimal digits.

E6) (i) Consider the function:


f (x ) = x 2 + 1 − 1 whose value may be required for x near 0. Since
x 2 + 1 ≈ 1 when x ≈ 0 , we see that there is a potential loss of
significant digits in the subtraction. If we use five-decimal digit
arithmetic and if x = 10 −3 ,then f ( x ) will be computed as 0.

Whereas if we rationalise and write


 2  2 
 x + 1 − 1 x + 1 + 1
   x2
f(x) = =
 2  x2 +1 +1
 x + 1 + 1
 
1
we get the value as ×10 −6
2
(ii) Consider the function:

f (x ) = x – Sin x whose value is required near x = 0. The loss of


significant digits can be recognised since Sin x ≈ x when x ≅ 0.
21
Computer Arithmetic To avoid the loss of significance we use the Taylor (Maclaurin) series
and Solution of for Sin x
Non-linear Equations

x 3 x 5 x7
Sin x = x − + − + ...
3! 5! 7!
x 3 x 5 x7
Then f (x ) = x – Sin x = − + + ...
3! 5! 7!
x3
The series starting with is very effective for calculation
6
f (x ) when x is small.

(iii) Consider the function:

f(x) = x − x 2 − α
 x − x2 − α 
 
α
as f(x) =   2 
x+ x −α  =
x+ x −α 
2  x + x2 − α

Since when x is very large compared to α , there will be loss of


significant digits in subtraction.

x3 x5
E7) Sinx = x − + −...
3! 5!
( )
Sinx = .12 ×10 −10 =.12 ×10 −10 −.17 ×10 −32 +... ≈.12 ×10 −10

x3
So f(x) = =∞
x − Sinx
x3
But f(x) = can be simplified to
x − Sinx
x3 1
= 3 5
=
x x 1 x2
− + ... − + ...
3! 5! 3! 5!

x3
The value of for =.12 ×10 −10
x − Sinx
1
is = 6.
1
3!

E8) Using two digit arithmetic


a −b
u= = .71 × 10–1
c
a b
v = − = .59 – .51 = .80 × 10–1
c c
True value = .071428
u – fl(u) = eu = .000428
v – fl(v) = ev = .0008572
Thus, evis nearly two times ofeu indicating that u is more accurate than v.

E9) The word condition is used to describe the sensitivity of the function value
f ( x ) to changes in the argument x . The informal formula for Condition of
f at x

22
= max 
( )
 f (x ) − f x * x − x* 
: x − x * " small" 
Computer Arithmetic

 f (x ) x 
f' (x)x

f(x)

The larger the condition, the more ill-conditioned the function is said to be.

If f(x) = x , the condition of f is approximately


 1 
 x
f' (x)x  2 x  1
= =
f(x) x 2

This indicates that taking square root is a well conditioned process.


10
But if f(x) =
1− x2

=
(
f 1 (x)x 20x 1 − x 2 x
=
)2x 2
f(x) (
10x 1 − x 2 )
1− x2
This number can be very large when x is near 1 or –1 signalling that the
function is quite ill-conditioned.

E10) Let us calculate


− b ± b 2 − 4ac
x=
2a
−111.11 + 111.09
x1 =
2
= – 0.01000

while in fact x1 = −0.010910, correct to the number of digits shown.


However, if we calculate x1 as
2c
x1 =
b + b 2 − 4ac
in five-decimal digit arithmetic x 1 = −0.010910 which is accurate to five
digits.
−2 ×1.2121 −2.4242
x1 = =
111.11 + 111.09 222.20
24242
=− = −0.0109099 = −.0109099
2222000

23
Computer Arithmetic
and Solution of
Non-linear Equations

24

You might also like