Computer Arithmetic and Error Analysis
Computer Arithmetic and Error Analysis
1.0 INTRODUCTION
When a calculator or digital computer is used to perform numerical calculations, an
unavoidable error, called round-off error must be considered. This error arises
because the arithmetic operations performed in a machine involve numbers with only
a finite number of digits, with the result that many calculations are performed with
approximate representation of actual numbers. The computer has the in–built
capability to perform only the basic arithmetic operations of addition, subtraction,
multiplication and division. While formulating algorithm all other mathematical
operators are reduced to these basic operations even when solving problems involving
the operations of calculus. We will discuss the way arithmetic operations are carried
out in a computer and some of the peculiarities of computer arithmetics. Finally we
dwell on the propagation of errors.
1.1 OBJECTIVES
After going through this unit, you should be able to:
7
Computer Arithmetic 2. Other Real Numbers, such as numbers with decimal point.
and Solution of
Non-linear Equations
In computers, all the numbers are represented by a (fixed) finite number of digits.
Thus, not all integers can be represented in a computer. Only finite number of
integers, depending upon the computer system, can be represented. On the other
hand, the problem for non-integer real numbers is still more serious, particularly for
non-terminating fractions.
An n-digit floating-point number in base β (a given natural number), has the form
1 1 1
(.d1d2…dn)β = d1 × + d2 × 2 + + dn × n ; e is an integer called the
β β β
exponent.
The exponent e is also limited to range m < e < M, where m and M are integers
varying from computer to computer. Usually, m = –M.
In IBM 1130, m = –128 (in binary), –39 (decimal) and M = 127 (in binary), 38 (in
decimal).
2
Example 1: fl = .666667 × 100 in 6 decimal digit floating point representation.
3
Definition 3 (Chopping): fl(x) is chosen as the floating point number obtained by
deleting all the digits except the left-most n digits. Here dn+1… etc. are neglected and
fl(x) = d1d2…dnβe.
2
Example 2: If number of digits n = 2, fl = (.67) × 100 rounded
3
(.66) × 100 chopped
fl(–83.7) = –(0.84) × 103 rounded
–(0.83) × 103 chopped.
8
Computer Arithmetic
On some computers, this definition of fl(x) is modified in case x ≥ βM (overflow) or
0 < x ≤ βm (under flow), where m and M are the bounds on the exponents. Either
fl(x) is not defined in this case causing a stop or else fl(x) is represented by a special
number which is not subject to the usual rules of arithmetic, when combined with
ordinary floating point number.
1 1– n
(i) rx < β if rounding is used.
2
(ii) 0 ≤ rx ≤ β1 – n if chopping is used.
Case 1.
1
dn+1 < β, then fl(x) = ± (.d1d2…dn)βe
2
x-fl(x) = dn+1, dn+2 …βe– n–1
1 1
≤ β.βe– n–1 = βe– n
2 2
Case 2.
1
dn+1 ≥ β,
2
fl(x) = ± {(.d1d2…dn)βe+βe– n}
x-fl(x) = . − d n +1 , d n + 2 . β e-n-1 + β e − n
= βe– n–1 dn+1 . dn+2 ─ β
1 1 e– n
≤ βe– n–1 × β= β
2 2
2. Due to finite digit arithmetic operations, the computer generates, in the solution
of a problem errors known as generated errors or rounding errors.
3. Sensitivity of the algorithm of the numerical process used for computing f(x): if
small changes in the initial data x lead to large errors in the value of f(x) then
the algorithm is called unstable.
9
Computer Arithmetic Sin x, Cos x or f(x) by Maclaurin’s or Taylor Series expression. Such errors are
and Solution of called truncation errors.
Non-linear Equations
Generated Error
Error arising due to inexact arithmetic operation is called generated error. Inexact
arithmetic operation results due to finite digit arithmetic operations in the machine. If
arithmetic operation is done with the (ideal) infinite digit representation then this
error would not appear. During an arithmetic operation on two floating point numbers
of same length n, we obtain a floating point number of different length m (usually m >
n). Computer can not store the resulting number exactly since it can represent
numbers a length n. So only n digits are stored. This gives rise to error.
a 23
= = (0. 075666E2)
b 300
a
If two decimal digit arithmetic is used then * = .76 × 10–1 (0.76E – 1)
b
Generated error is given by xwy – xw*y. However, computers are designed in such a
way that
xwy − xw* y
r.g.e. = rxwy =
xwy
1 1– n
r.g.e. < β , if rounding is used.
2
Due to generated error, the associative and the distributive laws of arithmetic are not
satisfied in some cases as shown below:
1
In a computer 3 × would be represented as 0.999999 (in case of six significant
3
digit) but by hand computation it is one. This simple illustration suggested that
everything does not go well on computers. More precisely 0.333333 + 0.333333
+0.333333 = 0.999999.
10
Computer Arithmetic
1.2.3 Non-Associativity of Arithmetic
Example 5: Let a = 0.345 × 100, b = 0.245 × 10–3 and c = 0.432 × 10–3. Using
3-digit decimal arithmetic with rounding, we have
b+c = 0.000245 + 0.000432
= 0.000677 (in accumulator)
= 0.677 × 10–3
a + (b + c) = 0.345 + 0.000677 (in accumulator)
= 0.346 × 100 (in memory) with rounding
a+b = 0.345 × 100 + 0.245 × 10–3
= 0.345 × 100 (in memory)
(a + b) + c = 0.345432 (in accumulator)
= 0.345 × 100 (in memory)
These above examples show that error is due to finite digit arithmetic.
11
Computer Arithmetic 1 1
and Solution of x − x * = 0.0005 = (.001) = × 10– 3
2 2
Non-linear Equations
x − x*
= 0.0000044 ≤ .000005
x
1 1 –5 1
≤ (.00001) = 10 = × 101–6
2 2 2
Hence, x* approximates x correct to 6 significant decimal digits.
xwy − x * wy *
=
xwy
xwy − x * w* y *
rxwy =
xwy
xwy − x * wy * x * wy * − x * w* y *
= +
xwy xwy
xwy − x * wy * x * wy * − x * w* y *
= +
xwy x * wy *
for the first approximation. So total relative error = relative propagated error +
relative generated error.
12
Computer Arithmetic
f(x) − f(x * )
rf(x) = (1)
f(x)
Note: For evaluation of f(x) in denominator of r.h.s. after simplification, f(x) must be
replaced by f(x*) in some cases. So
xf' (x* )
rf(x) = rx
f(x)
xf' (x* )
The expression is called condition number of f(x) at x. The larger the
f(x)
condition number, the more ill-conditioned the function is said to be.
Example 9:
1
1. Let f(x) = xand x approximates x* correct to n significant decimal digits.
10
Prove that f(x*) approximates f(x) correct to (n+1) significant decimal digits.
xf' (x* )
rf(x) = rx.
f(x)
9
1 * − 10
x⋅ x
= rx. 10 1
x 10
1
= rx
10
1 1 1 1
rf(x) = rx ≤ . .101–n = 101–(n+1)
10
10 2 2
Example 10: The function f(x*) = ex is to be evaluated for any x, 0 ≤ x ≤ 50, correct
to at least 6 significant digits. What digit arithmetic should be used to get the required
accuracy?
xf' (x x )
rf(x) = rx
f(x)
*
x.e x
= rx
ex
= rxx
or 10–n ≤ 10–8.2
–n ≤ –8 + log 10
2
2
8 – log 10 ≤ n or 8 – .3 ≤ n
That is n ≥ 8.
Now we can find the results for propagated error in an addition, multiplication,
subtraction and division by using the above results.
x
(d) Division: f(x,y) =
y
1 x
e x = ex. – ey. 2
y
y y
ex ey
rx = –
y
x y
= rx – ry
14
Computer Arithmetic
1.3 SOME PITFALLS IN COMPUTATIONS
As mentioned earlier, the computer arithmetic is not completely exact. Computer
arithmetic sometimes leads to undesirable consequences, which we discuss below:
x y
rx– y = rx – ry
x− y x− y
will become very large and further becomes large if rx and ry are of opposite signs.
But if x* and y* agree at left most digits (one or more) then the left most digits will
cancel and there will be loss of significant digits.
The more the digit on left agrees the more loss of significant digits would take place.
A similar loss in significant digits occurs when a number is divided by a small number
(or multiplied by a very large number).
Remark 1
Solution:
1 1-3
Given, rx, ry, ≤ 10
2
z* = x* – y* = .3454 – .3443
= .0011
= .11 × 10-2
This is correct to one significant digit since last digits 4 in x* and 3 in y* are not
reliable and second significant digit of i* is derived from the fourth digits of x* and y*.
1 1 1
Max. rz = 101–1 = = 100. .10–2
2 2 2
≥ 100 rx, 100 ry
15
Computer Arithmetic Example 12: Let x = .657562 × 103 and y = .657557 × 103. If we round these
and Solution of numbers then
Non-linear Equations
Now
u .253 × 10 −2 253 1
= = ≠
x−y .005 500 2
u*
whereas =∞
x* − y
Example 13: Solve the quadratic equation x2 + 9.9 x – 1 = 0 using two decimal digit
floating arithmetic with rounding.
Solution:
2
− b + b 2 − 4ac − 9.9 + (9.9) − 4.1.( −1)
x= =
2a 2
while the true solutions are – 10 and 0.1. Now, if we rationalize the expression.
− b + b 2 − 4ac −4ac
x= =
2a 2a(b + b 2 − 4ac )
−2c 2
= =
2
b + b − 4ac ) 9.9 + 102
2 2 2
= = = ≅ .1 .(0.1000024)
9.9 + 10 19.9 20
16
results are generally accepted when C and ε are small. An algorithm that exhibits Computer Arithmetic
linear growth of error is stable.
If Rn(ε) ≈ Cknε, k > 1, C > 0, k and C are independent of n, then growth of error is
called exponential. Since the term kn becomes large for even relatively small values
of n. The final result will be completely erroneous in case of exponential growth of
error. Such algorithm is called unstable.
Example 14:
1 1 1
Let yn = n! e − 1 + + +, + (1)
1! 2! n!
1 1
yn = + +….. (2)
n + 1 (n + 1)(n + 2 )
1 1 1
yn < + 2 + 3 + ….
n n n
1
n 1
0 ≤ yn < =
1 n −1
1−
n
yn → 0 as n → ∞
i.e. {yn} is monotonically decreasing sequence which converges to zero. The value of
y9 using (2) is y9 = .10991 correct to 5 significant figures.
1 1 1
yn+1 = (n+1)! e − 1 + + +, +
1! 2! (n + 1)!
y0 = e – 1 =1.7183, we get
y1 = .7183
y2 = .4366
y3 = .3098
y4 = .2392
y5 = .1960
y6 = .1760
y7 = .2320
y8 = .8560
y9 = 6.7040
This value is not correct even to a single significant digit, because algorithm is
unstable. This is shown computationally. Now we show it theoretically.
17
Computer Arithmetic yn+1 – y *n +1 = (n+1) (yn–y *n )
and Solution of
Non-linear Equations i.e. en+1 = (n+1) en
en+1 = (n+1)! eo
en+1 > 2n eofor n > 1
1 n
en > .2 eo
2
1
Example 15: The integral En = ∫ 0
xnex-1dx is positive for all n ≥ 0. But if we
1 1
integrate by parts, we get En = 1 – nEn (= xnex-1 ∫ 0
─ ∫ 0
n xn-1ex-1dx).
1
Starting from E1 = .36787968 as an approximation to (accurate value of E1) correct
e
to 7 significant digits, we observe that En becomes negative after a finite number of
iteration (in 8 digit arithmetic). Explain.
Solution
En – E *n = –n(En–1 – En–1)
en = (–1)n n!en
en≥ ½ .2n eohence process is unstable.
Using 4 digit floating point arithmetic and E1 = 0.3678 × 100 we have E2 = 0.2650,
E3 = 0.2050, E4 = 0.1800, E5 = 0.1000, E6 = 0.4000. By inspection of the arithmetic,
the error in the result is due to rounding error committed in approximating E2.
1.4 SUMMARY
In this unit we have covered the following:
After discussing floating-point representation of numbers we have discussed the
arithmetic operations with normalized floating-point numbers. This leads to a
discussion on rounding errors. Also we have discussed other sources of errors… like
propagated errors loss of significant digits etc. Very brief idea about stability or
instability of a numerical algorithm is presented also.
1.5 EXERCISES
E1) Give the floating point representation of the following numbers in 2 decimal
digit and 4 decimal digit floating point number using (i) rounding and (ii)
18
chopping. Computer Arithmetic
(a) 37.21829
(b) 0.022718
(c) 3000527.11059
E3) How many bits of significance will be lost in the following subtraction?
37.593621 – 37.584216
correctly to the number of digits used when it is near zero for (i) and (ii),very
much larger than α for (iii)
x3
E7) Evaluate f (x ) = when x =.12 ×10 −10 using two digit arithmetic.
x − Sinx
a −b a b
E8) Let u = and v = − when a = .41, b = .36 and c = .70. Using two
c c c
digit arithmetic show that evis nearly two times eu.
x 2 + 111.11x + 1.2121 = 0
1.6 SOLUTIONS/ANSWERS
E1) (a) rounding chopping
.37 × 102 .37 × 102
.3722 × 102 .3721 × 102
19
Computer Arithmetic (b) .23 × 10–1 .22 × 10–1
and Solution of .2272 × 10–1 .2271 × 10–1
Non-linear Equations
(c) .31 × 102 .30 × 102
.3056 × 102 .3055 × 102
E2) Let
a = .5555 × 101
b = .4545 × 101
c = .4535 × 101
Hence a(b – c) ≠ ab – ac
20
and y*, the relative error in z* is possibly 10,000 times the relative error in x* or y*. Computer Arithmetic
Loss of significant digits is, therefore, dangerous only if we wish to keep the relative
error small.
1
Given rx , r y < 10 1−7
2
z* = (0.9405)10 −2
1 1
Max rz = 10 1−3 = 10,000. 10 −6 ≥ 10,000 rz ,10,000 r y
2 2
(x − y ) − (x* − y * ) 0.0000034322
= ≈ 3 × 10 − 2
x− y 0.0001234322
The magnitude of this relative error is quite large when compared with the relative
errors of x* and y* (which cannot exceed 5 × 10–5 and in this case it is approximately
1.3 × 10–5)
x
E5) Here f (x ) = e 100
r f(x) ≈ rx .
( )
xf' x *
≈ rx
xf' x * ( )
x
= rx .e 100 .
1
.
1
f(x) f(x) 100 x
e 100
i.e.
1 1 1 1
rf(x) ≈ rx ≤ . 10 1 −4 = 10 1 −6.
100 100 2 2
x* x
Therefore, e 100
approximates e 100
correct for 6 significant decimal digits.
x 3 x 5 x7
Sin x = x − + − + ...
3! 5! 7!
x 3 x 5 x7
Then f (x ) = x – Sin x = − + + ...
3! 5! 7!
x3
The series starting with is very effective for calculation
6
f (x ) when x is small.
f(x) = x − x 2 − α
x − x2 − α
α
as f(x) = 2
x+ x −α =
x+ x −α
2 x + x2 − α
x3 x5
E7) Sinx = x − + −...
3! 5!
( )
Sinx = .12 ×10 −10 =.12 ×10 −10 −.17 ×10 −32 +... ≈.12 ×10 −10
x3
So f(x) = =∞
x − Sinx
x3
But f(x) = can be simplified to
x − Sinx
x3 1
= 3 5
=
x x 1 x2
− + ... − + ...
3! 5! 3! 5!
x3
The value of for =.12 ×10 −10
x − Sinx
1
is = 6.
1
3!
E9) The word condition is used to describe the sensitivity of the function value
f ( x ) to changes in the argument x . The informal formula for Condition of
f at x
22
= max
( )
f (x ) − f x * x − x*
: x − x * " small"
Computer Arithmetic
f (x ) x
f' (x)x
≈
f(x)
The larger the condition, the more ill-conditioned the function is said to be.
=
(
f 1 (x)x 20x 1 − x 2 x
=
)2x 2
f(x) (
10x 1 − x 2 )
1− x2
This number can be very large when x is near 1 or –1 signalling that the
function is quite ill-conditioned.
23
Computer Arithmetic
and Solution of
Non-linear Equations
24