Arbitrary-precision arithmetic
This article needs additional citations for verification. (July 2007) |
In computer science, arbitrary-precision arithmetic is a technique whereby calculations are performed on numbers whose digits of precision are limited only by the available memory of the host system. This contrasts with the faster fixed-precision arithmetic found in most ALU hardware, which typically offers between 6 and 16 decimal digits. It is also called bignum arithmetic, and sometimes even "infinite-precision arithmetic" (which is a misnomer, since the number of digits is both finite and bounded in practice).
Several modern programming languages have built-in support for bignums, and others have libraries available for bignum integer and floating-point math. Rather than store values as a fixed number of binary bits related to the size of the processor register, these implementations typically use variable-length arrays of digits. Since division can introduce infinitely repeating decimals, they often store rational numbers as an arbitrary-precision integer for both the numerator and denominator, with the greatest common divisor divided out.
Arbitrary precision is used in applications where the speed of arithmetic is not a limiting factor, or where precise results with very large numbers are required. It should not be confused with the symbolic computation provided by many computer algebra systems, which represent numbers by expressions such as , and can thus represent any computable number with infinite precision. Yet symbolic computation can only evaluate to a finite precision, since the expression must be computed with arbitrary-precision arithmetic.
Applications
A common application is public-key cryptography (such as that in every modern Web browser), whose algorithms commonly employ arithmetic with integers of hundreds or thousands of digits. Another is in human-centric applications where artificial limits and overflows would be inappropriate. It is also useful for checking the results of fixed-precision calculations, and for determining the best possible value for coefficients needed in formulae (the √⅓ that appears in Gaussian integration, for example).
Arbitrary precision arithmetic is also used to compute fundamental mathematical constants such as π to millions or more digits and to analyze the properties of the digit strings (e.g.[1]), or more generally to investigate the precise behaviour of functions such as the Riemann Zeta function where certain questions are difficult to explore via analytical methods. Another example is in rendering Fractal images with an extremely high magnification, such as those found in the Mandelbrot set.
Arbitrary-precision arithmetic can also be used to avoid overflow, which is an inherent limitation of fixed-precision arithmetic. Similar to a 5-digit odometer's display which changes from 99999 to 00000, a fixed-precision integer can exhibit wraparound if numbers grow too large to represent at the fixed level of precision. Some processors can deal with overflow by saturation, which means that if a result would be unrepresentable, it is replaced with the nearest representable value. (With 16-bit unsigned saturation, adding 1 to 65535 yields 65535 — see saturation arithmetic.) Some processors can generate an exception if an arithmetic result exceeds the available precision. Where necessary, the exception can be caught and the operation can be restarted in software with arbitrary-precision operands.
Since many computers now routinely use 32-bit or even 64-bit integers, it can often be guaranteed that the integer numbers in a specific application will never grow large enough to cause an overflow. However, as time passes the exact nature of the constraint can be forgotten, as in implementations of the Binary search method which often employ the form (L + R)/2; this means that for correct functioning the sum of L and R is limited to the machine word size (eg. 32 or 64 bits), not just the individual variables.
Some programming languages such as Lisp, Rexx, Python, Perl and Ruby use, or have an option to use, arbitrary-precision numbers for all integer arithmetic. Although this reduces performance, it eliminates the possibility of incorrect results (or exceptions) due to simple overflow. It also makes it possible to guarantee that arithmetic results will be the same on all machines, regardless of any particular machine's word size. The exclusive use of arbitrary-precision numbers in a programming language also simplifies the language, because "a number is a number" and there is no need for the panoply of types needed to represent different levels of precision.
Implementation Issues
Arbitrary-precision arithmetic is considerably slower than arithmetic using numbers that fit entirely within processor registers, since the latter are usually implemented in hardware arithmetic whereas the former must be implemented in software[citation needed]. There are exceptions, as certain "variable word length" machines of the 1950s and 1960s, notably the IBM 1401 and the Honeywell "Liberator" series, could manipulate numbers bound only by available storage, with an extra bit that delimited the value. Even if the computer lacks hardware for certain operations (such as integer division, or all floating-point operations) and software is provided instead it will use number sizes closely related to the available hardware registers: one or two words only and definitely not N words.
Numbers can be stored in a fixed-point format, or in a floating-point format as a significand multiplied by an arbitrary exponent. However, since division almost immediately introduces infinitely repeating sequences of digits (such as 4/7 in decimal), should this possibility arise then either the representation would be truncated at some satisfactory size or else rational numbers would be used: a large integer for the numerator and for the denominator, with the greatest common divisor divided out. Unfortunately, arithmetic with rational numbers can become unwieldy very swiftly: 1/99 - 1/100 = 1/9900, and if 1/101 is then added the result is 10001/999900.
Bounding the size of arbitrary-precision numbers is not only the total storage available, but the variables used by the software to index the digit strings. These are typically themselves limited in size.
Numerous algorithms have been developed to efficiently perform arithmetic operations on numbers stored with arbitrary precision. In particular, supposing that N digits are employed, algorithms have been designed to minimize the asymptotic complexity for large N.
The simplest algorithms are for addition and subtraction, where one simply adds or subtracts the digits in sequence, carrying as necessary, which yields an O(N) algorithm (see big O notation).
For multiplication, the most straightforward algorithms used for multiplying numbers by hand (as taught in primary school) requires operations, but multiplication algorithms that achieve complexity have been devised, such as the Schönhage–Strassen algorithm, based on fast Fourier transforms, and there are also algorithms with slightly worse complexity but with sometimes superior real-world performance for smaller N.
For a list of algorithms along with complexity estimates, see: Computational complexity of mathematical operations
Example
The calculation of factorials produces very large numbers very swiftly. This is not a problem for their usage in many formulae (such as Taylor series) because they appear along with other terms so that given careful attention to the order of evaluation the net calculation value is not troublesome. If approximate values of factorial numbers are desired, Stirling's approximation gives good results. If exact values are of interest, then the integer limit is soon exceeded. Even floating-point approximations soon exceed the maximum floating-point value possible, to the degree that the calculations should be recast into using the log of the number.
But if exact values for large factorials are desired, then special software is required, somewhat as in the pseudocode that follows, which implements the classic primary school algorithm to calculate 1, 1*2, 1*2*3, 1*2*3*4, etc. the successive factorial numbers.
Constant Limit = 1000; %Sufficient digits.
Constant Base = 10; %The base of the simulated arithmetic.
Array digit[1:Limit] of integer; %The big number.
Integer carry,d; %Assistants during multiplication.
Integer last,i; %Indices to the big number's digits.
Array text[1:Limit] of character;
Constant tdigit[0:9] of character = ["0","1","2","3","4","5","6","7","8","9"];
BEGIN
digit:=0; %Clear the whole array.
digit[1]:=1; %The big number starts with 1,
last:=1; %Its highest-order digit is number 1.
for n:=1 to 365 do
carry:=0; %Start a multiply.
for i:=1 to last do %Step along every digit.
d:=digit[i]*n + carry; %The classic multiply.
digit[i]:=d mod Base; %The low-order digit of the result.
carry:=d div Base; %The carry to the next digit.
next i;
while carry > 0 %Store the carry in the big number.
if last >= Limit then croak('Overflow!'); %If possible!
last:=last + 1; %One more digit.
digit[last]:=carry mod Base; %Placed.
carry:=carry div Base; %The carry reduced.
Wend %With n > base, maybe > 1 digit extra.
text:=" "; %Now prepare the output.
for i:=1 to last do %Translate from binary to text.
text[Limit - i + 1]:=tdigit[digit[i]]; %Reversing the order.
next i; %Arabic numerals put the low order last.
Print text," = ",n,"!";
next n;
END;
With the example in view, a number of details can be described. The most important is the choice of the representation of the big number. In this case, only integer values are required for factorials, so a fixed-point scheme is adequate. The powers of the base are zero and upwards, so it is convenient to have successive elements of the array represent higher powers. The computer language may not enable a convenient choice of the array bounds (for example, the lower bound might have to be one, always, or zero, always) and the requirements of the calculation in general might not involve a permitted bound, so this example proceeds with an array starting from one, not zero, to demonstrate the simple issues of accountancy. That the index into the digit array corresponds to a certain power of the base is not directly utilised as a part of the method.
The second most important decision is in the choice of the base of arithmetic, here ten. There are many considerations. The scratchpad variable d must be able to hold the result of a single-digit multiply plus the carry from the previous digit's multiply. In base ten, a sixteen-bit integer is certainly adequate as it allows up to 32767. However, this example cheats, in that the value of n is not itself limited to be a single-digit base ten number. This has the consequence that the method will fail for n > 3200 or so, not a pressing limit in this example. In general, n would be a multi-digit big number also. A second consequence of the shortcut is that after the multi-digit multiply has been completed, the last value of carry must be carried into higher-order digits beyond what was the upper limit of the previous number because it may be carrying multiple digits not just the single digit that would otherwise be normal.
Flowing from the choice of the base for the bignumber comes the issue of presenting its value. Because the base is ten, the result could be shown simply by printing the successive digits of array digit, but, they would appear with the highest-order digit last (so that a hundred and twenty-three would appear as "321") because of the first choice for the representation of the bignumber. The tradition for Arabic numbers is the other way around, so they could be printed in reverse order. But that would present the number with leading zeroes ("00000...000123") which may not be appreciated, so the final decision is to build the representation in a text variable and then print that. The first few results (with many leading spaces removed) are:
Reach of computer numbers. 1 = 1! 2 = 2! 6 = 3! 24 = 4! 120 = 5! 8-bit unsigned 720 = 6! 5040 = 7! 40320 = 8! 16-bit unsigned 362880 = 9! 3628800 = 10! 39916800 = 11! 479001600 = 12! 32-bit unsigned 6227020800 = 13! 87178291200 = 14! 1307674368000 = 15! 20922789888000 = 16! 355687428096000 = 17! 6402373705728000 = 18! 121645100408832000 = 19! 2432902008176640000 = 20! 64-bit unsigned 51090942171709440000 = 21! 1124000727777607680000 = 22! 25852016738884976640000 = 23! 620448401733239439360000 = 24! 15511210043330985984000000 = 25! 403291461126605635584000000 = 26! 10888869450418352160768000000 = 27! 304888344611713860501504000000 = 28! 8841761993739701954543616000000 = 29! 265252859812191058636308480000000 = 30! 8222838654177922817725562880000000 = 31! 263130836933693530167218012160000000 = 32! 8683317618811886495518194401280000000 = 33! 295232799039604140847618609643520000000 = 34! 128-bit unsigned 10333147966386144929666651337523200000000 = 35!
More serious attempts would try to use the available arithmetic of the computer more efficiently. A simple escalation would be to base 100 (with corresponding changes to the translation process for output), or, bigger computer variables (such as 32-bit integers) could be used so as to enable larger bases, such as 10,000. Conversion from non-decimal bases to a decimal base for output is a significant computation. Nevertheless, working in bases closer to the computer's built-in integer operations offers advantages. Operations on an integer holding a value such as six take just as long as the same operation on an integer holding a larger value, so there are large gains in packing as much of a bignumber into each element of the digit array as possible. The computer may also offer facilities for splitting a product into a digit and carry without requiring the two operations of mod and div as in the example. For instance, the IBM1130 integer multiply of 16-bit integers (actually of a 32-bit accumulator and extension register pair, with a nominated 16-bit word) produced a 32-bit result which could be treated as two separate 16-bit words, thus if the bignumber base was 65536 the carry would be in the high-order sixteen bits, and the digit would be in the lower-order sixteen bits. No mod and div operations would be required to separate them.
This sort of detail is the grist of machine-code programmers, and a suitable bignumber routine would run orders of magnitude faster than the result of the compilation of a high-level language, which do not offer similar facilities. Even so, it may be possible to juggle 16-bit and 32-bit variables in cunning ways, but the tricks (essentially, arranging that a 32-bit variable overlays the same storage as two 16-bit variables) are frowned upon by computer language purists. Thus the EQUIVALENCE statement of Fortran and the OVERLAY statement of Pl/1 are deprecated.
For a single-digit multiply the working variables must be able to hold the value (base -1)² + carry, where the maximum value of the carry is (base - 1). Notice that the IBM1130 offered a working register of 32 bits for 16-bit arithmetic so that many calculations whose intermediate results exceeded the 16-bit limit nevertheless worked; in a high-level language, if the bignumber's digit array were of unsigned 16-bit integers and the base 65536, the maximum result of a digit multiply would not exceed 4,294,901,760 but this exceeds the capacity of a 32-bit signed integer which is 2³¹ - 1 = 2,147,483,647. The high-level language may not offer a 32-bit unsigned integer as a variable (limit 2³² - 1 = 4,294,967,295), even if the computer's internal arithmetic register allows this or is bigger still. Or, if 32-bit unsigned integers are available, what then of the required 64-bit unsigned integers?
Similarly, the variables used to index the digit array are themselves limited to sixteen (or thirty-two bits, etc.); even if sixty-four bit integers were available there is still an upper limit aside from storage availability. A simple approach would be to deal with the bignumber's digits in blocks of some convenient size so that the addressing would be via (block i, digit j) where i and j would be small integers, or, one could escalate to employing bignumber techniques for the indexing variables. Either way, storage limits await: the entire observable universe contains no more than an estimated nuclear particles.
Choosing instead a base of 256 has the advantage of simplicity, and moreover, it is quite possible to check for the correct function of the basic arithmetic operations on all possible digit combinations. Errors in computer hardware are not unknown. Since the prime purpose for slow but exact or at least high-precision computation is to obtain definitive results, some sort of assurance is helpful.
History
An early widespread implementation was available via the IBM 1620 of 1959-1970. The 1620 was a decimal-digit machine which used discrete transistors, yet it had hardware that used lookup tables to perform integer or floating-point arithmetic on digit strings of a length that could be from two to whatever memory was available. (The mantissa of floating-point numbers was restricted to 100 digits or fewer, and the exponent of floating-point numbers was restricted to two digits only.) The largest memory supplied offered sixty thousand digits, however Fortran compilers for the 1620 settled on fixed sizes such as ten (though it could be specified on a control card if the default was not satisfactory).
IBM's first business computer, the IBM 702 (a vacuum tube machine), implemented integer arithmetic entirely in hardware on digit strings of any length from one to 511 digits. The earliest widespread software implementation of arbitrary precision arithmetic was probably that in Maclisp. Later, around 1980, the VAX/VMS and VM/CMS operating systems offered bignum facilities as a collection of string functions in the one case and in the EXEC 2 and REXX languages in the other.
Arbitrary-precision software
Stand-alone application software
Software that supports arbitrary precision computations:
- bc an arbitrary-precision math program that comes standard on most *nix systems.
- Calc C-style arbitrary precision calculator (LGPL2)
- Maxima (software): a Computer algebra system whose "bignum" integers are directly inherited from its implementation language Common Lisp. In addition, it supports arbitrary-precision floating-point numbers, bigfloats.
- Mathematica, and several other computer algebra software include arbitrary-precision arithmetic. Mathematica employs GMP for approximate number computation.
- PARI/GP, an open source computer algebra system that supports arbitrary precision.
- TTCalc an open source bignum mathematical calculator (TTMath library).
- Virtual Calc 2000: arbitrary-precision and arbitrary-base calculator for Windows shareware
- Wcalc an open source calculator (GMP and MPFR libraries)
Online tools
Software that supports arbitrary precision computations available through a web browser:
- Big Online Calculator: a bignum calculator for floating point numbers (TTMath library).
Libraries
Arbitrary-precision arithmetic in most computer software is implemented by calling an external library that provides data types and subroutines to store numbers with the requested precision and to perform computations.
Different libraries have different ways of representing arbitrary-precision numbers, some libraries work only with integer numbers, others store floating point numbers in a variety of bases (decimal or binary powers). Rather than representing a number as single value some store numbers as a numerator/denominator pair (Rationals) and some can fully represent real numbers.
Package / Library Name | Number Type | Language | License |
---|---|---|---|
apfloat | Decimal floats, integers, rationals, and complex | Java and C++ | LGPL and Freeware |
ARPREC and MPFUN | Integers, binary floats, complex binary floats | C++ with C++ and Fortran bindings | BSD |
Base One Number Class | Decimal floats | C++ | Proprietary |
bbnum library | Integers and floats | Assembler and C++ | New BSD |
phpseclib | Decimal floats | PHP | LGPL |
BigDigits | Naturals | C | Freeware [1] |
BigFloat | Binary Floats | C++ | GPL |
BigNum | Binary Integers, Floats (with math functions) | C# / .NET | Freeware |
CLN, a Class Library for Numbers | Integers, rationals, and complex | C and C++ | GPL |
Computable Real Numbers | Reals | Common Lisp | |
IMSL | C | Commercial | |
decNumber | Decimals | C | ICU licence (MIT licence) [2] |
FMLIB | Floats | Fortran | |
GNU Multi-Precision Library (and MPFR) | Integers, rationals and floats | C and C++ with bindings (GMPY,...) | LGPL |
GNU Multi-Precision Library for .NET | Integers | C# / .NET | LGPL |
Eiffel Arbitrary Precision Mathematics Library | Integers | Eiffel | LGPL |
HugeCalc | Integers | C++ and Assembler | Proprietary |
IMath | Integers and rationals | C | Freeware |
IntX | Integers | C# / .NET | New BSD |
JScience LargeInteger | Integers | Java | |
libmpdec | Decimals | C and C++ | Simplified BSD |
LibTomMath | Integers | C and C++ | Public domain |
LiDIA | Integers, floats, complex floats and rationals | C and C++ | Free for non-commercial use |
MAPM | Integers and decimal floats | C (bindings for C++ and Lua) | Freeware |
MIRACL | Integers and rationals | C and C++ | Free for non-commercial use |
MPI | Integers | C | LGPL |
MPArith | Integers, floats, and rationals | Pascal / Delphi | zlib |
mpmath | Floats, complex floats | Python | New BSD |
NTL | Integers, floats | C and C++ | GPL |
TTMath library | Integers and binary floats | Assembler and C++ | New BSD |
vecLib.framework | Integers | C | Proprietary |
W3b.Sine | Decimal floats | C# / .NET | New BSD |
Languages
Programming languages that supports arbitrary precision computations, either built-in, or in the standard library of the language:
- Common Lisp: The ANSI Common Lisp standard supports arbitrary precision integer, ratio and complex numbers.
- C# System.Numerics.BigInteger (from .NET 4.0)
- dc: the POSIX desk calculator
- Erlang: the built-in Integer datatype implements arbitrary-precision arithmetic.
- Haskell: the built-in Integer datatype implements arbitrary-precision arithmetic.
- Java: class java.math.BigInteger (integer), Class java.math.BigDecimal (decimal)
- OCaml: The Num library supports arbitrary-precision integers and rationals.
- Python: the built-in int (3.x) / long (2.x) integer type is of arbitrary precision. The Decimal class in the standard library module decimal has user definable precision.
- REXX: programming language (including Open Object Rexx and NetRexx)
- Ruby: the built-in Bignum integer type is of arbitrary precision. The BigDecimal class in the standard library module bigdecimal has user definable precision.
- Scheme: R5RS encourages, and R6RS requires, that exact integers and exact rationals be of arbitrary precision.
- Scala: Class BigInt.
- Smalltalk: programming languages (including Squeak, Smalltalk/X, GNU Smalltalk, Dolphin Smalltalk, etc.)
References
- ^ R.K. Pathria, A Statistical Study of the Randomness amongst the first 10,000 Digits of Pi, 1962 Mathematics of Computation , v16, N77-80, pp 188-97 in which appears the remark "Such an extreme pattern is dangerous even if diluted by one of its neighbouring blocks" - this was the occurrence of the sequence 77 twenty-eight times in one block of a thousand digits
- Knuth, Donald, The Art of Computer Programming, ISBN 0-201-89684-2, Volume 2: Seminumerical Algorithms, Section 4.3.1: The Classical Algorithms
- Implementing Multiple-Precision Arithmetic, Part 1