Flashpap
Flashpap
Karl-Dietrich Neubert
Abstract
1 Introduction
Great attention has been paid in the past to sorting algorithms based on the
comparison of elements.[1,2,3] In theory, these algorithms require 0(N 2) time if
simple and 0( N logN ) time if more complex. In contrast, sorting algorithms based
on the classification of elements have found only limited attention. [1,2,3] These
algorithms perform ordering in 0(N) time and thus achieve the absolute lowest
time complexity for sorting N elements. [4] However, since sorting by classification
is believed to require considerable auxiliary memory space, it has not found wide
acceptance despite its favorable time behaviour.
Sorting may be viewed to be that permutation which is the inverse to the
permutation producing the unsorted array of elements from the sorted one. In the
following we understand by permutation this inverse permutation. It is an inherent
property of permutations, that, in general, they do not consist of only one cycle
(Fig.1), but of many.[5]
Number Of Strings N =7 Number Of Strings N =7
6 8 8 11 11 11 11 11 11 6 0 0 0 0 11 11 11 11
5 2 2 2 2 11 11 11 11 5 2 2 2 2 2 2 10 10
4 11 9 9 9 9 9 9 9 4 5 6 6 6 6 6 6 6
3 11 11 11 8 8 8 8 8 3 4 4 5 5 5 5 5 5
2 1 1 1 1 1 1 4 4 2 11 11 11 4 4 4 4 4
1 4 4 4 4 4 2 2 2 1 10 10 10 10 10 10 10 2
0 9 9 9 9 9 9 9 1 0 6 6 6 6 6 0 0 0
Cycle Leader Position Key - Value Cycle Length Cycle Leader Position Key - Value Cycle Length
0 9 7 0 6 5
1 10 2
2
2
2 Design of Flash-Sort
In this paper we discuss the essence of the algorithm, Fig.2, by assuming the
elements to be sorted to be strings of length 1 byt e, stored in an array A(i), i = 0, 1,
2, 3, ..., N -1. We take the view that the array is arranged vertically and that in the
ordered state, small numbers reside in the lower and large numbers in the upper
part of the array, i.e. the large numbers tend to sift up during ordering.
We introduce a vector L of length M. Because of ist functional role we call this
vector the class pointer vector. In CLASSIFY the elements of the array A are
counted according to their key for each of the M classes. After completion of the L-
VECTOR, each L(k) is equal to the cumulative number of elements A(i) in all
the classes 0 through k. The final component L(M-1) is equal to N-1, independent of
the distribution of the A(i) into the classes.
Then the words LEADER and PERMUTE are executed in turn until the sorting is
completed. In order to facilitate the discussion, we call the position A(i) “empty" if
A(i) has not yet been replaced by some other element. At the beginning of the
permutation, all positions A(i) are empty, since no element has been moved. If
during the permutation an element A(i) has been replaced by some other element
we call its position "occupied."
Each cycle starts with a cycle leader. If during the permutation cycle as descibed by
the word PERMUTE, the position of an element A(i) becomes occupied by some
element FLASH, the corresponding class pointer L(KEY(FLASH)) will be
decremented and then will point to the next empty position of the class
KEY(FLASH). If the last empty position of that class which provides the cycle
leader, becomes occupied, the current permutation cycle is complete.The
completion of a cycle is flagged by the fact that the pointer L(KEY(A(j))) of the class
providing the cycle leader points to one position below the lowest position of this
class. Thus, if A(j) is a cycle leader, the completion of the corresponding cycle is
given by the condition
L(KEY(A( j))) < j (1)
: KEY-VALUE
( COLUMN @ + COLLATION-TABLE ) C@ ;
: CLASSIFY
0 L M @ WSIZE * 0 FILL
N @ 0 DO
1 I A KEY-VALUE L+!
LOOP ;
: L-VECTOR : LEAP
-1 M @ 0 DO / JJ @ -1 > IF
I L DUP >R @ + DUP R> ! / -1 K +!
LOOP DROP ; / BEGIN
/ 1 K +!
: LEADER / K @ 1+ L @ DUP A KEY-VALUE L @
LEAP <---------------------| SWAP >=
BEGIN \ UNTIL
1 JJ +! \ K @ L @ JJ !
JJ @ A KEY-VALUE L @ JJ @ >= \ THEN ;
UNTIL
JJ @ A KEY-VALUE K ! ;
: PERMUTE
JJ @ A DUP @
SWAP KEY-VALUE
BEGIN
K @ L @ JJ @ >= WHILE
K!
K @ L @ A DUP KEY-VALUE >R
DUP @ >R
!
R>
R>
-1 K @ L +!
-1 NMOVE +!
REPEAT
DROP DROP ;
4
4
j A (j) k L(k)
105 L ( 7 ) = 103
EMPTY
104 7
103
102 L ( 6 ) = 101
OCCUPIED
101 6
100
99 5 L ( 5 ) = 97
98
OCCUPIED
97 4 AND
CLASS COMPLETED
Fig. 3 Various typical possibilities of pointer constellation. j = 0 ... 105 ... N.
Since the sorting cannot end within a cycle, this condition needs to be checked only
between cycles.
The word LEAP in the code, Fig.2, is optional. If this word is ignored, a cycle leader
is found by considering up to the last cycle leader every element to be a candidate
for cycle leader. Even though suggestive, it is not possible, to test only those
elements which are designated by the L-vector, because this vector points to the
highest elements of a class and not to the lowest. Fortunately, a hybrid method may
be used. The word LEAP first finds, by increasing not j but k, a class just below the
class which contains the cycle leader. A member of this class may then be used as
starting point for finding a new cycle leader by increasing j. Thus, If nnumber of cycles is the
number of cycles, the number of steps to find a new cycle leader is reduced in the
average from N/2 to n number of cycles ∗ N/2M. With M=256 and the conservative estimate
n number of cycles = 8, the reduction factor is equal to about 30.
The reader may have noticed that we do not mention the well-recognized problem
connected with elements already in place before the permutation, elements, which
cannot be distinguished from those put into place during the permutation. Here this
problem does not arise, since every element independent of its class number and
its final position is moved exactly once. In the special case that there is exactly one
empty position left in a class and the corresponding element is cycle leader, the
move degenerate into taking the element out of place and putting it back into the
same place. However, since the class pointer and move counter are at still updated
correctly, no discrepancies arise from these cycles of length 1.
3 Runtimes
6
6
Fig. 4 shows the runtime for sorting N random strings of 1 byte length as a
function of N, using a PC with a 166 MHz Pentium Processor and the 80386
UR/FORTH Vers.1.21 of LMI. These runtimes are measured disregarding the word
LEAP. The measured runtimes do not ly on a straight line, but within a cone as
marked by the shaded area. The reason for this spread becomes evident from
Fig.5, where the runtimes for N = 10 6 strings are shown as a function of the
position of the last cycle leader. The runtimes exhibit towards the minimal runtime
an offset, which is proportional to the position of the last cycle leader. This fact
reflects the time needed for finding the last cycle leader. Clearly, the cone in Fig.4 is
a consequence of this effect. The number of cycles, on the other hand, is of no
influence on the runtime. This number usually is a small number, in this example
typically between 5 and 8. Obviously, the smallest runtime occurs, if the last cycle
leader is the element A(0), which implies, that there is only one cycle, which is a
very rare event, indeed. For the other extreme, that the last cycle leader is nearly
the last element of the array - which is also extremely rare - about 30% of the total
runtime would be absorbed in finding cycle leaders.
15 15
RUNTIME t [ sec ]
RUNTIME t [ sec ]
10 10 NUMBER OF STRINGS
N = 1 000 000
5 5
0 0
0 250 000 500 000 750 000 1 000 000 0 0,25 0,5 0,75 1
The average time for finding cycle leaders are roughly halve as large the maximal
ones. i.e. only about 15% of the total runtime are required in the average. A cycle of
length 1, which is the case rather frequently for the last cycle, refers to an element
already in place and needs not be sorted. Hence, the condition of equat.(3) could
be relaxed to NMOVE = 1, and the average time for finding cycle leaders would
accordingly be reduced.
With a little more effort, by including the word LEAP, we get a remarcable decrease
in the time for finding cycle leaders. Now this time is, in accordance with the
estimate given above, barely measurable, and the total time for sorting 1 000 000
bytes is - with the given hardware - 11.92 sec ± 0.05 sec, i.e. the time for finding
cycle leaders is less than 1% of the total run time.
4 Generalization
Sorting on one byte is of limited use for large numbers and was treated here in
order to study basic properties. We also have implemented a rather general,
recursive version of Flash-Sort. With that version, within the limits of available
memory, any number of strings of any length and any number of keys with
independent selectable collation sequences for any sort order of columns may be
sorted. The overhead due to the more complicated access to the data requires a
factor of about 3.5 in runtime, compared to the basic version presented here. This
is amply compensated by taking advantage of the Native Code Compiler ( NCC )
provided by LMI, which results in a speedup by about a factor 5. As an example,
sorting 100 000 strings of 50 bytes length with a 50 byte key takes about 4.6 sec,
sorting 106 strings of the same length and same number of keys requires 46 sec.
5 Discussion
8
8
not sort in place. The array B will then be copied back into array A. The runtime
difference between Flash-Sort and Counting-Sort is evidently the time required to
find the cycle leaders in Flash-Sort and it is just this additional time, which is the
cost in runtime to save the memory space of array B. For a given hardware
configuration, by not needing the memory space of array B, the maximal number of
elements which may be sorted with Flash-Sort, is almost doubled. The runtime cost
for finding cycle leaders for large N, N >> M, is less than 1% of the total runtime,
which is very low, indeed.
E-mail. karl-dietrich.neubert@usa.net
References
10
10