Handbook of Algorithms and Data Structures in Pascal and C PDF
Handbook of Algorithms and Data Structures in Pascal and C PDF
of Algorithms
and
Data Structures
In Pascal and C
Second Edition
INTERNATIONAL COMPUTER SCIENCE SERIES
G.H. Gonnet
ETH, Zurich
25 tes
A
vv
A D D I S O N -WESLEY
PUBLISHING
COMPANY
Wokingham, England Reading, Massachusetts M e n l o Park, California N e w York
D o n hlills, Ontario Amsterdam Bonn Sydney Singapore
Tokyo Madrid San Juan Milan Paris Mexico City Seoul * T a i p e i
01991 Addison-Wesley Publishers Ltd.
01991 Addison-Wesley Publishing Company Inc.
All rights reserved. NO part of this publication may be reproduced, stored in a retrieval system,
or transmitted in any form or by any means, electronic, mechanical, photocopying, recording
or otherwise, without prior written permission of the publisher.
The programs in this book have been included for their instructional value. They have been
tested with care but are not guaranteed for any particular purpose. The publisher does not offer
any warranties or representations, nor does it accept any liabilities with respect to the
programs.
Many of the designations used by manufacturers and sellers to distinguish their products are
claimed as trademarks. Addison-Wesley has made every attempt to supply trademark
information about manufacturers and their products mentioned in this book. A list of the
trademark designations and their owners appears on p. xiv.
ISBN 0-201-41607-7
vii
...
viii PREFACE
The first edition of this handbook has been very well received by the com-
munity, and this has given us the necessary momentum for writing a second
edition. In doing so, R. A . Baeza-Yates has joined me as a coauthor. Without
his help this version would have never appeared.
This second edition incorporates many new results and a new chapter on
text searching. The area of text managing, in particular searching, has risen in
importance and matured in recent times. The entire subject of the handbook
has matured too; our citations section has more than doubled in size. Table
searching algorithms account for a significant part of this growth.
Finally we would like to thank the over one hundred readers who notified us
about errors and misprints, they have helped us tremendously in correcting
all sorts of blemishes. We are especially grateful for the meticulous, even
amazing, work of Lynne Balfe, the proofreader. We will continue cheerfully
to pay $4.00 (increased due to inflation) for each first report of an error.
Preface vii
1 Introduction
1.1 Structure of the chapters
1.2 Naming of variables
1.3 Probabilities
1.4 Asymptotic notation
1.5 About the programming languages
1.6 On the code for the algorithms
1.7 Complexity measures and real timings
2 Basic Concepts 9
2.1 Data structure description 9
2.1.1 Grammar for data objects 9
2.1.2 Constraints for data objects 12
2.1.2.1 Sequential order 13
2.1.2.2 Uniqueness 13
2.1.2.3 Hierarchical order 13
2.1.2.4 Ilierarchical balance 13
2.1.2.5 Optimality 14
2.2 Algorithm descriptions 14
2.2.1 Basic (or atoiiic) operations 15
2.2.2 Building procedures 17
2.2.2.1 Composition 17
2.2.2.2 Alternation 21
2.2.2.3 Conformation 22
2.2.2.4 Self-organization 23
2.2.3 Interchangeability 23
is
x CONTENTS
3 Searching Algorithms 25
3.1 Sequential search 25
3.1.1 Basic sequential search 25
3.1.2 Self-organizing sequential search: move-to-front method 28
3.1.3 Self-organizing sequential search: transpose method 31
3.1.4 Optimal sequential search 34
3.1.5 Jump search 35
3.2 Sorted array search 36
3.2.1 Binary search 37
3.2.2 Interpolation search 39
3.2.3 Interpolationsequential search 42
3.3 Hashing 43
3.3.1 Practical hashing functions 47
3.3.2 Uniform probing hashing 48
3.3.3 Random probing hashing 50
3.3.4 Linear probing hashing 51
3.3.5 Double hashing 55
3.3.6 Quadratic hashing 57
3.3.7 Ordered and split-sequence hashing 59
3.3.8 Reorganization schemes 62
3.3.8.1 Brents algorithm 62
3.3.8.2 Binary tree hashing 64
3.3.8.3 Last-come-first-served hashing 67
3.3.8.4 Robin Hood hashing 69
3.3.8.5 Self-adjusting hashing 70
3.3.9 Optimal hashing 70
3.3.10 Direct chaining hashing 71
3.3.11 Separate chaining hashing 74
3.3.12 Coalesced hashing 77
3.3.13 Extendible hashing 80
3.3.14 Linear hashing 82
3.3.15 External hashing using minimal internal storage 85
3.3.16 Perfect hashing 87
3.3.17 Summary 90
3.4 Recursive structures search 91
3.4.1 Binary tree search 91
3.4.1.1 Randomly generated binary trees 94
3.4.1.2 Random binary trees 96
3.4.1.3 IIeight-balanced trees 97
3.4.1.4 IVeigh t- b alaiiced trees 100
3.4.1.5 Balancing by internal path reduction 102
3.4.1.6 Ileuristic organization schemes on binary trees 105
3.4.1.7 Optiinal binary tree search 109
3.4.1.8 Rotations in binary trees 112
3.4.1.9 Deletions in binary trees 114
CONTENTS xi
I1 AsymptoticExpansions 297
11.1 Asymptotic expansions of sums 298
11.2 Gamma-type expansions 300
11.3 Exponential- type expansions 30 1
11.4 Asymptotic expansions of sums and definite integrals contain-
ing e-xa 302
11.5 Doubly exponential forms 303
11.6 Roots of polynomials 304
11.7 Sums containing descending factorials 305
11.8 Summation formulas 307
Iiidex 415
Trademark notice
1
2 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
query
average comparisons add a record i n t o
variance accesses delete a record f r o m
minimum number of assignments when we m o d i f y a record o f
worstcase exchanges reorganize
aver age w. c. f u n c t i o n calls build
read sequentially
when working with external storage, and discuss any significant practical
considerations in using the algorithm externally.
(7) With the description of each algorithm we include a list of relevant
references. General references, surveys, or tutorials are collected at the
end of chapters or sections. The third appendix contains an alphabetical
list of all references with cross-references to the relevant algorithms.
The complexity measures are also named uniformly throughout the hand-
book. Complexity measures are named X: and should be read as the number
of X s performed or needed while doing 2 onto a structure of size n. Typical
values for X are:
null (no superscript): successful search (or default operation, when there
is only one possibility);
unsuccessful search;
C : construction (building) of structure;
D : deletion of an element;
E : extraction of an element (mostly for priority queues);
1 : insertion of a new element;
4 H A N D B O O K OF A L G O R I T H M S A N D D A T A S T R U C T U R E S
M : merging of structures;
Opt : optimal construction or optimal structure (the operation is usually
implicit);
M M : minimax, or minimum number of Xs in the worst case: this is
usually used to give upper and lower bounds on the complexity of a
problem.
Note that X,l means number of operations done to insert an element into a
+
structure of size n or to insert the n 1st element.
Although these measures are random variables (as these depend on the
particular structure on which they are measured), we will make exceptions
for Cn and Ck which most of the literature considers to be expected values.
1.3 Probabilities
The probability of a given event is denoted by Pr{event}. Random vari-
ables follow the convention described in the preceding section. The expected
value of a random variable X is written E [ X ] and its variance is a 2 ( X ) . In
particular, for discrete variables X
fb)= O(s(n)>
implies that there exists k and no such that I f(n) I < k g ( n ) for n > nI1.
f(n) = o(g(n)) 4 lim f (4 =
- 0
n-+oo s(n)
9
10 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
Met aproductions
LEAF :: nil; D.
N :: DIGIT; DIGIT N.
Secondly consider the specification for a hash table to be used with direct
chaining. The production
s - (string,int) : [ (string,int) , s - (string,int)]; nil
and M[1] yield
D -, {s - (string,int)}O
96
Hyperrules
HRPI datastructure : D.
HRPI s-D : [ D , s - D ] ; nil.
~ ~ 3 1b t - D - L E A F : [D,bt-D-LEAF,bt-D-LEAF];LEAF.
HR[4] mt - N - D - LEAF : [ int, {D}T, {mt - N - D - LEAF}!] ; LEAF.
HRN gt-D-LEAF : [ D , s - g t - D - L E A F ] ; LEAF.
HRPI tr-N-D: [ { t r - N - D } y ] ; [D]; nil.
In this multitree, each node contains 10 keys and has 11 descendants. Certain
restrictions on B-trees, however, are not included in this description (that
the number of actual keys is to be stored in the int field in each node, that
this number must be between 5 and 10, that the actual keys will be stored
contiguously in the keys-array starting at position 1, ...); these will instead be
defined as constraints (see below).
The grammar rules that we are using are inherently ambiguous. This is
not inconvenient; as a matter of fact it is even desirable. For example, consider
and
D + DICT 4 KEY)^ + {real)l10
Although both derivation trees produce the same object, the second one de-
scribes an array used as a sequential implementation of a dictionary structure,
while the first may just be a collection of real numbers. In other words, the
derivation tree used to produce the data objects contains important semantic
information and should not be ignored.
2.1.2.2 Uniqueness
Often it is convenient to disallow duplicate values in a structure, for example
in representing sets. At other times the property of uniqueness can be used
to ensure that records are not referenced several times in a structure (for
example, that a linear chain has no cycles or that every node in a tree has
only one parent).
Lexicographical trees
A lexicographical tree is a tree that satisfies the following condition for every
+
node s: if s has n keys ( k e y l , key2, ..., key,) stored in it, s must have n 1
descendant subtrees t o , t l , . . . ,tn. Furthermore, if do is any key in any node
of t o , dl any key in any node of t l , and so on, the inequality do 5 key1 5
d l 5 ... 5 k e y , 5 dn must hold.
Priority queues
A priority queue can be any kind of recursive structure in which an order
relation has been established between each node and its descendants. One
example of such an order relation would be to require that keyp 5 k e y d , where
k e y p is any key in a parent node, and keyd is any key in any descendant of
that node.
Height balance
Let s be any node of a tree (binary or multiway). Define h ( s ) as the height
of the subtree rooted in s, that is, the number of nodes in the tallest branch
starting at s. One structural quality that may be required is that the height of
a tree along any pair of adjacent branches be approximately the same. More
formally, the height balance constraint is I h(s1) - h(s2) I 5 6 where s1 and
s2 are any two subtrees of any node in the tree, and 6 is a constant giving
14 HANDBOOK OF ALGORTTHMS AND DATA STRUCTURES
. the maximum allowable height difference. In B-trees (see Section 3.4.2) for
example, S = 0, while in AVL-trees 6 = 1 (see Section 3.4.1.3).
Weight balance
For any tree, the weight function w(s) is defined as the number of external
nodes (leaves) in the subtree rooted at s. A weight balance condition requires
that for any two nodes s1 and s2,if they are both subtrees of any other node
in the tree, P 5 w(sl)/w(s2) 5 1 / where
~ P is a positive constant less than 1.
2.1.2.5 Optimality
Any condition on a data structure which minimizes a complexity measure
(such as the expected number of accesses or the maximum number of com-
parisons) is an optimality condition. If this minimized measure of complexity
is based on a worst-case value, the value is called the minimax; when the
minimized complexity measure is based on an average value, it is the minave.
In summary, the W-grammars are used to define the general shape or
pattern of the data objects. Once an object is generated, its validity is checked
against the semantic rules or constraints that may apply to it.
References:
[Pooch, U.W. et al., 731, [Aho, A.V. et al., 741, [Rosenberg, A.L., 741, [Rosen-
berg, A.L., 751, [Wirth, N., 761, [Claybrook, B.G., 771, [Hollander, C.R., 771,
[Honig, W.L. et al., 771, [MacVeigh, D.T., 771, [Rosenberg, A.L. et a / . , 771,
[Cremers, A.B. et al., 781, [Gotlieb, C.C. et al., 781, [Rosenberg, A.L., 781, [Bo-
brow, D.G. et d . ,791, [Burton, F.W., 791, [Rosenberg, A.L. et d . ,791, [Rosen-
berg, A.L. e t al., 801, [Vuillemin, J., 801, [Rosenberg, A.L., 811, [ODunlaing,
C. e t al., 821, [Gonnet, G.H. et al., 831, [Wirth, N., 861.
and R are all data structures; S is called the input structure, P contains
parameters (for example, to specify a query), and R is the result. The two
following examples illustrate these concepts:
(1) Quicksort is an algorithm that takes an array and sorts it. Since there
are no parameters,
BASIC CONCEPTS 15
(2) B-tree insertion is an algorithm that inserts a new record P into a B-tree
S, giving a new B-tree as a result. In functional notation,
B-tree-insertion: B-tree x new-record --f B-tree
References:
[Aho, A.V. et al., 741, [Wirth, N . , 761, [Bentley, J.L., 791, [Bentley, J.L., 791,
[Saxe, J.B. et al., 791, [Bentley, J.L. et al., 801, [Bentley, J.L. et al., 801, [Remy,
J.L., 801, [Mehlhorn, K. et al., 811, [Overmars, M.H. et al., 811, [Overmars,
M.H. et al., 811, [Overmars, M.H. et al., 811, [Overmars, M.H. et al., 811,
[Overmars, M.H., 811, [Rosenberg, A.L., 811, [Overmars, M.H. et al., 821,
[Gonnet, G.H. et al., 831, [Chazelle, B. et al., 861, [Wirth, N . , 861, [Tarjan,
R.E., 871, [Jacobs, D. et al., 881, [Manber, U., 881, [Rao, V.N.S. et al., 881,
[Lan, K.K., 891, [Mehlhorn, I(. et al., 901.
binary tree by replacing the nil values in the leaves by (tagged) references back
to appropriate nodes in the tree.
solve-pro bZem( A ) :
if size(A) <= Critical-Size
then End-Action
else begin
Split-problem;
solve-problem(A1);
solve-problem(A2);
...
Assemble- Results
end;
18 HANDBOOK OF ALGORJTHMS AND DATA STRUCTURES
Special cases of divide and conquer, when applied to trees, are tree traver-
sals.
Iterative application
This operates on an algorithm and a sequence of data structures. The algo-
rithm is iteratively applied using successive elements of the sequence in place
of the single element for which it was written. For example, insertion sort
iteratively inserts an element into a sorted sequence.
Iterative application
s o h e- p ro b le m( A ) :
for i:=l to size(A) do
Action on A[z];
End- Action
Tail recursion
This method is a composition involving one algorithm that specifies the crite-
rion for splitting a problem into (usually two) components and selecting one
of them to be solved recursively. A classical example is binary search.
Tail recursion
solve-pro blem(A):
if size(A) <= Critical-Size
then End-Action
else begin
Split and select subproblem i;
solve-pro blem( Aj)
end
BASIC CONCEPTS 19
Tail recursion
solve-problem( A ) :
while size(A) > Critical-Size do begin
Split and select subproblem i;
A := Ai
end;
End- Action
Inverted search
inverted-search(S, A , V):
{*** Search the value V of the attmbute A an
the structure S *** ]
search (search($ A ) , V)
The structure S on which the inverted search operates has to reflect these
two searching steps, For the generation of S, the following metaproductions
should be used:
20 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
Digital decomposition
This is applied to a problem of size n by attacking preferred-size pieces (for
example, pieces of size equal to a power of two). An algorithm is applied to
all these pieces to produce the desired result. One typical example is binary
decomposition.
Digital decomposition
Solve-problem(A, n )
{*** n has a digital decomposition n = n k P k + ... + nlP1 + no *** 1
Partition the problem into subsets
A = UkO U?i 3=1 A i ;
Merge
The merge technique applies an algorithm and a discarding rule t o two or
more sequences of data structures ordered on a common key. The algorithm
is iteratively applied using successive elements of the sequences in place of
the single elements for which it was written. The discarding rule controls the
iteration process. For example, set union, intersection, merge sort, and the
majority of business applications use merging.
Merge
Randomization
This is used to improve a procedure or to transform a procedure into a proba-
BASIC CONCEPTS 21
bilistic algorithm. This is appealing when the underlying procedure may fitil,
may not terminate, or may have a very bad worst case.
Randomization
solve-pro blem ( A )
repeat begin
ran domize(A ) ;
solve( ran do mized(A ) , t( A ) un its-of- time);
end until Solve-Succeeds or Too-Many- Iterations;
if Too-Many- Iterations
then return( No-Solution- Exists)
else return( Solution);
References:
[Bentley, J.L. et al., 761, p a o , A.C-C., 771, [Bentley, J.L. et al., 781, [Dwyer,
B., 811, [Chazelle, B., 831, [Lesuisse, R., 831, [Walah, T.R., 841, [Snir, M., 861,
[Karlsson, R.G. et al., 871, [Veroy, B.S., 881.
2.2.2.2 Alternation
Superimposition
solve-problern(A):
case 1: solve-problernl(A);
case 2: solve-problern2(A);
...
case n: solve-problern,(A)
Interleaving
This operation is a special case of alternation in which one algorithm does not
need to wait for other algorithms to terminate before starting its execution.
For example one algorithm might add records to a file while a second algorithm
makes deletions; interleaving the two would give an algorithm that performs
additions and deletions in a single pass through the file.
Recursion termination
This is an alternation that separates the majority of the structure manipu-
lations from the end actions. For example, checking for end of file on input,
for reaching a leaf in a search tree, or for reduction to a trivial sublist in a
binary search are applications of recursion termination. It is important to
realize that this form of alternation is as applicable to iterative processes as
recursive ones. Several examples of recursion termination were presented in
the previous section on composition (see, for example, divide and conquer).
2.2.2.3 Conformation
If an algorithm builds or changes a data structure, it is sometimes necessary
to perform more work to ensure that semantic rules and constraints on the
data structure are not violated. For example, when nodes are inserted into
or deleted from a tree, the trees height balance may be altered. As a result
it may become necessary to perform some action to restore balance in the
new tree. The process of combining an algorithm with a clean-up operation
on the data structure is called conformation (sometimes organization or
reorganization). In effect, conformation is a composition of two algorithms:
the original modification algorithm and the constraint satisfaction algorithm.
Because this form of composition has an acknowledged meaning to the algo-
rithms users, it is convenient to list it as a separate class of building operation
rather than as a variant of composition. Other examples of conformation in-
clude reordering elements in a modified list to restore lexicographic order,
percolating newly inserted elements to their appropriate locations in a prior-
ity queue, and removing all dangling (formerly incident) edges from a graph
BASIC CONCEPTS 23
2.2.2.4 Self-organization
This is a supplementary heuristic activity that an algorithm may often per-
form in the course of querying a structure. Not only does the algorithm do
its primary work, but it also reaccommodates the data structure in a way
designed to improve the performance of future queries. For example, a search
algorithm may relocate the desired element once it is found so that future
searches through the file will locate the record more quickly. Similarly, a page
management system may mark pages m they are accessed, in order that least
recently used pages may be identified for subsequent replacement.
Once again, this building procedure may be viewed as a special case of
composition (or of interleaving); however, its intent is not to build a func-
tionally different algorithm, but rather to augment an algorithm to include
improved performance characteristics.
2.2.3 Interchangeability
The framework described so far clearly satisfies two of its goals: it offers
sufficient detail to allow effective encoding in any programming language,
and it provides a uniformity of description to simplify teaching. It remains
to be shown that the approach can be used to discover similarities among
implementations as well as to design modifications that result in useful new
algorithms.
The primary vehicle for satisfying these goals is the application of inter-
changeability. Having decomposed algorithms into basic operations used in
simple combinations, one is quickly led to the idea of replacing any component
of an algorithm by something similar.
The simplest form of interchangeability is captured in the static objects
definition. The hyperrules emphasize similarities among the data structure
implementations by indicating the universe of uniform substitutions that can
be applied. For example, in any structure using a sequence of reals, the hyper-
rule for s - D together with that for D indicates that the sequence of reals can
be replaced by a sequence of integers, a sequence of binary trees, and so on.
Algorithms that deal with such modified structures need, at most, superficial
changes for manipulating the new sequences, although more extensive modi-
fications may be necessary in parts that deal directly with the components of
the sequence.
The next level of interchangeability results from the observation that some
data structure implementations can be used to simulate the behaviour of oth-
ers. For example, wherever a bounded sequence is used in an algorithm, it
may be replaced by an array, relying on the sequentiality of the integers to
access the arrays components in order. Sequences of unbounded length may
24 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
P r ( A , = i) -1 (1 5 i 5 n)
n
n+l
E[A,] = -2
n2 - 1
u2(A,) =
12
25
26 HANDBOOK OF A L G O N T H M S AND DATA STRUCTURES
is known as secondary key search. The third algorithm lmerts a new key
into the array without checking if the key already exists (this must be done
for primary keys). The last two algorithms deal with the search for primary
and secondary keys in linked lists.
begin
2 : = 1;
while (i<n) and ( k e y <> r[z].k)do i := i+l;
if r [ z ] . k k e y then search := i {*** found(r[z])***}
else search := -1; {*** notfound(key) ***}
end;
begin
r[n+l].k := key;
2 .- 1 ;
while k e y <> r[zl.kdo i := i+1;
if i <= n then search := 2 {*** found(r[z])***}
else search := -1; {*** notfound(key) ***)
end:
for i:=l t o n do
if k e y = dz1.k then found(r[z]);
SEARCHING A l X ~ 0 R . l T H M S 27
begin
if n>=m then Error {*** Table is full ***)
else begin
n := n+l;
r[n].k := k e y
end
end;
{ datarecord * p ;
for (p=list; p != NULL && k e y != p ->h; p = p ->next);
ret urn ( p ) ;
p := list;
while p <> nil do
begin
if k e y = p1.k then f o u n d ( p f ) ;
p := pT.next
end ;
(3) we are looking for secondary keys and a large number of hits ( O ( n ) )is
expected;
(4) testing extremely complicated conditions.
The sequential search can also look for a given range of keys instead of
one unique key, at no significant extra cost. Another advantage of this search
algorithm is that it imposes no restrictions on the order in which records are
stored in the list or array.
The efficiency of the sequential search improves somewhat when we use it
to examine external storage. Suppose each physical 1/0 operation retrieves b
records; we say that b is the blocking factor of the file, and we refer to each
block of b records as a bucket. Assume that there are a total of n records
in the external file we wish to search and let k = [ n / b J ,If we use E, as a
random variable representing the number of external accesses needed to find
a given record, we have
E[En] = k + 1 -
kb(k + 1) M -
k+1
2n 2
References:
[Knuth, D.E., 731, [Berman, G . et al., 741, [Knuth, D.E., 741, [Clark, D.W.,
761, [Wise, D.S., 761, [Reingold, E.M. et al., 771, [Gotlieb, C.C. e t al., 781,
[Hansen, W.J., 781, [Flajolet, P. et al., 791, [Flajolet, P. et al., 791, [Kronsjo,
L., 791, [Flajolet, P. e t al., SO], [Willard, D.E., 821, [Sedgewick, R., 881.
>
numbered in such a way that p l 2 p2 - ... 2 pn > 0 . With this model
we have
begin
begin
n := n+l;
new(p);
p1.k := key;
p f . n e x t := head;
i n s e r t := p ;
end;
21n 2 1 In
-------- a in3 a
Cn = +0(in5 a)
In a 2 24 2880
Wedge distribution: pi =
F n+l)
where p i is the optimal cost (see Section 3.1.4). The above formula is maxi-
mized for X = 2, and this is the worst-case possible probability distribution.
Table 3.1 gives the relative efficiency of move-to-front compared to the op-
timal arrangement of keys, when the list elements have accessing probabilities
which follow several different folklore distributions.
References:
[McCabe, J., 651, [Knuth, D.E., 731, [Hendricks, W.J., 761, [Rivest, R.L.,
761, [Bitner, J.R., 791, [Gonnet, G.H. et al., 791, [Gonnet, G.H. ei al., 811,
[Tenenbaum, A.M. ei al., 821, [Bentley, J.L. e t al., 851, [Hester, J.H. et al.,
851, [Hester, J.H. ei a!., 871, [Chung, F.R.K. ei al., 881, [Makinen, E., 881.
' If 'UUC
transposed with the record that immediately precedes it in the table (provided
of course that the record being sought was not already in the first position).
As with the move-to-front (see Section 3.1.2) technique, the object of this
rearrangement process is to improve the average access time for future searches
by moving the most frequently accessed records closer to the beginning of the
table. We have
This expected value of the number of the accesses to find an element can be
written in terms of permanents by
In general the transpose method gives better results than the move-to-front
(MTF) technique for stable probabilities. In fact, for all record accessing
probability distributions, we have
Cn
transpose
- n
< cMTF
When we look at the case of the unsuccessful search, however, both methods
have the identical result
A: = n
Below we give a code description of the transpose algorithm as it can be
applied to arrays. The transpose method can also be implemented efficiently
for lists, using an obvious adaptation of the array algorithm.
begin
2 .-
.- 1;
while ( i < n ) and (r[2].k <> key) do i := i + l ;
+
may take Q ( n 2 )accesses to come within a factor of 1 E of the final steady
state.
Because of this slow adaptation ability, the transpose algorithm is not
recommended for applications where accessing probabilities may change with
time.
For sequential searching of an array, the transpose heuristic is preferable
over the move-to-front heuristic.
Table 3.2 gives simulation results of the relative efficiency of the trans-
pose method compared to the optimal arrangement of keys, when the list
elements have accessing probabilities which follow several different 'folklore'
distributions. It appears that for all smooth distributions, the ratio between
transpose and the optimal converges to 1 as n 00. ---f
References:
[Hendricks, W.J., 761, [Rivest, R.L., 761, [Tenenbaum, A.M., 781, [Bitner, J.R.,
791, [Gonnet, G.H. et al., 791, [Gonnet, G.H. et al., 811, [Bentley, J.L. et al.,
851, [Hester, J.H. et al., 851, [Hester, J.H. e2 al., 871, [Makinen, E., 881.
i=l
n
g2(Afl) = i2pi - ( ' P ; ) ~
i=l
SEARCHING ALGORITHMS 35
Naturally, these improved efficiencies can only be achieved when the ac-
cessing probabilities are known in advance and do not change with time. In
practice, this is often not the case. Further, this ordering requires the over-
head of sorting all the keys initially according to access probability. Once the
sorting is done, however, the records do not need reorganization during the
actual search procedure.
rea dJirst-record;
while key > r.k do Jump-records;
while key < r.k do readprevious-record;
if key=r.k then found(r)
else notfound( key);
(2) Sequential files with compressed and/or encoded information when the
cost of decompressing and/or decoding is very high.
References:
[sk H . , 731, [Shneiderman, B., 781, [Janko, W., 811, [Leipala, T., $11,
[Guntzer, U. et al., 871.
General references:
[Shneiderman, B., 731, [Lodi, E. et al., 761, [Shneiderman, B. et al., 761, [Wirth,
N., 761, [Nevalainen, 0. et al., 771, [Allen, B. et al., 781, [Claybrook, B.G. et
al., 781, [McKellar, A.C. e2 al., 781, [Standish, T.A., 801, [Mehlhorn, K . , 841,
[Manolopoulos, Y.P. e2 ul., 861, [Wirth, N., 861, [Papadakis, T. et al., 901,
[Pugh, W., 901.
begin
i .-
.- n;
if n>=m then E r r o r {*** Table full ***)
else begin
n := n + l ;
while i>O do
if r[;J.k > k e y then begin
SEARCHING A L G O R I T H M S 37
rfi+l] := 421;
2 .- i-1
end
else goto 999; {*** break ***}
The above algorithm will not detect the insertion of duplicates, that is,
elements already present in the table. If we have all the elements available
at the same time, it is advantageous to sort them in order, as opposed to
inserting them one by one.
General references:
[Peterson, W.W., 571, [Price, C.E., 711, [Overholt, K.J., 731, [Horowitz, E. et
al., '761, [Guibas, L.J. et al., 771, [Flajolet, P. et al., 791, [Flajolet, P. et al.,
801, [Mehlhorn, K., 841, [Linial, N. et al., 851, [Manolopoulos, Y.P. et al., 861,
p u b a , T. et al., 871, [Pugh, W., 901.
2k+1
C, = C i = k + 2 - - M log, n
n+l
1
a2(AL) < -
- 12
If we use three-way comparisons and stop the search on equality, the number
of comparisons for the successful search changes to:
38 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
E[A,] = C, = k + l -
a2(An) =
3x 2'c+1 - (k + 2)2 - 2
n
k: 2.125 f ,125 + o(1)
c, = (I+ t) c:,- 1
begin
low := 0;
high := n;
while high-low > 1 do begin
j := (high+low) div 2;
if key <= $],k then high := j
else low := j
end;
if rfhigh1.k = key then search := high {*** found(rfhigh1) ***I
else search := -1; {*** notfound(key) ***)
end;
There are more efficient search algorithms than the binary search but
such methods must perform a number of special calculations: for example,
the interpolation search (see Section 3.2.2) calculates a special interpolation
function, while hashing algorithms (see Section 3.3) must compute one or
more hashing functions. The binary search is an optimal search algorithm
when we restrict our operations only to comparisons between keys.
Binary search is a very stable algorithm: the range of search times stays
very close to the average search time, and the variance of the search times
is O ( 1 ) . Another advantage of the binary search is that it is well suited to
searching for keys in a given range as well as searching for one unique key.
One drawback of the binary search is that it requires a sorted array. Thus
additions, deletions, and modifications to the records in the table can be
expensive, requiring work on the scale of O(n).
SEARCHING ALGORITHMS 39
Table 3.3 gives figures showing the performance of the three-way compar-
ison binary search for various array sizes.
n cn a2(An) c:,
5 2.2000 0.5600 2.6667
10 2.9000 0.8900 3.5455
50 4.8600 1.5204 5.7451
100 5.8000 1.7400 6.7327
500 7.9960 1.8600 8.9780
1000 8.9870 1.9228 9.9770
5000 11.3644 2.2004 12.3619
10000 12.3631 2.2131 13.3618
References:
[Arora, S.R. et al., 691, [Flores, I. et al., 711, [Jones, P.R., 721, [Knuth, D.E.,
731, [Overholt, K.J., 731, [Aho, A.V. e t al., 741, [Berman, G. et al., 741, [Bent-
ley, J.L. et al., 761, [Reingold, E.M. et al., 771, [Gotlieb, C.C. e t al., 781,
[Flajolet, P. et al., 791, [Kronsjo, L., 791, [Leipala, T., 791, v a o , A.C-C., 811,
[Erkioe, H. e t al., 831, [Lesuisse, R., 831, [Santoro, N. et al., 851, [Arazi, B.,
861, [Baase, S., 881, [Brassard, C.et al., 881, [Sedgewick, R., 881, [Manber, U.,
891.
k
1
P r { A , > IC} M n(l-,E2-')
begin
low := 1 ;
high := n;
while (dhigh1.L >= key) and ( k e y > (1owI.k) do
begin
j := t7unc((key-(low].k) / (r[high].k-dlow].k) *
(high-low)) low; +
if key > *].k then low := j+l
else if Ley < +].k then high := j-1
else low := j
end;
if T[low].k = key then search := low {*** found(r[low]) ***)
else search := - 1 ; {*** notfound(key) ***)
end:
n E[Anl Ln
5 0.915600f0.000039 1.45301f0.00014 1.28029f0.00009
10 1.25143f0.00010 2.18449 f 0.00024 1.50459f0.00015
50 1.91624f0.00029 3.83115f0.00083 2.02709 f0.00032
, 100 2.15273f0.00040 4.5588f0.0013 2.23968 f0.00042
500 2.60678f0.00075 6.1737f0.0029 2.67133f0.00073
1000 2.771 l f O . O O 1 O 6.8265f0.0040 2.83241f0.00094
5000 3.0962fO .OO 18 8.2185f0.0084 3.155 lf0.0017
10000 3.2 173f0.0023 8.749 f0.012 3.2760f0.0022
50000 3.4638 f 0 .0043 9.937 f0.025 3.5221f0.0043
From the above results we can see that the value for E[An] is close to the
value of log2 log2 n;in particular under the arbitrary assumption that
E[&] = ~rl0g2log2 n +P
for n 2 500, then
= 1.0756f0.0037 ,B = -0.797f0.012
References:
[Kruijer, H.S.M., 741, [Waters, S.J., 751, [Whikt, J.D. e2 al., 751, [Yao, A.C-C.
e2 al., 761, [Gonnet, G.H., 771, [Perl, Y. e2 al., 771, [Gotlieb, C.C. et al., 781,
[Perl, Y. e2 al., 781, [Franklin, IV.R., 791, [van der Nat, M., 791, [Burton, F.W.
e t al., SO], [Gonnet, G.11. . e 2 al., SO], [Ehrlich, G., 811, [Lewis, G.N. e2 al.,
811, [Burkhard, W.A., 831, [Mehlhorn, K. et al., 851, [Santoro, N. e2 al., 851,
[Manolopoulos, Y.P. e t al., 871, [Carlsson, S. et al., 881, [Manber, U., 891.
42 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
= (g)1/2+O(l)
As with the standard interpolation search (see Section 3.2.2), this method
requires an interpolation formula 4 such as $ ( a , n ) = [rial or + ( a , n ) =
[na + 11; for the code below we use the latter.
begin
if n > 1 then
begin
(*** initial probe location ***)
j := tmnc((key-r[I].k) / (~fn].IC-r[l].k) * ( n - I ) ) + 1;
if k e y < rb1.k then
while ( j > l ) and (Ley<+].k) do j := j-1
else while (j< n) and (key> rb].IC) do j := j+l
end
else j := 1 ;
SEARCHING ALGORITHMS 43
E[Enl
In addition to this reduction the accessed buckets are contiguous and hence
the seek time may be reduced.
Table 3.5 lists the expected number of accesses required for both successful
and unsuccessful searches for various table sizes.
n E[A~I E [ A ~
5 1.5939 1.9613
10 1.9207 2.3776
50 3.1873 3.7084
100 4.1138
500 7.9978
1000 10.9024
5000 23.1531
10000 32.3310
References:
[Gonnet, G.H. et al., 771.
3.3 Hashing
Hashing or scatter storage algorithms are distinguished by the use of a hash-
ing function. This is a function which takes a key as input and yields an
integer in a prescribed range (for example, [0, m- 11) as a result. The function
is designed so that the integer values it produces are uniformly distributed
44 HANDBOOK OF ALGORJTHMS AND DATA STRUCTURES
throughout the range. These integer values are then used as indices for an
array of size m called the hashing table. Records are both inserted into
and retrieved from the table by using the hashing function to calculate the
required indices from the record keys.
When the hashing function yields the same index value for two different
keys, we have a collision. Keys which collide are usually called synonyms.
A complete hashing algorithm consists of a hashing function and a method
for handling the problem of collisions. Such a method is called a collision
resolution scheme.
There are two distinct classes of collision resolution schemes. The first
class is called open-addressing. Schemes in this class resolve collisions by
computing new indices based on the value of the key; in other words, they
rehash into the table. In the second class of resolution schemes, all elements
which hash to the same table location are linked together in a chain.
To insert a key using open-addressing we follow a sequence of probes in the
table. This sequence of probe positions is called a path. In open-addressing
a key will be inserted in the first empty location of its path. There are at
most m! different paths through a hashing table and most open-addressing
methods use far less paths than m! Several keys may share a common path or
portions of a path. The portion of a path which is fully occupied with keys
will be called a chain.
The undesirable effect of having chains longer than expected is called clus-
tering. There are two possible definitions of clustering.
(1) Let p = @(mk) be the maximum number of different paths. We say that
+
a collision resolution scheme has k 1 clustering if it allows p different
circular paths. A circular path is the set of all paths that are obtained
from circular permutations of a given path. In other words, all the paths
in a circular path share the same order of table probing except for their
starting position.
(2) If the path depends exclusively on the first k initial probes we say that
we have k-clustering.
It is generally agreed that linear probing suffers from primary clustering,
quadratic and double hashing from secondary clustering, and uniform and
random probing from no clustering.
Assume our hashing table of size m has n records stored in it. The quantity
cy = n / m is called the load factor of the table. We will let A,, be a random
variable which represents the number of times a given algorithm must access
the hashing table to locate any of the n elements stored there. It is expected
that some records will be found on the first try, while for others we may have
to either rehash several times or follow a chain of other records before we
locate the record we want. We will use L,, to denote the length of the longest
probe sequence needed to find any of the n records stored in the table. Thus
our random variable A,, will have the range
SEA RCIiING A LGORITIIRfS 45
Its actual value will depend on which of the n records we are looking for.
In the same way, we will let A; be a random variable which represents the
+
number of accesses required to insert an n l t h element into a table already
containing n records. We have
1 2 A: 5 n+l
The search for a record in the hashing table starts at an initial probe
location calculated by the hashing function, and from there follows some pre-
scribed sequence of accesses determined by the algorithm. If we find an empty
location in the table while following this path, we may conclude that the de-
sired record is not in the file. Thus it is important that an open-addressing
scheme be able to tell the difference between an empty table position (one
that has not yet been allocated) and a table position which has had its record
deleted. The probe sequence may very well continue past a deleted position,
but an empty position marks the end of any search. When we are inserting
a record into the hashing table rather than searching for one, we use the first
empty or deleted location we find.
Let
Cn = E[&]
and
C i = E[Ak].
C,, denotes the expected number of accesses needed to locate any individual
record in the hashing table while CA denotes the expected number of accesses
needed to insert a record. Thus
n-1 n-1
Below we give code for several hash table algorithms. In all cases we will
search in an array of records of size m, named T , with the definition in Pascal
being
and in C being
The key value being searched is stored in the variable k e y . There ex-
ist functions (or default values) that indicate whether an entry is empty
( e m p t y ( r [ i ] ) )or indicate whether a value has been deleted (deletecl(r[i])).
The hashing functions yield values between 0 and m - 1. The increment func-
tions, used for several double-hashing algorithms, yield values between 1 and
m-1.
General references:
[Peterson, W.W., 571, [Schay, G. et al., 631, [Batson, A., 651, [Chapin, N., 691,
[Chapin, N., 691, [Bloom, B.H., 701, [Coffman, E.G. et al., 701, [Collmeyer, A.J.
et a / . , 701, [Knott, G.D., 711, [Nijssen, G.M., 711, [Nijssen, G.M., 711, [Price,
C.E., 711, [Williams, J.G., 711, [Webb, D.A., 721, [Bays, C., 733, [Knuth, D.E.,
731, [Aho, A.V. et al., 741, [Bayer, R., 741, [Montgomery, A.Y., 741, [Roth-
nie, J.B. et al., 741, [Bobrow, D.G., 751, [Deutscher, R.F. et al., 751, [Ghosh,
S.P. et al., 751, [Maurer, W.D. e t al., 751, [Goto, E. e t al., 761, [Guibas, L.J.,
761, [Horowitz, E. e t al., 761, [Sassa, M. e t al., 761, [Severance, D.G. et ai.,
. SEARCHING ALGORITHMS 47
761, [Clapson, P., 771, [Reingold, E.M. et al., 771, [Ilosenberg, A.L. e t al., 771,
[Gotlieb, C.C. e t al., 781, [Guibas, L.J., 781, [Halatsis, C. e t al., 781, [Kollias,
J.G., 781, [Kronsjo, L., 791, [Mendelson, H. et al., 791, [Pippenger, N., 791, [Ro-
mani, F. et d.,791, [Scheurmann, P., 791, [Larson, P., 801, [Lipton, R.J. e t d.,
801, [Standish, T.A., 801, [Tai, K.C. et al., 801, [Bolour, A., 811, [Litwin, W.,
811, [Tsi, K.T. e t al., 811, [Aho, A.V. et al., 831, [Nishihara, S. e t al., 831, [Rein-
gold, E.M. e t al., 831, [Larson, P., 841, [Mehlhorn, K., 841, [Torn, A.A., 841,
[Devroye, L., 851, [Szymanski, T.G., 851, [Badley, J., 861, [Jacobs, M.C.T. e t
al., 861, [van Wyk, C.J. e t al., 861, [Felician, L., 871, [Ramakrishna, M.V., 871,
[Ramakrishna, M.V. et al., 881, [Ramakrishna, M.V., 881, [Christodoulakis, S.
e t al., 891, [Manber, U., 891, [Broder, A.Z. e t al., 901, [Cormen, T.H. e t al.,
901, [Gil, J. e t al., 901,
where w is the number of bits in a computer word, and the mod 2w operation
is done by the hardware. For this function the value B = 131 is recommended,
as Bihas a maximum cycle mod 2k for 8 5 IC 5 64.
48 H A N D B O O K OF A L G O R I T H M S A N D D A T A S T R U C T U R E S
int h a s h f u n c t i o n ( s )
char *s;
{ int i;
for(i=O; *s; s++) i = 131*i + *s;
return(i % m);
1
References:
[Maurer, W.D., 681, [Bjork, H., 711, [Lum, V.Y. et al., 711, [Forbes, K., 721,
[Lum, V.Y. e2 al., 721, [Ullman, J.D., 721, [Gurski, A., 731, [Knuth, D.E., 731,
[Lum, V.Y., 731, [Knott, G.D., 751, [Sorenson, P.G. et al., 781, [Bolour, A.,
791, [Carter, J.L. el al., 791, [Devillers, R. et al., 791, [Wegman, M.N. et al.,
791, [Papadimitriou, C.H. et al., 801, [Sarwate, D.V., 801, [Mehlhorn, K., 821,
[Ajtai, M. el al., 841, [Wirth, N., 861, [Brassard, G. et al., 881, [Fiat, A. et al.,
881, [Ramakrishna, M.V., 881, [Sedgewick, R., 881, [Fiat, A. et al., 891, [Naor,
M. el al., 891, [Schmidt, J.P. et al., 891, [Siegel, A., 891, [Mansour, Y. et al.,
901, [Pearson, P.K., 901, [Schmidt, J.P. et al., 901.
a2(An) =
2(m 1) +
- C&n + 1)
m-n+2
n
SEARCHING ALGORITHMS 49
a2(AL) =
(m + l)n(m - n) CY
(m - n + 1)2(m - n + 2) %
(1 - cu)2
C& = m
I I m = 100 I m=oo I
CY Cn a2(~n) CA Cn a2(An) c:,
50% 1.3705 0.6358 1.9804 1.3863 0.6919 2.0
80% 1.9593 3.3837 4.8095 2.0118 3.9409 5 .O
90% 2.4435 8.4190 9.1818 2.5584 10.8960 10.0
95% 2.9208 17.4053 16.8333 3.1534 26.9027 20.0
99% 3.7720 44.7151 50.0 4.6517 173.7101 100.0
Double hashing (see Section 3.3.5) behaves very similarly to uniform prob-
ing. For all practical purposes they are indistinguishable.
50 HANDBOOK OF ALG0RlTHAf.S AND DATA STRUCTURES
References:
[Furukawa, K., 731, [Knuth, D.E.,731, [Ajtai, M. et al., 781, [Gonnet, G.H., 801,
[Gonnet, G.H., 811, [Greene, D.H. et al., 821, [Larson, P . , 831, p a o , A.C-C.,
851, [Ramakrishna, M.V., 881, [Schmidt, J.P. et a / . , 901.
m
E[An] = Cn = -(Hm - H,,,) = -a"'ln (1 - a ) + 0
n
- -2
- + a - l l n ( 1 - a)- a-2 ln2(1 - a)+ o
1-a
Cm = H m = In m + y + O ( m - ' )
1
E [ A i ] = CA = -
1-a
a
a2(Ak) =
( 1 - a)2
All collision resolution schemes that do not take into account the future probe
sequences of the colliding records have the same expected successful search
time under random probing.
SEARCHING ALGORITHMS 51
I I m = 100 I m=oo I
a cn a2(An) c:, cn a2(An) c:,
50% 1.3763 0.6698 2.0 1.3863 0.6919 2 .o
80% 1.9870 3.7698 5.0 2.0118 3.9409 5 .O
90% 2.5093 10.1308 10.0 2.5584 10.8960 10.0
95% 3.0569 23.6770 20.0 3.1534 26.9027 20.0
99% 4.2297 106.1598 100.0 4.6517 173.7101 100.0
Table 3.7 gives figures for some of the basic complexity measures in the
case of 'm = 100 and m = 00.
Notice that the asymptotic results ( m 00; a fixed) coincide with uniform
---f
probing, while for finite values of m, uniform probing gives better results.
Random probing could be implemented using pseudo-random probe 1~
cations; it does not seem, however, to be a good alternative to the double
hashing algorithm described in Section 3.3.5.
References:
[Morris, R., 681, [Furukawa, K., 731, [Larson, P., 821, [Celis, P. et al., 851,
[Celis, P., 851, [Celis, P., 861, [Poblete, P.V. et al., 891, [Ramakrishna, M.V.,
891.
-
- &(a2- 3a + 6) - 3a+1
+ O(m-2)
12( 1 - a)3 2(1- a)5m
~ ( w o rfile)
~ t - n+l
- -
n
2
1
E[Ak]=Ci = -
2
= ;(1+
(1 - a)2 >- 3a
2(1 - a)4m
+ o(m-2)
C(worst file)
n
= 1 + n(n + 1)
2m
We denote the hashing table as an array T, with each element r[i] having
a key k.
begin
i := hashfunction(key) ;
last := ( i + n - 1 ) mod rn;
while (i<>lasi) and (not ernpty(r[z])) and (r[z].k<>key) do
i := (i+l) mod rn;
if r[z].k=key then search := i {*** found(r[z]) ***}
else search := -1; {*** notfound(key) ***}
end;
SEARCHING ALGORTTHMS 53
begin
i := hashfunction(key) ;
last := ( i + m - 1 ) mod m;
while (i<>last) and (not empty( 421))
and (not deleted(rfa1)) and (rfz].L<>key) do
i := (i+l) mod m;
if empty(r[z]) or deleted(r[z])then
begin
{*** insert here ***}
r(z1.k := key;
n := n + l
end
else E r r o r {*** table full, o r key already i n table ***};
end;
Linear probing hashing uses one of the simplest collision resolution tech-
niques available, requiring a single evaluation of the hashing function. It
suffers, however, from a piling-up phenomenon called primary clustering.
The longer a contiguous sequence of keys grows, the more likely it is that
collisions with this sequence will occur when new keys are added to the table.
Thus the longer sequences grow faster than the shorter ones. Furthermore,
there is a greater probability that longer chains will coalesce with other chains,
causing even more clustering. This problem makes the linear probing scheme
undesirable with a high load factor cy.
It should be noted that the number of accesses in a successful or unsuc-
cessful search has a very large variance. Thus it is possible that there will be a
sizable difference in the number of accesses needed to find different elements.
It should also be noted that given any set of keys, the order in which the
keys are inserted has no effect on the total number of accesses needed to install
the set.
An obvious variation on the linear probing scheme is to move backward
through the table instead of forward, when resolving collisions. Linear probing
can also be used with an increment q > 1 such that q is co-prime with m.
More generally, we could move through a unique permutation of the table
entries, which would be the same for all the table; only the starting point
of the permutation would depend on the key in question. Clearly, all these
variations would exhibit exactly the same behaviour as the standard linear
probing model.
54 HANDBOOK OF ALGORTTHMS AND DATA STRUCTURES
As noted previously, deletions from the table must be marked as such for
the algorithm to work correctly. The presence of deleted records in the table is
called contamination, a condition which clearly interferes with the efficiency
of an unsuccessful search. When new keys are inserted after deletions, the
successful search is also deteriorated.
Up until now, we have been considering the shortcomings of linear prob-
ing when it is used to access internal storage. With external storage, the
performance of the scheme improves significantly, even for fairly small stor-
age buckets. Let b be the blocking factor, that is, the number of records per
storage bucket. We find that the number of external accesses (E,) is
An 1
En = 1 +
b
while the number of accesses required to insert an n + l t h record is
A:, - 1
E ; = l +
Furthermore, for external storage, we may change the form of the algo-
rithm so that we scan each bucket completely before examining the next
bucket. This improves the efficiency somewhat over the simplest form of the
linear probing algorithm.
Table 3.8 gives figures for the efficiency of the linear probing scheme with
m = 100, and m = 00.
m = 100 I m=oo
Cn I a2(~n)
2.3952 1.5 1.5833 2.5
9.1046 3.0 35.3333 13.0
19.6987 5.5 308.25 50.5
32.1068 10.5 2566.58 200.5
50.5 50.5 330833.0 5000.5
References:
[Schay, G . et al., 621, [Buchholz, W., 631, [Tainiter, M., 631, [Konheim, A.G.
e t al., 661, [Morris, R., 681, [Kral, J., 711, [Knuth, D.E., 731, [van der Pool,
J.A., 731, [Bandyopadhyay, S.K., 771, [Blake, I.F. et al., 771, [Lyon, G.E., 781,
[Devillers, R. et al., 791, [Larson, P., 791, [Mendelson, H. et al., 801, [Quittner,
P. e t al., 811, [Samson, W.B., 811, [Larson, P., 821, [Mendelson, H., 831, [Yflug,
G.C. et ai., 871, [Pittel, B., 871, [Poblete, P.V., 871, [Aldous, D., 881, [Knott,
G.D., 881, [Sedgewick, R., 881, [Schmidt, J.P. et al., 901.
SEARCHING ALGORITHMS 55
lim P r { L , = O(logn)} = 1
n+ca
cPb.h a s h . - c,u;I.i
f . prob.
= 0.0009763...
3
E [ L f 3 0 u b . hash. - E[f.f.POb* ] = 0.001371...
begin
i := hashfunction(key) ;
inc := increment(key) ;
last := (i+(n-l)*inc) mod m;
while (i<>last) and (not empty(r[z]))and (r[z].k<>key) do
i := (i+inc) mod m;
if r [ z ] . k k e y then search := i {*** fovnd(r[z])***)
else search := -1; {*** notfound(key) ***}
end:
56 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
begin
i := hashfunction(key) ;
inc := increment(key);
last := ( i + ( m - l ) * i n c ) mod m;
while (i<>last) and (not empty(rfa]))
and (not deleted(r[a]))and (r[z].k<>key) do
i := (i+inc) mod m;
if empty( 4 2))or deleted( r[z]) then
begin
{*** insert here ***}
rfz1.k := key;
n := n+Z
end
else E r r o r {*** table full, or key already i n table ***};
end;
I
I m = 101 I
I I
I n cn g2(An)
r
Ln
r*I
bn 1
51 1.37679f0.00009 0.6557fO .OOO3 4.5823rt0.0012 2.00 159f O . O O O 12
81 1.96907f0.00021 3.4867rtO .0020 11.049f0.004 4.87225rt0.00088
91 2.45611f0.00036 8.6689k0.0062 18.159f0.009 9.2966f0.0028
96 2.93478f0.00058 17.849f0 .O16 27.115f0.017 17.0148f0.0073
100 3.7856fO .UO13 50.292fO .069 48.759fO .045 51.0
n m=
2500 1.38617f0.00010 0.69 14f0.0003 9.340f0.010 1.99997f0.00012
3999 2.01054f0.00022 3.9285fO .0025 25.6 12f0.041 4.9952fO .OO 10
4499 2.55599 f O ,00039 10.845f0.009 48.78k0.10 9.9806kO .0039
4749 3.14830f0.00073 26.650fO ,036 88.59f0.25 19.941f O .O 15
4949 4.6249f0.0032 166.73f 0.75 318.8f2.2 97.93f0.31
References:
[Bell, J.R. e t al., 701, [Bookstein, A.,721, [Luccio, F.,721, [Knuth, D.E., 731,
[Guibas, L.J., 761, [Guibas, L.J. e t al., 781, [Samson, W.B., 811, vao, A.C-C.,
851, [Lueker, G.S. e2 al., 881, [Sedgewick, R., 881, [Schmidt, J.P. e t a/., 901.
a
M I-ln(l-a)--
2
M (l-a)-' -a-In(I-a)
58 H A N D B O O K OF A L G O R I T H M S A N D D A T A STRUCTURES
begin
i := hashfunction(key) ;
inc := 0 ;
while (inc<m) and (not empty(r[z]))and (r[z].k<>key) do
begin
i := (i+inc+l) mod m;
inc := inc +2
end;
if 4 2 J . k k e y then search := i {*** found(r[zJ)
***}
else search := -1; {*** notfound(key) ***}
end;
begin
i := hashfunction(key);
inc := 0 ;
while (inc<m) and (not empty(r[z]))and
(not deleted(r[z])) and (r[z].k<>k.ey)do begin
i := (i+inc+l) mod m;
inc := inc + 2;
end;
if empty(r[z])or deleted(r[z])then
begin
{*** insert here ***}
r[zJ.k := k e y ;
n := n+l
end
else Error {*** table full, or k e y already in table ***};
end;
clustering.
This algorithm may fail to insert a key after the table is half full. This is
due to the fact that the ith probe coincides with the m - ith probe. This can
+
be solved by the use of the probe sequence h ( k ) , h ( k ) 1, h ( k ) - 1, h ( k ) + 4,
h ( k ) - 4, ... whenever m is a prime of the form 4K - 1.
Table 3.10 show some simulation results for quadratic hashing. Fn indi-
cates the average number of times that the algorithm failed during insertion.
These simulation results are not in close agreement with the proposed formu-
las for secondary clustering.
1 I m = 101
n cn Ln cc, Fn
I 51 1.41410f0.00011 4.9875f0.0013 2.11837f0.00008 <
~ 81 2.06278f0.00025 11.5711f0.0043 5.12986f0.00031 <
' 91 2.56693f0.00040 18.5212f0.0090 9.52385f0.00062 < 10-5
96 3.03603f0.00061 26.569f0.015 16.9118f0.0012 < 0.00026
100 3.69406f0.00098 37.217&0.020 38.871287 0.5709f0.0019
References:
[Maurer, W.D., 681, [Bell, J.R., 701, [Day, A.C., 701, [Radke, C.E., 701, [Hop-
good, F.R.A. et al., 721, [Knuth, D.E., 731, [Ackerman, A.F., 741, [Ecker,
A., 741, [Nishihara, S. et a/., 741, [Batagelj, V., 751, [Burkhard, W.A., 751,
[Santoro, N., 761, [Wirth, N., 761, [Samson, W.B. et ai., 781, [Wirth, N., 861,
[Wogulis, J . , 891.
altering the order of insertion. That is, if convenient, keys already in the table
are moved to make room for newly inserted keys.
In this section we present two techniques that assume we can define an
order relation on the keys in the table.
Ordered hashing is a composition of a hashing step, followed by double
hashing collision resolution. Furthermore, ordered hashing reorders keys to
simulate the effect of having inserted all the keys in increasing order. To
achieve this effect, during insertion, smaller value keys will cause relocation
of larger value keys found in their paths.
For the analysis of ordered hashing we assume, as for uniform probing (see
Section 3.3.2), that the hashing function produces probing sequences without
clustering. Let x be the probability that a randomly selected key in the file
is less than the searched key. Then
cy = n/m
n-k
P r ( A ; ( z ) > IC} = -k Xk
m-
n-k 1 a(1-
E[A:,(x)]= T x k
m- = 1- o x
a)x2
(1 - ax)3rn + O(n-2)
k>O
n
n-k
k=O
The values for A, and Cn are the same as those for double hashing (see
Section 3.3.5).
begin
i := hushfunction(key) ;
inc := increment(key) ;
lust := ( z + ( n - l ) + i n c ) mod m ;
while (i<>lust) and (not e m p t y ( r [ z ] ) )and (r[z].k<key) do
i := ( i + i n c ) mod m ;
SEARCHING ALGORITHMS 61
begin
if n > = m then E r r o r {*** table is full ***}
else begin
i := hashfunction(key) ;
while (not empty( r[z]))and (not deleted( r(z]))
and (r[z].k<>key) do begin
if r[zJ.k> key then begin
{*** Exchange key and continue ***}
t e m p := key; key := 4zI.k .[z).k := t e m p
end;
i := (i+increment(key)) mod m
end;
if empty( r[zJ)or deleted( r[zl) then begin
{*** do insertion ***}
r(2J.k:= key;
n := n+l
end
else E r r o r {*** key already i n table ***}
end
end;
This variation of double hashing (see Section 3.3.5) reduces the complexity
of the unsuccessful search to roughly that of the successful search at a small
cost during insertion.
Table 3.11 shows simulation results for ordered hashing. We present the
A since the values for C,, and L, are expected to be the same as
values for C
those for double hashing.
Split-sequence hashing chooses one of two possible collision resolution
sequences depending on the value of the key located at the initial probe posi-
tion. When we search for a key k, we first compare k with the key k stored
in position h ( k ) . If k = k or h ( k ) is empty, the search ends. Otherwise we
62 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
cl,
n m = 101 n m = 4999
51 1.38888f0.00008 2500 1.38639f0.00007
81 2.00449f0.00022 3999 2.01137f0.00022
91 2.53016f0.00039 4499 2.55787f0.00041
96 3.07959f0.00063 4749 3.15161f0.00071
100 4.2530f0.0014 4949 4.64 15f0.0021
follow one of two possible probe sequences depending on k < E' or k > E'.
For example, split linear probing uses an increment q1 if E < E', or 42 if
k > k', where q1 and 42 are both co-prime with m. Similarly, we can define
split quadratic hashing, split double hashing, and so on.
Simulations show that split linear probing hashing can improve the average
search time of linear probing by more than 50% for values of a near 1, for
random keys.
References:
[Amble, 0. e t al., 741, [Lodi, E. e t al., 851.
0 I GY 1
The values for the unsuccessful search are identical to those for double
hashing (see Section 3.3.5).
begin
init := hashfunction( k e y ) ;
inc := increment( key);
for i:=O to n do
for j:=i downto 0 do begin
j j := (init + inc*j) mod m;
+
22 := ( j j increment(+j].k) * (i-j)) mod rn;
if empty( r[iz])or deleted(7fi4) then begin
{*** move record forward ***I
r[iz]:= rkj];
{*** insert new in rbj] ***}
+ j ] . k := key;
n := n+l;
goto 999 {*** return ***I
end
end;
Error {*** table full ***I;
999:
end;
The above algorithm will not detect the insertion of duplicates, that is.
64 HANDBOOK OF ALGORJTHMS AND DATA STRUCTURES
n cn 4w Ln In
References:
[Brent, R.P., 731, [Feldman, J.A. et al., 731, [Knuth, D.E., 731, [Tharp, A.L.,
791.
of the keys in its path or the paths of any keys in the path of the path, and
so on. The name binary tree comes from the fact that the algorithm probes
locations following a binary tree pattern.
Considering uniform probing, and a = n/m (the load factor), then
Cn NN 2 + 4o3
1+a + - -a4
-
15
o5
18
+-+-
2a6
105
83a7 +--
720
613a8
5760
69a9
-+...
1120
C, NN 2.13414 ...
If Mn is the number of keys that are moved forward for an insertion, then
Mn e - a3 2a4
a2 -+-+-+------...
- 15
a5 8a6
9 105
101a7
720
506a8
2835
3 4
M, M 0.38521 ...
Table 3.14 shows exact values for these complexity measures.
a Ca
0.17255
0.24042
0.29200
0.35819
+
i := (init inc*level) mod m;
if empty( dz]) or deleted( r[z])ihen SearchMove := i
else begin
for j:=level-1 downto 0 do begin
+
i := (init inc*j) mod m;
incl := increment( dz].k);
k := SearchMove((i+incl) mod m, incl, level-j-1);
if k>-1 then begin
{*** A hole was found, move forward ***}
d k ] := r[2];
SearchMove := i;
goto 999 {*** return ***}
end
end;
{*** Could not find hole ***}
SearchMove := -1;
end;
999:
end;
begin
init := hashfunction(key);
inc := increment(key);
i .- 0 ; j := -1;
while (i<=n) and (j<O) and (n<m) do begin
j := SearchMove(init, inc, i);
i := i+l
end;
if j>-1 then begin
{*** A hole was found, insert k e y ***)
rlj1.k := key;
n := n+l
end
else Error (*** table as full ***};
end;
var i, i n c l , j , k : integer;
begin
i := (init + inc*level) mod m;
if empty(r[z])or deleted(r(z])then SearchMove := i
else begin
for j:=level-1 downto 0 do begin
i := (init + i n w j ) mod m;
2nd := incremeni(r[z].k);
k := SearchMove(( i+incl) mod m, i n c l , level-j-1);
if k>-1 then begin
{*** A hole was found, move forward ***}
r[k]:= r[Z);
SearchMove := a;
goto 999 {*** return ***I
end
end;
{*** Could not find hole ***)
SearchMove := -1;
end;
999:
end;
The above algorithm will not detect the insertion of duplicates, that is,
elements already present in the table.
This reorganization scheme significantly reduces the number of accesses for
a successful search at the cost of some additional effort during the insertion
of new keys. This algorithm is very suitable for building static tables, which
will be searched often.
Table 3.15 summarizes simulation results for the binary tree hashing reor-
ganization scheme. The column headed by In counts the average number of
elements accessed to insert a new key in the table. In gives an accurate idea
of the cost of the reorganization. Note that the expected length of the longest
probe sequence (Ln)is very short. On the other hand, the cost of inserting
new elements is particularly high for full or nearly full tables. The simulation
results are in excellent agreement with the predicted theoretical results.
References:
[Gonnet, G.H. e i al., 771, [Mallach, E.G., 771, [Rivest, R.L., 781, [Gonnet, G.H.
et a/., 791, [Lyon, G.E., 791, [Madison, J.A.T., 801.
L
n cn Ln In Mn
51 1.27475f.00005 2.9310f.0004 1.48633f.00011 0.061774f.000023
81 1.55882f.00008 4.3938f.0007 2.56876f.00038 0.165760f.000039
91 1.72359&.00010 5.2899f.0010 3.83135f.00085 0.228119f.000049
96 1.84624f.00011 6.0181f.0013 5.6329f.0019 0.273611f.000058
100 1.99963f.00017 7,0822f.0022 12.837f.014 0.327670f.000082
101 2.06 167f. 00023 7.6791f.0034 31.54f.29 0.34760f.00011
L
2499 1.28485f.00005 4.3213f.0026 1.49835f.00012 0.063668f.000024
3999 1.57955f.00008 6.6825f.0051 2.62862f.00040 0.171101f.000041
4499 1.75396f.00010 8.1678f.0071 3.98929f.00092 0.236601f.000052
4749 1.88698f.00013 9.4163f.0094 6.0202f.0021 0.285576f.000063
4949 2.06221f.00019 11.403f.016 15.729f.017 0.347749f.000093
4999 2.14844f.00067 13.344f .069 495f49 0.37645f.00032
ln(1-a) 1-a
M -
a
--
a2
In(1- ci) + O(I/rn)
a2(Am) = In rn +y +-
6
+0
7r2
(e)
where a = n/m. In comparison with random probing, the successful search
time is the same, but the variance is logarithmic instead of linear.
We can take advantage of this small variance by doing a centred search.
That is, instead of searching the probe sequence h l , ha, ..., we search the
probe sequence in decreasing probability order, according to the probability
SEARCHING ALGORITHMS 69
of finding the key in the ith location of the sequence. For LCFS hashing,
the probability distribution is a positive Poisson distribution with parameter
X = - ln(1 - CY). Instead of following the optimal order, it is simpler to use a
mode-centred search. In this case the mode is d = max(1, [XJ). Thus, we
+ +
search the probe sequence in the order d , d 1, d - 1, d 2, d - 2 , ..., 2d -
+
1, 1, 2d, 2d 1, ... . For CY < 1 - e2 M 0.86466, mode-centred search is
equivalent to the standard search. For CY 2 1 - e2, we have
References:
[Cunto, W. et al., 881, [Poblete, P.V. et al., 891.
As for LCFS, we can replace the standard search by a centred search. For
the optimal order we have Cn 5 2.57. Using a niean-centred search we
have Cn 5 2.84.
A disadvantage of Robin Uood hashing is that during an insertion we have
to compute the length of the probe sequence for one of the keys. This can be
done by traversing the probe sequence of that key until the current location is
found. For double hashing, this can also be obtained by performing a division
over a finite field.
70 H A N D B O O K OF A L G O R I T I I M S A N D D A T A S T R U C T U R E S
References:
[Celis, P. e i al., 851, [Celis, P., 851, [Celis, P., 861.
References:
[Pagli, L., 851, [Wogulis, J., 891.
For the minimax arrangement for uniform probing (see Section 3.3.2), we 1i;ivc
the lower bound
For the minimum-average arrangement for random probing (see Section 3.3.3)
and for uniform probing (see Section 3.3.2) we have:
1.688382... 5 Cm = O(1)
These optimal algorithms are mostly of theoretical interest. The algo-
rithms to produce these optimal arrangements may require O ( m ) additional
space during the insertion of new elements.
Tables 3.16 and 3.17 show some simulation results on optimal arrange-
men ts .
n m a Cpt Ln
13
798
101
997
997
997
997
19
41
101
80%
90%
95%
99%
100%
100%
100%
1.4890f0.0041 4.40f0.11
1.6104f0.0043 5.147f0.089
1.6892f0.0059 5.68f0.12
1.7851f0.0058
1.783f0.011
1.79850.011
6.77f0.13
1.729f0.011 4.385f0.071
5.29f0.11
6.30f0.18
499 499 100% 1.824f0.011 7.92k0.36
997 997 100% 1.8279f0.0064 8.98f0.38
Refer en ces :
[Gannet, G.H. e t al., 771, [Gonnet, G.H., 771, [Lyon, G.E., 781, [Gonnet,
G.H. e t al., 791, [Gonnet, G.H., 811, [Krichersky, R.E., 841, [Yao, A.C-C.,
851, [Poblete, P.V. e t al., 891.
n m CY Cn L,OPt
399 499 80% 1.4938f0.0067 3.000f0.030
449 499 90% 1.6483k0.0079 3.050&0.043
474 499 95% 1.6995f0.0070 3.990f0.020
494 499 99% 1.7882f0.0077 5.120f0.089
19 19 100% 1.749f0.011 3.929f0.062
41 41 100% 1.796f0.010 4.665f0.088
101 101 100% 1.807f0.010 5.53f0.14
499 499 100% 1.8300f0.0081 7.38f0.29
hashing table using the record key. This table location does not hold an actual
record, but a pointer to a linked list of all records which hash to that location.
This is a composition of hashing with linked lists. The data structure used
by this algorithm is described by
Pn = An, PA = A ; + 1 .
The pertinent facts about this algorithm are listed below:
(n- l)(n-5)
u2(An) =
12m2
+-n2m
- 1
M -CY2
+
12 2
- CL
u2(AL) =
n(m - 1) = C Y
rn2
SEARCHING A L GOI t I 'l'HMS 73
this algorithm uses less storage than separate chaining hashing (see Sec-
tion 3.3.11).
Descriptions of the search and insert algorithms are given below. For this
algorithm, we will not use r , the array of records, but ptrs an array of heads
of linked lists. The nodes of the linked list are the ones which contain the
keys.
{ int i, last;
datarecord *p;
p = ptrs[hashfunction(key)];
while ( p ! = N U L L && key!=p ->k) p = p ->next;
return(p);
~~~
1 ~ ~~
{ extern int n;
int i;
i = hashfunction( key);
ptrs[z] = NeurNode(key, ptrs[z]);
n++;
1
74 H A N D B O O K OF A L G O R I T H M S A N D D A T A STRUCTURES
'The above algorithm will not detect the insertion of duplicates, that is,
elements already present in the table.
The direct chaining method has several advantages over open-addressing
schemes. It is very efficient in terms of the average number of accesses for
both successful and unsuccessful searches, and in both cases the variance of
the number of accesses is small. Ln grows very slowly with respect to n.
Unlike the case with open-addressing schemes, contamination of the table
because of deletions does not occur; to delete a record all that is required is
an adjustment in the pointers of the linked list involved.
Another important advantage of direct chaining is that the load factor a
can be greater than 1; that is, we can have n > m. This makes the algorithm
a good choice for dealing with files which may grow beyond expectations.
There are two slight drawbacks to the direct chaining method. The first is
+
that it requires additional storage for the (rn n) pointers used in linking the
lists of records. The second is that the method requires some kind of memory
manQgement capability to handle allocation and deallocation of list elements.
This method is very well suited for external search. In this case we will
likely keep the array of pointers in main memory. Let Ef: be the expected
number of buckets accessed when direct chaining hashing is used in external
storage with bucket size 6 . Then
Q = n/m
E: =
n -1 + m(6 + 1 ) m(b2 - 1 )
2b m 12bn
a+b+l
-
26
+-b122 -a1b + O ( P )
where w j = e- is a root of unity.
n-1
E: = - + -3 + -(1-
m
(1 - 2 / m ) n )
4m 4 8n
References:
[Morris, R., 681, [Tai, K.C. ei al., 801, [Gonnet, G.H., 811, [Knott, G.D., 841,
[Vitter, J.S. ei al., 851, [Graham, R.L. ei al., 881, [Knott, G.D. e2 al., 891.
n(m-1) m-2n
02(AL) =
m2
+ m
(1 - l/rn), - (1 - 1/m)2fl
M +
a (1 - 2a)e-a - e-2a
The values for A,, L, and L , coincide with those for direct chaining liashing
(see Section 3.3.10).
Let Sf and Sp be the size of a record and the size of a pointer, then the
expected storage used, E[S,], is
this algorithm uses less storage than direct chaining hashing (see Sec-
tion 3.3.10).
Descriptions of the search and insert, algorithms are given below.
datarecord *search(key, r)
t y p e k e y key; dataarray r;
{ datarecord * p ;
p = &r[hashfunction(key)];
while ( p ! = N U L L && key!=p ->k) p = p ->next;
return (p ) ;
1
void i n s e r t ( k e y , r)
t y p e k e y key; dataarray r;
76 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
{ extern int n;
int i;
i = hashfunction(key);
if (empiy(r[zJ))/*** insert in main array ***/
7fzI.k = key;
else /*** insert in new node ***/
r[zl.nezt = NewNode(key, r [ z ] . n e z i ) ;
n++;
1
The above algorithm will not detect the insertion of duplicates, that is,
elements already present in the table.
This method has several advantages over open-addressing schemes. It is
very efficient in terms of the average number of accesses for both successful
and unsuccessful searches, and in both cases the variance of the number of
accesses is small. The length of the loiigest probe sequence, that is to say, the
actual worst-case, grows very slowly with respect to n.
Unlike open-addressing schemes, contamination of the table because of
deletions does not occur.
The load factor can go beyond 1 which makes the algorithm a good choice
for tables that may grow unexpectedly.
This method requires some extra storage to allocate space for pointers.
It also requires a storage allocation scheme to allocate and return space for
records.
As mentioned in Section 3.3.8.5, it is possible to use self-organizing tech-
niques on every chain. For separate chaining, using the transpose technique,
we have
E [ A n ]= Cn x (1 + i)/ha!
where a! = n / m > 1.
Similarly, the split-sequence technique mentioned in Section 3.3.7 can be
applied to separate chaining. That is, when we search for a key k, we first
compare it with the key K stored in location h ( k ) . If k: = 12 or h ( k ) is empty,
the search terminates. Otherwise, we follow one of two lists, depending on
whether k > k or k < k. For this we have
E[An]=Cn =
n
~u~+4a!-l+e-~
x
3a
E[Ak]=CL = i((l-i)n+t+l)
2
+ 1)
I
1
x -(a!+! -a
2
SEARCHING ALGORITHMS 77
References:
[Johnson, L.R., 611, [Morris, R., 681, [Olson, C.A., 691, [Bookstein, A., 721,
[van der Pool, J.A., 721, [Bays, C., 731, [Gwatking, J.C., 731, [Knuth, D.E.,
731, [van der Pool, J.A., 731, [Behymer, J.A. et al., 741, [Devillers, R. et
al., 791, [Quittner, P. et al., 811, [Larson, P., 821, [Norton, R.M. e t a/., 851,
[Ramakrishna, M.V., 881, [Sedgewick, R., 881.
E[A,] = C, = 1 +2
8n
((1 +2/m)" - 1 -
.
"> +
m
n-1
-
4m
1 a
= 1 + --(e2& - 1 - 2a) + - + O(m-l)
8a 4
16
1
16a
+ L)
32~x2
e2a +
4e3" - e4a
+ O(m-l)
Descriptions of the search and insert algorithms are given below. The
insertion algorithm uses the variable nextfree to avoid a repetitive search of
the table for empty locations. This variable should be initialized to m - 1
before starting to fill the table.
78 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
int search(key, r)
t y p e k e y key; d a t a a r r a y r;
{ int i;
i = hashfunction(key);
while (i!=(-1) && !empty(r[z])&& r[z].k!=key) i = r[z].next;
if (i==(-I) 11 empty(r[z])) return(-1);
else return(i);
1
Coalesced hashing: insertion
void insert(key, r)
t y p e k e y key; dataarray r;
i = hashfunction(key);
if (empty(r[zl)){
4zj.k = key;
r[z].next = (-1);
n++;
1
else { /*** Find end of chain ***/
while (r[z].next!=(-1) && r[z].k!=key) i = r[z].next;
if (r[zJ.k=key) Error /*** key already in table ***/;
else {
/*** Find next free location ***/
while (!empty(.[nextfree]) && neztfree>=O) nextfree--;
if (nextfree<O) Error /*** Table is full ***I;
else {
r[z].next= neztfree;
r[nextfree].k= key;
r[neztfree].next= ( - 1 ) ;
n++;
SEARCIIING A L GORlTIIMS 79
a = n/m
=- 1 + -
1 (e2(a--x)
8a
- 1 - 2(a - A)) (3 - 2 / p + 2X)
+ +CY 2X -X2/a
4
log m
otherwise;
E[A:,J = C
A = a +e- ifa<X
otherwise.
For every value of a we could select an optinial value for /3 which minimizes
either the successful or unsuccessful case. The value ,8 = 0.853 ... minimizes
the successful case for a full table while p = 0.782 ... does siinilarly for the
unsuccessful case. The value /3 = 0.8G appears to be a good coiiiproiiise for
both cases and a wide range of values for CY.
~ 80 H.4NDBOOK OF ALGORITHMS AND DATA STRUCTURES
References:
[Williams, F.A., 591, [Bays, C., 731, [Knuth, D.E., 731, [Banerjee, J. et af., 751,
[Vitter, J.S., 801, [Vitter, J.S., 801, [Vitter, J.S., 811, [Greene, D.H. et af., 821,
[Vitter, J.S., 821, [Vitter, J.S., 821, [Chen, W-C. et af., 831, [Vitter, J.S., 831,
[Chen, W-C. et af., 841, [Knott, G.D., 841, [Chen, W-C. et af., 861, [Murthy,
I D. ei af., 881.
= (In '2((b
('"')
+ 1)!)lIb + Ql(d(n))) n 1 + q1 + o( 1))
and
The functions Qi(z) are complicated periodic functions with period 1 and
average value 0 (that is, Jt
Qi(z) dx = 0).
h 1 ( K ) mod 2d
This method allows graceful growth and shrinkage of an external hashing
table. Assuming that the directory cannot be kept in internal storage, this
method guarantees access to any record in two external accesses. This makes
it a very good choice for organizing external files.
82 HANDBOOK OF ALGOMTHMS AND DATA STRUCTURES
In case the directory can be kept in main memory, we can access records
with a single external access, which is optimal.
The directory is O ( b - l n l + l / b )in size. This means that for very large n
or for relatively small bucket sizes, the directory may become too large. It is
not likely that such a directory can be stored in main memory.
Insertions may be direct or may require the splitting of a leaf-page or may
even require the duplication of the directory. This gives a bad worst-case
complexity for insertion of new records.
Deletions can be done easily by marking or even by folding split buckets.
Shrinking of the directory, on the other hand, is very expensive and may
require O(n) overhead for every deletion in some cases.
Table 3.18 gives numerical values for several measures in extendible hash-
ing with Poisson distributed keys, for two different bucket sizes.
b = 10 b = 50
n Db(n) E[md] E[mb] Db(n) E[md] E[mb]
100 4.60040 25.8177 14.4954 1.71109 3.42221 2.92498
1000 8.45970 374.563 144.022 5.02284 32.7309 31.0519
10000 12.1860 4860.14 1438.01 8.99995 511.988 265.644
100000 16.0418 68281.7 14492.6 12.0072 4125.43 2860.62
References:
[Fagin, R. et a / . , 791, Tyao, A.C-C., 801, [Regnier, M., 811, [Scholl, M., 811,
[ T a m i n e n , M., 811, [Flajolet, P. et al., 821, [Lloyd, J.W. e t al., 821, [Tam-
minen, M., 821, [Burkhard, W.A., 831, [Flajolet, p., 831, [Lomet, D.B., 831,
[Lomet, D.B., 831, [Bechtald, U. et al., 841, [Mullen, J., 841, [Kawagoe, K.,
851, [Mullin, J.K., 851, [Ouksel, M., 851, [Tamminen, M., 851, [Veklerov, E.,
851, [Enbody, R.J. et al., 881, [Salzberg, B., 881, [Sedgewick, R., 881, [Weems,
B.P., 881, [Henrich, A. et al., 891.
bucket : ({KEY}:,OW~~~OW).
where
k>_O j=1
hm+l(I<) = { m or(IC)m - mo
hm iff hm(1C) # m - mo
other wise
A common implementation of this function, assuming that we have avail-
able a basic hash function h 1 ( K ) which transforms the key into an integer in
a sufficiently large interval, is:
i := hl(lcey);
if ( i mod m0) < m-mO then hashfunction := i mod (2*m0);
else hashfinction := i mod mO;
where hashing is done at the directory level, and overflow in buckets produce a
new internal node in the binary trie (see Section 3.4.4) with the corresponding
bucket split.
These methods are supposed to be excellent methods for storing large
tables which require quick access in external storage.
References:
[Larson, P., 781, [Litwin, W., 781, [Litwin, W., 791, [Larson, P., 801, [Litwin,
W., 801, [Mullin, J.K., 811, [Scholl, M., 811, [Larson, P., 821, [Larson, P., 821,
[Lloyd, J.W. et al., 821, [Ramamohanarao, K. e t al., 821, [Ouksel, M. e t al.,
831, [Kjellberg, P. et al., 841, [Mullen, J., 841, [Ramamohanarao, K. et al.,
841, [Kawagoe, K., 851, [Larson, P., 851, [Larson, P., 851, [Ramamohanarao,
K. et al., 851, [Tamminen, M., 851, [Veklerov, E., 851, [Litwin, W. et d.,861,
[Robinson, J.T., 861, [Litwin, W. et al., 871, [Enbody, R.J. et al., 881, [Larson,
P., 881, [Larson, P., 881, [Lomet, D.B., 881, [Ouksel, M. e i al., 881, [Salzberg,
B., 881, [Baeza-Yates, R.A., 891.
(1) guarantee exactly one external access (optimal) for each key while min-
imizing the additional internal storage required; or
(2) given a fixed amount of internal storage, minimize the number of exter-
nal accesses.
86 HANDBOOK OF ALGORITHMS A N D DATA STRUCTURES
Let us call k-prefix the first k bits of the signature of a key. To solve the
first problem we will construct the following table. For each table location
we will code the following information: (1) the location is empty or (2) the
location is occupied and the key stored in this location has a prefix of length
k. The prefix stored is the shortest possible required to distinguish the stored
key from all other keys that probe this location. Note that on average, only
Cn - 1 other keys probe an occupied location. This algorithm requires build-
ing a table of variable length prefixes, hence we will call it variable-length
signatures.
Let mb(n) be the average number of internal bits required by these algo-
rithms; if the main hashing algorithm is uniform probing (see Section 3.3.2)
or random probing (see Section 3.3.3) we have the following lower and upper
bounds (the upper bounds represent the behaviour of the above algorithm):
a = n/m
m*(n) 2
n
(. + (1 - a)ln (1 - a)-
+
mb(n) 5 log, (- In (1 - CY)) O(1)
mx2 310g, m
- -
61n 2 2
.t O(1) 5 m(m) 5 log, log, m + O(1)
A better lower bound is obtained for memoryless hashing algorithms. Let us
call an algorithm memoryless if it does not store any information gained from
previous probes, except the implicit fact that they failed. All the hashing
algorithms in this section are memoryless.
where P = - In (1 - CY)and
For the second problem we now restrict ourselves to using a fixed, small
number, d , of bits per location. The goal is now to reduce the number of
external accesses. If we store in each location the d-prefix of the stored key,
we reduce the unnecessary accesses by a factor of 2 d . For this algorithm
in the bucket (lower prefixes) and those of the records which overflowed to
other buckets (larger prefixes). This algorithm may displace records with high
prefixes ~ 1 records
s with smaller prefixes are inserted.
Finally, by selecting a fixed length separator and by moving forward
records that would force a larger separator, an optimal and very economi-
cal algorithm is obtained. In this last case, there is a limit on the load of the
external file. In other words, an insertion may fail although there is still room
in the table (this happens when all the buckets are full or their separators are
fully utilized).
Although these algorithms require internal tables, the actual sizes for real
situations are affordable by almost any system. The reduction in the number
of external accesses is very attractive. These methods are more economical in
internal storage than extendible hashing (see Section 3.3.13) with an internally
stored directory.
References:
[Larson, P. et al., 841, [Gonnet, G.H. et al., 881, [Larson, P., 881.
Table 3.19 shows different functions that have been proposed for perfect hash-
ing, where k is the key, assumed to be an integer and a , b, c, ... are parameters
to be chosen appropriately.
To construct a minimal perfect hashing function efficiently we will use
an auxiliary integer array ( A ) of size m2 which will store parameters of the
hashing function.
The hashing function is ( A [ k mod m2] k) mod m where m2 x rn and
gcd(rn, m2) = 1. This function uses a particular multiplicative factor for each
88 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
cluster of keys (all keys having the same IC mod 7722 value form a cluster). The
algorithm will use at most m2 clusters (dimension of the array A ) .
int search(key, r, A )
int key; dataarray r; int *A;
\
SEARCHING ALGORITHMS 89
I
{ int i;
extern int m, m2;
i = hashfunction(A[key%m2],key);
if(r[d.k == key) return( i);
else return (-1);
1 -
The insertion algorithm has to insert all keys at once. The building is
done by inserting the largest clusters first and the smaller later. The insertion
algorithm returns true or false depending on whether it could build the table
for the given keys, and the integers m and m2. If it could not, another ma
can be tried. The probability of failure is O ( l / m ) .
int insert(input, n, r, A )
dataarray input, r; int n , ' * A ;
i f ( m < n) return(0);
for(k0; i<m2; a++) A[2] = 0 ;
for(i=O; i<n; i++) A[input[z].k% m2]++;
/* Shellsort input array based on collision counts */
for (d=n; d > l ; ) {
i f (d<5) d = 1;
else d = (5*d-1)/11;
for (i=n-1-4 i>=O; i--) {
tempr = input[z];
ia = tempr.k % m2;
for (j=i+d; j<n && (A[ia]< A[ib=input[jl.k % m2] 11
A[iu] == A[ib]&& ia > ib); j+=d)
input6j-dl = inputb];
input6j-dI = temp?
1
1
for(i=O; i<n; i=iup) {
ia = input[t].k% m2;
iup = i + A[iu];
90 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
References:
[Sprugnoli, R., 771, [Anderson, M.R. et al., 791, [Tarjan, R.E. et al., 791,
[Cichelli, R.J., 801, [Jaeschke, G. et al., 801, [Jaeschke, G., 811, [Yao, A.C-C.,
811, [Mehlhorn, K., 821, [Bell, R.C. et al., 831, [Du, M.W. et al., 831, [Mairson,
H.G., 831, [Chang, C.C., 841, [Fredman, M.L. et al., 841, [Fredman, M.L. et
al., 841, [Yang, W.P. et al., 841, [Cercone, N. et al., 851, [Cormack, G.V. et al.,
851, [Larson, P. et al., 851, [Sager, T.J., 851, [Yang, W.P. et al., 851, [Aho, A.V.
et al., 861, [Berman, F. et al., 861, [Chang, C.C. et a!., 861, [Dietzfelbinger, M.
et al., 881, [Gori, M. et a!., 891, [Ramakrishna, M.V. et al., 891, [Schmidt, J.P.
et al., 891, [Brain, M.D. et al., 901, [Pearson, P.K., 901, [Winters, V.G., 901.
3.3.17 Summary
Table 3.20 shows the relative total times for inserting 10007 random keys
and performing 50035 searches (five times each key). We also include other
searching algorithms, to compare them with hashing.
SEARCHING ALGORITHMS 91
Tree definition
The internal path length, In, of a tree with n nodes is defined as the sum
of the depths of all its nodes. The external path length, En, of a tree is the
sum of the depths of all its external nodes. For any binary tree
En = In + 2n.
We have
and
A ( l ) = B(1) - 1 = n
A(1) = EIIn]
B(1) = E[En]
For a successful search we have
1 L An L h ( n )
and for an unsuccessful search
1 5 A; 5 h ( n )
The ordered binary tree is a structure which allows us to perform many
operations efficiently: inserting takes a time of O ( h ( n ) ) ;deleting a record
also takes O ( h ( n ) ) ;finding the maximum or minimum key requires O ( h ( n ) )
comparisons; and retrieving all the elements in ascending or descending order
can be done in a time of O(n). With small changes, it permits the retrieval
of the lcth ordered record in the tree in O ( h ( n ) ) .
94 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
General references:
[Hibbard, T.N., 621, [Batson, A,, 651, [Bell, C., 651, [Lynch, W.C., 651, [Arora,
S.R. et al., 691, [Coffman, E.G. et al., 701, [Stanfel, L., 701, [Nievergelt, J. e t
al., 711, [Price, C.E., 711, [Knuth, D.E., 731, [Nievergelt, J. e t a / . , 731, [Robson,
J.M., 731, [Aho, A.V. e t al., 741, [Nievergelt, J . , 741, [Burkhard, W.A., 751,
[Burge, W.H., 761, [Horowitz, E. et al., 761, [Wilson, L.B., 761, [Wirth, N., 761,
[Knott, G.D., 771, [Payne, H.J. et al., 771, [Ruskey, F. et al., 771, [Snyder, L.,
771, [Soule, S., 771, [Driscoll, J.R. et al., 781, [Gotlieb, C.C. et al., 781, [Rotem,
D. e t al., 781, [Flajolet, P. e t al., 791, [Flajolet, P. e t al., 791, [Kronsjo, L.,
791, [Rosenberg, A.L., 791, [Strong, H.R. et ai., 791, [Yongjin, Z. et al., 791,
[Dasarathy, B. et al., SO], [Flajolet, P. et al., SO], [Gill, A., SO], [Kleitman, D.J.
e t al., SO], [Lee, K.P., SO], [Proskurowski, A., SO], [Solomon, M. e t al., SO],
[Standish, T.A., SO], [Stephenson, C.J., SO], [Fisher, M.T.R., 811, [Cesarini,
F. et al., 821, [Ottmann, T. et al., 821, [Aho, A.V. et al., 831, [Andersson, A.
e t al., 831, [Kirschenhofer, P., 831, [Lescarne, P. et al., 831, [Munro, J.I. e t al.,
831, [Reingold, E.M. et al., 831, [Sleator, D.D. e t al., 831, [van Leeuwen, J. et
al., 831, [Brown, G.G. et al., 841, [Mehlhorn, K., 841, [Munro, J.I. et al., 841,
[Ottmann, T. et al., 841, [Brinck, K., 851, [Ottmann, T. e t al., 851, [Pittel,
B., 851, [Zerling, D., 851, [Brinck, K., 861, [Culberson, J.C., 861, [Gordon, D.,
861, [Langenliop, C.E. et al., 861, [Lee, C.C. e t al., 861, [Stout, Q.F. et al.,
861, [Wirth, N., 861, [Burgdorff, H.A. e t al., 871, [Levcopoulos, C. e t al., 881,
[Sedgewick, R., 881, [Andersson, A., 891, [Aragon, C. e t al., 891, [Klein, R. e t
al., 891, [Lentfert, P. e2 al., 891, [Makinen, E., 891, [Manber, U., 891, [Slough,
W. e t al., 891, [Andersson, A. e t al., 901, [Cormen, T.H. e t al., 901, [Francon,
J . e t a!., 901, [Fredman, M.L. e t al., 901, [Ottmann, T. e t al., 901, [Papadakis,
T. e t al., 901, [Pugh, W., 901.
n
B ( 4 = i=l
i-lf22
i
B ( z )- 1
= 2Z- 1
n-1
i=O
a,(&) = +
(2 10/n)Hn - 4 ( 1 + l/n)(H:/n + HA2))+ 4
M 1.3863 loga n - 1.4253
E[Ak] = C
A = 2H,+1- 2 M 1.3863 log, n - 0.8456
a2(Ak) = 2H,+1 - 4H,+,
(2) +2 M 1.3863 log, n - 3.4253
At the cost of two extra pointers per element, randomly generated binary
trees display excellent behaviour in searches. Unfortunately, the worst case
can be generated when the elements are sorted before they are put into the
tree. In particular, if any subset of the input records is sorted, it will cause
the tree to degenerate badly. Compared to the random binary trees of the
next section, however, ordered binary trees generated from random input are
exceptionally well behaved.
Table 3.21 gives numerical values for several efficiency measures in trees
of various sizes.
References:
[ICnuth, D.E., 731, [Knuth, D.E., 741, [Palmer, E.M. et al., 741, [Guibas, L.J.,
751, [Wilson, L.B., 761, [Francon, J . , 771, [Reingold, E M . et al., 771, [Meir,
A. et al., 781, [Robson, J.M., 791, [Brinck, K . et al., 811, [Sprugnoli, R., 811,
96 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
[Wright, W.E., 811, [Bagchi, A. et al., 821, [Knott, G.D., 821, [Robson, J.M.,
821, [Ziviani, N., 821, [Eppinger, J.L., 831, [Devroye, L., 841, [Mahmoud, H.M.
et al., 841, [Pittel, B., 841, [Flajolet, P. et al., 851, [Devroye, L., 861, [Mahmoud,
H.M., 861, [Cunto, W. et al., 871, [Devroye, L., 871, [Devroye, L., 881.
E[An] = 4 - *(?) 9
=J;;;;(l+-+-- 17
+ 0(~-3)) -3--
1
n+l (2,) 8n 128n2 n
When all trees of height h are considered equally likely to occur, then
E[nodes] = (0.62896...)2h - 1 +0(6-2h) (6 1)
SEARCHING ALGORITHMS 97
References:
[Knuth, D.E., 731, [Kemp, R., 791, [Flajolet, P. et al., 801, [Kemp, R., SO],
[Flajolet, P. et al., 821, [Flajolet, P. et al., 841, [Kirschenhofer, P. et al., 871,
[Kemp, R., 891.
If we assume that all trees of height h are equally likely to occur, the average
number of nodes in a balanced tree of height h is
E[nodes] = (0.70118...)2h
Below we give the description of the AVL insertion algorithm. The inser-
tion algorithm uses an additional balance counter in each node of the tree, bal.
98 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
The range of this balance field is -2...2 . The procedures r r o t ( ) and [rot()
which perform right and left rotations are common to several algorithms and
are described in Section 3.4.1.8.
more useful information and leads t o simpler algorithms. Note that using six
bits for height information we could store trees with up t o 0.66 x 1013 nodes.
The constraint on the height balance can be strengthened to require that
either both subtrees be of the same height or the right-side one be taller by
one. These trees are called one-sided height balanced (OSHB), trees. In
this case only one bit per node is required to store the balance information. In-
sertions in OSHBs become more complicated though; in particular, insertions
in O(1og n ) time are extremely complicated.
Similarly, the constraint on the balance may be relaxed. One option is
to allow the height of subtrees to differ at most by k. These trees are called
k-height balanced, HB[k], trees.
Table 3.22 shows some simulation results for AVL trees. C, indicates
the average number of comparisons required in a successful search, R, is the
average number of rotations (single or double) required by an insertion, and
E[h(n)] indicates the average height of a tree of size n.
cn E[h(n)l Rn
2.2 3.O 0.21333
2.907143 4 0.318095
4.930346f0.000033 6.94667f0.00017 0.42731fO.00005
5.88861lf0.000042 7.998905f0.000043 0.44439f0.00005
8.192021f 0 .000087 10.92515fO.00073 0.46 103f0.00006
1000 9.20056fO .00012 11.99842=t0.00020 0.46329fO .00006
11.55409f0.00028 14.9213f0.0026 0.46529f0.00007
10000 12.57009f0.00041 15.99885f0.00072 0.46552f0.00007
50000 14.92963f0.00094 18.9165f0.0096 0.46573 f0.00007
The values for C A can be calculated from the above, for example, for all
binary trees C +
A = (C, 1)/(1+ l/n).
From the above results we can see that the value for C, is close to the
value of log, n; in particular, under the arbitrary assumption that
Cn = &log, n +p
for n 2 500, then
a = 1.01228 f 0.00006 ; and 3!, = -0.8850 f 0.0006 .
References:
[Adelson-Velskii, G.M. e t ai., 621, [Foster, C.C., 651, [ I h o t t , G.D., 711,
[Tan, K.C., 721, [Foster, C.C., 731, [ICnuth, D.E., 731, [Aho, A.V. e t a/., 741,
[Hirschberg, D.S., 761, [Karlton, P.L. e t al., 761, [Luccio, F. e t a/., 761, [Baer,
.", +'
1
-.
* L
J.L. et al., 771, [Reingold, E.M. et d.,771, [Brown, M.R., 781, [Guibas, L.J.
et al., 781, [Kosaraju, S.R., 781, [Luccio, F. et al., 781, [Luccio, F. et al., .
781, [Ottmann, T. et al., 781, [Zweben, S.H. et al., 781, [Brown, M.R., 791,
[Ottmann, T. et al., 791, [Ottmann, T. et al., 791, [Pagli, L., 791, [Raiha, K.J. ,
e t al., 791, [Luccio, F. e i al., 801, [Ottmann, T. et al., 801, [Wright, W.E., 811,
[Mehlhorn, K., 821, [Ziviani, N. et al., 821, [Ziviani, N., 821, [Gonnet, G.H. et
al., 831, [Richards, R.C., 831, [Zaki, A.S., 831, [Tsakalidis, A.K., 851, [Chen,
L., 861, [Li, L., 861, [Mehlhorn, K. e i al., 861, [Klein, R. et al., 871, [Wood, D.,
881, [Manber, U . , 891, [Baeza-Yates, R.A. et al., 901, [Klein, R. ei al., 901.
log, n
cn -dog, a + (1 - a)log2 ( 1 - a )
-2.
Rta L C(Q)
Sh'ARCHING ALGORITHMS 101
,I4
'1
Let Rn be the average number of rotations per inserI,ion in a BB[l- a / 2 ]
tree after the random insertion of n keys into the empty tree. Let f(p) be the
Although the insertion algorithm is coded using real arithmetic, this is not
really needed. For example, fi/2 can be approximated by its convergents 2/3,
5/7, 12/17, 29/41, 70/99, .... In case integer arithmetic must be used, the first
test can be rewritten, for example, as
1 102 HANDBOOK OF ALGORJTHMS AND DATA STRUCTURES
R7a 3
5 2.2 3 0.21333
10 2.9 4 0.3252381
50 4.944142fO.000046 7.02363f 0.00027 0.40861f0.00006
100 5.go8038f0.000067 8.20895 f0.00063 0.42 139f0.00007
500 8.230 1550.OOO 17 11.2552f0.0018 0.43204f0.00008
1000 9.24698f0.00025 12.6081f0.0031 0.43343 f0.00009
5000 11.62148f0.00061 15.6991f0.0076 0.43455 fO.OOO 10
10000 12.64656f0.00091 17.0366f0.0089 0.43470fO .OOO 10
50000 15.0300f0.0022 20.110f0.022 0.43476f0.00011
From the above results we can see that the value for C n is close to the
value of log, n; in particular, under the arbitrary assumption that
C, = alog, n +p
for n >_ 500, then
CY = 1.02107f 0.00013 ; and p = -0.9256 f 0.0012 .
References:
[Knuth, D.E., 731, [Nievergelt, J. et al., 731, [Baer, J.L. et a]., 771, [Reingold,
E.M. et al., 771, [Unterauer, K., 791, [Blum, N. et a/., 801, [Bagchi, A. et al.,
821.
Although these trees are in the class B B ( 1 / 3 ) , there are some important
restrictions on the rotations. This makes their performance superior to the
BB( 1 / 3 ) trees.
A natural extension of this algorithm is to perform rotations only when
the difference in weights is k or larger. This extension is called k-balancing.
For these trees the main complexity measures remain of the same order, while
the number of rotations is expected to be reduced by a factor of k.
ck 5 1.05155...log, n + o(1)
Table 3.24 shows simulation results for these trees. Cn indicates the aver-
age number of comparisons required in a successful search, R n is the average
number of rotations (single or double) required by an insertion and E[h(n)]
indicates the average height of a tree of size n.
n Cn >I
E Ch (n Rn
5 2.2 3 0.213333
10 2.9 4 0.33
50 4.904496f0.000027 6.93788fO .00026 0.469722 f O .000078
100 5.857259f0.000038 8.00408f0.00015 0.494494fO .000090
500 8.151860f0.000090 10.9169f0.0012 0.5 1836f O . O O O 11
1000 9.15670f0.00013 12.0191f0.0010 0.52 177f0.OOO 12
5000 11.50285f0.00032 14.9529f O .0039 0.52476f0.00014
10000 12.51640f0.00048 16.0477f0.0052 0.52521 f0.00014
50000 14.8702f0.0011 18.995f0.011 0.52564 f0.0 00 16
From the above results we can see that the value for Cn is close to the
value of log, n; in particular, under the arbitrary assumption that
References:
[Baer, J.L., 753, [Robson, J.M., 801, [Gonnet, G.H., 831, [Gerash, T.E., 881.
BuiZdTree(Set0fKeys): tree;
begin
K := select(Set0fheys);
A1 := Keys in SetOfI<eys<
A2 := Keys in SetOfKeys > &
return( NewNode(K,BuildTree(A l ) , BuildTree(A2)))
end;
(1) Insert in decreasing probability order In this way, the keys most
likely to be sought are closer to the root and have shorter search paths.
This method requires either a reordering of the keys before they are
put into the tree or the selection of the maximum probability at each
step. For this analysis, we will assume that the keys are numbered in
- p2 >
decreasing probabilit,y order, that is, (p1 > - ... 2 pn). Then for a
random tree
n
we choose the one with the highest accessing probability to be the root.
This selection procedure is repeated recursively for the nodes in each
subtree. Experimental results indicate that these trees are within 2% to
3% from optimal.
(4) Another way of combining approaches (1) and (2) produces trees which
are also called median split trees. At every node we store two keys;
the first one, the 'owner' of the node, is the one with higher probability
in the subtree, and the second one is the median of all the values in
the subtree. The searching algorithm is almost identical to the normal
algorithm:
Using this approach we benefit from the advantages of both (1) and (2)
above, at the cost of one extra key per node. The 'median split' may
be interpreted as the statistical median (a key which splits the tree into
two subtrees in such a way that both halves are the closest possible to
equiprobable) or as the counting median (a key which splits the tree in
equal size halves). Known algorithms to construct optimal median split
trees are not very efficient (at least O(n4)).
(5) Greedy trees This is a heuristic which constructs trees bottom-up.
The construction resembles the Huffnian encoding algorithm. At each
step we select the three consecutive external/internal/external nodes
which add to the lowest accessing probability. A node is constructed
with the two external nodes as direct descendants and the triplet is
replaced by a single external node with the sum of the accessing prob-
abilities. Under this heuristic
CzT 5 2 + 1.81335...H(p',q3
108 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
Self-organizing heuristics
When we do not know the accessing probabilities we may try heuristic or-
ganization schemes sinlilar to the transpose and move-to-front techniques in
sequential searching.
Shape heuristics
(10) Fringe reorg niz ti n This type of h uristics guarantees that any
subtree with size k or smaller is of minimal height (or, equivalently,
of minimal internal path). The simplest heuristic is for k = 3 which
reorganizes any subtree with three nodes which is not in perfect balance.
Under random insertions, a tree constructed using k = 3 will have
12 75
Ci = THn+l --
49
= 1.18825 ...log2 n - 0.54109 ... for n 2 6
for n 2 13.
In general, if k = 2t - 1 (t 2 1) then
References:
[Gotlieb, C.C. et al., 721, [Martin, W.A. ed al., 721, [Knuth, D.E., 731, [Fred-
man, M.L., 751, [Mehlhorn, K., 751, [Walker, W.A. et al., 761, [Baer, J.L. et
al., 771, [Mehlhorn, K., 771, [Allen, B. ed al., 781, [Sheil, B.A., 781, [Horibe, Y.
et al., 791, [Mehlhorn, K., 791, [Comer, D., 801, [Eades, P. et al., 811, [Korsh,
J.F., 811, [Allen, B., 821, [Korsh, J.F., 821, [Poblete, P.V., 821, [Greene, D.H.,
831, [Huang, S-H.S. et al., 831, [Chang, H. et al., 841, [Huang, S-H.S. et al.,
841, [Huang, S-H.S. et al., 841, [Huang, S-H.S. et al., 841, [Perl, Y., 841, [Bent,
S.W. ei al., 851, [Hermosilla, L. et al., 851, [Poblete, P.V. et al., 851, [Sleator,
D.D. et al., 851, [Hester, J.H. et al., 861, [Huang, S-H.S., 871, [Levcopoulos, C.
et al., 871, [Makinen, E., 871, [Hester, J.H. et al., 881, [Moffat, A. et al., 891,
[Sherk, M., 891, [Cole, R., 901.
if pi = 0.
The following algorithm constructs an optimal tree given the probabilities
of successful searches ( p i ) and the probabilities of unsuccessful searches ( q i ) .
This algorithm due to Knuth uses a dynamic programming approach, comput-
ing the cost and root of every tree composed of contiguous keys. To store this
information, the algorithm uses two upper triangular matrices dimensioned
n x n. Both its storage and time requirements are O ( n 2 ) .
begin
(*w Initializations ***}
c[O,O] := q[O];
for i:=l to n do begin
c[i,z]:= q[2);
SEARCHING ALGOHTHMS 111
References:
[Bruno, J. et al., 711, [Hu, T.C. et al., 711, [Knuth, D.E.,711, [Hu, T.C. et al.,
721, [Kennedy, S., 721, [Hu,~T.C., 731, [Knuth, D.E., 731, [Garey, M.R., 741,
[Hosken, W.H., 751, [Itai, A., 761, [Wessner, R.L., 761, [Choy, D.M. et al., 771,
[Garsia, A.M. et al., 771, [Horibe, Y., 771, [Reingold, E.M. et al., 771, [Choy,
D.M. e i ai., 781, [Bagchi, A. et al., 791, [Horibe, Y . , 791, [Hu, T.C. et al., 791,
[Wikstrom, A., 791, [Kleitman, D.J. et al., 811, [Allen, B., 821, [Hu, T.C., 821,
[Akdag, H., 831, [Shirg, M . , 831, [Bender, E.A. et al., 871, [Larmore, L.L., 871,
[Levcopoulos, C. et al., 871, [Baase, S., 881, [Brassard, G. et al., 881, [Kingston,
J.H., 881, [Sedgewick, R., 881, [Levcopoulos, C . et al., 891.
112 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
There are two possible such situations, the one shown in Figure 3.1 and
its symmetric which are called l e f t and right single rotations respectively.
The procedures to perform these rotations are
rrot(tT.righ2); lrot(t);
In many cases the nodes carry some information about the balance of
their subtrees. For example, in AVL trees (see Section 3.4.1.3), each node
contains the difference in height of its subtrees; in weight-balanced trees (see
Section 3.4.1.4) each node contains the total number of nodes in its sub-
tree. This information should be reconstructed by the single rotation, and
consequently double rotations or more complicated rotations based on single
rotations do not need to reconstruct any information.
Let bal contain the difference in height between the right subtree and the
left subtree (h.(t 1 .right) - h(t t . l e f t ) ) , as in AVL trees (see Section 3.4.1.3).
For example, after a single left rotation, the new balance of the nodes A
I 114 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
References:
[Tarjan, R.E., 831, [Zerling, D., 851, [Sleator, D.D. et al., 861, [Stout, Q.F. et
al., 861, [Wilber, R., 861, [Bent, S.W., 001, [Cormen, T.H. e t al., 901, [Ottmann,
T. et al., 901.
begin
if t = nil then Error {*** key not found ***}
else begin
{*** search f o r key t o be deleted ***}
if tf.k < key then delete(key, tf.right)
else if 2l.k > key then delete(key, tf.left)
For height balanced (AVL) trees (see Section 3.4.1.3) we simply replace
the function w t ( ) by the height of the subtree.
References:
[Knuth,' D.E., 731, [Knott, G.D., 751, [Knuth, D.E., 771, [Jonassen, A.T. e t
al., 781, [Brinck, K., 861, [Baeza-Yates, R.A., 891, [Culberson, J.C. et al., 891,
[Cormen, T.H. et al., 901, [Culberson, J.C. et al., 901.
116 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
The lexicographical order is given by the fact that, in each internal node, all
the keys stored in the ith descendant are greater than the i- l t h key and less
than the ith key of the node. The relation between the internal path length,
In, and the external path length, E n , on a tree with n internal nodes, is
The average internal path length of an m-ary search tree built from n
random insertions is:
with variance:
For the expected height, we have the following limit (in probability)
h(n) - 1
-
lim
m-+m Inn
- H,-1
The average space utilization of an m-ary search tree is
References:
[Ruskey, F., 781, [Szwarcfiter, J.L. ei al., 781, [Pagli, L., 791, [Vaishnavi, V.K.
et ai., SO], [Culik 11, K. et al., 811, [Arnow, D. et al., 841, [Szwarcfiter, J.L., 841,
[Mahmoud, N.M., 861, [Baeza-Yates, R.A., 871, [Huang, S-H.S., 871, [Cunto,
W. et al., 881, [Mahmoud, H.M. et a!., 891, [Sherk, M., 891.
SEARCHING ALGORJTHMS 117
3.4.2 B-trees
A B-tree is a balanced multiway tree with the following properties:
(1) Every node has at most 2m + 1 descendants.
+
(2) Every internal node except the root has at least m 1 descendants, the
root either being a leaf or having at least two descendants.
(3) The leaves are null nodes which all appear at the same depth.
B-trees are usually named after their allowable branching factors, that is,
+ +
m 1-2m 1 . For example, 2-3 trees are B-trees with m = 1; 6-11 trees are
B-trees with m = 5. B-trees are used mainly as a primary key access method
for large databases which cannot be stored in internal memory. Recall the
definition of multiway trees:
Note that, in C, arrays always start with index 0, consequently the array
containing the keys runs from 0 to 2M - 1. The lexicographical order is given
by the fact that all the keys in the subtree pointed by p[i] are greater than
k [ i - 13 and less than k [ i ] .
Let En and EA represent the number of nodes accessed in successful and
unsuccessful searches respectively. Let h(n) be the height of a B-tree with n
keys. Then
1
E[En] = h ( n ) -
2mln 2
+ O(m-2)
118 HANDBOOK OF ALGORITHMS A N D D A T A STRUCTURES
where
P(z) = Zm+l(2m+l - 1) 1
2 - 1
and
tn 4in
= -&(log n ) ( l + O(n-))
n
where 0 < 4m < 1 and dm is a root of P ( z ) = z and Q ( x ) is a periodic function
in x with average value qhm/ In P(qhm)and period In P(qhm).Table 3.25 shows
some exact values.
where w ( m ) e w ( m )= m, and
B-tree search
search(key, t )
t y p e k e y key;
btree t;
{ int i;
while ( t != NULL) {
for (i=O; i<t ->d OL&key>t ->k 7; i++);
if ( k e y == t ->k[z])
{ found(t, 2 ) ; return; }
t = 2 ->p[z];
1
no t f ou n d( k e y ) ;
1;
B-tree insertion
btree insert(key, 2 )
typekey key;
btree t;
{
typekey ins;
extern btree NewTree;
typekey InternalInsertO;
ins = InternalInsert(key, t ) ;
/*** check for growth at the root ***/
if (ins != NoKey) return(NewNode( ins, t, NewTree));
return(t);
1;
t y p e k e y InternalInsert(key, t )
t y p e k e y key;
120 HANDBOOK OF ALGORITHhfS A N D D A T A S T R U C T U R E S
btree t;
{int a, j ;
typekey ins;
btree tempr;
extern btree NewTree;
if ( t == NULL) { /*** the bottom of the tree has been reached
indicate insertion to be done ***/
NewTree = NULL;
return(key);
1
else {
for (i=O; i<t ->d && key>t ->k[z]; i++);
i f (i<t ->d && key == t ->k[a))
Error; /*** Key already in table ***/
else {
ins = Interna/Inseat(key, t ->p[a));
if (ins != NoKey)
/*** the key in "ins" has to be inserted in present node ***/
if ( t ->d < 2*M) InsInNode(t, ins, NewTree);
else /*** present node has to be split ***/
{I***create new node ***/
i f (i<=M) {
tempr = NewNode(t ->b[2*M-l], NULL, t ->p[2*M);
t ->d--;
InsInNode(t, ins, NewTree);
1
else tempr = NewNode(ins, NULL, NewTree);
/*** move keys and pointers ***/
for (j=M+2; j<=2*M; j++)
InslnNode(tempr, t ->kb-l], t ->p[s1);
t->d=M;
tempr ->p[O] = t ->p[M+1];
NewTree = tempr;
return( t ->k[MJ);
1
1
return (NoKey);
1
1;
The above algorithm is structured as a main function insert and a sub-
ordinate function Internallnsert. The main function handles the growth at
the root, while the internal one handles the recursive insertion in the tree.
The insertion function returns a pointer to the resulting tree. This pointer
may point to a new node when the B-tree grows at the root.
SEARCHING ALGOHTHMS 121
The insertion algorithm uses the global variable NewNode to keep track of
newly allocated nodes in the case of node splitting. The function InsInNode
inserts a key and its associated pointer in lexicographical order in a given
node. The function CreateNode allocates storage for a new node and inserts
one key and its left and right descendant pointers. The value N o K e y is an
impossible value for a key and it is used to signal that there is no propagation
of splittings during an insertion.
Although B-trees can be used for internal memory dictionaries, this struc-
ture is most suitable for external searching. For external dictionaries, each
node can be made large enough to fit exactly into a physical record, thus
yielding, in general, high branching factors. This produces trees with very
small height.
B-trees are well suited to searches which look for a range of keys rather
than a unique key. Furthermore, since the B-tree structure is kept balanced
during insertions and deletions, there is no need for periodic reorganizations.
Several variations have been proposed for general B-trees with the inten-
tion of improving the utilization factor of the internal nodes. Note that a
better storage utilization will result in a higher effective branching factor,
shorter height and less complexity. The variations can be loosely grouped in
three different classes.
Overflow techniques
There are several overflow techniques for B-trees. The most important
are B*-trees and solutions based on multiple bucket sizes. Both cases are
variations which try to prevent the splitting of nodes.
In B*-trees, when an overflow occurs during an insertion, instead of split-
ting the node we can:
(1) scan a right or left brother of the node to see if there is any room, and, if
there is, we can transfer one key-pointer pair (the leftmost or rightmost
respectively) to make room in the overflowed node;
(2) scan both left and right siblings of a node;
(3) scan all the descendants of the parent of the node.
If splitting is still necessary, the new nodes may take some keys from their
siblings to achieve a more even distribution of keys in nodes. In the worst-
case a 67% node storage utilization is achieved, with an average value of
approximately 81%.
When we have multiple bucket sizes, instead of splitting the node, we
expand it. This is called a partial expansion. When the bucket reaches the
maximum size, we split it into two buckets of minimum size. The simplest
case is having two bucket sizes of relative size ratio 2/3. This also gives a
67% worst-case storage utilization and around 80% average storage utilization
(including external fragmentation owing to two bucket sizes). There are also
adaptive overflow techniques that perform well for sorted or non-uniformly
distributed inputs based on multiple bucket sizes.
122 IIANDDOOK OF ALGORTTIIMS AND DATA STRUCTURES
mt - N - D -LEAF -+
h
nit - 2m- KEY - [D1].
The above variations are somewhat orthogonal, in the sense that these
can be applied simultaneously to achieve varying degrees of optimization.
Note that the limits of the range for any gain in efficiency are from about
70% occupation (for randomly generated trees) to 100% occupation (optimal
trees). The coding complexity of some of these implementations may not
justify the gains.
Table 3.26 presents simulation results of 6-11 trees for several sizes, and
Table 3.27 shows simulation results for various branching factors and a con-
stant size. In both cases, E n indicates the number of nodes accessed, h(n)
indicates the height of the tree, Nn is the average number of nodes in the tree,
and Sn is the average number of splits that the n + l t h insertion will require.
The simulation results indicate that the variance on the number of nodes
accessed is very small. Induced by the formula for the upper bound on the
variance, and with the arbitrary assumption that
SEARCHING ALGORITHMS 123
Nnln sn
0.2 0
1 1 0.1 1
1.889599f0.000007 2f0.0000003 0.150401f0.000007 0.12718f0.00009
2.83386f0.00016 2.9623f0.0002 0.1581 09f0.000009 0.13922f0.00013
2.860087f0.000008 3f0.000003 0.1459 13f0.000008 0.13623f0.00012
3.857201f0.000009 4f0.000007 0.146799f0.000009 0.13972f0.00013
5000 3.8792f0.0011 4.0243f0.0011 0.145827f0.0000 11 0.14724f0.00015
10000 4.854505f0.000011 5f0.000089 0.145995f0.0000 11 0.14704f0.00016
50000 5.85293f0.00079 5.9990f0.0008 0.1461 99f0.0000 12 0.14651f0.00016
General references:
, 711, [Bayer, R. et al., 721, [Knutli, D.E., 731, [Wagner, R.E., 731,
[ B a ~ e rR.,
[Wong, C.K. et al., 731, [Bayer, R., 74:],[Bayer, R. et al., 761, [Horowitz, E. et
a/., 761, [Samadi, B., 761, [Shneiderman, B. et a!., 761, [Wirth, N., 761, [Bayer,
R. et al., 771, [Guibas, L.J. et al., 771, [McCreight, E.M., 771, [Reingold, E.M.
et al., 771, [Gotlieb, C.C. et al., 781, [Held, G. et al., 781, [Maly, K., 781,
[Snyder, L., 781, [Comer, D., 791, [Frederickson, G.N., 791, [Strong, H.R. et
al., 791, [Quitzow, K.H. et al., 801, [Standish, T.A., 801, [Wright, W.E., 801,
[Batory, D.S., 811, [Culik 11, K. et al., 811, [Gotlieb, L.R., 811, [Hansen, W.J.,
811, [Huddleston, S. et al., 811, [Ouksel, M. et al., 811, [Robinson, J.T., 811,
124 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
n=O
4"
t , = -Q(ln
n
n)(l + O(n-l))
where 4 = (1 + 6 ) / 2 is the 'golden ratio', and Q(z) is a periodic function
with period In (4 - 4) and mean value ($In (4 - +))-l.
Let Nn be the expected number of nodes in a 2-3 tree built by the insertion
of a random permutation of n keys. Then
E [ k e y s ] = (0.72161...)3h
The algorithm for searching and performing insertions in 2-3 trees is the
same as the general algorithm for B-trees with m = 1.
As opposed to general B-trees, 2-3 trees are intended for use in main
memory.
In Table 3.28, we give figures showing the performance of 2-3 trees con-
structed from random sets of keys.
~ n EPnI
5 1.68 2 0.72 0.40
10 2.528571 3 0.771429 0.522078
50 4.18710f0.00023 4.84606f0.00025 0.755878f0.000032 0.71874f0.00021
100 4.71396f0.00047 5.40699f0.00049 0.747097f0.000035 0.75062f0.00023
500 6.46226f0.00093 7.19371f0.00094 0.74583lf0.000035 0.74726f0.00025
1000 7.27715f0.00042 8.01493f0.00042 0.745800f0.000035 0.74550f0.00025
5000 9.25824f0.00040 10.0023f0.0004 0.746027f0.000038 0.7459lf0.00028
10000 10.25436f0.00032 10.9993f0.0003 0.746064f0.000039 0.74588f0.00029
50000 12.2518f0.0014 12.9977f0.0014 0.746090~0.000043 0.74610f0.00031
1 N
nB 1
- 5 - < - = 0.70710 ...
2 n - 4
5 1.4142 ...
1 5 E[NnB1
n
References:
[Aho, A.V. et al., 741, [Brown, M.R. et al., 781, [Brown, M.R., 781, [Kriegel,
H.P. et al., 781, [Rosenberg, A.L. et al., 781, [Yao, A.C-C., 781, [Brown, M.R.,
791, [Larson, J.A. et al., 791, [Miller, R. et al., 791, [Reingold, E.M., 791,
[Vaishnavi, V.K. et al., 791, [Bent, S.W. et al., 801, [Brown, M.R. et al., 801,
[Olivie, H.J., 801, [Bitner, J.R. et al., 811, [Kosaraju, S.R., 811, [Maier, D. et
al., 811, [Eisenbarth, B. et al., 821, [Gupta, U.I.et al., 821, [Huddleston, S. et
126 HANDBOOK OF ALGORITIIMS AND DATA STRUCTURES
al., 821, [Mehlhorn, K., 821, [Ziviani, N . , 821, [Kriegel, H.P. et al., 831, [Murthy,
Y.D. e t al., 831, [Zaki, AS., 831, [Zaki, A S . , 841, [Baeza-Yates, R.A. et al.,
851, [Bagchi, A. et al., 851, [Klein, R. et al., 871, [Aldous, D. et al., 881, [Wood,
D., 881.
A C E A C E
Table 3.29 shows some simulation results for SBB trees. Cnis the average
number of nodes visited during a successful search and Sn, Vn and h ( n ) have
the meaning described earlier.
n cn Sn Vn In E[h(n)l
5 2.2000f0.0003 0.213f0.023 1.213f0.023 3.000f0.020
10 2.9057f0.0035 0.293f0.015 1.663f0.021 4.023f0.022
50 4.9720=t0.0051 0.3594f0.0050 2.1692~t0.0073 7.009f0.016
100 5.9307fO .0054 0.3733fO .0046 2.2757fO .0072 8.093f0.033
500 8.2419f0.0059 0.3868f0.0027 2.3801f0.0047 11.027f0.026
1000 9.2537f0.0062 0.3872f0.0023 2.3975f0.0042 12.140f0.068
5000 11.6081f0.0073 0.3876f0.0013 2.4088f0.0023 15.014f0.028
10000 12.6287f0.0083 0.3880f0.0011 2.4 109f0.OO 19 16.180fO.108
From the simulation results we can see that the value for C,, is close to
the value of loga n; in particular, under the arbitrary assumption that
then
References:
[Bayer, R., 721, [Olivie, H.J., 801, [Ziviani, N. et al., 821, [Ziviani, N., 821,
[Tarjan, R.E., 831, [Ziviani, N. et al., 851.
128 HANDBOOK OF ALGORITIIAfS AND DATA STRUCTURES
References:
[Maurer, H.A. e t al., 761, [Ottmann, T. e t al., 781, [Ottmann, T. e t al., 791,
[Olivie, H.J., 801, [Ottmann, T. e t al., 801, [Ottmann, T. e t al., 801, [Olivie,
H.J., 811, [Ottmann, T. e t al., 811, [Mehlhorn, K., 821, [Ottmann, T. e t al.,
841, [Klein, R. et al., 871, [Wood, D., 881.
2-3-4 trees are similar to B-trees. We allow nodes having two, three, or four
children. As for B-trees, all the leaves are at the same level, and this property
is maintained through node splitting when we perform an insertion.
It is possible to represent 2-3-4 trees as binary trees. These are called
red-black trees. A red-black tree is a binary search tree where every node
has a colour, which can be either red or black. The correspondence with 2-3-4
trees is as follows:
(1) A black node with two red children is equivalent to a four children node;
(2) a black node with one red child (the other must be black) corresponds
to a three children node;
(3) a black node with no red children is a two-child node (both children are
black).
According to the above, the colouring of the nodes satisfies the following
proper ties:
(3) Every path from a node to a leaf contains the same number of black
nodes.
Maintaining the colouring properties (that is, balancing the tree) of red-
black trees, during an insertion or a deletion, is done through rotations (Sec-
tion 3.4.1.8).
References:
[Guibas, L.J. e t al., 781, [Sedgewick, R., 881, [Cormen, T . H . e t al., 901.
130 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
References:
[Lomet, D.B., 811, [Scheurmann, P. et al., 821, [Lomet, D.B., 831, [Litwin, W.
et al., 861, [Hsiao, Y-S. et al., 881, [Baeza-Yates, R.A., 891, [Christodoulakis,
S. et al., 891, [Lomet, D.B. et al., 891, [Baeza-Yates, R.A., 901, [Lomet, D.B.
et al., 901.
bucket(KEY) : DICT2(KEY);
In the above definition, DICTl stands for the organization of the index file
and DICT2 for the organization of each individual bucket (both mapping to
DICT), while the collection of all the bucket(KEY) forms the main file.
Indexed files can be organized in several levels. By adding an index of the
index we increase the number of levels by one. This is formally described by
mapping the bucket(KEY) to
bucket (KEY) : iiidex( KEY)
SEARCHING ALGORI?llh!IS 131
instead. If the same DICT structures for each level of indexing are chosen,
the file has homogeneous indexing. In practice, the number of levels is
very small and homogeneous (typically one or two levels).
The typical choices for the DICT structure in the index file are arrays and
trees. The typical choice for the bucket is a sequential array. An indexed file
can, however, be implemented using any selection for the DICT structures
in the index file and bucket and the SET representation for the main file.
Normally the following constraints are imposed on the structure:
(1) each index entry contains as key the maximum key appearing in the
pointed bucket(KEY).
(2) the index file structure should perform range searches, or nearest-
neighbour searches efficiently, the type of search of most interest being
search the smallest key 2 X .
(3) the bucket(KEY) should allow some type of dynamic growth (overflow
records, chaining, and so on), which should not be of bounded size.
(4) it should be possible to scan all the components in a bucket sequentially
and all the components of the set sequentially, or, in other words, it
should be possible to scan all the main file sequentially.
(5) the index contains an artificial key (00) which is larger than any other
key in the file.
begin
low := 0;
high := n; {*** highest index entry ***)
while high-low > 1 do begin
j := (high+low) div 2;
if key <= indexb].k then high := j
else low := j
end;
Searchlndex := index[high].BuckAddr
end;
132 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
begin
while p <> nil do begin
ReadBucket(p) into bucket;
i : = B;
while (i>l) and (bucket.r[z].k>key) do i := 2-1;
if bucket.r[z].k = key then goto 999 {*** break ***}
else if i=B then p := bucket.nezt
else p := nil
end;
999:
if p <> nil then found(bucket.r[z])
else n o tfo u n d( key)
end;
The goal of indexed files is to have an index small enough to keep in main
memory, and buckets small enough to read with a single access. In this ideal
situation, only one external access per random request is needed.
B*-trees (see Section 3.4.2) are a generalization of a special implementation
of index files.
SearchBucket(key, SearchIndez(key));
main file.
index(KEY) : (KEY}?
bucket(KEY) : ({KEY, D
}
:
, int);
In the above definition, B is the bucket size, N denotes the number of buckets
in the main file, and W denotes the number of buckets reserved for overflow.
The integer in the bucket(KEY) is the index of the corresponding overflow
bucket.
The buckets are designed to match closely the physical characteristics of
devices, for example, typically a bucket fully occupies a track in a disk. In
some cases the index is organized as an indexed file itself, in which case the
ISAM becomes a two-level index. For two-level indices the same array struc-
tures are used. The top level index is made to match a physical characteristic
of the device, for example, a cylinder in a disk.
General references:
[Chapin, N., 691, [Chapin, N., 691, [Ghosh, S.P. et al., 691, [Senko, M.E. et al.,
691, [Collmeyer, A.J. et al., 701, [Lum, V.Y., 701, [Mullin, J.K., 711, [Nijssen,
G.M., 711, [Mullin, J.K., 721, [Cardenas, A.F., 731, [Casey, R.G., 731, [Wag-
ner, R.E., 731, [Behymer, J.A. et al., 741, [Grimson, J.B. et al., 741, [Keehn,
D.G. et al., 741, [Shneiderman, B., 741, [Schkolnick, M., 751, [Schkolnick, M.,
751, [Whitt, J.D. et al., 751, [Wong, K.F. et al., 751, [Yue, P.C. et al., 751,
[Gairola, B.K. e t al., 761, [Shneiderman, B. et al., 761, [Anderson, H.D. et al.,
771, [Cardenas, A.F. et al., 771, [Maruyama, K. et al., 771, [Schkolnick, M.,
771, [Senko, M.E., 771, [Severance, D.G.et al., 771, [Gotlieb, C.C. et al., 781,
[Kollias, J.G., 781, [Nakamura, T. et al., 781, [Mizoguchi, T., 791, [Strong, H.R.
et al., 791, [Zvegintzov, N., 801, [Batory, D.S., 811, [Larson, P., 811, [Leipala,
T., 811, [Leipala, T., 821, [Willard, D.E., 821, [Burkhard, W.A., 831, [Cooper,
R.B. et al., 841, [Manolopoulos, Y.P., 861, [Willard, D.E., 861, [Ramakrishna,
M.V. et al., 881, [Rao, V.N.S. et al., 881.
any level where the remaining subtrie has only one record, the branching is
suspended. A trie of order M is defined by
tr-M-D : [{tr-M-D}y]; [D]; nil
The basic trie tree, if the underlying alphabet is ordered, is a lexicograph-
ically ordered tree. The character set is usually the alphabet or the decimal
digits or both. Typically the character set has to include a string-terminator
character (blank). If a string terminator character is available, tries can store
variable length keys. In particular, as we use the smallest prefix of the key
which makes the key unique, digital trees are well suited for handling un-
bounded or semi-infinite keys.
Let Cn and C i denote the average number of internal nodes inspected
during a successful search and an unsuccessful search respectively. Let Nn
denote the number of internal nodes in a trie with n keys, and let h(n) denote
its height. The digital cardinality will be denoted by m; this is the size of
the alphabet and coincides with the dimension of the internal-node arrays.
In all the following formulas, P ( z ) denotes complicated periodic (or con-
vergent t o periodic) functions with average value 0 and very small absolute
value. These functions should be ignored for any practical purposes. Although
we use P ( z ) for all such functions, these may be different.
For tries built from random keys, uniformly distributed in U ( 0 , l ) (or keys
composed of random-uniform digits) we have:
n
- -(1
In m
+ P(log, n)) + O(1)
(c;= c; = 0)
Hn-1
- In in
+ -21 + P(log,n) + O(n-l)
E[h(n)] = 2 log,,, n + o(log n )
where Hn = Cyzll / i denote the harmonic numbers. Table 3.30 shows some
exact values.
SEARCHING ALGORITHMS 135
search( key, t )
typekey key;
trie t;
{
int depth;
for( depth=l; t ! = N U L L && !IsData( t ) ; depth++)
t = t ->p[charac( depth,Ley)];
i f ( t != N U L L && k e y == t ->k)
found( 2);
else notfound( k e y ) ;
~ ~ ~~
1 ~~~
{
int j ;
trie t l ;
i f (t==NULL) r e t u r n ( N e w D a t a N o d e ( key));
if ( I s D a t a ( t ) )
if ( t ->k == key)
Error /*** K e y already in table ***/;
else { tl = N e w l n t N o d e ( ) ;
21 ->p[charac(depth,t ->k)] = t;
t = insert(key, 21, depth);
1
else { j = charac(depth,key);
t -->pbJ = insert( key, t ->pb], depth+l);
1
return(t);
insert uses the level indicator depth to facilitate the search. The user should
call this function with depth 1; for example, insert(bey,trie, 1). The function
IsData(t) tests whether a pointer points to an internal node or to a data
node. The functions NewIntNode and NewDataNode create new nodes of
the corresponding types.
In cases where there is no value associated with the key, we can avoid
the data records completely with a special terminator (such as nil*) which
indicates that a string key terminates there. The key, if desired, can be
reconstructed from the path in the tree.
There is a very close correspondence between a trie tree and top-down
radix sort, as the trie structure reflects the execution pattern of the sort, each
node corresponds to one call to the sorting routine.
I m=2
n ECNnl cn c: E[h(n)l
10 13.42660 4.58131 3.28307 6.92605f0.00068
50 7 1.13458 6.96212 5.54827 11.6105f0.0017
100 143.26928 7.96937 6.54110 13.6108f0.0025
500 720.348 10 10.29709 8.85727 18.2517f0.0060
1000 1441.69617 11.29781 9.85655 20.2566f0.0087
5000 72 12.47792 I 13.62031 12.17792 24.877f0.020
10000 14425.95582 14.62039 13.17785 26.769f0.027
50000
~ ~~
72133.67421 ~ 16.94237 15.49970 30.246f0.03 1
m = 10
10 4.1 1539 1.70903 1.26821 2.42065 f0.00022
50 20.92787 2.43643 2.05685 3.84110f0.00059
100 42.60540 2.73549 2.26860 4.43724f0.00082
500 210.60300 3.44059 3.05159 5.8418f0.002 1
1000 427.45740 3.73802 3.26849 6.4373f0.0029
5000 2107.33593 4.44100 4.05106 7.8286f0.0071
10000 4275.97176 4.73827 4.26847 8.3965f0.0091
50000 21074.66351 5.44104 5.05100 9.494f0.020
General references:
[de la Brandais, R., 591, [Fredkin, E., GO], [Sussenguth, E.H., 631, [Patt,
Y.N., 691, [Knuth, D.E., 731, [Burkhard, W.A., 761, [Horowitz, E. et al., 761,
[Maly, K., 761, [Stanfel, L., 761, [Burkhard, W.A., 771, [Comer, D. et al., 771,
[Miyakawa, M. et a/., 771, [Nicklas, B.M. et al., 771, [Reingold, E.M. et al.,
771, [Gotlieb, C.C. et al., 781, [Comer, D.,791, [Mehlhorn, K., 791, [Tarjan,
R.E. et al., 791, [Comer, D.,811, [Litwin, W., 811, [Lomet, D.B., 811, [Reg-
nier, M., 811, [Tamminen, M., 811, [Devroye, L., 821, [Flajolet, P. et al., 821,
[Knott, G.D., 821, [Orenstein, J.A., 821, [Comer, D., 831, [Flajolet, P. et al.,
831, [Flajolet, P., 831, [Devroye, L., 841, [Mehlhorn, K., 841, [Flajolet, P. e t
al., 851, [Flajolet, P. et al., 861, [Jacquet, P. et al., 861, [Kirschenhofer, P. et
al., 861, [Litwin, W. et al., 861, [Pittel, B., 861, [Szpankowski, W., 871, [de la
Torre, P., 871, [Kirschenhofer, P. ei al., 881, [Lomet, D.B., 881, [Sedgewick,
R., 881, [Szpankowski, W., 881, [Szpankowski, W., 881, [Luccio, F. et al., 891,
[Szpankowski, W., 891, [Murphy, O.J., 901.
Hn-1-Hb-1
Cn =
In m
+ -21 + P(log,n) + O(n-1)
Hn-Hb
c:, = In m
+ -21 + P(log,n) + O(n-1)
The exact formulas for the above quantities are the same as the ones for
general tries but with the extended initial condition: NO = N1 = ... = Nb = 0 .
For bucket binary tries, that is, when m = 2 we have
138 IIANDBOOK OF ALGORITIIMS AND DATA STRUCTURES
Bucket binary tries are used as the collision resolution mechanism for dy-
namic hashing (see Section 3.3.14).
A different type of hybrid trie is obtained by implementing the array in
the internal nodes with a structure which takes advantage of its possible spar-
sity: for example, a linked list consisting of links only for non-empty subtries.
Almost any technique of those used for economizing storage in B-tree nodes
can be applied to the internal nodes in the tries (see Section 3.4.2).
Nn = n
Cn = log,n
7-1
+-
In m
+ 3
-
2
- a , + P(log, n) + o (%)
where
= 1.60669...
n cn c:, cn c:,
10 3.04816 3.24647 2.19458 1.64068
50 5.06061 5.41239 2.90096 2.32270
100 6.00381 6.39134 3.19015 2.61841
500 8.26909 8.69616 3.89782 3.3 1913
1000 9.26011 9.69400 4.18865 3.6 1622
5000 11.57373 12.01420 4.89731 4.31876
10000 12.57250 13.01398 5.18840 4.61600
140 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
(CO = c
1 = 0)
= log2n t.- 1
Y - -
In 2 2
+ P(log,n) + O(n-l)
n-1
c:, 2n - 2 i=l
(C;, = c; = 0)
= log2n + y -InI n2n + -21 + P(log2n) + O ( n - l )
SEARCHING rl1,CORITIIMS 141
search( k e y , t )
typekey key;
Patricia 2;
{
i f (t==NULL) notfound(key);
else { while ( ! I s D a t a ( t ) )
t = bit(t ->level,key) ? t ->right : t ->left;
if ( k e y == t ->k) f o u n d ( t ) ;
else notfound( key);
1
1;
Patricia i n s e r f (key, t )
typekey key;
Patricia t;
{Patricia p ;
Patricia I n s B e t w e e n o ;
int i;
i f (t==NUI;L) return(NewDa;taNode(key));
for(p= t; !I s D a t a( p ) ;)
p = bit(p ->level, key) ? p ->right : p ->left ;
typekey key;
Patricia t;
int i;
(Patricia p ;
if ( l s D a t a ( t ) 11 i < t -->level) {
/* create a new internal node */
p = NeurDataNode(key);
return( bit( i,key) ? NewlntNode( i , t , p ) : NewlntNode( i , p , t ) ) ;
1
if ( b i t ( t ->lewel,key)==l)
t ->right = InsBetween(key, t ->right, i ) ;
else t ->left = InsBetween(key, t ->left, i ) ;
return(t);
1;
The function bit(i, k e y ) returns the ith bit of a key. The functions I s D a t a ,
N e w l n t N o d e and N e w D a t a N o d e have the same functionality as the ones for
tries .
Some implementations keep the number of bits skipped between the bit in-
spected by a node and the bit inspected by its parent, instead of the bit index.
This approach may save some space, but complicates the calling sequence and
the algorithms.
Patricia trees are a practical and efficient solution for handling variable
length or very long keys; they are particularly well suited for text search-
ing. Note that the problem generated by very long common prefixes virtually
disappears for Patricia trees.
The structure generated by building a Patricia tree over all the semi-
infinite strings resulting from a base string (or base text) is called a PAT tree
and has several important uses in text searching (see Section 7.2.2).
Given a set of keys, the shape of the tree is determined, so there cannot
be any conformation or reorganization algorithm.
In summary, digital trees provide a convenient implementation for several
database applications. The most important reasons are:
(2) they allow searching on very long or unbounded keys very efficiently;
(4) they allow search of interleaved keys and hence they are amenable to
multidimensional search.
SEARClllNG ALGORITHMS 143
I -
I .. E[Wl
3.58131 3.07425 4.63400 f O .00023
5.962 12 5.33950
6.96937 6.33232
9.29709 8.64847
10.29781 9.64775
5000 12.62031 11.96910
10000 13.62039 12.96903
50000 15.94237 15.29091
References:
[Morrison, D.R., 681, [Knuth, D.E., 731, [Merrett, T.H. e t a/., 851, [Sz-
pankowski, W., 861, [Kirschenhofer, P. et a/., 881, [Sedgewick, R., 881,
[Kirschenhofer, P. et al., 891.
General references:
[Lum, V.Y., 701, [Dobkin, D. et al., 741, [Rothnie, J.B. et al., 741, [Dobkin,
D. et al., 761, [Raghavan, V.V. et al., 771, [Bentley, J.L. et al., 791, [Kosaraju,
S.R., 791, [Ladi, E. et al., 791, [Lipski, Jr., W. et al., 791, [Bentley, J.L., 801,
[Guting, R.H. et al., 801, [Hirschberg, D.S., 801, [Lee, D.T. et al., 801, [Guting,
R.H. et al., 811, [Ouksel, M. et al., 811, [Eastman, C.M. et al., 821, [Orenstein,
J.A., 821, [Scheurmann, P. et al., 821, [Willard, D.E., 821, [Guttman, A., 841,
[Madhavan, C.E.V., 841, [Mehlhorn, K., 841, [Kent, P., 851, [Cole, R., 861,
[Faloutsos, C. ei al., 871, [Karlsson, R.G. et al., 871, [Munro, J.I., 871, [Sacks-
Davis, R. et al., 871, [Sellis, T. et al., 871, [Willard, D.E., 871, [Fiat, A. et al.,
881, [Seeger, B. et al., 881, [Henrich, A. et al., 891, [Lomet, D.B. et al., 891.
Hn 5n 4
Var[CA] = HA2) + -2
+ -
9
--
9n2
--
13
6
(for k = 2)
, n-1
2
Cn = - l n n + ~ k + O
k + log n n-2+2cm Y) (for any k)
where T k is independent of n.
For partial matches, for k = 2 when only one key is specified,
- 1 + o(1)
= 1.595099...n0.561552***
where (I! = *.
search( key, t )
t?/Pekey ;
tree t;
{
int i, indx, noteq;
while(t != N U L L ) {
indx = noteq = 0;
for (i=O; i<k;i++) {
indx = indx << 1;
if (key[zj > t ->k[zj) indx++;
if (key[z] != i ->k[zj) noteq++;
1
if (noteq) t = t ->p[indz];
else { found(i); return; }
1
notfound( key);
1;
146 IIANDBOOIC OF ALGORITIMS AND DATA STRUCTURES
tree insert(key, t )
typekey key[ 3;
tree t ;
{
int i, indx, noteq;
if ( t = = N U L L ) t = NewNode(key);
Table 3.33: Exact and simulation results for quad trees of two and three
dimensions.
k=2 k=3
n Cn E[h(n)l Cn E[h(n)l
5 2.23556 3.28455f0.00014 2.09307 2.97251f0.00013
10 2 34327 4.41439fO .00025 2.53845 3.78007fO .00022
50 4.35920 7.30033f0.00075 3.59019 5.81713f0.00058
100 5.03634 8.6134f0.0011 4.04838 6.72123f0.00086
500 6.63035 11.7547f0.0029 5.11746 8.8586f0.0021
1000 7.32113 13.1337f0.0043 5.57895 9.7953f0 .0031
5000 8.92842 16.382f0.011 6.65135 11.9847f0.0076
10000 9.62 125 17.784f0.015 7.11336 12.942f0.011
50000 11.2304 21.106f0.038 8.18624 15.140f0.027
vergent to periodic) functions with average value 0 and very small absolute
value. These functions should be ignored for any practical purposes. Although
we use P ( x ) for all such functions, these may be different. The behaviour of
quad tries is identical to those of digital tries of order 2':
- -Hn-1
-
kln2
+ -21 + P((log2n)/k) + O(n-')
n
c:, = 1 + 2-'" (7)(2' -1)n-q (C;, = c; = 0)
6=2
148 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
tree insert(key, t )
typekey k [ q ;
tree t;
{ tree Insertlndx();
return(InsertIndx(key,t,1 ) ) ;
1
tree InsertIndx( key, t , lev)
typekey key[lil;
tree t;
int lev;
{ int i, indx;
tree 21;
i f ( t == NULL) return( NewDataNode( k e y ) ) ;
if (IsData(t)) {
for(i=O; i<K && key[z] == t ->k[z]; i++);
if ( i >= A? {
Error /*** Key already an table ***/;
return(t);
1
else { tl = NewIntNode();
SEARCHING ALGORITHMS 149
indx = 0 ;
for (i=O; i<Ei; i++) indx = 2*indx + bit(lev,t ->k[2]);
tl ->p[indx] = t ;
t = tl;
1
1
indx = 0 ;
+
for (i=O; i<E, i++) indx = 2*indx bit(lev,key[a]);
t ->p[indz] = InsertIndz(key, t ->p[indz], lev+l);
return (t );
1;
Quad tries have been successfully used to represent data associated with
planar coordinates such as maps, graphics, and bit-map displays. For
example, in describing a planar surface, if all the surface is homogeneous, then
it can be described by an external node, if not, the surface is divided into four
equal-size quadrants and the description process continues recursively.
References:
[Finkel, R.A. et al., 741, [Bentley, J.L. et al., 751, [Lee, D.T. et al., 771, [Over-
mars, M.H. et al., 821, [Flajolet, P. et al., 831, [Beckley, D.A. et al., 851, [Fla-
jolet, P. et al., 851, [Fabbrini, F. et al., 861, [Nelson, R.C. et al., 871, [Cunto,
W . et al., 891, [Flajolet, P. et al., 911.
+
u2(A,) = ( 2 10/n)H,, - 4 ( 1 + l / n ) ( H : / n + H F ) )+ 4
N 1.3863 log, n - 1.4253
search( key, t )
typekey kedlil;
tree t;
int lev, i;
for (le-0; t != NULL; lev=(lev+l)%h? {
for (i=O; i<K && key[z]==t ->k[zl; i++);
if (i==K) { found(t); return; }
if (key[lev]> t ->k[lev]) t = t ->right;
else t = t ->left;
1
notfound( key);
~
1; ~~ ~
We have
~ = i - P- + e
k
with 0 < 8 < 0.07. Table 3.34 shows some values for A.
The constant which multiplies the nx term depends on which subkeys are
used in the partial-match query. This constant is lowest when the subkeys
used for the search are the first subkeys of the key.
I A I
k p=l p=2 p=3
1 1 2
3
4
0.56155
0.71618
0.78995
1 0.39485
0.56155
1 0.30555 I
K-d trees allow range searches; the following algorithm searches a k-d tree
for values contained between lowk and uppk. The function f o u n d ( ) is called
for each value in the tree within the range.
{int j ;
if (t==NULL) return; .
if (Zowk[lev]<= t ->k[Zev])
rsearch(Zowk, uppk, t ->left, (/ev+l)%IQ;
if (uppk[lev]> t ->k[Zev])
mearch( lowk, uppk, t ->right, (lev+l) %IQ;
1;
There are no efficient or simple methods for performing rotations in k-d
trees. Consequently it is difficult to maintain a k-d tree balanced.
152 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
References:
[Bentley, J.L., 751, [Friedman, J.H. et al., 77],\[Lee, D.T. et al., 771, [Bentley,
J.L., 791, [Silva-Filho, Y.V., 791, [Eastman, C.M., 811, [Robinson, J.T., 811,
[Silva-Filho, Y.V., 811, [Eastman, C.M. et al., 821, [Hoshi, M. et al., 821,
[Overmars, M.H. e t al., 821, [Flajolet, P. et al., 831, [Beckley, D.A. et a/., 851,
[Flajolet, P. et al., 861, [Murphy, O.J. et al., 861, [Lea, D., 881.
A Sorting Algorithms
The typical definition for procedures to sort arrays in place is, in Pascal:
and in C:
where P is the array to be sorted between r[Zo] and r[up].The sorting is done
in place, in other words, the array is modified by permuting its components
into ascending order.
153
154 H A N D B O O K OF A L G O R I T H M S A N D D A T A S T R U C T U R E S
n-1 5 Cn 5 n(n - 1)
2
n(n - 1)
O I I n I
2
n(n - 1)
-
E[In] =
4
Elpasses] = n -- d a + 513 + 0 (3
The simplest form of the bubble sort always makes its passes from the top of
the array to the bottom.
Bubble sort
var i, j : integer;
tempr : ArrayEntry;
begin
while u p > h do begin
j := lo;
for +lo to up-1. do
if .[z).k > 7'[i+l].k then begin
tempr := 4 2 1 ;
r[z) := r[i+l];
r[i+l] := tempr;
j:= i
end;
SORTING ALGOMTHMS 155 ~
up := j
end
end;
A slightly more complicated algorithm passes from the bottom to the top,
then makes a return pass from top to bottom.
{int i, j;
while (up>lo) {
j = lo;
for (i=lo; i<up; i++)
if (dz1.k > r[i+l].k) {
ezrchange(r, i, i+l);
j = i;}
up = j;
for (i=up; i> lo; i--)
if (.[zj.k < 42-1l.k) {
ezchange( r, i, i-1);
j = a;}
lo = j ;
1
1
The bubble sort is a simple sorting algorithm, but it is inefficient. Its
running time is O ( n 2 ) ,unacceptable even for medium-sized files. Perhaps for
very small files its simplicity may justify its use, but the linear insertion sort
(see Section 4.1.2) is just as simple to code and more efficient to run.
For files with very few elements out of place, the double-direction bubble
sort (or cocktail shaker sort) can be very efficient. If only .k of the n elements
are out of order, the running time of the double-direction sort is O(lcn). One
advantage of the bubble sort is that it is stable: records with equal keys remain
in the same relative order after the sort as before.
References:
[Knuth, D.E., 731, [Reingold, E.M. et ai., 771, [Dobosiewicz, W., 801, [Meijer,
H. et al., 801, [Sedgewick, R., 881, [Weiss, M.A. et al., 881.
156 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
a2(Cn) =
(2n - l l ) n ( n + 7) + 2Hn - Hi2)
72
(int i , j;
ArrayEntry temp?
for (i=up-I; i>=Zo; i---) {
tempr = dz];
for (j=i+I; j<=up && (tempr.k>+j.k);j++)
4l-11 = +I;
+-I] = temp?
. 1
If the table can be extended to add one sentinel record at its end (a record
with the largest possible key), linear insertion sort will improve its efficiency
by having a simpler inner loop.
{int i, j ;
ArrayEnt ry tempr;
r[up+l].k = MaximumKey;
for (i=up- 1; i>=lo; i--) {
ternpr = 7'1z'J;
for (j=i+l; tempr.k>+].k; j++)
+-I] = +];
rb-11 = tempr;
1
The running time for sorting a file of size n with the linear insertion sort
is O ( n 2 ) . For this reason, the use of the algorithm is justifiable only for
sorting very small files. For files of this size (say n < lo), however, the linear
insertion sort may be more efficient than algorithms which perform better
asymptotically. The main advantage of the algorithm is the simplicity of its
code.
Like the bubble sort (see Section 4.1.1), the linear insertion sort is stable:
records with equal keys remain in the same relative order after the sort as
before.
A common variation of linear insertion sort is to do the searching of the
final position of each key with binary search. This variation, called binary
insertion sort, uses an almost optimal number of comparisons but does not
reduce the number of interchanges needed to make space for the inserted key.
The total running time still remains O ( n 2 ) .
{int i, j , h, I;
ArrayEnty tempr;
for (i=lo+l; i<=up; i++) {
tempr = rfz'l;
for (l=Zo-1, h=i; h-l> 1 ; ) {
j = (h+1)/2;
if (tempr.k < 4 j . k ) h = j ; else 1 = j ;
1
for (j=i; j>h; j--) $1 = 4-11;
158 HANDBOOK OF ALGORTTHMS AND DATA STRUCTURES
References:
[Knuth, D.E., 731, [Horowitz, E. et al., 761, [Janko, W., 761, [Reingold, E.M.
et al., 771, [Gotlieb, C.C. et a/., 781, [Melville, R. et al., 801, [Dijkstra, E.W.
et al., 821, [Doberkat, E.E., 821, [Panny, W., 861, [Baase, S., 881, [Sedgewick,
R., 881.
4.1.3 Quicksort
Quicksort is a sorting algorithm which uses the divide-and-conquer technique.
To begin each iteration an element is selected from the file. .The file is then
split into two subfiles, those elements with keys smaller than the selected one
and those elements whose keys are larger. In this way, the selected element
is placed in its proper final location between the two resulting subfiles. This
procedure is repeated recursively on the two subfiles and so on.
Let Cn be the number of comparisons needed to sort a random array of
size n , let In be the number of interchanges performed in the process (for the
present algorithm In will be taken as the number of record assignments), and
let k = [log, n J . Then
n-1
Quicksort algorithm
var i, j : integer;
tempr : ArrayEntry;
begin
while up>lo do begin
2 .- lo;
j := U P ;
tempr := .[lo];
{*** Split file in two ***}
while i<j do begin
while rL1.k > tempr.k do
j := j-1;
421 := rb];
while (i<j) and (r[z],k<=tempr.lc)do
i .-
.- i+l;
:= r[z]
end;
r[zl := tempr;
{*** Sort recursively ***}
sort( r,lo,i-l);
lo := i+l
end
end;
The above algorithm uses the same technique even for very small files. As
it turns out, very small subfiles can be sorted more efficiently with other tech-
niques, such as, linear insertion sort or binary insertion sort (see Section 4.1.2).
It is relatively simple to build a hybrid algorithm which uses Quicksort for
large files and switches to a simpler, more efficient, algorithm for small files.
160 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
Composition of Quicksort
....
begin
while up-lo > Af do begin
. . . . body of quicksort; . . ..
end;
if up > lo then begin
. . . . simpler-sort . . . .
end
end;
(4) Arithmetic averages, or any other method which selects a value that is
not part of the array, produce algorithms that may loop on equal keys.
Arithmetic operations on keys significantly restrict the applicability of
sorting algorithms.
References:
[Hoare, C.A.R., 611, [Hoare, C.A.R., 621, [Scowen, R.S., 651, [Singleton, R.C.,
691, [Frazer, W.D. et al., 701, [van Emden, M.H., 701, [van Emden, M.H.,
701, [Knuth, D.E., 731, [Aho, A.V. et al., 741, [Knuth, D.E., 741, [Loeser, R.,
741, [Peters, J.G. et ul., 751, [Sedgewick, R., 751, [Horowitz, E. et ul., 761,
[Reingold, E.M. et ul., 771, [Sedgewick, R., 771, [Sedgewick, R., 771, [Apers,
P.M., 781, [Gotlieb, C.C. et ul., 781, [Sedgewick, R., 781, [Standish, T.A., 801,
[Rohrich, J., 821, [Motzkin, D., 831, [Erkio, H., 841, [Wainwright, R.L., 851,
[Bing-Chao, H. et al., 861, [Wilf, H., 861, [Verkamo, A.I., 871, [Wegner, L.M.,
871, [Baase, S., 881, [Brassard, G. et al., 881, [Sedgewick, R., 881, [Manber, U.,
891, [Cormen, T.H. et al., 901.
4.1.4 Shellsort
Shellsort (or diminishing increment sort) sorts a file by repetitive application
of linear insertion sort (see Section 4.1.2). For these iterations the file is
seen as a collection of d files interlaced, that is, the first file is the one in
locations 1, d + 1, 2 d + 1, ..., the second in locations 2, d + 2, 2 d + 2, ..., and
so on. Linear insertion sort is applied to each of these files for several values
of d . For example d may take the values in the sequence { n / 3 , n/9, ..., l}.
It is crucial that the sequence of increment values ends with 1 (simple linear
insertion) to guarantee that the file is sorted.
Different sequences of increments give different performances for the algo-
rithm.
Let Cn be the number of comparisons and In the number of interchanges
used by Shellsort to sort n numbers.
For d = { h , k, 1)
{int d, i, j ;
ArrayEntry tempr;
for (d=up- lo+l; d>l;) {
if (d<5) d = 1;
else d = (5*d-1)/11;
/*** Do linear insertion sort in steps size d ***/
for (i=up-d; i>=lo; i--) {
tempr = r[z];
for (j=i+d; j<=up sC& (tempr.k>rb].k);j+=d)
rli-4 = +I;
rb-4 = tempr;
}
1
SORTING ALGORITHhfS 163
I I .- , - = 3dxl+ 1
dhsi 1 a = 0.45454
n E[CnJ E[In] E[Cn] E[Inl
5 7.71667 4.0 8.86667 3.6
10 25.5133 14.1333 25.5133 14.1333
50 287.489 f0.006 164.495f0.007 292.768 f 0.006 151.492f0.006
100 731.950f0 .O 17 432.625f0.018 738.589h0.013 365.939f0.013
500 5862.64 f 0.24 3609.33f0.25 5674.38f0.11 2832.92f0.12
1000 13916.92f0.88 8897.19f0.88 13231.61f0.30 6556.54f0.3 1
5000 101080f16 68159f16 89350.7f3.4 46014.1f 3 . 4
10000 235619f56 164720f56 194063.8f6.7 97404.5f6.7
50000 1671130f 1163 1238247f1163 1203224f58 619996f58
100000 3892524f4336 2966745f4336 2579761f113 1313319f113
References:
[Shell, D.L., 591, [Boothroyd, J . , 631, [Espelid, T.O., 731, [Ihuth, D.E., 731,
[Ghoshdastidar, D. et al., 751, [Erkio, H., 801, [Yao, A.C-C., 801, [Incerpi, J . e t
al., 851, [Sedgewick, R., 861, [Incerpi, J. et al., 871, [Baase, S., 881, [Sedgewick,
R., 881, [Weiss, M.A. et al., 881, [Seliner, E.S., 891, [Weiss, M.A. e2 al., 901.
164 HANDBOOK OF ALGORJTHAIS AND DATA STRUCTURES
4.1.5 Heapsort
Heapsort (or Treesort 111) is a sorting algorithm that sorts by building a
priority queue and then repeatedly extracting the maximum of the queue
until it is empty. The priority queue used is a heap (see Section 5.1.3) that
shares the space in the array to be sorted. The heap is constructed using all
the elements in the array and is located in the lower part of the array. The
sorted array is constructed from top to bottom using the locations vacated by
the heap as it shrinks. Consequently we organize the priority queue to extract
the maximum element .
Cn 5 2n[log2 nJ+ 3n
In 5 n[logz n] + 2.5n
The complexity results for the heap-creation phase can be found in Section I
5.1.3.
Heapsor t
var i : integer;
tempr : ArrayEntry;
begin
(*** construct h e a p ***I
for i := ( u p div 2) downto 2 do siftup(r,z,up);
{*** repeatedly extract maximum ***}
for i := up downto 2 do begin
siftup(r,l,i);
tempr := dl];
.[I] := r[z];
r[z]:= tempr
end
end;
The above algorithm uses the function siftup (defined in Section 5.1.3).
A call to siftup(r,i, n) constructs a subheap in the array r at location i not
beyond location n assuming that there are subheaps rooted at 2i and 2i 1. +
Although the above procedure accepts the parameter lo for conformity with
other sorting routines, Heapsort assumes that lo = 1.
Heapsort is not a stable sorting algorithm since equal keys may be trans-
posed.
Heapsort is guaranteed to execute in O(nlog n) time even in the worst case.
Heapsort does not benefit from a sorted array, nor is its efficiency significantly
SORTING ALGORITHMS 165
References:
[Floyd, R.W., 641, [Williams, J.W.J., 641, [Knuth, D.E., 731, [Aho, A.V. et al.,
741, [Horowitz, E. et al., 761, [Reingold, E M . e2 al., 771, [Doberkat, E.E., 801,
[Standish, T.A., 801, [Dijkstra, E.W. et al., 821, [Dijkstra, E.W., 821, [Hertel,
S., 831, [Doberkat, E.E., 841, [Carlsson, S., 871, [Baase, S., 881, [Sedgewick, R.,
881, [Manber, U., 891, [Cormen, T.H. et al., 901, [Xunuang, G. e t al., 901.
166 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
{int i;
i = (key- M i n K e y ) * (up- lo+l .O) / (MaxIiey- MinKey) + lo;
return(i>up ? up : 2x10 ? lo : 2);
1;
Note that if the above multiplication is done with integers, this operation
is likely to cause overflow.
The array iwk is an auxiliary array with the same dimensions as the array
to be sorted and is used to store the indices to the working array.
The array i w k does not need to be as big as the array to be sorted. If
we make it smaller, the total number of comparisons during the final linear
insertion phase will increase. In particular, if iwk has m entries and m 5 n
then
SORTING ALGORITHMS 167
E[Cn]= 2 n - m - 1 + n(n4m- 1 )
Interpolation sort
{Arraylndices iwk;
ArrayToSort out;
ArrayEntry tempr;
int i, j ;
References:
[Isaac, E.J. et al., 561, [Flores, I., 601, [Kronmal, R.A. et al., 651, [Tarter, M.E.
et al., 661, [Gamzon, E. et al., 691, [Jones, B., 701, [Ducoin, F., 791, [Ehrlich,
G., 811, [Gonnet, G.H., 841, [Lang, S.D., 901.
4a3+ 6 a 2 + 6)
-12(1-
a(a4
4 3 + O(7n-l)
Let Wn be the number of keys in the overflow section beyond the location
m in the table. We have
The expected value of the total number of table probes to sort n elements
using linear probing sort is minimized when n / m = 2 - & = 0.5857.... At
this point the expected number of probes is
cn + rn+ wn = ( 2 + JZ)n+0(1)
Below we describe the linear probing sort using the interpolation func-
tion p h i ( k e y , lo, u p ) . This sorting function depends on two additional global
parameters: rn, which is the size of the interpolation area, and U p p B o u n d r ,
which is the upper bound of the input array (UppBoundr 2 m+w). Selecting
m R ,/n x U p p B o u n d r minimizes the probability of failure due to exceeding
the overflow area.
var i, j : integer;
rl : ArruyToSort;
begin
rl := r;
for j:=lo to UppBoundr do +].k := NoKey;
for j:=lo to up do begin
i := phi(rlbj.k,lo,rn);
while (21.k <> NoKey do begin
if r1b'J.k< (4.k then begin
rIb-11 := dil;
(z] := rlij];
T I L ] := rIb-11
end;
2 := i+l;
if i > UppBoundr then Error
end;
r[z]:= rib]
end;
a .-
.- 10-1;
for j:=lo to UppBoundr do
if 7.c31.k <> NoKey then begin
2 := i+l;
170 HANDBOOK OF ALGORJTHMS AND DATA STRUCTURES
:= $1
rlzl
end;
for j:=i+l to UppBovndr do rfj1.k := N o K e y ;
end;
With a good interpolation formula, this algorithm can rank among the
most efficient interpolation sort (see Section 4.1.6) algorithms.
The application of this algorithm to external storage appears to be promis-
ing; its performance, however, cannot be improved by using larger buckets.
Letting E, be the number of external accesses required to sort n records, we
have
Table 4.4: Exact and simulation results for linear probing sort.
I I m. = 100 I m = 5000 I
1
CY E[Cn] E[Wn] E[InI E[Cn] E[Wn] EEInl
50% 72.908 .23173 13.785f0.003 3747.65 .24960 765.29f0.18
80% 200.696 1.27870
90% 310,184 2.47237
95% 399.882 3.62330
99% 499.135 5.10998
100% 528.706 5.60498
References:
[Melville, R. et a/., 801, [Gonnet, G.H. e t al., 811, [Gonnet, G.H. e i al., 841,
[Poblete, P.V., 871.
4.1.8 Summary
Table 4.5 shows an example of real relative total times for sorting an array
with 49998 random elements.
There are algorithms specially adapted to partially sorted inputs. That
is, they run faster if the input is in order or almost in order. Several measures
of presortedness have been defined, as well as optimal algorithms for each
measure.
SORTING I\LC:ORITHMS 171
A lgoriihm C Pasc:d
Bubble sort
=1254
Shaker sort 2370
Linear insertion sort 544 54 1
Linear insertion sort with sentinel 450 366
Binary insertion sort 443
Quicksort 1.o 1.o
Quicksort with bounded stack usage 1.o
Shellsort 1.9 2 .o
Shellsort for fixed increments 1.9
Heapsort 2.4 2.4
Interpolation sort 2.5 2.1
Interpolation sort (in-place, positive n rnb 2.6
I Linear probing sort 1.4 1.2
References:
[Warren, H.S., 731, [Meijer, H. et al., SO], [Gonzalez, T.F. et al., 821, [Mannila,
H., 841, [Skiena, S.S., 881, [Estivill-Castro, V. et al., 891, [Levcopoulos, C. et
al., 891, [Levcopoulos, C. et a / . , 901.
General references:
[Friend, E.H., 561, [Flores, I., 611, [Boothroyd, J., 631, [Hibbard, T.N., 631,
[Flores, I., 691, [Martin, W.A., 711, [Nozaki, A., 731, [Icnuth, D.E., 741, [Lorin,
H., 751, [Pohl, I., 751, [Preparata, F.P., 751, [Fredman, M.L., 761, [Wirth,
N., 761, [Trabb Pardo, L., 771, [Horvath, E.C., 781, [Borodin, A. et al., 791,
[Kronsjo, L., 791, [Manacher, G.K., 791, [Mehlhorn, K . , 791, [Cook, C.R. et al.,
801, [Erkio, H., 811, [Borodin, A. et a / . , 821, [Aho, A.V. et al., 831, [Reingold,
E.M. et al., 831, [Mehlhorn, I<., 841, [Bui, T.D. et al., 851, [Merritt, S.M., 851,
[Wirth, N . , 861, [Beck, I. e t al., 881, [Richards, D. e l al., 881, [Richards, D.,
881, [Huang, B. et al., 891, [Munro, J.I. et al., 891, [Douglas, C.C. et al., 901,
[Fredman, M.L. et al., 901, [Munro, J.I. et al., 901.
type
list = t rec;
rec = record
k : typekey;
next : list
end;
Reordering of arrays
i : = 1;
while root <> 0 do begin
tempr := 7-frootI;
r[root] := 421;
r(z) := tempr;
r(z).next := root;
root := tempr.next;
.-
.- i+l;
while (root<i) and (root>O) do root := r[root].next;
end;
end;
General references:
[Friend, E.H., 561, [Flores, I., 691, [Tarjan, R.E., 721, [Harper, L.H. ed al., 751,
[Munro, J.I. et al., 761, [Wirth, N., 761, [Gotlieb, C.C. et al., 781, [Sedgewick,
R., 781, [Tanner, RM., 781, [Borodin, A. et al., 791, [Nozaki, A., 791, [Bentley,
J.L. et al., 801, [Chin, F.Y. et al., 801, [Colin, A.J.T. et al,, 801, [Power, L.R.,
801, [Borodin, A. et al., 821, [Aho, A.V. et al., 831, [Goodman, J.E. et al., 831,
[Reingold, E.M. et al., 831, [Mehlhorn, K., 841, [Wirth, N . , 861.
SORTING ALGORITHMS 173
i=l
E [ C p ]= ( k - Q ) 2 k + 2 -
4.24
3
+-8 . 47- k
(log, n - a)n + 2 + O(n-') 5 E[G]5 (log, n - P>n+ 2 + ~ ( n - l )
where Q = 1.26449 ... = 2 - xila -. and ,8 = 1.24075... .
Merge sort
begin
if r = nil then sort := nil
else if n > l then
sort := merge(sort(r, n div 2),
sort(r, (n+l) div 2))
else begin
temp := r;
r := rr.next;
tempf.next := nil;
sort := temp
end
end;
~~~~~~~ ~
If the merging routine is stable, that is, in the output of merge(a, b ) equal
keys are not transposed and those from the list a precede those from the list
b , merge sort will be a stable sorting algorithm and equal keys will not be
transposed.
Merge sort uses extra storage: the pointers that are associated with the
list.
Merge sort can take advantage of partially ordered lists (Natural merge)
as described in Appendix IV. For this variation, the algorithm will do a single
pass on totally ordered (or reversely ordered) files and will have a smooth
transition between O ( n )and O(n log n ) complexity for partially ordered files.
Merge sort is guaranteed to execute in O(n log n ) even in the worst case.
In view of the above, merge sort is one of the best alternatives to sorting
lists.
Table 4.6 illustrates some exact counts of the number of comparisons for
merge sort. The average values are computed for random permutations of the
input file.
References:
[Jones, B., 701, [Bron, C., 721, [Knuth, D.E., 731, [Aho, A.V. ei al., 741, [Dewar,
R.B.K., 741, [Horowitz, E. e t al., 761, [Peltola, E. et al., 781, [Todd, S., 781,
[Erkio, II., 801, [Baase, S., 881, [Brassard, G. et al., 881, [Manber, U., 891.
The execution pattern (sizes of subfiles, and so on) of this algorithm is the
same as for Quicksort for arrays. Let I n be the number of times the inner
loop is executed to sort a file with n elements. The inner loop involves one
or two comparisons and a fixed number of pointer manipulations. Let Cn be
the number of comparisons and k = [log2n J , then
n(n - 1)
(n + 1)k - 2"' +2 5 In 5 2
begin
if r = nil then begin Lust := nil; sort := rend
else begin
lowf := nil; midf := nil; highf := nil;
{*** First key becomes splitter ***}
tailins( r, midf, midl);
r := rT.next;
while r<>nil do begin
if rT .k< midfl .k t lien t ailins( r,lo wf,lowl)
else if r 1 . k m i d f l . k then tailins(r,midf,midl)
else tailins( r,highf,high/);
r := rT.next
end;
{*** Assemble resulting list ***}
if lowf<> nil then begin
1owlT.next := nil;
sort := sort( lowf);
Lastf.next := midf
end
else sort := midJ
if highf<> nil then highll.next := nil;
176 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
References:
[Motzkin, D., 811, [Wegner, L.M., 821.
This measure counts the number of times the innermost loop is executed. In
satisfies the recurrence equation:
Bucket sort
i
int i;
typekey diu, max:b[MJ,minb[MJ;
list head[MJ,t; .
i = ( s ->E-man) / diu;
if (i<O) i = 0; else if (i>=M) i = M-I;
t = s;
s = s ->next;
t ->nezt = head[z];
if (head[z]==NULL) minb[z] = maxb[z] = t ->k;
head[z] = t;
if ( t ->k > mazb[z]) mazb[z] = t ->k;
if ( t - > k < mZnb[z]) minb[zl = t ->k;
1
/* sort recursively */
t = &aux;
for (i=O; i<M; a++) if (head[z]!=NULL){
t ->next = sort(head[z],minb[2],masb[z]);
t = Lust;
1
return ( aux. n ext) ;
1
The above algorithm computes the maximum and minimum key for each
bucket. This is necessary and convenient as it allows correct sorting of files
containing repeated keys and reduces the execution time. Bucket sort requires
two additional parameters, the maximum and minimum key. Since these are
recomputed for each pass, any estimates are acceptable; in the worst case, it
will force bucket sort into one additional pass.
The above function sets the global variable Lust to point to the last record
of a sorted list. This allows easy concatenation of the resulting lists.
Bucket sort can be combined with other sorting techniques. If the number
of buckets is significant compared to the number of records, most of the sorting
work is done during the first pass. Consequently we can use a simpler (but
quicker for small files) algorithm to sort the buckets.
Although the worst case for bucket sort is O ( n 2 ) ,this can only happen
for particular sets of keys and only if the spread in values is n!. This is
very unlikely. If we can perform arithmetic operations on keys, bucket sort is
probably the most efficient alternative to sorting lists.
References:
[Isaac, E.J. et al., 561, [Flores, I., 601, [Tarter, M.E. et al., 661, [Knuth, D.E.,
731, [Cooper, D. et al., 801, [Devroye, L. et al., 811, [Akl, S.G. et al., 821,
[Kirkpatrick, D.G. et al., 841, [Suraweera, F. et a!., 881, [Manber, U., 891,
[Cormen, T.H. et a / . , 901.
SORTING ALGORITHMS 179
begin
for i:=D downto 1 do begin
180 HANDBOOK OF ALGOHTHMS AND DATA STRUCTURES 1
I
The above sorting algorithm uses the function charac(i, key) which re-
turns the ith digit from the key key. The top-down radix sorting function is
described in Appendix IV.
If D log m is larger than log n then bottom-up radix sort is not very effi-
cient. On the other hand, if D log m < log n (some keys must be duplicated),
radix sort is an excellent choice.
References:
[Hibbard, T.N., 631, [MacLaren, M.D., 661, [Knuth, D.E., 731, [Aho, A.V. e t
al., 741, [Reingold, E.M. et al., 771, [McCulloch, C.M., 821, [van der Nat, M.,
831, [Devroye, L., 841, [Baase, S., 881, [Sedgewick, R., 881, [Manber, U., 891,
[Cormen, T.H. et al., 901.
Most of the sorting algorithms described so far are basic in the sense that
their building blocks are more primitive operations rather than other sorting
algorithms. In this section we describe algorithms which combine two or more
sorting algorithms. The basic sortings usually have different properties and
advantages and are combined in a way to exploit their most advantageous
proper ties.
SORTING ALGORITHMS 181
This is a general technique which has been described for Quicksort (see Sec-
tion 4.1.3) in particular. Many recursive sorting algorithms have good general
performance, except that they may do an inordinate amount of work for a file
with very few elements (such as Quicksort or bucket sort for two elements).
On the other hand, being efficient for the tail of the recursion is very
important for the total complexity of the algorithm.
The general scheme for hybrid recursive sorts is then
Hybrid termination
function sort(Eeys);
begin
if size(keys) > M then
< ...main sorting algorithm ... >
else simplersort( keys);
end;
The sirnpZersort() part may be just an analysis of one, two, and three elements
by brute force or another sorting algorithm which does well for small files. In
the latter case, linear insertion sort (see Section 4.1.2) is a favourite candidate.
cn = O(n log n )
If the median is too costly to compute we could split the file into two equal-
size parts and apply bucket sort twice. We then sort the buckets recursively
and finally merge the two halves. This has the same effect as computing the
median for the worst case, but it is much inore efficient.
182 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
References:
[Dobosiewicz, W., 781, [Peltola, E. et al., 781, [Dobosiewicz, W., 791, [Huits,
M. et al., 791, [Jackowski, B.L. et al., 791, [Meijer, H. et al., 801, [van der Nat,
M., 801, [Akl, S.G. et al., 821, [Allison, D.C.S. et al., 821, [Noga, M.T. e t al.,
851, [Tamminen, M., 851, [Handley, C., 861.
4.2.6 Treesort
A Treesort sorting algorithm sorts by constructing a lexicographical search
tree with all the keys. Traversing the tree in an infix order, all the nodes
can be output in the desired order. Treesort algorithms are a composition of
search tree insertion with infix tree traversal.
The number of comparisons required to sort n records is related to the
specific type of search tree. Let Cn be the average number of comparisons in
a successful search, then
Almost any of the tree structures described in Section 3.4 can be used for
this purpose. The following algorithm is based on binary trees.
Binary treesort
tree := nil;
for i:=l to n do insert(tree, < i t h - k e y > ) ;
u u t p u t-infid t re e) ;
SORTING ALGORTTIIMS 183
These algorithms require two pointers per record and consequently are sig-
nificantly more expensive than other methods in terms of additional storage.
There is one circumstance when this structure is desirable, and that is when
the set of records may grow or shrink, and we want to be able to maintain it
in order at low cost.
To guarantee an O ( n log n ) performance it is best to select some form of
balanced tree (such as AVL, weight-balanced or B-trees).
References:
[Frazer, W.D. et al., 701, [Woodall, A.D., 711, [Aho, A.V. et al., 741, [Szwarc-
fiter, J.L. e t al., 781.
4.3 Merging
A special case of sorting is to build a single sorted file from several sorted
files. This process is called merging of files and it is treated separately, as it
normally requires simpler algorithms.
Merging a small number of files together is easily achieved by repeated
use of a function which merges t w o files at a time. In most cases, an optimal
strategy is to merge the two smallest files repeatedly until there is only one
file left. For this reason, the merging of two ordered files is the main function
which we will analyze in this section. Algorithms for merging large numbers
of files are studied in conjunction with external sorting. In particular, the
second phases of the merge sort algorithms are good merging strategies for
many files.
A stable merging algorithm is one which preserves the relative orderings
of equal elements from each of the sequences. The concept of stability can
be extended to enforce that equal elements between sequences will maintain
184 H A N D B O O K OF A L G O R I T H M S A N D D A T A S T R U C T U R E S
General references:
[Floyd, R.W. et al., 731, [Schlumberger, M. et al., 731, [Hyafil, L. et al., 741,
[Harper, L.H. et al., 751, [Xao, A.C-C. et al., 761, [Fabri, J., 771, [Reingold,
E.M. et al., 771, [Sedgewick, R., 781, [Tanner, R.M., 781, [Brown, M.R. et al.,
791, [van der Nat, M., 791, [Mehlhorn, K., 841, [Munro, J.I. et al., 871, [Salowe,
J.S. et al., 871, [Huang, B. et al., 881, [Sedgewick, R., 881, [Huang, B. et al.,
891.
List merging
The above function uses the procedure tailins which inserts a node into a
list defined by its first and last pointers. Such a procedure is useful in general
for working with lists and is described in Section 4.2.
The above algorithm is stable but not fully stable.
References:
[Knuth, D.E.,731, [Horowitz, E. et al., 761, [Huang, B. et al., 881, [Huang, B.
et al., 891.
Merging of arrays
(see Section 4.1.4) will do less work for the merging of two sequences than for
sorting a random array, and is thus recommended.
7 Comparisons S t able
No
Ref e r e n ce
[Kronrod, 691
Yes [Horvarth, 741
Yes [Trabb Pardo, 771
Yes [Wong, 811
Yes [Dudzinski & Dydek, 811
No [Huang & Langston, 881
Yes [Huang & Langston, 891
Table 4.7 lists the properties and references for some in-place merging
algorithms, where nu and 126 denote the sizes of the two arrays to be merged,
+
nu na = n, and without loss of generality we assume nu 2 126.
References:
[Kronrod, M.A., 691, [Iinuth, D.E., 731, [Horvath, E.C., 741, [Trabb Pardo, L.,
771, [Horvath, E.C., 781, [Murphy, P.E. et al., 791, [Dudzinski, K. et al., 811,
[Wong, J.K., 811, [Alagar, V.S. et al., 831, [Mannila, H. et al., 841, [Carlsson,
S., 861, [Thanh, M. et al., 861, [Dvorak, S. et al., 871, [Salowe, J.S. et al., 871,
[Dvorak, S. et al., 881, [Dvorak, S. et al., 881, [Huang, B. et al., 881, [Huang,
B. et al., 891, [Sprugnoli, R., 891.
The Hwang and Lin merging algorithm, sometimes called binary merg-
ing, merges two files with an almost optimal number of comparisons. This
algorithm is optimal for merging a single element into a sequence, two equal
sequences and other cases. Compared to the standard algorithm, it reduces
the number of comparisons significantly for files with very different sizes, how-
ever the number of movements will not be reduced, and hence this algorithm
is mostly of theoretical interest.
The basic idea of binary merging is to compare the first element of the
shorter file with the 1st or 2nd or 4th or 8th ... element of the longer file
depending on the ratio of the file sizes. If no 2 nb then we compare the first
element of file b with the 2t element of a , where t = [log, nu/nbJ. If the
key from file b comes first, then a binary search between 2t - 1 elements is
required; otherwise 2t elements of file a are moved ahead. The procedure is
repeated until one of the files is exhausted.
In its worst case, Hwang and Lins algorithm requires
References:
[Hwang, F.K. et al., 711, [Hwang, F.K. et al., 721, [Knuth, D.E., 731, [Christen,
C., 781, [Manacher, G.K., 791, [Hwang, F.K., 801, [Stockmeyer, P.K. ei al., 801,
[Schulte Monting, J., 811, [Thanh, M. et a!., 821, [Manacher, G.K. et a!., 891.
( 2 ) the intermediate files may not support direct (or random) access of
elements, and even if they do support direct accesses, sequential accesses
are more efficient.
188 HANDBOOK OF ALGORITHRfS AND DATA STRUCTURES
Our main measure of complexity is the number of times that the file has
been copied, or read and written. A complete copy of the file is called a pass.
The algorithms we will describe use the following interface with the file
system:
In all cases the argument i refers to a unit number, an integer in the range
l . . . m a x f i Z e s . The function E o f ( i ) returns the value true when the last
ReadFiZe issued failed. The functions OpenWrite and OpenRead set the
corresponding indicator to the letters o (output unit) and i (input unit)
respectively in the global array FilStat. The direct access operations use
an integer to select the record to be read/written. These operations use the
input file only. Without loss of generality we will assume that the input file
is in unit 1 , which can be used later for the sorting process. Furthermore, the
output file will be placed in any file whose index is returned by the sorting
procedure. In the worst case, if this is not desired and cannot be predicted, a
single copy is sufficient.
The external merge sorting algorithms are the most common algo-
rithms and use t w o phases: distribution phase and merging phase. During
the distribution phase or dispersion phase the input file is read and sorted
into sequences, each sequence as long as possible. These sequences, sometimes
called strings or runs, are distributed among the output files. The merging
phase merges the ordered sequences together until the entire file is a single
sequence; at this point the sorting is completed.
The options available for creating the initial sequences (runs), for dis-
tributing them and organizing the merging phase (which files to merge with
which, and so on) give rise to many variations of external merge sorting.
The distribution phases objective is to create as few sequences as possible,
and at the same time distribute these sequences in a convenient way to start
the merging phase. There are three main methods for constructing the ordered
sequences: replacement selection, natural selection and alternating
selection.
SORTING ALGORITHMS 189
General references:
[Friend, E.H., 561, [Gotlieb, C.C., 631, [Flores, I., 691, [Martin, W.A., 711,
[Frazer, W.D. et al., 721, [Barnett, J.K.R., 731, [Schlumberger, M. et al., 731,
[Hyafil, L. et al., 741, [Lorin, H., 751, [Kronsjo, L., 791, [Munro, J.I. et al.,
801, [McCulloch, C.M., 821, [Tan, K.C. et al., 821, [Reingold, E.M. et al., 831,
[Mehlhorn, K., 841, [Six, H. et al., 841, [Aggarwal, A. et al., 881, [Baase, S.,
881, [Sedgewick, R., 881, [Salzberg, B., 891.
The simplest way to manage the buffers is to keep a priority queue with
the elements larger than the last output key, and a pool with the others. The
following code describes the function distribute which uses a heap as a priority
queue.
dist n'bzlt e( )
{int i, hbot, s;
typekey lastout;
while (i>=O) {
for (hboi=O; hboi< 2;) inseri(++hboi, B u n ;
/*** Siart a new sequence ***/
s = nezifile();
while (hboi >= 0) {
lastout = Bufl0J.L;
WriieFile(s, BuflO]);
BuflO] = Buflhboi];
siftup( Bug, 0 , hbot-1);
if ( ! E o A l ) ) Buflhboi] = ReadFile(1);
if (EoJT1)) Buflhbot--] = Bufli--3;
else if (Buflhboi1.k < Zasioui) hbot--;
else insert(hbot, B u n ;
1
1
1;
The function neztfile returns the file number on which the next sequence
or run should be placed. The functions insert and s i f t u p are described in
the priority queue Section 5.1.3.
Reservoir Average
size run length
MI2 2.15553. ..M
M 2.71828...M
3MI2 3.16268...M
2M 3.53487 ...M
5M/2 3.86367 ...M
3M 4.16220 ...M
where L ( r ) is the average run length with reservoir size r. Table 4.9 shows
some values for the optimal reservoir size. The above function is very (flat
around its minimum, so large variations in the reservoir size do not depart
significantly from the optimum.
where direction is a global variable which contains the letter a or the letter
d. The priority queue functions should also use this global indicator.
The alternation between ascending and descending sequences should be
commanded by the function next f ile. As a general rule, longer sequences are
obtained when the direction is not changed, so the function next f ile should
be designed to minimize the changes in direction. If the direction is changed
for every run, the average length of run is
192 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
3M
E[%] = -2 + o(1)
merge( out)
int out;
{
int i, ism4
typekey lastout;
extern struct rec LastRec[ 3;
extern char FilStatr 1;
lastout = Afinimumhey;
LastRec[O].k = MazimumKey;
while (TRUE){
isml = 0 ;
for (i=I; i<=mazfiles; i++)
if (FilStat[zJ==i && !Eof(i) &&
LastRec[z].k>= lastout &&
LastRec[z].k< LastRec[isml].k)
zsml = a;
if (isml==O) {
for ( k l ; i<=maxftles; i++)
if (FilStat[t~==i && !EofTi))return(0);
return( done );
1
Write File( out, Last Re c[ism4) ;
lastout = LastRec[isml].k;
LastRec[isml]=: ReadFile( isml);
1.
1
Merge uses the global record array LastRec. This array contains the last
SORTING ALGORITHMS 193
record read from every input file. When all the input files are exhausted
simultaneously, this function returns the word done.
References:
[Goetz, M.A., 631, [Knuth, D.E., 631, [Dinsmore, R.J.,651, [Gassner, B.J.,
671, [Frazer, W.D. et al., 721, [McKellar, A.C. et al., 721, [Knuth, D.E., 731,
[Espelid, T.O., 761, [Ting, T.C. et al., 771, [Dobosiewicz, W., 851.
P? = r2 1ogLT/2J[T/21nl
int i, runs;
extern int maxfiles, unit;
extern char FilStat[ 1;
extern struct rec LastRec[ 3;
n extfile()
{extern iat rnuxfiles, unii;
extern char FilStui[ 1;
do unit = unii?%muxfiles+ 1;
while (FilStui[unii]!= ' 0 ');
return( unii);
I;
For simplicity, the current output unit number is kept in the global variable
unit.
For some particular values of n and T , the balanced merge may not be
optimal, for example P i = 5, but an unbalanced merge can do it in four
passes. Also it is easy to see that PT = 2 for n 5 T - 1. The difference
between the optimal and normal balanced merge is not significant.
Table 4.10 shows the maximum number of runs that can be sorted in a
given number of passes for the optimal arrangement of balanced merge sort.
Number of pusses
Files 3 4 5 6 7
3 2 4 4 8 8
4 4 9 16 32 64
5 6 18 36 108 216
6 9 32 81 256 729
7 12 50 144 576 1728
8 16 75 256 1125 4096
10 25 147 625 3456 15625
References:
[Knuth, D.E., 731, [Horowitz, E. et ul., 761.
SORTING ALGORITHMS 105
where ak = ( 44 Tk -+2l ) ~
T-1
4 s
St % x cos
2T - 1 'Os 4T - 2 4T-2 4T-2
1 - 2T-1 s
+ o(77-3)
2 sin s/(4T - 2) s + 24(2T - 1)
Let t k be the total number of runs sorted by a T-file cascade merge sort in k
merging passes or the size of the kth perfect distribution. Then
Table 4.11 shows the maximum number of runs sorted by cascade merge sort
for various values of T and k.
196 HANDBOOK OF ALGORITHMS A N D DATA STRUCTURES
I
Files 3 4 5 6
3 7 13 23
6 14 32 97
I
10 30 85 257
15 55 190 677
21 91 371 1547
28 140 658 3164 15150
10 45 285 1695 10137 62349
References:
[Knuth, D.E., 731, [Kritzinger, P.S. et al., 741.
2
aT=2-
2T-T+1
O(T28-T) +
Let t k be the total number of runs sorted by a T-file polyphase merge in
k merging steps, or the size of the kth perfect distribution, then
t(%) =C t k Z k =
+
(zT - T Z T - 1 ) ~
k
(2%- 1 - % T ) ( Z - 1 )
The number of merging steps, M,, for a perfect distribution with n se-
quences is then
(1+2T:n2)
log, n + 1 - log2(T - 2 ) + O ( 2 T T + n-)
for some positive e.
Let r k be the total number of runs passed (read and written) in a k-step
merge with a perfect distribution. Then
(zT - T Z+ T - l ) ~
T(%) = rk%k =
k
(2%- 1 - Z T ) 2
( a-~2 ) T ( T 2 - (aT - l ) ( a T ) k
((T - 2)k +
rk M
2 --2 T 2T + + + +
T ~ 2,T 2 T ) ( 2 - 2 T TCUT),
Let Pn be the total number of passes of the entire file required to sort n
initial runs in k merging steps. Then
sort()
{
int a, j , some;
extern int mazfiles, mazruns[ 1, actruns[ 1;
extern struct rec LastRec[ 3 ;
I
SORTING ALGORJTHMS 199
nextfile()
(extern int maxfiles, maxruns[ 1, actruns[ 3;
int i, j , inc;
actruns[O]++;
if (actruns[O]>maxruns[O]){
/*** Find next perfect distribution ***/
inc = mazruns[mazfiZes];
maxruns[O] +=
(maxfiles-2) * inc;
for (i=maxfiles; i>l; i--)
mazruns[z) = mazruns[i-1] inc; +
1
j = 2;
/*** select file furthest f r o m perfect ***/
for (2=3; i<=maxfiles; i++)
if (maxruns[z)-actruns[zl > m a x r u n s ~ ] - a c t r u n s ~ j] )= i;
++
act runsb] ;
return (1);
1;
Table 4.12 shows the maximum number of runs sorted by polyphase merge
sort for various numbers of files and passes.
- Nu zber o F passe
3
- 4 5 6 7
7 -
3 7 13 26 54
7 17 55 149 355
11 40 118 378 1233
15 57 209 737 2510
7 19 74 291 1066 4109
8 23 90 355 1400 5446
10 31 122 487 1942 7737
References:
[Gilstad, R.L., 601, [Gilstad, R.L., 631, [Malcolm, W.D., 631, [ M a k e r , H.H.,
631, [McAllester, R.L., 641, [Shell, D.L., 711, [Knuth, D.E., 731, [MacCallum,
I.R., 731, [Kritzinger, P.S. et al., 741, [Horowitz, E. et al., 761, [Zave, D.A.,
200 HANDBOOK OF ALLGOIZITHMS AND DATA STRUCTURES
Oscillating sort
begin
if n=O then {*** Mark as dummy entry ***}
FilStat[unit]:= 8 - 8
else if n=l then
Re a d One Ru n( un it, direction)
Table 4.13 shows the maximum number of runs sorted by oscillating sort
or any of its modified versions, for various numbers of files and passes. Note
that since the input unit remains open during most of the sorting process, it
is not possible to sort with less than four units.
I I Number f passes
61 7
256 4096
625
10 64 32768 I 262144 I
References:
[Sobel, S., 621, [Goetz, M.A. et al., 631, [Knuth, D.E., 731, [Lowden, B.G.T.,
771.
External Quicksort
sort(a, b)
int a, b;
while ( b > a ) {
rupp = wupp = b;
rlow = wlow = a;
InBu$= 0 ;
MaxLower = MinimumKey;
M i n Upper = MaximumKey;
i = a-1;
j = b+l;
I***Partition the file ***I
while (rupp >= rlow) {
if (rlow-wlow < wupp-rupp)
LastRead = ReadDirect( rlow++);
else LastRead = ReadDirect( rupp--);
if (InBu$ < M) {
BuflInBuff++] = LastRead;
intsort( Buff, 0 , InBuff- 1 ) ;
1
else {
if (LastRead.k > B u f l M - 13. k) {
if (LastRead.k > MinUpper) j = wupp;
else Min Upper = LastRead. k;
WriteDirect(wupp--, LastRead);
1
else if (Las2Read.k < BuflO1.k) {
if (LastRead.k < MaxLower) i = wlow;
else MaxLower = LastRead.L;
Write Direct( wlow++, LastRead);
1
else if (udow-a < b-wupp) {
WriteDirect(wlowf+, BuflO]);
MaxLower = BuflO] .k.;
BuaO] = LastRead;
intsort( Buff, 0, M- 1);
1
SORTING ALGORITHMS 203
(1) the records kept in the buffer are maintained as close to the centre as
possible, that is, deletions are done on the left or on the right depending
on how many records were already passed to the left or right.
(2) the reading of records is also done as balanced as possible with respect
to the writing positions. This is done to improve the performance when
the file is not random, but slightly out of order.
(3) two key values are carried during the splitting phase: MazLower and
MinUpper. These are used to determine the largest interval which can
be guaranteed to be in order. By this mechanism it is possible to sort a
totally ordered or reversely ordered file in a single pass.
The function intsort is any internal sorting function. Its complexity is not
crucial as this function is called about M In n times per pass of size n. An
internal sorting function which does little work when the file is almost totally
sorted is preferred (for example, the linear insertion sort of Section 4.1.2).
Table 4.14 shows simulation results on external Quicksort. From these
results we find that the empirical formula
E[P,] = log,(n/M) - 0.924
gives an excellent approximation for files with 1000 or more elements.
For very large internal buffers, a double-ended priority queue should be
used, instead of the function intsort.
External Quicksort requires an external device which supports direct ac-
cess. This sorting procedure sorts records in-place, that is, no additional
204 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
n M=5 M = 10 M = 20
100 3.5272f0.0011 2.73194f0.00076 2.09869f0.00090
500 5.7057f0.0015 4.74526f0.00077 3.88463f0.00057
1000 6.6993d~0.0021 5.69297f0.00095 4.77862f0.00059
5000 9.0555f0 .0051 7.9773f0.00 16 6.99252f0.00063
10000 10.0792f0.007 1 8.9793fO.0026 7.979 13f0.00090
files are required. External Quicksort seems to be an ideal sorting routine for
direct access files.
This version of Quicksort will have an improved efficiency when sorting
partially ordered files.
References:
[Monard, M.C., 801, [Cunto, W. et al., to app.].
Selection Algorithms
205
206 HANDBOOK OF ALGOIZITElAfS A N D DATA STRUCTURES
For the C implementation, the procedures which use var parameters are
changed into functions which return the modified priority queue.
For some applications we may superimpose priority queue operations with
the ability to search for any particular element; search for the successor (or
predecessor) of a given element; delete an arbitrary element, and so on.
Searching structures which accept lexicographical ordering may be used
as priority queues. For example, a binary search tree may be used as a pri-
ority queue. To add an element we use the normal insertion algorithm; the
minimum is in the leftmost node of the tree; the maximum is in the rightmost
node.
In all cases C i will denote the number of comparisons required to insert
an element into a priority queue of size n , C," the number of comparisons to
extract the maximum element and reconstruct the priority queue, and C z the
number of comparisons needed to construct a priority queue from n elements.
I, =
n(n 5)+
6
where I , is the average number of records inspected for all sequences of n
operations which start and finish with an empty queue.
{struct rec r;
list p ;
r.next = pq;
p = &r;
while ( p ->next != N U L L && p ->next ->k > new ->k)
p = p ->next;
SELECTION ALGORITHMS 207
n e w ->next = p ->next;
p ->next = n e w ;
return(r.next);
list delete(pq)
list pq;
t y p e k e y inspect(pq)
list pq;
{if ( p q = = N U L L ) Error /* inspect an e m p t y P Q */;
else r e t u r n ( p q ->k);
1;
A sorted list used as a priority queue is inefficient for insertions, because
it requires O ( n )operations. However it may be a good choice when there are
(1) very few elements in the queue;
(2) special distributions which will produce insertions near the head of the
list;
(3) no insertions at all (all elements are available and sorted before any
extraction is done).
An unsorted l i s t , at the other extreme, provides very easy addition of
elements, but a costly extraction or deletion.
c," = n
c; = 0
208 HANDBOOK OF ALGORITIIAfS AND DATA STRUCTURES
list delet e( p q )
list pq;
{struct rec r;
last p , max;
if (pq==NULL) Error /*** Deleting from empty PQ ***I;
else {r.next = yq;
max = &r;
for (p=pq; p ->next != NULL; p=p ->next)
i f (max->next ->k < p ->next ->k) max = p ;
max ->next == max ->next ->next;
return(r.nett);
1
t ypeke y inspect( p q )
list pq;
{list p ;
t y p e k e y max;
if (pq==NULL) Error /*** Empty Queue ***I;
else { max = p q ->k;
for ( p = p q ->next; p!=NULL; p=p ->next)
if (niax < p ->k) max = p ->k;
return( ntar);
I
SELECTION ALGORJTll firs 200
(1) the elements are already placed in a list by some other criteria;
References:
[Nevalainen, 0. et a / . , '791.
5.1.2 P-trees
P-trees or priority trees are binary trees with a particular ordering con-
straint which makes them suitable for priority queue implementations. This
ordering can be best understood if we tilt the binary tree 45" clockwise and
let the left pointers become horizontal pointers and the right pointers become
vertical. For such a rotated tree the ordering is lexicographical.
We also impose the condition that the maximum and minimum elements
of the tree both be on the leftmost branch, and so on recursively. This implies
that any leftmost node does not have right descendants.
The top of the queue, the maximum in our examples, is kept at the leftmost
node of the tree. The minimum is kept at the root. This requires some
additional searching to retrieve the top of the queue. If we keep additional
pointers and introduce pointers to the parent node in each node, the deletion
and retrieval of the top element become direct operations. In any case, a
deletion does not require any comparisons, only pointer manipulations.
Let Ln be the length of the left path in a queue with n elements. For
each node inspected a key comparison is done. Then for a queue built from
n random keys:
n-1
P-tree insertion
{
tree p ;
if ( p q == NULL) return(new);
else if ( p q ->k >=: new ->k) {
/*** Insert above subtree ***/
new ->left = pq;
return( new);
1
else {
P = Pq;
while ( p ->/eft != NULL)
if ( p ->left ->k >= new ->k) {
/*** Insert in right subtree ***/
p ->right = insert(new, p ->right);
return(pq);
I
else p = p ->left;
/*** Insert at bottom left ***/
p ->left = new;
1;
return(pq);
1;
P-tree deletion of maximum
tree delete(pq)
tree pq;
{
if ( p q == NULL) Error /*** deletion on an e m p t y queue ***I;
else if ( p q ->left == NULL) return(NULL);
else if ( p q ->left -->left == NULL) {
p q ->left = p q ->right;
p q ->right = NULL;
Table 5.1 contains exact results (rounded to six digits). Simulation results
are in excellent agreement with the theoretical ones.
n E[C,C'I ELI
5 7.66667 3.56667
~ 10 27.1935 4.85794
50 347.372 7.99841
' 100 939.017 9.37476
500 8207.70 12.5856
1000 20001.3 13.9709
5000 147948.6 17.1890
10000 342569.2 18.5752
References:
[Jonassen, A.T. et al., 751, [Nevalainen, 0. et al., 781.
5.1.3 Heaps
A heap is a perfect binary tree represented implicitly in an array. This binary
tree has priority queue ordering: the key in the parent node is greater than or
equal to any descendant key. The tree is represented in an array without the
use of pointers. The root is located in position 1. The direct descendants of
the node located in position i are those located in 2i and 2i+ 1. The parent of
node i is located at Lila]. The tree is 'perfect' in the sense that a tree with n
212 HANDBOOK OF ALGORITIIMS AND DATA STRUCTURES
n-1
E[M,] = E [ C i ] - -
n
For an insertion into a random heap (all possible heaps being equally likely),
when n is in the range 2 l - 1 5 n < 2k - 1 we have:
begin
n := n + l ;
j := n;
flag := true;
while flag and (j>l)do begin
i := j div 2;
if r[zJ,k>= n e w . k then flag := false
else begin rb] := r[2J;j := i end
end;
rb] := n e w
end;
If all the elements are available at the same time, we can construct a heap
more efficiently using Floyds method. In this case
SELECTION ALGOIZIllll\lS 213
o 5 M: ,< n - v ( n )
+
E[M,C,-,] = (a1 a, - 2 ) 2 k - k:
3k+4
-- + O(l~4-~)
9 2k
E [ M z ] = 0.74403 ...n + O(Wog n )
where v ( n ) is the number of 1s in the binary representation of n and $ ( n ) is
the number of trailing Os of the binary representation of n .
var j : integer;
tempr : ArrayEntry;
begin
while 2*i<=n do begin
j := 2*i;
i f j < n then
if 4 J . k < + + l ] . k then j := j+l;
if r(z].k < rb1.k then begin
tempr := rbJ;
+J := r[zj;
r[2] := tempr;
end
else i := n+l
end
end;
for i := ( n div 2 ) downto 1 do siftup(r,i,n);
/
C i = [log, ([log, nJ + 1)J+ 1
C: = [log, nj + g ( n ) + O(1)
214 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
cf 5 2[log, ( n -- l ) ] - p(n - 1 )
where p(n) is 1 if n is a power of 2 , 0 otherwise.
2 L L 2 b g 2 nJ - p(n)
E[Cf] =
2((n + 1 ) k - [n/2] - 2 k )
n
where k = [log, n] + 1.
( 2 k - 3)2k + 2
E[C,E,J = -
2k - 1
The heap does not require any extra storage besides the elements them-
selves. These queues can be implemented just by using arrays and there are
no requirements for recursion.
The insertion and extraction operations are guaranteed to be O(1og n ) .
Whenever we can allocate vectors to store the records, the heap seems to
be an ideal priority queue.
Merging two disjoint heaps is an O ( n ) operation.
We can generalize the heap to any branch factor b other than two; in this
case the parent of node i is located at [(i - 2)/bJ +
1 and the descendants are
located at [ b ( i - 1)+21, ..., [bi+ 11. This provides a tradeoff between insertion
and extraction times: the larger b, the shorter the insertion time and longer
the extraction time.
Table 5.2 gives figures for the number comparisons, C z , required to build
a heap by repetitive insertions, the number of comparisons required to insert
the n + l t h element, CA and the number of comparisons required to extract
all the elements from a heap constructed in this manner, Cf.
References:
[Floyd, R.W., 641, [Williams, J.W.J., 641, [Knuth, D.E., 731, [Porter, T . e t al.,
751, [Gonnet, G.H., 761, [Kahaner, D.K., 801, [Doberkat, E.E., 811, [Doberkat,
E.E., 821, [Carlsson, S., 841, [Doberkat, E.E., 841, [Bollobas, B. e t al., 851,
[Sack, J.R. e t a!., $51, [Atkinson, M.D. e t a!., 861, [Fredman, M.L. e t a!.,
861, [Gajewska, H. e t al., 861, [Gonnet, G.H. e t al., 861, [Sleator, D.D. e t al.,
861, [Carlsson, S., 871, [Fredman, M.L. e t a!., 871, [Fredman, M.L. e t al., 871,
[Hasham, A. e t al., 871, [Stasko, J.T. e t a / . , 871, [Brassard, G. e t a!., 881,
[Draws, L. e t a!., 881, [Driscoll, J.R. e t al., 881, [Frieze, A.M., 881, [Sedgewick,
R., 881, [Carlsson, S. e t a!., 891, [Manber, U., 891, [McDiarmid, C.J.H. e t al.,
216 HANDBOOK OF ALGORJTHMS AND DATA STRUCTURES
891, [Strothotte, T. ei al., 891, [Weiss, M.A. et al., 891, [Cormen, T.H. et al,,
901, [Frederickson, G.N., 901, [Sack, J.R. et al., 901.
where s(N) = [fll. The top queue is a queue on the indices of the bottom
array. The index of every non-empty queue in the bottom is a key in the top
queue.
i n s e r t ( n e w : integer; var p q ) ;
case p q is nil:
p q := NewSingleNode( n e w ) ;
case p q is boolean array:
t u r n o n corresponding e n t r y ;
case p q is single element:
expand e n t r y to full node;
SELECTION ALGOMTHMS 217
extract(var p q ) : integer;
case p q is nil:
Error;
case q is boolean array:
p
Find last true entry;
if only one entry remains then transform to SingleEntry;
case p q is single element:
return element;
p q := nil;
case p q is full node:
return maximum;
i f bottom queue corresponding t o maximum is single element
then extract from top queue;
max := max o f bottom[max o f top];
else extract from bottom;
max := max o f bottom;
end;
The functions extract minimum, test membership, find successor and find
predecessor can also be implemented in the same time and space.
References:
[van Emde-Boas, P. et al., 771, [van Emde-Boas, P., 771.
218 HANDBOOK OF ALGORITHMS A N D DATA STRUCTURES
5.1.5 Pagodas
The pagoda is an implementation of a priority queue in a binary tree. The
binary tree is constrained to have priority queue ordering (parent larger than
descendants). The structure of the pointers in the pagoda is peculiar; we have
the following organization:
(1) the root pointers point to the leftmost and to the rightmost nodes;
( 2 ) the right link of a right descendant points to its parent and its left link
to its leftmost descendant;
( 3 ) the left link of a left descendant points to its parent and its right link
to its rightmost descendant.
L
E[CL] = 2 - --
n+l
E[C:] = 2n-2Hn
SELECTION ALGORITHMS 219
begin
if a=nil then m e r g e := b
else if b=nil then m e r g e := a
else begin
{*** f i n d b o t t o m of a's r i g h t m o s t edge ***}
bota := ar.right; a1.righ.t := nil;
{*** b o t t o m of b's l e f t m o s t edge ***}
botb := bT.left; br.left := nil;
r := nil;
{*** m e r g i n g loop ***}
while (bota<>nil) and ( b o t b o n i l ) do
if bota1.k: < botb1.k: then begin
t e m p := botaf.right;
if =nil then botaf.m'ght := bota
else begin
botar.right := r1,right;
rt.right := bota
end;
r := bota;
bota := t e m p
end
else begin
t e m p := botbf.left;
if =nil then botbT.left := botb
else begin
bot by. left := rt .left;
rT.left := botb
end ;
r := botb;
botb := t e m p
end;
{*** o n e edge is exhausted, finish m e r g e ***I
if botb=nil then begin
ar.right := rt.right;
rr.right := bota;
m e r g e := a
end
else begin
bl.left := rf.left;
220 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
rf.left := botb;
m e r g e := b
end
end
end;
~ ~~
Insertion in a pagoda
procedure d e l e t e ( v a r p q : tree);
var le, ri : tree;
begin
if pq=nil then Error {*** deletion on e m p t y q u e u e ***}
else begin
{*** f i n d left descendant of root ***)
if pqt.left = p q then le := nil
else begin
le := pqf.left;
while let.left <> p q do le := let.left;
lel.left := pqT.left
end;
{*** f i n d right descendant of root ***}
i f pqf.right = p q then ri := nil
else begin
ri := pq1.right;
while rif.right <> p q do ri := rif.right;
ril .right := pqf .right
end;
{ *** m e r g e d e s c e n d a n t s ***}
p q := nterge(le, ri)
end
end:
SELECTION ALGORTTHMS 221
References:
[Francon, J . et al., 781.
begin
if p q = nil then p q := n e w
else if pq1.k > new1.k then begin
insert( n e w , p q f .right);
J i X W P q)
end
else begin
new1.left := pq;
p q := n e w
end
end;
222 HANDBOOK OF ALGORTTHhfS AND DATA STRUCTURES
function m e r g e ( a , b : t r e e ) : tree;
begin
if a = nil then m e r g e := b
else if b = nil then m e r g e := a
else if af.k > bf.k then begin
af.m'ght := m e r g e ( a f . r i g h t , b ) ;
fizdist( a ) ;
m e r g e := a
end
else begin
bf.right := m e r g e ( a , b f . r i g h t ) ;
fizdist( b ) ;
m e r g e := b
end
end;
function d i s t a n c e ( p q : t r e e ) : integer;
begin
if pq=nil then distance := 0
else d i s t a n c e := p q f . d i s t
end;
The function fixdist recomputes the distance to the closest leaf by inspect-
ing at the right descendant, if any.
All operations on the leftist trees require O(1og n) time even in the worst
case.
Table 5.3 summarizes simulation results on leftist trees. Cz indicates
the number of comparisons required to build a leftist tree, dist indicates the
distance from the root to the closest leaf and Cf the number of comparisons
required to extract all the elements from the tree.
begin
if p q = nil then p q := n e w
else if p q f . k <= n e w f . k then begin
newf.left := pq;
p q := n e w
end
else if p q .left = nil then
pqT.left := n e w
else if p q .leftf.k <= new1.k then
i n s e r t ( n e w , pqf.left)
else insert( n e w , pqf .right)
end;
Table 5.4 summarizes the simulation results for binary priority queues. I n
indicates the number of iterations performed by the insertion procedure, C,C
the number of comparisons to construct the queue and C, the number of
comparisons to extract all the elements from the queue.
References :
[Knuth, D.E., 731, [Aho, A.V. et al., 741, [McCreight, E.M., 851, [Sleator, D.D.
et al., 851, [Atkinson, M.D. et al., 861.
(2) the root has b descendants; one Bo, one B1, ... , one BI,-1 tree.
BI, trees are the natural structure that arises from a tournament between
2k players.
Two BI, trees can be joined into a single B I , +tree
~ with one single com-
parison. Consequently a Bk tree can be constructed using 2'"- 1 comparisons.
This construction is optimal.
A binomial queue of size n is represented as a forest of B k trees where
there is at most one B k tree for each b . This corresponds to the binary
decomposition of n. For example, n = 13 = 11012 is represented by B3, B2, Bo
SELECTION ALGORITHMS 227
References:
[Brown, M.R., 771, [Brown, M.R., 781, [Vuillernin, J., 781, [Carlsson, S. e t al.,
881, [Cormen, T.H. e t al., 901.
5.1.8 Summary
Table 5.5 shows an example of real relative total times for constructing a
priority queue with 10007 elements by repetitive insertions and then extracting
all its elements.
General references:
[Johnson, D.B., 751, [Pohl, I., 751, [Brown, M.R. e t al., 791, [Flajolet, P. e t al.,
791, [Flajolet, P. e t al., 801, [Standish, T.A., 801, [Itai, A. e t al., 811, [Ajtai, M.
e t a/., 841, [Fischer, M.J. e2 al., 841, [Mehlhorn, K., 841, [Mairson, H.G., 851,
[Huang, S-H.S., 861, [Jones, D.W., 861, [Lentfert, P. e t al., 891, [Sundar, R.,
891.
228 HANDBOOK OF ALGORJTIJAIS AND DATA STRUCTURES
A lg orit h m C Pascal
Sorted lists 55.1 52.9
Unsorted lists 240.2 146.7
P-trees 3.4 3.4
Heaps 1 .o 1 .o
Pagodas 1.5 1.6
Leftist trees 4.3 4.2
Binary priority queues 2.1 2.3
B 3 . T as priority queues 1.7
General references:
[Hoare, C.A.R., 611, [Blum, N. e t a!., 731, [Knuth, D.E., 731, [Nozaki, A., 731,
[Pratt, V. e t al., 731, [Aho, A.V. e t ul., 741, [Noshita, Xi.,741, [Floyd, R.W. e t
al., 751, [Fussenegger, F. e t al., 781, [Hyafil, L., 761, [Schonhage, A. e t al., 761,
SELECTION ALGORITHMS 229
CkM,M
L o w e r bounds
k=l n-1
k=2 n - 2 + [log2 n]
for any j
k = 3, n = 2 j +1
k = 3 , 3 x 2j < n 5 4 x 2j
k = 3 , 2 x 2 j + 1 < n 5 3 x 2j
2k - 15 n < 3k
3k 5 n
2k = n
U p p e r bou n ds
k=l n-1
k=2 +
n - 2 [log, nl
k z l n -k + (k - 1)[log2 (n - k + 2)1
25(2r10g3kl + j) < n - k + 2 and
n - K + (k - l)[log,(n - k + 2)1-
n - k + 2 5 2'(2rlog3 kl + j + 1)
and o 1k/2] > j [log2 K 1
l(k - 1)/2J + j [log2 kl
2k=n+l +
3n O((n log
5k 5 n M n(1 + 21-f10g2(n/5k)1)
+ 5k[loga(n/5k)l
[Wirth, N., 761, [Yap, C.K., 761, [Reingold, E.M. et al., 771, [Johnson, D.B. et
al., 781, [Reiser, A., 781, [Eberlein, P.J., 791, [Fussenegger, F. et al., 791, [Galil,
Z. et al., 791, [Kronsjo, L., 791, [Allison, D.C.S. et al., 801, [Frederickson, G.N.
e t al., 801, [Munro, J.I. et al., 801, [Dobkin, D. et al., 811, [Kirkpatrick, D.G.,
811, [Motoki, T., 821, Tyao, A.C-C. e t al., 821, [Cunto, W., 831, [Postmus, J.T.
et al., 831, [Devroye, L., 841, [Rlehlhorn, IC, 841, [Ramanan, P.V. e t al., 841,
[Bent, S.W. e t a!., 851, [Wirth, N . , 861, [Baase, S., 881, [Brassard, G. et al.,
881, [Lai, T.W. e t a / . , 881, [Sedgewick, R., 881, [Cunto, W. et al., 891, [Manber,
TJ., 891, p a o , A.C-C., 891, [Cormen, T.H. et al., 901, [Frederickson, G.N., 901.
230 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
Selection by sorting
var i, j : integer;
tempr : ArrayEnt y;
begin
s := s+lo-1;
if (s<lo) or (s>up) then Error {*** selection out 01bounds ***}
else begin
while (up>=s) and (s>=lo) do begin
i := lo;
j := U P ;
tempr := r[s]; r[s]:= .[lo]; .[lo] := tempr;
(*** split file in two ***}
while i<j do begin
while rlj1.k > tempr.k do
j := j-1;
44 := 41;
while (i<j) and (r[2].k<=tempr.k)do
2 .-
1- i+l;
rIj] := r[z]
end;
r[z] := tempr;
{*** select subfile ***}
if s<i then up := 2-1
else lo := i+l
end;
select := r[s].k
end
end;
The above algorithm uses as a splitting element the one located at the se-
lected position. For a random file, any location would provide an equivalently
good splitter. However, if the procedure is applied more than once, any other
element (for example, the first) may produce an almost worst-case behaviour.
As selections are done, the array is sorted into order. It is expected that
later selections will cost less, although these will always use O ( n )comparisons.
Strategies which select, in place, a smaller sample to improve the splittings,
cause an almost worst-case situation and should be avoided. Sampling, if done,
should not alter the order of elements in the array.
Any of the distributive methods of sorting, for example, such as bucket
sort (see Section 4.2.3) or top-down radix sort (see Section 4.2.4), can be
modified to do selection. In all cases the strategy is the same: the sorting
algorithms split the file into several subfiles and are applied recursively on to
each subfile (divide and conquer). For selection, we do the same first step,
but then we select only the subfile that will contain the desired element (by
counting the sizes of the subfiles) and apply recursion only on one subfile (tail
recursion).
6 Arithmetic Algorithms
235
236 HANDBOOK OF ALGORJTIIMS AND DATA STRUCTURES
ab = plBn + (p3 +
- p2 - P I ) B " / ~ p2
where B is the base of the numbering system, we obtain
M ( n ) = 3 M ( n / 2 ) + O ( n ) = O(n1.58496.**
1
Similarly, by splitting the numbers in k (n/k)-digit components,
M ( k n ) = (2K - l ) M ( n ) + O(n) = O(n'0gk ( - 1 ) )
ab =
(a + b)2 - (a,- b)2
4
and
since
1
22 = +a:
-z-1- -
1 1
z
For the next complexity results we will assume that we use an asymptoti-
cally fast multiplication algorithm, that is, one for which
M ( n ) = O(n(l0g n y )
In such circumstances,
M ( n a k ) = -M ( n ) (1
1-a,
+ O(l/(log 72)))
k>O
xi+l = - E j + 1) Ej = uti -1
then Q l / , ( n ) R 3 M ( n ) also. Consequently divisions can be computed in
Q / ( 4* 4M(n)
To evaluate x = a - l i 2 we can use the third-order iteration:
q = axi2 - 1
Xi+l = xi - X i E i 4 - 3i
8
for which
Consequently
11M(n)
Qfi(4 2
Derivatives can be computed from the formula
then
+
where p = (1 6 ) / 2 is the golden ratio.
For the purpose of describing the algorithms we will use a common rep-
resentation, based on arrays of digits. The digits may take values from 0 to
B A S E - 1 in their normalized form, although a digit may hold a maximum
value M A X D . For example, for eight-bit characters on which we want to
represent decimal numbers, B A S E = 10 and M A X D = 255.- The bound
M A X D may be any value including BASE - 1. For our algorithms we will
assume that M A X D 2 2BASE2. With this assumption we do not have to
use temporary variables for the handling of digits.
The data definition for our C algorithms is
238 HANDBOOK OF ALGORJTIIAITS AND DATA STRUCTURES
nomnali,ze( a )
*P a;
linear(a, k a , b, kb)
mP a , b;
int La, kb;
m u l i n t ( a , b, c)
mP a, b, c;
/*** multiply t w o integers. a*b- ->c ***/
{int i, j , la, lb;
/*** b a n d c m a y coincide ***/
la = length(a); Zb = length(b);
for (i=O; i<la-2; i++) c[lb+z] = 0 ;
for (i=lb-1; i>O; i--) {
for (j=2; j<la; j++)
if ((c[i+j-11 +=
b[z]*a[cll)>
ICfAXD- (BASE- 1 )*(BASE- 1)- M A X D / BASE) {
c[i+j-1] -= (ICfAdYD/BASE)*BASE;
C[i+J] += ICfAXDlBASE;
c[2] = b[~]*a[l];
1
storelength( c, la+lb-2);
storesign( c, sign( a)==sign( b) ? POS : NEG);
normalize( c);
1;
240 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
References:
[Knuth, D.E., 691, [Aho, A.V. e t al., 741, [Borodin, A. e t al., 751, [Floyd, R.W.,
751, [Artzy, E. eZ al., 761, [Brent, R.P., 761, [Brent, R.P., 761, [Collins, G.E.e t
al., 771, [Dhawan, A.K. e2 al., 771, [Knuth, D.E., 781, [Morris, R., 781, [JaJa,
J . , 791, [Alt, I., 801, [Bruss, A.R. et al., 801, [Head, A.K., 801, [Linnainmaa,
S., 811, [Alt, H., 831, [Stockmeyer, L.J., 831, [Regener, E., 841, [Flajolet, P. e t
al., 851, [Flajolet, P., 851, [Kaminski, M., 871, [Alt, H., 881, [Robertazzi, T.G.
e t al., 881.
and
Qopt(n) IQbp(n)
The first inequality is tight, but the latter is not. n = 15 is the smallest
example for which they differ: we can compute x15 by computing x 2 , x3, x6,
x12 and 215 giving QOpt(15)= 5 while Qbp(15) = 6. Similarly, the smallest
exponent for which the difference is 2 is 63, Qopt(63) = 8 while &bp(63) = 10.
(One of the optimal sequences of powers is 2,4,5,9,18,27,45,63.)
The problem of computing the optimal strategy for powering is related
to the addition chain problem, which is how to construct an increasing
sequence ul,u 2 , . . . , uk for which every element is the sum of two previous
elements and a1 = 1 and uk = n for a minimal b.
Using tlie fact that (ax)Y = uxY, if tlie power is a composite number, then
ARITHMETIC ALGORITHMS 241
This inequality is not tight. For example, QOpt(33)= 6 but QOp1(3) = 2 and
Qopt(l1) = 5.
It is always possible to do a squaring as the last step, which gives
Q o p t ( 2 n ) IQ o p t ( n ) +1
but this bound is not tight either since Q o p t ( 1 9 1 ) = 11 and Q o p t ( 3 8 2 ) = 1 1 .
For binary powering we can define an average value of the complexity, as
if the bits of the power were randomly selected. For this definition
as opposed to
n2
Qiter(n) M yM(N)
In the above cases it is assumed that the size of the result of powering an
N-digit number to the nth power is an Nn-digit number. This may be too
pessimistic sometimes.
Binary powering
function p o w e r ( b : n u m b e r ; e : integer) : n u m b e r ;
begin
if e<O then p o w e r : = l / p o w e r ( b , - e )
242 HANDBOOK OF ALGORITHMS A N D D A T A STRUCTURES
ai+1 =
ai + bi
2
where n is the number of digits in the answer. The AG mean is related to the
complete elliptic integrals as
*I2 dB
/d
7r
Fast computation of 7r
function pi : number;
var a , b, t, x, tempu : number;
begin
a := 1;
ARITHMETIC ALGORITHMS 243
b := sqrt(0.5);
2 := 0.25;
x := 1;
while a- b>epsilon do begin
tempa := a;
a := (a+b) / 2;
b := sqrt(tempa* b);
t := t - x*sqr(a-tempa);
x := 2*x
end;
pi := sqr(a+b) / (4*t)
end;
~~ ~~~ ~ ~ -~
Other classical methods for evaluating 7r are based on identities of the type
7r = 16 arctan(l/5) - 4 arctan(1/239)
The function arctan( l/i) for integer i can be evaluated in time proportional to
+
O ( n 2 /log i) using the Maclaurin expansion of arctan(x) = x - x3/3 x5/5 -
... .
ln(x) = 2AG(1,4/x)
71
(1 + o(x-2))
If x is not large enough, we can simply scale it by multiplying by a suitable
power of the BASE (just a shift). For this method
begin
logbase := crude_estimate_of_ln(z)/ln(BASE);
if 2*logbase<Digits then begin
shift := Digits div 2 - logbase 1; +
244 IIANDBOOK OF ALGORITHMS AND DATA STRUCTURES
sin z = -
2i
then
begin
s := sqrteps;
v := 2 / ( 1 + sqri(l+z.*z));
q := 1;
while 1-5 > epsilon do begiii
ARITHMETIC AI,( :ORITHMS 245
q := 2*q / (l+s);
w := 2*s*v / (l+v*v);
+
w := w / (1 sqrt(1-w*w));
w := (v+w) / (l-v*w);
+
v := w / (1 sqrt(l+w*w));
s := 2 * s q d ( s ) / (l+s)
end;
arctan := q * ln((l+v)/(l-v))
end;
References:
[Knuth, D.E., 691, [Horowitz, E., 731, [Kedem, Z.M., 741, [Borodin, A. e l a/.,
751, [Winograd, S., 751, [Brent, R.P., 761, [Brent, R.P., 761, [Yao, A.C-C., 761,
[Pippenger, N., 791, [Pippenger, N., 801, [Downey, P. e i al., 811, [Borwein, J.M.
et al., 841, [Brassard, G . et al., 881, [Tang, P.T.P., 891.
k=l
The classical algorithm for matrix multiplication requires mnp niultiplications
and mn(p- 1) additions. Let 44, ( n )be the number of niult.iplications used to
multiply two n x n matrices. Then hf,(n) = n3 for the classical algorithm.
Classical algorithm
for i:=l to m do
for j:=1 to n do begin
C [ i , J ] := 0;
for k:=l t o p do
C [ i , J ] := C [ i , J ] t.U[i,k]*b[k,J]
end;
246 HANDBOOK OF ALGORITHMS A N D D A T A S T R U C T U R E S
k=l
where
k=l
and the last term (t) is present only if p is odd. Winograds matrix multipli-
cation uses
multiplications and
A , ( m , p , n) = m n ( p + 2) + (mn + m + n)(k / 2 ] - 1)
additions/sub tractions.
( x , +) References
(Kn + 1)/21, 4 [Motzkin,65]
odd, n 2 7 ((n + 3)/2, ...) [Motzkin,65], [Knuth,81], [Revah,75]
(-, n ) [Belaga,58]
((n+ 2)/2, n + 1) [Knut h ,811, [Pan ,791
odd, n 2 11 ( ( n+ 1)/2, n + 2) [Knuth,62], [Revah,75]
odd, n 2 3 ( ( n+ 3)/3, 4 [Belaga,5S], [Revah,75]
741, [Shaw, M. et al., 741, [Strassen, V., 741, [Aho, A.V. e t al., 751, [Borodin,
A. e t al., 751, [Hyafil, L. e t al., 751, [Lipton, R.J. e t al., 751, [Revah, L., 751,
[Borodin, A. e t al., 761, [Chin, F.Y., 761, [Lipton, R.J. et a!., 761, [Schonhage,
A., 771, [Shaw, M. et al., 771, [Lipton, R.J., 781, [Pan, V.Y., 781, [van de Wiele,
J.P., 781, [Kronsjo, L., 791, [Nozaki, A., 791, [Rivest, R.L. e t al., 791, [Brown,
M.R. et al., 801, [Dobkin, D. et al., 801, [Heintz, J . e t a!., 801, [Heintz, J. et a/.,
801, [Mescheder, B., 801, [Schnorr, C.P. e t al., SO], [Pan, V.Y., 811, [Schnorr,
C.P., 811, [Baase, S., 881, [Sedgewick, R., 881, [Hansen, E.R. e t al., 901.
7 U
Text Algorithms
25 1
252 HANDBOOK OF ALGORITHALS AND DATA STRUCTURES
and in Pascal:
The Pascal compiler must support variable length strings to have the pro-
g r a m given here working.
These functions can be composed to search on external text files:
m = strlen(pat);
i f ( m == 0 ) r e t u r n ( 0 ) ;
i f ( m >= BUFSIZ)
return(-2); /*** Bufler is too small ***/
/*** Assume that the file is open and positioned ***/
offs = 0 ; /*** number of characters already read ***/
nb = 0 ; /*** number of characters in bufler ***/
while( T R UE) {
i f ( n b >= m ) {
/*** try t o match ***/
p = search(pat,bum;
i f ( p != NULL)
return(p-buf+ of.); /*** found ***/
for(i=O; i < m; i++) buaz) = bufli+nb-m+l];
offs += nb-m+l;
TEXT ALGORITIIMS 253
~
nb = m-1;
1
/*** read more text ***/
nr = read(Jiledesc,bufl+nb, BUFSIZ-1-nb);
i f ( n r <= 0 ) return(-1); /*** not found ***/
nb += nr;
buflnb] = EOS;
1
1
Any preprocessing of the pattern should be done only once, at the begin-
ning. Especially, if the buffer size is small. Also, the knowledge of the length
of the buffer (text) should be used (for example, see Section 7.1.3).
Similarly, these functions can be adapted or composed to count the total
number of matches. We use two special constants: MAXPATLEN which
is an upper bound on the size of the pattern, and MAXCHAR which is the
size of the alphabet (a power of 2 ) .
Let A, be the number of comparisons performed by an algorithm, then in
the worst case we have the following lower and upper bounds
4 1
n-m+l <A,,<~n--m
3
For infinitely many n's, 1x1 > 2, and odd rn 2 3 we have
General references:
[Karp, R.M. et al., 721, [Slisenko, A., 731, [Fischer, M.J. et al., 741, [Sellers,
P., 741, [Galil, Z., 761, [Rivest, R.L., 771, [Seiferas, J . et al., 771, [Galil, Z. et
al., 781, [Yao, A.C-C., 791, [Aho, A.V., 801, [Galil, Z. et al., 801, [Main, M. e i
al., 801, [Sellers, P., 801, [Slisenko, A., 801, [Crochemore, M., 811, [Galil, Z. et
al., 811, [Galil, Z., 811, [Galil, Z. et a/., 831, [Galil, Z., 851, [Pinter, R., 851, [Li,
M e et a/., 861, [Abrahamson, I<., 871, [Baeza-Yates, R.A., 891, [Baeza-Yates,
R.A., 891, [Vishkin, U., 901.
n 5 A,, 5 m ( n - m + 2 ) - 1
var i, j , m, n: anteger;
f o u n d boolean;
begin
m := length(pat);
if m = 0 then search := 1
else begin
n := length(tezt); search := 0 ;
j := 1; i := 1; ,found := FALSE
while not f o u n d and ( i <= n - m + l ) do begin
if p a t = substr(tez2, i, m) then begin
search := i; f o u n d := TRUE; end;
2 .-.-
i + 1;
end;
end;
end;
References:
[Barth, G., 841, [Wirth, N . , 861, [Baase, S., 881, [Sedgewick, R., 881, [Baeza-
Yates, R.A., 891, [Baeza-Yates, R.A., 891, [Manber, U., 891, [Cormen, T.H. et
al., 901.
n 5 A n 5 2n + O ( m )
{ int next[MAXPATLEiVl, j;
This function may inspect some characters more than once, but will never
backtrack to inspect previous Characters. It is an on-line algorithm, that
256 HANDBOOI\: OF ALGORITHMS AND DATA STRUCTURES
is, characters are inspected (may be more than once) strictly left to right.
References:
[Aho, A.V. et al., 741, [Knuth, D.E. et al., 771, [Barth, G., 811, [Salton, G. et
al., 831, [Barth, G., 841, [Meyer, B., 851, [Takaoka, T., 861, [Wirth, N., 861,
[Baase, S., 881, [Brassard, G . et al., 881, [Sedgewick, R., 881, [Baeza-Yates,
R.A., 891, [Baeza-Yates, R.A., 891, [Manber, U., 891, [Cormen, T.H. et al.,
901.
Boyer-Moore preprocessing
m = strlen(pat);
f o r ( k 0 ; k<MAXCHAR; k++) skip[k] = m;
f o r ( k 1 ; k < = m ; k++) {
d[k-11 = ( m << 1 ) - lq
skip[pa2[k-l]] = m-k;
1
t=m+l;
f o r ( j = m ; j > 0; j--) {
fi-11 = t ;
w h i l e ( t <= m && pailj-l] != pai[t-l])
{
d[t--11 = min(d[t-l], m-j);
t = At-11;
TEXT ALGORITEIhlS 257
1
t--;
1
q=t; t = m + l - q ; ql=l; tl=O;
for(j=l; j<=t; j++) {
fi-11 = t l ;
while(t1 >= 1 && patlj-1] != pat[tl-11)
t1 = Atl-11;
tl++;
1
while(q < m)
There are several versions of this algorithm. The one presented here is the
one given in Knuth-Morris-Pratts paper. The running time is O(n rm) +
where T is the number of occurrences found. For any version of this algorithm
we have
n
An L -
m
Table 7.1 shows the best known upper bound for different variations of the
Boyer-Moore algorithm when there are no occurrences of the pattern in the
text.
I An I References I
3n [Boyer et al., 771, [Knuth et al., 771
14n L
[Galil., 791
. )
2n [Apostolico e2 al., 861
3n/2 [Colussi et a/., 901
4nI 3 [Colussi et a/.. 901
which is optimal. For large patterns, the maximum shift will also depend on
the alphabet size.
258 HANDBOOK OF ALGORITIIMS AND DATA STRUCTURES
{ int j , k, m, skzp[hilAXCHAR], d [ M A X P A T L E N ] ;
m = strlen(pat);
i f ( m == 0) r e t u r n ( t e x t ) ;
preprocpat(put, skip, d ) ;
f o r ( k = m - 1 ; k<n; k +=
maz(skzp[tezt[k]& ( M A X C H A R - l ) ] , d I j ] ) ) {
for(j=m-1; j >= 0 && texttk] == p u t b ] ; j--) k - - ;
i f ( j ==(-1)) return(text+k+l);
1
return ( N U L L );
1
This function may inspect text characters more than once and may back-
track to inspect previous characters. We receive the length of the text as a
paremeter, such that we do not need to compute it. Otherwise, we lose the
good average performance of this algorithm. This function works even if the
text contains a character code that is not in the alphabet. If we can ensure
that the text only has valid characters, the anding with M A X C H A R - 1 can
be eliminated.
In practice, it is enough to use only the heuristic which always matches
the character in the text corresponding to the mth character of the pattern.
This version is called the Boyer-Moore-EIorspool algorithm. For large rn,
T E X T ALGORJTIIRIS 259
m = strlen(pat);
if( m==O) return(text);
for(k=O; k<AfASCHAR; k++) skip[k] = m;
for(L=O; k m - 1 ; k++) skip[pat[k]] = m-k-1;
References:
[Boyer, R. et al., '771, [Galil, Z., 791, [Bailey, T.A. et al., 801, [Guibas, L.J. e t
al., 801, [Ilorspool, R.N.S., SO], [Rytter, W., 801, [Salton, G. et al., 831, [h)loller-
Nielsen, P. et al., 841, [Apostolico, A. et al., 861, [Wirth, N., 861, [Baase, S.,
881, [Brassard, G. et a / . , 881, [Schaback, R., 881, [Sedgewick, R., 881, [Baeza-
Yates, R.A., 891, [Baeza-Yates, R.A., 891, [Baeza-Yates, R.A., 891, [hlanber,
U . , 891, [Baeza-Yates, R.A. c l al., 001, [Cormen, T.11. et al., 901.
260 H A N D B O O K OF A L G O R I T H M S A N D D A T A S T R U C T U R E S
state := 1;
for i := 1 to n do begin
while trans(state, tezt[z])= FAIL do
st at e := fa ilu re( stat e ) ;
staie := trans(state, tezt[zj);
if Guiput(state) <> {} then
{*** a match was found ***I;
end;
The advantage of the PMM Ovei a DFA is that the transition table is
smaller at the cost of sometimes inspecting characters more than once. This
function will never backtrack to inspect previous characters. It is an on-line
algorithm.
The construction and optimizations of the table are beyond the scope of
this handbook. More efficient automata are fully described in Section 7.1.6.
There also exist pattern matching machines based on the Boyer-Moore
algorithm (Section 7.1.3). In this case, the search is done from right to left in
the set of strings. If a mismatch is found, the set of strings is shifted to the
right.
References:
[Aho, A.V. et al., 741, [Aho, A.V. et al., 751, [Comrnentz-Walter, B., 791,
[Bailey, T.A. ei al., 801, [Meyer, B., 851, [Sedgewick, R., 881, [Baeza-Yates,
R.A. et al., 901.
const B = 131;
var hpat, htext, Bm, j,m, n: integer;
found boolean;
begin
found := FALSE; search := 0 ;
m := Zength(pai);
if m=O then begin
search := 1; found := TRUE end;
Bm := 1 ;
hpat := 0 ; htext := 0 ;
n := Zength(text);
if n >= m then {*** preprocessing ***}
for j := 1 to m do begin
Bm := Bm*B;
hpat := hpat*B +
ord(patbJ);
+
htext := htext*B ord(text[jl);
end;
References:
[Harrison, M.C., 711, [Karp, R.M. et al., 871, [Sedgewick, R., $81, [Baeza-Yates,
R.A., 891, [Cormen, T.H,et al., 901, [Gonnet, G.H. et al., 901.
Automata definition
In addition to the above definition, when automata are used for string
matching, we will encode final states in the transition table as the complement
of the state number. This allows a single quick check in a crucial part of the
search loop. For an accepting state, f i n a l will encode the length of the match,
whenever this is possible.
With this definition, the searching function is:
automata stm'ngautom(pat)
char *pat;
a = (autornaia)malloc(sizeof(struct auiornrec));
u ->d = MAXCHAR;
a ->st = strlen(pat)+l;
a ->nextst = (short **)calloc(a ->st, sizeof(short *));
a ->final = (short *)calloc( a ->st, sizeof(s1iort));
-1
The next function produces the union of two automata.
s h o r t m e rg es t at es( ) ;
if(pat[O]==EOS) return(iext);
B = 1;
for(m=O; m<MAXCHAR; m++) mask[m] = -0;
for(m=O; B != O && pat[m] != EOS; m++) {
mask[pat[m]] &= B;
B<<= 1;
.
B = l<<(m-1);
for( biis= -0; *tezt !:= EOS; text++) {
bits = bits<<l I masE[*text & (MAXCHAR-I)];
if((bitsM3) == 0 ) {
for(i=O; pat[m+zl != EOS && pai[m+z]==tezt[i+l];i++);
if(pai[m+z] ==EO$) return ( texi-m+l);
1
1
return(N U L L ) ;
1
T E X T ALGORITHMS 267
This function will inspect each character once, and will never backtrack to
inspect previous characters. This function works even if the text contains a
character code that is not in the alphabet. If we can ensure that the text only
has valid characters, the anding with M A X C H A R - 1 can be eliminated. It
is an on-line algorithm.
This algorithm extends to classes of characters, by modifying the prepro-
cessing of the table mask, such that every position in the pattern can be a
class of characters, a complement of a class or a dont care symbol. Similarly,
we may allow dont care symbols in the text, by defining a special symbol
x such that mask[z] = 0. This is the fastest algorithm to solve this gener-
alization of string searching. There exist algorithms with better asymptotic
complexity to solve this problem, but these are not practical.
References:
[Abrahamson, K., 871, [Baeza-Yates, R.A. e2 al., 891, [Baeza-lates, R.A., 891,
[Kosaraju, S.R., 891.
[Baez a- Yates, 8 91
[Baeza-Yates et al., 891
[Grossi et al., 891
[Tarhio et al., 901
that solve this problem, where w denotes the computer word size and r the
number of occurrences found.
The brute force algorithm for this problem is presented below. We have
(k: + 1)n 5 A, 5 mn
{ int j, m, count;
m = strlen(put);
if(m <= k) return(tezt);
{ int qMAXPATLEN+l];
int i, j, m, tj, tjl;
m = strlen(pat);
if(m <= k) return(text + n);
q o ] = 0; /*** initial values ***/
for(j=l; j<=m; j++) $1 =j;
References:
[Levenshtein, V., 651, [Levenshtein, V., 661, [Sellers, P., 741, [Wagner, R.E. et
al., 743, [Wagner, R.E., 751, [Wong, C.K. et al., 761, [Hall, P.A.V. et al., 801,
[Bradford, J., 831, [Johnson, J.I1., 831, [Sankoff, D. et al., 831, [Ukkonen, E.,
831, [Landau, G.M. et al., 851, [Ukkonen, E., 851, [Ukkonen, E., 851, [Galil,
Z. et al., 861, [Landau, G.M. e-t al., 861, [Landau, G.M. et al., 861, [Landau,
G.M., 861, [Krithivasan, K. et al., 871, [Baase, S., 881, [Ehrenfeucht, A. et al.,
881, [Baeza-Yates, R.A. et al., 891, [Baeza-Yates, R.A., 891, [Galil, Z. et al.,
891, [Grossi, R. et al., 891, [Manber, U., 891, [Eppstein, D. et al., 901, [Tarhio,
J . et al., 901, [Ukkonen, E. et al., 901.
Usually there are some restrictions imposed on the indices and conse-
quently on the later searches. Examples of these restrictions are: a control
dictionary is a collection of words which will be indexed. Words in the text
which are not in the control dictionary will not be indexed, and hence are
not searchable. Stop words are very common words (such as articles or
prepositions) which for reasons of volume or precision of recall will not be
included in the index, and hence are not searchable. An index point is the
beginning of a word or a piece of text which is placed into the index and is
searchable. Usually such points are preceded by space, punctuation marks or
some standard prefixes. In large text databases, not all character sequences
are indexed, just those which are likely to be interesting for searching.
The most important complexity measures for preprocessed text files are:
the extra space used by the index or auxiliary structures S,, the time required
to build such an index T, and the time required to search for a particular
query, A,. As usual, n will indicate the size of the text database, either
characters or number of index points.
General references:
[Gonnet, G.H., 831, [Larson, P., 831, [Faloutsos, C., 851, [Galil, Z., 851.
each work is a record and fields can be title, abstract, authors, and so on.
Every word in any of the fields, is considered an index point.
The result of searching a term in an inverted index is a set of record num-
bers. All these sets are typically stored sequentially together in an external
file. The set can be identified by its first and last position in the external file.
Let n be the total number of words indexed. The complexity of building
the index is that of sorting n records, each one of length rlogznfk] bits
where k is the size of the control dictionary and f is the number of fields in
any record.
ControlDict : { [word]}f.
k f
FieldIndex : (FieldName, {first, last)l}l .
word : string. FieldName : string.
(1) Assume that the control dictionary can be kept, in main memory. Assign
a sequential number to each word, call this the word number (an
integer between 1 and k).
(2) Scan the text database and for each word, if in the control dictionary,
output to a temporary file the record number, field number, and its
word number.
(3) Sort the temporary file by field number, word number, and record num-
ber.
(4) For each field, compact the sorted file to distinct record numbers alone.
During this compaction, build the inverted list from the end points of
each word. This compacted file becomes the main index for that field.
For a single term search, the location of the answer and the size of the
answer are immediately known. Further operations on the answers, inter-
sections, unions, and so on, will require time proportional to the size of the
sets.
The operations of union, intersection and set difference can be made over
the set of pointers directly (all these sets will be in sorted order) without any
need for reading the text.
References:
[Knuth, D.E., 731, [Grimson, J.B. et al., 741, [Stanfel, L., 761, [McDonell,
K.J., 771, [Nicklas, B.M. et al., 771, [Jakobsson, M., 801, [Salton, G. et al., 831,
[Sankoff, D. et al,, 831, [Waterman, M.S., 841, [Blumer, A. et al., 871, [ b o ,
V.N.S. et al., 881, [Coulbourn, C.J. et al., 891.
Prefix searching Every subtree of the PAT tree contains all the sistrings
with a given prefix, by construction. Heiice prefix searching in a PAT tree
274 HANDBOOK OF ALGORITHRfS AND DATA STRUCTURES
consists of searching the prefix in the tree up to the point where we exhaust
the prefix or up to the point where we reach an external node. At this point we
need to verify whether we could have skipped bits. This is done with a single
comparison of any of the sistrings in the subtree (considering an external node
as a subtree of size one). If this comparison is successful then all the sistrings
in the subtree (which share the common prefix) are the answer, otherwise
there are no sistrings in the answer. We have
Range searching Searching for all the strings within a certain range of
values (lexicographical range) can be done equally efficiently. More precisely,
range searching is defined as searching for all strings which lexicographically
compare between two given strings. For example the range abc .. (acc will
contain strings like abracadabra, acacia, aboriginal but not abacus or
acrimonious.
To do range searching on a PAT tree we search each of the defining in-
tervals and then collect all the subtrees between (and including) them. Only
O(height) subtrees will be in the answer even in the worst-case (the worst-
case is 2 height - 1 ) and hence only O(1og n ) time is necessary in total on the
aver age.
two bits per internal node (to indicate equal heights as well) and the search
becomes logarithmic in height and linear in the number of matches.
References:
[Fredkin, E., 601, [Morrison, D.R., 681, [Weiner, P., 731, [Aho, A.V. e t al.,
741, [McDonell, K.J., 771, [Nicklas, B.M. e t al., 771, [Majster, M. e$ al., 801,
[Comer, D. e t al., 821, [Orenstein, J.A., 821, [Gonnet, G I . , 831, [Salton, G.
et al., 831, [Apostolico, A. e t al., 851, [Apostolico, A., 851, [Clien, M.T. et
al., 851, [Merrett, T.H. et al., 851, [Iiemp, M. et al., 871, [Gonnet, G.H., 881,
[Baeza-Yates, R.A., 891.
Automaton Trie
where CY = log, IAI, and A is the largest eigenvalue of the incidence matrix of
the DFA with multiplicity m. For any binary DFA 1x1 < 2 and hence CY < 1.
The expected number of external nodes visited is proportional to N n , and
the expected number of comparisons needed in every external node is O(1).
Therefore, the total searching time is given by O(Nn).
References:
[Gonnet, G.H., 881, [Baeza-Yates, R.A. e t a/., 891, [Baeza-Yates, R.A., 891,
[Baeza-Yates, R.A. et al., 901.
m = strlen(pat);
/* search left end */
if(strncmp(pat, index[O],m) != 1) left = 0 ;
else if(strncmp(pat, index[n-I], m) == I) left = n;
e l s e { /* binary search */
for(low=O, high=n; high-low > 1;) {
i = (high+low)/2;
if(strncmp(pat, indez[z],m) != 1 ) high = i;
else low = i;
1
left = high;
1
/* search right end */
if(strncmp(pa2, index[O],m) == -1) right = -1;
e l s e if(strncmp(pat, indez[n-11, m) != -1) right = n-1;
e l s e { /* binary search */
for( low=O, high=n; high-low > 1;) {
i = (high+low)/2;
if(strncmp(pat, index[z], m) != -1) low = i;
else high = i;
1
right = low;
1
return ( right- left+ I) ;
1
PAT arrays are also called suffix arrays. With additional information
about the longest corninon prefixes of adjacent index points in the array, it is
possible to speed up a prefix search to
References:
[Gannet, G.H., 861, [Manber, U. et ai., 901, [Manber, U. et al., to app.].
7.2.5 DAWG
The Directed Acyclic Word Graph (DAWG) is a deterministic finite automa-
ton that recognizes all possible substrings of a text. All states in the DAWG
are accepting (final) states. Transitions which are not defined are assumed to
go to a non-accepting dead state.
For any text of size n > 2 we have
n + 15 states 5 2n - 1
n 5 transitions 5 3 n - 4
To search a substring in the DAWG we simply run the string through the
DFA as in the search function of Section 7.1.6. If the DAWG is implemented
w DFAs like in Section 7.1.6, the running time is
References:
[Blumer, A. et al., 851, [Crochernore, h l . , 851, [Blumer, A. et al., 871, [Baeza-
Yates, R.A., to app.].
280 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
signature for the record is the superimposition (logical or) ol all the word
signatures. For this method the signatures of the words shouhl have fewer 1
bits. This method is particularly attractive for searching queriefl with an and
condition, that is, all records which have two or more given words. An and
search is done by searching the or of all the word signatures of the query.
In this method we divide each document into sets of words or size w (log-
ical blocks), and we hash every distinct word from each block i l l bit patterns
of length B . The signature of a block is obtained by superimposing those bit
patterns. Finally, the document signature is the concatenation of all block sig-
natures. In this case, the optimal number of bits set to 1 (that is, to minimize
false drops) is
B In 2
W
for single word queries. We have
Bn
s, = W x average word size
bits .
These techniques can be extended to handle subword searches, and other
boolean operations. Other variations include compression techniques.
References:
[Harrison, M.C., 711, [Bookstein, A., 731, [Knuth, D.E., 731, [Rivest, R.L., 741,
[Rivest, R.L., 761, [Burkhard, W.A., 791, [Cowan, R. et al., 791, [Comer, D. et
al., 821, [Tharp, A.L. et al., 821, [Larson, P., 831, [Ramamohanarao, K. et al.,
831, [Sacks-Davis, R. et al., 831, [Salton, G. et al., 831, [Faloutsos, C. et al.,
841, [Faloutsos, C. et al., 871, [Karp, R.M. et al., 871, [Sacks-Davis, R. et al.,
871, [Faloutsos, C., 881.
7.2.7 P-strings
Text is sometimes used to describe highly structured information, such as,
dictionaries, scientific papers, and books. Searching such a text requires
not only string searching, but also consideration of the structure of the text.
Large structured texts are often called text-dominated databases. A text-
dominated database is best described by a schema expressed as a grammar.
Just as numeric data is structured in a business database, string data
must be structured in a text-dominated database. Rather than taking the I
form of tables, hierarchies, or networks, grammar-based data takes the form
of parsed strings, or p-strings.
A p-string is the main data structure of a text-dominated database and it
is formed from a text string and its parse tree (or derivation tree, see [Hopcroft
et al. 79, pages 82-87]). Notice that we do not require to have a parseable
string (with the schema grammar) but instead we keep both the string and
its parsing tree together.
282 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
For the string Doe, John E. we have the p-string shown in Figure 7.3.
r author
traversal.
The above operators allow structured search within the text database.
String searching algorithms can be composed with the above. For example,
References:
[Gonnet, G.H. et al., 871, [Smith, J . et al., 871.
General references:
[Maier, D., 781, [Tzoreff, T. et al., 881, [Myers, E. et al., 891, [Amir, A. e t al.,
901, [Manber, U. et al., to app.].
and l? < n.
284 HANDBOOK OF ALGORITIIhfS AND DATA STRUCTURES
is a subsequence of it.
References:
[Hirschberg, D.S., 751, [Aho, A.V. et al., 761, [Hirschberg, D.S., 771, [Hunt,
J. et al., 771, [Hirschberg, D.S., 781, [Maier, D., 781, [Dromey, R.G., 791,
[Mukhopadhay, A., 801, [Nakatsu, N. et al., 821, [Hsu, W.J. e.t al., 841, [Hsu,
W.J. et al., 841, [Apostolico, A., 861, [Crochemore, M., $61, [Myers, E., 861,
[Apostolico, A. et al., 871, [Apostolico, A., 871, [Kumar, S.K. et al., 871, [Cor-
men, T.H. et al., 901, [Eppstein, D. et al., 901, [Baeza-Yates, R.A., to app.],
[Myers, E., to app.].
Note that now the size of the text is n2 instead of n. For this problem, the
brute force algorithm may require O ( n 2 m 2 )time, to search for a pattern of
size m x m in a text of size n x n.
Table 7.6 shows the time and space required by 2-dimensional pattern
matching algorithms. Some of these algorithms can be extended to allow
scaling of the pattern or approximate matching. However, there are no effi-
cient algorithms that allow arbitrary rotations of the pattern.
References:
[Bird, R., 771, [Baker, T., 781, [Davis, L.S. et al., 801, [Karp, R.M. et al.,
871, [Ihithivasan, K. et al., 871, [Gonnet, G.H., 881, [Zhu, R.F. e t al., 891,
[Baeza-Yates, R.A. et al., 901.
KMP states
3 0
0 2 Pattern machine
r
1 01I output
, 2 Ix
I
2
Next character to read
Text
with f ( m ) < 1.
This algorithm can be improved to avoid repeating comparisons in the
checking phase if we have overlapped occurrences. It can also be extended to
non-rectangular pattern shapes, or higher dimensions.
8 1 4 1 5
9 2 3 1 4
10 1:1 12 13
where the integers indicate the ordinal position of the comparison for the pixel
marked as 1 (the sispiral comparing sequence).
The main data structure for subpicture searching is a PAT tree (see Sec-
tion 7.2.2 for the complexity measures) built on sispirals for each pixel. As
with sistrings, every time that we step outside the picture we should use a
'null' character which is not used inside any of the pictures.
To search a square in the album, we just locate its centre, that is, a pixel
that will develop a spiral which covers the square, and search the sispiral
starting at this pixel in the PAT tree. The searching time is independent of
the number of matches found.
288 HANDBOOK OF ALGORITEIMS AND DATA STRUCTURES
Distributions Derived
from Empirical
Observation
In this appendix we will describe some probability distributions arising from
empirical situations. The distributions described here may be used with other
well-known distributions to test algorithms under various conditions. Some
of these distributions are related directly to data processing.
fl = ifi
where fi denotes the frequency of the ith most frequent word. Zipf observed
that the population of cities in the USA also follows this relation closely. From
this observation we can easily define a Zipfian probability distribution as
I
289
290 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
Zipf found that some word frequencies matched this distribution closely
for values of 8 other than 1. In this case the first moments and variance are
DISTRIBUTIONS DERIVED FROM EAIPIRXCAL OBSERVATION 291
References:
[Zipf, G.K., 491, [Johnson, N.L. e2 a / . , 691, [Knuth, D.E., 731.
and so
R(n) = T
where T is the total expected number of references. To divide the n books
into E divisions satisfying the given ratio, the number of books in each division
must be w ,w,,,-a. ... nmk--l m 1 Since each division receives the
same number of references, this number inust be T/E.Consequently the total
expected number of references to the first division will be
n(m-1)
Now the quantities E and nz are related to one another, since for any valid
E, Bradfords law predicts the existence of a unique m. Examination of R ( z )
for different values of b and m shows that in order for the law to be consistent,
the quantity mk - 1 = b must be constant. This constant b defines the shape
of the distribution. Froin equation 1.1 we can solve for R ( z ) and obtain
202 IIANDBOOK OF ALGORITHMS AND DATA STRUCTURES
Let pi be the probability that a raiidoin reference refers to the ith book.
From the above discussion we have
Pi =
R(i) - R(i - 1) bi+n
T b(i - 1) + n
The variance is
n2 1
02 =
This distribution behaves very much like the generalized harnionic (or
the first generalization of Zipfs distribution). When the parameter b 0
---f
References:
[Pope, A., 751.
DISTRIBUTIONS DERIVED FROM EMPIRICAL OBSERVATION 293
We immediately conclude that this law will be consistent only for 8 > 2, as
has been noted by several other authors; otherwise this first moment will be
unbounded, a situation which does not correspond with reality. Note that
npi denotes the expected number of papers published in a journal which has
n contributors.
For 8 5 3, the variance of the distribution under discussion diverges. For
8 > 3, the variance is given by
The median number of papers by the most prolix author can be approxi-
mated by
References:
[Lotka, A.J., 261, [Murphy, L.J., 731, [Radhakrishnan, T. et a / . , 791.
of the transactions are on the most active 20% of the records, and so on
- p2 >
recursively. Mathematically, let pl > - p3 >- ... >- pn be the independent,
probabilities of performing a transaction on each of the n records. Let R ( j )
be the cumulative distribution of the p i ' s , that is,
e p i = R(j) R(n) = 1
Note that this probability distribution also possesses the required monotone
behaviour, that is, pi 2p i + l .
The parameter 8 gives shape to the distribution. When e = 1 (a! =
5) the distribution coincides with the discrete rectangular distribution. The
moments and variance of the distribution described by equation 1.6 are
p;
n
=p p i =:
On2
- + - 0
+ 2-e
- -6
71 2c(-o - 1) +~ ( - 0 )
i=l
8+2 0+1 ne
+O(n-')
DISTRIBUTIONS DERIVED FROM EMPIRICAL OBSERVATION 295
n
Onk Oknk-l O(k - O)knk-2
p; =Cakpi = -
i=l O+k 2 ( 8 + k - 1 ) -I- 1 2 ( O + k - 2 )
+
+ ~ ( n ~ - O(n-')
~ )
On2
a2 =
(e + 1)"Q + 2 ) + O(nl-')
For large n , the tail of the distribution coincides asymptotically with pi m
i e - l . For the 80%-20% rule, 8 = 0.138646...; consequently the distribution
which arises from this rule behaves very similarly to the second generalization
. of Zipf's distribution.
References:
[Heising, W.P., 631, [Knuth, D.E., 731.
APPENDIX II
Asymptotic Expansions
C(z) = 12-"
n=l
y = n+00
liin II, - In ( 1 1 ) == 0.5772158840 ...
297
I
298 HANDBOOK OF ALGORITHAfS AND DATA STRUCTURES
1 e+2 7e2 + 4 8 e + 2 4
(11.4)
+
en 2e2n2 24e3n3
k=l
zk
= -1n(l-z)+
( z - 1)-12"+1
+ + (zn+'(i - 1)!n! + ... (11.7)
(n + 1) z - l)i(n + i)!
e..
k=l
= -ln(l-z)+
( z - 1)n
1
( z - 1)n + ( z z-+1)2n2
l
+ +
z 2 4%+ 1 ( z 1 ) ( 2 1oz + 1)
( z - 1)3n3
+ + + ( z --1)4n4
(11.8)
(11.9)
- -log, (2. - 1) + E
y
+ 43 + -
51n z
144
-
311n3z
86400
+qin5 (11.10)
k=l
(11.11)
300 HANDBOOK OF ALGORITIIMS AND DATA STRUCTURES
- -
7r2 (1 - z ) ~- (1 - z ) ~- ...
- ln(1-z)lnz - (1-z) -
6 4 9
(11.13)
n 1 1 1
E, = +(n+1)+7 == H, = y + l n n + - -2n
-
12132
(11.14)
k=l
1 1
+---+ -...1
120n4 252n6 240ns
n 1 1 1 1 1 1
L2
=
I?
= - _. - + -
6 n 2n2
-
6n3
+-
30n5
- -...
42n7
(11.15)
k=l
n
In 27r
x 1 n k = In r ( n + 1) = ( n + 1/2)ln n - 12 +7 (11.16)
k=l
+- 1 - -
1 1 - 1 ...
12n 360~13' 1260n5 1680n7
A S Y M P T O T I C EXPANSIONS 301
(11.17)
- 571
2488320n4 +
163879
209018880n5
+...)
= n n d 2 x ( n + 1/6)e"" (1+ - 1 + 0(.-3))
144n2
( ~ + z / n )=
~ e"
( 1- z2
2n
-+ 32+8
-2-
24n2
(z+2)(2+6) z4
48n3
(11.18)
(l+l/n)n = e ( l - - + - 1- - 11 7 2447
(11.19)
2n 24n2 16n3'57GOn4
- 959
2304n5 +
238043 -
580608n6
...)
(11-20)
- 137 -_ 67177
3840n5 2903040n6
-...)
b b2 b-3 b - 12b3
= 1+-+T+-b2+- (11.2 1)
n 2n 6n3 24n4
b2 - 30b 40+b3
b2
+ +
- 60b 330 b4 +
...
+ 120n5 720n6
(11.22)
302 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
(11.23)
y Inn 1 1 1 (- 1)i-1
dx = - - + - + - - - + - -
2 2 2n 8n2 36n3
... + i! 2i ni (11.24)
e-l/n
dx = -- dx ( s > 1) (11.25)
s- P (s- l)n
100
(11.28)
dx =
Inn-y
2
+m-
lnn+l-y
2n
(11.29)
-2 d m
3
+ l n n +4n2
3/2-y
+
4 d v
15
- ...
2[T(s- 2) - T ( s - l)]
dx = T ( s ) = -1
s-1
+ n(1- s)
[s > 11 (11.30)
ASYMPTOTIC EXPANSIONS 303
- 2t)
- ((s 2n2 +... [s-mt # Iform=O,1,2,-..]
-
(-n>-m (g - q(m + 1)+ (ln(n) + q(m+ 1))2)
2t2m!
C(s - t) - C(s - 2t)
-C(s) + n 2n2
.... +
[s - mt = 1 and C(s - mt) interpreted as y1,
1
where y1 = - limC(x)
x=l
+
(x - 1 ) 2 1
k20
(1 - ;)2k = -log2 (log2 n - +) + s1 + Y (11.33)
(11.34)
n
k>O
+ ( p - 1)n +
+P (logp (logg 52))
z Pz2
Z(P2 - l)n2
304 HANDBOOK OF ALGOIEITHhfS AND DATA STRUCTURES
k:>O
I P(x)I 5 0.0001035
(11.36)
(11.37)
e-2n e-4n
+nP(log, n) - - -- ...
4
I P(x)I 5 0.000000173
(11.38)
= log, n + -
y
In a
+ 51 + -
El
n
+ 2 + + P(log, n )
E2
n
* * *
where
k=ca
a+ 1 (Sak - 3)e-,-
and K2 =-
2 24a a4k
k=-m k=-m
axn+bxn-l+f(n) = O (11.39)
A S Y M P T O T I C EXPANSIONS 305
x = l + E + Y ( - +b i ) + $ ( b2 + b(b + a/2)y
n n2 a+b ( a + b)2
where y = In
(--:'a>
(11.40)
where y = w(-nf(n))
y-a-b
x = 1+
n
where y = w(-e"+bf(n))
+n3 (
y b + ac + (c-
c2 + c(c - 1/2)y
1)2
+ 6) + O(y4n-4)
6
where y = In
(c - 1)n - b - a
or alternatively
(11.43)
n4 +- a (3 CY giv(a) + 8 i ( a ) ) (11.44)
i>O
ai m = s(4 - 2m 24m2
(11.45)
(11.46)
+*-u)[3~~(1
24m2 - a ) g i V + $(1- 2a)g - 12g]
a(1- a)
48ni3
+--[-a2(1 - CY)2gi - 8 a ( l - a ) ( l - 2a)g
-12(1 --Ga + 6a 2 )g i v + 48(1- 2a)g - 24gI + O(m-4)
ASYMPTOTIC EXPANSIONS 307
k - 1 + .-1 - b2
C f ( n k > = (n - - - (k - l)'(k + 1) (11.47)
n 2 24n 48n2
where
-
1209600 x=l
308 HANDBOOR OF ALGORJTIIMS AND DATA STRUCTURES
If we write
then, if f(z) =
the reals),
Ciajxi + X i bi In 2 x i + xi ln2(z)xi +
ci (i varying over
bo(1n (2n) - 2)
+s+ -
41as
a4
252
+ .-.+ 2
+ ..,
(11.50)
General references:
[de Bruijn, N.G., 701, [Abramowitz, M. e t al., 721, [ I h u t h , D.E., 731, [ I h u t h ,
D.E., 731, [Bender, E.A., 741, [Gonnet, G.H., 781, [Greene, D.H. et al., 821,
[Graham, R.L. et al., 881.
APPENDIX Ill
References
111.1 Textbooks
The following are fine textbooks recommended for further information on their
topics.
1. Aho, A.V., Hopcroft, J.E. and Ullman, J.D.: The Design and Analysis of
Computer Algorithms; Addison-Wesley, Reading, Mass,(1974). (2.1, 2.2, 3.2.1,
3.3, 3.4.1, 3.4.1.3, 3.4.2.1, 4.1.3, 4.1.5, 4.2.1, 4.2.4, 4.2.6, 5.1.6, 5.2, 5.2.2, 6.1,
6.3, 6.4, 7.1.2, 7.1.4, 7.1.6, 7.2.2).
2. Aho, A.V., Hopcroft, J.E. and Ullrnan, J.D.: Data Structures and Algorithms;
Addison-Wesley, Reading, Mass, (1983). (3.3, 3.4.1, 3.4.2, 4.1, 4.2).
3. Baase, S.: Computer Algorithms: Introduction to Design and Analysis;
Addison-Wesley, Reading, Mass, (1988). (3.2.1, 3.4.1.7, 4.1.2, 4.1.3, 4.1.4,
4.1.5, 4.2.1, 4.2.4, 4.4, 5.2, 6.3, 6.4, 7.1.1, 7.1.2, 7.1.3, 7.1.8).
4. Borodin, A. and Munro, J.I.: The Computational Complexity of Algebraic and
Numeric Problems; American Elsevier, New York, NY, (1975). (6.1, 6.2, 6.3,
6.4).
5. Brassard, G. and Bratley, P.: Algorithmics - Theory and Practice; Prentice-
Hall, Englewood Cliffs, NJ, (1988). (3.2.1, 3.3.1, 3.4.1.7, 4.1.3, 4.2.1, 5.1.3,
5.2, 6.2, 7.1.2, 7.1.3).
6. Cormen, T.H., Leiserson, C.E. and Rivest, R.L.: Introduction to Algorithms;
MIT Press, Cambridge, Mass., (1990). (3.3, 3.4.1, 3.4.1.8, 3.4.1.9, 3.4.2,
3.4.2.4, 4.1.3, 4.1.5, 4.2.3, 4.2.4, 5.1.3, 5.1.7, 5.2, 6.3, 7.1.1, 7.1.2, 7.1.3, 7.1.5,
7.1.6, 7.3.1).
7. de Bruijn, N.G.: Asymptotic Methods in Analysis; North-Holland, Amsterdam,
(1970). (11).
8. Flores, I.: Computer Sorting; Prentice-Hall, Englewood Cliffs, NJ, (1969).
(4.1, 4.2, 4.4).
309
310 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
9. Gotlieb, C.C. and Gotlieb, L.R.: Data Types and Structures; Prentice-HaU,
Englewood Cliffs, NJ, (1978). (2.1, 3.1.1, 3.2.1, 3.2.2, 3.3, 3.4.1, 3.4.2, 3.4.3,
3.4.4, 4.1.2, 4.1.3, 4.2).
10. Greene, D.H. and Knuth, D.E.: Mathematics f o r the Analysis of Algorithms;
Birkhauser, Boston, Mass, (1982). (3.3.2, 3.3.12, 11).
11. Graham, R.L., Knuth, D.E. and Patashnik, 0.: Concrete Mathematics: A
Foundation for Computer Science; Addison-Wesley, Reading, Mass, (1988).
(3.3.10, 11).
12. Hopcroft, J.E. and Ullman, J.D.: Introduction to Automata Theory, Lan-
guages, and Computation; Addison-Wesley, Reading, Mass, (1979). (7.1.6).
13. Horowitz, E. and Sahni, S.: Fundamentals of Data Structures; Computer Sci-
ence Press, Potomac, Maryland, (1976). (3.2, 3.3, 3.4.1, 3.4.2, 3.4.4, 4.1.2,
4.1.3, 4.1.5, 4.2.1, 4.4.2, 4.4.4, 4.3.1).
14. Hu, T.C.: Combinatorial Algorithms; Addison-Wesley, Reading, Mass, (1982).
(3.4.1.7, 6.3).
15. Jensen, I<. and Wirth, N.: Pascal User Manual and Report; Springer-Verlag,
Berlin, (1974). (1).
16. Johnson, N.L. and Kotz, S.: Discrete Distributions; Houghton Mifflin, Boston,
Mass, (1969). (1.1).
17. Kernighan, B.W. and Ritchie, D.M.: The C Programming Language; Prentice-
Hall, Englewood Cliffs NJ, (1978). (1).
18. Knuth, D.E.: The A r t of Computer Programming, vol. I: Fundamental Algo-
rithms; Addison-Wesley, Reading, Mass, (1973). (3.4.1.2, 11).
19. Knuth, D.E.: The A r t of Computer Programming, vol. 11: Seminumerical
Algorithms; Addison-Wesley, Reading, Mass, (1969). (6.1, 6.2, 6.3, 6.4).
20. Knuth, D.E.: The A r t of Computer Programming, vol. III: Sorting and Search-
ing; Addison-Wesley, Reading, Mass, (1973). (3.1.1, 3.1.2, 3.1.4, 3.2.1, 3.3,
3.3.2, 3.3.4, 3.3.5, 3.3.6, 3.3.8.1, 3.3.11, 3.3.12, 3.3.1, 3.4.1, 3.4.1.1, 3.4.1.6,
3.4.1.7, 3.4.1.3, 3.4.1.4, 3.4.1.9, 3.4.2, 3.4.4, 3.4.4.5, 4.1.1, 4.1.2, 4.1.3, 4.1.4,
4.1.5, 4.2.1, 4.2.3, 4.2.4, 4.3.1, 4.3.2, 4.3.3, 4.4.1, 4.4.2, 4.4.3, 4.4.4, 4.4.5, 5.1.3,
5.1.6, 5.2.2, 5.2, 7.2.11, 7.2.6, 1.1, 1.4, 11).
21. Kronsjo, L.: Algorithms: their complexity and efficiency; John Wiley, Chich-
ester, England, (1979). (3.1.1, 3.2.1, 3.3, 3.4.1, 4.1, 4.4, 5.2, 6.3, 6.4).
22. Lorin, H.: Sorting and Sort Systems; Addison-Wesley, Reading, Mass, (1975).
(4.1, 4.4).
23. Manber, U.: Introduction to Algorithms: A Creative Approach; Addison-
Wesley, Reading, Mass, (1989). (3.2.1, 3.2.3, 3.3, 3.4.1, 3.4.1.3, 4.1.3, 4.1.5,
4.2.1, 4.2.3, 4.2.4, 5.1.3, 5.3, 6.3, 7.1.1, 7.1.2, 7.1.3, 7.1.8).
24. Mehlhorn, K.: Data Structures and Algorithms, vol. I: Sorting and Searching;
Springer-Verlag, Berlin, (1984). (3.1, 3.2, 3.3, 3.4.1, 3.4.2, 3.4.4, 4.1, 4.2, 4.3,
4.4, 5.1, 5.2).
25. Mehlhorn, I<.: Data Structures and Algorithms, vol. III: Multidimensional
Searching and Computational Geometry; Springer-Verlag, Berlin, (1984). (3.5,
3.6).
26. Reingold, E.M. and IIansen, W.J.: Data Structures; Little, Brown, Boston,
Mass, (1983). (3.3, 3.4.1, 4.1, 4.2, 4.4).
27. Reingold, E.M., Nievergelt, J. and Deo, N.: Combinatorial Algorithms: Theory
and Practice; Prentice-Hall, Englewood Cliffs NJ, (1977). (3.1.1, 3.2.1, 3.3,
3.4.1.1, 3.4.1.3, 3.4.1.4, 3.4.1.7, 3.4.2, 3.4.4, 4.1.1, 4.1.2, 4.1.3, 4.1.5, 4.2.4, 4.3,
5.2).
REFERENCES 311
111.2 Papers
The following are research papers that contain some in-depth information on
the topics covered in the indicated sections of the handbook. Technical reports
and unpublished manuscripts are not included in this list.
9. Aho, A.V. and Lee, D.T.: Storing a Sparse Dynamic Table; Proceedings FOCS,
Toronto, Canada, 27:55-60, (Oct 1986). (3.3.16).
10. Aho, A.V., Steiglitz, K. and Ullman, J.D.: Evaluating Polynomials at Fixed
Points; SIAM J on Computing, 4(4):533-539, (Dec 1975). (6.4).
11. Aho, A.V.: Pattern Matching in Strings; Formal Language Theory: Perspec-
tives and Open Problems, Academic Press, London, :325-347, (1980). (7.1).
12. Ajtai, M., Fredman, M.L. and I<omlos, J.: Hash Functions for Priority Queries;
Information and Control, 63(3):217-225, (Dec 1984). (3.3.1, 5.1).
13. Ajtai, M., Komlos, J. and Szemeredi, E.: There is no Fast Single Hashing
Algorithm; Inf. Proc. Letters, 7(6):270-273, (Oct 1978). (3.3.2).
14. Akdag, H.: Performance of an Algorithm Constructing a Nearly Optimal Bi-
nary Tree; Acta Informatica, 20(2):121-132, (1983). (3.4.1.7).
15. Akl, S.G. and Meijer, H.: On the Average-Case Complexity of Bucketing Al-
gorithms; J of Algorithms, 3(1):9-13, (Mar 1982). (4.2.3).
16. AM, S.G. and Meijer, H.: Recent Advances in Hybrid Sorting Algorithms;
Utilitas Mathematica, 21C:325-343, (May 1982). (4.2.5).
17. Alagar, V.S., Bui, T.D. and Thanh, M.: Efficient Algorithms for Merging;
BIT, 23(4):410-428, (1983). (4.3.2).
18. Alagar, V.S. and Probst, D.K.: A Fast, Low-Space Algorithm for Multiplying
Dense Multivariate Polynomials; ACM TOMS, 13( 1):35-57, (Mar 1987). (6.3).
19. Aldous, D., Flannery, B. and Palacios, J.L.: Two Applications of Urn Pro-
cesses: The Fringe Analysis of Search Trees and the Simulation of Quasi-
Stationary Distributions of Markov Chains; Probability in the Eng. and Inf.
Sciences, 2:293-307, (1988). (3.4.2, 3.4.2.1).
20. Aldous, D.: Hashing with Linear Probing, Under Non-Uniform Probabilities;
Probability in the Eng. and Inf. Sciences, 2:l-14, (1988). (3.3.4).
21. Alekseyed, V.B.: On the Complexity of Some Algorithms of Matrix Multipli-
cation; J of Algorithms, 6(1):71-85, (Mar 1985). (6.3).
22. Allen, B. and Munro, J.I.: Self-organizing Search Trees; J.ACM, 25(4):526-
535, (Oct 1978). (3.4.1.6, 3.1).
23, Allen, B.: On the Costs of Optimal and Near-Optimal Binary Search Trees;
Acta Informatica, 18(3):255-263, (1982). (3.4.1.6, 3.4.1.7).
24. Allison, D.C.S. and Noga, M.T.: Selection by Distributive Partitioning; Inf.
Proc. Letters, 11(1):7-8, (Aug 1980). (5.2).
25. Allison, D.C.S. and Noga, M.T.: Usort: An Efficient Hybrid of Distributive
Partitioning Sorting; BIT, 23(2):135-139, (1982). (4.2.5).
26. Alt, H., Mehlhorn, K. and Munro, J.I.: Partial Match Retrieval in Implicit
Data Structures; Inf. Proc. Letters, 19(2):61-65, (Aug 1984). (3.6.2).
27. Alt, H.: Comparing the Combinatorial Complexities of Arithmetic Functions;
J.ACM, 35(2):447-460, (Apr 1988). (6.1).
28. Alt, H.: Functions Equivalent to Integer Multiplication; Proceedings ICALP,
Lecture Notes in Computer Science 85, Springer-Verlag, Noordwijkerhovt, Hol-
land, 7:30-37, (1980). (6.1).
29. Alt, H.: Multiplication is the Easiest Nontrivial Arithmetic Function; Pro-
ceedings FOCS, Tucson AZ, 24:320-322, (Nov 1983). (6.1).
30. Amble, 0. and Knuth, D.E.: Ordered Hash Tables; Computer Journal,
17(3):135-142, (May 1974). (3.3.7).
REFERENCES 313
31. Amir, A., Landau, G.M. and Vishkin, U.: Efficient Pattern Matching with
Scaling; Proceedings SODA, San Francisco CA, 1:344-357, (Jan 1990). (7.3).
32. Anderson, H.D. and Berra, P.B.: Minimum Cost Selection of Secondary In-
dexes for Formatted Files; ACM TODS, 2(1):68-90, (1977). (3.4.3).
33. Anderson, M.R. and Anderson, M.G.: Comments on Perfect Hashing Func-
tions: A Single Probe Retrieving Method for Static Sets; C.ACM, 22(2):104-
105, (Feb 1979). (3.3.16).
34. Andersson, A. and Carlsson, S.: Construction of a Tree from Its Traversals in
Optimal Time and Space; Inf. Proc. Letters, 34( 1):21-25, (1983). (3.4.1).
35. Andersson, A. and Lai, T.W.: Fast Updating of Well Balanced Trees; Pro-
ceedings Scandinavian Workshop in Algorithmic Theory, SWAT90, Lecture
Notes in Computer Science 447, Springer-Verlag, Bergen, Norway, 2:111-121,
(July 1990). (3.4.1).
36. Andersson, A.: Improving Partial Rebuilding by Using Simple Balance Crite-
ria; Proceedings Workshop in Algorithms and Data Structures, Lecture Notes
in Computer Science 382, Springer-Verlag, Ottawa, Canada, 1:393-402, (Aug
1989). (3.4.1).
37. Apers, P.M.: Recursive Samplesort; BIT, 18(2):125-132, (1978). (4.1.3).
38. Apostolico, A. and Giancarlo, R.: The Boyer-Moore-Galil String Searching
Strategies Revisited; SIAM J on Computing, 15:98-105, (1986). (7.1.3).
39. Apostolico, A. and Guerra, C.: The Longest Common Subsequence Problem
Revisited; Algorithmica, 2:315-336, (1987). (7.3.1).
40. Apostolico, A. and Preparata, F.P.: Structural Properties of the String Statis-
tics Problem; JCSS, 31:394-411, (1985). (7.2.2).
41. Apostolico, A.: Improving the Worst-case Performance of the Hunt-Szymanski
Strategy for the Longest Common Subsequence of two Strings; Inf. Proc.
Letters, 23:63-69, (1986). (7.3.1).
42. Apostolico, A.: Remark on the Hsu-Du New Algorithm for the Longest Com-
mon Subsequence Problem; Inf. Proc. Letters, 25:235-236, (1987). (7.3.1).
43. Apostolico, A.: The Myriad Virtues of Subword Trees; Combinatorial Al-
gorithms on Words, NATO AS1 Series, Springer-Verlag, l?12:85-96, (1985).
(7.2.2).
44. Aragon, C. and Seidel, R.: Randomized Search Trees; Proceedings FOCS,
Research Triangle Park, NC, 30:540-545, (1989). (3.4.1).
45. Arazi, B.: A Binary Search with a Parallel Recovery of the Bits; SIAM J on
Computing, 15(3):851-855, (Aug 1986). (3.2.1).
46. Arnow, D. and Tenenbaum, A.M.: An Empirical Comparison of B-Trees, Com-
pact B-Trees and Multiway Trees; Proceedings ACM SIGMOD, Boston, Mass,
14:33-46, (June 1984). (3.4.2, 3.4.1.10).
47. Arora, S.R. and Dent, W.T.: Randomized Binary Search Technique; C.ACM,
12(2):77-80, (1969). (3.3.1, 3.4.1).
48. Artzy, E., Hinds, J.A. and Saal, H.J.: A Fast Technique for Constant Divisors;
C.ACM, 19(2):98-101, (Feb 1976). (6.1).
49. Atkinson, M.D., Sack, J.R., Santoro, N. and Strothotte, T.: Min-Max Heaps
and Generalized Priority Queues; C.ACM, 29(10):996-1000, (Oct 1986). (5.1.3,
5.1.6).
50. Atkinson, M.D. and Santoro, N.: A Practical Algorithm for Boolean Matrix
Multiplication; Inf. Proc. Letters, 39( 1):37-38, (Sep 1988). (6.3).
314 HANDBOOK OF ALGOliYTIIhfS AND DATA STRUCTURES
51. Aviad, Z. and Shamir, E.: A Direct Dynamic Solution to Range Search and Re-
lated Problems for Product Regions; Proceedings FOCS, Nashville T N , 22:123-
126, (Oct 1981). (3.6.3).
52. Badley, J.: Use of Mean distance between overflow records to compute average
search lengths in hash files with open addressing; Computer Journal, 29(2):167-
170, (Apr 1986). (3.3).
53. Baer, J.L. and Schwab, B.: A Comparison of Tree-Balancing Algorithms;
C.ACM, 20(5):322-330, (May 1977). (3.4.1.3, 3.4.1.4, 3.4.1.6).
54. Baer, J.L.: Weight-Balanced Trees; Proceedings AFIPS, Anaheim CA, 44:467-
472, (1975). (3.4.1.5).
55. Baeza-Yates, R.A., Gonnet, G.H. and Regnier, M.: Analysis of Boyer-Moore-
type String Searching Algorithms; Proceedings SODA, San Francisco CA,
1:328-343, (Jan 1990). (7.1.3).
56. Baeza-Yates, R.A., Gonnet, G.H. and Ziviani, N.: Expected Behaviour Analy-
sis of AVL Trees; Proceedings Scandinavian Workshop in Algorithmic Theory,
SWAT90, Lecture Notes in Computer Science 447, Springer-Verlag, Bergen,
Norway, 2:143-159, (July 1990). (3.4.1.3).
57. Baeza-Yates, R.A. and Gonnet, G.H.: A New Approach to Text Searching;
Proceedings ACM STGIR, Cambridge, Mass., 12:168-175, (June 1989). (7.1.7,
7.1.8).
58. Baeza-Yates, R.A. and Gonnet, G.H.: Efficient Text Searching of Regular
Expressions; Proceedings ICALP, Lecture Notes in Computer Science 372,
Springer-Verlag, Stresa, Italy, 16:46-62, (July 1989). (7.2.3).
59. Baeza-Yates, R.A. and Gonnet, G.H.: Average Case Analysis of Algorithms
using Matrix Recurrences; Proceedings ICCI, Niagara Falls, Canada, 2:47-51,
(May 1990). (3.4.2, 7.2.3).
60. Baeza-Yates, R.A. and Larson, P.: Performance of B+-trees with Partial Ex-
pansions; IEEE Trans. on Knowledge and Data Engineering, 1(2):248-257,
(June 1989). (3.4.2).
61. Baeza-Yates, R.A. and Poblete, P.V.: Reduction of the Transition Matrix of
a Fringe Analysis and Its Application to the Analysis of 2-3 Trees; Proceed-
ings SCCC Int. Conf. in Computer Science, Santiago, Chile, 5:56-82, (1985).
(3.4.2.1).
62. Baeza-Yates, R.A. and Regnier, M.: Fast Algorithms for Two Dimensional
and Multiple Pattern hiatching; Proceedings Scandinavian Workshop in Algo-
rithmic Theory, SWAT90, Lecture Notes in Computer Science 447, Springer-
Verlag, Bergen, Norway, 2:332-347, (July 1990). (7.1.4, 7.3.2).
63. Baeza-Yates, R.A.: Efficient Text Searching; PhD Dissertation, Department
of Computer Science, University of Waterloo, (May 1989). (7.1, 7.1.1, 7.1.2,
7.1.3, 7.1.5, 7.1.7, 7.1.8, 7.2.2, 7.2.3).
64. Baeza-Yates, R.A.: A Trivial Algorithm Whose Analysis Isnt: A Continua-
tion; BIT, 29:88-113, (1989). (3.4.1.9).
65. Baeza-Yates, R.A.: An Adaptive Overflow Technique for the B-tree; Proceed-
ings Extending Data Base Technology Conference, Lecture Notes in Computer
Science 416, Springer-Verlag, Venice, :16-28, (Mar 1990). (3.4.2).
66. Baeza-Yates, R.A.: Expected Behaviour of B+-trees under Random Insertions;
Acta Informatica, 26(5):439-472, (1989). (3.4.2).
67. Baeza-Yates, R.A.: Improved String Searching; Software - Practice and Expe-
rience, 19(3):257-271, (1989). (7.1.3).
REFERENCES 315
89. Bayer, R. and Unterauer, I<.: Prefix B-trees; ACM TODS, 2(1):11-26, (Mar
1977). (3.4.2).
90. Bayer, R.: Binary B-trees for virtual memory; Proceedings ACM SIGFIDET
Workshop on Data Description, Access and Control, San Diego CA, :219-235,
(Nov 1971). (3.4.2).
91. Bayer, R.: Symmetric Binary B-trees: Data Structure and Maintenance Algo-
rithms; Acta Informatica, 1(4):290-306, (1972). (3.4.2.2).
92. Bayer, R.: Storage Characteristics and Methods for Searching and Addressing;
Proceedings Information Processing 74, North-Holland, Stockholm, Sweden,
:440-444, (1974). (3.3, 3.4.2).
93. Bays, C.: A Note on When to Chain Overflow Items Within a Direct-Access
Table; C.ACM, 16(1):46-47, (Jan 1973). (3.3.11).
94. Bays, C.: Some Techniques for Structuring Chained Hash Tables; Computer
Journal, 16( 2): 126- 131, (May 1973). (3.3.12).
95. Bays, C.: The Reallocation of Hash-Coded Tables; C.ACM, 16(1):11-14, (Jan
1973). (3.3).
96. Bechtald, U. and Kuspert, K.: On the use of extendible Hashing without
hashing; Inf. Proc. Letters, 19(1):21-26, (July 1984). (3.3.13).
97. Beck, I. and Krogdahl, S.: A select and insert sorting algorithm; BIT,
28(4):726-735, (1988). (4.1).
98. Beckley, D.A., Evans, M.W. and Raman, V.K.: Multikey Retrieval from K-d
Trees and Quad Trees; Proceedings ACM SIGMOD, Austin T X , 14:291-303,
(1985). (3.5.1, 3.5.2).
99. Behymer, J.A., Ogilive, R.A. and Merten, A.G.: Analysis of Indexed Sequen-
tial and Direct Access File Organization; Proceedings ACM SIGMOD Work-
shop on Data Description, Access and Control, Ann Arbor MI, :389-417, (May
1974). (3.3.11, 3.4.3).
100. Belaga, E.G.: Some Problems Involved in the Computation of Polynomials;
Dokladi Akademia Nauk SSSR, 123:775-777, (1958). (6.4).
101. Bell, C.: An Investigation into the Principles of the Classification and Analysis
of Data on an Automatic Digital Computer; PhD Dissertation, Leeds Univer-
sity, (1965). (3.4.1).
102. Bell, D.A. and Deen, S.M.: Hash trees vs. B-trees; Computer Journal,
27(3):218-224, (Aug 1984). (3.4.2).
103. Bell, J.R. and Kaman, C.H.: The Linear Quotient Hash Code; C.ACM,
13(11):675-677, (Nov '1970). (3.3.5).
104. Bell, J.R.: The Quadratic Quotient Method: A Hash Code Eliminating Sec-
ondary Clustering; C.ACM, 13(2):107-109, (Feb 1970). (3.3.6).
105. Bell, R.C. and Floyd, B.: A Monte Carlo Study of Cichelli Hash-Function
Solvability; C.ACM, 26( 11):924-925, (Nov 1983). (3.3.16).
106. Bender, E.A., Praeger, C.E. and Wornald, C.N.: Optimal worst case trees;
Acta Informatica, 24(4):475-489, (1987). (3.4.1.7).
107. Bender, E.A.: Asymptotic methods in enumeration; SIAM Review, 16:485-
515, (1974). (11).
108. Bent, S.W. and John, J.W.: Finding the median requires 2n comparisons;
Proceedings STOC SIGACT, Providence, RI, 17:213-216, (May 1985). (5.2).
109. Bent, S.W., Sleator, D.D. and Tarjan, R.E.: Biased 2-3 Trees; Proceedings
FOCS, Syracuse NY, 21:248-354, (Oct 1980). (3.4.2.1).
110. Bent, S.W., Sleator, D.D. and Tarjan, R.E.: Biased Search Trees; SIAM .J on
Computing, 14(3):545-568, (Aug 1985). (3.4.1.6).
111. Bent, S.W.: Ranking Trees Generated by Rotations; Proceedings Scandinavian
Workshop in Algorithmic Theory, SWATSO, Lecture Notes in Computer Sci-
ence 447, Springer-Verlag, Bergen, Norway, 2:132-142, (July 1990). (3.4.1.8).
112. Bentley, J.L. and Brown, D.J.: A General Class of Resource Tradeoffs; Pro-
ceedings FOCS, Syracuse NY, 21:217-228, (Oct 1980). (2.2).
113. Bentley, J.L. and Friedman, J.H.: Data Structures for Range Searching; ACM
C. Surveys, 11(4):397-409, (Dec 1979). (3.6).
114. Bentley, J.L. and Maurer, H.A.: A Note on Euclidean Near Neighbor Searching
in the Plane; Inf. Proc. Letters, 8(3):133-136, (Mar 1979). (3.5).
115. Bentley, J.L. and Maurer, H.A.: Efficient Worst-case Data Structures for
Range Searching; Acta Informatica, 13(2):155-168, (1980). (3.6).
116. Bentley, J.L. and McGeoch, C.C.: Amortized Analyses of Self-organizing Se-
quential Search Heuristics; C.ACM, 28(4):404-411, (Apr 1985). (3.1.2, 3.1.3).
117. Bentley, J.L. and Saxe, J.B.: Decomposable Searching Problems. I. Static-to-
Dynamic Transformation; J of Algorithms, 1(4):301-358, (Dec 1980). (2.2).
118. Bentley, J.L. and Saxe, J.B.: Generating Sorted Lists of Random Numbers;
ACM TOMS, 6(3):359-364, (Sep 1.980). (4.2).
119. Bentley, J.L. and Shamos, M.I.: Divide and Conquer for Linear Expected
Time; Inf. Proc. Letters, 7(2):87-91, (Feb 1978). (2.2.2.1).
120. Bentley, J.L. and Shamos, M.I.: Divide and Conquer in Multidimensional
Space; Proceedings STOC-SIGACT, Hershey PA, 8:220-230, (May 1976).
(2.2.2.1).
121. Bentley, J.L. and Stanat, D.F.: Analysis of Range Searches in Quad Trees;
Inf. Proc. Letters, 3(6):170-173, (July 1975). (3.5.1).
122. Bentley, J.L. and Yao, A.C-C.: An Almost Optimal Algorithm for Unbounded
Searching; Inf. Proc. Letters, 5(3):82-87, (Aug 1976). (3.2.1).
123. Bentley, J.L.: An Introduction t o Algorithm Design; IEEE Computer,
12(2):66-78, (Feb 1979). (2.2).
124. Bentley, J.L.: Decomposable Searching Problems; Inf. Proc. Letters, 8(5):244-
251, (June 1979). (2.2).
125. Bentley, J.L.: Multidimensional Binary Search Trees in Database Applications;
IEEE Trans. Software Engineering, 5(4):333-340, (July 1979). (3.5.2).
126. Bentley, J.L.: Multidimensional Binary Search Trees Used for Associative
Searching; C.A CM, 18( 9):50 9-5 17, (Sep 1975). (3.5.2).
127. Bentley, J .L. : Multidimensional Divide-and-Conquer ; C .ACM, 23( 4):2 14-229,
(Apr 1980). (3.5).
128. Bentley, J.L.: Programming Pearls: Selection; C.ACM, 28(11):1121-1127,
(Nov 1985). (5.2.2).
129. Berman, F., Bock, M.E., Dittert, E., ODonell, M.J. and Plank, P.: Collections
of functions for perfect hashing; SIAM J on Computing, 15(2):604-618, (May
1986). (3.3.16).
130. Berman, G. and Colijn, A.W.: A Modified List Technique Allowing Binary
Search; J.ACM, 21(2):227-232, (Apr 1974). (3.1.1, 3.2.1).
131. Bing-Chao, H. and Knuth, D.E.: A one-way, stackless quicksort algorithm;
BIT, 26(1):127-130, (1986). (4.1.3).
318 HANDBOOK OF ALGORITIMS A N D D A T A STRUCTURES
132. Bini, D., Capovani, M., Romani, F. and Lotti, G.: O(n**2.7799) Complexity
for n x n Approximate Matrix Multiplication; Inf. Proc. Letters, 8(5):234-235,
(June 1979). (6.3).
133. Bird, R.: Two Dimensional Pattern Matching; Inf. Proc. Letters, 6:168-170,
(1977). (7.3.2).
134. Bitner, J.R. and Huang, S-H.S.: Key Comparison Optimal 2-3 Trees with Max-
imum Utilization; SIAM J on Computing, 10(3):558-570, (Aug 1981). (3.4.2.1).
135. Bitner, J.R.: Heuristics that Dynamically Organize Data Structures; SIAM J
on Computing, 8(1):82-110, (Feb 1979). (3.1.2, 3.1.3).
136. Bjork, H.: A Bi-Unique Transformation into Integers of Identifiers and Other
Variable-Length Items; BIT, 11(1):16-20, (1971). (3.3.1).
137. Blake, I.F. and Konheim, A.G.: Big Buckets Are (Are Not) Better!; J.ACM,
24(4):591-606, (Oct 1977). (3.3.4).
138. Bloom, B.H.: Space/Time Trade-offs in Hash Coding with Allowable Errors;
C.ACM, 13(7):422-426, (1970). (3.3).
139. Blum, N., Floyd, R.W., Pratt, V., Rivest, R.L. and Tarjan, R.E.: Time Bounds
for Selection; JCSS, 7(4):448-461, (Aug 1973). (5.2).
140. Blum, N. and Mehlhorn, K.: On the Average Number of Rebalancing Opera-
tions in Weight-Balanced Trees; Theoretical Computer Science, 11(3):303-320,
(July 1980). (3.4.1.4).
141. Blumer, A., Blumer, J., Haussler, D., Ehrenfeucht, A., Chen, M.T. and
Seiferas, J.: The Smallest Automaton Recognizing the Subwords of a Text;
Theoretical Computer Science, 40:31-55, (1985). (7.2.5).
142. Blumer, A., Blumer, J., Haussler, D., McConnell, R. and Ehrenfeucht, A.:
Complete Inverted Files for Efficient Text Retrieval and Analysis; J.ACM,
34(3):578-595, (July 1987). (7.2.1, 7.2.5).
143. Bobrow, D.G. and Clark, D.W.: Compact Encodings of List Structure; ACM
TOPLAS, 1(2):266-286, (Oct 1979). (2.1).
144. Bobrow, D.G.: A Note on Hash Linking; C.ACM, 18(7):413-415, (July 1975).
(3.3).
145. Bollobas, B. and Simon, I.: Repeated Random Insertion in a Priority Queue;
J of Algorithms, 6(4):466-477, (Dec 1985). (5.1.3).
146. Bolour, A.: Optimal Retrieval Algorithms for Small Region Queries; SIAM J
on Computing, 10(4):721-741, (Nov 1981). (3.3).
147. Bolour, A.: Optimality Properties of Multiple-Key Hashing Functions; J.ACM,
26(2):196-210, (Apr 1979). (3.3.1, 3.5.4).
148. Bookstein, A.: Double Hashing; J American Society of Information Science,
23(6):40 2-405, (1972). (3.3.5, 3.3.11 1.
149. Bookstein, A.: On Harrisons Substring Testing Technique; C.ACM, 16:180-
181, (1973). (7.2.6).
150. Boothroyd, J.: Algorithm 201, Shellsort; C.ACM, 6(8):445, (Aug 1963).
(4.1.4).
151. Boothroyd, J.: Algorithm 207, Stringsort; C.ACM, 6( 10):615, (Oct 1963).
(4.1).
152. Borodin, A. and Cook, S.: A Time-Space Tradeoff for Sorting on a General Se-
quential Model of computation; SIAM J on Computing, 11(2):287-297, (May
1982). (4.1, 4.3).
REFERENCES 319
153. Borodin, A. and Cook, S.: On the Number of Additions to Compute Specific
Polynomials; SIAM J on Computing, 5(1):146-157, (Mar 1976). (6.4).
154. Borodin, A., Fischer, M.J., Kirkpatrick, D.G., Lynch, N.A. and Tompa, M.P.:
A Time-Space Tradeoff for Sorting on Non-Oblivious Machines; Proceedings
FOCS, San Juan PR, 20:319-327, (Oct 1979). (4.1, 4.2).
155. Borwein, J.M. and Borwein, P.M.: The Arithmetic-Geometric Mean and Fast
Computation of Elementary Functions; SIAM Review, 26(3):351-366, (1984).
(6.2).
156. Boyer, R. and Moore, S.: A Fast String Searching Algorithm; C.ACM, 20:762-
772, (1977). (7.1.3).
157. Bradford, J.: Sequence Matching with Binary Codes; Inf. Proc. Letters,
34(4):193-196, (July 1983). (7.1.8).
158. Brain, M.D. and Tharp, A.L.: Perfect Hashing Using Sparse Matrix Packing;
Inform. Systems, 15(3):281-290, (1990). (3.3.16).
159. Brent, R.P.: Fast Multiple-Precision Evaluation of Elementary Functions;
J.ACM, 23(2):242-251, (1976). (6.1, 6.2).
160. Brent, R.P.: Multiple-Precision Zero-Finding Methods and the Complexity of
Elementary Function Evaluation; Analytic Computational Complexity, Aca-
demic Press, :151-176, (1976). (6.1, 6.2).
161. Brent, R.P.: Reducing the Retrieval Time of Scatter Storage Techniques;
C.ACM, 16(2):105-109, (Feb 1973). (3.3.8.1).
162. Brinck, K. and Foo, N.Y.: Analysis of Algorithms on Threaded Trees; Com-
puter Journal, 24(2):148-155, (May 1981). (3.4.1.1).
163. Brinck, K.: Computing parent nodes in threaded binary trees; BIT, 26(4):402-
409, (1986). (3.4.1).
164. Brinck, K.: On deletion in threaded binary trees; J of Algorithms, 7(3):395-
411, (Sep 1986). (3.4.1.9).
165. Brinck, K.: T h e expected performance of traversal algorithms in binary trees;
Computer Journal, 28(4):426-432, (Aug 1985). (3.4.1).
166. Brockett, R.W. and Dobkin, D.: On the Number of Multiplications Required
for Matrix Multiplication; SIAM J on Computing, 5(4):624-628, (Dec 1976).
(6.3).
167. Broder, A.Z. and Karlin, A.R.: Multilevel Adaptive Hashing; Proceedings
SODA, San Francisco CA, 1:43-53, (Jan 1990). (3.3).
168. Bron, C.: Algorithm 426: Merge Sort Algorithm ( M l ) ; C.ACM, 15(5):357-358,
(May 1972). (4.2.1).
169. Brown, G.G. and Shubert, B.O.: On random binary trees; Math. Operations
Research, 9:43-65, (1984). (3.4.1).
170. Brown, M.R. and Dobkin, D.: An Improved Lower Bound on Polynomial
Multiplication; IEEE Trans. on Computers, 29(5):337-340, (May 1980). (6.4).
171. Brown, M.R. and Tarjan, R.E.: A Fast Merging Algorithm; J.ACM, 26(2):211-
226, (Apr 1979). (4.3, 5.1).
172. Brown, M.R. and Tarjan, R.E.: A Representation for Linear Lists with Mov-
able Fingers; Proceedings STOC-SIGACT, San Diego CA, 10:19-29, (May
1978). (3.4.2.1).
173. Brown, M.R. and Tarjan, R.E.: Design and Analysis of a Data Structure for
Representing Sorted Lists; SIAM J on Computing, 9(3):594-614, (Aug 1980).
(3.4.2.1).
320 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
195. Cardenas, A.F. and Sagamang, J.P.: Doubly-Chained Tree Data Base Or-
ganization - Analysis and Design Strategies; Computer Journal, 20( 1):15-26,
(1977). (3.4.3).
196. Cardenas, A.F.: Evaluation and Selection of File Organization - A Model and
a System; C.ACM, 16(9):540-548, (Sep 1973). (3.4.3).
197. Carlsson, S., Chen, J. and Strothotte, T.: A note on the construction of the
data structure deap; Inf. Proc. Letters, 31(6):315-317, (June 1989). (5.1.3).
198. Carlsson, S. and Mattsson, C.: An Extrapolation on the Interpolation Search;
Proceedings SWAT 88, Halmstad, Sweden, 1:24-33, (1988). (3.2.2).
199. Carlsson, S., Munro, J.I. and Poblete, P.V.: An Implicit Binomial Queue with
Constant Insertion Time; Proceedings SWAT 88, Halmstad, Sweden, 1:l-13,
(1988). (5.1.7).
200. Carlsson, S.: Average-case results on heapsort; BIT, 27(1):2-16, (1987).
(4.1.5).
201. Carlsson, S.: Improving worst-case behavior of heaps; BIT, 24( 1):14-18,
(1984). (5.1.3).
202. Carlsson, S.: Split Merge-A Fast Stable Merging Algorithm; Inf. Proc. Letters,
22(4):189-192, (Apr 1986). (4.3.2).
203. Carlsson, S.: The Deap - A double-ended heap t o implement double-ended
priority queues; Inf. Proc. Letters, 26(1):33-36, (Sep 1987). (5.1.3).
204. Carter, J.L. and Wegman, M.N.: Universal Classes of Hash Functions; JCSS,
18 (2):143- 154, (A pr 1979). (3.3.1).
205. Casey, R.G.: Design of Tree Structures for Efficient Querying; C.ACM,
16(9):549-556, (Sep 1973). (3.4.3).
206. Celis, P., Larson, P. and Muiiro, J.I.: Robin Hood Hashing; Proceedings FOCS,
Portland OR, 26:281-288, (Oct 1985). (3.3.3, 3.3.8.4).
207. Celis, P.: External Robin Hood Hashing; Proceedings SCCC Int. Conf. in
Computer Science, Santiago, Chile, 6:185-200, (July 1986). (3.3.3, 3.3.8.4).
208. Celis, P.: Robin Hood Hashing; PhD Dissertation, University of Waterloo,
(1985). (3.3.3, 3.3.8.4).
209. Cercone, N., Boates, J. and Krause, M.: An Interactive System for Finding
Perfect Hashing Functions; IEEE Software, 2(6):38-53, (1985). (3.3.16).
210. Cesarini, F. and Sada, G.: An algorithm to construct a compact B-tree in case
of ordered keys; Inf. Proc. Letters, 17(1):13-16, (July 1983). (3.4.2).
211. Cesarini, F. and Soda, G . : Binary Trees Paging; Inform. Systems, 7:337-334,
(1982). (3.4.1).
212. Chang, C.C. and Lee, R.C.T.: A Letter-oriented minimal perfect hashing;
Computer Journal, 29(3):277-281, (June 1986). (3.3.16).
213. Chang, C.C.: The Study of an Ordered Minimal Perfect Hashing Scheme;
C.ACM, 27(4):384-387, (Apr 1984). (3.3.16).
214. Chang, H. and Iyengar, S.S.: Efficient Algorithms to Globally Balance a Binary
Search Tree; C.ACM, 27(7):695-702, (July 1984). (3.4.1.6).
215. Chapin, N.:
A Comparison of File Organization Techniques; Proceedings
ACM-NCC, New York NY, 24:273-283, (Sep 1969). (3.3, 3.4.3).
216. Chapin, N.: Common File Organization Techniques Compared; Proceedings
AFIPS Fall JCC, Las Vegas NE, :413-432, (Nov 1969). (3.3, 3.4.3).
322 HANDBOOK OF A L G O R J T H M S A N D D A T A S T R U C T U R E S
239. Coffman, E.G. and Bruno, J.: On File Structuring for Non-Uniform Access
Frequencies; BIT, 10(4):443-456, (1970). (3.4.1).
240. Coffman, E.G. and Eve, J.: File Structures Using Hashing Functions; C.ACM,
13(7):427-436, (1970). (3.3).
241. Cohen, J. and Roth, M.: On the Implementation of Strassen's Fast Multipli-
cation Algorithm; Acta Informatica, 6:341-355, (1976). (6.3).
242. Cohen, J.: A Note on a Fast Algorithm for Sparse Matrix Multiplication; Inf.
Proc. Letters, 16(5):247-248, (June 1983). (6.3).
243. Cole, R.: On the Dynamic Finger Conjecture for Splay Trees; Proceedings
STOC-SIGACT, Baltimore MD, 22:8-17, (May 1990). (3.4.1.6).
244. Cole, R.: Searching and Storing similar lists; J of Algorithms, 7(2):202-220,
(June 1986). (3.5).
245. Colin, A.J.T., McGettrick, A.D. and Smith, P.D.: Sorting Trains; Computer
Journal, 23(3):270-273, (Aug 1980). (4.2, 4.4.4).
246. Collins, G.E. and Musser, D.R.: Analysis of the Pope-Stein Division Algo-
rithm; Inf. Proc. Letters, 6(5):151-155, (Oct 1977). (6.1).
247. Collmeyer, A.J. and Shemer, J.E.: Analysis of Retrieval Performance for Se-
lected File Organization Techniques; Proceedings AFIPS, Houston TX, 37:201-
210, (1970). (3.3, 3.4.3).
248. Comer, D. and Sethi, R.: The Complexity of Trie Index Construction; J.ACM,
24(3):428-440, (July 1977). (3.4.4).
249. Comer, D. and Shen, V.: Hash-Bucket Search: A Fast Technique for Searching
an English Spelling Dictionary; Software - Practice and Experience, 12:669-
682, (1982). (7.2.2, 7.2.6).
250. Comer, D.: A Note on Median Split Trees; ACM TOPLAS, 2(1):129-133, (Jan
1980). (3.4.1.6).
251. Comer, D.: Analysis of a Heuristic for Full Trie Minimization; ACM TODS,
6(3):513-537, (Sep 1981). (3.4.4).
252. Comer, D.: Effects of Updates on Optimality in Tries; JCSS, 26(1):1-13, (Feb
1983). (3.4.4).
253. Comer, D.: Heuristics for Trie Index Minimization; ACM TODS, 4(3):383-395,
(Sep 1979). (3.4.4).
254. Comer, D.: The Ubiquitous B-tree; ACM C. Surveys, 11(2):121-137, (June
1979). (3.4.2).
255. Commentz-Walter, B.: A String Matching Algorithm Fast on the Average;
Proceedings ICALP, Lecture Notes in Computer Science 71, Springer-Verlag,
Graz, Austria, 6:118-132, (July 1979). (7.1.4).
256. Cook, C.R. and Kim, D.J.: Best Sorting Algorithm for Nearly Sorted Lists;
C.ACM, 23(11):620-624, (Nov 1980). (4.1).
257. Cooper, D., Dicker, M.E. and Lynch, F.: Sorting of Textual Data Bases: A
Variety Generation Approach to Distribution Sorting; Inf. Processing and
Manag., 16:49-56, (1980). (4.2.3).
258. Cooper, R.B. and Solomon, M.K.: The Average Time until Bucket Overflow;
ACM TODS, 9(3):392-408, (1984). (3.4.3).
259. Coppersmith, D. and Winograd, S.: Matrix Multiplication via Arithmetic Pro-
gressions; Proceedings STOC-SIGACT, New York, 19:l-6, (1987). (6.3).
260. Coppersmith, D. and Winograd, S.: On the Asymptotic Complexity of Matrix
Multiplication; SIAM J on Computing, 11(3):472-492, (Aug 1982). (6.3).
324 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
323. Driscoll, J.R., Lang, S.D. and Bratman, S.M.: Achieving Minimum Height
for Block Split Tree Structured Files; Inform. Systems, 12:115-124, (1987).
(3.4.2).
324. Driscoll, J.R. and Lien, Y.E.: A Selective Traversal Algorithm for Binary
Search Trees; C.ACM, 21(6):445-447, (June 1978). (3.4.1).
325. Dromey, R.G.: A Fast Algorithm for Text Comparison; Australian Computer
J, 11~63-67,(1979). (7.3.1).
326. Du, M.W., Hsieh, T.M., Jea, K.F. and Shieh, D.W.: The Study of a New
Perfect Hash Scheme; IEEE Trans. Software Engineering, SE-9( 3):305-313,
(Mar 1983). (3.3.16).
327. Ducoin, F.: Tri par Adressage Direct; RAIRO Informatique, 13(3):225-237,
(1979). (4.1.6).
328. Dudzinski, K. and Dydek, A.: On a Stable Minimum Storage Merging Algo-
rithm; Inf. Proc. Letters, 12( 1):5-8, (Feb 1981). (4.3.2).
329. Dvorak, S. and Durian, B.: Merging by decomposition revisited; Computer
Journal, 31(6):553-556, (Dec 1988). (4.3.2).
. 330. Dvorak, S. and Durian, B.: Stable linear time sublinear space merging; Com-
puter Journal, 30(4):372-374, (Aug 1987). (4.3.2).
331. Dvorak, S. and Durian, B.: Unstable linear time O( 1) space merging; Computer
Journal, 31(3):279-282, (June 1988). (4.3.2).
332. Dwyer, B.: One More Time-How to Update a Master File; C.ACM, 24(1):3-8,
(Jan 1981). (2.2.2.1).
333. Eades, P. and Staples, J.: On Optimal Trees; J of Algorithms, 2(4):369-384,
(Dec 1981). (3.4.1.6).
334. Eastman, C.M. and Weiss, S.F.: Tree Structures for High Dimensionality Near-
est Neighbor Searching; Inform. Systems, 7:115-122, (1982). (3.5).
335. Eastman, C.M. and Zemankova, M.: Partially Specified Nearest Neighbor
Searches Using k-d Trees; Inf. Proc. Letters, 15(2):53-56, (Sep 1982). (3.5.2).
336. Eastman, C.M.: Optimal Bucket Size for Nearest Neighbor Searching in k-d
Trees; Inf. Proc. Letters, 12(4):165-167, (Aug 1981). (3.5.2).
337. Eberlein, P.J.: A Note on Median Selection and Spider Production; Inf. Proc.
Letters, 9(1):19-22, (July 1979). (5.2).
338. Ecker, A.: The Period of Search for the Quadratic and Related Hash Methods;
Computer Journal, 17(4):340-343, (Nov 1974). (3.3.6).
339. Ehrenfeucht, A. and Haussler, D.: A new distance metric on strings com-
putable in linear time; Discr App Math, 20:191-303, (1988). (7.1.8).
340. Ehrlich, G.: Searching and Sorting Real Numbers; J of Algorithms, 2(1):1-12,
(Mar 1981). (3.2.2, 4.1.6).
341. Eisenbarth, B., Ziviani, N., Gonnet, G.H., Mehlhorn, K. and Wood, D.: The
Theory of Fringe Analysis and Its Application to 2-3 Trees and B-Trees; In-
formation and Control, 55( 1):125-174, (Oct 1982). (3.4.2, 3.4.2.1).
342. Enbody, R.J. and Du, H.C.: Dynamic Hashing Schemes; ACM C. Surveys,
20(2):85-114, (June 1988). (3.3.13, 3.3.14).
343. Eppinger, J.L.: An Empirical Study of Insertion and Deletion in Binary Search
Trees; C.ACM, 26(9):663-669, (Sep 1983). (3.4.1.1).
344. Eppstein, D., Galil, Z., Giancarlo, R. and Italiano, G.: Sparse Dynamic
Programming; Proceedings SODA, San Francisco CA, 1:513-522, (Jan 1990).
(7.1.8, 7.3.1).
328 H A N D B O O K OF A L G O R I T H M S A N D D A T A S T R U C T U R E S
345. Er, M.C. and Lowden, B.G.T.: The Theory and Practice of Constructing an
Optimal Polyphase Sort; Computer Journal, 25(1):93-101, (Feb 1982). (4.4.4).
346. Erkio, H.: A Heuristic Approximation of the Worst Case of Shellsort; BIT,
20 (2):130- 136, ( 198 0). (4.1.4).
347. Erkio, H.: Internal Merge Sorting with Delayed Selection; Inf. Proc. Letters,
11(3):137-140, (NOV 1980). (4.2.1).
348. Erkio, H.: Speeding Sort Algorithms by Special Instructions; BIT, 21(1):2-19,
(1981). (4.1).
349. Erkio, H.: The worst case permutation for mediac-of-three quicksort; Com-
puter Journal, 27(3):276-277, (Aug 1984). (4.1.3).
350. Erkioe, H. and Terkki, R.: Binary Search with Variable-Length Keys Within
an Index Page; Inform. Systems, 8:137-140, (1983). (3.2.1).
351. Espelid, T.O.: Analysis of a Shellsort Algorithm; BIT, 13(4):394-400, (1973).
(4.1.4).
352. Espelid, T.O.: On Replacement Selection and Dinsmores Improvement; BIT,
16(2):133-142, (1976). (4.4.1).
353. Estivill-Castro, V. and Wood, D.: A new measure of presortedness; Informa-
tion and Computation, 83(1):111-119, (Oct 1989). (4.1.8).
354. Eve, J.: The Evaluation of Polynomials; Numer Math, 6:17-21, (1974). (6.4).
355. Fabbrini, F. and Montani, C.: Autumnal Quadtrees; Computer Journal,
29( 5):472-474, (Oct 1986). (3.5.1).
356. Fabri, J.: Some Remarks on p-Way Merging; SIAM J on Computing, 6(2):268-
271, (June 1977). (4.3).
357. Fagin, R., Nievergelt, J., Pippenger, N. and Strong, H.R.: Extendible Hashing-
A Fast Access Method for Dynamic Files; ACM TODS, 4(3):315-344, (Sep
1979). (3.3.13).
358. Faloutsos, C. and Christodoulakis, S.: Description and Performance Analysis
of Signature File Methods; ACM TOOIS, 5(3):237-257, (1987). (7.2.6).
359. Faloutsos, C. and Christodoulakis, S.: Signature Files: An Access Method
for Documents and Its Analytical Performance Evaluation; ACM TOOIS,
2(4):267-388, (Oct 1984). (7.2.6).
360. Faloutsos, C., Sellis, T. and Roussopoulos, N.: Analysis of Object Oriented
Spatial Access Methods; Proceedings ACM SIGMOD, San Francisco CA,
16:426-439, (May 1987). (3.5).
361. Faloutsos, C.: Access Methods for Text; ACM C. Surveys, 17:49-74, (1985).
(7.2).
362. Faloutsos, C. and Roseman, S.: Fractals for Secondary Key Retrieval; Pro-
ceedings ACM PODS, Philadelfia PA, 8, (Mar 1989). (3.5.4).
363. Faloutsos, C.: Multiattribute Hashing using Gray Codes; Proceedings ACM
SIGMOD, Washington DC, 15:227-238, (May 1986). (3.5.4).
364. Faloutsos, C.: Signature Files : an integrated access method for text and
attributes suitable for optical disk storage; BIT, 28(4):736-754, (1988). (7.2.6).
365. Feig, E.: Minimal Algorithms for Bilinear Forms May Have Divisions; J of
Algorithms, 4( 1):81-84, (Mar 1983). (6.3).
366. Feig, E.: On Systems of Bilinear Forms Whose Minimal Division-Free Algo-
rithms are all Bilinear; J of Algorithms, 2(3):261-281, (Sep 1981). (6.3).
REFERENCES 329
367. Feldman, J.A. and Low, J.R.: Comment on Brents Scatter Storage Algorithm;
C.ACM, 16(11):703, (Nov 1973). (3.3.8.1).
368. Felician, L.: Linked-hashing: an Improvement of Open Addressing Techniques
for Large Secondary Storage Files; Inform. Systems, 12(4):385-390, (1987).
(3.3).
369. Fiat, A., Naor, M., Schaffer, A., Schmidt, J.P. and Siegel, A.: Storing and
Searching a Multikey Table; Proceedings STOC-SIGACT, Chicago IL, 20:344-
353, (May 1988). (3.5).
370. Fiat, A., Naor, M., Schmidt, J.P. and Siegel, A.: Non-Oblivious Hashing;
Proceedings STOC-SIGACT, Chicago IL, 20:367-376, (May 1988). (3.3.1).
371. Fiat, A. and Naor, M.: Implicit O(1) Probe Search; Proceedings STOC-
SIGACT, Seattle, Washington, 21:336-344, (May 1989). (3.3.1).
372. Finkel, R.A. and Bentley, J.L.: Quad Trees: A Data Structure for Retrieval
on Composite Keys; Acta Informatica, 4( 1):l-9, (1974). (3.5.1).
373. Fischer, M.J. and Paterson, M.S.: Fishpear: A priority queue algorithm; Pro-
ceedings FOCS, Singer Island FL, 25:375-386, (Oct 1984). (5.1).
374. Fischer, M. J. and Paterson, M.S.: String Matching and Other Products; Com-
plexity of Computation (SIAM-AMS Proceedings 7), American Mathematical
Society, Providence, RI, 7:113-125, (1974). (7.1).
375. Fisher, M.T.R.: On universal binary search trees; Fundamenta Informaticae,
4( 1): 173-184, (1 981). (3.4.1).
376. Flajolet, P., Francon, J. and Vuillemin, J.: Computing Integrated Costs of Se-
quences of Operations with Applications to Dictionaries; Proceedings STOC-
SIGACT, Atlanta GA, 11:49-61, (Apr 1979). (3.1.1, 3.2.1, 3.4.1).
377. Flajolet, P., Francon, J. and Vuillemin, J.: Sequence of Operations Analysis for
Dynamic Data Structures; J of Algorithms, l ( 2 ) : l l l - 1 4 1 , (June 1980). (3.1.1,
3.2, 3.4.1, 5.1).
378. Flajolet, P., Francon, J. and Vuillemin, J.: Towards Analysing Sequences of
Operations for Dynamic Data Structures; Proceedings FOCS, San Juan PR,
20:183-195, (Oct 1979). (3.1.1, 3.2, 3.4.1, 5.1).
379. Flajolet, P. and Martin, N.G.: Probabilistic Counting Algorithms for Data
Base Applications; JCSS, 31(2):182-209, (Oct 1985). (6.1).
380. Flajolet, P. and Odlyzko, A.M.: Exploring Binary Trees and Other Simple
Trees; Proceedings FOCS, Syracuse NY, 21:207-216, (Oct 1980). (3.4.1.2).
381. Flajolet, P. and Odlyzko, A.M.: Limit Distributions for Coefficients of Iterates
of Polynomials with Applications t o Combinatorial Enumerations; Math Proc
Camb Phil SOC,96:237-253, (1984). (3.4.1.2).
382. Flajolet, P. and Odlyzko, A.M.: The Average Height of Binary Trees and
Other Simple Trees; JCSS, 25(2):171-213, (Oct 1982). (3.4.1.2).
383. Flajolet, P., Ottmann, T. and Wood, D.: Search Trees and Bubble Memories;
RAIRO Informatique Theorique, 19( 2):137-164, (1985). (3.4.1.1).
384. Flajolet, P. and Puech, C.: Partial Match Retrieval of Multidimensional Data;
J.ACM, 33(2):371-407, (Apr 1986). (3.5.2, 3.6.2).
385. Flajolet, P. and Puech, C.: Tree Structures for Partial Match Retrieval; Pro-
ceedings FOCS, Tucson AZ, 24:282-288, (Nov 1983). (3.5.1, 3.5.2, 3.6.2).
386. Flajolet, P., Gonnet, G.H., Puech, C. and Robson, M.: The Analysis of Mul-
tidimensional Searching in Quad-Trees; Proceedings SODA91, San Francisco
CA, 2, (Jan 1991). (3.5.1).
330 HANDBOOK OF ALGORITllhfS AND DATA STRUCTURES
387. Flajolet, P., Regnier, M. and Sotteau, D.: Algebraic Methods for Trie Statis-
tics; Annals of Discrete Mathematics, 25:145-188, (1985). (3.4.4, 3.5.1).
388. Flajolet, P. and Saheb, N.: Digital Search Trees and the Generation of an
Exponentially Distributed Variate; Proceedings CAAP, LAquila, Italy, 10:221-
235, (1983). (3.4.4).
389. Flajolet, P. and Sedgewick, R.: Digital Search Trees Revisited; SIAM J on
Computing, 15:748-767, (1986). (3,4.4).
390. Flajolet, P. and Steyaert, J.M.: A Branching Process Arising in Dynamic
Hashing, Trie Searching and Polynomial Factorization; Proceedings ICALP,
Aarhus, 9:239-251, (July 1982). (3.3.13, 3.4.4).
391. Flajolet, P.: Approximate Counting: A Detailed Analysis; BIT, 25:113-134,
(1985). (6.1).
393. Flajolet, P.: On the Performance Evaluation of Extendible Hashing and Trie
Search; Acta Informatica, 20(4):345-369, (1983). (3.3.13, 3.4.4).
393. Flores, I. and Madpis, G.: Average Binary Search Length for Dense Ordered
Lists; C.ACM, 14(9):602-603, (Sep 1971). (3.2.1).
394. Flores, I.: Analysis of Internal Computer Sorting; J.ACM, 8(1):41-80, (Jan
1961). (4.1).
395. Flores, I.: Computer Time for Address Calculation Sorting; J.ACM, 7(4):389-
409, (Oct 1960). (4.1.6, 4.2.3).
396. Floyd, R.W. and Rivest, R.L.: Expected Time Bounds for Selection; C.ACM,
18(3):165-172, (Mar 1975). (5.2).
397. Floyd, R.W. and Smith, A.J.: A Linear Time Two Tape Merge; Inf. Proc.
Letters, 2(5):123-125, (Dec 1973). (4.3).
398. Floyd, R.W.: Algorithm 245, Treesort3; C.ACM, 7(12):701, (Dec 1964). (4.1.5,
5.1.3).
399. Floyd, R.W.: The Exact Time Required to Perform Generalized Addition;
Proceedings FOCS, Berkeley CA, 16:3-5, (Oct 1975). (G.1).
400. Forbes, I<.: Random Files and Subroutine for Creating a Random Address;
Australian Computer J, 4(1):35-40, (1972). (3.3.1).
401. Foster, C.C.: A Generalization of AVL Trees; C.ACM, 16(8):513-517, (Aug
1973). (3.4.1.3).
402. Foster, C.C.: Inforrnation Storage and Retrieval Using AVL Trees; Proceedings
ACM-NCC, Cleveland OH, 20:192-205, (1965). (3.4.1.3).
403. Francon, J., Randrianarimanana, B. and Schott, R.: Analysis of dynamic
algorithms in Knuths model; Theoretical Computer Science, 72(2/3): 147-168,
(May 1990). (3.4.1).
404. Francon, J., Viennot, G. and Vuillemin, J.: Description and Analysis of an
Efficient Priority Queue Representation; Proceedings FOCS, Ann Arbor MI,
19:l-7, (Oct 1978). (5.1.5).
405. Francon, J.: On the analysis of algorithms for trees; Theoretical Computer
Science, 4(2):155-169, (1977). (3.4.1.1).
406. Franklin, W.R.: Padded Lists: Set Operations in Expected O(1og log N) Time;
Inf. Proc. Letters, 9(4):161-166, (Nov 1979). (3.2.3).
407. Frazer, W.D. and Bennett, B.T.: Bounds of Optimal Merge Performance, and
a Strategy for Optimality; J.ACM, 19(4):641-648, (Oct 1972). (4.4).
408. Frazer, W.D. and hlcKellar, A.C.: Samplesort: A Sampling Approach to Min-
imal Storage Tree Sorting; J.AChl, 17(3):496-507, (July 1970). (4.1.3, 4.2.6).
REFERENCES 331
409. Frazer, W.D. and Wong, C.K.: Sorting by Natural Selection; C.ACM,
15(10):910-913, (Oct 1972). (4.4.1).
410. Frederickson, G.N. and Johnson, D.B.: Generalized Selection and Ranking;
Proceedings STOC-SIGACT, Los Angeles CA, 12:420-428, (Apr 1980). (5.2).
411. Frederickson, G.N.: Improving Storage Utilization in Balanced Trees; Proceed-
ings AUerton Conference, Monticello, IL, 17:255-264, (1979). (3.4.2).
412. Frederickson, G.N.: T h e Information Theory Bound is Tight for Selection in
a Heap; Proceedings STOC-SIGACT, Baltimore MD, 22:26-33, (May 1990).
(5.1.3, 5.2).
413. Fredkin, E.: Trie Memory; C.ACM, 3(9):490-499, (Sep 1960). (3.4.4, 7.2.2).
414. Fredman, M.L., Komlos, J. and Szemeredi, E.: Storing a Sparse Table with
0 ( 1 ) Worst Case Access Time; J.ACM, 31(3):538-544, (July 1984). (3.3.16).
415. Fredman, M.L. and Komlos, J.: On the Size of Separating Systems and Fam-
ilies of Perfect Hash Functions; SIAM J Alg Disc Methods, 5(1):61-68, (Mar
1984). (3.3.16).
416. Fredman, M.L., Sedgewick, R., Sleator, D.D. and Tarjan, R.E.: The Pairing
Heap: A New Form of Self-Adjusting Heap; Algorithmica, 1( 1 ) : l l l - 1 2 9 , (Mar
1986). (5.1.3).
417. Fredman, M.L. and Spencer, T.H.: Refined complexity analysis for heap op-
erations;' JCSS, 35(3):269-284, (Dec 1987). (5.1.3).
418. Fredman, M.L. and Tarjan, R.E.: Fibonacci Heaps and Their Uses in Im-
proved Network Optimization Algorithms; J.ACM, 34(3):596-615, (July 1987).
(5.1.3).
419. Fredman, M.L. and Willard, D.E.: Blasting Through the Information Theo-
retic Barrier with Fusion Trees; Proceedings STOC-SIGACT, Baltimore MD,
22:l-7, (May 1990). (3.4.1, 3.5.3, 4.1).
420. Fredman, M.L.: A Lower Bound on the Complexity of Orthogonal Range
Queries; J.ACM, 28(4):696-705, (Oct 1981). (3.6.2).
421. Fredman, M.L.: A Near Optimal Data Structure for a Type of Range Query
Problem; Proceedings STOC-SIGACT, Atlanta GA, 11:62-66, (Apr 1979).
(3.6.2).
422. Fredman, M.L.: How good is the information theory bound in sorting?; The-
oretical Computer Science, 1(4):355-361, (1976). (4.1).
423. Fredman, M.L.: The Inherent Complexity of Dynamic Data Structures Which
Accommodate Range Queries; Proceedings FOCS, Syracuse NY, 21:191-199,
(Oct 1980). (3.6.2).
424. Fredman, M.L.: Two Applications of a Probabilistic Search Technique: Sort-
ing X+Y and Building Balanced Search Trees; Proceedings STOC-SIGACT,
Albuquerque NM, 7:240-344, (May 1975). (3.4.1.6).
425. Freeston, M.: Advances in the design of the BANG file; Proceedings Foun-
dations of Data Organisation and Algorithms, Lecture Notes in Computer
Science 367, Springer-Verlag, Paris, France, 3:322-338, (June 1989). (3.5.4).
426. Freeston, M.: T h e Bang file: a new kind of grid file; Proceedings ACM SIG-
MOD, San Francisco CA, 16:260-269, (May 1987). (3.5.4).
427. Friedman, J.H., Bentley, J.L. and Finkel, R.A.: An Algorithm for Finding
Best Matches in Logarithmic Expected Time; ACM TOMS, 3(3):209-226, (Sep
1977). (3.5.2, 3.6).
428. Friend, E.H.: Sorting on Electronic Computer Systems; J.ACM, 3(3):134-168,
(July 1956). (4.1, 4.3, 4.4).
332 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
472. Gonnet, G.H., Rogers, L.D. and George, J.A.: An Algorithmic and Complexity
Analysis of Interpolation Search; Acta Informatica, 13( 1):39-52, (Jan 1980).
(3.2.2).
473. Gonnet, G.H. and Rogers, L.D.: The Interpolation-Sequential Search Algo-
rithm; Inf. Proc. Letters, 6(4):136-139, (Aug 1977). (3.2.3).
474, Gonnet, G.H. and Tompa, F.W.: A Constructive Approach t o the Design of
Algorithms and Their Data Structures; C.ACM, 26(11):912-920, (Nov 1983).
(2.1, 2.2).
475. Gonnet, G.H. and Tompa, F.W.: Mind your Grammar: A New Approach
to Modelling Text; Proceedings VLDB, Brighton, England, 13:339-346; (Aug
1987). (7.2.7).
476. Gonnet, G.H.: Average Lower Bounds for Open Addressing Hash Coding;
Proceedings Theoretical Computer Science, Waterloo, Ont, :159-162, (Aug
1977). (3.3.9).
477. Gonnet, G.H.: Balancing Binary Trees by Internal Path Reduction; C.ACM,
26 (1 2) :1074- 108 1, (Dec 198 3). (3.4.1.5).
478. Gonnet, G.H.: Efficient Searching of Text and Pictures; (Technical Report
OED-88-02)(1988). (7.2.2, 7.2.3, 7.3.2).
479. Gonnet, G.H.: Expected Length of the Longest Probe Sequence in Hash Code
Searching; J.ACM, 28(2):289-304, (Apr 1981). (3.3.2, 3.3.9, 3.3.10).
480. Gonnet, G.H.: Heaps Applied to Event Driven Mechanisms; C.ACM,
19(7):417-418, (July 1976). (5.1.3).
481. Gonnet, G.H.: Interpolation and Interpolation-Hash Searching; PhD Disser-
tation, University of Waterloo, (Feb 1977). (3.2.2).
482. Gonnet, G.H.: Notes on the Derivation of Asymptotic Expressions from Sum-
mations; Inf. Proc. Letters, 7(4):165-169, (June 1978). (11).
483. Gonnet, G.H.: On Direct Addressing Sort; RAIRO TSI, 3(2):123-127, (Mar
1984). (4.1.6).
484. Gonnet, G.H.: Open Addressing Hashing with Unequal Probability Keys;
JCSS, 21(3):354-367, (Dec 1980). (3.3.2).
485. Gonnet, G.H.: PAT Implementation; (1986). (7.2.4).
486. Gonnet, G.H.: Unstructured Data Bases or Very Efficient Text Searching;
Proceedings ACM PODS, Atlanta, GA, 2:117-124, (Mar 1983). (7.2, 7.2.2).
487. Gonzalez, T.F. and Johnson, D.B.: Sorting Numbers in Linear Expected Time
and Optimal Extra Space; Inf. Proc. Letters, 15(3):119-124, (Oct 1982). .
(4.1.8).
488. Goodman, J.E. and Pollack, R.: Multidimensional Sorting; SIAM J on Com-
puting, 12(3):484-507, (Aug 1983). (4.3).
489. Gordon, D.: Eliminating the flag in threaded binary search trees; Inf. Proc.
Letters, 23(4):209-214, (Apr 1986). (3.4.1).
490. Gori, M. and Soda, G.: An algebraic approach to Cichellis perfect hashing;
BIT, 29(1):2-13, (1989). (3.3.16).
491. Gotlieb, C.C. and Walker, W.A.: A Top-Down Algorithm for Constructing
Nearly Optimal Lexicographical Trees; Graph Theory and Computing, Aca-
demic Press, :303-323, (1972). (3.4.1.6).
492. Gotlieb, C.C.: Sorting on Computers; C.ACM, 6(5):194-201, (May 1963).
(4.4).
REFERENCES 335
514. Halatsis, C. and Philokypru, G.: Pseudo Chaining in Hash Tables; C.ACM,
21(7):554-557, (July 1978). (3.3).
515. Hall, P.A.V. and Dowling, G.R.: Approximate String Matching; ACM C. Sur-
veys, 12:381-402, (1980). (7.1.8).
516. Handley, C.: An in-situ distributive sort; Inf. Proc. Letters, 23(5):265-270,
(Apr 1986). (4.2.5).
517. Hansen, E.R., Patrick, M.L. and Wong, R.L.C.: Polynomial evaluation with
scaling; ACM TOMS, 16(1):86-93, (Mar 1990). (6.4).
518. Hansen, W.J.: A Cost Model for the Internal Organization of B+ Tree Nodes;
ACM TOPLAS, 3(4):508-532, (Oct 1981). (3.4.2).
519. Hansen, W.J.: A Predecessor Algorithm for Ordered Lists; Inf. Proc. Letters,
7( 3):137-1 38, (Apr 1978). (3.1.1).
520. Harper, L.H., Payne, T.H., Savage, J.E. and Straus, E.: Sorting X+Y; C.ACM,
18(6):347-349, (June 1975). (4.2, 4.3).
521. Harrison, M.C.: Implementation of the Substring Test by Hashing; C.ACM,
14:777-779, (1971). (7.1.5, 7.2.6).
522. Hasham, A. and Sack, J.R.: Bounds for min-max heaps; BIT, 27(3):315-323,
(1987). (5.1.3).
523. Head, A X . : Multiplication Modulo n; BIT, 20(1):115-116, (1980). (6.1).
524. Heintz, J. and Schnorr, C.P.: Testing Polynomials Which are Easy to Com-
pute; Proceedings STOC-SIGACT, Los Angeles CA, 12:262-272, (Apr 1980).
(6.4).
525. Heintz, J. and Sieveking, M.: Lower Bounds for Polynomials with Algebraic
Coefficients; Theoretical Computer Science, 11:321-330, (1980). (6.4).
526. Heising, W.P.: Note on Random Addressing Techniques; IBM Systems J,
2(2):112-116, (June 1963). (1.4).
527. Held, G. and Stonebraker, M.: B-trees re-examined; C.ACM, 21(2):139-143,
(Feb 1978). (3.4.2).
528. Hendricks, W.J.: An account of self-organizing systems; SIAM J on Comput-
ing, 5(4):715-723, (Dec 1976). (3.1.2, 3.1.3).
529. Henrich, A., Six, H. and Widmayer, P.: The LSD tree: spatial access to mul-
tidimensional point- and non-point objects; Proceedings VLDB, Amsterdam,
Netherlands, 15:45-54, (Aug 1989). (3.3.13, 3.5).
530. Hermosilla, L. and Olivos, J.: A Bijective Approach to Single rotation trees;
Proceedings SCCC Int. Conf. in Computer Science, Santiago, Chile, 5:22-30,
(1 985). (3.4.1.6).
531. Hertel, S.: Smoothsorts Behavior on Presorted Sequences; Inf. Proc. Letters,
16(4):165-170, (May 1983). (4.1.5).
532. Hester, J.H., Hirschberg, D.S., Huang, S-H.S. and Wong, C.K.: Faster con-
struction of optimal binary split trees; J of Algorithms, 7(3):412-424, (Sep
1986). (3.4.1.6).
533. Hester, J.H., Hirschberg, D.S. and Larmore, L.L.: Construction of optimal
Binary Split trees in the presence of bounded access probabilities; J of Algo-
rithms, 9(33):245-253, (June 1988). (3.4.1.6).
534. Hester, J.H. and Hirschberg, D.S.: Self-Organizing Linear Search; ACM C.
Surveys, 17(3):295-311, (Sep 1985). (3.1.2, 3.1.3).
535. Hester, J.H. and Hirschberg, D.S.: Self-Organizing Search Lists Using Proba-
bilistic Back-Pointers; C.ACh4, 30(12):1074-1079, (Dec 1987). (3.1.2, 3.1.3).
REFERENCES 337
560. Hsiao, Y-S. and Tharp, A.L.: Adaptive Hashing; Inform. Systems, 13(1):111-
128, (1988). (3.4.2.5).
561. Hsu, W.J. and Du, M.W.: Computing a Longest Common Subsequence for A
Set of Strings; BIT, 24:45-59, (1984). (7.3.1).
562. Hsu, W.J. and Du, M.W.: New algorithms for the longest common subsequence
problem; JCSS, 29:133-152, (1984). (7.3.1).
563. Hu, T.C., Kleitman, D.J. and Tamaki, J.K.: Binary Trees Optimum Under
Various Criteria; SIAM J Appl Math, 37(2):246-256, (Oct 1979). (3.4.1.7).
564. Hu, T.C. and Shing, M.T.: Computation of Matrix Chain Products. Part I;
SIAM J on Computing, 11(2):362-373, (May 1982). (6.3).
565. Hu, T.C. and Tan, K.C.: Least Upper Bound on the Cost of Optimum Binary
Search Trees; Acta Informatica, 1(4):307-310, (1972). (3.4.1.7).
566. Hu, T.C. and Tucker, A.C.: Optimal Computer Search Trees and Variable-
Length Alphabetical Codes; SIAM J Appl Math, 21(4):514-532, (Dec 1971).
(3.4.1.7).
567. IIu, T.C.: A New Proof of the T-C Algorithm; SIAM J Appl Math, 25(1):83-
94, (July 1973). (3.4.1.7).
568. Huang, B. and Langston, M.A.: Practical In-Place Merging; C.ACM,
31(3):348-352, (Mar 1988). (4.3, 4.3.1, 4.3.2).
569. Huang, B. and Langston, M.A.: Fast Stable Merging and Sorting in Constant
Extra Space; Proceedings ICCI89, 71-80, (1989). (4.3, 4.3.1, 4.3.2).
570. Huang, B. and Langston, M.A.: Stable Duplicate-key Extraction with Optimal
Time and Space bounds; Acta Informatica, 26(5):473-484, (1989). (4.1).
571. Huang, S-H.S. and Viswanathan, V.: On the construction of weighted time-
optimal B-trees; BIT, 30(2):207-215, (1990). (3.4.2).
572. Huang, S-H.S. and Wong, CX.: Binary search trees with limited rotation;
BIT, 23(4):436-455, (1983). (3.4.1.6).
573. Huang, S-H.S. and Wong, C.K.: Generalized Binary Split Trees; Acta Infor-
matica, 21 (1): 113-123, (1984). (3.4.1.6).
574. Huang, S-H.S. and Wong, C.K.: Optimal Binary Split Trees; J of Algorithms,
5(1):65-79, (Mar 1984). (3.4.1.6).
575. Huang, S-H.S. and Wong, C.K.: Average Number of rotation and access cost
in iR-trees; BIT, 24(3):387-390, (1984). (3.4.1.6).
576. Huang, S-H.S.: Height-balanced trees of order (p, 7,6);ACM TODS,
10(2):261-284, (1985). (3.4.2).
577. Huang, S-H.S.: Optimal Multiway split trees; J of Algorithms, 8(1):146-156,
(Mar 1987). (3.4.1.6, 3.4.1.10).
578. Huang, S-H.S.: Ordered priority queues; BIT, 26(4):442-450, (1986). (5.1).
579. Huddleston, S. and Mehlhorn, I<.: A New Data Structure for Representing
Sorted Lists; Acta Informatica, 17(2):157-184, (1982). (3.4.2.1).
580. Huddleston, S. and Mehlhorn, I<.: Robust Balancing in B-Trees; Lecture Notes
in Computer Science 104, Springer-Verlag, :234-244, (1981). (3.4.2).
581. Huits, M. and Kumar, V.: The Practical Significance of Distributive Parti-
tioning Sort; Inf. Proc. Letters, 8(4):168-169, (Apr 1979). (4.2.5).
582. Hunt, J. and Szymanski, T.G.: A fast algorithm for computing longest common
subsequences; C.ACM, 20:350-353, (1977). (7.3.1).
REFERENCES 339
583. Hutflesz, A., Six, H. and Widmayer, P.: Globally Order Preserving Multidi-
mensional Linear Hashing; Proceedings IEEE Conf. on Data Eng., Los Angeles
CA, 4:572-579, (1988). (3.5.4).
584. Hutflesz, A., Six, H. and Widmayer, P.: Twin Grid Files: Space Optimizing
Access Schemes; Proceedings ACM SIGMOD, Chicago IL, 17:183-190, (June
1988). (3.5.4).
585. Hwang, F.K. and Lin, S.: A Simple Algorithm for Merging Two Disjoint Lin-
early Ordered Sets; SIAM J on Computing, 1(1):31-39, (Mar 1972). (4.3.3).
586. Hwang, F.K. and Lin, S.: Optimal Merging of 2 Elements with n Elements;
Acta Informatica, 1(2):145-158, (1971). (4.3.3).
587. Hwang, F.K.: Optimal Merging of 3 Elements with n Elements; SIAM J on
Computing, 9(2):298-320, (May 1980). (4.3.3).
588. Hyafil, L., Prusker, F. and Vuillemin, J.: An Efficient Algorithm for Comput-
ing Optimal Disk Merge Patterns; Proceedings STOC-SIGACT, Seattle WA,
6:216-229, (Apr 1974). (4.3, 4.4).
589. Hyafil, L. and van de Wiele, J.P.: On the Additive Complexity of Specific
Polynomials; Inf. Proc. Letters, 4(2):45-47, (Nov 1975). (6.4).
590. Hyafil, L.: Bounds for Selection; SIAM J on Computing, 5(1):109-114, (Mar
1976). (5.2).
591. Incerpi, J. and Sedgewick, R.: Improved Upper Bounds on Shellsort; JCSS,
31(2):210-224, (Oct 1985). (4.1.4).
592. Incerpi, J. and Sedgewick, R.: Practical Variations of Shellsort; Inf. Proc.
Letters, 26(1):37-43, (Sep 1987). (4.1.4).
593. Isaac, E.J. and Singleton, R.C.: Sorting by Address Calculation; J.ACM,
3(3):169-174, (July 1956). (4.1.6, 4.2.3).
594. It&, A., Konheim, A.G. and Rodeh, M.: A Sparse Table Implementation of
Priority Queues; Proceedings ICALP, Lecture Notes in Computer Science 115,
Springer-Verlag, Acre, 8:417-430, (July 1981). (5.1).
595. Itai, A.: Optimal Alphabetic Trees; SIAM J on Computing, 5(1):9-18, (Mar
1976). (3.4.1.7).
596. JaJa, J. and Takche, J.: Improved Lower Bounds for some matrix multipli-
cation problems; Inf. Proc. Letters, 21(3):123-127, (Sep 1985). (6.3).
597. JaJa, J.: On the Complexity of Bilinear Forms with Commutativity; SIAM
J on Computing, 9(4):713-728, (Nov 1980). (6.3).
598. JaJa, J.: On the Computational Complexity of the Permanent; Proceedings
FOCS, Tucson AZ, 24:312-319, (Nov 1983). (6.3).
599. JaJa, J.: Optimal Evaluation of Pairs of Bilinear Forms; SIAM J on Com-
puting, 8(3):443-462, (Aug 1979). (6.1, 6.3).
600. Jackowski, B.L., Kubiak, R. and Sokolowski, S.: Complexity of Sorting by
Distributive Partitioning; Inf. Proc. Letters, 9(2):100, (Aug 1979). (4.2.5).
601. Jacobs, D. and Feather, M.: Corrections to A synthesis of Several Sorting
algorithms; Acta Informatica, 26( 12):19-24, (1988). (2.2).
602. Jacobs, M.C.T. and van Emde-Boas, P.: T w o results on Tables; Inf. Proc.
Letters, 22(1):43-48, (Jan 1986). (3.3).
603. Jacquet, P. and Regnier, M.: Trie Partitioning Process: Limiting Distribu-
tions; Proceedings CAAP, Nice, 13:196-210, (1986). (3.4.4).
604. Jaeschke, G. and Osterburg, G.: On Cichellis Minimal Perfect Hash Functions
Method; C.ACM, 23(12):728-729, (Dec 1980). (3.3.16).
340 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
647. Klein, R. and Wood, D.: On the Path Length of Binary Trees; J.ACM,
36(2):280-289, (Apr 1989). (3.4.1).
648. Kleitman, D.J., Meyer, A.R., Rivest, R.L., Spencer, J. and Winklmann, K.:
Coping with Errors in Binary Search Procedures; JCSS, 20(3):396-404, (June
1980). (3.4.1).
649. Kleitman, D. J. and Saks, M.E.: Set Orderings Requiring Costliest Alphabetic
Binary Trees; SIAM J Alg Disc Methods, 2(2):142-146, (June 1981). (3.4.1.7).
650. Knott, G.D. and de la Torre, P.: Hash table collision resolution with direct
chaining; J of Algorithms, 10( 1):20-34, (Mar 1989). (3.3.10).
651. Knott, G.D.: A Balanced Tree Storage and Retrieval Algorithm; Proceedings
ACM Symposium of Information Storage and Retrieval, College Park MD,
175-196, (1971). (3.4.1.3).
652. Knott, G.D.: A Numbering System for Binary Trees; C.ACM, 20(2):113-115,
(Feb 1977). (3.4.1).
653. Knott, G.D.: Deletions in Binary Storage Trees; PhD Dissertation, Computer
Science Department, Stanford University, (May 1975). (3.4.1.9).
654. Knott, G.D.: Direct-chaining with coalescing lists; J of Algorithms, 5( 1):7-21,
(Mar 1984). (3.3.10, 3.3.12).
655. Knott, G.D.: Fixed-Bucket Binary Storage Trees; J of Algorithms, 3(3):276-
287, (Sep 1983). (3.4.1.1, 3.4.4).
656. Knott, G.D.: Hashing Functions; Computer Journal, 18(3):265-278, (Aug
1975). (3.3.1).
657. Knott, G.D.: Linear open addressing and Petersons theorem rehashed; BIT,
28(2):364-371, (1988). (3.3.4).
658. Knott, G.D.: Expandable Open Addressing Hash Table Storage and Retrieval;
Proceedings ACM SIGFIDET Workshop on Data Description, Access and
Control, San Diego CA, :186-206, (Nov 1971). (3.3).
659. Knuth, D.E., Morris, J. and Pratt, V.: Fast Pattern Matching in Strings;
SIAM J on Computing, 6:323-350, (1977). (7.1.2).
660. Knuth, D.E.: Deletions that Preserve Randomness; IEEE Trans. Software
Engineering, 3:351-359, (1977). (3.4.1.9).
661. Knuth, D.E.: Evaluating Polynomials by Computers; C.ACM, 5:595-599,
(1962). (6.4).
662. Knuth, D.E.: Length of Strings for a Merge Sort; C.ACM, 6(11):685-688, (Nov
1963). (4.4.1).
663. Knuth, D.E.: Optimum Binary Search Trees; Acta Informatica, 1(1):14-25,
(1971). (3.4.1.7).
664. Knuth, D.E.: Structured Programming with Go To Statements; ACM C. Sur-
veys, 6(4):261-301, (Dec 1974). (3.1.1, 3.4.1.1, 4.1, 4.1.3).
665. Knuth, D.E.: The Average Time for Carry Propagation; P. Kon Ned A,
81(2):238-242, (1978). (6.1).
666. Kollias, J.G.: An Estimate of Seek Time for Batched Searching of Random or
Index Sequential Struc:tured Files; Computer Journal, 21(2):132-133, (1978).
(3.3, 3.4.3).
667. Konheim, A.G. and Weiss, B.: An Occupancy Discipline and Applications;
SIAM J Appl Math, 14:1266-1374, (1966). (3.3.4).
REFERENCES 343
668. Korsh, J.F.: Greedy Binary Search Trees are Nearly Optimal; Inf. Proc.
Letters, 13( 1):16-19, (Oct 198 1). (3.4.1.6).
669. Korsh, J.F.: Growing Nearly Optimal Binary Search Trees; Inf. Proc. Letters,
14(3):139-143, (May 1982). (3.4.1.6).
670. Kosaraju, S.R.: Insertions and Deletions in One-sided Height-Balanced Trees;
C.ACM, 21(3):226-227, (Mar 1978). (3.4.1.3).
671. Kosaraju, S.R.: Localized Search in Sorted Lists; Proceedings STOC-SIGACT,
Milwaukee WI, 13:62-69, (May 1981). (3.4.2.1).
672. Kosaraju, S.R.: On a Multidimensional Search Problem; Proceedings STOC-
SIGACT, Atlanta GA, 11:67-73, (Apr 1979). (3.5).
673. Kosaraju, S.R.: Efficient Tree Pattern Matching; Proceedings FOCS, Research
Triangle Park, NC, 30:178-183, (1989). (7.1.7).
674. Kral, J.: Some Properties of the Scatter Storage Technique with Linear Prob-
ing; Computer Journal, 14(3):145-149, (1971). (3.3.4).
675. Krichersky, R.E.: Optimal Hashing; Information and Control, 62( 1):64-92,
(July 1984). (3.3.9).
676. Kriegel, H.P. and Kwong, Y.S.: Insertion-Safeness in Balanced Trees; Inf.
Proc. Letters, 16(5):259-264, (June 1983). (3.4.2.1).
677. Kriegel, H.P. and Seeger, B.: Multidimensional Order Preserving Linear Hash-
ing with Partial Expansions; Proceedings Int. Conf. on Database Theory, Lec-
ture Notes in Computer Science, Springer-Verlag, Rome, 243:203-220, (1986).
(3.5.4).
678. Kriegel, H.P. and Seeger, B.: PLOP-Hashing: A Grid File without Directory;
Proceedings IEEE Conf. on Data Eng., Los Angeles, CA, 4:369-376, (1988).
(3.5.4).
679. Kriegel, H.P., Vaishnavi, V.K. and Wood, D.: 2-3 Brother Trees; BIT,
18(4):425-435, (1978). (3.4.2.1).
680. Krithivasan, K. and Sitalakshmi, R.: Efficient Two-Dimensional Pattern
Matching in the Presence of Errors; Information Sciences, 43:169-184, (1987).
(7.1.8, 7.3.2).
681. Kritzinger, P.S. and Graham, J.W.: A Theorem in the Theory of Compromise
Merge Methods; J.ACM, 21(1):157-160, (Jan 1974). (4.4.4, 4.4.3).
682. Kronmal, R.A. and Tarter, M.E.: Cumulative Polygon Address Calculation
Sorting; Proceedings ACM-NCC, Cleveland OH, 20:376-384, (1965). (4.1.6).
683. Kronrod, M.A.: An Optimal Ordering Algorithm Without a Field of Opera-
tion; Dokladi Akademia Nauk SSSR, 186:1256-1258, (1969). (4.3.2).
684. Kruijer, H.S.M.: The Interpolated File Search Method; Informatie, 16( 11):612-
615, (Nov 1974). (3.2.2).
685. Kumar, S.K. and Ranzon, C.P.: A linear space algorithm for the LCS problem;
Acta Informatica, 24(3):353-362, (1987). (7.3.1).
686. Kung, H.T.: A New Upper Bound on the Complexity of Derivative Evaluation;
Inf. Proc. Letters, 2(5):146-147, (Dec 1973). (6.4).
687. Kuspert, K.: Storage Utilization in B*-trees with a Generalized Overflow Tech-
nique; Acta Informatica, 29(1):35-56, (1983). (3.4.2).
688. Ladi, E., Luccio, F., Mugnai, C. and Pagli, L.: On two dimensional data
organization I; Fundamenta Informaticae, 3(2):211-226, (1979). (3.5).
689. Lai, T.W. and Wood, D.: Implicit Selection; Proceedings SWAT 88, Halmstad,
Sweden, 1:14-23, (1988). (5.2).
344 HANDBOOK OF ALGORJTHMS AND DATA STRUCTURES
690. Lan, K.K.: A note on synthesis and Classification of Sorting Algorithms; Acta
Informatica, 27( 1):73-80, (1989). (2.2).
691. Landau, G.M. and Vishkin, U.: Efficient String Matching in the Presence of
Errors; Proceedings FOGS, Portland OR, 26:126-136, (Oct 1985). (7.1.8).
692. Landau, G.M. and Vishkin, U.: Efficient String Matching with IC Mismatches;
Theoretical Computer Science, 43239-249, (1986). (7.1.8).
693. Landau, G.M. and Vishkin, U.: Introducing efficient parallelism into approxi-
mate string matching and a new serial algorithm; Proceedings STOC-SIGACT,
Berkeley CA, 18:220-230, (May 1986). (7.1.8).
694. Landau, G.M.: String Matching in Erroneous Input; PhD Dissertation, Tel
Aviv University, Tel Aviv, Israel, (1986). (7.1.8).
695. Lang, S.D.: Analysis of recursive batched interpolation sort; BIT, 30( 1):42-50,
(1990). (4.1.6).
696. Langenhop, C.E. and Wright, W.E.: A model of the Dynamic Behavior of
B-trees; Acta Informatica, 27( 1):41-60, (1989). (3.4.2).
697. Langenhop, C.E. and Wright, W.E.: An Efficient Model for Representing and
Analyzing B-Trees; Proceedings ACM-NCC, Denver CO, 40:35-40, (1985).
(3.42).
698. Langenhop, C.E. and Wright, W.E.: Probabilities related t o Father-Son Dis-
tances in Binary search; SIAM J on Computing, 15(2):520-530, (May 1986).
(3.4.1).
699. Larmore, L.L.: A Subquadratic algorithm for constructing approximately op-
timal binary search trees; J of Algorithms, 8(4):579-591, (Dec 1987). (3.4.1.7).
700. Larson, J.A. and Walden, W.E.: Comparing Insertion Schemes Used to Update
3-2 Trees; Inform. Systems, 4:127-136, (1979). (3.4.2.1).
701. Larson, P. and Kajla, A.: File Organization: Implementation of a Method
Guaranteeing Retrieval in one Access; C.ACM, 27(7):670-677, (July 1984).
(3.3.15).
702. Larson, P. and Ramakrishna, M.V.: External Perfect Hashing; Proceedings
ACM SIGMOD, Austin TX, 14:190-200, (June 1985). (3.3.16).
703. Larson, P.: A Method for Speeding u p Text Retrieval; Proceedings ACM
SIGMOD, San Jose CA, 12:117-123, (May 1983). (7.2, 7.2.6).
704. Larson, P.: A Single-File Version of Linear Hashing with Partial Expansions;
Proceedings VLDB, Mexico City, 8:300-309, (Sep 1982). (3.3.14).
705. Larson, P.: Analysis of Hashing with Chaining in the Prime Area; J of Algo-
rithms, 5(1):36-47, (1984). (3.3).
706. Larson, P.: Analysis of Index-Sequential Files with Overflow Chaining; ACM
TODS, 6(4):671-680, (Dec 1981). (3.4.3).
707. Larson, P.: Analysis of Repeated Hashing; BIT, 20(1):25-32, (1980). (3.3).
708. Larson, P.: Analysis of Uniform Hashing; J.ACM, 30(4):805-819, (Oct 1983).
(3.3.2).
709. Larson, P.: Dynamic Hash Tables; C.ACM, 31(4):446-457, (Apr 1988).
(3.3.14).
710. Larson, P.: Dynamic Hashing; BIT, 18(2):184-201, (1978). (3.3.14).
71 1. Larson, P.: Expected Worst-case Performance of Hash Files; Computer Jour-
nal, 25(3):347-352, (Aug 1982). (3.3.3, 3.3.4, 3.3.11).
REFERENCES 345
712. Larson, P.: Frequency Loading and Linear Probing; BIT, 19(2):223-228,
(1979). (3.3.4).
713. Larson, P.: Linear Hashing with Overflow-Handling by Linear Probing; ACM
TODS, 10(1):75-89, (Mar 1985). (3.3.14).
714. Larson, P.:Linear Hashing with Partial Expansions; Proceedings VLDB, Mon-
treal, 6:224-232,(1980). (3.3.14).
715. Larson, P.: Linear Hashing with Separators - A Dynamic Hashing Scheme
Achieving One-Access Retrieval; ACM TODS, 13(3):366-388, (1988). (3.3.14,
3.3.15).
716. Larson, P.: Performance Analysis of a Single-File Version of Linear Hashing;
Computer Journal, 28(3):319-329, (1985). (3.3.14).
717. Larson, P.: Performance Analysis of Linear Hashing with Partial Expansions;
ACM TODS, 7(4):566-587, (Dec 1982). (3.3.14).
718. Lea, D.:Digital and Hilbert A-D trees; Inf. Proc. Letters, 27(1):35-41, (Feb
1988). (3.5.2).
719. Lee, C.C., Lee, D.T. and Wong, C.K.: Generating Binary Trees of Bounded
Height; Acta Informatica, 23(5):529-544, (1986). (3.4,l).
720. Lee, D.T. and Wong, C.K.: Quintary Trees: A File Structure for Multi-
dimensional Database System; ACM TODS, 5(3):339-353, (Sep 1980). (3.5).
721. Lee, D.T. and Wong, C.K.: Worst-Case Analysis for Region and Partial Region
Searches in Multidimensional Binary Search Trees and Balanced Quad Trees;
Acta Informatica, 9(1):23-29, (1977). (3.5.1,3.5.2, 3.6.2).
722. Lee, K.P.: A Linear Algorithm for Copying Binary Trees Using Bounded
Workspace; C.ACM, 23(3):159-162, (Mar 1980). (3-4.1).
723. Leipala, T.: Qn a Generalization of Binary Search; Inf. Proc. Letters,
8(5):230-233, (June 1979). (3.2.1).
724. Leipala, T.: On Optimal Multilevel Indexed Sequential Files; Inf. Proc. Let-
ters, 15(5):191-195, (Dec 1982). (3.4.3).
725. Leipala, T.:On the Design of Qne-Level Indexed Sequential Files; Int. J of
Comp and Inf Sciences, 10(3):177-1.86,(June 1981). (3.1.5,3.4.3).
726. Lentfert, P. and Overmars, M.H.: Data structures in a real time environment;
Inf. Proc. Letters, 31(3):151-155, (May 1989). (3.4.1,5.1).
727. Lescarne, P. and Steyaert, J.M.: On the Study of Data Structures: Binary
Tournaments with Repeated Keys; Proceedings ICALP, Lecture Notes in Com-
puter Science 154, Springer-Verlag, Barcelona, Spain, 10:466-477,(July 1983).
(3.4.1).
728. Lesuisse, R.: Some Lessons Drawn from the History of the Binary Search
Algorithm; Computer Journal, 26(2): 154-163, (May 1983). (3.2.1,2.2.2.1).
729. Leung, H.C.: Approximate storage utilization of B-trees: A simple derivation
and generalizations; Inf. Proc. Letters, 19(4):199-201, (Nov 1984). (3.4.2).
730. Levcopoulos, C., Lingas, A. and Sack, J.R.: Heuristics for Optimum Binary
Search Trees and Minimum Weight Trangulation problems; Theoretical Com-
pu ter Science, 66(2): 181-204,(1989). (3.4.1.7).
731. Levcopoulos, C., Lingas, A. and Sack, J.R.: Nearly Optimal heuristics for Bi-
nary Search Trees with Geometric Applications; Proceedings ICALP, Lecture
Notes in Computer Science 267, Springer-Verlag, Karslruhe, West Germany,
14:376-385,(1987). (3.4.1.6,3.4.1.7).
346 HANDBOOK OF ALGORITH.hrlS AND DATA STRUCTURES
732. Levcopoulos, C. and Overmars, M.H.: A balanced search tree with 0(1)worst
case update time; Acta Informatica, 26(3):269-278,(1988). (3.4.1).
733. Levcopoulos, C. and Petersson, 0.: Heapsort - adapted for presorted files; Pro-
ceedings Workshop in Algorithms and Data Structures, Lecture Notes in Com-
puter Science 382, Springer-Verlag, Ottawa, Canada, 1:499-509,(Aug 1989).
(4.1.8).
734. Levcopoulos, C. and Petersson, 0.: Sorting shuffled monotone sequences; Pro-
ceedings Scandinavian Workshop in Algorithmic Theory, SWATSO, Lecture
Notes in Computer Science 447, Springer-Verlag, Bergen, Norway, 2:181-191,
(July 1990). (4.1.8).
735. Levenshtein, V.: Binary Codes capable of correcting deletions, insertions and
reversals; Soviet Phys. Dokl, 6:126-136,(1966). (7.1.8).
736. Levenshtein, V.: Binary codes capable of correcting spurious insertions and
deletions of ones; Problems of Information Transmission, 1:8-17, (1965).
(7.1.8).
737. Lewis, G.N., Boynton, N.J. and Burton, F.W.: Expected Complexity of Fast
Search with Uniformly Distributed Data; Inf. Proc. Letters, 13( 1):4-7, (Oct
1981). (3.2.2).
738. Li, L.: Ranking and Unranking AVL-Trees; SIAM J on Computing, 15(4):1025-
1035, (Nov 1986). (3.4.1.3).
739. Li, M. and Yesha, Y.: String matching cannot be done by a two-head one way
deterministic finite automaton; Inf. Proc. Letters, 22:231-235,(1986). (7.1).
740. Li, S. and Loew, M.H.: Adjacency Detection Using Quadcodes; C.ACM,
30(7):627-631, (July 1987). (3.5.1.1).
741. Li, S. and Loew, M.H.: The Quadcode and its Arithmetic; C.ACM, 30(7):621-
626, (July 1987). (3.5.1.1).
742. Linial, N. and Saks, M.E.: Searching ordered structures; J of Algorithms,
6(1):86-103, (Mar 1985). (3.2).
743. Linnainmaa, S.: Software for Doubled-Precision Floating-point Computations;
ACM TOMS, 7(3):272-283, (Sep 1981). (6.1).
744. Lipski, Jr., W., Ladi, E., Luccio, F., Mugnai, C. and Pagli, L.: On two dimen-
sional data organization 11; Fundamenta Informaticae, 3(3):245-260, (1979).
(3.5).
745. Lipton, R.J. and Dobkin, D.: Complexity Measures and Hierarchies for the
Evaluation of Integers, Polynomials and N-Linear Forms; Proceedings STOC-
SIGACT, Albuquerque NM, 7:l-5,(May 1975). (6.4).
746, Lipton, R.J., Rosenberg, A.L. and Yao, A.C-C.: External Hashing Schemes
for Collection of Data Structures; J.ACM, 27(1):81-95,(Jan 1980). (3.3).
747. Lipton, R. J. and Stockmeyer, L. J.: Evaluation of Polynomials with Super-
Preconditioning; Proceedings STOC-SIGACT, Hershey PA, 8:174-180,(May
1976). (6.4).
748. Lipton, R.J.: Polynomials With 0-1 Coefficients That are Hard t o Evaluate;
SIAM 3 on Computing, 7(1):61-69,(Feb 1978). (6.4).
749. Litwin, W. and Lomet, D.B.: A New Method for Fast Data Searches with
Keys; IEEE Software, 4(2):16-24,(Mar 1987). (3.3.14,3.4.2).
750. Litwin, W. and Lomet, D.B.: The Bounded Disorder Access Method; Pro-
ceedings IEEE Conf. on Data Eng., Los Aiigeles CA, 2:38-48,(1986). (3.3.14,
3.4.3.5,3.4.4).
REFERENCES 347
751. Litwin, W.: Linear Hashing: A New Tool for File and Table Addressirig;
Proceedings VLDB, Montreal, 6:212-223, (1980). (3.3.14).
752. Litwin, W.: Linear Virtual Hashing: A New Tool for Files and Tables Imple-
mentation; Proceedings IFIP TC-2 Conference, Venice, Italy, (1979). (3.3.14).
753. Litwin, W.: Trie Hashing; Proceedings ACM SIGMOD, Ann Arbor MI, 11:19-
29, (Apr 1981). (3.4.4, 3.3).
754. Litwin, W.: Virtual Hashing: A Dynamically Changing Hashing; Proceedings
VLDB, Berlin, 4:517-523, (Sep 1978). (3.3.14).
755. Lloyd, J.W. and Ramamohanarao, K.: Partial-Match Retrieval for Dynamic
Files; BIT, 22(2):150-168, (1982). (3.3.13, 3.3.14, 3.6.2).
756. Lloyd, J.W.: Optimal Partial-Match Retrieval; BIT, 20(4):406-413, (1980).
(3.6.2).
757. Lodi, E., Luccio, F., Pagli, L. and Santoro, N.: Random Access in a List
Environment; Inform. Systems, 2:ll-17, (1976). (3.1).
758. Lodi, E. and Luccio, F.: Split sequence hash search; Inf. Proc. Letters,
20(3):131-136, (Apr 1985). (3.3.7).
759. Loeser, R.: Some Performance Tests of Quicksort and Descendants; C.ACM,
17(3):143-152, (Mar 1974). (4.1.3).
760. Lomet, D.B. and Salzberg, B.: Access Methods for Multiversion Data; Pro-
ceedings ACM SIGMOD, Portland OR, 18:315-324, (May 1989). (3.4.2.5).
761. Lomet, D.B. and Salzberg, B.: The hB-tree: A robust multiattribute search
structure; Proceedings IEEE Conf. on Data Eng., Los Angeles CA, 5, (Feb
1989). (3.5).
762. Lomet, D.B. and Salzberg, B.: The Performance of a Multiversion Access
Method; Proceedings ACM SIGMOD, Atlantic City N J , 19:353-363, (May
199 0). (3.4.2.5).
763. Lomet, D.B.: A High Performance, Universal, Key Associative Access Method;
Proceedings ACM SIGMOD, San Jose CA, 13:120-133, (May 1983). (3.3.13,
3.4.2.5).
764. Lomet, D.B.: A Simple Bounded Disorder File Organization with Good Per-
formance; ACM TODS, 13(4):525-551, (1988). (3.3.14, 3.4.4).
765. Lomet, D.B.: Bounded Index Exponential Hashing; ACM TODS, 8(1):136-
165, (Mar 1983). (3.3.13).
766. Lomet, D.B.: Digital B-Trees; Proceedings VLDB, Cannes, 7:333-344, (Sep
1981). (3.4.2.5, 3.4.4).
767. Lomet, D.B.: Partial Expansions for file organizations with an index; ACM
TODS, 12:65-84, (1987). (3.4.2).
768. Lotka, A.J.: The Frequency Distribution of Scientific Production; J of the
Washington Academy of Sciences, 16( 12):317-333, (1926). (1.3).
769. Lotti, G. and Romani, F.: Application of Approximating Algorithms to
Boolean Matrix Multiplication; IEEE Trans. on Computers, C29( 10):927-928,
(Oct 1980). (6.3).
770. Lowden, B.G.T.: A Note on the Oscillating Sort; Computer Journal, 20(1):92,
(Feb 1977). (4.4.5).
771. Luccio, F. and Pagli, L.: Comment on Generalized AVL Trees; C.ACM,
23(7):394-395, (July 1980). (3.4.1.3).
348 HANDBOOK OF ALGORITIIMS AND DATA STRUCTURES
772. Luccio, F. and Pagli, L.: On the Height of Height-Balanced Trees; IEEE Trans.
on Computers, C25(1):87-90, (Jan 1976). (3.4.1.3).
773. Luccio, F. and Pagli, L.: Power Trees; C.ACM, 21(11):941-947, (Nov 1978).
(3.4.1.3).
774. Luccio, F. and Pagli, L.: Rebalancing Height Balanced Trees; IEEE Trans. on
Computers, C27( 5):386-396, (May 1978). (3.4.1.3).
775. Luccio, F., Regnier, h'i. and Schott, R.: Disc and other related data struc-
tures; Proceedings Workshop in Algorithms and Data Structures, Lecture
Notes in Computer Science 382, Springer-Verlag, Ottawa, Canada, 1:192-205,
(Aug 1989). (3.4.4).
776. Luccio, F.: Weighted Increment Linear Search for Scatter Tables; C.ACM,
15(12):1045-1047, (Dec 1972). (3.3.5).
777. Lueker, G.S. and Molodowitch, M.: More Analysis of Double Hashing; Pro-
ceedings STOC-SIGACT, Chicago IL, 20:354-359, (May 1988). (3.3.5).
778. Lueker, G.S. and Willard, D.E.: A Data Structure for Dynamic Range Queries;
Inf. Proc. Letters, 15(5):209-213, (Dec 1982). (3.6.2).
779. Lueker, G.S.: A Data Structure for Orthogonal Range Queries; Proceedings
FOCS, Ann Arbor MI, 19:28-34, (Oct 1978). (3.6.2).
780. Lum, V.Y., Yuen, P.S.T. and Dodd, M.: Key-to-Address Transform Tech-
niques: a Fundamental performance Study on Large Existing Formatted Files;
C.ACM, 14(4):238-239, (1971). (3.3.1).
781. Lum, V.Y. and Yuen, P.S.T.: Additional Results on Key-to-Address Transform
Techniques: A Fundamental Performance Study on Large Existing Formatted
Files; C.ACM, 15(11):996-997, (Nov 1972). (3.3.1).
782. Lum, V.Y.: General Perormance Analysis of Key-to-Address Transformation
Methods Using an Abstract File Concept; C.ACM, 16( 10):603-612, (Oct 1973).
(3.3.1).
783. Lum, V.Y.: Multi-Attribute Retrieval with Combined Indexes; C.ACM,
13(11):660-665, (NOV 1970). (3.4.3, 3.5).
784. Lynch, W.C.: More combinatorial problems on certain trees; Computer Jour-
nal, 7:299-302, (1965). (3.4.1). '
785. Lyon, G.E.: Batch Scheduling From Short Lists; Inf. Proc. Letters, 8(2):57-59,
(Feb 1979). (3.3.8.2).
786. Lyon, G.E.: Hashing with Linear Probing and Frequency Ordering; J Res.
Nat. Bureau of Standards, 83(5):445-447, (Sep 1978). (3.3.4).
787. Lyon, G.E.: Packed Scatter Tables; C.ACM, 21(10):857-865, (Oct 1978).
(3.3.9).
788. MacCallum, I.R.: A Simple Analysis of the nth Order Polyphase Sort; Com-
puter Journal, 16(1):16-18, (Feb 1973). (4.4.4).
789. MacLaren, h1.D.: Internal Sorting by Radix Plus Shifting; J.ACM, 13(3):404-
411, (July 1966). (4.2.4).
790. MacVeigh, D.T.: Effect of Data Representation on Cost of Sparse Matrix
Operations; Acta Informatica, 7:361-394, (1977). (2.1).
791. Madhavan, C.E.V.: Secondary attribute retrieval using tree data structures;
Theoretical Computer Science, 33( 1): 10 7- 116, (1 984). (3.5).
792. Madison, J.A.T.: Fast Lookup in Hash Tables with Direct Rehashing; Com-
puter Journal, 23(2):188-189, (Feb 1980). (3.3.8.2).
REFERENCES 349
793. Mahmoud, H.M. and Pittel, B.: Analysis of the space of search trees under
the random insertion algorithm; J o f Algorithms; 10(1):52-75, (Mar 1989).
(3.4.1.10).
794. Mahmoud, H.M. and Pittel, B.: On the Most Probable Shape of a Search Tree
Grown from a Random Permutation; SIAM J Alg Disc Methods, 5(1):69-81,
(Mar 1984). (3.4.1.1).
795. Mahmoud, H.M.: On the Average Internal Path length of m-ary search trees;
Acta Informatica, 23( 1):lll-117, (1986). (3.4.1.lo).
796. Mahmoud, H.M.: The expected distribution of degrees in random binary
search trees; Computer Journal, 29(1):36-37, (Feb 1986). (3.4.1.1).
797. Maier, D. and Salveter, S.C.: Hysterical B-Trees; Inf. Proc. Letters, 12(4):199-
202, (Aug 1981). (3.4.2.1).
798. Maier, D.: The Complexity of some Problems on Subsequences and Superse-
quences; J.ACM, 25:322-336, (1978). (7.3.1, 7.3).
799. Main, M. and Lorentz, R.: An O ( nlog n) Algorithm for Finding all Repetitions
in a String; J of Algorithms, 1:359-373, (1980). (7.1).
800. Mairson, H.G.: Average Case Lower Bounds on the Construction and Search-
ing of Partial Orders; Proceedings FOCS, Portland OR, 26:303-311, (Oct
1985). (5.1).
801. Mairson, H.G.: The Program Complexity of Searching a Table; Proceedings
FOCS, Tucson AZ, 24:40-47, (Nov 1983). (3.3.16).
802. Majster, M. and Reiser, A.: Efficient On-Line Construction and Correction of
Position Trees; SIAM J on Computing, 9:785-807, (1980). (7.2.2).
803. Makarov, O.M.: Using Duality for the Synthesis of an Optimal Algorithm
Involving Matrix Multiplication; Inf. Proc. Letters, 13(2):48-49, (Nov 1981).
(6.3).
804. Makinen, E.: Constructing a binary tree from its traversals; BIT, 29(3):572-
575, (1989). (3.4.1).
805. Makinen, E.: On Linear Search Heuristics; Inf. Proc. Letters, 29( 1):35-36,
(Sep 1988). (3.1.2, 3.1.3).
806. Makinen, E.: On top-down splaying; BIT, 27(3):330-339, (1987). (3.4.1.6).
807. Malcolm, W.D.: String Distribution for the Polyphase Sort; C.ACM, 6(5):217-
220, (May 1963). (4.4.4).
808. Mallach, E.G.: Scatter Storage Techniques: A Unifying Viewpoint and a
Method for Reducing Retrieval Times; Computer Journal, 20(2):137-140, (May
1977). (3.3.8.2).
809. Maly, K.: A Note on Virtual Memory Indexes; C.ACM, 21(9):786-787, (Sep
1978). (3.4.2).
810. Maly, K.: Compressed Tries; C.ACM, 19(7):409-415, (July 1976). (3.4.4).
811. Manacher, G.K., Bui, T.D. and Mai, T.: Optimum Combinations of Sorting
and Merging; J.ACM, 36(3):290-334, (Apr 1989). (4.3.3).
812. Manacher, G.K.: Significant Improvements to the Hwang-Lin Merging Algo-
rithm; J.ACM, 26(3):434-440, (July 1979). (4.3.3).
813. Manacher, G.K.: The Ford-Johnson Sorting Algorithm is Not Optimal;
J.ACM, 26(3):441-456, (July 1979). (4.1).
814. Manber, U. and Baeza-Yates, R.A.: An Algorithm for String Matching with a
Sequence of Dont Cares; Inf. Proc. Letters, to app.. (7.2.4, 7.3).
350 HANDBOOK OF ALGORITIIMS AND DATA STRUCTURES
815. Manber, U. and Myers, G.: Suffix Arrays: A new method for on-line
string searches; Proceedings SODA, San Francisco CA, 1:319-327, (Jan 1990).
(7.2.4).
816. Manber, U.: Using Induction to Design Algorithms; C.ACM, 31(11):1300-1313,
(1988). (2.2).
817. Manker, H.H.: Multiphase Sorting; C.ACM, 6(5):214-217, (May 1963). (4.4.4).
818. Mannila, H. and Ukkonen, E.: A Simple Linear-time algorithm for in-situ
merging; Inf. Proc. Letters, 18(4):203-208, (May 1984). (4.3.2).
819. Mannila, 1.: Measures of Presortedness and Optimal Sorting Algorithms; Pro-
ceedings ICALP, Lecture Notes in Computer Science 267, Springer-Verlag,
Antwerp, Belgium, 11:324-336, (1984). (4.1.8).
820. Manolopoulos, Y.P., Kollias, J.G. and Burton, F.W.: Batched interpolation
search; Computer Journal, 30(6):565-568, (Dec 1987). (3.2.2).
821. Manolopoulos, Y.P., Kollias, J.G. and Hatzupoulos, M.: Sequential vs. Binary
Batched searching; Computer Journal, 29(4):368-372, (Aug 1986). (3.1, 3.2).
822. Manolopoulos, Y.P.: Batched search of index sequential files; Inf. Proc. Let-
ters, 22(5):267-272, (Apr 1986). (3.4.3).
823. Mansour, Y., Nisan, N. and Tiwari, P.: T h e Computational Complexity of
Universal Hashing; Proceedings STOC-SIGACT, Baltimore MD, 22:235-243,
(May 1990). (3.3.1).
824. Martin, W.A. and Ness, D.N.: Optimizing Binary Trees Grown with a Sorting
Algorithm; C.ACM, 15(2):88-93, (Feb 1972). (3.4.1.6).
825. Martin, W.A.: Sorting; Computing Surveys, 3(4):147-174, (Dec 1971). (4.1,
4.4).
826. Maruyama, I<. and Smith, S.E.: Analysis of Design Alternatives for Virtual
Memory Indexes; C.ACM, 20(4):245-254, (Apr 1977). (3.4.3).
827. Maurer, H.A., Ottmann, T. and Six, H.: Implementing Dictionaries Using
Binary Trees of Very Small Height; Inf. Proc. Letters, S(1):ll-14, (May 1976).
(3.4.2.3).
828. Maurer, W.D. and Lewis, T.G.: Hash table methods; ACM C. Surveys, 7(1):5-
19, (Mar 1975). (3.3).
829. Maurer, W.D.: An Improved Hash Code for Scatter Storage; C.ACM, 11(1):35-
38, (Jan 1968). (3.3.1, 3.3.6).
830. McAllester, R.L.: Polyphase Sorting with Overlapped Rewind; C.ACM,
7(3):158-159, (Mar 1964). (4.4.4).
831. McCabe, J.: On serial files with relocatable records; Operations Research,
13(4):609-618, (1965). (3.1.2).
832. McCreight, E.M.: Pagination of B*-trees with variable-length records;
C.ACM, 20(9):670-674, (Sep 1977). (3.4.2).
833. McCreight, E.M.: Priority search trees; SIAM J on Computing, 14(2):257-276,
(May 1985). (5.1.6).
834. McCulloch, C.M.: Quickshunt - A Distributive Sorting Algorithm; Computer
Journal, 25(1):102-104, (Feb 1982). (4.2.4, 4.4).
835. McDiarmid, C.J.H. and Reed, B.A.: Building Heaps Fast; J of Algorithms,
10(3):352-365, (Sep 1989). (5.1.3).
836. McDonell, I<.J.: An Inverted Index Implementation; Computer Journal,
20(2):116-123, (1977). (7.2.1, 7.2.2).
REFERENCES 351
837. McKellar, A.C. and Wong, C.K.: Bounds on Algorithms for String Generation;
Acta Informatica, 1(4):311-319, (1972). (4.4.1).
838. McKellar, A.C. and Wong, C.K.: Dynamic Placement of Records in Linear
Storage; J.ACM, 25(3):431-434, (July 1978). (3.1).
839. Mehlhorn, K. and Naher, S.: Dynamic Fractional cascading; Algorithmica,
5(2):2 15-141, (199 0). (2 -2).
840. Mehlhorn, K. and Overmars, M.H.: Optimal Dynamization of Decomposable
Searching Problems; Inf. Proc. Letters, 12(2):93-98, (Apr 1981). (2.2).
841. Mehlhorn, K. and Tsakalidis, A.K.: An Amortized Analysis of Insertions into
AVL-Trees; SIAM J on Computing, 15(1):22-33, (Feb 1986). (3.4.1.3).
842. Mehlhorn, K. and Tsakalidis, A.K.: Dynamic Interpolation Search; Proceed-
ings ICALP, Lecture Notes in Computer Science 194, Springer-Verlag, Naf-
plion, Greece, 12:424-434, (1985). (3.2.2).
843. Mehlhorn, K.: A Best Possible Bound for the Weighted Path Length of Binary
Search Trees; SIAM J on Computing, 6(2):235-239, (June 1977). (3.4.1.6).
844. Mehlhorn, K.: A Partial Analysis of Height-Balanced Trees Under Random
Insertions and Deletions; SIAM J on Computing, 11(4):748-760, (Nov 1982).
(3.4.1.3, 3.4.2.1, 3.4.2.3).
845. Mehlhorn, K.: Dynamic Binary Search; SIAM J on Computing, 8(2):175-198,
(May 1979). (3.4.1.6, 3.4.4).
846. Mehlhorn, K.: Nearly Optimal Binary Search Trees; Acta Informatica, 5:287-
295, (1975). (3.4.1.6).
847. Mehlhorn, K.: On the Program Size of Perfect and Universal Hash Functions;
Proceedings FOGS, Chicago IL, 23:170-175, (Oct 1982). (3.3.16, 3.3.1).
848. Mehlhorn, IC.: Sorting Presorted Files; Proceedings GI Conference on Theoret-
ical Computer Science, Lecture Notes in Computer Science 67, Springer-Verlag,
Aachen, Germany, 4:199-212, (1979). (4.1).
849. Meijer, H. and Akl, S.G.: The Design and Analysis of a New Hybrid Sorting
Algorithm; Inf. Proc. Letters, 10(4):313-218, (July 1980). (4.1.1, 4.1.8, 4.2.5).
850. Meir, A. and Moon, J.W.: On the Altitude of Nodes in Random Trees; Canad
J Math, 30(5):997-1015, (1978). (3.4.1.1).
851. Melville, R. and Gries, D.: Controlled Density Sorting; Inf. Proc. Letters,
10(4):169-172, (July 1980). (4.1.2, 4.1.7).
852. Mendelson, H. and Yechiali, U.: A New Approach t o the Analysis of Linear
Probing Schemes; J.ACM, 37(2):474-483, (July 1980). (3.3.4).
853. Mendelson, H. and Yechiali, U.: Performance Measures for Ordered Lists in
Random-Access Files; J.ACM, 26(4):654-667, (Oct 1979). (3.3).
854. Mendelson, H.: Analysis of Linear Probing with Buckets; Inform. Systems,
8:207-216, (1983). (3.3.4).
855. Merrett, T.H. and Fayerman, B.: Dynamic Patricia; Proceedings Int. Conf.
on Foundations of Data Organization, Kyoto, Japan, :13-20, (1985). (3.4.4.5,
7.2.2).
856. Merritt, S.M.: An Inverted Taxonomy of Sorting 'Algorithms; C.ACM,
28(1):96-99, (Jan 1985). (2.2.2, 4.1).
857. Mescheder, B.: On the Number of Active *-Operations Needed to Compute the
Discrete Fourier Transform; Acta Informatica, 13(4):383-408, (1980). (6.4).
858. Mesztenyi, C. and Witzgall, C.: Stable Evaluation of Polynomials; J Res. Nat.
Bureau of Standards, 71B( 1):ll-17, (Jan 1967). (6.4).
352 HANDBOOK OF ALGORITHnlS AND DATA STRUCTURES
859. Meyer, B.: Incremental String Matching; Inf. Proc. Letters, 21:219-227,
(1985). (7.1.2, 7.1.4).
860. Miller, R., Pippenger, N., Rosenberg, A.L. and Snyder, L.: Optimal 2-3 trees;
SIAM J on Computing, 8(1):42-59, (Feb 1979). (3.4.2.1).
861. Miyakawa, M., Yuba, T., Sugito, Y. and Hoshi, M.: Optimum Sequence Trees;
SIAM J on Computing, 6(2):201-234, (June 1977). (3.4.4).
862. Mizoguchi, T.: On Required Space for Random Split Trees; Proceedings Aller-
ton Conference, Monticello, IL, 17:265-273, (1979). (3.4.3).
863. Moenk, R. and Borodin, A.: Fast Modular Transforms Via Division; Proceed-
ings FOCS, College Park Md, 13:90-96, (Oct 1972). (6.4).
864. Moffat, A. and Port, G . : A fast algorithm for melding splay trees; Proceedings
Workshop in Algorithms and Data Structures, Lecture Notes in Computer Sci-
ence 382, Springer-Verlag, Ottawa, Canada, 1:450-459, (Aug 1989). (3.4.1.6).
865. Moller-Nielsen, P. and Staunstrup, J.: Experiments with a Fast String Search-
ing Algorithm; Inf. Proc. Letters, 18:129-135, (1984). (7.1.3).
866. Monard, M.C.: Design and Analysis of External Quicksort Algorithms; PhD
Dissertation, PUC University of Rio de Janeiro, (Feb 1980). (4.4.6).
867. Montgomery, A.Y.: Algorithms and Performance Evaluation of a New Type of
Random Access File Organisation; Australian Computer J , 6(1):3-11, (1974).
(3.3).
868. Moran, S.: On the complexity of designing optimal partial-match retrieval
systems; ACM TODS, 8(4):543-551, (1983). (3.6).
869. Morris, R.: Counting Large Numbers of Events in Small Registers; C.ACM,
21(10):840-842, (Oct 1978). (6.1).
870. Morris, R.: Scatter Storage Techniques; C.ACM, 11(1):38-44, (Jan 1968).
(3.3.3, 3.3.4, 3.3.10, 3.3.11).
871. Morrison, D.R.: PATRICIA - Practical Algorithm t o Retrieve Information
Coded in Alphanumeric; J-ACM, 15(4):514-534, (Oct 1968). (3.4.4.5, 7.2.2).
872. Motoki, T.: A Note on Upper Bounds for the Selection Problem; Inf. Proc.
Letters, 15(5):214-219, (Dec 1982). (5.2).
873. Motzkin, D.: A Stable Quicksort; Software - Practice and Experience, 11:607-
611, (1981). (4.2.2).
874. Motzkin, D.: Meansort; C.ACM, 26(4):250-251, (Apr 1983). (4.1.3).
875. Motzkin, T.S.: Evaluation of Polynomials and Evaluation of Rational Func-
tions; Bull of Amer Math SOC,61:163, (1965). (6.4).
876. Mukhopadhay, A.: A Fast Algorithm for the Longest-Common-Subsequence
Problem; Information Sciences, 20:69-82, (1980). (7.3.1).
877. Mullen, J.: Unified Dynamic Hashing; Proceedings VLDB, Singapore, 10:473-
480, (1984). (3.3.13, 3.3.14).
878. Mullin, J.K.: An Improved Index Seqnential Access Method Using Hashed
Overflow; C.ACM, 15(5):301-307, (May 1972). (3.4.3).
879. Mullin, J.K.: Retrieval-Update Speed Tradeoffs Using Combined Indices;
C.ACM, 14(12):$75-776, (1971). (3.4.3).
880. Mullin, J.K.: Spiral Storage: Efficient Dynamic Hashing with Constant Per-
formance; Computer Journal, 28(3):330-334, (1985). (3.3.13).
881. hlullin, J.K.: Tightly Controlled Linear Hashing Without Separate Overflow
Storage; BIT, 21(4):390-400, (1981). (3.3.14).
REFERENCES 353
882. Munro, J.I. and Paterson, M.S.: Selection and Sorting with Limited Storage;
Theoretical Computer Science, 12(3):315-323, (1980). (4.4, 5.2).
883. Munro, J.I. and Poblete, P.V.: A Discipline for Robustness or Storage Reduc-
tion in Binary Search Trees; Proceedings ACM PODS, Atlanta GA, 2:70-75,
(Mar 1983). (3.4.1).
884. Munro, J.I. and Poblete, P.V.: Fault Tolerance and Storage reduction in
Binary search trees; Information and Control, 62(2-3):210-218, (Aug 1984).
(3.4.1).
885. Munro, J.I. and Poblete, P.V.: Searchability in merging and implicit d a t a
structures; BIT, 27(3):324-329, (1987). (4.3).
886. Munro, J.I., Raman, V. and Salowe, J.S.: Stable in-situ sorting and mini-
mum data movement; BIT, 30(2):220-234, (1990). (4.1).
887. Munro, J.I. and Raman, V.: Sorting with minimum data movement; Pro-
ceedings Workshop in Algorithms and Data Structures, Lecture Notes in Com-
puter Science 382, Springer-Verlag, Ottawa, Canada, 1:552-562, (Aug 1989).
(4.1).
888. Munro, J.I. and Spira, P.M.: Sorting and Searching in Multisets; SIAM J on
Computing, S(1):l-8, (Mar 1976). (4.2).
889. Munro, J.I.: Searching a Two Key Table Under a Single Key; Proceedings
STOC-SIGACT, New York, 19:383-387, (May 1987). (3.5, 3.G.2).
890. Murphy, L.J.: Lotkas Law in the Humanities; J American Society of Informa-
tion Science, 24(6):461-462, (1973). (1.3).
891. Murphy, O.J. and Selkow, S.M.: The efficiency of using k-d trees for finding
rearest neighbours in discrete space; Inf. Proc. Letters, 23(4):215-218, (Apr
1986). (3.5.2).
892. Murphy, O.J.: A Unifying Frame work for Trie Design Heuristics; Inf. Proc.
Letters, 34:243-249, (1990). (3.4.4).
893. Murphy, P.E. and Paul, M.C.: Minimum Comparison Merging of sets of ap-
proximately equal size; Information and Control, 42(1):87-96, (July 1979).
(4.3.2).
894. Murthy, D. and Srimani, P.K.: Split Sequence Coalesced IIashing; Inform.
Systems, 13(2):21 1-2 18, (1 988). (3.3.12).
895. Murthy, Y.D., Bhattacharjee, G.P. and Seetaramanath, M.N.: Time- and
Space-Optimal Height Balanced 2-3 Trees; J. of Combinatorics, Information
and System Sciences, 8(2):127-141, (1983). (3.4.2.1).
896. Myers, E. and Miller, W.: Approximate Matching of Regular Expressions;
Bulletin of Mat!iematical Biology, 51(1):5-37, (1989). (7.1.6, 7.3).
897. Myers, E.: An O ( N D ) Difference Algorithm and Its Variations; Algorithmica,
1:251-266, (1986). (7.3.1).
898. Myers, E.: Incremental Alignment Algorithms and Their Applications; SIAM
J on Computing, toapp.. (7.3.1).
899. Nakamura, T. and Mizoguchi, T.: An Analysis of Storage Utilization Factor in
Block Split Data Structuring Scheme; Proceedings VLDB, Berlin, 4:489-495,
(Sep 1978). (3.4.3).
900. Nakatsu, N., Kambayashi, Y. and Yajima, S.: A Longest Common Subse-
quence Algorithm Suitable for Sinlilar Text Strings; Acta Informatica, 18:171-
179, (1982). (7.3.1).
354 HANDBOOK OF ALGOlUTHhfS AND DATA STRUCTURES
901. Naor, M. and Yung, M.: Universal One-way Hash Functions and their Cryp-
tographic Applications; Proceedings STOC-SIGACT, Seattle WA, 21:33-43,
(May 1989). (3.3.1).
902. Nelson, R.C. and Samet, H.: A Population Analysis for Hierarchical Data
Structures; Proceedings ACM SIGMOD, San Francisco CA, 16:270-277, (May
1987). (3.5.1).
903. Nevalainen, 0. and Teuhola, J.: Priority Queue Administration by Sublist
Index; Computer Journal, 22(3):220-225, (Mar 1979). (5.1.1).
904. Nevalainen, 0. and Teuhola, J.: The Efficiency of Two Indexed Priority Queue
Algorithms; BIT, 18(3):320-333, (1978). (5.1.2).
905. Nevalainen, 0. and Vesterinen, M.: Determining Blocking Factors for Sequen-
tial Files by Heuristic Methods; Computer Journal, 20(3):245-247, (1977).
(3.1).
906. Nicklas, B.M. and Schlageter, G.: Index Structuring in Inverted Data Bases
by Tries; Computer Journal, 20(4):321-324, (Nov 1977). (3.4.4, 7.2.1, 7.2.2).
907. Nievergelt, J., Hinterberger, H. and Sevcik, IC.: The Grid File: An Adapt-
able, Symmetric h4ultikey File Structure; ACM TODS, 9(1):38-71, (Mar 1984).
(3.5.4).
908. Nievergelt, J. and Reingold, E.hl.: Binary Search Trees of Bounded Balance;
SIAM J on Computing, 2(1):33-43, (1973). (3.4.1.4).
909. Nievergelt, J. and Wong, C.K.: On Binary Search Trees; Proceedings Infor-
mation Processing 71, Ljubjana, Yugoslavia, :91-98, (Aug 1971). (3.4.1).
910. Nievergelt, J. and Wong, C.K.: Upper bounds for the total path length of
binary trees; J.ACM, 20(1):1-6, (Jan 1973). (3.4.1).
911. Nievergelt, J.: Binary Search Trees and File Organization; ACM C. Surveys,
6(3):195-207, (Sep 1974). (3.4.1).
912. Nijssen, G.M.: Efficient Batch Updating of a Random File; Proceedings ACM
SIGFIDET Workshop an Data Description, Access and Control, San Diego
CA, :174-186, (Nov 1971). (3.3).
913. Nijssen, G.M.: Indexed Sequential versus Random; IAG Journal, 4:29-37,
(1971). (3.3, 3.4.3).
914. Nishihara, S. and Hagiwara, H.: A Full Table Quadratic Search Method Elim-
inating Secondary Clustering; Int. J of Comp and Inf Sciences, 3(2):123-128,
(1974). (3.3.6).
915. Nishihara, S. and Ikeda, IC.: Reducing the Retrieval Time of Hashing Method
by Using Predictors: C.ACM, 26(12):1082-1088, (Dec 1983). (3.3).
916. Noga, M.T. and Allison, D.C.S.: Sorting in linear expected time; BIT,
25 (3):45 1-465, ( 198 5). (4.3.5).
917. Norton, R.M. and Yeager, D.P.: A Probability Model for Overflow Sufficiency
in Small Hash Tables; C.ACM, 28( 10):1068-1075, (Oct 1985). (3.3.11).
918. Noshita, I<.: Median Selection of 9 Elements in 14 Comparisons; Inf. Proc.
Letters, 3( 1):8-12, (July 1974). (5.2).
919. Nozaki, A.: A Note on the Complexity of Approximative Evaluation of Poly-
nomials; Inf. Proc. Letters, 9(2):73-75, (Aug 1979). (6.4).
920. Nozaki, A.: Sorting Using Networks of Deques; JCSS, 19(3):309-315, (Dec
1979). (4.2).
921. Nozaki, A.: Two Entropies of a Generalized Sorting Problem; JCSS, 7(5):615-
621, (Oct 1973). (4.1, 5.2).
REFERENCES 355
942. Ottmann, T. and Wood, D.: How to update a balanced binary tree with a con-
stant number of rotations; Proceedings Scandinavian Workshop in Algorithmic
Theory, SWATSO, Lecture Notes in Computer Science 447, Springer-Verlag,
Bergen, Norway, 2:122-131, (July 1990). (3.4.1, 3.4.1.8).
943. Ouksel, M. and Scheuermann, P.: Implicit Data Structures for linear Hashing;
Inf. Proc. Letters, 29(5):187-189, (Nov 1988). (3.3.14).
944. Ouksel, M. and Scheuermann, P.: Multidimensional B-Trees: Analysis of Dy-
namic Behavior; BIT, 21(4):401-418, (1981). (3.4.2, 3.5).
945. Ouksel, M. and Scheuermann, P.: Storage Mappings for Multidimensional
Linear Dynamic Hashing; Proceedings ACM PODS, Atlanta GA, 2:90-105,
(Mar 1983). (3.3.14).
946. Ouksel, M.: The interpolation-based grid file; Proceedings ACM PODS, Port-
land OR, 4:20-27, (Mar 1985). (3.3.13).
947. Overholt, I<.J.: Efficiency of the Fibonacci Search Method; BIT, 13(1):92-96,
(1973). (3.2).
948. Overholt, I<.J.: Optimal Binary Search Methods; BIT, 13(1):84-91, (1973).
(3.2.1).
949. Overmars, hl.H., Smid, M., de Berg, M. and van Kreveld, M.: Maintain-
ing Range Trees in Secondary Memory. Part I: Partitions; Acta Informatica,
27:423-452, (1990). (3.6).
950. Overmars, M.H. and van Leeuwen, J.: Dynamic Multidimensional Data Struc-
tures Based on Quad- and I<-D Trees; Acta Informatica, 17(3):267-285, (1982).
(2.2, 3.5.1, 3.5.2).
951. Overmars, M.13. and van Leeuwen, J.: Dynamizations of Decomposable
Searching Problems Yielding Good Worst-case Bounds; Lecture Notes in
Computer Science 104, Syringer-Verlag, :224-233, (1981). (2.2).
952. Overmars, M.H. and van Leeuwen, J.: Some Principles for Dynamizing De-
composable Searching Problems; Inf. Proc. Letters, 12( 1):49-53, (Feb 1981).
(2.2).
953. Overmars, M.H. and van Leeuwen, J.: T w o General Methods for Dynamizing
Decomposable Searching Problems; Computing, 26(2):155-166, (1981). (2.2).
954. Overmars, M.H. and van Leeuwen, J.: Worst-case Optimal Insertion and
Deletion Methods for Decomposable Searching Problems; Inf. Proc. Letters,
12(4):168-173, (Aug 1981). (2.2).
955. Overmars, M.H.: Dynamization of Order Decomposable Set Problems; J of
Algorithms, 2( 3):245-260, (Sep 198 1). (2.2).
956. Overmars, M.H.: Efficient Data Structures for range searching on a grid; J of
Algorithms, 9(2):254-275, (June 1988). (3.6.2).
957. Pagli, L.: Height-balanced hfultiway Trees; Inform. Systems, 4:227-234,
(1979). (3.4.1.3, 3.4.1.10).
958. Pagli, L.: Self Adjusting Hash Tables; Inf. Proc. Letters, 21(1):23-25, (July
1985). (3.3.8.5).
959. Palmer, E.M., Rahimi, M.A. and Robinson, R.W.: Efficiency of a Binary
Comparison Storage Technique; J.ACM, 21(3):376-384, (July 1974). (3.4.1.1).
960. Pan, V.Y.: A Unified Approach to the Analysis of Bilinear Algorithms; J of
Algorithms, 3(3):301-310, (Sep 1981). (6.3).
961. Pan, V.Y.: Computational Complexity of Computing Polynomials Over the
Field of Real and Complex Numbers; Proceedings STOC-SIGACT, San Diego
CA, 10:163-172, (hlay 1978). (6.4).
REFERENCES 357
962. Pan, V.Y.: New Combinations of Methods for the Acceleration of Matrix
Multiplication; Comput Math with Applic, 7:73-125, (1981). (6.3).
963. Pan, V.Y.: New Fast Algorithms for Matrix Operations; SIAM J on Comput-
ing, 9(2):321-342, (May 1980). (6.3).
964. Pan, V.Y.: New Methods for the Acceleration of Matrix Multiplication; Pro-
ceedings FOCS, San Juan PR, 20:38-38, (Oct 1979). (6.3).
965. Pan, V.Y.: Strassens Algorithm is not Optimal: Trilinear Technique of Ag-
gregating, Uniting and Canceling for Constructing Fast Algorithms for Matrix
Operations; Proceedings FOCS, Ann Arbor MI, 19:166-176, (Oct 1978). (6.3).
966. Pan, V.Y.: The Additive and Logical Complexities of Linear and Bilinear
Arithmetic Algorithms; J of Algorithms, 4( 1):l-34, (Mar 1983). (6.3).
967. Fan, V.Y.: The Bit-Complexitmyof Arithmetic Algorithms; J of Algorithms,
2(2):144-163, (June 1981). (6.4).
968. Pan, V.Y.: The Techniques of Trilinear Aggregating and the Recent Progress
in the Asymptotic Acceleration of Matrix Operations; Theoretical Computer
Science, 33 ( 1): 117- 138, ( 1984). (6.3)
969. Panny, W.: A Note on the higher moments of the expected behavior of straight
insertion sort; Inf. Proc. Letters, 22(4):175-177, (Apr 1986). (4.1.2).
970. Papadakis, T., Munro, J.I. and Poblete, P.V.: Analysis of the expected search
cost in skip lists; Proceedings Scandinavian Workshop in Algorithmic Theory,
SWAT90, Lecture Notes in Computer Science 447, Springer-Verlag, Bergen,
Norway, 2:160-172, (July 1990). (3.1, 3.4.1).
971. Papadimitriou, C.H. and BernsteiE, P.A.: On the Performance of Balanced
Hashing Functions When Keys are Not Equiprobable; ACM TOPLAS, 2(1):77-
89, (Jan 1980). (3.3.1).
972. Patt, Y.N.: Variable Length Tree Structures Having Minimum Average Search
Time; C.ACM, 12(2):72-76, (Feb 1969). (3.4.4).
973. Payne, H.J. and Meisel, W.S.: An Algorithm for Constructing Optimal Binary
Decision Trees; IEEE Trans. on Computers, 26(9):905-916, (1977). (3.4.1).
974. Pearson, P.K.: Fast Hashing of Variable-Length Text Strings; C.ACM,
33(6):677-680, (June 1990). (3.3.16, 3.3.1).
975. Peltola, E. and Erkio, 11.: Insertion Merge Sorting; Inf. Proc. Letters, 7(2):92-
99, (Feb 1978). (4.2.1, 4.2.5).
976. Perl, Y., Itai, A. and Avni, H.: Interpolation Search - A Log Log N Search;
C.ACM, 21(7):550-553, (July 1978). (3.2.2).
977. Ped, Y. and Reingold, E.M.: Understanding the Complexity of Interpolation
Search; Inf. Proc. Letters, 6(6):219-321, (Dec 1977). (3.2.2).
978. Perl, Y.: Optimum split trees; J of Algorithms, 5(3):367-374, (Sep 1984).
(3.4.1.6).
979. Peters, J.G. and Kritzinger, P.S.: Implementation of Samplesort: A Minimal
Storage Tree Sort; BIT, 15(1):85-93, (1975). (4.1.3).
980. Peterson, W.W.: Addressing for Random-Access Storage; IBM J Res. Devel-
opment, 1(4):130-146, (Apr 1957). (3.2, 3.3).
981. Pflug, G.C. and Kessler, H.W.: Linear Probing with a Nonuniform Address
Distribution; J.ACM, 34(3):397-410, (Apr 1987). (3.3.4).
982. Pinter, R.: Efficient String Matching with Dont-Care Patterns; Combinatorial
Algorithms on Words, NATO AS1 Series, Springer-Verlag, F12:ll-29, (1985).
(7.1).
358 HANDBOOK OF ALGORITIIAC5' AND DATA STRUCTURES
1025. Ramamohanarao, I<., Lloyd, J.W. and Thom, J.A.: Partial-Match Retrieval
Using Hashing and Descriptors; ACM TODS, 8(4):522-576, (1983). (3.5.4,
7.2.6).
1026. Ramamohanarao, K. and Lloyd, J.W.: Dynamic Hashing Schemes; Computer
Journal, 25(4):478-485, (Nov 1982). (3.3.14).
1027. Ramamohanarao, I<. and Sacks-Davis, R.: Recursive Linear Hashing; ACM
TODS, 9(3):369-391, (1984). (3.3.14).
1028. Ramanan, P.V. and Hyafil, L.: New algorithms for selection; J of Algorithms,
5(4):557-578, (Dec 1984). (5.2).
1029. Rao, V.N.S., Iyengar, S.S. and Kashyap, R.L.: An average case analysis of
MAT and inverted file; Theoretical Computer Science, 62(3):251-266, (Dec
1988). (3.4.3, 7.2.1).
1030. Rao, V.N.S., Vaishnavi, V.K. and Iyengar, S.S.: On the dynamization of data
structures; BIT, 28( 1):37-53, (1988). (2.2).
103 1. Regener, E.: Multiprecision Integer Division Examples using Arbitrary Radix;
ACM TOMS, 10(3):325-328, (1984). (6.1).
1032. Regnier, M.: Analysis of grid file algorithms; BIT, 25(2):335-357, (1985).
(3.5.4).
1033. Regnier, M.: On the Average Height of Trees in Digital Search and Dynamic
Hashing; Inf. Proc. Letters, 13(2):64-66, (Nov 1981). (3.4.4, 3.3.13).
1034. Reingold, E.M.: A Note on 3-2 Trees; Fibonacci Quarterly, 17(2):151-157,
(Apr 1979). (3.4.2.1).
1035. Reiser, A.: A Linear Selection Algorithm for Sets of Elements with Weights;
Inf. Proc. Letters, 7(3):159-162, (Apr 1978). (5.2).
1036. Remy, J.L.: Construction Evaluation et Amelioration Systematiques de Struc-
tures de Donnees; RAI'RO Informatique Theorique, 14( 1):83-118, (1980). (2.2).
1037. Revah, L.: On the Number of Multiplications/Divisions Evaluating a Poly-
nomial with Auxiliary Functions; SIAM J on Computing, 4(3):381-392, (Sep
1975). (6.4).
1038. Richards, D. and Vaidya, P.: On the distribution of comparisons in sorting
algorithms; BIT, 28(4):764-774, (1988). (4.1).
1039. Richards, D.: On the worst possible analysis of weighted comparison-based
algorithms; Computer Journal, 31(3):276-278, (June 1988). (4.1).
1040. Richards, R.C.: Shape distribution of height-balanced trees; Inf. Proc. Let-
ters, 17(1):17-20, (July 1983). (3.4.1.3).
1041. Rivest, R.L. and van de Wiele, J.P.: An Cl((n/Ign)1/2) Lower Bound on the
Number of Additions Necessary to Compute 0-1 Polynomials Over the Ring
of Integer Polynomials; Inf. Proc. Letters, 8(4):178-180, (Apr 1979). (6.4).
1042. Rivest, R.L.: On Hash-Coding Algorithms for Partial-Match Retrieval; Pro-
ceedings FOCS, New Orleans LA, 15:95-103, (Oct 1974). (3.5.4, 7.2.6).
1043. Rivest, R.L.: On Self-Organizing Sequential Search Heuristics; C.ACM,
19(2):63-67, (Feb 1976). (3.1.2, 3.1.3).
1044. Rivest, R.L.: On the Worst-case Behavior of String-Searching Algorithms;
SIAM J on Computing, 6:669-674, (1977). (7.1).
1045. Rivest, R.L.: Optimal Arrangement of Keys in a Hash Table; J.ACM,
25( 3):200-209, (Apr 1978). (3.3.8.a).
REFERENCES 361
1067. Rosenberg, A.L.: On Uniformly Inserting One Data Structure into Another;
C.ACM, 24(2):88-90, (Feb 1981). (2.1, 2.2).
1068. Rotern, D.: Clustered Multiattribute Hash Files; Proceedings ACM PODS,
Philadelfia PA, 8, (Mar 1989). (3.5.4).
1069. Rotem, D. and Varol, Y.L.: Generation of Binary Trees from Ballot Sequences;
J.ACM, 25(3):396-404, (July 1978). (3.4.1).
1070. Rothnie, J.B. and Lozano, T.: Attribute Based File Organization in a Paged
Memory Environment; C.ACM, 17(2):63-69, (Feb 1974). (3.3, 3.5).
1071. Ruskey, F. and Hu, T.C.: Generating Binary Trees Lexicographically; SIAM
J on Computing, 6(4):745-758, (Dec 1977). (3.4.1).
1072. Ruskey, F.: Generating t-Ary Trees Lexicographically; SIAM J on Computing,
7(4):434-439, (Nov 1978). (3.4.1.10).
1073. Rytter, W.: A Correct Preprocessing Algorithm for Boyer-Moore String-
Searching; SIAM J on Computing, 9:509-512, (1980). (7.1.3).
1074. Sack, J.R. and Strothotte, T.: A Characterization of Heaps and Its Applica-
tions; Information and Computation, 86(1):69-86, (May 1990). (5.1.3).
1075. Sack, J.R. and Strothotte, T.: An algorithm for merging heaps; Acta Infor-
matica, 22( 2):171-186, (1985). (5.1.3).
1076. Sacks-Davis, R., Ramamohanarao, I<. and Kent, A.: Multikey access methods
based on superimposed coding techniques; ACM TODS, 12(4):655-696, (1987).
(3.5, 7.2.6).
1077. Sacks-Davis, R. and Ramamohanarao, IC.: A Two Level Superimposed Cod-
ing Scheme for Partial Match Retrieval; Inform. Systems, 8:273-280, (1983).
(3.5.4).
1078. Sacks-Davis, R. and Ramamohanarao, I<.: A Two-Level Superimposed Coding
Scheme for Partial Match Retrieval; Inform. Systems, 8(4):273-280, (1983).
(3.5.4, 7.2.6).
1079. Sager, T.J.: A Polynomial Time Generator for Minimal Perfect Hash Func-
tions; C.ACM, 28(5):523-532, (hlay 1985). (3.3.16).
1080. Salowe, J.S. and Steiger, W.L.: Simplified stable merging tasks; J of Algo-
rithms, 8(4):557-571, (Dec 1987). (4.3.2).
1081. Salowe, J.S. and Steiger, W.L.: Stable unmerging in linear time and Constant
space; Inf. Proc. Letters, 25(5):285-294, (July 1987). (4.3).
1082. Salzberg, B.: Merging sorted runs using large main memory; Acta Informatica,
27(3):195-216, (1989). (4.4).
1083. Samadi, B.: B-trees in a system with multiple views; Inf. Proc. Letters,
5(4):107-112, (Oct 1976). (3.4.3).
1084. Samet, H.: A Quadtree Medial Axis Transform; C.ACM, 26(9):680-693, (Sep
1983). (3.5.1.1).
1085. Samet, H.: Data Structures for Quadtree Approximation and Compression;
C.ACM, 28(9):973-993, (Sep 1985). (3.5.1.1).
1086. Samet, H.: Deletion in Two-Dimensional Quad Trees; C.ACM, 23(12):703-710,
(Dec 1980). (3.5.1.1).
1087. Samet, H.: The Quadtree and Related Hierarchical Data Structures; ACM C.
Surveys, 16(2):187-260, (June 1984). (3.5.1.1).
1088. Samson, W.B. and Davis, R.H.: Search Times Using Hash Tables for Records
with Non-Unique Keys; Computer Journal, 21(3):210-214, (Aug 1978). (3.3.6).
REFERENCES 363
1089. Samson, W.B.: Hash Table Collision Handling on Storage Devices with La-
tency; Computer Journal, 24(2):130-131, (May 1981). (3.3.4, 3.3.5).
1090. Santoro, N. and Sidney, J.B.: Interpolation Binary Search; Inf. Proc. Letters,
20(4):179-182, (May 1985). (3.2.1, 3.2.3).
1091. Santoro, N.: Chain Multiplication of Matrices Approximately or Exactly the
Same Size; C.ACM, 27(2):152-156, (Feb 1984). (6.3).
1092. Santoro, N.: Extending the Four Russians Bound to General Matrix Multi-
plication; Inf. Proc. Letters, 10(2):87-88, (Mar 1980). (6.3).
1093. Santoro, N.: Full Table Search by Polynomial Functions; Inf. Proc. Letters,
5(3):72-74, (Aug 1976). (3.3.6).
1094. Sarwate, D.V.: A Note on Universal Classes of Hash Functions; Inf. Proc.
Letters, 10(1):41-45, (Feb 1980). (3.3.1).
1095. Sassa, M. and Goto, E.: A Hashing Method for Fast Set Operations; Inf. Proc.
Letters, 5(2):31-34, (1976). (3.3).
1096. Savage, J.E.: An Algorithm for the Computation of Linear Forms; SIAM J on
Computing, 3(2):150-158, (June 1974). (6.3, 6.4).
1097. Saxe, J.B. and Bentley, J.L.: Transforming Static Data Structures to Dynamic
Data Structures; Proceedings FOCS, San Juan PR, 20:148-168, (Oct 1979).
(2.2).
1098. Saxe, J.B.: On the Number of Range Queries in k-Space; Discr App Math,
1(3) :2 17-225, (1979). (3.6.2).
1099. Schaback, R.: On the Expected Sublinearity of the Boyer-Moore Algorithm;
SIAM J on Computing, 17(4):648-658, (1988). (7.1.3).
1100. Schachtel, G.: A Noncommutative Algorithm for Multiplying 5 x 5 Matrices
Using 103 Multiplications; Inf. Proc. Letters, 7(4):180-182, (June 1978). (6.3).
1101. Schay, G. and Raver, N.: A Method for Key-to-Address Transformation; IBM
J Res. Development, 7:121-126, (1963). (3.3).
1102. ,Schay, G. and Spruth, W.G.: Analysis of a File Addressing Method; C.ACM,
5(8):459-462, (Aug 1962). (3.3.4).
1103. Scheurmann, P. and Ouksel, M.: Multidimensional B-trees for Associative
Searching in Database Systems; Inform. Systems, 7:123-137, (1982). (3.4.2.5,
3.5).
1104. Scheurmann, P.: Overflow Handling in Hashing Tables: A Hybrid Approach;
Inform. Systems, 4:183-194, (1979). (3.3).
1105. Schkolnick, M.: Secondary Index Optimization; Proceedings ACM SIGMOD,
San Francisco CA, 4:186-193, (May 1975). (3.4.3).
1106. Schkolnick, M.: A Clustering Algorithm for Hierarchical Structures; ACM
TODS, 2(1):27-44, (Mar 1977). (3.4.3).
1107. Schkolnick, M.: The Optimal Selection of Secondary Indices for Files; Inform.
Systems, 1:141-146, (1975). (3.4.3).
1108. Schlumberger, M. and Vuillemin, J.: Optimal Disk Merge Patterns; Acta In-
formatica, 3(1):25-35, (1973). (4.3, 4.4).
1109. Schmidt, J.P. and Siegel, A.: On Aspects of Universality and Performance for
Closed Hashing; Proceedings STOC-SIGACT, Seattle, Washington, 21:355-
366, (1989). (3.3.16, 3.3.1).
1110. Schmidt, J.P. and Siegel, A.: The Analysis of Closed Hashing under Limited
Randomness; Proceedings STOC-SIGACT, Baltimore MD, 22:224-234, (May
1990). (3.3.2, 3.3.4, 3.3.5, 3.3.1).
364 HANDBOOK OF ALGORITIIAfS AND DATA STRUCTURES
1111. Schnorr, C.P. and van de Wiele, J.P.: On the Additive Complexity of Polyno-
mials; Theoretical Computer Science, lO(1):l-18, (1980). (6.4).
1112. Schnorr, C.P.: How Many Polynomials Can be Approximated Faster than they
can be Evaluated?; Inf. Proc. Letters, 12(2):76-78, (Apr 1981). (6.4).
1113. Scholl, M.: New File Organizations Based on Dynamic Hashing; ACM TODS,
6(1):194-211, (Mar 1981). (3.3.13, 3.3.14).
1114. Schonhage, A., Paterson, M.S. and Pippenger, N.: Finding the Median; JCSS,
13(2):184-199, (Oct 1976). (5.2).
1115. Schonhage, A.: Fast Multiplication of Polynomials Over Fields of Character-
istic 2; Acta Informatica, 7:395-398, (1977). (6.4).
1116. Schonhage, A.: Partial and Total Matrix Multiplication; SIAM J on Comput-
ing, 10(3):434-455, (Aug 1981). (6.3).
1117. Schoor, A.: Fast Algorithm for Sparse Matrix Multiplication; Inf. Proc. Let-
ters, 15(2):87-89, (Sep 1982). (6.3).
1118. Schulte Monting, J.: Merging of 4 or 5 Elements with n Elements; Theoretical
Computer Science, 14(1):19-37, (1981). (4.3.3).
1119. Scowen, R.S.: Algorithm 271, Quickersort; C.ACM, 8( 11):669-670, (Nov 1965).
(4.1.3).
1120. Sedgewick, R.: A new upper bound for shellsort; J of Algorithms, 7(2):159-173,
(June 1986). (4.1.4).
1121. Sedgewick, R.: Data Movement in Odd-Even Merging; SIAM J on Computing,
7(3):239-272, (Aug 1978). (4.3, 4.3).
1122. Sedgewick, R.: Implementing Quicksort Programs; C.ACM, 21(10):847-856,
(Oct 1978). (4.1.3).
1123. Sedgewick, R.: Quicksort With Equal Keys; SIAM J on Computing, 6(2):240-
267, (June 1977). (4.1.3).
1134. Sedgewick, R.: Quicksort; PhD Dissertation, Computer Science Department,
Stanford University, (May 1975). (4.1.3).
1125. Sedgewick, R.: The Analysis of Quicksort Programs; Acta Informatica, 7:327-
355, (1977). (4.1.3).
1126. Seeger, B. and Kriegel, H.P.: Techniques for design and implementation of
efficient spatial data structures; Proceedings VLDB, Los Angeles CA, 14:360-
371, (1988). (3.5).
1127. Seiferas, J. and Galil, 2.: Real-time recognition of substring repetition and
reversal; Mathematical Systems Theory, 11:lll-146, (1977). (7.1).
1128. Sellers, P.: An Algorithm for the Distance Between Two Finite Sequences; J
of Combinatorial Theory (A), 16:353-258, (1974). (7.1.8).
1129. Sellers, P.: On the theory and computation of evolutionary distances; SIAM J
Appl Math, 36:787-793, (1974). (7.1).
1130. Sellers, P.: The Theory and Computation of Evolutionary Distances: Pattern
Recognition; J of Algorithms, 1:359-373, (1980). (7.1).
1131. Sellis, T., Roussopoulos, N. and Faloutsos, C.: The R+-tree: A dynamic index
for multidimensional objects; Proceedings VLDB, Brighton, England, 13:507-
518, (1987). (3.5).
1132. Selmer, E.S.: On shellsort and the Frobenius problem; BIT, 29(1):37-40,
(1989). (4.1.4).
REFERENCES 365
1133. Senko, M.E., Lum, V.Y. and Owens, P.J.: A File Organization Model
(FOREM); Proceedings Information Processing 68, Edinburgh, :514-519,
(1969). (3.4.3).
1134. Senko, M.E.: Data Structures and Data Accessing in Data Base Systems: Past,
Present and Future; IBM Systems J , 16(3):208-257, (1977). (3.4.3).
1135. Severance, D.G. and Carlis, J.V.: A Practical Approach to Selecting Record
Access Paths; ACM C. Surveys, 9(4):259-272, (1977). (3.4.3).
1136. Severance, D.G. and Duhne, R.: A Practitioner's Guide to Addressing Algo-
rithms; C.ACM, 19(6):314-326, (June 1976). (3.3).
1137. Shaw, M. and Traub, J.F.: On the Number of Multiplications for the Evalua-
tion of a Polynomial and Some of its Derivatives; J.ACM, 21(1):161-167, (Jan
1974). (6.4).
1138. Shaw, M. and Traub, J.F.: Selection of Good Algorithms from a Family of
Algorithms for Polynomial Derivative Evaluation; Inf. Proc. Letters, 6(5):141-
145, (Oct 1977). (6.4).
1139. Sheil, B.A.: Median Split Trees: A Fast Lookup Technique for Frequently
Occurring Keys; C.ACM, 21(11):947-958, (Nov 1978). (3.4.1.6).
1140. Shell, D.L.: A High-speed Sorting Procedure; C.ACM, 2(7):30-33, (July 1959).
(4.1.4).
1141. Shell, D.L.: Optimizing the Polyphase Sort; C.ACM, 14(11):713-719, (Nov
1971). (4.4.4).
1142. Sherk, M.: Self-adjusting k-ary search trees; Proceedings Workshop in Algo-
rithms and Data Structures, Lecture Notes in Computer Science 382, Springer-
Verlag, Ottawa, Canada, 1:75-96, (Aug 1989). (3.4.1.6, 3.4.1.10).
1143. Shirg, M.: Optimum ordered Bi-weighted binary trees; Inf. Proc. Letters,
17(2) ~67-70,(Aug 1983). (3.4.1.7).
1144. Shneiderman, B. and Goodman, V.: Batched Searching of Sequential and Tree
Structured Files; ACM TODS, 1(3):268-275, (1976). (3.4.2, 3.1, 3.4.3).
1145. Shneiderman, B.: A Model for Optimizing Indexed File Structures; Int. J of
Comp and Inf Sciences, 3(1):93-103, (Mar 1974). (3.4.3).
1146. Shneiderman, B.: Jump Searching: A Fast Sequential Search Technique;
C.ACM, 21(10):831-834, (Oct 1978). (3.1.5).
1147. Shneiderman, B.: Polynomial Search; Software - Practice and Experience,
3(2):5-8, (1973). (3.1).
1148. Siegel, A.: On Universal Classes of Fast High Performance Hash Functions,
Their Time-Space Tradeoff, and their Applications; Proceedings FOCS, Re-
search Triangle Park, NC, 30:20-27, (1989). (3.3.1).
1149. Silva-Filho, Y.V.: Average Case Analysis of Region Search in Balanced k-d
Trees; Inf. Proc. Letters, 8(5):219-223, (June 1979). (3.5.2).
1150. Silva-Filho, Y.V.: Optimal Choice of Discriminators in a Balanced k-d Binary
Search Tree; Inf. Proc. Letters, 13(2):67-70, (Nov 1981). (3.5.2).
1151. Singleton, R.C.: An Efficient Algorithm for Sorting with Minimal Storage;
C.ACM, 12(3):185-187, (Mar 1969). (4.1.3).
1152. Six, H. and Wegner, L.M.: Sorting a random access file in situ; Computer
Journal, 27(3):270-275, (Aug 1984). (4.4).
1153. Six, H.: Improvement of the m-way Search Procedure; Angewandte Informatik,
15(1):79-83, (Feb 1973). (3.1.5).
366 HANDBOOK OF ALGORITHhrlS AND DATA STRUCTURES
1177. Stephenson, C.J.: A Method for Constructing Binary Search Trees by Making
Insertions at the Root; Int. J of Comp and Inf Sciences, 9(1):15-29, (Feb 1980).
(3.4.1).
1178. Stockmeyer, L. J.: The Complexity of Approximate Counting; Proceedings
STOC-SIGACT, Boston Mass, 15:118-126, (Apr 1983). (6.1).
1179. Stockmeyer, P.K. and Yao, F.F.: On the Optimality of Linear Merge; SIAM
J on Computing, 9(1):85-90, (Feb 1980). (4.3.3).
1180. Stout, Q.F. and Warren, B.L.: Tree Rebalancing in Optimal Time and Space;
C.ACM, 29(9):902-908, (Sep 1986). (3.4.1, 3.4.1.8).
1181. Strassen, V.: The Asymptotic Spectrum of Tensors and the Exponent of
Matrix Multiplication; Proceedings FOCS, Toronto, Canada, 27:49-54, (Oct
1986). (6.3).
1182. Strassen, V.: Gaussian Elimination is not Optimal; Numer Math, 13:354-356,
(1969). (6.3).
1183. Strassen, V.: Polynomials with Rational Coefficients Which are Hard to Com-
pute; SIAM J on Computing, 3(2):128-149, (June 1974). (6.4).
1184. Strong, H.R., Markowsky, G. and Chandra, A.K.: Search Within a Page;
J.ACM, 26(3):457-482, (July 1979). (3.4.1, 3.4.2, 3.4.3).
1185. Strothotte, T., Eriksson, P. and Vallner, S.: A note on constructing min-max
heaps; BIT, 29(2):251-256, (1989). (5.1.3).
1186. Sundar, R.: Worst-case data structures for the priority queue with Attrition;
Inf. Proc. Letters, 31(2):69-75, (Apr 1989). (5.1).
1187. Suraweera, F. and Al-anzy, J.M.: Analysis of a modified Address calculations
sorting algorithm; Computer Journal, 31(6):561-563, (Dec 1988). (4.2.3).
1188. Sussenguth, E.H.: Use of Tree Structures for Processing Files; C.ACM,
6(5):272-279, (1963). (3.4.4).
1189. Szpankowski, W.: Average Complexity of Additive Properties for Multiway
Tries: A Unified Approach; Proceedings CAAP, Lecture Notes in Computer
Science 249, Pisa, Italy, 14:13-25, (1987). (3.4.4).
1190. Szpankowski, W.: Digital data structures and order statistics; Proceedings
Workshop in Algorithms and Data Structures, Lecture Notes in Computer
Science 382, Springer-Verlag, Ottawa, Canada, 1:206-217, (Aug 1989). (3.4.4).
1191. Szpankowski, W.: How much on the average is the Patricia trie better?; Pro-
ceedings Allerton Conference, Monticello, IL, 24:314-323, (1986). (3.4.4.5).
1192. Szpankowski, W.: On an Alternative Sum Useful in the Analysis of Some
Data Structures; Proceedings SWAT 88, Halmstad, Sweden, 1:120-128, (1988).
(3.4.4).
1193. Szpankowski, W.: Some results on V-ary asymmetric tries; J of Algorithms,
9(2):224-244, (June 1988). (3.4.4).
1194. Szwarcfiter, J.L. and Wilson, L.B.: Some Properties of Ternary Trees; Com-
puter Journal, 21(1):66-72, (Feb 1978). (3.4.1.10, 4.2.G).
1195. Szwarcfiter, J.L.: Optimal multiway search trees for variable size keys; Acta
Informatica, 2 1( 1):47-60, (1 984). (3.4.1.10).
1196. Szymanski, T.G.: Hash table reorganization; J of Algorithms, 6(3):322-355,
(Sep 1985). (3.3).
1197. Tai, K.C. and Tharp, A.L.: Computed Chaining A Hybrid of Direct and
Open Addressing; Proceedings AFIPS, Anaheim CA, 49:275-282, (1980). (3.3,
3.3.10).
368 HANDBOOK OF ALGORITHhfS AND DATA STRUCTURES
1198. Tainiter, M.: Addressing for Random-Access Storage with Multiple Bucket
Capacities; J.ACM, 10:307-315, (1963). (3.3.4).
1199. Takaoka, T.: An On-line Pattern Matching Algorithm; Inf. Proc. Letters,
22:329-330, (1986). (7.1.2).
1200. Tamminen, M.: Analysis of N-Trees; Inf. Proc. Letters, 16(3):131-137, (Apr
1983). (3.4.2).
1201. Tamminen, M.: Comment on Quad- and Octtrees; C.ACM, 27(3):248-249,
(Mar 1984). (3.5.1.1).
1202. Tamminen, M.: Extendible Hashing with Overflow; Inf. Proc. Letters,
15(5):22 7-233, (Dec 1982). (3.3.13).
1203. Tamminen, M.: Order Preserving Extendible Hashing and Bucket Tries; BIT,
21(4):419-435, (1981). (3.3.13, 3.4.4).
1204. Tamminen, M.: On search by address computation; BIT, 25( 1):135-147,
(1985). (3.3.13, 3.3.14).
1205. Tamminen, M.: Two levels are as good as any; J of Algorithms, 6(1):138-144,
(Mar 1985). (4.2.5).
1206. Tan, K.C. and Hsu, L.S.: Block Sorting of a Large File in External Storage by
a 2-Component Key; Computer Journal, 25(3):327-330, (Aug 1982). (4.4).
1207. Tan, K.C.: On Foster's Information Storage and Retrieval Using AVL Trees;
C.ACM, 15(9):843, (Sep 1972). (3.4.1.3).
1208. Tang, P.T.P.: Table-Driven Implementation of the Exponential Function in
IEEE Floating Point Arithmetic; ACM TOMS, 15(2):144-157, (1989). (6.2).
1209. Tanner, R.M.: Minimean Merging and Sorting: An Algorithm; SIAM J on
Computing, 7(1):18-38, (Feb 1978). (4.3, 4.2).
1210. Tarhio, J. and Ukkonen, E.: Boyer-Moore approach to approximate string
matching; Proceedings Scandinavian Workshop in Algorithmic Theory,
SWAT'90, Lecture Notes in Computer Science 447, Springer-Verlag, Bergen,
Norway, 2:348-359, (July 1990). (7.1.8).
1211. Tarjan, R.E. and Yao, A.C-C.: Storing a Sparse Table; C.ACM, 22(11):606-
611, (Nov 1979). (3.3.16, 3.4.4).
1212. Tarjan, R.E.: Algorithm Design; C.ACM, 30(3):204-213, (Mar 1987). (2.2).
1213. Tarjan, R.E.: Sorting Using Networks of Queues and Stacks; J.ACM,
18(2):341-346, (Apr 1972). (4.2).
1214. Tarjan, R.E.: Updating a Balanced Search Tree in 0(1)Rotations; Inf. Proc.
Letters, 16(5):253-257, (June 1983). (3.4.2.2, 3.4.1.8).
1215. Tarter, M.E. and Kronmal, R.A.: Non-Uniform Key Distribution and Ad-
dress Calculation Sorting; Proceedings ACM-NCC, Washington DC, 21:331-
337, (Aug 1966). (4.1.6, 4.2.3).
1216. Tenenbaum, A.M. and Nemes, R.M.: T w o Spectra of Self-organizing Sequen-
tial Algorithms; SIAM J on Computing, 11(3):557-566, (Aug 1982). (3.1.2).
1217. Tenenbaum, A.M.: Simulations of Dynamic Sequential Search Algorithms;
C.AChf , 2 1(9):790-79 1, (Sep 1978). (3.1.3).
1218. Thanh, M., Alagar, V.S. and Bui, T.D.: Optimal Expected-Time algorithms
for merging; J of Algorithms, 7(3):341-357, (Sep 1986). (4.3.2).
1219. Thanh, M. and Bui, T.D.: An Improvement of the Binary Merge Algorithm;
BIT, 22(4):454-462, (1982). (4.3.3).
REFERENCES 369
1220. Tharp, A.L. and Tai, K.C.: The Practicality of Text Signatures for Accelerat-
ing String Searching Software; Software - Practice and Experience, 12:35-44,
(1982). (7.2.6).
1221. Tharp, A.L.: Further Refinement of the Linear Quotient Hashing Method;
Inform. Systems, 4:55-56, (1979). (3.3.8.1).
1222. Thompson, K.: Regular Expression Search Algorithm; C.ACM, 11:419-422,
(1968). (7.1.6).
1223. Ting, T.C. and Wang, Y.W.: Multiway Replacement Selection Sort with Dy-
namic Reservoir; Computer Journal, 20(4):298-301, (Nov 1977). (4.4.1).
1224. Todd, S.: Algorithm and Hardware for a Merge Sort Using Multiple Processors;
IBM J Res. Development, 22(5):509-517, (Sep 1978). (4.2.1).
1225. Torn, A.A.: Hashing with overflow index; BIT, 24(3):317-332, (1984). (3.3).
1226. Trabb Pardo, L.: Stable Sorting and Merging with Optimal Space and Time
Bounds; SIAM J on Computing, 6(2):351-372, (June 1977). (4.3.2, 4.1).
1227. Tropf, H. and Herzog, H.: Multidimensional Range Search in Dynamically
Balanced Trees; Angewandte Informatik, 2:71-77, (1981). (3.6.2).
1228. Tsakalidis, A.K.: AVL-trees for localized search; Information and Control,
67( 1-3): 173- 194, (Oct 1985). (3.4.1.3).
1229. Tsi, K.T. and Tharp, A.L.: Computed chaining: A hybrid of Direct Chaining
and Open Addressing; Inform. Systems, 6:111-116, (1981). (3.3).
1230. Tzoreff, T. and Vishkin, U.: Matching Patterns in Strings Subject to Multi-
linear Transformations; Theoretical Computer Science, 60:231-254, (1988).
(7.3).
1231. Ukkonen, E. and Wood, D.: A simple on-line algorithm to approximate string
matching; (Report A-1990-4)Helsinki, Finland, (1990). (7.1.8).
1232. Ukkonen, E.: Algorithms for Approximate String hlatching; Information and
Control, 64:lOO-118, (1985). (7.1.8).
1233. Ukkonen, E.: Finding Approximate Patterns in Strings; J of Algorithms, 6:132-
137, (1985). (7.1.8).
1234. Ukkonen, E.: On Approximate String Matching; Proceedings Int. Conf. on
Foundations of Computation Theory, Lecture Notes in Computer Science 158,
Springer-Verlag, Borgholm, Sweden, :487-495, (1983). (7.1.8).
1235. Ullman, J.D.: A Note on the Efficiency of Hashing Functions; J.ACM,
19(3):569-575, (July 1972). (3.3.1).
1236. Unterauer, K.: Dynamic Weighted Binary Search Trees; Acta Informatica,
11 (4):341-362, (1 979). (3.4.1.4).
1237. Vaishnavi, V.K., Kriegel, H.P. and Wood, D.: Height Balanced 2-3 Trees;
Computing, 21:195-211, (1979). (3.4.2.1).
1238. Vaishnavi, V.K., Kriegel, H.P. and Wood, D.: Optimum Multiway Search
Trees; Acta Informatica, 14(2):119-133, (1980). (3.4.1.10).
1239. van de Wiele, J.P.: An Optimal Lower Bound on the Number of Total Op-
erations to Compute 0-1 Polynomials Over the Field of Complex Numbers;
Proceedings FOCS, Ann Arbor MI, 19:159-165, (Oct 1978). (6.4).
1240. van der Nat, M.: A Fast Sorting Algorithm, a Hybrid of Distributive and
Merge Sorting; Inf. Proc. Letters, 10(3):163-167, (Apr 1980). (4.2.5).
1241. van der Nat, M.: Binary Merging by Partitioning; Inf. Proc. Letters, 8(2):72-
75, (Feb 1979). (4.3).
370 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
1242. van der N a t , M.: Can Integers be Sorted in Linear Worst Case Time?; Ange-
wandte Informatik, 25(11):499-501, (Nov 1983). (4.2.4).
1243. van der Nat, M.: On Interpolation Search; C.ACM, 22(12):681, (Dec 1979).
(3.2.2).
1244. van der Pool, J.A.: Optimum Storage Allocation for a File in Steady State;
IBM Systems J , 17(1):27-38, (1973). (3.3.11).
1245. van der Pool, J.A.: Optimum Storage Allocation for a File with Open Ad-
dressing; IBM Systems J, 17(2):106-114, (1973). (3.3.4).
1246. van der Pool, J.A.: Optimum Storage Allocation for Initial Loading of a File;
IBM Systems J , 16(6):579-586, (1972). (3.3.11).
1247. van Emde-Boas, P., Kaas, R. and Zijlstra, E.: Design and Implementation of
an Efficient Priority Queue; Mathematical Systems Theory, 10:99-127, (1977).
(5.1.4).
1248. van Emde-Boas, P.: Preserving Order in a Forest in Less than Logarithmic
Time and Linear Space; Inf. Proc. Letters, 6(3):80-82, (June 1977). (5.1.4).
1249. van Emden, M.H.: Algorithm 402, qsort; C.ACM, 13(11):693-694, (Nov 1970).
(4.1.3).
1250. van Emden, M.H.: Increasing the Efficiency of Quicksort; C.ACM, 13(9):563-
567, (Sep 1970). (4,1.3).
1251. van Leeuwen, J. and Overmars, M.H.: Stratified Balanced Search Trees; Acta
Informatica, 18(4):345-359, (1983). (3.4.1, 3.4.2).
1252. van Leeuwen, J. and Wood, D.: Dynamization of Decomposable Searching
Problems; Inf. Proc. Letters, 10(2):51-56, (Mar 1980). (2.2.2).
1253. van Wyk, C.J. and Vitter, J.S.: The Complexity of Hashing with Lazy Dele-
tion; Algorithmica, 1(1):17-29, (1986). (3.3).
1254. Veklerov, E.: Analysis of Dynamic Hashing with Deferred Splitting; ACM
TODS, 10(1):90-96, (Mar 1985). (3.3.13, 3.3.14).
1255. Verkamo, A.I.: Performance of Quicksort Adapted for virtual Memory use;
Computer Journal, 30( 4):362-371, (Aug 1987). (4.1.3).
1256. Veroy, B.S.: Average Complexity of Divide-and-Conquer algorithms; Inf. Proc.
Letters, 29(6):319-326, (Dec 1988). (3.4.2).
1257. Veroy, B.S.: Expected Combinatorial Complexity of Divide-and-Conquer Al-
gorithms; Proceedings SCCC Int. Conf. in Computer Science, Santiago, Chile,
8:305-314, (July 1988). (2.2.2.1).
1258. Vishkin, U.: Deterministic Sampling: A New Technique for Fast Pattern
Matching; Proceedings STOC-SIGACT, Baltimore MD, 22:170-180, (May
1990). (7.1).
1259. Vitter, J.S. and Chen, W-C.: Optimal algorithms for a model of direct chain-
ing; SIAM J on Computing, 14(2):490-499, (hlay 1985). (3.3.10).
1260. Vitter, J.S.: A Shared-Memory Scheme for Coalesced Hashing; Inf. Proc.
Letters, 13(2):77-79, (Nov 1981). (3.3.12).
1261. Vitter, J.S.: Analysis of Coalesced Hashing; PhD Dissertation, Stanford Uni-
versity, (Aug 1980). (3.3.12).
1262. Vitter, J.S.: Analysis of the Search Performance of Coalesced Hashing; J.ACM,
30( 2):23 1-258, (Apr 1983). (3.3.13).
1263. Vitter, J.S.: Deletion Algorithms for Hashing that Preserve Randomness; J of
Algorithms, 3(3):261-275, (Sep 1982). (3.3.12).
REFERENCES 371
1286. Weiss, M.A. and Sedgewick, R.: Tight Lower Bounds for Shellsort; Proceedings
SWAT 88, Halmstad, Sweden, 1:255-262, (1988). (4.1.4).
1287. Wessner, R.L.: Optimal Alphabetic Search Trees with Restricted Maximal
Height; Inf. Proc. Letters, 4(4):90-94, (Jan 1976). (3.4.1.7).
1288. Whitt, J.D. and Sullenberger, A.G.: The Algorithm Sequential Access Method:
an Alternative to Index Sequential; C.ACM, 18(3):174-176, (Mar 1975). (3.2.2,
3.4.3).
1289. Wikstrom, A.: Optimal Search Trees and Length Restricted Codes; BIT,
19( 4):518-524, (1 97 9). (3.4.1.7).
1290. Wilber, R.: Lower Bounds for Accessing Binary Search Trees with Rotations;
Proceedings FOCS, Toronto, Canada, 27:61-70, (Oct 1986). (3.4.1.8).
1291. Willard, D.E. and Lueker, G.S.: Adding Range Restriction Capability to Dy-
namic Data Structures; J.ACM, 32(3):597-617, (July 1985). (3.6).
1292. Willard, D.E.: Good Worst-case Algorithms for Inserting and Deleting
Records in Dense Sequential Files; Proceedings ACM SIGMOD, Washington
DC, 15:251-260, (May 1986). (3.4.3).
1293. Willard, D.E.: Log-logarithmic worst-case range queries are possible in space
O ( N ) ;Inf. Proc. Letters, 17(2):81-84, (Aug 1983). (3.6.2).
1294. Willard, D.E.: Maintaining Dense Sequential Files in a Dynamic Environment;
Proceedings STOC-SIGACT, San Francisco CA, 14:114-121, (May 1982).
(3.1.1, 3.4.3).
1295. Willard, D.E.: Multidimensional Search Trees that Provide New Types of
Memory Reductions; J.ACM, 34(4):846-858, (Oct 1987). (3.5).
1296. Willard, D.E.: New Data Structures for Orthogonal Range Queries; SIAM J
on Computing, 14(1):232-253, (Feb 1985). (3.5.3).
1297. Willard, D.E.: New Trie Data Structures Which Support Very fast Search
operations; JCSS, 28(3):379-394, (June 1984). (3.5.3).
1298. Willard, D.E.: Polygon Retrieval; SIAM J on Computing, 11(1):149-165, (Feb
1982). (3.5).
1299. Williams, F.A.: Handling Identifiers as Internal Symbols in Language Proces-
sors; C.ACM, 2(6):21-24, (June 1959). (3.3.12).
1300. Williams, J.G.: Storage Utilization in a Memory Hierarchy when Storage As-
signment is Performed by a Hashing Algorithm; C.ACM, 14(3):172-175, (Mar
1971). (3.3).
1301. Williams, J.W. J.: Algorithm 232; C.ACM, 7(6):347-348, (June 1964). (4.1.5,
5.1.3).
1302. Williams, R.: The Goblin Quadtree; Computer Journal, 31(4):358-363, (Aug
1988). (3.5.1.1).
1303. Wilson, L.B.: Sequence Search Trees: Their Analysis Using Recurrence Rela-
tions; BIT, 16(3):332-337, (1976). (3.4.1.1, 3.4.1).
1304. Winograd, S.: A New Algorithm for Inner Product; IEEE Trans. on Comput-
ers, C17(7):693-694, (July 1968). (6.3).
1305. Winograd, S.: The Effect of the Field of Constants on the Number of Multi-
plications; Proceedings FOCS, Berkeley CA, 16:l-3, (Oct 1975). (6.2).
1306. Winters, V.G.: Minimal perfect hashing in polynomial time; BIT, 30(2):235-
244, (1990). (3.3.16).
REFERENCES 373
1307. Wise, D.S.: Referencing Lists by an Edge; C.ACM, 19(6):338-342, (June 1976).
(3.1.1).
1308. Wogulis, J.: Self-Adjusting and split sequence Hash Tables; Inf. Proc. Letters,
30(4):185-188, (Feb 1989). (3.3.6, 3.3.8.5).
1309. Wong, C.K. and Chandra, A.K.: Bounds for the string editing problem;
J.ACM, 23(1):13-16, (Jan 1976). (7.1.8).
1310. Wong, C.K. and Yue, P.C.: Free Space Utilization of a Disc File Organization
Method; Proceedings Princeton Conf. on Information Sciences, Princeton,
7:s-9, (1973). (3.4.2).
1311. Wong, J.K.: Some Simple In-Place Merging Algorithms; BIT, 21(2):157-166,
(1981). (4.3.2).
1312. Wong, K.F. and Straws, J.C.: An Analysis of ISAM Performance Improve-
ment Options; Manag. Datamatics, 4(3):95-107, (1975). (3.4.3).
1313. Wood, D.: Extrema1 Cost Tree Data Structures; Proceedings SWAT 88, Halm-
stad, Sweden, 1:51-63, (1988). (3.4.1.3, 3.4.2.1, 3.4.2.3).
1314. Woodall, A.D.: A Recursive Tree Sort; Computer Journal, 14(1):103-104,
(1971). (4.2.6).
1315. Wright, W.E.: Average Performance of the B-Tree; Proceedings Allerton Con-
ference, Monticello, IL, 18:233-241, (1980). (3.4.2).
1316. Wright, W.E.: Binary Search Trees in Secondary Memory; Acta Informatica,
15(1):3-17, (1981). (3.4.1.1, 3.4.1.3).
1317, Wright, W.E.: Some Average Performance Measures for the B-tree; Acta In-
formatica, 21(6):541-558, (1985). (3.4.2).
1318. Xunuang, G. and Yuzhang, Z.: A New Heapsort Algorithm and the Analysis
of its Complexity; Computer Journal, 33(3):281, (June 1990). (4.1.5).
1319. Yang, W.P. and Du, M.W.: A backtracking method for constructing perfect
hash functions from a set of mapping functions; BIT, 25(1):148-164, (1985).
(3.3.16).
1320. Yang, W.P. and Du, M.W.: A Dynamic Perfect Hash Function defined by an
Extended Hash Indicator Table; Proceedings VLDB, Singapore, 10:245-254,
(1984). (3.3.16).
1321. Yao, A.C-C. and Yao, F.F.: Lower Bounds on Merging Networks; J.ACM,
23(3):566-571, (July 1976). (4.3).
1322. Yao, A.C-C. and Yao, F.F.: On the Average-Case Complexity of Selecting k-th
Best; SIAM J on Computing, 11(3):428-447, (Aug 1982). (5.2).
1323. Yao, A.C-C. and Yao, F.F.: The Complexity of Searching an Ordered Random
Table; Proceedings FOCS, IIouston TX, 17:173-177, (Oct 1976). (3.2.2).
1324. Yao, A.C-C.: A Note on the Analysis of Extendible Hashing; Inf. Proc. Let-
ters, 11(2):84-86, (Oct 1980). (3.3.13).
1325. Yao, A.C-C.: An Analysis of (h,k,l)-Shellsort; J of Algorithms, 1(1):14-50,
(1980). (4.1.4).
1326. Yao, A.C-C.: On optimal arrangements of keys with double hashing; J of
Algorithms, 6(2):253-264, (June 1985). (3.3.5, 3.3.9).
1327. Yao, A.C-C.: On Random 2-3 Trees; Acta Informatica, 9(2):159-170, (1978).
(3.4.2.1).
1328. Yao, A.C-C.: On Selecting the K largest with Median tests; Algorithmica,
4(2):293-300, (1989). (5.2).
374 HANDBOOK OF ALGORITHhIS AND DATA STRUCTURES
Algorithms Coded in
Pascal and C
The following entries are selected algorithms which are coded in a language
different from that used in the main entries.
void insert(key, r)
t y p e k e y key; dataarray r;
{ extern int n;
if (n>=m) E r r o r /*** Table is full ***/;
else r [ n + + ] . k = key;
1
375
376 HANDBOOK OF ALGOItlTEIRfS AND DATA STRUCTURES
{ extern int n;
n++;
return(NewNode(new, list));
1
int search(key, r)
typekey key; dataarray r;
{ extern int n;
int i;
datarecord tempr;
int search(key, r)
typekey key; dataarray r;
void insert(new, r)
typekey new; d a t a a r r a y r;
{ extern int n;
int i;
if (n>=m) Error /*** table is full ***/;
else { for (i=n++; i>=O && r[z].k>new; i--) r[i+l] = r[zJ;
4i+l].k = new;
1
1
int search(key, r )
typekey key; dataarray r;
{ int i, last;
i = hashfunction(key) ;
last = (i+n-l) % m;
while (i!=last && !empty(r[z])&& r[z].K!=key)
i = (i+l) % m;
if (r[z].k==key) return(i);
else return (- 1);
1
378 HANDBOOK OF ALGORITIIMS A N D DATA STRUCTURES
void insert( k e y , r)
t y p e k e y key; dataarray r;
{ extern int n;
int i, last;
i = hashfunction(key) ;
last = (i+m- 1 ) % m;
while (i!=last && !e,mpty(r[z)) && !deleted(r[z])&& r[z].k!=key)
i = (i+l) % m ;
if ( e m p t y ( r[z])I I deleted( r [ z ] ) )
{
/*** i n s e r t here ***/
r[2].k = key;
n++;
1
else Error /*** table f u l l , or k e y already in table ***I;
1
iiit search(key, r )
t y p e k e y key; dataarray r;
{ int i, i n c , last;
i = hash.funciion(key) ;
inc = increment(key);
last = (i+(n-I)*inc) % m;
while (i!=last && ! e m p t y ( r [ z ] )&& r[z].k!=key)
i = (i+inc) % m;
if (r[z].k==key) returii(i);
else return (-1);
ALGORITHMS CODED IN PASCAL AND c 379
void insert(key, r)
typekey key; dataarray r;
{ extern int n;
int i, inc, last;
i = hashfunction(key) ;
i n c = increment(key);
last = ( i + ( r n - l ) * i n c ) % rn;
while (i!=last && !ernpty(r[z])&& !deleted(r[zj)&& r[z].k!=key)
i = ( i + i n c ) % rn;
if ( e m p t y ( dz]) I I deleted( r[z]))
{
/*** insert here ***/
d z ] . k = key;
n++;
1
else Error /*** table full, or key already in table ***/;
void insert(key, r)
typekey Key; dataarray r;
{ extern int n;
int i, inc, ii, init, j , j;
init = hashfunction(key);
inc = increment(key);
for (i=O; i<=n; a++)
for (j=i; j>=O; j--)
{
+
j j = (init j*inc) % rn;
..
zz = ( j j + (i-j)*increment(rbjJ.k)) % rn;
if ( e m p t y ( r [ i z ) ) 11 deleted(r[iz]))
search( key, t )
typekey key;
tree t ;
1
w h i l e ( t != NULL)
if ( 2 ->k == k e y )
{ found(t); return; }
else if ( t ->k < key) t = t ->righi;
else t = t ->left;
notfound( key);
1
tree insert(hey, t )
t y p e k e y Key;
tree t;
ALGORITHMS C O D E D IN PASCAL A N D C 381
tree Zrot(t)
tree t;
{ tree temp;
int a;
temp = t;
t = t ->right;
temp ->right = i ->lefl;
t ->left = temp;
/*** adjust balance ***/
a = temp ->bal;
temp ->bal = a - 1 - max(t ->bal, 0 ) ;
t -> bal = man( a-2, man( a+t -> bal-2, t -> bal-1));
ret urn (t );
1
tree insert(key, t )
typekey key;
tree t;
{ i f ( t == NULL) {
t = NewNode(Eey, NULL, NULL);
t ->weight = 2;
1
382 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
else i f ( t ->k == k e y )
Error; /*** K e y already in table ***/
else { i f ( t ->k < key) t ->right = insert(key, t ->right);
else t ->le8 = insert(key, t ->left);
t ->weight = wt(t ->left) + wt(t ->right);
t = checkrots(t);
1
return(t ) ;
1
tree delete(key, t )
typekey key;
tree t;
tree lrot(t)
tree t;
{ tree temp;
temp = t;
t = t ->right;
temp ->right = t ->left;
t ->left = temp;
/*** adjust weight ***/
t ->weight = temp ->weight;
temp ->weight = wt(temp ->left) + wt(temp ->right);
return( t);
1
The Pascal data structure used to define B-trees is
btree = Inode;
node = record
d : O..2*M;
k. : array [1..2*4 of typekey;
p : array [O..2*M]of btree
end;
Note that the lexicographical order is given by the fact that all the keys
in the subtree pointed by p [ i ] are greater than k [ i ] and less than k [ i + 11.
var i : integer;
begin
if t=nil then {*** Not Found ***}
notfound( k e y )
else with tt do begin
384 HANDBOOK OF ALGOItTTHRlS AND DATA STRUCTURES
2 .-.- 1;
while (zed)and (key>k[z]) do i := i+l;
if k e y = k[2] then {*** F o u n d ***}
f o u n d ( t 1 , i)
else if k e y < k[z] then search(key, p[i-11)
else search(key, p[z])
end
end:
var t : btree;
begin
new(t);
tt.p[O] := p o ;
tT.p[l] := p l ;
t t . k [ l ] := k l ;
2T.d := 1;
N e w N o d e := t
end;
if k e y = k[t) then
Error (*** hey already in table ***}
else begin
if k e y > k[23 then i := i+l;
ins := InternaZInsert(p[i-l]);
if ins <> NoKey then
(*** the key in ins has t o be inserted in present node ***}
if d<2*M then InsInNode(t, ins, NewTree)
else {*** Present node has to be split ***}
begin
{*** Create new node ***}
if i<=M+l then begin
tempr := NewNode(k[2*M],nil, p[2*M]);
d := d-1;
InsInNode(2, ins, NewTree)
end
else tempr := NewNode(ins, nil, NewTree);
(*w move keys and pointers ***}
for j:=M+2 to 2*M do
InsInNode(tempr, kb], pb]);
d := M;
t e mprf .p[O] := p [M+ 13;
Internallnsert := k[M+l];
NewTree := tempr
end
end
end
end;
begin
ins := InternalInsert( t ) ;
{*** check for growth at the root ***I
if ins <> NoKey then t := NewNode(ins, t, NewTree)
end:
label 999;
var j : integer;
386 HANDBOOK OF ALGORITHMS A N D DATA STRUCTURES
begin
with tT do begin
j : = d;
while j >= 1 do
i f k e y < k[31 then begin
kb+1] := kb];
Pb+ll := Pbk
j := j-1
end
else goto 999; {*** break ***)
999:
kb+l] := key;
plj+l] := p t r ;
d := d+l
end
end;
btree N e w N o d e ( k 1 , PO, p l )
typekey k l ;
btree PO, p l ;
{ btree t e m p r ;
t e m p r = (btree)maZZoc(sizeof(node));
tempr->p[O] = PO;
t e m p r ->p[l] = p l ;
tempr->k[O] = k l ;
tempr->d = 1;
return( t e m p r ) ;
1
InsInNode(t, key, ptr)
btree t, p t r ;
t y p e k e y key;
{int j ;
f o r ( j = t ->d; j>O && key<t ->k[j-l]; j--) {
t ->kb] = t ->kL-l];
t->pb+l] = t->pb];
1
t ->d++;
i
I
ALGORJTHhlS CODED IN PASCAL AND C 387
t ->kljl = key;
t ->p[j+l] = ptr;
var i, j : integer;
t e m p r : ArrayEntry;
flag : boolean;
begin
for i:=up-1 downto lo do begin
t e m p r := r[z];
j := i + l ;
flag := true;
while ( j < = u p ) and flag do
if t e m p r . k > +].k then begin
rb-l] := rb];
j := j + l
end
else flag := false;
+ - I ] := t e m p r
end
end;
var i, j : integer;
1
i
I
I
388 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
tempr : ArrayEntry;
begin
dup+l].k := MaximumKey;
for i:=up-1 downto lo do begin
tempr := dz];
j : = i+l;
while tempr.k > +].A: do begin
4-11 := 41;
j := j+l
end;
4-11 := tempr
end
end;
var i, j : integer;
tempr : ArrayEntry;
begin
while up>lo do begin
2 .-.- lo;
j := U P ;
tempr := dlo];
(*** Split file in two ***I
while i<j do begin
while +].k > ternpr.k do
j := j-1;
dzl := 41;
while (i<j) and (dd.k<=tempr.k) do
2 .- i+l;
$1 := r[z]
end;
r [ z ] := tempr;
(*** Sort recursively, the smallest first ***I
if 2-10 < up-i then begin
sort( r,lo,z-l);
lo := i+l
end
else begin
sort( r,i+ 1,up);
ALGOWTHMS CODED IN PASCAL AND C 389
u p := 2-1
end
end
end;
s o r t ( r , lo, u p )
A r r a y T o S o r t r;
int lo, up;
{int i, j;
A r r a y E n t y tempr;
while ( u p > l o ) {
i = lo;
j = up;
t e m p r = r[lo];
/*** Split file i n t w o ***/
while ( i < j ) {
for (; + ] . k > t e m p r . k ; j - - ) ;
for (r[iI=$J; i<j &sC r[zJ.k<=tempr.k; i++);
r[jl = d21;
1
rft] = tempr;
/*** Sort recursively, the smallest first ***/
if (2-10 < u p - i ) { s o r t ( r , l o , i - l ) ; lo = i+l; }
else { s o r t ( r , i + l , u p ) ; u p = 2-1; }
1
1
The above version of Quicksort is designed to prevent the growth of the re-
cursion stack in the worst case (which could be O(n)). This is achieved by
changing the second recursive call into a while loop, and selecting the smallest
array to be sorted recursively.
label 999;
390 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
var d , i, j : integer;
tempr : ArrayEnty;
begin
d := up-lo+l;
while d>l do begin
if d<5 then d := 1
else d := trunc(O.45454*d);
I*** Do linear insertion sort in t ps size d ***I
for i:=up-d downto lo do begin
tempr := 421;
j : = i+d;
while j <= up do
if tempr.k > rb1.k then begin
+-d] := +];
j := j+d
elid
else goto 999; {*** break ***I
999:
+-dJ := tempr
end
end
end;
{int d , i, i d , j ;
ArrayEnt y tempr;
for (id=O; (d=Increments[id])> 0 ; id++) {
/*** Do linear insertion sort in steps size d ***/
for ( k u p - d ; i>=lo; i--) {
tempr = +I;
for (j=i+d; j<=up && (ternpr.k>rb].k);j+=d)
I
s o r t ( r , lo, up)
A r r a y T o S o r t r;
int lo, up;
{int a;
/*** construct heap ***/
for ( i = u p / 2 ; i > l ; i--) siftup(r,i,up);
/*** repeatedly extract m a x i m u m ***I
for (i=up; i > l ; i--) {
siftup( r,l,2);
exchange(r, 1, 2);
1
1;
var iwk : A r r a y l n d i c e s ;
out : A r r a y T o S o r t ;
t e m p r : ArrayEntry;
i, j : integer;
flag : boolean;
begin
iwk[lo] := 10-1;
for i:=lo+l to up do iwk[z] := 0;
for i:=lo to up do begin
j := phi(r[z].k, lo, up);
iwk[jl := i w k ~ J + l
end;
392 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
{ Array ToSorl rl ;
int i, j , uppq
uppr = up + (UppBoundr-up)*3/4;
for (j=Zo; j<=up; j++) rib] = +];
for (j=Zo; j<= UppBoundr; j++) rllJ.k = NoICey;
for (j=Zo; j<=up; j++) {
for (i=phi(rlb].k,Zo,uppr); r[z).k != NoKey; i++) {
if (r1bl.k < r[z].k){
rlIj-11 = dz];
423 = rib];
rib] = rlIj-11;
394 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
1;
if ( i > UppBoundr) Error;
1
r[4 = rlbl;
1;
for (j=a=lo; j<= UppBoundr; j++)
if (rbJ.k!= N o K e y )
+++I = rb];
while ( i <= UppBoundr)
r[i++].k = NoKey;
1;
end
end;
The above algorithm is similar to the one in the main entry, except that
at the bottom level of recursion, it tries to construct the longest possible list
of ordered elements. To achieve this, it compares the next element in the list
against the head and the tail of the list being constructed. Consequently, this
algorithm will improve significantly when used to sort partially ordered (or
rever se-or dered) files.
list sort(n)
int n;
{
list fi, la, temp;
extern list r;
i f ( r == NULL) r e t u r n ( N U L L ) ;
else i f ( n > l )
return(merge(sort(n / 2 ) , sort(( n+1)/2)));
else {
fi = r; la = r;
/*** Build list as long as possible ***/
for ( T - r ->next; r!=NUL;L;)
i f ( r ->k >= la ->k) {
la ->next = r;
la = r;
r = r ->next;
1
else if ( r ->k <= fi ->IC) {
temp = r;
r = r ->next;
temp ->next = ji;
ji = temp;
1
else break;
la ->next = NULL;
return(fi);
1
1;
Owing to the absence of var variables in C, the list to be sorted is stored
396 HANDBOOK OF ALGOR.lTHI\.IS AND DATA STRUCTURES
list s o r t ( r )
last r;
{
list h e a d [ M , t a i l [ M J ;
int i, j , h;
for (i=D; i>O; i--) {
for (j=O; j<.hfi j++) headljl = N U L L ;
while ( r != N U L L ) {
h = charac(i, r -->k);
if ( h e a d [ h ] = = N U L L ) head[h] = r;
else taiqh] ->next = r;
tail[h] = r;
r = r ->next;
1;
/*** Concatenate lists ***I
r = NULL;
for (j=M-1; j>=O; j--)
if (lieu@] != N U L L ) {
t a i Q ] ->next = r;
r = headb];
1
1;
return(r);
1;
The above algorithm uses the function c h a r m which returns the ith char-
acter of the given key. The global constant M gives the range of the alphabet
(or characters). The constant or variable D gives the number of characters
used by the key.
last s o r t ( s , j )
list s;
int j ;
ALGORITHMS CODED IN PASCAL Alp11 I 397
{
int i;
list head[W, t;
struct rec aux;
extern list Last;
if (s==NULL) return(s);
if ( s ->next == NULL) {Last = s; return(s);}
if ( j > D ) {
for (Last=s; Last ->next!= NULL; Last = Last ->next);
.
return(s);
1
for (z=O; i<M; a++) head[z] = NULL;
/*** place records in buckets ***/
while ( s != NULL) { .
i = charac(j, s ->k);
t = s;
s = s ->next;
t ->next = head[zj;
head[d = t ;
1
J
list rnerge(a, b)
list a, b;
{
list temp;
struct rec aux;
temp = &aut;
while ( b != NULL;)
if ( a == NULL) { a = b; break; }
else if ( b ->k > a ->k)
398 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
begin
if pq=nil then p q := n e w
else if pqT.k < n e w f . k then begin
n e w t . n e x t := pq;
p q := n e w
end
else begin
P := Pq;
while p f . n e x t <> nil do begin
if pT.nextT.k < n e w t . k then begin
n e w t . n e x t := pT.next;
p f . n e x t := n e w ;
got0 9999
end;
p := p f . n e x t
end;
pT.next := n e w
end;
9999
end;
400 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
begin
if pq=nil then Error {*** E x t r a c t i o n f r o m a n e m p t y list ***}
else if p q f . n e x t = nil then begin
extract := pq1.k; p q := nil end
else begin
m a x := pq; p := pq;
while p f . n e x t <> nil do begin
if maxf.7aextt.k < pf.next1.k then m a x := p ;
p := p f . n e x t
end;
if max:f.nextt.k < pq1.k then begin
extract := pqf.k; p q := p q f . n e x t end
else begin
extract := maxf.nextf.K;
r
m a x t .n ex2 := m a x f .next .next
end
end
end;
begin
if p q = nil then p q := n e w
else if p q f . k >= new1.k then begin
{*** I n s e r t above subtree ***}
newf.left := pq;
p q := n e w
end
else begin
P := Pq;
while pt.left <> nil do
if pT.1eflT.k >= n e w f . k then begin
{*** I n s e r t in right subtree ***}
insert( n e w , pf .right);
got0 9999
end
else p := pf.lefl;
{*** I n s e r t at b o t t o m lefl ***}
pl.left := n e w
end;
9999
end;
insert( new, r)
RecordArray r;
ArrayEntry new;
{int a', j ;
extern int n;
n++;
for (j=n; j>l; j = i ) {
i = j/2;
if (qz1.k >= new.k) break;
{ ArrayEntry tempr;
int j ;
while ((j=2*i) <= n) {
if ( j < n St& rb1.k < ++1 4 j++
if (r[z).k< +].k) {
tempr = rb];
+I = r[zl;
r[z] = tempr;
z = 3;
1
else break;
-. 1
ALGORITHMS CODED IN I'i\,YCAL AND C 403
delete( r)
RecordArray r;
extern int n;
i f ( n < l ) Error /*** extracting from an empty Heap ***I;
else {
411 = 474;
siftup(r, 1 , --n);
1
1;
tree rnerge(a, b)
tree a, b;
r = botb;
botb = t e m p ;
1;
/*** one edge is exhausted, finish merge ***/
if (botb==NULL) {
a ->right = r ->right;
r ->right = bota;
return(a ) ;
.1
else { b ->left = r ->left;
r ->left = botb;
tree insert(new, p q )
tree n e w , pq;
{
n e w ->left = new; n e w ->right = new;
return( merge(pq, n e w ) ) ;
1;
tree delete(pq)
tree pq;
{
tree le, ri;
if (pq==NULL) Error /*** Deletion o n e m p t y queue ***I;
else {
I***Find left descendant of root ***/
if ( p q - > l e f t == pq) le = NULL;
ALGORITHAfS CODED IN PASCAL AND C 405
else {
le = p q ->left;
while ( l e ->left != pq) le = le ->left;
le ->lefl = p q ->left;
1;
/*** F i n d right descendant of root ***/
if (pq->right == pq) ri = NULL;
else {
~i = p q ->right;
while (ri ->right != pq) ri = ri ->right;
ri ->right = p q ->right;
1;
/*** m e r g e t h e m ***/
return(merge( le, rz));
1
1; ~ ~~
tree merge(a, b)
tree a, b;
if ( a == NULL) return(b);
else if ( b == NULL) return(a);
else if ( a ->k > b ->k) {
a ->right = merge(u ->right, b);
fizdist( a ) ;
return(u ) ;
1
else {
b ->right = merge( a, b ->righi);
fizdist( b);
return(b);
1
h
tree delete(pq)
tree pq;
{
if ( p q == NULL) Error /*** delete on a n e m p t y queue ***/;
else return(merge(pq ->lefl, p q ->right));
1;
406 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
i f (pq==NULL) return(new);
else i f ( p q ->k > new ->k) {
p q ->right = insert( new, p q ->right);
fixdist(Pq) ;
return ( p q ) ;
1
else {
new ->le8 = pq;
return( new);
1
1;
fixdist(p q )
tree pq;
{
tree temp;
i f (distance(pq ->lefl) < distance(pq ->right)) {
temp = p q ->right;
p q ->right = p q ->left;
p q ->left = temp;
1;
p q ->did = distance(yq ->right) + 1;
1;
ALGORITHMS CODED IN PASCAL AND C 407
tree delete(pq)
tree pq;
{tree temp;
if ( p q == NULL) Error /*** deletion on an empty queue ***I;
else if (pq->right == NULL)
return(pq ->left);
else {
I*** promote left descendant up *+*/
p q ->k = p q ->left ->k.;
p q ->left = delete(pq ->left);
/*** rearrange according to constraints
if(pq->left == NULL) {
p q ->left = p q ->right; p q ->right = NULL; 1;
if (pq->right != NULL)
if ( p q ->left ->k < p q ->right ->k) {
/*** descendants in wrong order
temp = p q ->right;
p q ->right = p q ->left;
pq->left = temp;
1
ret urn(pq) ;
1
1;
{
if ( p q == NULL) r e t u r n ( n e w ) ;
else if ( p q ->k <= new ->k) {
new ->left = pq;
return( new);
1
else if ( p q ->left == NULL)
pq->left = new;
else if ( p q ->lefl ->k <= new -->k)
408 HANDBOOK OF ALGORITHMS AND DATA STRUCTURES
rea d( b u f i ]);
j := j+l;
end;
fillbuff := j-nb-1;
for i:=j to BUFSIZ do bufizl := chr(0);
end;
begin
found := FALSE;
m := Zength(pat);
if m = 0 then begin
extsearch := 1;
found := TRUE;
end;
if m >= BUFSIZ then begin {*** Buffer is too small ***}
extsearch := -1;
found := TRUE;
end;
{*** Assume that the file is open and positioned ](I**}
08s := 0 ; {*** number of characters already read ***}
nb := 0; {*** number of characters in buffer ***)
while not found do begin
if nb >= m then begin
{*** try to match ***}
i := search(pat,b u n ;
i f i <> 0 then begin
extsearch := i+o#s; (*** found ***}
found := TRUE;
end;
for i:=l to m-1 do buflz] := bufii+nb-m+2];
0 8 s := offs + nb-m+l;
nb := m-1;
end;
{*** read more text ***}
if not found then begin
nr := fillbufi
if nr <= 0 then begin
extsearch := 0; {*** not found ***I
found := TRUE;
end;
nb := nb + nr;
end;
end;
end;
410 HANDBOOK OF ALGORITEIAfS AND DATA STRUCTURES
{ int m;
procedure preprocpat;
var k, I: integer;
begin
m := length(pat);
1 := 1;
k := 0; next[l] := 0;
repeat begin
if (Xr-0) or (pat[4=pat[k]) then begin
1 := l+l; k := k+1;
if pat[k]=pat[lJ then nex:i[ll := next[k]
else next[lJ := &
end
else k := nezt[k];
end
until (1 > m);
end;
begin
found := FALSE; search := 0;
ALGORITHMS CODED IN PASCAL AND C 411
m := length(pat);
if m=O then begin
search := 1; found := T R U E end;
prep rocp a 2;
n := length(text);
j := 1 ; i := 1;
while not found and ( i <= n) d o begin
if (j=O) or ( p a t b ] = tez2[2])then begin
i := i+l; j := j + l ;
if j > m then begin
search := i-j+l;
found := TRUE;
end;
end
else j := nextbJ;
end;
end;
var i, j , k, m, n: integer;
skip: array [O..MAXCHAR] of integer;
found boolean;
begin
found := FALSE search := 0 ;
m := length(pat);
if m=O then begin
search := 1; found := T R U E end;
for k:=O to MAXCHAR d o skip[k] := m; {*** Preprocessing ***}
for k:=l to m-1 do := m-k;
skip[ord(pat[k])]
# d e f i n e B 131
==EO9 r e t u r n ( text);
if(put[O]
B m = 1;
hpat = htext = 0 ;
for(j=m; TRUE; j + + ) {
if(hpat==htext && s~mcnzp(text+j-m,put,m)==O)
r e t u r n (t ext+j- m);
if(textIj]==EOS) r e t u r n ( N U L L ) ;
h2ext.t = Atext*B - textlj-m]*Bm + teztb];
found boolean;
begin
found := FALSE; search := 0 ;
m := Zength(pat);
if m=O then begin
search := 1; found := T R U E end;
n := length(text);
j := 1; i := 1;
while (i<=n-m+l) and not found do begin
count := 0 ; j := 1 ;
while ( j <= m) and (count <= k) do begin
if texi[i+j-l] <> pat[jl then count := count + 1;
+
j := j 1;
end;
if count <= k then begin
search := 2; found := T R U E end;
2 .- +
.- 2 1;
end
end;
Index
1-2 brother trees, 128 array of digits, 237
1-2 neighbour trees, 128 array search, 25
1-2 son trees, 128 array sorting, 230
1-2 trees, 128 ASCII, 138, 235
2-3 brother trees, 125 asymptotic expansions, 296
2-3 trees, 124 asymptotic expansions of sums, 298
2-3-4 trees, 129 containing e-xa , 302
80%-20% rule, 70, 293 asymptotic expansions of definite
integrals containing e-"a,
accesses, 91 302
accessing books, 291 asymptotic matrix multiplication,
addition, 235, 247 247
addition chain, 240 asymptotic notation, 5
address region, 79 atomic operations, 15
address-calculation sort, 176 automaton simulation, 275
addressing methods, 24 average minimum accesses, 70
album, 287 AVL trees, 97, 127, 128, 183
algorithm definition, 14
algorithm descriptions, 14 B*-trees, 121, 122, 132
algorithm format, 1, 2 B+-trees, 122
algorithms, code, 6 B k tree, 226
alignment problem, 283 B B ( a ) trees, 100
alphabet size, 251 B-Tree insertion, 15
alternating selection, 188, 191 B-tree variations, 130
alternation, 21 B-trees, 11, 117, 183
amortized worst case, 103 balance of a node, 100
approximat e matrix mu1t iplic at ion, balanced binary trees, 226
247 balanced merge sort, 193
approximate string matching, 267 balanced multiway trees, 117
arbitrary precision approximating, balanced nodes, 97
247 balanced Quicksort, 181
arctan(x), 244 balanced trees, 183
arithmetic algorithms , 235 balancing by internal path reduc-
arit hmetic-geometric mean, 242 tion, 102
array indices, 131 balancing rules, 24
array merging, 185 basic algorithm, 24
415
416 INDEX