Data Structures Sample

Submit Assignment For Help
Massachusetts Institute of Technology Go To Code Directly Handout 3

6.854J/18.415J: Advanced
info@programminghomeworkhelper.com Algorithms Wednesday, September 14, 2005
David Karger
Problem Set 2
Due: Wednesday, September 21, 2005.
Notice that one problem is marked noncollaborative. As you might expect, this prob
lem should be done without any collaboration.
Problem 1. Describe a data structure that represents an ordered list of elements under
the following three types of operations:
access(k): Return the kth element of the list (in its current order).
insert(k, x): Insert x (a new element) after the kth element in the current version of the list.
reverse(i, j) Reverse the order of the ith through jth elements.
For example, if the initial list is [a, b, c, d, e], then access(2) returns b. After reverse(2,4), the
represented list becomes [a, d, c, b, e], and then access(2) returns d.
Each operation should run in O(log n) amortized time, where n is the (current) number of
elements in the list. The list starts out empty.
Hint: First consider how to implement access and insert using splay trees. Then think about
a special case of reverse in which the [i, j] range is represented by a whole subtree. Use these
ideas to solve the real problem. Remember, if you store extra information in the tree, you
must state how this information can be maintained under various restructuring operations.
This data structure is useful in efficiently implementing the Lin Kernighan heuristic for the
travelling salesman problem.
Problem 2. Given the theorem about access time in splay trees, it is tempting to con
jecture that splaying does not create trees in which it would take a long time to find an
item. Show that this conjecture is false by showing that for large enough n, it is possible to
restructure any binary tree on n nodes into any other binary tree on n nodes by a sequence
of splay operations (implying that there is some access sequence that turns a tree into a
path).
Problem 3. Let S be a search data structure that performs insert, delete and search in
O(log n) time, where n is the number of elements stored. An empty data structure S can be
created in O(1) time.
We will construct a static data structure with n elements that is worstcase optimal in total
access time, given the number of times an element is accessed in an access sequence.
https://www.programminghomeworkhelper.com/
2 Handout 3: Problem Set 2
k
The data structure is constructed as follows. Search data structure Sk holds the 22 most
frequently occurring items in the access sequence. A search on v is done on S0 , S1 , . . . until
an Si holding v is encountered. Notice that all elements in Si are held in Si+1 .
(a) Show that the above data structure is asymptotically comparable to the optimal
static tree in terms of the total time to process the access sequence. Recall
from class that the statically optimal data structure achieves average access time
�
O(− pi log pi ) where pi is the fraction of accesses to item i.
(b) Make the data structure capable of insert operations. Assume that the number of
searches to be done on v is provided when v is inserted. The cost of insert should
be O(log n) amortized time, and total cost of searches should still be worst case
optimal (nonamortized).
(c) Improve your solution to work even if the frequency of access is not given during
the insert. Your data structure now satisfies the same static optimality theorem
as splay trees.
(d) Optional. Make your data structure satisfy the working set theorem on splay
trees. Ignore the static optimality condition.
Problem 4. Worked example.

(a) Build an uncompressed suffix trie for “banana$”. Show the structure and node
traversal path for each suffix insertion. Mark the suffix links that are actually
used as shortcuts in the efficient construction algorithm.
(b) Draw the compressed suffix tree for “banana$”.
NONCOLLABORATIVE Problem 5. In this problem, we will see how to construct

a suffix tree on multiple texts, and what some useful properties of such a suffix tree are.
Suppose you are given n texts T1 , T2 , . . . Tn .
(a) Suppose you build a common suffix tree of all the texts T1 , T2 , . . . Tn , i.e., a trie
that contains all the suffixes of all the n texts. Argue that you can do this in time
O(|T1 | + |T2 | + . . . |Tn |). Be careful not to produce suffixes that cross from one
text to another (which would happen if you simply concatenated all the texts).
(b) Suppose that we add a different unique terminating symbol $i to each of the
texts Ti . Consider a node N in the common suffix tree, and let s be the string
corresponding to this node (i.e., the string on the path from the root to the node).
How can you determine whether the string s is a substring of all the n texts by
looking at the subtree rooted at N ?
(c) Using the above approach, explain how you can find the largest common substring
of the two texts T1 , T2 in time O(|T1 | + |T2 |) (there’s a simple generalization to
more texts).
Massachusetts Institute of Technology Handout 6
6.854J/18.415J: Advanced Algorithms Wednesday, September 21, 2005
David Karger
Problem Set 2 Solutions
Problem 1. We augment every node x in the splay tree with the number x.desc of de-
scendants (including itself) and a reverse bit x.reverse. No key needs to be maintained.
Each node x has a minor child x.minor and a major child x.major. The left child x.lef t is
the minor child and the right child x.right is the major child if an even number of ancestors
(including itself ) have their reverse bit set. Otherwise x.right is the minor child and x.lef t
is the major child.
An in-order traversal T rav(x) on node x is defined as T rav(x.minor) + x + T rav(x.major).
We ensure the invariant that T rav(t), where t is the root, is the list of elements in order.
When splay tree operations are performed, the notion of left and right children is replaced
with that of minor and major children. The minor and major children of a node x can be
identified by looking at the reverse bits of its ancestors. This computation can be done when
a search for x is performed.
z x
y y
Zig-zig
x z
It is evident that all splaying operations preserve T rav(t) if we update the reverse bit ap-
propriately. For example in Figure Problem 1, the reverse bit of z is modified z.reverse ⊕
x.reverse ⊕ y.reverse, where ⊕ denotes the exclusive-or operation. Similarly, the value of
number descendants can be updated on rotations. For example in Figure Problem 1, the
value of z.desc is updated to 1 + y.major.desc + z.major.desc.
The potential function argument works for the data structure as it does for splay trees except
when a reverse bit is flipped. When a reverse bit x.reverse is flipped, the major and minor
children are flipped for all the descendants of x. However this does not change the potential
�
x r(x).
2 Handout 6: Problem Set 2 Solutions
Therefore we can perform splay operation correctly in O(log n) amortized time. Split and
join operations can be defined on our structure. The removal or addition of a root only
causes changes to the new root.
We can perform access(k) by a search based on desc field. Operation insert(k, x) is done
like a splay tree insert, using split and join. The reverse(i, j) involves flipping x.reverse
where x is the subtree containing the range [i, j] as its descendants. To obtain an x of this
form, we split at i and then at j. We now have x as the root of a splay tree. After flipping
x.reverse, the three trees can be joined.
Problem 2. Observe first that the claim in the question is not true for n = 3; it is not
possible to turn a zig-zig into a zig-zag by splaying (try it).
Claim: For n ≥ 4, it is possible to turn any n node binary search tree into any other by a
sequence of splay operations.
Proof:
We will prove this claim by induction on n.
Base case: n = 4. We can turn the tree into a left path by splaying on the items in order.
(It is easy to show this for all n by induction. The key observation is that the last step of
each successive splay must be a zig or zig-zag, which pushes the root onto the left path.)
This is true for all n It remains to check that we can turn a left path into anything:
d c c a d a
a d a
c b d b a d
b a c c c
a a d b b
b
a
d
d c b
b c
d a c
c a d
a d
b b a
b
c
d a
b
a d c
d
c b d
c
a a
d b
b a
d d
c b
a c
Handout 6: Problem Set 2 Solutions 3
Inductive step: We need to show that if it is possible to restructure any n − 1 node binary
search tree into any other by a sequence of splay operations then the same is true for any n
node binary search tree.
We will accomplish this goal via the following four lemmas:
Lemma 1 Any node in a binary search tree with ≥ 4 nodes can be moved to a leaf position
by an appropriate sequence of splay operations.
Lemma 2 A leaf node will remain a leaf node under a sequence of splay operations if it is
not splayed.
Lemma 3 The structure of the tree containing the descendants of a node that is splayed has
no effect on the structure of the tree that results.
Lemma 4 No two binary search trees on n nodes differ only in the position of one leaf node.
By Lemma 1 we can pick a node that is to become a leaf in the final tree and make it a leaf.
Now Lemmas 2 and 3 say that this leaf will stay a leaf if we splay the other nodes, and will
not affect the results of splaying on the other nodes. Thus by the inductive hypothesis we
know that we can restructure the other n − 1 nodes to match the desired tree. Finally, by
Lemma 4 we know that we have gotten the desired tree.
Proof of 1. Let i denote the item we wish to turn into a leaf. If i is the minimum item we
can turn it into a leaf by splaying on i and its successor. If i is the maximal element we can
handle it symmetrically. If i is not the second element, splay i’s predecessor’s predecessor,
i’s, predecessor, i, and i’s successor, giving the following situation:
j splay at g g
i j
h h
g i
If i is the second element we can handle it symmetrically. (Splay succ(succ(i)), succ(i),

i, pred(i), and then succ(succ(i) again.)
Proof of 2. It is clear from the definition of splaying that no leaf node is ever given a
descendant unless it is splayed.
Proof of 3. It is clear from the definition of splaying that descendants of a splayed node
have no effect on the result of the operation.
Proof of 4. Suppose two binary search trees differed only in the position of one leaf node.
Then the path from the root to the leaf differs in these two trees. Look at the place where it
first differs. In order for the path to go left at this point the leaf must be less than this node;
in order for the path to go right the leaf must be greater than this node. It is impossible for
both of these to happen. Contradiction.
Several people misinterpreted this question by assuming that they could just apply the zig,
zig-zig, and zig-zag cases at will. A splay operation applies the three cases as appropriate
until the item is at the root. So splay(x) always brings x all the way to the root. Thus you
cannot just splay in subtrees, and inversion of splays is difficult. (This theorem implies that
you can invert splays, but you can’t use this theorem to prove itself.)
Problem 3. Let m be the number of accesses made, and let p(x) · m be the number
of accesses made to item x. The access time has a information theoretic lower bound of
�
Ω(m x −p(x) log p(x)). It takes Ω(m) to process the sequence. Therefore the optimal
�
access time is Ω(m + m x −p(x) log p(x)).
k
1. Search data structure Sk holds 22 most frequently accessed items.
Lemma 5 The search data structure is statically optimal.
Proof. There are at most 1/p(x) items with more access frequency than x. Therefore
x must belong to an Sk such that
k−1
22 < 1/p(x)
i.e., 2k < 2(1 − log p(x)). Therefore the search time in Sk is O(2k ) = O(1 − log p(x)).
The search time in smaller Si ’s is O(20 + 21 + · · · + 2k−1 ) which is O(2k ). So the total
�
access time is O(m + m x −p(x) log p(x)) which matches the lower bound.
k
2. We make the data structure dynamic. Sk now holds the 22 most frequently accessed
items that have been accessed at least once previously. The search data structure is
k
still optimal in search time since Sk still holds at least 22 most frequently accessed
items that can be accessed by the subsequent search.
The items in Sk are also organized in a search tree in the increasing order of access
frequencies. It can be seen that every insert or delete operation in Sk will still take
O(2k ) time.
Item x in inserted in Si if p(x) of x is more than the minimum access frequency in
Si . If the bucket Si is full, the item with minimum access frequency is deleted. Notice
that the deleted item will be present in a higher Sj data structure.
A new Sl+1 needs to be created if Sl cannot hold all elements after an insert. The
creation of this level costs O(n log n) time. We will now show that the cost of insert is
O(log n) amortized.
Lemma 6 The amortized cost of insert operation is O(log n).

Handout 6: Problem Set 2 Solutions 5
Proof. The cost of insertions in each level is
O(20 + 21 + · + 2l ) = O(2l ) = O(log n)

l
since 22 ≥ n. The cost of creating a new level is O(n log n). But we have to create
l
a new level only if n = 22 . We define the potential function
l−1
φ = 2l+1 · # elements in Sl − 22
where Sl is the last search data structure. The change in potential if a new level is
not created is only 2l+1 . The change is potential if a new level is created is
l l−1 l
2l+1 (22 − 22 ) ≥ 2l · 22 = n lg n
which pays for the cost of creating a new level.
3. Recall that in (b), the access frequencies were organized in a search tree for each Sk .
The data structure now updates values in the search tree on accesses and maintains the
current access frequency of every element in Sk .
Lemma 7 The dynamic online data structure is statically optimal.
Proof. The cost of the jth search is O(log(j/f (x, j))), where f (x, j) is the current ac-
cess frequency of item searched. Therefore the total time to process the access sequence
is
�
T (m) = O(log(j/f (x, j)))
x
�
= O(log(m!/ (mp(x))!))
x
�
Let us denote mp(x) by mx . Note that x mx = m. By plugging in the Stirling
approximation of factorials, we get
� �
mm−1/2 e−m
T (m) = O log � m −1/2
mx x e−mx
x
� �
mm �
= O log � mx + log mx
x mx x
� �
mm
= O log � mx + m
x mx
�
since x log mx = O(m).
4. Instead of holding the most frequently accessed items, we hold the most recently ac-
cessed item. We can replace the search tree on access frequencies by a doubly linked
list holding the items in LRU order. The proof that working set theorem is satisfied is
similar to lemma 5.
Problem 4. See the diagrams on the webpage.
Problem 5. (a) We append a unique terminator to the end of each text; in par-
ticular, the text Ti becomes Ti $i . Now we construct the combined suffix tree in
the following manner. First, construct the suffix tree of the text T1 using the
McCreight’s algorithm as described in the lecture. After that, we will add all the
suffixes of T2 , then all the suffixes of T3 , and so forth.
To add all the suffixes of T2 , we first slowfind the text T2 (i.e., the string consist-
ing of the entire text). We will eventually fall off the existing tree. After that, we
can just run McCreight’s algorithm to insert the rest of the suffixes of T2 . Note
that, we will still have the invariant: that after each step, except for the newly
created leaf and possibly its parent, every explicit node of the tree has a suffix
link to some other explicit node of the tree. The rest of the proof of correctness
and runtime of the McCreight’s algorithm is still the same. Specifically, the time
to insert all the suffixes of T2 in the common suffix tree will be O(|T2 |).
In a similar way, we add all the suffixes of T3 , then all the suffixes of T4 , and so
forth.
Total running time will be O(|T1| + |T2 | + . . . + |Tn |).
(b) Let s be the string corresponding to the node N (i.e., s is the string on the path
from the root to N). Then, if s is a substring of a text Ti , then there is a suffix
sw$i in Ti . This means that $i is present on some edge in the subtree rooted
at N (more specifically, $i is the last symbol of an edge leading to a leaf in N’s
subtree).
Thus, s is a common substring of all n texts iff all $i , i = 1 . . . n, are present in

the subtree rooted at N.
(c) Construct a common suffix tree of texts T1 and T2 . From this suffix tree we
compute the longest common substring as follows.
1. Mark every suffix tree node that contains in its subtree a suffix ending in
$1 . This can be done in linear time by performing a postorder traversal of
the tree: when we examine a node, we have already checked all its children;
mark the node if any of its children is marked. Similarly, mark every suffix
tree node that contains in its subtree a suffix ending in $2 .
2. With one more tree traversal, find the deepest node marked with both fea-
tures. Just maintain a “current depth” counter; increment it by the length
of any edge traversed downward and decrement by the length of any edge
traversed upward. (Note that by the “depth” of a node N, we mean the
length of the string corresponding to N.)

Data Structures Sample

Uploaded by

Data Structures Sample

Uploaded by

Submit Assignment For Help

Massachusetts Institute of Technology Go To Code Directly Handout 3

Due: Wednesday, September 21, 2005.

searches to be done on v is provided when v is inserted. The cost of insert should

trees. Ignore the static optimality condition.

Problem 4. Worked example.

used as shortcuts in the eﬃcient construction algorithm.

(b) Draw the compressed suﬃx tree for “banana$”.

NONCOLLABORATIVE Problem 5. In this problem, we will see how to construct

looking at the subtree rooted at N ?

of the two texts T1 , T2 in time O(|T1 | + |T2 |) (there’s a simple generalization to

Problem Set 2 Solutions

If i is the second element we can handle it symmetrically. (Splay succ(succ(i)), succ(i),

Lemma 6 The amortized cost of insert operation is O(log n).

Proof. The cost of insertions in each level is

O(20 + 21 + · + 2l ) = O(2l ) = O(log n)

Problem 4. See the diagrams on the webpage.

Total running time will be O(|T1| + |T2 | + . . . + |Tn |).

at N (more speciﬁcally, $i is the last symbol of an edge leading to a leaf in N’s

Thus, s is a common substring of all n texts iﬀ all $i , i = 1 . . . n, are present in

compute the longest common substring as follows.

You might also like