Advanced Search On Linear Data Structures: Li Yin February 8, 2020
Advanced Search On Linear Data Structures: Li Yin February 8, 2020
Data Structures
Li Yin1
February 8, 2020
1
www.liyinscience.com
ii
As the name suggests, Two pointers technique involves two pointers that
start and move with the following two patterns:
sub-spaces exist: from start index to i ([0, i]), from i to j ([i, j]), and from
j to the end index ([j, n]).
Even though slow-faster pointers technique rarely given formal introduc-
tion in book, it is widely used in algorithms. In sorting, Lumuto’s partition
in the QuickSort used the slow-faster pointers to divide the whole region
into three parts according the comparison result to the pivot: Smaller Items
region, Larger Items region, and the unrestricted region. In string pattern
matching, fixed sliding window and one we will introduce in this chapter.
In this section, we explain how two pointers work on two types of linear
data structures: Array and Linked List.
0.1.1 Array
Remove Duplicates from Sorted Array(L26)
Given a sorted array a = [0, 0, 1, 1, 1, 2, 2, 3, 3, 4], remove the duplicates in-
place such that each element appears only once and return the new length.
Do not allocate extra space for another array, you must do this by modifying
the input array in-place with O(1) extra memory. In the given example,
there are in total of 5 unique items and 5 is returned.
Analysis We set both slower pointer i and the faster pointer j at the first
item in the array. Recall that slow-fast pointers cut the space of the sorted
array into three parts, we can define them as:
In the process, we compare the items pointed by two pointers, once these
two items does not equal, we find an new unique item. We copy this unique
item at the faster pointer right next to the position of the slower pointer.
Afterwards, we move the slow pointer by one position to remove duplicates
of our copied value.
With our example, at first, i = j = 0, region one has one item which is
naively unique and region two has zero item. Part of the process is illustrated
as:
i j [0 , i ] [ i +1, j ] process
0 0 [0] [] item 0==0, j +1=1
0 1 [0] [0] item 0==0, j +1=2
0 2 [0] [0 , 1] item 0!=1 , i +1=1, copy 1 t o i n d e x 1 , j
+1=3
1 3 [0 , 1] [1 , 1] item 1==1, j +1=4
1 4 [0 , 1] [1 , 1 , 1] item 1==1, j +1=5
iv
After calling the above function on our given example, array a becomes
[[0, 1, 2, 3, 4, 2, 2, 3, 3, 4]. Check the source code for the whole visualized pro-
cess.
Input : s = 7 , nums = [ 1 , 4 , 1 , 2 , 4 , 3 ]
Output : 2
E x p l a n a t i o n : t h e s u b a r r a y [ 4 , 3 ] has t h e minimal l e n g t h under t h e
problem c o n s t r a i n t .
However, we can use two pointers i and j (i ≤ j) and both points at the
first item. In this case, these two pointers defines a subarray a[i : j + 1] and
we care the region [i, j]. As we increase pointer j, we keep adding positive
item into the sum of the subarray, making the subarray sum monotonically
increasing. Oppositely, if we increase pointer i, we remove positive item
away from the subarray, making the sum of the subarray monotonically
decreasing. The detailed steps of two pointer technique in this case is as:
2. Get the optimal subarray for all subproblems(subarries) that end with
current j, which is e0 at the moment. We do this by forwarding pointer
i this time to shrink the window size until sum ≥ s no longer holds.
Let’s assume pointer i stops at index s0 . Now, we find the optimal
solution for subproblems a[0 : i, 0 : j]( denoting subarries with the
start point in range [0, i] and the end point in range [0, j].
Because both pointer i and j move at most n steps, with the total op-
erations to be at most 2n, making the time complexity as O(n). The above
question would be trivial if the maximum subarray length is asked.
Analysis Applying two pointers, with the region between pointer i and j
to be our testing substring. For this problem, the condition for the window
[i, j] it will at most have all characters from T . The intuition is we keep
expanding the window by moving forward j until all characters in T is
found. Afterwards, we contract the window so that we can find the minimum
vi
window with the condition satisfied. Instead of using another data structure
to track the state of the current window, we can depict the pattern T as a
dictionary data structure where all unique characters comprising the keys
and with the number of occurrence of each character as value. We use
another variable count to track how the number of unique characters. In
all, they are used to track the state of the moving window in [i, j], with the
value of the dictionary to indicate how many occurrence is short of, and the
count represents how many unique characters is not fully found, and we
depict the state in Fig. 2.
Along the expanding and shrinking of the window that comes with the
movement of pointer i and j, we track the state with:
Part of this process with our example is shown in Fig. 3. And the Python
code is given as:
1 from c o l l e c t i o n s import Counter
2 d e f minWindow ( s , t ) :
3 d i c t _ t = Counter ( t )
4 count = l e n ( d i c t _ t )
5 i , j = 0, 0
6 ans = [ ]
7 minLen = f l o a t ( ' i n f ' )
8 while j < len ( s ) :
9 c = s[j]
10 i f c in dict_t :
0.1. SLOW-FASTER POINTERS vii
Figure 3: The partial process of applying two pointers. The grey shaded
arrow indicates the pointer that is on move.
11 d i c t _ t [ c ] −= 1
12 i f d i c t _ t [ c ] == 0 :
13 count −= 1
14 # S h r i n k t h e window
15 w h i l e count == 0 and i < j :
16 curLen = j − i + 1
17 i f curLen < minLen :
18 minLen = j − i + 1
19 ans = [ s [ i : j + 1 ] ]
20 e l i f curLen == minLen :
21 ans . append ( s [ i : j +1])
22
23 c = s[i]
24 i f c in dict_t :
25 d i c t _ t [ c ] += 1
26 i f d i c t _ t [ c ] == 1 :
27 count += 1
28 i += 1
29
30 j += 1
31 r e t u r n ans
viii
Input : [ 1 , 2 , 3 , 4 , 5 ]
Output : Node 3 from t h i s l i s t ( S e r i a l i z a t i o n : [ 3 , 4 , 5 ] )
Example 2 ( even l e n g t h ) :
Input : [ 1 , 2 , 3 , 4 , 5 , 6 ]
Output : Node 4
from t h i s l i s t ( S e r i a l i z a t i o n : [ 4 , 5 , 6 ] )
0.1. SLOW-FASTER POINTERS ix
When a linked list which has a cycle, as shown in Fig. 5, iterating items
over the list will make the program stuck into infinite loop. The pointer
starts from the heap, traverse to the start of the loop, and then comes back
to the start of the loop again and continues this process endlessly. To avoid
being stuck into a “trap”, we have to possibly solve the following three
problems:
1. Check if there exists a cycle.
2. Check where the cycle starts.
3. Remove the cycle once it is detected.
The solution encompasses the exact way of slow faster pointers traversing
through the linked list as our last example. With the slow pointer iterating
one item at a time, and the faster pointer in double pace, these two pointers
will definitely meet at one item in the loop. In our example, they will meet
at node 6. So, is it possible that it will meet at the non-loop region starts
from the heap and ends at the start node of the loop? The answer is No,
because the faster pointer will only traverse through the non-loop region
once and it is always faster than the slow pointer, making it impossible to
meet in this region. This method is called Floyd’s Cycle Detection, aka
Floyd’s Tortoise and Hare Cycle Detection. Let’s see more details at how
to solve our mentioned three problems with this method.
Check Linked List Cycle(L141) Compared with the code in the last
example, we only need to check if the slow and fat pointers are pointing at
the same node: If it is, we are certain that there must be a loop in the list
and return True, otherwise return False.
1 d e f h a s C y c l e ( head ) :
2 s l o w = f a s t = head
3 w h i l e f a s t and f a s t . next :
4 s l o w = s l o w . next
5 f a s t = f a s t . next . next
6 i f s l o w == f a s t :
7 r e t u r n True
8 return False
0.1. SLOW-FASTER POINTERS xi
For a given linked list, assume the slow and fast pointers meet at node
somewhere in the cycle. As shown in Fig. 6, we denote three nodes: head
(h, start node of cycle(s), and meeting node in the cycle(m). we denote the
distance between h and s to be x, the distance between s and m to be y, and
the distance between m and s to be z. Because the faster pointer traverses
through the list in double speed, when it meets up with the slow pointer,
the distance that it traveled(x + y + z + y) to be two times of the distance
traveled by the slow pointer (x + y).
From the above equation, we obtain the equal relation between x and z. the
starting node of the cycle from the head is x, and y is the distance from
the start node to the slow and fast pointer’s node, and z is the remaining
distance from the meeting point to the start node. Therefore, after we have
detected the cycle from the last example, we can reset the slow pointer to
the head of the linked list after. Then we make the slow and the fast pointer
both traverse at the same pace–one node at a time–until they meet at a
node we stop the traversal. The node where they stop at is the start node
of the cycle. The code is given as:
1 d e f d e t e c t C y c l e ( head ) :
2 s l o w = f a s t = head
3
4 d e f g e t S t a r t N o d e ( slow , f a s t , head ) :
5 # Reset slow p o i n t e r
6 s l o w = head
7 w h i l e f a s t and s l o w != f a s t :
8 s l o w = s l o w . next
xii
9 f a s t = f a s t . next
10 return slow
11
12 w h i l e f a s t and f a s t . next :
13 s l o w = s l o w . next
14 f a s t = f a s t . next . next
15 # A cycle i s detected
16 i f s l o w == f a s t :
17 r e t u r n g e t S t a r t N o d e ( slow , f a s t , head )
18
19 r e t u r n None
Remove Linked List Cycle We can remove the cycle by recirculing the
last node in the cycle, which in example in Fig. 5 is node 6 to an empty
node. Therefore, we have to modify the above code to make the slow and
fast pointers stop at the last node instead of the start node of the loop. This
subroutine is implemented as:
1 d e f r e s e t L a s t N o d e ( slow , f a s t , head ) :
2 s l o w = head
3 w h i l e f a s t and s l o w . next != f a s t . next :
4 s l o w = s l o w . next
5 f a s t = f a s t . next
6 f a s t . next = None
The complete code to remove cycle is provided in google colab together with
running examples.
2. If t > a[i] + a[j], we have to increase the sum, we can only do this by
moving pointer i forward.
3. If t > a[i] + a[j], we have to decrease the sum, we can only do this by
moving pointer j backward.
1 d e f twoSum ( a , t a r g e t ) :
2 n = len (a)
3 i , j = 0 , n−1
4 while i < j :
5 temp = a [ i ] + a [ j ]
6 i f temp == t a r g e t :
7 return [ i , j ]
8 e l i f temp < t a r g e t :
9 i += 1
10 else :
11 j −= 1
12 return [ ]
However, the above code only returns 3, instead of 4 as shown in the example.
By printing out pointers i and j, we can see the above code is missing case
(2, 4). Why? Because we are restricting the subarray sum in range [i, j] to
be smaller than or equal to S, with the occruence of 0s that might appear
in the front or in the rear of the subarray:
The solution is to add another pointer ih to handle the missed case: When
the sum = S, count the total occurrence of 0 in the front. Compared with
the above solution, the code only differs slightly with the additional pointer
and one extra while loop to deal the case. Also we need to pay attention
that ih ≤ j, otherwise, the while loop would fail with example with only
zeros and a targeting sum 0.
1 d e f numSubarraysWithSum ( a , S ) :
2 i , i_h , j = 0 , 0 , 0
3 win_sum = 0
4 ans = 0
5 while j < len ( a ) :
6 win_sum += a [ j ]
7 w h i l e i < j and win_sum > S :
8 win_sum −= a [ i ]
9 i += 1
10 # Move i_h t o count a l l z e r o s i n t h e f r o n t
11 i_h = i
12 w h i l e i_h < j and win_sum == S and a [ i_h ] == 0 :
13 ans += 1
14 i_h += 1
15
16 i f win_sum == S :
17 ans += 1
18 j += 1
19 r e t u r n ans
0.4 Summary
Two pointers is a powerful tool for solving problems on liner data structures,
such as “certain” subarray and substring problems as we have shown in the
examples. The “window” secluded between the two pointers can be viewed
as sliding window: It can move slide forwarding with the forwarding the
slower pointer. Two important properties are generally required for this
technique to work:
monotonicity: moving the faster pointer and the slower pointer for-
ward results into opposite change to the state. The same goes for the
substring problems where we see from the minimum window substring
example the change of the state: count and the value of the dictionary
is monotonic, and each either increases or decreases with the moving
of two pointers.
0.5 Exercises
1. 3. Longest Substring Without Repeating Characters