0% found this document useful (0 votes)
14 views488 pages

DS1 Stack Merged

Uploaded by

Khanh Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views488 pages

DS1 Stack Merged

Uploaded by

Khanh Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Stack:

Last-In-First-Out (LIFO) Data


Structure

Sungdeok (Steve) Cha

VinUni, CECS

1
Learning Objectives

• Understand Last-In-First-Out (LIFO) data


structure
• Understand when stack is an effective choice
• Become familiar with built-in support for stack
(java.util.Stack class)
• Understand how stack can be internally
implemented (e.g., array vs linked list)
Stack

• LIFO (Last In First Out)


• Special case of ordered (linear) list
– insertion (push)

– deletion (pop)

• Applications
– Maze Problem
– Evaluation of expression
Push, Pop, Empty, …

• Push 1,2,3,4,5 in order and pop one


import java.util.Stack;
public class StackExample {
public static void main(String[] args) {
// Create a new stack
Stack<Integer> s = new Stack<>();

// Push elements onto the stack


s.push(1);
s.push(2);
s.push(3); 5 top
s.push(4); 4 top 4 4 top
s.push(5); 3 top 3 3 3
2 top 2 2 2 2
s.pop();
1 top 1 1 1 1 1
// Pop elements from the stack Push 2 Push 3 Push 4 Push 5 Pop
while (! s.empty() ) { // s.isEmpty() ?
System.out.println(s.pop());
}
}
}
! is logical not (negation)
Can you …?

• When using java.util.Stack


– How can I find out the current size of the stack?
– Can I sort elements in the stack in ascending or
descending order?
– Can I reverse the order in the stack?
– …

• Why?
– To enhance your understanding of data structure !
Stack Size?
Array-based Stack
Implementation
• Why? Some Algorithm size()
programming return t + 1
languages (e.g., C)
do not provide Algorithm pop()
built-in stack if empty() // t=-1?
throw StackEmpty
support else
– Add elements from left t  t  1
to right return S[t + 1]
– A variable, top or t,
keeps track of the index
of the top element


S
0 1 2 t N-1
Array-based Stack
Implementation
• The array storing
the stack elements Algorithm push(o)
may become full if t = S.size()  1
then throw StackFull
– Throw a StackFull else {
exception? t  t + 1
• No need to worry in S[t]  o
java.util.Stack. Why? }
– Alternatively, stack
could be resized (e.g.,
reallocated) and data
copied

S
0 1 2 t
In “static” environment
(e.g., C)
In C
(e.g., static environment)…
#include <stdio.h> // Resize the array
#include <stdlib.h> int new_size = 10;
arr = (int *)realloc(arr, new_size * sizeof(int));
int main() { if (arr == NULL) {
int *arr; printf("Memory reallocation failed\n");
int size = 5; return 1;
}
// Allocate initial memory
arr = (int *)malloc(size * sizeof(int)); // Initialize new elements
if (arr == NULL) { for (int i = size; i < new_size; i++) {
printf("Memory allocation failed\n"); arr[i] = (i + 1) * 2; // Just an example initialization
return 1; }
}
// Print resized array
// Initialize the array printf("Resized array: ");
for (int i = 0; i < size; i++) { for (int i = 0; i < new_size; i++) {
arr[i] = i + 1; printf("%d ", arr[i]);
} }
printf("\n");
// Print original array
printf("Original array: "); // Free allocated memory
for (int i = 0; i < size; i++) { free(arr);
printf("%d ", arr[i]);
} return 0;
printf("\n"); }
Details, Details, …
// Resize the array
• See difference? int new_size = 10;
newarr = (int *)realloc(arr, new_size * sizeof(int));
// Resize the array if (newarr == NULL) {
int new_size = 10; printf("Memory reallocation failed\n");
arr = (int *)realloc(arr, new_size * sizeof(int)); return 1;
if (arr == NULL) { }
printf("Memory reallocation failed\n");
return 1; // Initialize new elements
} for (int i = size; i < new_size; i++) {
newarr[i] = (i + 1) * 2;
// Initialize new elements }
for (int i = size; i < new_size; i++) { oldarr = arr;
arr[i] = (i + 1) * 2; arr = newarr;
} free (oldarr);

// Print resized array // Print resized array


printf("Resized array: "); printf("Resized array: ");
for (int i = 0; i < new_size; i++) { for (int i = 0; i < new_size; i++) {
printf("%d ", arr[i]); printf("%d ", arr[i]);
} }
printf("\n"); printf("\n");

// Free allocated memory // Free allocated memory


free(arr); free(arr);

return 0; return 0;
} }
Array-based Stack
Implementation
• Performance
– Let N be the array size
• Space usage: O(N) (must be pre-allocate array of size N)
• Each operation (e.g., push, pop, …) runs in O(1)
• Limitation
– The maximum size of the stack must be defined a priori and
cannot be changed
– Trying to push a new element into a full stack can causes a
run-time error
Stack implementation using
Linked List

• Using singly linked list


• Top element is stored at the first node of
the list
– Space : O(n) for n elements
• Unlike array, pointer field is also required
– Push and Pop operation : O(1)

nodes
t 

elements
Stack implementation using
Linked List

• How do you define “Top of stack” in the


linked list?
– What was last item pushed onto stack? Tiger? Horse?
• Which is a “better” (?) design? Why?

nodes
t 

elements
Application: System Stack

• Locals, temporaries, parameters, return addres


s, frame pointer
– Procedure call stack
Call stack and
Return address
Call stack, Return address,
and Malware (e.g., worm)
Buffer Overflow Attacks

• Many others
– Slammer Worm
(2003)
– Blaster Worm (2003)
– Hearbleed (2014)
– EternalBlue (2017)
– …
• Don’t be
surprised if
there is another
buffer overflow
attack tomorrow
Stack Application Example:
Maze Problem
• 2D array to model the map
– 0: opened (can move), 1: blocked (cannot move)
– int maze[i][j]

Enter 0 1 0 0 0 1 1 0 0 0 1 1 1 1 1
1 0 0 0 1 1 0 1 1 1 0 0 1 1 1
0 1 1 0 0 0 0 1 1 1 1 0 0 1 1
1 1 0 1 1 1 1 0 1 1 0 1 1 0 0
1 1 0 1 0 0 1 0 1 1 1 1 1 1 1
0 0 1 1 0 1 1 1 0 1 0 0 1 0 1
0 0 1 1 0 1 1 1 0 1 0 0 1 0 1
0 1 1 1 1 0 0 1 1 1 1 1 1 1 1
1 1 0 0 0 1 1 0 1 1 0 0 0 0 0
0 0 1 1 1 1 1 0 0 0 1 1 1 1 0
0 1 0 0 1 1 1 1 1 0 1 1 1 1 0 Exit
Maze Problem
Maze Problem

COMP1020
Scope
Maze Problem

• Items
– (x,y,dir) : at (x,y), try to move dir

• Strategy
– Try each direction unless it is not yet visited
– When moving forward, use stack to store the current location a
nd the next search direction
– If all directions are blocked, pop the previous location and nex
t search direction to continue
Maze Problem

• Allowable moves? struct offsets


{
– If diagonal moves are allowed int a, b;
};

enum directions {N, NE, E, SE,


NW N NE S, SW, W, NW};
[i-1][j-1] [i-1][j] [i-1][j+1] offsets move[8];

q m ove[q].a m ove[q].b
N -1 0
NE -1 1
W [i][j-1] X [i][j+1] E E 0 1
SE 1 1
S 1 0
[i] [j]
SW 1 -1
W 0 -1
NW -1 -1
[i+1][j-1] [i+1][j] [i+1][j+1]
Table of moves
SW S SE
Move from x[I][j] to SW (x[g][h])
g = I + move[sw].a;
h = j + move[sw].b;
chatGPT’s
Maze Code in Java

push

pop
Different Behavior?

• How about code readability? Maintainability?

Arrays.fill (flags, false);


Do It Yourself
Strongly Recommended!!
private static boolean dfs(int[][] maze, int x, int y, int endX, int endY,
List<int[]> path, boolean[][] visited) {

// Check if out of bounds or wall or already visited


if (x < 0 || y < 0 || x >= maze.length || y >= maze[0].length || maze[x][y] == 1 || visited[x][y])
return false;

visited[x][y] = true; // Mark the current cell as visited


path.add(new int[]{x, y}); // add? DFS? BFS?

if (x == endX && y == endY) // Check if we've reached the destination


return true;

// Explore all possible directions


for (int[] dir : DIRECTIONS) {
int newX = x + dir[0];
int newY = y + dir[1];
Trace program execution,
if (dfs(maze, newX, newY, endX, endY, path, visited))
return true; draw your own stack,
} and see how stack is
used find a path if there
// If no valid path, backtrack
path.remove(path.size() - 1); // add? DFS? BFS? is one
return false;
}
DIY
private static boolean dfs(int[][] maze, int x, int y, int endX, int endY,
List<int[]> path, boolean[][] visited) {

// Check if out of bounds or wall or already visited


if (x < 0 || y < 0 || x >= maze.length || y >= maze[0].length || maze[x][y] == 1 || visited[x][y])
return false;

visited[x][y] = true; // Mark the current cell as visited


path.add(0, new int[]{x, y}); // ?

if (x == endX && y == endY) // Check if we've reached the destination


return true;

// Explore all possible directions • Trace program


for (int[] dir : DIRECTIONS) { execution, draw your
int newX = x + dir[0]; own stack, and see
int newY = y + dir[1];
if (dfs(maze, newX, newY, endX, endY, path, visited))
how stack is used find
return true; a path if there is one
} • Trace program on
input where there exist
// If no valid path, backtrack
path.remove(0); // ? no path
return false;
}
Logical errors can sometimes
be really tricky to find
Another Stack Application:
Evaluation of Expression
• Often called “expression tree”
– Push to stack until operator is reached
– Pop two operands from stack (e.g., binary operations)
– Push the result back to stack

https://www.calcont.in/Calculator/Postfix_calculator/
Postfix Notation

• Benefits
– No parenthesis
– No operator priority
– Simple to evaluate (left to right scan)

• Example
– Infix: A/B-C+D*E-A*C
– Postfix: AB/C-DE*+AC*-
• Another Example
– Infix: “A*B/C”
– Postfix: “AB*C/”
– Prefix: “/*ABC”
Stack Applications
Valid/Balanced
Parenthesis Problem
• The Balanced Parentheses problem is a classic problem in computer
science and involves checking whether a given string containing
different types of parentheses (such as round brackets (), curly
braces {}, and square brackets []) is balanced or not.
– A string is considered "balanced" if every opening bracket has a
corresponding closing bracket in the correct order.
– String test1 = "{ [ ( ) ( ) ] }"; // Balanced (space added for clarity)
– String test2 = "{ [ ( ] ) }"; // Not Balanced
– String test3 = "( ( ( ) ) )"; // Balanced
– String test4 = "( ( ( ) )"; // Not Balanced
Potential Stack Questions

• Use Search engine, chatGPT, and other sources


• Some candidates are…
– Check if an expression has balanced parentheses using a stack
• Input : "((a+b)*(c-d))" → Output: True
– Write a Java method to reverse a stack
Queue and Priority Queue

Sungdeok (Steve) Cha

VinUni, CECS

1
Learning Objectives

• Understand how queue and priority queue work


as First-In-First-Out (FIFO) data structure
• Become familiar with built-in Java support
– java.util.Queue, java.util.PriorityQueue
• Understand when queue is a good choice in
problem solving
• Understand how PriorityQueue is different from
Queue in behavior
• Understand how Queue can be implemented
– Array vs LinkedList, advantages and disadvantages
Queue

• Another common data structure used in program


to hold objects.
• first-in-first-out (FIFO)
– New elements are inserted at the end of the list and are deleted
from the beginning of the list.

Queue

front/head rear/tail

3
Queue Methods
• Among
Numerous
examples
Priority Queue

• Operates like queue except that elements inter


nally stored ordered by priority.
– Element’s priority determines the order in which elements are
removed
– ONLY supports comparable data (e.g., must be able to order
using well-defined criteria
AI-generated Tutorial

• Impressive!
chatGPT is
Equally Impressive
DIY:
Try to do the same
with MaxHeap, too
More Subtle Example

• What’s “strange (?)” about this program?


• Will it compile? Will it run? Will it crash?

26
More Subtle Example

This code is perhaps not


highly recommended, but it
is important to understand
why the code behaved the
way it did

27
Naïve Implementation of
Queue
Risks of Naïve
Implementation of Queue
1. Fixed Size Limitation
•Problem: Arrays have a fixed size, which means the queue can only hold a
predetermined number of elements.
•Solution: …
2. Wasted Space
•Problem: When elements are dequeued, the space they occupied in the array is not
reused.
•Solution: ...
3. Resizing Overhead
•Problem: If you choose to resize the array when it becomes full, this operation can be
costly.
•Solution: …
4. Complexity of Circular Queue
•Problem: Implementing a circular queue can be more complex than a simple array-based
queue.
•Solution: …
5. Index Management
•Problem: Keeping track of the front and rear indices can be tricky, especially when the
queue wraps around.
•Solution: …
Naïve Implementation of
Queue using Array
Array Implementation of
Circular Queue
Why Should I bother?
I am using Java!
• You probably don’t have to worry about it when
using methods defined in java.util.LinkedList
class
– It is a convenient and dynamic data structure!!
– You still want to make sure that “corner cases” are properly
handled
– Corner cases (or rare/extreme inputs) are notoriously difficult
to test properly
• Some programming languages (e.g., C) lack
built-in support for Abstract Data Types
Potential Questions
Potential Questions
Sorting Algorithms

Sungdeok (Steve) Cha

VinUni, CECS

1
Learning Objectives

• Understand Big O notation


• Understand differences in approaches among
various techniques
– Evaluate pros and cons of different sorting techniques
– Which technique works well when? Why?
– Which technique does NOT work well when? Why?
• Understand which sorting techniques are used
in practice (e.g., built-in Java classes)
• Understand how “creative” an idea can be even
to the well-understood problem such as sorting
Topics

• Big O notations
• Linear Search vs Binary Search
• Bubble Sort
– Brute-force (straightforward) implementation
– Possibilities to improve brute-force bubble sort algorithms
• Selection Sort
• Insertion Sort
• Merge Sort
• Shell Sort
• Quick Sort
Comparison: Linear Search
vs Binary Search
• Time Complexity: Best case, average case, wor
st case
• O(N) vs O(log N)
• “Management” of ordered array, however, is m
ore complex
“Big O”, Omega, Theta
Notations
• Big O is enough for now ;)
Criteria for
Time Complexity Analysis
There are diverse criteria
examples:
- # of for/while loop iterations
- # of visiting a particular line
- # of calls for a particular function
-…
sample(n):
factorial(n):
sum ← 0
if n = 0 or n=1 return 1
for i ← 1 to n-1
else return n * factorial(n-1)
for j ← i+1 to n
sum ← sum + A[i]*A[j]
return sum
Examples of Running Time
sample1(A[], n):
 constant time
k= n/ 2 independent of n
return A[k]

sample2(A[], n):
sum ← 0
for i ← 1 to n
sum ← sum + A[i]  proportional to n
return sum
Examples of Running Time
sample3(A[], n):
sum ← 0
for i ← 1 to n
 proportional to n2
for j ← 1 to n
sum← sum+ A[i]*A[j]
return sum

sample4(A[], n):
sum ← 0
for i ← 1 to n
for j ← 1 to n 𝑛
k ← maximum among randomly chosen elements
2
sum ← sum + k
return sum

 why proportional to n3 ?
Examples of Running Time
sample5(A[], n):
sum ← 0
for i ← 1 to n
for j ← i to n
sum ← sum + A[i]*A[j]
return sum
 proportional to n2
Examples of Running Time
factorial(n): T(1) = c
if (n=1) return 1 T(n) = c + T(n-1)
return n*factorial(n-1)
 proportional to n

Formal mathematical analysis is outside the scope of COMP 1020


Search Algorithms and
“Factors”
• Applicable algorithms vary depending on how “
data is organized”
– Ordered array (e.g., ascending or descending) vs unorderd
Linear Search on
Unordered Array

public static void main(String[] args) {


// Unordered array
int[] array = {12, 5, 7, 9, 3, 8, 14, 6};

// Target value to search for public static int linearSearch(int[] array,


int target = 8; int target) {
for (int i = 0; i < array.length; i++) {
int result = linearSearch(array, target); if (array[i] == target) {
return i;
if (result != -1) }
System.out.println }
("Element " + target + " found at index: " return -1; // Return -1 if not found
+ result); }
else
System.out.println
("Element " + target +
" not found in the array.");
}
Binary Search

public class BinarySearchExample {

public static int binarySearch(int[] array, int target) {


public static void main(String[] args) { int low = 0;
// Sorted array (ordered) int high = array.length - 1;
int[] array = {1, 3, 5, 7, 9, 11, 13, 15, 17, 19};
int target = 7; // Target value to search for
while (low <= high) {
int result = binarySearch(array, target); int mid = low + (high - low) / 2;
if (result != -1) { if (array[mid] == target)
System.out.println ("Element " + target + return mid;
" found at index: " + result); else if (array[mid] > target)
} else {
System.out.println
high = mid - 1;
("Element " + target + else
" not found in the array."); low = mid + 1;
} }
}
return -1;
}
Recursive Binary Search
However, …

• Be sure to understand exactly what the


“contract” states
– It promises a negative number, but not always -1
• Don’t blindly assume
it will always be -1
No surprise?

• Java’s built-in BinarySearch assumes, but does


not verify, that array is sorted in ascending
order
Lots of Sorting Algorithms

• What’s your choice? Why?


Bubble Sort in Java

public class BubbleSort {


// Method to print the array
public static void printArray(int[] arr) { // Method to perform Bubble Sort
for (int num : arr) { public static void bubbleSort(int[] arr) {
System.out.print(num + " ");
}
int n = arr.length;
System.out.println();
} // Traverse through all array elements
public static void main(String[] args) {
for (int i = 0; i < n - 1; i++) {
// Example array to be sorted // Last i elements are already sorted
int[] arr = {64, 34, 25, 12, 22, 11, 90}; for (int j = 0; j < n - i - 1; j++) {
System.out.println("Unsorted Array:");
// Swap if elements are out of order
printArray(arr); if (arr[j] > arr[j + 1]) {
// Swap arr[j] and arr[j + 1]
// Sorting the array using Bubble Sort
bubbleSort(arr);
int temp = arr[j];
arr[j] = arr[j + 1];
System.out.println("Sorted Array:"); arr[j + 1] = temp;
printArray(arr);
}
}
} }
}
}
Which one is correct?

public class BubbleSort { public class BubbleSort {

// Method to perform Bubble Sort // Method to perform Bubble Sort


public static void bubbleSort(int[] arr) { public static void bubbleSort(int[] arr) {
int n = arr.length; int n = arr.length;

// Traverse through all array elements // Traverse through all array elements
for (int i = 0; i < n - 1; i++) { for (int i = 0; i < n - 1; i++) {
// Last i elements are already sorted // Last i elements are already sorted
for (int j = 0; j < n - i - 1; j++) { for (int j = i; j < n - i - 1; j++) {
// Swap if elements are out of order // Swap if elements are out of order
if (arr[j] > arr[j + 1]) { if (arr[j] > arr[j + 1]) {
// Swap arr[j] and arr[j + 1] // Swap arr[j] and arr[j + 1]
int temp = arr[j]; int temp = arr[j];
arr[j] = arr[j + 1]; arr[j] = arr[j + 1];
arr[j + 1] = temp; arr[j + 1] = temp;
} }
} }
} }
} }
Complexity Analysis
of Bubble Sort
• Can you do it yourself?
– Is best case really O(N)? public class BubbleSort {

// Method to perform Bubble Sort


public static void bubbleSort(int[] arr) {
int n = arr.length;

// Traverse through all array elements


for (int i = 0; i < n - 1; i++) {
// Last i elements are already sorted
for (int j = 0; j < n - i - 1; j++) {
// Swap when necessary
if (arr[j] > arr[j + 1]) {
// Swap arr[j] and arr[j + 1]
int temp = arr[j];
arr[j] = arr[j + 1];
arr[j + 1] = temp;
}
}
}
}
Compare
Improvement Possible?
public class BubbleSort {
public static void bubbleSort(int[] arr) {
public class BubbleSort { int n = arr.length;
boolean swapped;
// Method to perform Bubble Sort
public static void bubbleSort(int[] arr) { for (int i = 0; i < n - 1; i++) {
int n = arr.length; swapped = false;
for (int j = 0; j < n - i - 1; j++) {
// Traverse through all array elements if (arr[j] > arr[j + 1]) {
for (int i = 0; i < n - 1; i++) { // Swap elements
// Last i elements are already sorted int temp = arr[j];
for (int j = 0; j < n - i - 1; j++) { arr[j] = arr[j + 1];
// Swap when necessary arr[j + 1] = temp;
if (arr[j] > arr[j + 1]) { swapped = true;
// Swap arr[j] and arr[j + 1] }
int temp = arr[j]; }
arr[j] = arr[j + 1]; // If no swap occurred, the array is sorted
arr[j + 1] = temp; if (!swapped)
} break;
} }
} }
} }
Is it even possible to do
more efficiently?
• Impressive enough, chatGPT suggested …
chatGPT says …

• That’s why chatGPT is software developer’s


best friend but can be the worst enemy at the
same time

Can you explain


rationale behind this
recommendation?
Selection Sort
Compare
Insertion sort

• A simple sorting algorithm.


• Partition array into two parts:
sorted and unsorted.
• Values from the unsorted part
are picked and placed at the c
orrect position in the sorted p
art.
• Drawback: when an element has
to be moved far ahead, many mo
vements are involved.
Insertion Sort
Selection vs Insertion Sort
DIY

• Can you modify the code to sort in desending


order? Where do swaps (or shifts) occur?
Merge Sort
Merge operations

1. Split the array into two halves.


2. Sort the left half. How? Recursively!
3. Sort the right half.
4. Merge the two.

38
• Merge Sort
Just don’t memorize;
Convince Yourself
Recursive MergeSort
Beyond the
Scope of COMP 1020
• Good example where curiosity and creativity
result in an effective algorithm
FYI
Shell Sort

• Don’t be surprised if chatGPT appears


unsure
Shell Sort

Outside the scope of


COMP 1020
Why study so many sorting
algorithms?
• Isn’t Bubblesort (or Quicksort, Mergesort) –
whatever your favorite – enough?
• Understand how creative idea can come up
with “smart (?)” algorithms
– More efficient algorithms can come from only curious minds
• Develop skill to critically evaluate various
alternatives
Shell Sort

• Visualization tool may


help you better
understand the
concept
• What’s the key idea
behind shell sort
algorithm?
– When is shell sort effective
– Compare it against
insertion sort (or
bubble/selection sort)
DIY. Using a larger array

How about [5, 2, 9, 1, 7, 3, 4, 5, 2] ?


[5, 2, 9, 1, 7, 3, 4, 5, 2]

• Shell sort, step-by-step


Shell Sort and Java
Shell Sort

• What’s the key idea behind shell sort


algorithm?
– When is shell sort effective?
– Compare it against insertion sort (or bubble/selection sort)
Quick sort algorithm

• Perhaps the best known sorting algorithm

54
Sir C.A.R. Hoare
Naïve Quick Sort

• O(N2) time complexity in the worst case


– If we were to always pick the first element as pivot value
[1, 2, 3, 4, 5, 6, 7, 8]
– Or, if we choose the last element as pivot
[8, 7, 6, 5, 4, 3, 2, 1]
• Ideally, we want left and right partition to be
of the similar size
– In order to achieve O(N log N) complexity
How to pick pivot?

• A quick sort needs to select a value known as


pivot to assist with splitting the list.
• Some of the pivot selection methods are:
1. Always pick first element as pivot.
2. Always pick last element as pivot.
3. Pick a random element as pivot.
4. Pick median as pivot.
– What’s the cost of choosing the median value?
Quicksort and
Pivot Selection
Quicksort and
Pivot Selection
Quicksort and
Pivot Selection
Timsort?
Beyond the Scope of COMP 1020

• Just for your information. Stay curious, always


willing to learn
Your Choice?
Oracle’s choice?
• Be sure to
accurately
understand
the definition
of “stability”
– Understand why
“unstable” result
may be obtained
Potential Questions
Potential Questions
Tree and Binary Search Tree

Sungdeok (Steve) Cha

VinUni, CECS

1
Learning Objectives

• Tree structure is very frequently used


– May provide O(log n) performance in search, insertion, and
deletion
• Understand binary search tree algorithms
– Other types of trees (e.g., B-trees) are sometimes used.
Especially in DBMS implementations
• Understand how tree, abstract data structure,
is implemented
– Doubly linked list is usually used. Understand why.
– Array may be used in certain types of trees. It is called “Heap”
• Understand the need for height-balanced tree
– AVL tree
– Understand why Red-Black tree is used in practice
Tree

• Non-linear data structure organized based on


hierarchical relationship
– A node has one predecessor (i.e., parent), but can h
ave multiple successors (i.e., child)
• For binary tree, up to two
– The root of the tree is at the top!

3
Tree Terminology

• Root, Parent, Child


• Siblings: nodes with same parent
• Leaf vs internal nodes
– Leaf node has no children
• Internal nodes: non-leaf nodes

4
Level, Depth, Height, …

5
Level, Depth, Height, …
DIY
Don’t Get Confused !!!

“Appearance”
(Not about data contents)

Essential Topic (On how


to store and organize
data)
(Not now…)
Full Binary Tree

• Has nothing to
do with
“contents”
• We will learn
later that it is
NOT a binary
search tree
– OK if it does not
make sense now
Complete Binary Tree

• Useful as it can be
easily stored in array
– Location of child nodes can
be computed
– Root at [0]
• That’s why some
tree-based data
structures (e.g.,
MaxHeap, MinHeap)
enforce complete
binary tree
requirements
Heap

• Do you see why array is used to represent a


heap structure?

14 9 30
MaxHeap 12 7 6 3 25

10 8 6 5

14 9 30
Not a 12 7 6 25 40
MaxHeap!
Why? 10 6 5
Full/Complete Tree and Heap

14 5

12 7 12 7

10 8 6 8
Full Binary Tree? Y Full Binary Tree? Y
Complete Binary Tree? Y Complete Binary Tree? N
MaxHeap? Y MaxHeap? N
MinHeap? N MinHeap? N

2 9

4 7 4 7

10 6 10

Full Binary Tree? Y Full Binary Tree? N


Complete Binary Tree? Y Complete Binary Tree? Y
MaxHeap? N MaxHeap? N
MinHeap? Y MinHeap? N
0-based? 1-based?

• Nothing terribly complex or important


Stay
alert!!!
Perfect Binary Tree
Hmm…

• Is full binary tree always a complete binary


tree?
• Is complete binary tree always a full binary
tree?
• Is perfect binary tree a full binary tree?
• …
Skewed Binary Tree

• Becomes essentially
equivalent to linked
list
• Applications do not
have control over
input orders
– Need smart algorithms to
prevent tree from being
skewed
– AVL tree or Red-Black tree
Binary Search Tree (BST)

• Intuitive Algorithm
– Could result in skewed
(e.g., inefficient) BST
LinkedList and
Tree Structure
Advantages Disadvantages
• Dynamic Size Allocation • Extra Memory Overhead
• Efficient Insertions and • Slower Access Time
Deletions • Complex
• Flexible Structure Implementation
• Memory Utilization • Cache Inefficiency
• Easy Implementation of • Garbage Collection
Parent and Child Links Issues (in some
languages)
– Not in Java
– Manual deallocation
required in C or C++
BST and Java Code
BST and Java Code
Java and Binary Tree

We will
study
“simpler”
AVL tree
instead
Binary Tree Traversal

• Preorder
• Inorder
• Postorder
One of KU’s Data Structure
Exam Question
• Credit to Prof. Won-Ki Jeong
• Draw a binary tree whose inorder and
postorder traversal sequences are as follows:
– Inorder : “A D E F C I H B G”
– Postorder : “A D F I H C G B E”
Binary Search Tree (BST)

• Left child contains smaller values than the


parent
• Right child contains larger values than the
parent
• Duplication is NOT allowed
• Intuitive
algorithm ;)
Insertion in Binary Search Trees

insert(t, x):
◀ t : root node, x : key to insert
30
if (t=NIL)
r.key ← x ◀ r: new node
20 40
return r
if (x < t.key)
t.left ← insert(t.left, x) 25
return t
else t.right ← insert(t.right, x)
return t
Examples of Insertion

30 30 30 30

20 20 20 40

25 25

(a) (b) (c) (d)

30 30

20 40 20 40

10 25 10 25 35

(e) (f)
Deletion in Binary Search Trees

There are three cases


– Case 1 : r is a leaf node
– Case 2 : r has only one child
– Case 3 : r has two children

t:root node
r: node to deleted
Deletion in Binary Search
Trees
sketch_delete(t, r):
◀ t : root node, r : node to delete
if (r is a leaf node) ◀ Case 1
Just throw away r
else if (r has only one child) ◀ Case 2
Let r’s parent links to the (only) child of r
else ◀ Case 3
Remove the minimum node s of r’s right subtree,
and copy the key of s to node r
Example: Case 1
55 55

15 60 15 60

8 28 90 8 28 90

r
3 18 30 3 30

48 48

38 50 38 50

33 33

32 36 32 36

(a) r has no child (b) Simply throw away r


Example: Case 2
55 55 55

15 60 15 60 15 60

8 28 90 8 28 90 8 28 90

r
3 18 30 3 18 3 18 48

48 48 38 50

38 50 38 50
33

33 33 32 36

32 36 32 36

(a) r has only one child (b) Remove r (c) Put r’s (only) child at r’s location
Example: Case 3
55 55

15 60 15 60

r
8 28 90 8 r 90

3 18 45 3 18 45

41 48 41 48

s s
30 50 30 50

38 38

33 33

32 36 32 36

(a) Find r’s inorder successor s (b) Remove r (imaginary removal)


55 55

15 60 15 60

s s
8 30 90 8 30 90

3 18 45 3 18 45

41 48 41 48

s 50 38 50

38
33

33
32 36

32 36

(c) Move s to r’s location (d) Move s’s (only) child to s’s location
(copy s to r)
DIY

• Insert 2 ?
• Insert 55 ?
– Tree might become “less
balanced”, but it is still a
BST
• Delete 30 ?
• Delete 52 ?
• Delete 50 ?
– Choose inorder
predecessor?
– Choose inorder
successor?
Another KU’s Data
Structure Exam Question
• Credit to Prof. Won-Ki Jeong
• Assume that you insert a new element x to a binary search tree
that does not contain x. After inserting x, you delete x
immediately from the tree. If you repeat this insertion-deletion
operation several times (each time with different value x), would
the tree be same as the initial one? If yes, then explain why. If no,
then give a counter example.
Another KU’s Data
Structure Exam Question
• Credit to Prof. Won-Ki Jeong
• Assume that you delete a leaf node x from a binary search tree
and insert x back to the tree immediately. If you repeat this
deletion-insertion operation several times (each time with different
x value), would the tree be same as the initial one? If yes, then
explain why. If no, then give a counter example.

• Assume that you delete a non-leaf node x from a binary search


tree and insert x back to the tree immediately. If you repeat this
deletion-insertion operation several times (each time with different
x value), would the tree be same as the initial one? If yes, then
explain why. If no, then give a counter example.
Potential Questions
Potential Questions
Potential Questions
Height-Balanced Trees:
AVL Tree and Red-Black Tree*

Sungdeok (Steve) Cha


VinUni, CECS

*Red-Black tree is introduced only briefly, and it is


outside the scope of COMP1020 final exam
Learning Objectives

• Understand why height-balanced tree structure


is needed
• Understand the mechanism to keep tree’s
height balanced at all times (e.g., AVL tree)
– Understand the cost (or overhead) of maintaining the tree
structure balanced
• Understand details of insertion and deletion
algorithm in AVL tree
• Understand the difference in approaches
between AVL tree and Red-Black Tree
– Understand why Red-Black tree, not AVL tree, is used in
practice
– Understand how Java supports Red-Black tree operations
AVL Tree

• AVL tree is a form of self-balancing BST


– Need additional rules (e.g., rotation) to to keep tree structure
always balanced

3
Search on AVL Tree

• Essentially the same as performing binary


search on sorted array
– Examine root node first
– If match is not found, continue search recursively on either left
or right subtree
• As AVL tree, by definition, is height-balanced,
search operation is efficient
– O(log N)
Important Topic !!!

• Why “balance” the tree? To achieve O(log N)


performance !!!
• AVL Tree Search
– Same as binary tree
search
– AVL tree is still a binary
search tree !
– With height balance
constraints enforced
Insertion on AVL Tree

• Not as straightforward or intuitive as BST


algorithm
– Why? Requirement on maintaining height balance may require
“adjustment” (e.g., rotation) of tree structure
• The following steps are required
1. Perform a standard (“intuitive”) BST insertion:
2. Update the height of each ancestor node:
3. Calculate the balance factor:
4. Perform rotations, if necessary, to balance the tree
• Left-Left (LL) Case : Right rotation
• Right-Right (RR) Case: Left rotation
• Left-Right (LR) Case: Left rotation + Right rotation
• Right-Left (RL) Case: Right rotation + Left rotation
Insertion on AVL Tree

• Remember that “rotation” (e.g., structural


adjustment) occurs as soon as imbalance
occurs
• First 3 steps are quite intuitive
AVL Tree and Balance
Factor
AVL Tree and Balance
Factor
• By definition, node 30 is balanced and node 40
is NOT!

Heavy on the right


side, therefore
rotate left !
Fixing the Unbalance
AVL Tree Rotation

• Heavy on the left? Rotate to the right

Heavy on the left P Q


side, therefore
rotate right !

Q R S P

S T T R
LR? RL?
Can be confusing!

#2

#1
LR? RL?
Can be confusing!

#2

#1
Summary of
AVL Tree Rotation
AVL Tree Insertion
Recommended Exercises
DIY. Strongly
Recommended
Same Values,
Different Orders
• “Internal” structure of AVL tree may vary
depending on the order of insertion
– [10, 25, 50, 20, 40, 30]
– [10, 20, 25, 30, 40, 50]
• But, same performance (e.g., time complexity)
when insertion/deletion operations are applied
• Still height-balanced
[10, 25, 50, 20, 40, 30]
[10, 20, 25, 30, 40, 50]
AVL Tree and
Node Deletion
Unbalanced AVL Tree on
Deletion
• What happen if we
delete 20, 30, and
40 in that order?
Red-Black Tree

• Red-Black Tree is another form of self-


balancing BST
– “Higher intellectual challenge” compared to AVL tree
– To be skipped primarily due to time limit
• Will be covered later, if time permits.
• Java’s TreeSet class implements Red-Black
Tree algorithm
– Sorted Order: Elements are stored in ascending order by default.
– No Duplicates: It does not allow duplicate elements.
– Self-Balancing: Internally implemented using a Red-Black Tree,
ensuring operations like insert, delete, and search run in O(log N)
time
Red-Black Tree

• A newer (!) and more frequently used form of


self-balancing tree
Red-Black Tree

• Practical improvement of AVL tree ideas


AVL tree vs Red-Black tree

• Both are useful in that height is guaranteed to


be “balanced” (e.g., rotation)
• In AVL tree, rotation is applied whenever
height imbalance of greater than -1 or 1 is
detected
– Sometimes an excessive constraint and/or overhead
• Red-Black tree aims to relax constraints so that
height difference is kept “reasonably balanced”
(e.g., ratio less than 2) so that “adjustment”
operation can be avoided when not critical
Lots of YouTube Tutorials

• Rob Edwards (San Diego State University) is


the one I watched. Pretty good.
– Illustrative and interactive lectures on Red-Black Tree
• Lecture 1 & 2 is especially useful
– Other Data Structure lectures are also available
Red-Black Tree Properties

• The root and all external nodes are black


– Only 1-bit storage overhead per node

• No root-to-external-node path has two consecu


tive red nodes
– (=) Red node must have two black children

• All root-to-external-node paths have the same


number of black nodes
Red-Black Tree Properties (Poi
nter)
• Pointers from an internal node to an external n
ode are black
• No root-to-external-node path has two consecu
tive red pointers
• All root-to-external-node paths have the same
number of black pointers
Example Red-Black Tree

1
0

7 4
0
3 8 3 4
0 5

3 6
1 5 2 0
5
0
2
5
Properties

• If P and Q are two root-to-external-node paths


in a red-black tree, then
length(P) <= 2*length(Q)
• (=) longest path length is bounded by 2*shorte
st path length
– Shortest path : B-B-B-....-B
– Longest path : B-R-B-R...-B
– Why? Number of B must be same for all paths by definition and
red-red sequence is NOT allowed
java.util.TreeMap
java.util.TreeMap

• Elements are stored in automatically sorted


order
java.util.TreeMap
Java vs C++
Potential Questions
KU’s Data Structure
Exam Question
• Assume that you have 7 numbers (1, 2, 3, 4, 5, 6, 7). If
your goal is to achieve the shortest running time of AVL
tree (e.g., with minimal tree rotations), what sequence
would you choose? Explain why. (Credit to Prof. Won-Ki
Jeong)

• Assume that you have 7 numbers (1, 2, 3, 4, 5, 6, 7). If


you were to insert them in the exact sequence into AVL
tree, how does tree structure evolve? Illustrate the
process with AVL tree at each step. How many and which
rotations are applied during the process?
– My “customized” question
MaxHeap, MinHeap, Heap Sort

Sungdeok (Steve) Cha

VinUni, CECS

1
Learning Objectives

• Understand when heap is a preferred choice


– Understand why array is usually used for heap representation
• Learn various algorithms on Heaps
– Insertion, Deletion, conversion of MaxHeap into MinHeap, …
• Understand how heap sort works and its
complexity
No rules on the ordering of values between sibling nodes!!
No rules on not allowing duplicates!!
MaxHeap
BST? Heap?

• Read “definition” of binary heap again !!


Array Implementation of
Heap
• Complete binary tree !!!
Array Implementation of
Heap
Heap or Not?

• If it is a heap, is it MinHeap or MaxHeap?


– Can you convert MaxHeap to MinHeap or vice versa?

8 8

9
4 7 3 5
8 8
2 1 3
4 7 3 5

2 1 3
MinHeap
MinHeap
Heap

14 9 30
Max heap
12 7 6 3 25

10 8 6 5

2 10 11

Min heap 7 4 20 21
83

10 8 6 50
Search on MaxHeap (or
MinHeap)
14

12 7

10 8 6

14

10 12

8 7 6

14

8 12

7 6 10
Never Stop Asking Yourself
Insertion, MaxHeap
MaxHeap Insertion
Example
MinHeap Insertion

• Essentially the same, Different Heapify


requirement
Example: Insert 21

Step 0 (Initial MaxHeap)

20

15 17

8 10 6 9

2
Example: Insert 21

• Step 1

20

15 17

8 10 6 9

2 21
Create node at the end of the heap
Example: Insert 21

• Step 2

20

15 17

21 10 6 9

2 8
Bubbling up : 8 <-> 21
Example: Insert 21

• Step 3

20

21 17

15 10 6 9

2 8
Bubbling up : 15 <-> 21
Example: Insert 21

• Step 4

21

20 17

15 10 6 9

2 8
Bubbling up : 20 <-> 21, done!
DIY

• Show step-by-step procedure of MinHeap(or


MaxHeap) construction when the numbers are
inserted in the following order
– 12
– 70
– 30
– 20
– 55
– 25
– 40
– 50
MaxHeap to MinHeap?

• Is it possible to transform a MaxHeap to


MinHeap (or vice versa)? DIY

21

20 17

15 10 6 9

2 8
Impressive indeed !!
Deletion, MaxHeap

• The node to be deleted is most likely the root


– But, it is not an absolute requirement
Never Stop Asking
Questions
• Even if it may not make full sense to you now…
• Or explore other (potentially more efficient)
solutions
– That’s essentially how you do research !!!
MaxHeap, Deletion
MaxHeap Deletion

• Delete root (21)

21

20 17

15 10 6 9

2 8
MaxHeap Deletion

• Step 1 (Move the last element to root)

20 17

15 10 6 9

Copy data of the last element to the top and delete the last node
MaxHeap Deletion

• Step 2

20

8 17

15 10 6 9

2 Trickle down : 8 <-> 20 (not 8<->17)


MaxHeap Deletion

• Step 3

20

15 17

8 10 6 9

2 Trickle down : 8 <-> 15 (not 8<->10), done!


Note

• Insertion / deletion complexity : O(log n)


• Siblings are not necessarily ordered
– Only parent > children is guaranteed in MaxHeap
• No guarantee that “left < right”
HeapSort
HeapSort
Some Thoughts on
HeapSort
• MaxHeap (or MinHeap) is easily implemented
using array: Efficient computation of
parent/child index
• In case of MaxHeap, root (index 0) is
guaranteed to be the maximum value of the
array values
– Can be easily swapped with the last element of the array
– Such swap may “destroy” MaxHeap property
– But, “heapify” operation will find the largest element among
the remaining array
– It can then be swapped. Mission accomplished!!!
DIY. Highly Recommended
PriorityQueue and
MinHeap/MaxHeap
DIY
1) Is it a binary search tree? If not, why not?
Explain.

2) Is it a complete tree? If not, why not? Explain 55

3) Is it a heap? If not, why not? Explain. If it is a


15 60
heap, is it a MinHeap or MaxHeap?

4) Is it an AVL tree? If not, why not? Explain. For 28 90


8
example, identify the node where AVL tree
property is violated and why it is violated.

5) Convert the above tree to AVL tree. Explain 3 18 30

where and which rotations are to be performed


and why. Draw the resulting AVL tree. 48

6) How can this tree be represented in array?


Illustrate it using 0-based (e.g., index starting
from 0) array format.

7) We want to delete 28 from the above tree. Show


what the resulting tree looks like.
Potential Questions
Graph Representation
Depth-First vs Breadth-First Search
Minimum Spanning Tree

Sungdeok (Steve) Cha

VinUni, CECS

1
Learning Objectives

• Understand why graph is frequently used in


algorithm development
• Understand graph representations
– Adjacency matrix, Adjacency list
– Which representation is more effective when (e.g., sparse vs
dense graph)?
• Understand depth-first and breath-first search
on graph
• Understand what minimum spanning tree is
and how it is used
– Kruskals’s algorithm, Prim’s algorithm
Why Graph Algorithms?

The Seven Bridges of Königsberg


How to maximize your day at Disney? The problem was to devise a walk through the city th
• The distance between attractions at would cross each of those bridges once and only o
• Preferred rides (e.g., weighted graph) nce.
• Waiting time
• … “Traveling Salesman Problem (TSP)” in CS is a
well-known problem whose “efficient” solutio
n is unknown
TSP
Beyond the scope of
COMP 1020
• Domain of “theoretical computer science”
What to Learn

•Graph Terminology
•Graph Representation
–Adjacency Matrix
–Adjacency List
•Graph Traversal
–Breath-first search (using FIFO Queue)
–Depth-first search (using LIFO Stack)
•Shortest Path Algorithm
–Dijkstra’s algorithm
–There are many others, but beyond the scope of COMP 1020
8
FYI Only.
Not in the exam^^
Graph Terminology:
Vertex (Node), Edge
A B A graph with four vertices and five edges

D C

• Graph can be directed (aka digraph) or


undirected

10
Directed Graph

Identify all the vertices and edges:

Each edge is an An ordered pair (u, v) means that


ordered pair of v (the destination) is adjacent to
vertices! u (the source)
Undirected Graph

Identify all the vertices and edges:

An unordered pair (u, v) means that v is adjac


ent to u, and u is adjacent to v.
Degree, Neighbor

Handshaking theorem with m edge


s: the sum of the degrees of all the v
ertices is always equal to twice the
number of edges.

deg(a) = 2, deg(b) = deg(c) = deg(f ) = 4, deg(d ) = 1, de


g(e) = 3, deg(g) = 0.
N(a) = {b, f }, N(b) = {a, c, e, f }, N(c) = {b, d, e, f }, N(d
) = {c}, N(e) = {b, c , f },
N(f) = {a, b, c, e}, N(g) = ∅ .
Discrete Mathematics and Its Applications 7th, Rosen, Kenneth

13
Degree, Neighbor

Loop adds two


to the degree

deg(a) = 4, deg(b) = deg(e) = 6, deg(c) = 1, de


g(d) = 5.
N(a) = {b, d, e}, N(b) = {a, b, c, d, e}, N(c) = {
b}, N(d) = {a, b, e},
N(e) = {a, b ,d}.
Discrete Mathematics and Its Applications 7th, Rosen, Kenneth

14
In-Degree, Out-Degree

•The in-degree of a vertex v, denoted deg−(v), is


the number of edges which terminate at v.
•The out-degree of v, denoted deg+(v), is the nu
mber of edges with v as their initial vertex.
•Note that a loop at a vertex contributes 1 to both
the in-degree and the out-degree of the vertex.

deg−(3)=2
deg+(3)=2

15
In-Degree, Out-Degree

deg−(a) = 2, deg−(b) = 2, deg−(c) = 3, de


g−(d) = 2, deg−(e) = 3, deg−(f) = 0.

deg+(a) = 4, deg+(b) = 1, deg+(c) = 2, d


eg+(d) = 2, deg+ (e) = 3, deg+(f) = 0.

Discrete Mathematics and Its Applications 7th, Rosen, Kenneth

16
Complete Graph

Discrete Mathematics and Its Applications 7th, Rosen, Kenneth

17
Cycle

(The starting and end vertex are the same)

Discrete Mathematics and Its Applications 7th, Rosen, Kenneth

18
Path and Cycle

•A vertex is adjacent if there exists an edge to it (


from other vertices).

•Path: a sequence of vertices in which each succe


ssive vertex is adjacent to its predecessor.

19
Connected Graph

A graph is said to be connected if every pair of vertices in the grap


h is connected. This means that there is a path between every pair
of vertices. An undirected graph that is not connected is called dis
connected.

Connected
component
(subset)

Connected Connected Disconnected

20
Graph vs Tree

•A tree is a special case of a graph.


•Any graph that is connected and contains no
cycles can be a tree (by picking one of its ver
tices as the root).
root

Connected Graph Tree

21
DAG
(Directed Acyclic Graph)
•A directed acyclic graph (DAG) is a graph that ha
s NO directed cycle.

A directed graph is a DAG if and only if it can


be topologically ordered, by arranging the v
ertices as a linear ordering that is consistent
with all edge directions.

22
DAG or not?

23
Why Study Graph Theory?
Graph-based Algorithms
Graph-based Algorithms
Graph Representation

• Two most used representations of a graph:


– Adjacency Matrix

– Adjacency List

27
Adjacency Matrix and
Undirected Graph

•Square matrix represents a finite graph.


•Symmetric if graph is undirected

B A B C D E

A 0 1 1 1 0
A
B 1 0 0 0 0
D C 1 0 0 1 0
symmetric
D 1 0 1 0 1
C
E 0 0 0 1 0
E

Two nodes are


connected

30
Adjacency Matrix and
Directed Graph

• Not necessarily symmetric

B A B C D E

A 0 1 1 0 0
A
B 0 0 0 0 0
D Symmetric?
C 0 0 0 0 0
D 1 0 1 0 1
C
E 0 0 0 0 0
E

Two nodes are


connected

31
Adjacency List

•Adjacency list is a representation of graph as an


array of linked list.
•An edge is denoted by a pointer from the source
vertex to the destination vertex.
–This adjacency list is maintained for each vertex in the graph.
–NULL pointer indicates the end of the adjacency list
•“Duplication” (?) occurs for undirected graph re
presentation
–It is equivalent to a bi-directional edge

32
Adjacency List: Another Example

B A B C D

B A
A

D C A D

D A C E
C
E D
E

B A C

A B A

D C

D A C E
C
E
E

33
Dense vs Sparse Graph
Breadth-First Graph Search
An Example BFS

Animation

Starting vertex: a

Queue a b f i c e g d h Done !
Depth-First Graph Search
An Example DFS

Animation

Starting vertex: a

ef
g
h
d
c
bi
a
Done !!
Stack
BFS vs DFS
Breadth-First Graph Search
BFS and Shortest Path
More Complex
BFS Example
• Perform BFS and find the shortest path from A
to J

B C

D E F G

H I

J
Minimum Spanning Trees

• Definition
– A spanning tree of least cost
– Cost: edge weights

• Greedy algorithms
– Find optimal solution in each stage
– Kruskal algorithm, Prim algorithm
– Constraints for minimum-cost spanning tree
• Use edges within the graph
• Use exactly n-1 edges
• We may not use edges that produce a cycle
Kruskal’s Algorithm
0 0 0
28
10 1 1 10 1
14
16
5 5 5
6 6 6
24 2 2 2
25
4 18 4 4
12
22
3 3 3

(a) (b) (c)

0 0 0

10 1 10 1 10 1
14 14
16
5 5 5
6 6 6
2 2 2
4 4 4
12 12 12
3 3 3

(d) (e) (f)

0 0

10 1 10 1
14 14
16 16
5 5
6 6
2 2
25
4 4
12 12
22 22
3 3

(g) (h)
E
3 2

B 2
F
4 3
D
2 E
5 6
3 2 E
3 2
A 3 B 2
C F B 2
F
D
2 D
3 2

A 3
C 3
A C
Prim Algorithm

0 0 0

10 1 10 1 10 1

5 5 5
0 6 6 6
28 2 2 2
25 25
4 4 4
10 1
14 22
16 3 3 3
5
6 (a) (b) (c)
24 2
25
4 18
0 0 0
12
22 10 1 10 1 10 1
3
14
16 16
5 5 5
6 6 6
2 2 2
25 25 25
4 4 4
12 12 12
22 22 22
3 3 3

(d) (e) (f)


Prim vs Kruskal Algorithms
Potential Questions
• Potential
Questions
– Princeton
• Potential
Questions
– Princeton
Potential Questions
Dijkstra’s Shortest-Path
Algorithm

Sungdeok (Steve) Cha

VinUni, CECS

1
Learning Objectives

• Understand the shortest path problem


– How such algorithm is used in real-world applications
– Learn to apply the algorithm through step-by-step
demonstration
• Understand the assumptions (or constraints)
associated with Dijkstra’s algorithm
– Understand why it won’t work on path with negative weight
– Bellman-Ford algorithm is outside the scope of COMP 1020
• Understand how “greedy algorithm” works
Edsger W. Dijkstra

1930~2002
Edsger W. Dijkstra
Real-World Applications of
Dijkstra’s Algorithm
• Navigation and Route Planning (e.g., Google
Maps, GPS systems, Transportation apps like
Uber, Waze, Grab, …)
– Robotics and Autonomous Vehicles, too
– Airline and railway scheduling as well
• Network Routing (Internet &
Telecommunications)
• AI and Game Development (e.g., A* algorithm)
• Social networks and recommendation systems
(e.g., people you may know)
• Electrical grid and circuit design
• …
Dijkstra’s
Shortest Path Algorithm
Weighted Graph

• It could be distance, airfare, or degree of traffic


congestion, …

4 A 1

3
B C

2 5
D
Dijkstra Algorithm
in Action
Source : 0

1 9
5
3 4 6
10
0 2 2 6
5 2
7 3
3

S : {}
Dijkstra Algorithm
in Action
Step 1 (Initially) inf
Source : 0
1 9 inf
5
0 inf 3 4 6 inf
10
0 2 2 inf 6
inf 5 2
7 3
3

S : {0}
Dijkstra Algorithm
in Action
Step 2 5 = min(inf, d(0)+5 = 5)
Source : 0
1 9 inf
5
0 10 3 4 6 inf
10
0 2 2 inf 6
5 2
7 3
3
7 = min(inf, d(0)+7)=7)
S : {0}
Dijkstra Algorithm
in Action
Step 3 5
Source : 0
1 9 inf
5
0 10 3 4 6 inf
10
0 2 2 inf 6
7 5 2
7 3
3

S : {0, 1}
Dijkstra Algorithm
in Action
Step 4 5
Source : 0 14 = min(inf, d(1)+9=14)
1 9
5
0 10 3 4 6 inf
10
0 2 2 inf 6
7 5 2
7 3
3

S : {0, 1}
Dijkstra Algorithm
in Action
Step 5 5
Source : 0
1 9 14
5
0 10 3 4 6 inf
10
0 2 2 inf 6
7 5 2
7 3
3

S : {0, 1, 3}
Dijkstra Algorithm
in Action
Step 6 5
Source : 0
1 9 14
5
0 10 3 4 6 inf
10
0 2 2 6
7 5 2
7 3
3 10 = min(inf, d(3)+3=10)

S : {0, 1, 3}
Dijkstra Algorithm
in Action
Step 7 5
Source : 0
1 9 14
5
0 10 3 4 6 inf
10
0 2 2 6
7 5 2
7 3
3 10

S : {0, 1, 3, 2}
Dijkstra Algorithm
in Action
Step 8 5
Source : 0
1 9 13 = min(14,d(2)+3)
5
0 10 3 4 6 inf
10
0 2 2 6
7 5 2
7 3
3 10 = min(10, d(2)+2)

S : {0, 1, 3, 2}
Dijkstra Algorithm
in Action
Step 9 5
Source : 0
1 9 13
5
0 10 3 4 6 inf
10
0 2 2 6
7 5 2
7 3
3 10

S : {0, 1, 3, 2, 5}
Dijkstra Algorithm
in Action
Step 10 5
Source : 0
1 9 13
5
0 10 3 4 6 12
10
0 2 2 6
7 5 2
7 3
3 10

S : {0, 1, 3, 2, 5}
Dijkstra Algorithm
in Action
Step 11 5
Source : 0
1 9 13
5
0 10 3 4 6 12
10
0 2 2 6
7 5 2
7 3
3 10

S : {0, 1, 3, 2, 5, 6}
Dijkstra Algorithm
in Action
Step 12 5
Source : 0
1 9 13
5
0 10 3 4 6 12
10
0 2 2 6
7 5 2
7 3
3 10

S : {0, 1, 3, 2, 5, 6, 4}
Dijkstra’s Shortest Path
Algorithm
• Though possibly intuitive, it can be difficult to
understand
• There are lots of visualization “helps” available
– YouTube, chatGPT, search engine, …
DIY

A
4 2

3
B C
6 5
2 7

D E F
1 5
Dijkstra’s Algorithm:
Step-by-Step

A
4 2
3
B C
6 5
2 7

D E F
1 5
Dijkstra’s Algorithm:
Step-by-Step

A
4 2
3
B C
6 5
2 7

D E F
1 5
Dijkstra’s Algorithm:
Step-by-Step

A
4 2
3
B C
6 5
2 7

D E F
1 5
Dijkstra’s Algorithm:
Step-by-Step

A
4 2
3
B C
6 5
2 7

D E F
1 5
Dijkstra’s Algorithm:
Step-by-Step

A
4 2
3
B C
6 5
2 7

D E F
1 5
Dijkstra’s Algorithm:
Step-by-Step

A
4 2
3
B C
6 5
2 7

D E F
1 5
Understand Dijkstra’s
Algorithm in Java
Constraints on
Dijkstra’s Algorithm
DIY

• Bellman-Ford algorithm is outside the scope of COMP 1020

5 C

A -10

4
B

-1
C

A -1

1
B
Dijksta vs Bellman-Ford
Dijksta vs Bellman-Ford

• Bellman-Ford algorithm is beyond the scope of


COMP 1020
– Should be taught in algorithm analysis course
Graph Coloring Problem

Chromatic number = 3

[source]

36
Greedy Algorithm
to Graph Coloring

37
Example: Graph Coloring
1. Start with vertex 1 and use the first color.
2. Vertex 2 doesn’t have any adjacent colors. Thus, we use the first one as well.
3. Vertex 3 has a blue vertex adjacent to it. Thus, we use the second one.
4. Similarly, vertex 4 has a blue vertex adjacent to it twice, so we also use the second one.
5. Vertex 5 has blue and green adjacent to it. Therefore, we have to use the third color for it.
6. Vertex 6 now has blue, green, and red vertices adjacent to it. As a result, we have to use the new
fourth color.
7. Finally, vertex 7 has blue, green, and orange adjacents. Since the third color (red) is not adjacent, we
can use it.

As a result, we used 4 colors to solve the problem.

38
Greedy Coloring Algorithm

• Fast but does not guarantee optimal solution


– Depending on the order of visiting nodes, “sub-optimal”
solutions can be given
– Understand pros and cons
• What if nodes are visited in the following
order?
– A–B–C–D A B
– A – D – B – C (or A – D – C – B)

C D
Where Greedy Algorithm
Failed
Potential Questions
Hash Table and Collision
Handling Algorithms

Sungdeok (Steve) Cha

VinUni, CECS

1
Learning Objectives

• Understand what hash table is and why it is a


preferred choice over other alternatives
• Understand why hash function plays critical
role in performance
• Collision is inevitable when using hash table.
Understand how collision can be handled
– Separate chaining, Linear probing, Quadratic probing, …
– Understand advantages and disadvantages of each choice
– Understand how to “properly” handle not only insertion but
also deletion
• Understand Java’s support for hashing
– java.util.HashMap, java.util.HashSet
What is Hashing?
Performance Comparison

• Array or LinkedList
– Overall O(n) time
• Binary search trees
– Expected O(log n) time search, insertion, deletion
– But, O(n) in the worst case
• Balanced binary search trees
– Guarantees O(log n) time search, insertion, and deletion
– AVL tree
• Hash table
– Expected O(1) time search, insertion, and deletion
Hashing

• Hash table or hash map is a data structure that


associates keys with values
• Example
– Phone book

Wikipedia

Hashing: key to index is computed using some rules (i.e., hash function)!
Hash Table

• Key-value pairs are stored in a fixed size


table called a hash table
– A hash table is partitioned into many buckets (b)
– Each bucket has many slots (s) slots
0 s-1
• Each slot holds one record
0 . . .
b
b
u
c
. . .
k
e . . .
t
s
. . .
b-1 . . .
Hash Table

• A hash function h(k) transforms the iden


tifier k (key) into an address in the hash
table
• Key density : n/T
– Fraction of keys in the table compared to the total number of poss
ible keys
– Usually very low because not all keys are used

– n: current number of key-value pairs in the table

– T : all possible keys

• Loading density(factor) : α=n/(sb)


– How much the hash table is used
Hash Function

• Hash function h computes hash table address b


ased only on the key value k
• Hash function may generate identical address f
or different keys (“collision”)
– Key density is usually very low, so there is a small chance for t
wo keys map to a same location
– “Perfect hash function” is ideal but difficult to achieve in reality
Simple Hashing Example

• b = 26, s = 2, n = 10
• Keys : GA, D, A, G, L ,A2, A1, A3, A4, E
• Hash function : A~Z to 0~25, first character
– A -> 0
– A2 -> 0
– D -> 3
– G -> 6
– GA -> 6
Simple Hashing Example

• How about A1 and A3?


– Synonyms : h(A1) = h(A3) = h(A) = 0, collision!
– No slot left : overflow!

A2, A1, A3 : collisions


A1, A3 : overflows
Hash Functions

• Modulo arithmetic
– h(x) = x mod tableSize
– tableSize is recommended to
be prime. Why?
• Multiplication method
– h(x) = (xA mod 1) * tableSize
– A: constant in (0, 1)
• Determines the “distribution
pattern” of hash function
• Usually not a random choice
(e.g., 0.6180339887
suggested by Knuth)
– tableSize
p
is not critical, usually
2 for an integer p
Hash Table Design Issues
• Choice of hash function
– Easy to compute
– Avoid collision as much as possible
• Overflow handling method
• Size of hash table
– If too small, the collision occurs often
– If too large, “resource” waste
• Application of Hash table in real-world applic
ations
– Emergency calls and location identification
– …
Collision Resolution

• The most important design decision in


hash tables
• resolves collision by a seq. of hash
values table[ ]
0
– h0(x)(=h(x)), h1(x), h2(x), h3(x), …


22 123

h(224) = 224 mod 101 = 22 table[22] is


occupied


Collision:

a key maps to an occupied location in the hash table


An example: h(x) = x mod 101 100
Collision-Resolution
Methods
• Separate chaining
– Each table[i] is maintained by a linked list
• Open addressing (resolves in the table)
– Linear probing
Simple version
• hi(x) = (h0(x) + i) % tableSize
Full version:
– Quadratic probing
hi(x) = (h0(x) + ai+b) %
• hi(x) = (h0(x) + i2) % tableSize tableSize

– Double hashing
• hi(x) = (h0(x) + i•β(x)) % tableSize
• β(x): another hash function Full version:
hi(x) = (h0(x) + ai2+bi+c) % tableSize
Separate Chaining

• Table[ ] is a table[ ]
header array of


linked lists
• No interference 22 123 224 22

between keys
not collided
– Open addressing
may interfere…


n-1
Separate Chaining
• h[k] = k%17 [0] 0 34

– Insert 6, 12, 34, 29, 28, 11, 23


, 7, 0, 33, 30, 45

– Can be kept sorted if desired [4]

• What’s the trade-off? 6 23


7
– Search?
[8]
– Insertion?

– Deletion? 11 28 45
[12] 12 29
• Overhead? 30

[16] 33
Other Approaches:
Open Addressing
• Rehashing
– Use a series of different hash functions h1, h2, ..., hm
– Examining A[hj(k)] for j=1, 2, ..., m
– Minimize clustering

• Random probing
– Examining A[(h(k)+s(i))%b] for i= 1, 2, ..., b-1
– s(i) : pseudo random number between 1 to b-1, each number is
generated only once
Open Addressing:
Linear Probing
• Insert 123, 24, 224, 22, 729, … in this order
– Insertion algorithm is intuitive, if non-trivial
table
Linear probing 0 []
hi(x) = (h0(x) + i) mod tableSize


bad w/ primary clustering

22 123 h0(123)=h0(224)=h0(22)=h0(729)=22
23 224 h0(729) + 1
24 24 h0(729) + 2 h0(24) = 24
i=3 25
26 22 h0(729) + 3
h3(729) = (h0(729) + 3) % 101 x = 729 729 h0(729) + 4

Linear probing with …


hi(x) = (h0(x) + i) mod 101

100
Linear Probing and
Deletion
• Item might not be at the first place we hash to (e.g.,
delete 22)
• Other items (e.g., with different hash value) might exist
in between table
0 []


22 123 h0(123)=h0(224)=h0(22)=h0(729)=22
23 224 h0(729) + 1
24 24 h0(729) + 2 h0(24) = 24
i=3 25 22 h0(729) + 3
h3(729) = (h0(729) + 3) % 101 x = 729 729 h0(729) + 4

Linear probing with
hi(x) = (h0(x) + i) mod 101

100
Linear Probing:
Deletion Impact on Search

Hash function: 0 13 0 13 0 13
hi(x) = (h0(x) + i) mod 13 1 1 1 1 DELETED
2 15 2 15 2 15
3 16 3 16 3 16
4 28 4 28 4 28
5 31 5 31 5 31
6 38 6 38 6 38
7 7 7 7 7 7
8 20 8 20 8 20
9 9 9
10 10 10
11 11 11
12 25 12 25 12 25

(a) Delete element 1 (b) Search 38, (c) Okay: marking it


wrong result! as DELETED
Linear Probing Summary

• Find available bucket by examining hash table


A[(h(k)+j)%b] for j=0, 1, 2, ..., b-1
• Insert
– Find empty bucket
• Search
– Find match key
– If empty, key is not in the table
• Delete
– May need to reorganize keys
Linear Probing

• Divisor = b (# of buckets) = 10
• h(k) = k%10, A(k)=(h(k)+j)%b

0 3 6 9
Linear Probing

• Divisor = b (# of buckets) = 10
• h(k) = k%10, A(k)=(h(k)+j)%b

0 3 6 9

3
• Insert 3
– h(3)=3%10=3, A[3] = 3
Linear Probing

• Divisor = b (# of buckets) = 10
• h(k) = k%10, A(k)=(h(k)+j)%b

0 3 6 9

3 7
• Insert 7
– h(7) = 7%10=7
Linear Probing

• Divisor = b (# of buckets) = 10
• h(k) = k%10, A(k)=(h(k)+j)%b

0 3 6 9

3 7
• Insert 13
X
– h(13)=13%10=3=collision & overflow!
Linear Probing

• Divisor = b (# of buckets) = 10
• h(k) = k%10, A(k)=(h(k)+j)%b

0 3 6 9

3 13 7
• Insert 13
– (h(13)+1)%10 = (3+1)%10 = 4
Linear Probing

• Divisor = b (# of buckets) = 10
• h(k) = k%10, A(k)=(h(k)+j)%b

0 3 6 9

3 13 7
• Insert 23
X
– h(23) = 23%10 = 3 = collision & overflow!
Linear Probing

• Divisor = b (# of buckets) = 10
• h(k) = k%10, A(k)=(h(k)+j)%b

0 3 6 9

3 13 7
• Insert 23
X
– (h(23)+1)%10=4=collision & overflow!
Linear Probing

• Divisor = b (# of buckets) = 10
• h(k) = k%10, A(k)=(h(k)+j)%b

0 3 6 9

3 13 23 7
• Insert 23
– (h(23)+2)%10=5
Linear Probing

• Divisor = b (# of buckets) = 10
• h(k) = k%10, A(k)=(h(k)+j)%b

0 3 6 9

3 13 23 26 7
• Insert 26
– h(26) = 26%10 = 6
Linear Probing

• Divisor = b (# of buckets) = 10
• h(k) = k%10, A(k)=(h(k)+j)%b

0 3 6 9

3 13 23 26 7 36
• Insert 36
– h(36) = 36%10 = 6 = collision & overflow!
– Next available bucket: 8
Linear Probing

• Divisor = b (# of buckets) = 10
• h(k) = k%10, A(k)=(h(k)+j)%b

0 3 6 9

3 13 23 26 7 36
• Same color : same hash value group
Linear Probing
• Divisor = b (# of buckets) = 10
• h(k) = k%10, A(k)=(h(k)+j)%b

0 3 6 9

3 13 23 26 7 36

• Delete 23
– Search the right cluster if there is a key to shift to left
Linear Probing
• Divisor = b (# of buckets) = 10
• h(k) = k%10, A(k)=(h(k)+j)%b

0 3 6 9

3 13 26 7 36

• Delete 23
– A(26) = 6, A(7) = 7, A(36) = 8 (due to collision at 6) : no shi
fting required
Linear Probing
• Divisor = b (# of buckets) = 10
• h(k) = k%10, A(k)=(h(k)+j)%b

0 3 6 9

3 13 26 7 36

• Delete 7
– Search right cluster (which is 36)
Linear Probing
• Divisor = b (# of buckets) = 10
• h(k) = k%10, A(k)=(h(k)+j)%b

0 3 6 9

3 13 26 36

• Delete 7
– A(36) = 7 (since 6 is collision and 7 is empty)
– What happens if you don’t shift or mark it as deleted?
Linear Probing
• Divisor = b (# of buckets) = 10
• h(k) = k%10, A(k)=(h(k)+j)%b

0 3 6 9

3 13 26 36

• Delete 7
– Shift 36 to left
Linear Probing
• Divisor = b (# of buckets) = 10
• h(k) = k%10, A(k)=(h(k)+j)%b

0 3 6 9

3 13 26 D 36

• Delete without shifting


– Mark as deleted, and a new key can be inserted to that loca
tion later (retain cluster)
Performance of
Linear Probing
• Worst-case find/insert/delete time
– O(n), when?
• Pros
– Simple to compute

• Cons
– Clustering
– Worst case O(n)
– Delete can be expensive
Quadratic Probing

• Examining A[(h(k)+j2)%b] for j=0, 1, 2, ..., b-1


• Search the next available bucket from the origi
nal address by the distance 1, 4, 9, 16, ...
• Pros
– Simple calculation, reduce clustering
• Cons
– Not all buckets can be examined
– Can be minimized if b is prime number
Quadratic Probing

• Insert 123, 24, 224, 22, 729, … in this order


table[ ]


hi(x) = (h0(x) + i2) mod tableSize 22 123
2
23 224 h0(224) +1
24

2
26 22 h0(224) +2


i=2
2
x = 224 31 729 h0(224) +3
h2(224) = (h0(224) + 22) % 101


Quadratic probing with
hi(x) = (h0(x) + i2) mod 101
Why Quadratic Probing?
Behavioral Difference?
DIY

• Easy-to-medium vs Medium difficulty


Remember…

• A hash table performs bad when the load


factor(𝛼) is too high
– Load factor of 0.7 is often considered “ideal”
• Generally, set a threshold and if the load factor
surpasses it
- Double the size of the hash table and rehash all the
elements in the table
# 𝐨𝐟 𝐨𝐜𝐜𝐮𝐩𝐢𝐞𝐝 𝐬𝐥𝐨𝐭𝐬
𝜶=
𝐡𝐚𝐬𝐡 𝐭𝐚𝐛𝐥𝐞 𝐬𝐢𝐳𝐞
HashTable Usage

• Is one array a subset of another?


– Why put larger array into HashTable and perform lookup on small
array? What’s trade-off?
– Alternative solutions?
Useful Training on
“Computational Thinking”
java.util.HashMap
java.util.HashMap
HashMap and
Collision Handling
java.util.HashSet
java.util.HashSet
java.util.HashSet
java.util.HashSet
Summary of Java
& Built-In Data Structure
Map Interface
• Potential
Questions
– U of Waterloo
Potential Questions

You might also like