Data Structures and Algorithms
Data Structures and Algorithms
Structure
1.0 Objectives
1.1 Introduction
1.2 Overview of Data structures
1.3 Abstract Data types and their Implementation using arrays and Linked lists
1.6 Summary
1.7 Keywords
1.9 Questions
1.10 Reference
1.0 OBJECTIVES
1.1 INTRODUCTION
Data structures are a crucial aspect of computer science and programming, providing a way
to organize and store data efficiently for various computational tasks. They are essential for
designing algorithms and solving problems. Here's an overview of some common data
structures:
1. Arrays:
A collection of elements, each identified by an index or a key.
Elements are stored in contiguous memory locations.
Access time is constant, O(1), but insertion and deletion can be inefficient,
especially in the middle of the array.
2. Linked Lists:
A linear data structure consisting of nodes, where each node contains data and
a reference (or link) to the next node in the sequence.
Dynamic size and efficient insertion and deletion at any position.
1
Access time is O(n) in the worst case.
3. Stacks:
A Last In, First Out (LIFO) data structure where elements are added and
removed from the same end, known as the top.
Common operations include push (addition) and pop (removal).
4. Queues:
A First In, First Out (FIFO) data structure where elements are added at the rear
and removed from the front.
Common operations include enqueue (addition) and dequeue (removal).
5. Trees:
A hierarchical data structure with a root node and branches of nodes, forming
a tree-like structure.
Binary trees are particularly common, with each node having at most two
children.
Special types include binary search trees (BST), AVL trees, and red-black
trees.
6. Graphs:
A collection of nodes (vertices) and edges connecting these nodes.
Can be directed or undirected, and may have weights associated with edges.
Common algorithms include depth-first search (DFS) and breadth-first search
(BFS).
7. Hash Tables:
A data structure that maps keys to values using a hash function.
Provides constant-time average case complexity for basic operations like
insertion, deletion, and lookup.
8. Heaps:
A specialized tree-based data structure used for heap sorting and priority
queue implementations.
Common types include min-heap and max-heap.
9. Tries:
An ordered tree data structure used to store a dynamic set or associative array
where keys are strings.
Particularly efficient for string-related operations.
2
10. Sets and Maps:
Sets store unique elements, and maps associate keys with values.
Implementations include hash sets, linked sets, hash maps, and tree maps.
Understanding the strengths and weaknesses of different data structures is crucial for
choosing the right one for a specific problem and optimizing algorithm performance.
Different data structures excel in different scenarios, and the choice often depends on the
requirements of the task at hand.
Data Structure is a systematic way to organize data in order to use it efficiently. Following
terms are the foundation terms of a data structure.
Interface − Each data structure has an interface. Interface represents the set of
operations that a data structure supports. An interface only provides the list of
supported operations, type of parameters they can accept and return type of these
operations.
Implementation − Implementation provides the internal representation of a data
structure. Implementation also provides the definition of the algorithms used in the
operations of the data structure.
Characteristics of a Data Structure
Correctness − Data structure implementation should implement its interface
correctly.
Time Complexity − Running time or the execution time of operations of data
structure must be as small as possible.
Space Complexity − Memory usage of a data structure operation should be as little as
possible.
Need for Data Structure
As applications are getting complex and data rich, there are three common problems that
applications face now-a-days.
3
Data Search − Consider an inventory of 1 million(106) items of a store. If the
application is to search an item, it has to search an item in 1 million(106) items every
time slowing down the search. As data grows, search will become slower.
Processor speed − Processor speed although being very high, falls limited if the data
grows to billion records.
Multiple requests − As thousands of users can search data simultaneously on a web
server, even the fast server fails while searching the data.
To solve the above-mentioned problems, data structures come to rescue. Data can be
organized in a data structure in such a way that all items may not be required to be searched,
and the required data can be searched almost instantly.
There are three cases which are usually used to compare various data structure's execution
time in a relative manner.
Worst Case − This is the scenario where a particular data structure operation takes
maximum time it can take. If an operation's worst case time is ƒ(n) then this operation
will not take more than ƒ(n) time where ƒ(n) represents function of n.
Average Case − This is the scenario depicting the average execution time of an
operation of a data structure. If an operation takes ƒ(n) time in execution, then m
operations will take mƒ(n) time.
Best Case − This is the scenario depicting the least possible execution time of an
operation of a data structure. If an operation takes ƒ(n) time in execution, then the
actual operation may take time as the random number which would be maximum as
ƒ(n).
Basic Terminology
Data − Data are values or set of values.
Data Item − Data item refers to single unit of values.
Group Items − Data items that are divided into sub items are called as Group Items.
Elementary Items − Data items that cannot be divided are called as Elementary
Items.
Attribute and Entity − An entity is that which contains certain attributes or
properties, which may be assigned values.
4
Entity Set − Entities of similar attributes form an entity set.
Field − Field is a single elementary unit of information representing an attribute of an
entity.
Record − Record is a collection of field values of a given entity.
File − File is a collection of records of the entities in a given entity set.
1.3 Abstract Data types and their Implementation using arrays and Linked lists
Abstract Data types and their Implementation using arrays and Linked lists:
Abstract Data Types (ADTs) : are high-level descriptions of data structures that define a set
of operations and the behavior of those operations without specifying how the data structure
is implemented. ADTs provide a way to understand and interact with data structures at a
conceptual level, abstracting away the implementation details. The primary goal is to
encapsulate data and operations into a unified interface, allowing users to work with the data
structure without needing to know the underlying implementation.
1. List:
An ordered collection of elements with dynamic size.
Operations: Insert, Delete, Find, Traverse, Get Size, etc.
2. Stack:
A Last In, First Out (LIFO) structure.
Operations: Push, Pop, Peek, Is Empty, etc.
3. Queue:
A First In, First Out (FIFO) structure.
Operations: Enqueue, Dequeue, Peek, Is Empty, etc.
4. Set:
An unordered collection of unique elements.
Operations: Add, Remove, Contains, Union, Intersection, etc.
5. Map (Dictionary):
A collection of key-value pairs.
5
Operations: Insert, Delete, Find, Get Value by Key, etc.
6. Tree:
A hierarchical structure with nodes and edges.
Operations: Traverse (Inorder, Preorder, Postorder), Search, Insert, Delete, etc.
7. Graph:
A collection of nodes and edges.
Operations: Traverse (Depth-First Search, Breadth-First Search), Shortest
Path, Connect, Disconnect, etc.
8. Heap:
A specialized tree-based structure often used for priority queues.
Operations: Insert, Extract Min/Max, Heapify, etc.
9. Priority Queue:
A data structure where each element has an associated priority.
Operations: Insert, Extract Min/Max, Peek, etc.
10. Hash Table:
A data structure that maps keys to values using a hash function.
Operations: Insert, Delete, Search, etc.
These abstract definitions allow programmers to choose and implement specific data
structures based on the requirements of their applications. The same abstract data type can be
implemented using different underlying data structures (e.g., lists with arrays or linked lists).
ADTs provide a powerful way to reason about and design algorithms, promoting modular and
maintainable code.
Abstract Data Types (ADTs) provide a high-level description of the behavior and properties
of a data structure, independent of its implementation details. Two common ADTs are Lists
and Stacks. Let's explore their implementation using arrays and linked lists:
1. List ADT:
Represents an ordered collection of elements with dynamic size.
Implementation using Arrays:
6
#include <stdio.h>
#include <stdlib.h>
#define INITIAL_CAPACITY 10
typedef struct {
int *array;
int size;
int capacity;
} List;
if (list->array == NULL) {
exit(EXIT_FAILURE);
list->size = 0;
list->capacity = INITIAL_CAPACITY;
7
void append(List *list, int value) {
list->capacity *= 2;
if (list->array == NULL) {
exit(EXIT_FAILURE);
list->array[list->size++] = value;
exit(EXIT_FAILURE);
return list->array[index];
8
void display(List *list) {
printf("\n");
free(list->array);
int main() {
List myList;
initialize(&myList);
append(&myList, 10);
append(&myList, 20);
9
append(&myList, 30);
append(&myList, 40);
display(&myList);
freeList(&myList);
return 0;
In this implementation:
The List struct contains an array to store the elements, along with variables to track
the size and capacity of the array.
The initialize function initializes the list with an initial capacity and allocates memory
for the array.
The append function adds an element to the end of the list. If the array is full, it
dynamically reallocates memory to double its capacity.
The get function retrieves an element from the list at a specified index.
The display function prints the elements of the list.
10
The freeList function frees the memory allocated for the list array when it is no
longer needed.
This implementation provides a basic dynamic array-based list, allowing for the addition of
elements and retrieval of elements by index. It dynamically resizes the array to accommodate
more elements as needed.
Here's an implementation of a List Abstract Data Type (ADT) using singly linked lists in C:
#include <stdio.h>
#include <stdlib.h>
typedef struct {
Node* head;
int size;
} List;
11
}
newNode->data = value;
newNode->next = NULL;
if (list->head == NULL) {
list->head = newNode;
} else {
Node* current = list->head;
while (current->next != NULL) {
current = current->next;
}
current->next = newNode;
}
list->size++;
}
int main() {
List myList;
initialize(&myList);
return 0;
}
13
In this implementation:
The List struct contains a pointer to the head of the linked list and a variable to track
the size of the list.
The initialize function initializes the list with a null head pointer and size zero.
The append function adds a new node with the given value to the end of the linked
list.
The get function retrieves the value at the specified index in the linked list.
The display function prints all the elements in the linked list.
The freeList function frees the memory allocated for all nodes in the linked list when
it is no longer needed.
This implementation provides a basic singly linked list-based list, allowing for the addition of
elements and retrieval of elements by index. It dynamically allocates memory for new nodes
as elements are added and frees the memory when the list is no longer needed.
1.7 SUMMARY
In this unit we have learnt about Overview of Data structures, Abstract Data types and their
Implementation using arrays and Linked lists.
An abstract data type is an abstraction of a data structure that provides only the interface to
which the data structure must adhere. The interface does not give any specific details about
something should be implemented or in what programming language.
In other words, we can say that abstract data types are the entities that are definitions of data
and operations but do not have implementation details. In this case, we know the data that we
are storing and the operations that can be performed on the data, but we don't know about the
implementation details. The reason for not having implementation details is that every
programming language has a different implementation strategy for example; a C data
structure is implemented using structures while a C++ data structure is implemented using
objects and classes.
14
For example, a List is an abstract data type that is implemented using a dynamic array and
linked list. A queue is implemented using linked list-based queue, array-based queue, and
stack-based queue. A Map is implemented using Tree map, hash map, or hash table.
Before knowing about the abstract data type model, we should know about abstraction and
encapsulation.
Abstraction: It is a technique of hiding the internal details from the user and only showing the
necessary details to the user.
Encapsulation: It is a technique of combining the data and the member function in a single
unit is known as encapsulation.
The above figure shows the ADT model. There are two types of models in the ADT model,
i.e., the public function and the private function. The ADT model also contains the data
structures that we are using in a program. In this model, first encapsulation is performed, i.e.,
all the data is wrapped in a single unit, i.e., ADT. Then, the abstraction is performed means
showing the operations that can be performed on the data structure and what are the data
structures that we are using in a program.
1.8 KEYWORDS
15
Data structures, Linked lists., Arrays.
1.9 QUESTIONS
1.10 REFERENCES
16
UNIT 2: Algorithm Analysis
Structure:
2.0 Objectives
2.1 Introduction
2.2 Measuring the efficiency of algorithms, time and space complexity.
2.3 Big-O notation and Performance Analysis Techiques.
2.4 Summary
2.5 Keywords
2.6 Questions
2.7 Reference
2.0 OBJECTIVES
2.1 INTRODUCTION
1. Time Complexity:
Time complexity measures the amount of time an algorithm takes to complete
as a function of the input size.
It is expressed using Big O notation, which provides an upper bound on the
growth rate of an algorithm.
Common time complexities include O(1) (constant time), O(log n)
(logarithmic time), O(n) (linear time), O(n log n) (linearithmic time), O(n^2)
(quadratic time), etc.
17
2. Space Complexity:
Space complexity measures the amount of memory an algorithm uses as a
function of the input size.
Like time complexity, space complexity is also expressed using Big O
notation.
It considers both the auxiliary space (extra space required for algorithm
execution) and input space.
3. Big O Notation:
Big O notation describes the upper bound or worst-case scenario of an
algorithm's growth rate.
It provides a simplified representation of the algorithm's efficiency, ignoring
constant factors and lower-order terms.
Common Big O notations include O(1), O(log n), O(n), O(n log n), O(n^2),
O(2^n), etc.
4. Best, Worst, and Average Case Analysis:
Algorithms may have different time complexities in different scenarios.
Best-case time complexity represents the minimum time an algorithm takes for
any input.
Worst-case time complexity represents the maximum time an algorithm takes
for any input.
Average-case time complexity considers the expected time over all possible
inputs.
5. Amortized Analysis:
Amortized analysis provides the average performance of an algorithm over a
sequence of operations.
It considers the total cost of a sequence of operations divided by the number of
operations, providing a more accurate picture of average performance.
6. Space-Time Tradeoff:
Some algorithms optimize for time complexity at the cost of increased space
complexity and vice versa.
Analyzing the tradeoff between time and space complexity is crucial for
choosing the most suitable algorithm for a particular application.
7. Empirical Analysis:
18
In addition to theoretical analysis, empirical analysis involves measuring the
actual performance of an algorithm on real-world data.
This can help validate theoretical predictions and identify practical
considerations that may impact performance.
Efficient algorithms strike a balance between time and space complexity, and algorithm
analysis provides insights into the scalability and practicality of different solutions. It helps in
making informed decisions about algorithm selection based on the requirements and
constraints of a given problem.
Measuring the efficiency of algorithms involves analyzing their time complexity and space
complexity. Time complexity refers to the amount of time an algorithm takes to complete as a
function of the input size, while space complexity measures the amount of memory the
algorithm uses. Here's how you can analyze and measure these complexities:
Time Complexity:
1. Counting Basic Operations:
Identify the basic operations that contribute to the running time of the
algorithm.
Count the number of these operations as a function of the input size.
2. Asymptotic Analysis:
Express the count of basic operations as a mathematical function, ignoring
constant factors and lower-order terms.
Use Big O notation to represent the upper bound of the growth rate.
3. Best, Worst, and Average Case Analysis:
Determine the best-case, worst-case, and average-case time complexity.
Consider scenarios that lead to the minimum, maximum, and average running
times.
19
4. Recurrence Relations:
For recursive algorithms, formulate recurrence relations to express the time
complexity.
Solve the recurrence relation to obtain a closed-form expression.
5. Time Complexity Classes:
Classify the algorithm into a time complexity class (e.g., O(1), O(log n), O(n),
O(n log n), O(n^2), etc.).
Compare the growth rates to assess scalability.
Space Complexity:
1. Counting Memory Usage:
Identify the memory requirements of the algorithm, including variables, data
structures, and recursion stack.
Count the amount of memory used as a function of the input size.
2. Asymptotic Analysis for Space:
Express the count of memory usage as a mathematical function using Big O
notation.
Consider both auxiliary space (extra space required for algorithm execution)
and input space.
3. Space Complexity Classes:
Classify the algorithm into a space complexity class (e.g., O(1), O(log n),
O(n), O(n^2), etc.).
Evaluate the tradeoff between time and space complexity.
Practical Considerations:
1. Empirical Testing:
Implement the algorithm and test it on real-world data.
Measure the actual running time and memory usage using profiling tools.
2. Benchmarking:
Compare the algorithm's performance with other algorithms solving the same
problem.
Consider the constants hidden by Big O notation in practical scenarios.
3. Optimizations:
Explore opportunities for algorithmic optimizations or improvements to
reduce time or space complexity.
20
Evaluate the impact of these optimizations on overall efficiency.
4. Scalability Analysis:
Analyze how the algorithm performs as the input size increases.
Identify any points of diminishing returns or potential bottlenecks.
Big-O Notation:
Big-O notation is a mathematical notation used to describe the upper bound or worst-case
performance of an algorithm in terms of its input size. It provides an asymptotic upper bound
on the growth rate of the algorithm's time or space complexity. Here are some common
complexities expressed in Big-O notation:
21
Performance Analysis Techniques:
1. Counting Operations:
Identify basic operations and count the number of times each operation is
executed as a function of the input size.
2. Asymptotic Analysis:
Express the count of operations using Big-O notation to provide an upper
bound on the growth rate.
3. Best, Worst, and Average Case Analysis:
Consider scenarios that lead to the minimum, maximum, and average running
times to understand the algorithm's behavior.
4. Recurrence Relations:
For recursive algorithms, formulate recurrence relations to express the time
complexity and solve them to obtain a closed-form expression.
5. Amortized Analysis:
Analyze the average performance of a sequence of operations to provide a
more accurate picture of average efficiency.
6. Benchmarking:
Compare the algorithm's performance with other algorithms solving the same
problem to understand its relative efficiency.
7. Empirical Testing:
Implement the algorithm and test it on real-world data to measure the actual
running time and memory usage using profiling tools.
8. Space Complexity Analysis:
Identify the memory requirements of the algorithm and express them using
Big-O notation.
9. Optimizations:
Explore opportunities for algorithmic optimizations to reduce time or space
complexity while preserving correctness.
10. Scalability Analysis:
Analyze how the algorithm performs as the input size increases to ensure it scales
well with growing data.
22
Big-O notation and performance analysis techniques provide a framework for understanding,
comparing, and optimizing algorithms. They are crucial for making informed decisions about
algorithm selection based on the specific requirements and constraints of a given problem.
Big O Notation in Data Structure is used to express algorithmic complexity using algebraic
terms. It describes the upper bound of an algorithm's runtime and calculates the time and
amount of memory needed to execute the algorithm for an input value.
Mathematical Definition
Consider the functions f(n) and g(n), where functions f and g are defined on an unbounded set
of positive real numbers. g(n) is strictly positive for every large value of n.
The function f is said to be O(g) (read as big- oh of g), if, for a constant c>0 and a natural
number n0,
f(n) = O(g(n))
23
Properties of Big O Notation
Constant Multiplication:
Summation Function:
Logarithmic Function:
O(f(n)) = O(g(n))
Polynomial Function:
O(f(n)) = O(nm)
In order to analyze and calculate an algorithm's performance, we must calculate and compare
the worst-case runtime complexities of the algorithm. The order of O(1) - known as the
Constant Running Time - is the fastest running time for an algorithm, with the time taken by
the algorithm being equal for different input sizes. Although the Constant Running Time is
24
the ideal runtime for an algorithm, it can be rarely achieved because the runtime depends on
the size of n inputted.
n=20,
20 = 20
20^2 = 400
2^20 = 1084576
Runtime Complexity for Bubble Sort, Insertion Sort, Selection Sort, Bucket Sort -
O(n^c).
Runtime Complexity for Heap Sort, Merge Sort - O(n log n).
25
How Does Big O Notation Analyze Space Complexity?
It is also essential to determine the Space Complexity of an algorithm. This is because space
complexity indicates how much memory space the algorithm occupies. We compare the
worst-case space complexities of the algorithm.
Before the Big O notation analyzes the Space complexity, the following tasks need to be
implemented:
2. The size of input n needs to be known to calculate the memory each item will hold.
Linear Search, Binary Search, Bubble sort, Selection sort, Heap sort, Insertion sort - Space
Complexity is O(1).
int min = i;
26
for(int j=i; j<n; j++)
if(array[j]<array[min])
min=j;
array[i] = array[min];
array[min] = temp;
Explanation:
The range of the first (outer) for loop is i<n, meaning the order of the loop is O(n).
The range for the second (inner) for loop is j<n; so, the order of the loop is again O(n).
Average efficiency is calculated as n/2 for a constant c, but we ignore the constant. Thus, the
order comes to be O(n).
We get runtime complexity by multiplying the inner and outer loop order. It is O(n^2).
In this way, you can implement other algorithms in C, and analyze and determine the
complexities.
27
2.4 SUMMARY
Data structure analysis assesses how algorithms interact with and manipulate different data
structures. This entails determining the best-case, worst-case, and average-case situations for
algorithms, assessing their time and space complexity, and comprehending how they behave
with different input quantities.
Developers can choose algorithms with greater knowledge and efficiency, resulting in
scalable and more effective software solutions, by thoroughly analyzing algorithms in the
context of data structures.
2.5 KEYWORDS
2.6 QUESTIONS
28
2.7 REFERENCES
29
UNIT 3: Stacks and Queues
Structure:
3.0 Objectives
3.1 Introduction
3.2 Implementing Stack and Queue data structures, their applications , and usage in design
algorithm.
3.3 Summary
3.4 Keywords
3.5 Questions
3.6 Reference
3.0 OBJECTIVES
3.1 INTRODUCTION
A stack is a last-in, first-out (LIFO) data structure, where the last element added is the first
one to be removed. It follows the principle of adding elements to the "top" and removing
them from the same "top" position.
Operations:
Push: Add an element to the top of the stack.
Pop: Remove the element from the top of the stack.
Peek (or Top): View the element at the top without removing it.
isEmpty: Check if the stack is empty.
Applications:
Function call management (call stack in programming).
Expression evaluation (postfix, prefix, and infix notations).
Undo mechanisms in applications.
30
Backtracking algorithms.
Implementation:
Can be implemented using arrays or linked lists.
Queues:
A queue is a first-in, first-out (FIFO) data structure, where the first element added is the first
one to be removed. It follows the principle of adding elements to the "rear" and removing
them from the "front."
Operations:
Enqueue: Add an element to the rear of the queue.
Dequeue: Remove the element from the front of the queue.
Front: View the element at the front without removing it.
isEmpty: Check if the queue is empty.
Applications:
Task scheduling in operating systems.
Print job management.
Breadth-first search in graph algorithms.
Handling requests in networking.
Implementation:
Can be implemented using arrays, linked lists, or specialized data structures
like a circular queue.
Differences:
Order of Removal:
Stack: Last In, First Out (LIFO).
Queue: First In, First Out (FIFO).
Operations:
Stack: Push, Pop, Peek.
Queue: Enqueue, Dequeue, Front.
Implementation:
Stacks can be easily implemented using arrays or linked lists.
31
Queues can be implemented using arrays, linked lists, or circular buffers.
Applications:
Stacks are used in scenarios where the last operation needs to be undone or
revisited.
Queues are used when tasks are processed in the order they arrive.
Both stacks and queues are fundamental data structures, and their simplicity makes them
useful in various applications. Choosing between them depends on the specific requirements
of a problem.
3.2 Implementing Stack and Queue data structures, their applications , and usage in
design algorithm
Implementing Stack and Queue data structures, their applications , and usage in
design algorithm
let's implement basic versions of a Stack and a Queue in Python and discuss their
applications and usage in algorithm design.
Stack Implementation:
Stack Implementation:
#include <stdio.h>
#include <stdlib.h>
typedef struct {
int data[MAX_SIZE];
int top;
} Stack;
32
void initialize(Stack *stack) {
stack->top = -1;
}
33
Queue Implementation:
#include <stdio.h>
#include <stdlib.h>
typedef struct {
int data[MAX_SIZE];
int front;
int rear;
} Queue;
Applications:
Function call stack in programming languages for managing function calls
and local variables.
Undo/Redo functionality in text editors and other applications.
Expression evaluation and conversion (infix to postfix, postfix to infix, etc.).
Backtracking algorithms (e.g., depth-first search in graphs).
Usage in Algorithm Design:
Used to implement depth-first search (DFS) algorithm in graph traversal.
Helps in solving problems requiring last-in-first-out (LIFO) behavior, such as
backtracking algorithms.
Can be used to reverse a sequence of elements efficiently.
Queue:
35
Applications:
Job scheduling in operating systems.
Print spooling in printers.
Breadth-first search (BFS) in graph traversal.
Buffer management in networking and I/O systems.
Usage in Algorithm Design:
Used to implement breadth-first search (BFS) algorithm in graph traversal.
Helps in solving problems requiring first-in-first-out (FIFO) behavior, such as
scheduling problems.
Can be used to simulate processes where entities arrive and leave in a
sequential manner.
In algorithm design, stacks and queues are fundamental data structures used in
various problem-solving scenarios. Understanding their properties and operations is
crucial for efficiently solving problems in computer science and related fields.
You can perform the implementation of stacks in data structures using two data structures
that are an array and a linked list.
Array: In array implementation, the stack is formed using an array. All the
operations are performed using arrays. You will see how all operations can be
implemented on the stack in data structures using an array data structure.
36
Linked-List: Every new element is inserted as a top element in the linked list
implementation of stacks in data structures. That means every newly inserted
element is pointed to the top. Whenever you want to remove an element from the
stack, remove the node indicated by the top, by moving the top to its previous node
in the list.
37
Application of Stack in Data Structures
Backtracking
Function Call
Parentheses Checking
String Reversal
Syntax Parsing
Memory Management
38
1. Expression Evaluation and Conversion
There are three types of expression that you use in programming, they are:
Infix Expression: An infix expression is a single letter or an operator preceded by one single
infix string followed by another single infix string.
X
X+Y
(X + Y ) + (A - B)
Prefix Expression: A prefix expression is a single letter or an operator followed by two prefix
strings.
X
+XY
++XY-AB
Postfix Expression: A postfix expression (also called Reverse Polish Notation) is a single
letter or an operator preceded by two postfix strings.
X
XY+
XY+CD-+
Similarly, the stack is used to evaluate these expressions and convert these expressions like
infix to prefix or infix to postfix.
2. Backtracking
39
To solve the optimization problem with backtracking, you have multiple solutions; it does not
matter if it is correct. While finding all the possible solutions in backtracking, you store the
previously calculated problems in the stack and use that solution to resolve the following
issues.
The N-queen problem is an example of backtracking, a recursive algorithm where the stack is
used to solve this problem.
3. Function Call
Whenever you call one function from another function in programming, the reference of
calling function stores in the stack. When the function call is terminated, the program control
moves back to the function call with the help of references stored in the stack.
So stack plays an important role when you call a function from another function.
40
4. Parentheses Checking
Stack in data structures is used to check if the parentheses like ( ), { } are valid or not in
programing while matching opening and closing brackets are balanced or not.
So it stores all these parentheses in the stack and controls the flow of the program.
For e.g ((a + b) * (c + d)) is valid but {{a+b})) *(b+d}] is not valid.
5. String Reversal
Another exciting application of stack is string reversal. Each character of a string gets stored
in the stack.
The string's first character is held at the bottom of the stack, and the last character of the
string is held at the top of the stack, resulting in a reversed string after performing the pop
operation.
6. Syntax Parsing
Since many programming languages are context-free languages, the stack is used for syntax
parsing by many compilers.
7. Memory Management
Memory management is an essential feature of the operating system, so the stack is heavily
used to manage memory.
41
Array representation of Queue
We can easily represent queue by using linear arrays. There are two variables i.e. front and
rear, that are implemented in the case of every queue. Front and rear variables point to the
position from where insertions and deletions are performed in a queue. Initially, the value of
front and queue is -1 which represents an empty queue. Array representation of a queue
containing 5 elements along with the respective values of front and rear, is shown in the
following figure.
The above figure shows the queue of characters forming the English word "HELLO". Since,
No deletion is performed in the queue till now, therefore the value of front remains -1 .
However, the value of rear increases by one every time an insertion is performed in the
queue. After inserting an element into the queue shown in the above figure, the queue will
look something like following. The value of rear will become 5 while the value of front
remains same.
42
After deleting an element, the value of front will increase from -1 to 0. however, the queue
will look something like following.
Check if the queue is already full by comparing rear to max - 1. if so, then return an overflow
error.
43
If the item is to be inserted as the first element in the list, in that case set the value of front
and rear to 0 and insert the element at the rear end.
Otherwise keep increasing the value of rear and insert each element one by one having rear as
the index.
Algorithm
o Step 1: IF REAR = MAX - 1
Write OVERFLOW
Go to step
[END OF IF]
o Step 2: IF FRONT = -1 and REAR = -1
SET FRONT = REAR = 0
ELSE
SET REAR = REAR + 1
[END OF IF]
o Step 3: Set QUEUE[REAR] = NUM
o Step 4: EXIT
C Function
void insert (int queue[], int max, int front, int rear, int item)
{
if (rear + 1 == max)
{
printf("overflow");
}
else
{
if(front == -1 && rear == -1)
{
front = 0;
rear = 0;
}
else
{
rear = rear + 1;
}
queue[rear]=item;
44
}
}
If, the value of front is -1 or value of front is greater than rear , write an underflow message
and exit.
Otherwise, keep increasing the value of front and return the item stored at the front end of the
queue at each time.
Algorithm
o Step 1: IF FRONT = -1 or FRONT > REAR
Write UNDERFLOW
ELSE
SET VAL = QUEUE[FRONT]
SET FRONT = FRONT + 1
[END OF IF]
o Step 2: EXIT
C Function
int delete (int queue[], int max, int front, int rear)
{
int y;
if (front == -1 || front > rear)
{
printf("underflow");
}
else
{
y = queue[front];
if(front == rear)
{
front = rear = -1;
else
front = front + 1;
45
}
return y;
}
}
3.3 SUMMARY
In this unit we have discussed in detail about . Stacks and Queues. Implementation of Stack
and Queue data structures, their applications , and usage in design algorithm.
3.4 KEYWORDS
Stacks ,
Queues.
design algorithm.
3.5 QUESTIONS
3.6 REFERENCES
46
4. "Data Structures and Algorithms in C++" by Adam Drozdek
5. "Data Structures and Algorithms Made Easy" by Narasimha Karumanchi
6. "Data Structures and Algorithm Analysis in C++" by Mark A. Weiss
7. "Data Structures and Algorithms with Object-Oriented Design Patterns in C++" by
Bruno R. Preiss
47
UNIT 4: Linked Lists
Structure:
4.0 Objectives
4.1 Introduction
4.2 Singly Linked Lists, Doubly Linked Lists and Circular Linked Lists, their
Implementation and applications.
4.3 Summary
4.4 Keywords
4.5 Questions
4.6 Reference
4.0 OBJECTIVES
Singly Linked Lists, Doubly Linked Lists and Circular Linked Lists, their
Implementation and applications.
4.1 INTRODUCTION
A singly linked list is a data structure in which each element (node) contains a data part and a
link to the next node in the sequence.
1. Data Part: This part stores the actual data or payload associated with the
node. It could be of any data type depending on the application.
2. Link (or Pointer) Part: This part contains a reference (or pointer) to the next
node in the sequence. It establishes the logical connection between nodes
and allows traversal through the list.
48
Here's a simple visual representation of a singly linked list:
In this example:
Dynamic Size: Singly linked lists can dynamically grow or shrink in size as elements
are added or removed.
Sequential Access: Elements are accessed sequentially starting from the head node
and traversing through subsequent nodes.
Efficient Insertion and Deletion: Insertion and deletion operations are efficient at
the beginning or end of the list, but may require traversal for operations in the middle.
No Random Access: Unlike arrays, singly linked lists do not support direct access to
elements by index; traversal is required to reach a specific node.
Singly linked lists are widely used in various applications, such as:
49
Understanding singly linked lists is fundamental in computer science and forms the basis for
more complex data structures and algorithms.
#include <stdio.h>
#include <stdlib.h>
typedef struct {
int data[MAX_SIZE];
int top;
} Stack;
stack->top = -1;
50
}
if (!isFull(stack)) {
stack->data[++stack->top] = value;
} else {
printf("Stack overflow!\n");
if (!isEmpty(stack)) {
return stack->data[stack->top--];
} else {
printf("Stack underflow!\n");
51
if (!isEmpty(stack)) {
return stack->data[stack->top];
} else {
printf("Stack is empty!\n");
int main() {
Stack myStack;
initialize(&myStack);
push(&myStack, 10);
push(&myStack, 20);
push(&myStack, 30);
push(&myStack, 40);
52
// Popping elements from the stack
return 0;
A doubly linked list is a data structure similar to a singly linked list, with the addition of each
node having a pointer/reference to the previous node as well as the next node. This
bidirectional linkage allows traversal of the list in both forward and backward directions.
53
NULL <-> [Node 1] <-> [Node 2] <-> [Node 3] <-> ... <-> [Node N] <-> NULL
Each node contains two pointers: prev (pointer to the previous node) and next
(pointer to the next node).
The first node's prev pointer and the last node's next pointer are NULL, indicating
the start and end of the list, respectively.
Nodes can be traversed in both forward and backward directions.
#include <stdio.h>
#include <stdlib.h>
int data;
} Node;
typedef struct {
Node* head;
Node* tail;
} DoublyLinkedList;
54
void initialize(DoublyLinkedList *list) {
list->head = NULL;
list->tail = NULL;
newNode->data = value;
newNode->prev = NULL;
newNode->next = list->head;
if (list->head != NULL) {
list->head->prev = newNode;
} else {
list->tail = newNode; // if list was empty, newNode is now both head and tail
list->head = newNode;
55
void insertAtEnd(DoublyLinkedList *list, int value) {
newNode->data = value;
newNode->next = NULL;
newNode->prev = list->tail;
if (list->tail != NULL) {
list->tail->next = newNode;
} else {
list->head = newNode; // if list was empty, newNode is now both head and tail
list->tail = newNode;
current = current->next;
56
printf("\n");
current = current->prev;
printf("\n");
int main() {
DoublyLinkedList myList;
initialize(&myList);
insertAtBeginning(&myList, 10);
insertAtBeginning(&myList, 20);
57
insertAtBeginning(&myList, 30);
insertAtEnd(&myList, 40);
insertAtEnd(&myList, 50);
insertAtEnd(&myList, 60);
printf("Forward: ");
displayForward(&myList);
printf("Backward: ");
displayBackward(&myList);
return 0;
58
displayBackward: Displays the elements of the list in backward direction.
Doubly linked lists are used in scenarios where bidirectional traversal is required, such as
implementing text editors (for undo/redo functionality), implementing cache algorithms, and
representing sparse matrices.
Applications:
Doubly linked lists find application in various scenarios where bidirectional traversal and
efficient insertion and deletion operations at both ends of the list are required. Some common
applications include:
1. Text Editors:
Doubly linked lists can be used to implement the data structure behind text
editors for features like undo and redo operations.
Each node in the list can represent a text buffer, and the bidirectional links
allow for efficient navigation and manipulation of the text.
2. Cache Implementation:
Doubly linked lists are used in implementing LRU (Least Recently Used) and
MRU (Most Recently Used) cache eviction policies.
Each node in the list represents a cached item, and when an item is accessed, it
is moved to the front or end of the list depending on the eviction policy.
3. Browser History:
Doubly linked lists can be used to implement browser history functionalities.
Each visited page can be stored as a node in the list, and bidirectional links
allow for efficient navigation through the browsing history.
4. Music Playlist:
Doubly linked lists are suitable for implementing music playlist functionalities
in media players.
Each song can be represented as a node in the list, and bidirectional links
allow for moving forward and backward through the playlist.
5. Deque (Double-ended Queue) Implementation:
59
Doubly linked lists can be used to implement deque data structures, where
insertion and deletion operations are allowed at both ends of the list.
This is particularly useful in scenarios where both FIFO (First-In-First-Out)
and LIFO (Last-In-First-Out) operations are required.
6. Sparse Matrix Representation:
Doubly linked lists are used to represent sparse matrices efficiently.
Each node in the list represents a non-zero element of the matrix, and the
bidirectional links allow for easy traversal and manipulation of the matrix
elements.
7. Undo/Redo Functionality in Software:
Doubly linked lists are commonly used to implement undo/redo functionality
in various software applications.
Each action performed by the user can be stored as a node in the list, and
bidirectional links allow for efficient navigation through the action history.
Overall, doubly linked lists are versatile data structures with applications in various domains
where bidirectional traversal and efficient insertion and deletion operations are required.
A circular linked list is similar to a singly linked list, but the last node points back to the first
node.
A circular linked list is a variation of the linked list data structure where the last node's next
pointer points back to the first node, forming a circular loop. This means that the list has no
ending node; it loops back to the beginning.
Here
60
's a visual representation:
^ |
Each node contains two parts: data and a pointer to the next node.
The last node's next pointer points back to the first node, forming a loop.
Circular linked lists can be singly or doubly linked.
#include <stdio.h>
#include <stdlib.h>
int data;
} Node;
typedef struct {
Node* head;
61
} CircularLinkedList;
list->head = NULL;
newNode->data = value;
newNode->next = NULL;
if (list->head == NULL) {
list->head = newNode;
} else {
temp = temp->next;
62
temp->next = newNode;
if (list->head == NULL) {
return;
do {
current = current->next;
printf("\n");
63
int main() {
CircularLinkedList myList;
initialize(&myList);
insertAtEnd(&myList, 10);
insertAtEnd(&myList, 20);
insertAtEnd(&myList, 30);
display(&myList);
return 0;
64
1. Round-robin Scheduling: Used in scheduling algorithms where tasks are executed in
a circular manner.
2. Circular Buffers: Used in data streaming applications to manage continuous data
flow.
3. Implementation of Circular Queue: Used in data structures where elements are
inserted and removed in a circular manner, such as in the BFS algorithm's queue
implementation.
These types of linked lists have their own advantages and applications, and the choice
depends on the specific requirements of the problem. Singly linked lists are simple and
memory-efficient, doubly linked lists provide bidirectional traversal, and circular linked lists
have applications where nodes need to be traversed in a loop.
4.2 Singly Linked Lists, Doubly Linked Lists and Circular Linked Lists, their
Implementation and applications.
Singly Linked Lists, Doubly Linked Lists and Circular Linked Lists, their
Implementation and applications.
A singly linked list is a collection of nodes, which can be used to effectively implement
linear data structures such as stacks and queues. In general, each node in a singly linked list
consists of several fields out which one field is used to hold the address of the next node in a
list and all other fields are used to hold the data items of type primitive. Figure 4.1 shows an
example of a singly linked list with three fields. The first two fields hold the data and the
third field contains the address of the next node in the list.
65
(b) Structure of the node
Every node is a chunk of memory having an address. When a set of data elements to be
used by an application are represented using a linked list, each data element is represented by
a node. Depending on the information content of the data element, one or more data fields are
used in the node. However, in singly linked list only a single link field is used to point to
node, which represents its neighbouring element in the list. The last node in the linked list has
its link empty. The empty link is normally denoted using a cross mark.
Figure 4.2 shows the physical representation of a linked list where its logical
representation is shown in Figure 4.1. Note that the nodes are distributed all over the memory
and are not physically contiguous. Also observe that the link field of each node contains the
address of the node of its logical neighbour. The link field of the last node is NUL indicated
by a cross symbol. In the above example, the node at address 4026 containing the data
elements E and 50 is the last node of the linked list. This can be easily understood with the
Figure 4.1.
66
OPERATIONS ON SINGLY LINKED LISTS:
We have briefly described the various operations that can be performed on linked list, in
general, in Unit-I. In fact, we can realize all those operations in singly linked list. The
following sections describe the logical representation of all these operations along with their
associated algorithms.
Algorithm: Insert-First-SLL
e, element to be inserted.
Output: F, updated.
Method:
1. n = getnode()
2. n.data = e
3. n.link = F
4. F = n
Algorithm ends
In the above algorithm, n indicates the node to be added into the list. The getnode() function
allocates the required memory locations to a node n and assigns the address to it. The
statement n.data = e copies the element into the data field of the node n. The statement n.link
= F copies the address of the first node into the link field of the node n. Finally, the address of
the node n is placed in the pointer variable F, which always points to the first node of the list.
Thus an element is inserted into a singly linked list at the front. The Figure 4.3 below shows
the logical representation of the above operation. In figure 4.3(a) the pointer F contains the
address of the first node i.e. 1000. Figure 4.3(b) shows the node n to be inserted, which has
67
its address 5000 holding the data A. Figure 4.3(c) shows the linked list after inserting the
node n at the front end of the list. Now, F contains the address of the new node i.e. 5000.
Algorithm: Insert-Last-SLL
e, element to be inserted.
Output: F, updated.
Method: 1. n = getnode()
5. n.data = e
6. n.link = Null
7. if (F = Null) then F = n
68
else
T=F
T = T.Link
End-of-While
T.Link = n
Endif
Algorithm ends
Algorithm above illustrates the insertion of node at the end of a singly linked list. Observe
that the link field of the node to be inserted is Null. This is because the inserted node
becomes the end of the list, whose link field always contains the Null value. If the list is
empty, the inserted node becomes the first node and hence the pointer F will hold the address
of this inserted node. Otherwise, the list is traversed using a temporary variable till it reaches
the end of the list. This is accomplished using a statement T = T.Link. Once the last
node is found, the address of the new node is copied to the link field. Figure 4.4 below shows
the logical representation of inserting a node at the end of the linked list.
69
Figure 4.4 Logical representation of the insertion operation
Algorithm: Insert-Sorted-SLL
e, element to be inserted.
Output: F, updated.
Method: 1. n = getnode()
2. n.data = e
F=n
else
F=n
Else
T=F
T = T.Link
End-of-While
n.link = T.link
T.link = n
Endif
70
Endif
Algorithm ends
The above algorithm illustrates the operation of inserting an element into a sorted linked list.
This operation requires the information about the immediate neighbour of the current
element. This is done using the statement T.link.data. This is because, we need to insert an
element between two nodes. The rest of the statements are self explanatory and can be traced
with an example for better understanding of the algorithm. Figure 4.5 below shows the
logical representation of this operation.
Algorithm: Delete-First-SLL
71
Output: F, updated.
Method:
T=F
F = F.link
Dispose(T)
ifend
Algorithm ends
The above Algorithm is self explanatory. The function Dispose(T) is a logical deletion. It
releases the memory occupied by the node and freed memory is added to available memory
list. Figure 3.6 below shows the logical representation of this operation.
Algorithm: Delete-Last-SLL
72
Input: F, address of the first node.
Output: F, updated.
else
F = Null
else
T=F
T = T.Link
End-of-While
T.link = Null
T.link = n
endif
endif
Algorithm ends
From the algorithm above, we understand that we may come across three cases while deleting
an element from a list. When the list is empty, we just display an appropriate message. When
a list contains only one element, we remove it and the pointer holding the address of the first
node will be assigned a Null value. When a list contains more than one element, we traverse
through the linked list to reach the end of the linked list. Since we need to update the link
field of the node, predecessor to the node to be deleted, it requires checking the link field of
the next node being in the current node. This is accomplished using the statement
(T.link).link. Figure 4.7 below shows the logical representation of this operation.
73
(a) Linked list before deletion
Algorithm: Delete-Element-SLL
e, element to be deleted.
Output: F, updated.
Method:
else
else
74
T=F
T = T.Link
End-of-While
else
V = T.link
T.link = (T.link).link
Dispose(V)
endif
endif
endif
Algorithm ends
The above algorithm is self explanatory as described in the earlier algorithm. Figure 4.8
below shows the logical representation of this operation.
75
(a) Linked list after deleting a node containing element 40
Figure 4.8
Insertion and deletion at the beginning of a linked list are very fast. They involve changing
only one or two pointers, which takes O(1) time.
Finding or deleting a specified item requires searching through, on the average, half the items
in the list. This requires O(N) comparisons. An array is also O(N) for these operations, but
the linked list is nevertheless faster because nothing needs to be moved when an item is
inserted or removed. The increased efficiency can be significant, especially if a copy takes
much longer than a comparison.
Of course, another important advantage of linked lists over arrays is that the linked list uses
exactly as much memory as it needs, and can expand to fill all available memory.
The size of an array is fixed when it is created; this usually leads to inefficiency because the
array is too large, or to running out of room because the array is too small.
Definition: A linear linked list organized in such a way that link field of the last node
contains the address of the first node is called as a circularly linked list. Figure 4.9 illustrates
the representation of a circularly linked list.
76
Figure 4.9. Logical representation of a circularly linked list
From the above representation of circularly linked list, it is understood that during processing
one has to make sure that one does not get into an infinite loop owing to the circular nature of
pointers in the list. A solution to this problem is to designate a special node to act as the head
of the list. This node is usually referred to as header node. This header node has its
advantage other than pointing to the beginning of a list. The list can never be empty and
represented by a hanging pointer (F = Null) as was the case with empty singly linked lists. A
circular linked list becomes empty when head points to the head node of the list i.e.
(Head.link = Head). A circular linked list with an header node is called a headed circularly
linked list. Figure 4.10 shows an empty headed circularly linked list. Figure 4.11 shows the
logical representation of a headed circularly linked list.
Observe that the header node has the same structure as the other nodes in the list. The data
field of the header node is unused and is indicated as a shaded field in the pictorial
representation. However, in practical applications these fields may be utilized to represent
any useful information about the list relevant to the application.
77
OPERATIONS ON CIRCULARLY LINKED LISTS
Let us understand the various primitive operations that can be performed on circularly linked
lists.
78
To insert an element at the end of the circularly linked list, we need to know the address of
the header node and an element to be inserted. The algorithm below is used to implement this
operation. Figure 4.13 shows the pictorial representation of this operation.
Algorithm: Insert-Last-CLL
Input: H, address of the header node.
e, element to be inserted.
Output: H, updated.
Method:
1. n = Getnode()
2. n.data = e
3. n.link = H
4. T = H
5. While (T.link ≠ H) DO
T = T.link
End-of-while
6. T.link = n
Algorithm ends
We can observe from the above figure that inserting a node (element) at end of the list is very
simple. First, we have to traverse the list till we reach the end of the list. The last node of the
list is the one whose link field contains the address of the header node as shown in the Figure
79
3.6(a). Once we find this node, its link field is replaced with the address of the new node to
be inserted and the address of the header node is copied to the link field of the inserted node.
Algorithm: Delete-First-CLL
Input: H, address of the header node.
Output: H, updated.
Method:
1. If (H.link = Null) then Display “List is empty”
else
H.link = (H.link).link
endif
Algorithm ends
80
To delete an element at the end of the circularly linked list, first, we have to traverse the list
till we reach the node previous to the last node of the list. Deletion is done by just replacing
the link field of the last but one node with the address of the header node. The algorithm
below is used to implement this operation and is self explanatory. Figure 4.15 shows the
pictorial representation of this operation.
Algorithm: Delete-Last-CLL
Input: H, address of the header node.
Output: H, updated.
Method:
1. If (H.link = Null) then Display “List is empty”
else
T=H
While ((T.link).link ≠ H) DO
T = T.link
End-of-while
2. T.link = H
Algorithm ends
We can split the given circularly linked list into two lists such that a node containing the
element e becomes the first node of the second list. Let H1 be the address of the header node
81
of the list. To accomplish this task, we have to traverse the list to find the node containing the
element e. Once found, the link field of the node previous to this node is replaced with the
address of the header node H1. The address of the node containing the element e is copied to
another header node say H2 and this list is traversed till we reach the end i.e. the node whose
link field contains the address of the header node H1. Now, the link field of this node is
replaced with the address of the header node H2. Thus we get the two lists with header nodes
H1 and H2. The following algorithm is used to implement this operation and the Figure 4.16
shows the pictorial representation of this operation.
Algorithm: Split-CLL
Input: H, address of the header node and element e.
Output: H1, H2, headers.
Method: 1. H2 = Get-Header()
2. T = H
3. While (((T.link).link ≠ H) and ((T.link).data ≠ e)) DO
T = T.link
End-of-while
4. if (T.link ≠ H) then
H2.link = T.link
T.link = H
T = H2
While (((T.link).link ≠ H) DO
T = T.link
End-of-while
T.link = H2 ; H1 = H
Else
H2.link = H2
endif
Algorithm ends
82
(a) Before splitting
Algorithm: Combine-CLL
Input: H1 and H2, address of the header nodes.
Output: H1, Header.
Method: 1. T = H1
2. While (T.link ≠ H1) DO
T = T.link
End-of-while
83
3. T.link = H2.link
4. While (T.link ≠ H2) DO
T = T.link
End-of-while
5. T.link = H1
6. Dispose(H2)
Algorithm ends
Definition: A linear linked list organized in such a way that every node consists of one or
more data fields and two link fields that contain references to the previous and to the next
node in the sequence. It can be viewed as two singly-linked lists formed from the same data
items, in two opposite orders.
The two links allow walking along the list in either direction with equal ease. Compared to a
singly-linked list, modifying a doubly-linked list usually requires changing more pointers, but
is sometimes simpler because there is no need to keep track of the address of the previous
84
node. The link fields are often called forward and backwards, or next and previous. A
pointer to any node of a doubly-linked list gives access to the whole list. Figure 3.10 shows
the pictorial representation of a doubly linked list and Figure 4.17 shows the pictorial
representation of node structure in doubly linked list
Let us understand the various primitive operations that can be performed on doubly linked
lists.
1. Inserting an element at the beginning of the doubly linked list.
To insert an element at the beginning of the doubly linked list, we need to know the address
of the first and last node of the list and an element to be inserted. The algorithm below is used
to implement this operation. Figure 4.18 shows the pictorial representation of this operation.
Algorithm: Insert-First-DLL
Input: F, R, address of first and last nodes.
e, element to be inserted.
Output: F and R, updated.
Method:
1. n = Getnode()
2. n.data = e
3. n.blink = null
4. n.flink = F
5. If (F = null) then R = n
Else F.blink = n
endif
85
6. F = n
Algorithm ends
From the above pictorial representation, we can understand that inserting an element at the
beginning of the doubly linked list requires updation of blink and flink pointers of the node to
be inserted as well as the node pointed by the F pointer. After insertion, the pointer F is also
updated to point to the inserted node. If the node inserted is the first node then we have to
update the pointer R as well.
2. Inserting an element at the beginning of the doubly linked list.
To insert an element at the end of the doubly linked list, we need to know the address of the
first and last node of the list and an element to be inserted. The algorithm below is used to
implement this operation. Figure 4.19 shows the pictorial representation of this operation.
Algorithm: Insert-Last-DLL
Input: F, R, address of first and last nodes
e, element to be inserted.
Output: F and R, updated.
Method:
1. n = Getnode()
2. n.data = e
3. n.flink = null
4. n.blink = R
5. If (R = null) then F = n
86
Else R.flink = n
endif
6. R = n
Algorithm ends
From the above pictorial representation, we can understand that inserting an element at the
end of the doubly linked list requires updation of blink and flink pointers of the node to be
inserted as well as the node pointed by the R pointer. After insertion, the pointer R is also
updated to point to the inserted node. If the node inserted is the first node then we have to
update the pointer F as well.
Algorithm: Delete-First-DLL
Input: F, R, address of first and last nodes
Output: F and R, updated.
Method:
1. If (F=null) then Display “List is empty”)
Else
T=F
87
F = F.flink
If (F = null) then R = null
Else
F.blink = null
Endif
Dispose(T)
Endif
Algorithm ends
Deleting the first node from a doubly linked list is a simple task. We need the address of the
first (F) and the last node (R) of the list. If (F=null) means the list is empty. Otherwise, F is
made to point to next node using the statement F = F.flink thus deleting the first node. After
deletion, if (F=null) then R is also null. That means if there is only one element then after
deletion, the list will become empty; otherwise, the node next to the one deleted will become
the first node of the list and hence its blink will become null. That is F.blink = null. Finally,
the memory occupied by the node is released. The Figure 4.20 shows the pictorial
representation of this operation.
88
Algorithm: Delete-Last-DLL
Input: F, R, address of first and last nodes
Output: F and R, updated.
Method:
1. If (R=null) then Display “List is empty”)
Else
T=R
R = R.blink
If (R = null) then F = null
Else
R.flink = null
Endif
Dispose(T)
Endif
Algorithm ends
Deleting the last node from a doubly linked list is a simple task. We need the address of the
first (F) and the last node (R) of the list. If (R=null) means the list is empty. Otherwise, R is
made to point to next node using the statement R = R.blink thus deleting the last node. After
deletion, if (R=null) then F is also made to null. That means if there is only one element then
after deletion, the list will become empty; otherwise, the node previous to the one deleted will
become the last node of the list and hence its flink will become null. That is R.flink = null.
Finally, the memory occupied by the node is released. The Figure 4.21 shows the pictorial
representation of this operation.
89
Figure 4.21 The pictorial representation of deletion operation.
Algorithm: Insert-Sort-DLL
Input: F, R, address of first and last nodes
e, element to be inserted.
Output: F and R, updated.
Method: 1. n = Getnode()
2. n.data = e
3. if (F=null) then
n.blink = null
n.flink = null
F=n
R=n
else
if (e ≤ F.data) then
n.flink = F
n.blink = null
F.blink = n
F=n
else
T=F
While ((T.flink ≠ null) and (T.flink).data < e) DO
90
T = T.flink
End-of-while
n.flink = T.flink
n.blink = T
T.flink = n
If (T = R) then R = n
else (n.flink).blink = n
endif
endif
endif
Algorithm ends
91
(a) List after deletion
Algorithm: Delete-Element-DLL
Input: F, R, address of first and last nodes
e, element to be inserted.
Output: F and R, updated.
Method: 1 if (F=null) then Display “List is empty“
else
if (e = F.data) then
F = F.flink
if (F = null) then R = null
else
F.blink = null
endif
else
T=F
While ((T.flink ≠ null) and (T.flink).data ≠ e) DO
T = T.flink
End-of-while
if (T = R) then Display “Element e is not found”
else
T.flink = (T.flink).flink
if (T.flink = null) then R = T
else
92
(T.flink).blink = T
endif
endif
endif
endif
Algorithm ends
4.3 Summary
4.4 Keywords
Linear data structure, Singly Linked lists, circularly linked lists, doubly linked lists, header node.
4.5 Questions
1. What are singly linked lists? Explain with an illustrative example.
93
2. Mention the various operations performed on singly linked lists.
3. Briefly explain the implementation of stack using singly linked list.
4. Briefly explain the implementation of queue using singly linked list.
5. Design an algorithm to delete all the nodes with a given element from a singly linked
list.
6. Design an algorithm to delete alternative occurrence of an element from a singly
linked list.
7. Design an algorithm to delete an element e from a sorted singly linked list.
8. What are circular linked lists? Explain with an illustrative example.
9. Mention the various operations performed on circularly linked lists.
10. What is an header? Why is it required? Explain with an example.
11. What are the advantages of circularly linked lists?
12. Design an algorithm to delete all the nodes with a given element from a circularly linked list.
13. Write a note on Singly Linked Lists, Doubly Linked Lists and Circular Linked Lists,
their Implementation and applications.
4.6 Reference
94
95
Unit-5 TREES
Structure
5.0 Objectives
5.1 Introduction
5.2 Terminology and Definition of Tree
5.3 Binary Tree
5.4 Binary Search Tree
5.5 AVL trees,
5.6 B-trees, and their implementation
5.7 Traversal algorithms, and applications.
5.8 Summary
5.9 Keywords
5.10 Questions
5.11 References
5.0 OBJECTIVES
5.1 INTRODUCTION
The trees are also non-linear data structures, which are very useful in representing
hierarchical relationships among the data items. For example, in real life, if we want to
express the relationship exists among the members of the family then we use nonlinear
structures like trees. Organizing the data in a hierarchical structure plays a very important
role for most of the applications, which involve searching. Trees are the most useful and
widely used data structure in Computer Science in the areas of data storage, parsing,
evaluation of expressions, and compiler design.
5.2 DEFINITION AND BASIC TERMINOLOGIES OF TREES
Definition: A tree is defined as a finite set of one or more nodes such that
(i) there is a specially designated node called the root and
(ii) the rest of the nodes could be partitioned in to t disjoint sets (t ≥ 0) each set
representing a tree Ti, i = 1, 2, 3, …, t known as subtree of the tree.
A node in the definition of the tree represents an item of information and the links between
the nodes termed as branches, represent an association between the items of information.
Figure 5.2.1 shows a tree.
In the above figure, node 1 represents the root of the tree, nodes 2, 3, 4 and 9 are all
intermediate nodes and nodes 5, 6, 7, 8, 10, 11 and 12 are the leaf nodes of the tree. The
definition of the tree emphasizes on the aspect of (i) connectedness and (ii) absence of loops
or cycles. Beginning from the root node, the structure of the tree permits connectivity of the
root to every other node in the tree. In general, any node is reachable from any where in the
tree. Also, with branches providing links between the nodes, the structure ensures that no set
of nodes link together to form a closed loop or cycle.
There are several basic terminologies associated with trees. There is a specially designated
node called the root node. The number of subtrees of a node is known as the degree of the
node. Nodes that have zero degree are called leaf nodes or terminal nodes. The rest of them
are called intermediate nodes. The nodes, which hang from branches emanating from a node,
are called as children and the node from which the branches emanate is known as the parent
node. Children of the same parent node are referred to as siblings. The ancestors of a given
node are those nodes that occur on the path from the root to the given node. The degree of a
tree is the maximum degree of the node in the tree. The level of the node is defined by letting
the root node to occupy level 0. The rest of the nodes occupy various levels depending on
their association. Thus, if parent node occupies level i then, its children should occupy level
i+1. This renders a tree to have a hierarchical structure with root occupying the top most
level of 0. The height or depth of a tree is defined to be the maximum level of any node in
the tree. A forest is a set of zero or more disjoint trees. The removal of the root node from a
tree result in a forest.
5. 3 BINARY TREE
A binary tree has the characteristic of all nodes having at most two branches, that is, all nodes
have a degree of at most 2. Therefore, a binary tree can be empty or consist of a root node
and two disjointed binary trees termed left subtree and right subtree. Figure 5.3.1 shows an
example binary tree.
The maximum number of vertices at each level in a binary tree can be found out as follows:
At level 0: 20 number of vertices
At level 1: 21 number of vertices
At level 2: 22 number of vertices
…
At level i: 2i number of vertices
Therefore, maximum number of vertices in a binary tree of depth ‘l’ is:
20 + 21 + 22 + … + 2l
i.e., ∑2k = 2 l+1 – 1 for k = 0 to l
A binary tree has the characteristic of all nodes having at most two branches, that is, all
nodes have a degree of at most 2. Therefore, a binary tree can be empty or consist of a
root node and two disjointed binary trees termed left subtree and right subtree. Figure
5.4 shows an example binary tree.
Figure 5.4 An example binary tree
The number of levels in the tree is called the “depth” of the tree. A “complete”
binary tree is one which allows sequencing of the nodes and all the previous levels
are maximally accommodated before the next level is accommodated. i.e., the
siblings are first accommodated before the children of any one of them. And a
binary tree, which is maximally accommodated with all leaves at the same level is
called “full” binary tree. A full binary tree is always complete but a complete
binary tree need not be full. Fig. 5.4 is an example for a full binary tree and Figure
5.4.1 illustrates a complete binary tree.
The maximum number of vertices at each level in a binary tree can be found out as
follows:
20 + 21 + 22 + … + 2l
Here, the row indices correspond to the parent nodes and the column corresponds to the child
nodes. i.e., a row corresponding to the vertex vi having the entries ‘L’ and ‘R’ indicate that vi
has its left child, the index corresponding to the column with the entry ‘L’ and has its right
child, the index corresponding to the column with the entry ‘R’. The column corresponds to
vertex vi with no entries indicate that it is the root node. All other columns have only one
entry. Each row may have 0, 1 or 2 entries. Zero entry in the row indicates that the
corresponding vertex vi is a leaf node, only one entry indicates that the node has only one
child and two entries indicate that the node has both the left and right children. The entry “L”
is used to indicate the left child and “R” is used to indicate the right child entries.
From the above representation, we can understand that the storage space utilization is not
efficient. Now, let us see the space utilization of this method of binary tree representation. Let
‘n’ be the number of vertices. The space allocated is n x n matrix. i.e., we have n2 number of
locations allocated, but we have only n-1 entries in the matrix. Therefore, the percentage of
space utilization is calculated as follows:
The percentage of space utilized decreases as n increases. For large ‘n’, the percentage of
utilization becomes negligible. Therefore, this way of representing a binary tree is not
efficient in terms of memory utilization.
If l is the depth of the binary tree then, the number of possible nodes in the binary tree is 2 l+1-
1. Hence it is necessary to have 2l+1-1 locations allocated to represent the binary tree.
If ‘n’ is the number of nodes, then the percentage of utilization is
Figure 5.4.3 shows a binary tree and Figure 5.4.4 shows its one-dimensional array
representation.
For a complete and full binary tree, there is 100% utilization and there is a maximum wastage
if the binary tree is right skewed or left skewed, where only l+1 spaces are utilized out of the
2l+1 – 1 spaces.
An important observation to be made here is that the organization of the data in the binary
tree decides the space utilization of the representation used.
The first type of self-balancing binary search tree to be invented is the AVL tree. The name
AVL tree is coined after its inventor's names − Adelson-Velsky and Landis.In AVL trees, the
difference between the heights of left and right subtrees, known as the Balance Factor, must
be at most one. Once the difference exceeds one, the tree automatically executes the
balancing algorithm until the difference becomes one again.
There are usually four cases of rotation in the balancing algorithm of AVL trees: LL, RR, LR,
RL.
LL Rotations
LL rotation is performed when the node is inserted into the right subtree leading to an
unbalanced tree. This is a single left rotation to make the tree balanced again −
LR Rotations
LR rotation is the extended version of the previous single rotations, also called a double
rotation. It is performed when a node is inserted into the right subtree of the left subtree. The
LR rotation is a combination of the left rotation followed by the right rotation. There are
multiple steps to be followed to carry this out.
Consider an example with “A” as the root node, “B” as the left child of “A” and “C”
as the right child of “B”.
Since the unbalance occurs at A, a left rotation is applied on the child nodes of A, i.e.
B and C.
After the rotation, the C node becomes the left child of A and B becomes the left child
of C.
The unbalance still persists, therefore a right rotation is applied at the root node A and
the left child C.
After the final right rotation, C becomes the root node, A becomes the right child and
B is the left child.
Fig :5.5.3 LR Rotation
RL Rotations
RL rotation is also the extended version of the previous single rotations, hence it is called a
double rotation and it is performed if a node is inserted into the left subtree of the right
subtree. The RL rotation is a combination of the right rotation followed by the left rotation.
There are multiple steps to be followed to carry this out.
Consider an example with “A” as the root node, “B” as the right child of “A” and “C”
as the left child of “B”.
Since the unbalance occurs at A, a right rotation is applied on the child nodes of A,
i.e. B and C.
After the rotation, the C node becomes the right child of A and B becomes the right
child of C.
The unbalance still persists, therefore a left rotation is applied at the root node A and
the right child C.
After the final left rotation, C becomes the root node, A becomes the left child and B
is the right child.
Fig :5.5.4 RL Rotation
Insertion
The data is inserted into the AVL Tree by following the Binary Search Tree property of
insertion, i.e. the left subtree must contain elements less than the root value and right subtree
must contain all the greater elements. However, in AVL Trees, after the insertion of each
element, the balance factor of the tree is checked; if it does not exceed 1, the tree is left as it
is. But if the balance factor exceeds 1, a balancing algorithm is applied to readjust the tree
such that balance factor becomes less than or equal to 1 again.
Algorithm
The following steps are involved in performing the insertion operation of an AVL Tree −
Step 1 − Create a node
Step 2 − Check if the tree is empty
Step 3 − If the tree is empty, the new node created will become the root node of the AVL
Tree.
Step 4 − If the tree is not empty, we perform the Binary Search Tree insertion operation and
check the balancing factor of the node in the tree.
Step 5 − Suppose the balancing factor exceeds ±1, we apply suitable rotations on the said
node and resume the insertion from Step 4.
START
if node == null then:
return new node
if key < node.key then:
node.left = insert (node.left, key)
else if (key > node.key) then:
node.right = insert (node.right, key)
else
return node
node.height = 1 + max (height (node.left), height (node.right))
balance = getBalance (node)
if balance > 1 and key < node.left.key then:
rightRotate
if balance < -1 and key > node.right.key then:
leftRotate
if balance > 1 and key > node.left.key then:
node.left = leftRotate (node.left)
rightRotate
if balance < -1 and key < node.right.key then:
node.right = rightRotate (node.right)
leftRotate (node)
return node
END
Insertion Example
Let us understand the insertion operation by constructing an example AVL tree with 1 to 7
integers.
Starting with the first element 1, we create a node and measure the balance, i.e., 0.
Since both the binary search property and the balance factor are satisfied, we insert another
element into the tree.
The balance factor for the two nodes are calculated and is found to be -1 (Height of left
subtree is 0 and height of the right subtree is 1). Since it does not exceed 1, we add another
element to the tree.
Now, after adding the third element, the balance factor exceeds 1 and becomes 2. Therefore,
rotations are applied. In this case, the RR rotation is applied since the imbalance occurs at
two right nodes.
The tree is rearranged as
Similarly, the next elements are inserted and rearranged using these rotations. After
rearrangement, we achieve the tree as −
The balance of the tree still remains 1, therefore we leave the tree as it is without performing
any rotations.
Example
#include <stdio.h>
#include <stdlib.h>
struct Node {
int data;
struct Node *leftChild;
struct Node *rightChild;
int height;
};
int max(int a, int b);
int height(struct Node *N){
if (N == NULL)
return 0;
return N->height;
}
int max(int a, int b){
return (a > b) ? a : b;
}
struct Node *newNode(int data){
struct Node *node = (struct Node *) malloc(sizeof(struct Node));
node->data = data;
node->leftChild = NULL;
node->rightChild = NULL;
node->height = 1;
return (node);
}
struct Node *rightRotate(struct Node *y){
struct Node *x = y->leftChild;
struct Node *T2 = x->rightChild;
x->rightChild = y;
y->leftChild = T2;
y->height = max(height(y->leftChild), height(y->rightChild)) + 1;
x->height = max(height(x->leftChild), height(x->rightChild)) + 1;
return x;
}
struct Node *leftRotate(struct Node *x){
struct Node *y = x->rightChild;
struct Node *T2 = y->leftChild;
y->leftChild = x;
x->rightChild = T2;
x->height = max(height(x->leftChild), height(x->rightChild)) + 1;
y->height = max(height(y->leftChild), height(y->rightChild)) + 1;
return y;
}
int getBalance(struct Node *N){
if (N == NULL)
return 0;
return height(N->leftChild) - height(N->rightChild);
}
struct Node *insertNode(struct Node *node, int data){
if (node == NULL)
return (newNode(data));
if (data < node->data)
node->leftChild = insertNode(node->leftChild, data);
else if (data > node->data)
node->rightChild = insertNode(node->rightChild, data);
else
return node;
node->height = 1 + max(height(node->leftChild),
height(node->rightChild));
int balance = getBalance(node);
if (balance > 1 && data < node->leftChild->data)
return rightRotate(node);
if (balance < -1 && data > node->rightChild->data)
return leftRotate(node);
if (balance > 1 && data > node->leftChild->data) {
node->leftChild = leftRotate(node->leftChild);
return rightRotate(node);
}
if (balance < -1 && data < node->rightChild->data) {
node->rightChild = rightRotate(node->rightChild);
return leftRotate(node);
}
return node;
}
struct Node *minValueNode(struct Node *node){
struct Node *current = node;
while (current->leftChild != NULL)
current = current->leftChild;
return current;
}
void printTree(struct Node *root){
if (root == NULL)
return;
if (root != NULL) {
printTree(root->leftChild);
printf("%d ", root->data);
printTree(root->rightChild);
}
}
int main(){
struct Node *root = NULL;
root = insertNode(root, 22);
root = insertNode(root, 14);
root = insertNode(root, 72);
root = insertNode(root, 44);
root = insertNode(root, 25);
root = insertNode(root, 63);
root = insertNode(root, 98);
printf("AVL Tree: ");
printTree(root);
return 0;
}
Output
AVL Tree: 14 22 25 44 63 72 98
After deletion: 14 22 44 63 72 98
Here we will see what are the B-Trees. The B-Trees are specialized m-way search tree.
This can be widely used for disc access. A B-tree of order m, can have maximum m-1
keys and m children. This can store large number of elements in a single node. So the
height is relatively small. This is one great advantage of B-Trees.
B-Tree has all of the properties of one m-way tree. It has some other properties.
• Every node except root and leaves, can hold at least m/2 children The root
nodes must have at least two children.
• All leaf nodes must have at same level
Example of B-Tree
This supports basic operations like searching, insertion, deletion. In each node, the
item will be sorted. The element at position i has child before and after it. So
children sored before will hold smaller values, and children present at right will
hold bigger values. Here we will see, how to perform the insertion into a B-Tree.
Suppose we have a B-Tree like below
Example of B-Tree
Fig 5.6.2
To insert an element, the idea is very similar to the BST, but we have to follow
some rules. Each node has m children, and m-1 elements. If we insert an element
into one node, there are two situations. If the node has elements less than m-1, then
the new element will be inserted directly into the node. If it has m-1 elements, then
by taking all elements, and the element which will be inserted, then take the median
of them, and the median value is send to the root of that node by performing the
same criteria, then create two separate lists from left half and right half of the node.
Suppose we want to insert 79 into the tree. At first it will be checked with root, this
is greater than 56. Then it will come to the right most sub-tree. Now it is less than
81, so move to the left sub-tree. After that it will be inserted into the root. Now
there are three elements [66, 78, 79]. The median value is 78, so 78 will go up, and
the root node becomes [79, 81], and the elements of the node will be split into two
nodes. One will hold 66, and another will hold 79.
Input − The root of the tree, and key to insert We will assume, that the key is not
present into the list x := Read root
If x is full then
Z:=new node
Locate the middle object oi, stored in x, move the objects to the left of oi in to node y
Move the object to the right of oi into node z.
Here we will see, how to perform the deletion of a node from B-Tree. Suppose we have a
BTree like below
Example of B-Tree
Fig 5.6.4
Deletion has two parts. At first we have to find the element. That strategy is like the
querying. Now for deletion, we have to care about some rules. One node must have
at-least m/2 elements. So if we delete, one element, and it has less than m-1
elements remaining, then it will adjust itself. If the entire node is deleted, then its
children will be merged, and if their size issame as m, then split them into two
parts, and again the median value will go up.
Suppose we want to delete 46. Now there are two children. [45], and [47, 49], then
they will be merged, it will be [45, 47, 49], now 47 will go up.
Here we will see, how to perform the deletion of a node from B-Tree. Suppose we have a
BTree like below
Input − The root of the tree, and key to delete We will assume, that the key is present into
the list if x is leaf, then delete object with key ‘key’ from x else if x does not contain the
object with key ‘key’, then locate the child x->child[i] whose key range is holding ‘key’
y := x->child[i] if y has m/2 elements, then If the sibling node z immediate to the left or
right of y, has at least one more object than m/2, add one more object by moving x-
>key[i] from x to y, and move that last or first object from z to x. If y is non-leaf node,
then last or first child pointer in z is also moved to y else any immediate sibling of y has
m/2 elements, merge y with immediate sibling end if
BTree Delete(y, key) else if y that precedes ‘key’ in x, has at-least m/2 + 1 objects, then
find predecessor k of ‘key’, in the sub-tree rooted by y. then recursively delete k from the
sub-tree and replace key with k in x else if ys has m/2 elements, then check the child z,
which is immediately follows ‘key’ in x if z has at least m/2+1 elements, then find
successor k of ‘key’, in the sub-tree rooted by z. recursively delete k from sub-tree, and
replace key with k in x else
both y and z has m/2 elements, then merge then into one node, and push ‘key’ down
to the new node as well. Recursively delete ‘key’ from this new node end if end if.
A traversal of a binary tree is where its nodes are visited in a particular but repetitive order,
rendering a linear order of nodes or information represented by them. There are three simple
ways to traverse a tree. They are called preorder, inorder, and postorder. In each technique,
the left subtree is traversed recursively, the right subtree is traversed recursively, and the root
is visited. What distinguishes the techniques from one another is the order of those three
tasks. The following sections discuss these three different ways of traversing a binary tree.
Preorder Traversal
In this traversal, the nodes are visited in the order of root, left child and then right child.
o Process the root node first.
o Traverse left sub-tree.
Inorder Traversal
In this traversal, the nodes are visited in the order of left child, root and then right child. i.e.,
the left sub-tree is traversed first, then the root is visited and then the right sub-tree is
traversed. The function must perform only three tasks.
Postorder Traversal
In this traversal, the nodes are visited in the order of left child, right child and then the root.
i.e., the left sub-tree is traversed first, then the right sub-tree is traversed and finally the root
is visited. The function must perform the following tasks.
o Traverse the left subtree.
o Traverse the right subtree.
o Process the root node.
The postorder traversal sequence for the binary tree shown in Figure 5.8.1 is: D H I E B F G
C A.
We have already studied that every node of a binary tree in linked representation has a
structure which has links to the left and right children. The algorithms for traversing the
binary tree in linked representation are given below.
If D contains ‘LRLR’, from the root node, first move towards left (L), then right (R), then
left (L) and finally move towards right (R). If the pointer points to null at that position, node
temp can be inserted otherwise, it cannot be inserted. To achieve this, one has to start from
the root node. Let us use two pointers prev and cur where prev always points to parent node
and cur points to child node. Initially cur points to root node and prev points to null. To start
with one can write the following statements.
prev = null
cur = root
Now, keep updating the node pointed to by cur towards left if the direction is ‘L’ otherwise,
update towards right. Once all directions are over, if current points to null, insert the node
temp towards left or right based on the last direction. Otherwise, display error message. This
procedure can be algorithmically expressed as follows.
Searching
To search for an item in a tree, we can traverse a tree in any of the (inorder, preorder,
postorder) order to visit the node. As we visit the node, we can compare the item to be
searched with the data item stored in information field of the node. If found then the search is
successful otherwise, search is unsuccessful. A recursive inorder traversal technique used for
searching an item in binary tree is presented below.
Algorithm: Search(item, root, flag)
Method:
1. if (root = null then
flag = false
exit
ifend
2. Search (item, root.llink, flag)
3. if (item = root.info) then
flag = true
exit
ifend
4. Search (item, root.rlink, flag)
Deletion
Deletion of a node from a binary tree involves searching for a node which contains the data
item. If such a node is found then that node is deleted; otherwise, appropriate message is
displayed. If the node to be deleted is a leaf node then the deletion operation is a simple task.
Otherwise, appropriate modifications need to be done to update the binary tree after deletion.
This operation is explained in detail considering another form of a binary tree called binary
search tree.
Binary Search Tree
Binary Search Tree (BST) is an ordered Binary Tree in that it is an empty tree or value of
root node is greater than all the values in Left Sub Tree (LST) and less than all the values of
Right Sub Tree (RST). Right and Left sub trees are again binary sub trees by themselves.
Figure 5.7.3 (a) shows an example binary tree where as Figure 5.7.3 (b) is not a binary search
tree but a binary tree.
Figure 5.7.3(a) A binary search tree Figure 5.7.3 (b) Not a binary search tree
We will be using BST structure to demonstrate features of Binary Trees. The operations
possible on a binary tree are
Create a Binary Tree
Insert a node in a Binary tree
Delete a node in a Binary Tree
Search for a node in Binary search Tree
Figure 5.7.4 (a) Before insertion Figure 5.7.4 (b) After insertion of item 5
(b) Deleting a node with one child only, either left child or Right child.
For example, delete node 9 that has only a right child as shown in Figure 5.7.6 The right
pointer of node 7 is made to point to node 11. The new tree after deletion is shown in 5.7.7
Figure 5.7.6 Deletion of node with only one child Figure 5.7.7 New tree after deletion
Algorithm ends
5.8 SUMMARY
In this unit Trees and binary trees are non-linear data structures, which are inherently two
dimensional in structures. While trees are non-empty and may have nodes of any degree, a
binary tree may be empty r hold nodes of degree at most two. The terminologies of root node,
height, level, parent, children, sibling, ancestors, leaf or terminal nodes and non-terminal
nodes are applicable to both trees and binary trees and traversal algorithms and their
applications.
5.9 KEYWORDS
Trees
binary trees
non-linear data structures
root node
5.10 QUESTIONS FOR SELF STUDY
Sartaj Sahni, 2000, Data structures, algorithms and applications in C++, McGraw Hill
international edition.
Horowitz and Sahni, 1983, Fundamentals of Data structure, Galgotia publications
Horowitz and Sahni, 1998, Fundamentals of Computer algorithm, Galgotia
publications.
Narsingh Deo, 1990, Graph theory with applications to engineering and computer
science, Prentice hall publications.
Tremblay and Sorenson, 1991, An introduction to data structures with applications,
McGraw Hill edition.
UNIT-6.0 GRAPHS
Structure
6.0 Objectives
6.1 Introduction
6.2 Basic Definitions
6.3 Graph Data Structure
6.4 Representation of Graphs
6.5 adjacency matrix
6.6 Adjacency list and graph traversal algorithms.
6.7 Summary
6.8 Keywords
6.9 Questions for self-study
6.10 References
6.0 OBJECTIVES
After studying this unit, we will be able to explain the following:
6.1 INTRODUCTION
we have defined non-linear data structure and we mentioned that trees and graphs are the
examples of non-linear data structure. To recall, in non-linear data structures unlike linear
data structures, an element is permitted to have any number of adjacent elements. Graph is an
important mathematical representation of a physical problem, for example finding optimum
shortest path from a city to another city for a traveling sales man, so as to minimize the cost.
A graph can have unconnected node. Further there can be more than one path between two
nodes. Graphs and directed graphs are important to computer science for many real-world
applications from building compilers to modeling physical communication networks. A graph
is an abstract notion of a set of nodes (vertices or points) and connection relations (edges or
arcs) between them.
6.2 BASIC DEFINITIONS
Definition3: The order of a graph (digraph) G = (V, E) is |V| sometimes denoted by |G| and
the size of this graph is |E|
Sometimes we view a graph as a digraph where every unordered edge (u, v) is replaced by
two directed arcs (u, v) and (v, u). In this case, the size of a graph is half the size of the
corresponding digraph.
Definition 4: A walk in a graph (digraph) G is a sequence of vertices v0,v1…vn such that for
all 0 ≤ i < n, (vi,vi+1) is an edge (arc) in G. The length of the walk v0,v1…vn is the number n. A
path is a walk in which no vertex is repeated. A cycle is a walk (of length at least three for
graphs) in which v0 = vn and no other vertex is repeated; sometimes, it is understood, we omit
vn from the sequence.
In the next example, we display a graph G1 and a digraph G2 both of order 5. The size of the
graph G1 is 6 where E(G1) = {(0, 1), (0, 2), (1, 2), (2, 3), (2, 4), (3, 4) while the size of the
graph G2 is 7 where E(G2) = {(0, 2), (1, 0), (1, 2), (1, 3), (3, 1), (3, 4), (4, 2)}.
A pictorial example of a graph G1 and a digraph G2 is given in figure 6.1
Figure 6.1 A Graph G1 and a digraph G2
Example 1: For the graph G1 of Figure 6.1, the following sequences of vertices are classified
as being walks, paths, or cycles.
Example 2: For the graph G1 of Figure 6.1, the following sequences of vertices are classified
as being walks, paths, or cycles.
Definition 5: A graph G is connected if there is a path between all pairs of vertices u and v of
V(G). A digraph G is strongly connected if there is a path from vertex u to vertex v for all
pairs u an v in V(G).
In Figure 6.1, the graph G1 is connected by the digraph G2 is not strongly connected because
there are no arcs leaving vertex 2. However, the underlying graph G2 is connected.
Definition 6: In a graph, the degree of a vertex v, denoted by deg(v), is the number of edges
incident to v. For digraphs, the out-degree of a vertex v is the number of arcs {(v, x) Є E | x Є
V} incident from v (leaving v) and the in-degree of vertex v is the number of arcs {(v, x) Є E |
x Є V} incident to v (entering v).
For a graph, the in-degree and out-degree’s are the same as the degree. For out graph G1, we
have deg(0) = 2, deg(2) = 4, deg(3) = 2 and deg(4) = 2. We may concisely write this as a
degree sequence (2, 2, 4, 2, 2) id there is a natural ordering (e.g., 0, 1, 2, 3, 4) of the vertices.
The in-degree sequence and out-degree sequence of the digraph G2 are (1, 1, 3, 1, 1) and (1,
3, 0, 2, 1), respectively. The degree of a vertex of a digraph is sometimes defined as the sum
of its in-degree and out-degree. Using this definition, a degree sequence of G2 would be (2, 4,
3, 3, 2).
Definition 7: A weighted graph is a graph whose edges have weights. These weights can be
thought as cost involved in traversing the path along the edge. Figure 6.2 shows a weighted
graph.
Definition 8: If removal of an edge makes a graph disconnected then that edge is called
cutedge or bridge.
Definition 9: If removal of a vertex makes a graph disconnected then that vertex is called
cutvertex.
Definition 10: A connected graph without a cycle in it is called a tree. The pendent vertices
of a tree are called leaves.
Definition 11: A graph without self loop and parallel edges is called a simple graph.
Definition 12: A graph which can be traced without repeating any edge is called an Eulerian
graph. If all vertices of a graph happen to be even degree then the graph is called an Eulerian
graph.
Definition 13: If two vertices of a graph are odd degree and all other vertices are even then it
is called open Eulerian graph. In open Eulerian graph the starting and ending points must be
odd degree vertices.
Definition 14: A graph in which all vertices can be traversed without repeating any edge but
can have any number of edges is called Hamiltonian graph.
Definition 15: Total degree of a graph is twice the number of edges. That is, the total degree
= 2* |E|
Sum of degrees of all even degree vertices + Sum of degrees of all odd degree
vertices = Even.
We can formally define graph as an abstract data type with data objects and operations on it
as follows:
Data objects: A graph G of vertices and edges. Vertices represent data objects.
Operations:
Check-Graph-Empty(G): Check if graph G is empty - Boolean function
Insert-Vertex(G, V): Insert an isolated vertex V into a graph G. Ensure that vertex V
does not exist in G before insertion.
Insert-Edge(G, u, v): Insert an edge connecting vertices u, v into a graph G. Ensure
that an edge does not exist in G before insertion.
Delete-Vertex(G, V): Delete vertex V and all the edges incident on it from the graph
G. Ensure that such a vertex exists in the graph G before deletion.
Delete-Edge(G, u, v): Delete an edge from the graph G connecting the vertices u, v.
Ensure that such an edge exists before deletion.
Store-Data(G, V, Item): Store Item into a vertex V of graph G.
Retrieve-Data(G, V, Item): Retrieve data of a vertex V in the graph G and return it
in Item.
BFT(G): Perform Breath First Traversal of a graph.
DFT(G): Perform Depth First Traversal of a graph.
Note that the number of 1s in a row represents the out degree of a node. In case of undirected
graph, the number of 1s represents the degree of the node. Total number of 1s in the matrix
represents number of edges. Figure 6.4(a) shows a graph and Figure 6.4(b) shows its
adjacency matrix.
Figure 6.4(a) Graph Figure 6.4(b) Adjacency matrix
Figure 6.4(a) shows a digraph and Figure 6.4(b) shows its adjacency matrix.
1 ej incident upon vi
Aij =
0 Otherwise
Matrix M is known as the incidence matrix representation of the graph G. Figure 6.4.1 (a)
shows a graph and Figure 6.4.1 (b) shows its incidence matrix.
e1 e2 e3 e4 e5 e6 e7
v1 1 0 0 0 1 0 0
v2 1 1 0 0 0 1 1
v3 0 1 1 0 0 0 0
v4 0 0 1 1 0 0 1
v5 0 0 0 1 1 1 0
Figure 6.4.1(a) Undirected graph Figure 6.4.1 (b) Incidence matrix
The incidence matrix contains only two elements, 0 and 1. Such a matrix is called a binary
matrix or a (0, 1)-matrix.
The following observations about the incidence matrix can readily be made:
1. Since every edge is incident on exactly two vertices, each column of in an incidence
matrix has exactly two1’s.
2. The number of 1’s in each row equals the degree of the corresponding vertex.
3. A row with all 0’s, therefore, represents an isolated vertex.
Figure 6.4.1 (a) Undirected graph Figure 6.4.1 (b) Linked representation of a graph
Figure 6.4.2 (a) Digraph Figure 6.4.2 (b) Linked representation of a graph
Table of Contents:
o Definition
o Creation from a Graph
o Properties
o Undirected Graph
o Directed Graph
o Example
Graphs can also be defined in the form of matrices. To perform the calculation of paths and
cycles in the graphs, matrix representation is used. It is calculated using matrix operations.
The two most common representation of the graphs are:
o Adjacency Matrix
o Adjacency List
We will discuss here about the matrix, its formation and its properties.
The adjacency matrix, also called the connection matrix, is a matrix containing rows and
columns which is used to represent a simple labelled graph, with 0 or 1 in the position of (Vi ,
Vj) according to the condition whether Vi and Vj are adjacent or not. It is a compact way to
represent the finite graph containing n vertices of a m x m matrix M. Sometimes adjacency
matrix is also called as vertex matrix and it is defined in the general form as
If the simple graph has no self-loops, Then the vertex matrix should have 0s in the diagonal.
It is symmetric for the undirected graph. The connection matrix is considered as a square
array where each row represents the out-nodes of a graph and each column represents the in-
nodes of a graph. Entry 1 represents that there is an edge between two nodes.
The adjacency matrix for an undirected graph is symmetric. This indicates the value in the ith
row and jth column is identical with the value in the jth row and ith column. Additionally, a
fascinating fact includes matrix multiplication. If the adjacency matrix is multiplied by itself
(matrix multiplication), if there is a nonzero value present in the ith row and jth column, there
is a route from Vi to Vj of length equal to two. It does not specify the path though there is a
path created. The nonzero value indicates the number of distinct paths present.
Where, the value aij equals the number of edges from the vertex i to j. For an undirected
graph, the value aij = aji for all i, j , so that the adjacency matrix becomes a symmetric
matrix.Mathematically, this can be explained as:Let G be a graph with vertex set {v1, v2, v3, .
. . , vn}, then the adjacency matrix of G is the n × n matrix that has a 1 in the (i, j)-position if
there is an edge from vi to vj in G and a 0 in the (i, j)-position otherwise.From the given
directed graph, the adjacency matrix is written as
Properties
The vertex matrix is an array of numbers which is used to represent the information about the
graph. Some of the properties of the graph correspond to the properties of the adjacency
matrix, and vice versa. The properties are given as follows:
Matrix Powers
The most well-known approach to get information about the given graph from operations on
this matrix is through its powers. The entries of the powers of the matrix give information
about paths in the given graph. The theorem is given below to represent the powers of the
adjacency matrix.
Theorem: Let us take, A be the connection matrix of a given graph. Then the entries i, j of
An counts n-steps walks from vertex i to j.
Spectrum
The study of the eigenvalues of the connection matrix of a graph is clearly defined in spectral
graph theory. Assume that, A be the connection matrix of a k-regular graph and v be the all-
ones column vector in Rn. Then the i-th entry of Av is equal to the sum of the entries in the
ith row of A. This represents the number of edges proceeds from vertex i, which is exactly k.
Isomorphisms
The given two graphs are said to be isomorphic if one graph can be obtained from the other
by relabeling vertices of another graph. It is noted that the isomorphic graphs need not have
the same adjacency matrix. Because this matrix depends on the labelling of the vertices. But
the adjacency matrices of the given isomorphic graphs are closely related.
Theorem: Assume that, G and H be the graphs having n vertices with the adjacency matrices
A and B. Then G and H are said to be isomorphic if and only if there is an occurrence of
permutation matrix P such that B=PAP-1.
For an undirected graph, the protocol followed will depend on the lines and loops. That
means each edge (i.e., line) adds 1 to the appropriate cell in the matrix, and each loop adds 2.
Thus, using this practice, we can find the degree of a vertex easily just by taking the sum of
the values in either its respective row or column in the adjacency matrix. This can be
understood using the below example.
Question:
Write down the adjacency matrix for the given undirected weighted graph
Solution:
The weights on the edges of the graph are represented in the entries of the adjacency matrix
as follows:
6.7 ADJACENCY LIST AND GRAPH TRAVERSAL ALGORITHMS.
The graph is a non-linear data structure. This represents data using nodes, and their relations
using edges. A graph G has two sections. The vertices, and edges. Vertices are represented
using set V, and Edges are represented as set E. So the graph notation is G(V,E). Let us see
one example to get the idea.
In this graph, there are five vertices and five edges. The edges are directed. As an example, if
we choose the edge connecting vertices B and D, the source vertex is B and destination is D.
So we can move B to D but not move from D to B.
The graphs are non-linear, and it has no regular structure. To represent a graph in memory,
there are few different styles. These styles are
while queue:
node = queue.pop(0)
In the above diagram, the full way of traversing is shown using arrows.
Step 1: Create a Queue with the same size as the total number of vertices in the graph.
Step 2: Choose 12 as your beginning point for the traversal. Visit 12 and add it to the
Queue.
Step 3: Insert all the adjacent vertices of 12 that are in front of the Queue but have not
been visited into the Queue. So far, we have 5, 23, and 3.
Step 4: Delete the vertex in front of the Queue when there are no new vertices to visit
from that vertex. We now remove 12 from the list.
Step 5: Continue steps 3 and 4 until the queue is empty.
Step 6: When the queue is empty, generate the final spanning tree by eliminating
unnecessary graph edges.
from collections import deque
graph = {
'A': {'B', 'C'},
'B': {'A', 'D', 'E'},
'C': {'A', 'F'},
'D': {'B'},
'E': {'B', 'F'},
'F': {'C', 'E'}
}
bfs(graph, 'A')
Output: A B C E D F
The entire path of traversal is depicted in the diagram above with arrows.
o Step 1: Create a Stack with the total number of vertices in the graph as its size.
o Step 2: Choose 12 as your beginning point for the traversal. Go to that vertex and
place it on the Stack.
o Step 3: Push any of the adjacent vertices of the vertex at the top of the stack that has
not been visited onto the stack. As a result, we push 5
o Step 4: Repeat step 3 until there are no new vertices to visit from the stack’s top
vertex.
o Step 5: Use backtracking to pop one vertex from the stack when there is no new
vertex to visit.
o Step 6: Repeat steps 3, 4, and 5.
o Step 7: When the stack is empty, generate the final spanning tree by eliminating
unnecessary graph edges.
Code Implementation
def dfs(graph, start, visited=None):
if visited is None:
visited = set()
visited.add(start)
print(start)
for next_vertex in graph[start] - visited:
dfs(graph, next_vertex, visited)
return visited
graph = {
'A': {'B', 'C'},
'B': {'A', 'D', 'E'},
'C': {'A', 'F'},
'D': {'B'},
'E': {'B', 'F'},
'F': {'C', 'E'}
}
dfs(graph, 'A')
Output: A C F E B D B
6.8 SUMMARY
In this unit we have discussed in detail about the Graphs are non-linear data structures. Graph
is an important mathematical representation of a physical problem. Graphs and directed
graphs are important to computer science for many real-world applications from building
compilers to modeling physical communication networks. A graph is an abstract notion of a
set of nodes (vertices or points) and connection relations (edges or arcs) between them.
6.6 KEYWORDS
6.8 REFERENCES
Sartaj Sahni, 2000, Data structures, algorithms and applications in C++, McGraw Hill
international edition.
Horowitz and Sahni, 1983, Fundamentals of Data structure, Galgotia publications
Narsingh Deo, 1990, Graph theory with applications to engineering and computer
science, Prentice Hall publications.
Tremblay and Sorenson, 1991, An introduction to data structures with applications,
McGraw Hill edition.
C and Data Structures by Practice- Ramesh, Anand and Gautham.
Data Structures and Algorithms: Concepts, Techniques and Applications by GAV Pai.
Tata McGraw Hill, New Delhi.
UNIT 7.0 SORTING ALGORITHMS
Structure
7.0 Objectives
7.1 Introduction to sorting techniques
7.2 Conventional sort
7.3 Selection sort
7.4 Insertion sort
7.5 Bubble sort,
7.6 Quicksort,
7.7 Merge sort,
7.8 Heap sort algorithms,
7.9 Applications of sorting
7.10 Summary
7.11 Keywords
7.12 Questions
7.13 References
7.0 OBJECTIVES
Sorting is a process of arranging a set of unordered elements in some predefined ordered. The
simplest way of ordering any unordered set of elements in some predefined orders is to
consider randomly one element at a time and array them, continue this procedure for all
elements in the set. If the desired ordered set is obtained stop the procedure else continue
with other combinations. This process shall be continued until the desired order is obtained.
The important thing to be noted in any sorting is to select any desired property on which the
sorting will be performed and that property shall be called as sorting key.
For example, consider an unordered set of records arranged sequentially containing details of
students. If we want to arrange the details of the students based on the alphabetical order of
the student’s name, then we need to sort the records based on the student’s name and student
name will be treated as sorting key. Consider Table 1.1 where we have nine students and
their details which are stored in an unordered way. If we want to store those details in an
ordered way then we need to sort the entire table using any one of the attribute values as a
sorting key.
Table 7.2 shows an ordered table containing details of students. In this case table is sorted
based on the alphabetical order of the students and hence the sorting key is student name.
Table 7.1: Details of students ordered by their name in alphabetical order
Note:
The sorting key can be created from two or more sort keys. The first key is termed as primary
sort key and second is called as secondary sort key and so on.
In the literature we can a good number of sorting techniques which are classified as (i)
internal techniques (ii) External techniques. Internal techniques are those which can be used
when the records are small enough to be sorted within the main memory and external
techniques are those which can be used when the records are huge and requires main memory
and secondary memory to sort entire set of records. In this unit we shall consider only
internal techniques.
In the following section we present three simple sorting techniques viz., conventional sorting
technique, selection sorting technique and insertion sorting technique. In all the techniques
discussed in the following sections we consider an unordered set of integer numbers. While
sorting we can sort in ascending order (i.e., smallest to largest) or in descending order (i.e.,
largest to smallest). For simplicity in all the sections we have considered sorting of integer
numbers in ascending order and the sorting in descending order is left as assignment for the
students for practice.
The basic step in this sorting technique is to bring the smallest element of the unordered list
to the recent position. In this technique we will consider two pointers ‘i’ and ‘j’. Initially
pointer ‘i’ will be pointing to first data point and pointer ‘j’ will be pointing to the next data
point. Compare the value pointed by pointers ‘i’ and ‘j’, if the value pointed by pointer ‘j’ is
smaller than the value pointed by pointer ‘i’, then swap those two values else do not swap the
values. Increment the pointer ‘j’ so that it point to the next position. Check whether the value
pointed by pointers ‘i’ and ‘j’, if the value pointed by pointer ‘j’ is smaller than the value
pointed by pointer ‘i’, then swap those two values else do not swap the values and increment
the pointer ‘j’ one at a time. Continue the same process until the pointer ‘j’ reaches the last
position. At the end of this process we can see that the smallest element in the list is at the
location pointed by the pointer ‘i’.
Now increment the pointer ‘i’ by one and initialize the pointer ‘j’ to the next location of the
pointer ‘i’. Continue the above discussed process until the pointer ‘i’ reaches the last but one
position of the list. The algorithm for the conventional list is as given below.
In order to illustrate the conventional sorting technique let us consider an example where a
list A={10, 6, 8, 2, 4, 11} contains unordered set of integer numbers.
i j
10 6 8 2 4 11
Swap (10, 6) as 10 is greater than 6 and increment the pointer j
j
i
10 6 8 2 4 11
6 10 8 2 4 11
As the value 6 is less than 8 do not swap the values, only increment the pointer j
i j
10 6 8 2 4 11
6 10 8 2 4 11
Swap (6, 2) as 6 is greater than 2 and increment the pointer j
i j
10 6 8 2 4 11
2 10 8 6 4 11
As the value 2 is less than 4 do not swap the values, only increment the pointer j
i j
10 6 8 2 4 11
2 10 8 6 4 11
As the value 2 is less than 11 do not swap the values, only increment the pointer j
This complete one iteration and you can observe that the smallest element is present in the
first position. For the second iteration, increment the pointer ‘i’ such that it points to next
location and initialize the pointer ‘j’ to next position of pointer ‘i’. Carry out the same
procedure as explained above and it can be observed that at the end of 2nd iteration the list
will be as follows.
2 4 10 8 6 11
Continue the same process as explained above and at the end of the process it can be
observed that the list will be sorted in ascending order and it looks as follows
2 4 6 8 10 11
If we assume that each step will take 1 unit of time, then the total time taken to sort the
considered 6 elements is 5+4+3+2+1 = 15 unit of times. In general if there are n elements in
n2 3n 2
the unordered list the time taken to sort is (n 1) (n 2) (n 3) ... 3 2 1 .
2
7.3 SELECTION SORT
In case of conventional sort we saw that in each step the smallest element will be brought to
the respective positions. In the conventional algorithm it should be noted that there is possible
of swapping elements in each step, which is more time consuming part. In order to reduce
this swapping time we have another sorting technique called selection sort in which we first
select the smallest element in each iteration and swap that smallest number with the first
element of the unsorted list part.
Consider the same example which was considered in demonstrating the conventional sorting
technique. Similar to conventional sorting technique in this selection sorting technique also
we consider same two pointers ‘i’ and ‘j’. As usual ‘i’ will be pointing to first position and ‘j’
will be pointing to the next position of ‘i’. Find the minimum for all values of ‘j’ i.e.,
i 1 j n . Let the smallest element be present in kth position. If the element pointed by ‘i’
is larger than the element at kth position swap those values else increment ‘i’ and set the value
of j = i+1.
It should be observed clearly that in case of selection sort there is only one swap where as in
case of conventional sorting technique there is possibility of having more than one swapping
of elements in each iteration.
k j
i
10 6 8 2 4 11
10 6 8 2 4 11
Swap (10, 2) as 2 is the smallest among the elements covered by the jth pointer.
2 6 8 10 4 11
Similarly do the process for reaming set and the resultant steps are as follows.
2 6 8 10 4 11
2 4 8 10 6 11
2 4 6 10 8 11
2 4 6 8 10 11
2 4 6 8 10 11
2 4 6 8 11 11
The basic step in this method is to insert an element ‘e’ into a sequence of ordered elements
e1, e2, e3, .. ej in such a way that the resulting sequence of size i+1 is also ordered. We start
with an array of size 1 which is in sorted order. By inserting the second element into its
appropriate position we get an ordered list of two elements. Similarly, each subsequent
element will be added into their respective positions obtaining a partial sorted array. The
algorithm for the insertion sort is a follows.
To illustrate the working principle of the insertion sort let us consider the same set of
elements considered in the above sections.
10 6 8 2 4 11
Consider the first element, as there is only one element it is already sorted.
10 6 8 2 4 11
Now consider the second element 6 and insert that element in the respective position of the
sorted list. As 6 is less than 10 insert 6 before the value 10.
6 10 8 2 4 11
The third element is 8 and the value of 8 is greater than 6 and less than 10. Hence insert the
element 8 in between 6 and 10.
6 8 10 2 4 11
The fourth element is 2 and it is the smallest among all the elements in the partially sorted
list. So insert the value 2 in the beginning of the partially sorted list.
2 6 8 10 4 11
The fifth element is 4 and the value of 4 is greater than 2 and less than remaining elements of
the partially sorted list {6, 8, 10} and hence insert the element 4 in between 2 and {6, 8, 10}.
2 4 6 8 10 11
The remaining element is 11 and it is largest of all elements of the partially sorted list {2, 4,
6, 8, 10}, so leave the element in its place only. The finally sorted list is as follows.
2 4 6 8 10 11
.
The insertion sort works faster than the conventional sort and selection sort. The computation
of time taken to sort an unordered set of elements using insertion sort is left as assignment to
the students (Refer question number 3).
We take an unsorted array for our example. Bubble sort takes Ο(n2) time so we're keeping it
short and precise.
Bubble sort starts with very first two elements, comparing them to check which one is
greater.
In this case, value 33 is greater than 14, so it is already in sorted locations. Next, we compare
33 with 27.
We find that 27 is smaller than 33 and these two values must be swapped.
Next we compare 33 and 35. We find that both are in already sorted positions.
We swap these values. We find that we have reached the end of the array. After one iteration,
the array should look like this
To be precise, we are now showing how an array should look like after each iteration. After
the second iteration, it should look like this
Notice that after each iteration, at least one value moves at the end.
And when there's no swap required, bubble sorts learns that an array is completely sorted.
Algorithm
We assume list is an array of n elements. We further assume that swap function swaps the
values of the given array elements.
begin BubbleSort(list)
return list
end BubbleSort
Pseudocode
We observe in algorithm that Bubble Sort compares each pair of array element unless the
whole array is completely sorted in an ascending order. This may cause a few complexity
issues like what if the array needs no more swapping as all the elements are already
ascending.
To ease-out the issue, we use one flag variable swapped which will help us see if any swap
has happened or not. If no swap has occurred, i.e. the array requires no more processing to be
sorted, it will come out of the loop.
loop = list.count;
end for
end for
One more issue we did not address in our original algorithm and its improvised pseudocode,
is that, after every iteration the highest values settles down at the end of the array. Hence, the
next iteration need not include already sorted elements. For this purpose, in our
implementation, we restrict the inner loop to avoid already sorted values.
#include <stdio.h>
#include <stdbool.h>
#define MAX 10
int list[MAX] = {1,8,4,6,0,3,5,2,7,9};
void display() {
int i;
printf("[");
Output
Input Array:[1 8 4 6 0 3 5 2 7 9 ]
Items compared: [ 1, 8 ] => not swapped
Items compared: [ 8, 4 ] => swapped [4, 8]
Items compared: [ 8, 6 ] => swapped [6, 8]
Items compared: [ 8, 0 ] => swapped [0, 8]
Items compared: [ 8, 3 ] => swapped [3, 8]
Items compared: [ 8, 5 ] => swapped [5, 8]
Items compared: [ 8, 2 ] => swapped [2, 8]
Items compared: [ 8, 7 ] => swapped [7, 8]
Items compared: [ 8, 9 ] => not swapped
Iteration 1#: [1 4 6 0 3 5 2 7 8 9 ]
Items compared: [ 1, 4 ] => not swapped
Items compared: [ 4, 6 ] => not swapped
Items compared: [ 6, 0 ] => swapped [0, 6]
Items compared: [ 6, 3 ] => swapped [3, 6]
Items compared: [ 6, 5 ] => swapped [5, 6]
Items compared: [ 6, 2 ] => swapped [2, 6]
Items compared: [ 6, 7 ] => not swapped
Items compared: [ 7, 8 ] => not swapped
Iteration 2#: [1 4 0 3 5 2 6 7 8 9 ]
Items compared: [ 1, 4 ] => not swapped
Items compared: [ 4, 0 ] => swapped [0, 4]
Items compared: [ 4, 3 ] => swapped [3, 4]
Items compared: [ 4, 5 ] => not swapped
Items compared: [ 5, 2 ] => swapped [2, 5]
Items compared: [ 5, 6 ] => not swapped
Items compared: [ 6, 7 ] => not swapped
Iteration 3#: [1 0 3 4 2 5 6 7 8 9 ]
Items compared: [ 1, 0 ] => swapped [0, 1]
Items compared: [ 1, 3 ] => not swapped
Items compared: [ 3, 4 ] => not swapped
Items compared: [ 4, 2 ] => swapped [2, 4]
Items compared: [ 4, 5 ] => not swapped
Items compared: [ 5, 6 ] => not swapped
Iteration 4#: [0 1 3 2 4 5 6 7 8 9 ]
Items compared: [ 0, 1 ] => not swapped
Items compared: [ 1, 3 ] => not swapped
Items compared: [ 3, 2 ] => swapped [2, 3]
Items compared: [ 3, 4 ] => not swapped
Items compared: [ 4, 5 ] => not swapped
Iteration 5#: [0 1 2 3 4 5 6 7 8 9 ]
Items compared: [ 0, 1 ] => not swapped
Items compared: [ 1, 2 ] => not swapped
Items compared: [ 2, 3 ] => not swapped
Items compared: [ 3, 4 ] => not swapped
Output Array: [0 1 2 3 4 5 6 7 8 9 ]
7.6 QUICKSORT
Quick sort is a highly efficient sorting algorithm and is based on partitioning of array of data
into smaller arrays. A large array is partitioned into two arrays one of which holds values
smaller than the specified value, say pivot, based on which the partition is made and another
array holds values greater than the pivot value.
Quicksort partitions an array and then calls itself recursively twice to sort the two resulting
subarrays. This algorithm is quite efficient for large-sized data sets as its average and worst-
case complexity are O(n2), respectively.
Partition in Quick Sort
Following animated representation explains how to find the pivot value in an array.
The pivot value divides the list into two parts. And recursively, we find the pivot for each
sub-lists until all lists contains only one element.
Quick Sort Pivot Algorithm
Based on our understanding of partitioning in quick sort, we will now try to write an
algorithm for it, which is as follows.
Step 1 − Choose the highest index value has pivot
Step 2 − Take two variables to point left and right of the list excluding pivot
Step 3 − left points to the low index
Step 4 − right points to the high
Step 5 − while value at left is less than pivot move right
Step 6 − while value at right is greater than pivot move left
Step 7 − if both step 5 and step 6 does not match swap left and right
Step 8 − if left ≥ right, the point where they met is new pivot
Quick Sort Pivot Pseudocode
The pseudocode for the above algorithm can be derived as −
function partitionFunc(left, right, pivot)
leftPointer = left
rightPointer = right - 1
while True do
while A[++leftPointer] < pivot do
//do-nothing
end while
end while
swap leftPointer,right
return leftPointer
end function
Quick Sort Algorithm
Using pivot algorithm recursively, we end up with smaller possible partitions. Each partition
is then processed for quick sort. We define recursive algorithm for quicksort as follows −
Step 1 − Make the right-most index value pivot
Step 2 − partition the array using pivot value
Step 3 − quicksort left partition recursively
Step 4 − quicksort right partition recursively
Quick Sort Pseudocode
To get more into it, let see the pseudocode for quick sort algorithm −
procedure quickSort(left, right)
if right-left <= 0
return
else
pivot = A[right]
partition = partitionFunc(left, right, pivot)
quickSort(left,partition-1)
quickSort(partition+1,right)
end if
end procedure
#include <stdio.h>
#include <stdbool.h>
#define MAX 7
int intArray[MAX] = {4,6,3,2,1,9,7};
void printline(int count) {
int i;
for(i = 0;i < count-1;i++) {
printf("=");
}
printf("=\n");
}
void display() {
int i;
printf("[");
while(true) {
while(intArray[++leftPointer] < pivot) {
//do nothing
}
while(rightPointer > 0 && intArray[--rightPointer] > pivot) {
//do nothing
}
if(leftPointer >= rightPointer) {
break;
} else {
printf(" item swapped :%d,%d\n", intArray[leftPointer],intArray[rightPointer]);
swap(leftPointer,rightPointer);
}
}
printf(" pivot swapped :%d,%d\n", intArray[leftPointer],intArray[right]);
swap(leftPointer,right);
printf("Updated Array: ");
display();
return leftPointer;
}
void quickSort(int left, int right) {
if(right-left <= 0) {
return;
} else {
int pivot = intArray[right];
int partitionPoint = partition(left, right, pivot);
quickSort(left,partitionPoint-1);
quickSort(partitionPoint+1,right);
}
}
int main() {
printf("Input Array: ");
display();
printline(50);
quickSort(0,MAX-1);
printf("Output Array: ");
display();
printline(50);
}
Output
Array elements before quick sort are: [4 6 3 2 1 9 7 ]
**************************************************
pivot swapped: 9, 7
Updated Array: [4 6 3 2 1 7 9 ]
pivot swapped: 4, 1
Updated Array: [1 6 3 2 4 7 9 ]
item swapped: 6, 2
pivot swapped: 6, 4
Updated Array: [1 2 3 4 6 7 9 ]
pivot swapped: 3, 3
Updated Array: [1 2 3 4 6 7 9 ]
Array elements after quick sort are: [1 2 3 4 6 7 9 ]
7.7 MERGESORT
The basic concept of merge sort is like this. Consider a series of n numbers, say A(1), A(2)
……A(n/2) and A(n/2 + 1), A(n/2 + 2) ……. A(n). Suppose we individually sort the first set
and also the second set. To get the final sorted list, we merge the two sets into one common
set.
We first look into the concept of arranging two individually sorted series of numbers into a
common series using an example:
Let the first set be A = {3, 5, 8, 14, 27, 32}. Let the second set be B = {2, 6, 9, 15, 18, 30}.
The two lists need not be equal in length. For example the first list can have 8 elements and
the second 5. Now we want to merge these two lists to form a common list C. Look at the
elements A(1) and B(1), A(1) is 3, B(1) is 2. Since B(1) < A(1), B(1) will be the first element
of C i.e., C(1)=2. Now compare A(1) =3 with B(2) =6. Since A(1) is smaller then B(2), A(1)
will become the second element of C. C[ ] = {2, 3}
Similarly compare A(2) with B(2), since A(2) is smaller, it will be the third element and so
on. Finally, C is built up as C[ ]= {2, 3, 5, 6, 8, 9, 14, 15, 18, 27, 30, 32}.
However the main problem remains. In the above example, we presume that both A & B are
originally sorted. Then only they can be merged. But, how do we sort them in the first? To do
this and show the consequent merging process, we look at the following example. Consider
the series A= (7 5 15 6 4). Now divide A into 2 parts (7, 5, 15) and (6, 4). Divide (7, 5, 15)
again as ((7, 5) and (15)) and (6, 4) as ((6) (4)). Again (7, 5) is divided and hence ((7, 5) and
(15)) becomes (((7) and (5)) and (15)).
Now since every element has only one number, we cannot divide again. Now, we start
merging them, taking two lists at a time. When we merge 7 and 5 as per the example above,
we get (5, 7) merge this with 15 to get (5, 7, 15). Merge this with 6 to get (5, 6, 7, 15).
Merging this with 4, we finally get (4, 5, 6, 7 and 15). This is the sorted list.
You are now expected to take different sets of examples and see that the method always
works.
We design two algorithms in the following. The main algorithm is a recursive algorithm
(somewhat similar to the binary search algorithm that we saw earlier) which calls at times the
other algorithm called MERGE. The algorithm MERGE does the merging operation as
discussed earlier.
Algorithm: MERGESORT
Input: low, high, the lower and upper limits of the list to be sorted
A, the list of elements
Output: A, Sorted list
Method:
If (low<high)
Mid = (low + high)/2
MERGESORT(low, mid)
MERGESORT (mid, high)
MERGE(A, low, mid, high)
If end
Algorithm ends
Algorithm: Merge
Input: low, mid, high, limits of two lists to be merged i.e., A(low, mid) and A(mid+1, high)
A, the list of elements
Output: B, the merged and sorted list
Method:
h = low, i = low, j = mid + 1;
While ((h dŠ mid) and (j dŠ high)) do
If (A(h) dŠ A(j) )
B(i) = a(h);
h = h+1;
else
B(i) = A(j);
j = j+1;
If end
i = i+1;
If (h > mid)
For k = j to high
B(i) = A(k);
i = i+1;
For end
Else
For k = h to mid
B(i) = A(k);
i = i+1
For end
If end
While end
Algorithm ends
We now look at the concept of heap sort - a method of arranging numbers in ascending order. A
heap is defined as a collection of numbers, normally arranged in the form of a tree - A binary tree
to be more precise (students can look into a later block to look at the concept of binary trees). The
condition is that the parent node is always larger than the child nodes. For example in the
following figures the first two represent heaps, while the others do not.
What we have shown are only two levels, but the same can be rearranged for any number of
levels and any number of elements.
To insert an element into the heap, one adds it “at the bottom” of the heap and then compares it
with its parent, grandparent, great grandparent and so on, until it is less than or equal to one of
these values.
Let us consider the incoming numbers in the sequence (75, 103, 65, 105, 88, 96, 100). Now we
want to construct a heap for the same
Note that though we have ended up in a balanced tree in this case, it may not be
so always. Now take out the largest element 105, put it at the end of the sorted list and rearrange
the heap. 103 comes to top, remove it, rearrange the heap again etc.We now write algorithms to
do the same. The first one is the most important procedure that actually creates the heap. The
other two call on this to produce the required sorted array.
Algorithm Insert
Input: A, An unordered set
N, size of set A
Output:
Insert A[n] into the heap which is stored in A[1: n-1]
Method:
i=n
item=A[n];
While ( (i>n) and (A[ i!/2 ] < item)) do
A[ i ] = A[i/2];
i =i/2;
While end
A[i]=item;
Algorithm end
Algorithm Adjust
Input: A, i, n
Output: Updated A
Method:
j=2i
item=A[i]
While (j<=n) do
If ((j<=n) and (A[j] < A[j+1])) then
j=j+1
//compare left and right child and let j be the right //child
If (item >= A[i]) then break
// a position for item is found
A[i/2]=A[j]
j=2i
While end
A[j/2]=item;
Algorithm end
Algorithm: Delmax
Input: A n x
// Delete the maximum from the heap A[1..n] and store it in x
If (n=0) then
Write (‘heap is empty”);
If end
x=A[1];
A[1] =A[n];
Adjust (A, 1, n-1);
Return (true);
Algorithm end
To delete the maximum key from the max heap, we use an algorithm called Adjust. Adjust takes
asinput the array A[ ] and integer I and n. It regards A[1..n] as a complete binary tree. If the sub
treesrooted at 2I and 2I+1 are max heaps, then adjust will rearrange elements of A[ ] such that the
tree rooted at I is also a max heap. The maximum elements from the max heap A[1..n] can be
deleted by deleting the root of the corresponding complete binary tree. The last element of the
array, i.e. A[n], is copied to the root, and finally we call Adjust(A, 1, n-1).
Algorithm: sort
Input: Unsorted list - A, size of the list - n
Output: Sorted list A
Method:
For i=1 to n do
Insert (A, i)
For end
For i= n to 1 step –1 do
Delmax (A, i, x);
A[i] =x;
For end
Algorithm end
Applications of sorting can find in many areas. Some applications of sorting are
(1). Speeding up the searching process is one of the most important applications of the
sorting technique. Example. To search an element for its existence in any list,
applying sorting prior to searching makes the process to work faster.
(2). In many of the statistical procedures sorting is necessary. Example, to find median of
any set of elements sorting is must.
(3). To find duplicate elements present in the list.
7.10 SUMMARY
In this unit we have presented the basics of sorting technique. We have presented three
different sorting techniques such as conventional sort, selection sort and insertion sort. The
algorithm for all the three-sorting technique is given in this unit and demonstrated with
suitable example.
7.11 KEYWORDS
(1). Define sorting. Mention the applications of sorting with respect the computer science
field.
(2). Design and develop an algorithm to sort unordered set of element in descending order
using (i) Conventional sort (ii) Selection sort (iii) Insertion sort.
(3). Calculate the time taken to sort n elements present in an unordered list using insertion
sort.
(4). Sort the given unordered set {12,2,16,30,8,28,4,10,20,6,18} using conventional sort
and count the number of swapping took during the sorting process
(5). For the same unordered set given in question 1 sort using selection sort and insertion
sort and count the number of swapping too during the sorting process.
(6). Design an algorithm to sort given set of n numbers using merge sorting technique
(7). Design an algorithm to sort n number using heap sort. Consider a set A =
{12,2,16,30,8,28,4,10,20,6,18}, illustrate the designed algorithm on set A.
7.13REFERENCES
Ellis Horowitz, Sartaj Sahni, and Dinesh Mehta. Fundamental of Data Structures in
C++
Alfred V. Aho , Jeffrey D. Ullman, John E. Hopcroft. Data Structures and Algorithms.
Addison Wesley (January 11, 1983)
UNIT-8.0 SEARCHING ALGORITHMS
8.0 Objectives
8.1 Introduction to Searching Techniques
8.2 Linear Search
8.3 Binary Search
8.4 Depth First Search
8.5 Breadth First Search
8.6 Summary
8.7 Keywords
8.8 Questions
8.9 Reference
8,0 OBJECTIVES
Linear search is a method of searching an element in the list where the given search element
‘e’ is compared against all elements sequentially until there is atleast one match (i.e.,
Success) or search process reaches the end of the list without finding any match (i.e.,
Failure). Let A = [10 15 6 23 8 96 55 44 66 11 2 30 69 96] and searching element ‘e’ = 11.
Consider a pointer ‘i’, to begin with the process initialize the pointer ‘i’ = 1. Compare the
value pointed by the pointer with the searching element ‘e’ = 11. As A(1) = 10 and its is not
equal to element ‘e’ increment the pointer i by i+1. Compare the value pointed by pointer
i.e., A(2) = 15 and it is also not equal to element ‘e’. Continue the process until the search
element is found or the pointer ‘i’ reaches the end of the list.
Step 1 10 ≠11
A= 10 15 6 23 11 96 55 44 66 8 2 30 69 96
15 ≠11
Step 2
A= 10 15 6 23 11 96 55 44 66 8 2 30 69 96
Step 3 6 ≠11
A= 10 15 6 23 11 96 55 44 66 8 2 30 69 96
Step 4 23 ≠11
A= 10 15 6 23 11 96 55 44 66 8 2 30 69 96
Step 5 11 = 11
A= 10 15 6 23 11 96 55 44 66 8 2 30 69 96
The time taken to search an element in the list is ‘n+1’ in the worst case when the element is
not present in the list. In case if element is present the time taken will be the time taken to
reach the position of the element. If the element is present at the end of the list of size ‘n’ the
time taken is ‘n’ and if the element is present in the first position the time required is 1 unit.
Linear search or sequential search suits the applications where the size of the list is less. If the
size of the list is more, then linear search may perform very poor.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
A= 10 15 6 23 11 96 55 44 66 8 2 30 69 96
Let Low = 1; High = 14 be the initial values, where Low is the starting position and High is
the last position of the list.
Sort the given list using any of the sorting techniques. In this illustrative we assume that we
have sorted list of A.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
A= 2 6 8 10 11 15 23 30 44 55 66 69 96 96
1 2 3 4 5 6 7 8 9 10 11 12 13 14
A= 2 6 8 10 11 15 23 30 44 55 66 69 96 96
1 2 3 4 5 6
A= 2 6 8 10 11 15
Since the search element is not equal to A(mid) and A(mid) is less than the search element,
the search element should be present in the second half of the list. Consider second half as the
new list and continue the same procedure neglecting the first half of the list. For further
processing initialization are as follows
Low = Mid+1, i.e., Mid =3+1 = 4; Mid = (Low+High)/2; i.e., Mid = (4+6)/2 = 10/2 = 5. In
this case the value of High remains same i.e., High = 6.
4 5 6
A= 10 11 15
A(Mid) =
11
As A(Mid) is equal to search element, it can be announced that Search element is found in
position 5. The recursive algorithm for binary search is as follows.
The techniques that were discussed in above two sections are only for linear type of
structures (Specifically for arrays). In next two sections we present two different approaches
called Depth first search and breadth first search algorithms, which works on graph data
structures. Normally these two approaches are used to search a node in a graph or to traverse
any undirected graph. Traversing a graph without revisiting the same node is a difficult task
as it involves loops, circuits. In this unit we consider the problem of searching a node in a
graph.
Depth first search algorithm starts visiting nodes of a graph arbitrarily, marking that node as
visited node. Soon after visiting any node (current node) we consider any of its adjacent
nodes as next node for traversal and the current node address will be stored in stack data
structure and traverse to the next adjacent node. The same thing is processed until no node
can be processed further. If there are any nodes which are not visited then backtracking is
used until all the nodes are visited. In depth first search stack will be used as a storage
structure to store information about the nodes which will be used during backtracking.
Before knowing how to search a node in a graph using depth first search we need to
understand how depth first search can be used for traversal of graph. Consider a graph G as
shown in Figure 1(a). The traversal starts with node 1 (Figure 1(b)), mark the node as
traversed (Gray shading is used to indicate that the node is traversed) and push the node
number 1 into the stack. As it has only one adjacent node 4 we will move to node number 4.
Mark the node number 4 Figure 1(c) and push 4 into the stack. For node number 4 there are 2
adjacent nodes i.e., 2 and 5.
(a) (b) (c)
(j) (k)
Figure 1: Traversal of a graph using depth first search algorithm
Select one node arbitrarily (For implementation purpose we can select the node with smallest
number) and move to that node, in this case we will move to node 2 and push the node
number 2 to stack. Similarly we will move to node 5 from node 2, pushing 5 into stack and
then move to node 3 from node 5 and push node 3 into the stack (Figure (1(d, e, f, g, h, I, j)).
Figure 1(k) shows the elements present in the stack at the end. From node 3 there is no
possibility to traverse further. From this point onwards we will backtrack to check whether
there are any other nodes which are not traversed. Pop the top node 3 from stack. Now,
check is there any possibility to traverse from the element present in the top of the stack. The
top element is 5 and there is an edge with has not been traversed from the node 5 (See Figure
2(b), the line marked in red color is untraversed edge). This edge leads to 4 which has been
already visited and there is no other possibility for traversing from node 5, pop node 5 from
the stack. Do the same process and at the end there will be no elements in the stack indicating
that all the vertices of the graph have been traversed.
(a) (b) (c)
(g)
Figure 2. Backtracking operations for the depth first search algorithm
Figure 1 and Figure 2 demonstrated the depth first search for traversal purpose. The same
technique can be used to search an element in the graph. Given a graph with n nodes we can
search whether given node is present in the graph or not. Each time we visit a node we check
whether that node is same as the search node, if it is stop the procedure declaring that the
node is present else push that node into the stack and traverse until you the stack become
empty.
Let us consider a tree example and illustrate the working principle of the depth first search.
Let the search element be F.
Figure 4(b).
Figure 4(c)
Figure 4(d)
Figure 4(a) – 4(d) : Various steps in depth first search algorithm.
Note: Depth first search method uses stack as a data structure.
Analogous to depth first search which search the nodes from top to bottom fashion
postponing the traversal of adjacent elements, the breadth first search algorithm first traverse
adjacent nodes of a starting node, then all unvisited nodes in a connected graph will be
traversed in the same manner.It is convenient to use a queue to trace the operation of breadth
first search. The queue is initialized with the traversal’s starting node, which is marked as
visited. On each iteration, the algorithm identifies all unvisited nodes that are adjacent to the
front node, marks them as visited, and adds them to the queue; after that front node is
removed from the queue.
Let us consider the same example of tree traversal Figure 3. Starting node is A, Insert A into
queue mark A as traversed. Move to its successor element {B, C}, push them to queue and
mark them as traversed. Since there is no other adjacent element to node A, remove A from
which is first element in the queue. The next element in the queue is B, check for its
successor node. Since B has no successor elements remove B from the Queue. The next
element in the queue is C, find its successor elements i.e., {D, F}. Insert them into the queue
and correspondingly marks the, as traversed. Since C has no other elements as its successor
remove C from the Queue. The next element in the queue is D, its successor is E insert it into
the queue and mark it as traversed. Now, D has no successor node hence remove D from the
Queue. The next element in the Queue is F, find out is successor i.e., {H, I}. Insert them into
the queue and mark them as visited. Once again the element F has no successor so remove it
from queue and check for next element in the queue. The next element is E and E has no
successor remove it and next elements are H and I. traverse them in the same way.
For searching an element using breadth first search, similar to depth first search we traverse
the graph using breadth first traversal and while traversing the graph if a node same as search
element occurs we declare that search element is present in the graph.
8.6 SUMMARY
In this unit we have presented the basics of searching technique. We have presented four
different searching techniques such as linear search, binary search, depth first search and
depth first search. The first two are for conventional type of data whereas last two are for
searching of elements stored in the form of graph.
8.7 KEYWORDS
Searching technique
Linear search
Binary search
Depth first search
8.8 QUESTIONS FOR SELF STUDY
1. Design an algorithm to search an element ‘e’ from a given list using linear search
technique.
2. Design and develop an algorithm to find a given element ‘e’ using binary search method.
Discuss how it is efficient than linear search method.
3. Mention the difference between depth first search and breadth first search algorithm.
4. Mention the applications of searching algorithms?
5. Consider a list A= {12,2,16,30,8,28,4,10,20,6,18}. Check whether element 6 is present in
the list using binary search technique. Illustrate the searching technique
8.9 REFERENCES
• Ellis Horowitz, Sartaj Sahni, and Dinesh Mehta. Fundamental of Data Structures in
C++
• Alfred V. Aho , Jeffrey D. Ullman, John E. Hopcroft. Data Structures and Algorithms.
Addison Wesley (January 11, 1983)
UNIT 9: HASHING
Structure
9.0 Objectives
9.1 Introduction
9.2 Hashing
9.3 Collision resolution techniques
9.4 Implementation of hashing
9.5 Summary
9.6 Key words
9.7 Questions
9.9 References
9.0 OBJECTIVES
9.1 INTRODUCTION
Hashing refers to the process of generating a fixed-size output from an input of variable
size using the mathematical formulas known as hash functions. This technique determines
an index or location for the storage of an item in a data structure.
Need for Hash data structure
Every day, the data on the internet is increasing multifold and it is always a struggle to
store this data efficiently. In day-to-day programming, this amount of data might not be that
big, but still, it needs to be stored, accessed, and processed easily and efficiently. A very
common data structure that is used for such a purpose is the Array data structure.
Now the question arises if Array was already there, what was the need for a new data
structure! The answer to this is in the word “efficiency“. Though storing in Array takes
O(1) time, searching in it takes at least O(log n) time. This time appears to be small, but for
a large data set, it can cause a lot of problems and this, in turn, makes the Array data
structure inefficient.
So now we are looking for a data structure that can store the data and search in it in
constant time, i.e. in O(1) time. This is how hashing data structure came into play. With the
introduction of the Hash data structure, it is now possible to easily store data in constant
time and retrieve them in constant time as well.
Components of Hashing
There are majorly three components of hashing:
1. Key: A Key can be anything string or integer which is fed as input in the hash
function the technique that determines an index or location for storage of an item
in a data structure.
2. Hash Function: The hash function receives the input key and returns the index of
an element in an array called a hash table. The index is known as the hash index.
3. Hash Table: Hash table is a data structure that maps keys to values using a
special function called a hash function. Hash stores the data in an associative
manner in an array where each data value has its own unique index.
Components of Hashing
Step 4: Now, assume that we have a table of size 7 to store these strings. The
hash function that is used here is the sum of the characters in key mod Table
size. We can compute the location of the string in the array by taking
the sum(string) mod 7.
Step 5: So we will then store
“ab” in 3 mod 7 = 3,
“cd” in 7 mod 7 = 0, and
“efg” in 18 mod 7 = 4.
The above technique enables us to calculate the location of a given string by using a simple
hash function and rapidly find the value that is stored in that location. Therefore the idea of
hashing seems like a great way to store (key, value) pairs of the data in a table.
9.2 HASHING
Hashing is an important data structure designed to solve the problem of efficiently finding
and storing data in an array. For example, if you have a list of 20000 numbers, and you
have given a number to search in that list- you will scan each number in the list until you
find a match.
The hash function in the data structure verifies the file which has been imported from
another source. A hash key for an item can be used to accelerate the process. It increases
the efficiency of retrieval and optimises the search. This is how we can simply give
hashing definition in data structure.
It requires a significant amount of your time to search in the entire list and locate that
specific number. This manual process of scanning is not only time-consuming but
inefficient too. With hashing in the data structure, you can narrow down the search and
find the number within seconds.
It uses hash tables to store the data in an array format. Each value in the array has been
assigned a unique index number. Hash tables use a technique to generate these unique
index numbers for each value stored in an array format. This technique is called the hash
technique.
You only need to find the index of the desired item, rather than finding the data. With
indexing, you can quickly scan the entire list and retrieve the item you wish. Indexing also
helps in inserting operations when you need to insert data at a specific location. No matter
how big or small the table is, you can update and retrieve data within seconds.
The hash table is basically the array of elements and the hash techniques of search are
performed on a part of the item i.e. key. Each key has been mapped to a number, the range
remains from 0 to table size 1
Types of hashing in data structure is a two-step process.
1. The hash function converts the item into a small integer or hash value. This integer is
used as an index to store the original data.
2. It stores the data in a hash table. You can use a hash key to locate data quickly.
Examples of Hashing in Data Structure
The following are real-life examples of hashing in the data structure –
In schools, the teacher assigns a unique roll number to each student. Later, the teacher
uses that roll number to retrieve information about that student.
A library has an infinite number of books. The librarian assigns a unique number to
each book. This unique number helps in identifying the position of the books on the
bookshelf.
Hash Function
The hash function in a data structure maps the arbitrary size of data to fixed-sized data. It
returns the following values: a small integer value (also known as hash value), hash
codes, and hash sums. The hashing techniques in the data structure are very interesting,
such as:
hash = hashfunc(key)
index = hash % array_size
The hash function must satisfy the following requirements:
A good hash function is easy to compute.
A good hash function never gets stuck in clustering and distributes keys evenly across
the hash table.
A good hash function avoids collision when two elements or items get assigned to the
same hash value.
One of the hashing techniques of using a hash function is used for data integrity. If using a
hash function one change in a message will create a different hash.
The three characteristics of the hash function in the data structure are:
1. Collision free
2. Property to be hidden
3. Puzzle friendly
Hash Table
Hashing in data structure uses hash tables to store the key-value pairs. The hash table
then uses the hash function to generate an index. Hashing uses this unique index to
perform insert, update, and search operations.
It can be defined as a bucket where the data are stored in an array format. These data have
their own index value. If the index values are known then the process of accessing the data
is quicker.
How does Hashing in Data Structure Works?
In hashing, the hashing function maps strings or numbers to a small integer value. Hash
tables retrieve the item from the list using a hashing function. The objective of hashing
technique is to distribute the data evenly across an array. Hashing assigns all the elements
a unique key. The hash table uses this key to access the data in the list.
Hash table stores the data in a key-value pair. The key acts as an input to the hashing
function. Hashing function then generates a unique index number for each value stored.
The index number keeps the value that corresponds to that key. The hash function returns
a small integer value as an output. The output of the hashing function is called the hash
value.
Let us understand hashing in a data structure with an example. Imagine you need to
store some items (arranged in a key-value pair) inside a hash table with 30 cells.
The values are: (3,21) (1,72) (40,36) (5,30) (11,44) (15,33) (18,12) (16,80) (38,99)
The hash table will look like the following:
Serial Array
Key Hash
Number Index
1 3 3%30 = 3 3
2 1 1%30 = 1 1
3 40 40%30 = 10 10
4 5 5%30 = 5 5
5 11 11%30 = 11 11
6 15 15%30 = 15 15
7 18 18%30 = 18 18
8 16 16%30 = 16 16
9 38 38%30 = 8 8
The process of taking any size of data and then converting that into smaller data value
which can be named as hash value. This hash value can be used in an index accessible in
hash table. This process defines hashing in data structure.
9.3 COLLISION RESOLUTION TECHNIQUES
Hashing in data structure falls into a collision if two keys are assigned the same index
number in the hash table. The collision creates a problem because each index in a hash
table is supposed to store only one value. Hashing in data structure uses several collision
resolution techniques to manage the performance of a hash table.
In this section, we have explored the idea of collision in hashing and explored different
collision resolution techniques such as:
Open Hashing (Separate chaining)
Quadratic probing
Double hashing
Hash table: a data structure where the data is stored based upon its hashed key which
is obtained using a hashing function.
Hash function: a function which for a given data, outputs a value mapped to a fixed
range. A hash table leverages the hash function to efficiently map data such that it can
be retrieved and updated quickly. Simply put, assume S = {s1, s2, s3, ...., sn} to be a
set of objects that we wish to store into a map of size N, so we use a hash function H,
such that for all s belonging to S; H(s) -> x, where x is guaranteed to lie in the range
[1,N]
Perfect Hash function: a hash function that maps each item into a unique slot (no
collisions).
Hash Collisions: As per the Pigeonhole principle if the set of objects we intend to store
within our hash table is larger than the size of our hash table we are bound to have two or
more different objects having the same hash value; a hash collision. Even if the size of the
hash table is large enough to accommodate all the objects finding a hash function which
generates a unique hash for each object in the hash table is a difficult task. Collisions are
bound to occur (unless we find a perfect hash function, which in most of the cases is hard to
find) but can be significantly reduced with the help of various collision resolution techniques.
Following are the collision resolution techniques used:
Open Hashing (Separate chaining)
Quadratic probing
Double hashing
we'll simply append the colliding values to a list being pointed by their hash keys.
Obviously in practice the table size can be significantly large and the hash function can be
even more complex, also the data being hashed would be more complex and non-primitive,
but the idea remains the same.
This is an easy way to implement hashing but it has its own demerits.
The lookups/inserts/updates can become linear [O(N)] instead of constant time [O(1)]
if the hash function has too many collisions.
It doesn't account for any empty slots which can be leveraged for more efficient
storage and lookups.
Ideally, we require a good hash function to guarantee even distribution of the values.
Say, for a load factor
λ=number of objects stored in table/size of the table (can be >1)
a good hash function would guarantee that the maximum length of list associated with
each key is close to the load factor.
Note that the order in which the data is stored in the lists (or any other data structures) is
based upon the implementation requirements. Some general ways include insertion order,
frequency of access etc.
Array
Serial Array Index after
Key Hash
Number Index Linear
Probing
1 3 3%30 = 3 3 3
2 1 1%30 = 1 1 1
3 63 63%30 = 3 3 4
4 5 5%30 = 5 5 5
5 11 11%30 = 11 11 11
6 15 15%30 = 15 15 15
7 18 18%30 = 18 18 18
8 16 16%30 = 16 16 16
9 46 46%30 = 8 16 17
The idea of linear probing is simple, we take a fixed sized hash table and every time we face
a hash collision, we linearly traverse the table in a cyclic manner to find the next empty slot.
Assume a scenario where we intend to store the following set of numbers =
{0,1,2,4,5,7} into a hash table of size 5 with the help of the following hash function
H, such that H(x) = x%5.
So, if we were to map the given data with the given hash function we'll get the
corresponding values
H(0)-> 0%5 = 0
H(1)-> 1%5 = 1
H(2)-> 2%5 = 2
H(4)-> 4%5 = 4
H(5)-> 5%5 = 0
in this case we see a collision of two terms (0 & 5). In this situation we move linearly
down the table to find the first empty slot. Note that this linear traversal is cyclic in
nature, i.e. in the event we exhaust the last element during the search we start again
from the beginning until the initial key is reached.
b. Quadratic Probing
This method lies in the middle of great cache performance and the problem of clustering. The
general idea remains the same, the only difference is that we look at the Q(i) increment at
each iteration when looking for an empty bucket, where Q(i) is some quadratic expression of
i. A simple expression of Q would be Q(i) = i^2, in which case the hash function looks
something like this:
H(x, i) = (H(x) + i^2)%N
In general, H(x, i) = (H(x) + ((c1\*i^2 + c2\*i + c3)))%N, for some choice of
constants c1, c2, and c3
Despite resolving the problem of clustering significantly it may be the case that in
some situations this technique does not find any available bucket, unlike linear
probing which always finds an empty bucket.
Luckily, we can get good results from quadratic probing with the right combination of
probing function and hash table size which will guarantee that we will visit as many
slots in the table as possible. In particular, if the hash table's size is a prime number
and the probing function is H(x, i) = i^2, then at least 50% of the slots in the table will
be visited. Thus, if the table is less than half full, we can be certain that a free slot will
eventually be found.
Alternatively, if the hash table size is a power of two and the probing function is H(x,
i) = (i^2 + i)/2, then every slot in the table will be visited by the probing function.
Assume a scenario where we intend to store the following set of numbers = {0,1,2,5}
into a hash table of size 5 with the help of the following hash function H, such
that H(x, i) = (x%5 + i^2)%5.
Clearly 5 and 0 will face a collision, in which case we'll do the following:
- we look at 5%5 = 0 (collision)
- we look at (5%5 + 1^2)%5 = 1 (collision)
- we look at (5%5 + 2^2)%5 = 4 (empty -> place element here)
c. Double Hashing
Double Hashing
The double hashing technique uses two hash functions. The second hash function comes
into use when the first function causes a collision. It provides an offset index to store the
value.
The formula for the double hashing technique is as follows:
(firstHash(key) + i * secondHash(key)) % sizeOfTable
Where i is the offset value. This offset value keeps incremented until it finds an empty
slot.
For example, you have two hash functions: h1 and h2. You must perform the following
steps to find an empty slot:
1. Verify if hash1(key) is empty. If yes, then store the value on this slot.
2. If hash1(key) is not empty, then find another slot using hash2(key).
3. Verify if hash1(key) + hash2(key) is empty. If yes, then store the value on this slot.
4. Keep incrementing the counter and repeat with hash1(key)+2hash2(key),
hash1(key)+3hash2(key), and so on, until it finds an empty slot.
This method is based upon the idea that in the event of a collision we use an another hashing
function with the key value as an input to find where in the open addressing scheme the data
should actually be placed at.
In this case we use two hashing functions, such that the final hashing function looks
like:
H(x, i) = (H1(x) + i*H2(x))%N
Typically for H1(x) = x%N a good H2 is H2(x) = P - (x%P), where P is a prime
number smaller than N.
A good H2 is a function which never evaluates to zero and ensures that all the cells of
a table are effectively traversed.
Assume a scenario where we intend to store the following set of numbers = {0,1,2,5}
into a hash table of size 5 with the help of the following hash function H, such that
H(x, i) = (H1(x) + i*H2(x))%5
H1(x) = x%5 and H2(x) = P - (x%P), where P = 3
(3 is a prime smaller than 5)
Clearly 5 and 0 will face a collision, in which case we'll do the following:
- we look at 5%5 = 0 (collision)
- we look at (5%5 + 1*(3 - (5%3)))%5 = 1 (collision)
- we look at (5%5 + 2*(3 - (5%3)))%5 = 2 (collision)
- we look at (5%5 + 3*(3 - (5%3)))%5 = 3 (empty -> place element here)
Define a data item having some data and key, based on which the search is to be conducted in
a hash table.
struct DataItem {
int data;
int key;
};
Hash Method
Define a hashing method to compute the hash code of the key of the data item.
}
Search Operation
Whenever an element is to be searched, compute the hash code of the key passed and locate
the element using that hash code as index in the array. Use linear probing to get the element
ahead if the element is not found at the computed hash code.
Example
struct DataItem *search(int key) {
//get the hash
int hashIndex = hashCode(key);
//move in array until an empty
while(hashArray[hashIndex] != NULL) {
if(hashArray[hashIndex]->key == key)
return hashArray[hashIndex];
//go to next cell
++hashIndex;
//wrap around the table
hashIndex %= SIZE;
}
return NULL;
}
Insert Operation
Whenever an element is to be inserted, compute the hash code of the key passed and locate
the index using that hash code as an index in the array. Use linear probing for empty location,
if an element is found at the computed hash code.
Example
item->data = data;
item->key = key;
++hashIndex;
hashIndex %= SIZE;
}
hashArray[hashIndex] = item;
}
Delete Operation
Whenever an element is to be deleted, compute the hash code of the key passed and locate the
index using that hash code as an index in the array. Use linear probing to get the element
ahead if an element is not found at the computed hash code. When found, store a dummy
item there to keep the performance of the hash table intact.
Example
struct DataItem* delete(struct DataItem* item) {
int key = item->key;
//get the hash
int hashIndex = hashCode(key);
//move in array until an empty
while(hashArray[hashIndex] !=NULL) {
if(hashArray[hashIndex]->key == key) {
struct DataItem* temp = hashArray[hashIndex];
//assign a dummy item at deleted position
hashArray[hashIndex] = dummyItem;
return temp;
} //go to next cell
++hashIndex;
//wrap around the table
hashIndex %= SIZE;
}
return NULL;
}
9.5 SUMMARY
In this unit we have studied hashing in detail. We also learnt collision resolution techniques.
At the end of this unit dealt with implementation of hashing.
9.6 KEYWORDS
Key, Hashing, Hash function, Hash table, Search, Insert and Delete .
9.7 QUESTIONS
1. Define hashing.
2. Discuss components of hashing.
3. Explain hash function.
4. Describe open hashing.
5. Briefly explain liner probing.
9.8 REFERENCES
10.0 Objectives
10.1 Introduction
10.2 Priority queue
10.3 Implementation of priority queue
10.4 Heap data structure
10.5 Implementation of heap data structure
10.6 Summary
10.7 Key words
10.8 Questions
10.9 References
10.0 OBJECTIVES
10.1 INTRODUCTION
The priority queue in the data structure is an extension of the “normal” queue. It is an
abstract data type that contains a group of items. It is like the “normal” queue except that
the dequeuing elements follow a priority order. The priority order dequeues those items
first that have the highest priority.
It is an abstract data type that provides a way to maintain the dataset. The “normal” queue
follows a pattern of first-in-first-out. It dequeues elements in the same order followed at
the time of insertion operation. However, the element order in a priority queue depends on
the element’s priority in that queue. The priority queue moves the highest priority
elements at the beginning of the priority queue and the lowest priority elements at the back
of the priority queue.
It supports only those elements that are comparable. Hence, a priority queue in the data
structure arranges the elements in either ascending or descending order.
You can think of a priority queue as several patients waiting in line at a hospital. Here, the
situation of the patient defines the priority order. The patient with the most severe injury
would be the first in the queue.
You can implement the priority queues in one of the following ways:
Linked list
Binary heap
Arrays
Binary search tree
The binary heap is the most efficient method for implementing the priority queue in
the data structure.
The below tables summarize the complexity of different operations in a priority queue.
Heap Data Structure is also known as Binary Heap that is in the form of a tree and follows
the property of a complete binary tree such that all the levels of the tree are filled by nodes
except the last level, that can be partially filled.
Although it is represented in the form of a tree, it is stored in the memory as an array unlike a
tree that is obtained through referring to child nodes. As the elements are stored contiguously
in an array, it is more cache-friendly while the complete binary tree ensures that there are the
least number of tree levels possible for total elements.
The following mathematical formula can help us find the left child and right child or a tree or
even the parent in the array.
Here, i, means index such that if we want any such relationship, we can substitute the value
of the index and find out our necessary requirements easily.
Now that we have an idea on what is heap in data structure, we move on looking at the use
cases of heap data structure. Heap data structure is used to implement priority queues in
problem-solving while it is extensively used in heap sort which will be covered in the further
sections alongside heap sort algorithm.
1. Min Heap :- The smallest element is present at the root of the tree in the min heap such
that it is easier to extract the smallest element when heap pop is performed.
2. Max Heap :- The greatest element is present at the root of the tree in the max heap such
that it is easier to extract the largest element when heap pop is performed.
Output
16
14
12
2) Implement Priority Queue Using Linked List:
In a LinkedList implementation, the entries are sorted in descending order based on their
priority. The highest priority element is always added to the front of the priority queue,
which is formed using linked lists. The functions like push(), pop() and peek() are used to
implement a priority queue using a linked list and are explained as follows:
push(): This function is used to insert new data into the queue.
pop(): This function removes the element with the highest priority from the
queue.
peek() / top(): This function is used to get the highest priority element in the
queue without removing it from the queue.
return temp;
}
// Return the value at head
int peek(Node** head) { return (*head)->data; }
// Removes the element with the
// highest priority form the list
void pop(Node** head)
{
Node* temp = *head;
(*head) = (*head)->next;
free(temp);
}
// Function to push according to priority
void push(Node** head, int d, int p)
{
Node* start = (*head);
// Create new Node
Node* temp = newNode(d, p);
// Special Case: The head of list has
// lesser priority than new node
if ((*head)->priority < p) {
// Insert New Node before head
temp->next = *head;
(*head) = temp;
}
else {
// Driver code
int main()
{
// Create a Priority Queue
// 7->4->5->6
Node* pq = newNode(4, 1);
push(&pq, 5, 2);
push(&pq, 6, 3);
push(&pq, 7, 0);
while (!isEmpty(&pq)) {
cout << " " << peek(&pq);
pop(&pq);
}
return 0;
}
Output
6547
Note: We can also use Linked List, time complexity of all operations with linked list
remains same as array. The advantage with linked list is deleteHighestPriority() can be
more efficient as we don’t have to move items.
3) Implement Priority Queue Using Heaps:
Binary Heap is generally preferred for priority queue implementation because heaps
provide better performance compared to arrays or LinkedList. Considering the properties of
a heap, The entry with the largest key is on the top and can be removed immediately. It
will, however, take time O(log n) to restore the heap property for the remaining keys.
However if another entry is to be inserted immediately, then some of this time may be
combined with the O(log n) time needed to insert the new entry. Thus the representation of
a priority queue as a heap proves advantageous for large n, since it is represented efficiently
in contiguous storage and is guaranteed to require only logarithmic time for both insertions
and deletions. Operations on Binary Heap are as follows:
insert(p): Inserts a new element with priority p.
extractMax(): Extracts an element with maximum priority.
remove(i): Removes an element pointed by an iterator i.
getMax(): Returns an element with maximum priority.
changePriority(i, p): Changes the priority of an element pointed by i to p.
A heap is a complete binary tree structure where each element satisfies a heap property. In a
complete binary tree, all levels are full except the last level, i.e., nodes in all levels except the
last level will have two children. The last level will be filled from the left. Here, each heap
node stores a value key, which defines the relative position of that node inside the heap.
There are two types of heap data structures: max heap and min heap.
All elements in this heap satisfy the property that the key of the parent node is greater than or
equal to the keys of its child nodes i.e. key of a node >= key of its children. So, moving up
from any node, we get a nondecreasing sequence of keys, and moving down from any node,
we get a nonincreasing sequence of keys. In particular: The largest key in a max-heap is
found at the root.
All elements in this heap satisfy the property that the key of the parent node is less than or
equal to the keys of its child nodes i.e. key of a node <= key of its children. So, moving up
from any node, we get a nonincreasing sequence of keys, and moving down from any node,
we get a nondecreasing sequence of keys. In particular: The smallest key in a min-heap is
found at the root.
A binary heap is a binary tree that satisfies two properties: (1) Shape property: all
levels, except the last level, are fully filled and the last level is filled from left to right
(2) Heap property
Level order traversal of the heap will give the order in which elements are filled in the
array.
Heap is a complete tree structure, so we define the height of a node in a heap as the
number of edges on the longest path from the node to a leaf.
We define the height of the heap to be the height of its root. Since a heap of n
elements is based on a complete binary tree, its height is O(logn).
In the worst case, we shall see that the basic operations on heaps run in time
proportional to the tree's height and thus take O(logn) time.
A binary heap can be represented using an array where the indices of the array capture the
parent-child relationship. Suppose A[] be a heap array of size n:
Note: In most programming languages, these operations can be implemented efficiently using
bitwise operators. Therefore, an array representation is a space-efficient approach as we don’t
need to store extra 3 pointers per node in the heap.
Thus, for a max-heap, we can say that A[i] ≥ A[2i + 1] and A[i] ≥ A[2*i + 2], where
(2*i + 1) and (2i + 2) are < n. As we know, the key of each node in a max-heap is
greater than the key of its children, hence, the maximum key in the heap will be stored
at the root, that is, at A[0].
Similarly, min-heap will satisfy the property that for any index i, A[i] ≤ A[2i + 1] and
A[i] ≤ A[2*i + 2], where (2*i + 1) and (2i + 2) are < n. Thus, for a min-heap, the
minimum element will be at the root of the heap and thus, at A[0].
Various operations supported by max heap is described below on a high level and will be
covered in more detail in subsequent articles:
maxHeapify(A[], i): It is a method to rearrange the elements of the heap in order to
maintain the heap property. This process is required when a certain node at index i
causes an imbalance in the heap due to some operation on that node.
buildMaxHeap(A[]): We can use the procedure to convert an input array into a max-
heap.
findMax(heap[]): This operation returns the maximum value in the heap and its time
complexity is O(1) as it just needs to return A[0].
extractMax(A[]): This operation removes the maximum element from the heap and
returns it. The time complexity of this operation is O(logn) as we replace A[0] with
A[n-1] — the last element of the heap, and then do some operations to maintain the
max-heap property.
increaseKey(A[], i, v): This operation increases the value at index i in the array to
value v. This operation is only valid if A[i] ≤ v, that is the new value is greater than
the existing value at index i. This ensures that the subtree rooted at index i is still a
max-heap. The complexity of this operation is O(logn) as after increasing the key at
index i, the max-heap property of Parent(i) might be violated, and we might need to
perform some operations to restore it.
insertKey(A[], v): This operation inserts the element v in a heap, and its complexity
is O(lg n). To implement this operation, we add an element at the end of the heap
(at A[n-1]) and then perform some operations to restore the heap property.
deleteKey(A[], i): This operation is used to delete an element at index i, and the
complexity of this operation is O(logn). To delete any element, we can replace it with
the last element of the heap, and then again perform operations to restore the heap
property in case it is violated.
minHeapify(A[], i)
buildMinHeap(A[])
findMin(A[])
extractMin(A[])
decreaseKey(A[], i, v)
insertKey(A[], v)
deleteKey(A[], i).
Applications of Heap Data Structure
Heaps are used to efficiently implement a priority queue, an important data structure
in computer science. One of the applications of priority queues is in process
scheduling in operating systems.
Heaps are used by the Heapsort Algorithm, which is one of the fastest sorting
algorithms known. Its complexity is O(nlogn).
Heaps are also used in the efficient implementations of algorithms like Dijkstra’s
shortest-path algorithm, where we need to pick the node closest to a given node. If
distances to all the nodes are stored in a heap then the closest node can be extracted
efficiently using min-heap.
Heaps provide an efficient way to get the kth smallest or largest element in an array.
Heap data structure is a complete binary tree that satisfies the heap property, where any
given node is
always greater than its child node/s and the key of the root node is the largest among
all other nodes. This property is also called max heap property.
always smaller than the child node/s and the key of the root node is the smallest
among all other nodes. This property is also called min heap property.
Min-heap
Max-heap
Some of the important operations performed on a heap are described below along with their
algorithms.
Heapify
Heapify is the process of creating a heap data structure from a binary tree. It is used to create
a Min-Heap or a Max-Heap.
1. Start from the first index of non-leaf node whose index is given by n/2 - 1.
Algorithm
Heapify(array, size, i)
set i as largest
leftChild = 2i + 1
rightChild = 2i + 2
MaxHeap(array, size)
loop from the first index of non-leaf node down to zero
call heapify
For Min-Heap, both leftChild and rightChild must be larger than the parent for all nodes.
Insert Element into Heap
If there is no node,
create a newNode.
else (a node is already present)
insert the newNode at the end (last node from left to right.)
heapify the array
1. Insert the new element at the end of the tree.
For Min Heap, above algorithm is modified so that both childNodes are greater smaller
than currentNode.
Peek (Find max/min)
Peek operation returns the maximum element from Max Heap or minimum element from Min
Heap without deleting the node.
return rootNode
Extract-Max/Min
Extract-Max returns the node with maximum value after removing it from a Max Heap
whereas Extract-Min returns the node with minimum after removing it from Min Heap.
insert(array, 3);
insert(array, 4);
insert(array, 9);
insert(array, 5);
insert(array, 2);
deleteRoot(array, 4);
printArray(array, size);
}
10.6 SUMMARY
In this unit we have discussed priority queue in detail. We also discussed implementation of
priority queue. At the end of this unit studied heap data structure and also implementation of
heap data structure.
10.7 KEYWORDS
Priority queue, Heapify, Peek(), Insert(), Remove(), Push() and Pop().
10.8 QUESTIONS
10.9REFERENCES
11.1 INTRODUCTION
When it comes to searching and sorting data, one of the most fundamental data structures is
the binary search tree. However, the performance of a binary search tree is highly
dependent on its shape, and in the worst case, it can degenerate into a linear structure with a
time complexity of O(n). This is where Red Black Trees come in, they are a type of
balanced binary search tree that use a specific set of rules to ensure that the tree is always
balanced. This balance guarantees that the time complexity for operations such as insertion,
deletion, and searching is always O(log n), regardless of the initial shape of the tree.
Red Black Trees are self-balancing, meaning that the tree adjusts itself automatically after
each insertion or deletion operation. It uses a simple but powerful mechanism to maintain
balance, by coloring each node in the tree either red or black.
Splay tree is a self-adjusting binary search tree data structure, which means that the tree
structure is adjusted dynamically based on the accessed or inserted elements. In other
words, the tree automatically reorganizes itself so that frequently accessed or inserted
elements become closer to the root node.
1. The splay tree was first introduced by Daniel Dominic Sleator and Robert Endre
Tarjan in 1985. It has a simple and efficient implementation that allows it to
perform search, insertion, and deletion operations in O(log n) amortized time
complexity, where n is the number of elements in the tree.
2. The basic idea behind splay trees is to bring the most recently accessed or
inserted element to the root of the tree by performing a sequence of tree
rotations, called splaying. Splaying is a process of restructuring the tree by
making the most recently accessed or inserted element the new root and
gradually moving the remaining nodes closer to the root.
3. Splay trees are highly efficient in practice due to their self-adjusting nature,
which reduces the overall access time for frequently accessed elements. This
makes them a good choice for applications that require fast and dynamic data
structures, such as caching systems, data compression, and network routing
algorithms.
4. However, the main disadvantage of splay trees is that they do not guarantee a
balanced tree structure, which may lead to performance degradation in worst-
case scenarios. Also, splay trees are not suitable for applications that require
guaranteed worst-case performance, such as real-time systems or safety-critical
systems.
Overall, splay trees are a powerful and versatile data structure that offers fast and efficient
access to frequently accessed or inserted elements. They are widely used in various
applications and provide an excellent tradeoff between performance and simplicity.
Each node in the Red-black tree contains an extra bit that represents a color to ensure that the
tree is balanced during any operations performed on the tree like insertion, deletion, etc. In
a binary search tree, the searching, insertion and deletion take O(log2n) time in the average
case, O(1) in the best case and O(n) in the worst case.
In the above tree, if we want to search the 80. We will first compare 80 with the root node. 80
is greater than the root node, i.e., 10, so searching will be performed on the right subtree.
Again, 80 is compared with 15; 80 is greater than 15, so we move to the right of the 15, i.e.,
20. Now, we reach the leaf node 20, and 20 is not equal to 80. Therefore, it will show that the
element is not found in the tree. After each operation, the search is divided into half. The
above BST will take O(logn) time to search the element.
The above tree shows the right-skewed BST. If we want to search the 80 in the tree, we will
compare 80 with all the nodes until we find the element or reach the leaf node. So, the above
right-skewed BST will take O(N) time to search the element.
In the above BST, the first one is the balanced BST, whereas the second one is the
unbalanced BST. We conclude from the above two binary search trees that a balanced tree
takes less time than an unbalanced tree for performing any operation on the tree.
Therefore, we need a balanced tree, and the Red-Black tree is a self-balanced binary search
tree. Now, the question arises that why do we require a Red-Black tree if AVL is also a
height-balanced tree. The Red-Black tree is used because the AVL tree requires many
rotations when the tree is large, whereas the Red-Black tree requires a maximum of two
rotations to balance the tree. The main difference between the AVL tree and the Red-Black
tree is that the AVL tree is strictly balanced, while the Red-Black tree is not completely
height-balanced. So, the AVL tree is more balanced than the Red-Black tree, but the Red-
Black tree guarantees O(log2n) time for all operations like insertion, deletion, and searching.
Insertion is easier in the AVL tree as the AVL tree is strictly balanced, whereas deletion and
searching are easier in the Red-Black tree as the Red-Black tree requires fewer rotations.
As the name suggests that the node is either colored in Red or Black color. Sometimes no
rotation is required, and only recoloring is needed to balance the tree.
Yes, every AVL tree can be a Red-Black tree if we color each node either by Red or Black
color. But every Red-Black tree is not an AVL because the AVL tree is strictly height-
balanced while the Red-Black tree is not completely height-balanced.
The following are some rules used to create the Red-Black tree:
1. If the tree is empty, then we create a new node as a root node with the color black.
2. If the tree is not empty, then we create a new node as a leaf node with a color red.
3. If the parent of a new node is black, then exit.
4. If the parent of a new node is Red, then we have to check the color of the parent's
sibling of a new node.
4b) If the color is Red then we recolor the node. We will also check whether the parents'
parent of a new node is the root node or not; if it is not a root node, we will recolor and
recheck the node.
Step 2: The next node is 18. As 18 is greater than 10 so it will come at the right of 10 as
shown below.
We know the second rule of the Red Black tree that if the tree is not empty then the newly
created node will have the Red color. Therefore, node 18 has a Red color, as shown in the
below figure:
Now we verify the third rule of the Red-Black tree, i.e., the parent of the new node is black or
not. In the above figure, the parent of the node is black in color; therefore, it is a Red-Black
tree.
Step 3: Now, we create the new node having value 7 with Red color. As 7 is less than 10,
so it will come at the left of 10 as shown below.
Now we verify the third rule of the Red-Black tree, i.e., the parent of the new node is black or
not. As we can observe, the parent of the node 7 is black in color, and it obeys the Red-Black
tree's properties.
Step 4: The next element is 15, and 15 is greater than 10, but less than 18, so the new node
will be created at the left of node 18. The node 15 would be Red in color as the tree is not
empty.
The above tree violates the property of the Red-Black tree as it has Red-red parent-child
relationship. Now we have to apply some rule to make a Red-Black tree. The rule 4 says
that if the new node's parent is Red, then we have to check the color of the parent's sibling
of a new node. The new node is node 15; the parent of the new node is node 18 and the
sibling of the parent node is node 7. As the color of the parent's sibling is Red in color, so we
apply the rule 4a. The rule 4a says that we have to recolor both the parent and parent's sibling
node. So, both the nodes, i.e., 7 and 18, would be recolored as shown in the below figure.
We also have to check whether the parent's parent of the new node is the root node or not. As
we can observe in the above figure, the parent's parent of a new node is the root node, so we
do not need to recolor it.
Step 5: The next element is 16. As 16 is greater than 10 but less than 18 and greater than 15,
so node 16 will come at the right of node 15. The tree is not empty; node 16 would be Red in
color, as shown in the below figure:
In the above figure, we can observe that it violates the property of the parent-child
relationship as it has a red-red parent-child relationship. We have to apply some rules to make
a Red-Black tree. Since the new node's parent is Red color, and the parent of the new node
has no sibling, so rule 4a will be applied. The rule 4a says that some rotations and recoloring
would be performed on the tree.
Since node 16 is right of node 15 and the parent of node 15 is node 18. Node 15 is the left of
node 18. Here we have an LR relationship, so we require to perform two rotations. First, we
will perform left, and then we will perform the right rotation. The left rotation would be
performed on nodes 15 and 16, where node 16 will move upward, and node 15 will move
downward. Once the left rotation is performed, the tree looks like as shown in the below
figure:
In the above figure, we can observe that there is an LL relationship. The above tree has a
Red-red conflict, so we perform the right rotation. When we perform the right rotation, the
median element would be the root node. Once the right rotation is performed, node 16 would
become the root node, and nodes 15 and 18 would be the left child and right child,
respectively, as shown in the below figure.
After rotation, node 16 and node 18 would be recolored; the color of node 16 is red, so it will
change to black, and the color of node 18 is black, so it will change to a red color as shown in
the below figure:
Step 6: The next element is 30. Node 30 is inserted at the right of node 18. As the tree is not
empty, so the color of node 30 would be red.
The color of the parent and parent's sibling of a new node is Red, so rule 4b is applied. In rule
4b, we have to do only recoloring, i.e., no rotations are required. The color of both the parent
(node 18) and parent's sibling (node 15) would become black, as shown in the below image.
We also have to check the parent's parent of the new node, whether it is a root node or not.
The parent's parent of the new node, i.e., node 30 is node 16 and node 16 is not a root node,
so we will recolor the node 16 and changes to the Red color. The parent of node 16 is node
10, and it is not in Red color, so there is no Red-red conflict.
Step 7: The next element is 25, which we have to insert in a tree. Since 25 is greater than 10,
16, 18 but less than 30; so, it will come at the left of node 30. As the tree is not empty, node
25 would be in Red color. Here Red-red conflict occurs as the parent of the newly created is
Red color.
Since there is no parent's sibling, so rule 4a is applied in which rotation, as well as recoloring,
are performed. First, we will perform rotations. As the newly created node is at the left of its
parent and the parent node is at the right of its parent, so the RL relationship is formed.
Firstly, the right rotation is performed in which node 25 goes upwards, whereas node 30 goes
downwards, as shown in the below figure.
After the first rotation, there is an RR relationship, so left rotation is performed. After right
rotation, the median element, i.e., 25 would be the root node; node 30 would be at the right of
25 and node 18 would be at the left of node 25.
Now recoloring would be performed on nodes 25 and 18; node 25 becomes black in color,
and node 18 becomes red in color.
Step 8: The next element is 40. Since 40 is greater than 10, 16, 18, 25, and 30, so node 40
will come at the right of node 30. As the tree is not empty, node 40 would be Red in color.
There is a Red-red conflict between nodes 40 and 30, so rule 4b will be applied.
As the color of parent and parent's sibling node of a new node is Red so recoloring would be
performed. The color of both the nodes would become black, as shown in the below image.
After recoloring, we also have to check the parent's parent of a new node, i.e., 25, which is
not a root node, so recoloring would be performed, and the color of node 25 changes to Red.
After recoloring, red-red conflict occurs between nodes 25 and 16. Now node 25 would be
considered as the new node. Since the parent of node 25 is red in color, and the parent's
sibling is black in color, rule 4a would be applied. Since 25 is at the right of the node 16 and
16 is at the right of its parent, so there is an RR relationship. In the RR relationship, left
rotation is performed. After left rotation, the median element 16 would be the root node, as
shown in the below figure.
After rotation, recoloring is performed on nodes 16 and 10. The color of node 10 and node 16
changes to Red and Black, respectively as shown in the below figure.
Step 9: The next element is 60. Since 60 is greater than 16, 25, 30, and 40, so node 60 will
come at the right of node 40. As the tree is not empty, the color of node 60 would be Red.
As we can observe in the above tree that there is a Red-red conflict occurs. The parent node is
Red in color, and there is no parent's sibling exists in the tree, so rule 4a would be applied.
The first rotation would be performed. The RR relationship exists between the nodes, so left
rotation would be performed.
When left rotation is performed, node 40 will come upwards, and node 30 will come
downwards, as shown in the below figure:
After rotation, the recoloring is performed on nodes 30 and 40. The color of node 30 would
become Red, while the color of node 40 would become black.
The above tree is a Red-Black tree as it follows all the Red-Black tree properties.
Let's understand how we can delete the particular node from the Red-Black tree. The
following are the rules used to delete the particular node from the tree:
Step 1: First, we perform BST rules for the deletion.
Step 2:
Suppose we want to delete node 30 from the tree, which is given below.
Initially, we are having the address of the root node. First, we will apply BST to search the
node. Since 30 is greater than 10 and 20, which means that 30 is the right child of node 20.
Node 30 is a leaf node and Red in color, so it is simply deleted from the tree.
If we want to delete the internal node that has one child. First, replace the value of the
internal node with the value of the child node and then simply delete the child node.
Let's take another example in which we want to delete the internal node, i.e., node 20.
We cannot delete the internal node; we can only replace the value of that node with another
value. Node 20 is at the right of the root node, and it is having only one child, node 30. So,
node 20 is replaced with a value 30, but the color of the node would remain the same, i.e.,
Black. In the end, node 20 (leaf node) is deleted from the tree.
If we want to delete the internal node that has two child nodes. In this case, we have to decide
from which we have to replace the value of the internal node (either left subtree or right
subtree). We have two ways:
o Inorder predecessor: We will replace with the largest value that exists in the left
subtree.
o Inorder successor: We will replace with the smallest value that exists in the right
subtree.
Suppose we want to delete node 30 from the tree, which is shown below:
Node 30 is at the right of the root node. In this case, we will use the inorder successor. The
value 38 is the smallest value in the right subtree, so we will replace the value 30 with 38, but
the node would remain the same, i.e., Red. After replacement, the leaf node, i.e., 30, would
be deleted from the tree. Since node 30 is a leaf node and Red in color, we need to delete it
(we do not have to perform any rotations or any recoloring).
Case 2: If the root node is also double black, then simply remove the double black and make
it a single black.
Case 3: If the double black's sibling is black and both its children are black.
We cannot simply delete node 15 from the tree as node 15 is Black in color. Node 15 has two
children, which are nil. So, we replace the 15 value with a nil value. As node 15 and nil node
are black in color, the node becomes double black after replacement, as shown in the below
figure.
In the above tree, we can observe that the double black's sibling is black in color and its
children are nil, which are also black. As the double black's sibling and its children have
black so it cannot give its black color to neither of these. Now, the double black's parent node
is Red so double black's node add its black color to its parent node. The color of the node 20
changes to black while the color of the nil node changes to a single black as shown in the
below figure.
After adding the color to its parent node, the color of the double black's sibling, i.e., node 30
changes to red as shown in the below figure.
In the above tree, we can observe that there is no longer double black's problem exists, and it
is also a Red-Black tree.
Case 4: If double black's sibling is Red.
o Swap the color of its parent and its sibling.
o Rotate the parent node in the double black's direction.
o Reapply cases.
Initially, the 15 is replaced with a nil value. After replacement, the node becomes double
black. Since double black's sibling is Red so color of the node 20 changes to Red and the
color of the node 30 changes to Black.
Once the swapping of the color is completed, the rotation towards the double black would be
performed. The node 30 will move upwards and the node 20 will move downwards as shown
in the below figure.
In the above tree, we can observe that double black situation still exists in the tree. It satisfies
the case 3 in which double black's sibling is black as well as both its children are black. First,
we remove the double black from the node and add the black color to its parent node. At the
end, the color of the double black's sibling, i.e., node 25 changes to Red as shown in the
below figure.
In the above tree, we can observe that the double black situation has been resolved. It also
satisfies the properties of the Red Black tree.
Case 5: If double black's sibling is black, sibling's child who is far from the double black is
black, but near child to double black is red.
o Swap the color of double black's sibling and the sibling child which is nearer to the
double black node.
o Rotate the sibling in the opposite direction of the double black.
o Apply case 6
First, we replace the value 1 with the nil value. The node becomes double black as both the
nodes, i.e., 1 and nil are black. It satisfies the case 3 that implies if DB's sibling is black and
both its children are black. First, we remove the double black of the nil node. Since the
parent of DB is Black, so when the black color is added to the parent node then it becomes
double black. After adding the color, the double black's sibling color changes to Red as
shown below.
We can observe in the above screenshot that the double black problem still exists in the tree.
So, we will reapply the cases. We will apply case 5 because the sibling of node 5 is node 30,
which is black in color, the child of node 30, which is far from node 5 is black, and the child
of the node 30 which is near to node 5 is Red. In this case, first we will swap the color of
node 30 and node 25 so the color of node 30 changes to Red and the color of node 25 changes
to Black as shown below.
Once the swapping of the color between the nodes is completed, we need to rotate the sibling
in the opposite direction of the double black node. In this rotation, the node 30 moves
downwards while the node 25 moves upwards as shown below.
As we can observe in the above tree that double black situation still exists. So, we need to
case 6. Let's first see what is case 6.
Now we will apply case 6 in the above example to solve the double black's situation.
In the above example, the double black is node 5, and the sibling of node 5 is node 25, which
is black in color. The far child of the double black node is node 30, which is Red in color as
shown in the below figure:
First, we will swap the colors of Parent and its sibling. The parent of node 5 is node 10, and
the sibling node is node 25. The colors of both the nodes are black, so there is no swapping
would occur.
In the second step, we need to rotate the parent in the double black's direction. After rotation,
node 25 will move upwards, whereas node 10 will move downwards. Once the rotation is
performed, the tree would like, as shown in the below figure:
In the next step, we will remove double black from node 5 and node 5 will give its black
color to the far child, i.e., node 30. Therefore, the color of node 30 changes to black as shown
in the below figure.
/*
* A sample Java Program to Implement the Red-Black Tree Data Structure
*/
// The Scanner class from the util is imported to take input from the user
import java.util.Scanner;
// A class named Node_Red_Black_Tree is created whose each object will work as a
Node of the Red-Black Tree
class Node_Red_Black_Tree
{
// Each element of the Red Black tree node has four members.
// Out of these four members two variables are of Node_Red_Black_Tree class type na
med left_node_addr and right_node_addr storing the left and right nodes of the previous n
ode
Node_Red_Black_Tree left_node_addr, right_node_addr;
// The node_data Integer variable is used to store the data present in that particular node
int node_data;
// The node_data Integer variable is used to store the color of that particular node
int colour_of_node;
// A class named Node_Red_Black_Tree is created whose each object will work as the R
ed-Black Tree
class Red_Black_Tree
{
private Node_Red_Black_Tree current_node;
private Node_Red_Black_Tree parent_node;
private Node_Red_Black_Tree grand_node;
private Node_Red_Black_Tree great_node;
private Node_Red_Black_Tree header_node;
private static Node_Red_Black_Tree node_null;
// color coding
/* Black - 1 RED - 0 */
static final int BLACK = 1;
static final int RED = 0;
if (parent_node.colour_of_node == RED)
{
// Have to rotate
grand_node.colour_of_node = RED;
if (item < grand_node.node_data != item < parent_node.node_data)
parent_node = rotate( item, grand_node ); // Start dbl rotate
current_node = rotate(item, great_node );
current_node.colour_of_node = BLACK;
}
// Make root black
header_node.right_node_addr.colour_of_node = BLACK;
}
private Node_Red_Black_Tree rotate(int item, Node_Red_Black_Tree parent_node)
{
if(item < parent_node.node_data)
return parent_node.left_node_addr = item < parent_node.left_node_addr.node_d
ata ? rotateWithleft_node_addrChild(parent_node.left_node_addr) : rotateWithright_node
_addrChild(parent_node.left_node_addr) ;
else
return parent_node.right_node_addr = item < parent_node.right_node_addr.node
_data ? rotateWithleft_node_addrChild(parent_node.right_node_addr) : rotateWithright_n
ode_addrChild(parent_node.right_node_addr);
}
/* Rotate binary tree node with left_node_addr child */
private Node_Red_Black_Tree rotateWithleft_node_addrChild(Node_Red_Black_Tr
ee k2)
{
Node_Red_Black_Tree k1 = k2.left_node_addr;
k2.left_node_addr = k1.right_node_addr;
k1.right_node_addr = k2;
return k1;
}
/* Rotate binary tree node with right_node_addr child */
private Node_Red_Black_Tree rotateWithright_node_addrChild(Node_Red_Black_T
ree k1)
{
Node_Red_Black_Tree k2 = k1.right_node_addr;
k1.right_node_addr = k2.left_node_addr;
k2.left_node_addr = k1;
return k2;
}
/* Class Red_Black_Tree_Run */
class Red_Black_Tree_Run
{
public static void main(String[] args)
{
Scanner scannner_object = new Scanner(System.in);
/* Creating object of RedBlack Tree */
Red_Black_Tree red_black_tree_object = new Red_Black_Tree(Integer.MIN_VAL
UE);
System.out.println("Red Black Tree Test\n");
char ch;
/* Perform tree operations */
do
{
System.out.println("\nThe options list for Red Black Tree::\n");
System.out.println("1. To add a new node in the Red-Black Tree");
System.out.println("2. To search the Red-Black Tree for a node");
System.out.println("3. To get node count of nodes in Red Black Tree");
System.out.println("4. To check if the Red_Black_Tree is Empty or not?");
System.out.println("5. To Clear the Red_Black_Tree.");
Have you ever thought that AVL and Red-Black trees are also self-adjusted trees? Then, what
makes the Splay Tree different from the AVL and Red-Black trees? Yes, there is one operation
called splaying, which makes it different from both AVL and the Red-Black Tree.
A splay tree contains all the operations of a binary search tree, like insertion, deletion, and
searching. But it also contains one more operation, which is called splaying. In a splay tree,
every operation is performed at the root of the tree. All operations in the splay tree involve
one common operation called splaying.
You may have questioned what splaying is and why it differentiates splay trees from AVL
and Red-Black trees. So, let me tell you about splaying. Splaying is the process of bringing
an element to the root by performing suitable rotations.
By splaying elements in the tree, we can bring more frequently used elements closer to the
root of the tree so that any operations like insertion, searching, and deletion can be performed
quickly. This means, that after applying the splaying operation, more frequently used
elements come closer to the root.
Suppose we have been given a binary search tree with different nodes and we know that in
the binary search tree, elements to the left are smaller and those to the right are greater than
the root node.
For searching, we will perform the binary search method. Let’s say we want to search for
element 9. As 9 is less than 11, we will come to the left of the root node. After performing a
search operation, we need to do one thing called splaying. This means, that after splaying, the
element on which we are operating should come to the root. Elements would come to the root
after performing some rearrangements of elements or rotations in the tree.
To rearrange the tree, we need to perform some rotations. The rotations given below are the
rotations that we are going to perform in the splay tree.
Zig rotation:
This rotation is similar to the right rotation in the AVL tree. In zig rotation, every node moves
one position to the right of its current position. We use Zig rotation when the item which is to
be searched is either a root node or a left child of the root node.
Let’s take a scenario where the search item is the left child of the root node.
In the above example, we have to search for node 9 in the tree. To search for the given node
in the binary search tree, we need to perform the following steps:
Step 1: First, we compare 9 with the root node. As 9 is less than 11, it is a left child of the
root node.
Step 2: We have already seen above that once the element is found, we will perform
splaying. Here the right rotation is performed so that 9 becomes the root node of the tree.
Have a look at the diagram below.
In the above diagram, we can see that node 9 has become the root node of the tree, this shows
that the searching is completed.
It’s a kind of double zig rotation. Here we perform zig rotation two times. Every node
moves two positions to the right of its current position. But why are we doing this?
We are doing this because sometimes situations arise where we need to search for the item
that has both parent and grandparent. In such cases, we have to perform four rotations for
splaying.
Step 2: We have to perform splaying, which means we have to make node 3 the root node.
So here we will perform two right rotations because the elements to be searched have both
parent and grandparent.
In the above diagram, we can see that node 3 has become the root node of the tree, this shows
that the searching is completed.
Zag rotation:
This rotation is similar to the left rotation in the AVL tree. In zag rotation, every node moves
one position to the left of its current position. We use Zag rotation when the item which is to
be searched is either a root node or a right child of the root node.
Let’s see the case where the element to be searched for is present on the right of the root node
of the tree. Now let’s say we have to search for 13, which is present on the right of the root
node of the tree.
The steps involved in searching are given below:
Step 1: First, we compare 13 with the root node. As 13 is greater than the root node of the
tree, it is the right child of the root node.
Step 2: Once the element is found, we will perform splaying as we did in the previous
examples. Here we will perform a left rotation so that 13 becomes the root node of the tree.
In the above diagram, we can see that node 13 has become the root node of the tree, this
shows that the searching is completed.
It’s a kind of double zag rotation. Here we perform zag rotation two times. Every node moves
two positions to the left of its current position. But why are we doing this?
We are doing this because sometimes situations arise where we need to search for the item
that has both parent and grandparent. In such cases, we have to perform four rotations for
splaying.
Step 1: First, we have to perform a search operation in the tree as we did previously, which
means a BST operation. As 7 is greater than 4 and 6, it will be at the right of node 6. So we
can say that element 7 has a parent of 6 and a grandparent of 4.
Step 2: We have to perform splaying, which means we have to make node 7 the root node.
So here we will perform two left rotations because the elements to be searched have both
parent and grandparent.
In the above diagram, we can see that node 7 has become the root node of the tree, this shows
that the searching is completed.
This type of rotation is a sequence of zig rotations followed by zag rotations. So far, we've
seen that both the parent and the grandparent are in a RR or LL relationship. Now, here we
will see the RL and LR kinds of relationships between parent and grandparent. Every node
moves one position to the right, followed by one position to the left of its current position.
Suppose we want to search for element 5. Here first we perform a BST searching operation.
Like 5, it is greater than the root node 4 and smaller than the node 6. So 5 would be the left
child of node 6.
So an RL relationship exists since node 5 is to the left of node 6 and node 6 is to the right of
node 4. So first we will perform the right rotation on node 6, and then node 6 will move
downwards, and node 5 will come upwards, as you can see in the example given below. After
that, we will perform zag rotation(left rotation) at 4, and we will see that 5 will become the
root node of the tree.
As we can observe in the above tree, node 5 has become the root node; therefore, the search
is completed. In this case, first, we have performed the zig rotation and then the zag rotation.
So it is known as zig-zag rotation.
This rotation is similar to the Zig-zag rotation, the only difference is that here every node
moves one position to the left, followed by one position to the right of its current position.
Suppose we want to search for element 5. Here first we perform a BST searching operation.
Like 5, it is smaller than the root node 6 and greater than node 4. So 5 would be the right
child of node 4, and 4 would be the left child of the root node 6.
So here LR relationship exists since node 5 is to the right of node 4 and node 4 is to the left of
node 6. So first we will perform the left rotation on node 4, and then node 4 will move
downwards, and node 5 will come upwards, as you can see in the example given below. And
after that, we will perform one zig rotation (right rotation) at 6, so finally, we will get 5 as the
root node of the tree.
As we can observe in the above tree, node 5 has become the root node; therefore, the
searching is completed. In this case, first, we have performed the zag rotation and then the zig
rotation. So it is known as zag-zig rotation.
In the AVL and Red-Black trees, we need to store some information. Like in AVL
trees, we need to store the balance factor of each node, and in the red-black trees, we
also need to store one extra bit of information that denotes the color of the node,
either black or red.
Splay tree is the fastest type of binary search tree, which is used in a variety of
practical applications such as GCC compilers.
Improve searching by moving frequently accessed nodes closer to the root node.
One of the practical uses is cache implementation, in which recently used data is
saved in the cache so that we can access the data more quickly without going into
memory.
The main disadvantage of the splay tree is that trees are not strictly balanced, but
rather roughly balanced. When the splay trees are linear, the time complexity is O(n).
C
#include <stdio.h>
#include <stdlib.h>
// Tree Node
struct node
{
int key;
struct node *left, *right;
};
// Allocates a new node with the given key and NULL left and right pointers.
struct node *TreeNode(int key)
{
struct node *node = (struct node *)malloc(sizeof(struct node));
node->key = key;
node->left = node->right = NULL;
return (node);
}
// If the key is present in the tree, this function moves it to the root.
// If the key is not present, it returns the last item accessed by root.
// This function modifies the tree and returns the modified root.
struct node *splay(struct node *root, int key)
{
// Root is NULL or key is present at root.
if (root == NULL || root->key == key)
return root;
// Do the first rotation for the root, followed by the second rotation.
root = rightRotate(root);
}
else if (root->left->key < key) // Zig-Zag (Left Right)
{
// First, recursively bring the key as the root of left-right.
root->left->right = splay(root->left->right, key);
// The search function for Splay Tree. Note that this function returns the new root of the
splay-tree.
// If a key is present in the tree, then it is moved to the root.
struct node *bstSearch(struct node *root, int key)
{
return splay(root, key);
}
// main function
int main()
{
struct node *root = TreeNode(100);
root->left = TreeNode(50);
root->right = TreeNode(200);
root->left->left = TreeNode(40);
root->left->left->left = TreeNode(30);
root->left->left->left->left = TreeNode(20);
root = bstSearch(root, 20);
preOrder(root);
return 0;
}
11.6 SUMMARY
In this unit we learnt about red-black tree in data structure. We also learnt about
implementation of red-black tree in data structure. At the end of this unit, we have discussed
splay tree and also implementation of splay tree in data structure
11.7 KEYWORDS
Splay tree, Red-black tree, zig rotation, Zig-zig, Zag-zag and Zag-zig.
11.8 QUESTIONS
1. Define red-black tree.
2. Write the properties of red-black tree.
3. Explain insertion in red-black tree with example.
4. Write the advantages and disadvantages of red-black tree.
5. Explain rotations in splay tree
11.10REFERENCES
1. "Data Structures and Algorithms" by Michael T. Goodrich, Roberto Tamassia, and
Michael H. Goldwasser
2. "Introduction to Algorithms" by Thomas H. Cormen, Charles E. Leiserson, Ronald L.
Rivest, and Clifford Stein
3. "Algorithms" by Robert Sedgewick and Kevin Wayne
4. "Data Structures and Algorithms in C++" by Adam Drozdek
5. "Data Structures and Algorithms Made Easy" by Narasimha Karumanchi
6. "Data Structures and Algorithm Analysis in C++" by Mark A. Weiss
7. "Data Structures and Algorithms with Object-Oriented Design Patterns in C++" by
Bruno R. Preiss
8. "Data Structures and Algorithm Analysis in Java" by Mark A. Weiss
UNIT 12: GRAPH ALGORITHMS
Structure
12.0 Objectives
12.1 Introduction
12.2 Shortest path algorithms
12.3 Minimum spanning tree algorithms
12.7 Summary
12.8 Key words
12.9 Questions
12.10 References
12.0 OBJECTIVES
At the end of this unit, you will be able to
- Discuss Shortest path algorithms.
- Understand the Minimum spanning tree algorithms.
12.1 INTRODUCTION
Graph algorithms are a subset of tools for graph analytics. Graph analytics is something we
do—it’s the use of any graph-based approach to analyze connected data. There are various
methods we could use: we might query the graph data, use basic statistics, visually explore
the graphs, or incorporate graphs into our machine learning tasks. Graph pattern–based
querying is often used for local data analysis, whereas graph computational algorithms
usually refer to more global and iterative analysis. Although there is overlap in how these
types of analysis can be employed, we use the term graph algorithms to refer to the latter,
more computational analytics and data science uses.
Graph algorithms provide one of the most potent approaches to analyzing connected data
because their mathematical calculations are specifically built to operate on relationships.
They describe steps to be taken to process a graph to discover its general qualities or specific
quantities. Based on the mathematics of graph theory, graph algorithms use the relationships
between nodes to infer the organization and dynamics of complex systems. Network
scientists use these algorithms to uncover hidden information, test hypotheses, and make
predictions about behavior.
Graph algorithms have widespread potential, from preventing fraud and optimizing call
routing to predicting the spread of the flu. For instance, we might want to score particular
nodes that could correspond to overload conditions in a power system. Or we might like to
discover groupings in the graph which correspond to congestion in a transport system.
This algorithm depends on the relaxation principle where the shortest distance for all vertices
is gradually replaced by more accurate values until eventually reaching the optimum solution.
In the beginning all vertices have a distance of "Infinity", but only the distance of the source
vertex = 0, then update all the connected vertices with the new distances (source vertex
distance + edge weights), then apply the same concept for the new vertices with new
distances and so on.
Implementation:
v[i].clear();
dis[i] = 2e9;
}
v[i].push_back(from);
v[i].push_back(next);
v[i].push_back(weight);
}
dis[0] = 0;
for(int i = 0; i < n - 1; i++){
int j = 0;
while(v[j].size() != 0){
A very important application of Bellman Ford is to check if there is a negative cycle in the
graph,
Time Complexity of Bellman Ford algorithm is relatively high O(V * E), in case
Dijkstra's algorithm has many variants but the most common one is to find the shortest paths
from the source vertex to all other vertices in the graph.
Algorithm Steps:
Set all vertices distances = infinity except for the source vertex, set the source
distance = 0.
Push the source vertex in a min-priority queue in the form (distance, vertex), as the
comparison in the min-priority queue will be according to vertices distances.
Pop the vertex with the minimum distance from the priority queue (at first the popped
vertex = source).
Update the distances of the connected vertices to the popped vertex in case of "current
vertex distance + edge weight < next vertex distance", then push the vertex
with the new distance to the priority queue.
If the popped vertex is visited before, just continue without using it.
Apply the same algorithm again until the priority queue is empty.
Implementation:
vector < pair < int , int > > v [SIZE]; // each vertex has all the connected vertices with the
edges weights
int dist [SIZE];
bool vis [SIZE];
void dijkstra(){
// set the vertices distances as infinity
memset(vis, false , sizeof vis); // set all vertex as unvisited
dist[1] = 0;
multiset < pair < int , int > > s; // multiset do the job as a min-priority queue
while(!s.empty()){
pair <int , int> p = *s.begin(); // pop the vertex with the minimum distance
s.erase(s.begin());
down to .
However, if we have to find the shortest path between all pairs of vertices, both of the above
methods would be expensive in terms of time. Discussed below is another algorithm designed
for this case.
Floyd\u Warshall's Algorithm is used to find the shortest paths between between all pairs of
vertices in a graph, where each edge in the graph has a weight which is positive or negative.
The biggest advantage of using this algorithm is that all the shortest distances between
Now, let’s look at the sum of edge weight costs for all these spanning trees represented in the
table below:
Sum of Edge
Spanning Tree
Costs
ST - 1 22
ST - 2 35
ST - 3 36
1. Prim’s Algorithm
Prim's algorithm begins with a single node and adds up adjacent nodes one by one by
discovering all of the connected edges along the way. Edges with the lowest weights that
don't generate cycles are chosen for inclusion in the MST structure. As a result, we can claim
that Prim's algorithm finds the globally best answer by making locally optimal decisions.
Steps involved in Prim’s algorithms are mentioned below:
Output:
You can verify the output of this code by comparing it with ST-1 as the graph that we have
passed to this Prim’s algorithm is the same graph G(V, E) represented previously.
2. Kruskal’s Algorithm
Kruskal's approach sorts all the edges in ascending order of edge weights and only adds
nodes to the tree if the chosen edge does not form a cycle. It also selects the edge with the
lowest cost first and the edge with the highest cost last. As a result, we can say that
the Kruskal algorithm makes a locally optimum decision in the hopes of finding the global
optimal solution. Hence, this algorithm can also be considered as a Greedy Algorithm.
The steps involved in Kruskal’s algorithm to generate a minimum spanning tree are:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
//structure for denoting an edge
struct Edge {
int source, destination, weight;
};
//structure for representing a weighted, connected and undirected graph
struct Graph {
int Node, E;
struct Edge* edge;
};
//memory allocation for storing graph with V vertices and E edges
struct Graph* create_Graph(int Node, int E)
{
struct Graph* gph = (struct Graph*)(malloc(sizeof(struct Graph)));
gph->Node = Node;
gph->E = E;
gph->edge = (struct Edge*)malloc(sizeof( struct Edge));
return gph;
}
//Union-Find Subset
struct tree_subset {
int parent;
int rank;
};
//finding the set of selected nodes
int DisjointSet_find(struct tree_subset subsets[], int i)
{
//find root and make root as parent of i
if (subsets[i].parent != i)
subsets[i].parent
= DisjointSet_find(subsets, subsets[i].parent);
return subsets[i].parent;
}
void DisjointSet_Union(struct tree_subset subsets[], int x, int y)
{
int xroot = DisjointSet_find(subsets, x);
int yroot = DisjointSet_find(subsets, y);
if (subsets[xroot].rank < subsets[yroot].rank)
subsets[xroot].parent = yroot;
else if (subsets[xroot].rank > subsets[yroot].rank)
subsets[yroot].parent = xroot;
else
{
subsets[yroot].parent = xroot;
subsets[xroot].rank++;
}
}
//Comparing edges with qsort() in C
int myComp(const void* a, const void* b)
{
struct Edge* a1 = (struct Edge*)a;
struct Edge* b1 = (struct Edge*)b;
return a1->weight > b1->weight;
}
//function for creating MST using Kruskal’s Approach
void MST_Kruskal(struct Graph* gph)
{
int Node = gph->Node;
struct Edge
result[Node];
int e = 0;
int i = 0;
//edge sorting
qsort(gph->edge, gph->E, sizeof(gph->edge[0]),
myComp);
//allocating memory for v subsets
struct tree_subset* subsets
= (struct tree_subset*)malloc(Node * sizeof(struct tree_subset));
for (int v = 0; v < Node; ++v) {
subsets[v].parent = v;
subsets[v].rank = 0;
}
//V-1 : Path traversal limit
while (e < Node - 1 && i < gph->E) {
struct Edge next_edge = gph->edge[i++];
int x = DisjointSet_find(subsets, next_edge.source);
int y = DisjointSet_find(subsets, next_edge.destination);
if (x != y) {
result[e++] = next_edge;
DisjointSet_Union(subsets, x, y);
}
}
//prompting state of MST
printf(
"Edges created in MST are as below: \n");
int minimumCost = 0;
//calculating minimum cost using for loop
for (i = 0; i < e; ++i)
{
printf("%d -- %d == %d\n", result[i].source,
result[i].destination, result[i].weight);
minimumCost += result[i].weight;
}
printf("The Cost for created MST is : %d",minimumCost);
return;
}
//driver function
int main()
{
int Node = 4;
int E = 6;
struct Graph* gph = create_Graph(Node, E);
//graph creation
gph->edge[0].source = 0;
gph->edge[0].destination = 1;
gph->edge[0].weight = 2;
gph->edge[1].source = 0;
gph->edge[1].destination = 2;
gph->edge[1].weight = 4;
gph->edge[2].source = 0;
gph->edge[2].destination = 3;
gph->edge[2].weight = 4;
gph->edge[3].source = 1;
gph->edge[3].destination = 3;
gph->edge[3].weight = 3;
gph->edge[4].source = 2;
gph->edge[4].destination = 3;
gph->edge[4].weight = 1;
gph->edge[5].source = 1;
gph->edge[5].destination = 2;
gph->edge[5].weight = 2;
MST_Kruskal(gph);
return 0;
}
Output:
You can verify this output’s accuracy by generating an MST for the graph given above.
12.5 KEYWORDS
Spanning tree, Ford’s algorithm, Dijkstra’s algorithm and Kruskal’s algorithm.
12.6 QUESTIONS
1. Explain bellman ford’s algorithm.
2. Discuss dijkstra’s algorithm.
3. Write a short note on minimum spanning tree algorithm.
4. Describe prim’s algorithm.
5. Briefly explain Kruskal’s algorithm.
12.7 REFERENCES