Data Structures
Data Structures
Introduction to Data Structures, abstract data types, Linear list – singly linked list
implementation, insertion, deletion and searching operations on linear list, Stacks-
Operations, array and linked representations of stacks, stack applications, Queues-
operations, array and linked representations.
Data Structure
Introduction
Data Structure can be defined as the group of data elements which provides an efficient way of
storing and organizing data in the computer so that it can be used efficiently. Some examples of
Data Structures are arrays, Linked List, Stack, Queue, etc. Data Structures are widely used in
almost every aspect of Computer Science i.e. Operating System, Compiler Design, Artificial
intelligence, Graphics and many more.
Data Structures are the main part of many computer science algorithms as they enable the
programmers to handle the data in an efficient way. It plays a vital role in enhancing the
performance of a software or a program as the main function of the software is to store and
retrieve the user's data as fast as possible.
Basic Terminology
Data structures are the building blocks of any program or the software. Choosing the appropriate
data structure for a program is the most difficult task for a programmer. Following terminology
is used as far as data structures are concerned.
Data: Data can be defined as an elementary value or the collection of values, for example,
student's name and its id are the data about the student.
Group Items: Data items which have subordinate data items are called Group item, for example,
name of a student can have first name and the last name.
Record: Record can be defined as the collection of various data items, for example, if we talk
about the student entity, then its name, address, course and marks can be grouped together to
form the record for the student.
File: A File is a collection of various records of one type of entity, for example, if there are 60
employees in the class, then there will be 20 records in the related file where each record
contains the data about each employee.
Field: Field is a single elementary unit of information representing the attribute of an entity.
As applications are getting complex and amount of data is increasing day by day, there may arise
the following problems:
Processor speed: To handle very large amount of data, high speed processing is required, but as
the data is growing day by day to the billions of files per entity, processor may fail to deal with
that much amount of data.
Data Search: Consider an inventory size of 106 items in a store, If our application needs to
search for a particular item, it needs to traverse 106 items every time, results in slowing down the
search process.
Multiple requests: If thousands of users are searching the data simultaneously on a web server,
then there are the chances that a very large server can be failed during that process
in order to solve the above problems, data structures are used. Data is organized to form a data
structure in such a way that all items are not required to be searched and required data can be
searched instantly.
Efficiency: Efficiency of a program depends upon the choice of data structures. For example:
suppose, we have some data and we need to perform the search for a particular record. In that
case, if we organize our data in an array, we will have to search sequentially element by element.
hence, using array may not be very efficient here. There are better data structures which can
make the search process efficient like ordered array, binary search tree or hash tables.
Reusability: Data structures are reusable, i.e. once we have implemented a particular data
structure, we can use it at any other place. Implementation of data structures can be compiled
into libraries which can be used by different clients.
Abstraction: Data structure is specified by the ADT which provides a level of abstraction. The
client program uses the data structure through interface only, without getting into the
implementation details.
Linear Data Structures: A data structure is called linear if all of its elements are arranged in the
linear order. In linear data structures, the elements are stored in non-hierarchical way where each
element has the successors and predecessors except the first and last element.
Arrays: An array is a collection of similar type of data items and each data item is called an
element of the array. The data type of the element may be any valid data type like char, int, float
or double.
The elements of array share the same variable name but each one carries a different index
number known as subscript. The array can be one dimensional, two dimensional or
multidimensional.
The individual elements of the array age are:
age[0], age[1], age[2], age[3],. age[98], age[99].
Linked List: Linked list is a linear data structure which is used to maintain a list in the memory.
It can be seen as the collection of nodes stored at non-contiguous memory locations. Each node
of the list contains a pointer to its adjacent node.
Queue: Queue is a linear list in which elements can be inserted only at one end called rear and
deleted only at the other end called front.
It is an abstract data structure, similar to stack. Queue is opened at both end therefore it follows
First-In-First-Out (FIFO) methodology for storing the data items.
If a data structure organizes the data in random order, then that data structure is called as
Non-Linear Data Structure.
Example
1. Tree
2. Graph
3. Dictionaries
4. Heaps
5. Tries, Etc.,
Types of Non Linear Data Structures are given below:
Trees: Trees are multilevel data structures with a hierarchical relationship among its elements
known as nodes. The bottommost nodes in the herierchy are called leaf node while the topmost
node is called root node. Each node contains pointers to point adjacent nodes.
Tree data structure is based on the parent-child relationship among the nodes. Each node in the
tree can have more than one children except the leaf nodes whereas each node can have atmost
one parent except the root node. Trees can be classfied into many categories which will be
discussed later in this tutorial.
Graphs: Graphs can be defined as the pictorial representation of the set of elements (represented
by vertices) connected by the links known as edges. A graph is different from tree in the sense
that a graph can have cycle while the tree can not have the one.
1) Traversing: Every data structure contains the set of data elements. Traversing the data
structure means visiting each element of the data structure in order to perform some specific
operation like searching or sorting.
Example: If we need to calculate the average of the marks obtained by a student in 6 different
subject, we need to traverse the complete array of marks and calculate the total sum, then we will
devide that sum by the number of subjects i.e. 6, in order to find the average.
2) Insertion: Insertion can be defined as the process of adding the elements to the data structure
at any location.
If the size of data structure is n then we can only insert n-1 data elements into it.
3) Deletion:The process of removing an element from the data structure is called Deletion. We
can delete an element from the data structure at any random location.
If we try to delete an element from an empty data structure then underflow occurs.
4) Searching: The process of finding the location of an element within the data structure is
called Searching. There are two algorithms to perform searching, Linear Search and Binary
Search. We will discuss each one of them later in this tutorial.
5) Sorting: The process of arranging the data structure in a specific order is known as Sorting.
There are many algorithms that can be used to perform sorting, for example, insertion sort,
selection sort, bubble sort, etc.
6) Merging: When two lists List A and List B of size M and N respectively, of similar type of
elements, clubbed or joined to produce the third list, List C of size (M+N), then this process is
called merging
An abstract data type, sometimes abbreviated ADT, is a logical description of how we view the
data and the operations that are allowed without regard to how they will be implemented. This
means that we are concerned only with what data is representing and not with how it will
eventually be constructed. By providing this level of abstraction, we are creating an
encapsulation around the data. The idea is that by encapsulating the details of the
implementation, we are hiding them from the user’s view. This is called information hiding. The
implementation of an abstract data type, often referred to as a data structure, will require that we
provide a physical view of the data using some collection of programming constructs and
primitive data types.
As the items can be added or removed only from the top i.e. the last item to be added to a stack is the first
item to be removed.
Operations on stack:
While performing push and pop operations the following test must be conducted on the stack.
a) Stack is empty or not b) stack is full or not
1. Push: Push operation is used to add new elements in to the stack. At the time of addition first check
the stack is full or not. If the stack is full it generates an error message "stack overflow".
2. Pop: Pop operation is used to delete elements from the stack. At the time of deletion first check the
stack is empty or not. If the stack is empty it generates an error message "stack underflow".
All insertions and deletions take place at the same end, so the last element added to the stack will be the
first element removed from the stack. When a stack is created, the stack base remains fixed while the stack
top changes as elements are added and removed. The most accessible element is the top and the least
accessible element is the bottom of the stack.
1. push():When an element is added to a stack, the operation is performed by push(). Below Figure
shows the creation of a stack and addition of elements using push().
Initially top=-1, we can insert an element in to the stack, increment the top value i.e top=top+1. We can
insert an element in to the stack first check the condition is stack is full or not. i.e top>=size-1. Otherwise
add the element in to the stack.
Step 1: START
Step 2: if top>=size-1 then
Write “ Stack is Overflow”
Step 3: Otherwise
3.1 : read data value ‘x’
3.2 : top=top+1;
3.3 : stack[top]=x;
Step 4: END
figure shows a stack initially with three elements and shows the deletion of elements using pop().
We can insert an element from the stack, decrement the top value i.e top=top-1.
We can delete an element from the stack first check the condition is stack is empty or not.
i.e top==-1. Otherwise remove the element from the stack.
Algorithm: procedure pop():
Step 1: START
Step 2: if top==-1 then
Write “Stack is Underflow”
Step 3: otherwise
3.1 : print “deleted element”
3.2 : top=top-1;
Step 4: END
3. display():This operation performed display the elements in the stack. We display the element in the
stack check the condition is stack is empty or not i.e top==-1.Otherwise display the list of elements in
the stack.
OUTPUT:
Enter the number of elements in the stack 5
*********Stack operations using array*********
Applications of STACK:
Application of Stack :
Recursive Function.
Expression Evaluation.
Expression Conversion.
Infix to postfix
Infix to prefix
Postfix to infix
Postfix to prefix
Prefix to infix
Prefix to postfix
Reverse a Data
Processing Function Calls
Expressions:
• Operator is a symbol which performs a particular task like arithmetic operation or logical
operation or conditional operation etc.,
• Operands are the values on which the operators can perform the task. Here operand can
be a direct value or variable or address of memory location
Expression types:
Based on the operator position, expressions are divided into THREE types. They are as follows.
• Infix Expression
• Example
• Postfix Expression
• In postfix expression, operator is used after operands. We can say that "Operator
follows the Operands".
• Example:
• Prefix Expression
• In prefix expression, operator is used before operands. We can say that "Operands
follows the Operator".
• Example:
• If the scanned symbol is an operand, then place directly in the postfix expression
(output).
• If the symbol scanned is a right parenthesis, then go on popping all the items from the
stack and place them in the postfix expression till we get the matching left parenthesis.
• If the scanned symbol is an operator, then go on removing all the operators from the stack
and place them in the postfix expression, if and only if the precedence of the operator
which is on the top of the stack is greater than (or greater than or equal) to the precedence
of the scanned operator and push the scanned operator onto the stack otherwise, push the
scanned operator onto the stack.
Example-1
Example2:
Convert ((A – (B + C)) * D) ↑ (E + F) infix expression to postfix form:
Example3
Convert a + b * c + (d * e + f) * g the infix expression into postfix form.
SYMBOL POSTFIX STRING STACK REMARKS
a a
+ a +
b ab +
* ab +*
c abc +*
+ abc*+ +
( abc*+ +(
d abc*+d +(
* abc*+d +(*
e abc*+de +(*
+ abc*+de* +(+
f abc*+de*f +(+
) abc*+de*f+ +
* abc*+de*f+ +*
g abc*+de*f+g +*
End of The input is now empty. Pop the output symbols
abc*+de*f+g*+
string from the stack until it is empty.
Example 3:
Example 4:
Convert the following infix expression A+(B *C–(D/E↑F)*G)*H into its equivalent postfix expression.
A A
+ A +
( A +(
B AB +(
* AB +(*
C ABC +(*
- ABC* +(-
( ABC* +(-(
D ABC*D +(-(
/ ABC*D +(-(/
E ABC*DE +(-(/
↑ ABC*DE +(-(/↑
F ABC*DEF +(-(/↑
) ABC*DEF↑/ +(-
* ABC*DEF↑/ +(-*
G ABC*DEF↑/G +(-*
) ABC*DEF↑/G*- +
* ABC*DEF↑/G*- +*
H ABC*DEF↑/G*-H +*
6 6
5 6, 5
2 6, 5, 2
8 2 3 5 6, 5, 5, 8 Next 8 is pushed
Example2
Example 3:
6 6
2 6, 2
3 6, 2, 3
+ 2 3 5 6, 5
- 6 5 1 1
3 6 5 1 1, 3
8 6 5 1 1, 3, 8
2 6 5 1 1, 3, 8, 2
/ 8 2 4 1, 3, 4
+ 3 4 7 1, 7
* 1 7 7 7
2 1 7 7 7, 2
↑ 7 2 49 49
3 7 2 49 49, 3
+ 49 3 52 52
Reverse a Data:
To reverse a given set of data, we need to reorder the data so that the first and last elements are
exchanged, the second and second last element are exchanged, and so on for all other elements.
o Reversing a string
Reverse a String
A Stack can be used to reverse the characters of a string. This can be achieved by simply pushing
one by one each character onto the Stack, which later can be popped from the Stack one by one.
Because of the last in first out property of the Stack, the first character of the Stack is on the
bottom of the Stack and the last character of the String is on the Top of the Stack and after
performing the pop operation in the Stack, the Stack returns the String in Reverse order.
Stack plays an important role in programs that call several functions in succession. Suppose we
have a program containing three functions: A, B, and C. function A invokes function B, which
invokes the function C.
When we invoke function A, which contains a call to function B, then its processing will not be
completed until function B has completed its execution and returned. Similarly for function B
and C. So we observe that function A will only be completed after function B is completed and
function B will only be completed after function C is completed. Therefore, function A is first to
be started and last to be completed. To conclude, the above function activity matches the last in
first out behavior and can easily be handled using Stack.
Consider addrA, addrB, addrC be the addresses of the statements to which control is returned
after completing the function A, B, and C, respectively.
The above figure shows that return addresses appear in the Stack in the reverse order in which
the functions were called. After each function is completed, the pop operation is performed, and
execution continues at the address removed from the Stack. Thus the program that calls several
functions in succession can be handled optimally by the stack data structure. Control returns to
each function at a correct place, which is the reverse order of the calling sequence.
A queue is linear data structure and collection of elements. A queue is another special kind of list, where
items are inserted at one end called the rear and deleted at the other end called the front. The principle of
queue is a “FIFO” or “First-in-first-out”.
Queue is an abstract data structure. A queue is a useful data structure in programming. It is similar to the
ticket queue outside a cinema hall, where the first person entering the queue is the first person who gets the
ticket.
A real-world example of queue can be a single-lane one-way road, where the vehicle enters first, exits first.
More real-world examples can be seen as queues at the ticket windows and bus-stops and our college
library.
The operations for a queue are analogues to those for a stack; the difference is that the insertions go at the
end of the list, rather than the beginning.
Operations on QUEUE:
A queue is an object or more specifically an abstract data structure (ADT) that allows the following
operations:
Enqueue or insertion: which inserts an element at the end of the queue.
Dequeue or deletion: which deletes an element at the start of the queue.
Again insert another element 33 to the queue. The status of the queue is:
Now, delete an element. The element deleted is the element at the front of the queue.So the status of
the queue is:
Again, delete an element. The element to be deleted is always pointed to by the FRONT pointer. So,
22 is deleted. The queue status is as follows:
Now, insert new elements 44 and 55 into the queue. The queue status is:
Now it is not possible to insert an element 66 even though there are two vacant positions in
the linear queue. To overcome this problem the elements of the queue are to be shifted
towards the beginning of the queue so that it creates vacant position at the rear end. Then
the FRONT and REAR are to be adjusted properly. The element 66 can be inserted at the
rear end. After this operation, the queue status is as follows:
This difficulty can overcome if we treat queue position with index 0 as a position that comes
after position with index 4 i.e., we treat the queue as a circular queue.
Check if the queue is already full by comparing rear to max - 1. if so, then return an overflow
error
If the item is to be inserted as the first element in the list, in that case set the value of front and
rear to 0 and insert the element at the rear end.
Otherwise keep increasing the value of rear and insert each element one by one having rear as
the index.
Otherwise, keep increasing the value of front and return the item stored at the front end of the
queue at each time.
printf("\n==============================================================
===\n");
printf("\n1.insert an element\n2.Delete an element\n3.Display the queue\n4.Exit\n");
printf("\nEnter your choice ?");
scanf("%d",&choice);
switch(choice)
{
case 1:
insert();
break;
case 2:
delete();
break;
case 3:
display();
break;
case 4:
exit(0);
break;
default:
printf("\nEnter valid choice??\n");
}
}
}
void insert()
{
int item;
printf("\nEnter the element\n"); scanf("\n
%d",&item);
if(rear == maxsize-1)
{
printf("\nOVERFLOW\n");
return;
}
if(front == -1 && rear == -1)
{
front = 0;
rear = 0;
}
else
{
rear = rear+1;
}
}
void delete()
{
int item;
if (front == -1 || front > rear)
{
printf("\nUNDERFLOW\n");
return;
}
else
{
item = queue[front];
if(front == rear)
{
front = -1;
rear = -1 ;
}
else
{
front = front + 1;
}
printf("\nvalue deleted ");
}
void display()
{
int i;
if(rear == -1)
{
printf("\nEmpty queue\n");
}
else
{ printf("\nprinting values......\n");
for(i=front;i<=rear;i++)
{
printf("\n%d\n",queue[i]);
}
}
}
Although, the technique of creating a queue is easy, but there are some drawbacks of using this
technique to implement a queue.
o Memory wastage : The space of the array, which is used to store queue elements, can
never be reused to store the elements of that queue because the elements can only be
inserted at front end and the value of front might be so high so that, all the space before
that, can never be filled.
The above figure shows how the memory space is wasted in the array representation of queue. In
the above figure, a queue of size 10 having 3 elements, is shown. The value of the front variable
is 5, therefore, we can not reinsert the values in the place of already deleted element before the
position of front. That much space of the array is wasted and can not be used in the future (for
this queue).
o Deciding the array size
One of the most common problem with array implementation is the size of the array which
requires to be declared in advance. Due to the fact that, the queue can be extended at runtime
depending upon the problem, the extension in the array size is a time taking process and almost
impossible to be performed at runtime since a lot of reallocations take place. Due to this reason,
we can declare the array large enough so that we can store queue elements as enough as possible
but the main problem with this declaration is that, most of the array slots (nearly half) can never
be reused. It will again lead to memory wastage.
Types of Queues
The above figure shows that the elements are inserted from the rear end, and if we insert more
elements in a Queue, then the rear value gets incremented on every insertion. If we want to show
the deletion, then it can be represented as:
In the above figure, we can observe that the front pointer points to the next element, and the element
which was previously pointed by the front pointer was deleted.
The major drawback of using a linear Queue is that insertion is done only from the rear end. If the
first three elements are deleted from the Queue, we cannot insert more elements even though the
space is available in a Linear Queue. In this case, the linear Queue shows the
overflow condition as the rear is pointing to the last element of the Queue.
In Circular Queue, all the nodes are represented as circular. It is similar to the linear Queue
except that the last element of the queue is connected to the first element. It is also known
as Ring Buffer as all the ends are connected to another end. The circular queue can be
represented as:
he drawback that occurs in a linear queue is overcome by using the circular queue. If the empty
space is available in a circular queue, the new element can be added in an empty space by simply
incrementing the value of rear.
3. Priority Queue
A priority queue is another special type of Queue data structure in which each element has some
priority associated with it. Based on the priority of the element, the elements are arranged in a
priority queue. If the elements occur with the same priority, then they are served according to the
FIFO principle.
In priority Queue, the insertion takes place based on the arrival while the deletion occurs based
on the priority. The priority Queue can be shown as:
The above figure shows that the highest priority element comes first and the elements of the
same priority are arranged based on FIFO structure.
4. Deque
Both the Linear Queue and Deque are different as the linear queue follows the FIFO principle
whereas, deque does not follow the FIFO principle. In Deque, the insertion and deletion can occur
from both ends.
1. It allocates the memory dynamically. All the nodes of linked list are non-contiguously
stored in the memory and linked together with the help of pointers.
2. Sizing is no longer a problem since we do not need to define its size at the time of
declaration. List grows as per the program's demand and limited to the available memory
space.
It is the commonly used linked list in programs. If we are talking about the linked list, it means it
is a singly linked list. The singly linked list is a data structure that contains two parts, i.e., one is
the data part, and the other one is the address part, which contains the address of the next or the
successor node. The address part in a node is also known as a pointer.
Suppose we have three nodes, and the addresses of these three nodes are 100, 200 and 300
respectively. The representation of three nodes as a linked list is shown in the below figure:
We can observe in the above figure that there are three different nodes having address 100, 200
and 300 respectively. The first node contains the address of the next node, i.e., 200, the second
node contains the address of the last node, i.e., 300, and the third node contains the NULL value
in its address part as it does not point to any node. The pointer that holds the address of the initial
node is known as a head pointer.
The linked list, which is shown in the above diagram, is known as a singly linked list as it
contains only a single link. In this list, only forward traversal is possible; we cannot traverse in
the backward direction as it has only one link in the list.
Representation of the node in a singly linked list
struct node
{
int data;
struct node *next;
}
As the name suggests, the doubly linked list contains two pointers. We can define the doubly
linked list as a linear data structure with three parts: the data part and the other two address part.
In other words, a doubly linked list is a list that has three parts in a single node, includes one data
part, a pointer to its previous node, and a pointer to the next node.
Suppose we have three nodes, and the address of these nodes are 100, 200 and 300, respectively.
The representation of these nodes in a doubly-linked list is shown below
As we can observe in the above figure, the node in a doubly-linked list has two address parts;
one part stores the address of the next while the other part of the node stores the previous node's
address. The initial node in the doubly linked list has the NULL value in the address part, which
provides the address of the previous node.
struct node
{
int data;
struct node *next;
struct node *prev;
}
In the above representation, we have defined a user-defined structure named a node with three
members, one is data of integer type, and the other two are the pointers, i.e., next and prev of
the node type. The next pointer variable holds the address of the next node, and the prev
pointer holds the address of the previous node. The type of both the pointers, i.e., next and
prev is struct node as both the pointers are storing the address of the node of the struct
node type.
A circular linked list is a variation of a singly linked list. The only difference between the singly
linked list and a circular linked list is that the last node does not point to any node in a singly
linked list, so its link part contains a NULL value. On the other hand, the circular linked list is a
list in which the last node connects to the first node, so the link part of the last node holds the
first node's address. The circular linked list has no starting and ending node. We can traverse in
any direction, i.e., either backward or forward. The diagrammatic representation of the circular
linked list is shown below:
struct node
{
int data;
struct node *next;
}
A circular linked list is a sequence of elements in which each node has a link to the next node,
and the last node is having a link to the first node. The representation of the circular linked list
will be similar to the singly linked list, as shown below:
One way chain or singly linked list can be traversed only in one direction. In other words, we can
say that each node contains only next pointer, therefore we can not traverse the list in the reverse
direction.
Consider an example where the marks obtained by the student in three subjects are stored in a
linked list as shown in the figure.
There are various operations which can be performed on singly linked list. A list of all such
operations is given below.
Node Creation
struct node
{
int data;
struct node *next;
};
struct node *head, *ptr;
ptr = (struct node *)malloc(sizeof(struct node *));
Insertion
The insertion into a singly linked list can be performed at different positions. Based on the
position of the new node being inserted, the insertion is categorized into the following categories.
1. Inserting at Beginning
2. Inserting at the End of the LIst
3. Inserting after specified node
Inserting a new element into a singly linked list at beginning is quite simple. We just need to
make a few adjustments in the node links. There are the following steps which need to be
followed in order to inser a new node in the list at beginning.
1. Allocate the space for the new node and store data into the data part of the node. This will
be done by the following statements.
ptr = (struct node *) malloc(sizeof(struct node *));
ptr → data = item
2. Make the link part of the new node pointing to the existing first node of the list. This will
be done by using the following statement.
ptr->next = head
head = ptr;
Algorithm
o Step 1: IF PTR = NULL
Write OVERFLOW
Go to Step 7
[END OF IF]
o Step 2: SET NEW_NODE = PTR
o Step 3: SET PTR = PTR → NEXT
o Step 4: SET NEW_NODE → DATA = VAL
o Step 5: SET NEW_NODE → NEXT = HEAD
o Step 6: SET HEAD = NEW_NODE
o Step 7: EXIT
Function for inserting element at beginning of the list
void beginsert()
{
struct node *ptr;
int item;
ptr = (struct node *) malloc(sizeof(struct node *));
if(ptr == NULL)
{
printf("\n memory insufficient to allocate");
}
else
{
printf("\nEnter value\n");
scanf("%d",&item);
ptr->data = item;
ptr->next = head; head = ptr;
In order to insert a node at the last, there are two following scenarios which need to be
mentioned.
1. The node is being added to an empty list(CASE 1)
2. The node is being added to the end of the linked list(CASE2)
in the first case,(CASE1)
o The condition (head == NULL) gets satisfied. Hence, we just need to allocate the space
for the node by using malloc statement in C. Data and the link part of the node are set
up by using the following statements.
ptr->data = item;
ptr -> next = NULL;
o Since, ptr is the only node that will be inserted in the list hence, we need to make this
node pointed by the head pointer of the list. This will be done by using the following
Statements.
Head = ptr
In the second case: CASE(2):
o The condition Head = NULL would fail, since Head is not null. Now, we need to declare
a temporary pointer temp in order to traverse through the list. temp is made to point the
first node of the list.
Temp = head
o Then, traverse through the entire linked list using the statements:
while (temp→ next != NULL)
temp = temp → next;
o At the end of the loop, the temp will be pointing to the last node of the list. Now,
allocate the space for the new node, and assign the item to its data part. Since, the new
node is going to be the last node of the list hence, the next part of this node needs to be
pointing to the null. We need to make the next part o
o If the temp node (which is currently the last node of the list) point to the new node (ptr)
.temp = head;
while (temp -> next != NULL)
{
temp = temp -> next;
}
temp->next = ptr;
ptr->next = NULL;
void lastinsert()
{
struct node *ptr,*temp;
int item;
ptr = (struct node*)malloc(sizeof(struct node));
if(ptr == NULL)
{
printf("\nOVERFLOW");
}
else
{
printf("\nEnter value?\n"); scanf("%d",&item);
ptr->data = item;
if(head == NULL)
}
}
}
10
Algorithm
o STEP 1: IF PTR = NULL
WRITE OVERFLOW
GOTO STEP 12
END OF IF
o STEP 2: SET NEW_NODE = PTR
o STEP 3: NEW_NODE → DATA = VAL
o STEP 4: SET TEMP = HEAD
o STEP 5: SET I = 0
o STEP 6: REPEAT STEP 5 AND 6 UNTIL I<loc< li=""></loc<>
o STEP 7: TEMP = TEMP → NEXT
o STEP 8: IF TEMP = NULL
WRITE "DESIRED NODE NOT PRESENT"
GOTO STEP 12
END OF IF
END OF LOOP
o STEP 9: PTR → NEXT = TEMP → NEXT
o STEP 10: TEMP → NEXT = PTR
o STEP 11: SET PTR = NEW_NODE
o STEP 12: EXIT
C Function
void randominsert()
{
int i,loc,item;
11
Deleting a node from the beginning of the list is the simplest operation of all. It just need a few
adjustments in the node pointers. Since the first node of the list is to be deleted, therefore, we just
need to make the head, point to the next of the head. This will be done by using the following
statements
12
Now, free the pointer ptr which was pointing to the head node of the list. This will be done by
using the following statement.
free(ptr)
Algorithm
o Step 1: IF HEAD = NULL
Write UNDERFLOW
Go to Step 5
[END OF IF]
o Step 2: SET PTR = HEAD
o Step 3: SET HEAD = HEAD -> NEXT
o Step 4: FREE PTR
o Step 5: EXIT
C function
void begdelete()
{
struct node *ptr;
if(head == NULL)
{
printf("\nList is empty");
}
else
{
ptr = head;
head = ptr->next;
free(ptr);
printf("\n Node deleted from the begining ...");
}
13
14
C Function
void end_delete()
{
struct node *ptr,*ptr1;
if(head == NULL)
{
printf("\nlist is empty");
}
else if(head -> next == NULL)
{
head = NULL;
free(head);
printf("\nOnly node of the list deleted ...");
}
else
{
ptr = head;
while(ptr->next != NULL)
{
ptr1 = ptr;
ptr = ptr ->next;
}
ptr1->next = NULL;
free(ptr);
printf("\n Deleted Node from the last ...");
15
if(ptr == NULL)
{
printf("\nThere are less than %d elements in the list..",loc);
return;
}
}
Now, our task is almost done, we just need to make a few pointer adjustments. Make the next of
ptr1 (points to the specified node) point to the next of ptr (the node which is to be deleted).
This will be done by using the following statements.
Algorithm
o STEP 1: IF HEAD = NULL
WRITE UNDERFLOW
GOTO STEP 10
END OF IF
16
Searching is performed in order to find the location of a particular element in the list. Searching
any element in the list needs traversing through the list and make the comparison of every
element of the list with the specified element. If the element is matched with any of the list
element then the location of the element is returned from the function.
Algorithm
o Step 1: SET PTR = HEAD
o Step 2: Set I = 0
o STEP 3: IF PTR = NULL
write i+1
End of IF
o STEP 6: I = I + 1
o STEP 7: PTR = PTR → NEXT
[END OF LOOP]
17
C Function
void search()
{
struct node *ptr;
int item,i=0,flag;
ptr = head;
if(ptr == NULL)
{
printf("\nEmpty List\n");
}
else
{
printf("\nEnter item which you want to search?\n");
scanf("%d",&item);
while (ptr!=NULL)
{
if(ptr->data == item)
{
printf("item found at location %d ",i+1);
flag=0;
}
else
{
flag=1;
}
i++;
ptr = ptr -> next;
}
if(flag==1)
{
printf("Item not found\n");
}
}
18
Algorithm
o STEP 1: SET PTR = HEAD
o STEP 2: IF PTR = NULL
WRITE "EMPTY LIST"
GOTO STEP 7
END OF IF
o STEP 4: REPEAT STEP 5 AND 6 UNTIL PTR != NULL
o STEP 5: PRINT PTR→ DATA
o STEP 6: PTR = PTR → NEXT
[END OF LOOP]
o STEP 7: EXIT
DISADVANTAGE
19
Doubly linked list is a complex type of linked list in which a node contains a pointer to the
previous as well as the next node in the sequence. Therefore, in a doubly linked list, a node
consists of three parts: node data, pointer to the next node in sequence (next pointer) , pointer to
the previous node (previous pointer). A sample node in a doubly linked list is shown in the
figure.
A doubly linked list containing three nodes having numbers from 1 to 3 in their data part, is
shown in the following image.
struct node
{
struct node *prev;
int data;
struct node *next;
}
The prev part of the first node and the next part of the last node will always contain null
indicating end in each direction.
In a singly linked list, we could traverse only in one direction, because each node contains
address of the next node and it doesn't have any record of its previous nodes. However, doubly
Memory Representation of a doubly linked list is shown in the following image. Generally,
doubly linked list consumes more space for every node and therefore, causes more expansive
basic operations such as insertion and deletion. However, we can easily manipulate the elements
of the list since the list maintains pointers in both the directions (forward and backward).
In the following image, the first element of the list that is i.e. 13 stored at address 1. The head
pointer points to the starting address 1. Since this is the first element being added to the list
therefore the prev of the list contains null. The next node of the list resides at address 4
therefore the first node contains 4 in its next pointer.
We can traverse the list in this way until we find any node containing null or -1 in its next part.
Node Creation
struct node
{
struct node *prev;
int data;
struct node *next;
};
struct node *head;
INSERTION
Insertion in doubly linked list at beginning
As in doubly linked list, each node of the list contain double pointers therefore we have to maintain
more number of pointers in doubly linked list as compare to singly linked list.
There are two scenarios of inserting any element into doubly linked list. Either the list is empty or it
contains at least one element. Perform the following steps to insert a node in doubly linked list at
beginning.
o Allocate the space for the new node in the memory. This will be done by using the
following statement.
ptr = (struct node *)malloc(sizeof(struct node));
o Check whether the list is empty or not. The list is empty if the condition head == NULL
holds. In that case, the node will be inserted as the only node of the list and therefore the
prev and the next pointer of the node will point to NULL and the head pointer will point
to this node.
ptr->next = NULL;
ptr->prev=NULL;
ptr->data=item;
head=ptr;
o In the second scenario, the condition head == NULL become false and the node will be
inserted in beginning. The next pointer of the node will point to the existing head pointer
C Function
void insertbeginning( )
{
struct node *ptr = (struct node *)malloc(sizeof(struct node));
int item;
printf(“enter the value”);
Algorithm
o Step 1: IF PTR = NULL
Write
OVERFLOW Go
to Step 11
[END OF IF]
o Step 2: SET NEW_NODE = PTR
o Step 3: SET PTR = PTR -> NEXT
o Step 4: SET NEW_NODE -> DATA = VAL
o Step 5: SET NEW_NODE -> NEXT = NULL
o Step 6: SET TEMP = START
o Step 7: Repeat Step 8 while TEMP -> NEXT != NULL
o Step 8: SET TEMP = TEMP ->
NEXT [END OF LOOP]
o Step 9: SET TEMP -> NEXT = NEW_NODE
o Step 10C: SET NEW_NODE -> PREV = TEMP
o Step 11: EXIT
In order to insert a node after the specified node in the list, we need to skip the required number
of nodes in order to reach the mentioned node and then make the pointer adjustments as required.
Use the following steps for this purpose.
o Allocate the memory for the new node. Use the following statements for this.
ptr = (struct node *)malloc(sizeof(struct node));
o Traverse the list by using the pointer temp to skip the required number of nodes in order
to reach the specified node.
temp=head;
for(i=0;i<loc;i++)
{
temp = temp->next;
if(temp == NULL) // the temp will be //null if the list doesn't last long //up to mentio
ned location
{
return;
}
}
o The temp would point to the specified node at the end of the for loop. The new node
needs to be inserted after this node therefore we need to make a fer pointer adjustments
here. Make the next pointer of ptr point to the next node of temp.
ptr → next = temp → next;
make the prev of the new node ptr point to temp.
ptr → prev = temp;
make the next pointer of temp point to the new node ptr.
temp → next = ptr;
make the previous pointer of the next node of temp point to the new node.
temp → next → prev = ptr;
Algorithm
o Step 1: IF PTR = NULL
Write OVERFLOW
Go to Step 15
[END OF IF]
C Function
void insert_specified(int item)
{
struct node *ptr = (struct node *)malloc(sizeof(struct node));
struct node *temp;
int i, loc;
if(ptr == NULL)
{
printf("\n OVERFLOW");
DELETION OPERATION
Deletion at beginning
Deletion in doubly linked list at the beginning is the simplest operation. We just need to copy the
head pointer to pointer ptr and shift the head pointer to its next.
Ptr = head;
head = head → next;
now make the prev of this new head node point to NULL. This will be done by using the
following statements.
head → prev = NULL
Now free the pointer ptr by using the free function.
free(ptr)
Algorithm
o STEP 1: IF HEAD = NULL
WRITE
UNDERFLOW
GOTO STEP 6
C FUNCTION
void beginning_delete()
{
struct node *ptr;
if(head == NULL)
{
printf("\n UNDERFLOW\n");
}
else if(head->next == NULL)
{
head = NULL;
free(head);
printf("\nNode Deleted\n");
}
else
{
ptr = head;
head = head -> next;
head -> prev = NULL;
free(ptr);
printf("\nNode Deleted\n");
}
}
In order to delete the last node of the list, we need to follow the following steps.
o If the list is already empty then the condition head == NULL will become true and
therefore the operation can not be carried on.
o If there is only one node in the list then the condition head → next == NULL become
true. In this case, we just need to assign the head of the list to NULL and free head in
order to completely delete the list.
o Otherwise, just traverse the list to reach the last node of the list. This will be done by
using the following statements.
ptr = head;
if(ptr->next != NULL)
{
ptr = ptr -> next;
}
o The ptr would point to the last node of the ist at the end of the for loop. Just make the
next pointer of the previous node of ptr to NULL.
ptr → prev → next = NULL
free(ptr)
ALGORITHM
o Step 1: IF HEAD = NULL
Write UNDERFLOW
Go to Step 7
[END OF IF]
[END OF LOOP]
C PROGRAM
void last_delete()
{
struct node *ptr;
if(head == NULL)
{
printf("\n UNDERFLOW\n");
}
else if(head->next == NULL)
{
head = NULL;
free(head);
printf("\nNode Deleted\n");
}
else
{
ptr = head;
if(ptr->next != NULL)
{
ptr = ptr -> next;
}
ptr -> prev -> next = NULL;
free(ptr);
printf("\nNode Deleted\n");
}
}
C FUNCTION
void delete_specified( )
{
struct node *ptr, *temp;
int val;
printf("Enter the value");
scanf("%d",&val);
temp = head;
while(temp -> data != val)
temp = temp -> next;
if(temp -> next == NULL)
{
printf("\nCan't delete\n");
}
else if(temp -> next -> next == NULL)
{
temp ->next = NULL;
printf("\nNode Deleted\n");
}
else
{
ptr = temp -> next;
temp -> next = ptr -> next;
ptr -> next -> prev = temp;
free(ptr);
printf("\nNode Deleted\n");
Algorithm
o Step 1: IF HEAD == NULL
WRITE "UNDERFLOW"
GOTO STEP 8
[END OF IF]
o Step 2: Set PTR = HEAD
o Step 3: Set i = 0
o Step 4: Repeat step 5 to 7 while PTR != NULL
o Step 5: IF PTR → data = item
return i
[END OF
IF]
o Step 6: i = i + 1
o Step 7: PTR = PTR → next
o Step 8: Exit
C FUNCTION
void search()
{
struct node *ptr;
int item,i=0,flag;
ptr = head;
if(ptr == NULL)
{
printf("\nEmpty List\n");
}
else
{
printf("\nEnter item which you want to search?\n");
scanf("%d",&item);
while (ptr!=NULL)
{
if(ptr->data == item)
{
printf("\nitem found at location %d ",i+1);
flag=0;
break;
}
else
{
flag=1;
}
i++;
ptr = ptr -> next;
}
if(flag==1)
{
printf("\nItem not found\n");
}
}
Traversing is the most common operation in case of each data structure. For this purpose, copy
the head pointer in any of the temporary pointer ptr.
Ptr = head
then, traverse through the list by using while loop. Keep shifting value of pointer
variable ptr until we find the last node. The last node contains null in its next part.
while(ptr != NULL)
{
printf("%d\n",ptr->data);
ptr=ptr->next;
}
Although, traversing means visiting each node of the list once to perform some specific
operation. Here, we are printing the data associated with each node of the list.
Algorithm
WRITE "UNDERFLOW"
GOTO STEP 6
[END OF IF]
C Function
int traverse()
{
struct node *ptr;
if(head == NULL)
{
printf("\nEmpty List\n");
}
else
{
ptr = head;
while(ptr != NULL)
{
printf("%d\n",ptr->data);
ptr=ptr->next;
}
}
}
SLL nodes contains 2 field -data field DLL nodes contains 3 fields -data field, a previous
and next link field. link field and a next link field.
In SLL, the traversal can be done In DLL, the traversal can be done using the
using the next node link only. Thus previous node link or the next node link. Thus
traversal is possible in one direction traversal is possible in both directions (forward and
only. backward).
The SLL occupies less memory than The DLL occupies more memory than SLL as it
DLL as it has only 2 fields. has 3 fields.
In a circular Singly linked list, the last node of the list contains a pointer to the first node of the list.
We traverse a circular singly linked list until we reach the same node where we started. The circular
singly liked list has no beginning and no ending. There is no null value present in the next part of any of
the nodes.
Circular linked list are mostly used in task maintenance in operating systems. There are many examples
where circular linked list are being used in computer science including browser surfing where a record of
pages visited in the past by the user, is maintained in the form of circular linked lists and can be accessed
again on clicking the previous button.
i)Creation
#include<stdio.h>
#include<stdlib.h>
void create(int);
struct node
{
int data;
struct node *next;
};
struct node *head;
void main ()
{
int choice,item;
do
{
printf("1.Append List\n2.Exit\n3.Enter your choice?");
scanf("%d",&choice);
switch(choice)
{
case 1:
printf("\nEnter the item\n");
scanf("%d",&item);
create(item);
break;
case 2:
exit(0);
break;
default:
printf("\nPlease enter valid choice\n");
}
}while(choice != 3);
}
void create(int item)
{
ii) Insertion
Insertion into circular singly linked list at beginning
#include<stdio.h>
#include<stdlib.h>
struct node
{
int data;
struct node *next;
};
struct node *head;
void beginsert ();
void lastinsert ();
void display();
void main ()
{
int choice =0;
while(choice != 4)
{
printf("\n*********Main Menu*********\n"); printf("\
nChoose one option from the following list ...\n");
printf("\n===============================================\n");
printf("\n1.Insert in begining\n2.Insert at last\n3.display\n4.Exit\n");
printf("\nEnter your choice?\n");
scanf("\n%d",&choice);
switch(choice)
{
case 1:
beginsert();
break;
case 2:
lastinsert();
break;
case 3:
display();
case 4:
}
void lastinsert()
{
struct node *ptr,*temp;
int item;
ptr = (struct node *)malloc(sizeof(struct node));
if(ptr == NULL)
{
printf("\nOVERFLOW\n");
}
else
{
printf("\nEnter Data?");
printf("\nnode inserted\n");
}
void display()
{
struct node *ptr;
ptr=head;
if(head == NULL)
{
printf("\nnothing to print");
}
else
{
printf("\n printing values ... \n");
iii) Deletion
#include<stdio.h>
#include<stdlib.h>
struct node
{
int data;
struct node *next;
};
struct node *head;
void create();
void begin_delete();
void last_delete();
void display();
void main ()
{
int choice =0;
while(choice != 5)
case 1:
create();
break;
case 2:
begin_delete();
break;
case 3:
last_delete();
break;
case 4:
display();
break;
case 5:
exit(0);
break;
default:
printf("Please enter valid choice..");
}
}
}
void create()
{
struct node *ptr,*temp;
int item;
ptr = (struct node *)malloc(sizeof(struct node));
if(ptr == NULL)
{
printf("\nOVERFLOW\n");
}
else
{
printf("\nEnter Data?");
scanf("%d",&item);
ptr->data = item;
if(head == NULL)
{
head = ptr;
ptr -> next = head;
}
else
printf("\nnode inserted\n");
}
void begin_delete()
{
struct node *ptr;
if(head == NULL)
{
printf("\nUNDERFLOW");
}
else if(head->next == head)
{
head = NULL;
free(head);
printf("\nnode deleted\n");
}
else
{ ptr = head;
while(ptr -> next != head)
ptr = ptr -> next;
ptr->next = head->next;
free(head);
head = ptr->next; printf("\
nnode deleted\n");
}
}
void last_delete()
{
struct node *ptr, *preptr;
if(head==NULL)
{
printf("\nUNDERFLOW");
}
else if (head ->next == head)
{
head = NULL;
free(head);
}
else
{
ptr = head;
while(ptr ->next != head)
{
preptr=ptr;
ptr = ptr->next;
}
preptr->next = ptr -> next;
free(ptr);
printf("\nnode deleted\n");
}
}
void display()
{
struct node *ptr;
ptr=head;
if(head == NULL)
{
printf("\nnothing to print");
}
else
{
printf("\n printing values ... \n");
#include<stdio.h>
#include<stdlib.h>
void create(int);
void traverse();
struct node
{
int data;
struct node *next;
};
struct node *head;
void main ()
{
int choice,item;
do
{
printf("1.Append List\n2.Traverse\n3.Exit\n4.Enter your choice?");
scanf("%d",&choice);
switch(choice)
{
case 1:
printf("\nEnter the item\n");
scanf("%d",&item);
create(item);
break;
case 2:
traverse();
break;
case 3:
exit(0);
break;
default:
printf("\nPlease enter valid choice\n");
}
}while(choice != 3);
}
void create(int item)
{
}
void traverse()
{
struct node *ptr;
ptr=head;
if(head == NULL)
{
printf("\nnothing to print");
}
else
{
printf("\n printing values ... \n");
DICTIONARIES:
Dictionary is a collection of pairs of key and value where every value is associated with the
corresponding key.
Basic operations that can be performed on dictionary are:
1. Insertion of value in the dictionary
2. Deletion of particular value from dictionary
3. Searching of a specific value with the help of key
class dictionary
{
private:
int k,data;
struct node
{
public: int key;
int value;
struct node *next;
} *head;
public:
dictionary();
void insert_d( );
void delete_d( );
void display_d(
); void length();
};
94
Now as head is NULL, this new node becomes head. Hence the dictionary contains only one
record. this node will be ‘curr’ and ‘prev’ as well. The ‘cuur’ node will always point to current
visiting node and ‘prev’ will always point to the node previous to ‘curr’ node. As now there is
only one node in the list mark as ‘curr’ node as ‘prev’ node.
New/head/curr/prev
1 10 NULL
New
4 20 NULL
Compare the key value of ‘curr’ and ‘New’ node. If New->key > Curr->key then attach New node
to ‘curr’ node.
If we insert <3,15> then we have to search for it proper position by comparing key
value. (curr->key < New->key) is false. Hence else part will get executed.
1 10 4 20 7 80 NULL
3 15
void dictionary::insert_d( )
{
node *p,*curr,*prev;
cout<<"Enter an key and value to be inserted:";
cin>>k;
cin>>data;
95
p=new node;
p->key=k;
p->value=data;
p->next=NULL;
if(head==NULL)
head=p;
else
{
curr=head;
while((curr->key<p->key)&&(curr->next!=NULL))
{
prev=curr;
curr=curr->next;
}
if(curr->next==NULL)
{
if(curr->key<p->key)
{
curr->next=p;
prev=curr;
}
else
{
p- >next=prev-
>next; prev-
} >next=p;
}
else
{
p->next=prev->next;
prev->next=p;
}
cout<<"\nInserted into dictionary Sucesfully.....\n";
}
}
Case 1: Initially assign ‘head’ node as ‘curr’ node.Then ask for a key value of the node which is
to be deleted. Then starting from head node key value of each jode is cked and compared with the
desired node’s key value. We will get node which is to be deleted in variable ‘curr’. The node
given by variable ‘prev’ keeps track of previous node of ‘cuu’ node. For eg, delete node with key
value 4 then
cur
1 10 3 15 4 20 7 80 ULL
96
Case 2:
Then, simply make ‘head’ node as next node and delete ‘curr’
curr head
1 10 3 15 4 20 7 80 ULL
head
3 15 4 20 7 80 ULL
void dictionary::delete_d( )
{
node*curr,*prev;
cout<<"Enter key value that you want to delete...";
cin>>k;
if(head==NULL)
cout<<"\ndictionary is Underflow";
else
{ curr=head;
while(curr!=NULL)
{
if(curr->key==k)
break;
prev=curr;
curr=curr->next;
}
}
if(curr==NULL)
cout<<"Node not found...";
else
{
if(curr==head)
97
head=curr->next;
else
prev->next=curr->next;
delete curr;
cout<<"Item deleted from dictionary...";
}
}
1 2 3 4 5 6 7
head tail
node node
The skip list is an efficient implementation of dictionary using sorted chain. This is because in
skip list each node consists of forward references of more than one node at a time.
98
Eg:
null
Now to search any node from above given sorted chain we have to search the sorted chain from
head node by visiting each node. But this searching time can be reduced if we add one level in
every alternate node. This extra level contains the forward pointer of some node. That means in
sorted chain come nodes can holds pointers to more than one node.
NULL
If we want to search node 40 from above chain there we will require comparatively less time. This
search again can be made efficient if we add few more pointers forward references.
NULL
skip list
99
The individual node looks like this:
Element *next
Searching:
The desired node is searched with the help of a key value.
Searching for a key within a skip list begins with starting at header at the overall list level and
moving forward in the list comparing node keys to the key_val. If the node key is less than the
key_val, the search continues moving forward at the same level. If o the other hand, the node key
is equal to or greater than the key_val, the search drops one level and continues forward. This
process continues until the desired key_val has been found if it is present in the skip list. If it is
not, the search will either continue at the end of the list or until the first key with a value greater
than the search key is found.
Insertion:
There are two tasks that should be done before insertion operation:
1. Before insertion of any node the place for this new node in the skip list is searched. Hence
before any insertion to take place the search routine executes. The last[] array in the search
routine is used to keep track of the references to the nodes where the search, drops down
one level.
2. The level for the new node is retrieved by the routine randomelevel()
skipNode<K,E>* temp =
search(New_pair.key); if(temp->element.key
== New_pair.key)
100
{
temp->element.value=New_pair.value;
return;
}
for(int i=0;i<=New_Level;i++)
{
newNode->next[i] = last[i]-
>next[i]; last[i]->next[i] =
newNode;
}
len++;
return;
}
Deletion:
First of all, the deletion makes use of search algorithm and searches the node that is to be deleted.
If the key to be deleted is found, the node containing the key is removed.
for(int i=0;i<=levels;i++)
101
{
if(last[i]->next[i] == temp)
last[i]=>next[i] = temp->next[i];
}
For example: Consider that we want place some employee records in the hash table The record of
employee is placed with the help of key: employee ID. The employee ID is a 7 digit number for
placing the record in the hash table. To place the record 7 digit number is converted into 3 digits
by taking only last three digits of the key.
If the key is 496700 it can be stored at 0th position. The second key 8421002, the record of those
key is placed at 2nd position in the array.
Hence the hash function will be- H(key) = key%1000
Where key%1000 is a hash function and key obtained by hash function is called hash key.
Bucket and Home bucket: The hash function H(key) is used to map several dictionary
entries in the hash table. Each position of the hash table is called bucket.
The function H(key) is home bucket for the dictionary with pair whose value is key.
102
h(key) = record % table size 0
1
54%10=4 2 72
72%10=2 3
89%10=9 4 54
37%10=7 5
6
7 37
8
9 89
2. Mid Square:
In the mid square method, the key is squared and the middle or mid part of the result is used as the
index. If the key is a string, it has to be preprocessed to produce a number.
Consider that if we want to place a record 3111 then
31112 = 9678321
for the hash table of size 1000
H(3111) = 783 (the middle 3 digits)
H(key) = floor(p *(fractional part of key*A)) where p is integer constant and A is constant real
number.
H(key) = floor(50*(107*0.61803398987))
= floor(3306.4818458045)
= 3306
At 3306 location in the hash table the record 107 will be placed.
4. Digit Folding:
The key is divided into separate parts and using some simple operation these parts are
combined to produce the hash key.
For eg; consider a record 12365412 then it is divided into separate parts as 123 654 12 and these
are added together
H(key) = 123+654+12
= 789
The record will be placed at location 789
5. Digit Analysis:
The digit analysis is used in a situation when all the identifiers are known in advance. We
first transform the identifiers into numbers using some radix, r. Then examine the digits of each
identifier. Some digits having most skewed distributions are deleted. This deleting of digits is
continued until the number of remaining digits is small enough to give an address in the range of
the hash table. Then these digits are used to calculate the hash address.
103
COLLISION
the hash function is a function that returns the key value using which the record can be placed in
the hash table. Thus this function helps us in placing the record in the hash table at appropriate
position and due to this we can retrieve the record directly from that location. This function need
to be designed very carefully and it should not return the same hash key address for two different
records. This is an undesirable situation in hashing.
Definition: The situation in which the hash function returns the same hash key (home bucket) for
more than one record is called collision and two same hash keys returned for different records is
called synonym.
Similarly when there is no room for a new pair in the hash table then such a situation is
called overflow. Sometimes when we handle collision it may lead to overflow conditions.
Collision and overflow show the poor hash functions.
For example, 0
1 131
Consider a hash function. 2
3 43
H(key) = recordkey%10 having the hash table size of 10 4 44
5
The record keys to be placed are 6 36
7 57
131, 44, 43, 78, 19, 36, 57 and 77 8 78
19
131%10=1 9
44%10=4
43%10=3
78%10=8
19%10=9
36%10=6
57%10=7
77%10=7
Now if we try to place 77 in the hash table then we get the hash key to be 7 and at index 7 already
the record key 57 is placed. This situation is called collision. From the index 7 if we look for next
vacant position at subsequent indices 8.9 then we find that there is no room to place 77 in the hash
table. This situation is called overflow.
104
CHAINING
In collision handling method chaining is a concept which introduces an additional field with data
i.e. chain. A separate chain table is maintained for colliding data. When collision occurs then a linked
list(chain) is maintained at the home bucket.
For eg;
Here D = 10
0
1 131 21 61 NULL
3 NULL
131 61 NULL
7 97 NULL
A chain is maintained for colliding elements. for instance 131 has a home bucket (key) 1. similarly
key 21 and 61 demand for home bucket 1. Hence a chain is maintained at index 1.
For example:
Consider that following keys are to be inserted in the hash table 131,
105
Initially, we will put the following keys in the hash table.
We will use Division hash function. That means the keys are placed using the formula
H(key) = 131 % 10
=1
Index 1 will be the home bucket for 131. Continuing in this fashion we will place 4, 8, 7.
Now the next key to be inserted is 21. According to the hash function
H(key)=21%10
H(key) = 1
But the index 1 location is already occupied by 131 i.e. collision occurs. To resolve this collision
we will linearly move down and at the next empty location we will prob the element. Therefore
21 will be placed at the index 2. If the next element is 5 then we get the home bucket for 5 as
index 5 and this bucket is empty so we will put the element 5 at index 5.
106
The next record key is 9. According to decision hash function it demands for the home bucket 9.
Hence we will place 9 at index 9. Now the next final record key 29 and it hashes a key 9. But
home bucket 9 is already occupied. And there is no next empty bucket as the table size is limited
to index 9. The overflow occurs. To handle it we move back to bucket 0 and is the location over
there is empty 29 will be placed at 0th index.
Problem with linear probing:
One major problem with linear probing is primary clustering. Primary clustering is a process in which a block
of data is formed in the hash table when collision is resolved.
Key
39
19%10 = 9 cluster is formed
18%10 = 8 29
39%10 = 9 8
29%10 = 9
8%10 = 8
18
19
QUADRATIC PROBING:
Quadratic probing operates by taking the original hash value and adding successive values of an arbitrary
quadratic polynomial to the starting value. This method uses following formula.
for eg; If we have to insert following elements in the hash table with table size 10:
Consider i = 0 then
(17 + 02) % 10 = 7
107
(17 + 12) % 10 = 8, when i =1
H1(37) = 37 % 10 = 7
37
H1(90) = 90 % 10 = 0
H1(45) = 45 % 10 = 5
H1(22) = 22 % 10 = 2 49
H1(49) = 49 % 10 = 9
108
Now if 17 to be inserted then
Key
H1(17) = 17 % 10 = 7 90
H2(key) = M – (key % M)
17
22
Here M is prime number smaller than the size of the table. Prime number
smaller than table size 10 is 7
Hence M = 7
45
H2(17) = 7-(17 % 7)
=7–3=4
37
That means we have to insert the element 17 at 4 places from 37. In short we ha v e to take
jumps. Therefore the 17 will be placed at index 1. 449
H2(55) = 7-(55 % 7) 17
=7–6=1 22
That means we have to take one jump from index 5 to place 55.
Finally the hash table will be -
45
55
37
49
The double hashing requires another hash function whose probing efficiency is same as
some another hash function required when handling random collision.
The double hashing is more complex to implement than quadratic probing. The quadratic
probing is fast technique than double hashing.
REHASHING
Rehashing is a technique in which the table is resized, i.e., the size of table is doubled by creating
a new table. It is preferable is the total size of table is a prime number. There are situations in
which the rehashing is required.
109
In such situations, we have to transfer entries from old table to the new table by re computing their
positions using hash functions.
Consider we have to insert the elements 37, 90, 55, 22, 17, 49, and 87. the table size is 10 and will use
hash function.,
37 % 10 = 7
90 % 10= 0
55 % 10 = 5
22 % 10 = 2
17 % 10 = 7 Collision solved by linear probing 49
% 10 = 9
Now this table is almost full and if we try to insert more elements collisions will occur and eventually
further insertions will fail. Hence we will rehash by doubling the table size. The old table size is
10 then we should double this size for new table, that becomes 20. But 20 is not a prime number,
we will prefer to make the table size as 23. And new hash function will be
Advantages:
110
1. This technique provides the programmer a flexibility to enlarge the table size if required.
2. Only the space gets doubled with simple hash function which avoids occurrence of
collisions.
EXTENSIBLE HASHING
Extensible hashing is a technique which handles a large amount of data. The data to be
placed in the hash table is by extracting certain number of bits.
Extensible hashing grow and shrink similar to B-trees.
In extensible hashing referring the size of directory the elements are to be placed in
buckets. The levels are indicated in parenthesis.
0 1
Levels
(0) (1)
001 111
010 data to be
placed in
bucket
The bucket can hold the data of its global depth. If data in bucket is more than
global depth then, split the bucket and double the directory.
Consider we have to insert 1, 4, 5, 7, 8, 10. Assume each page can hold 2 data entries (2 is the
depth).
Step 1: Insert 1, 4
1 = 001
0
4 = 100
(0)
We will examine last bit
001
of data and insert the data
010
in bucket.
111
1 = 001
0 1
4 = 100
(0) (1)
001 5 = 101
100
010
Based on last bit the data
is inserted.
Step 2: Insert 7
7 = 111
But as depth is full we can not insert 7 here. Then double the directory and split the bucket.
After insertion of 7. Now consider last two bits.
00 01 10 11
00 01 10 11
(2)
(1)
001 111
100
010
1000
Step 4: Insert 1 0
112
Thus the data is inserted using extensible hashing.
Deletion Operation:
00 01 10 11
Delete 7.
00 01 10 11
(1) (1)
00 00 10 11
(1) (1)
100 001
101
Applications of hashing:
113
1. In compilers to keep track of declared variables.
2. For online spelling checking the hashing functions are used.
3. Hashing helps in Game playing programs to store the moves made.
4. For browser program while caching the web pages, hashing is used.
5. Construct a message authentication code (MAC)
6. Digital signature.
7. Time stamping
8. Key updating: key is hashed at specific intervals resulting in new key
114
Data Structures
UNIT IV
Graphs: Graph Implementation Methods. Graph Traversal Methods. (DFS,BFS)
Sorting: Heap Sort, External Sorting- Model for external sorting, Merge Sort
Introduction to Graphs
Graph is a non-linear data structure. It contains a set of points known as nodes (or vertices) and a
set of links known as edges (or Arcs). Here edges are used to connect the vertices. A graph is
defined as follows...
Graph is a collection of vertices and arcs in which vertices are connected with arcs
Graph is a collection of nodes and edges in which nodes are connected with edges
Generally, a graph G is represented as G = ( V , E ), where V is set of vertices and E is set of
edges.
Example
The following is a graph with 5 vertices and 6 edges.
This graph G can be defined as G = ( V , E )
Where V = {A,B,C,D,E} and E = {(A,B),(A,C)(A,D),(B,D),(C,D),(B,E),(E,D)}.
DFS (
A- BFS
Graph Terminology
We use the following terms in graph data structure...
Vertex
Individual data element of a graph is called as Vertex. Vertex is also known as node. In above
example graph, A, B, C, D & E are known as vertices.
Edge
An edge is a connecting link between two vertices. Edge is also known as Arc. An edge is
represented as (startingVertex, endingVertex). For example, in above graph the link between
vertices A and B is represented as (A,B). In above example graph, there are 7 edges (i.e., (A,B),
(A,C), (A,D), (B,D), (B,E), (C,D), (D,E)).
Undirected Graph
A graph with only undirected edges is said to be undirected graph.
Directed Graph
A graph with only directed edges is said to be directed graph.
Mixed Graph
A graph with both undirected and directed edges is said to be mixed graph.
Origin
If a edge is directed, its first endpoint is said to be the origin of it.
Destination
If a edge is directed, its first endpoint is said to be the origin of it and the other endpoint is said to be
the destination of that edge.
Adjacent
If there is an edge between vertices A and B then both A and B are said to be adjacent. In other
words, vertices A and B are said to be adjacent if there is an edge between them.
Incident
Edge is said to be incident on a vertex if the vertex is one of the endpoints of that edge.
Outgoing Edge
A directed edge is said to be outgoing edge on its origin vertex.
Incoming Edge
A directed edge is said to be incoming edge on its destination vertex.
Degree
Total number of edges connected to a vertex is said to be degree of that vertex.
Indegree
Total number of incoming edges connected to a vertex is said to be indegree of that vertex.
Outdegree
Total number of outgoing edges connected to a vertex is said to be outdegree of that vertex.
Self-loop
Simple Graph
A graph is said to be simple if there are no parallel and self-loop edges.
Path
A path is a sequence of alternate vertices and edges that starts at a vertex and ends at other
vertex such that each edge is incident to its predecessor and successor vertex.
Graph Representations
In this representation, the graph is represented using a matrix of size total number of vertices by
a total number of edges. That means graph with 4 vertices and 6 edges is represented using a
matrix of size 4X6. In this matrix, rows represent vertices and columns represents edges. This
matrix is filled with 0 or 1 or -1. Here, 0 represents that the row edge is not connected to
column vertex, 1 represents that the row edge is connected as the outgoing edge to column
vertex and -1 represents that the row edge is connected as the incoming edge to column vertex.
Adjacency List
In this representation, every vertex of a graph contains list of its adjacent vertices.
For example, consider the following directed graph representation implemented using linked
list...
Graph Traversal
Graph traversal is a technique used for a searching vertex in a graph. The graph traversal is also
used to decide the order of vertices is visited in the search process. A graph traversal finds the
edges to be used in the search process without creating loops. That means using graph traversal
we visit all the vertices of the graph without getting into looping path.
There are two graph traversal techniques and they are as follows...
1. DFS (Depth First Search)
2. BFS (Breadth First Search)
DFS (Depth First Search)
DFS traversal of a graph produces a spanning tree as final result. Spanning Tree is a graph
without loops. We use Stack data structure with maximum size of total number of vertices in the
graph to implement DFS traversal.
Example
11
12
15
16
Sorting: Heap Sort, External Sorting- Model for external sorting, Merge Sort
SORTING INTRODUCTION
The term sorting came into picture, as humans realized the importance of searching quickly.
There are so many things in our real life that we need to search for, like a particular record in
database, roll numbers in merit list, a particular telephone number in telephone directory, a
particular page in a book etc. All this would have been a mess if the data was kept unordered
and unsorted, but fortunately the concept of sorting came into existence, making it easier for
everyone to arrange data in an order, hence making it easier to search.
Sorting Efficiency
The two main criteria to judge which algorithm is better than the other have been:
1. Time taken to sort the given data.
2. Memory Space required to do so.
Different Sorting Algorithms
There are many different techniques available for sorting, differentiated by their efficiency and
space requirements. Following are some sorting techniques which we will be covering here.
1. Bubble Sort
2. Insertion Sort
3. Selection Sort
4. Merge Sort
5. Heap Sort
Sorting Terminology
Stability is mainly important when we have key value pairs with duplicate keys possible (like
people names as keys and their details as values). And we wish to sort these objects by keys.
A sorting algorithm is said to be stable if two objects with equal keys appear in the same
order in sorted output as they appear in the input array to be sorted.
Informally, stability means that equivalent elements retain their relative positions, after sorting.
When equal elements are indistinguishable, such as with integers or more generally, any data
where the entire element is the key, stability is not an issue. Stability is also not an issue if all
keys are different.
18
Consider the following dataset of Student Names and their respective class sections.
If we sort this data according to name only, then it is highly unlikely that the resulting dataset
will be grouped according to sections as well.
19
The dataset is now sorted according to sections, but not according to names.
In the name-sorted dataset, the tuple (alice , B)was before (ERIC,B), but since the sorting
algorithm is not stable, the relative order is lost.
If on the other hand we used a stable sorting algorithm, the result would be-
20
What is a Heap?
Heap is a special tree-based data structure that satisfies the following special heap properties:
1. Shape Property: Heap data structure is always a Complete Binary Tree, which means all
levels of the tree are fully filled.
Heap Property: All nodes are either greater than or equal to or less than or equal to each of
its children. If the parent nodes are greater than their child nodes, heap is called a Max-Heap,
and if the parent nodes are smaller than their child nodes, heap is called Min-Heap.
21
Algorithm
Step 1 − Create a new node at the end of heap.
Step 2 − Assign new value to the node.
Step 3 − Compare the value of this child node with its
parent. Step 4 − If value of parent is less than child, then
swap them. Step 5 − Repeat step 3 & 4 until Heap property
holds.
Note − In Min Heap construction algorithm, we expect the value of
the parent node to be less than that of the child node.
Example:
/ \
3 5
/ \ /\
4 6 13 10
/\ /\
9 8 15 17
/ \
3 5
/ \ /\
4 17 13 10
/\ /\
9 8 15 6
/ \
3 5
/ \ /\
9 17 13 10
/\ /\
4 8 15 6
23
3 13
/ \ / \
9 17 5 10
/\ /\
4 8 15 6
/ \
17 13
/ \ /\
9 15 5 10
/\ /\
4 83
17
24
25
15 13
/ \ / \
9 6 5 10
/\ / \
4 83 1
26
PROGRAM
#include <stdio.h>
/* function to heapify a subtree. Here 'i' is the
index of root node in array a[], and 'n' is the size of heap. */
void heapify(int a[], int n, int i)
{
int largest = i; // Initialize largest as root
int left = 2 * i + 1; // left child
int right = 2 * i + 2; // right child
// If left child is larger than root
if (left < n && a[left] > a[largest])
largest = left;
// If right child is larger than root
if (right < n && a[right] > a[largest])
largest = right;
// If root is not largest
if (largest != i) {
// swap a[i] with a[largest]
int temp = a[i];
a[i] = a[largest];
a[largest] = temp;
heapify(a, n, largest);
}
}
/*Function to implement the heap sort*/
void heapSort(int a[], int n)
{
for (int i = n / 2 - 1; i >= 0; i--)
heapify(a, n, i);
// One by one extract an element from
heap for (int i = n - 1; i >= 0; i--) {
/* Move current root element to end*/
// swap a[0] with a[i]
int temp = a[0];
29
30
MERGE SORT
Merge Sort follows the rule of Divide and Conquer to sort a given set of numbers/elements,
recursively, hence consuming less time.
Before jumping on to, how merge sort works and its implementation, first let’s understand
what the rule of Divide and Conquer is?
When Britishers s came to India, they saw a country with different religions living in harmony,
hard working but naive citizens, unity in diversity, and found it difficult to establish their
empire. So, they adopted the policy of Divide and Rule. Where the population of India was
collectively a one big problem for them, they divided the problem into smaller problems, by
instigating rivalries between local kings, making them stand against each other, and this
worked very well for them.
Well that was history, and a socio-political policy (Divide and Rule), but the idea here is, if we
can somehow divide a problem into smaller sub-problems, it becomes easier to eventually
solve the whole problem.
In Merge Sort, the given unsorted array with n elements is divided into n sub arrays, each
having one element, because a single element is always sorted in itself. Then, it repeatedly
merges these sub arrays, to produce new sorted sub arrays, and in the end, one complete
sorted array is produced.
31
3. Combine the solutions of the sub problems to find the solution of the actual problem.
As we have already discussed that merge sort utilizes divide-and-conquer rule to break the
problem into sub-problems, the problem in this case being, sorting a given array.
In merge sort, we break the given array midway, for example if the original array had 6
elements, then merge sort will break it down into two sub arrays with 3 elements each.
But breaking the original array into 2 smaller sub arrays is not helping us in sorting the array.
So we will break these sub arrays into even smaller sub arrays, until we have multiple sub arrays
with single element in them. Now, the idea here is that an array with a single element is
already sorted, so once we break the original array into sub arrays which has only a single
element, we have successfully broken down our problem into base problems.
And then we have to merge all these sorted sub arrays, step by step to form one single sorted
array.
Below, we have a pictorial representation of how merge sort will sort the given array.
32
1. We take a variable p and store the starting index of our array in this. And we take
another variable r and store the last index of array in it.
2. Then we find the middle of the array using the formula (p + r)/2 and mark the middle
index as q, and break the array into two sub arrays, from p to q and from q +
1 to r index.
3. Then we divide these 2 sub arrays again, just like we divided our main array and this
continues.
4. Once we have divided the main array into sub arrays with single elements, then we
start merging the sub arrays.
Example
We know that merge sort first divides the whole array iteratively into equal halves unless the
atomic values are achieved. We see here that an array of 8 items is divided into two arrays of
size 4.
This does not change the sequence of appearance of items in the original. Now we divide
these two arrays into halves.
We further divide these arrays and we achieve atomic value which can no more be divided.
34
We first compare the element for each list and then combine them into another list in a
sorted manner. We see that 14 and 33 are in sorted positions. We compare 27 and 10 and in
the target list of 2 values we put 10 first, followed by 27. We change the order of 19 and 35
whereas 42 and 44 are placed sequentially.
In the next iteration of the combining phase, we compare lists of two data values, and merge
them into a list of found data values placing all in a sorted order.
After the final merging, the list should look like this −
PROGRAM
#include <stdio.h>
void mergeSort(int [], int, int,
int); void partition(int [],int, int);
int main()
{
int list[50];
int i, size;
printf("Enter total number of elements:");
scanf("%d", &size);
printf("Enter the elements:\n");
for(i = 0; i < size; i++)
{
scanf("%d", &list[i]);
}
partition(list, 0, size - 1);
35
Insertion Sort
Properties:
INSERTION-SORT can take different amounts of time to sort two input sequences
of the same size depending on how nearly sorted they already are.
In INSERTION-SORT, the best case occurs if the array is already sorted.
T [Best Case]= O(n)
37
Average Case: When half the elements are sorted while half not
The running time of insertion sort therefore belongs to both Ω(n) and O(n²)
Pros:
For nearly-sorted data, it’s incredibly efficient (very near O(n) complexity)
It works in-place, which means no auxiliary storage is necessary i.e. requires only a
constant amount O(1) of additional memory space
Efficient for (quite) small data sets.
Stable, i.e. does not change the relative order of elements with equal keys
Cons:
It is less efficient on list containing more number of elements
Insertion sort needs a large number of element shifts
Merge Sort:
Properties
Merge Sort’s running time is 0(nlogn) in best, worst and average case
The space complexity of Merge sort is O(n). This means that this algorithm takes a lot
of space and May slower down operations for the last data sets.
Merge sort is external sorting.
Pros:
It is quicker for larger lists because unlike insertion it doesn't go through the whole
list several times.
The merge sort is slightly faster than the heap sort for larger sets
(𝑛𝑙𝑜𝑔𝑛) worst case asymptotic complexity.
Stable sorting algorithm
Not a in-place sorting technique
Cons
Slower comparative to the other sort algorithms for smaller data sets
Marginally slower than quick sort in practice
Goes through the whole process even if the list is sorted
It uses more memory space to store the sub elements of the initial split list.
It requires twice the memory of the heap sort because of the second array.
38
Heap data structure is always a Complete Binary Tree, which means all levels of the
tree are fully filled
A.heap_size of an array is initially the size of the array. At first iteration, after exchanging
root of the max_heap tree (A[1]) with A[i] = A[A.length] (last element inside array A)
Initially create a Heap. extract_max(), put element of the heap in the array until we
have the complete sorted list in our array.
The Heap Sort sorting algorithm seems to have a worst case complexity of O(n log(n))
Heap sort is in place sorting techniques.
Pros:
Heap sort and merge sort are asymptotically optimal comparison sorts
Cons: N/A
The time required to merge in a merge sort is counterbalanced by the time required to
build the heap in heap sort
Heap Sort is better :
The Heap Sort sorting algorithm uses O(1) space for the sorting operation while Merge
Sort which takes O(n) space
39
Similarity
Heap sort and insertion sort are both used comparison based sorting technique
Differences
Heap Sort is not stable whereas Insertion Sort is.
When already sorted, Insertion Sort will not sort every element again where as Heap Sort
will use extract max and heapify again and again When already sorted, Insertion Sort
takes O(n) TC whereas Heap Sort will take O(n log(n)) time Insertion Sort is not efficient
for large input data whereas Heap Sort is.
40
TREES INTRODUCTION
The tree is a nonlinear hierarchical data structure and comprises a collection of entities known as
nodes. It connects each node in the tree data structure using "edges”, both directed and
undirected.
The image below represents the tree data structure. The blue-colored circles depict the nodes of the
tree and the black lines connecting each node with another are called edges.
You will understand the parts of trees better, in the terminologies section.
Other data structures like arrays, linked-list, stacks, and queues are linear data structures, and all
these data structures store data in sequential order. Time complexity increases with increasing
data size to perform operations like insertion and deletion on these linear data structures. But it is
not acceptable for today's world of computation.
The non-linear structure of trees enhances the data storing, data accessing, and manipulation
processes by employing advanced control methods traversal through it. You will learn about tree
traversal in the upcoming section.
Tree Terminologies
Root Node
Edge
Parent node
Child node
Root
In a tree data structure, the root is the first node of the tree. The root node is the initial
node of the tree in data structures.
In the tree data structure, there must be only one root node.
Edge
In a tree in data structures, the connecting link of any two nodes is called the edge of the
tree data structure.
In the tree data structure, N number of nodes connecting with N -1 number of edges.
Parent
In the tree in data structures, the node that is the predecessor of any node is known as a parent
node, or a node with a branch from itself to any other successive node is called the parent node.
The node, a descendant of any node, is known as child nodes in data structures.
In a tree, any number of parent nodes can have any number of child nodes.
Siblings
In trees in the data structure, nodes that belong to the same parent are called siblings.
Leaf
Trees in the data structure, the node with no child, is known as a leaf node.
In trees, leaf nodes are also called external nodes or terminal nodes.
Trees in the data structure have at least one child node known as internal nodes.
Sometimes root nodes are also called internal nodes if the tree has more than one node.
Degree
In the tree data structure, the total number of children of a node is called the degree of
the node.
The highest degree of the node among all the nodes in a tree is called the Degree of Tree.
In tree data structures, the root node is said to be at level 0, and the root node's children are at
level 1, and the children of that node at level 1 will be level 2, and so on.
Height
In a tree data structure, the number of edges from the leaf node to the particular node in
the longest path is known as the height of that node.
In the tree, the height of the root node is called "Height of Tree".
Depth
In the tree, the total number of edges from the root node to the leaf node in the longest
path is known as "Depth of Tree".
Path
In the tree in data structures, the sequence of nodes and edges from one node to another
node is called the path between those two nodes.
Subtree
In the tree in data structures, each child from a node shapes a sub-tree recursively and every child
in the tree will form a sub-tree on its parent node.
General Tree
Properties
The general tree follows all properties of the tree data structure.
BINARY TREES
The Binary tree means that the node can have maximum two children. Here, binary name itself
suggests that 'two'; therefore, each node can have either 0, 1 or 2 children.
The above tree is a binary tree because each node contains the utmost two children. The logical
representation of the above tree is given below:
o The height of the tree is defined as the longest path from the root node to the leaf node.
The tree which is shown above has a height equal to 3. Therefore, the maximum number
of nodes at height 3 is equal to (1+2+4+8) = 15. In general, the maximum number of
nodes possible at height h is (20 + 21 + 22+….2h) = 2h+1 -1.
o If the number of nodes is minimum, then the height of the tree would be maximum.
Conversely, if the number of nodes is maximum, then the height of the tree would be
minimum.
As we know that,
n = 2h+1 -1
n+1 = 2h+1
log2(n+1) = log2(2h+1)
log2(n+1) = h+1
h = log2(n+1) – 1
As we know that,
n = h+1
h= n-1
The full binary tree is also known as a strict binary tree. The tree can only be considered as the
full binary tree if each node must contain either 0 or 2 children. The full binary tree can also be
defined as the tree in which each node must contain 2 children except the leaf nodes.
In the above tree, we can observe that each node is either containing zero or two children;
therefore, it is a Full Binary tree.
o The number of leaf nodes is equal to the number of internal nodes plus 1. In the above
example, the number of internal nodes is 5; therefore, the number of leaf nodes is equal to
6.
o The maximum number of nodes is the same as the number of nodes in the binary tree,
i.e., 2h+1 -1.
o The maximum height of the full binary tree can be computed as:
n+1 = 2*h
h = n+1/2
The complete binary tree is a tree in which all the nodes are completely filled except the last
level. In the last level, all the nodes must be as left as possible. In a complete binary tree, the
nodes should be added from the left.
The above tree is a complete binary tree because all the nodes are completely filled, and all the
nodes in the last level are added at the left first.
A tree is a perfect binary tree if all the internal nodes have 2 children, and all the leaf nodes are at
the same level.
10
The below tree is not a perfect binary tree because all the leaf nodes are not at the same level.
The degenerate binary tree is a tree in which all the internal nodes have only one children.
The above tree is a degenerate binary tree because all the nodes have only one child. It is also
known as a right-skewed tree as all the nodes have a right child only.
11
The balanced binary tree is a tree in which both the left and right trees height differ by atmost 1.
For example, AVL and Red-Black trees are balanced binary tree.
The above tree is a balanced binary tree because the difference between the height of left subtree
and right subtree is zero.
The above tree is not a balanced binary tree because the difference between the height of left
subtree and the right subtree is greater than 1.
12
Inorder Traversal
Algorithm Inorder(tree)
Uses of Inorder
In the case of binary search trees (BST), Inorder traversal gives nodes in non-decreasing order.
To get nodes of BST in non-increasing order, a variation of Inorder traversal where Inorder
traversal s reversed can be used.
Example:
Preorder Traversal
Algorithm Preorder(tree)
13
Postorder Traversal
Algorithm Postorder(tree)
EXAMPLE
Uses of Postorder
Postorder traversal is also useful to get the postfix expression of an expression tree.
Level order traversal of a tree is breadth first traversal for the tree.
14
The idea is to start with the root node, which would be the last item in the postorder sequence,
and find the boundary of its left and right subtree in the inorder sequence. To find the boundary,
search for the index of the root node in the inorder sequence. All keys before the root node in the
inorder sequence become part of the left subtree, and all keys after the root node become part of
the right subtree. Repeat this recursively for all nodes in the tree and construct the tree in the
process.
Inorder : { 4, 2, 1, 7, 5, 8, 3, 6 }
Postorder : { 4, 2, 7, 8, 5, 6, 3, 1 }
Root would be the last element in the postorder sequence, i.e., 1. Next, locate the index of the
root node in the inorder sequence. Now since 1 is the root node, all nodes before 1 in the inorder
sequence must be included in the left subtree of the root node, i.e., {4, 2} and all the nodes
after 1 must be included in the right subtree, i.e., {7, 5, 8, 3, 6}. Now the problem is reduced to
building the left and right subtrees and linking them to the root node.
Left subtree:
Inorder : {4, 2}
Postorder : {4, 2}
Right subtree:
Inorder : {7, 5, 8, 3, 6}
Postorder : {7, 8, 5, 6, 3}
The idea is to recursively follow the above approach until the complete tree is constructed.
15
The value of the key of the left sub-tree is less than the value of its parent (root) node's
key.
The value of the key of the right sub-tree is greater than or equal to the value of its parent
(root) node's key.
In the above figure, we can observe that the root node is 40, and all the nodes of the left subtree
are smaller than the root node, and all the nodes of the right subtree are greater than the root
node.
Similarly, we can see the left child of root node is greater than its left child and smaller than its
right child. So, it also satisfies the property of binary search tree. Therefore, we can say that the
tree in the above image is a binary search tree.
Suppose if we change the value of node 35 to 55 in the above tree, check whether the tree will be
binary search tree or not.
In the above tree, the value of root node is 40, which is greater than its left child 30 but smaller
than right child of 30, i.e., 55. So, the above tree does not satisfy the property of Binary search
tree. Therefore, the above tree is not a binary search tree.
16
o Searching an element in the Binary search tree is easy as we always have a hint that
which subtree has the desired element.
o As compared to array and linked lists, insertion and deletion operations are faster in BST.
Now, let's see the creation of binary search tree using an example.
Suppose the data elements are - 45, 15, 79, 90, 10, 55, 12, 20, 50
o First, we have to insert 45 into the tree as the root of the tree.
o Then, read the next element; if it is smaller than the root node, insert it as the root of the
left subtree, and move to the next element.
o Otherwise, if the element is larger than the root node, then insert it as the root of the right
subtree.
Now, let's see the process of creating the Binary search tree using the given data element. The
process of creating the BST is shown below -
As 15 is smaller than 45, so insert it as the root node of the left subtree.
As 79 is greater than 45, so insert it as the root node of the right subtree.
17
90 is greater than 45 and 79, so it will be inserted as the right subtree of 79.
55 is larger than 45 and smaller than 79, so it will be inserted as the left subtree of 79.
18
20 is smaller than 45 but greater than 15, so it will be inserted as the right subtree of 15.
50 is greater than 45 but smaller than 79 and 55. So, it will be inserted as a left subtree of 55.
Now, the creation of binary search tree is completed. After that, let's move towards the
operations that can be performed on Binary search tree.
We can perform insert, delete and search operations on the binary search tree.
19
Searching means to find or locate a specific element or node in a data structure. In Binary search
tree, searching a node is easy because elements in BST are stored in a specific order. The steps of
searching a node in Binary Search tree are listed as follows -
1. First, compare the element to be searched with the root element of the tree.
2. If root is matched with the target element, then return the node's location.
3. If it is not matched, then check whether the item is less than the root element, if it is
smaller than the root element, then move to the left subtree.
4. If it is larger than the root element, then move to the right subtree.
6. If the element is not found or not present in the tree, then return NULL.
Now, let's understand the searching in binary tree using an example. We are taking the binary
search tree formed above. Suppose we have to find node 20 from the below tree.
Step1:
Step2:
20
Now, let's see the algorithm to search an element in the Binary search tree.
Now let's understand how the deletion is performed on a binary search tree. We will also see an
example to delete an element from the given tree.
In a binary search tree, we must delete a node from the tree by keeping in mind that the property
of BST is not violated. To delete a node from BST, there are three possible situations occur -
It is the simplest case to delete a node in BST. Here, we have to replace the leaf node with NULL
and simply free the allocated space.
21
In this case, we have to replace the target node(Deleting node) with its child, and then delete the
child node. It means that after replacing the target node with its child node, the child node will
now contain the value to be deleted. So, we simply have to replace the child node with NULL
and free up the allocated space.
We can see the process of deleting a node with one child from BST in the below image.
In the below image, suppose we have to delete the node 79, as the node to be deleted has only
one child, so it will be replaced with its child 55.
So, the replaced node 79 will now be a leaf node that can be easily deleted.
This case of deleting a node in BST is a bit complex among other two cases. In such a case, the
steps to be followed are listed as follows -
22
o And at last, replace the node with NULL and free up the allocated space.
The inorder successor is required when the right child of the node is not empty. We can obtain the
inorder successor by finding the minimum element in the right child of the node.
We can see the process of deleting a node with two children from BST in the below image. In the
below image, suppose we have to delete node 45 that is the root node, as the node to be deleted
has two children, so it will be replaced with its inorder successor. Now, node 45 will be at the
leaf of the tree so that it can be deleted easily.
A new key in BST is always inserted at the leaf. To insert an element in BST, we have to start
searching from the root node; if the node to be inserted is less than the root node, then search for
an empty location in the left subtree. Else, search for the empty location in the right subtree and
insert the data. Insert in BST is similar to searching, as we always have to maintain the rule that
the left subtree is smaller than the root, and right subtree is larger than the root.
Now, let's see the process of inserting a node into BST using an example.
23
Let's see the time and space complexity of the Binary search tree. We will see the time
complexity for insertion, deletion, and searching operations in best case, average case, and worst
case.
1. Time Complexity
Worst case scenario indicates the BST is the Degenerated BST for all the operations (insertion,
deletion and search)
2. Space Complexity
Insertion O(n)
Deletion O(n)
Search O(n)
24
Program:
#include <stdio.h>
#include <stdlib.h>
struct btnode
{
int value;
struct btnode *l;
struct btnode *r;
}*root = NULL, *temp = NULL, *t2, *t1;
void insert();
void create();
void search( struct btnode *root);
void inorder(struct btnode *t);
void preorder(struct btnode *t);
void postorder(struct btnode *t);
void main()
{
int ch; printf("\
nOPERATIONS ---");
printf("\n1 - Insert an element into tree\n");
printf("2 - Inorder Traversal\n");
printf("3 - Preorder Traversal\n");
printf("4 - Postorder Traversal\n");
printf("5 - Exit\n");
while(1)
{
printf("\nEnter your choice : ");
scanf("%d", &ch);
switch (ch)
{
case 1:
insert();
break;
case 2:
inorder(root);
break;
case 3:
25
26
27
return;
}
if (t->l != NULL)
postorder(t->l);
if (t->r != NULL)
postorder(t->r);
printf("%d -> ", t->value);
}
OUTPUT:
28
AVL Tree can be defined as height balanced binary search tree in which each node is associated
with a balance factor which is calculated by subtracting the height of its right sub-tree from that
of its left sub-tree.
Tree is said to be balanced if balance factor of each node is in between -1 to 1, otherwise, the tree
will be unbalanced and need to be balanced.
If balance factor of any node is 0, it means that the left sub-tree and right sub-tree contain equal
height.
If balance factor of any node is -1, it means that the left sub-tree is one level lower than the right
sub-tree.
An AVL tree is given in the following figure. We can see that, balance factor associated with
each node is in between -1 and +1. therefore, it is an example of AVL tree.
AVL tree controls the height of the binary search tree by not letting it to be skewed. The time
taken for all operations in a binary search tree of height h is O(h). However, it can be extended
to O(n) if the BST becomes skewed (i.e. worst case). By limiting this height to log n, AVL tree
imposes an upper bound on each operation to be O(log n) where n is the number of nodes.
AVL Rotations
We perform rotation in AVL tree only in case if Balance Factor is other than -1, 0, and 1. There
are basically four types of rotations which are as follows:
[Type text]
Where node A is the node whose balance Factor is other than -1, 0, 1.
The first two rotations LL and RR are single rotations and the next two rotations LR and RL are
double rotations. For a tree to be unbalanced, minimum height must be at least 2, Let us
understand each rotation
1. RR Rotation
When BST becomes unbalanced, due to a node is inserted into the right subtree of the right
subtree of A, then we perform RR rotation, RR rotation
is an anticlockwise rotation, which is applied on the edge below a node having balance factor -2
In above example, node A has balance factor -2 because a node C is inserted in the right subtree
of A right subtree. We perform the RR rotation on the edge below A.
2. LL Rotation
When BST becomes unbalanced, due to a node is inserted into the left subtree of the left subtree
of C, then we perform LL rotation,
LL rotation is clockwise rotation, which is applied on the edge below a node having balance
factor 2.
[Type text]
3. LR Rotation
Double rotations are bit tougher than single rotation which has already explained above. LR
rotation = RR rotation + LL rotation, i.e., first RR rotation is performed on subtree and then LL
rotation is performed on full tree, by full tree we mean the first node from the path of inserted
node whose balance factor is other than -1, 0, or 1.
Let us understand each and every step very clearly:
State Action
[Type text]
4. RL Rotation
. R L rotation= LL rotation + RR rotation, i.e., first LL rotation is performed on subtree and then
RR rotation is performed on full tree, by full tree we mean the first node from the path of
inserted node whose balance factor is other than -1, 0, or 1.
State Action
[Type text]
On inserting the above elements, especially in the case of H, the BST becomes unbalanced as the
Balance Factor of H is -2. Since the BST is right-skewed, we will perform RR Rotation on node
H.
[Type text]
2. Insert B, A
On inserting the above elements, especially in case of A, the BST becomes unbalanced as the
Balance Factor of H and I is 2, we consider the first node from the last inserted node i.e. H. Since
the BST from H is left-skewed, we will perform LL Rotation on node H.
The resultant balance tree is:
3. Insert E
[Type text]
4. Insert C, F, D
[Type text]
5. Insert G
[Type text]
6. Insert K
On inserting K, BST becomes unbalanced as the Balance Factor of I is -2. Since the BST is right-
skewed from I to K, hence we will perform RR Rotation on the node I.
The resultant balanced tree after RR rotation is:
[Type text]
Insertion in AVL tree is performed in the same way as it is performed in a binary search tree.
The new node is added into AVL tree as the leaf node. However, it may lead to violation in the
AVL tree property and therefore the tree may need balancing.
The tree can be balanced by applying rotations. Rotation is required only if, the balance factor of
any node is disturbed upon inserting the new node, otherwise the rotation is not required.
[Type text]
Construct an AVL tree by inserting the following elements in the given order.
63, 9, 19, 27, 18, 108, 99, 81
The process of constructing an AVL tree from the given set of elements is shown in the
following figure.
At each step, we must calculate the balance factor for every node, if it is found to be more than 2
or less than -2, then we need a rotation to rebalance the tree. The type of rotation will be
estimated by the location of the inserted element with respect to the critical node.
All the elements are inserted in order to maintain the order of binary search tree.
[Type text]
Deleting a node from an AVL tree is similar to that in a binary search tree. Deletion may
disturb the balance factor of an AVL tree and therefore the tree needs to be rebalanced in order to
maintain the AVLness. For this purpose, we need to perform rotations.
Example
Delete the node 60 from the AVL tree shown in the following image.
Solution:
in this case, node B has balance factor -1. Deleting the node 60, disturbs the balance factor of the
node 50 therefore, it needs to be R-1 rotated. The node C i.e. 45 becomes the root of the tree with
the node B(40) and A(50) as its left and right child.
[Type text]
RED-BLACK TREE
Introduction:
A red-black tree is a kind of self-balancing binary search tree where each node has an extra bit,
and that bit is often interpreted as the color (red or black). These colors are used to ensure that
the tree remains balanced during insertions and deletions. Although the balance of the tree is
not perfect, This tree was invented in 1972 by Rudolf Bayer.
Rules That Every Red-Black Tree Follows:
The above tree is a Red-Black tree where every node is satisfying all the properties of
Red-Black Tree.
Most of the BST operations (e.g., search, max, min, insert, delete.. etc) take O(h) time where h is
the height of the BST. The cost of these operations may become O(n) for a skewed Binary tree.
If we make sure that the height of the tree remains O(log n) after every insertion and deletion,
then we can guarantee an upper bound of O(log n) for all these operations. The height of a Red-
Black tree is always O(log n) where n is the number of nodes in the tree. Where “n” is the total
number of elements in the red-black tree.
The AVL trees are more balanced compared to Red-Black Trees, but they may cause more
rotations during insertion and deletion. So if your application involves frequent insertions and
deletions, then Red-Black trees should be preferred. And if the insertions and deletions are less
frequent and search is a more frequent operation, then AVL tree should be preferred over Red-
Black Tree.
Interesting points about Red-Black Tree:
1. Black height of the red-black tree is the number of black nodes on a path from the root
node to a leaf node. Leaf nodes are also counted as black nodes. So, a red-black tree of
height h has black height >= h/2.
4. The black depth of a node is defined as the number of black nodes from the root to that
node i.e the number of black ancestors.
Black height is the number of black nodes on a path from the root to a leaf. Leaf nodes are also
counted black nodes. From the above properties 3 and 4, we can derive, a Red-Black Tree of
height h has black-height >= h/2.
NOTE: Every Red Black Tree with n nodes has height <= 2Log2(n+1)
1. Search
2. Insertion
3. Deletion
Every red-black tree is a special case of a binary tree so the searching algorithm of a red-black
tree is similar to that of a binary tree.
Solution:
1. Start from the root.
2. Compare the inserting element with root, if less than root, then recurse for left, else
recurse for right.
3. If the element to search is found anywhere, return true, else return false.
1. Recoloring
2. Rotation
Re-coloring is the change in color of the node i.e. if it is red then change it to black and vice
versa. It must be noted that the color of the NULL node is always black. Moreover, we always
try re-coloring first, if re-coloring doesn’t work, then we go for rotation.
Following is a detailed algorithm. The algorithms have mainly two cases depending upon the
color of the uncle(Uncle means new node parent sibling). If the uncle is red, we do recolor. If the
uncle is black, we do rotations and/or re-coloring.
Logic:
First, you have to insert the node similarly to that in a binary tree and assign a red color to it.
Now, if the node is a root node then change its color to black, but if it is not then check the color
of the parent node. If its color is black then don’t change the color but if it is not i.e. it is red then
check the color of the node’s uncle. If the node’s uncle has a red color then change the color of
the node’s parent and uncle to black and that of grandfather to red color and repeat the same
process for him (i.e. grandfather).
Algorithm
1. Perform standard BST insertion and make the color of newly inserted nodes as RED.
2. If new node (x) is the root, change the color of x as BLACK
3. Do the following if the color of new node ( x’s ) parent is not BLACK and x is not the
root.
a) If x’s uncle(Uncle means new node parent sibling) is RED (Grandparent must
have been black from Red-Black Tree Property )
Change the colour of parent and uncle as BLACK.
Colour of a grandparent as RED.
Change x = x’s grandparent, repeat steps 2 and 3 for new x.
b). If x’s uncle is BLACK, then there can be four configurations for x, x’s parent (p)
and x’s grandparent (g)
Left Left Case (p is left child of g and x is left child of p)
Left Right Case (p is left child of g and x is the right child of p)
Right Right Case
Right Left Case
EXAMPLE:
Create a Red-Black Tree with the following sequence of numbers 8,18,5,15,17,25,40 and 80
10
Initial RB Tree
You first have to search for 30, once found perform BST deletion . For a node with value ‘30’, find
either the maximum of the left subtree or a minimum of the right subtree and replace 30 with that
value. This is BST deletion .
The resulting RB tree will be like one in fig. 4. Element 30 is deleted and the value is successfully
replaced by 38. But now the task is to delete duplicate element 38.
Go to the table above and you’ll observe case 1 is satisfied by this tree.
Since node with element 38 is a red leaf node, remove it and the tree looks like the one in fig. 5.
Observe that if you perform correct actions, the tree will still hold all the properties of the RB tree.
11
Initial RB Tree
15 can be removed easily from the tree (BST deletion). In the case of RB trees, if a leaf node is deleted
you replace it with a double black (DB) nil node . It is represented by a double circle.
The entire problem is now drilled down to get rid of this bad boy, DB, via some
actions. Go back to our rule book (table) and case 3 fits perfectly.
12
In short, remove DB and then swap the color of its sibling with its parent
Delete node with value 15 and, as a rule, replace it with DB nil node as shown. Now, DB’s sibling is black
and sibling’s both children are also black (don’t forget the hidden NIL nodes!), it satisfies all the
conditions of case 3. Here,
13
1. DB’s parent is 20
3. DB’s sibling is 30
With these points in mind perform the actions and you get an RB tree as in fig. 10.
20 becomes DB and hence the problem is not resolved yet. Reapply case 3
The resulting tree looks like the one in the above fig.
14
The root resolved DB and becomes a black node. And you’re done deleting 15 successfully.
First, Search 15 as per BST rules and then delete it. Second, replace deleted node with DB NIL node as
shown in fig. 13 (B).
15
(a) Swap DB’s parent’s color with DB’s sibling’s color. I know this is confusing, but take it easy and
keep following. The tree looks like fig. 14.
(b) Perform rotation at parent node in direction of DB. The tree becomes like the one in fig. 15. DB is
still there (what’s its problem!).
(c) Check which case can be applied in the current tree. And got it, case 3.
16
(d) Apply case 3 as explained and the RB tree is free from the DB node as shown in fig. 16.
I know it’s tiresome, but I swear if you practice these examples 2–3 times, you will have a good grasp of
the concept of deletion in RB trees.
Perform the basic preliminary steps- delete the node with value 1 and replace it with DB NIL node as
shown in fig. 17(B). Check for the cases which fit the current tree and it’s case 3(DB’s sibling is black).
17
Node 5 has now become a double black node. We need to get rid of it.
Search for cases that can be applied and case 5 seems to fit here (not case
3).
(A) Tree after swapping colors of 30 & 25 (B) Tree after rotation
(b) Rotate at sibling node in the direction opposite to the DB node. Hence, perform right rotation
at node 30 and the tree becomes like fig. 19 (B).
18
[Type text]
The double black node still haunts the tree! Re-check the case that can be applied to this tree and we
find that case 6 (don’t fall for case 3) seems to fit.
(b) Perform rotation at DB’s parent node in the direction of DB (fig, 20(B)).
(c) Change DB node to black node. Also, change the color of DB’s sibling’s far-red child to black and
the final RB tree will look fig. 21.
And, voilà! The RB tree is free of element 1 as well as of any double node. Life is good now.
19
Real-world uses of red-black trees include TreeSet, TreeMap, and Hashmap in the Java Collections
Library.
20
Splay trees are the self-balancing or self-adjusted binary search trees. In other words, we can say
that the splay trees are the variants of the binary search trees. The prerequisite for the splay trees
that we should know about the binary search trees.
As we already know, the time complexity of a binary search tree in every case. The time
complexity of a binary search tree in the average case is O(logn) and the time complexity in the
worst case is O(n). In a binary search tree, the value of the left subtree is smaller than the root
node, and the value of the right subtree is greater than the root node; in such case, the time
complexity would be O(logn). If the binary tree is left-skewed or right-skewed, then the time
complexity would be O(n). To limit the skewness, the AVL and Red-Black tree came into the
picture, having O(logn) time complexity for all the operations in all the cases. We can also
improve this time complexity by doing more practical implementations, so the new Tree data
structure was designed, known as a Splay tree.
Note: The splay tree can be defined as the self-adjusted tree in which any operation performed
on the element would rearrange the tree so that the element on which operation has been
performed becomes the root node of the tree.
In a splay tree, every operation is performed at the root of the tree. All the operations in splay
tree are involved with a common operation called "Splaying".
Splaying an element, is the process of bringing it to the root position by performing
suitable rotation operations.
In a splay tree, splaying an element rearranges all the elements in the tree so that splayed element
is placed at the root of the tree.
By splaying elements we bring more frequently used elements closer to the root of the tree so
that any operation on those elements is performed quickly. That means the splaying operation
automatically brings more frequently used elements closer to the root of the tree.
Every operation on splay tree performs the splaying operation. For example, the insertion
operation first inserts the new element using the binary search tree insertion process, then the
newly inserted element is splayed so that it is placed at the root of the tree. The search operation
in a splay tree is nothing but searching the element using binary search process and then splaying
that searched element so that it is placed at the root of the tree.
In splay tree, to splay any element we use the following rotation operations...
1. Zig Rotation
2. Zag Rotation
3. Zig - Zig Rotation
4. Zag - Zag Rotation
5. Zig - Zag Rotation
6. Zag - Zig Rotation
Example
Zig Rotation
Zag Rotation
The Zag Rotation in splay tree is similar to the single left rotation in AVL Tree rotations. In zag
rotation, every node moves one position to the left from its current position. Consider the
following example...
Zig-Zig Rotation
The Zig-Zig Rotation in splay tree is a double zig rotation. In zig-zig rotation, every node
moves two positions to the right from its current position. Consider the following example...
Zag-Zag Rotation
Zig-Zag Rotation
The Zig-Zag Rotation in splay tree is a sequence of zig rotation followed by zag rotation. In zig-
zag rotation, every node moves one position to the right followed by one position to the left from
its current position. Consider the following example...
Zag-Zig Rotation
The Zag-Zig Rotation in splay tree is a sequence of zag rotation followed by zig rotation. In
zag-zig rotation, every node moves one position to the left followed by one position to the right
from its current position. Consider the following example...
The following are the factors used for selecting a type of rotation:
Case 1: If the node does not have a grand-parent, and if it is the right child of the parent, then we
carry out the left rotation; otherwise, the right rotation is performed.
Case 2: If the node has a grandparent, then based on the following scenarios; the rotation would
be performed:
Scenario 1: If the node is the right of the parent and the parent is also right of its parent, then zag
zag left left rotation is performed.
Scenario 2: If the node is left of a parent, but the parent is right of its parent, then zig zag right
left rotation is performed.
Scenario 3: If the node is left of the parent and the parent is left of its parent, then zig zig right
right rotation is performed.
Scenario 4: If the node is right of a parent, but the parent is left of its parent, then zag zig left-
right rotation is performed.
The zig & zag rotations are used when the item to be searched is either a root node or the child of a root
node (i.e., left or the right child).
The following are the cases that can exist in the splay tree while searching:
Case 2: If the search item is a child of the root node, then the two scenarios will be there:
1. If the child is a left child, the right rotation would be performed, known as a zig right
rotation.
2. If the child is a right child, the left rotation would be performed, known as a zag left
rotation.
In the above example, we have to search 20 element in the tree. We will follow the below steps:
Step 1: First, we compare 20 with a root node. As 20 is greater than the root node, so it is a right
child of the root node.
Step 2: Once the element is found, we will perform splaying. The left rotation is performed so
that 20 element becomes the root node of the tree.
Sometimes the situation arises when the item to be searched is having a parent as well as a
grandparent. In this case, we have to perform four rotations for splaying.
Step 1: First, we have to perform a standard BST searching operation in order to search the 1
element. As 1 is less than 10 and 7, so it will be at the left of the node 7. Therefore, element 1 is
having a parent, i.e., 7 as well as a grandparent, i.e., 10.
Step 2: In this step, we have to perform splaying. We need to make node 1 as a root node with
the help of some rotations. In this case, we cannot simply perform a zig or zag rotation; we have
to implement zig zig rotation.
In order to make node 1 as a root node, we need to perform two right rotations known as zig zig
rotations. When we perform the right rotation then 10 will move downwards, and node 7 will
come upwards as shown in the below figure:
As we observe in the above figure that node 1 has become the root node of the tree; therefore, the
searching is completed.
Step 2: The second step is to perform splaying. In this case, two left rotations would be
performed. In the first rotation, node 10 will move downwards, and node 15 would move
upwards as shown below:
As we have observed that two left rotations are performed; so it is known as a zag zag left
rotation.
Till now, we have read that both parent and grandparent are either in RR or LL relationship.
Now, we will see the RL or LR relationship between the parent and the grandparent.
Step 2: Since node 13 is at the left of 15 and node 15 is at the right of node 10, so RL
relationship exists. First, we perform the right rotation on node 15, and 15 will move downwards,
and node 13 will come upwards, as shown below:
Still, node 13 is not the root node, and 13 is at the right of the root node, so we will perform left
rotation known as a zag rotation. The node 10 will move downwards, and 13 becomes the root
node as shown below:
Step 2: Since node 9 is at the right of node 7, and node 7 is at the left of node 10, so LR
relationship exists. First, we perform the left rotation on node 7. The node 7 will move
downwards, and node 9 moves upwards as shown below:
Still the node 9 is not a root node, and 9 is at the left of the root node, so we will perform the
right rotation known as zig rotation. After performing the right rotation, node 9 becomes the root
node, as shown below:
The major drawback of the splay tree would be that trees are not strictly balanced, i.e., they are
roughly balanced. Sometimes the splay trees are linear, so it will take O(n) time complexity.
In the insertion operation, we first insert the element in the tree and then perform the splaying
operation on the inserted element.
Step 1: First, we insert node 15 in the tree. After insertion, we need to perform splaying. As 15 is a
root node, so we do not need to perform splaying.
Step 2: The next element is 10. As 10 is less than 15, so node 10 will be the left child of node 15, as
shown below:
Now, we perform splaying. To make 10 as a root node, we will perform the right rotation, as
shown below:
Step 4: The next element is 7. As 7 is less than 17, 15, and 10, so node 7 will be left child of 10.
Now, we have to splay the tree. As 7 is having a parent as well as a grandparent so we will
perform two right rotations as shown below:
The deletion is still not completed. We need to splay the parent of the deleted node, i.e., 10. We
have to perform Splay(10) on the tree. As we can observe in the above tree that 10 is at the right
of node 7, and node 7 is at the left of node 13. So, first, we perform the left rotation on node 7
and then we perform the right rotation on node 13, as shown below:
o Now, we have to delete the 14 element from the tree, which is shown below:
As we know that we cannot simply delete the internal node. We will replace the value of the
node either using inorder predecessor or inorder successor. Suppose we use inorder successor in
which we replace the value with the lowest value that exist in the right subtree. The lowest value
in the right subtree of node 14 is 15, so we replace the value 14 with 15. Since node 14 becomes
the leaf node, so we can simply delete it as shown below:
Top-down splaying
In top-down splaying, we first perform the splaying on which the deletion is to be performed and
then delete the node from the tree. Once the element is deleted, we will perform the join
operation.
Let's understand the top-down splaying through an example.
Suppose we want to delete 16 from the tree which is shown below:
The node 16 is still not a root node, and it is a right child of the root node, so we need to perform
left rotation on the node 12 to make node 16 as a root node.
Once the node 16 becomes a root node, we will delete the node 16 and we will get two different
trees, i.e., left subtree and right subtree as shown below:
After performing two rotations on the tree, node 15 becomes the root node. As we can see, the
right child of the 15 is NULL, so we attach node 17 at the right part of the 15 as shown below,
and this operation is known as a join operation.
Pattern Matching and Tries: Pattern matching algorithms-Brute force, the Boyer –Moore
algorithm, the Knuth-Morris-Pratt algorithm, Standard Tries, Compressed Tries, Suffix tries
Pattern Matching
Pattern searching is an important problem in computer science. When we do search for a
string in notepad/word file or browser or database, pattern searching algorithms are used to
show the search results.
A typical problem statement would be-
Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(char pat[], char
txt[]) that prints all occurrences of pat[] in txt[]. You may assume that n > m.
Examples:
Input: txt[] = "THIS IS A TEST TEXT"
pat[] = "TEST"
Output: Pattern found at index 10
Input: txt[] = "AABAACAADAABAABA"
pat[] = "AABA"
Output: Pattern found at index 0
Pattern found at index 9
Pattern found at index 12
Different Types of Pattern Matching Algorithms
1. Navie Based Algorithm or Brute Force Algorithm
2. Boyer Moore Algorithm
3. Knuth-Morris Pratt (KMP) Algorithm
Navie Based Algorithm or Brute Force Algorithm
When we talk about a string matching algorithm, every one can get a simple string matching
technique. That is starting from first letters of the text and first letter of the pattern check
whether these two letters are equal. if it is, then check second letters of the text and pattern. If
it is not equal, then move first letter of the pattern to the second letter of the text. then check
these two letters. this is the simple technique everyone can thought.
Brute Force string matching algorithm is also like that. Therefore we call that as Naive string
do
if (text letter == pattern letter)
compare next letter of pattern to next letter of text
else
move pattern down text by one letter
while (entire pattern found or end of text)
In above red boxes says mismatch letters against letters of the text and green boxes says
match letters against letters of the text. According to the above
In first raw we check whether first letter of the pattern is matched with the first letter of
the text. It is mismatched, because "S" is the first letter of pattern and "T" is the first letter of
text. Then we move the pattern by one position. Shown in second raw.
Example 2
Worst Case
Best case
Given a pattern M characters in length, and a text N characters in length...
• Best case if pattern found: Finds pattern in first M positions of
text. For example, M=5.
AAAAAAAAAAAAAAAAAAAAAAAAAAAH
AAAAA 5 comparisons made
• Total number of comparisons: M
• Best case time complexity: Ο(M)
Best case if pattern not found:
Always mismatch on first character. For example, M=5.
Disadvantages
1. Very inefficient method. Because this method takes only one position movement in
each time
If a character is compared that is not within the pattern, no match can be found by comparing any
furher characters at this position so the pattern can be shifted completely past the
mismatching character.
For determining the possible shifts , B-M algorithm uses 2 preprocessing strategies simultaneously
whenever a mismatch occurs, the algorithm computes a shift using both strategies and selects
the longer one. thus it makes use of the most efficient stategy for each individual case
NOTE : Boyer Moore algorithm starts matching from the last character of the pattern.
The 2 strategies are called heuristics of B-M as they are used to reduce the search. They are
case 1
Explanation: In the above example, we got a mismatch at position 3. Here our mismatching
character is “A”. Now we will search for last occurrence of “A” in pattern. We got “A” at
position 1 in pattern (displayed in Blue) and this is the last occurrence of it. Now we will shift
pattern 2 times so that “A” in pattern get aligned with “A” in text.
case2
This means we need some extra information to produce a shift an encountering a bad
character. The information is about last position of evry character in the pattern and also the
set of every character in the pattern and also the set of characters used in the pattern
Explanation: In the above example, we have got a substring t of text T matched with pattern P (in
green) before mismatch at index 2. Now we will search for occurrence of t (“AB”) in P. We have
found an occurrence starting at position 1 (in yellow background) so we will right shift the
pattern 2 times to align t in P with t in T. This is weak rule of original Boyer Moore
This algorithm takes o(mn) in the worst case and O(nlog(m)/m) on average case,
which is the sub linear in the sense that not all characters are inspected
Applications
This algorithm is highly useful in tasks like recursively searching files for virus
patterns,searching databases for keys or data ,text and word processing and any other task
that requires handling large amount of data at very high speed
txt[] = "AAAAAAAAAAAAAAAAAB"
pat[] = "AAAAB"
txt[] = "ABABABCABABABCABABABC"
pat[] = "ABABAC" (not a worst case, but a bad case for Naive
KMP Algorithm is one of the most popular patterns matching algorithms. KMP stands for Knuth
Morris Pratt. KMP algorithm was invented by Donald Knuth and Vaughan Pratt together and
independently by James H Morris in the year 1970. In the year 1977, all the three jointly
published KMP Algorithm.
KMP algorithm is used to find a "Pattern" in a "Text". This algorithm campares character by
character from left to right. But whenever a mismatch occurs, it uses a preprocessed table
called "Prefix Table" to skip characters comparison while matching. Some times prefix table is
also known as LPS Table. Here LPS stands for "Longest proper Prefix which is also Suffix".
We use the LPS table to decide how many characters are to be skipped for comparison
when a mismatch has occurred.
When a mismatch occurs, check the LPS value of the previous character of the mismatched
EXAMPLE 1
Definition of a Trie
Data structure for representing a collection of strings
In computer science , a trie also called digital tree or radix tree or prefix tree.
Tries support fast string matching.
Properties of Tries
EXAMPLE
Trie is an efficient information retrieval data structure. Using Trie, search complexities can be
brought to an optimal limit (key length).
Given multiple strings. The task is to insert the string in a Trie
Examples:
root
/ \
c t
| |
a h
|\ |
l t e
| | \
l i r
|\ | |
e i r e
| |
r n
/ |\
l n t
| |
l d
|\ |
e iy
| |
r n
Approach: An efficient approach is to treat every character of the input key as an individual
trie node and insert it into the trie. Note that the children are an array of pointers (or
references) to next level trie nodes. The key character acts as an index into the array of
children. If the input key is new or an extension of the existing key, we need to construct non-
existing nodes of the key, and mark end of the word for the last node. If the input key is a
prefix of the existing key in Trie, we simply mark the last node of the key as the end of a word.
The key length determines Trie depth.
Trie deletion
1. Key may not be there in trie. Delete operation should not modify trie.
2. Key present as unique key (no part of key contains another key (prefix), nor the
key itself is prefix of another key in trie). Delete all the nodes.
3. Key is prefix key of another long key in trie. Unmark the leaf node.
4. Key present in trie, having atleast one other key as prefix key. Delete nodes from end
of key until first leaf node of longest prefix key.
Time Complexity: The time complexity of the deletion operation is O(n) where n is the key
length
Tries is a tree that stores strings. The maximum number of children of a node is equal to
the size of the alphabet. Trie supports search, insert and delete operations in O(L)
time where L is the length of the key.
Hashing:- In hashing, we convert the key to a small value and the value is used to index
data. Hashing supports search, insert and delete operations in O(L) time on average.
Self Balancing BST : The time complexity of the search, insert and delete operations in a
self-balancing Binary Search Tree (BST) (like Red-Black Tree, AVL Tree, Splay Tree, etc) is
O(L
* Log n) where n is total number words and L is the length of the word. The advantage of
Self-balancing BSTs is that they maintain order which makes operations like minimum,
maximum, closest (floor or ceiling) and kth largest faster.
Why Trie? :-
2. Another advantage of Trie is, we can easily print all words in alphabetical order which is
not easily possible with hashing.
APPLICATIONS OF TRIES
String handling and processing are one of the most important topics for programmers.
Many real time applications are based on the string processing like:
The data structure that is very important for string handling is the Trie data structure that
is based on prefix of string
1. Standard Tries
2. Compressed Tries
3. Suffix Tries
STANDARD TRIES
Strings={ a,an,and,any}
Handling Keys(strings)
COMPRESSED TRIE
A Compressed trie have the following properties:
A compressed Trie can be stored at O9s) where s= | S| by using O(1) Space index ranges at the
nodes
1. Suffix trie is a compressed trie for all the suffixes of the text
2. Suffix trie are space efficient data structure to store a string that allows many kinds
of queries to be answered quickly.
Example