Data Structures and Algorithm

DATA-ENGINEERING 101
Data Structures and

Algorithms
Shwetank Singh
GritSetGrow - GSGLearn.com
Arrays
Arrays are a collection of elements identified by index or key. They provide a way to store multiple items of the
same type together. Arrays have a fixed size, meaning that once they are created, their size cannot be
changed. This makes them efficient for accessing and modifying elements, but less flexible for dynamic data.
Arrays are commonly used in scenarios where the size of the dataset is known in advance and random access
to elements is required.
Example
An array can be visualized as a row of boxes where each box contains a value and has an index. For example,
in an array of integers [1, 2, 3, 4, 5], the value 3 is at index 2.
Shwetank Singh
GritSetGrow - GSGLearn.com Data Engineering 101 - DSA
AVL Trees
AVL trees are self-balancing binary search trees where the difference of heights of left and right subtrees
cannot be more than one for all nodes. This property ensures that the tree remains balanced, providing O(log
n) time complexity for insertions, deletions, and lookups. AVL trees are used in scenarios where frequent
insertions and deletions occur, and maintaining balanced trees is critical for performance. The balancing is
achieved through rotations, which are tree restructuring operations that restore the balance factor after an
insertion or deletion. AVL trees are named after their inventors, Adelson-Velsky and Landis. The ability to
maintain balance and ensure efficient operations makes AVL trees suitable for applications requiring dynamic
set operations and ordered data management. The ability to maintain balance and ensure efficient operations
makes AVL trees essential for applications involving dynamic set operations and ordered data management.
Example
For example, a database index implemented with AVL trees ensures that query operations remain efficient
even with frequent updates to the database.
Shwetank Singh
Backtracking
Backtracking is a refinement of the brute force method that systematically searches for a solution to a
problem among all available options. It incrementally builds candidates to the solutions and abandons a
candidate (backtracks) as soon as it determines that the candidate cannot lead to a valid solution.
Backtracking is used in problems like the N-Queens problem, Sudoku solving, and generating permutations of
a set.
Example
For example, solving the N-Queens problem by placing queens one by one in different columns and
backtracking when a conflict is detected.
Shwetank Singh
Bellman-Ford Algorithm
Bellman-Ford algorithm is an algorithm that computes shortest paths from a single source vertex to all of the
other vertices in a weighted digraph. It is capable of handling graphs with negative edge weights and can
detect negative weight cycles. The algorithm works by iteratively relaxing all edges, updating the shortest
known distance to each vertex. If any distance is updated after V-1 iterations (where V is the number of
vertices), a negative weight cycle exists. Bellman-Ford is used in applications like routing in computer networks
and finding the shortest path in graphs with negative weights. The ability to handle negative weights and
detect negative cycles makes Bellman-Ford suitable for more complex graph scenarios where other algorithms
like Dijkstra's may not be applicable. The ability to compute shortest paths in graphs with negative weights and
detect negative cycles makes Bellman-Ford essential for complex network routing and analysis scenarios.
Example
For example, finding the shortest path from vertex A to all other vertices in a graph with edges that may have
negative weights, by repeatedly relaxing all edges and updating distances.
Shwetank Singh
Binary Heaps
Binary heaps are complete binary trees which satisfy the heap property: in a max heap, for any given node,
the value of the node is greater than or equal to the values of its children; in a min heap, the value of the node
is less than or equal to the values of its children. Binary heaps are used to implement priority queues, allowing
efficient extraction of the maximum (or minimum) element. They provide logarithmic time complexity for
insertions and deletions. Binary heaps are also used in heapsort, an efficient sorting algorithm. The ability to
efficiently manage priority elements and maintain heap properties makes binary heaps suitable for priority
queue implementation and efficient sorting algorithms. The ability to efficiently manage priority elements and
maintain heap properties makes binary heaps essential for applications requiring efficient priority queue
implementation and sorting.
Example
For example, a priority queue implemented with a binary min heap allows quick access to the smallest
element, with efficient insertions and deletions.
Shwetank Singh
Binary Search
Binary search is a search algorithm that finds the position of a target value within a sorted array. It works by
repeatedly dividing the search interval in half, comparing the target value to the middle element of the
interval, and narrowing the interval based on the comparison. Binary search has a time complexity of O(log n)
and is more efficient than linear search for large datasets. It is used in scenarios where the dataset is sorted,
and fast search times are required, such as in searching for elements in a sorted array, databases, and
dictionaries.
Example
For example, finding the position of the number 5 in a sorted array [1, 2, 3, 4, 5, 6, 7, 8, 9] by repeatedly
dividing the search interval and comparing the target value to the middle element.
Shwetank Singh
Binary Search Trees
Binary search trees (BSTs) are a node-based binary tree data structure that maintains sorted order. Each node
has at most two children, and for each node, all elements in the left subtree are less than the node, and all
elements in the right subtree are greater. BSTs are used in applications that require dynamic set operations
like insertions, deletions, and lookups in a sorted order. They provide average-case O(log n) time complexity
for these operations.
Example
For example, a BST with the root node 10, left child 5, and right child 15 ensures that any node in the left
subtree of 10 is less than 10, and any node in the right subtree is greater.
Shwetank Singh
Bitonic Sort
Bitonic sort is a parallel comparison-based sorting algorithm that builds a bitonic sequence (a sequence that is
monotonically increasing and then monotonically decreasing) and then merges it to produce the sorted
sequence. Bitonic sort is particularly useful for parallel processing and has a time complexity of O(log^2 n). It is
used in applications where parallel processing is advantageous, such as in GPU computing and parallel
architectures. The ability to leverage parallel processing makes bitonic sort suitable for high-performance
computing applications requiring efficient sorting in parallel environments.
Example
For example, sorting an array [3, 7, 4, 8, 6, 2, 1, 5] using bitonic sort by first creating a bitonic sequence and
then merging it to get [1, 2, 3, 4, 5, 6, 7, 8].
Shwetank Singh
Bloom Filters
Bloom filters are space-efficient probabilistic data structures used to test whether an element is a member of
a set. They use multiple hash functions to map elements to a bit array, and a bit is set to 1 if any of the hash
functions map to that bit. Bloom filters allow false positives (indicating an element is in the set when it is not)
but no false negatives. They are used in applications like cache mechanisms, network security, and database
querying to quickly test for element membership with a small probability of error. Bloom filters are particularly
useful in scenarios where space is limited and approximate answers are acceptable. The ability to efficiently
test for set membership with a small error rate makes bloom filters suitable for applications where fast and
space-efficient membership testing is needed.
Example
For example, a Bloom filter can be used to check if an IP address is part of a known list of malicious addresses,
with a small chance of false positives but no false negatives.
Shwetank Singh
Breadth-First Search
Breadth-first search (BFS) is an algorithm for traversing or searching tree or graph data structures. It starts at
the root (or an arbitrary node) and explores all neighbor nodes at the present depth before moving on to
nodes at the next depth level. BFS is used in applications like finding the shortest path in an unweighted
graph, level-order traversal of trees, and network broadcast algorithms. It uses a queue to keep track of the
nodes to be explored next. BFS is particularly useful for finding the shortest path in unweighted graphs and for
exploring nodes level by level. The ability to systematically explore nodes level by level makes BFS suitable for
applications requiring uniform exploration and shortest path finding in unweighted graphs.
Example
For example, finding the shortest path from a starting node to all other nodes in an unweighted graph by
exploring all neighbors at the current depth before moving to the next level.
Shwetank Singh
B-Trees
B-Trees are self-balancing search trees in which nodes can have multiple children. Unlike binary search trees,
B-Trees can store more than one key in a single node and have multiple children nodes. They are optimized for
systems that read and write large blocks of data, such as databases and filesystems. B-Trees maintain balance
through a set of properties, such as all leaf nodes being at the same depth and internal nodes having a range
of children between a minimum and maximum degree. B-Trees provide logarithmic time complexity for
insertions, deletions, and lookups, and they minimize disk access by keeping data as close to the root as
possible. The ability to manage large datasets efficiently and maintain balance makes B-Trees suitable for
database indexing and filesystems. The ability to efficiently manage large datasets and maintain balance
makes B-Trees essential for database indexing and filesystem management.
Example
For example, B-Trees are used in database indexing to allow efficient querying and updating of records.
Shwetank Singh
Bubble Sort
Bubble sort is a simple comparison-based sorting algorithm that repeatedly steps through the list, compares
adjacent elements, and swaps them if they are in the wrong order. This process is repeated until the list is
sorted. Bubble sort has a time complexity of O(n^2) in the worst and average case, making it inefficient for
large datasets. However, it is simple to understand and implement, and it can be optimized to stop early if the
list becomes sorted before all passes are complete. Bubble sort is used in educational settings and for small
datasets where simplicity and ease of implementation are more important than efficiency. The simplicity and
ease of understanding make bubble sort suitable for educational purposes and simple sorting tasks with small
datasets.
Example
For example, sorting an array [5, 3, 8, 4, 2] using bubble sort by repeatedly swapping adjacent elements until
the list is sorted to get [2, 3, 4, 5, 8].
Shwetank Singh
Bucket Sort
Bucket sort is a sorting algorithm that works by distributing the elements of an array into a number of buckets.
Each bucket is then sorted individually, either using a different sorting algorithm or by recursively applying the
bucket sort. Finally, the sorted buckets are concatenated to form the sorted array. Bucket sort is useful for
sorting uniformly distributed data and has a time complexity of O(n + k), where n is the number of elements
and k is the number of buckets. It is often used in scenarios like histogram sort and when sorting floating-point
numbers in a fixed range. Bucket sort is particularly efficient when the input is uniformly distributed across a
range. The ability to efficiently sort data by distributing it into buckets makes bucket sort suitable for
applications where uniform distribution and fixed ranges are common. The ability to efficiently sort data by
distributing it into buckets makes bucket sort essential for applications where uniform distribution and fixed
ranges are common.
Example
For example, sorting an array of floating-point numbers between 0 and 1 by distributing them into buckets
based on their value ranges, then sorting each bucket and concatenating the results.
Shwetank Singh
Cocktail Shaker Sort
Cocktail shaker sort is a variation of bubble sort that sorts in both directions on each pass through the list. It
traverses the list in both directions, alternatively performing bubble sort from left to right and then from right
to left. This bidirectional approach can help to move elements into place faster than bubble sort. Cocktail
shaker sort has a time complexity of O(n^2) in the worst and average case. It is used in scenarios where a
bidirectional approach can lead to better performance compared to bubble sort, such as in educational
settings and simple sorting tasks. The ability to sort in both directions makes cocktail shaker sort suitable for
simple and straightforward sorting tasks where performance improvements over bubble sort are desired.
Example
For example, sorting an array [5, 3, 8, 4, 2] using cocktail shaker sort by performing bubble sort from left to
right and then from right to left on each pass to get [2, 3, 4, 5, 8].
Shwetank Singh
Comb Sort
Comb sort improves on bubble sort by using a gap sequence that initially allows the comparison and exchange
of elements that are far apart. The gap is progressively decreased until it becomes 1, at which point the
algorithm effectively becomes bubble sort. Comb sort is more efficient than bubble sort and performs better
in practice for moderately sized lists. The time complexity of Comb sort is O(n^2) in the worst case but can be
much better in practice with a good gap sequence. Comb sort is used in scenarios where bubble sort would be
used, but better performance is desired, such as in educational settings and simple sorting tasks. The ability to
improve performance over bubble sort while maintaining simplicity makes comb sort suitable for
straightforward sorting tasks with moderate performance requirements.
Example
For example, sorting an array [23, 12, 1, 8, 34, 54, 2, 3] using Comb sort by initially comparing elements far
apart and gradually reducing the gap to achieve a sorted array [1, 2, 3, 8, 12, 23, 34, 54].
Shwetank Singh
Counting Sort
Counting sort is an algorithm for sorting a collection of objects according to keys that are small integers. It
counts the number of objects that have each distinct key value, and uses arithmetic to determine the positions
of each key in the output sequence. Counting sort is stable and has a time complexity of O(n + k), where n is
the number of elements and k is the range of the input. It is particularly useful for sorting integers and for
applications like radix sort. Counting sort is used in scenarios where the range of input values is known and
relatively small, making it efficient in terms of both time and space complexity. The ability to sort integers
efficiently and stably makes counting sort suitable for applications with known and limited input ranges. The
ability to sort integers efficiently and stably makes counting sort essential for applications with known and
limited input ranges.
Example
For example, sorting an array of integers between 0 and 9 by counting the occurrences of each integer and
then placing them in the correct positions in the output array based on their counts.
Shwetank Singh
Cycle Sort
Cycle sort is an in-place, unstable sorting algorithm that is optimal in terms of the number of writes to the
array. It works by finding cycles in the permutation of the array and then rotating the elements within each
cycle to their correct positions. Cycle sort has a time complexity of O(n^2) in the worst case, but it minimizes
the number of writes, making it useful for scenarios where writing to memory is expensive. Cycle sort is used
in applications where minimizing write operations is crucial, such as in write-limited storage devices. The
ability to minimize write operations makes cycle sort suitable for applications involving write-limited memory
or storage devices.
Example
For example, sorting an array [1, 8, 3, 9, 10, 10, 2, 4] using cycle sort by rotating elements within each cycle to
their correct positions, minimizing the number of writes.
Shwetank Singh
Depth-First Search
Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures. It starts at the
root (or an arbitrary node) and explores as far as possible along each branch before backtracking. DFS is used
in applications like finding connected components in a graph, solving puzzles with only one solution, and
topological sorting. It uses a stack (explicit or implicit via recursion) to keep track of the path being explored.
DFS is particularly useful for exploring all possible paths in a graph and for solving problems where the
solution requires exploring all possibilities. The ability to systematically explore all paths in a graph makes DFS
suitable for applications requiring exhaustive exploration and solution finding in complex graphs and puzzles.
Example
For example, exploring all possible paths in a maze by going as deep as possible along each path before
backtracking to explore other paths.
Shwetank Singh
Dijkstra’s Algorithm
Dijkstra’s algorithm is a shortest path algorithm for finding the shortest paths between nodes in a graph,
which may represent, for example, road networks. It works by iteratively selecting the vertex with the
minimum distance from the source and updating the distances to its neighbors. The algorithm uses a priority
queue to efficiently select the vertex with the minimum distance. Dijkstra's algorithm is widely used in network
routing protocols and geographical mapping services. However, it only works with graphs that have non-
negative edge weights. Dijkstra's algorithm provides a simple and efficient way to find the shortest path in
graphs with non-negative weights, making it suitable for many practical applications. The ability to find the
shortest path in graphs with non-negative weights makes Dijkstra's algorithm essential for efficient routing
and navigation in various networked systems.
Example
For example, finding the shortest path from vertex A to all other vertices in a graph with weighted edges by
continuously updating the shortest known distance to each vertex until all vertices have been processed.
Shwetank Singh
Divide and Conquer
Divide and conquer is an algorithm design paradigm based on multi-branched recursion. The problem is
divided into smaller subproblems that are solved independently and then combined to produce the final
solution. This approach is used in algorithms like mergesort, quicksort, and binary search. The main advantage
is that it simplifies complex problems and can significantly reduce time complexity, especially for problems
that can be broken down into smaller, independent subproblems.
Example
For example, mergesort divides the array into two halves, sorts them, and then merges the sorted halves.
Shwetank Singh
Dynamic Arrays
Dynamic arrays are arrays that can grow and shrink in size at runtime. Unlike static arrays, dynamic arrays
allocate additional space when the array reaches its capacity, usually doubling the capacity to amortize the
cost of resizing. This allows for efficient random access and dynamic resizing. Dynamic arrays are used in
scenarios where the size of the dataset can change dynamically, such as in implementing data structures like
stacks, queues, and hash tables. Dynamic arrays provide average-case O(1) time complexity for append
operations and O(n) for resizing operations. The ability to dynamically resize while maintaining efficient access
and update times makes dynamic arrays suitable for applications where data size is unpredictable. The ability
to dynamically resize and efficiently manage varying data sizes makes dynamic arrays essential for applications
with unpredictable data growth and size requirements.
Example
For example, Python's list implementation is a dynamic array that resizes itself when it runs out of space,
allowing efficient appends and random access.
Shwetank Singh
Dynamic Programming
Dynamic programming is a method for solving complex problems by breaking them down into simpler
subproblems and storing the results of these subproblems to avoid redundant computations. It is used for
optimization problems where the solution can be constructed from solutions to subproblems. Dynamic
programming can be applied using memoization (top-down approach) or tabulation (bottom-up approach). It
is commonly used in problems like the shortest path, knapsack problem, and sequence alignment in
bioinformatics.
Example
For example, solving the Fibonacci sequence with dynamic programming: store the results of Fibonacci(n-1)
and Fibonacci(n-2) to compute Fibonacci(n) efficiently.
Shwetank Singh
Floyd-Warshall Algorithm
Floyd-Warshall algorithm is an algorithm for finding shortest paths in a weighted graph with positive or
negative edge weights (but with no negative cycles). It works by considering all pairs of vertices and iteratively
improving the shortest path between them using an intermediate vertex. The algorithm uses a dynamic
programming approach and is particularly useful for dense graphs or when the shortest paths between all
pairs of vertices are required. It has a time complexity of O(V^3), where V is the number of vertices. Floyd-
Warshall is used in applications like network routing, where it is important to compute the shortest paths
between all pairs of nodes. The ability to compute shortest paths between all pairs of vertices makes Floyd-
Warshall suitable for scenarios where comprehensive distance information is required. The ability to compute
shortest paths between all pairs of vertices efficiently makes Floyd-Warshall essential for comprehensive
network analysis and routing applications.
Example
For example, finding the shortest paths between all pairs of vertices in a graph by iteratively updating the
shortest path between each pair using every other vertex as an intermediate point.
Shwetank Singh
Graph Traversal
Graph traversal is the process of visiting all the nodes in a graph in a systematic way. The main types of graph
traversal are: (1) Breadth-First Search (BFS) which explores all neighbor nodes at the present depth before
moving on to nodes at the next depth level, and (2) Depth-First Search (DFS) which explores as far as possible
along each branch before backtracking. Graph traversal is used in various applications like finding the shortest
path in unweighted graphs, detecting cycles, and searching for connected components. BFS uses a queue to
keep track of the nodes to be explored next, while DFS uses a stack (explicit or implicit via recursion). Each
traversal type has its own use cases and can be applied to both directed and undirected graphs. The ability to
systematically explore all nodes and edges makes graph traversal essential for comprehensive graph analysis
and manipulation.
Example
For example, finding all reachable nodes from a given node in a graph using BFS or DFS. BFS would explore
nodes level by level, while DFS would explore nodes by going as deep as possible along each path before
backtracking.
Shwetank Singh
Graphs
Graphs are collections of nodes (vertices) and edges connecting pairs of nodes. Graphs can be directed (edges
have a direction) or undirected (edges have no direction). They are used to represent networks such as social
networks, transportation networks, and dependency graphs. Graph traversal algorithms, such as depth-first
search (DFS) and breadth-first search (BFS), are used to explore nodes and edges in a systematic way.
Example
A graph can be represented as an adjacency list or adjacency matrix. For example, a graph with vertices A, B, C
and edges A-B, A-C can be represented as A: [B, C], B: [A], C: [A].
Shwetank Singh
Greedy Algorithms
Greedy algorithms build up a solution piece by piece, always choosing the next piece that offers the most
immediate benefit. These algorithms make locally optimal choices at each step with the hope of finding a
global optimum. Greedy algorithms are used in optimization problems like finding the minimum spanning
tree, the shortest path in a weighted graph, and in tasks like scheduling and resource allocation. The main
drawback is that they do not always produce the optimal solution for all problems.
Example
For example, in the coin change problem, always choosing the largest denomination coin first to make up the
total amount.
Shwetank Singh
Hash Functions
Hash functions are functions that can be used to map data of arbitrary size to fixed-size values, typically for
use in hash tables. A good hash function should distribute keys uniformly across the hash table to minimize
collisions. Hash functions are used in various applications like data retrieval, cryptography, and data integrity
verification. In hash tables, hash functions are used to compute an index into an array of buckets or slots, from
which the desired value can be found. A good hash function should be fast to compute and minimize the
number of collisions, ensuring efficient insertion, deletion, and lookup operations. The ability to efficiently
map keys to unique indices makes hash functions essential for fast data retrieval and integrity checking in
numerous applications.
Example
For example, using a hash function to map a string key to an index in a hash table for efficient retrieval of
values. A simple hash function for strings could sum the ASCII values of the characters and take the modulo
with the size of the table.
Shwetank Singh
Hash Tables
Hash tables are data structures that map keys to values for highly efficient lookup. They use a hash function to
compute an index into an array of buckets or slots, from which the desired value can be found. Hash tables
are commonly used for implementing associative arrays, database indexing, and caches. They provide
average-case O(1) time complexity for insertions, deletions, and lookups, making them extremely efficient for
these operations.
Example
In Python, dictionaries are implemented as hash tables. For example, hash_table = {'a': 1, 'b': 2} allows quick
access to values 1 and 2 using keys 'a' and 'b'.
Shwetank Singh
Heap Sort
Heap sort is a comparison-based sorting technique based on a binary heap data structure. It first builds a max
heap (or min heap) from the input data, then repeatedly extracts the maximum (or minimum) element from
the heap and rebuilds the heap until all elements are sorted. Heap sort has a time complexity of O(n log n) for
all cases and is not a stable sort. It is particularly useful for applications that require a reliable O(n log n) sort
and where space complexity is a concern, as it sorts in place with O(1) auxiliary space. Heap sort is used in
scenarios where efficient, in-place sorting is required, such as in embedded systems and resource-constrained
environments. The ability to efficiently sort data in place with a reliable time complexity makes heap sort
suitable for applications where memory efficiency and sorting performance are important.
Example
For example, sorting an array [4, 10, 3, 5, 1] using heap sort involves building a max heap and then extracting
the largest element repeatedly to get [1, 3, 4, 5, 10].
Shwetank Singh
Heaps
A heap is a special tree-based data structure that satisfies the heap property: in a max heap, for any given
node I, the value of I is greater than or equal to the values of its children. In a min heap, the value of I is less
than or equal to its children. Heaps are used to implement priority queues, which allow efficient access to the
highest (or lowest) priority element. They are also used in algorithms like heapsort and for finding the k largest
(or smallest) elements in a dataset.
Example
A min heap ensures the smallest element is always at the root. For example, in a min heap with elements 1, 3,
6, 1 is the root, and its children 3 and 6 are larger.
Shwetank Singh
Heapsort
Heapsort is a comparison-based sorting algorithm that uses a binary heap data structure. It first builds a max
heap from the input data, then repeatedly extracts the maximum element from the heap and rebuilds the
heap until all elements are sorted. Heapsort has O(n log n) time complexity for all cases and is not a stable
sort. It is commonly used in applications that require a reliable O(n log n) sort and where space complexity is a
concern, as it sorts in place with O(1) auxiliary space.
Example
For example, sorting an array [4, 10, 3, 5, 1] using heapsort involves building a max heap and then extracting
the largest element repeatedly to get [1, 3, 4, 5, 10].
Shwetank Singh
Insertion Sort
Insertion sort is a simple sorting algorithm that builds the final sorted array one item at a time. It is much less
efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. However,
insertion sort has the advantage of being simple to implement and efficient for small datasets or nearly sorted
data. It works by iterating through the input array, growing the sorted portion by inserting each new element
into its correct position. Insertion sort has a time complexity of O(n^2) in the worst case but performs well on
small or nearly sorted datasets. It is used in scenarios like sorting small lists or as a part of more complex
algorithms like introsort. The ability to efficiently sort small or nearly sorted datasets makes insertion sort
suitable for quick and simple sorting tasks.
Example
For example, sorting a small list of integers by repeatedly inserting each element into its correct position in the
already sorted portion of the list.
Shwetank Singh
Kruskal’s Algorithm
Kruskal’s algorithm is a minimum spanning tree algorithm which finds an edge of the least possible weight that
connects any two trees in the forest. It starts with a forest (a set of trees) where each vertex in the graph is a separate
tree. The algorithm repeatedly adds the shortest edge that does not form a cycle, until there is one tree that spans all
vertices. Kruskal's algorithm is used in network design, such as laying out cables or pipelines, where it is important to
minimize the total length or cost. It is particularly effective for sparse graphs, where it performs better than Prim's
algorithm. Kruskal's algorithm ensures that the growing spanning tree is always connected, making it suitable for
scenarios where connectivity must be maintained while minimizing costs. The ability to find the minimum spanning tree
by considering edges makes Kruskal's algorithm suitable for applications in network design and infrastructure planning
where minimal cost and full connectivity are essential.
Example
For example, finding the minimum spanning tree in a graph with vertices {A, B, C, D} and edges with weights
{(A, B, 1), (B, C, 2), (C, D, 3), (A, D, 4)} by adding the smallest edge that does not form a cycle until all vertices
are connected.
Shwetank Singh
Linear Search
Linear search is a search algorithm that checks each element of a list sequentially until the desired element is
found. It has a time complexity of O(n), making it less efficient for large datasets compared to binary search.
However, linear search is simple to implement and does not require the data to be sorted. It is used in
scenarios where the dataset is small or unsorted, and quick implementation is required, such as in searching
for an element in an unsorted array or list. The simplicity and applicability to unsorted data make linear search
suitable for straightforward search tasks in small or unordered datasets.
Example
For example, finding the number 5 in an unsorted array [4, 2, 5, 1, 3] by checking each element sequentially
until the target value is found.
Shwetank Singh
Linked List Variants
Linked lists have several variants, including singly linked lists, doubly linked lists, and circular linked lists. Singly
linked lists have nodes with data and a reference to the next node. Doubly linked lists have nodes with
references to both the next and previous nodes, allowing traversal in both directions. Circular linked lists have
the last node pointing back to the first node, forming a circle. Each variant has its own advantages and use
cases, such as simpler implementation and less memory overhead for singly linked lists, and efficient
bidirectional traversal for doubly linked lists. Linked lists are used in scenarios where dynamic size and
frequent insertions/deletions are required. The flexibility and variety of linked list types make them suitable
for different use cases, such as implementing queues, stacks, and complex data structures. The flexibility and
variety of linked list variants make them essential for applications requiring dynamic and efficient data
management.
Example
For example, a doubly linked list allows traversal in both directions, which is useful for implementing a deque
(double-ended queue) where elements can be added or removed from both ends.
Shwetank Singh
Linked Lists
A linked list is a linear data structure where each element, known as a node, contains a data part and a
reference (or link) to the next node in the sequence. Linked lists are dynamic in size, meaning elements can be
added or removed easily. They are particularly useful for applications where frequent insertions and deletions
occur. However, linked lists do not allow random access and have higher memory overhead due to the storage
of pointers.
Example
A singly linked list consists of nodes where each node contains data and a reference to the next node. For
instance, a list containing 1 -> 2 -> 3 means the first node holds 1 and points to the second node holding 2,
which points to the third node holding 3.
Shwetank Singh
Merge Sort
Merge sort is an efficient, stable, comparison-based, divide-and-conquer sorting algorithm. It works by
recursively splitting the input array into two halves, sorting each half, and then merging the sorted halves to
produce the sorted array. Merge sort has a time complexity of O(n log n) for all cases, making it suitable for
large datasets. It is stable, meaning that it maintains the relative order of equal elements, and it can be easily
adapted for use with linked lists and external sorting (sorting data on disk). Merge sort is used in scenarios
where stability and guaranteed time complexity are important, such as in database and large-scale data
processing. The ability to efficiently sort large datasets while maintaining stability makes merge sort suitable
for applications requiring reliable and consistent sorting performance.
Example
For example, sorting an array [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5] by recursively dividing it into smaller arrays and
merging the sorted arrays to get [1, 1, 2, 3, 3, 4, 5, 5, 5, 6, 9].
Shwetank Singh
Pigeonhole Sort
Pigeonhole sort is a non-comparison sorting algorithm that is suitable for sorting lists of elements where the
number of elements and the range of possible key values are approximately the same. It works by first
creating an array of pigeonholes, each representing a possible key value, and then placing each element into
its corresponding pigeonhole. Finally, the elements are collected from the pigeonholes in order. Pigeonhole
sort has a time complexity of O(n + k), where n is the number of elements and k is the range of the key values.
It is particularly useful for sorting integers in a small range. The ability to efficiently sort integers with a small
range makes pigeonhole sort suitable for specialized sorting tasks with known and limited key ranges.
Example
For example, sorting an array of integers [8, 3, 2, 7, 4, 6, 8] with values in the range 0 to 9 using pigeonhole
sort by placing each element in its corresponding position and then collecting them in order to get [2, 3, 4, 6, 7,
8, 8].
Shwetank Singh
Prim’s Algorithm
Prim’s algorithm is a minimum spanning tree algorithm which starts with a single vertex and adds the lowest-
weight edges one at a time. The algorithm grows the spanning tree from an arbitrary starting vertex by adding
the cheapest edge from the tree to a vertex not yet in the tree. Prim's algorithm is used in network design,
where the goal is to minimize the total length or cost. It is particularly effective for dense graphs, where it
performs better than Kruskal's algorithm. Prim's algorithm ensures that the growing spanning tree is always
connected, making it suitable for scenarios where connectivity must be maintained. The ability to find the
minimum spanning tree by continuously adding the cheapest edge makes Prim's algorithm suitable for
applications requiring efficient and connected network designs with minimal costs.
Example
For example, starting from vertex A in a graph, Prim’s algorithm adds the smallest edge connecting A to
another vertex, then adds the smallest edge connecting any vertex in the tree to another vertex, and so on
until all vertices are included.
Shwetank Singh
Priority Queues
Priority queues are abstract data types in which each element has a "priority" associated with it. Elements with
higher priority are served before elements with lower priority. Priority queues can be implemented using
binary heaps, which provide efficient O(log n) time complexity for insertion and deletion of the highest priority
element. Priority queues are used in various applications like task scheduling, Dijkstra's algorithm for shortest
paths, and Huffman coding. The main operations are insert (add an element with a priority) and extract-max
or extract-min (remove the element with the highest or lowest priority). The ability to efficiently manage and
retrieve elements based on priority makes priority queues suitable for applications where order based on
priority is essential. The ability to efficiently manage and retrieve elements based on priority makes priority
queues essential for applications requiring prioritized task and resource management.
Example
For example, a task scheduler that always executes the highest priority task next can be implemented using a
priority queue.
Shwetank Singh
Queues
A queue is a collection of elements that supports first-in, first-out (FIFO) semantics. This means that the first
element added is the first one removed. Queues are useful for scenarios where order must be preserved, such
as task scheduling, breadth-first search in graphs, and buffering data streams. Operations typically include
enqueue (add an element to the end) and dequeue (remove the front element).
Example
A queue can be visualized as a line of people. The person who gets in line first is served first. For example,
enqueuing 1, 2, 3 results in 1 being at the front. Dequeuing will remove 1, leaving 2 at the front.
Shwetank Singh
Quicksort
Quicksort is an efficient sorting algorithm that, on average, makes O(n log n) comparisons to sort n items. It is a
divide-and-conquer algorithm that works by selecting a 'pivot' element from the array and partitioning the other
elements into two sub-arrays according to whether they are less than or greater than the pivot. The sub-arrays are
then sorted recursively. Quicksort is generally faster in practice compared to other O(n log n) algorithms like merge
sort and heapsort, but its performance degrades to O(n^2) in the worst case. However, this can be mitigated with
good pivot selection strategies. Quicksort is used in many applications like databases and search engines where
performance is critical. The ability to efficiently sort large datasets with good average performance makes quicksort
suitable for applications where speed and efficiency are paramount.
Example
For example, sorting an array [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5] by selecting a pivot, partitioning the array into
elements less than and greater than the pivot, and recursively sorting the partitions to get [1, 1, 2, 3, 3, 4, 5, 5,
5, 6, 9].
Shwetank Singh
Radix Sort
Radix sort is a non-comparative integer sorting algorithm that sorts data with integer keys by grouping keys by
the individual digits which share the same significant position and value. The sorting is done by processing
each digit, from least significant to most significant, using a stable sorting algorithm like counting sort. Radix
sort has a time complexity of O(kn), where k is the number of digits in the largest number, and n is the number
of elements. It is particularly efficient for large datasets of fixed-length integers or strings. Radix sort is used in
scenarios where comparison-based sorting algorithms are less efficient, such as in sorting large datasets of
phone numbers or IP addresses. The ability to efficiently sort large datasets with fixed-length keys makes radix
sort suitable for specialized sorting tasks where traditional algorithms may be less effective.
Example
For example, sorting an array of phone numbers by first sorting based on the least significant digit, then the
next digit, and so on until all digits are processed.
Shwetank Singh
Recursion
Recursion is a method of solving a problem where the solution depends on solutions to smaller instances of
the same problem. This technique involves a function calling itself to solve subproblems. Recursion is used in
various applications like traversing trees, solving mathematical problems such as factorial and Fibonacci
series, and many divide-and-conquer algorithms. Proper use of base cases is essential to prevent infinite
recursion and stack overflow errors.
Example
For example, calculating the factorial of a number n recursively: factorial(n) = n * factorial(n-1) with the base
case factorial(0) = 1.
Shwetank Singh
Red-Black Trees
Red-Black trees are balanced binary search trees where each node contains an extra bit for denoting the color
of the node, either red or black. These colors ensure that the tree remains balanced through a set of
properties: (1) Each node is either red or black. (2) The root is always black. (3) Red nodes cannot have red
children (no two reds in a row). (4) Every path from a node to its descendant NULL nodes has the same
number of black nodes. Red-Black trees provide O(log n) time complexity for insertions, deletions, and
lookups. They are used in scenarios where balanced trees are required, such as in implementing associative
arrays, priority queues, and sets. The ability to maintain balance while ensuring efficient operations makes
Red-Black trees suitable for complex data management and dynamic set operations. The ability to maintain
balance while ensuring efficient operations makes Red-Black trees essential for applications involving complex
data management and dynamic set operations.
Example
For example, the TreeMap and TreeSet classes in Java are implemented using red-black trees to maintain
balanced order and efficient performance.
Shwetank Singh
Searching
Searching algorithms are designed to retrieve information stored within some data structure. They are used to
find the presence or location of a specific element in a collection of data. Common searching algorithms
include linear search and binary search. Linear search checks each element sequentially, while binary search
divides and conquers, requiring the data to be sorted.
Example
An example of searching is looking for the number 4 in a sorted list [1, 2, 3, 4, 5] using binary search, which
would find 4 quickly by repeatedly dividing the search interval in half.
Shwetank Singh
Segment Trees
Segment trees are a data structure used for storing information about intervals, or segments. They allow
efficient querying of aggregate information like the sum, minimum, or maximum over a range of values in an
array. Segment trees are used in scenarios where multiple range queries and updates are required, such as in
computational geometry, data analysis, and game development. They provide logarithmic time complexity for
both queries and updates. Segment trees can be extended to lazy propagation to handle range updates
efficiently. The ability to efficiently handle range queries and updates makes segment trees suitable for
applications requiring dynamic interval management and aggregate data processing. The ability to efficiently
handle range queries and updates makes segment trees essential for applications requiring dynamic interval
management and aggregate data processing.
Example
For example, a segment tree can be used to find the sum of elements in a subarray efficiently, as well as
update individual elements while maintaining the ability to query sums quickly.
Shwetank Singh
Selection Sort
Selection sort is a simple comparison-based sorting algorithm that divides the input list into two parts: the
sublist of items already sorted, and the sublist of items remaining to be sorted that occupy the rest of the list.
The algorithm repeatedly selects the smallest (or largest) element from the unsorted sublist, swaps it with the
first unsorted element, and moves the sublist boundary one element to the right. Selection sort has a time
complexity of O(n^2) in all cases but performs better than bubble sort. It is used in scenarios where simplicity
and small memory usage are more important than speed, such as in teaching or for small datasets. The ability
to sort by repeatedly selecting the smallest or largest element makes selection sort suitable for simple and
straightforward sorting tasks. The ability to sort by repeatedly selecting the smallest or largest element makes
selection sort essential for simple and straightforward sorting tasks.
Example
For example, sorting an array of integers by repeatedly finding the minimum element from the unsorted
portion and swapping it with the first unsorted element.
Shwetank Singh
Shell Sort
Shell sort is a generalization of insertion sort that allows the exchange of items that are far apart. The
algorithm starts by sorting elements far apart from each other and progressively reduces the gap between
elements to be compared. Shell sort is efficient for medium-sized lists and provides a significant improvement
over insertion sort for larger lists. The time complexity of Shell sort depends on the gap sequence used, but it
generally performs better than O(n^2) algorithms like insertion and bubble sort. Shell sort is used in scenarios
where the input list is moderately sized and partially sorted, making it more efficient than simpler quadratic
sorting algorithms. The ability to efficiently sort medium-sized lists with a variable gap sequence makes shell
sort suitable for applications where a balance between simplicity and performance is needed.
Example
For example, sorting an array [23, 12, 1, 8, 34, 54, 2, 3] using Shell sort by initially sorting elements far apart
and gradually reducing the gap to achieve a sorted array [1, 2, 3, 8, 12, 23, 34, 54].
Shwetank Singh
Skip Lists
Skip lists are a data structure that allows fast search within an ordered sequence of elements. They consist of
multiple levels of linked lists, with each higher level list skipping over a larger number of elements, allowing
fast search, insertion, and deletion operations. Skip lists provide an alternative to balanced trees and have
average-case time complexity of O(log n) for these operations. They are particularly useful in concurrent
settings where operations can be performed in a lock-free manner. Skip lists are used in applications like
databases, caching, and network routing, where efficient search and update operations are required. The
ability to efficiently manage ordered sequences with multiple levels of skipping makes skip lists suitable for
high-performance applications requiring fast search and update operations.
Example
For example, a skip list with levels for a list of integers [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] allows efficient search by
skipping over elements at higher levels and descending to lower levels as needed to find the target element.
Shwetank Singh
Sorting
Sorting is the process of arranging data in a specific order, typically in ascending or descending order. Sorting
algorithms are fundamental in computer science and are used in various applications, including searching,
data analysis, and organizing data. Common sorting algorithms include quicksort, mergesort, and bubblesort,
each with different time and space complexities and use cases.
Example
An example of sorting is organizing a list of numbers [3, 1, 4, 1, 5, 9] in ascending order to get [1, 1, 3, 4, 5, 9].
Shwetank Singh
Stacks
A stack is a collection of elements that supports last-in, first-out (LIFO) semantics. This means that the most
recently added element is the one that is removed first. Stacks are useful for problems that require reverse
order processing, such as undo mechanisms in text editors, parsing expressions, and depth-first search in
graphs. Operations typically include push (add an element to the top), pop (remove the top element), and
peek (view the top element without removing it).
Example
A stack can be visualized as a stack of plates. You can only add or remove the top plate. For example, pushing
1, 2, 3 onto the stack results in 3 being on top. Popping the stack will remove 3, leaving 2 on top.
Shwetank Singh
Stooge Sort
Stooge sort is a recursive sorting algorithm with a time complexity of O(n^(log 3 / log 1.5)) ≈ O(n^2.71). It is
defined recursively as follows: If the value at the start is greater than the value at the end, swap them. Then,
recursively sort the first two-thirds, the last two-thirds, and the first two-thirds again. Stooge sort is mainly of
theoretical interest due to its high time complexity and is used in educational settings to illustrate the concepts
of recursion and algorithm analysis. The ability to illustrate recursion and theoretical algorithm analysis makes
stooge sort suitable for educational purposes and theoretical discussions on sorting algorithms.
Example
For example, sorting an array [5, 3, 8, 4, 2] using stooge sort by recursively sorting segments of the array and
swapping elements to achieve the sorted array [2, 3, 4, 5, 8].
Shwetank Singh
Suffix Trees
Suffix trees are compressed tries of all the suffixes of a given text. They are used to preprocess a string to
allow efficient searching for patterns, substrings, and other operations like finding the longest repeated
substring or the longest common substring of multiple strings. Suffix trees provide linear time complexity for
construction and efficient query operations. They are used in applications like text processing, bioinformatics
(for DNA sequence analysis), and data compression algorithms. Suffix trees are space-efficient representations
of suffix arrays with additional information. The ability to efficiently handle complex string operations and
queries makes suffix trees suitable for advanced text processing and analysis tasks. The ability to efficiently
handle complex string operations and queries makes suffix trees essential for advanced text processing and
analysis tasks.
Example
For example, a suffix tree for the string "banana" helps efficiently find all occurrences of the substring "ana".
Shwetank Singh
Tim Sort
Tim Sort is a hybrid stable sorting algorithm, derived from merge sort and insertion sort, designed to perform
well on many kinds of real-world data. It was designed to have better performance on partially ordered
datasets. Tim Sort uses a combination of merge sort and insertion sort to handle different portions of the data,
making it efficient and versatile. Tim Sort is used in the Python programming language's built-in sort function
and in Java's Arrays.sort() for non-primitive data types. The ability to efficiently handle various types of real-
world data makes Tim Sort suitable for general-purpose sorting tasks in modern programming languages.
Example
For example, Python's built-in sort() method uses Tim Sort to efficiently sort a list of numbers [3, 1, 4, 1, 5, 9, 2,
6, 5, 3, 5] to get [1, 1, 2, 3, 3, 4, 5, 5, 5, 6, 9].
Shwetank Singh
Topological Sort
Topological sorting of a directed graph is a linear ordering of its vertices such that for every directed edge uv
from vertex u to vertex v, u comes before v in the ordering. This is only possible if the graph has no directed
cycles. Topological sorting is used in scenarios where there is a dependency between tasks, such as scheduling
tasks, course prerequisites, and resolving symbol dependencies in linkers. The sorting can be performed using
DFS or Kahn's algorithm. Topological sorting is particularly useful for scheduling and planning applications
where tasks have dependencies that must be respected. The ability to provide a linear ordering of tasks based
on dependencies makes topological sort essential for project management and planning applications where
task order is critical.
Example
For example, determining the order in which to take courses given their prerequisites, ensuring that each
course is taken only after all its prerequisites have been completed.
Shwetank Singh
Tree Traversal
Tree traversal is the process of visiting all the nodes in a tree data structure in a specific order. The main types
of tree traversal are: (1) Pre-order (visit the root, traverse the left subtree, then traverse the right subtree), (2)
In-order (traverse the left subtree, visit the root, then traverse the right subtree), and (3) Post-order (traverse
the left subtree, traverse the right subtree, then visit the root). Tree traversal is used in various applications
like expression evaluation, tree-based search algorithms, and hierarchical data processing. Each traversal type
has its own use cases and can be implemented using recursive or iterative approaches. The ability to
systematically visit all nodes in a tree makes tree traversal essential for applications involving hierarchical data
processing and manipulation.
Example
For example, performing an in-order traversal on a binary tree with root 10, left child 5, and right child 15
would visit the nodes in the order 5, 10, 15.
Shwetank Singh
Trees
A tree is a hierarchical data structure with a root value and subtrees of children, represented as a set of linked
nodes. Each node contains a value and references to its children. Trees are used to represent hierarchical
relationships, such as file systems, organizational structures, and XML documents. Binary trees, where each
node has at most two children, are a common variant used in searching and sorting algorithms like binary
search trees and heaps.
Example
A binary tree has nodes with a maximum of two children. For example, in a binary tree with root 10, and
children 5 and 15, 10 is the root, 5 is the left child, and 15 is the right child.
Shwetank Singh
Tries
Tries are a type of search tree used to store a dynamic set or associative array where the keys are usually
strings. Each node in a trie represents a single character of a string, and the path from the root to a node
represents a prefix of the stored strings. Tries are used in applications like autocomplete, spell checking, and
IP routing. They provide efficient search, insert, and delete operations, especially for strings with common
prefixes. Tries can be extended to support suffix trees and compressed tries for more advanced string
processing tasks. The ability to efficiently manage and search for string data makes tries suitable for
applications requiring fast and flexible string manipulation and retrieval. The ability to efficiently manage and
search for string data makes tries essential for applications requiring fast and flexible string manipulation and
retrieval.
Example
For example, an autocomplete feature in a search engine uses a trie to store and quickly retrieve possible
completions for a given prefix.
Shwetank Singh
Union-Find
Union-Find is a data structure that keeps track of elements which are split into one or more disjoint sets. It
supports two primary operations: find (determines which set a particular element is in) and union (merges two
sets into one). Union-Find is used in network connectivity, image processing, and Kruskal's algorithm for
finding the minimum spanning tree. It provides near-constant-time operations due to path compression and
union by rank optimizations. Union-Find is particularly useful for dynamic connectivity problems, where the
connectivity of elements changes over time. The ability to efficiently manage and merge disjoint sets makes
Union-Find suitable for applications like dynamic graph connectivity and clustering. The ability to efficiently
manage dynamic connectivity and merge sets makes Union-Find essential for applications involving dynamic
networks and graph connectivity.
Example
For example, keeping track of connected components in a network where find determines the component a
node belongs to, and union merges two components when an edge is added between nodes in different
components.
Shwetank Singh

Data Structures and Algorithm

Uploaded by

Data Structures and Algorithm

Uploaded by

DATA-ENGINEERING 101

Data Structures and

You might also like