0% found this document useful (0 votes)
114 views

Parallel Algorithms Complete Notes

The document discusses parallel algorithms and their analysis. Parallel algorithms can execute instructions simultaneously on different processors and then combine outputs to produce the final result. Analysis of parallel algorithms considers time complexity, number of processors used, and total cost. Common parallel algorithm models include data parallel, task graph, work pool, master-slave, and producer-consumer models.

Uploaded by

OP
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views

Parallel Algorithms Complete Notes

The document discusses parallel algorithms and their analysis. Parallel algorithms can execute instructions simultaneously on different processors and then combine outputs to produce the final result. Analysis of parallel algorithms considers time complexity, number of processors used, and total cost. Common parallel algorithm models include data parallel, task graph, work pool, master-slave, and producer-consumer models.

Uploaded by

OP
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Parallel Algorithm - Introduction

An algorithm is a sequence of steps that take inputs from the user and after some computation,
produces an output. A parallel algorithm is an algorithm that can execute several instructions
simultaneously on different processing devices and then combine all the individual outputs to
produce the final result.

Concurrent Processing
The easy availability of computers along with the growth of Internet has changed the way we store
and process data. We are living in a day and age where data is available in abundance. Every day
we deal with huge volumes of data that require complex computing and that too, in quick time.
Sometimes, we need to fetch data from similar or interrelated events that occur simultaneously. This
is where we require concurrent processing that can divide a complex task and process it multiple
systems to produce the output in quick time.
Concurrent processing is essential where the task involves processing a huge bulk of complex data.
Examples include − accessing large databases, aircraft testing, astronomical calculations, atomic
and nuclear physics, biomedical analysis, economic planning, image processing, robotics, weather
forecasting, web-based services, etc.

What is Parallelism?
Parallelism is the process of processing several set of instructions simultaneously. It reduces the
total computational time. Parallelism can be implemented by using parallel computers, i.e. a
computer with many processors. Parallel computers require parallel algorithm, programming
languages, compilers and operating system that support multitasking.

What is an Algorithm?
An algorithm is a sequence of instructions followed to solve a problem. While designing an
algorithm, we should consider the architecture of computer on which the algorithm will be executed.
As per the architecture, there are two types of computers −

● Sequential Computer
● Parallel Computer
Depending on the architecture of computers, we have two types of algorithms −
● Sequential Algorithm − An algorithm in which some consecutive steps of instructions are
executed in a chronological order to solve a problem.
● Parallel Algorithm − The problem is divided into sub-problems and are executed in parallel
to get individual outputs. Later on, these individual outputs are combined together to get the
final desired output.
It is not easy to divide a large problem into sub-problems. Sub-problems may have data
dependency among them. Therefore, the processors have to communicate with each other to solve
the problem.
It has been found that the time needed by the processors in communicating with each other is more
than the actual processing time. So, while designing a parallel algorithm, proper CPU utilization
should be considered to get an efficient algorithm.
To design an algorithm properly, we must have a clear idea of the basic model of computation in a parallel
computer

Parallel Algorithm - Analysis


Analysis of an algorithm helps us determine whether the algorithm is useful or not. Generally, an
algorithm is analyzed based on its execution time (Time Complexity) and the amount of
space (Space Complexity) it requires.
Since we have sophisticated memory devices available at reasonable cost, storage space is no
longer an issue. Hence, space complexity is not given so much of importance.
Parallel algorithms are designed to improve the computation speed of a computer. For analyzing a
Parallel Algorithm, we normally consider the following parameters −

● Time complexity (Execution Time),


● Total number of processors used, and
● Total cost.

Time Complexity
The main reason behind developing parallel algorithms was to reduce the computation time of an
algorithm. Thus, evaluating the execution time of an algorithm is extremely important in analyzing its
efficiency.
Execution time is measured on the basis of the time taken by the algorithm to solve a problem. The
otal execution time is calculated from the moment when the algorithm starts executing to the
moment it stops. If all the processors do not start or end execution at the same time, then the total
execution time of the algorithm is the moment when the first processor started its execution to the
moment when the last processor stops its execution.
Time complexity of an algorithm can be classified into three categories−
● Worst-case complexity − When the amount of time required by an algorithm for a given
input is maximum.
● Average-case complexity − When the amount of time required by an algorithm for a given
input is average.
● Best-case complexity − When the amount of time required by an algorithm for a given input
is minimum.

Asymptotic Analysis
The complexity or efficiency of an algorithm is the number of steps executed by the algorithm to get
the desired output. Asymptotic analysis is done to calculate the complexity of an algorithm in its
theoretical analysis. In asymptotic analysis, a large length of input is used to calculate the complexity
function of the algorithm.
Note − Asymptotic is a condition where a line tends to meet a curve, but they do not intersect. Here
the line and the curve is asymptotic to each other.
Asymptotic notation is the easiest way to describe the fastest and slowest possible execution time
for an algorithm using high bounds and low bounds on speed. For this, we use the following
notations −

● Big O notation
● Omega notation
● Theta notation
Big O notation
In mathematics, Big O notation is used to represent the asymptotic characteristics of functions. It
represents the behavior of a function for large inputs in a simple and accurate method. It is a method
of representing the upper bound of an algorithm’s execution time. It represents the longest amount of
time that the algorithm could take to complete its execution. The function −
f(n) = O(g(n))
iff there exists positive constants c and n0 such that f(n) ≤ c * g(n) for all n where n ≥ n0.
Omega notation
Omega notation is a method of representing the lower bound of an algorithm’s execution time. The
function −
f(n) = Ω (g(n))
iff there exists positive constants c and n0 such that f(n) ≥ c * g(n) for all n where n ≥ n0.
Theta Notation
Theta notation is a method of representing both the lower bound and the upper bound of an
algorithm’s execution time. The function −
f(n) = θ(g(n))
iff there exists positive constants c1, c2, and n0 such that c1 * g(n) ≤ f(n) ≤ c2 * g(n) for all n where n ≥
n0.

Speedup of an Algorithm
The performance of a parallel algorithm is determined by calculating its speedup. Speedup is
defined as the ratio of the worst-case execution time of the fastest known sequential algorithm for a
particular problem to the worst-case execution time of the parallel algorithm.
speedup =
Worst case execution time of the fastest known sequential for a particular problem
Worst case execution time of the parallel algorithm

Number of Processors Used


The number of processors used is an important factor in analyzing the efficiency of a parallel
algorithm. The cost to buy, maintain, and run the computers are calculated. Larger the number of
processors used by an algorithm to solve a problem, more costly becomes the obtained result.

Total Cost
Total cost of a parallel algorithm is the product of time complexity and the number of processors
used in that particular algorithm.
Total Cost = Time complexity × Number of processors used
Therefore, the efficiency of a parallel algorithm is −
Efficiency =
Worst case execution time of sequential algorithm
Worst case execution time of the parallel algorithm

Parallel Algorithm - Models


The model of a parallel algorithm is developed by considering a strategy for dividing the data and
processing method and applying a suitable strategy to reduce interactions. In this chapter, we will
discuss the following Parallel Algorithm Models −

● Data parallel model


● Task graph model
● Work pool model
● Master slave model
● Producer consumer or pipeline model
● Hybrid model

Data Parallel
● Simple Model
● tasks are statistically assigned to processes and each task performs similar types of
operations on different data.
● it is a result of identical operations being applied concurrently o different data items cause data
parallelism e.g SIMD
● Work may be done in phases
● Data-parallel model can be applied on shared-address spaces and message-passing
paradigms.
● interaction overheads can be reduced by selecting a locality preserving decomposition, by
using optimized collective interaction routines, or by overlapping computation and interaction.
● The primary characteristic of data-parallel model problems is that the intensity of data
parallelism increases with the size of the problem, which in turn makes it possible to use more
processes to solve larger problems.
● effective to solve large problems
● Example − Dense matrix multiplication.
Task Graph Model
In the task graph model, parallelism is expressed by a task graph. A task graph can be either trivial
or nontrivial. In this model, the correlation among the tasks are utilized to promote locality or to
minimize interaction costs. This model is enforced to solve problems in which the quantity of data
associated with the tasks is huge compared to the number of computation associated with them. The
tasks are assigned to help improve the cost of data movement among the tasks.
Examples − Parallel quick sort, sparse matrix factorization, and parallel algorithms derived via
divide-and-conquer approach.

Here, problems are divided into atomic tasks and implemented as a graph. Each task is an
independent unit of job that has dependencies on one or more antecedent task. After the completion
of a task, the output of an antecedent task is passed to the dependent task. A task with antecedent
task starts execution only when its entire antecedent task is completed. The final output of the graph
is received when the last dependent task is completed (Task 6 in the above figure).

Work Pool Model


In work pool model, tasks are dynamically assigned to the processes for balancing the load.
Therefore, any process may potentially execute any task. This model is used when the quantity of
data associated with tasks is comparatively smaller than the computation associated with the tasks.
There is no desired pre-assigning of tasks onto the processes. Assigning of tasks is centralized or
decentralized. Pointers to the tasks are saved in a physically shared list, in a priority queue, or in a
hash table or tree, or they could be saved in a physically distributed data structure.
The task may be available in the beginning, or may be generated dynamically. If the task is
generated dynamically and a decentralized assigning of task is done, then a termination detection
algorithm is required so that all the processes can actually detect the completion of the entire
program and stop looking for more tasks.
Example − Parallel tree search
Master-Slave Model
In the master-slave model, one or more master processes generate task and allocate it to slave
processes. The tasks may be allocated beforehand if −

● the master can estimate the volume of the tasks, or


● a random assigning can do a satisfactory job of balancing load, or
● slaves are assigned smaller pieces of task at different times.
This model is generally equally suitable to shared-address-space or message-passing
paradigms, since the interaction is naturally two ways.
In some cases, a task may need to be completed in phases, and the task in each phase must be
completed before the task in the next phases can be generated. The master-slave model can be
generalized to hierarchical or multi-level master-slave model in which the top level master feeds
the large portion of tasks to the second-level master, who further subdivides the tasks among its own
slaves and may perform a part of the task itself.

Precautions in using the master-slave model


Care should be taken to assure that the master does not become a congestion point. It may happen
if the tasks are too small or the workers are comparatively fast.
The tasks should be selected in a way that the cost of performing a task dominates the cost of
communication and the cost of synchronization.
Asynchronous interaction may help overlap interaction and the computation associated with work
generation by the master.

Pipeline Model
It is also known as the producer-consumer model. Here a set of data is passed on through a
series of processes, each of which performs some task on it. Here, the arrival of new data generates
the execution of a new task by a process in the queue. The processes could form a queue in the
shape of linear or multidimensional arrays, trees, or general graphs with or without cycles.
This model is a chain of producers and consumers. Each process in the queue can be considered as
a consumer of a sequence of data items for the process preceding it in the queue and as a producer
of data for the process following it in the queue. The queue does not need to be a linear chain; it can
be a directed graph. The most common interaction minimization technique applicable to this model is
overlapping interaction with computation.
Example − Parallel LU factorization algorithm.

Hybrid Models
A hybrid algorithm model is required when more than one model may be needed to solve a problem.
A hybrid model may be composed of either multiple models applied hierarchically or multiple models
applied sequentially to different phases of a parallel algorithm.
Example − Parallel quick sort

Parallel Algorithm - Structure


To apply any algorithm properly, it is very important that you select a proper data structure. It is
because a particular operation performed on a data structure may take more time as compared to
the same operation performed on another data structure.
Example − To access the ith element in a set by using an array, it may take a constant time but by
using a linked list, the time required to perform the same operation may become a polynomial.
Therefore, the selection of a data structure must be done considering the architecture and the type
of operations to be performed.
The following data structures are commonly used in parallel programming −

● Linked List
● Arrays
● Hypercube Network

Linked List
A linked list is a data structure having zero or more nodes connected by pointers. Nodes may or may
not occupy consecutive memory locations. Each node has two or three parts − one data part that
stores the data and the other two are link fields that store the address of the previous or next node.
The first node’s address is stored in an external pointer called head. The last node, known
as tail, generally does not contain any address.
There are three types of linked lists −

● Singly Linked List


● Doubly Linked List
● Circular Linked List
Singly Linked List
A node of a singly linked list contains data and the address of the next node. An external pointer
called head stores the address of the first node.

Doubly Linked List


A node of a doubly linked list contains data and the address of both the previous and the next node.
An external pointer called head stores the address of the first node and the external pointer
called tail stores the address of the last node.

Circular Linked List


A circular linked list is very similar to the singly linked list except the fact that the last node saved the
address of the first node.

Arrays
An array is a data structure where we can store similar types of data. It can be one-dimensional or
multi-dimensional. Arrays can be created statically or dynamically.
● In statically declared arrays, dimension and size of the arrays are known at the time of
compilation.
● In dynamically declared arrays, dimension and size of the array are known at runtime.
For shared memory programming, arrays can be used as a common memory and for data parallel
programming, they can be used by partitioning into sub-arrays.

Hypercube Network
Hypercube architecture is helpful for those parallel algorithms where each task has to communicate
with other tasks. Hypercube topology can easily embed other topologies such as ring and mesh. It is
also known as n-cubes, where n is the number of dimensions. A hypercube can be constructed
recursively.
Parallel Algorithm - Design Techniques
Selecting a proper designing technique for a parallel algorithm is the most difficult and important
task. Most of the parallel programming problems may have more than one solution. In this chapter,
we will discuss the following designing techniques for parallel algorithms −

● Divide and conquer


● Greedy Method
● Dynamic Programming
● Backtracking
● Branch & Bound
● Linear Programming

Divide and Conquer Method


In the divide and conquer approach, the problem is divided into several small sub-problems. Then
the sub-problems are solved recursively and combined to get the solution of the original problem.
The divide and conquer approach involves the following steps at each level −
● Divide − The original problem is divided into sub-problems.
● Conquer − The sub-problems are solved recursively.
● Combine − The solutions of the sub-problems are combined together to get the solution of
the original problem.
The divide and conquer approach is applied in the following algorithms −

● Binary search
● Quick sort
● Merge sort
● Integer multiplication
● Matrix inversion
● Matrix multiplication

Greedy Method
In greedy algorithm of optimizing solution, the best solution is chosen at any moment. A greedy
algorithm is very easy to apply to complex problems. It decides which step will provide the most
accurate solution in the next step.
This algorithm is a called greedy because when the optimal solution to the smaller instance is
provided, the algorithm does not consider the total program as a whole. Once a solution is
considered, the greedy algorithm never considers the same solution again.
A greedy algorithm works recursively creating a group of objects from the smallest possible
component parts. Recursion is a procedure to solve a problem in which the solution to a specific
problem is dependent on the solution of the smaller instance of that problem.
Dynamic Programming
Dynamic programming is an optimization technique, which divides the problem into smaller
sub-problems and after solving each sub-problem, dynamic programming combines all the solutions
to get ultimate solution. Unlike divide and conquer method, dynamic programming reuses the
solution to the sub-problems many times.
Recursive algorithm for Fibonacci Series is an example of dynamic programming.

Backtracking Algorithm
Backtracking is an optimization technique to solve combinational problems. It is applied to both
programmatic and real-life problems. Eight queen problem, Sudoku puzzle and going through a
maze are popular examples where backtracking algorithm is used.
In backtracking, we start with a possible solution, which satisfies all the required conditions. Then we
move to the next level and if that level does not produce a satisfactory solution, we return one level
back and start with a new option.

Branch and Bound


A branch and bound algorithm is an optimization technique to get an optimal solution to the problem.
It looks for the best solution for a given problem in the entire space of the solution. The bounds in the
function to be optimized are merged with the value of the latest best solution. It allows the algorithm
to find parts of the solution space completely.
The purpose of a branch and bound search is to maintain the lowest-cost path to a target. Once a
solution is found, it can keep improving the solution. Branch and bound search is implemented in
depth-bounded search and depth–first search.

Linear Programming
Linear programming describes a wide class of optimization job where both the optimization criterion
and the constraints are linear functions. It is a technique to get the best outcome like maximum
profit, shortest path, or lowest cost.
In this programming, we have a set of variables and we have to assign absolute values to them to
satisfy a set of linear equations and to maximize or minimize a given linear objective function.

You might also like