0% found this document useful (0 votes)
138 views90 pages

Parallel and Distributed Computing

Uploaded by

Well Wisher
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
Download as pptx, pdf, or txt
0% found this document useful (0 votes)
138 views90 pages

Parallel and Distributed Computing

Uploaded by

Well Wisher
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 90

Parallel and Distributed

Computing
Lecture – 01 Introduction
Background – Serial Computing
• Computer software was written conventionally for serial computing.
• Standard computing is also known as "serial computing"
• This meant that to solve a problem, an algorithm divides the
problem into smaller instructions.
• These discrete instructions are then executed on the Central
Processing Unit of a computer one by one.
• Only after one instruction is finished, next one starts. 
Background – Serial Computing
Background – Serial Computing
Background – Serial Computing
• A real-life example of this would be people standing in a queue
waiting for a movie ticket and there is only a cashier. The cashier is
giving tickets one by one to the persons. The complexity of this
situation increases when there are 2 queues and only one cashier. 
• So, in short, Serial Computing is following: 
• In this, a problem statement is broken into discrete instructions. 
• Then the instructions are executed one by one. 
• Only one instruction is executed at any moment of time.
Background – Serial Computing
• Look at point 3. This was causing a huge problem in the computing industry
as only one instruction was getting executed at any moment of time. This was
a huge waste of hardware resources as only one part of the hardware will be
running for particular instruction and of time.
• As problem statements were getting heavier and bulkier, so does the amount
of time in execution of those statements. Examples of processors are Pentium
3 and Pentium 4. 
• Now let’s come back to our real-life problem. We could definitely say that
complexity will decrease when there are 2 queues and 2 cashiers giving
tickets to 2 persons simultaneously. This is an example of Parallel Computing.
Parallel Computer
• Virtually all stand-alone computers today are parallel from a hardware
perspective:
- Multiple functional units (L1 cache, L2 cache, branch, prefetch, decode,
floating-point, graphics processing (GPU), integer, etc.)
- Multiple execution units/cores
- Multiple hardware threads
Parallel Computer
Parallel Computer
• Networks connect multiple stand-alone computers (nodes) to make larger
parallel computer clusters.
Parallel Computing
• Kind of computing architecture where the large problems break into
independent, smaller, usually similar parts that can be processed in one go.
• It is done by multiple CPUs communicating via shared memory, which
combines results upon completion.
• It helps in performing large computations as it divides the large problem
between more than one processor.
Parallel Computing
Parallel Computing
Parallel Computing
• Parallel computing also helps in faster application processing and task
resolution by increasing the available computation power of systems.
• The parallel computing principles are used by most supercomputers employ
to operate.
• The operational scenarios that need massive processing power or
computation, generally, parallel processing is commonly used there.
Parallel Computing
• Typically, this infrastructure is housed where various processors are installed
in a server rack; application server distributes the computational requests
into small chunks then requests are processed simultaneously on each
server.
• The earliest computer software is written for serial computation as they are
able to execute a single instruction at one time, but parallel computing is
different where it executes several processors an application or computation
in one time.
Parallel Computing – Flynn’s Taxonomy
• Best known classification scheme for parallel computers.
• Depends on parallelism it exhibits with its
- Instruction stream
- Data stream
• A sequence of instructions (the instruction stream) manipulates a sequence
of operands (the data stream)
• The instruction stream (I) and the data stream (D) can be either single (S) or
multiple (M)
Parallel Computing – Flynn’s Taxonomy
• Four combinations: SISD, SIMD, MISD, MIMD
Parallel Computing – Flynn’s Taxonomy

• Serial Computer
• Single-CPU systems
- i.e., uniprocessors
- Note: co-processors don’t count as more processors
• Examples: older generation main frames, work stations, PCs
Parallel Computing – Flynn’s Taxonomy

• A type of parallel computer


• All processing units execute the same instruction at any given clock cycle
• Each processing unit can operate on a different data element
• One instruction stream is broadcast to all processors
• Each processor (also called a processing element or PE) is very simplistic and
is essentially an ALU;
- PEs do not store a copy of the program nor have a program control unit
Parallel Computing – Flynn’s Taxonomy

• All active processor executes the same instruction synchronously, but on


different data

• On a memory access, all active processors must access the same location in
their local memory.

• The data items form an array and an instruction can act on the complete array
in one cycle.
Parallel Computing – Flynn’s Taxonomy

• How to View an SIMD Machine


- Think of soldiers all in a unit.

- A commander selects certain soldiers as active – for example, every even


numbered row.

- The commander barks out an order that all the active soldiers should do
and they execute the order synchronously.
Parallel Computing – Flynn’s Taxonomy

• Most modern computers, particularly those with graphics processor units


(GPUs) employ SIMD instructions and execution units.
Parallel Computing – Flynn’s Taxonomy

• A single data stream is fed into multiple processing units.

• Each processing unit operates on the data independently via independent


instruction streams.

• Few actual examples : Carnegie-Mellon C.mmp computer (1971).


Parallel Computing – Flynn’s Taxonomy

• Processors are asynchronous, since they can independently execute different programs
on different data sets.

• Communications are handled either by


- through shared memory. (multiprocessors)
- use of message passing (multicomputers)

• MIMD’s have been considered by most researchers to include the most powerful, least
restricted computers.
Parallel Computing – Flynn’s Taxonomy

• Currently, most common type of parallel computer


• Every processor may be executing a different instruction stream
• Every processor may be working with a different data stream
• Execution can be synchronous or asynchronous, deterministic or non-
deterministic
Parallel Computing –Parallel Computer Architectures
• Shared memory: all processors can access the same memory

• Uniform memory access (UMA):


- identical processors
- equal access and access times to memory
Parallel Computing –Non-uniform memory access (NUMA)

• Not all processors have equal access to all memories

• Memory access across link is slower

• Advantages: -user-friendly programming perspective to memory -fast and


uniform data sharing due to the proximity of memory to CPUs

• Disadvantages: -lack of scalability between memory and CPUs. -Programmer


responsible to ensure "correct" access of global memory -Expense
Parallel Computing - Distributed Memory
• Distributed memory systems require a communication network to connect
inter-processor memory.

• Advantages: -Memory is scalable with number of processors. -No memory


interference or overhead for trying to keep cache coherency. -Cost effective

• Disadvantages: -programmer responsible for data communication between


processors. -difficult to map existing data structures to this memory
organization.
Parallel Computing - Hybrid Distributed-Shared Memory
• Generally used for the currently largest and fastest computers
• Has a mixture of previously mentioned advantages and disadvantages
Parallel Computing - Advantages
• It saves time and money as many resources working together will reduce the
time and cut potential costs. 

• It can be impractical to solve larger problems on Serial Computing

• It can take advantage of non-local resources when the local resources are
finite. 

• Reduces the Complexity

• Serial Computing ‘wastes’ the potential computing power, thus Parallel


Computing makes better work of the hardware.
Parallel Computing – Parallel Programming
• Parallel programming (also, unfortunately, sometimes called concurrent
programming), is a computer programming technique that provides for the
execution of operations concurrently, either
- within a single parallel computer OR
- across a number of systems.
• In the latter case, the term distributed computing is used.
Parallel Computing – Parallel Programming Model
• Shared Model
- tasks share a common address space, which they read and write
asynchronously.
- Various mechanisms such as locks / semaphores may be used to control
access to the shared memory.
- Advantage:
- no need to explicitly communicate of data between tasks -> simplified
programming
- Disadvantages:
- Need to take care when managing memory, avoid synchronization
conflicts
- Harder to control data locality
Parallel Computing – Parallel Programming Model
• Threads
- A thread can be considered as a subroutine in the main program
- Threads communicate with each other through the global memory
- commonly associated with shared memory architectures and operating
systems
- PosixThreads or pthreads
- OpenMP
Parallel Computing – Parallel Programming Model
• Message Passing
- A set of tasks that use their own local memory during computation.
- Data exchange through sending and receiving messages.
- Data transfer usually requires cooperative operations to be performed by
each process. For example, a send operation must have a matching receive
operation.
- MPI (released in 1994)
- MPI-2 (released in 1996)
Parallel Computing – Parallel Programming Model
• Data Parallel Model

- The data parallel model demonstrates the following characteristics:

- Most of the parallel work performs operations on a data set, organized into a common
structure, such as an array
- A set of tasks works collectively on the same data structure, with each task working on
a different partition
- Tasks perform the same operation on their partition

- On shared memory architectures, all tasks may have access to the data structure through
global memory. On distributed memory architectures the data structure is split up and
resides as "chunks" in the local memory of each task.
Parallel Computing – Parallel Programming Model
Data Parallel Model
Parallel Computing – Parallel Programming Model
• Hybrid
- combines various models, e.g. MPI/OpenMP
• Single Program Multiple Data (SPMD)
- A single program is executed by all tasks simultaneously

• Multiple Program Multiple Data (MPMD)


- An MPMD application has multiple executables. Each task can execute the same or
different program as other tasks.
Parallel Computing – Types
• Bit-level parallelism
• Instruction-level parallelism
• Task Parallelism
Parallel Computing – Types
• Bit-level parallelism
- Every task is dependent on processor word size. In terms of performing a
task on large-sized data, it reduces the number of instructions the
processor must execute.
- There is a need to split the operation into series of instructions. For
example, there is an 8-bit processor, and you want to do an operation on
16-bit numbers. First, it must operate the 8 lower-order bits and then the
8 higher-order bits. Therefore, two instructions are needed to execute the
operation. The operation can be performed with one instruction by a 16-
bit processor.
Parallel Computing – Types
• Instruction-level parallelism
- A processor can only address less than one instruction for each clock cycle
phase. These instructions can be re-ordered and grouped which are later
on executed concurrently without affecting the result of the program. This
is called instruction-level parallelism
Parallel Computing – Why?
• The Real-World is a Massively Complex
- In the natural world, many complex, interrelated events are happening at
the same time, yet within a temporal sequence.
Parallel Computing – Why?
• The Real-World is a Massively Complex
- Compared to serial computing, parallel computing is much better suited
for modeling, simulating and understanding complex, real world
- phenomena.
- For example, imagine modeling these serially:
Parallel Computing – Why?
• Save Time/Monet
- In theory, throwing more resources at a task will shorten its time to
completion, with potential cost savings.
- Parallel computers can be built from cheap, commodity components.
Parallel Computing – Why?
• Solve Larger/Complex Problems
- Many problems are so large and/or complex that it is
impractical or impossible to solve them using a serial
program, especially given limited computer memory.
Parallel Computing – Why?
• Solve Larger/Complex Problems
- Example: "Grand Challenge Problems"
(en.wikipedia.org/wiki/Grand_Challenge) requiring petaflops and
petabytes of computing resources.
- Example: Web search engines/databases processing millions of
transactions every second

Petaflops =
Petabytes =
Parallel Computing – Why?
• Provide Accuracy
- Single compute resource can only do one thing at a time.
Multiple compute resources can do many things simultaneously.
- Example: Collaborative Networks provide a global venue where
people from around the world can meet and conduct work
"virtually".
Parallel Computing – Why?
• Provide Accuracy
- Single compute resource can only do one thing at a time.
Multiple compute resources can do many things simultaneously.
- Example: Collaborative Networks provide a global venue where
people from around the world can meet and conduct work
"virtually".
Parallel Computing – Why?
• Take Advantage of non-local Resources
- Using compute resources on a wide area network, or even the
Internet when local compute resources are scarce or insufficient.
- Example: SETI@home (setiathome.berkeley.edu) has over 1.7
million users in nearly every country in the world. (May, 2018).
Parallel Computing – Why?
• BETTER USE OF UNDERLYING PARALLEL HARDWARE
- Modern computers, even laptops, are parallel in architecture with multiple
processors/cores.
- Parallel software is specifically intended for parallel hardware with
multiple cores, threads, etc.
- In most cases, serial programs run on modern computers "waste"
potential computing power.
Parallel Computing – Who is using?
• Science and Engineering
- Historically, parallel computing has been considered to be "the high end of
computing", and has been used to model difficult problems in many areas
of science and engineering:
- Atmosphere, Earth, Environment
- Physics - applied, nuclear, particle, condensed matter, high pressure,
fusion, photonics
- Bioscience, Biotechnology, Genetics
- Chemistry, Molecular Sciences
Parallel Computing – Who is using?
• Science and Engineering
- Geology, Seismology
- Mechanical Engineering - from prosthetics to spacecraft
- Electrical Engineering, Circuit Design, Microelectronics
- Computer Science, Mathematics
- Defense, Weapons
Parallel Computing – Who is using?
• Industrial and Commercial
- • Today, commercial applications provide an equal or greater driving force
in the development of faster computers. These applications require
- the processing of large amounts of data in sophisticated ways. For
example:
- "Big Data",data mining
- Artificial Intelligence (AI)
- Oil exploration
Parallel Computing – Who is using?
• Industrial and Commercial
- Web search engines, web based business services
- Medical imaging and diagnosis
- Pharmaceutical design
- Financial and economic modeling
- Management of national and multi-national corporations
- Advanced graphics and virtual reality, particularly in the entertainment
industry
- Networked video and multi-media technologies
- Collaborative work environments
Parallel Computing – Who is using?
• Global Applications
- Parallel computing is now being used extensively around the world, in a
wide variety of applications
Parallel Computing – Limitations
• It addresses such as communication and synchronization between multiple
sub-tasks and processes which is difficult to achieve.

• The algorithms must be managed in such a way that they can be handled in a
parallel mechanism.

• The algorithms or programs must have low coupling and high cohesion. But
it’s difficult to create such programs.

• More technically skilled and expert programmers can code a parallelism-


based program well.
von Neumann Computer
• Model for designing and building computers, based on the following
three characteristics:
1) The computer consists of four main sub-systems:
• Memory
• ALU (Arithmetic/Logic Unit)
• Control Unit
• Input/Output System (I/O)

2) Program is stored in memory during execution.

3) Program instructions are executed sequentially.


von Neumann Computer
Bus

Processor (CPU)

Memory Input-Output
Control Unit

ALU
Communicate with
Store data and program
"outside world", e.g.
• Screen
Execute program • Keyboard
• Storage devices
• ...
Do arithmetic/logic operations
requested by program
von Neumann Computer
• All computers more or less based on the same basic design, the Von Neumann
Architecture!
Personnel Computer
Motivations for Parallel Computing
• Fundamental limits on single processor speed
• Heat dissipation from CPU chips
• Disparity between CPU & memory speeds
• Distributed data communications
• Need for very large scale computing platforms
Motivations for Parallel Computing
• Fundamental limits – Cycle Speed
- Intel 8080 2MHz 1974
- ARM 2 8MHz 1986
- Intel Pentium Pro 200MHz 1996
- AMD Athlon 1.2GHz 2000
- Intel QX6700 2.66GHz 2006
- Intel Core i7 3770k 3.9GHz 2013
- Speed of light: 30cm in 1ns
Moore’s Law
• Moore’s observation in 1965: the number of transistors per square
inch on integrated circuits had doubled every year since the
integrated circuit was invented
• Moore’s revised observation in 1975: the pace was slowed down a
bit, but data density had doubled approximately every 18 months
• How about the future? (price of computing power falls by a half
every 18 months?)
Moore’s Law – Held for Now
Power Wall Effect in Computer Architecture
• Too many transistors in a given chip die area
• Tremendous increase in power density
• Increased chip temperature
• High temperature slows down the transistor switching rate and the overall
speed of the computer
• Chip may melt down if not cooled properly  Efficient cooling systems are
expensive
Cooling Computer Chips
• Some people suggest to put computer chips in liquid nitrogen to cool them
Solutions
• Use multiple inexpensive processors A processor with multiple cores
A Multi-Core Processor
CPU and Memory Speed
• In 20 years, CPU speed (clock rate) has increased by a factor of 1000
• DRAM speed has increased only by a factor of smaller than 4
• How to feed data faster enough to keep CPU busy?
• CPU speed: 1-2 ns
• DRAM speed: 50-60 ns
• Cache: 10 ns
Memory Access and CPU Speed
CPU, Memory and Disk Speed
Possible Solutions
• A hierarchy of successively fast memory devices (multilevel caches)
• Location of data reference (code)
• Efficient programming can be an issue
• Parallel systems may provide
1.) larger aggregate cache
2.) higher aggregate bandwidth to the memory system
Parallel Computing – Useful Terms
• Concurrent - Events or processes which seem to occur or progress at the
same time.
• Parallel –Events or processes which occur or progress at the same time
• Parallel programming (also, unfortunately, sometimes called concurrent
programming), is a computer programming technique that provides for the
execution of operations concurrently, either
- within a single parallel computer OR
- across a number of systems.
• In the latter case, the term distributed computing is used.
Parallel Computing – Flynn’s Taxonmy
• Best known classification scheme for parallel computers.
• Depends on parallelism it exhibits with its
- Instruction stream
- Data stream
• A sequence of instructions (the instruction stream) manipulates a sequence
of operands (the data stream)
• The instruction stream (I) and the data stream (D) can be either single (S) or
multiple (M)
Parallel Computing – Flynn’s Taxonmy
• Four combinations: SISD, SIMD, MISD, MIMD
Parallel Computing – Flynn’s Taxonmy

• Serial Computer
• Single-CPU systems
- i.e., uniprocessors
- Note: co-processors don’t count as more processors
• Examples: older generation main frames, work stations, PCs
Parallel Computing – Flynn’s Taxonmy

• A type of parallel computer


• All processing units execute the same instruction at any given clock cycle
• Each processing unit can operate on a different data element
• One instruction stream is broadcast to all processors
• Each processor (also called a processing element or PE) is very simplistic and
is essentially an ALU;
- PEs do not store a copy of the program nor have a program control unit
Parallel Computing – Flynn’s Taxonmy

• All active processor executes the same instruction synchronously, but on


different data

• On a memory access, all active processors must access the same location in
their local memory.

• The data items form an array and an instruction can act on the complete array
in one cycle.
Parallel Computing – Flynn’s Taxonmy

• How to View an SIMD Machine


- Think of soldiers all in a unit.

- A commander selects certain soldiers as active – for example, every even


numbered row.

- The commander barks out an order that all the active soldiers should do
and they execute the order synchronously.
Parallel Computing – Flynn’s Taxonmy

• Most modern computers, particularly those with graphics processor units


(GPUs) employ SIMD instructions and execution units.
Parallel Computing – Flynn’s Taxonmy

• A single data stream is fed into multiple processing units.

• Each processing unit operates on the data independently via independent


instruction streams.

• Few actual examples : Carnegie-Mellon C.mmp computer (1971).


Parallel Computing – Flynn’s Taxonmy

• Processors are asynchronous, since they can independently execute different programs
on different data sets.

• Communications are handled either by


- through shared memory. (multiprocessors)
- use of message passing (multicomputers)

• MIMD’s have been considered by most researchers to include the most powerful, least
restricted computers.
Parallel Computing – Flynn’s Taxonmy

• Currently, most common type of parallel computer


• Every processor may be executing a different instruction stream
• Every processor may be working with a different data stream
• Execution can be synchronous or asynchronous, deterministic or non-
deterministic
Parallel Computing – Useful Terms
• All processors have access to all memory locations .
• Uniform memory access (UMA)
• Similar to uniprocessor, except additional, identical CPU’s are added to the
bus.
• Each processor has equal access to memory and can do anything that any
other processor can do.
• Also called a symmetric multiprocessor or SMP
• We will discuss in greater detail later (e.g., text pg 43)
• SMPs and clusters of SMPs are currently very popular
Parallel Computing – Useful Terms
• Coupling and Cohesion
- In software engineering, coupling is the degree of interdependence between software
modules; a measure of how closely connected two routines or modules are; the
strength of the relationships between modules
- Coupling is usually contrasted with cohesion. Low coupling often correlates with high
cohesion, and vice versa. Low coupling is often thought to be a sign of a well-
structured computer system and a good design, and when combined with high
cohesion, supports the general goals of high readability and maintainability.
Parallel Computing – Useful Terms
• FLOPS
- Floating-point operations per second, or FLOPS, is the unit of
measurement that calculates the performance capability of a
supercomputer.
- Floating-point operations can only be executed on computers with
integrated floating-point registers.
- The average computer’s processor performance is measured by megahertz
(MHz) units to calculate its clock speed. Since supercomputers are far
more capable when it comes to power performance, the method in which
performance is calculated must be on a considerably larger scale
Parallel Computing – Useful Terms
• FLOPS
One petaFLOPS is equal to 1,000,000,000,000,000 (one quadrillion) FLOPS,
or one thousand teraFLOPS

One teraFLOPS is equal to 1,000,000,000,000 (one trillion) FLOPS.


Parallel Computing – Useful Terms
• SETI@home

SETI@home ("SETI at home") is a project of the Berkeley SETI Research


Center to analyze radio signals, searching for signs of extraterrestrial
intelligence. Until March 2020, it was run as an Internet-based public
volunteer computing project that employed the BOINC software platform. It
is hosted by the Space Sciences Laboratory at the University of California,
Berkeley, and is one of many activities undertaken as part of the worldwide
SETI effort.
Parallel Computing – Useful Terms
• Extraterrestrial intelligence
Extraterrestrial intelligence (often abbreviated ETI) refers to
hypothetical intelligent extraterrestrial life (Extraterrestrial
life is hypothetical life that may occur outside Earth and which did not
originate on Earth.)
Parallel Computing
• Parallel computing also helps in faster application processing and task
resolution by increasing the available computation power of systems.
• The parallel computing principles are used by most supercomputers employ
to operate.
• The operational scenarios that need massive processing power or
computation, generally, parallel processing is commonly used there.

You might also like