Unit4 Session3 Parallel Computing Concepts Terminology Design Issues
Unit4 Session3 Parallel Computing Concepts Terminology Design Issues
Architecture (μpCA)
Unit 4: Parallel Computing Concepts and
Terminology, Design Issues & Constraints
UE21CS251B
Session : 4.3
. Microprocessor & Computer Architecture (μpCA)
Parallel Computing Concepts and Terminology
Node
A standalone "computer in a box". Usually comprised of multiple CPUs/processors/cores, memory, network
interfaces, etc. Nodes are networked together to comprise a supercomputer.
Task : A logically discrete section of computational work. A task is typically a program or program-like set of instructions
that is executed by a processor. A parallel program consists of multiple tasks running on multiple processors.
Pipelining : Breaking a task into steps performed by different processor units, with inputs streaming through, much like an
assembly line; a type of parallel computing.
Shared Memory: From a strictly hardware point of view, describes a computer architecture where all processors have
direct (usually bus based) access to common physical memory. In a programming sense, it describes a model where parallel
tasks all have the same "picture" of memory and can directly address and access the same logical memory locations
regardless of where the physical memory actually exists.
Symmetric Multi-Processor (SMP): Shared memory hardware architecture where multiple processors share a single
address space and have equal access to all resources.
Distributed Memory: In hardware, refers to network based memory access for physical memory that is not common. As a
programming model, tasks can only logically "see" local machine memory and must use communications to access memory
on other machines where other tasks are executing.
. Microprocessor & Computer Architecture (μpCA)
Parallel Computing Concepts and Terminology
Communications : Parallel tasks typically need to exchange data. There are several ways this can be accomplished, such as
through a shared memory bus or over a network, however the actual event of data exchange is commonly referred to as
communications regardless of the method employed.
Synchronization : The coordination of parallel tasks in real time, very often associated with communications. Often
implemented by establishing a synchronization point within an application where a task may not proceed further until
another task(s) reaches the same or logically equivalent point.Synchronization usually involves waiting by at least one
task, and can therefore cause a parallel application's wall clock execution time to increase.
Granularity : In parallel computing, granularity is a qualitative measure of the ratio of computation to communication.
Coarse: relatively large amounts of computational work are done between communication events
Fine: Relatively small amounts of computational work are done between communication events
Observed Speedup: Observed speedup of a code which has been parallelized, defined as:
wall-clock time of serial execution
------------------------------------------------
wall-clock time of parallel execution
One of the simplest and most widely used indicators for a parallel program's performance.
. Microprocessor & Computer Architecture (μpCA)
Parallel Computing Concepts and Terminology
Parallel Overhead: The amount of time required to coordinate parallel tasks, as opposed to doing useful work. Parallel
overhead can include factors such as:Task start-up time
•Synchronizations
•Data communications
•Software overhead imposed by parallel languages, libraries, operating system, etc.
•Task termination time
Massively Parallel : Refers to the hardware that comprises a given parallel system - having many processing elements.
The meaning of "many" keeps increasing, but currently, the largest parallel computers are comprised of processing
elements numbering in the hundreds of thousands to millions.
Embarrassingly Parallel: Solving many similar, but independent tasks simultaneously; little to no need for coordination
between the tasks.
Scalability : Refers to a parallel system's (hardware and/or software) ability to demonstrate a proportionate increase in
parallel speedup with the addition of more resources. Factors that contribute to scalability include:
•Hardware - particularly memory-cpu bandwidths and network communication properties
•Application algorithm
•Parallel overhead related
•Characteristics of your specific application
Microprocessor & Computer Architecture (μpCA)
Parallel Computing Models
• SPMD is actually a "high level" programming model that can be built upon any combination of the previously
mentioned parallel programming models.
• SINGLE PROGRAM: All tasks execute their copy of the same program simultaneously. This program can be
threads, message passing, data parallel or hybrid.
• SPMD programs usually have the necessary logic programmed into them to allow different tasks to branch or
conditionally execute only those parts of the program they are designed to execute. That is, tasks do not
necessarily have to execute the entire program - perhaps only a portion of it.
• The SPMD model, using message passing or hybrid programming, is probably the most commonly used
parallel programming model for multi-node clusters.
Microprocessor & Computer Architecture (μpCA)
Parallel Computing Models : Multiple Program Multiple Data (SPMD):
• Like SPMD, MPMD is actually a "high level" programming model that can be built upon any combination of the
previously mentioned parallel programming models.
• MULTIPLE PROGRAM: Tasks may execute different programs simultaneously. The programs can be threads, message
passing, data parallel or hybrid.
• MPMD applications are not as common as SPMD applications, but may be better suited for certain types of problems,
particularly those that lend themselves better to functional decomposition than domain decomposition (discussed later
under Partioning).
Microprocessor & Computer Architecture (μpCA)
Parallel Computing - Design Issues:
Thread Level
• Multi-threading vs Hyper-threading or Simultaneous Multi-threading.
Instruction Level
• Pipelining
• Super Pipelining
• Super Scalar
• Vector & Array Processing
• VLIW
• EPIC
• Parallel Computing vs Multicore Computing
Microprocessor & Computer Architecture (μpCA)
Single thread can run at any given time
L1 D-Cache D-TLB
Schedulers
Uop queues
Rename/Alloc
Decoder
Bus
L1 D-Cache D-TLB
Schedulers
Uop queues
Rename/Alloc
Decoder
Bus
Thread 2:
integer operation
Microprocessor & Computer Architecture (μpCA)
Multi-Threading
Hyper-Thread or
Simultaneous
Multithreading?
L1 D-Cache D-TLB
Schedulers
Uop queues
Rename/Alloc
Decoder
Bus
L1 D-Cache D-TLB
Schedulers
Uop queues
Rename/Alloc
Rename/Alloc Rename/Alloc
Bus
Thread 3 21 Thread 4
Microprocessor & Computer Architecture (μpCA)
SMT Dual-core: all four threads can run concurrently
Rename/Alloc Rename/Alloc
Bus
for(i=1;i<=6;i++)
Out=i+i;
Microprocessor & Computer Architecture (μpCA)
Pipelining
1 2 3 4 5 6 7 8 9 10
VLIW ADD MUL ADDF MULF LDR MOV VLIW allows multiple independent
instructions to be packed together by the
compiler, enabling parallel execution
PE PE :::::::: PE ::::::::
PE PE PE
Team MPCA
Department of Computer Science and Engineering