MULTIPLE PROCESSOR
ORGANIZATION
• Single instruction, single data stream - SISD
• Single instruction, multiple data stream - SIMD
• Multiple instruction, single data stream - MISD
• Multiple instruction, multiple data stream- MIMD
SINGLE INSTRUCTION,
SINGLE DATA STREAM -
SISD
• Single processor
• Single instruction stream
• Data stored in single memory
• Uni-processor
MULTIPLE DATA STREAM -
• SingleSIMD
machine instruction
• Controls simultaneous execution
• Number of processing elements
• Lockstep basis
• Each processing element has associated data memory
• Each instruction executed on different set of data by
different processors
• Vector and array processors
SINGLE DATA STREAM -
MISD
• Sequence of data
• Transmitted to set of processors
• Each processor executes different instruction
sequence
• Never been implemented
TAXONOMY OF PARALLEL
PROCESSOR
ARCHITECTURES
MIMD - OVERVIEW
• General purpose processors
• Each can process all instructions necessary
• Further classified by method of processor
communication
TIGHTLY COUPLED - SMP
• Processors share memory
• Communicate via that shared memory
• Symmetric Multiprocessor (SMP)
• Share single memory or pool
• Shared bus to access memory
• Memory access time to given area of memory is
approximately the same for each processor
TIGHTLY COUPLED - NUMA
• Non-uniform memory access
• Access times to different regions of memory may
differ.
LOOSELY COUPLED -
CLUSTERS
• Collection of independent uniprocessors or SMPs
• Interconnected to form a cluster
• Communication via fixed path or network connections
PARALLEL ORGANIZATIONS
- SISD
PARALLEL ORGANIZATIONS
- SIMD
PARALLEL ORGANIZATIONS
- MIMD SHARED MEMORY
- MIMD
DISTRIBUTED MEMORY
SYMMETRIC
MULTIPROCESSORS
• A stand alone computer with the following characteristics
• Two or more similar processors of comparable capacity
• Processors share same memory and I/O
• Processors are connected by a bus or other internal connection
• Memory access time is approximately the same for each processor
• All processors share access to I/O
• Either through same channels or different channels giving paths to same
devices
• All processors can perform the same functions (hence symmetric)
• System controlled by integrated operating system
• providing interaction between processors
• Interaction at job, task, file and data element levels
MULTIPROGRAMMING AND
MULTIPROCESSING
SMP ADVANTAGES
• Performance
• If some work can be done in parallel
• Availability
• Since all processors can perform the same functions, failure of a single
processor does not halt the system
• Incremental growth
• User can enhance performance by adding additional processors
• Scaling
• Vendors can offer range of products based on number of processors
TIGHTLY COUPLED
MULTIPROCESSOR
MULTITHREADING AND
CHIP MULTIPROCESSORS
• Instruction stream divided into smaller streams (threads)
• Executed in parallel
• Wide variety of multithreading designs
DEFINITIONS OF THREADS
AND PROCESSES
• Thread in multithreaded processors may or may not be
same as software threads
• Process:
• An instance of program running on computer
• Resource ownership
• Virtual address space to hold process image
• Scheduling/execution
• Process switch
Cont…
• Thread: dispatch able unit of work within process
• Includes processor context (which includes the program
counter and stack pointer) and data area for stack
• Thread executes sequentially
• Interruptible: processor can turn to another thread
• Thread switch
• Switching processor between threads within same process
• Typically less costly than process switch
IMPLICIT AND EXPLICIT
MULTITHREADING
• All commercial processors and most experimental ones use explicit
multithreading
• Concurrently execute instructions from different explicit threads
• Interleave instructions from different threads on shared pipelines or parallel
execution on parallel pipelines
• Implicit multithreading is concurrent execution of multiple threads
extracted from single sequential program
• Implicit threads defined statically by compiler or dynamically by hardware
APPROACHES TO EXPLICIT
MULTITHREADING
• Interleaved
• Fine-grained
• Processor deals with two or more thread contexts at a time
• Switching thread at each clock cycle
• If thread is blocked it is skipped
• Blocked
• Coarse-grained
• Thread executed until event causes delay
• E.g. Cache miss
• Effective on in-order processor
• Avoids pipeline stall
MULTIPROCESSOR SYSTEM
• A multiprocessor system is a single computer that includes multiple
processors (computer modules).
• Processors may communicate and cooperate at different levels in
solving a given problem.
• The communication may occur by sending messages from one
processor to the other or by sharing a common memory.
• A multiprocessor system is controlled by one operating system which
provides interaction between processors and their programs at the
process, data set and data element levels.
MULTICOMPUTERS
• There is a group of processors, in which each of the processors has
sufficient amount of local memory.
• The communication between the processors is through messages.
• There is neither a common memory nor a common clock.
• This is also called distributed processing.
GRID COMPUTING
• Grid Computing enables geographically dispersed computers or
computing clusters to dynamically and virtually share applications,
data, and computational resources.
• It uses standard TCP/IP networks to provide transparent access to
technical computing services wherever capacity is available,
transforming technical computing into an information utility that is
available across a department or organization.
27
28
29
30
Challenges resulting from multi-core
Relies on effective exploitation of multiple-thread parallelism
Need for parallel computing model and parallel programming model
Aggravates memory wall
Memory bandwidth
▪ Way to get data out of memory banks
▪ Way to get data into multi-core processor array
Memory latency
Fragments L3 cache
Pins become strangle point
▪ Rate of pin growth projected to slow and flatten
▪ Rate of bandwidth per pin (pair) projected to grow slowly
Requires mechanisms for efficient inter-processor coordination
Synchronization
Mutual exclusion
Context switching
31
Advantages of Multi-core
• Cache coherency circuitry can operate at a much higher clock rate
than is possible if the signals have to travel off-chip.
• Signals between different CPUs travel shorter distances, those signals
degrade less.
• These higher quality signals allow more data to be sent in a given
time period since individual signals can be shorter and do not need to
be repeated as often.
• A dual-core processor uses slightly less power than two coupled
single-core processors.
32
Performance
Introduction
Performance measurement is important:
Helps us to determine if one processor or computer
works faster than other
Helps us to know how much performance
improvement has taken place after incorporating some
performance enhancement feature
Help to see through the marketing hype!
Provides answer to the following questions:
Why is some hardware better than others for different
programs?
What factors affect system performance?
Hardware, OS or compiler?
How does the machine’s instruction set affect
performance?
Defining Performance in terms of time
Time is the final measure of computer performance
A computer exhibits higher performance if it executes
program faster
Individual
Response Time (elapsed time, Latency): user concern
How long does it take for my job to run?
How long does it take to execute (start to finish)
my job?
How long must/wait for the database query?
Throughput:
How many jobs can the machine run at once?
What is the average execution rate? System
Manager
How much work is getting done? concern
Execution Time
Elapsed time
Count everything (disk and memory access, waiting for IO, running
other programs, etc) from start to finish
A useful number, but often not good for comparison purpose
Elapsed time = CPU time +wait time (IO, other program, etc.)
CPU time
Doesn’t count waiting for IO or time spent running other programs
Can be divided into user CPU time and system CPU time(OS calls)
CPU time = user CPU time + System CPU Time
Elapsed time = user CPU time + System CPU Time + wait time
Our focus: User CPU time
CPU execution time or simply execution time: time spent executing
the lines of code that are in our program
Measuring performance
for some program running on machine X:
Performance =
X is n times faster than Y means:
=n
The IRON law of processor performance
𝑇𝑖𝑚𝑒
𝑃𝑟𝑜𝑐𝑒𝑠𝑠𝑜𝑟 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒=
𝑃𝑟𝑜𝑔𝑟𝑎𝑚
= XX
(code Size) (CPI) (Cycle time)
Architecture Implementation Realization
Compiler Designer Processor designer Chip designer
MIPS and MFLOPS
Problems with MIPS
Problem
Find out the number of instructions for each code
sequence, the faster code sequence, and CPI for
each code sequence
Bench mark sample with problem