Chapter 6 Parallel Processor
Chapter 6 Parallel Processor
5th
Edition
The Hardware/Software Interface
Chapter 6
Parallel Processors from
Client to Cloud
§6.1 Introduction
Introduction
Multiprocessor
Goal: connecting multiple computers
to get higher performance
Scalability, availability, power efficiency
Multicore processors
Chips with multiple processors (cores)
Task-level (process-level) parallelism
High throughput for independent jobs
Parallel program (parallel software)
A single program run on multiple processors
Challenges: partitioning, coordination,
communications overhead
Chapter 6 — Parallel Processors from Client to Cloud — 2
Amdahl’s Law
Sequential part can limit speedup
Example: 100 processors, 90× speedup?
Tnew = Tparallelizable/100 + Tsequential
across processors
Time = 10 × tadd + 100/10 × tadd = 20 × tadd
Speedup = 110/20 = 5.5 (55% of potential)
100 processors
Time = 10 × tadd + 100/100 × tadd = 11 × tadd
Speedup = 110/11 = 10 (10% of potential)
across processors
10 processors, 10 × 10 matrix
performance
Time = 10 × tadd + 100/10 × tadd = 20 × tadd
Constant
Interconnection network
Memory Memory
UMA NUMA
Chapter 6 — Parallel Processors from Client to Cloud — 15
Example: Sum Reduction
Sum 100,000 numbers on 100 processor UMA
Each processor has ID: 0 ≤ Pn ≤ 99
Partition 1000 numbers per processor
Initial summation on each processor
sum[Pn] = 0;
for ( i=1000*Pn; i<1000*(Pn+1); i=i+1)
sum[Pn] = sum[Pn] + A[i];
Now need to add these partial sums
Reduction: divide and conquer
Half the processors add pairs, then quarter, …
Need to synchronize between reduction steps
half = 100;
repeat
synch();
// The condition
if (half%2 != 0 && Pn == 0)
// when half is odd
sum[0] = sum[0] + sum[half-1];
half = half/2;
if (Pn < half) sum[Pn] = sum[Pn] + sum[Pn+half];
until (half == 1);
Bus Ring
N-cube (N = 3)
2D Mesh
Fully connected