Introduction to Parallel Programming – Part 6
Analyzing Parallel Performance
Intel Software College
Intel® Software College
Objectives
At the end of this module, you should be able to
Define speedup and efficiency
Use Amdahl’s Law to predict maximum speedup
Use the Karp-Flatt metric to
analyze parallel program performance
predict speedup with additional processors
Analyzing Parallel Performance
2
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Speedup
Speedup is the ratio between sequential execution time and
parallel execution time
For example, if the sequential program executes in 6
seconds and the parallel program executes in 2 seconds, the
speedup is 3
y=x
Speedup
Speedup curves
look like this
Processors
Analyzing Parallel Performance
3
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Efficiency
Efficiency
A measure of processor utilization
Speedup divided by the number of processors
Example
Program achieves speedup of 3 on 4 CPUs
Efficiency is 3 / 4 = 75%
y = 1.0
Efficiency
Efficiency curves
look like this
Processors
Analyzing Parallel Performance
4
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Idea Behind Amdahl’s Law
Portion of computation
that will be performed
sequentially
Execution Time
f Portion of computation
that will be executed
in parallel
f
1-f f
f f
(1-f )/2
(1-f )/3 (1-f )/4
(1-f )/5
Processors
Analyzing Parallel Performance
5
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Derivation of Amdahl’s Law
Speedup is ratio of execution time on 1 processor to
execution time on p processors
Execution time on 1 processor is f + (1-f)
Execution time on p processors is at least f + (1-f)/p
f (1 f ) 1
f (1 f ) / p f (1 f ) / p
Analyzing Parallel Performance
6
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Amdahl’s Law Is Too Optimistic
Amdahl’s Law ignores parallel processing overhead
Examples of this overhead include time spent
creating and terminating threads
Parallel processing overhead is usually an increasing
function of the number of processors
Analyzing Parallel Performance
7
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Graph with Parallel Overhead Added
Parallel overhead
Execution Time
increases with
# of processors
Processors
Analyzing Parallel Performance
8
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Other Optimistic Assumptions
Amdahl’s Law assumes that the computation divides
evenly among the processors
In reality, the amount of work does not divide evenly
among the processors
Processor waiting time is another form of overhead
Task started
Working time
Waiting time
Task completed
Analyzing Parallel Performance
9
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Graph with Workload Imbalance Added
Execution Time
Time lost
due to
workload
imbalance
Processors
Analyzing Parallel Performance
10
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
More General Speedup Formula
(n,p) Speedup for problem of size n on p CPUs
(n) Time spent in sequential portion of code
for problem of size n
(n) Time spent in parallelizable portion of
code for problem of size n
(n,p)Parallel overhead
( n) ( n)
( n, p )
( n ) ( n ) / p ( n, p )
Analyzing Parallel Performance
11
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Amdahl’s Law: Maximum Speedup
( n) ( n)
( n, p )
( n ) ( n ) / p ( n, p )
Assumes parallel
work divides perfectly
among available CPUs
This term is set to 0
Analyzing Parallel Performance
12
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
The Amdahl Effect
( n) ( n)
( n, p )
( n ) ( n ) / p ( n, p )
As n these
terms dominate
Speedup is an increasing function of problem size
Analyzing Parallel Performance
13
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Illustration of the Amdahl Effect
Linear speedup
n = 100,000
Speedup
n = 10,000
n = 1,000
Processors
Analyzing Parallel Performance
14
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Using Amdahl’s Law
Program executes in 5 seconds
Profile reveals 80% of time spent in function alpha,
which we can execute in parallel
What would be maximum speedup on 2 processors?
1 1
1.67
0.2 (1 0.2) / 2 0.6
New execution time ≥ 5 sec / 1.67 = 3 seconds
Analyzing Parallel Performance
15
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
The Karp-Flatt Metric
Suppose we benchmark a parallel program and get
these speedup figures
Processors Speedup Efficiency
2 1.5 75%
3 1.8 60%
4 2 50%
Why is efficiency dropping?
How much speedup could we expect on 8 processors?
Analyzing Parallel Performance
16
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Deriving the Karp-Flatt Metric
( n) ( n)
( n, p )
( n ) ( n ) / p ( n, p )
The denominator represents parallel execution time
One processor does sequential code; others idle
All processors incur overhead time
“Wasted time” = (p-1)(n) + p(n, p)
Experimentally determined serial fraction = “wasted
time” divided by (p-1) times sequential time
Analyzing Parallel Performance
17
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Karp-Flatt Metric
1 / 1 / p
e
1 1/ p
The experimentally determined serial fraction is a
function of speedup and the number of processors
We can use e to determine whether efficiency
decreases are due to
Sequential component of computation
Increases in overhead
Analyzing Parallel Performance
18
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
How to Interpret “e”
If “e” is constant as the number of processors
increases, then speedup is constrained by the
sequential component of the computation
If “e” is increasing as the number of processors
increases, then speedup is constrained by
parallel overhead, such as
Thread creation/termination time
Contention for shared data structures
Cache-related inefficiencies
Often a combination of the two factors
Analyzing Parallel Performance
19
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Going Back to Our Example
Processors Speedup Efficiency e
2 1.5 75% 0.33
3 1.8 60% 0.33
4 2.0 50% 0.33
In this case, speedup is constrained by the relatively
large amount of time spent in sequential code
Analyzing Parallel Performance
20
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Example: Rectangle Rule Program
Processors Speedup Efficiency e
2 1.87 93% 0.070
3 2.60 87% 0.078
4 3.16 79% 0.089
Benchmark data from an OpenMP program computing using
the rectangle rule
We can predict speedup on 6 processors
Extrapolate e to be 0.11
Speedup would be 3.87
Analyzing Parallel Performance
21
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Speedup Prediction Formula
1 / 1 / p
e
1 1/ p
p
e( p 1) 1
Analyzing Parallel Performance
22
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Case Study
We benchmark a sequential program and find it
spends 85% of its time in functions we believe we
can make parallel
We make these functions multithreaded and execute
the program on a dual-core system
The parallel program achieves a speedup of 1.67 on
2 processors
If we can get access to a quad-core system, what
kind of speedup should we expect?
Analyzing Parallel Performance
23
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Prediction Based on Amdahl’s Law
1
0.15 (1 0.15) / 4
2.76
Analyzing Parallel Performance
24
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Prediction Based on Karp-Flatt Metric
When p = 2, e = 0.25
We know 0.15 of e is sequential component
Rest of e (0.05) is parallel overhead
If parallel overhead increases linearly with number of
processors, then it will be 0.15 when p = 3
We predict when p = 4, e = 0.30
Hence when p = 4, we predict speedup of 2.11
Analyzing Parallel Performance
25
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Superlinear Speedup
According to our general speedup formula, the
maximum speedup a program can achieve on p
processors is p
Superlinear speedup is the situation where
speedup is greater than the number of processors
used
It means the computational rate of the processors is
faster when the parallel program is executing
Superlinear speedup is usually caused because the
cache hit rate of the parallel program is higher
Analyzing Parallel Performance
26
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
References
Michael J. Quinn, Parallel Programming in C with MPI
and OpenMP, McGraw-Hill (2004).
Analyzing Parallel Performance
27
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Analyzing Parallel Performance
28
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.