COMPUTER ORGANIZATION AND DESIGN
5th
Edition
The Hardware/Software Interface
Chapter 1
Computer Abstractions
and Technology
Fall 2020
Soontae Kim
School of Computing
KAIST
§1.6 Performance
Defining Performance
◼ Which airplane has the best performance?
Boeing 777 Boeing 777
Boeing 747 Boeing 747
BAC/Sud BAC/Sud
Concorde Concorde
Douglas Douglas DC-
DC-8-50 8-50
0 100 200 300 400 500 0 2000 4000 6000 8000 10000
Passenger Capacity Cruising Range (miles)
Boeing 777 Boeing 777
Boeing 747 Boeing 747
BAC/Sud BAC/Sud
Concorde Concorde
Douglas Douglas DC-
DC-8-50 8-50
0 500 1000 1500 0 100000 200000 300000 400000
Cruising Speed (mph) Passengers x mph
Chapter 1 — Computer Abstractions and Technology — 2
Response Time and Throughput
◼ Response time
◼ How long it takes to do a task
◼ Throughput
◼ Total work done per unit time
◼ e.g., tasks/transactions/… per hour
◼ How are response time and throughput affected
by
◼ Replacing the processor with a faster version?
◼ Adding more processors?
◼ We’ll focus on response time for now…
Chapter 1 — Computer Abstractions and Technology — 3
Relative Performance
◼ Define Performance = 1/Execution Time
◼ “X is n times faster than Y”
Performance X Performance Y
= Execution timeY Execution timeX = n
◼ Example: time taken to run a program
◼ 10s on A, 15s on B
◼ Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
◼ So A is 1.5 times faster than B
Chapter 1 — Computer Abstractions and Technology — 4
Measuring Execution Time
◼ Elapsed time
◼ Total response time, including all aspects
◼ Processing, I/O, OS overhead, idle time
◼ Determines system performance
◼ CPU time
◼ Time spent processing a given job
◼ Discounts I/O time, other jobs’ shares
◼ Comprises user CPU time and system CPU
time
◼ Different programs are affected differently by
CPU and system performance
Chapter 1 — Computer Abstractions and Technology — 5
CPU Clocking
◼ Operation of digital hardware governed by a
constant-rate clock
Clock period
Clock (cycles)
Data transfer
and computation
Update state
◼ Clock period: duration of a clock cycle
◼ e.g., 250ps = 0.25ns = 250×10–12s
◼ Clock frequency (rate): cycles per second
◼ e.g., 4.0GHz = 4000MHz = 4.0×109Hz
Chapter 1 — Computer Abstractions and Technology — 6
CPU Time
CPU Time = CPU Clock Cycles Clock Cycle Time
CPU Clock Cycles
=
Clock Rate
◼ Performance improved by
◼ Reducing number of clock cycles
◼ Increasing clock rate
◼ Hardware designer must often trade off clock
rate against cycle count
◼ Example next slide
Chapter 1 — Computer Abstractions and Technology — 7
CPU Time Example
◼ Computer A: 2GHz clock, 10s CPU time
◼ Designing Computer B
◼ Aim for 6s CPU time
◼ Can do faster clock, but causes 1.2 × clock cycles
◼ How fast must Computer B clock be?
Clock CyclesB 1.2 Clock CyclesA
Clock RateB = =
CPU TimeB 6s
Clock CyclesA = CPU Time A Clock RateA
= 10s 2GHz = 20 109
1.2 20 109 24 109
Clock RateB = = = 4GHz
6s 6s
Chapter 1 — Computer Abstractions and Technology — 8
Instruction Count and CPI
Clock Cycles = Instruction Count Cycles per Instruction
CPU Time = Instruction Count CPI Clock Cycle Time
Instruction Count CPI
=
Clock Rate
◼ Instruction Count for a program
◼ Determined by program, ISA and compiler
◼ Average cycles per instruction
◼ Determined by CPU hardware
◼ If different instructions have different CPI
◼ Average CPI affected by instruction mix
Chapter 1 — Computer Abstractions and Technology — 9
CPI Example
◼ Computer A: Cycle Time = 250ps, CPI = 2.0
◼ Computer B: Cycle Time = 500ps, CPI = 1.2
◼ Same ISA
◼ Which is faster, and by how much?
CPU Time = Instruction Count CPI Cycle Time
A A A
= I 2.0 250ps = I 500ps A is faster…
CPU Time = Instruction Count CPI Cycle Time
B B B
= I 1.2 500ps = I 600ps
B = I 600ps = 1.2
CPU Time
…by this much
CPU Time I 500ps
A
Chapter 1 — Computer Abstractions and Technology — 10
CPI in More Detail
◼ If different instruction classes take different
numbers of cycles
n
Clock Cycles = (CPIi Instruction Counti )
i=1
◼ Weighted average CPI
Clock Cycles n
Instruction Counti
CPI = = CPIi
Instruction Count i=1 Instruction Count
Relative frequency
Chapter 1 — Computer Abstractions and Technology — 11
CPI Example
◼ Alternative compiled code sequences using
instructions in classes A, B, C
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1
◼ Sequence 1: IC = 5 ◼ Sequence 2: IC = 6
◼ Clock Cycles ◼ Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
◼ Avg. CPI = 10/5 = 2.0 ◼ Avg. CPI = 9/6 = 1.5
Chapter 1 — Computer Abstractions and Technology — 12
Performance Summary
The BIG Picture
Instructions Clock cycles Seconds
CPU Time =
Program Instruction Clock cycle
◼ Performance depends on
◼ Algorithm: affects IC, possibly CPI
◼ Programming language: affects IC, CPI
◼ Compiler: affects IC, CPI
◼ Instruction set architecture: affects IC, CPI, Tc
Chapter 1 — Computer Abstractions and Technology — 13
§1.7 The Power Wall
Power Trends
◼ In CMOS IC technology
Power = Capacitive load Voltage2 Frequency
×30 5V → 1V ×1000
Chapter 1 — Computer Abstractions and Technology — 14
Reducing Power
◼ Suppose a new CPU has
◼ 85% of capacitive load of old CPU
◼ 15% voltage and 15% frequency reduction
Pnew Cold 0.85 (Vold 0.85)2 Fold 0.85
= = 0.85 4
= 0.52
Cold Vold Fold
2
Pold
◼ The power wall
◼ We can’t reduce voltage further
◼ We can’t remove more heat
◼ How else can we improve performance?
Chapter 1 — Computer Abstractions and Technology — 15
§1.8 The Sea Change: The Switch to Multiprocessors
Uniprocessor Performance
Constrained by power, instruction-level parallelism,
memory latency
Chapter 1 — Computer Abstractions and Technology — 16
Multiprocessors
◼ Multicore microprocessors
◼ More than one processor per chip
◼ Requires explicitly parallel programming
◼ Compare with instruction level parallelism
◼ Hardware executes multiple instructions at once
◼ Hidden from the programmer
◼ Hard to do
◼ Programming for performance
◼ Load balancing
◼ Optimizing communication and synchronization
Chapter 1 — Computer Abstractions and Technology — 17
SPEC CPU Benchmark
◼ Programs used to measure performance
◼ Supposedly typical of actual workload
◼ Standard Performance Evaluation Corp (SPEC)
◼ Develops benchmarks for CPU, I/O, Web, …
◼ SPEC CPU2006
◼ Elapsed time to execute a selection of programs
◼ Negligible I/O, so focuses on CPU performance
◼ Normalize relative to reference machine
◼ Summarize as geometric mean of performance ratios
◼ CINT2006 (integer) and CFP2006 (floating-point)
n
n
Execution time ratio
i=1
i
Chapter 1 — Computer Abstractions and Technology — 18
CINT2006 for Intel Core i7 920
Chapter 1 — Computer Abstractions and Technology — 19
§1.10 Fallacies and Pitfalls
Pitfall: Amdahl’s Law
◼ Improving an aspect of a computer and
expecting a proportional improvement in
overall performance
Taffected
Timproved = + Tunaffected
improvement factor
◼ Example: multiply accounts for 80s/100s
◼ How much improvement in multiply performance to
get 5× overall?
80 ◼ Can’t be done!
20 = + 20
n
◼ Corollary: make the common case fast
Chapter 1 — Computer Abstractions and Technology — 20
Fallacy: Low Power at Idle
◼ Look back at i7 power benchmark
◼ At 100% load: 258W
◼ At 50% load: 170W (66%)
◼ At 10% load: 121W (47%)
◼ Google data center
◼ Mostly operates at 10% – 50% load
◼ At 100% load less than 1% of the time
◼ Consider designing processors to make
power proportional to load
Chapter 1 — Computer Abstractions and Technology — 21
Pitfall: MIPS as a Performance Metric
◼ MIPS: Millions of Instructions Per Second
◼ Doesn’t account for
◼ Differences in ISAs between computers
◼ Differences in complexity between instructions
Instruction count
MIPS =
Execution time 106
Instruction count Clock rate
= =
Instruction count CPI CPI 10 6
10 6
Clock rate
◼ CPI varies between programs on a given CPU
Chapter 1 — Computer Abstractions and Technology — 22
§1.9 Concluding Remarks
Concluding Remarks
◼ Cost/performance is improving
◼ Due to underlying technology development
◼ Hierarchical layers of abstraction
◼ In both hardware and software
◼ Instruction set architecture
◼ The hardware/software interface
◼ Execution time: the best performance
measure
◼ Power is a limiting factor
◼ Use parallelism to improve performance
Chapter 1 — Computer Abstractions and Technology — 23