0% found this document useful (0 votes)

29 views24 pages

Computer Architecture Fundamentals

The document provides an overview of computer architecture, defining it as the hardware organization of computers and discussing key components such as processors, memory systems, and networks. It highlights the importance of performance metrics, the challenges posed by the memory wall, and the evolution of processor architecture over time. Additionally, it addresses issues related to power, reliability, and design complexity in modern computing systems.

Uploaded by

Ha Moondancer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views24 pages

Computer Architecture Fundamentals

Uploaded by

Ha Moondancer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

CHAPTER 1

INTRODUCTION

• COMPUTER ARCHITECTURE: DEFINITION

• SYSTEM COMPONENTS

• TECHNOLOGICAL FACTORS AND TRENDS

• PERFORMANCE METRICS AND EVALUATION

• QUANTITATIVE PRINCIPLES OF COMPUTER DESIGN

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

WHAT IS COMPUTER ARCHITECTURE?
• OLD DEFINITION: INSTRUCTION SET ARCHITECTURE (ISA)

• TODAY’S DEFINITION IS MUCH BROADER: HARDWARE

ORGANIZATION OF COMPUTERS (HOW TO BUILD
COMPUTER)--INCLUDES ISA

• LAYERED VIEW OF COMPUTER SYSTEMS

• ROLE OF THE COMPUTER ARCHITECT:

• TO MAKE DESIGN TRADE-OFFS ACROSS THE HW/SW INTERFACE TO
MEET FUNCTIONAL, PERFORMANCE AND COST REQUIREMENTS
© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved
COMPUTER ORGANIZATION
• MODERN PC ARCHITECTURE

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

COMPUTER ORGANIZATION
• GENERIC HIGH-END PARALLEL SYSTEM:

• MAIN COMPONENTS: PROCESSOR, MEMORY SYSTEMS, I/O

AND NETWORKS,

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

PROCESSOR ARCHITECTURE
• HISTORICALLY THE CLOCK RATES OF MICROPROCESSORS HAVE
INCREASED EXPONENTIALLY
• Highest clock rate of Intel processors in every year from 1990 to
2008

• DUE TO PROCESS IMPROVEMENTS

• DEEPER PIPELINE
• CIRCUIT DESIGN TECHNIQUES

THIS HISTORICAL TREND HAS SUBSIDED OVER THE PAST 10 YEARS

IF IT HAD KEPT UP, TODAY’S CLOCK RATES WOULD BE MORE THAN
30GHz!!!!!

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

PROCESSOR ARCHITECTURE
• PIPELINING (I.E., ARCHITECTURE) AND CIRCUIT TECHNIQUES HAVE
GREATLY CONTRIBUTED TO THE DRAMATIC RISE OF THE CLOCK RATE
• THE 1.19 CURVE CORRESPONDS TO PROCESS IMPROVEMENTS ALONE
• REST IS DUE TO ARCHITECTURE AND CIRCUITS
• ADDITIONALLY COMPUTER ARCHITECTS TAKE ADVANTAGE OF THE
GROWING NUMBER OF CIRCUITS
New process every 2 year
feature size reduced by 30% every
process
or halved every 5 years

(a)
Number (b)of transistor doubles every 2

1B transistors reached in 2008

100B in 2021

• A SANDBOX TO PLAY IN SO TO SPEAK

• HOW DO WE USE 100B TRANSISTORS????
CAN THIS TREND CONTINUE?

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

MEMORY SYSTEMS
• MAIN MEMORY SPEED IS NOT GROWING AS FAST AS
PROCESSORS’ SPEED.
• GROWING GAP BETWEEN PROCESSOR AND MEMORY SPEED (THE SO-
CALLED “MEMORY WALL”)
• ONE WANTS TO DESIGN A MEMORY SYSTEM THAT’S BIG,
FAST AND CHEAP
• THE APPROACH IS TO USE A MULTI-LEVEL HIERARCHY OF MEMORIES
• MEMORY HIERARCHIES RELY ON PRINCIPLE OF LOCALITY
• EFFICIENT MANAGEMENT OF THE MEMORY HIERARCHY IS KEY
• COST AND SIZE OF MEMORIES IN A BASIC PC (2008)

Memory Size Marginal Cost Cost per MB Access Time

L2 Cache (on chip) 1MB $20/MB $20 5 nsec
Main Memory 1GB $50/GB 5c 200 nsec
Disk 500GB $100/500GB 0.02c 5msec

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

MEMORY WALL?? WHICH MEMORY WALL??

HISTORICALLY, MICROPROCESSOR SPEED HAS INCREASED BY 50% A YEAR

• WHILE DRAM PERFORMANCE IMPROVED BY 7% A YEAR
• ALTHOUGH DRAM DENSITY KEEPS INCREASING BY 4X EVERY 3 YEARS
• THIS CREATED THE PERCEPTION THAT THIS PROBLEM WOULD LAST FOREVER
• HOWEVER TRENDS HAVE CHANGED DRAMATICALLY IN THE PAST 6 YEARS
• THE “MEMORY WALL” (RELATIVE PERFORMANCE OF PROCESSORS VS DRAM)
DRAM: 1.07 CGR

Memory wall =
memory_cycle/processor_cycle

In 1990, it was about 4 (25MHz,150n

Grew to 200 exponentially until 2002
Has tappered off since then

• ALTHOUGH STILL A BIG PROBLEM, THE MEMORY WALL STOPPED GROWING

AROUND 2002.
WITH THE ADVENT OF MULTICORE MICROARCHITECTURES THE MEMORY PROBLEM HAS
SHIFTED FROM LATENCY TO BANDWIDTH

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

DISK
• HISTORICALLY DISK PERFORMANCE & DENSITY IMPROVED BY
40%
DISK TIME = ACCESS TIME + TRANSFER TIME

• HISTORICALLY TRANSFER TIMES HAVE DOMINATED

• BUT TODAY TRANSFER AND ACCESS TIMES ARE OF THE SAME
ORDER
• IN FUTURE ACCESS TIME WILL DOMINATE (MUCH SLOWER
CURVE)

NOTE: ALL THESE TIMES ARE STILL IN THE ORDER OF MILLISECONDS

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved MUST SWITCH CONTEXT
NETWORKS
• NETWORKS ARE PRESENT AT MANY LEVELS
• ON-CHIP INTERCONNECTS forward values from and to different
stages of a pipeline and among execution units AND connect
cores to shared cache banks.

• SYSTEM INTERCONNECTS connect processors (CMPs) to memory

& I/O

• I/O INTERCONNECTS (usually a bus such as e.g., PCI) connect

various I/O devices to the system bus.

• INTER-SYSTEM INTERCONNECTS connect separate systems

(separate chassis or box) and includes
• SANs (System-Area networks --connecting systems at very short distances),
• LAN (Local Area Networks --connecting systems within an organization or a
building),
• WAN (Wide Area networks --connecting multiple LAN at long distances).

• INTERNET. Most computing systems are connected to the

Internet, which is a global, worldwide interconnect.
© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved
PARALLELISM IN ARCHITECTURES

• THE MOST SUCCESSFUL MICROARCHITECTURE HAS BEEN

THE SCALAR PROCESSOR
• A TYPICAL SCALAR INSTRUCTION OPERATES ON SCALAR OPERANDS
ADD O1,O2,O3 /O2+O3=>O1
• EXECUTE MULTIPLE SCALAR INSTRUCTIONS AT A TIME
• PIPELINING
• SUPERSCALAR
• SUPERPIPELINING
• TAKES ADVANTAGE OF ILP, I.E., INSTRUCTION-LEVEL PARALLELISM, THE
PARALLELISM EXPOSED IN SINGLE THREAD OR SINGLE PROCESS EXECUTION

• CMPs (CHIP MULTIPROCESSORS) EXPLOITS PARALLELISM

EXPOSED BY DIFFERENT THREADS RUNNING IN PARALLEL
• THREAD LEVEL PARALLELISM OR TLP
• CAN BE SEEN AS MULTIPLE SCALAR PROCESSORS RUNNING IN
PARALLEL

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

PARALLELISM IN ARCHITECTURES

• VECTOR AND ARRAY PROCESSORS

• A TYPICAL VECTOR INSTRUCTION EXECUTES DIRECTLY ON VECTOR
OPERANDS
VADD VO1,VO2,VO3 /VO2+VO3=>VO1
• VOk IS A VECTOR OF SCALAR COMPONENTS
• EQUIVALENT TO COMPUTING
– VO2[i]+VO3[i]=>VO1[i], i=0,..,N
• VECTOR INSTRUCTIONS ARE EXECUTED BY PIPELINES OR
PARALLEL ARRAYS

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

POWER
• TOTAL POWER: DYNAMIC + STATIC(LEAKAGE)

Pdynamic = αCV2f

Pstatic = VIsub ≈ Ve-KVt/T

• DYNAMIC POWER FAVORS PARALLEL PROCESSING OVER HIGHER

CLOCK RATE
• DYNAMIC POWER ROUGHLY PROPORTIONAL TO F3
• TAKE A U.P. AND REPLICATE IT 4 TIMES: 4X SPEEDUP & 4X POWER
• TAKE A U.P. AND CLOCK IT 4 TIMES FASTER: 4X SPEEDUP BUT 64X DYNAMIC POWER!
• STATIC POWER
• BECAUSE CIRCUITS LEAK WHATEVER THE FREQUENCY IS.
• POWER/ENERGY ARE CRITICAL PROBLEMS
• POWER (IMMEDIATE ENERGY DISSIPATION) MUST BE DISSIPATED
• OTHERWISE TEMPERATURE GOES UP (AFFECTS PERFORMANCE, CORRECTNESS AND MAY
POSSIBLY DESTROY THE CIRCUIT, SHORT TERM OR LONG TERM)
• EFFECT ON THE SUPPLY OF POWER TO THE CHIP
• ENERGY (DEPENDS ON POWER AND SPEED)
• COSTLY; GLOBAL PROBLEM
• BATTERY OPERATED DEVICES

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

RELIABILITY
• TRANSIENT FAILURES (OR SOFT ERRORS)
• CHARGE Q = C X V
• IF C AND V DECREASE THEN IT IS EASIER TO FLIP A BIT
• SOURCES ARE COSMIC RAYS AND ALPHA PARTICULES RADIATING FROM
THE PACKAGING MATERIAL
• DEVICE IS STILL OPERATIONAL BUT VALUE HAS BEEN CORRUPTED
• SHOULD DETECT/CORRECT AND CONTINUE EXECUTION
• ALSO: ELECTRICAL NOISE CAUSES SIMILAR FAILURES
• INTERMITTENT/TEMPORARY FAILURES
• LAST LONGER
• DUE TO
• TEMPORARY: ENVIRONMENTAL VARIATIONS (EG, TEMPERATURE)
• INTERMITTENT: AGING
• SHOULD TRY TO CONTINUE EXECUTION
• PERMANENT FAILURES
• MEANS THAT THE DEVICE WILL NEVER FUNCTION AGAIN
• MUST BE ISOLATED AND REPLACED BY SPARE

PROCESS VARIATIONS INCREASE THE PROBABILITY OF FAILURES

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

WIRE DELAYS
• WIRE DELAYS DON’T SCALE LIKE LOGIC DELAYS
• PROCESSOR STRUCTURES MUST EXPAND TO SUPPORT MORE
INSTRUCTIONS
• THUS WIRE DELAYS DOMINATE THE CYCLE TIME; SLOW WIRES MUST
BE LOCAL

DESIGN COMPLEXITY
• PROCESSORS ARE BECOMING SO COMPLEX THAT A LARGE FRACTION
OF THE DEVELOPMENT OF A PROCESSOR OR SYSTEM IS DEDICATED
TO VERIFICATION
• CHIP DENSITY IS INCREASING MUCH FASTER THAN THE
PRODUCTIVITY OF VERIFICATION ENGINEERS (NEW TOOLS, SPEED OF
SYSTEMS)
•
CMOS ENDPOINT
• CMOS IS RAPIDLY REACHING THE LIMITS OF MINIATURIZATION
• FEATURE SIZES WILL REACH ATOMIC DIMENSIONS IN LESS THAN 15 YEARS
• OPTIONS????
• QUANTUM COMPUTING
• NANOTECHNOLOGY
• ANALOG COMPUTING

PERFORMANCE REMAINS A CRITICAL DESIGN FACTOR

• METRIC #1: TIME TO COMPLETE A TASK (Texe): EXECUTION

TIME, RESPONSE TIME, LATENCY
• “X IS N TIMES FASTER THAT Y” MEANS Texe(Y)/Texe(X) = N
• THE MAJOR METRIC USED IN THIS COURSE

• METRIC #2: NUMBER OF TASKS PER DAY, HOUR, SEC, NS

• THE THROUGHPUT FOR X IS N TIMES HIGHER THAN Y IF
THROUGHPUT(X)/THROUGHPUT(Y) = N
• NOT THE SAME AS LATENCY (EXAMPLE OF MULTIPROCESSORS)

• EXAMPLES OF UNRELIABLE METRICS:

• MIPS: MILLION OF INSTRUCTIONS PER SECOND
• MFLOPS: MILLION OF FLOATING POINT OPERATIONS PER SECOND

EXECUTION TIME OF A PROGRAM IS THE ULTIMATE MEASURE OF PERFORMANCE

BENCHMARKING

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

WHICH PROGRAM TO CHOOSE?
• REAL PROGRAMS:
• PORTING PROBLEM; COMPLEXITY; NOT EASY TO UNDERSTAND THE CAUSE OF
RESULTS

• KERNELS
• COMPUTATIONALLY INTENSE PIECE OF REAL PROGRAM

• TOY BENCHMARKS (E.G. QUICKSORT, MATRIX MULTIPLY)

• SYNTHETIC BENCHMARKS (NOT REAL)

• BENCHMARK SUITES
• SPEC: STANDARD PERFORMANCE EVALUATION CORPORATION
• SCIENTIFIC/ENGINEEING/GENERAL PURPOSE
• INTEGER AND FLOATING POINT
• NEW SET EVERY SO MANY YEARS (95,98,2000,2006)
• TPC BENCHMARKS:
• FOR COMMERCIAL SYSTEMS
• TPC-B, TPC-C, TPC-H, AND TPC-W
• EMBEDDED BENCHMARKS
• MEDIA BENCHMARKS

REPORTING PERFORMANCE FOR A SET OF PROGRAMS

LET Ti BE THE EXECUTION TIME OF PROGRAM i:

1. (WEIGHTED) ARITHMETIC MEAN OF EXECUTION TIMES:
 Ti  N  T i W i OR
i i

THE PROBLEM HERE IS THAT THE PROGRAMS WITH LONGEST EXECUTION TIMES
DOMINATE THE RESULT
2. DEALING WITH SPEEDUPS
• SPEEDUP MEASURES THE ADVANTAGE OF A MACHINE OVER A
REFERENCE MACHINE FOR A PROGRAM i
T R i
S i = -----------
Ti
• ARITHMETIC MEAN OF SPEEDUPS
• HARMONIC MEAN
N = N
S = --------- -----------
N1 NT
 ---- i
 ----------
i i S i
T R i

REPORTING PERFORMANCE FOR A SET OF
PROGRAMS
• GEOMETRIC MEANS OF SPEEDUPS
N
S = N  Si
i=1
• MEAN SPEEDUP COMPARIONS BETWEEN TWO MACHINES ARE INDEPENDENT
OF THE REFERENCE MACHINE
• EASILY COMPOSABLE
• USED TO REPORT SPEC NUMBERS FOR INTEGER AND FLOATING POINT
Program A Program B Arithmetic Speedup (ref 1) Speedup (ref 2)
Mean
Machine 1 10 sec 100 sec 55 sec 91.8 10

Machine 2 1 sec 200 sec 100.5 sec 50.2 5.5

Reference 1 100 sec 10000 sec 5050 sec

Reference 2 100 sec 1000 sec 550 sec

Program A Program B Arithmetic Harmonic Geometric

Wrt Reference 1 Machine 1 10 100 55 18.2 31.6

Machine 2 100 50 75 66.7 70.7

Wrt Reference 2 Machine 1 10 10 10 10 10

Machine 2 100 5 52.5 9.5 22.4

Texe = IC X CPI X Tc

• IC: DEPENDS ON PROGRAM, COMPILER AND ISA

• CPI: DEPENDS ON INSTRUCTION MIX, ISA, AND IMPLEMENTATION
• Tc: DEPENDS ON IMPLEMENTATION COMPLEXITY AND TECHNOLOGY

CPI (CLOCK PER INSTRUCTION) IS OFTEN USED INSTEAD OF EXECUTION TIME

• WHEN PROCESSOR EXECUTES MORE THAN ONE

INSTRUCTION PER CLOCK USE IPC (INSTRUCTIONS PER
CLOCK)

Texe = (IC X Tc)/IPC

AMDAHL’S LAW
1-F F
without E

Apply enhancement

1-F F/S
with E

• ENHANCEMENT E ACCELERATES A FRACTION F OF THE TASK

BY A FACTOR S

T exewithE = T exewithoutEX 1 – F + -F
S

T exewithoutE 1
SpeedupE = -------------------------- = ---------------
T exewithE F
1 – F + -
S

LESSONS FROM AMDAHL’S LAW
• IMPROVEMENT IS LIMITED BY THE FRACTION OF THE
EXECUTION TIME THAT CANNOT BE ENHANCED
1-
SPEEDUPE < ------
1– F

• LAW OF DIMINISHING RETURNS

F=0.5

• OPTIMIZE THE COMMON CASE

• EXECUTE THE RARE CASE IN SOFTWARE (E.G. EXCEPTIONS)

PARALLEL SPEEDUP
T
SP = ---1 1
- = ----------------- P
- = ---------------- 1-
- < ------
T P 1 – F +F § P F +P1 – F 1 – F

F=0.95

• NOTE: SPEEDUP CAN BE SUPERLINEAR. HOW CAN THAT

BE??

OVERALL NOT VERY HOPEFUL

GUSTAFSON’S LAW
• REDEFINE SPEEDUP
• THE RATIONALE IS THAT, AS MORE AND MORE CORES ARE INTEGRATED
ON CHIP OVER TIME, THE WORKLOADS ARE ALSO GROWING
• STARTS WITH THE EXECUTION TIME ON THE PARALLEL MACHINE WITH P
PROCESSORS:

T PSERIAL
• s IS THE TIME TAKEN BY THE = s +p CODE AND p IS THE TIME TAKEN BY
THE PARALLEL CODE
• EXECUTION TIME ON ONE PROCESSOR IS
• T 1 = s=+1+F(P-1)
Let F=p/(s+p). Then SP = (s+pP)/(s+p) = 1-F+FP pP

Computer Architecture and Design Fundamentals
No ratings yet
Computer Architecture and Design Fundamentals
50 pages
CI-0120 Arquitectura de Computadoras Ejemplos FundamentosDiseño
No ratings yet
CI-0120 Arquitectura de Computadoras Ejemplos FundamentosDiseño
52 pages
Computer Architecture Fundamentals Explained
No ratings yet
Computer Architecture Fundamentals Explained
73 pages
Computer Architecture: Trends & Analysis
No ratings yet
Computer Architecture: Trends & Analysis
28 pages
Computer Performance and Architecture Insights
No ratings yet
Computer Performance and Architecture Insights
24 pages
Understanding Parallel Architecture Basics
No ratings yet
Understanding Parallel Architecture Basics
84 pages
Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition
No ratings yet
Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition
24 pages
Unit 1
No ratings yet
Unit 1
194 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
64 pages
Computer Architecture Course Overview
No ratings yet
Computer Architecture Course Overview
18 pages
Lecture 3
No ratings yet
Lecture 3
26 pages
Lecture 1 Computer Abstraction and Performance
No ratings yet
Lecture 1 Computer Abstraction and Performance
25 pages
Defining Computer Architecture
No ratings yet
Defining Computer Architecture
6 pages
Enhancing Computing Performance Techniques
No ratings yet
Enhancing Computing Performance Techniques
15 pages
Computer Architecture and Organization Performance - 1
No ratings yet
Computer Architecture and Organization Performance - 1
18 pages
Introduction to Computer Organization
No ratings yet
Introduction to Computer Organization
20 pages
Architecture II
No ratings yet
Architecture II
247 pages
Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition
No ratings yet
Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition
37 pages
Week 4a - Computer Architecture Fundamentals - Part 1
No ratings yet
Week 4a - Computer Architecture Fundamentals - Part 1
45 pages
The Future Evolution of High-Performance Microprocessors: Norm Jouppi HP Labs
No ratings yet
The Future Evolution of High-Performance Microprocessors: Norm Jouppi HP Labs
57 pages
CH02 COA10e
No ratings yet
CH02 COA10e
67 pages
RTSEC Documentation
No ratings yet
RTSEC Documentation
4 pages
Advanced Computer Architecture Course Overview
No ratings yet
Advanced Computer Architecture Course Overview
28 pages
Understanding CA02 Performance Metrics
No ratings yet
Understanding CA02 Performance Metrics
79 pages
Computer Architecture and Parallel Processing
No ratings yet
Computer Architecture and Parallel Processing
65 pages
Lecture1 - Computer Abstractions and Technology v2
No ratings yet
Lecture1 - Computer Abstractions and Technology v2
58 pages
Introduction to Parallel Computing Course
No ratings yet
Introduction to Parallel Computing Course
50 pages
HPC Pipeline Execution Time Overview
No ratings yet
HPC Pipeline Execution Time Overview
124 pages
L1.0 HPC Overview
No ratings yet
L1.0 HPC Overview
58 pages
Chapter 01
No ratings yet
Chapter 01
40 pages
01) Fundamentals of Quantitative Design and Analysis
No ratings yet
01) Fundamentals of Quantitative Design and Analysis
71 pages
Trends in Computer Architecture
No ratings yet
Trends in Computer Architecture
30 pages
Lecture 36
No ratings yet
Lecture 36
15 pages
Chapter 2
No ratings yet
Chapter 2
34 pages
Introducción 2024
No ratings yet
Introducción 2024
41 pages
Overview of Computer Architecture Design
No ratings yet
Overview of Computer Architecture Design
4 pages
Unit I Fundamentals of Computer Design and Ilp-1-14
No ratings yet
Unit I Fundamentals of Computer Design and Ilp-1-14
14 pages
Computer Architecture Course Overview
No ratings yet
Computer Architecture Course Overview
30 pages
CS3350B Computer Architecture: Marc Moreno Maza
100% (1)
CS3350B Computer Architecture: Marc Moreno Maza
45 pages
CCS 1202 Lecture 2 - Computer Evolution and Performance
No ratings yet
CCS 1202 Lecture 2 - Computer Evolution and Performance
32 pages
Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition
No ratings yet
Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition
54 pages
Introduction to Computer Organization
No ratings yet
Introduction to Computer Organization
47 pages
Hpca Notes
No ratings yet
Hpca Notes
216 pages
Overview of Von Neumann Architecture
No ratings yet
Overview of Von Neumann Architecture
42 pages
Advanced Computer Architecture Overview
No ratings yet
Advanced Computer Architecture Overview
61 pages
1 BookIntro
No ratings yet
1 BookIntro
23 pages
Chapter 01
No ratings yet
Chapter 01
78 pages
Parallel Programming for Scientists
No ratings yet
Parallel Programming for Scientists
50 pages
Overview of Computer Evolution and Architecture
No ratings yet
Overview of Computer Evolution and Architecture
71 pages
Computer Evolution and Performance Insights
No ratings yet
Computer Evolution and Performance Insights
32 pages
Computer Systems for Tech Students
No ratings yet
Computer Systems for Tech Students
50 pages
Computer Architecture Lesson Plans
No ratings yet
Computer Architecture Lesson Plans
41 pages
Computer Architecture Lesson Plan Overview
No ratings yet
Computer Architecture Lesson Plan Overview
250 pages
Introduction to Computer Organization
No ratings yet
Introduction to Computer Organization
21 pages
التحليل
No ratings yet
التحليل
32 pages
ECE411: Computer Organization Overview
No ratings yet
ECE411: Computer Organization Overview
40 pages
Industrial BiRotor Meter Guide
No ratings yet
Industrial BiRotor Meter Guide
4 pages
Tle 9 Eim 9 Q4 M15
100% (1)
Tle 9 Eim 9 Q4 M15
13 pages
Challenger DKHC Rotary Header With Spoke 700727376B
No ratings yet
Challenger DKHC Rotary Header With Spoke 700727376B
89 pages
Tsurumi Submersible Pump Overview
No ratings yet
Tsurumi Submersible Pump Overview
5 pages
504164-003 X-32 sn53095-53100 C.PS
No ratings yet
504164-003 X-32 sn53095-53100 C.PS
102 pages
5 5045d 2wd 5050d 5205 Trem Iiia Eng
No ratings yet
5 5045d 2wd 5050d 5205 Trem Iiia Eng
3 pages
Direct Injection and Ignition System (4-Cyl 1 8 LTR 2 0 LTR 4-Valve Turbo - Generation III)
No ratings yet
Direct Injection and Ignition System (4-Cyl 1 8 LTR 2 0 LTR 4-Valve Turbo - Generation III)
70 pages
Industrial Grinding Machine Guide
No ratings yet
Industrial Grinding Machine Guide
87 pages
Excavator Efficiency & Durability
No ratings yet
Excavator Efficiency & Durability
11 pages
V-Belts and Pulleys Specifications Guide
No ratings yet
V-Belts and Pulleys Specifications Guide
353 pages
Vehicle Breakdown Assistance Report
No ratings yet
Vehicle Breakdown Assistance Report
2 pages
2015 VRF Catalog - 012115 - D - FNL Viewing
No ratings yet
2015 VRF Catalog - 012115 - D - FNL Viewing
39 pages
3 Gardner Denver Compressor Bulletin 13-9-209 - 2nd - 9-03
No ratings yet
3 Gardner Denver Compressor Bulletin 13-9-209 - 2nd - 9-03
2 pages
Toyota Corolla Repair Manual - Replacement - Instrument Panel Sub-Assy Lower - Windshield - Windowglass - Mirror
No ratings yet
Toyota Corolla Repair Manual - Replacement - Instrument Panel Sub-Assy Lower - Windshield - Windowglass - Mirror
10 pages
Cable Tray Loading Selection
No ratings yet
Cable Tray Loading Selection
1 page
Fluxus F601 User Manual Overview
No ratings yet
Fluxus F601 User Manual Overview
66 pages
PRETEST
No ratings yet
PRETEST
3 pages
9 Sew Latest
No ratings yet
9 Sew Latest
12 pages
#2 Bettys House Fun Activities Games 1
No ratings yet
#2 Bettys House Fun Activities Games 1
1 page
Isuzu Diesel Engine Specs
No ratings yet
Isuzu Diesel Engine Specs
11 pages
Sub-Zero Refrigerator PDF
No ratings yet
Sub-Zero Refrigerator PDF
56 pages
Fire & MGPS
No ratings yet
Fire & MGPS
137 pages
IMPATT Diode: Structure & Applications
No ratings yet
IMPATT Diode: Structure & Applications
16 pages
MC-0133 Uc-0060-0240 2012 (En)
No ratings yet
MC-0133 Uc-0060-0240 2012 (En)
33 pages
DEN0001C Principles of Arm Memory Maps
No ratings yet
DEN0001C Principles of Arm Memory Maps
25 pages
Basic Air Balance HVAC
No ratings yet
Basic Air Balance HVAC
19 pages
MPS Variable Installation Manual (48K+60K)
No ratings yet
MPS Variable Installation Manual (48K+60K)
45 pages
Dpu30d-N06a1 & Dbu20b-N12a1 Distributed
100% (1)
Dpu30d-N06a1 & Dbu20b-N12a1 Distributed
52 pages
ZX Spectrum 128K Video Enhancements
No ratings yet
ZX Spectrum 128K Video Enhancements
8 pages

Computer Architecture Fundamentals

Uploaded by

Computer Architecture Fundamentals

Uploaded by

CHAPTER 1

• COMPUTER ARCHITECTURE: DEFINITION

• TECHNOLOGICAL FACTORS AND TRENDS

• PERFORMANCE METRICS AND EVALUATION

• QUANTITATIVE PRINCIPLES OF COMPUTER DESIGN

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

• TODAY’S DEFINITION IS MUCH BROADER: HARDWARE

• LAYERED VIEW OF COMPUTER SYSTEMS

• ROLE OF THE COMPUTER ARCHITECT:

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

• MAIN COMPONENTS: PROCESSOR, MEMORY SYSTEMS, I/O

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

• DUE TO PROCESS IMPROVEMENTS

THIS HISTORICAL TREND HAS SUBSIDED OVER THE PAST 10 YEARS

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

1B transistors reached in 2008

• A SANDBOX TO PLAY IN SO TO SPEAK

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

Memory Size Marginal Cost Cost per MB Access Time

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

HISTORICALLY, MICROPROCESSOR SPEED HAS INCREASED BY 50% A YEAR

In 1990, it was about 4 (25MHz,150n

• ALTHOUGH STILL A BIG PROBLEM, THE MEMORY WALL STOPPED GROWING

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

• HISTORICALLY TRANSFER TIMES HAVE DOMINATED

NOTE: ALL THESE TIMES ARE STILL IN THE ORDER OF MILLISECONDS

• SYSTEM INTERCONNECTS connect processors (CMPs) to memory

• I/O INTERCONNECTS (usually a bus such as e.g., PCI) connect

• INTER-SYSTEM INTERCONNECTS connect separate systems

• INTERNET. Most computing systems are connected to the

• THE MOST SUCCESSFUL MICROARCHITECTURE HAS BEEN

• CMPs (CHIP MULTIPROCESSORS) EXPLOITS PARALLELISM

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

• VECTOR AND ARRAY PROCESSORS

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

Pstatic = VIsub ≈ Ve-KVt/T

• DYNAMIC POWER FAVORS PARALLEL PROCESSING OVER HIGHER

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

PROCESS VARIATIONS INCREASE THE PROBABILITY OF FAILURES

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

PERFORMANCE REMAINS A CRITICAL DESIGN FACTOR

• METRIC #1: TIME TO COMPLETE A TASK (Texe): EXECUTION

• METRIC #2: NUMBER OF TASKS PER DAY, HOUR, SEC, NS

• EXAMPLES OF UNRELIABLE METRICS:

EXECUTION TIME OF A PROGRAM IS THE ULTIMATE MEASURE OF PERFORMANCE

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

• TOY BENCHMARKS (E.G. QUICKSORT, MATRIX MULTIPLY)

• SYNTHETIC BENCHMARKS (NOT REAL)

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

LET Ti BE THE EXECUTION TIME OF PROGRAM i:

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

Machine 2 1 sec 200 sec 100.5 sec 50.2 5.5

Reference 1 100 sec 10000 sec 5050 sec

Reference 2 100 sec 1000 sec 550 sec

Program A Program B Arithmetic Harmonic Geometric

Wrt Reference 1 Machine 1 10 100 55 18.2 31.6

Machine 2 100 50 75 66.7 70.7

Wrt Reference 2 Machine 1 10 10 10 10 10

Machine 2 100 5 52.5 9.5 22.4

• IC: DEPENDS ON PROGRAM, COMPILER AND ISA

CPI (CLOCK PER INSTRUCTION) IS OFTEN USED INSTEAD OF EXECUTION TIME

• WHEN PROCESSOR EXECUTES MORE THAN ONE

Texe = (IC X Tc)/IPC

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

• ENHANCEMENT E ACCELERATES A FRACTION F OF THE TASK

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

• LAW OF DIMINISHING RETURNS

• OPTIMIZE THE COMMON CASE

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

• NOTE: SPEEDUP CAN BE SUPERLINEAR. HOW CAN THAT

OVERALL NOT VERY HOPEFUL

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

© Michel Dubois, Murali Annavaram, Per Stenström All rights reserved

You might also like