0% found this document useful (0 votes)

128 views32 pages

Advanced Computer Architecture Overview

This document provides an overview of CS 203A Advanced Computer Architecture, an undergraduate course taught by Professor Laxmi Narayan Bhuyan. It introduces the instructor, syllabus, course details, and provides a brief introduction to key concepts in computer architecture including instruction set architecture, implementation, hardware trends, performance metrics, and benchmarking. The document aims to familiarize students with the foundational topics that will be covered throughout the course.

Uploaded by

Ankita Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

128 views32 pages

Advanced Computer Architecture Overview

Uploaded by

Ankita Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

CS 203A Advanced Computer Architecture

Lecture 1

Instructor: L. N. Bhuyan

9/23/2004

Lec 1-2

Instructor Information
Laxmi Narayan Bhuyan Office: Engg.II Room 351 E-mail: [email protected] Tel: (951) 787-2244 Office Times: W, 3-4.30 pm

9/23/2004

Lec 1-2

Course Syllabus
Introduction, Performance, Instruction Set, Pipelining Appendix A Instruction level parallelism, Dynamic scheduling, Branch Prediction and Speculation Ch 2 Text Limits on ILP and Software Approaches Ch 3 Multiprocessors, Thread-Level ParallelismCh 4 Memory Hierarchy Ch 5 I/O Architectures Papers Text: Hennessy and Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufman Publisher Fourth Editon Prerequisite: CS 161
9/23/2004 Lec 1-2 3

Course Details
Grading: Based on Curve Test1: 35 points Test 2: 35 points Project: 30 points

9/23/2004

Lec 1-2

What is Computer Architecture

Computer Architecture =
Instruction Set Architecture + Organization + Hardware +

9/23/2004

Lec 1-2

The Instruction Set: a Critical Interface

The actual programmer visible instruction set

software
instruction set

hardware

9/23/2004

Lec 1-2

Instruction-Set Processor Design

Architecture (ISA)
programmer/compiler view

functional appearance to its immediate user/system programmer Opcodes, addressing modes, architected registers, IEEE floating point

Implementation (architecture) processor designer/view

logical structure or organization that performs the architecture Pipelining, functional units, caches, physical registers

Realization

chip/system designer view physical structure that embodies the implementation Gates, cells, transistors, wires
Lec 1-2 7

(chip)

9/23/2004

Hardware
Trends in Technology (Section 1.4 and Figure 1.9):
Feature size (10 microns in 1971 to 0.18 microns in 2001, to 0.045 in 2010!!!)
Minimum size of a transistor or a wire in either the x or y dimension

Logic designs Packaging technology Clock rate Supply voltage Moores Law Number of transistors doubles every 1.5 years (due to smaller feature size and larger die size)

9/23/2004

Lec 1-2

Relationship Between the Three Aspects

Processors having identical ISA may be very different in organization. Processors with identical ISA and nearly identical organization are still not nearly identical.
e.g. Pentium II and Celeron are nearly identical but differ at clock rates and memory systems

Architecture covers all three aspects.

9/23/2004

Lec 1-2

Applications and Requirements

Scientific/numerical: weather prediction, molecular modeling
Need: large memory, floating-point arithmetic

Commercial: inventory, payroll, web serving, e-commerce

Need: integer arithmetic, high I/O

Embedded: automobile engines, microwave, PDAs

Need: low power, low cost, interrupt driven

Network computing: Web, Security, multimedia, games, entertainment

Need: high data bandwidth, application processing, graphics

9/23/2004

Lec 1-2

Network bandwidth outpaces Moores law

TCP requirements Rule of thumb: 1GHz for 1Gbps 1000

100 40

100 Network bandwidth

GHz and Gbps

Moores Law

0.1

Moores Law
1990 1995 2000 2003 2005 2006/7 2010

.01

Time

Classes of Computers
High performance (supercomputers) Balanced cost/performance
Supercomputers Cray T-90, SGI Altix Massively parallel computers Cray T3E
Workstations SPARCstations Servers SGI Origin, UltraSPARC High-end PCs Pentium quads

Low cost/power

Low-end PCs, laptops, PDAs mobile Pentiums

9/23/2004

Lec 1-2

Why Study Computer Architecture

Arent they fast enough already?
Are they? Fast enough to do everything we will EVER want?
AI, protein sequencing, graphics

Is speed the only goal?

Power: heat dissipation + battery life Cost Reliability Etc.

Answer #1: requirements are always changing

9/23/2004 Lec 1-2 14

Why Study Computer Architecture

Answer #2: technology playing field is always changing

Annual technology improvements (approx.)

Logic: density + 25%, speed +20% DRAM (memory): density +60%, speed: +4% Disk: density +25%, disk speed: +4%

Designs change even if requirements are fixed. But the requirements are not fixed.

9/23/2004

Lec 1-2

Example of Changing Designs

Having, or not having caches
1970: 10K transistors on a single chip, DRAM faster than logic having a cache is bad 1990: 1M transistors, logic is faster than DRAM having a cache is good 2000: 600M transistors -> multiple level caches and multiple CPUs -> Multicore CPUs Will software ever catch up?

9/23/2004

Lec 1-2

Performance Growth in Perspective

Same absolute increase in computing power
Big Bang 2001 2001 2003

1971 2001: performance improved 35,000X!!!

What if cars or planes improved at this rate?

9/23/2004

Lec 1-2

Measuring Performance
Latency (response time, execution time)
Minimize time to wait for a computation

Energy/Power consumption Throughput (tasks completed per unit time, bandwidth)

Maximize work done in a given interval = 1/latency when there is no overlap among tasks > 1/latency when there is
In real processors there is always overlap (pipelining)

Both are important (Architecture Latency is important,

Embedded system Power consumption is important, and Network Throughput is important)

9/23/2004

Lec 1-2

Performance Terminology
X is n times faster than Y means:
Execution timeY Execution timeX =n

X is m% faster than Y means: Execution timeY - Execution timeX Execution timeX

9/23/2004 Lec 1-2 19

X 100% = m

Compute Speedup Amdahls Law

Speedup is due to enhancement(E):
TimeBefore TimeAfter

Speedup (E) =

Execution time w/o E (Before) Execution time w E (After)

Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected, what is the Execution timeafter and Speedup(E) ?
9/23/2004 Lec 1-2 20

Amdahls Law
Execution timeafter = ExTimebefore x [(1-F) + F

Speedup(E) =

ExTimebefore ExTimeafter

1 F [(1-F) + S

9/23/2004

Lec 1-2

Amdahls Law An Example

Q: Floating point instructions improved to run 2X; but only 10% of execution time are FP ops. What is the execution time and speedup after improvement? Ans:
F = 0.1, S = 2 ExTimeafter = ExTimebefore x [ (1-0.1) + 0.1/2 ] = 0.95 ExTimebefore

Speedup =

ExTimebefore ExTimeafter

1 0.95

= 1.053

Read examples in the book!

9/23/2004 Lec 1-2 22

CPU Performance
The Fundamental Law

seconds instructio ns cycles seconds CPU time program program instructio n cycle
Three components of CPU performance:
Instruction count CPI Clock cycle time

Program Compiler Inst. Set Architecture Arch Physical Design

Inst. Count CPI X X X X X X

Clock

X X X
23

9/23/2004

Lec 1-2

CPI - Cycles per Instruction

Let Fi be the frequency of type I instructions in a program. Then, Average CPI:

Total Cycle CPI Total Instruction Count IC i CPI i Fi where Fi Instruction Count i 1
CPU time Cycle time (CPI i IC i )
i 1 n

Example:

Instruction type Frequency Clock cycles

ALU Load 43% 21% 1 2

Store 12% 2

Branch 24% 2

average CPI = 0.43 + 0.42 + 0.24 + 0.48 = 1.57 cycles/instruction

9/23/2004 Lec 1-2 24

Example (RISC Vs. CISC)

Instruction mix of a RISC architecture.
Inst. Freq. C. C. ALU 50% 1 Load 20% 2 Store 10% 2 Branch 20% 2

Add a register-memory ALU instruction format?

One op. in register, one op. in memory

The new instruction will take 2 cc but will also increase the Branches to 3 cc.

Q: What fraction of loads must be eliminated for this to pay off?

9/23/2004 Lec 1-2 25

Solution
Instr. ALU Load Store Branch Reg/Mem
1.0 CPI=1.5

Fi
.5 .2 .1 .2

CPIi
1 2 2 2

CPIixFi
.5 .4 .2 .4

Ii
.5-X .2-X .1 .2 X

CPIi
1 2 2 3 2

CPIixIi
.5-X .4-2X .2 .6 2X

1-X

(1.7-X)/(1-X)

Exec Time = Instr. Cnt. x CPI x Cycle time

Instr. Cntold x CPIold x Cycle timeold >= Instr. Cntnew x CPInew x Cycle timenew 1.0 x 1.5 >= (1-X) x (1.7-X)/(1-X) X >= 0.2

ALL loads must be eliminated for this to be a win!

9/23/2004 Lec 1-2 26

Improve Memory System

All instructions require an instruction fetch, only a fraction require a data fetch/store.
Optimize instruction access over data access

Programs exhibit locality

Spatial Locality Temporal Locality

Access to small memories is faster

Provide a storage hierarchy such that the most frequent accesses are to the smallest (closest) memories. Registers
9/23/2004

Cache

Memory
Lec 1-2

Disk/Tape
27

Benchmarks
program as unit of work
There are millions of programs Not all are the same, most are very different Which ones to use?

Benchmarks
Standard programs for measuring or comparing performance Representative of programs people care about repeatable!!
9/23/2004 Lec 1-2 28

Choosing Programs to Evaluate Perf.

Toy benchmarks
e.g., quicksort, puzzle No one really runs. Scary fact: used to prove the value of RISC in early 80s Attempt to match average frequencies of operations and operands in real workloads. e.g., Whetstone, Dhrystone Often slightly more complex than kernels; But do not represent real programs Most frequently executed pieces of real programs e.g., livermore loops Good for focusing on individual features not big picture Tend to over-emphasize target feature

Synthetic benchmarks

Kernels

Real programs

e.g., gcc, spice, SPEC89, 92, 95, SPEC2000 (standard performance evaluation corporation), TPCC, TPCD

Networking Benchmarks: Netbench, Commbench,

Applications: IP Forwarding, TCP/IP, SSL, Apache, SpecWeb Commbench: www.ecs.umass.edu/ece/wolf/nsl/software/cb/index.html

Execution Driven Simulators: Simplescalar http://www.simplescalar.com/ NepSim http://www.cs.ucr.edu/~yluo/nepsim/

9/23/2004

Lec 1-2

MIPS and MFLOPS

MIPS: millions of instructions per second:
MIPS = Inst. count/ (CPU time * 10**6) = Clock rate/(CPI*106) easy to understand and to market inst. set dependent, cannot be used across machines. program dependent can vary inversely to performance! (why? read the book)

MFLOPS: million of FP ops per second.

9/23/2004

less compiler dependent than MIPS. not all FP ops are implemented in h/w on all machines. not all FP ops have same latencies. normalized MFLOPS: uses an equivalence table to even out the various latencies of FP ops.
Lec 1-2 31

Performance Contd.
SPEC CINT 2000, SPEC CFP2000, and TPCC figures are plotted in Fig. 1.19, 1.20 and 1.22 for various machines. EEMBC Performance of 5 different embedded processors (Table 1.24) are plotted in Fig. 1.25. Also performance/watt plotted in Fig. 1.27. Fig.1.30 lists the programs and changes in SPEC89, SPEC92, SPEC95 and SPEC2000 benchmarks.

9/23/2004

Lec 1-2

Lecture1 2
No ratings yet
Lecture1 2
30 pages
Slide 1
No ratings yet
Slide 1
33 pages
CPSC 321 Computer Architecture: Fall 2006
No ratings yet
CPSC 321 Computer Architecture: Fall 2006
36 pages
CH02-HP Computer Abstractions and Technology
No ratings yet
CH02-HP Computer Abstractions and Technology
36 pages
Chapter 1
No ratings yet
Chapter 1
53 pages
CMP2008 L1
No ratings yet
CMP2008 L1
47 pages
Computer Organization & Design The Hardware/Software Interface, 2nd Edition Patterson & Hennessy
80% (5)
Computer Organization & Design The Hardware/Software Interface, 2nd Edition Patterson & Hennessy
118 pages
Lecture 3
No ratings yet
Lecture 3
26 pages
Intro
No ratings yet
Intro
14 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
64 pages
Computer Architecture Overview and Concepts
No ratings yet
Computer Architecture Overview and Concepts
39 pages
Computer Architecture Unit 1
No ratings yet
Computer Architecture Unit 1
59 pages
CHAPTER 1 and 2
No ratings yet
CHAPTER 1 and 2
25 pages
Unit I
No ratings yet
Unit I
27 pages
Lecture 2: Performance/Power, MIPS Instructions
No ratings yet
Lecture 2: Performance/Power, MIPS Instructions
28 pages
Chapter - 01 - Computer Abstractions
No ratings yet
Chapter - 01 - Computer Abstractions
37 pages
ARM Architecture in Embedded Systems
No ratings yet
ARM Architecture in Embedded Systems
463 pages
Chapter 1 Edit PDF
No ratings yet
Chapter 1 Edit PDF
40 pages
DEC30032 Chapter 1 - Part1
No ratings yet
DEC30032 Chapter 1 - Part1
27 pages
Understanding Computer Architecture Basics
No ratings yet
Understanding Computer Architecture Basics
54 pages
Lecture 06 - Slides - Computer Technology and Instructions
No ratings yet
Lecture 06 - Slides - Computer Technology and Instructions
46 pages
Performance Enhancements in Microprocessors
No ratings yet
Performance Enhancements in Microprocessors
47 pages
Computer Architecture and Operating Systems (Caos) Course Code: CS31702 4-0-0
No ratings yet
Computer Architecture and Operating Systems (Caos) Course Code: CS31702 4-0-0
33 pages
Lec01 Intro
No ratings yet
Lec01 Intro
41 pages
Computer Architecture Insights
No ratings yet
Computer Architecture Insights
29 pages
Defining Computer Architecture
No ratings yet
Defining Computer Architecture
6 pages
Computer Architecture & Performance
No ratings yet
Computer Architecture & Performance
56 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
Week 1
No ratings yet
Week 1
34 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
Computer Architecture: Vnu - University Engineering Technology
No ratings yet
Computer Architecture: Vnu - University Engineering Technology
30 pages
CSC 301 Solutions
No ratings yet
CSC 301 Solutions
13 pages
Advances in Computer Architecture ECE 6373
No ratings yet
Advances in Computer Architecture ECE 6373
151 pages
CS3350B Computer Architecture: Marc Moreno Maza
100% (1)
CS3350B Computer Architecture: Marc Moreno Maza
45 pages
Computer Architecture Overview
No ratings yet
Computer Architecture Overview
68 pages
Modern Computer Architecture (Processor Design) : Prof. Dan Connors Dconnors@colostate - Edu
No ratings yet
Modern Computer Architecture (Processor Design) : Prof. Dan Connors Dconnors@colostate - Edu
32 pages
Computer Organisation and Architecture Assignment
0% (1)
Computer Organisation and Architecture Assignment
6 pages
Fundamentals of Computer Design Unit 1-Chapter 1: Reference
No ratings yet
Fundamentals of Computer Design Unit 1-Chapter 1: Reference
53 pages
Understanding the Central Processing Unit
No ratings yet
Understanding the Central Processing Unit
42 pages
Lecture 2 CPU Fundamentals
No ratings yet
Lecture 2 CPU Fundamentals
43 pages
Student Notes 1
No ratings yet
Student Notes 1
65 pages
Lec 2
No ratings yet
Lec 2
31 pages
PPT#01
No ratings yet
PPT#01
30 pages
COAL Lecture 02
No ratings yet
COAL Lecture 02
36 pages
Chapter 1
No ratings yet
Chapter 1
18 pages
The Improvement of The Personal Computer
No ratings yet
The Improvement of The Personal Computer
74 pages
Chapter1 Computer Abstractions and Technology
No ratings yet
Chapter1 Computer Abstractions and Technology
52 pages
Computer Architecture & OS Syllabus
No ratings yet
Computer Architecture & OS Syllabus
30 pages
Computer Organization & Architecture Basics
No ratings yet
Computer Organization & Architecture Basics
9 pages
Computer Abstractions and Technology
No ratings yet
Computer Abstractions and Technology
47 pages
RTSEC Documentation
No ratings yet
RTSEC Documentation
4 pages
CSE 820 Graduate Computer Architecture: Dr. Enbody
No ratings yet
CSE 820 Graduate Computer Architecture: Dr. Enbody
25 pages
kiến trúc máy tính
No ratings yet
kiến trúc máy tính
30 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
Ico22 - 1 - Computer Abstraction and Technology
No ratings yet
Ico22 - 1 - Computer Abstraction and Technology
42 pages
01 - Chapter 1
No ratings yet
01 - Chapter 1
41 pages
CH04 COA11e
No ratings yet
CH04 COA11e
48 pages
Locality Principles in Caching Systems
No ratings yet
Locality Principles in Caching Systems
18 pages
Computer System Overview Guide
No ratings yet
Computer System Overview Guide
21 pages
Cache Memory for CS Students
No ratings yet
Cache Memory for CS Students
21 pages
Sampriya Chandra Cache Memory
No ratings yet
Sampriya Chandra Cache Memory
36 pages
Saurabh Mca Jul 24 CSM 6112
No ratings yet
Saurabh Mca Jul 24 CSM 6112
16 pages
Test 6 PracticeQuestion Cachememory 1
No ratings yet
Test 6 PracticeQuestion Cachememory 1
21 pages
Memory-3
No ratings yet
Memory-3
138 pages
Cache Misses
No ratings yet
Cache Misses
8 pages
CS333 Homework 1: OS Overview
No ratings yet
CS333 Homework 1: OS Overview
5 pages
UNIT - V Previous Question Papers Solutions
No ratings yet
UNIT - V Previous Question Papers Solutions
19 pages
Lect12 Cache
No ratings yet
Lect12 Cache
39 pages
EMLClassTest1 2024QS
No ratings yet
EMLClassTest1 2024QS
4 pages
Memory Organization in Computer Architecture
No ratings yet
Memory Organization in Computer Architecture
43 pages
Low Power Vlsi Design: Architecture Optimizations/Synthesis
No ratings yet
Low Power Vlsi Design: Architecture Optimizations/Synthesis
13 pages
Memory Systems Overview
No ratings yet
Memory Systems Overview
73 pages
UNIT-2 (Memory Hierarchy Design)
No ratings yet
UNIT-2 (Memory Hierarchy Design)
98 pages
Error Correcting Codes and Cache Design
No ratings yet
Error Correcting Codes and Cache Design
28 pages
Computer Organisation and Architecture MCQ Part 1
100% (2)
Computer Organisation and Architecture MCQ Part 1
5 pages
Cache Memory Explained
No ratings yet
Cache Memory Explained
7 pages
Computer Architecture - Memory System
100% (1)
Computer Architecture - Memory System
22 pages
Matrix Computation Memory Locality Analysis
No ratings yet
Matrix Computation Memory Locality Analysis
4 pages
CS 220
No ratings yet
CS 220
20 pages
Chapter 3
No ratings yet
Chapter 3
16 pages
Distributed File Systems Overview
No ratings yet
Distributed File Systems Overview
49 pages
Unit 4 and Unit-5 - Memory
No ratings yet
Unit 4 and Unit-5 - Memory
104 pages
16-Cache Memory-13-03-2024
No ratings yet
16-Cache Memory-13-03-2024
50 pages
Memory Hierarchy & Cache Optimization
No ratings yet
Memory Hierarchy & Cache Optimization
23 pages
Cache Memory Organization Techniques
No ratings yet
Cache Memory Organization Techniques
35 pages
Rules of Thumb in Data Engineering
No ratings yet
Rules of Thumb in Data Engineering
8 pages

Advanced Computer Architecture Overview

Uploaded by

Advanced Computer Architecture Overview

Uploaded by

CS 203A Advanced Computer Architecture

What is *Computer Architecture*

The Instruction Set: a Critical Interface

Instruction-Set Processor Design

Implementation (architecture) processor designer/view

Relationship Between the Three Aspects

Architecture covers all three aspects.

Applications and Requirements

Commercial: inventory, payroll, web serving, e-commerce

Embedded: automobile engines, microwave, PDAs

Network computing: Web, Security, multimedia, games, entertainment

Network bandwidth outpaces Moores law

100 Network bandwidth

GHz and Gbps

Low-end PCs, laptops, PDAs mobile Pentiums

Why Study Computer Architecture

Is speed the only goal?

Answer #1: requirements are always changing

Why Study Computer Architecture

Annual technology improvements (approx.)

Example of Changing Designs

Performance Growth in Perspective

1971 2001: performance improved 35,000X!!!

Energy/Power consumption Throughput (tasks completed per unit time, bandwidth)

Both are important (Architecture Latency is important,

Embedded system Power consumption is important, and Network Throughput is important)

X is m% faster than Y means: Execution timeY - Execution timeX Execution timeX

Compute Speedup Amdahls Law

Execution time w/o E (Before) Execution time w E (After)

Amdahls Law An Example

Read examples in the book!

Program Compiler Inst. Set Architecture Arch Physical Design

Inst. Count CPI X X X X X X

CPI - Cycles per Instruction

Instruction type Frequency Clock cycles

ALU Load 43% 21% 1 2

average CPI = 0.43 + 0.42 + 0.24 + 0.48 = 1.57 cycles/instruction

Example (RISC Vs. CISC)

Add a register-memory ALU instruction format?

One op. in register, one op. in memory

Q: What fraction of loads must be eliminated for this to pay off?

Exec Time = Instr. Cnt. x CPI x Cycle time

ALL loads must be eliminated for this to be a win!

Improve Memory System

Programs exhibit locality

Access to small memories is faster

Choosing Programs to Evaluate Perf.

Networking Benchmarks: Netbench, Commbench,

Execution Driven Simulators: Simplescalar http://www.simplescalar.com/ NepSim http://www.cs.ucr.edu/~yluo/nepsim/

MIPS and MFLOPS

MFLOPS: million of FP ops per second.

You might also like

What is Computer Architecture