0% found this document useful (0 votes)

152 views31 pages

Chapter 8 - Pipelining

This document discusses pipelining in computer processors. It begins with an overview of pipelining and its benefits for improving processor throughput. It then provides examples to illustrate the basic concepts of pipelining using an analogy of an assembly line for laundry. The document discusses how pipelining can be applied in computer instruction execution by dividing instructions into multiple stages. It also covers pipeline performance issues such as hazards that can cause stalls, and techniques for addressing hazards like forwarding, branch prediction, and instruction queues.

Uploaded by

Anita Sofia Keyser

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

152 views31 pages

Chapter 8 - Pipelining

Uploaded by

Anita Sofia Keyser

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

Chapter 8.

Pipelining

Overview
Pipelining

is widely used in modern

processors.
Pipelining improves system performance in
terms of throughput.
Pipelined organization requires sophisticated
compilation techniques.

Basic Concepts

Making the Execution of

Programs Faster
Use

faster circuit technology to build the

processor and the main memory.
Arrange the hardware so that more than one
operation can be performed at the same time.
In the latter way, the number of operations
performed per second is increased even
though the elapsed time needed to perform
any one operation is not changed.

Traditional Pipeline Concept

Laundry

Example
Ann, Brian, Cathy, Dave
each have one load of clothes
to wash, dry, and fold
Washer takes 30 minutes
Dryer

takes 40 minutes

Folder

takes 20 minutes

Traditional Pipeline Concept

6 PM

Midnight

Time
30

20 30

20 30
Sequential

A
B
C
D

laundry takes 6 hours

for 4 loads
If they learned pipelining, how
long would laundry take?

Traditional Pipeline Concept

6 PM
T
a
s
k
O
r
d
e
r

Midnight

Time
30

40 20

A
Pipelined

B
C
D

laundry takes
3.5 hours for 4 loads

Traditional Pipeline Concept

6 PM

Pipelining

9
Time

T
a
s
k
O
r
d
e
r

30
A

B
C
D

doesnt help latency

of single task, it helps
throughput of entire workload
Pipeline rate limited by slowest
pipeline stage
Multiple tasks operating
simultaneously using different
resources
Potential speedup = Number
pipe stages
Unbalanced lengths of pipe
stages reduces speedup
Time to fill pipeline and time
to drain it reduces speedup
Stall for Dependences

Use the Idea of Pipelining in a

Computer
Fetch + Execution
I1

Time

Clockcycle
F

I2
Interstagebuffer
B1
Instruction
fetch
unit

I3
Execution
unit

(b)Hardwareorganization

Instruction
I1

(a)Sequentialexecution

Time

E2
F3

(c)Pipelinedexecution

Figure8.1.Basicideaofinstructionpipelining.

Use the Idea of Pipelining in a

Computer
Time

Clockcycle

Instruction

Fetch + Decode
+ Execution + Write

I1
I2
I3
I4

(a)Instructionexecutiondividedintofoursteps
Interstagebuffers

D:Decode
instruction
andfetch
operands

F:Fetch
instruction
B1

E:Execute
operation
B2

(b)Hardwareorganization

Textbook page: 457

Figure8.2. A4stagepipeline.

W:Write
results
B3

Role of Cache Memory

Each pipeline stage is expected to complete in one

clock cycle.
The clock period should be long enough to let the
slowest pipeline stage to complete.
Faster stages can only wait for the slowest one to
complete.
Since main memory is very slow compared to the
execution, if each instruction needs to be fetched
from main memory, pipeline is almost useless.
Fortunately, we have cache.

Pipeline Performance
The

potential increase in performance

resulting from pipelining is proportional to the
number of pipeline stages.
However, this increase would be achieved
only if all pipeline stages require the same
time to complete, and there is no interruption
throughout program execution.
Unfortunately, this is not true.

Pipeline Performance
Time
Clockcycle

Instruction
I1
I2
I3
I4

Figure8.3. Effectofanexecutionoperationtakingmorethanoneclockcycle.

Pipeline Performance

The previous pipeline is said to have been stalled for two clock
cycles.
Any condition that causes a pipeline to stall is called a hazard.
Data hazard any condition in which either the source or the
destination operands of an instruction are not available at the
time expected in the pipeline. So some operation has to be
delayed, and the pipeline stalls.
Instruction (control) hazard a delay in the availability of an
instruction causes the pipeline to stall.
Structural hazard the situation when two instructions require
the use of a given hardware resource at the same time.

Pipeline Performance
Time

Instruction
hazard

Clockcycle

Instruction
I1
I2

(a)Instructionexecutionstepsinsuccessiveclockcycles
Time
Clockcycle

idle

Stage
F:Fetch
D:Decode
E:Execute
W:Write

Idle periods
stalls (bubbles)
W3

(b)Functionperformedbyeachprocessorstageinsuccessiveclockcycles

Figure8.4. PipelinestallcausedbyacachemissinF2.

Pipeline Performance
Structural
hazard

Load X(R1), R2
Time
Clockcycle

D2
F3

Instruction
I1
I2 (Load)
I3
I4

Figure8.5. EffectofaLoadinstructiononpipelinetiming.

Pipeline Performance

Again, pipelining does not result in individual

instructions being executed faster; rather, it is the
throughput that increases.
Throughput is measured by the rate at which
instruction execution is completed.
Pipeline stall causes degradation in pipeline
performance.
We need to identify all hazards that may cause the
pipeline to stall and to find ways to minimize their
impact.

Quiz
Four

instructions, the I2 takes two clock

cycles for execution. Pls draw the figure for 4stage pipeline, and figure out the total cycles
needed for the four instructions to complete.

Data Hazards

We must ensure that the results obtained when instructions are

executed in a pipelined processor are identical to those obtained
when the same instructions are executed sequentially.
Hazard occurs
A3+A
B4A
No hazard
A5C
B 20 + C
When two operations depend on each other, they must be
executed sequentially in the correct order.
Another example:
Mul R2, R3, R4
Add R5, R4, R6

Data Hazards
Time
Clockcycle

D2A

Instruction
I1 (Mul)
I2 (Add)
I3
I4

Figure8.6. PipelinestalledbydatadependencybetweenD2andW1.

Figure 8.6. Pipeline stalled by data dependency between D2 and W1.

Operand Forwarding
Instead

of from the register file, the second

instruction can get data directly from the
output of ALU after the previous instruction is
completed.
A special arrangement needs to be made to
forward the output of ALU to the input of
ALU.

Source1
Source2

SRC1

SRC2

ALU

RSLT
Destination
(a)Datapath

SRC1,SRC2

RSLT
E:Execute
(ALU)

W:Write
(Registerfile)

Forwardingpath
(b)Positionofthesourceandresultregistersintheprocessorpipeline

Figure8.7. Operandforw ardinginapipelinedprocessor.

Handling Data Hazards in

Software
Let

the compiler detect and handle the

hazard:
I1: Mul R2, R3, R4
NOP
NOP
I2: Add R5, R4, R6
The compiler can reorder the instructions to
perform some useful work during the NOP
slots.

Side Effects

The previous example is explicit and easily detected.

Sometimes an instruction changes the contents of a register
other than the one named as the destination.
When a location other than one explicitly named in an instruction
as a destination operand is affected, the instruction is said to
have a side effect. (Example?)
Example: conditional code flags:
Add R1, R3
AddWithCarry R2, R4
Instructions designed for execution on pipelined hardware should
have few side effects.

Instruction Hazards

Overview
Whenever

the stream of instructions supplied

by the instruction fetch unit is interrupted, the
pipeline stalls.
Cache miss
Branch

Unconditional Branches
Time
Clockcycle

Instruction
I1
I2 (Branch)
I3
Ik
Ik+1

Executionunitidle

E2
F3

Ek
Fk+1

Ek+1

Figure8.8. Anidlecyclecausedbyabranchinstruction.

Time

Branch Timing

Clockcycle

I2 (Branch)
I3

- Branch penalty
- Reducing the penalty

Ik
Ik+1

Fk+1

Dk+1

E k+1

(a)BranchaddresscomputedinEx ecutestage
Time
Clockcycle

I2 (Branch)
I3
Ik
Ik+1

X
Fk

Fk+1

D k+1 E k+1

(b)BranchaddresscomputedinDecodestage

Figure8.9. Branchtiming.

Instruction Queue and

Prefetching
Instructionfetchunit
Instructionqueue
F:Fetch
instruction

D:Dispatch/
Decode
unit

E:Execute
instruction

W:Write
results

Figure8.10.UseofaninstructionqueueinthehardwareorganizationofFigure8.2b.

Conditional Braches
A

conditional branch instruction introduces the

added hazard caused by the dependency of
the branch condition on the result of a
preceding instruction.
The decision to branch cannot be made until
the execution of that instruction has been
completed.
Branch instructions represent about 20% of
the dynamic instruction count of most
programs.

Pipelining Basic and Intermediate Concepts
No ratings yet
Pipelining Basic and Intermediate Concepts
75 pages
Data Hazards and Pipeline Timing in RISC
No ratings yet
Data Hazards and Pipeline Timing in RISC
8 pages
Level 2 Flowcharts Housekeeping Tasks Merged With The Operation Tasks To Form Level 2 Flowcharts
100% (1)
Level 2 Flowcharts Housekeeping Tasks Merged With The Operation Tasks To Form Level 2 Flowcharts
19 pages
Solutions Ch4
No ratings yet
Solutions Ch4
7 pages
Instruction-Level Parallelism (ILP), Since The
100% (1)
Instruction-Level Parallelism (ILP), Since The
57 pages
Sushant - Dish Network - Interview Experience
No ratings yet
Sushant - Dish Network - Interview Experience
4 pages
Chapter 03 Assembly Language
No ratings yet
Chapter 03 Assembly Language
96 pages
Unit-4 8051 Assembly Language Programming Technical
No ratings yet
Unit-4 8051 Assembly Language Programming Technical
59 pages
Advanced VLSI Design Course
No ratings yet
Advanced VLSI Design Course
23 pages
3.ae ZG511 Ec-3r First Sem 2022-2023
No ratings yet
3.ae ZG511 Ec-3r First Sem 2022-2023
5 pages
Unit 1 Semiconductor Devices
No ratings yet
Unit 1 Semiconductor Devices
45 pages
Understanding Multithreading Techniques
No ratings yet
Understanding Multithreading Techniques
22 pages
Microprocessor and Interfacing Techniques: (Course Code: CET208A) Credits-3
No ratings yet
Microprocessor and Interfacing Techniques: (Course Code: CET208A) Credits-3
147 pages
CPU Cycles and Pipeline Performance
No ratings yet
CPU Cycles and Pipeline Performance
16 pages
N Bit Comparator
No ratings yet
N Bit Comparator
25 pages
Advanced Pipelining for CE Students
No ratings yet
Advanced Pipelining for CE Students
43 pages
Udp Verilog
No ratings yet
Udp Verilog
23 pages
Rsfec Project Report - Final
No ratings yet
Rsfec Project Report - Final
81 pages
Real Time Systems Syllabus
No ratings yet
Real Time Systems Syllabus
1 page
Interconnection Structures
No ratings yet
Interconnection Structures
7 pages
Course File DSTL KCS-303
No ratings yet
Course File DSTL KCS-303
39 pages
PIC Microcontroller and Embedded Systems Muhammad Ali Mazidi, Rolin McKinlay and Danny Causey
No ratings yet
PIC Microcontroller and Embedded Systems Muhammad Ali Mazidi, Rolin McKinlay and Danny Causey
10 pages
Computer Organization CPU Organization 1.3.1
No ratings yet
Computer Organization CPU Organization 1.3.1
13 pages
Real-Time Task Priority Inversion
No ratings yet
Real-Time Task Priority Inversion
30 pages
K. R. Rao, Zaron S. Bojkovic, Dragorad A. Milocanovic, Multimedia Communication
No ratings yet
K. R. Rao, Zaron S. Bojkovic, Dragorad A. Milocanovic, Multimedia Communication
248 pages
Understanding Computer Architecture Basics
No ratings yet
Understanding Computer Architecture Basics
12 pages
Advanced Digital System Design (2013)
No ratings yet
Advanced Digital System Design (2013)
2 pages
Image Segmentation Techniques Overview
No ratings yet
Image Segmentation Techniques Overview
41 pages
Santosh V Hegde-2022HT01035-ESZG553 RTS
No ratings yet
Santosh V Hegde-2022HT01035-ESZG553 RTS
7 pages
DFTS BE 4 II Sem Unit 1
No ratings yet
DFTS BE 4 II Sem Unit 1
166 pages
Module 1 and 2
No ratings yet
Module 1 and 2
80 pages
Electronics Engineering Exam Prep
No ratings yet
Electronics Engineering Exam Prep
11 pages
8086 Microprocessor Overview
No ratings yet
8086 Microprocessor Overview
129 pages
RISC vs CISC: Characteristics & Processing
No ratings yet
RISC vs CISC: Characteristics & Processing
16 pages
Mealy vs Moore Machines Guide
No ratings yet
Mealy vs Moore Machines Guide
21 pages
BEE - Important 2 Marks and 10 Marks Questions
No ratings yet
BEE - Important 2 Marks and 10 Marks Questions
7 pages
Ec6703 Embedded and Real Time Systems
No ratings yet
Ec6703 Embedded and Real Time Systems
1 page
Unit 1
No ratings yet
Unit 1
67 pages
Anwesha - John Deere - Interview
No ratings yet
Anwesha - John Deere - Interview
2 pages
1st Unit EMBEDDED SYSTEM Notes GWCET
No ratings yet
1st Unit EMBEDDED SYSTEM Notes GWCET
28 pages
Components For Embedded Programs
No ratings yet
Components For Embedded Programs
16 pages
Arithmetic Pipeline in Computer Architecture
No ratings yet
Arithmetic Pipeline in Computer Architecture
27 pages
Unit - 2 Central Processing Unit TOPIC 1: General Register Organization
No ratings yet
Unit - 2 Central Processing Unit TOPIC 1: General Register Organization
13 pages
Understanding Procedures and Macros in Programming
No ratings yet
Understanding Procedures and Macros in Programming
7 pages
Barrel Shifter
No ratings yet
Barrel Shifter
2 pages
LPC2148 GPIO Configuration Guide
No ratings yet
LPC2148 GPIO Configuration Guide
18 pages
ARM INstruction Set
100% (1)
ARM INstruction Set
6 pages
Ee660 2017 Spring Materials Week 04 Slides
No ratings yet
Ee660 2017 Spring Materials Week 04 Slides
40 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
33 pages
Week 5 Course Material
No ratings yet
Week 5 Course Material
76 pages
ARM Processor Architecture Overview
No ratings yet
ARM Processor Architecture Overview
32 pages
Sample Eda Lab (Part-A) Manual: Simulation Output
No ratings yet
Sample Eda Lab (Part-A) Manual: Simulation Output
20 pages
Execution Control Comp
50% (2)
Execution Control Comp
47 pages
DAC 0800 Waveform Generation with 8051
No ratings yet
DAC 0800 Waveform Generation with 8051
14 pages
Pipe Lining
No ratings yet
Pipe Lining
61 pages
Module 5 - Pipelining
No ratings yet
Module 5 - Pipelining
61 pages
Unit3 Pipelining
No ratings yet
Unit3 Pipelining
54 pages
Chapter 6 - Pipelining
0% (1)
Chapter 6 - Pipelining
61 pages
Pipelining in Modern Processors
No ratings yet
Pipelining in Modern Processors
61 pages
Pipelining in Modern Processors
No ratings yet
Pipelining in Modern Processors
61 pages
Operating Systems Overview Guide
No ratings yet
Operating Systems Overview Guide
71 pages
Unit 1 - Exercise - Solution
No ratings yet
Unit 1 - Exercise - Solution
8 pages
Module 2
No ratings yet
Module 2
21 pages
Operating Systems Practice Guide
No ratings yet
Operating Systems Practice Guide
16 pages
Cs8651 Internet Programming Unit I Website Basics, HTML 5, Css 3, Web 2.0 9
No ratings yet
Cs8651 Internet Programming Unit I Website Basics, HTML 5, Css 3, Web 2.0 9
37 pages
Computer Science Students' Exercises
No ratings yet
Computer Science Students' Exercises
10 pages
Unit Ii - Process Management
No ratings yet
Unit Ii - Process Management
35 pages
Remote Sensing Image Processing
No ratings yet
Remote Sensing Image Processing
40 pages
List of Colleges As On 29022020 PDF
No ratings yet
List of Colleges As On 29022020 PDF
1,551 pages
Spectral Signatures in Remote Sensing
No ratings yet
Spectral Signatures in Remote Sensing
55 pages
Unit - I Software Management Renaissance: Siddharth Group of Institutions:: Puttur
No ratings yet
Unit - I Software Management Renaissance: Siddharth Group of Institutions:: Puttur
4 pages
Introduction to Remote Sensing Basics
No ratings yet
Introduction to Remote Sensing Basics
40 pages
Question Bank SPM
No ratings yet
Question Bank SPM
35 pages
Remote Sensing Image Processing
No ratings yet
Remote Sensing Image Processing
40 pages
21 April 2020 - Geographic Phenomena - Concepts and Examples - Prasun Kumar Gupta
100% (1)
21 April 2020 - Geographic Phenomena - Concepts and Examples - Prasun Kumar Gupta
39 pages
23 April 2020 - Spatial Analysis - Functionality and Tools by Shri. Kapil Oberai PDF
No ratings yet
23 April 2020 - Spatial Analysis - Functionality and Tools by Shri. Kapil Oberai PDF
44 pages
Overview of Spatial Data Analysis
No ratings yet
Overview of Spatial Data Analysis
55 pages
Introduction to Remote Sensing Basics
No ratings yet
Introduction to Remote Sensing Basics
40 pages
Basic Prinicpalof Remote Sensing
No ratings yet
Basic Prinicpalof Remote Sensing
2 pages
Spectral Signatures in Remote Sensing
No ratings yet
Spectral Signatures in Remote Sensing
55 pages
Cyber Forensics and Firewall Security
No ratings yet
Cyber Forensics and Firewall Security
2 pages
Cache Memory for CS Students
No ratings yet
Cache Memory for CS Students
81 pages
Avadhesh Ranm Tripathi 23BCS80022 Ass-12
No ratings yet
Avadhesh Ranm Tripathi 23BCS80022 Ass-12
3 pages
Solution Manual of Digital Logic and Computer Design 4th Ed Morris Mano
0% (4)
Solution Manual of Digital Logic and Computer Design 4th Ed Morris Mano
59 pages
Switched Capacitance Minimization Techniques
100% (1)
Switched Capacitance Minimization Techniques
75 pages
Digital Logic Design Lab Report
No ratings yet
Digital Logic Design Lab Report
49 pages
NGP 324 Controller
No ratings yet
NGP 324 Controller
3 pages
B.tech 2nd Yr ECE Advanced Communication Technology VLSI
No ratings yet
B.tech 2nd Yr ECE Advanced Communication Technology VLSI
16 pages
Intel 8085 Microprocessor Guide
No ratings yet
Intel 8085 Microprocessor Guide
15 pages
RTL Building Block: Nguyễn Thị Ngọc Huyền Trần Trung Hiếu
No ratings yet
RTL Building Block: Nguyễn Thị Ngọc Huyền Trần Trung Hiếu
17 pages
Intro to Computers for Beginners
No ratings yet
Intro to Computers for Beginners
3 pages
ADC and DAC Programming in AVR
50% (2)
ADC and DAC Programming in AVR
17 pages
09 - CP342 5 DP Master
No ratings yet
09 - CP342 5 DP Master
18 pages
Nissho Technology: Sample Source Manual
No ratings yet
Nissho Technology: Sample Source Manual
28 pages
ITT270 Group Project
No ratings yet
ITT270 Group Project
4 pages
Toshiba Satellite A300 PDF
No ratings yet
Toshiba Satellite A300 PDF
263 pages
Design Implementations of Ternary Logic Systems - A Critical Review
No ratings yet
Design Implementations of Ternary Logic Systems - A Critical Review
23 pages
Programming Manual
No ratings yet
Programming Manual
469 pages
VLSI Lab Viva QA Clean Formal Final
No ratings yet
VLSI Lab Viva QA Clean Formal Final
5 pages
NuMicro Flash Memory Controller Guide
No ratings yet
NuMicro Flash Memory Controller Guide
21 pages
MOTHERBOARD 915gm PDF
No ratings yet
MOTHERBOARD 915gm PDF
98 pages
Difference Y86 and x86
No ratings yet
Difference Y86 and x86
8 pages
Vlsi 6 New
No ratings yet
Vlsi 6 New
4 pages
Discrete Structure Lecture 5
No ratings yet
Discrete Structure Lecture 5
29 pages
Introduction To ARM LPC1768
No ratings yet
Introduction To ARM LPC1768
4 pages
Microcontroller-Based PWM Inverter
100% (1)
Microcontroller-Based PWM Inverter
4 pages
MOSFET I-V Characteristics Analysis
No ratings yet
MOSFET I-V Characteristics Analysis
39 pages
Exercise I-III - Semiconductor Electronics
No ratings yet
Exercise I-III - Semiconductor Electronics
19 pages
DLCOunit 3
No ratings yet
DLCOunit 3
49 pages
Risc V Implementation
No ratings yet
Risc V Implementation
8 pages
The Federal University of Technology Akure, Ondo State
No ratings yet
The Federal University of Technology Akure, Ondo State
5 pages

Chapter 8 - Pipelining

Uploaded by

Chapter 8 - Pipelining

Uploaded by

Chapter 8.

is widely used in modern

Making the Execution of

faster circuit technology to build the

Traditional Pipeline Concept

Traditional Pipeline Concept

laundry takes 6 hours

Traditional Pipeline Concept

Traditional Pipeline Concept

doesnt help latency

Use the Idea of Pipelining in a

Use the Idea of Pipelining in a

Textbook page: 457

Role of Cache Memory

Each pipeline stage is expected to complete in one

potential increase in performance

Again, pipelining does not result in individual

instructions, the I2 takes two clock

We must ensure that the results obtained when instructions are

Figure 8.6. Pipeline stalled by data dependency between D2 and W1.

of from the register file, the second

Figure8.7. Operandforw ardinginapipelinedprocessor.

Handling Data Hazards in

the compiler detect and handle the

The previous example is explicit and easily detected.

the stream of instructions supplied

Instruction Queue and

conditional branch instruction introduces the

You might also like