0% found this document useful (0 votes)

17 views23 pages

Hafta 14

This document describes out-of-order execution in processors. It discusses: - Allowing instructions to complete out of order while still issuing them in program order. This enables optimizations like instruction I2 completing before I1. - Out-of-order issue allows looking ahead past dependencies/conflicts to issue independent instructions earlier. This requires decoupling decode and execute stages with an instruction window buffer. - Register renaming avoids false dependencies like write-after-write that could otherwise stall the pipeline with out-of-order execution. It allocates physical registers dynamically. - Examples show how out-of-order execution and multiple functional units can improve performance, especially with a larger instruction window to find independent instructions.

Uploaded by

nausicaatetoo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views23 pages

Hafta 14

Uploaded by

nausicaatetoo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

In-Order Issue

Out-Of-Order Completion
• Issue instructions in the exact order that would be achieved by sequential
execution but allow instructions to run to completion out of order.

• Any number of instructions may be in the execution stage at any one time, up
to the maximum degree of machine parallelism across all functional units.

• Instruction issuing is stalled by a resource conflict, a data dependency, or a

procedural dependency.

• Instruction I2 is allowed to run to completion prior to I1.

• This allows I3 to be completed earlier, with the net result of a savings of one
cycle.
Out-Of-Order Issue
Out-Of-Order Completion
• With in-order issue, the processor will only decode
instructions up to the point of a dependency or conflict.

• No additional instructions are decoded until the conflict is

resolved.

• As a result, the processor cannot look ahead of the point of

conflict to subsequent instructions that may be
independent of those already in the pipeline and that may
be usefully introduced into the pipeline.

• To allow out-of-order issue, it is necessary to decouple the

decode and execute stages of the pipeline

• This is done with a buffer referred to as an instruction

window
Out-Of-Order Issue
Out-Of-Order Completion
• With this organization, after a processor has finished
decoding an instruction, it is placed in the instruction
window.

• As long as this buffer is not full, the processor can continue

to fetch and decode new instructions.

• When a functional unit becomes available in the execute

stage, an instruction from the instruction window may be
issued to the execute stage.

• Any instruction may be issued, provided that

— it needs the particular functional unit that is available
— no conflicts or dependencies block this instruction
Out-Of-Order Issue
Out-Of-Order Completion
• The result of this organization is that the processor has a
lookahead capability, allowing it to identify independent
instructions that can be brought into the execute stage.

• Instructions are issued from the instruction window with

little regard for their original program order.

• As before, the only constraint is that the program execution

behaves correctly.
Out-Of-Order Issue
Out-Of-Order Completion
• During each of the first three cycles, two instructions are fetched
into the decode stage.

• During each cycle, subject to the constraint of the buffer size, two
instructions move from the decode stage to the instruction
window.

• In this example, it is possible to issue instruction I6 ahead of I5

(recall that I5 depends on I4, but I6 does not).

• Thus, one cycle is saved in both the execute and write-back

stages, and the end-to-end savings
Register Renaming
• When out-of-order instruction issuing and/or out-of-order
instruction completion are allowed, this may give rise to the
possibility of WAW dependencies and WAR dependencies.

• These dependencies differ from RAW data dependencies and

resource conflicts, which reflect the flow of data through a
program and the sequence of execution.

• WAW dependencies and WAR dependencies, on the other

hand, arise because the values in registers may no longer
reflect the sequence of values dictated by the program flow.

• May result in a pipeline stall

• Registers allocated dynamically

— i.e. registers are not specifically named
Register Renaming
• The register reference without the subscript refers to the
logical register reference found in the instruction.

• The register reference with the subscript refers to a hardware

• When a new allocation is made for a particular logical register,

subsequent instruction references to that logical register as a
source operand are made to refer to the most recently
allocated hardware register (recent in terms of the program
sequence of instructions).
Register Renaming
• In this example, the creation of register R3c in instruction I3
avoids the WAR dependency on the second instruction and the
WAW on the first instruction, and it does not interfere with the
correct value being accessed by I4.

• The result is that I3 can be issued immediately

• Without renaming, I3 cannot be issued until the first

instruction is complete and the second instruction is issued.
Register Renaming - Speedup
• The vertical axis corresponds to the mean speedup of the
superscalar machine over the scalar machine.
• The horizontal axis shows the results for four alternative
processor organizations.
• The base machine does not duplicate any of the functional units,
but it can issue instructions out of order.
Register Renaming - Speedup
• The second configuration duplicates the load/store functional unit
that accesses a data cache.
• The third configuration duplicates the ALU
• The fourth configuration duplicates both load/store and ALU.
• In each graph, results are shown for instruction window sizes of
8, 16, and 32 instructions, which dictates the amount of
lookahead the processor can do.
Register Renaming - Speedup
• The difference between the two graphs is that, in the second,
register renaming is allowed.
• First graph reflects a machine that is limited by all dependencies
• Second graph corresponds to a machine that is limited only by
true dependencies.
Machine Parallelism
• The two graphs, combined, yield some important conclusions.
• The first is that it is probably not worthwhile to add functional
units without register renaming.
• There is some slight improvement in performance, but at the cost
of increased hardware complexity.
• With register renaming, which eliminates antidependencies and
output dependencies, noticeable gains are achieved by adding
more functional units.
• Note, however, that there is a significant difference in the amount
of gain achievable between using an instruction window of 8
versus a larger instruction window.
• This indicates that if the instruction window is too small, data
dependencies will prevent effective utilization of the extra
functional units; the processor must be able to look quite far
ahead to find independent instructions to utilize the hardware
more fully.
Key Elements of a
Superscalar Processor Organization
• Instruction fetch strategies that simultaneously fetch multiple
instructions, often by predicting the outcomes of, and fetching
beyond, conditional branch instructions.

• These functions require the use of multiple pipeline fetch and decode
stages, and branch prediction logic.

• Logic for determining true dependencies involving register values,

and mechanisms for communicating these values to where they are
needed during execution.

• Mechanisms for initiating, or issuing, multiple instructions in parallel.

• Resources for parallel execution of multiple instructions, including

multiple pipelined functional units and memory hierarchies capable of
simultaneously servicing multiple memory references.

• Mechanisms for committing the process state in correct order.

Example - 1
• Consider the following sequence of instructions, where the syntax
consists of an opcode followed by the destination register followed by
one or two source registers:

• Assume the use of a four-stage pipeline: fetch, decode/issue,

execute, write back.
• Assume that all pipeline stages take one clock cycle except for the
execute stage.
• For simple integer arithmetic and logical instructions, the execute
stage takes one cycle, but for a LOAD from memory, five cycles are
consumed in the execute stage.
Example - 1
• If we have a simple scalar pipeline but allow out-of-order execution,
we can construct the following table for the execution of the first
seven instructions:
Example - 1
• The entries under the four pipeline stages indicate the clock cycle at
which each instruction begins each phase.

• In this program, the second ADD instruction (instruction 3) depends

on the LOAD instruction (instruction 1) for one of its operands, r6.

• Because the LOAD instruction takes five clock cycles, and the issue
logic encounters the dependent ADD instruction after two clocks, the
issue logic must delay the ADD instruction for three clock cycles.

• With an out-of-order capability, the processor can stall instruction 3

at clock cycle 4, and then move on to issue the following three
independent instructions, which enter execution at clocks 6, 8, and 9.

• The LOAD finishes execution at clock 9, and so the dependent ADD

can be launched into execution on clock 10.
Example - 1
a) Complete the preceding table.

b) Redo the table assuming no out-of-order capability. What is

the savings using the capability?

c) Redo the table assuming a superscalar implementation that

can handle two instructions at a time at each stage.
Example - 1
a) Complete the preceding table.

Inst.No / Time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
I0 F D E WB
I1 F D E WB
I2 F D E WB
I3 F D E WB
I4 F D E WB
I5 F D E WB
I6 F D E WB
I7 F D E WB
I8 F D E WB
I9 F D E WB
I10 F D E WB
Example - 1
b) Redo the table assuming no out-of-order capability. What is the
savings using the capability?

Inst.No / Time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
I0 F D E WB
I1 F D E WB
I2 F D E WB
I3 F D E WB
I4 F D E WB
I5 F D E WB
I6 F D E WB
I7 F D E WB
I8 F D E WB
I9 F D E WB
I10 F D E WB
Example - 1
c) Redo the table assuming a superscalar implementation that can
handle two instructions at a time at each stage.

Inst.No / Time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
I0 F D E WB
I1 F D E WB
I2 F D E WB
I3 F D E WB
I4 F D E WB
I5 F D E WB
I6 F D E WB
I7 F D E WB
I8 F D E WB
I9 F D E WB
I10 F D E
Example - 2
• Identify the write-read, write-write, and read-write dependencies in
the following instruction sequence:
Example - 2
• True Data Dependency (Read After
Write, RAW)
❖ I1-I4
❖ I1-I5
❖ I2-I4
❖ I2-I5
• Antidependency (Write After Read,
WAR)
❖ I2-I3
❖ I2-I4
❖ I3-I4
❖ I4-I5
• Output Dependency (Write After
Write, WAW)
❖ I1-I2
❖ I1-I5
❖ I2-I5
Review Questions
1. What is the difference between the superscalar and
superpipelined approaches?
2. Explain the following terms with an example
i. True data dependency
ii. Procedural dependency
iii. Resource conflicts
iv. Output dependency
v. Antidependency
3. Explain three types of superscalar instruction issue policies
with an example
4. What are the key elements of a superscalar processor
organization

Computer Organization and Architecture What Does Superscalar Mean?
No ratings yet
Computer Organization and Architecture What Does Superscalar Mean?
14 pages
CH18 COA11e
No ratings yet
CH18 COA11e
37 pages
L27,28 Superscaler
No ratings yet
L27,28 Superscaler
28 pages
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
No ratings yet
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
7 pages
CH16 ParallelismSuperScalar 22 Slides
No ratings yet
CH16 ParallelismSuperScalar 22 Slides
22 pages
Instruction-Level Parallelism and Superscalar Processors
100% (1)
Instruction-Level Parallelism and Superscalar Processors
22 pages
10 Week
No ratings yet
10 Week
35 pages
P14-15 Superscalar
No ratings yet
P14-15 Superscalar
28 pages
5th Sem - Unit 2-Ec355tbf
No ratings yet
5th Sem - Unit 2-Ec355tbf
104 pages
Superscalar
No ratings yet
Superscalar
38 pages
Decode and Issue More and One Instruction at A Time Executing More Than One Instruction at A Time More Than One Execution Unit
No ratings yet
Decode and Issue More and One Instruction at A Time Executing More Than One Instruction at A Time More Than One Execution Unit
28 pages
Superscalar Processors Explained
No ratings yet
Superscalar Processors Explained
34 pages
William Stallings Computer Organization and Architecture: Instruction Level Parallelism and Superscalar Processors
No ratings yet
William Stallings Computer Organization and Architecture: Instruction Level Parallelism and Superscalar Processors
28 pages
Instruction Level Parallelism Overview
No ratings yet
Instruction Level Parallelism Overview
42 pages
CH16-WS ILP and Superscalar-V2
No ratings yet
CH16-WS ILP and Superscalar-V2
42 pages
CH16-WS ILP and Superscalar-V2
No ratings yet
CH16-WS ILP and Superscalar-V2
42 pages
ITEC582-Chapter 16m
No ratings yet
ITEC582-Chapter 16m
55 pages
William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
40 pages
7TH - Unit 2-21ec74h6 - Ca
No ratings yet
7TH - Unit 2-21ec74h6 - Ca
95 pages
Chapter 2 ILP
No ratings yet
Chapter 2 ILP
89 pages
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
No ratings yet
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
67 pages
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
No ratings yet
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
21 pages
S6 - Advanced Topics in Computer Architecture
No ratings yet
S6 - Advanced Topics in Computer Architecture
52 pages
Superscalar Processors & Parallelism
No ratings yet
Superscalar Processors & Parallelism
50 pages
L1.3b OOOpipelines
No ratings yet
L1.3b OOOpipelines
72 pages
Chapter 13 - Instruction Level Parallelism
No ratings yet
Chapter 13 - Instruction Level Parallelism
16 pages
Presentation Cea Chapter16 2 Demo
No ratings yet
Presentation Cea Chapter16 2 Demo
30 pages
CH10-Processor Structure and Function
No ratings yet
CH10-Processor Structure and Function
14 pages
EE457Unit9a OoO
No ratings yet
EE457Unit9a OoO
77 pages
Superscalar Pipeline and Instruction Processing
No ratings yet
Superscalar Pipeline and Instruction Processing
28 pages
Instruction-Level Parallelism Overview
No ratings yet
Instruction-Level Parallelism Overview
20 pages
Module 5 - Processor Structure and Function
No ratings yet
Module 5 - Processor Structure and Function
74 pages
Computer Architecture Essentials
No ratings yet
Computer Architecture Essentials
42 pages
Instruction Level Pipelining
100% (1)
Instruction Level Pipelining
113 pages
CH18 COA11e
No ratings yet
CH18 COA11e
40 pages
Understanding Processor Architecture Basics
No ratings yet
Understanding Processor Architecture Basics
24 pages
Processor Organization & Instruction Cycle
No ratings yet
Processor Organization & Instruction Cycle
31 pages
4-Advanced Pipelining - 241114 - 060906
No ratings yet
4-Advanced Pipelining - 241114 - 060906
80 pages
Instruction Level Parallelism Explained
No ratings yet
Instruction Level Parallelism Explained
45 pages
Processor Structure and Function Overview
No ratings yet
Processor Structure and Function Overview
9 pages
Onur Ddca 2025 Lecture14 Out of Order Execution Afterlecture
No ratings yet
Onur Ddca 2025 Lecture14 Out of Order Execution Afterlecture
114 pages
Superscalar Processors Questions
No ratings yet
Superscalar Processors Questions
12 pages
Superscalar vs Superpipelined CPUs
No ratings yet
Superscalar vs Superpipelined CPUs
4 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
36 pages
Parallelism I: Inside The Core
No ratings yet
Parallelism I: Inside The Core
61 pages
Modern Computer Architecture (Processor Design) : Prof. Dan Connors Dconnors@colostate - Edu
No ratings yet
Modern Computer Architecture (Processor Design) : Prof. Dan Connors Dconnors@colostate - Edu
32 pages
Onur Digitaldesign - Comparch 2021 Lecture13 Pipelining Afterlecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture13 Pipelining Afterlecture
138 pages
Indirect Addressing in CPU Cycles
No ratings yet
Indirect Addressing in CPU Cycles
56 pages
Instruction Pipelining and SuperScalar Development - 2019
No ratings yet
Instruction Pipelining and SuperScalar Development - 2019
53 pages
MIPS Pipeline Multi-Cycle Operations Guide
No ratings yet
MIPS Pipeline Multi-Cycle Operations Guide
15 pages
Design of 32bit MIPS Processor
No ratings yet
Design of 32bit MIPS Processor
23 pages
Processor Instruction Handling
No ratings yet
Processor Instruction Handling
3 pages
3 Pipeline
No ratings yet
3 Pipeline
38 pages
Pipelining and Parallel Processing
No ratings yet
Pipelining and Parallel Processing
26 pages
Slot24 25 CH14 ProcessorStructureAndFunction 42 Slots
No ratings yet
Slot24 25 CH14 ProcessorStructureAndFunction 42 Slots
42 pages
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
74 pages
Computer Architecture 09-Superscalar
No ratings yet
Computer Architecture 09-Superscalar
83 pages
CPU Architecture Essentials
No ratings yet
CPU Architecture Essentials
40 pages
Tems Discovery Training
No ratings yet
Tems Discovery Training
147 pages
6 3d Sim Snow Drift Around Cube
No ratings yet
6 3d Sim Snow Drift Around Cube
23 pages
1 Soln
No ratings yet
1 Soln
3 pages
Chapter 22: Electric Flux and Gauss's Law
No ratings yet
Chapter 22: Electric Flux and Gauss's Law
27 pages
E01517 CRP User's Manual
No ratings yet
E01517 CRP User's Manual
46 pages
Enhanced Oil Recovery Methods
No ratings yet
Enhanced Oil Recovery Methods
17 pages
Gr3 Wk33 World Climates
No ratings yet
Gr3 Wk33 World Climates
2 pages
ASME B31.3 Key Changes
No ratings yet
ASME B31.3 Key Changes
2 pages
Mbed Microcontroller
No ratings yet
Mbed Microcontroller
3 pages
Z Transform - Practice Sheet 01
No ratings yet
Z Transform - Practice Sheet 01
4 pages
Maths 2021
No ratings yet
Maths 2021
20 pages
Shobhan Babu
No ratings yet
Shobhan Babu
1 page
GREENHECK - JET FAN Catalog - GJ-AU GJ-AB - EN
No ratings yet
GREENHECK - JET FAN Catalog - GJ-AU GJ-AB - EN
12 pages
MCA Python Journal
100% (2)
MCA Python Journal
5 pages
Nitin Goyal (Editor), Sharad Sharma (Editor), Arun Kumar Rana (Editor), Suman Lata Tripathi (Editor) - Internet of Things_ Robotic and Drone Technology (Smart Engineering Systems_ Design and Applicati
No ratings yet
Nitin Goyal (Editor), Sharad Sharma (Editor), Arun Kumar Rana (Editor), Suman Lata Tripathi (Editor) - Internet of Things_ Robotic and Drone Technology (Smart Engineering Systems_ Design and Applicati
289 pages
CHAPTER 2 Operations and Decisions Making
No ratings yet
CHAPTER 2 Operations and Decisions Making
26 pages
FE Civil Practice Exam
100% (5)
FE Civil Practice Exam
56 pages
SPM Chemistry Structured Answers 2003-2008
68% (19)
SPM Chemistry Structured Answers 2003-2008
27 pages
Bell-LaPadula Security Model Guide
No ratings yet
Bell-LaPadula Security Model Guide
6 pages
Shed Type Structures - Steel Vs Bamboo: Mini Project Report On
100% (3)
Shed Type Structures - Steel Vs Bamboo: Mini Project Report On
33 pages
Unit 06 - MAD
No ratings yet
Unit 06 - MAD
16 pages
Xi Physics - Doc-2
No ratings yet
Xi Physics - Doc-2
14 pages
Manual Completo S-110-P
No ratings yet
Manual Completo S-110-P
18 pages
Interschool Quizbowl - Math Questions
No ratings yet
Interschool Quizbowl - Math Questions
4 pages
IW Quarter-Turn - Metric
No ratings yet
IW Quarter-Turn - Metric
6 pages
STL Final Record
No ratings yet
STL Final Record
46 pages
Chapter 2. Acute Angle Right Triangle
No ratings yet
Chapter 2. Acute Angle Right Triangle
8 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Nokia Real-Exams 4A0-100 v2020-03-17 by Benjamin 182q
No ratings yet
Nokia Real-Exams 4A0-100 v2020-03-17 by Benjamin 182q
87 pages
Azure Web Apps & DevOps Lab Guide
No ratings yet
Azure Web Apps & DevOps Lab Guide
6 pages

Hafta 14

Uploaded by

Hafta 14

Uploaded by

In-Order Issue

• Instruction issuing is stalled by a resource conflict, a data dependency, or a

• Instruction I2 is allowed to run to completion prior to I1.

• No additional instructions are decoded until the conflict is

• As a result, the processor cannot look ahead of the point of

• To allow out-of-order issue, it is necessary to decouple the

• This is done with a buffer referred to as an instruction

• As long as this buffer is not full, the processor can continue

• When a functional unit becomes available in the execute

• Any instruction may be issued, provided that

• Instructions are issued from the instruction window with

• As before, the only constraint is that the program execution

• In this example, it is possible to issue instruction I6 ahead of I5

• Thus, one cycle is saved in both the execute and write-back

• These dependencies differ from RAW data dependencies and

• WAW dependencies and WAR dependencies, on the other

• May result in a pipeline stall

• Registers allocated dynamically

• The register reference with the subscript refers to a hardware

• When a new allocation is made for a particular logical register,

• The result is that I3 can be issued immediately

• Without renaming, I3 cannot be issued until the first

• Logic for determining true dependencies involving register values,

• Mechanisms for initiating, or issuing, multiple instructions in parallel.

• Resources for parallel execution of multiple instructions, including

• Mechanisms for committing the process state in correct order.

• Assume the use of a four-stage pipeline: fetch, decode/issue,

• In this program, the second ADD instruction (instruction 3) depends

• With an out-of-order capability, the processor can stall instruction 3

• The LOAD finishes execution at clock 9, and so the dependent ADD

b) Redo the table assuming no out-of-order capability. What is

c) Redo the table assuming a superscalar implementation that

You might also like