0% found this document useful (0 votes)
18 views20 pages

VL I W Processor

Uploaded by

diya32755
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views20 pages

VL I W Processor

Uploaded by

diya32755
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

VLIW PROCESSORS

Department of E &TC, MITCOE, Pu


Introduction
o Very long instruction word or VLIW refers to a
processor architecture designed to take advantage of
instruction level parallelism
o Instruction of a VLIW processor consists of multiple
independent operations grouped together.
o There are Multiple Independent Functional Units in
VLIW processor architecture.
o Each operation in the instruction is aligned to a
functional unit.
o All functional units share the use of a common large
register file.
o This type of processor architecture is intended to allow
higher performance without the inherent complexity of
Different Approaches
Other approaches to improving performance in processor architectures
:
o Pipelining
Breaking up instructions into sub-steps so that instructions can be
executed partially at the same time

o Superscalar architectures
Dispatching individual instructions to be executed completely
independently in different parts of the processor

o Out-of-order execution
Executing instructions in an order different from the program
Instruction Level Parallelism (ILP )
o Instruction-level parallelism (ILP) is a measure of how
many of the operations in a computer program can be
performed simultaneously.

o The overlap among instructions is called instruction level


parallelism.

o Ordinary programs are typically written under a sequential


execution model where instructions execute one after the
other and in the order specified by the programmer.

o Goal of compiler and processor designers implementing ILP


is to identify and take advantage of as much ILP as possible.
What is ILP? (Example)
Consider the following program:
op 1 e = a + b
op2 f = c + d
op3 m = e * f

o Operation 3 depends on the results of operations 1 and 2, so


it cannot be calculated until both of them are completed
o However, operations 1 and 2 do not depend on any other
operation, so they can be calculated simultaneously
o If we assume that each operation can be completed in one
unit of time then these three instructions can be completed
in a total of two units of time
o giving an ILP of 3/2.
VLIW Compiler
o Compiler is responsible for static scheduling of instructions in
VLIW processor.

o Compiler finds out which operations can be executed in parallel


in the program.

o It groups together these operations in single instruction which is


the very large instruction word.

o Compiler ensures that an operation is not issued before its


operands are ready.
VLIW Instruction
o One VLIW instruction word encodes multiple operations which
allows them to be initiated in a single clock cycle.

o The operands and the operation to be performed by the various


functional units are specified in the instruction itself.

o One instruction encodes at least one operation for each execution


unit of the device.

o So length of the instruction increases with the number of execution


units

o To accommodate these operation fields, VLIW instructions are


usually at least 64 bits wide, and on some architectures are much
wider up to 1024 bits.
VLIW Instruction

Add r1,r2,r3; Sub r4,r5,r6; Ld r7,data; St r8,data

REGESTER FILES

LOAD LOAD
ALU ALU
/STORE /STORE
ILP in VLIW
o Consider the computation of y = a1x1 + a2x2 + a3x3
On a sequential processor On the VLIW processor with
2 load/store units, 1 multiply unit
and 1 add unit
cycle 1: load a1 cycle 1: load a1
cycle 2: load x1 load x1
cycle 3: load a2 cycle 2: load a2
cycle 4: load x2 load x2
cycle 5: multiply z1 a1 x1 Multiply z1 a1 x1
cycle 6: multiply z2 a2 x2 cycle 3: load a3
cycle 7: add y z1 z2 load x3
cycle 8: load a3 Multiply z2 a2 x2
cycle 9: load x3 cycle 4: multiply z3 a3 x3
cycle 10: multiply z1 a3 x3 add y z1 z2
cycle 11: add y y z2 cycle 5: add y y z3

requires 11 cycles. requires 5 cycles.


Block Diagram
Diagram (Conceptual Instruction
Execution)
Working
o Long instruction words are fetched from the memory
o A common multi-ported register file for fetching the operands
and storing the results.
o Parallel random access to the register file is possible through
the read/write cross bar.
o Execution in the functional units is carried out concurrently
with the load/store operation of data between RAM and the
register file.
o One or multiple register files for FX and FP data.
o Rely on compiler to find parallelism and schedule dependency
free program code.
Difference Between VLIW &
Superscalar Architecture
VLIW vs. Superscalar
Architecture
o Instruction formulation
 Superscalar:
⁻ Receive conventional instructions conceived for sequential
processors.

 VLIW:
⁻ Receive long instruction words, each comprising a field (or
opcode) for each execution unit.
⁻ Instruction word length depends number of execution units and
code length to control each unit (such as opcode length,
registers).
⁻ Typical word length is 64 – 1024 bits, much longer than
conventional machine word length.
VLIW vs. Superscalar
Architecture
o Instruction scheduling

 Superscalar:
⁻ Done dynamically at run-time by the hardware.
⁻ Data dependency is checked and resolved in
hardware.
⁻ Need a look ahead hardware window for instruction
fetch.

 VLIW:
⁻ Done statically at compile time by compiler.
⁻ Data dependency is checked by compiler.
⁻ In case of un-filled opcodes in a VLIW, memory
space and instruction bandwidth are wasted.
Comparison: CISC, RISC,
VLIW
ARCHITECTURE CISC RISC VLIW
CHARACTERIST
C
Instruction Size Varies One size, usually 32 bits One size

Instruction Varies from simple to Almost always one Many simple,


Semantics complex; possibly simple independent
many operation operations
dependent operations
per instruction
Registers Few, sometimes Many, general-purpose Many, general-
special purpose
Hardware Design Exploit microcode Exploit implementations Exploit
implementations with one pipeline and & implementations with
no microcode multiple pipelines, no
microcode &
no complex dispatch
logic
Advantages of VLIW
o Dependencies are determined by compiler and used to
schedule according to function unit latencies .
o Function units are assigned by compiler and correspond
to the position within the instruction packet.
o Reduces hardware complexity.
• Tasks such as decoding, data dependency detection,
instruction issues etc. becoming simple.
• Ensures potentially higher Clock Rate.
• Ensures Low power consumption
Disadvantages of VLIW
o Higher complexity of the compiler
o Compatibility across implementations : Compiler optimization
needs to consider technology dependent parameters such as
latencies and load-use time of cache.
o Unscheduled events (e.g. cache miss) stall entire processor .
o Code density: In case of un-filled opcodes in a VLIW, memory
space and instruction bandwidth are wasted i.e. low slot utilization.
o Code expansion: Causes high power consumption
Applications
o VLIW architecture is suitable for Digital Signal Processing
applications.
o Processing of media data like compression/decompression
of Image and speech data.
Examples of VLIW processor
o VLIW Mini supercomputers:
Multiflow TRACE 7/300, 14/300, 28/300
Multiflow TRACE /500
Cydrome Cydra 5
IBM Yorktown VLIW Computer
o Single-Chip VLIW Processors:
Intel iWarp, Philip’s LIFE Chips
o Single-Chip VLIW Media (through-put) Processors:
Trimedia, Chromatic, Micro-Unity
o DSP Processors (TI TMS320C6x )

You might also like