0% found this document useful (0 votes)
14 views25 pages

W1M3 HLS ProblemFormulations

The document discusses C-Based VLSI Design and High-level Synthesis, outlining its automated design process that converts high-level specifications into optimized RTL descriptions for hardware implementation. It covers key steps in high-level synthesis, including preprocessing, scheduling, allocation, and binding, as well as the importance of data path and controller design. Additionally, it addresses scheduling problems, resource allocation, and optimizations such as loop unrolling and pipelining to enhance performance and efficiency in VLSI design.

Uploaded by

shinynptel11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views25 pages

W1M3 HLS ProblemFormulations

The document discusses C-Based VLSI Design and High-level Synthesis, outlining its automated design process that converts high-level specifications into optimized RTL descriptions for hardware implementation. It covers key steps in high-level synthesis, including preprocessing, scheduling, allocation, and binding, as well as the importance of data path and controller design. Additionally, it addresses scheduling problems, resource allocation, and optimizations such as loop unrolling and pipelining to enhance performance and efficiency in VLSI design.

Uploaded by

shinynptel11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

C-Based VLSI Design- Problem

Formulation

Dr. Chandan Karfa


Department of Computer Science and Engineering

IIT Guwahati 1
C-Based VLSI Design == High-level Synthesis
What
• Automated design process that transforms a high level functional specification to
optimized register-transfer level (RTL) descriptions for efficient hardware
implementation
Why
– Productivity
• lower design complexity and faster simulation speed
– Portability
• single source -> multiple implementations
– Permutability
• rapid design space exploration -> higher quality of result (QoR)

IIT Guwahati 2
Design Space Exploration with HLS

IIT Guwahati 3
High-level Synthesis Steps

• Preprocessing: Intermediate representation (CDFG)


construction, data-dependency, live variable analysis,
compiler optimization.

• Scheduling: Assigns control step to the operations of the input


behaviour.

• Allocation: Computes minimum number of functional units


and registers.

• Binding: Variables are mapped to registers, operation to


functional units, data transfers to the interconnection units.

• Data path & Controller design: controller is designed based


on inter connections among the data path elements, data
transfer required in different control steps.

4 IIT Guwahati
High-level Synthesis Steps
1. < 1 *>
pre- R1 : 3, v1
2. <2 + > <0 * > R2 : x u, v5
processing
3. <4 * > | *| Allocation & R3 : v0, v6
scheduling 4. | *| <3 *> binding R4 : v3
5. <7 *> <6 *> | *|

6. | *| | *| <5 - > FU1: op1, on3. ..


7. <9 +> <8 - > FU2: op2, op5, …
Input behaviour FU3: …
FU1:
Data-path
generation

Data-path
Controller
status signal Control signal generation

Controller
RTL behaviour

5 IIT Guwahati
Intermediate representation
• Purposes of creating and operating on an IR
• Encode the behavior of the program
• Facilitate analysis
• Facilitate optimization
• Facilitate retargeting

IIT Guwahati 6
Program Flow Analysis
• Control flow analysis: determine control structure of a program and
build control flow graphs (CFGs)
• Data flow analysis: determine the flow of data values and build data
flow graphs (DFGs)

IIT Guwahati 7
Basic Block
• Basic block: a sequence of consecutive
intermediate language statements in
which flow of control can only enter at
the beginning and leave at the end.
• Identify Basic blocks
• Identify Control flow

• Usually C compilers like GCC or LLVM


are integrated into HLS flow as front
end

IIT Guwahati 8
Scheduling Problem Formulation
Input:
• Sequence Graph G = (V, E), |V| = n
• Delay of each node. D = {di, i = 0, 1, …, n}
• Resource or Timing Constraints (optional)
Output:
• The start time of each node T = {ti, i=0, 1, 2, …, n}
• Latency: number of cycles to execute the entire schedule. Difference of start
time of source node and sink node; latency = tn – t0
The start time of an operation is at least as large as the start time of
each of its direct predecessor plus its execution delay

IIT Guwahati 9
Scheduling Problems
• Minimum Latency Unconstrained minimum-latency scheduling
problem (Unconstraint)
• Minimum latency under resource constraints (MLRC)
• Minimum resource under latency constraints (MRLC)

IIT Guwahati 10
Allocation and Binding
• Objectives: Maximize Resource sharing; hence, minimize
resource usage
Operations Functional Units

Variables Storage

Subtasks:
1. FU allocation & Binding
2. Register Allocation & Binding

11
Datapath and Controller FSM Generation
Data path design and control synthesis are conceptually simple but still
important steps in synthesis
Bus-based or mux-based architecture
Generated data path is an interconnection of blocks
Controller is a finite-state machine
Optimization is used to reduce mux sizes.

IIT Guwahati 12
Datapath Synthesis

R1, R2, R1, R5, R4 R6, R1, R5, R6, R2

FU

R1, R1, R2, R7, R4

IIT Guwahati 13
Data path Generation

IIT Guwahati 14
Controller Synthesis
REGISTERS
a
3
dx
x enable
y
u
r1
r2

Mux control

ALU control (+,-,<)


ALU
*

DATA-PATH CONTROL-UNIT
IIT Guwahati 15
High-level Synthesis Steps
1. < 1 *>
pre- R1 : 3, v1
2. <2 + > <0 * > R2 : x u, v5
processing
3. <4 * > | *| Allocation & R3 : v0, v6
scheduling 4. | *| <3 *> binding R4 : v3
5. <7 *> <6 *> | *|

6. | *| | *| <5 - > FU1: op1, on3. ..


7. <9 +> <8 - > FU2: op2, op5, …
Input behaviour FU3: …
FU1:
Data-path
generation

Data-path
Controller
status signal Control signal generation

Controller
RTL behaviour

16 IIT Guwahati
Typical C/C++ Constructs to RTL Mapping

IIT Guwahati 17
Function Hierarchy
• Each function is usually translated into an RTL module
• Functions may be inlined to dissolve their hierarchy

IIT Guwahati 18
Function Arguments
• Function arguments become ports on the RTL modules

• Input/output (I/O) protocols


• Allow RTL blocks to automatically synchronize data exchange

IIT Guwahati 19
Expressions
• HLS generates datapath circuits mostly from expressions
• Timing constraints influence the degree of registering

IIT Guwahati 20
Arrays
• By default, an array in C code is typically implemented by a memory block
in the RTL
• – Read & write array -> RAM; Constant array -> ROM

• An array can be partitioned and map to multiple RAMs


• Multiples arrays can be merged and map to one RAM
• An array can be partitioned into individual elements and map to registers

IIT Guwahati 21
Loops
• By default, loops are rolled
• Each loop iteration corresponds to a “sequence” of states
• This state sequence will be repeated multiple times based on the loop trip
count

IIT Guwahati 22
Loop Unrolling
• Loop unrolling to expose higher parallelism
and achieve shorter latency
• Pros
• Decrease loop overhead
• Increase parallelism for scheduling
• Cons
• Increase operation count, which
• may negatively impact area, power, and timing

IIT Guwahati 23
Loop Pipelining
• Loop pipelining is one of the most important optimizations for high-level synthesis
• Allows a new iteration to begin processing before the previous iteration is complete
• Key metric: Initiation Interval (II) in # cycles

IIT Guwahati 24
Thank You

IIT Guwahati 25

You might also like