C-Based VLSI Design- Problem
Formulation
Dr. Chandan Karfa
Department of Computer Science and Engineering
IIT Guwahati 1
C-Based VLSI Design == High-level Synthesis
What
• Automated design process that transforms a high level functional specification to
optimized register-transfer level (RTL) descriptions for efficient hardware
implementation
Why
– Productivity
• lower design complexity and faster simulation speed
– Portability
• single source -> multiple implementations
– Permutability
• rapid design space exploration -> higher quality of result (QoR)
IIT Guwahati 2
Design Space Exploration with HLS
IIT Guwahati 3
High-level Synthesis Steps
• Preprocessing: Intermediate representation (CDFG)
construction, data-dependency, live variable analysis,
compiler optimization.
• Scheduling: Assigns control step to the operations of the input
behaviour.
• Allocation: Computes minimum number of functional units
and registers.
• Binding: Variables are mapped to registers, operation to
functional units, data transfers to the interconnection units.
• Data path & Controller design: controller is designed based
on inter connections among the data path elements, data
transfer required in different control steps.
4 IIT Guwahati
High-level Synthesis Steps
1. < 1 *>
pre- R1 : 3, v1
2. <2 + > <0 * > R2 : x u, v5
processing
3. <4 * > | *| Allocation & R3 : v0, v6
scheduling 4. | *| <3 *> binding R4 : v3
5. <7 *> <6 *> | *|
6. | *| | *| <5 - > FU1: op1, on3. ..
7. <9 +> <8 - > FU2: op2, op5, …
Input behaviour FU3: …
FU1:
Data-path
generation
Data-path
Controller
status signal Control signal generation
Controller
RTL behaviour
5 IIT Guwahati
Intermediate representation
• Purposes of creating and operating on an IR
• Encode the behavior of the program
• Facilitate analysis
• Facilitate optimization
• Facilitate retargeting
IIT Guwahati 6
Program Flow Analysis
• Control flow analysis: determine control structure of a program and
build control flow graphs (CFGs)
• Data flow analysis: determine the flow of data values and build data
flow graphs (DFGs)
IIT Guwahati 7
Basic Block
• Basic block: a sequence of consecutive
intermediate language statements in
which flow of control can only enter at
the beginning and leave at the end.
• Identify Basic blocks
• Identify Control flow
• Usually C compilers like GCC or LLVM
are integrated into HLS flow as front
end
IIT Guwahati 8
Scheduling Problem Formulation
Input:
• Sequence Graph G = (V, E), |V| = n
• Delay of each node. D = {di, i = 0, 1, …, n}
• Resource or Timing Constraints (optional)
Output:
• The start time of each node T = {ti, i=0, 1, 2, …, n}
• Latency: number of cycles to execute the entire schedule. Difference of start
time of source node and sink node; latency = tn – t0
The start time of an operation is at least as large as the start time of
each of its direct predecessor plus its execution delay
IIT Guwahati 9
Scheduling Problems
• Minimum Latency Unconstrained minimum-latency scheduling
problem (Unconstraint)
• Minimum latency under resource constraints (MLRC)
• Minimum resource under latency constraints (MRLC)
IIT Guwahati 10
Allocation and Binding
• Objectives: Maximize Resource sharing; hence, minimize
resource usage
Operations Functional Units
Variables Storage
Subtasks:
1. FU allocation & Binding
2. Register Allocation & Binding
11
Datapath and Controller FSM Generation
Data path design and control synthesis are conceptually simple but still
important steps in synthesis
Bus-based or mux-based architecture
Generated data path is an interconnection of blocks
Controller is a finite-state machine
Optimization is used to reduce mux sizes.
IIT Guwahati 12
Datapath Synthesis
R1, R2, R1, R5, R4 R6, R1, R5, R6, R2
FU
R1, R1, R2, R7, R4
IIT Guwahati 13
Data path Generation
IIT Guwahati 14
Controller Synthesis
REGISTERS
a
3
dx
x enable
y
u
r1
r2
Mux control
ALU control (+,-,<)
ALU
*
DATA-PATH CONTROL-UNIT
IIT Guwahati 15
High-level Synthesis Steps
1. < 1 *>
pre- R1 : 3, v1
2. <2 + > <0 * > R2 : x u, v5
processing
3. <4 * > | *| Allocation & R3 : v0, v6
scheduling 4. | *| <3 *> binding R4 : v3
5. <7 *> <6 *> | *|
6. | *| | *| <5 - > FU1: op1, on3. ..
7. <9 +> <8 - > FU2: op2, op5, …
Input behaviour FU3: …
FU1:
Data-path
generation
Data-path
Controller
status signal Control signal generation
Controller
RTL behaviour
16 IIT Guwahati
Typical C/C++ Constructs to RTL Mapping
IIT Guwahati 17
Function Hierarchy
• Each function is usually translated into an RTL module
• Functions may be inlined to dissolve their hierarchy
IIT Guwahati 18
Function Arguments
• Function arguments become ports on the RTL modules
• Input/output (I/O) protocols
• Allow RTL blocks to automatically synchronize data exchange
IIT Guwahati 19
Expressions
• HLS generates datapath circuits mostly from expressions
• Timing constraints influence the degree of registering
IIT Guwahati 20
Arrays
• By default, an array in C code is typically implemented by a memory block
in the RTL
• – Read & write array -> RAM; Constant array -> ROM
• An array can be partitioned and map to multiple RAMs
• Multiples arrays can be merged and map to one RAM
• An array can be partitioned into individual elements and map to registers
IIT Guwahati 21
Loops
• By default, loops are rolled
• Each loop iteration corresponds to a “sequence” of states
• This state sequence will be repeated multiple times based on the loop trip
count
IIT Guwahati 22
Loop Unrolling
• Loop unrolling to expose higher parallelism
and achieve shorter latency
• Pros
• Decrease loop overhead
• Increase parallelism for scheduling
• Cons
• Increase operation count, which
• may negatively impact area, power, and timing
IIT Guwahati 23
Loop Pipelining
• Loop pipelining is one of the most important optimizations for high-level synthesis
• Allows a new iteration to begin processing before the previous iteration is complete
• Key metric: Initiation Interval (II) in # cycles
IIT Guwahati 24
Thank You
IIT Guwahati 25