ASIC Design Flow How To Design Your Own Chip: Frank K. G Urkaynak
ASIC Design Flow How To Design Your Own Chip: Frank K. G Urkaynak
Frank K. Grkaynak u
Integrated Systems Laboratory 10. October 2006
1 / 51
Overview Who
Overview Who
Who am I ?
Frank K. Grkaynak u
Originally from Turkey Working on IC design since 1994 Studied and gave IC design courses at:
Istanbul Teknik Universitesi ( U) IT Ecole Polytechnique Fdrale de Lausanne (EPFL) e e Worcester Polytechnic Institute (WPI) Eidgenssiche Technische Hochschule Zrich (ETHZ) o u
3 / 51
Overview Who
Who am I ?
Frank K. Grkaynak u
Originally from Turkey Working on IC design since 1994 Studied and gave IC design courses at:
Istanbul Teknik Universitesi ( U) IT Ecole Polytechnique Fdrale de Lausanne (EPFL) e e Worcester Polytechnic Institute (WPI) Eidgenssiche Technische Hochschule Zrich (ETHZ) o u
Worked on IC design at Motorola and IBM, worked for Philips Made enough errors designing chips to last 3 careers.
3 / 51
4 / 51
4 / 51
Trade-os in Design
No free lunch
Depending on the design, some parameters are more important. You can generally sacrice one parameter to improve the other:
5 / 51
Trade-os in Design
No free lunch
Depending on the design, some parameters are more important. You can generally sacrice one parameter to improve the other: Speed vs Area It is possible to speed up a circuit by using larger transistors, parallel computation blocks.
5 / 51
Trade-os in Design
No free lunch
Depending on the design, some parameters are more important. You can generally sacrice one parameter to improve the other: Speed vs Area It is possible to speed up a circuit by using larger transistors, parallel computation blocks. Design time vs Performance Given enough time, the circuits can be optimized for higher performance
5 / 51
6 / 51
7 / 51
Design Specication
A list of requirements
Function What is expected from the ASIC ? Perfomance What speed, power, area ? I/O requirements How will the ASIC t together with the system ?
8 / 51
Feasibility ?
For most of the projects: We have never done it, so we dont know exactly! We may: Have experience from earlier projects Make small experiments to estimate performance Choose appropriate technology
9 / 51
Da ta IN xD I
Pe nx Ta SO xS I
Iterative Process
Identify blocks What do we need to perform the functionality?
32
Controller
Reg-128
128
InvShiftRows
128 32
Key Expansion
128
64 x 32 SRAM
128
AddRoundKey
128
ShiftRows
128 128
Reg-128
ShiftRows AddRoundKey
10 / 51
Da ta IN xD I
Pe nx Ta SO xS I
Iterative Process
Identify blocks What do we need to perform the functionality?
32
Controller
Reg-128
128
InvShiftRows
128 32
Key Expansion
128
64 x 32 SRAM
128
AddRoundKey
128
ShiftRows
128 128
Reg-128
ShiftRows AddRoundKey
10 / 51
Da ta IN xD I
Pe nx Ta SO xS I
Iterative Process
Identify blocks What do we need to perform the functionality?
32
Controller
Reg-128
128
InvShiftRows
128 32
Key Expansion
Visualize structure How are blocks connected? Find critical paths Which block is most critical (speed, area, power)?
128
64 x 32 SRAM
128
AddRoundKey
128
ShiftRows
128 128
Reg-128
ShiftRows AddRoundKey
10 / 51
Da ta IN xD I
Pe nx Ta SO xS I
Iterative Process
Identify blocks What do we need to perform the functionality?
32
Controller
Reg-128
128
InvShiftRows
128 32
Key Expansion
Visualize structure How are blocks connected? Find critical paths Which block is most critical (speed, area, power)? Divide and Conquer Draw sub-block diagrams
128
64 x 32 SRAM
128
AddRoundKey
128
ShiftRows
128 128
Reg-128
ShiftRows AddRoundKey
10 / 51
Architectural Transformations
Area
Performance parameters: Area (mm2 ) Clock rate (MHz) Throughput (data/sec) Latency (num clock cycles)
1
Pipelining
Smaller
Eciency
Faster
11 / 51
Parallelization
More computation
If we use 2 parallel blocks: Area doubles Clock stays same Throughput doubles Latency stays same
12 / 51
Pipelining
Faster computation
if we introduce one pipeline stage: Area increases a little Clock doubles Throughput doubles Latency doubles
13 / 51
Iterative Decomposition
14 / 51
Building a Model
15 / 51
Model Types
Key words in modelling
bit-true The model mimicks the hardware at bit level. Numbers are actually computed at the same accuracy as the hardware cycle-true The model accurately replicates how the hardware works for every clock cycle. transaction-based A high level model that works on blocks of data. It calculates the end result of the computation, intermediate steps are not available The model is an important part of simulation environment
16 / 51
Describing Hardware
Next Lecture
We will discuss this topic in a second lecture How to turn an idea into an architecture How to come up with a block diagram, Converting block diagram into VHDL code
17 / 51
18 / 51
Testbenches
Stimuli Generator stimuli DUT Clock Generator Comparator response simulation report expected response Golden Model
19 / 51
Verication
Bug hunting
Time consuming The majority of design is verication. Exhaustive tests are not feasible A 32 bit adder has 264 possible input combinations. If we check 1.000.000.000 inputs per second it will take 200 days !! Every line we write, has a potential for error People talk about 1 bug every 20 lines of code. Golden models can be wrong Sometimes, your hardware description is correct, but your model is wrong.
20 / 51
Synthesis
21 / 51
22 / 51
Icharge Cload
Idischarge
23 / 51
ID,lin = Cox
ID,sat = Cox
Driving Current
The geometric parameters directly determine the amount of current that can ow through he transistor. If length (L) is constant, transistors that are wider (W), have more current. Larger transistors are faster
24 / 51
Capacitive Loads
Input Capacitance
25 / 51
Input Capacitance
Mos capacitance
The dominant capacitance is the gate capacitance. The capacitance is proportional to the gate area. The wider the transistor the more capacitance it has
Gate Drain
Length n+
n+
Source + +
n n
p-
Bulk
p-
id
th
26 / 51
Timing Basics
Summary
If the load stays constant, making a transistor wider, will make it faster.
27 / 51
Timing Basics
Summary
If the load stays constant, making a transistor wider, will make it faster. A larger transistor will have a higher input capacitance
27 / 51
Timing Basics
Summary
If the load stays constant, making a transistor wider, will make it faster. A larger transistor will have a higher input capacitance It will be harder to drive this transistor
27 / 51
Timing Basics
Summary
If the load stays constant, making a transistor wider, will make it faster. A larger transistor will have a higher input capacitance It will be harder to drive this transistor Find a balance between driving strength and input capacitance
27 / 51
Timing Basics
Summary
If the load stays constant, making a transistor wider, will make it faster. A larger transistor will have a higher input capacitance It will be harder to drive this transistor Find a balance between driving strength and input capacitance The fanout (number of driven gates) of a transistor and the length of the interconnections determines the switching speed. Exact contribution of interconnect is not known at early stages, has to be estimated.
27 / 51
The synthesizer is lazy, if you dont set the proper constraints it will select constraints that will make him work less. Always set proper constraints
Synthesis Constraints
max delay combinational delay max area total circuit area setting the constraint does not guarantee the result
28 / 51
Sequential Timing
comb. circuit
comb. circuit
comb. circuit
Timing paths
In a sequential circuit there are 4 dierent timing paths: Register to register
comb. circuit
tpd,reg2reg
tpd,FF
tsetup,FF
29 / 51
Sequential Timing
comb. circuit
comb. circuit
comb. circuit
Timing paths
In a sequential circuit there are 4 dierent timing paths: Register to register Input to register
tinput
comb. circuit
tpd,in2reg
tsetup,FF
29 / 51
Sequential Timing
comb. circuit
comb. circuit
comb. circuit
Timing paths
In a sequential circuit there are 4 dierent timing paths: Register to register Input to register Register to output
comb. circuit
tpd,reg2out
toutput
tpd,FF
29 / 51
Sequential Timing
comb. circuit
comb. circuit
comb. circuit
Timing paths
In a sequential circuit there are 4 dierent timing paths: Register to register Input to register Register to output Input to output
tinput
comb. circuit
tpd,in2out
toutput
29 / 51
Sequential Timing
Timing paths
comb. circuit comb. circuit comb. circuit
In a sequential circuit there are 4 dierent timing paths: Register to register Input to register Register to output Input to output One of these paths will limit the performance of the system.
tinput
comb. circuit
tpd
toutput
tpd,FF
tsetup,FF
29 / 51
30 / 51
30 / 51
35.0k
30.0k
60.0
50.0
00 2 m .n
m 2.n s
15.0k
00
40.0 0
30.00
0 m 2 .ns
10.0k
0 m 2 .ns
5.0k
31 / 51
32 / 51
32 / 51
32 / 51
32 / 51
Goals Fplan Cells Pwr Place Clk Route Ext Opt DRC L
Back-end Design
33 / 51
Goals Fplan Cells Pwr Place Clk Route Ext Opt DRC L
Floorplaning
How will the chip look
Determine the total area/geometry of the chip Place the I/O cells Place pre-designed macro blocks Leave room for routing, optimizations, power connections
34 / 51
Goals Fplan Cells Pwr Place Clk Route Ext Opt DRC L
Floorplaning
How will the chip look
Determine the total area/geometry of the chip Place the I/O cells Place pre-designed macro blocks Leave room for routing, optimizations, power connections iterative process, can not determine the perfect oorplan from the beginning
Integrated Systems Laboratory (kgf) ASIC Design Flow 34 / 51
Goals Fplan Cells Pwr Place Clk Route Ext Opt DRC L
Standard Cells - 2
35 / 51
Goals Fplan Cells Pwr Place Clk Route Ext Opt DRC L
36 / 51
Goals Fplan Cells Pwr Place Clk Route Ext Opt DRC L
Power Planning
VDD
GND
Power Stripe
Standard Cells I/O and Corner Pads Placed on the Padframe Power Pad Connections
Block Halo
37 / 51
Goals Fplan Cells Pwr Place Clk Route Ext Opt DRC L
NP hard problem
What is the best way of placing the cells within a given area so that: Critical path is minimum Long interconnections on the critical path add capacitance The design is routable Not all placements can be routed. The area is minimum The routing overhead inreases area.
38 / 51
Goals Fplan Cells Pwr Place Clk Route Ext Opt DRC L
Clock Distribution
Clock is the most critical signal
Standard digital systems rely on the clock signal being present everywhere on the chip at the same time: skew Clock signal has to be connected to all ip-ops: high fan out Specialized tools insert multi level buers (to drive the load) and balance the timing by ensuring the same wirelength for all connection.
39 / 51
Goals Fplan Cells Pwr Place Clk Route Ext Opt DRC L
Clock Distribution
Clock is the most critical signal
Standard digital systems rely on the clock signal being present everywhere on the chip at the same time: skew Clock signal has to be connected to all ip-ops: high fan out Specialized tools insert multi level buers (to drive the load) and balance the timing by ensuring the same wirelength for all connection. The following example is a 200 MHz 3D image renderer with roughly 3 million transistors. The clock distribution has:
10.928 ip-ops 9 level clock tree 478 buers in the clock tree 34 cm total clock wiring
Integrated Systems Laboratory (kgf) ASIC Design Flow 39 / 51
Goals Fplan Cells Pwr Place Clk Route Ext Opt DRC L
40 / 51
Goals Fplan Cells Pwr Place Clk Route Ext Opt DRC L
40 / 51
Goals Fplan Cells Pwr Place Clk Route Ext Opt DRC L
40 / 51
Goals Fplan Cells Pwr Place Clk Route Ext Opt DRC L
40 / 51
Goals Fplan Cells Pwr Place Clk Route Ext Opt DRC L
40 / 51
Goals Fplan Cells Pwr Place Clk Route Ext Opt DRC L
40 / 51
Goals Fplan Cells Pwr Place Clk Route Ext Opt DRC L
Routing
Determine interconnection
Multiple (3-9) metal routing layers. Signals on dierent layers do not intersect. Vias to interconnect metals on adjacent layers. The longer the interconnection:
The more the capacitance The slower the connection The more the power consumption
Goals Fplan Cells Pwr Place Clk Route Ext Opt DRC L
Extraction
42 / 51
Goals Fplan Cells Pwr Place Clk Route Ext Opt DRC L
Optimization
43 / 51
Goals Fplan Cells Pwr Place Clk Route Ext Opt DRC L
Production rules
Every physical layer has limits that are determined by the production ow. These include Minimum spacing Minimum width
0.6 0.25
PMOS transistor
0.6
0.4 0.5
NMOS transistor
gnd!
P-Substrate Contact
44 / 51
Goals Fplan Cells Pwr Place Clk Route Ext Opt DRC L
45 / 51
Goals Fplan Cells Pwr Place Clk Route Ext Opt DRC L
46 / 51
Goals Fplan Cells Pwr Place Clk Route Ext Opt DRC L
Tape-out
47 / 51
48 / 51
Testing
49 / 51
Synthesis
Gate-level Sim.
Synopsys
Modelsim
Test Insertion
Fault Grading
Placement
Timing
Silicon Encounter
ClockTree
Gate-level Sim.
Timing
Routing
DRC
LVS
Calibre
Pearl
Tetramax
50 / 51
51 / 51