Notes7 Applying Synthesis Constraints
Notes7 Applying Synthesis Constraints
8.1 Introduction
All synthesis tools must have a method of constraining the output netlist it generates. There are
numerous synthesis constraints that need to be applied to ensure the output netlist will work in the
final application. It is important to note that most design constraints govern speed and area
usually one is at the expense of the other. We'll discuss the important constraints using the world's
most widely used 3rd party synthesis tool Synopsys Design Compiler. All examples I discuss
will synthesize using Design Compiler.
REFERENCE much of this material was created using the Synopsys Chip Synthesis Workshop
Student Guide training notes as a guide.
Synthesis tools give the best results when synthesizing registers, not large pools of
combinational logic. For this reason, it is best to synthesize to a register boundary and, in
fact, this is a logical place to break a circuit anyway. Synopsys Design Compiler will use
registers as synthesis boundaries when it synthesizes your VHDL code.
Then it logically makes sense to partition our design into synthesizable regions, which
being to choose regions targeted for synthesis that terminate on a register boundary. If we
have two separate VHDL entities that communicate with each other without registers at the
boundaries, then it likely wouldn't make sense to synthesize these entities separately.
Combine the .entities into a single synthesizable region. Also, very large architectural
bodies don't synthesize well so partition the VHDL code into manageable synthesizable
blocks my preference is to use separate entity/architecture pairs, not the BLOCK
statement this is cleaner.
To keep the VHDL code technology independent, we avoid hand instantiating technology
components from the target library as a means of obtaining the desired circuit. Instead, we
apply the appropriate synthesis constraints (and tinker with the VHDL code) until we get
the desired circuit.
Recall the lectures on "Coding for Synthesis". They tie in with this section.
Looking that these seven types of design objects from a VHDL perspective is as below:
ENTITY top_level IS
PORT ( clock : IN std_logic;
input1 : IN std_logic;
input2 : IN std_logic_vector (7 DOWNTO 0);
outputl : OUT std-logic);
END top_level;
-hierarchy (optional) use this option if all objects within an hierarchical design are
to be returned. Only works with these object types: design, net, cell or pin.
Design Compiler can be run using a menu-driven GUI (design-analyzer) or from a text-
only command prompt (dc-shell). With design-analyzer, the user selects menu commands
that are, in turn, sent to Design Compiler. With dc-shell, the user inputs commands directly
to Design Compiler (text only). Because there are numerous commands that can only be
invoked from the command prompt, this is what we will use in this course. Walk through
design-analyzer in a lab slot sometime.
After invoking dc-shell, the first thing you need to do is read, analyze and elaborate the
VHDL file that we are targeting for synthesis. The analyze command automatically reads
the VHDL file (so we tend not to invoke separate read commands) and performs basic
syntax checks and such to ensure that the VHDL code is synthesizable. The elaborate
command builds the design from the analyzed VHDL file it "translates" the VHDL code
and maps the logic to "generic" boolean logic cells. Note that analyzing and elaborating the
VHDL codes WILL pick-up some problems with the code that are missed by compilation
using NC-VHDL. For example, the elaborate command picks up things like missing
signals from process sensitivity lists. The syntax of the analyze and elaborate commands
are:
analyze -format VHDL -library library_name ../pathname/filename.vhd
elaborate entity_name -library library_name -architecture architecture_name
At this point, we should probably check the design for problems like connectivity, shorts,
opens, multiple instantiations, etc. To do this, use the command check_design.
We will go into the many constraints that need to be applied to our design to ensure the
resulting netlist will work in our intended application in the lectures following. For now,
we're still getting familiar with basic synthesis commands. Once these synthesis constraints
are applied, we need to "optimize" and "map" the design to the target technology i.e. take
the analyzed and elaborated (and now constrained) VHDL the last mile and convert it into
a netlist. We do this using the compile command. Not to be confused with the compile
command used by NC-VHDL. A better Synopsys command might have been "synthesize"
or "optimize-and-map" nonetheless, we have to deal with "compile". It means optimize
and map to the target technology as per the applied constraints. To "optimize-and-map",
simply invoke compile at the dc-shell prompt.
Here is a list of the synthesis constraints we'll discuss in the following lectures, in no
particular order:
MAX AREA, CLOCK FREQUENCY, SETUP TIME, HOLD TIME, PROPAGATION
DELAY, OUTPUT LOADING, INPUT DRIVE STRENGTH, MAX TRANSITION
TIMES, MAX CAPACITANCE, CLOCK SKEW AND UNCERTAINTY, WIRE LOAD
MODELS, MAX FANOUT ,OPERATING CONDITIONS.
And we'll also discuss generation of reports and how to write out the netlist once we've
compiled the design. So let's begin.
As you will see later in the course (Physical Design module), physical design (layout) tools
today are usually of two flavors congestion driven layout (CDL) and timing driven layout
(TDL). The preference in industry is to use CDL where possible, only using TDL when
absolutely necessary to meet timing. The reason, quite simply, is CDL usually results in a
smaller die, and this, in turn, results in a lower cost ASIC for 2 reasons -less silicon area
and higher yield. We get higher yield because the probability of a die being bad is less
when the die is smaller.
For synchronous designs, we need to (a) specify a clock, and (b) specify the I/O timing
relative to that clock. First we specify a clock.
When we specify a clock, we need to provide 3 pieces of information the clock source
(port or pin), the clock period and the duty cycle. We specify the clock using the following
dc_shell command:
create_clock -name clock-name -period period-in-ns -waveform (first-edge second-
edge) find (port portname)
The clock-name is a label for the clock. For example, you may decide to give a label of
"write_clock" to a wclk input port. The -waveform parameter is optional, and if excluded, a
50/50 duty cycle will be assumed. This is usually what we want, so we exclude this
parameter. The "find (port portname)" could alternately be replaced with "find (pin
pinname)".
In practice, designers over-constrain the clock period by about 20% during synthesis. There
are two reasons for this one, and this is the primary reason, to provide some timing
margin in the pre-layout netlist, and two, to avoid having to perform duty cycle checks for
both end of the duty cycle range which is usually specified as 60/40 and 40/60. So a 100
MHz clock with a 10ns period would be specified as an 8 ns clock period during synthesis.
For synchronous designs, once the clock is specified, then we specify the I/O timing
relative to that clock. Let's consider inputs first, then outputs.
Draw picture of clock/data showing required setup and hold time. Let's assume a
clock period of 10 ns, and let's assume that all logic is clocked on the rising edge of
Shows sourcing FF propagation delay, upstream combinational logic delay, and the
desired setup time of the capture logic/FF.
Let's assume a clock period of 10 ns, and let's assume that all logic is clocked on the
rising edge of the clock -this is the capture edge. We wish to specify a maximum
output propagation delay of 3 ns and a minimum output propagation delay of 1 ns.
Specifying a maximum output propagation delay of 3 ns is equivalent to saying "the
sink of that output will have its input valid 7 ns (10 - 3) before the capture clock
Show sourcing FF, external logic feeding downstream capture FF, and associated
delays.
The set_drive command is used to specify the drive resistance of input or bi-directional
ports. The drive resistance is specified as the ratio of time/load. For example, a drive
resistance of 5 means that the rise and fall ramp times on that input or bi-directional port is
5 ns per pF. It follows that a drive resistance of 0 denotes infinite drive strength this is
typically what's used on clock ports to prevent the synthesis tool from creating its own
buffer tree (which is usually unbalanced i.e. unbalanced rise/fall times) on the clock net.
If no drive resistance is specified, the default is 0 (infinite drive strength). The syntax of the
set_drive command is as below.
set_drive resistance port-list
As an example, to set a drive resistance of 2 on the input port IN1, we would specify the
synthesis constraint as set_drive 2 find (port IN1). To set a drive resistance of 1 on all
inputs, we would specify the synthesis constraint as set_drive 1 all_inputs().
The set_load command is used to specify the load value on output or bi-directional ports
(and sometimes nets). The load is specified in unit loads usually units of pF, as is the case
As an example, to set a load of 2 pF on the output port OUT1, we would specify the
synthesis constraint as set_load 2 find (port OUT1). To set a load of 5 pF on all outputs, we
would specify the synthesis constraint as set_load 5 all_outputs().
Process is scaled based on the limitations of the foundry that will be manufacturing your
silicon circuit. Note two of the world's leaders are Taiwan Semiconductor Manufacturing
Company (TSMC) and Chartered Semiconductor Manufacturing Company (CSMC). And
some organizations, like Intel, have their own foundry .Voltage is scaled somewhere
between 5% and 10%, depending on the technology. Temperature is scaled somewhere
in the -40C to +125C range. Typical operating conditions are usually represented as
nominal process, nominal voltage and +25C. For 0.18 micron, nominal voltage is 1.8V.
For 0.13 micron, nominal voltage is l.2V.
Worst case conditions result in maximum cell and wire delays. Worst case operating
conditions exist under worst case process, lowest voltage and highest temperature. Best
case operating conditions exist under best case process, highest voltage and lowest
temperature.
In industry practice, it is typical to synthesize your VHDL code using worst case operating
conditions. Both worst case and best case operating conditions are used for verifying
whether or not the circuit meets timing. And occasionally, best case operating conditions
We specify the operating conditions we want to use with the following design compiler
command:
set_operating_conditions -library library_name condition
You may determine which libraries and operating conditions are available by using the
following design compiler commands.
list -libraries and report_lib library_name
A wire load model is specified in design compiler using the following command.
set_wire_load wire_load_name -mode mode
The wire_load_name is the name of the wire load model (unique to technology and library
used). If the wire_load_name is numeric rather than text, use double quotes to surround the
name. The mode option can specify which wire load model to use for nets that cross
If no wire load model is specified, and if the technology library permits, design compiler
will select its own according to the area of the synthesized circuit. This is not
recommended, as larger designs will likely result in significantly long synthesis run times.
You should explicitly specify your wire load model.
9. Transition times
Loosely defined, transition time is the time it takes for a voltage level to transition from
one state to another. For example, for a buffer to drive a net from logic 0 to logic 1, the
transition time would be measured as the time required driving the net from 10% to 90% of
the voltage rail corresponding to logic 1. Draw a picture of a signal transitioning from logic
0 to logic 1 and show the 10% and 90% points on the waveform.
We need to specify a maximum acceptable transition time to help the synthesis tool
appropriately size the standard logic cells that drive nets internal to the design. The
maximum acceptable transition time is technology dependent. Most technology libraries
typically specify a default transition time. However, we should not depend on this and we
should explicitly specify the maximum transition time. For 0.18 micron, we specify the
maximum acceptable transition time as 1.5 ns on the entire design being synthesized. This
means that the time it takes to drive the logic level on any given net or port from 0 to 1, or
vice-versa, cannot exceed 1.5 ns.
Design rule constraints, such as transition time, cannot be violated at any cost even if it
means violating optimization constraints like timing and area. During synthesis, design rule
constraints are always given higher priority then timing constraints.
The maximum transition time is specified in design compiler using the following command.
set_max_transition transition_time object_list
Because we want to apply the maximum transition time design rule constraint to the entire
design, we specify set_max_transition 1.5 find(design).
Note the only exception is nets targeted for clock tree synthesis (CTS) like clock nets,
global reset lines, etc. Nets being targeted for CTS will be treated as ideal nets.
Note the only exception is nets targeted for clock tree synthesis (CTS) like clock nets,
global reset lines, etc. Nets being targeted for CTS will be treated as ideal nets more on
this later.
The maximum capacitance is specified in design compiler using the following command.
set_max_capacitance capacitance_value object-list
Because we want to apply the maximum capacitance design rule constraint to the entire
design, we specify set_max_capacitance 1.5 find (design).
11. Fallout
The fallout of a net is the physical number of wires that a cell fans out to. To prevent
routing congestion, as well as to help the synthesis tool meet maximum transition and
capacitance constraints, we need to specify the maximum fallout nets will have in the
design. The maximum fanout we specify is technology dependent. For 0.18 micron, we
specify the maximum fallout for all designs being synthesized to be 15.
As with maximum transition and capacitance constraints, maximum fallout is also a design
rule constraint that cannot be violated no matter what the cost. This means that during
synthesis, it has higher priority then timing constraints and will not be violated, even if it
means violating optimization constraints like timing and area.
Because we want to apply a maximum fanout design rule constraint of 15 to the entire
design, we specify set_max_fanout 15 find (design).
12. Clock skew and other ideal nets targeted for clock tree synthesis (CTS)
When ASICs were relatively small, the clock latency and clock skew on the clock network
was relatively insignificant when compared with the clock frequencies used and the
CLK Q delay through a FF. Because of this, clock latency and clock skew was mostly
ignored, and the clock network was simple driven by a single clock buffer pad at the device
level large enough to drive the current required to clock all the sequential elements in the
ASIC.
Clock frequencies in the single to low double-digit MHz range with just a few ns of latency
on the clock network resulted in very little performance penalty because of latency. For
example, for a 10 MHz clock (100ns period) with 2ns of latency on the clock network /
spine, the performance penalty as a result of latency was a mere 2% (2/l00).
And FF CLK Q delays were in the 2-3+ ns range, so clock skew was of little concern. For
example, for a FF CLK Q of 3ns and a clock skew of 2ns, we would be guaranteed that
the Q output of a FF never changes on ANY FF until ALL FFs have been clocked.
Note also that because ASICs were small, the current drive strength of the clock buffer was
relatively low (double digit to low triple digit mA) and quite manageable without any
electro-migration or wire reliability problems.
However, as ASICs grew in size to modern-day multi-million gate ASICs, al1 3 of these
issues created problems with the conventional so-called super-clock-buffer approach to
drive a clock network (a) latency, (b) skew, and (c) current drive and electro-migration/
reliability problems.
The power consumed by a single clock buffer driving this kind of a load at 200 MHz is:
Pspine = 200 MHz x 2 cm x 5 pF/cm x 200 lines x (1.8V)2 = 1.296 W
The result of growing ASICs forced us into another approach for distributing the clock
network across the chip. And that approach was to create a buffer tree on the clock network,
simply because it was impossible to drive increasingly large currents without burning up
the wires that constitute the clock spine on the chip.
Such a clock tree network, by its very nature, introduced latency. And as technologies
increased in density and speed, clock skew started becoming a very significant issue as the
CLK Q delay of FFs started approaching hundreds of ps.
It must be noted that clock trees are not the most optimal solution from an area and power
perspective, and often consume more power than the clock spine approach. Nonetheless,
ASIC designers were forced to use clock trees out of physical and electrical necessity.
This means that to minimize latency through the clock tree, we choose the clock tree to
satisfy this ratio requirement. Note that this is not the lowest power method, but we are
trading off power for latency. Perhaps in a power sensitive ASIC were latency is not a
problem, an alternate clock tree may be chosen to minimize power at the expense of
latency.
The diagram below illustrates the ratio between input and load capacitance for CMOS
gates to minimize latency.
We want C1oad / Cin to be equal to "e". Roughly speaking, this translates to a fallout of
approximately 3 at each stage of the clock tree. The number of stages in the clock tree then
becomes:
In [2000 pF (total capacitance of clock network spine)
# clock tree stages = --------------------------------------------------------------------
0.01 pF (input capacitance of the CLK input of each FF)]
We have to live with the latency created by the clock tree and design around it to satisfy
any device level setup/hold/propagation delay constraints. Note that we will discuss delay
lock loops (DLLs) as one method to achieve this later in the course.
Modern day CAD tools allow us to perform an automated clock tree synthesis (CTS) on the
ASIC design. Such a CTS process takes into account the size, power, technology, etc for
the ASIC and attempts to:
a. minimize skew
b. minimize latency
c. minimize power
probably in that order.
In addition, CTS allows us to balance/manage transition times on long nets. The transition
time is technology specific (for 0.18 micron, we will constrain the maximum transition
time to 1.5 ns on signal nets during synthesis). Note, however, that we won't constrain the
transition time on clock nets. Rather, we treat them as ideal nets. This is because we do not
want the Synthesis tool to insert its own buffers on the clock network we want a separate
CTS tool to perform the buffer insertion on the clock network. This ensures that we get a
balanced clock tree with minimal skew, minimal latency, lowest power and transition time
that does not violate that specified for the technology of the ASIC.
CTS is often performed by a separate department the physical design (or layout) group
and not the ASIC designer themselves.
Final note: when ASICs get really large (and they are today), we also perform CTS on non-
clock nets that have high fanouts. This is not to manage the skew, but rather to deal with
what would otherwise be high transition time violations. Such nets often include
asynchronous chip level resets.
We specify the following design constraints with respect to all clock networks, ports and
nets.
set_clock_uncertainty uncertainty find( clock, clock-name )
set_dont_touch_network find(port, clock-port)
The set_clock_uncertainty design constraint specifies the clock skew on a clock network.
Note that the clock uncertainty design constraint depends on what the physical design
(layout) tools are capable of achieving for a particular ASIC size and technology.
The set_dont_touch_network design constraint explicitly tells the synthesis tool not to
modify or replace any objects in the clock network during optimization.
The set_drive design constraint ensures that the synthesis tool doesn't insert buffers on the
clock network.
For the design project, we wish to perform CTS on both the write clock, WCLK, and the
read clock, RCLK, so we provide the following design constraints (shown only for WCLK,
but will also require identical design constraints for RCLK). Note that we have specified
350 ps of clock uncertainty (skew).
set_clock_uncertainty 0.35 find(clock, wclk)
set_dont_touch_network find(port, wclk)
set_drive 0 find(port, wclk)
set_resistance 0 find(net, wclk)
At the top level of an ASIC, if we wish to model the clock tree latency pre-layout and CTS,
then we may additionally use the design compiler command below.
If we are performing CTS on non-clock nets (e.g. rstb), then we do not provide a clock
uncertainty constraint, and we additionally provide the design constraint below.
set_ideal_net find(net, net_name)
The set_ideal_net design constraint avoids the reporting of maximum capacitance and
maximum fanout violations resulting from the large capacitive values and large fanouts that
For the design project, we wish to perform CTS on the RSTB network, so we provide the
following design constraint.
set_ideal_net find(net, rstb )