Verilog FAQ TIDBITS
Verilog FAQ TIDBITS
VCD - Value Change Dump format - is an ASCII file that contains the "Changes in
Values of Signals". This is a STANDARD format and is compatible between different
waveform viewers etc. Also most of the simulators can write out VCD files - both VHDL
& Verilog, though in Verilog you could do it more easily (than in VHDL - where you
have to go through your simulator's C-API) with the system tasks like $dumpvars.
Event Driven
Cycle Based
Event-based Simulator:
This Digital Logic Simulation method sacrifices performance for rich functionality:
every active signal is calculated for every device it propagates through during a
clock cycle. Full Event-based simulators support 4-28 states; simulation of
Behavioral HDL, RTL HDL, gate, and transistor representations; full timing
calculations for all devices; and the full HDL standard. Event-based simulators
are like a Swiss Army knife with many different features but none are particularly
fast.
Compiled-Code Simulators:
This technique takes the input definition (HDL) of the design and spends time
compiling it into a new data structure in order to enable much faster calculations
during run-time. You sacrifice compile time to be able to run large numbers of
tests faster. it is used in some high end, Event-based simulators.
e.g. Synopsys Inc.'s VCS Simulator converts verilog files into C code which then
be compiled and run, just like any other executable file. It is 10 to 50 times faster
than any other interpretive simulator.
see http://www.synopsys.com/products/simulation/vcs_ds.html
Cadence's Native Compiled Verilog generates direct machine language
instructions from verilog files.
see http://www.cadence.com/datasheets/affirma_nc_verilog_sim.html
This method of simulation allows for rapid change of the source HDL of the
design and restart of the simulation since there is little or no compilation involved
after every design change. This is good for interaction but leads to poor run
times of large tests compared to Compiled Code Techniques.
see http://www.cadence.com/technology/pcb/products/prev_ds/verilog-xl-
family.html
1.) Results are only examined at the end of every clock cycle; and
2.) The digital logic is the only part of the design simulated (no timing
calculations). By limiting the calculations, Cycle based Simulators can provide
huge increases in performance over conventional Event-based simulators.
Cycle based simulators are more like a high speed electric carving knife in
comparison because they focus on a subset of the biggest problem: logic
verification.
Cycle based simulators are almost invariably used along with Static Timing
verifier to compensate for the lost timing information coverage.
In following table differences between Event based and Cycle based simulation
are summarized.
Consider the circuit below: if a cycle based simulator runs a simulation on the circuit
below, then it will evaluate B, C, D and E only at each cycle. In the case of an event
based simulator, B, C, D and E are evaluated not only at clock cycle, but also when any
of the events at the input of gates and flip-flops occurs.
• Compiled Simulator : This kind of simulator converts the whole Verilog code
into machine dependent code and then runs the simulation. Example : VCS
generates the binary file, which can be run from the command prompt. Compiled
simulators are very fast.
• Interpreted Simulator : This kind of simulator executes line by line, thus is very
slow compared to a compiled simulator. Verilog-XL is one such simulator.
TIDBITS
Well I had this doubt when I was learning Verilog: What is the difference between reg
and wire? Well I won't tell stories to explain this, rather I will give you some examples to
show the difference.
From the college days we know that wire is something which connects two points, and
thus does not have any driving strength. In the figure below, in_wire is a wire which
connects the AND gate input to the driving source, clk_wire connects the clock to the
flip-flop input, d_wire connects the AND gate output to the flip-flop D input.
There is something else about wire which sometimes confuses. wire data types can be
used for connecting the output port to the actual driver. Below is the code which when
synthesized gives a AND gate as output, as we know a AND gate can drive a load.
SYNTHESIS OUTPUT
What this implies is that wire is used for designing combinational logic, as we all know
that this kind of logic can not store a value. As you can see from the example above, a
wire can be assigned a value by an assign statement. Default data type is wire: this means
that if you declare a variable without specifying reg or wire, it will be a 1-bit wide wire.
Now, coming to reg data type, reg can store value and drive strength. Something that we
need to know about reg is that it can be used for modeling both combinational and
sequential logic. Reg data type can be driven from initial and always block.
SYNTHESIS OUTPUT
This gives the same output as that of the assign statement, with the only difference that y
is declared as reg. There are distinct advantages to have reg modeled as combinational
element; reg type is useful when a "case" statement is required (refer to the Verilog
section for more on this).
To model a sequential element using reg, we need to have edge sensitive variables in the
sensitivity list of the always block.
SYNTHESIS OUTPUT
There is a difference in the way we assign to reg when modeling combinational logic: in
this logic we use blocking assignments while modeling sequential logic we use
nonblocking ones.
Blocking Statements: A blocking statement must be executed before the execution of the
statements that follow it in a sequential block. In the example below the first time
statement to get executed is a = b followed by
1 module block_nonblock();
2 reg a, b, c, d , e, f ;
3
4 // Blocking assignments
5 initial begin
6 a = #10 1'b1;// The simulator assigns 1 to a at time 10
7 b = #20 1'b0;// The simulator assigns 0 to b at time 30
8 c = #40 1'b1;// The simulator assigns 1 to c at time 70
9 end
10
11 // Nonblocking assignments
12 initial begin
13 d <= #10 1'b1;// The simulator assigns 1 to d at time 10
14 e <= #20 1'b0;// The simulator assigns 0 to e at time 20
15 f <= #40 1'b1;// The simulator assigns 1 to f at time 40
16 end
17
18 endmodule
Example - Blocking
1 module blocking (clk,a,c);
2 input clk;
3 input a;
4 output c;
5
6 wire clk;
7 wire a;
8 reg c;
9 reg b;
10
11 always @ (posedge clk )
12 begin
13 b = a;
14 c = b;
15 end
16
17 endmodule
Synthesis Output
Example - Nonblocking
1 module nonblocking (clk,a,c);
2 input clk;
3 input a;
4 output c;
5
6 wire clk;
7 wire a;
8 reg c;
9 reg b;
10
11 always @ (posedge clk )
12 begin
13 b <= a;
14 c <= b;
15 end
16
17 endmodule
Synthesis Output
Introduction
There are many ways to code these state machines, but before we get into the coding
styles, let's first understand the basics a bit. There are two types of state machines:
Mealy State Machine : Its output depends on current state and current inputs. In the
above picture, the blue dotted line makes the circuit a mealy state machine.
• Moore State Machine : Its output depends on current state only. In the above
picture, when blue dotted line is removed the circuit becomes a Moore state
machine.
Depending on the need, we can choose the type of state machine. In general, or you can
say most of the time, we end up using Mealy FSM.
Encoding Style
Since we need to represent the state machine in a digital circuit, we need to represent
each state in one of the following ways:
Binary encoding : each state is represented in binary code (i.e. 000, 001, 010....)
• Gray encoding : each state is represented in gray code (i.e. 000, 001, 011,...)
• One Hot : only one bit is high and the rest are low (i.e. 0001, 0010, 0100, 1000)
• One Cold : only one bit is low, the rest are high (i.e. 1110,1101,1011,0111)
Example
To help you follow the tutorial, I have taken a simple arbiter as the example; this has
got two request inputs and two grant outputs, as shown in the signal diagram below.
We can symbolically translate into a FSM diagram as shown in figure below, here FSM
has got following states.
• IDLE : In this state FSM waits for the assertion of req_0 or req_1 and drives both
gnt_0 and gnt_1 to inactive state (low). This is the default state of the FSM, it is
entered after the reset and also during fault recovery condition.
• GNT0 : FSM enters this state when req_0 is asserted, and remains here as long as
req_0 is asserted. When req_0 is de-asserted, FSM returns to the IDLE state.
• GNT1 : FSM enters this state when req_1 is asserted, and remains there as long
as req_1 is asserted. When req_1 is de-asserted, FSM returns to the IDLE state.
Coding Methods
Now that we have described our state machine clearly, let's look at various methods of
coding a FSM.
We use one-hot encoding, and all the FSMs will have the following code in common, so
it will not be repeated again and again.
Whenever there are setup and hold time violations in any flip-flop, it enters a state where
its output is unpredictable: this state is known as metastable state (quasi stable state); at
the end of metastable state, the flip-flop settles down to either '1' or '0'. This whole
process is known as metastability. In the figure below Tsu is the setup time and Th is the
hold time. Whenever the input signal D does not meet the Tsu and Th of the given D flip-
flop, metastability occurs.
When a flip-flop is in metastable state, its output oscillate between '0' and '1' as shown in
the figure below (here the flip-flop output settles down to '0') . How long it takes to settle
down, depends on the technology of the flip-flop.
If we look deep inside of the flip-flop we see that the quasi-stable state is reached when
the flip-flop setup and hold times are violated. Assuming the use of a positive edge
triggered "D" type flip-flop, when the rising edge of the flip-flop clock occurs at a point
in time when the D input to the flip-flop is causing its master latch to transition, the flip-
flop is highly likely to end up in a quasi-stable state. This rising clock causes the master
latch to try to capture its current value while the slave latch is opened allowing the Q
output to follow the "latched" value of the master. The most perfectly "caught" quasi-
stable state (on the very top of the hill) results in the longest time required for the flip-
flop to resolve itself to one of the stable states.
The relative stability of states shown in the figure above shows that the logic 0 and logic
1 states (being at the base of the hill) are much more stable than the somewhat stable state
at the top of the hill. In theory, a flip-flop in this quasi-stable hilltop state could remain
there indefinitely but in reality it won't. Just as the slightest air current would eventually
cause a ball on the illustrated hill to roll down one side or the other, thermal and induced
noise will jostle the state of the flip-flop causing it to move from the quasi-stable state
into either the logic 0 or logic 1 state.
As we have seen that whenever setup and hold violation time occurs, metastability
occurs, so we have to see when signals violate this timing requirement:
What is MTBF?
MTBF is Mean time between failure, what does that mean? Well MTBF gives us
information on how often a particular element will fail or in other words, it gives the
average time interval between two successive failures. The figure below shows a typical
MTBF of a flip-flop and also it gives the MTBF equation. I am not looking here to derive
MTBF equation :-)
In the simplest case, designers can tolerate metastability by making sure the clock period
is long enough to allow for the resolution of quasi-stable states and for the delay of
whatever logic may be in the path to the next flip-flop. This approach, while simple, is
rarely practical given the performance requirements of most modern designs.
The most common way to tolerate metastability is to add one or more successive
synchronizing flip-flops to the synchronizer. This approach allows for an entire clock
period (except for the setup time of the second flip-flop) for metastable events in the first
synchronizing flip-flop to resolve themselves. This does, however, increase the latency in
the synchronous logic's observation of input changes.
Neither of these approaches can guarantee that metastability cannot pass through the
synchronizer; they simply reduce the probability to practical levels.
In quantitative terms, if the Mean Time Between Failure (MTBF) of a particular flip-flop
in the context of a given clock rate and input transition rate is 33.33 seconds then the
MTBF of two such flip-flops used to synchronize the input would be (33.33* 33.33) =
18.514 Minutes. Well I have taken the worst flip-flop ever designed in history of man
kind :-). The figure below shows how to connect two flip-flops in series to achieve this
and also the resultant MTBF.
Normally,
Synchronous Reset
Asynchronous Reset
1 module asyn_reset(clk,reset,a,c);
2 input clk;
3 input reset;
4 input a;
5 output c;
6
7 wire clk;
8 wire reset;
9 wire a;
10 reg c;
11
12 always @ (posedge clk or posedge reset)
13 if ( reset == 1'b1) begin
14 c <= 0;
15 end else begin
16 c <= a;
17 end
18 endmodule
Synthesis Output
Synchronize the asynchronous external reset signal, use this synchronous reset as input to
all the asynchronous flip-flops inside the design, as shown in the figure below. We do this
as an asynchronous reset flip-flop takes less logic to implement, is faster, consumes less
power.
Introduction
There are times when a designer needs to interface two systems working at two different
clocks. This interfacing is difficult in the sense that design becomes asynchronous at the
boundary of interface, which results in setup and hold violation, metastability and
unreliable data transfers. So we need to go out for special design and interfacing
techniques.
Here we have two systems, which are asynchronous in nature to each other. In such a
case if we need to do data transfer, there are very few methods to achieve this:
Handshake Signaling
In this method the system (module) A sends data to system/module B based on the
handshake signals ack and req signals. The protocol for this uses the same old method
that is found with 8155 chip used with 8085.
Protocol
• Transmitter asserts the req (request) signal, asking the receiver to accept the data
on the data bus.
• Receiver asserts the ack (acknowledge) signal, asserting that it has accepted the
data.
This method is straightforward, but it too has got loop holes: when system B samples the
systems A's req line and System A samples system B's ack line, they are done with
respect to their internal clock, so there will be setup and hold time violation. To avoid this
we go for double or triple stage synchronizers, which increase the MTBF and thus are
immune to metastability to a good extent. The figure below shows how this is done with
respect to the above example.
If we do the double or triple stage synchronizing, then the transfer rate comes down, due
to the fact that a lot of clock cycles are wasted just handshaking.
Sometimes it is good to synchronize the data also to be double sure, but normally we
don't do this, as it takes a lot of logic and what we gain is very small. The figure below
shows one such case (there is no difference between circuits shown for req and data, they
are one and same).
Asynchronous FIFO
An Asynchronous FIFO has got two interfaces, one for writing the data into the FIFO and
the other for reading the data out. It has got two clocks, one for writing and the other for
reading. System A writes the data in the FIFO and System B reads out the data from it. To
facilitate error free operations, we have FIFO full and FIFO empty signals. These signals
are generated with respect to the corresponding clock. FIFO full signal is used by system
A (as when FIFO is full, we don't want system A to write data into FIFO, this data will be
lost), so it will be driven by the write clock. Similarly, FIFO empty will be driven by the
read clock. Here read clock means system B clock and write clock means system A clock.
Asynchronous FIFO is used at places when the performance is a matter, when one does
not want to waste clock cycles in handshake signals, when there is a lot of system
resources available.
How to design an Asynchronous FIFO is not in the scope of this document, but what I
would like to point out is that one should be careful with the generation of FIFO full and
FIFO empty signals, as it may, in certain cases, cause metastability.
Introduction
One of the most common questions in interviews is how to calculate the depth of a FIFO.
Fifo is used as buffering element or queueing element in the system, which is by common
sense is required only when you slow at reading than the write operation. So size of the
FIFO basically implies the amount of data required to buffer, which depends upon data
rate at which data is written and the data rate at which data is read. Statistically, Data rate
varies in the system majorily depending upon the load in the system. So to obtain safer
FIFO size we need to consider the worst case scenario for the data transfer across the
FIFO under consideration.
For worst case scenario, Difference between the data rate between write and read should
be maximum. Hence, for write operation maximum data rate should be considered and
for read operation minimum data rate should be considered.
So in the question itself, data rate of read operation is specified by the number of idle
cycles and for write operation, maximum data rate should be considered with no idle
cycle.
So for write operation, we need to know Data rate = Number of data * rate of clock.
Writing side is the source and reading side becomes sink, data rate of reading side
depends upon the writing side data rate and its own reading rate which is
Frd/Idle_cycle_rd.
In order to know the data rate of write operation, we need to know Number of data in a
Burst which we have assumed to be B.
So following up with the equation as explained below: Fifo size = Size to be buffered = B
- B * Frd / (Fwr* Idle_cycle _rd ).
Here we have not considered the sychnronizing latency if Write and Read clocks are
Asynchronous. Greater the Synchronizing latency, higher the FIFO size requirement to
buffer more additional data written.
Assume that we have to design a FIFO with following requirements and We want to
calculate minumum FIFO depth,
• A synchronized fifo
• Writing clock 30MHz - F1
• Reading clock 40MHz - F2
• Writing Burst Size - B
• Case 1 : There is 1 idle clock cycle for reading side - I
• Case 2 : There is 10 idle clock cycle for reading side - I
If if we have alternate read cycles i.e between two read cycle there is IDLE cycle.
= B/3
= B(1-4/30)
= B * 26 /30
Verification flow with specman is the same as with any other HVL. The figure below
shows the verification flow with specman.
Verification flow starts with understanding the specification of the chip/block under
verification. Once the specification is understood, a test cases document is prepared,
which documents all possible test cases. Once this document is done to a level where 70-
80 percent functionality is covered, a testbench architecture document is prepared. In the
past, this document was prepared first and the test cases one was prepared next. There is a
drawback with this style: if test cases document shows a particular functionality to be
verified and if testbench does not support it, as the architecture document was prepared
before the test cases one. If we have a test cases document to refer to, then writing an
architecture document becomes much easier, as we know for sure what is expected from
the testbench.
Note: This section was written in a hurry, so it is very far from what I really want it to
be!!!
Test Cases
Identify the test cases from the design specification: a simple task for simple cases.
Normally requirement in test cases becomes a test case. Anything that specification
mentions with "Can do", "will have" becomes a test case. Corner test cases normally take
lot of thinking to be identified.
Testbench Architecture
Typical testbench architecture looks as shown below. The main blocks in a testbench are
base object, transaction generator, driver, monitor, checker/scoreboard.
The block in red is the DUT, and boxes in orange are the testbench components.
Coverage is a separate block which gets events from the input and output monitors. It is
the same as the scoreboard, but does something more.
Base Object
Base object is the data structure that will be used across the testbench. Let's assume you
are verifying a memory, then the base object would contain:
1 <'
2 struct mem_object {
3 addr : uint (bits:8);
4 data : uint (bits:8);
5 rd_wt : uint [0..100];
6 wr_wt : uint [0..100];
7 rd_wr : bool;
8 keep soft rd_wt == 50;
9 keep soft wr_wt == 50;
10
11 keep gen (wr_wt) before (rd_wr);
12 keep gen (rd_wt) before (rd_wr);
13 // Default operation is Write
14 keep soft rd_wr == FALSE;
15
16 keep soft rd_wr == select {
17 rd_wt : TRUE;
18 wr_wt : FALSE;
19 };
20 };
21 '>
Here base_object is the name of the base object, in the same way as we have a module
name for each module in Verilog or an entity name in VHDL. Address, data, read, write
are various field of the base_object. Normally we have some default constraints and some
methods (functions) which could manipulate the objects in the base object.
Transaction Generator
Transaction generator generates the transactions based on the test constraints. Normally
the transaction generator applies test case constraints on the base object and generate a
base object based on constraints. Once generated, the transaction generator passes it to
the driver.
1 <'
2 struct mem_txgen {
3 ! mem_base : mem_object;
4 //driver : mem_driver;
5 ! num_cmds : uint;
6 // This method generates the commands and
7 // calls the driver
8 genrate_cmds()@sys.any is {
9 for {var i:uint = 0 ; i < num_cmds; i+=1} do {
10 // Generate a write access
11 gen mem_base keeping {
12 it.addr == 0x10;
13 it.data == 0x22;
14 it.rd_wr == FALSE;
15 };
16 // call the driver
17 //driver.drive_object(mem_base);
18 };
19 };
20 };
21 '>
Driver
Driver drives the base object generated by the transaction generator to the DUT. To do
this, it implements the DUT input protocol. Something like this:
1 <'
2 unit mem_driver {
3 event clk is rise('top.mem_clk') @sim;
4 // This method drives the DUT
5 drive_mem(mem_base : mem_object)@clk is {
6 wait cycle;
7 //Driver ce,addr,rd_wr command
8 'top.mem_ce' = 1;
9 'top.mem_addr' = mem_base.addr;
10 'top.mem_rd_wr' = mem_base.rd_wr;
11 if (mem_base.rd_wr == FALSE) {
12 'top.mem_wr_data' = mem_base.data;
13 };
14 // Deassert all the driven signals
15 wait cycle;
16 'top.mem_ce' = 0;
17 'top.mem_addr' = 0;
18 'top.mem_rd_wr' = 0;
19 'top.mem_wr_data' = 0;
20 };
21 };
22 '>
Input Monitor
Input monitor monitors the input signals to the DUT. Example: in an ethernet switch,
each ingoing packet is picked by the input monitor and passed to the checker.
Output Monitor
Output monitor monitors the output signals from DUT. Example: in an ethernet switch,
each outgoing packet from the switch is picked by the output monitor and passed to the
checker.
Checker/Scoreboard
Checker or Scoreboard basically checks if the output coming out of the DUT is correct or
wrong. Basically scoreboards in e language are implemented using keyed lists.
TestBench Coding
Testbench coding starts after the testbench architecture document is complete, typically
we start with:
• base object
• transaction generator
• driver
• input monitor
• output monitor
• scoreboard
If the project is big, all the tasks can start at the same time, as many engineers will be
working on them.
In this phase, test execution teams execute the test cases based on a priority. Typically
once the focused test cases pass and some level of random test cases pass, we move to
regression. In regression all the test cases are run with different seeds every time there is
change in RTL.
Post Processing
In post processing, code and functional coverage is checked to see if all the possible DUT
functionality is covered.
Code Coverage
Code coverage shows which part of the RTL is tested, thus is used as a measurement to
show how well the DUT is verified. Also code coverage shows how good the functional
coverage matrix is.
• Line Coverage
• Branch Coverage
• Expression Coverage
• Toggle Coverage
• FSM Coverage
Line Coverage
Line coverage or block coverage or segment coverage shows how many times each line is
executed.
Branch Coverage
Branch coverage shows if all the possible branches of if..else or case statements are
reached or not.
Expression Coverage
The golden of all coverage types. Expression coverage shows if all possible legal boolean
values of an expression are reached or not. Generally expression coverage of 95%
and above for large design is considered good.
Toggle Coverage
Toggle coverage shows which bits in the RTL have toggled. Toggle coverage is used for
power analysis mainly.
FSM Coverage
The FSM coverage shows if all states are reached, if all possible state transitions have
happened.