=
—Sunburst Design—
Expert Verilog, SystemVerilog & Synthesis Training
Simulation and Synthesis Techniques for Asynchronous
FIFO Design
Clifford E. Cummings, Sunburst Design, Inc.
cliffe@[Link]
ABSTRACT
FIFOs are often used to safely pass data from one clock domain to another asynchronous clock domain. Using a
FIFO to pass data from one clock domain to another clock domain requires multi-asynchronous clock design
techniques. There are many ways to design a FIFO wrong. There are many ways to design a FIFO right but still,
‘ake it difficult to properly synthesize and analyze the design,
This paper will detail one method that is used to design, synthesize and analyze a safe
‘or "FIFO empty
included,
‘onditions. The fully coded, synthesized and analyzed RTL Verilog model (FIFO Style #1) is
Post-SNUG Editorial Comment
A second FIFO paper by the same author was voted “Best Paper~ 1" Place” by SNUG attendees, is listed as
reference [3] and is also available for download,1.0 Introduction
An asynchronous FIFO refers to @ FIFO design where data values are written to a FIFO buffer from one clock
‘domain and the data values are read from the same FIFO bufler from another clock domain, where the two clock
‘domains ate asynchronous to each other
Asynchronous FIFOs are used to safely pass data from one elock domain to another clock domain
‘There are many ways to do asynchronous FIFO design, including many wrong ways, Most incorrectly implemented.
FIFO designs still function properly 90% of the time. Most almost-correct FIFO designs function properly 99%* of
the time. Unfortunately, FIFOs that work properly 99% of the time have design flaws that are usually the most
difficult to detect and debug (if you are lucky enough to notice the bug before shipping the product), or the most
‘costly to diagnose and recall (if the bug is not discovered until the product is in the hands of a dissatisfied
customer),
This paper discusses one FIFO design style and important details that must be considered when doing asynchronous
FIFO design.
‘The rest of the paper simply refers to an “asynchronous FIFO” as just “FIFO.”
2.0 Passing multiple asynchronous signals
Attempting to synchronize multiple changing signals from one clock domain into a new clock domain and insuring
that all changing signals are synchronized to the same clock cycle in the new clock domain has been shown to be
pproblematie{I). FIFOs are used in designs to safely pass multi-bit data words from one clock domain to another.
Data words are placed into @ FIFO buffer memory array by control signals in one clock domain, and the data words
are removed from another port of the same FIFO buffer memory array by control signals from a second clock
domain. Conceptually, the task of designing a FIFO with these assumptions seems to be easy.
‘The difficulty associated with doing FIFO design is related to generating the FIFO pointers and finding a reliable
‘way to determine full and empty status on the FIFO.
2.1 Synchronous FIFO pointers
For synchronous FIFO design (a FIFO where writes to, and reads from the FIFO buffer are conducted in the same
clock domain), one implementation counts the number of writes to, and reads from the FIFO butler to inerement (on
FIFO write but no read), decrement (on FIFO read but no write) or hold (no writes and reads, or simultaneous write
‘and read operation) the current fill value of the FIFO buffer. The FIFO is full when the FIFO counter reaches
predetermined full value and the FIFO is empty when the FIFO counter is zero.
Unfortunately, for asynchronous FIFO design, the inerement-decrement FIFO fill counter cannot be used, because
two different and asynchronous clocks would be required to control the counter. To determine full and empty status
for an asynchronous FIFO design, the write and read pointers will have to be compared,
2.2. Asynchronous FIFO pointers
In order to understand FIFO design, one needs to understand how the FIFO pointers work, The write pointer always
points to the next word to be written; therefore, on reset, both pointers are set to zero, which also happens to be the
next FIFO word location to be written. On a FIFO-write operation, the memory location that is pointed to by the
“write pointer is written, and then the write pointer is incremented to point to the next location to be written.
Similarly, the read pointer always points to the current FIFO word to be read, Again on reset, both pointers are reset
10 zero, the FIFO is empty and the read pointer is pointing to invalid data (hecause the FIFO is empty and the empty
flag is asserted). As soon as the first data word is written to the FIFO, the write pointer increments, the empty flag is
cleared, and the read pointer that is still addressing the contents of the first FIFO memory word, immediately drives
that first valid word onto the FIEO data output port, to be read by the receiver logic. The fact thatthe read pointer is.
always pointing to the next FIFO word to be read means thatthe receiver logic does not have to use two clock
periods to read the data word. Ifthe receiver first had to inerement the read pointer before reading a FIFO data
SNUG San Jose 2002 2 ‘Simulation and Synthesis Techniques for
Rev 1.2 Asynchronous FIFO Designword, the receiver would clock once to output the data word ftom the FIFO, and clock a second time to capture the
data word into the receiver, That would be needlessly inefficient,
‘The FIFO is empty when the read and write pointers are both equal. This condition happens when both pointers are
reset to zero during a reset operation, or when the read pointer catches up to the write pointer, having read the last
word from the FIFO.
A FIFO is full when the pointers are again equal, that is, when the write pointer has wrapped around and caught up
to the read pointer. This is a problem. The FIFO is either empty or full when the pointers are equal, but which?
‘One design technique used to distinguish between full and empty is to add an extra bit to each pointer, When the
“write pointer increments past the final FIFO address, the write pointer will increment the unused MSB while setting
the rest of the bits back to zero as shown in Figure I (the FIFO has wrapped and toggled the pointer MSB). The
same is done with the read pointer. Ifthe MSBs of the two pointers are different, it means that the write pointer has
‘wrapped one more time that the read pointer. Ifthe MSBs of the two pointers are the same, it means that both
pointers have wrapped the same number of times.
When
15 (waddr[3:0] == raddr[3:0]) 03
a ‘the FIFO is either 07
2 PULL or EMPTY 7
72 oo
' Ta 80
70 / Te a0
3 | 320
8 raddr points: 8 20
7 —~) to the word & 08
a being read oor .. the waddr has
| : S84) | sirapped around
- ‘one more time
/Ca waded points to ao than the radde
/ the next word ;
ie
3 tobe written 2 es }
/ ee) ;
‘ 7 : 7 J
reddr—» Fo <— waar | raddr—» [To 00] wear
(On reset, waddr<=0 EMPTY FULL
and raddre=| if (waddr == radar); if ({-wadar[4], waddr[3:0}} == radar);
Figure | - FIFO full and empty conditions
Using nebit pointers where (n-1) is the number of address bits required to access the entire FIFO memory buffer, the
FIFO is empty when both pointers, including the MSBs are equal. And the FIFO is full when both pointers, exeept
the MSBs are equal.
‘The FIFO design in this paper uses n-bit pointers for a FIFO with 2" write-able locations to help handle full and
‘empty conditions. More design details related tothe full and empty logie are included in section 5.0
SNUG San Jose 2002 3 ‘Simulation and Synthesis Techniques for
Rev 1.2 Asynchronous FIFO Design2.3. Binary FIFO pointer considerations
Trying to synchronize a binary count value from one clack domain to another is problematic because every bit of an
n-bit counter can change simultaneously (example 7->8 in binary numbers is 0111->1000, all bits changed). One
approach to the problem is sample and hold periodic binary count values in a holding register and pass a
synchronized ready signal to the new clock domain. When the ready signal is recognized, the receiving clock
‘domain sends back a synchronized acknowledge signal to the sending clock domain, A sampled pointer must not
‘change until an acknowledge signal is received from the receiving clock domain. A count-value with multiple
‘changing bits can be safely transferred to a new clock domain using this technique. Upon reccipt of an acknowledge
signal, the sending clock domain has permission to clear the ready signal and re-sample the binary count value,
Using this technique, the binary counter values are sampled periodically and not all ofthe binary counter values can
‘be passed to a new clock domain The question is, do we need to be concerned about the case where a binary counter
‘might continue to increment and overflow or underflow the FIFO between sampled counter values? The answer is
no[8]
FIFO fall occurs when the write pointer catches up to the synchronized and sampled read pointer. The synchronized
‘and sampled read pointer might not reflect the current value of the actual read pointer but the write pointer will not
iy to count beyond the synchronized read pointer value. Overflow will not accur{8].
FIFO empty occurs when the read pointer eatches up to the synchronized and sampled write pointer. The
synchronized and sampled waite pointer might not reflect the current valuc of the actual write pointer but the read
pointer will not try to count beyond the synchronized write pointer value. Underflow will not occur[8]. More
‘observations about this technique of sampling binary pointers with a synchronized ready-acknowledge pair of
handshaking signals are detailed in section 7.0, after the discussion of synchronized Gray[6] code pointers.
A common approach to FIFO counter-pointers, is to use Gray code counters. Gray codes only allow one bit to
change for each clock transition, eliminating the problem associated with trying to synchronize multiple changing,
signals on the same clock edge.
2.4 FIFO testing troubles
Testing a FIFO design for subtle design problems is nearly impossible to do. The problem is rooted in the fact that
FIFO pointers in an RTL simulation behave ideally, even though, if incorrectly implemented, they can cause
catastrophic failures if used in a real design.
In an RTL simulation, if binary-count FIFO pointers are included in the design all of the FIFO pointer bits will
‘change simultaneously; there is no chance to observe synchronization and comparison problems. Ina gate-level
simulation with no backannotated delays, there is only a slight chance of observing a problem ifthe gate transitions
are different for rising and falling edge signals, and even then, one would have to get lucky and have the correct
sequence of bits changing just prior to and just after a rising clock edge, For higher speed designs, the delay
differences between rising and falling edge signals diminishes and the probability of detecting problems also
diminishes. Finding actual FIFO design problems is greatest for gate-level designs with backannotated delays, but
‘even doing this type of simulation, finding problems will be difficult to do and again the odds of observing the
design problems decreases as signal propagation delays diminish,
Clearly the answer is to recognize that there are potential FIFO design problems and to do the design correetly from
the start.
The behavioral model that I sometimes use for testing a FIFO design is a FIFO model that is simple to code, is
accurate for behavioral testing purposes and would be difficult to debug if it were used as an RTL synthesis model
This FIFO model is only recommended for use in a FIFO testhench, The model accurately determines when FIFO
full and empty status bits should be set and can be used to determine the data values that should have been stored
into a working FIFO, THIS FIFO MODEL IS NOT SAFE FOR SYNTHESIS!
module beh_fifo (rdata, wfull, rempty, wdata,
wine, welk, west_n, rinc, relk, rrst_n);
SNUG San Jose 2002 4 ‘Simulation and Synthesis Techniques for
Rev 1.2 Asynchronous FIFO Designparameter DSIZE = 8;
parameter ASIZE = 4;
output [DSIZE-1:0] rdata;
output wfull;
output rempty;
input [DSIZE-1:0] wdata;
input winc, welk, wrst_n;
input vine, relk, rrst_n;
reg [ASIZE:0]_ wptr, wrptri, wrptr2, wrptr3;
reg [ASIZE:0]__rptr, rwptri, rwptr2, rwptr3;
parameter MEMDEPTH = 1<
01009
mirror image about Ve yy), 447] fo its One
the counter mid-point | (8 1100 <~ / > 10004~
a [3 qili«— > roi
2nd half of gray-code | _/ a m0 > 1010
sequence, MSB=1 12 1010<—~ > 110
13 ion <—— Sun
14 1001 +—~ SA > iio1
15 1000 <—— >
1100.
; Fy] Toe bits (S85)
change from 15 to 0
© 0000 > 0000;
Figure 2 - n-bit Gray code converted to an (n-1)-bit Gray code
To better understand the problem of converting an n-bit Gray code to an (n-1)-bit Gray code, consider the example
of creating a dual 4-bit and 3-bit Gray code counter as shown in Figure 2
‘The most common Gray code, as shown in Figure 2, isa reflected code where the bits in any column except the
MSB are symmetrical about the sequence mid-point{6). This means that the second hal of the 4-bit Gray code is @
‘mirror image of the first half with the MSB inverted.
To convert a 4-bit to a 3-bit Gray code, we do not want the LSBs of the second half of the 4-bit sequence to be a
mirror image of the LSBs of the first half, instead we want the LSBs of the second half to repeat the 4-bit LSB-
sequence of the first half,
Upon closer examination, itis obvious that inverting the second MSB of the second half of the 4-bit Gray code will
produce the desired 3-bit Gray code sequence in the three LSBs of the 4-bit sequence. The only other problem is
that the 3-bit Gray code with extra MSB is no longer a true Gray code because when the sequence changes from 7
(Gray 0100) to 8 (~Gray 1000) and again from 15 (~Gray 1100) t0 0 (Gray 0000), two bits are changing instead of
Jjust one bit. A true Gray code only changes one bit between counts.
SNUG San Jose 2002 6 ‘Simulation and Synthesis Techniques for
Rev 1.2 Asynchronous FIFO Design3.2. Gray code counter basies
The first fact to remember about a Gray eode is that the code distance between any two adjacent words is just 1
(only one bit can change from one Gray count tothe next). The second fact to remember about a Gray code counter
is that most useful Gray code counters must have power-of-2 counts in the sequence. It is possible to make a Gray
code counter that counts an even number of sequences but conversions to and from these sequences are generally
not as simple to do as the standard Gray code, Also note that there are no odd-count-length Gray code sequences so
‘one cannot make a 23-deep Gray code, This means that the technique described in this paper is used to make a FIFO
that is 2” deep.
Figure 3 is a block diagram for a style #1 dual n-bit Gray code counter. The style #1 Gray code counter assumes that
the outputs of the register bits are the Gray code value itself (ptr, either wptr or rptr). The Gray code outputs
are then passed to a Gray-to-binary converter (bin), which is passed to a conditional binary-value incrementer to
generate the next-binary-count-value (next), which is passed to a binary-to-Gray converter that generates the
next-Gray-count-value (gnext), which is passed to the register inputs, The top half of the Figure 3 block diagram
shows the described logic flow while the bottom half shows logic related to the second Gray code counter as
described in the next section.
To address memory
addr = {addrmsb, ptrin-3:0)
Gray to Core
Binary reg
comb.
logic ptr[n-4:0]
ine
ttull or _| ;
tempty $
gnoxtin-t] mse
Sy, msbnext
pena) mee addrmsb
pS Code
clk p> PH
ren | ___, to register the MSB of
the (n-1)-bit gray code
Figure 3 - Dual n-bit Gray code counter block diagram - style #1
3.3. Dual n-bit Gray code counter
A dual n-bit Gray code counter is a Gray code counter that generates both an n-bit Gray eode sequence (described in
section 3.2) and an (n1)-bit Gray code sequence,
The (n-1)-bit Gray code is simply generated by doing an exclusive-or operation on the two MSBs of the n-bit Gray
code to generate the MSB for the (n-1)-bit Gray code. This is combined with the (n-2) LSBs of the n-bit Gray code
counter to form the (n-1)-bit Gray code counter{5].
SNUG San Jose 2002 1 ‘Simulation and Synthesis Techniques for
Rev 1.2 Asynchronous FIFO Design3.4 Additional Gray code counter considerations
“The binary-value inerementer is conditioned with either an “if not full” or “if not empty” test as shown in Figure 3,
to insure that the appropriate FIFO pointer will not increment during FIFO-full or FIFO-empty conditions that could
Tead to overflow or underflow of the FIFO bute.
Ifthe logic block that sends data to the FIFO reliably stops sending data when a FIFO full condition is asserted, the
FIFO design might be streamlined by removing the full-testing logic from the FIFO write pointer.
The FIFO pointer itself does not protect the FIFO buffer from being overwritten, but additional conditioning logic
could be added to the FIFO memory buffer to insure that a write_enable signal could not be activated during a FIFO
full condition,
An additional “sticky” status bt, either ovf (overflow) or unf (underflow), could be added to the pointer design to
indicate that an additional FIFO write operation occurred during full or an additional FIFO read operation occurred
during empty to indicate error conditions that could only be cleared during reset.
A safe, general purpose FIFO design will include the above safeguards at the expense ofa slightly larger and
pethaps slower implementation. This is a good idea since a future co-worker might try to copy and reuse the code in
another design without understanding all ofthe important details that were considered for the current design.
4.0 Gray code counter - Style #2
Starting with version 1.2 of this paper, the FIFO implementation uses the Gray eode counter style #2, which actually
employs two sets of registers to eliminate the need to translate Gray pointer values to binary values. The second set
of registers (the binary registers) can also be used to address the FIFO memory directly without the need to translate
‘memory addresses into Gray codes. The n-bit Gray-code pointer is still required to synchronize the pointers into the
‘opposite clock domains, but the n-I-bit binary pointers can be used to address memory directly. The binary pointers
also make it easier to run calculations to generate "almost-full” and "almost-empty" bits if desired (not shown in this
paper).
n-1-bit binary
Pointer to
address the
FIFO memory
addr = bin[n-2:0]
— Binary gnext »|d q ptr[n-1:0)
beat, to ary rey
n reg
bil gray
clk o> code pointer to
J | synchronize into
vatn | $ | "the opposite
8 ‘Simulation and Synthesis Techniques for
Asynchronous FIFO DesignFIFO style #1
The block diagram for FIFO style #1 is shown in Figure 5.
wdata wdeta data
wine
wfull wfull welken.
FIFO Memory
FIFO (Dual Port RAM) FIFO
wptr & rptr &
full empty
wader “208 _,| waddr raddr |g ——"0"_| radar
wor, ptr
wine _wptr ‘ptr rine
rempty
wa2_rptr, re re_wptr <<——
wret_n | J > Ib [= rrst_n
welk t tT, t t relk
wrst_n (een [ ! rrst_n
Figure 5 - FIFO partitioning with synchronized pointer comparison
To facilitate static timing analysis ofthe style #1 FIFO design, the design has been partitioned into the following six
Verilog modules with the following functionality and clock domains:
© £4£01.v- (see Example 2 in section 6.1) - this is the top-level wrapper-module that includes all clock
«domains. The top module is only used as a wrapper to instantiate all of the other FIFO modules used in the
design, I this FIEO is used as part of a larger ASIC or EPGA design, this top-level wrapper would probably be
discarded to permit grouping of the other FIFO modules into their respective clock domains for improved
synthesis and statie timing analysis.
© £i£omem.v- (see Example 3 in section 6.2) - this is the FIFO memory buffer that is accessed by both the
‘rite and read clock domains. This buffer is most likely an instantiated, synchronous dual-port RAM. Other
‘memory styles ean be adapted to function as the FIFO buffer.
# syne_r2w.v- (sce Example 4in section 6.3) - this is a synchronizer module that is used to synchronize the
read pointer into the write-lock domain. The synchronized read pointer willbe used by the wpt=_£u11
‘module to generate the FIFO full condition. This module only contains flip-flops that are synchronized to the
write clock, No other loge is included inthis modu,
+ sync_w2r.v- (see Example 5 in ection 6.4) this is a synchronizer module that is used to synchronize the
‘write pointer into the read-clock domain, The synchronized write pointer will be used by the xptx_empty
‘module to generate the FIFO empty condition. This module only contains flip-flops that are synchronized to the
read clock, No other logic i included in this module
+ pte_empty.v- (see Example 6 in section 6.5) - this module is completely synchronous o the read-clock
domain and contains the FIFO read pointer and empty-flg logic.
SNUG San Jose 2002 9 ‘Simulation and Synthesis Techniques for
Rev 1.2 Asynchronous FIFO Design* wptr_ful1.v - (see Example 7 in section 6.6) - this module is completely synchronous to the write-clock
domain and contains the FIFO write pointer and full-lag logic.
In order to perform FIFO full and FIFO empty tests using this F
passed to the opposite clock domain for pointer comparison,
FO style, the read and write pointers must be
AS with other FIFO designs, since the two pointers are generated from two different clock domains, the pointers
ced to be “safely” passed to the opposite clock domain, The technique shown in this paper is to synchronize Gray
‘code pointers to insure that only one pointer bit can change ata time.
5.0 Handling full & empty conditions
Exactly how FIFO full ad FIFO empty are implemented is design-dcpendent
‘The FIFO design in this paper assumes thatthe empty flag wil be generated inthe read-clock domain to insure that
the empty flag is detected immediately when the FIFO butfer is empty, that is, the instant thatthe read pointcr
catches up to the write pointer (including the pointer MSBS).
‘The FIFO design inthis paper assumes that the fll lag will be generated in the write-clock domain to insure that
the full flag is detected immediately when the FIFO buffer is full, tha is, the instant that the write pointer catches up
to the read pointer (except for different pointer MSBs),
5.1 Generating empty
‘As shown in Figure 1, the FIFO is empty when the read pointer and the synchronized write pointer are equal.
The empty comparison is simple to do. Pointers that are one bit larger than needed to address the FIFO memory
buffer are used. Ifthe extra bits of both pointers (the MSBs of the pointers) are equal, the pointers have wrapped the
same number of times and if the rest of the read pointer equals the synchronized write pointer, the FIFO is empty.
‘The Gray code write pointer must be synchronized into the read-clock domain through a pair of synchronizer
registers found in the syne_w2z module. Since only one bit changes at a time using a Gray code pointer, there is
no problem synchronizing multi-bit transitions between clock domains.
In order to efficiently register the rempty output, the synchronized write pointer is actually compared against the
rgraynext (the next Gray code that will be registered into the xptx). The empty value testing and the
‘accompanying sequential always block has been extracted from the xptr_empty . v code of Example 6 and is,
shown below
assign rempty_val = (rgraynext == ra2_wptr) ;
always @(posedge relk or negedge rrst_n)
if (!rrst_n) rempty <= 1'b1;
else empty <= rempty_va:
5.2 Generating full
Since the full lag is generated in the write-clock domain by running a comparison between the write and read.
pointers, one safe technique for doing FIFO design requires that the read pointer be synchronized into the wri
‘lock domain before doing pointer compatison,
‘The full comparison is not as simple to do as the empty comparison. Pointers that are one bit larger than needed to
‘address the FIFO memory buffer are still used for the comparison, but simply using Gray code counters with an
‘extra bit to do the comparison is not valid to determine the full condition. The problem is that a Gray code is a
symmetric code except for the MSBs,
SNUG San Jose 2002 10 ‘Simulation and Synthesis Techniques for
Rev 1.2 Asynchronous FIFO Designbit gray code
‘B-bit gray code
onl
)_ 00
001
011
1010
110
431
101 | 4bit gray 3-bit gray
FIFO |{rptx=7 (0_100) re 7 7
empty || wptr=7_(0_100) i
a
Inverse 3-bit gray code
epte=7 (0.200) | 5 4 foo
+8 13 Abit gray 3-bit gray
wetr=7 (27100) | § G02 tgray bit gray
10 1411
11 1410
12 1.010
13° 1011 This is the
14 1001 problem!
15 1.000
Figure 6 - Problems associated with extracting a 3-bit Gray code from a 4-bit Gray code
Consider the example shown in Figure 6 of an 8leep FIFO. In this example, a 3-bit Gray code pointer is used to
auddress memory and an extra bit (the MSB ofa 4-bit Gray code) is added to test for full and empty conditions. IFthe
FIFO is allowed to fill the first seven locations (words 0-6) and then ifthe FIFO is emptied by reading back the
same seven words, both pointers willbe equal and will point to address Gray-7 (the FIFO is empty). On the next
‘write operation, the write pointer will inerement the 4-bit Gray code pointer (remember, only the 3 LSBs are being
used to address memory), making the MSBs different on the 4-bit pointers but the rest ofthe write pointer bits will
match the read pointer bits, so the FIFO full flag would be asserted. Ths is wrong! Not only is the FIFO wot fll,
but the 3 LSBs did not change, which means that the addressed memory location will over-write the last FIFO
‘memory location that was written. This too is wrong!
This is one reason why the dual n-bit Gray code counter of Figure 4 and Section 4.0 is used,
The correct method to perform the full comparison is aecomplished by synchronizing the xptx into the weLk
domain and then there are three conditions that are all necessary for the FIFO to be full:
(1) The wptx and the synchronized xpte MSB's are not equal (because the wptx must have wrapped
fone more time than the rp).
(2) The wptr and the synchronized rptx 2nd MSB's are not equal (because an inverted 2" MSB from
‘one pointer must be tested against the un-inverted 2"! MSB from the other pointer, which is required if the
MSB'S are also inverses of each other - see Figure 6 above).
) All other wptx and synchronized xptx bits must be equal,
In order to efficiently register the w£u11 output, the synchronized read pointer is actually compared against the
wegnext (the next Gray code that will be registered in the wptz). This is shown below in the sequential always
block that has been extracted from the wptxx_fu11..v code of Example 7:
SNUG San Jose 2002 ul ‘Simulation and Synthesis Techniques for
Rev 1.2 Asynchronous FIFO Designassign wfull_val = ((wgnext[ADDRSIZE] !=wq2_zptr[ADDRSIZE] ) 66
(wgnext [ADDRSIZE-1] !=wq2_rptr[ADDRSIZE-1]) &&
(wgnext [ADDRSIZE-2; 0] ==wq2_rptr [ADDRSIZE-2:0])) ;
always @(posedge welk or negedge wrst_n)
if (twee
else
In the above code, the three necessary conditions to check for FIFO-full are tested and the result is assigned to the
w£ul1_val signal, which is then registered in the subsequent sequential always block,
The continuous assignment to w£wL1_val. can be further simplified using coneatenations as shown below:
assign wfull_val = (wgraynext==(~wq2_rptr[ADDRSIZE:ADDRSIZE-1] ,
wq2_zptr[ADDRSIZE-2:0]}) ;
5.3 Different clock speeds
Since asynchronous FIFOs are clocked from two different elock domains, obviously the clocks are running at
different speeds. When synchronizing a faster clock into a slower clock domain, there will be some count values
that are skipped due to the fact that the faster clock will semi-periodically increment twice between slower clock
‘edges. This raises discussion of the two following questions:
First question. Noting that a synchronized Gray code that increments twice but is only sampled once will
show multi-bit changes in the synchronized value, will this cause multi-bit synchronization problems?
‘The answer is no, Synchronizing multi-bit changes is only a problem when multiple bits are changing near the rising
‘edge of the synchronizing clack. The fact that a Gray code counter could increment twice (or more) between slower
synchronization clock edges means that the first Gray code change will occur well before the rising edge of the
slower clock and only the second Gray code transition could change near the rising elock edge. There is no multi-bit
synchronization problem with Gray code counters.
Second question, Again noting that a faster Gray code counter could increment more than once between the
rising edge of a slower clock signal, is it possible that the Gray code counter from the faster clock domain
could increment to a full-state and to a full+1-state before full is detected, causing the FIFO to overflow
without recognizing that the FIFO was ever full? (This question similarly applies to FIFO empty).
Again, the answer is no using the implementation described in this paper. Consider first the generation of FIFO full
The FIFO goes full when the write pointer catches up to the synchronized read pointer and the FIFO-full state is
‘detected in the write clock domain. Ifthe we1k-domain is faster than the e1k-domain, the write pointer will
‘eventually catch up to the synchronized read pointer, the FIFO will be full, the wu11 bit will be set and the FIFO
‘will quit writing until the synchronized read pointer advances again. The write pointer cannot advance past the
synchronized read pointer in the weL-domain,
A similar examination of the empty flag shows that the FIFO goes empty when the read pointer catches up to the
synchronized write pointer and the FIFO-empty state is detected in the read clock domain. If the eL-domain is
faster than the welk-domain, the read pointer will eventually catch up to the synchronized write pointer, the FIFO
will be empty, the rempty bit will be set and the FIFO will quit reading until the synchronized write pointer
‘advances again. The read pointer cannot advance past the synchronized write pointer in the xe1k-domain,
Using this implementation, assertion of “full” or “empty” happens exactly when the FIFO goes full or empty.
‘Removal of “full” and “empty” status is pessimistic.
SNUG San Jose 2002 2 ‘Simulation and Synthesis Techniques for
Rev 1.2 Asynchronous FIFO Design5.4 Pessimistic full & empty
‘The FIFO described in this paper has implemented full-removal and empty-removal using a “pessimistic” method.
‘That is, “full” and “empty” are both asserted exactly on time but removed late
Since the write clock is used to generate the FIFO-full status and since FIFO-full occurs when the write pointer
‘catches up to the synchronized read pointer, fll-detection is “accurate” and immediate, Removal of “full” status is
pessimistic because “full” comparison is being done with a synchronized read pointer. When the read pointer docs
increment, the FIFO is no longer full, but the full-generation logic will not detect the change until two rising wel
‘edges synchronize the updated xptr into the wel domain, This is generally nota problem, since it means that the
ybnex,
ine , pH
Hull or TED
tempty t
gnoxtint) MsB
gnext[n-2] ) mabnext || Gray
p_- Code
clk bit
addrmsb
.
(One extra flip-flop
to register the MSB of
the (n-1)-bit gray code
rtn |
Figure 3 by placing a second adder after the Gray-to-binary combinational logic to add four tothe binary value and
rogister the result. This registered value would then be used to do subtraction against the synchronized xptx after it
has been converted to a binary value in the weLk domain, and if the difference is less than four, an almost_fu11
bit could be set. A less-than operation insures that the almost_fuL1 bit is set for the full range when the wpte is
within 0-4 counts of catching up to the synchronized xptx. Similar logic could be used in the xeLk-domain to
generate the almost empty flag,
Almost full and almost empty have not been i
ed in the Verilog RTL code shown in this paper.
14 ‘Simulation and Synthesis Techniques for
Asynchronous FIFO Design6.0 RTL code for FIFO Style #1
The Verilog RTL code for the FIFO style #1 mode is listed in this section.
6.1 £i£01.v~-FIFO top-level module
‘The top -level FIFO module is a parameterized FIFO design with all sub-blocks instantiated using the recommended,
practice of doing named port connections. Another common coding practice is to give the top-level module
instantiations the same name as the module name. This is done to facilitate debug, since referencing module names
in hierarchical path will be straight forward if the instance names match the module names,
module fifol # (parameter DSIZE = 8,
parameter ASIZE = 4)
{output [DSIZE-1:0] rdata,
output wfull,
output renpty,
input [DSIZE-1:0] wdata,
input wine, welk, west_n,
input rine, relk, rrst_n);
wire [ASTZE-1:0] waddr, raddr;
wire [ASIZE:0] wptr, rptr, wq?_rptr, rq2_wptr/
sync_r2w sync_r2w (.wq2_xptr(wa2_eptr), .zptr(xptr),
:welk(welk), swest_n(wrst_n));
sync_w2r sync_w2r (.rq2_wptr(rq2_wptr), .wptr(wptr) ,
‘zelk(zelk), 7rEst_a(zrst_n)):
fifomem #(DSIZE, ASIZE) fifomem
(.rdata(rdata), .wdata(wdata) ,
waddr (waddr), .raddr(zaddr) ,
-welken (wine), .wEull (w£ull) ,
welk (welk)) ;
zptrompty #(ASIZE) —_rptr_empty
(rempty (zempty) ,
sraddr (raddr) ,
‘pte (xptr), -rq2_wptr (q2_wptr),
rine(rine), .relk(relk) ,
leret_a(zrst_n));
wptr_full #(ASTZE) wptr_full
(CwEGLl(wfull) , .waddz (waddr) ,
iwptr(wptr), .wq2_xptr (wq2_rptr) ,
Wine (wine), .welk(welk) ,
cwest_n(wrst_n));
endnodule
Example 2 - Top-level Verilog code for the FIFO style #1 design
SNUG San Jose 2002 15 ‘Simulation and Synthesis Techniques for
Rev 1.2 Asynchronous FIFO Design62 £:
‘The FIFO
memory
i fomem.v~- FIFO memory buffer
memory buffer is typically an instantiated ASIC or FPGA dual-port, synchronous memory deviee. The
buffer could also be synthesized to ASIC or FPGA rogisters using the RTL code in this module.
‘About an instantiated vendor RAM versus a Verilog-declared RAM, the Synopsys DesignWare team did internal
analysis
RAM cor
Ia vend
and found that for sizes up to 256 bits, there is no lost area or performance using the Verilog-declared
smpared to an instantiated vendor RAM)
jor RAM is instantiated, it is highly recommended that the instantiation be done using named port
‘connections.
module fifomem #(parameter DATASIZE = 8, // Memory data word width
Parameter ADDRSIZE = 4) // Number of mem address bits
{output [DATASIZE-1:0] rdata,
input [DATASIZE-1:0] wdata,
input [ADDRSIZE-1:0] waddr, raddr,
input welken, wfull, welk) ;
“ifdef VENDORRAM
// instantiation of a vendor's dual-port RAM
vendor_ram mem (.dout(rdata), .din(wdata) ,
swaddr (waddr) , .raddr (raddr) ,
welken (welken) ,
welken_a(wéull), .clk(welk)) ;
velse
// REL Verilog memory model
localparam DEPTH = 1<>1) * rbinnext;
ue
7/ FIFO empty when the next rptr == synchronized wptr or on rese
ue
assign rempty_val = (rgraynext == rq2_wptr) ;
always @(posedge rclk or negedge rrst_n)
if (Irrst_n) rempty <= 1'b1;
else
endmodule
Example 6 - Verilog RTL code for the read pointer and empty flag logie
SNUG San Jose 2002 18 ‘Simulation and Synthesis Techniques for
Rev 1.2 Asynchronous FIFO Design6.6 wptr_full.v- Write pointer & full generation logic
‘This module encloses all ofthe FIFO logic that is generated within the write elock domain (except synchronizers),
‘The write pointer is a dual n-bit Gray code counter. The n-bit pointer ( wptr ) is passed to the read clock domain
through the syne_w2z module, The (n-1)-bit pointer ( waddr ) is used to address the FIFO buffer,
‘The FIFO full output is registered and is asserted on the next rising weLk edge when the next modified wgnext.
value equals the synchronized and modified weptx2 value (except MSBs). All module outputs are registered for
simplified synthesis using time budgeting. This module is entirely synchronous to the weLk for simplified static
timing analysis.
module wptr_full #(parameter ADDRSIZE = 4)
{output reg wfull,
output [ADDRS2zZE-1:0] waddr,
output reg [ADDRSIZE :0] wptr,
input [appRSIzE :0] wq2_zptr,
input wine, welk, west_n);
reg [ADDRSIZE:0] wbin;
wire [ADDRSIZE:0] wgraynext, wbinnext;
// GRAYSTYLE2 pointer
always @(posedge welk or negedge wrst_n)
if (Iwrst_n) (wbin, wptr}
else {wbin, wptr}
{wbinnext, wgraynext};
// Memory write-address pointer (okay to use binary to address memory)
assign waddr = wbin[ADDRSIZE-1:0];
wbin + (wine & ~wfull);
(wbinnext>>1) * wbinnext;
assign wbinnext
assign wgraynext
uy
7/ Simplified version of the three necessary full-tests
77 assign wfull_val=((wgnext[ADDRSIZE] !=wq2_xptr[ADDRSIZE] } 66
" (wgnext [ADDRSTZE-1] _!=wq2_rptr[ADDRSTZE-1]) 66
" (wgnext [ADDRSIZE-2:0]==wq2_rptr [ADDRSIZE-2:0])) :
uw
assign wfull_val
(wgraynext==(~wq2_rptr[ADDRSIZE : ADDRSIZE-1],
wq2_rptr[ADDRSIZE-2:0])) ;
always @(posedge welk or negedge wrst_n)
if (Iwest_n) wfull <= 1'bO;
else wfull <= wfull_val;
endnodule
Example 7 - Verilog RTL code for the write pointer and full flag logic
SNUG San Jose 2002 rt) ‘Simulation and Synthesis Techniques for
Rev 1.2 Asynchronous FIFO Design7.0 Comparing Gray code pointers to binary pointers
As mentioned in section 2.3, binary pointers can be used to do FIFO design ifthe pointers are sampled and
handshaking control signals are used between the two clock domains to safely pass the sampled binary count values
Some advantages of using binary pointers over Gray code pointers
* The technique of sampling @ multi-bit value into a holding register and using synchronized handshaking control
signals to pass the multi-bit value into a new clock domain can be used for passing ANY arbitrary multi-bit
value across clock domains. This technique can be used to pass FIFO pointers or any multi-bit value,
+ Bach synchronized Gray code pointer requites 2n flip-flops (2 per pointer bit). The sampled multi-bit register
requires 2n+4 flip-flops (I per holding register bitin each clock domain, 2 flip-flops to synchronize a ready bit
and 2 flip-flops to synchronize an acknowledge bit). There is no appreciable difference in the chance that either
pointer style would experience metastability
The sampled mult
and decrement.
it binary register allows arbitrary pointer changes. Gray code pointers can only increment
* The sampled multi-bit register technique permits arbitrary FIFO depths; whereas, a Gray code pointer requires
power-of-2 FIFO depths. Ifa design required a FIFO depth of at least 132 words, using a standard Gray code
pointer would employ a FIFO depth of 256 words. Since most instantiated dual-port RAM blocks are power-of
2 words deep, this may not be an issue,
* Using binary pointers makes it easy to calculate “almost-empty” and “almost-full” status bits using simple
binary arithmetic between the pointer values.
‘One small disadvantage to using binary pointers over Gray code pointers is:
+ Sampling and holding a binary FIFO pointer and then handshaking it across a clock boundary can delay the
capture of new samples by atleast two clock edges from the receiving clock domain and another two clock
edges from the sending clock domain. This latency is generally not a problem but it will typically add more
pessimism to the assertion of full and empty and might require additional FIFO depth to compensate for the
added pessimism. Since most FIFOs are typically specified with excess depth, itis not likely that extra registers
or a larger dual-port FIFO butler size would be required.
‘The above comparison is worthy of consideration when selecting a method to implement a FIFO design,
8.0 Conclusions
Asynchronous FIFO design requires careful attention to details from pointer generation techniques to full and empty
‘generation, Ignorance of important details will generally result in a design that is easily verified but is also wrong.
Finding FIFO design errors typically requires simulation of a gate-level FIFO design with backannotation of actual
delays and a whole lot of luck?
Synchronization of FIFO pointers into the opposite clock domain is safely accomplished using Gray code pointers.
Generating the FIFO-full status is pethaps the hardest part of a FIFO design. Dual n-bit Gray code counters are
valuable to synchronize and n-bit pointer into the opposite clock domain and to use an (u-1)-bit pointer to do “full
‘comparison. Synchronizing binary FIFO pointers using techniques described in section 7.0 is another worthy
technique to use when doing FIFO design.
Generating the FIFO-empty status is easily accomplished by comparing-equal the n-bit read pointer to the
synchronized n-bit write pointer.
‘The techniques described in this paper should work with asynchronous clocks spanning small to large differences in
speed.
Careful partitioning of the FIFO modules along clock boundaries with all outputs registered can facilitate synthesis
and static timing analysis within the two asynchronous clock domains.
SNUG San Jose 2002 20 ‘Simulation and Synthesis Techniques for
Rev 1.2 Asynchronous FIFO Design9.0 DesignWare FIFOs
{It should be mentioned that DesignWare (DW) has @ number of FIFO implementations that can be instantiated into
‘a design. It should also be noted that the DW FIFOs have not always been bug-free.
For additional documentation, go to SolvNet and search on "FIFO STAR" and you will find STAR 104287 and
STAR 105016 related to the FIFO DW components and the DW_16550 UART. All of these bugs had to do with the
DW FIFOs and FIFO sections of the UART. The DesignWare-110html says that the bugs are fixed in the 1299-3,
patch (December 1999),
‘There are too many ways to do a FIFO design wrong and I consider relying on the DW FIFO components to be
absolutely correct without more details on how they were designed to be very risky. Unless I could verify that IP
designers followed the important FIFO design guidelines outlined in this paper, I would be inclined to code my own
FIFO designs.
10.0 Acknowledgements
Tam grateful to Ben Cohen for his willingness to discuss FIFO design issues with me in preparation for writing this
paper. I would also like to thank Peter Alike of Xilinx for also discussing with me altemate interesting approaches
10 FIFO design,
A special thanks to Steve Golson for doing a great review of the paper on short notice and adding the valuable
information, techniques and advantages related to using binary pointers in FIFO design in place of the Gray code
pointers. Also for finding the original patent information on Frank Gray's “Pulse Code Communication.”
11.0 Additional Post-SNUG Editorial Comments
‘A second FIFO paper, voted “Best Paper - 1 Place” by SNUG attendees, i listed as reference [3] and is also
available for download,
‘Many of the techniques used in the second FIFO paper(3] can also be used in the FIFO1 design. In particular, the
“dual n-bit counter" of the FIFO design can be replaced with the quadrant detection logic described in the second
FIFO paper. The FIFO! Gray code counter style #1 can also be replaced with the faster Gray code counter style #2