Eecs150: Lab 2, Mapping Circuit Elements To Fpgas: 1 Time Table
Eecs150: Lab 2, Mapping Circuit Elements To Fpgas: 1 Time Table
1 Time Table
2 Objectives
In this lab, you will be working with Verilog HDL (hardware description language) to architect a practical
circuit. After designing your circuit, you will debug and verify it using a hardware-based test harness.
Finally, after your circuit is working correctly, you will conduct a resource and timing analysis that will
show you exactly how your design was actually implemented on the FPGA.
Through conducting these tests, you will gain experience working with non-trivial circuits on real
hardware. Additionally, you will learn about design partitioning, the process where primitive gates and
flip-flops in an HDL, like Verilog, are mapped down to primitive elements on the FPGA. Lastly, you
will learn how to use hardware-based test harnesses, along with various tools, to help you debug your
designs.
1. Ease of editing, since files can be written using any text editor
1 Specifically, we are referring to the .ncd file that FPGA Editor works with.
1
Figure 1 Structural Verilog −→ .bit file tool flow.
Optional step
FPGA Editor
In this class we will default to using basic text editors to write Verilog. Fancier editors are available,
and in fact are included with the CAD tools such as Xilinx ISE and ModelSim; however these tools are
slow and will often hinder you. In this lab, you will only be using a small subset of Verilog
called Structural Verilog. Specifically, you will be designing down to primitive gates (such as and and
or) and flip-flops.
3.3.1 Translate
Translate takes as input a netlist file from the design partitioning tools and outputs a Xilinx database file,
which is the same thing as the netlist, reduced to logic elements expressed in terms that Xilinx-specific
devices can understand.
3.3.2 Map
Map takes as input the database file which was output from Translate and ‘maps’ it to a specific Xilinx
FPGA. This is necessary because different FPGAs have different architectures, resources, and compo-
nents.
2
3.3.3 Placement
Placement takes as input the result of the “Map” step and determines exactly where, physically, on the
FPGA each LUT, flip-flop, and logic gate should be placed. For example, a 6LUT implementing the
function of a 6-input NAND gate in a netlist could be placed in any of the 69,120 6LUTs in a Xilinx
Virtex5 xc5vlx110t FPGA chip. Clever choice of placement will make the subsequent routing easier and
result a circuit with less delay.
3.3.4 Routing
Once the components are placed, the proper connections must be made. This step is called routing,
because the tools must choose, for each signal, one of the millions of paths to get that signal from its
source to its destination.
Because the number of possible paths for a given signal is very large, and there are many signals, this
is typically the most time consuming part of implementing a design, aside from specification. Planning
your design well and making it compact and efficient will significantly reduce how long this step takes.
Designing your circuit well can cut the time it takes to route from 30 min to 30 sec.
4 Lab Prerequisites
Before reading any datasheets or writing any Verilog, read the entirety of this section. It contains a
detailed description of the Verilog constructs that you will be allowed to use as well as of the circuit you
will be designing. Additionally, it provides a soft introduction to the Xilinx primitives that your design
will need to instantiate. If you have any questions regarding the material in this section, be sure to ask
a TA ahead of time.
(a) and
(b) or
(c) not
(d) xor
3
Additionally, although they are not stricly Structural Verilog constructs, you will be using Verilog
generate and parameter statements.
If any of the constructs mentioned above are unclear or you feel that you need to brush
up on Verilog in general, please refer to your text or to the code examples provided in the
lab (specifically ALU.v and Mux21.v). With regards to using any source to learn Verilog, remember
that you are only allowed to instantiate primitive gates, modules and wires in this lab. In general, the
only allowed syntax for this lab can be found in one or both of ALU.v and Mux21.v.
Result
ALU
In
ALUOp Clock
As shown in Figure 2, the Accumulator has a Clock, a standard data input In and an ALUOp2 . In
is a new value from outside the Accumulator which is to be processed by the ALU. ALUOp indicates the
operation (and, xor or + for example) that the ALU will perform. The other input to the ALU is the
output of a register which captures the result of the ALU from the cycle before. Thus, when the ALUOp
is set to addition, the Accumulator ‘accumulates,’ hence the name.
We will be implementing our ALU using a bit-slice approach. A bit-sliced ALU is an N-bit ALU
implemented as 1-bit ALUs. Bit-sliced ALUs are very efficiently mapped to FPGAs because each 1-bit
ALU maps to a small number of FPGA resources. Additionally, by implementing your ALU as a chain
of 1-bit ALUs, it will be easy to use Verilog generate statements to parameterize your ALU to N-bits
(where you can choose the N). An example of using generate to build an N-bit ALU is shown pictorially
in Figure 3.
2 Accumulators do not typically have an ALUOp as they typically use a plain Adder circuit. Our Accumulator has an
ALUOp because it uses a full-blown ALU.
4
Figure 3 Extending a bit-slice ALU out to N-bits using generate
AB
ALUOp[0]
CarryIn
A[0]
B[0]
ALUBitSlice
CarryOut
CarryIn
A[1]
B[1]
ALUBitSlice
CarryOut
CarryIn
Result
A[2]
B[2]
ALUBitSlice
CarryOut
ALUOp
3b
‘generate’ block
As you will soon come to appreciate, the ease of changing the ALU’s bit-width is extremely useful
for performing experiments. Specifically, instead of recoding an ALU of a different width, you can just
change a parameter that sets the width and the generate statement will take care of rebuilding the circuit
automatically.
You will find the following Verilog written for you (feel free to modify it, but do not modify the port
interface on the ALU):
The N-bit ALU will show you exactly how the ALUBitSlice modules are used (as it instantiates them)
and how to use a Verilog generate statement to parameterize your circuits. You can extend this concept
to the Accumulator, which requires the same support, but is not done for you. Keep in mind, you must
still incorperate the flip-flop FDRSE primitive element in your Accumulator. This has not
been provided and you are expected to become aquanted with the Xilinx documentation
(see PreLab Part 1b) on how to use and instantiate the FDRSE.
Aside from its interface, which is specified by the framework because it is instantiated in ALU,
ALUBitSlice’s implementation is entirely up to you. Take note, however, that part of the ALUBitSlice’s
5
interface is exactly what ALUOp encoding it should support (see Table 1). If your ALUBitSlice doesn’t
follow this ALUOp encoding, it will produce different results when run alongside the Tester.
As you probably notice in Table 1, our ALU supports many more functions than a mere 2:1 mux can
choose between. You will have to find a way to join 2:1 muxes together in order to create a mux large
enough to support all of the operations shown in Table 1. Feel free to use Mux21 as a starting point. It
has been provided as an example of what Verilog constructs/syntax you are allowed to use
in the rest of your design.
5 PreLab
Please make sure to complete the prelab before you attend your lab section. This week’s lab will be very
long and frustrating if you do not do the prelab ahead of time.
1. Reading
(a) Read Sections 3 and 4 and ask questions ahead of time if anything is unclear.
• Pay particular attention to Section 4.3 as it will tell you what you have to implement for
PreLab Part 2a specifically.
(b) Read the “FDRSE” section in the Virtex-5 Libraries Guide for HDL Designs.
(a) Write all the Verilog specified in Section 4.3 ahead of time.
• Since Verilog is nothing more than a bunch of standard text in a file with a *.v extension,
you can complete this part in your favorite text editor (we recommend emacs in Verilog
mode or Notepad++).
• Don’t worry about debugging your Verilog. The first main part of the lab constitutes
debugging your design.
3. Questions: Answer all questions on the Check-off sheet of this lab packet.
6 Lab Procedure
This section, and those beyond, assumes that you have coded the Structural Accumulator.
Before we analyze our circuit, we have to make sure it works correctly. To debug it properly, we will
use several tools. First, we will run Synplify Pro (see Figure 1) to check our Verilog for syntax errors.
Next, we will use the Synplify RTL Schematic to produce a gate and register level diagram of our
circuit. This will allow us to inspect our design for obvious errors which are harder to see in Verilog
code. Finally, we will actually test our circuit on actual hardware against a test harness that is also built
in hardware.
6
6.1 Circuit Debugging
As our first step, we must resolve trivial typos and remaining bugs in our circuit. To accomplish this,
we must first setup a Xilinx ISE Project to negotiate with the tools properly.
Xilinx ISE, as shown in Figure 4, will allow you to manage your files and invoke the various CAD
tools from a central location.
Before continuing, take a moment to orient yourself with the Xilinx ISE IDE. In the upper left is the
“Sources” box, where you can see all the modules that are part of your project, as well as which modules
they depend on (or test) and which files that are in. In the middle left, you can see the “Processes”
box, which will show all of the tools which can be applied to the currently selected source file.
You might have noticed that several Verilog files you had no part in modifying (the Verilog in the
/Framework directory) have appeared in the Sources area. These include FPGA_TOP_ML505 and its children:
several TestHarness modules. FPGA_TOP_ML505 is (by default) the top level Verilog file in your design. It
3 The directory doesn’t matter; however, we will use this one for the remainder of this tutorial.
7
Figure 4 A Complete Project
contains the assignments for various pins on the chip to wires that you can use in your design. Inside
FPGA_TOP_ML505 you will find several TestHarness modules, and only inside them will you find your actual
Accumulator or ALU. Working with the TestHarness modules will be the focus of Section 6.1.3.
Before you run any tests on hardware, however, you must reconcile any syntax errors in your Verilog
handiwork.
1. Select Implementation from the Sources for: pull down in the Sources box.
2. In Xilinx ISE select FPGA TOP ML505 from the Sources box.
• This will cause a long list of implementation steps to appear in the Processes box.
8
4. Double-click Synthesize - Synplify Pro.
• This will run the Design Partitioning tools on your design.
• If there is an X or a ! next to the Synthesize - Synplify Pro step, this means that there
has been an error or warning.
• To see the errors and warnings from Synplify Pro, double-click the Synthesize -
Synplify Pro → View Synthesis Report step.
Once you find that you have errors in your Synthesis Report (you probably will), you must go
through the Synthesis Report and fix them. This can be very daunting as the Synthesis Report is quite
dense. Program 1 shows an example fragment from a Synthesis Report.
Consider the first line. @N denotes the entry type (there is one entry per line and they may be @W
warnings, @E errors or @N notes). C:\Test.v denotes the path to the module that is responsible for
throwing the entry. 21 (the left-most number) denotes the line number in the parent module. Lastly,
the text at the far right offers a brief summary of the entry.
If running Synplify Pro fails, it is because you have @E or error entries in your design. You must
fix, rerun, fix, rerun, . . . etc these errors until running Synplify Pro does not throw errors
anymore. Don’t worry about @W or @N. You are guaranteed to get them, and they will mostly be benign
in the case of this circuit.
After your design has passed the point of having no errors (as far as Synplify Pro is concerned), it is
time to look at the schematic of your circuit for visual aided debugging.
1. To view a schematic of the circuit double-click on the Synthesize - Synplify Pro → Launch
Tools → View RTL Schematic step.
(a) This will launch Synplify Pro and automatically open the RTL Schematic.
(b) Navigate through the schematic using the Synplify Navigation Bar (see Figure 5).
• You can look inside modules using the “Push/Pop” Navigation Bar buttons.
Select Tool
Push/Pop
(Click Module to See Inside)
Does your circuit look correct? Remember that the first step in the design entry process is to come
up with a design that can be scribbled on a piece of paper. The RTL Schematic should have replicated
9
your vision to the gate and register. Look carefully for wires that you think should be connected
in your design but are disconnected or connected incorrectly. This is often due to misspelling
a wire in your circuit4 .
A A
ALUOp
Circuit under Test
TA Circuit
3b (‘CUT’)
==
Success Error
By default, the TestHarness circuits offer very little visibility into why your design fails if your design
does indeed fail. In order to make the TestHarness modules more useful, you are allowed to modify them
to fit your debugging needs. Specifically, you may add any Structural Verilog to FPGA_TOP_ML505
or to the TestHarness modules. For example, you can connect LEDs up to various wires in each
TestHarness for added visibility into why your design fails6 .
When you finish making your modifications, it is time to push your Verilog from Synplify Pro to the
FPGA. In Xilinx ISE, this process is fairly straightforward, but is better broken up into two parts. First,
in order to properly synthesize a black box, such as the ALUBehavioralBitSlice.edf file we have given
you, you must take a few extra steps before running the tools that will push your design to hardware:
1. Make sure to add the following shell Verilog file(s) to your project:
(a) ALUBehavioralBitSlice.v
4 Synplify Pro will not normally throw an error if your design contains wires that are spelled incorrectly.
It is a part of the Verilog standard to initialize unknown wires as 1-bit wires. This will cause many a bug in your circuits
throughout this semester. Using the RTL Schematic to quickly pinpoint disconnected wires will save you hours
over the course of the project.
5 In order to make this assignment realistic we have given you an EDIF black box for our implementations of
FPGA_TOP_ML505.
10
(b) AccumulatorBehavioralBitSlice.v
(a) Make sure FPGA TOP ML505 is highlighted in the Sources Box.
(b) Right-Click on Implement Design in the Processes Box.
(c) Go to the Translate Properties tab.
(d) Set the Macro Search Path to the exact directory where your copy(ies) of the black box .edf
files reside.
3. Your project should now be able to build with black box files properly.
Second, and at last: you must invoke all of the tools on your design to bring your Verilog to life:
1. In Xilinx ISE make sure FPGA TOP ML505 is still selected in the Sources box.
2. Double-click Synthesize - Synplify Pro and fix any errors that it reports back through the
Synthesis Report.
3. Invoke the Xilinx Place And Route tools by double-clicking on the Implement Design step
in the Processes box.
• This will run three sub tools: Translate, Map and PAR.
• Ignore any warnings from these steps only in this lab. They will often give warnings that can
be safely ignored.
In the future, you don’t have to explicitly click all of the above tabs in order to run the tools. If you
click Configure Target Device, all of the tools that it depends on will run in automatically.
As you find bugs in your design and have to make changes, keep in mind the time it takes to generate
a schematic versus the time it takes to verify on hardware. It is sometimes, for a design of this size, very
easy to find a bug in an RTL Schematic. Additionally, you do not need to run all of the tools and then
configure the board to see the schematic. Consider these development time tradeoffs when debugging
your circuit: you will save yourself a lot of time!
11
3. SLICE Registers (used as flip-flops) (see check-off Question 2c).
Based on your thought process developed in the PreLab, you can probably derive a decent guess for
these numbers right now. To verify your guess, we will use the generate block that we used to implement
our Accumulator to change the width of the Accumulator without rewriting any Verilog.
1. In the Sources box, right-click on Accumulator which is nested underneath FPGA_TOP_ML505 and
select Set as Top Module.
• This will tell the tools to only PPR our Accumulator, as opposed to the Tester and any other
baggage in FPGA_TOP_ML505.
• We can now get an accurate resource estimate of only the resources taken up by the Accumu-
lator.
2. In Accumulator.v, modify the value assigned to the parameter called Width and rerun the tools to
find out how many resources your Accumulator takes up.
In order to test your resource consumption theory, you will tweak Width in Step 2, above, until you
can come up with a generalized formula for determining resource consumption.
Of course, the last step of this story is how to actually use the tools to find resource consumption.
This is very simple; specifically:
1. Double-click Implement Design step in the Processes For Source box.
2. After the Map process completes (i.e. has a green check next to its name), double-click on View
Design Summary in the Processes For Source box.
3. Inspect the Design Summary for resource totals.
Before proceeding to Section 6.2.2, answer the check-off Questions 2a, 2b and 2c based on your
observations and experiments.
12
Figure 7 The path in the Accumulator whose delay we would like to analyze
Result
ALU
In
ALUOp Clock
2. After the Place and Route process completes, open FPGA Editor by double-clicking View/Edit
Routed Design (FPGA Editor), which can be found under Implement Design −→ Place
and Route.
3. Once in FPGA Editor, find the net that you located by name in the Technology Schematic.
4. How was this net routed in your design? In other words, how does it get from the output of the
FDRSE to the input of the ALU? Answer this question on the check-off sheet (Question 3a).
5. Left-click on this net and click on the delay button at the far right of the screen on the Button
bar.
• The Console Output window will show you the delay from the net’s driver to everywhere
else in your design that the net is connected.
• Connections will only tell you which SLICEs are connected to your net. As such, you will
have to look at each SLICE (they will probably have useful names) to determine which one
implements the 1-bit ALU and FDRSE that you are interested in.
6. Find the delay on the net shown in Figure 7 and mark this delay down on the check-off sheet for
Question 3b.
Now, you built an N-bit accumulator. We have just seen the delay on a single 1-bit wire that feeds
back from an FDRSE to the input of the ALU. What about the other wires? Will they share similar delay
or differing amounts of delay? Based on your answer for Question 3a and from direct inspection, answer
this question on the check-off sheet (Question 3b).
7 References
13
8 Lab 2 Checkoff
1. PreLab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (30%)
(a) What FPGA resources (be precise) does a single ALUBitSlice instance map to on the Virtex5
xc5vlx110t FPGA?
(b) What happens to the state of the FDRSE when its R and S inputs are both high?
(c) In Question 1b, when do the values on the R and S lines actually matter? Any time? At the
rising edge of the clock? Explain why, based on the FDRSE’s description.
(d) Imagine an Accumulator such as the one in Figure 2 without the flip-flop at the output. In
other words, the output feeds directly into the second input of the ALU. Does this circuit
make sense? Explain is behavior.
14
(a) How was the net from the output of the FDRSE to the input of the ALU routed?
(b) Delay (in ns) from the output of the FDRSE to the input of a LUT that implements an
ALUBitSlice
(c) Is there significant difference in the delay between the different wires that make up the bus in
an N-wide implementation? Why?
15