Section4 Fpga
Section4 Fpga
M. Balakrishnan
Outline (Section 4)
• Basics
– Why FPGAs
– XILINX design flow
• FPGA architecture
– Evolution of FPGAs
• FPGA technology mapping
– Technology mapping
– FPGA mapping
OUTLINE
•Introduction
• FPGA Architecture
• Xilinx Virtex-E
• Design Flow
• Design reports
OUTLINE (contd.)
• Design Flow
– Specifications of Design
– Design Entry
– Design Synthesis
– Mapping the Design
– Placing the Design
– Routing the Design
– Bit-stream Generation
– Assigning pins
Why FPGA?
Custom logic implemented as ASICs have the
following merits and demerits
Pros:
a. Reduced system complexity.
b. Improved performance.
Cons:
a. Very expensive to develop.
b. Delay introduction of product to market (time to
market) because of increased design time.
Why FPGA?
Need to worry about two kinds of costs:
a. Cost of development, sometimes called non-recurring
engineering (NRE)
b. Cost of manufacture or Recurring cost (RC)
• An FPGA is an
array of
programmable logic
elements that can be
connected to inputs
or outputs.
8
FPGA Architecture (contd.)
• In modern FPGA,
other elements such
as RAM, PLLs, and
even microprocessor
have been built
within FPGAs.
A Simple FPGA Logic Block
0 0 0 0 0 0 0 ................. 0 1 1
output
a
b FF
c
d
LUT
clk
Simple Circuit
LUT Configuration bits
0 0 0 1 0 0 0 ............ 1 1 1
LUT LUT
a ..... 1 0 0 ..... a
b
c FF . . b FF
d c
d
..... 1 .....
. .
0 1 1 1 1 0 0 ............ 1 0 1
LUT LUT
a a
b FF ..... 0 1 0 .....
b
c FF
c d
d . .
Example: FPGA from Xilinx
• Basic blocks are logical cells.
• A slice comprise of two logic cells.
• A configurable logic block (CLB) may have up to 4 slices:
• CLB of XC4000 series have 1 slice.
• CLB of virtex series have 2 or 4 slices.
CIN
Source: xilinx.com
Interconnections
• Five type of interconnection based on
length
Single length lines, double length lines, Quad, Octal and long
lines.
Source: xilinx.com
Programmable Interconnects
• Connection box
– Connects input/output of logic block to
interconnect channels.
• Switch box
– Enables the connection of two interconnect
lines.
• Transmission gate (or a pass transistor)
is used for each connection.
Programmable Interconnects
Source: xilinx.com
DESIGN FLOW Specifications
• Specifications of Design.
• Converting into HDL. HDL Simulation
• Synthesize Design
• Map design Technology
• Placing design inside FPGA Mapping
64
a
64
64-BIT sum
64 ADDER
b
cin
Specifications to FPGA
Specifications
Synthesis HDL
(Behavioral)
Technology
Technology Mapping
Mapping
Schematic
Place&Route
Place&Route Simulation
FPGA Bit-file
Bit-file
FPGA
FPGA
Specs (VHDL) to FPGA
library IEEE;
library
use IEEE;
IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use
use IEEE.STD_LOGIC_ARITH.ALL;
IEEE.STD_LOGIC_UNSIGNED.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity add_g is
entity add_g
generic is
(x : natural := 63);
generic (x : natural := 63);
port (a : in std_logic_vector (x downto 0);
b port (a : in std_logic_vector
: in std_logic_vector (x downto
(x downto 0); 0);
b : in std_logic_vector
cin : in std_logic; (x downto 0);
cin :: out
sum in std_logic;
std_logic_vector (x downto 0);
sum
cout : out: out std_logic_vector (x downto 0);
std_logic);
cout : out std_logic);
end entity add_g;
end entity add_g;
architecture behavior of add_g is
architecture
begin -- behavior behavior of add_g is
begin -- behavior
adder: process(a,b,cin)
adder: carry
variable process(a,b,cin)
: std_logic;
variable
variable isum carry : std_logic;
: std_logic_vector(x downto
variable
0); isum : std_logic_vector(x downto
0);
begin
begin
carry := cin;
carry
for i in := cin;
0 to x loop
for i in
isum(i) :=0 toa(i)
x loop
xor b(i) xor carry;
carry := (a(i)a(i)
isum(i) := andxor b(i)orxor
b(i)) carry;
(a(i) and
carry := (a(i) and
carry) or (b(i) and carry);b(i)) or (a(i) and
carry)
end loop; or (b(i) and carry);
end<=
sum loop;
isum;
cout <=<=
sum isum;
carry;
cout <=
end process adder; carry;
end
end process adder;
architecture behavior;
end architecture behavior;
Design Entry
Bit-stream Configuration
Assigning Pins
When implementing an entity in FPGA, the
input and output ports are mapped to pins of the
FPGA
entity add_g is
entity add_g
generic is
(x : natural := 63);
generic (x : natural := 63); a(63:0) cin
port (a : in std_logic_vector (x downto
0);port (a : in std_logic_vector (x downto b(63:0) FPGA
b 0);
: in std_logic_vector (x downto 0); cout
b ::in
cin in std_logic_vector
std_logic; (x downto 0);
cin : in std_logic; sum(63:0)
sum : out std_logic_vector (x downto 0);
sum
cout : out: out std_logic_vector (x downto 0);
std_logic);
cout : out
end entity add_g; std_logic);
end entity add_g;
Assigning Pins (cont.)
• A file called a UCF (User Constraint File) is
used to define which pin will be connected to
a particular input or output.
LOGIC
INSTR
Interconnect
+ Storage
Interconnect
+ Storage
Traditional P
Q
Read or Write Q P1
P2 Out
Data P3
P4
Programming Bit I1I2
2-Input LUT
Where are FPGAs Used
http://www.xilinx.com/bvdocs/publications/4000.pdf Page 9
Platform Computing
The Virtex Architecture
• CLBs
• IOBs
• General Routing
Matrix (GRM)
• BRAMs
• DLL
Virtex II Architecture
Virtex II CLB
V2 CLB Configuration
V2 Slice Configuration
Virtex II CLB (Half Slice)
Adder
Carry Chain
Other Features
The latest entry – Virtex II Pro
•Embedded high-speed serial transceivers enable data
bit rate up to 3.125 Gb/s per channel (RocketIO) or
10.3125 Gb/s (RocketIO X).
• Embedded IBM PowerPC 405 RISC processor blocks
provide performance up to 400 MHz.
• SelectIO-Ultra blocks provide the interface between
package pins and the internal configurable logic. Most
popular and leading-edge I/O standards are supported
by the programmable IOBs.
• Configurable Logic Blocks (CLBs) provide functional
elements for combinatorial and synchronous logic,
including basic storage elements. BUFTs (3-state
buffers) associated with each CLB element drive
dedicated segmentable horizontal routing resources.
•Block SelectRAM+ memory modules provide large
18 Kb storage elements of True Dual-Port RAM.
• Embedded multiplier blocks are 18-bit x 18-bit
dedicated multipliers.
• Digital Clock Manager (DCM) blocks provide self
calibrating,fully digital solutions for clock distribution
delay compensation, clock multiplication and division,
and coarse- and fine-grained clock phase shifting.
FPGA Technology Mapping
Outline
• Technology mapping
– Definition & Examples
– Algorithms
• FPGA structure & simple mapping
• FPGA technology mapping
– Issues
– Algorithms
Definition
Technology mapping is also referred to as
library binding.
Area = 9, Delay = 4
Example: Second Mapping
Cell library consists of
FF
XC4000 CLB
4 input
LUT
FF
3-input
LUT
FF
4 input
LUT
Mapping Objectives
• Cost optimal mapping
– Minimizing the number of LUTs
– Minimizing the number of CLBs
• Delay optimal mapping
– Minimizing the number of LUT levels
– Minimizing the delays (including routing
delays)
Cost Optimal Mapping
The problem of k-input LUT maps can be
mapped to the problem of bin packing. We
have to minimize the number of bins each
with a capacity of k.
Assume the starting point is a gate-level
netlist with each gate containing less than
equal to k inputs.
Each gate can be packed into one bin.
Example: Simple Mapping
Sum of Products: Bin Packing
• Select the product term with the most
number of variables and fit it into any table
where it fits and if it doesn’t fit anywhere
add a new table
• The table with the fewest number of unused
inputs is declared as final
• Associate this output with the first table that
can accept it
Example: 4-input LUT
Example: Overlapping Inputs
a
b
c
a
d
e
f
g
K=4
Example: Decomposition
a
b
c
h
d
e
f
g
K=4
Example: 3 input LUT
FPGA Technology Mapping:
Issues
LUT Mapping
Starting from a technology independent
optimized circuit, produce a minimal LUT
cover for the circuit. The complexities are
due to the following reasons.
• Fanout nodes
• Reconvergence
• Node decomposition and packing
Area vs. Delay
Decomposition
Decomposition
Fanout: Replication