0% found this document useful (0 votes)
10 views

Section4 Fpga

The document provides an overview of field programmable gate arrays (FPGAs) including why they are used, their architecture and design flow. FPGAs offer customizable logic that can be programmed in the field and provide advantages over ASICs like lower development costs and faster time to market, though they have higher production costs. The architecture of FPGAs has evolved over time to include additional elements like RAM, PLLs and processors. The design flow to implement a design on an FPGA involves specification, synthesis, technology mapping, placement and routing.

Uploaded by

Diriba Gobena
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Section4 Fpga

The document provides an overview of field programmable gate arrays (FPGAs) including why they are used, their architecture and design flow. FPGAs offer customizable logic that can be programmed in the field and provide advantages over ASICs like lower development costs and faster time to market, though they have higher production costs. The architecture of FPGAs has evolved over time to include additional elements like RAM, PLLs and processors. The design flow to implement a design on an FPGA involves specification, synthesis, technology mapping, placement and routing.

Uploaded by

Diriba Gobena
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 104

FPGA: Basics

M. Balakrishnan
Outline (Section 4)
• Basics
– Why FPGAs
– XILINX design flow
• FPGA architecture
– Evolution of FPGAs
• FPGA technology mapping
– Technology mapping
– FPGA mapping
OUTLINE
•Introduction
• FPGA Architecture
• Xilinx Virtex-E
• Design Flow
• Design reports
OUTLINE (contd.)
• Design Flow
– Specifications of Design
– Design Entry
– Design Synthesis
– Mapping the Design
– Placing the Design
– Routing the Design
– Bit-stream Generation
– Assigning pins
Why FPGA?
Custom logic implemented as ASICs have the
following merits and demerits
Pros:
a. Reduced system complexity.
b. Improved performance.
Cons:
a. Very expensive to develop.
b. Delay introduction of product to market (time to
market) because of increased design time.
Why FPGA?
Need to worry about two kinds of costs:
a. Cost of development, sometimes called non-recurring
engineering (NRE)
b. Cost of manufacture or Recurring cost (RC)

A: FPGAs (low NRE, high RC)


B: ASICs (high NRE, low RC)
FPGA Markets
• FPGA Manufacturers:
– Xilinx (50%)
– Altera (33%)
– Actel, Atmel, Cypress, Lattice, ..

• Market grown rapidly in 2000s


• Not only competing in their “own field”
– Microprocessors
– Microcontrollers
– DSPs
– ASICs
FPGA Architecture

• An FPGA is an
array of
programmable logic
elements that can be
connected to inputs
or outputs.

8
FPGA Architecture (contd.)
• In modern FPGA,
other elements such
as RAM, PLLs, and
even microprocessor
have been built
within FPGAs.
A Simple FPGA Logic Block
0 0 0 0 0 0 0 ................. 0 1 1

output

a
b FF
c
d

LUT
clk
Simple Circuit
LUT Configuration bits
0 0 0 1 0 0 0 ............ 1 1 1

LUT LUT

a ..... 1 0 0 ..... a
b
c FF . . b FF
d c
d

..... 1 .....
. .

0 1 1 1 1 0 0 ............ 1 0 1

LUT LUT

a a
b FF ..... 0 1 0 .....
b
c FF
c d
d . .
Example: FPGA from Xilinx
• Basic blocks are logical cells.
• A slice comprise of two logic cells.
• A configurable logic block (CLB) may have up to 4 slices:
• CLB of XC4000 series have 1 slice.
• CLB of virtex series have 2 or 4 slices.

• A slice from Xilinx Virtex series FPGA contains:


• Two 4 input Look-Up-Table (LUT)
• Two Flip-flop (Registers)
• Carry and control, including multiplexers.

CIN
Source: xilinx.com
Interconnections
• Five type of interconnection based on
length
Single length lines, double length lines, Quad, Octal and long
lines.

Source: xilinx.com
Programmable Interconnects
• Connection box
– Connects input/output of logic block to
interconnect channels.
• Switch box
– Enables the connection of two interconnect
lines.
• Transmission gate (or a pass transistor)
is used for each connection.
Programmable Interconnects

Snapshot from FPGA Editor


Programmable Switching Matrix
Methods of Interconnection
Direct Interconnect
General
- Connects Purpose
Interconnection
adjacent
Long line CLBs
Interconnect
through direct
--interconnects
Connects
Time criticalgeneral
signals
interconnections and
-without going
Horizontal,
passes through vertical
one&or
throughlong
Global switch
more switchlinesboxes 
boxes  very fast
-slow
Global long lines for
clocks and resets

Source: xilinx.com
DESIGN FLOW Specifications

• Specifications of Design.
• Converting into HDL. HDL Simulation

• To implement a HDL design into an FPGA several


steps are required:
Synthesis

• Synthesize Design
• Map design Technology
• Placing design inside FPGA Mapping

• Routing design inside FPGA


• Convert final design into a bit stream for Place&Route Simulation
programming PFGA

• Those steps are usually performed by automated Bit-file


tools, but it is also possible to do some part
interactively
FPGA
Specifications
• To add two 64-bit binary numbers with an
additional carry in and generate a 64-bit
output and 1-bit carry out. cout

64
a
64
64-BIT sum
64 ADDER
b

cin
Specifications to FPGA
Specifications

Synthesis HDL
(Behavioral)

Technology
Technology Mapping
Mapping
Schematic

Place&Route
Place&Route Simulation

FPGA Bit-file
Bit-file

FPGA
FPGA
Specs (VHDL) to FPGA
library IEEE;
library
use IEEE;
IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use
use IEEE.STD_LOGIC_ARITH.ALL;
IEEE.STD_LOGIC_UNSIGNED.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity add_g is
entity add_g
generic is
(x : natural := 63);
generic (x : natural := 63);
port (a : in std_logic_vector (x downto 0);
b port (a : in std_logic_vector
: in std_logic_vector (x downto
(x downto 0); 0);
b : in std_logic_vector
cin : in std_logic; (x downto 0);
cin :: out
sum in std_logic;
std_logic_vector (x downto 0);
sum
cout : out: out std_logic_vector (x downto 0);
std_logic);
cout : out std_logic);
end entity add_g;
end entity add_g;
architecture behavior of add_g is
architecture
begin -- behavior behavior of add_g is
begin -- behavior
adder: process(a,b,cin)
adder: carry
variable process(a,b,cin)
: std_logic;
variable
variable isum carry : std_logic;
: std_logic_vector(x downto
variable
0); isum : std_logic_vector(x downto
0);
begin
begin
carry := cin;
carry
for i in := cin;
0 to x loop
for i in
isum(i) :=0 toa(i)
x loop
xor b(i) xor carry;
carry := (a(i)a(i)
isum(i) := andxor b(i)orxor
b(i)) carry;
(a(i) and
carry := (a(i) and
carry) or (b(i) and carry);b(i)) or (a(i) and
carry)
end loop; or (b(i) and carry);
end<=
sum loop;
isum;
cout <=<=
sum isum;
carry;
cout <=
end process adder; carry;
end
end process adder;
architecture behavior;
end architecture behavior;
Design Entry

Snapshot from Xilinx ISE


Synthesizing the Design

• Synthesis : Optimization process of


adapting a logic design to the logic
resources available on the chip, like lookup
tables, Long line, and dedicated carry.
• It means analyzing the whole design, and
selecting which logic resources available in
the FPGA will be used to perform the task.
• Gate level netlist is the output file.
Synthesis - Example
library IEEE;
library
use IEEE;
IEEE.STD_LOGIC_1164.ALL;
use
use IEEE.STD_LOGIC_1164.ALL;
IEEE.STD_LOGIC_ARITH.ALL;
use
use IEEE.STD_LOGIC_ARITH.ALL;
IEEE.STD_LOGIC_UNSIGNED.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity add_g is
entity add_g
generic is
(x : natural := 63);
generic (x : natural := 63);
port (a : in std_logic_vector (x downto 0); Synthesis
b port (a : in std_logic_vector
: in std_logic_vector (x downto
(x downto 0); 0);
b ::in
cin in std_logic_vector
std_logic; (x downto 0);
cin :: out
sum in std_logic;
std_logic_vector (x downto 0);
sum
cout : out
: out std_logic_vector (x downto 0);
std_logic);
cout
end : outadd_g;
entity std_logic);
end entity add_g;
architecture behavior of add_g is
architecture
begin -- behavior behavior of add_g is
begin -- behavior
adder: process(a,b,cin)
adder: carry
variable process(a,b,cin)
: std_logic;
variable
variable isum carry : std_logic;
: std_logic_vector(x downto
variable
0); isum : std_logic_vector(x downto
0);
begin
begin
carry := cin;
carry
for i in := cin;
0 to x loop
for i in
isum(i) :=0 toa(i)
x loop
xor b(i) xor carry;
carry := (a(i)a(i)
isum(i) := andxor b(i)orxor
b(i)) carry;
(a(i) and
carry := (a(i) and
carry) or (b(i) and carry);b(i)) or (a(i) and
carry)
end loop; or (b(i) and carry);
end<=
sum loop;
isum;
cout <=<=
sum isum;
carry;
cout <=
end process adder; carry;
end
end process adder;
architecture behavior;
end architecture behavior;
Mapping the Design
• Mapping : Process of assigning portions of
the logic design to the physical chip
resources (CLBs).
• Function similar to synthesizing.
• Synthesis is not necessarily specific to a
particular FPGA, but mapping usually is.
Mapping the Design (contd.)

Snapshot from FPGA Editor


Placing the Design
• Placing : In FPGAs, the process of assigning
specific parts of the design to specific
locations (CLBs) on the chip.
• Usually done automatically.
• Once the design has been converted into the
logic resources of the FPGA, we find a
location for these within the FPGA.
(Generally a 2-dimensional grid structure)
Placing the Design (contd.)

Snapshot from FPGA Editor


Routing the Design
• Routing : The process of creating the desired
interconnection of logic cells to make them
perform the desired function.
• Routing follows after placement.
• Once logic resources have been assigned a
location within the FPGA, we need to
interconnect the logic resources using
internal buses inside the FPGA.
Routing the Design (contd.)

Snapshot from FPGA Editor


Routing the design (contd.)

Snapshot from FPGA Editor


Convert Design into Bit-stream

• This always done using automated tools.

• The placed and routed design is converted


into a bit-stream that is downloaded into the
FPGA to configure it.
Design to Bit-stream (contd.)

Bit-stream Configuration
Assigning Pins
When implementing an entity in FPGA, the
input and output ports are mapped to pins of the
FPGA

entity add_g is
entity add_g
generic is
(x : natural := 63);
generic (x : natural := 63); a(63:0) cin
port (a : in std_logic_vector (x downto
0);port (a : in std_logic_vector (x downto b(63:0) FPGA
b 0);
: in std_logic_vector (x downto 0); cout
b ::in
cin in std_logic_vector
std_logic; (x downto 0);
cin : in std_logic; sum(63:0)
sum : out std_logic_vector (x downto 0);
sum
cout : out: out std_logic_vector (x downto 0);
std_logic);
cout : out
end entity add_g; std_logic);
end entity add_g;
Assigning Pins (cont.)
• A file called a UCF (User Constraint File) is
used to define which pin will be connected to
a particular input or output.

• Within Xilinx Project manager, the “assign


package pin” function can be used to easily
define input and output pin location.
Example UC File
entity add_g is
entity add_g is
generic (x : natural := 63);
generic (x : natural := 63);
port (a : in std_logic_vector (x downto 0);
port (a : in std_logic_vector (x downto 0); NET
b : in std_logic_vector (x downto 0);
b : in std_logic_vector (x downto 0); NET"cin"
"cin" LOC
LOC=="T9";
"T9";
cin : in std_logic; NET
NET“cout"
“cout" LOC
LOC=="M13"
"M13"; ;
cin : in std_logic;
sum : out std_logic_vector (x downto 0);
sum : out std_logic_vector (x downto 0);
cout : out std_logic);
cout : out std_logic);
end entity add_g;
end entity add_g;
Design Summary

Snapshot from Xilinx ISE


Synthesis Report

Snapshot from Xilinx ISE


Synthesis Report – Device Utilization Summary

Snapshot from Xilinx ISE


Synthesis Report – Timing Report

Snapshot from Xilinx ISE


FPGA Architecture
Abstract Architecture

LOGIC
INSTR

Interconnect
+ Storage

• Three components of all computing elements


• Control
• Compute elements
• Communication
Custom Hardware
•No Control Store
•Not General Purpose
•All “computing” is through
LOGIC spatial connections

Interconnect
+ Storage
Traditional P

•Control store only


controls logic
•Communication is in LOGIC
time INSTR
•Registers, memory
etc
Interconnect
+ Storage
Programmable Devices
• Prefabricated Silicon
• Logic implemented by programming the
basic cells and the interconnect
• Very fast turnaround time
• Limited design flexibility
• Low development time and cost
FPGA
• Combines PLDs and MPGAs
• Densities : 2K to 1000K+ gates
• Array of logic blocks and programmable interconnect
• Logic Block
• Universal gates, multiplexors, RAMs, etc
• Programmable element
• SRAM,EEPROM or antifuse

Q
Read or Write Q P1
P2 Out
Data P3
P4
Programming Bit I1I2
2-Input LUT
Where are FPGAs Used

Time to Price volume


market

Emulation Very high X Low


Emulation
Prototyping Prototyping Very high X Low
PreProduction
Production Pre-production Very high Critical Moderate

production Very high Critical High


Changing Market
FPGA
Generic 2D FPGA
SRAM Based FPGA - XILINX
Xilinx 4000 CLB
The Basic Building Block
• Logic Block
– Lookup table Based
• Xilinx
– Multiplexor Based
• Actel
– Transistor Based
– Universal Gate Based
LUT Mapping
• N-LUT direct implementation of a truth table: any function
of n-inputs.
• N-LUT requires 2N storage elements (latches)
• N-inputs select one latch location (like a memory)
Implementing Combinational
Logic

Two 4-input functions with register outputs


and one 2-input function
5 input Function
Single Port RAM

http://www.xilinx.com/bvdocs/publications/4000.pdf Page 9
Platform Computing
The Virtex Architecture
• CLBs
• IOBs
• General Routing
Matrix (GRM)
• BRAMs
• DLL
Virtex II Architecture
Virtex II CLB

V2 CLB Configuration

V2 Slice Configuration
Virtex II CLB (Half Slice)
Adder
Carry Chain
Other Features
The latest entry – Virtex II Pro
•Embedded high-speed serial transceivers enable data
bit rate up to 3.125 Gb/s per channel (RocketIO) or
10.3125 Gb/s (RocketIO X).
• Embedded IBM PowerPC 405 RISC processor blocks
provide performance up to 400 MHz.
• SelectIO-Ultra blocks provide the interface between
package pins and the internal configurable logic. Most
popular and leading-edge I/O standards are supported
by the programmable IOBs.
• Configurable Logic Blocks (CLBs) provide functional
elements for combinatorial and synchronous logic,
including basic storage elements. BUFTs (3-state
buffers) associated with each CLB element drive
dedicated segmentable horizontal routing resources.
•Block SelectRAM+ memory modules provide large
18 Kb storage elements of True Dual-Port RAM.
• Embedded multiplier blocks are 18-bit x 18-bit
dedicated multipliers.
• Digital Clock Manager (DCM) blocks provide self
calibrating,fully digital solutions for clock distribution
delay compensation, clock multiplication and division,
and coarse- and fine-grained clock phase shifting.
FPGA Technology Mapping
Outline
• Technology mapping
– Definition & Examples
– Algorithms
• FPGA structure & simple mapping
• FPGA technology mapping
– Issues
– Algorithms
Definition
Technology mapping is also referred to as
library binding.

Given a Boolean network and a


characterized cell library, generate a
mapping of the network components onto
cell library components with the objective
of cost optimization or delay optimization.
Input & Library
• Input: Boolean network - Technology
independent optimized network; typically a
multi-level network
• Library:
– Characterization in terms of area, delay and
power
– Enumerated or implicit library cells
Typical Library
A typical simple library cell :-
• a single output combinational logic function
• cost in terms of area
• delay in terms of propagation delays for each
input/output pair and as a function of load
and/or fanout. Sometimes only the worst case
values are stored.
• power in terms of average current
Network Covering
Network covering implies replacement of the
sub-networks of the original network with
cell library instances. Covering entails
recognizing the equivalence of library cell
to the identified sub-network and selecting
adequate number of them to cover the
whole network.
Example 1

Cell library consists of two and three input gates


Example: First Mapping

Cell library consists of two and three input gates


Example: Second Mapping

Cell library consists of two and three input gates


Example 2
Cell library consists of

Component Area Delay


AND2 3 2
OR2 3 2
OA21 5 3
Example: First Mapping
Cell library consists of

Component Area Delay


AND2 3 2
OR2 3 2
OA21 5 3

Area = 9, Delay = 4
Example: Second Mapping
Cell library consists of

Component Area Delay


AND2 3 2
OR2 3 2
OA21 5 3

Area = 10, Delay = 3


Example
m4
Cell library consists of
m2
Component Area Delay
m1 m5 AND2 3 2
OR2 3 2
m3
OA21 5 3

(m1 + m4 + m5)(m2 + m4)(m3 + m5)(m2’ + m1)(m3’ + m1) = 1


FPGA Structures & Mapping
FPGA Structures
• Multiplexer based (ACTEL)
– Mapping techniques similar to library based
– Library is created by enumerating all possible
“patterns”
• LUT based (XILINX)
– Significantly different mapping techniques
LUT Based FPGAs
In LUT based FPGAs (example XILINX
FPGAs) the building blocks are LUTs and
Flip-Flops. A n-input LUT can implement
all functions of n-variables.
The FPGA itself is composed of CLB’s
with each CLB containing multiple LUT’s
and flip-flops which makes the technology
mapping problem more complex.
XC3000 CLB
FF
2X4
0r
1X5
LUT

FF
XC4000 CLB
4 input
LUT
FF

3-input
LUT

FF
4 input
LUT
Mapping Objectives
• Cost optimal mapping
– Minimizing the number of LUTs
– Minimizing the number of CLBs
• Delay optimal mapping
– Minimizing the number of LUT levels
– Minimizing the delays (including routing
delays)
Cost Optimal Mapping
The problem of k-input LUT maps can be
mapped to the problem of bin packing. We
have to minimize the number of bins each
with a capacity of k.
Assume the starting point is a gate-level
netlist with each gate containing less than
equal to k inputs.
Each gate can be packed into one bin.
Example: Simple Mapping
Sum of Products: Bin Packing
• Select the product term with the most
number of variables and fit it into any table
where it fits and if it doesn’t fit anywhere
add a new table
• The table with the fewest number of unused
inputs is declared as final
• Associate this output with the first table that
can accept it
Example: 4-input LUT
Example: Overlapping Inputs

a
b
c

a
d

e
f
g
K=4
Example: Decomposition

a
b
c

h
d

e
f
g
K=4
Example: 3 input LUT
FPGA Technology Mapping:
Issues
LUT Mapping
Starting from a technology independent
optimized circuit, produce a minimal LUT
cover for the circuit. The complexities are
due to the following reasons.
• Fanout nodes
• Reconvergence
• Node decomposition and packing
Area vs. Delay
Decomposition
Decomposition
Fanout: Replication

DAG, not a tree


Fanout: Replication
Fanout: Reconvergence
Fanout: Reconvergence
CLB Mapping
Though direct mapping of technology
independent circuit onto CLBs would
involve function decomposition.
Alternatively, one can start from a circuit
mapped onto LUTs and then pack them
onto CLBs.
Thank You

You might also like