Introduction To VIRTEX II
FPGA Architecture
Presented By
[Link]
Virtex Family
In 2010,Xilinx's announced the Virtex 7 family, is based
on a 28 nm design and is reported to deliver a two-fold
system performance improvement at 50% lower power.
In addition, Virtex-7 doubles the memory bandwidth
compared to previous generation Virtex FPGAs with
1866 Mbit/s memory interfacing performance and over
two million logic cells.
In 2011, Xilinx began shipping sample quantities of the
Virtex-7 2000T FPGA, which combines four smaller
FPGAs into a single package by placing them on a special
silicon interconnection pad (called an interposer) to
deliver 6.8 billion transistors in a single large chip.
In 2012, using the same 3D technology, Xilinx introduced
initial shipments of their Virtex-7 H580T FPGA, a
heterogeneous device, so called because it comprises two
FPGA dies and one 8-channel 28Gbit/s transceiver die in
the same package.
As Xilinx introduced new high capacity 3D FPGAs,
including Virtex-7 2000T and Virtex-7 H580T products,
these devices began to outpace the capacity of Xilinx’s
design software, which led the company to completely
redesign its tool set. The result was the introduction of the
Vivado Design Suite, which reduces the time needed for
programmable logic and I/O design, and speeds systems
integration and implementation compared to the previous
software.
General Description
The Virtex-II family is a platform FPGA
developed for high performance from low-
density to high-density designs that are based on
IP cores and customized modules.
The family delivers complete solutions for
telecommunication, wireless, networking, video,
and DSP applications, including PCI, LVDS, and
DDR interfaces.
Virtex-2 Array
Virtex-II devices are user-programmable gate
arrays with various configurable elements .
The Virtex-II architecture is optimized for high-
density and high-performance logic designs .
The programmable device is comprised of
input/output blocks (IOBs) and internal
configurable logic blocks (CLBs).
Virtex-II Architecture
Elements in virtex-2
The internal configurable logic includes four
major elements organized in a regular array.
Configurable Logic Blocks (CLBs)
Block Select RAM Memory
Multiplier blocks
DCM (Digital Clock Manager)
Configurable Logic Blocks (CLBs) provide
functional elements for combinational and
synchronous logic,including basic storage elements.
Block SelectRAM memory modules provide
large18 Kbit storage elements of dual-port RAM.
Multiplier blocks are 18-bit x 18-bit dedicate
multipliers.
DCM (Digital Clock Manager) blocks provide self-
calibrating, fully digital solutions for clock
distribution delay compensation, clock multiplication
and division, coarse- and fine-grained clock phase
shifting.
Features
Up to 2 Million System Gates at 100+ MHz
Distributed and Block RAM available
Low Power
Delay Logic Loops
2.5V Internal Operation with support of
common power
High-Performance Interfaces to External
Memory
Active Interconnect Technology
Naming Conventions
Block Diagram of VIRTEX-II
SONET / SDH
LVDS
DCM
DDR
CAM DDR SDRAM
Distri
RAM
FIFO QDR
PCI-X PCI 18Kb DDR SRAM
BRAM Shift
Registers DDR CAM
Multiplier
BLVDS
Backplane
CLB Resources
Basic resource unit is the Logic Cell
1 CLB contains 2 - 4 Logic Cells, depending on device family
Logic Cell = 4-input Look-Up Table (LUT) + D Flip-flop
LUT capacity limited by number of inputs, not complexity of
function
LUTs can be used as ROM or synchronous RAM
Flip-flop can be configured as a transparent latch in Virtex and
Spartan-II
LUT FF
Closer Look at a CLB Structure
COUT COUT
YB YB
G4 Look-Up Carry Y S
G4 Look-Up Carry Y S
G3 D Q G3 D Q
G2 Table O & G2 Table O &
Control CK Control CK
G1 G1
Logic EC Logic EC
R R
F5IN F5IN
BY BY
SR SR
XB XB
F4 Look-Up Carry X S F4 Look-Up Carry X S
F3 D Q F3 D Q
Table O & Table O &
F2 CK F2 CK
F1 Control F1 Control
Logic EC Logic EC
R R
CIN CIN
CLK CLK
CE SLICE CE SLICE
Each slice has 2 LUT-FF pairs with associated carry logic
Two 3-state buffers (BUFT) associated with each CLB
Each Slice has four outputs: Carry Logic for fast addition
Two registered outputs Two independent carry chain per CLB
Two non-registered output
CLB (Configurable Logic
Blocks)
Each CLB is connected to one switch
matrix Providing access to general routing
COUT COUT resources.
TBUF
TBUF
High level of logic integration
Slice S3
X1Y1
Wide-input functions:
—16:1 multiplexer in 1 CLB or any
Slice S2 function
Switch SHIFT
X1Y0 —32:1 multiplixer in 2 CLBs
Matrix (1 level of LUT)
Slice S1
Fast arithmetic functions
X0Y1 —2 look-ahead carry chains
per CLB column
Slice S0
X0Y0 Fast Connects Addressable shift registers in LUT
—16-b shift register in 1 LUT
CIN CIN
—128-b shift register in 1 CLB
Interconnect Technology
Offered by VIRTEX-II
Interconnect is an array of switch matrices
All Virtex II features can access routing resources
through the switch matrix
Simplify design and place & route
Switch Switch
CLB Matrix
Matrix
Switch
Switch Matrix
IOB 18Kb MULT
Matrix BRAM 18x18
Switch
Matrix
Switch DCM Switch
Matrix
Matrix
Shift Register
LUT
Each LUT can be IN D
CE
Q
CE
configured as shift register CLK
Serial in, serial out
D Q
CE
Dynamically addressable
delay up to 16 cycles
For programmable pipeline
LUT
= D
CE
Q OUT
Cascade for greater cycle
delays
Use CLB flip-flops to add D Q
depth CE
DEPTH[3:0]
Shift Register Look-Up Table
High density integration of shift registers
DSP applications use SRL16 for delay matching
CDMA wireless and video applications require shift
registers
Up to 128-b per CLB
Cascadable output
Dynamic addressable output
16-b per LUT
Multiple SRLC16 cascadable to any length
Digital Clock Manager
High-Speed 420 MHz clock generation:
Clock de-skew on-chip and off-chip
Up to 12 DCM per device
Fully digital circuitry
Flexible Frequency Synthesis
Synthesis outputs: clock 0° & 180° (def.: 4X)
High-Resolution Phase Shifting
DPS fixed and variable modes
Delay-Locked Loop (DLL)
Precise Clock De-Skew
DLL outputs: clock 0°, 90°, 180°, 270°
DLL outputs: clock 2X and clock division
50/50 duty cycle correction
DCM Features
• Clock De-skew: The DCM generates new system
clocks (either internally or externally to the FPGA),
which are phase-aligned to the input clock, thus
eliminating clock distribution delays.
• Frequency Synthesis: The DCM generates a wide
range of output clock frequencies, performing very
flexible clock multiplication and division.
• Phase Shifting: The DCM provides both coarse phase
shifting and fine-grained phase shifting with dynamic
phase shift control.
Digital Clock Manager: DCM
DCM The DCM has the following
CLKIN CLK0
CLK90
general control signals:
CLKFB
CLK180
RST CLK270
CLK2X • RST input pin: resets the
DSSEN CLK2X180
CLKDV
entire DCM
PSINCDEC
PSEN CLKFX • LOCKED output pin:
PSCLK CLKFX180 asserted High when all
LOCKED enabled DCM circuits have
STATUS[7:0] locked.
PSDONE
• STATUS output pins
(active High)
Clock signal
Control signal
Frequency Synthesis of DCM
The CLK2X and CLK2X180 o/p double the clock
frequency.
The CLKDV CLK0
CLK90
output creates divided output clocks
CLKFB
RST
with division options of 1.5, 2, 2.5, 3, ….., 7, 7.5,8,
CLK1803333333
CLK270
9, 10, 11,
DSSEN
CLK2X12, 13, 14, 15, and 16.
CLK2X180 CLKDV
CLK2X180 is phase shifted 180 degrees relative to
PSINCDEC
PSENCLK2X. CLKFX
PSCLK CLKFX180
CLKFX180 is phase shifted 180 degrees relative to
LOCKED
CLKFX
The CLKFX and CLKFX180 outputs can be used
STATUS[7:0]
to produce clocks at the following frequency:
Freq CLKFX = (M/D) x Freq CLK IN
High Resolution Phase
Shifting
The DCM provides additional control over clock skew
through either coarse or fine-grained phase shifting.
TheCLK0, CLK90, CLK180, and CLK270 outputs are
each phase shifted by ¼ of the input clock period relative
to each other, providing coarse phase control.
Note that CLK90 andCLK270 are not available in high-
frequency mode. Fine-phase adjustment affects all nine
DCM output clocks.
When activated, the phase shift between the rising edges of
CLKIN and CLKFB is a specified fraction of the input
clock period
Global
Clocks
Up to 16 Dedicated Low Skew Clocks
16 global clock multiplexers & buffers
8 clock nets in each quadrant
Global clock ENABLE
Switch glitch-free from one clock to another
16 clock pads (can be used as user I/O)
Clock Distribution
16 Global Clock Multiplexers Unused Branches are Disable
Eight on the top
(Power Saving)
8 BUFGMUX
Eight on the bottom
Switch “glitch free” from 1 clock to the
NW NE
other
8
8 Clocks selectable per 8 8 max
quadrant
8 BUFGMUX 16 Clocks
NW NE
8 8
16 Clocks
SE
SW
SW 8 BUFGMUX SW
8 BUFGMUX
Use Global Buffers to
Reduce Clock Skew
•Global buffers are connected to dedicated routing.
•This routing network is balanced to minimize skew
•All Xilinx FPGAs have global buffers
D Q
D Q
CLK2
BUFG
CLK1 Introduces clock skew between CLK1 and
CLK2
Uses an extra BUFG to reduce skew on
BUFG CLK2
Design contains 2 clock signals
Memory
On-Chip SelectRAMTM Memory
Large FIFOs
Packet Buffers
Video Line Buffers
Cache Tag Memory
DSP Coefficients
CAM
Small FIFOs
Deep/Wide Up to
CAM
400 Mbps/pin
Shallow/Wide
DDR & QDR
18 kb
128x1 Blocks
Distributed RAM Block RAM External RAM/CAM
bytes kilobytes megabytes
Terabit Memory Continuum
Embedded 18 kb Block RAM
Up to 3 Mb on-chip block RAM
High internal buffering bandwidth
Reduced I/O count and more embedded memory
18Kbit block RAM
Parity bit locations (parity in/out busses)
Data width up to 36 bits
3 WRITE modes
Output latches Set/Reset
True Dual-Port RAM
Independent clock (async.) & control
Distributed RAM
RAM16X1S
D
WE
=
WCLK
LUT A0 O
CLB LUT configurable as A1
A2
Distributed RAM A3
RAM32X1S
A LUT equals 16x1 RAM D
WE
WCLK
Implements Single and Dual- A0 O
A1
A2
Ports A3
A4
or
Cascade LUTs to increase RAM LUT RAM16X2S
D0
size D1
WE
Synchronous write = WCLK
A0
A1
A2
O0
O1
RAM16X1D
D
WE
A3
Synchronous/Asynchronous read WCLK
or
A0 SPO
LUT
A1
Accompanying flip-flops used A2
A3
for synchronous read DPRA0 DPO
DPRA1
DPRA2
DPRA3
18 x 18 Embedded Multiplier
Fast arithmetic functions
Optimized to implement multiply / accumulate
modules
18 x 18 signed multiplier
Fully combinatorial
Optional registers with CE & RST (pipeline)
Independent from adjacent block RAM
18 x 18 Multiplier
Embedded 18-bit x 18-bit multiplier
2’s complement signed operation
Multipliers are organized in columns
Data_A
(18 bits)
18 x 18 Output
Multiplier (36 bits)
Data_B
(18 bits)
Basic I/O Block Structure
Three-State D Q
FF Enable EC
Three-State
Clock SR Control
Set/Reset
Output D Q
FF Enable EC
Output Path
SR
Direct Input
FF Enable
Input Path
Registered Q D
Input EC
SR
I/O Signal Types
I/O Signal Type
Single-Ended Differential
LVCMOS HSTL SSTL LVTTL LVDS Bus LVDS LVPECL
NOTE: Only the popular IO types shown here
IOB: Double Data Rate
Registers
DDR registers can be clocked by
Clock and not (clock) if the duty cycle is 50/50
CLK0 and CLK180 DLL outputs
CLK
DATA_1 D1A D1B D1C
DATA_2 D2A D2B D2C
Dual Data Rate D1A D2A D1B D2B D1C
THANK YOU