0% found this document useful (0 votes)

37 views30 pages

Lecture04 - High-Level Digital Design Automation

The document outlines the course ECE 6775 on High-Level Digital Design Automation, focusing on Field-Programmable Gate Arrays (FPGAs) and their architectures. It includes exercises on operational intensity analysis for 2D convolution, discussions on FPGA components, and the advantages of FPGA-based computing. The agenda also mentions upcoming topics such as algorithm analysis and acknowledges contributions from various professors.

Uploaded by

leprelepre

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views30 pages

Lecture04 - High-Level Digital Design Automation

Uploaded by

leprelepre

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

ECE 6775

High-Level Digital Design Automation

Fall 2024

Field-Programmable Gate Arrays

(FPGAs)
Announcements

▸TA-led hands-on tutorial on HLS next Tuesday

– Bring your laptop

1
Exercise: OI Analysis of 2D Convolution
C

Input
image R
frame

Estimate the OI for the 2D convolution kernel both

without and with data use

2
OI Analysis of 2D Convolution w/o LineBuffer
C

for (r = 1; r < R; r++)

for (c = 1; c < C; c++)
Input for (i = 0; i < 3; i++)
image R for (j = 0; j < 3; j++)
frame out[r][c] += img[r+i-1][c+j-1] * f[i][j];

▸ OI without data reuse

– Number of operations = C*R*9*2
(1 multiply + 1 add per pixel)
– External mem accesses (bytes) = C*R*9
(assuming 1 byte per pixel in grayscale)
– Resulting OI = 2

3
OI Analysis of 2D Convolution w/ LineBuffer
C

for (r = 1; r < R; r++)

for (c = 1; c < C; c++)
Input for (i = 0; i < 3; i++)
image R for (j = 0; j < 3; j++)
frame out[r][c] += img[r+i-1][c+j-1] * f[i][j];

▸ OI with data reuse using line buffer

– Number of operations = C*R*9*2
– External mem accesses (bytes) = C*R
– OI = 18

4
Recap: Design Space Exploration with Roofline
120 Bandwidth Roof

Computational Roof
Computational Throughput

100
A B

80 Design points A & B

achieve same throughput
60
But which one would you
prefer?
40

0
0 10 20 30 40 50 60

Operational Intensity (OI)

5
Agenda

▸FPGA introduction
– Basic building blocks
– Classical homogeneous FPGA architectures
– Modern heterogeneous FPGA architectures

6
Tradeoff between Compute Efficiency and Flexibility

FLEXIBILITY EFFICIENCY
Register
Contr s
ol
Unit
CPUs
Arithmet GPUs FPGAs ASICs
(CU) ic Logic
Unit
(ALU)

7
What Are FPGAs

▸ Field-programmable gate array

– Can be configured to act like any circuit after manufacturing
– Can do many things – we focus on computation acceleration

8
FPGAs Come In Many Forms

PCIe-Attached In-Storage

CPU Integrated In-Network

9
Building Blocks of Modern FPGA Architectures

▸ A programmable array of logic blocks (LUT, FF),

interconnects, I/Os, and dedicated blocks (BRAM, DSP)

Look-up table (LUT) DSP

⋮

LUT
⋮

⋮ ⋮

10
Counting Boolean Functions

▸How many distinct 2-input 1-output Boolean

functions exist?

▸What about K inputs?

11
Multiplexer as a Universal Gate

▸ Any function of k variables can be implemented with a

2k:1 multiplexer

AA BB Cin
Cin SS Cout
Cout
0 0 0 00 0 ? 0
0 0 1 11 0 ? 1
? 2
0 1 0 11 0 ? 3
4 8:1 MUX Cout
0 1 1 00 1 ?
? 5
1 0 0 11 0 ? 6
7
?
1 0 1 00 1 S2 S1 S0

1 1 0 00 1
? ? ?
1 1 1 11 1

12
Look-Up Table (LUT)

§ A k-input LUT (k-LUT) can be

0/1
configured to implement any k-
0/1
input 1-output combinational
0/1
logic
…

MUX
0/1
– 2k SRAM bits Y
0/1
– Delay is independent of logic
function 0/1
0/1
0/1

x2 x1 x0

A 3-input LUT
13
Exercise: Implementing Logic with LUTs

▸Implement a 2:1 MUX using a network of 2-input

LUTs. Use the minimum number of LUTs

I0
MUX

Y 2-LUT
I1

Building block:
S 2-input LUT

14
A Logic Element

▸ A k-input LUT is usually followed by a flip-flop (FF) that

can be bypassed
▸ The LUT and FF combined form a logic element

15
A Logic Block

▸ A logic block clusters

multiple logic elements

16
Arithmetic Circuitry in Logic Block

Xilinx (now AMD) Intel/Altera

LUTs implement carry propagate and LUTs pass inputs to hardened adders
generation logic

17
Routing Architecture

vs.

Hierarchical routing architecture Island-style routing architecture

18
Traditional Homogeneous FPGA Architecture

Switch
block

Routing
track

Logic
block

19
Modern Heterogeneous Field-Programmable
System-on-Chip (SoC)

▸ Island-style configurable mesh routing

▸ Lots of dedicated components
– Memories/multipliers, I/Os, processors
– Specialization leads to higher performance and lower power

20
[Figure credit: [Link]]
Dedicated DSP Blocks

▸Built-in components for fast arithmetic operation

optimized for DSP applications
– Essentially a multiply-accumulate core with many
other features

– Fixed logic and connections, functionality may be

configured using control signals at run time

– Much faster than LUT-based implementation (ASIC

vs. LUT)

21
Example: Xilinx DSP48E Slice

§25x18 signed multiplier

§48-bit add/subtract/accumulate
§48-bit logic operations
§SIMD operations (12/24 bit)
§Pipeline registers for high speed

[source: AMD Xilinx] 22

Dedicated Block RAMs (BRAMs)

▸Example: Xilinx 18K/36K

block RAMs 18K/36K block RAM
DIA
– 32k x 1 to 512 x 72 in one DIPA
ADDRA
36K block WEA
ENA
– Simple dual-port and true DOA
CLKA DOPA
dual-port configurations
DIB
– Built-in FIFO logic DIPB
ADDRB
– 64-bit error correction WEB
ENB
coding per 36K block DOB
CLKB DOPB

[source: AMD Xilinx] 23

An Embedded FPGA SoC
Dual ARM Cortex-A9 + NEON Up to
SIMD extension @600MHz~1GHz 350K logic cells
2MB Block RAM
900 DSP48s

Xilinx Zynq All Programmable System-on-Chip

[Source: AMD Xilinx] 24
A Cloud FPGA Instance

Block RAM

Block RAM
~2 Million ~5000 ~300Mb
Logic Blocks DSP Blocks Block RAM

AWS F1 instance: AMD Xilinx UltraScale+ VU9P

[Figure source: David Pellerin, AWS]

25
An Even More Heterogeneous (FPGA) Accelerator

Versal Architecture Overview

AMD Xilinx Versal Architecture
Adaptable Engines
2X compute density

Intelligent Engines
Scalar Engines • AI Compute
• Platform Control • Diverse DSP workloads
• Edge Compute

Network-on-Chip
Protocol Engines • Guaranteed Bandwidth
• Integrated 600G cores • Enables SW Programmability
• 4X encrypted bandwidth

Programmable I/O DDR Memory

• Any sensor, any interface • 2X bandwidth/pin
• Extendable peripheral set • Server-class density

PCIe & CCIX

Transceivers
• 2X PCIe & DMA bandwidth
• Broad range, 25G →112G
• Cache-coherent interface
• 58G in mainstream devices
to accelerators

26
[source: AMD Xilinx]
Key Advantages of FPGA-Based Computing

▸ Massive amount of fine-

grained parallelism
– Highly parallel and/or deeply
pipelined architecture
▸ Silicon (re)configurable to
fit the application
– Compute at desired numerical
accuracy
– Customized memory hierarchy

Þ low (and predictable) latency

Þ higher energy efficiency

27
Next Lecture

▸Analysis of Algorithms

28
Acknowledgements

▸ These slides contain/adapt materials developed by

– Prof. Jason Cong (UCLA)
– Andrew Boutros and Prof. Vaughn Betz (Univ. of Toronto)
– UCI CS295 by Prof. Sang-Woo Jun

Lec5 FPGA
No ratings yet
Lec5 FPGA
46 pages
Opencl On Fpga: Marc Gaucheron INTEL Programmable Solution Group
No ratings yet
Opencl On Fpga: Marc Gaucheron INTEL Programmable Solution Group
128 pages
FPGA Architecture Course Overview
No ratings yet
FPGA Architecture Course Overview
61 pages
P2L4.1 FPGA System Design and Implementation With Vivado and Vitis
No ratings yet
P2L4.1 FPGA System Design and Implementation With Vivado and Vitis
40 pages
FPGA Design and Architecture Guide
No ratings yet
FPGA Design and Architecture Guide
104 pages
Sec5-Fpga - Part1
No ratings yet
Sec5-Fpga - Part1
41 pages
Gunjan FPGA
No ratings yet
Gunjan FPGA
43 pages
Design of Digital Circuits: Introduction To The Labs and Fpgas
No ratings yet
Design of Digital Circuits: Introduction To The Labs and Fpgas
33 pages
Fpga 1722521703
No ratings yet
Fpga 1722521703
73 pages
FPGA
No ratings yet
FPGA
26 pages
19 20 IntroFPGA PDF
No ratings yet
19 20 IntroFPGA PDF
56 pages
Comprehensive Overview of FPGA Field Programmable Gate Array Systems
No ratings yet
Comprehensive Overview of FPGA Field Programmable Gate Array Systems
7 pages
FPGA Genreal Paper
No ratings yet
FPGA Genreal Paper
7 pages
FPGA Vs ASIC
No ratings yet
FPGA Vs ASIC
9 pages
How Does FPGA Work: Outline
No ratings yet
How Does FPGA Work: Outline
17 pages
Evolution of Implementation Technologies: Trend Toward Higher Levels of Integration
No ratings yet
Evolution of Implementation Technologies: Trend Toward Higher Levels of Integration
29 pages
FPGA Basics
No ratings yet
FPGA Basics
20 pages
FPGA-Based System Design Overview
No ratings yet
FPGA-Based System Design Overview
27 pages
FPGA Architecture Overview
No ratings yet
FPGA Architecture Overview
60 pages
Understanding FPGAs for Acceleration
No ratings yet
Understanding FPGAs for Acceleration
22 pages
Introduction to Field Programmable Gate Arrays
No ratings yet
Introduction to Field Programmable Gate Arrays
57 pages
Department of Electronics and Communication Engineering Saintgits College of Engineering
No ratings yet
Department of Electronics and Communication Engineering Saintgits College of Engineering
41 pages
FPGA Digital Design & VHDL Guide
No ratings yet
FPGA Digital Design & VHDL Guide
33 pages
Introduction To FPGA
No ratings yet
Introduction To FPGA
25 pages
1d996928lecture 2 and 3 PDF
No ratings yet
1d996928lecture 2 and 3 PDF
53 pages
FPGA Applications and Timeline Overview
No ratings yet
FPGA Applications and Timeline Overview
39 pages
FPGA Presentation 2
No ratings yet
FPGA Presentation 2
57 pages
Bergmann
No ratings yet
Bergmann
35 pages
Fpga Tutorial
No ratings yet
Fpga Tutorial
10 pages
FPGAs Memory Synchronization and Performance Evaluation Using The Open Computing Language Framework
No ratings yet
FPGAs Memory Synchronization and Performance Evaluation Using The Open Computing Language Framework
8 pages
FPGA Basics for Aspiring Engineers
No ratings yet
FPGA Basics for Aspiring Engineers
37 pages
What Is An FPGA?: Figure 1: FPGA Block Structure
100% (1)
What Is An FPGA?: Figure 1: FPGA Block Structure
10 pages
2022 06 15 FPGA Lecture HS
No ratings yet
2022 06 15 FPGA Lecture HS
79 pages
EEE4084F Lecture19
No ratings yet
EEE4084F Lecture19
26 pages
Electronics System Design Using FPGA
No ratings yet
Electronics System Design Using FPGA
15 pages
Introduction To FPGA-based Design
No ratings yet
Introduction To FPGA-based Design
83 pages
FPGA Kitap BLM
No ratings yet
FPGA Kitap BLM
30 pages
FPGA Basics and Advantages Explained
No ratings yet
FPGA Basics and Advantages Explained
75 pages
FPGA Project Ideas and Examples
No ratings yet
FPGA Project Ideas and Examples
4 pages
04 Abstract
No ratings yet
04 Abstract
40 pages
Fpga Da
No ratings yet
Fpga Da
137 pages
FPGA PPT Presentation On Flow
No ratings yet
FPGA PPT Presentation On Flow
21 pages
FPGA Design Flow: Page 1 of 5
No ratings yet
FPGA Design Flow: Page 1 of 5
6 pages
FPGA Design and Applications
No ratings yet
FPGA Design and Applications
59 pages
Meghnad Saha Answers
No ratings yet
Meghnad Saha Answers
25 pages
FPGA Based System Design
No ratings yet
FPGA Based System Design
61 pages
FPGA Arquitectura Basica
No ratings yet
FPGA Arquitectura Basica
7 pages
Thesis Hridya
No ratings yet
Thesis Hridya
43 pages
FPGA Based System Design
No ratings yet
FPGA Based System Design
40 pages
Fpga Introduction 20190626a
No ratings yet
Fpga Introduction 20190626a
26 pages
Ece 465 Introduction To Cplds and Fpgas: Shantanu Dutt Ece Dept. University of Illinois at Chicago
No ratings yet
Ece 465 Introduction To Cplds and Fpgas: Shantanu Dutt Ece Dept. University of Illinois at Chicago
21 pages
FPGA Programming Process Overview
No ratings yet
FPGA Programming Process Overview
12 pages
Introduction to CPLDs and FPGAs
No ratings yet
Introduction to CPLDs and FPGAs
21 pages
Fpga Introduction
No ratings yet
Fpga Introduction
14 pages
Lecture11 FPGA
No ratings yet
Lecture11 FPGA
38 pages
MICRO22 - FPGA - DL - Deep Learning Optimized FPGA Architectures
No ratings yet
MICRO22 - FPGA - DL - Deep Learning Optimized FPGA Architectures
230 pages
01 Fpga
No ratings yet
01 Fpga
38 pages
Lecture 11
No ratings yet
Lecture 11
36 pages
Lecture10 - High-Level Digital Design Automation
No ratings yet
Lecture10 - High-Level Digital Design Automation
34 pages
Lecture07 - High-Level Digital Design Automation
No ratings yet
Lecture07 - High-Level Digital Design Automation
28 pages
Lecture05 - High-Level Digital Design Automation
No ratings yet
Lecture05 - High-Level Digital Design Automation
36 pages
Lecture02 - High-Level Digital Design Automation
No ratings yet
Lecture02 - High-Level Digital Design Automation
34 pages
Event Handling in C PDF
No ratings yet
Event Handling in C PDF
2 pages
Vsphere Esxi Vcenter Server 51 Security Guide
No ratings yet
Vsphere Esxi Vcenter Server 51 Security Guide
194 pages
Mitsubishi LVP Sd10u
No ratings yet
Mitsubishi LVP Sd10u
33 pages
Daa 6
No ratings yet
Daa 6
59 pages
Understanding TCP Three-Way Handshake
No ratings yet
Understanding TCP Three-Way Handshake
3 pages
OpenMP3 1-FortranCard
No ratings yet
OpenMP3 1-FortranCard
2 pages
BJT Cascade Amplifier Gain Analysis
No ratings yet
BJT Cascade Amplifier Gain Analysis
3 pages
SFA Exam Schedule Overview
No ratings yet
SFA Exam Schedule Overview
99 pages
IL2206 L04 IO Programming
No ratings yet
IL2206 L04 IO Programming
36 pages
Website: Vce To PDF Converter: Facebook: Twitter:: Aca-Cloud1.Vceplus - Premium.Exam.50Q
100% (1)
Website: Vce To PDF Converter: Facebook: Twitter:: Aca-Cloud1.Vceplus - Premium.Exam.50Q
15 pages
esxtop Data Analysis Guide
No ratings yet
esxtop Data Analysis Guide
6 pages
5G Network MCQ
No ratings yet
5G Network MCQ
14 pages
Brosur Cardiovit Cs 200 Office
No ratings yet
Brosur Cardiovit Cs 200 Office
4 pages
Routing Algorithms
No ratings yet
Routing Algorithms
21 pages
TM 9-4931-535-34P - HP - 3585A - Spectrum - Analyzer - 1988 PDF
No ratings yet
TM 9-4931-535-34P - HP - 3585A - Spectrum - Analyzer - 1988 PDF
217 pages
5100 Memory Controller Hub Chipset Datasheet
No ratings yet
5100 Memory Controller Hub Chipset Datasheet
434 pages
Program: - Semester-: BCA VI
No ratings yet
Program: - Semester-: BCA VI
2 pages
DaaS Embedded Linux
No ratings yet
DaaS Embedded Linux
16 pages
Alliah Tinoy - ICT Lab 4-1
No ratings yet
Alliah Tinoy - ICT Lab 4-1
3 pages
OSI Model in Banking Networks
No ratings yet
OSI Model in Banking Networks
4 pages
Composition and Aggregation
No ratings yet
Composition and Aggregation
27 pages
MPR 130 User Manual
No ratings yet
MPR 130 User Manual
592 pages
Exam
No ratings yet
Exam
5 pages
Smoke Alarm Project Report - University of Nairobi
No ratings yet
Smoke Alarm Project Report - University of Nairobi
75 pages
GD25Q16 Serial Flash Datasheet
No ratings yet
GD25Q16 Serial Flash Datasheet
34 pages
Smart Waste Management with GPS Tracking
No ratings yet
Smart Waste Management with GPS Tracking
38 pages
C Programming Concepts Quiz
No ratings yet
C Programming Concepts Quiz
10 pages
Operating System Quiz Questions
100% (1)
Operating System Quiz Questions
11 pages
Enterprises Network Design
No ratings yet
Enterprises Network Design
8 pages
In-House Practical Training On Inheritance and Interfaces in Java
100% (7)
In-House Practical Training On Inheritance and Interfaces in Java
44 pages

Lecture04 - High-Level Digital Design Automation

Uploaded by

Lecture04 - High-Level Digital Design Automation

Uploaded by

ECE 6775

High-Level Digital Design Automation

Field-Programmable Gate Arrays

▸TA-led hands-on tutorial on HLS next Tuesday

Estimate the OI for the 2D convolution kernel both

for (r = 1; r < R; r++)

▸ OI without data reuse

for (r = 1; r < R; r++)

▸ OI with data reuse using line buffer

80 Design points A & B

Operational Intensity (OI)

▸ Field-programmable gate array

CPU Integrated In-Network

▸ A programmable array of logic blocks (LUT, FF),

Look-up table (LUT) DSP

▸How many distinct 2-input 1-output Boolean

▸What about K inputs?

▸ Any function of k variables can be implemented with a

§ A k-input LUT (k-LUT) can be

▸Implement a 2:1 MUX using a network of 2-input

▸ A k-input LUT is usually followed by a flip-flop (FF) that

▸ A logic block clusters

Xilinx (now AMD) Intel/Altera

Hierarchical routing architecture Island-style routing architecture

▸ Island-style configurable mesh routing

▸Built-in components for fast arithmetic operation

– Fixed logic and connections, functionality may be

– Much faster than LUT-based implementation (ASIC

§25x18 signed multiplier

[source: AMD Xilinx] 22

▸Example: Xilinx 18K/36K

[source: AMD Xilinx] 23

Xilinx Zynq All Programmable System-on-Chip

AWS F1 instance: AMD Xilinx UltraScale+ VU9P

Versal Architecture Overview

Programmable I/O DDR Memory

PCIe & CCIX

▸ Massive amount of fine-

Þ low (and predictable) latency

▸ These slides contain/adapt materials developed by

You might also like