08 Architecture
08 Architecture
Embedded Architecture
1
Introduction to Microcontrollers
2
Introduction to Microcontrollers
Ø A microcontroller (MCU) is a small
computer on a single integrated circuit
consisting of a relatively simple central
processing unit (CPU) combined with
peripheral devices such as memories,
I/O devices, and timers.
§ By some accounts, more than half of all CPUs
sold worldwide are microcontrollers.
§ Such a claim is hard to substantiate because
the difference between microcontrollers and
general-purpose processors is indistinct.
3
Microcontrollers
Ø An Embedded Computer System on a Chip
§ A CPU
§ Memory (Volatile and Non-Volatile)
§ Timers
§ I/O Devices
Ø Typically intended for limited energy usage
§ Low power when operating plus sleep modes
Ø Where might you use a microcontroller?
4
What is Control?
Ø Sequencing operations
§ Turning switches on and off
Ø Adjusting continuously (or at least finely)
variable quantities to influence a process
5
Microcontroller vs Microprocessor
Ø A microcontroller is a small computer on a single
integrated circuit containing a processor core,
memory, and programmable input/output
peripherals.
Ø A microprocessor incorporates the functions of a
computer’s central processing unit (CPU) on a
single integrated circuit.
6
Microcontroller vs Microprocessor
7
Types of Processors
Ø In general-purpose computing, the variety of
instruction set architectures today is limited, with the
Intel x86 architecture overwhelmingly dominating all.
Ø There is no such dominance in embedded computing.
On the contrary, the variety of processors can be
daunting to a system designer.
Ø Do you want same microprocessor for your watch,
autonomous vehicle, industrial sensor?
8
How to choose?
Ø How to choose micro-processors/controllers?
Ø Things that matter
§ Peripherals
§ Concurrency & Timing
§ Clock Rates
§ Memory sizes (SRAM & flash)
§ Package sizes
9
Types of Microcontrollers
10
DSP Processors
Ø Processors designed specifically to support numerically
intensive signal processing applications are called DSP
processors, or DSPs (digital signal processors).
Ø Signal Processing Applications: interactive games; radar,
sonar, and LIDAR (light detection and ranging) imaging
systems; video analytics (the extraction of information from
video, for example for surveillance); driver-assist systems for
cars; medical electronics; and scientific instrumentation.
11
A Common Signal Processing Algorithm
Ø finite impulse response (FIR) filtering
Ø N is the length of the filter
Ø ai are tap values $%&
12
FIR Filter Implementation
Ø z-1 is unit delay
Ø Suppose N = 4 and a0 = a1 = a2 = a3 = 1/4.
Ø Then for all n ∈ N,
y(n) = (x(n) + x(n − 1) + x(n − 2) + x(n − 3))/4 .
Ø Multiply-Accumulate
Digital
Analog
0.1
10 100 1000
Frequency
15
Digital Filter Critique
Ø The filter pole is at about ¼ of the sampling rate
§ We have only 4 samples of the impulse response
§ This makes the FIR filter simple: only 4 taps
§ This also degrades the filter performance
Ø We may be able to improve the filter performance
some by using a different design technique
§ The filter coefficients would differ
Ø A higher sampling rate with respect to the filter
corner frequency could also help
16
FIR Filter Delay Implementation
Ø Circular Buffer
17
Programmable Logic Controller (PLC)
Ø A microcontroller system for industrial automation
§ Continuous operation
§ Hostile environments
§ originated as replacements for control circuits using electrical relays to
control machinery
18
GPUs
Ø A graphics processing unit (GPU) is a specialized
processor designed especially to per- form the
calculations required in graphics rendering.
Ø Most used for Gaming (earlier days)
Ø Common programming language: CUDA
19
Parallelism vs Concurrency
Ø Embedded computing applications typically do
more than one thing “at a time.”
Ø Tasks are said to be “concurrent” if they
conceptually execute simultaneously
Ø Tasks are said to be “parallel” if they physically
execute simultaneously
§ Typically multiple servers at the same time
20
Imperative Language
Ø Non-concurrent programs specify a sequence of
instructions to execute.
Ø Imperative Language: expresses a computation as
a sequence of operations
§ Example: C, Java
Ø How to write concurrent programs in imperative
language?
§ Thread Library
21
Dependency – Sequential Consistency
Ø No dependency
between lines 3 and 4
Ø Line 4 is dependent
on Line 3
22
Thread Mapping on Processor
Ø OS Dependent Scheduler
§ Static Mapping
§ Basic Lowest Load (fill in Round Robin fashion)
§ Extended Lowest Load
23
Performance Improvement
Ø Various current architectures seek to improve
performance by finding and exploiting potentials
for parallel execution
§ This frequently improves processing throughput
§ It does not always improve processing latency
§ It frequently makes processing time less predictable
Ø Many embedded applications rely on results being
produced at predictable regular rates
§ Embedded results must be available at the right time 24
Parallelism
Ø Temporal Parallelism – Pipelining
Ø Spatial Parallelism –
§ Superscalar (instruction and data level parallelism)
§ VLIW
§ Multicore
25
RISC and CISC Architectures
Ø CISC – Complex Instruction Set Computer
§ Multi-clock complex instructions
Ø RISC – Reduced Instruction Set Computer
§ Simple instructions that can be executed within one cycle
26
5 Cycles of RISC Instruction Set
Ø Instruction fetch cycle (IF)
§ Fetch instruction from memory pointed by PC, then increment PC
Ø Instruction decode/register fetch cycle (ID)
§ Decode the instruction
Ø Execution/effective address cycle (EX)
§ ALU operates on the operands
Ø Memory access (MEM)
§ Load/Store instructions
Ø Write-back cycle (WB)
§ Register-Register ALU instruction
27
Pipelining in RISC
data hazard (computed branch)
control hazard (conditional branch)
4 branch
Mux
Zero?
taken
Add
Decode
memory
data
Mux
Mux
Instruction
memory
Register
ALU
PC
bank
29
Pipelining Hazard
Ø Data Hazard (RAW (read after write) , WAW
(write after write) , WAR (write after read) )
§ Pipeline bubble (no op)
§ Interlock
§ Out-of-order Execution
Ø Control Hazard
§ Out-of-order Execution
§ Speculative Execution
30
Interlocks
instruction B reads a register written by instruction A
interlock
register bank read 2 A B C D E register bank read 2 A B C D E
ALU A B C D E ALU A B C D E
data memory A B C D E data memory A B C D E
register bank write A B C D E register bank write A B C D E
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 11 12
cycle cycle
32
FIR Filter Implementation
Ø z-1 is unit delay
Ø Suppose N = 4 and a0 = a1 = a2 = a3 = 1/4.
Ø Then for all n ∈ N,
y(n) = (x(n) + x(n − 1) + x(n − 2) + x(n − 3))/4 .
Ø Multiply-Accumulate
34
Symmetric FIR Filter
Ø Coefficients of FIR Filter is often symmetric
§ 𝑁 = 2, 𝑎" = 𝑎#$"$%
36
Multicore Architecture
Ø Combination of several processors in a single chip
Ø Real-time and Safety critical tasks can have
dedicated processors
Ø Heterogeneous multicore
§ CPU and GPUs together
37
FPGAs
Ø Field Programmable
Gate Arrays
§ Set of logic gates and RAM
blocks
§ Reconfigurable /
Programmable
§ Precise timing
𝒇 = 𝑨𝒉 + 𝑨𝒍 ×𝟐#𝒏
Radix point
= 21 + 5×2#(
= 21.625
41
Unsigned Fixed Point Representation
Ø Example: Convert 𝑓 = 3. 141593 to unsigned fixed-point UQ4.12
format.
42
Signed Fixed Point Representation
s m bits n bits
𝑨
𝒇= 𝒏
𝟐
where 𝑁 = 𝑚 + 𝑛 + 1
43
Signed Fixed Point Representation
Ø Example: Convert 𝑓 = −3. 141593 to signed fixed-point Q3.12 format.
𝐼, = 𝑓, ×2./ 𝑓, = 𝐼, ×2#./
/𝐼- = 𝑓- ×2./ /𝑓- = 𝐼- ×2#./
𝐼+ = 𝑓+ ×2./ 𝑓+ = 𝐼+ ×2#./
Subtraction
𝑓+ = 𝑓, − 𝑓-
𝑓+ = 𝑓, + 𝑓-
𝐼+ = 𝐼, − 𝐼-
= 𝐼, ×2#./ + 𝐼- ×2#./
= 𝐼, + 𝐼- ×2#./
𝐼+ ×2#./ = 𝐼, + 𝐼- ×2#./
𝐼+ = 𝐼, + 𝐼-
46
Multiplication
𝑓- = 𝑓. ×𝑓/
= 𝐼. ×2$%0 × 𝐼/ ×2$%0
= 𝐼. ×𝐼/ ×2$"!
𝐼- = 𝐼. ×𝐼/ ×2$%0
𝑓- = 𝐼- ×2$%0
47
Law of Conservation of Bits
Ø When multiplying two x-bit numbers with
formats n.m and p.q, the result has format (n +
p).(m + q)
Ø Processors might support full precision
multiplications
Ø Finally need to convert x-bits to data register
48
Fixed Point Multiplication
𝑓- = 𝑓. ×𝑓/
= 𝐼. ×2$%0 × 𝐼/ ×2$%0
= 𝐼. ×𝐼/ ×2$"! 𝐼- = 𝐼. ×𝐼/ ×2$%0
𝑓- = 𝐼- ×2$%0
49
Overflow Example
Ø Multiply 0.5x0.5
Ø Fixed point representation of 0.5 = 230
51