Adders
Adders
Circuits
A Design Perspective
Jan M. Rabaey
Anantha Chandrakasan
Borivoje Nikolic
Arithmetic Circuits
1
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
A Generic Digital Processor
MEM ORY
INPUT-OUTPUT
CONTROL
DATAPATH
2
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Building Blocks for Digital Architectures
Arithmetic unit
- Bit-sliced datapath (adder, multiplier, shifter, comparator, etc.)
Memory
- RAM, ROM, Buffers, Shift registers
Control
- Finite state machine (PLA, random logic.)
- Counters
Interconnect
- Switches
- Arbiters
- Bus
3
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Arithmetic building blocks
Speed and power of arithmetic components often
dominates the overall system performance
For each module, multiple topologies and ways of
design exists, with each of them has its own advantages
A global picture is of crucial importance. A designer
focus their attention on gates or transistors that have the
largest impact on their goal function. Non-critical
components can be developed routinely.
Typically two optimization process: logic optimization
(re-arrange Boolean equations so that a faster or small
circuit could be obtained) and circuit optimization
(manipulate circuit topology and transistor sizes to
optimize speed)
4
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Bit-Sliced Design
Control
Bit 3
Data-Out
Multiplexer
Bit 2
Data-In
Register
Adder
Shifter
Bit 1
Bit 0
6
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Full-Adder
A B
Sum
Generate (G) = AB
Propagate (P) = A B
Delete (D) = A B
Sum
S = A B Ci
8
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
The Ripple-Carry Adder
A0 B0 A1 B1 A2 B2 A3 B3
S0 S1 S2 S3
9
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Complimentary Static CMOS Full Adder
VDD
VDD
Ci A B
A B
A
B
Ci B
VDD
A
X
Ci
Ci A S
Ci
A B B VDD
A B Ci A
Co B
28 Transistors
10
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Complimentary Static CMOS Full Adder
Large PMOS stacks are present in both carry and sum
generation circuits
Intrinsic load capacitance of Co signal is large and
consists of eight capacitance components
There is one more inverter delay for carry and sum
(worse when the load capacitance is large)
Note that critical signal Ci closer to the output node
11
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Express Sum and Carry as a function of
P, G, D
Define 3 new variable which ONLY depend on A, B
Generate (G) = AB
Propagate (P) = A B
Delete (D) = A B
B
M2
A
A
F
M1 M3/M4
B
B
When B=1, M1/M2 inverter, M3/M4 off, so F=AB
When B=0, M1/M2 off, M3/M4 transmission gate, so F=AB
13
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Transmission Gate Full Adder
P
VDD
VDD Ci
A
P S Sum Generation
A A P Ci
A P VDD
B B
VDD A
P
P Co Carry Generation
Ci Ci Ci
A
Setup P
Propagate (P) = A B
14
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Manchester Carry Chain
VDD
Pi
VDD
Pi
Ci Co
Gi
Co Gi
Ci
Di
Pi
Generate (G) = AB
Propagate (P) = A B
Delete = A B
Prevent floating Co 15
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Full-Adder
A B
Sum
16
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Manchester Carry Chain
VDD
P0 P1 P2 P3
C3
Ci,0
G0 G1 G2 G3
C0 C1 C2 C3
17
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Manchester Carry Chain
Stick Diagram
Propagate/Generate Row
VDD
Pi Gi Pi + 1 Gi + 1
Ci - 1 Ci Ci + 1
GND
Inverter/Sum Row
18
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Manchester Carry Chain
Delay for the Manchester Carry Chain can be
modeled similar to a linearized RC network as in
transmission-gates
This means the propagation delay is quadratic in the
number of bits N (but does not imply the delay will be
larger than the ripple carry adder)
It might be necessary to insert signal buffering
inverters.
Still a ripple carry adder, typically only good for small
word length (<8/16 bits)
We need faster adders for computer and multimedia
applications with word length 32-128 bits
19
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Carry-Bypass Adder
P0 G
G10 P01
P G1 P2 G2 P3 G3 Also called
Carry-Skip
Ci,0 C o,0 C o,1 Co,2 Co,3
FA FA FA FA
P0 G
G10 PP01 G1 P2 G2 P3 G3
BP=P oP1 P2 P3
Ci,0 C o,0 Co,1 C o,2
FA FA FA FA
Multiplexer
Co,3
M bits
tp
ripple adder
bypass adder
4..8 N
22
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Carry-Select Adder
Setup
P,G
Carry Vector
Sum Generation
23
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Carry Select Adder: Critical Path
Bit 0–3 Bit 4–7 Bit 8–11 Bit 12–15
Setup Setup Setup Setup
24
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Linear Carry Select
Bit 0-3 Bit 4-7 Bit 8-11 Bit 12-15
(1)
26
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Adder Delays - Comparison
50
40 Ripple adder
tp (in unit delays)
30
Bypass
Linear select
20
10
Square root select
0
0 20 40 60
N
27
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
LookAhead - Basic Idea
A0, B0 A1, B1 ••• AN-1, BN-1
Ci,0 P0 Ci,1 P1
Ci, N-1 PN-1
S0 S1 ••• SN-1
C o k = f A k B k Co k – 1 = Gk + P kCo k – 1
28
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Look-Ahead: Topology
Expanding Lookahead equations: VDD
C o k = Gk + Pk Gk – 1 + Pk – 1Co k – 2 G3
G2
G1
All the way:
G0
C o k = Gk + Pk Gk – 1 + P k – 1 + P1 G0 + P0 Ci 0
Ci,0
Co,3
P0
P1
P2
P3
29
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Look-Ahead Adder: Logarithmic adder
A0 F
A1 A2 A3 A4 A5 A6 A7
A0
tp N
A1
A2
A3
F
A4
A5
A6 tp log2(N)
A7
30
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Carry Look-Ahead Trees
C0=G0+P0Cin
C1=G1+P1C0
C2=G2+P2C1
C3=G3+P3C2
C0=G0+P0Cin
C1=G1+P1C0 =G1+G0P1+P1P0Cin =G1:0+P1:0C0
(G1:0=G1+P1G0 P1:0=P1P0)
C2=G2+P2C1 =G2+G1P2+G0P2P1+P2P1P0Cin =G2:1+P2:1C0
(G2:1=G2+P2G1 P2:1=P2P1)
C3=G3+P3C2 =G3+G2P3+G1P3P2+G0P3P2P1+P3P2P1P0Cin
=G3:2+P3:2C1=G3:2+P3:2(G1:0+P1:0C0)=(G3:2+P3:2G1:0)+P3:2P1:0C0
Can continue building the tree hierarchically.
G3:2=(G3+P3G2) and P3:2=P3P2 are called dot products. 31
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Tree Adders
S10
S11
S12
S13
S14
S15
S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
(A0, B0)
(A1, B1)
(A2, B2)
(A3, B3)
(A4, B4)
(A5, B5)
(A6, B6)
(A7, B7)
(A8, B8)
(A9, B9)
(A10, B10)
(A11, B11)
(A12, B12)
(A13, B13)
(A14, B14)
(A15, B15)
16-bit radix-2 Kogge-Stone tree (radix 2 means that the tree is
Binary: it combines two dot product or carry words at a time at
Each level of hierarchy)
32
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
© Digital
(a 0, b 0) S0
(a 1, b 1) S1
EE141 Integrated
(a 2, b 2) S2
(a 3, b 3) S3
Circuits2nd
(a 4, b 4) S4
Tree Adders
(a 5, b 5) S5
(a 6, b 6) S6
(a 7, b 7) S7
(a 8, b 8) S8
(a 10, b 10) S 10
(a 11, b 11) S 11
(a 12, b 12) S 12
(a 13, b 13) S 13
(a 14, b 14) S 14
(a 15, b 15) S 15
33
Arithmetic Circuits
© Digital
(a 0, b 0) S0
(a 1, b 1) S1
EE141 Integrated
(a 2, b 2) S2
(a 3, b 3) S3
Circuits2nd
(a 4, b 4) S4
(a 5, b 5) S5
Sparse Trees
(a 6, b 6) S6
(a 7, b 7) S7
(a 8, b 8) S8
(a 9, b 9) S9
(a 10, b 10) S 10
(a 11, b 11) S 11
(a 12, b 12) S 12
16-bit radix-2 sparse tree with sparseness of 2
(a 13, b 13) S 13
(a 14, b 14) S 14
(a 15, b 15) S 15
34
Arithmetic Circuits
© Digital
(A0, B0) S0
EE141 Integrated
(A1, B1) S1
(A2, B2) S2
Circuits2nd
(A3, B3) S3
Brent-Kung Tree
(A4, B4) S4
Tree Adders
(A5, B5) S5
(A6, B6) S6
(A7, B7) S7
(A8, B8) S8
(A9, B9) S9
a
9-1 Mux
5-1 Mux
g64
CARRYGEN
node1 sum
SUMSEL
sumb
REG
ck1 to Cache
9-1 Mux
2-1 Mux
SUMGEN s0
+ LU s1
b
LU : Logical
Unit
1000um
Bit 3
Data-Out
Multiplexer
Bit 2
Data-In
Register
Adder
Shifter
Bit 1
Bit 0
The adder is
Multiplexers implemented
Shifter as a radix-4
Carry Look-
Adder stage 1 Ahead adder,
Wiring the red lines
Loopback Bus
Loopback Bus
Loopback Bus
Adder stage 2 are forwarding
the results of
Wiring
different stages
Bit slice 63
Bit slice 2
Bit slice 1
Bit slice 0
Adder stage 3
Sum Select
39
© Digital
EE141 Integrated Circuits2nd Courtesy of Intel Arithmetic Circuits
Multipliers
40
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
The Binary Multiplication
M + N– 1
·· k
Z = X Y = Zk 2
k=0
M – 1 N – 1
i j
= X 2 Yj 2
i
i=0 j = 0
M – 1 N – 1
i + j
=
Xi Yj 2
i =0 j= 0
with
M –1
i
X = Xi 2
i=0
N– 1
j
Y = Y j2
j= 0 41
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
The Binary Multiplication
1 0 1 0 1 0 Multiplicand
x 1 0 1 1 Multiplier
1 0 1 0 1 0
1 0 1 0 1 0
0 0 0 0 0 0 Partial products
+ 1 0 1 0 1 0
1 1 1 0 0 1 1 1 0 Result
42
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
The Array Multiplier (4 by 4)
X3 X2 X1 X0 Y0
X3 X2 X1 X0 Y1 Z0
HA FA FA HA
X3 X2 X1 X0 Y2 Z1
Half
FA FA FA HA
adder
X3 X2 X1 X0 Y3 Z2
FA FA FA HA
carry
Z7 Z6 Z5 Z4 Z3
sum
The carryout of the last adder for Yi is forwarded to Yi+1
43
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
The MxN Array Multiplier
— Critical Path
HA FA FA HA
FA FA FA HA Critical Path 1
Critical Path 2
FA FA FA HA
44
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Carry-Save Multiplier A more efficient
realization can be obtained
by noticing that the
HA HA HA HA
multiplication results does
not change when the output
carry bits are passed
HA FA FA FA diagonally downwards
instead of to the right.
HA FA FA FA
Y0
Y1 HA Multiplier Cell
C S C S C S C S
Z0
FA Multiplier Cell
Y2
C S C S C S C S
Z1 Vector Merging Cell
Y3
C S C S C S C S X and Y signals are broadcasted
Z2 through the complete array.
( )
C C C C
S S S S
Z7 Z6 Z5 Z4 Z3
46
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Wallace-Tree Multiplier
Partial products First stage
6 5 4 3 2 1 0 6 5 4 3 2 1 0 Bit position
(a) (b)
FA HA
(c) (d)
HA
48
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Wallace-Tree Multiplier
y0 y1
y2
y0 y1 y2 y3 y4 y5
Ci-1
FA
y3 FA FA
Ci Ci Ci-1
Ci-1
FA Ci Ci-1
y4
FA
Ci Ci-1 Ci Ci-1
FA
y5
Ci FA
FA
C S
C S
49
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Booth encoding
Multiply by 01111110 gives 8 partial products, but two
are all zero. Add these zero is waste of time.
Instead, multiply by 100000010, where 1 stands for -1.
Then you need to only add (actually subtract) partial
products, which improves speed
This kind of transformation is called booth encoding. It
reduces the number of partial product to at most half of
the original multiplier width.
The encoding logic is easily incorporated in the overall
multiplier design.
50
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Multipliers —Summary
• Optimization Goals Different Vs Binary Adder
52
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
The Binary Shifter
Right nop Left
Ai Bi
Ai-1 Bi-1
Bit-Slice i
...
53
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
The Barrel Shifter
Column: maximum shift
A3
B3
Sh1
A2
B2
Sh3
A0
B0
Word length
A2
A1
A0
A3 B3
A2 B2
A1 B1
A0 B0
A
3
Out3
A
2
Out2
A
1
Out1
A
0
Good for large shift amount (note that cascade pass Out0
transistor slow down the gate and generate weak signals,
buffers may be needed) 57
© Digital
EE141 Integrated Circuits2nd
Arithmetic Circuits
Building Blocks for Digital Architectures
Arithmetic unit
- Bit-sliced datapath (adder, multiplier, shifter, comparator)
58
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits