0% found this document useful (0 votes)
26 views58 pages

Adders

Uploaded by

Meril Cyriac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views58 pages

Adders

Uploaded by

Meril Cyriac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Digital Integrated

Circuits
A Design Perspective
Jan M. Rabaey
Anantha Chandrakasan
Borivoje Nikolic

Arithmetic Circuits

1
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
A Generic Digital Processor

MEM ORY
INPUT-OUTPUT

CONTROL

DATAPATH

2
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Building Blocks for Digital Architectures

Arithmetic unit
- Bit-sliced datapath (adder, multiplier, shifter, comparator, etc.)

Memory
- RAM, ROM, Buffers, Shift registers
Control
- Finite state machine (PLA, random logic.)
- Counters
Interconnect
- Switches
- Arbiters
- Bus

3
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Arithmetic building blocks
 Speed and power of arithmetic components often
dominates the overall system performance
 For each module, multiple topologies and ways of
design exists, with each of them has its own advantages
 A global picture is of crucial importance. A designer
focus their attention on gates or transistors that have the
largest impact on their goal function. Non-critical
components can be developed routinely.
 Typically two optimization process: logic optimization
(re-arrange Boolean equations so that a faster or small
circuit could be obtained) and circuit optimization
(manipulate circuit topology and transistor sizes to
optimize speed)
4
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Bit-Sliced Design
Control

Bit 3

Data-Out
Multiplexer
Bit 2
Data-In

Register

Adder

Shifter
Bit 1
Bit 0

Tile identical processing elements


Since the same operation has to be performed on each
bit of a data word, the data path can consist of the
number of bit slices (equal to the word length), each
operating on a single bit – hence the term bit-sliced 5
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Adders

6
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Full-Adder
A B

Cin Full Cout


adder

Sum

Generate (G) = AB
Propagate (P) = A  B
Delete (D) = A B

G,D, ensures a carry bit will be generated or deleted at Co independent of Ci,


While P guarantees that Ci will propagate to Co.
7
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
The Binary Adder
A B

Cin Full Cout


adder

Sum

S = A  B  Ci

= ABC i + ABC i + ABCi + ABCi


C o = AB + BCi + ACi

8
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
The Ripple-Carry Adder
A0 B0 A1 B1 A2 B2 A3 B3

Ci,0 Co,0 Co,1 Co,2 Co,3


FA FA FA FA
(= Ci,1)

S0 S1 S2 S3

Worst case delay linear with the number of bits


td = O(N)

tadder = (N-1)tcarry + tsum

Goal: Make the fastest possible carry path circuit

9
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Complimentary Static CMOS Full Adder
VDD

VDD
Ci A B

A B
A

B
Ci B
VDD
A
X
Ci

Ci A S
Ci

A B B VDD
A B Ci A

Co B

28 Transistors
10
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Complimentary Static CMOS Full Adder
 Large PMOS stacks are present in both carry and sum
generation circuits
 Intrinsic load capacitance of Co signal is large and
consists of eight capacitance components
 There is one more inverter delay for carry and sum
(worse when the load capacitance is large)
 Note that critical signal Ci closer to the output node

11
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Express Sum and Carry as a function of
P, G, D
Define 3 new variable which ONLY depend on A, B
Generate (G) = AB
Propagate (P) = A  B
Delete (D) = A B

Can also derive expressions for S and Co based on D and P


Note that we will be sometimes using an alternate definition for
Propagate (P) = A + B
12
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Transmission Gate XOR
F  ( A  B + A  B), 12 transisto rs for complement ary implementa tion
B

B
M2

A
A
F

M1 M3/M4
B

B
When B=1, M1/M2 inverter, M3/M4 off, so F=AB
When B=0, M1/M2 off, M3/M4 transmission gate, so F=AB
13
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Transmission Gate Full Adder
P
VDD
VDD Ci
A
P S Sum Generation
A A P Ci

A P VDD
B B
VDD A
P
P Co Carry Generation
Ci Ci Ci
A
Setup P
Propagate (P) = A  B
14
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Manchester Carry Chain
VDD
Pi
VDD 
Pi
Ci Co
Gi
Co Gi
Ci

Di
Pi 

Generate (G) = AB
Propagate (P) = A  B
Delete = A B
Prevent floating Co 15
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Full-Adder
A B

Cin Full Cout


adder

Sum

16
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Manchester Carry Chain
VDD

P0 P1 P2 P3
C3

Ci,0
G0 G1 G2 G3

C0 C1 C2 C3

17
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Manchester Carry Chain
Stick Diagram
Propagate/Generate Row

VDD
Pi Gi  Pi + 1 Gi + 1 

Ci - 1 Ci Ci + 1

GND

Inverter/Sum Row

18
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Manchester Carry Chain
 Delay for the Manchester Carry Chain can be
modeled similar to a linearized RC network as in
transmission-gates
 This means the propagation delay is quadratic in the
number of bits N (but does not imply the delay will be
larger than the ripple carry adder)
 It might be necessary to insert signal buffering
inverters.
 Still a ripple carry adder, typically only good for small
word length (<8/16 bits)
 We need faster adders for computer and multimedia
applications with word length 32-128 bits
19
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Carry-Bypass Adder
P0 G
G10 P01
P G1 P2 G2 P3 G3 Also called
Carry-Skip
Ci,0 C o,0 C o,1 Co,2 Co,3
FA FA FA FA

P0 G
G10 PP01 G1 P2 G2 P3 G3
BP=P oP1 P2 P3
Ci,0 C o,0 Co,1 C o,2
FA FA FA FA

Multiplexer
Co,3

Idea: If (P0 and P1 and P2 and P3 = 1)


then C o3 = C 0, else “delete”
“kill” or “generate”.
or “generate”

Break the bit-slice organization


20
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Carry-Bypass Adder (cont.)
Bit 0–3 Bit 4–7 Bit 8–11 Bit 12–15
Setup tsetup Setup Setup Setup
tbypass

Carry Carry Carry Carry


propagation propagation propagation propagation

Sum Sum Sum tsum Sum

M bits

tadder = tsetup + Mtcarry + (N/M-1)tbypass + (M-1)tcarry + tsum (worst case)

Tsetup: overhead time to create G, P, D signals


21
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Carry Ripple versus Carry Bypass
(both still linear)

tp
ripple adder

bypass adder

4..8 N
22
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Carry-Select Adder
Setup

P,G

"0" "0" Carry Propagation

"1" "1" Carry Propagation

Co,k-1 Multiplexer Co,k+3

Carry Vector

Sum Generation

23
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Carry Select Adder: Critical Path
Bit 0–3 Bit 4–7 Bit 8–11 Bit 12–15
Setup Setup Setup Setup

0 0-Carry 0 0-Carry 0 0-Carry 0 0-Carry

1 1-Carry 1 1-Carry 1 1-Carry 1 1-Carry

Multiplexer Multiplexer Multiplexer Multiplexer


Ci,0 Co,3 Co,7 Co,11 Co,15

Sum Generation Sum Generation Sum Generation Sum Generation


S0–3 S4–7 S8–11 S12–15

24
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Linear Carry Select
Bit 0-3 Bit 4-7 Bit 8-11 Bit 12-15

Setup Setup Setup Setup

(1)

"0" Carry "0" Carry "0" Carry "0" Carry


"0" "0" "0" "0"
(1)

"1" Carry "1" Carry "1" Carry "1" Carry


"1" "1" "1" "1"
(5) (5) (5) (5) (5)
(6) (7) (8)
Multiplexer Multiplexer Multiplexer Multiplexer
Ci,0
(9)

Sum Generation Sum Generation Sum Generation Sum Generation

S0-3 S 4-7 S8-11 S 12-15 (10)

tadder = tsetup + Mtcarry + (N/M)tmux + tsum


25
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Square Root Carry Select
Bit 0-1 Bit 2-4 Bit 5-8 Bit 9-13 Bit 14-19

Setup Setup Setup Setup


(1)

"0" Carry "0" Carry "0" Carry "0" Carry


"0" "0" "0" "0"
(1)

"1" Carry "1" Carry "1" Carry "1" Carry


"1" "1" "1" "1"
(3) (3) (4) (5) (6) (7)
(4) (5) (6) (7)
Multiplexer Multiplexer Multiplexer Multiplexer Mux
Ci,0
(8)
Sum Generation Sum Generation Sum Generation Sum Generation Sum

S0-1 S2-4 S5-8 S9-13 S14-19 (9)

26
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Adder Delays - Comparison
50

40 Ripple adder
tp (in unit delays)

30
Bypass
Linear select
20

10
Square root select

0
0 20 40 60
N

27
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
LookAhead - Basic Idea
A0, B0 A1, B1 ••• AN-1, BN-1

Ci,0 P0 Ci,1 P1
Ci, N-1 PN-1

S0 S1 ••• SN-1

C o k = f A k B k Co k – 1  = Gk + P kCo k – 1

28
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Look-Ahead: Topology
Expanding Lookahead equations: VDD

C o k = Gk + Pk Gk – 1 + Pk – 1Co k – 2  G3

G2

G1
All the way:
G0
C o k = Gk + Pk  Gk – 1 + P k – 1  + P1 G0 + P0 Ci 0  
Ci,0
Co,3

P0

P1

P2

P3

29
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Look-Ahead Adder: Logarithmic adder
A0 F

A1 A2 A3 A4 A5 A6 A7

A0
tp N
A1

A2
A3
F
A4
A5
A6 tp log2(N)
A7

30
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Carry Look-Ahead Trees
C0=G0+P0Cin
C1=G1+P1C0
C2=G2+P2C1
C3=G3+P3C2
C0=G0+P0Cin
C1=G1+P1C0 =G1+G0P1+P1P0Cin =G1:0+P1:0C0
(G1:0=G1+P1G0 P1:0=P1P0)
C2=G2+P2C1 =G2+G1P2+G0P2P1+P2P1P0Cin =G2:1+P2:1C0
(G2:1=G2+P2G1 P2:1=P2P1)
C3=G3+P3C2 =G3+G2P3+G1P3P2+G0P3P2P1+P3P2P1P0Cin
=G3:2+P3:2C1=G3:2+P3:2(G1:0+P1:0C0)=(G3:2+P3:2G1:0)+P3:2P1:0C0
Can continue building the tree hierarchically.
G3:2=(G3+P3G2) and P3:2=P3P2 are called dot products. 31
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Tree Adders

S10

S11

S12

S13

S14

S15
S0

S1

S2

S3

S4

S5

S6

S7

S8

S9
(A0, B0)

(A1, B1)

(A2, B2)

(A3, B3)

(A4, B4)

(A5, B5)

(A6, B6)

(A7, B7)

(A8, B8)

(A9, B9)

(A10, B10)

(A11, B11)

(A12, B12)

(A13, B13)

(A14, B14)

(A15, B15)
16-bit radix-2 Kogge-Stone tree (radix 2 means that the tree is
Binary: it combines two dot product or carry words at a time at
Each level of hierarchy)
32
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
© Digital
(a 0, b 0) S0

(a 1, b 1) S1

EE141 Integrated
(a 2, b 2) S2

(a 3, b 3) S3

Circuits2nd
(a 4, b 4) S4
Tree Adders

(a 5, b 5) S5

(a 6, b 6) S6

(a 7, b 7) S7

(a 8, b 8) S8

16-bit radix-4 Kogge-Stone Tree


(a 9, b 9) S9

(a 10, b 10) S 10

(a 11, b 11) S 11

(a 12, b 12) S 12

(a 13, b 13) S 13

(a 14, b 14) S 14

(a 15, b 15) S 15
33
Arithmetic Circuits
© Digital
(a 0, b 0) S0

(a 1, b 1) S1

EE141 Integrated
(a 2, b 2) S2

(a 3, b 3) S3

Circuits2nd
(a 4, b 4) S4

(a 5, b 5) S5
Sparse Trees

(a 6, b 6) S6

(a 7, b 7) S7

(a 8, b 8) S8

(a 9, b 9) S9

(a 10, b 10) S 10

(a 11, b 11) S 11

(a 12, b 12) S 12
16-bit radix-2 sparse tree with sparseness of 2

(a 13, b 13) S 13

(a 14, b 14) S 14

(a 15, b 15) S 15
34
Arithmetic Circuits
© Digital
(A0, B0) S0

EE141 Integrated
(A1, B1) S1

(A2, B2) S2

Circuits2nd
(A3, B3) S3

Brent-Kung Tree
(A4, B4) S4
Tree Adders

(A5, B5) S5

(A6, B6) S6

(A7, B7) S7

(A8, B8) S8

(A9, B9) S9

(A10, B10) S10

(A11, B11) S11

(A12, B12) S12

(A13, B13) S13

(A14, B14) S14

(A15, B15) S15


35
Arithmetic Circuits
Intel Itanium Microprocessor

a
9-1 Mux

5-1 Mux
g64
CARRYGEN

node1 sum

SUMSEL
sumb

REG
ck1 to Cache
9-1 Mux

2-1 Mux

SUMGEN s0
+ LU s1
b

LU : Logical
Unit

1000um

Itanium has 6 integer execution units like this


36
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Bit-Sliced Design
Control

Bit 3

Data-Out
Multiplexer
Bit 2
Data-In

Register

Adder

Shifter
Bit 1
Bit 0

Tile identical processing elements


37
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Bit-Sliced Datapath
From register files / Cache / Bypass

The adder is
Multiplexers implemented
Shifter as a radix-4
Carry Look-
Adder stage 1 Ahead adder,
Wiring the red lines
Loopback Bus
Loopback Bus

Loopback Bus
Adder stage 2 are forwarding
the results of
Wiring
different stages
Bit slice 63

Bit slice 2
Bit slice 1
Bit slice 0
Adder stage 3

Sum Select

To register files / Cache


38
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Itanium Integer Datapath

39
© Digital
EE141 Integrated Circuits2nd Courtesy of Intel Arithmetic Circuits
Multipliers

40
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
The Binary Multiplication
M + N– 1
·· k
Z = X Y =  Zk 2
k=0
M – 1 N – 1 
 i  j
=   X 2   Yj 2 
 i  
 i=0  j = 0 
M – 1 N – 1 
 i + j
=  
  Xi Yj 2 

i =0 j= 0
 

with
M –1
i
X =  Xi 2
i=0
N– 1
j
Y =  Y j2
j= 0 41
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
The Binary Multiplication
1 0 1 0 1 0 Multiplicand
x 1 0 1 1 Multiplier
1 0 1 0 1 0
1 0 1 0 1 0

0 0 0 0 0 0 Partial products

+ 1 0 1 0 1 0

1 1 1 0 0 1 1 1 0 Result

42
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
The Array Multiplier (4 by 4)
X3 X2 X1 X0 Y0

X3 X2 X1 X0 Y1 Z0

HA FA FA HA

X3 X2 X1 X0 Y2 Z1

Half
FA FA FA HA
adder
X3 X2 X1 X0 Y3 Z2

FA FA FA HA
carry
Z7 Z6 Z5 Z4 Z3
sum
The carryout of the last adder for Yi is forwarded to Yi+1
43
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
The MxN Array Multiplier
— Critical Path
HA FA FA HA

FA FA FA HA Critical Path 1
Critical Path 2

Critical Path 1 & 2

FA FA FA HA

44
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Carry-Save Multiplier  A more efficient
realization can be obtained
by noticing that the
HA HA HA HA
multiplication results does
not change when the output
carry bits are passed
HA FA FA FA diagonally downwards
instead of to the right.
HA FA FA FA

 But need extra adders


HA FA FA HA
(vector merging adders) that
can use fast carry look
ahead adders (since results
Vector Merging Adder
come at the same time)

 Critical path is uniquely


defined
45
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Multiplier Floorplan
X3 X2 X1 X0

Y0
Y1 HA Multiplier Cell
C S C S C S C S
Z0

FA Multiplier Cell
Y2
C S C S C S C S
Z1 Vector Merging Cell

Y3
C S C S C S C S X and Y signals are broadcasted
Z2 through the complete array.
( )

C C C C
S S S S

Z7 Z6 Z5 Z4 Z3

46
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Wallace-Tree Multiplier
Partial products First stage
6 5 4 3 2 1 0 6 5 4 3 2 1 0 Bit position

(a) (b)

Second stage Final adder


6 5 4 3 2 1 0 6 5 4 3 2 1 0

FA HA
(c) (d)

Save the number of full adders


Increase the complexity of routing
47
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Wallace-Tree Multiplier

HA

Can use carry Look-Ahead adder for the last stage

48
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Wallace-Tree Multiplier
y0 y1
y2

y0 y1 y2 y3 y4 y5
Ci-1
FA

y3 FA FA
Ci Ci Ci-1
Ci-1
FA Ci Ci-1

y4
FA
Ci Ci-1 Ci Ci-1
FA

y5

Ci FA
FA

C S
C S

49
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Booth encoding
 Multiply by 01111110 gives 8 partial products, but two
are all zero. Add these zero is waste of time.
 Instead, multiply by 100000010, where 1 stands for -1.
Then you need to only add (actually subtract) partial
products, which improves speed
 This kind of transformation is called booth encoding. It
reduces the number of partial product to at most half of
the original multiplier width.
 The encoding logic is easily incorporated in the overall
multiplier design.

50
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Multipliers —Summary
• Optimization Goals Different Vs Binary Adder

• Once Again: Identify Critical Path

• Other possible techniques


- Logarithmic versus Linear (Wallace Tree Mult)
- Data encoding (Booth)
- Pipelining
FIRST GLIMPSE AT SYSTEM LEVEL OPTIMIZATION
This is also why algorithmic invention has significant
meaning to VLSI design. 51
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Shifters

52
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
The Binary Shifter
Right nop Left

Ai Bi

Ai-1 Bi-1

Bit-Slice i

...
53
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
The Barrel Shifter
Column: maximum shift
A3
B3

Sh1
A2
B2

Sh2 : Data Wire


A1
B1 : Control Wire

Sh3
A0
B0

Word length

Sh0 Sh1 Sh2 Sh3

Area Dominated by Wiring


54
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
4x4 barrel shifter
A3

A2

A1

A0

Sh0 Sh1 Sh2 Sh3


Buffer
 Coder/decoder required to set shift bits
 Signal pass through one gate independent of shift
amount (parasitic capacitance may change the picture)
55
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
Logarithmic Shifter
Sh1 Sh1 Sh2 Sh2 Sh4 Sh4

A3 B3

A2 B2

A1 B1

A0 B0

No separate coder/decoder is required


56
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits
0-7 bit Logarithmic Shifter

A
3
Out3

A
2
Out2

A
1
Out1

A
0
Good for large shift amount (note that cascade pass Out0
transistor slow down the gate and generate weak signals,
buffers may be needed) 57
© Digital
EE141 Integrated Circuits2nd
Arithmetic Circuits
Building Blocks for Digital Architectures

Arithmetic unit
- Bit-sliced datapath (adder, multiplier, shifter, comparator)

(comparator, divider, sin, cos etc)

58
© Digital
EE141 Integrated Circuits2nd Arithmetic Circuits

You might also like