Unified Architecture For Double/Two-Parallel Single Precision Floating Point Adder
Unified Architecture For Double/Two-Parallel Single Precision Floating Point Adder
Abstract—Floating point (F.P.) addition is a core operation for a for double precision with dual single precision support, targeted
wide range of applications. This brief presents an area-efficient, for normalized operands.
dynamically configurable, multiprecision architecture for F.P. ad- In this brief, we have developed an architecture for addi-
dition. We propose an architecture of a double precision (DP) tion arithmetic with double precision F.P. numbers which can
adder, which also supports a dual (two parallel) single precision
(SP) computational feature. Key components involved in the F.P. also support on-the-fly dual (two parallel) single precision F.P.
adder architecture, such as comparator, swap, dynamic shifters, numbers computations, named as DPdSP adder architecture.
leading one-detector (LOD), mantissa adders/subtractors, and We have designed and/or configured the key elements of F.P.
rounding circuit, have been redesigned to efficiently enable re- adder, to share them among different precision operands, to sup-
source sharing for both precision operands with minimal multi- port the multiprecision computation. The proposed architecture
plexing circuitry. The proposed design supports both normal and fully supports normal as well as sub-normal computations, with
sub-normal numbers. The proposed architecture has been synthe-
round-to-nearest rounding method. Other rounding methods
sized for OSUcells Cell 0.18 μm technology ASIC implementation.
Compared to a standalone DP adder with two SP adders, the can be easily included. We have compared our results with the
proposed unified architecture can reduce the hardware resources best optimized implementations available in the literature. The
by ≈ 35%, with a minor delay overhead. Compared to previous main contributions of this work can be summarized as follows:
works, the proposed dual mode architecture has 40% smaller • Proposed an architecture for DPdSP adder, which can
area × delay, and has better area and delay overhead over only
perform on-the-fly either a Double Precision or dual (two
DP adder.
parallel) Single Precision addition/subtraction.
Index Terms—ASIC, digital arithmetic, floating point (F.P.) • Components have been optimized/configured with tuned
addition, multiprecision arithmetic. data path, to minimize the multiplexing circuitry, for re-
I. I NTRODUCTION ducing the area and delay metrics. It can be easily extend
for any dual precision F.P. adder implementation.
1549-7747 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
522 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS
14: OP ← S1 ⊕ S2
15: if OP == 1 then
16: Add_M ← Large_M + Small_M
17: else
18: Add_M ← Large_M − Small_M
19: Leading-One-Detection & Dynamic Left SHIFT:
20: Left_Shift ← LOD(Add_M)
21: Left_Shift ← Adjustment for SUB-NORMAL or Under-
flow or No-Shift(True Add_M MSBs)
22: Add_M ← Add_M Left_Shift
23: Normalization & Rounding:
24: Mantissa Normalization & Compute Rounding ULP
based on Guard, Round & Sticky Bit
25: Add_M ← Add_M + ULP
26: Large_E ← Large_E + Add_M[MSB] − Left_Shift
27: Finalizing Output:
28: Update Exponent & Mantissa for Exceptional Cases
29: Determine Final Output
Fig. 3. DPdSP adder: Dynamic left/right shifter, LOD. (a) Dual mode dynamic right shifter; (b) dual mode leading-one-detector; and (c) dual mode dynamic left
shifter.
H. Dynamic Left Shifter adder in effect is similar to that required for only DP processing.
Further to rounding of the mantissa, the exponents has been
The architecture of dual mode dynamic left shifter is shown
updated, for mantissa overflow. For this, each exponent update
in Fig. 3(c). The input to this unit are mantissa addition add_m,
may require to be incremented by one. One adder has been
and updated left shift amount from previous stage. The basic
shared for DP and one SP-1, because of shared operands in
idea of this is similar to dual mode dynamic right shifter. It
e_L. Further to this, each exponent has been updated for either
contains two left shifters to process each of the 32-bit inputs
of infinity, sub-normal or underflow cases, and each requires
in this stage. In comparison to right shifter, the additional
separate units. The computed signs, exponents and mantissas
multiplexer is used to process the higher left shift output or its
of double precision and both single precision have been finally
combination with primary input of the stage. Furthermore, this
multiplexed to produce the final 64-bit output, which either
is also parameterized and can be easily extended for any amount
contains a DP output or two SP outputs.
of dual mode dynamic left shifting.
Thus, the complete architecture needs only one multiplexer
for multiplexing the operands, and this belongs to the SWAP
I. Exponent Update section. All other processing data path components have been
tuned to follow those operands to support dual mode operations
In this unit the exponents have been updated for mantissa
without any further extra multiplexing circuitry, except the last
overflow and mantissa underflow. The exponents need to be
one to produce the final 64-bit output. A brief summary of
incremented by one or decremented by left shift amount. This
shared resources and extra resources over only DP adder is
update has been shared for DP and one SP, by sharing a
shown in Table I.
subtractor. This need an extra 5-bit adder for a SP processing
as an overhead over DP processing.
III. I MPLEMENTATION R ESULTS AND C OMPARISONS
J. Rounding and Final Processing
The proposed architecture is synthesized using the open-
The primary operands for this section is the left shifted source “OSUcells Cell [10]” 0.18 μm technology, using Syn-
mantissa from previous dual mode dynamic left shifter. The opsys Design Compiler. We have also synthesized only DP
add_m_shif ted consists either of DP or both SP in each of and only SP adder using similar data path computational
its 32-bit parts. Based on the MSBs of the add_m_shif ted, flow, for comparison purpose. The implementation details have
the rounding position need to be determined. Right next bit been shown in Table II. Each module has been synthesized
to the rounding position is the Guard-bit, the next right is for best possible delay. The proposed DPdSP architecture
Round-Bit, and remaining right bits generate Sticky-bit. Based needs roughly 15% more hardware and 7% extra delay than
on the rounding position bit, Guard-bit, Round-bit, Sticky-bit only DP adder, however has 37% saving when compared to
and MSB-bit, the round ULP (unit at last place) has been combined one DP with 2 SP modules (Area(DP + 2∗ SP) −
computed. This need to perform separately for DP and both SP Area(DPdSP)/Area(DP + 2∗ SP)).
and requires few gates for each. Approximately we need thrice A comparison with previous works is shown in Table III.
of DP only computation. After generating the ULP, it has been Other previously reported DPdSP adder designs [6]–[8] sup-
added to add_m_shif ted using two 32-bit adders, individually port only normal implementation, and lacks exceptional case
works for SP computation, and collectively produce the output handling. Though the inclusion of sub-normal support and
for DP (similar to the case of mantissa addition). Further, exceptional case handling is not difficult, it affects the overall
this rounded mantissa sum has been normalized. The rounding area and critical path delay significantly [11], [12]. Because
JAISWAL et al.: UNIFIED ARCHITECTURE FOR SINGLE PRECISION FLOATING POINT ADDER 525
TABLE I
R ESOURCE S HARING IN DPdSP A DDER S UB -C OMPONENTS
TABLE II design has smaller area and delay overhead when compared to
ASIC I MPLEMENTATION D ETAILS
only DP, and has 40%–50% smaller area × delay product. The
proposed DPdSP architecture provides full support to normal
and sub-normal, along with relevant exceptional case handling.
IV. C ONCLUSION
TABLE III In this brief, we have presented an architecture for floating
C OMPARISON W ITH R ELATED W ORK point adder with on-the-fly dual precision support, with both
normal and sub-normal support, and exceptional case han-
dling. It supports double precision with dual single precision
(DPdSP) adder computation. The data path has been tuned
with minimal required multiplexing circuitry. The supporting
sub-components have been tuned for on-the-fly dual mode
computation. It needs approx 15% more resources than DP
module and has a benefit of more than 37% reduction in area
when compared to combined single DP and two SP module.
In comparison to previous works in literature, our proposed
DPdSP design has 40–50% smaller area × delay product, and
has smaller area and delay overhead when compared to only
DP, and provide more computational support.
R EFERENCES
[1] X. Wang and M. Leeser, “Vfloat: A variable precision fixed- and floating-
of different technology implementations, comparison is based point library for reconfigurable hardware,” ACM Trans. Reconfigurable
on the % area and period/delay overhead over corresponding Technol. Syst., vol. 3, no. 3, pp. 16:1–16:34, Sep. 2010.
only DP adder based on the same technology. Ozbilen et al. [8] [2] K. S. Hemmert and K. D. Underwood, “Fast, efficient floating-point
adders and multipliers for FPGAs,” ACM Trans. Reconfigurable Technol.
has shown a little implementation details, and (approximately) Syst., vol. 3, no. 3, pp. 11:1–11:30, Sep. 2010.
has more than 25% area and 15% delay overhead than their [3] A. Baluni, F. Merchant, S. K. Nandy, and S. Balakrishnan, “A fully
corresponding DP adder. A. Akkas [6] has proposed a DPdSP pipelined modular multiple precision floating point multiplier with vector
support,” in Proc. ISED, 2011, pp. 45–50.
adder in 0.25 μm technology and it needs 24% more hardware [4] K. Manolopoulos, D. Reisis, and V. Chouliaras, “An efficient multiple pre-
than their DP design for a 3 clock cycle latency. Further, [7] has cision floating-point multiplier,” in Proc. 18th IEEE Int. Conf. Electron.,
extended his single-path design of [6] and proposed two-path Circuits Syst., 2011, pp. 153–156.
DPdSP adder design. It needs roughly 26% and 33% extra hard- [5] A. Isseven and A. Akkas, “A dual-mode quadruple precision floating-
point divider,” in Proc. 40th ACSSC, 2006, pp. 1697–1701.
ware than their only DP adder. Due to their two path method the [6] A. Akkas, “Dual-mode quadruple precision floating-point adder,” in Proc.
delay (and overhead) reduced than their earlier [6] design, how- Euromicro Symp. Digit. Syst. Des., 2006, pp. 211–220.
ever, with much larger hardware (and overhead) requirements. [7] A. Akkas, “Dual-mode floating-point adder architectures,” J. Syst. Archi-
tecture, vol. 54, no. 12, pp. 1129–1142, Dec. 2008.
The designs in [6], [7] have used a large number of multiplexers [8] M. Ozbilen and M. Gok, “A multi-precision floating-point adder,” in Proc.
(to support dual mode) at various level of architecture, and have PRIME, 2008, pp. 117–120.
less tuned data path for dual mode operation. Further the extra [9] IEEE Standard for Floating-Point Arithmetic, IEEE Std. 754-2008, Aug.
use of resources (like more adders/subtractors for exponent 2008.
[10] Oklahoma State University, OSUCells. [Online]. Available: http://
and mantissa, relatively larger dual shifters, extra mantissa vlsiarch.ecen.okstate.edu
normalizing shifters for dual mode support) made their over- [11] H.-J. Oh et al., “A fully pipelined single-precision floating-point unit in
head larger. Whereas, proposed architecture has reduced the the synergistic processor element of a cell processor,” IEEE J. Solid-State
Circuits, vol. 41, no. 4, pp. 759–771, Apr. 2006.
multiplexing circuitry (mainly two MUX: one in SWAP and [12] E. Schwarz, M. Schmookler, and S. Trong, “Hardware implementations of
one in Final Output section), with more shared and tuned data denormalized numbers,” in Proc. 16th IEEE Symp. Comput. Arithmetic,
path. Compared to previous works, the proposed DPdSP adder 2003, pp. 70–78.