TMS320VC5502 Library Reference
TMS320VC5502 Library Reference
Programmer’s Reference
Related Documentation
- The MathWorks, Inc. Matlab Signal Processing Toolbox User’s Guide. Na-
tick, MA: The MathWorks, Inc., 1996. .
Trademarks
Contents
1 Contents
Introduction to the TMS320C55x DSP Library
1.1 DSP Routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
1.2 Features and Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
1.3 DSPLIB: Quality Freeware That You Can Build On and Contribute To . . . . . . . . . . . . . . 1-2
2 Contents
Describes how to install the DSPLIB
2.1 DSPLIB Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
2.2 How to Install DSPLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
2.2.1 De-Archive DSPLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
2.2.2 Relocate Library File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
2.3 How to Rebuild DSPLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4
2.3.1 For Full Rebuild of 55xdsp.lib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4
2.3.2 For Partial Rebuild of 55xdsp.lib (modification of a specific DSPLIB function,
for example fir.asm) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4
3 Contents
Describes how to use the DSPLIB
3.1 DSPLIB Arguments and Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2
3.1.1 DSPLIB Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2
3.1.2 DSPLIB Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2
3.2 Calling a DSPLIB Function from C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
3.3 Calling a DSPLIB Function from Assembly Language Source Code . . . . . . . . . . . . . . . 3-3
3.4 Where to Find Sample Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
3.5 How DSPLIB is Tested − Allowable Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4
3.6 How DSPLIB Deals with Overflow and Scaling Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4
3.7 Where DSPLIB Goes From Here . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6
4 Contents
Provides descriptions for the TMS320C55x DSPLIB functions
4.1 Arguments and Conventions Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
4.2 DSPLIB Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3
v
Contents
vi
Figures
Figures
Contents vii
Tables
Tables
viii
Contents ix
Chapter 1
Introduction
Topic Page
1-1
DSP Routines
The routines included within the library are organized into eight different
functional categories:
- Adaptive filtering
- Correlation
- Math
- Trigonometric
- Miscellaneous
- Matrix
1.3 DSPLIB: Quality Freeware That You Can Build On and Contribute To
DSPLIB is a free-of-charge product. You can use, modify, and distribute TI
C55x DSPLIB for usage on TI C55x DSPs with no royalty payments. See
section 3.7, Where DSPLIB Goes From Here, for details.
1-2
Chapter 2
Installing DSPLIB
Topic Page
2-1
DSPLIB Content
3) One source library to allow function customization by the end user under
the ”55x_src” sub−directory
55xdsp.src
4) Example programs and linker command files used under the “55x_test”
Examples sub-directory .
2-2
How to Install DSPLIB
Note:
Read the README.TXT file for specific details of release.
c55_dsplib.exe −d
The DSPLIB directory structure and content you will find is:
c55_dsplib(dir)
55xdsp.lib : use for standards short-call mode
blt55x.bat : re-generate 55xdsp.lib based on 55xdsp.src
examples(dir) : contains one subdirectory for each routine included in the
library where you can find complete test cases
include(dir)
dsplib.h : include file with data types and function prototypes
tms320.h : include file with type definitions to increase TMS320 porta-
bility
misc.h : include file with useful miscellaneous definitions
doc(dir)
55x_src (dir) : contains assembly source files for functions
3) Replace the object , fir.obj, in the dsplib.lib object library with the newly
formed object:
ar55 r 55xdsp.lib fir.obj
2-4
Chapter 3
Using DSPLIB
Topic Page
3-1
DSPLIB Arguments and Data Types
- Q.15 (DATA) : A Q.15 operand is represented by a short data type (16 bit)
that is predefined as DATA, in the dsplib.h header file.
- Q.31 (LDATA) : A Q.31 operand is represented by a long data type (32 bit)
that is predefined as LDATA, in the dsplib.h header file.
3-2
Calling a DSPLIB Function from C
- Link your code with the DSPLIB object code library, 55xdsp.lib or
55xdspx.lib.
A project file has been included for each function in the examples folder. You
can reference function_t.c files for calling a DSPLIB function from C.
The examples presented in this document have been tested using the Texas
Instruments C55x Simulator. Customization may be required to use it with a
different simulator or development board.
- araw_t.c: main driver for testing the DSPLIB acorr (raw) function.
- test.h: contains input data(a) and expected output data(yraw) for the acorr
(raw) function as. This test.h file is generated by using Matlab scripts.
- test.c: contains function used to compare the output of araw function with
the expected output data.
- ftest.c: contains function used to compare two arrays of float data types.
- ltest.c: contains function used to compare two arrays of long data types.
- ld3.cmd: an example of a linker command you can use for this function.
The methodology used to deal with overflow should depend on the specifics
of your signal, the type of operation in your functions, and the DSP architecture
used. In general, overflow handling methodologies can be classified in five
categories: saturation, input scaling, fixed scaling, dynamic scaling, and
system design considerations.
3-4
How DSPLIB Deals with Overflow and Scaling Issues
There are 4 specific ways DSPLIB deals with overflow, as reflected in each
function description:
- Not applicable: In this type of function, due to the nature of the function
operations, there is no overflow.
3-6
Chapter 4
Function Descriptions
Topic Page
4-1
4.1 Arguments and Conventions Used
The following convention has been followed when describing the arguments
for each individual function:
Argument Description
x,y argument reflecting input data vector
ushort Unsigned short (16 bit). You can use this data type directly,
because it has been defined in dsplib.h
4-2
4.2 DSPLIB Functions
The routines included within the library are organized into 8 different functional
categories:
- FFT
- Filtering and convolution
- Adaptive filtering
- Correlation
- Math
- Trigonometric
- Miscellaneous
- Matrix
(a) FFT
Functions Description
void cfft (DATA *x, ushort nx, type) Radix-2 complex forward FFT − MACRO
void cfft32 (LDATA *x, ushort nx, type); 32-bit forward complex FFT
void cifft (DATA *x, ushort nx, type) Radix-2 complex inverse FFT − MACRO
void cifft32 (LDATA *x, ushort nx, type); 32-bit inverse complex FFT
void cbrev (DATA *x, DATA *r, ushort n) Complex bit-reverse function
void cbrev32 (LDATA *a, LDATA *r, ushort) 32-bit complex bit reverse
void rfft (DATA *x, ushort nx, type) Radix-2 real forward FFT − MACRO
void rifft (DATA *x, ushort nx, type) Radix-2 real inverse FFT − MACRO
void rfft32 (LDATA *x, ushort nx, type) Forward 32-bit Real FFT (in-place)
void rifft32 (LDATA *x, ushort nx, type) Inverse 32-bit Real FFT (in-place)
Functions Description
ushort fir (DATA *x, DATA *h, DATA *r, DATA *dbuffer, FIR direct form
ushort nx, ushort nh)
ushort fir2 (DATA *x, DATA *h, DATA *r, DATA *dbuffer, FIR direct form (Optimized to use DUAL−MAC)
ushort nx, ushort nh)
ushort firs (DATA *x, DATA *h, DATA *r, DATA *dbuffer, Symmetric FIR direct form (generic routine)
ushort nx, ushort nh2)
ushort convol (DATA *x, DATA *h, DATA *r, ushort nr, Convolution
ushort nh)
ushort convol1 (DATA *x, DATA *h, DATA *r, ushort nr, Convolution (Optimized to use DUAL−MAC)
ushort nh)
Functions Description
ushort convol2 (DATA *x, DATA *h, DATA *r, ushort nr, Convolution (Optimized to use DUAL−MAC)
ushort nh)
ushort iircas4 (DATA *x, DATA *h, DATA *r, DATA *dbuffer, IIR cascade direct form II. 4 coefficients per
ushort nbiq, ushort nx) biquad.
ushort iircas5 (DATA *x, DATA *h, DATA *r, DATA *dbuffer, IIR cascade direct form II. 5 coefficients per
ushort nbiq, ushort nx) biquad
ushort iircas51 (DATA *x, DATA *h, DATA *r, DATA *dbuffer, IIR cascade direct form I. 5 coefficients per
ushort nbiq, ushort nx) biquad
ushort iirlat (DATA *x, DATA *h, DATA *r, DATA *pbuffer, Lattice inverse IIR filter
int nx, int nh)
ushort firlat (DATA *x, DATA *h, DATA *r, DATA *pbuffer, Lattice forward FIR filter
int nx, int nh)
ushort firdec (DATA *x, DATA *h, DATA *r, DATA *dbuffer, Decimating FIR filter
ushort nh, ushort nx, ushort D)
ushort firinterp (DATA *x, DATA *h, DATA *r, DATA *dbuffer, Interpolating FIR filter
ushort nh, ushort nx, ushort I)
ushort hilb16 (DATA *x, DATA *h, DATA *r, DATA *dbuffer, FIR Hilbert Transformer
ushort nx, ushort nh)
ushort iir32 (DATA *x, LDATA *h, DATA *r, LDATA *dbuffer, Double-precision IIR filter
ushort nbiq, ushort nr)
Functions Description
ushort dlms (DATA *x, DATA *h, DATA *r, DATA *des, LMS FIR (delayed version)
DATA *dbuffer, DATA step, ushort nh, ushort nx)
ushort oflag = dlmsfast (DATA *x, DATA *h, DATA *r, DATA Adaptive delayed LMS filter (fast implemented)
*des, DATA *dbuffer, DATA step, ushort nh, ushort nx)
4-4
Table 4−2. Summary Table (Continued)
(d) Correlation
Functions Description
ushort acorr (DATA *x, DATA *r, ushort nx, ushort nr, type) Autocorrelation (positive side only) − MACRO
ushort corr (DATA *x, DATA *y, DATA *r, ushort nx, ushort Correlation (full-length)
ny, type)
(e) Trigonometric
Functions Description
ushort sine (DATA *x, DATA *r, ushort nx) sine of a vector
ushort atan2_16 (DATA *q, DATA *i, DATA *r, ushort nx) Four quadrant inverse tangent of a vector
ushort atan16 (DATA *x, DATA *r, ushort nx) Arctan of a vector
(f) Math
Functions Description
ushort add (DATA *x, DATA *y, DATA *r, ushort nx, Optimized vector addition
ushort scale)
ushort expn (DATA *x, DATA *r, ushort nx) Exponent of a vector
short bexp (DATA *x, ushort nx) Exponent of all values in a vector
ushort logn (DATA *x, LDATA *r, ushort nx) Natural log of a vector
ushort log_2 (DATA *x, LDATA *r, ushort nx) Log base 2 of a vector
ushort log_10 (DATA *x, LDATA *r, ushort nx) Log base 10 of a vector
short maxidx (DATA *x, ushort ng, ushort ng_size) Index for maximum magnitude in a vector
short maxidx34 (DATA *x, ushort nx) Index of the maximum element of a vector ≤ 34
short maxval (DATA *x, ushort nx) Maximum magnitude in a vector
void maxvec (DATA *x, ushort nx, DATA *r_val, Index and value of the maximum element of a
DATA *r_idx) vector
short minidx (DATA *x, ushort nx) Index for minimum magnitude in a vector
short minval (DATA *x, ushort nx) Minimum element in a vector
void minvec (DATA *x, ushort nx, DATA *r_val, Index and value of the minimum element of a
DATA *r_idx) vector
ushort mul32 (LDATA *x, LDATA *y, LDATA *r, ushort nx) 32-bit vector multiply
short neg (DATA *x, DATA *r, ushort nx) 16-bit vector negate
short neg32 (LDATA *x, LDATA *r, ushort nx) 32-bit vector negate
Functions Description
ushort sqrt_16 (DATA *x, DATA *r, short nx) Square root of a vector
short sub (DATA *x, DATA *y, DATA *r, ushort nx, Vector subtraction
ushort scale)
(g) Matrix
Functions Description
ushort mmul (DATA *x1, short row1, short col1, matrix multiply
DATA *x2, short row2, short col2, DATA *r)
ushort mtrans (DATA *x, short row, short col, DATA *r) matrix transponse
(h) Miscellaneous
Functions Description
ushort fltoq15 (float *x, DATA *r, ushort nx) Floating-point to Q15 conversion
ushort q15tofl (DATA *x, float *r, ushort nx) Q15 to floating-point conversion
ushort rand16 (DATA *r, ushort nr) Random number generation
void rand16init(void) Random number generation initialization
4-6
acorr
acorr Autocorrelation
Function ushort oflag = acorr (DATA *x, DATA *r, ushort nx, ushort nr, type)
(defined in araw.asm, abias.asm , aubias.asm)
Arguments
x [nx] Pointer to real input vector of nx real elements. nx ≥ nr
r [nr] Pointer to real output vector containing the first nr elements
of the positive side of the autocorrelation function of vector x.
r must be different than x (in-place computation is not
allowed).
Description Computes the first nr points of the positive side of the autocorrelation of the
real vector x and stores the results in real output vector r. The full-length auto-
correlation of vector x will have 2*nx−1 points with even symmetry around the
lag 0 point (r[0]). This routine provides only the positive half of this for memory
and computational savings.
Algorithm Raw Autocorrelation
nx*j*1
r [j] + ȍ x [j ) k] x [k] 0 v j v nr
k+0
Biased Autocorrelation
nx*j*1
r [j] + 1
nx
ȍ x[j ) k] x [k] 0 v j v nr
k+0
Unbiased Autocorrelation
nx*j*1
r [j] + 1
(nx * abs(j))
ȍ x[j ) k] x[k] 0 v j v nr
k+0
Special Requirements x array in internal memory (coefficient pointer CDP used to address it)
Implementation Notes
- Special debugging consideration: This function is implemented as a mac-
ro that invokes different autocorrelation routines according to the type
selected. As a consequence the acorr symbol is not defined. Instead the
acorr_raw, acorr_bias, acorr_unbias symbols are defined.
Benchmarks (preliminary)
Cycles† Abias:
Core:
nr even: [(4 * nx − nr * (nr + 2) + 20) / 8] * nr
nr odd: [(4 * nx − (nr − 1) * (nr + 1) + 20) / 8] * (nr − 1) + 10
nr = 1: (nx + 2)
Overhead:
nr even: 90
nr odd: 83
nr = 1: 59
Araw:
Core:
nr even: [(4 * nx − nr * (nr + 2) + 28) / 8] * nr
nr odd: [(4 * nx − (nr − 1) * (nr + 1) + 28) / 8] * (nr − 1) + 13
nr = 1: (nx + 1)
Overhead:
nr even: 34
nr odd: 35
nr = 1: 30
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
4-8
add
Cycles† Aubias:
Core:
nreven: [(8 * nx − 3 * nr * (nr + 2) + 68) / 8] * nr
nr odd: [(8 * nx − 3 * (nr−1) * (nr+1) + 68)/8] * (nr − 1) + 33
nr = 1: nx + 26
Overhead:
nr even: 64
nr odd: 55
nr = 1: 47
Code size Abias: 226
(in bytes) Araw: 178
Aubias: 308
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Function ushort oflag = add (DATA *x, DATA *y, DATA *r, ushort nx, ushort scale)
(defined in add.asm)
Arguments
x[nx] Pointer to input data vector 1 of size nx. In-place processing
allowed (r can be = x = y)
y[nx] Pointer to input data vector 2 of size nx
r[nx] Pointer to output data vector of size nx containing
- (x+y) if scale = 0
- (x+y) /2 if scale = 1
Overflow Handling Methodology Scaling implemented for overflow prevention (user selectable)
Benchmarks (preliminary)
Cycles† Core: 3 * nx
Overhead: 23
Code size 60
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Function ushort oflag = atan2_16 (DATA *q, DATA *i, DATA *r, ushort nx)
(defined in arct2.asm)
Arguments
q[nx] Pointer to quadrature input vector of size nx.
i[nx] Pointer to in-phase input vector of size nx
r[nx] Pointer to output data vector (in Q15 format) number
representation of size nx containing. In-place processing
allowed (r can be equal to x ) on output, r contains the
arctangent of (i/q) /π
4-10
atan16
Description This function calculates the arctangent of the ratio i/q, where −1 <= atan2_16
(i/q) <= 1 representing an actual range of −π < atan2_16 (i/q) < π. The result
is placed in the resultant vector r. Output scale factor correction = π. For
example, if:
y = [0x1999, 0x1999, 0x0, 0xe667, 0x1999] (equivalent to [0.2, 0.2, 0, −0.2,
0.2] float)
x = [0x1999, 0x3dcc, 0x7ffff, 0x3dcc c234] (equivalent to [0.2, 0.4828, 1,
0.4828, –0.4828] float)
atan2_16(y, x, r,4) should give:
r = [0x2000, 0x1000, 0x0, 0xf000, 0x7000] equivalent to [0.25, 0.125, 0,
–0.125, 0.875]*π
Special Requirements Linker command file: you must allocate .data section (for polynomial
coefficients)
Benchmarks (preliminary)
Cycles† 18 + 62 * nx
Code size 170 program; 10 data; 4 stack
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Function ushort oflag = atan16 (DATA *x, DATA *r, ushort nx)
(defined in atant.asm)
Arguments
x[nx] Pointer to input data vector of size nx. x contains the tangent
of r, where |x| < 1.
Description This function calculates the arc tangent of each of the elements of vector x. The
result is placed in the resultant vector r and is in the range [−π/2 to π/2] radians.
For example,
if x = [0x7fff, 0x3505, 0x1976, 0x0] (equivalent to tan(π/4), tan(π/8), tan(π/16),
0 in float):
atan16(x,r,4) should give
r = [0x6478, 0x3243, 0x1921, 0x0] equivalent to [π/4, π/8, π/16, 0]
Special Requirements Linker command file: you must allocate .data section (for polynomial
coefficients)
Implementation Notes
- atan(x), with 0 v x v 1, output scaling factor + p.
- Uses a polynomial to compute the arctan (x) for |x| <1. For |x| > 1, you can
express the number x as a ratio of 2 fractional numbers and use the
atan2_16 function.
Benchmarks (preliminary)
Cycles† 14 + 8 * nx
Code size 43 program; 6 data
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
4-12
bexp
Arguments
x [nx] Pointer to input vector of size nx
r Return value. Maximum exponent that may be used in
scaling.
Description Computes the exponents (number of extra sign bits) of all values in the input
vector and returns the minimum exponent. This will be useful in determining
the maximum shift value that may be used in scaling a block of data.
Benchmarks (preliminary)
Cycles Core: 3 * nx
Overhead: 4
Code size 19
(in bytes)
Arguments
x[2*nx] Pointer to complex input vector x.
r[2*nx] Pointer to complex output vector r.
nx Number of complex elements of vectors x and r.
- To bit-reverse the input of a complex FFT, nx should be the
complex FFT size.
- To bit-reverse the input of a real FFT, nx should be half the
real FFT size.
Description This function bit-reverses the position of elements in complex vector x into out-
put vector r. In-place bit-reversing is allowed. Use this function in conjunction
with FFT routines to provide the correct format for the FFT input or output data.
If you bit-reverse a linear-order array, you obtain a bit-reversed order array. If
you bit-reverse a bit-reversed order array, you obtain a linear-order array.
Special Requirements
- Input vector x[ ] and output vector r[ ] must be aligned on 32−bit boundary.
(2 LSBs of byte address must be zero)
- Ensure that the entire array fits within a 64K boundary (the largest possible
array addressable by the 16-bit auxiliary register).
Implementation Notes
- in place bit−reversal has better performance.
4-14
cbrev32
Benchmarks (preliminary)
16 128
32 150
64 222
128 310
256 554
512 918
1024 1794
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle table reads and instruction
fetches (provided linker command file reflects those conditions).
Arguments
x[2*nx] Pointer to complex input vector x.
r[2*x] Pointer to complex output vector r.
nx Number of complex elements in vector x.
- To bit-reverse the output of a complex (i)FFT, nx should be
the complex (i)FFT size.
- To bit-reverse the output of a real (i)FFT, nx should be half
the real (i)FFT size.
Description This function bit-reverses the position of elements in complex vector x into out-
put vector r. In-place bit-reversing is allowed. Use this function in conjunction
with (i)FFT routines to provide the correct format for the (i)FFT input or output
data. If you bit-reverse a linear-order array, you obtain a bit-reversed order
array. If you bit-reverse a bit-reversed order array, you obtain a linear-order
array.
- Ensure that the entire array fits within a 64K boundary (the largest possible
array addressable by the 16-bit auxiliary register).
Implementation Notes x is read in normal linear addressing and r is written with bit-reversed address-
ing.
Example See example/c(i)fft subdirectory
Benchmarks
Cycles† Core:
5*nx (off-place)
11*nx (in-place)
Description Computes a complex nx-point FFT on vector x, which is in normal order. The
original content of vector x is destroyed in the process. The nx complex ele-
ments of the result are stored in vector x in bit-reversed order. The twiddle table
is in bit-reversed order.
4-16
cfft
Algorithm (DFT)
Overflow Handling Methodology If type = SCALE is selected, scaling before each stage is imple-
mented for overflow prevention
Special Requirements
- Ensure that the entire input array fits within a 64K boundary (the largest
possible array addressable by the 16-bit auxiliary register).
- If the twiddle table and the data buffer are in the same block then the ra-
dix-2 kernal is 7 cycles and the radix-4 kernel is not affected.
Implementation Notes
- The implementations are optimized for MIPS, not for code size. They im-
plement the decimation-in-time (DIT) FFT algorithm.
- The SCALE version is implemented using only radix-2 stages. This routine
prevents overflow by scaling by 2 before each FFT stage.
Benchmarks
- 5 cycles (radix-2 butterfly − used in both SCALE and NOSCALE versions)
Cfft( ) xN
Cfft( )
NOSCALE
The MATLAB cfft results need to be multiplied by the cfft size, N, in order to
be compared to the C55 DSPLIB cfft results.
J SCALE version
Cfft( )
Cfft( )
SCALE
The C55 DSPLIB cfft results can be compared to the unmodified MATLAB cfft
results.
CFFT − SCALE
16 358 493
32 624 493
64 1210 493
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
4-18
cfft32
CFFT − NOSCALE
32 517 359
64 1036 359
Arguments
x[2*nx] Pointer to input vector containing nx complex elements (2*nx
real elements) in normal-order. On output, vector x contains
the nx complex elements of the FFT(x) in bit-reversed order.
Complex numbers are stored in the interleaved Re-Im
format.
nx Number of complex elements in vector x. Must be between 8
and 1024.
Description Computes a complex nx-point FFT on vector x, which is in normal order. The
original content of vector x is destroyed in the process. The nx complex ele-
ments of the result are stored in vector x in bit-reversed order.
Algorithm (DFT)
ȍ x[i] < ǒcosǒ2 * pnx* i * kǓ * j sinǒ2 * pnx* i * kǓǓ
nx*1
y[k] + 1 <
(scale factor) i+0
Overflow Handling Methodology If scale==1, scaling before each stage is implemented for over-
flow prevention.
Special Requirements
- The twiddle table must be located in the internal memory since it is ac-
cerred by the C55x coefficient bus.
- Ensure that the entire array fits within a 64K boundary (the largest possible
array addressable by the 16-bit auxiliary register).
Implementation Notes
- Radix-2 DIT version of the FFT algorithm is implemented. The imple-
mentation is optimized for MIPS, not for code size.
Benchmarks
- 12 cycles for radix-2 butterfly in non-scaled version; 15 cycles for radix-2
butterfly in scaled version
- 10 cycles for stage 1 loop in scaled version; 10 cycles for group 1 of stage
2 loop in scaled version; 13 cycles for group 2 of stage 2 in scaled version
CFFT32 − SCALE
32 1712 504
64 4038 504
4-20
cfir
CFFT – NOSCALE
32 1461 337
64 3460 337
Function ushort oflag = cfir (DATA *x, DATA *h, DATA *r, DATA *dbuffer, ushort nx,
ushort nh)
Arguments
x[2*nx] Pointer to input vector of nx complex elements.
h[2*nh] - Pointer to complex coefficient vector of size nh in
normal order. For example, if nh=6, then h[nh] =
{h0r, h0i, h1r, h1i h2r, h2i, h3r, h3i, h4r, h4i, h5r, h5i}
where h0 resides at the lowest memory address in
the array.
- This array must be located in internal memory since
it is accessed by the C55x coefficient bus.
r[2*nx] Pointer to output vector of nx complex elements.
In-place computation (r = x) is allowed.
Description Computes a complex FIR filter (direct-form) using the coefficients stored in
vector h. The complex input data is stored in vector x. The filter output result
is stored in vector r. This function maintains the array dbuffer containing the
previous delayed input values to allow consecutive processing of input data
blocks. This function can be used for both block-by-block (nx ≥ 2) and sample-
by-sample filtering (nx = 1). In-place computation (r = x) is allowed.
ȍ h [k] x [j * k]
nh*1
Algorithm r [j] + 0 v j v nx
k+0
Special Requirements
- nh must be a minimum value of 3. For smaller filters, zero pad the h[ ] array.
4-22
cfir
Implementation Notes The first element in the dbuffer array is present only for alignment purposes.
The second element in this array (index=0) is the entry index for the input
history. It is treated as an unsigned 16-bit value by the function even though
it has been declared as signed in C. The value of the entry index is equal to
the index − 1 of the oldest input entry in the array. The remaining elements
make up the input history. Figure 4−1 shows the array in memory with an entry
index of 2. The newest entry in the dbuffer is denoted by x(j−0), which in this
case would occupy index = 3 in the array. The next newest entry is x(j−1), and
so on. It is assumed that all x() entries were placed into the array by the
previous invocation of the function in a multiple-buffering scheme.
Figure 4−1, Figure 4−2, and Figure 4−3 show the dbuffer, x, and r arrays as
they appear in memory.
•
•
•
xr(j−nh−3)
xi(j−nh−3)
xr(j−nh−4)
xi(j−nh−4)
xr(j−nh−3)
xi(j−nh−3) highest memory address
4-24
cfir
•
•
•
xr(nx−2)
xi(nx−2)
xr(nx−1)
newest x( ) entry xi(nx−1) highest memory address
Benchmarks (preliminary)
Cycles† Core: nx * [8 + 2(nh−2)]
Overhead: 51
Code size 136
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Arguments
x [2*nx] Pointer to input vector containing nx complex elements (2*nx
real elements) in normal order. On output, vector contains
the nx complex elements of the IFFT(x) in bit-reversed order.
Complex numbers are stored in interleaved Re-Im format.
Description Computes a complex nx-point IFFT on vector x, which is in normal order. The
original content of vector x is destroyed in the process. The nx complex ele-
ments of the result are stored in vector x in bit-reversed order.
Algorithm (IDFT)
Overflow Handling Methodology If type = SCALE is selected, scaling before each stage is imple-
mented for overflow prevention
Special Requirements
- Ensure that the entire array fits within a 64K boundary (the largest possible
array addressable by the 16-bit auxiliary register).
- If the twiddle table and the data buffer are in the same block then the ra-
dix-2 kernal is 7 cycles and the radix-4 kernel is not affected.
4-26
cifft
Implementation Notes
- The implementations are optimized for MIPS, not for code size. They im-
plement the decimation-in-time (DIT) FFT algorithm.
- The SCALE version is implemented using only radix-2 stages. This routine
prevents overflow by scaling by 2 before each FFT stage.
Benchmarks (preliminary)
- 5 cycles (radix-2 butterfly − used in both SCALE and NOSCALE versions)
CIFFT − SCALE
16 358 494
32 624 494
64 1210 494
CFFT − NOSCALE
32 512 355
64 1031 355
Arguments
x[2*nx] Pointer to input vector containing nx complex elements (2*nx
real elements) in normal-order. On output, vector x contains
the nx complex elements of the iFFT(x) in bit-reversed order.
Complex numbers are stored in the interleaved Re-Im
format.
nx Number of complex elements in vector x. Must be between 8
and 1024.
Algorithm (iDFT)
4-28
cifft32
Overflow Handling Methodology If scale == 1, scaling before each stage is implemented for over-
flow prevention.
Special Requirements
- The twiddle table must be located in the internal memory since it is ac-
cerred by the C55x coefficient bus.
- Ensure that the entire array fits within a 64K boundary (the largest possible
array addressable by the 16-bit auxiliary register).
Implementation Notes
- Radix-2 DIT version of the iFFT algorithm is implemented. The imple-
mentation is optimized for MIPS, not for code size.
Benchmarks
- 12 cycles for radix-2 butterfly in non-scaled version; 15 cycles for radix-2
butterfly in scaled version
- 10 cycles for stage 1 loop in scaled version; 10 cycles for group 1 of stage
2 loop in scaled version; 13 cycles for group 2 of stage 2 in scaled version
CIFFT32 − SCALE
32 1712 504
64 4038 504
CFFT32 − NOSCALE
32 1461 337
64 3460 337
4-30
convol
convol Convolution
Function ushort oflag = convol (DATA *x, DATA *h, DATA *r, ushort nr, ushort nh)
Arguments
x[nr+nh−1] Pointer to input vector of nr + nh − 1 real elements.
h[nh] Pointer to input vector of nh real elements.
r[nr] Pointer to output vector of nr real elements.
nr Number of elements in vector r. In-place computation (r = x)
is allowed (see Description section for comment).
nh Number of elements in vector h.
oflag Overflow error flag (returned value)
- If oflag = 1, a 32-bit data overflow occurred in an inter-
mediate or final result.
- If oflag = 0, a 32-bit overflow has not occurred.
Description Computes the real convolution of two real vectors x and h, and places the
results in vector r. Typically used for block FIR filter computation when there
is no need to retain an input delay buffer. This function can also be used to
implement single-sample FIR filters (nr = 1) provided the input delay history
for the filter is maintained external to this function. In-place computation (r = x)
is allowed, but be aware that the r output vector is shorter in length than the
x input vector; therefore, r will only overwrite the first nr elements of the x.
ȍ h [k] x [j * k]
nh*1
Algorithm r [j] + 0 v j v nr
k+0
Implementation Notes Figure 4−4, Figure 4−5, and Figure 4−6 show the x, r, and h arrays as they
appear in memory.
•
•
•
x(nr+nh−2)
x(nr+nh−1) highest memory address
r(nr−2)
r(nr−1) highest memory address
•
•
•
h(nh−2)
h(nh−1) highest memory address
Benchmarks (preliminary)
Cycles† Core: nr * (1 + nh)
Overhead: 44
Code size 88
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
4-32
convol1
Function ushort oflag = convol1 (DATA *x, DATA *h, DATA *r, ushort nr, ushort nh)
Arguments
x[nr+nh−1] Pointer to input vector of nr+nh−1 real elements.
h[nh] Pointer to input vector of nh real elements.
r[nr] Pointer to output vector of nr real elements. In-place
computation (r = x) is allowed (see Description section for
comment).
nr Number of elements in vector r. Must be an even number.
nh Number of elements in vector h.
oflag Overflow error flag (returned value)
- If oflag = 1, a 32-bit data overflow occurred in an inter-
mediate or final result.
- If oflag = 0, a 32-bit overflow has not occurred.
Description Computes the real convolution of two real vectors x and h, and places the
results in vector r. This function utilizes the dual-MAC capability of the C55x
to process in parallel two output samples for each iteration of the inner function
loop. It is, therefore, roughly twice as fast as CONVOL, which implements only
a single-MAC approach. However, the number of output samples (nr) must be
even. Typically used for block FIR filter computation when there is no need to
retain an input delay buffer. This function can also be used to implement single-
sample FIR filters (nr = 1) provided the input delay history for the filter is main-
tained external to this function. In-place computation (r = x) is allowed, but be
aware that the r output vector is shorter in length than the x input vector; there-
fore, r will only overwrite the first nr elements of the x.
ȍ h [k] x [j * k]
nh*1
Algorithm r [j] + 0 v j v nr
k+0
•
•
•
x(nr+nh−2)
x(nr+nh−1) highest memory address
r(nr−2)
r(nr−1) highest memory address
•
•
•
h(nh−2)
h(nh−1) highest memory address
Benchmarks (preliminary)
Cycles† Core: nr/2 * [3+(nh−2)]
Overhead: 58
Code size 101
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
4-34
convol2
Function ushort oflag = convol2 (DATA *x, DATA *h, DATA *r, ushort nr, ushort nh)
Arguments
x[nr+nh−1] Pointer to input vector of nr + nh − 1 real elements.
h[nh] Pointer to input vector of nh real elements.
r[nr] Pointer to output vector of nr real elements. In-place
computation (r = x) is allowed (see Description section for
comment). This array must be aligned on a 32-bit boundary
in memory.
Description Computes the real convolution of two real vectors x and h, and places the
results in vector r. This function utilizes the dual-MAC capability of the C55x
to process in parallel two output samples for each iteration of the inner function
loop. It is, therefore, roughly twice as fast as CONVOL, which implements only
a single-MAC approach. However, the number of output samples (nr) must be
even. In addition, this function offers a small performance improvement over
CONVOL1 at the expense of requiring the r array to be 32-bit aligned in memo-
ry. Typically used for block FIR filter computation when there is no need to
retain an input delay buffer. This function can also be used to implement single-
sample FIR filters (nr = 1) provided the input delay history for the filter is main-
tained external to this function. In-place computation (r = x) is allowed, but be
aware that the r output vector is shorter in length than the x input vector; there-
fore, r will only overwrite the first nr elements of the x.
ȍ h [k] x [j * k]
nh*1
Algorithm r [j] + 0 v j v nr
k+0
Special Requirements
- nr must be an even value.
Implementation Notes Figure 4−10, Figure 4−11, and Figure 4−12 show the x, r, and h arrays as they
appear in memory.
•
•
•
x(nr+nh−2)
x(nr+nh−1) highest memory address
r(nr−2)
r(nr−1) highest memory address
•
•
•
h(nh−2)
h(nh−1) highest memory address
4-36
corr
Benchmarks (preliminary)
Cycles† Core: nr/2 * (1 + nh)
Overhead: 24
Code size 100
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Function ushort oflag = corr (DATA *x, DATA *y, DATA *r, ushort nx, ushort ny, type)
Arguments
x [nx] Pointer to real input vector of nx real elements.
y [ny] Pointer to real input vector of ny real elements.
r[nx+ny−1] Pointer to real output vector containing the full-length
correlation (nx + ny − 1 elements) of vector x with y. r
must be different than both x and y (in-place
computation is not allowed).
Description Computes the full-length correlation of vectors x and y and stores the result
in vector r. using time-domain techniques.
r[j] + ȍ x [j ) k] * y [k] 0 v j v nr + nx ) ny * 1
k+o
Biased correlation
nr*j*1
1
r[j] + nr ȍ x [j ) k] * y[k] 0 v j v nr + nx ) ny * 1
k+o
Unbiased correlation
nr*j*1
r[j] + 1
(nx * abs(j))
ȍ x [j ) k] * y [k] 0 v j v nr + nx ) ny * 1
k+o
Special Requirements
J nx y
J ny nx
Implementation Notes
- Special debugging consideration: This function is implemented as a
macro that invokes different correlation routines according to the type
selected. As a consequence the corr symbol is not defined. Instead the
corr_raw, corr_bias, corr_unbias symbols are defined.
Benchmarks (preliminary)
Cycles Raw: 2 times faster than C54x
Unbias: 2.14 times faster than C54x
Bias: 2.1 times faster than C54x
Code size Raw: 318
(in bytes) Unbias: 417
Bias: 356
4-38
dlms
Function ushort oflag = dlms (DATA *x, DATA *h, DATA *r, DATA *des, DATA *dbuffer,
DATA step, ushort nh, ushort nx)
(defined in dlms.asm)
Arguments
x[nx] Pointer to input vector of size nx
h[nh] Pointer to filter coefficient vector of size nh.
- h is stored in reversed order : h(n−1), ... h(0) where h[n]
is at the lowest memory address.
- Memory alignment: h is a circular buffer and must start
in a k-bit boundary(that is, the k LSBs of the starting ad-
dress must be zeros) where k = log2(nh)
r[nx] Pointer to output data vector of size nx. r can be equal to
x.
Description Adaptive delayed least-mean-square (LMS) FIR filter using coefficients stored
in vector h. Coefficients are updated after each sample based on the LMS
algorithm and using a constant step = 2*μ. The real data input is stored in vec-
tor dbuffer. The filter output result is stored in vector r .
LMS algorithm uses the previous error and the previous sample (delayed) to
take advantage of the C55x LMS instruction.
The delay buffer used is the same delay buffer used for other functions in the
C55x DSP Library. There is one more data location in the circular delay buffer
than there are coefficients. Other C55x DSP Library functions use this delay
buffer to accommodate use of the dual-MAC architecture. In the DLMS func-
tion, we make use of the additional delay slot to allow coefficient updating as
well as FIR calculation without a need to update the circular buffer in the interim
operations.
The FIR output calculation is based on x(i) through x(i−nh+1). The coefficient
update for a delayed LMS is based on x(i−1) through x(i−nh). Therefore, by
having a delay buffer of nh+1, we can perform all calculations with the given
delay buffer containing delay values of x(i) through x(i−nh). If the delay buffer
was of length nh, the oldest data sample, x(i−nh), would need to be updated
with the newest data sample, x(i), sometime after the calculation of the first co-
efficient update term, but before the calculation of the last FIR term.
ȍ h [k] * x [i * k]
nh*1
r [j] + 0 v i v nx * 1
k+0
Special Requirements Minimum of 2 input and desired data samples. Minimum of 2 coefficients
Implementation Notes
- Delayed version implemented to take advantage of the C55x LMS instruc-
tion.
ȍ h [k] * x [i * k]
nh*1
r [j] + 0 v i v nx * 1
k+0
4-40
dlmsfast
Benchmarks (preliminary)
Cycles† Core: nx * (7 + 2*(nh − 1)) = nx * (5 + 2 * nh)
Overhead: 26
Code size 122
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Function ushort oflag = dlmsfast (DATA *x, DATA *h, DATA *r, DATA *des, DATA *dbuffer,
DATA step, ushort nh, ushort nx)
This function is implemented for better performance on large number of filter
orders.
(defined in dlmsfast.asm)
Arguments
x[nx] Pointer to input vector of size nx.
h[2*nh] Pointer to filter coefficient array of size 2*nh. This array
contains two coefficient buffers h_coef and h_scratch.
The upated coefficients in different time slot are stored
into these two buffers alternatively. The final updated
coefficients are stored in h_coef.
- h_coef is stored in reversed order: h_coef(n−1), ...
h_coef(0) where h_coef(n−1) is at the lowest memory
address of the first half of array h.
- h_scratch is stored in reversed order : h_scratch(n−1),
... h_scratch(0) where h_scratch(n−1) is at the lowest
memory address of the second half of array h.
- Memory alignment: h must be aligned in 32 bytes
boundary.
r[nx] Pointer to output data vector of size nx. r can be equal to
x.
Description Adaptive delayed least-mean-square (LMS) FIR filter using coefficients stored
in vector h. Coefficients are updated after each sample based on the LMS al-
gorithm and using a constant step = 2*μ. The real data input is stored in vector
dbuffer. The filter output result is stored in vector r.
Unlike the DLMS function in DSPLIB, which uses C55x LMS instruction to do
partial filtering and addition of delta h to the coefficient, this fast LMS algorithm
is implemented by doing coefficient updating and filtering separately to get bet-
ter cycle count.
In this implementation, two input data are processed as a pair. The filtering op-
eration uses dual-MAC to process two time slots of data and two set of coeffi-
cients are updated corresponding to these two time slots.
The delay buffer used is the same delay buffer used for other functions in the
C55x DSP Library. There is two more data location in the circular delay buffer
than there are coefficients. Other C55x DSP Library functions use this delay
buffer to accommodate use of the dual-MAC architecture. In the DLMS func-
tion, we make use of the additional delay slots to allow coefficient updating as
well as FIR calculation without a need to update the circular buffer in the interim
operations.
4-42
dlmsfast
The first time slot of FIR output calculation is based on x(i) through x(i−nh+1).
While the coefficient update for a delayed LMS is based on x(i−1) through
x(i−nh). The second time slot of FIR output is based on x(i+1) through
x(i−nh+2). While the coefficient update for the delayed LMS is based on x(i)
through x(i−nh+1). Therefore, by having a delay buffer of nh+2, we can per-
form all calculations with the given delay buffer containing delay values of x(i)
through x(i−nh+1).
Special Requirements
- Delay buffer array dbugger[ ] must be locaed in the internal memory.
- Coefficient buffer and dbuffer need to be put into different block of memory
for the best performance.
Implementation Notes
- Filtering and coefficient updating are implemented separately.
Figure 4−13, Figure 4−14, and Figure 4−15 show the x buffer, dbuffer, and
h buffers.
entry index = 0
x(0) lowest memory address
newest x( ) entry x(1)
oldest x( ) entry x(−(nh+1))
x(−nh)
•
•
•
x(−2)
x(−1) highest memory address
FIR portion
4-44
expn
Benchmarks
Cycles† Core: nx/2 * (26 + 3*nh)
Overhead: 71
Code size 322
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Function ushort oflag = expn (DATA *x, DATA *r, ushort nx)
(defined in expn.asm)
Arguments
x[nx] Pointer to input vector of size nx. x contains the numbers
normalized between (−1,1) in q15 format
r[nx] Pointer to output data vector (Q3.12 format) of size nx. r can
be equal to x.
Special Requirements Linker command file: you must allocate .data section (for polynomial coeffi-
cients) on a 32−bit boundary (2 LSBs of byte address must be zero).
Implementation Notes Computes the exponent of elements of vector x. It uses the following Taylor
series:
exp(x) + c0 ) (c1 * x) ) (c2 * x 2) ) (c3 * x 3) ) (c4 * x 4) ) (c5 * x 5)
where
c0 = 1.0000
c1 = 1.0001
c2 = 0.4990
c3 = 0.1705
c4 = 0.0348
c5 = 0.0139
Benchmarks (preliminary)
Cycles† Core: 11 * nx
Overhead: 18
Code size 57
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Function ushort oflag = fir (DATA *x, DATA *h, DATA *r, DATA *dbuffer, ushort nx,
ushort nh)
Arguments
x[nx] Pointer to input vector of nx real elements.
h[nh] - Pointer to coefficient vector of size nh in normal order.
For example, if nh=6, then h[nh] = {h0, h1, h2, h3, h4,
h5} where h0 resides at the lowest memory address in
the array.
4-46
fir
Description Computes a real FIR filter (direct-form) using the coefficients stored in vector
h. The real input data is stored in vector x. The filter output result is stored in
vector r. This function maintains the array dbuffer containing the previous
delayed input values to allow consecutive processing of input data blocks. This
function can be used for both block-by-block (nx ≥ 2) and sample-by-sample
filtering (nx = 1). In place computation (r = x) is allowed.
ȍ h [k] x [j * k]
nh*1
Algorithm r [j] + 0 v j v nx
k+0
Special Requirements nh must be a minimum value of 3. For smaller filters, zero pad the h[ ] array.
Implementation Notes The first element in the dbuffer array (index = 0) is the entry index for the input
history. It is treated as an unsigned 16-bit value by the function even though
it has been declared as signed in C. The value of the entry index is equal to
the index − 1 of the oldest input entry in the array. The remaining elements
make up the input history. Figure 4−16 shows the array in memory with an
entry index of 2. The newest entry in the dbuffer is denoted by x(j−0), which
in this case would occupy index = 3 in the array. The next newest entry is
x(j−1), and so on. It is assumed that all x() entries were placed into the array
by the previous invocation of the function in a multiple-buffering scheme.
The dbuffer array actually contains one more history value than is needed to
implement this filter. The value x(j−nh) does not enter into the calculations for
for the output r(j). However, this value is required in other DSPLIB filter func-
tions that utilize the dual-MAC units on the C55x, such as FIR2. Including this
extra location ensures compatibility across all filter functions in the C55x
DSPLIB.
Figure 4−16, Figure 4−17, and Figure 4−18 show the dbuffer, x, and r arrays
as they appear in memory.
•
•
•
x(j−nh−5)
x(j−nh−4)
x(j−nh−3) highest memory address
4-48
fir2
•
•
•
x(nx−2)
newest x( ) entry x(nx−1)
highest memory address
Function ushort oflag = fir (DATA *x, DATA *h, DATA *r, DATA *dbuffer, ushort nx,
ushort nh)
Arguments
x[nx] Pointer to input vector of nx real elements.
h[nh] - Pointer to coefficient vector of size nh in normal order.
For example, if nh=6, then h[nh] = {h0, h1, h2, h3, h4,
h5} where h0 resides at the lowest memory address in
the array.
Description Computes a real FIR filter (direct-form) using the coefficients stored in vector
h. The real input data is stored in vector x. The filter output result is stored in
vector r. This function maintains the array dbuffer containing the previous
delayed input values to allow consecutive processing of input data blocks. This
function can be used for both block-by-block (nx ≥ 2) and sample-by-sample
filtering (nx = 1). In place computation (r = x) is allowed.
ȍ h [k] x [j * k]
nh*1
Algorithm r [j] + 0 v j v nx
k+0
Special Requirements
- nh must be a minimum value of 3. For smaller filters, zero pad the h[ ] array.
4-50
fir2
Implementation Notes The first element in the dbuffer array (index = 0) is the entry index for the input
history. It is treated as an unsigned 16-bit value by the function even though
it has been declared as signed in C. The value of the entry index is equal to
the index − 1 of the oldest input entry in the array. The remaining elements
make up the input history. Figure 4−16 shows the array in memory with an
entry index of 2. The newest entry in the dbuffer is denoted by x(j−0), which
in this case would occupy index = 3 in the array. The next newest entry is
x(j−1), and so on. Every iteration two entries are updated in the dbuffer array.
It is assumed that all x() entries were placed into the array by the previous in-
vocation of the function in a multiple-buffering scheme.
Figure 4−16, Figure 4−17, and Figure 4−18 show the dbuffer, x, and r arrays
as they appear in memory.
•
•
•
x(j−nh−5)
x(j−nh−4)
x(j−nh−3) highest memory address
•
•
•
x(nx−2)
newest x( ) entry x(nx−1)
highest memory address
Function ushort oflag = firdec (DATA *x, DATA *h, DATA *r, DATA *dbuffer , ushort nh,
ushort nx, ushort D)
(defined in decimate.asm)
Arguments
x [nx] Pointer to real input vector of nx real elements.
h[nh] Pointer to coefficient vector of size nh in normal order:
H = b0 b1 b2 b3 …
r[nx/D] Pointer to real input vector of nx/D real elements.
In-place computation (r = x) is allowed
dbuffer[nh+1] Delay buffer
- In the case of multiple-buffering schemes, this array
should be initialized to 0 for the first block only. Be-
tween consecutive blocks, the delay buffer preserves
previous delayed input samples. It also preserves a
ptr to the next new entry into the dbuffer. This ptr is
preserved across function calls in dbuffer[0].
4-52
firdec
Description Computes a decimating real FIR filter (direct-form) using coefficient stored in
vector h. The real data input is stored in vector x. The filter output result is
stored in vector r. This function retains the address of the delay filter memory
d containing the previous delayed values to allow consecutive processing of
blocks. This function can be used for both block-by-block and sample-by-
sample filtering (nx = 1).
ȍ h[k]x [j * D * k]
nh
Algorithm r[j] + 0 v j v nx
k+0
Benchmarks (preliminary)
Cycles Core: (nx/D)*(10+nh+(D−1))
Overhead 67
Code size 144
(in bytes)
Function ushort oflag = firinterp (DATA *x, DATA *h, DATA *r, DATA *dbuffer , ushort nh,
ushort nx, ushort I)
(defined in interp.asm)
Arguments
x [nx] Pointer to real input vector of nx real elements.
h[nh] Pointer to coefficient vector of size nh in normal
order:
H = b0 b1 b2 b3 …
r[nx*I] Pointer to real output vector of nx real elements.
In-place computation (r = x) is allowed
dbuffer[(nh/I)+1] Delay buffer of (nh/I)+1 elements
- In the case of multiple-buffering schemes, this
array should be initialized to 0 for the first block
only. Between consecutive blocks, the delay buff-
er preserves delayed input samples in dbuf-
fer[1…(nh/I)+1]. It also preserves a ptr to the next
new entry into the dbuffer. This ptr is preserved
across function calls in dbuffer[0].
- The delay buffer is only nh/I elements and holds
only delayed x inputs. No zero-samples are in-
serted into dbuffer (since only non-zero products
contribute to the filter output)
4-54
firinterp
Description Computes an interpolating real FIR filter (direct-form) using coefficient stored
in vector h. The real data input is stored in vector x. The filter output result is
stored in vector r. This function retains the address of the delay filter memory
d containing the previous delayed values to allow consecutive processing of
blocks. This function can be used for both block-by-block and sample-by-
sample filtering (nx = 1).
ȍ h[k]xƪI *t kƫ
nh
Algorithm r[t] + 0 v j v nr
k+0
Benchmarks (preliminary)
Cycles Core:
If I > 1
nx*(2+I*(1+(nh/I)))
If I=1 :
nx*(2+nh)
Overhead 72
Function ushort oflag = firlat (DATA *x, DATA *h, DATA *r, DATA *pbuffer, int nx, int nh)
Arguments
x [nx] Pointer to real input vector of nx real elements in normal
order:
x[0]
x[1]
.
.
x[nx−2]
x[nx−1]
h[nh] Pointer to lattice coefficient vector of size nh in normal
order:
h[0]
h[1]
.
.
h[nh−2]
h[nh−1]
r[nx] Pointer to output vector of nx real elements. In-place
computation (r = x) is allowed.
r[0]
r[1]
.
.
r[nx−2]
r[nx−1]
4-56
firlat
Description Computes a real lattice FIR filter implementation using coefficient stored in
vector h. The real data input is stored in vector x. The filter output result is
stored in vector r. This function retains the address of the delay filter memory
d containing the previous delayed values to allow consecutive processing of
blocks. This function can be used for both block-by-block and sample-by-
sample filtering (nx=1)
Benchmarks (preliminary)
Cycles† Core: nx{4 + 4(nh−1)]
Overhead: 23
Code size 53
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Function ushort oflag = firs (DATA *x, DATA *h, DATA *r, DATA *dbuffer, ushort nx,
ushort nh2)
Arguments
x[nx] Pointer to input vector of nx real elements.
r[nx] Pointer to output vector of nx real elements. In-place
computation (r = x) is allowed.
h[nh2] - Pointer to coefficient vector containing the first half
of the symmetric filter coefficients. For example, if
the filter coefficients are {h0, h1, h2, h2, h1, h0},
then h[nh2] = {h0, h1, h2} where h0 resides at the
lowest memory address in the array.
- This array must be located in internal memory
since it is accessed by the C55x coefficient bus.
dbuffer[2*nh2 + 2] Pointer to delay buffer of length nh = 2*nh2 + 2
- In the case of multiple-buffering schemes, this
array should be initialized to 0 for the first filter
block only. Between consecutive blocks, the delay
buffer preserves the previous r output elements
needed.
- The first element in this array is special in that it
contains the array index of the oldest input entry in
the delay buffer. This is needed for multiple-buffer-
ing schemes, and should be initialized to 0 (like all
the other array entries) for the first block only.
nx Number of input samples
4-58
firs
Description Computes a real FIR filter (direct-form) with nh2 symmetric coefficients using
the FIRS instruction approach. The filter is assumed to have a symmetric im-
pulse response, with the first half of the filter coefficients stored in the array h.
The real input data is stored in vector x. The filter output result is stored in vec-
tor r. This function maintains the array dbuffer containing the previous delayed
input values to allow consecutive processing of input data blocks. This function
can be used for both block-by-block (nx ≥ 2) and sample-by-sample filtering
(nx = 1). In-place computation (r = x) is allowed.
Special Requirements
- nh must be a minimum value of 3. For smaller filters, zero pad the h[] array.
Implementation Notes The first element in the dbuffer array (index = 0) is the entry index for the input
history. It is treated as an unsigned 16-bit value by the function even though
it has been declared as signed in C. The value of the entry index is equal to
the index − 1 of the oldest input entry in the array. The remaining elements
make up the input history. Figure 4−22 shows the array in memory with an
entry index of 2. The newest entry in the dbuffer is denoted by x(j−0), which
in this case would occupy index = 3 in the array. The next newest entry is
x(j−1), and so on. It is assumed that all x() entries were placed into the array
by the previous invocation of the function in a multiple-buffering scheme.
The dbuffer array actually contains one more history value than is needed to
implement this filter. The value x(j−2*nh2) does not enter into the calculations
for for the output r(j). However, this value is required in other DSPLIB filter func-
tions that utilize the dual-MAC units on the C55x, such as FIR2. Including this
extra location ensures compatibility across all filter functions in the C55x
DSPLIB.
Figure 4−22, Figure 4−23, and Figure 4−24 show the dbuffer, x, and r arrays
as they appear in memory.
•
•
•
x(j−2*nh2−5)
x(j−2*nh2−4)
x(j−2*nh2−3) highest memory address
•
•
•
x(nx−2)
newest x( ) entry x(nx−1) highest memory address
4-60
fltoq15
Benchmarks (preliminary)
Cycles† Core: nx[5 + (nh−2)]
Overhead: 72
Code size 133
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Function ushort errorcode = fltoq15 (float *x, DATA *r, ushort nx)
(defined in fltoq15.asm)
Arguments
x[nx] Pointer to floating-point input vector of size nx. x should
contain the numbers normalized between (−1,1). The
errorcode returned value will reflect if that condition is not
met.
Description Convert the IEEE floating-point numbers stored in vector x into Q15 numbers
stored in vector r. The function returns the error codes if any element x[i] is not
representable in Q15 format.
All values that exceed the size limit will be saturated to a Q15 1 or −1 depend-
ing on sign (0x7fff if value is positive, 0x8000 if value is negative). All values
too small to be correctly represented will be truncated to 0.
Benchmarks (preliminary)
Cycles† Core: 17 * nx (if x[n] ==0)
23 * nx (if x[n] is too small for Q15
representation)
32 * nx (if x[n] is too large for Q15
representation)
38 * nx (otherwise)
Overhead: 23
Code size 157
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Function ushort oflag = hilb16 (DATA *x, DATA *h, DATA *r, DATA *dbuffer, ushort nx,
ushort nh)
Arguments
x[nx] Pointer to input vector of nx real elements.
h[nh] - Pointer to coefficient vector of size nh in normal
order. H= {h0, h1, h2, h3, h4, …} Every odd valued
filter coefficient has to 0, i.e. h1 = h3 = … = 0. And
H = {h0, 0, h2, 0, h4, 0, …} where h0 resides at the
lowest memory address in the array.
4-62
hilb16
Description Computes a real FIR filter (direct-form) using the coefficients stored in vector
h. The real input data is stored in vector x. The filter output result is stored in
vector r. This function maintains the array dbuffer containing the previous
delayed input values to allow consecutive processing of input data blocks. This
function can be used for both block-by-block (nx >= 2) and sample-by-sample
filtering (nx = 1). In place computation (r = x) is allowed.
ȍ h[k] x [j * k]
nh*1
Algorithm r[j] + 0 v j v nx
k+0
Special Requirements
- Every odd valued filter coefficient has to be 0. This is a requirement for the
Hilbert transformer. For example, a 6 tap filter may look like this: H = [0.867
0 –0.324 0 –0.002 0]
- Always pad 0 to make nh as a even number. For example, a 5 tap filter with
a zero pad may look like this: H = [0.867 0 –0.324 0 –0.002 0]
- nh must be a minimum value of 6. For smaller filters, zero pad the H[] array.
Implementation Notes The first element in the dbuffer array (index = 0) is the entry index for the input
history. It is treated as an unsigned 16-bit value by the function even though
it has been declared as signed in C. The value of the entry index is equal to
the index − 1 of the oldest input entry in the array. The remaining elements
make up the input history. Figure 4−25 shows the array in memory with an
entry index of 2. The newest entry in the dbuffer is denoted by x(j−0), which
in this case would occupy index = 3 in the array. The next newest entry is
x(j−1), and so on. It is assumed that all x() entries were placed into the array
by the previous invocation of the function in a multiple-buffering scheme.
The dbuffer array actually contains one more history value than is needed to
implement this filter. The value x(j−nh) does not enter into the calculations for
for the output r(j). However, this value is required in other DSPLIB filter func-
tions that utilize the dual-MAC units on the C55x, such as FIR2. Including this
extra location ensures compatibility across all filter functions in the C55x
DSPLIB.
Figure 4−25, Figure 4−26, and Figure 4−27 show the dbuffer, x, and r arrays
as they appear in memory.
•
•
•
x(j−nh−5)
x(j−nh−4)
x(j−nh−3) highest memory address
4-64
hilb16
•
•
•
x(nx−2)
newest x( ) entry x(nx−1)
highest memory address
Benchmarks (preliminary)
Cycles Core: nx*(2+nh/2)
Overhead: 28
Code size 108
(in bytes)
Function ushort oflag = iir32 (DATA *x, LDATA *h, DATA *r, LDATA *dbuffer, ushort nbiq,
ushort nr)
(defined in iir32.asm)
Arguments
x [nr] Pointer to input data vector of size nr
h[5*nbiq] Pointer to the 32-bit filter coefficient vector with the
following format. For example for nbiq= 2, h is equal
to:
b21 – high beginning of biquad 1
b21 – low
b11 – high
b11 – low
b01 – high
b01 – low
a21 – high
a21 – low
a11 – high
a11 – low
b22 – high beginning of biquad 2 coefs
b22 – low
b12 – high
b12 – low
b02 – high
b02 – low
a22 – high
a22 – low
a12 – high
a12 – low
r[nr] Pointer to output data vector of size nr. r can be
equal or less than x.
4-66
iir32
Description Computes a cascaded IIR filter of nbiquad biquad sections using 32-bit coeffi-
cients and 32-bit delay buffers. The input data is assumed to be single-preci-
sion (16 bits).
Each biquad section is implemented using Direct-form II. All biquad coeffi-
cients (5 per biquad) are stored in vector h. The real data input is stored in vec-
tor x. The filter output result is stored in vector r .
This function retains the address of the delay filter memory d containing the
previous delayed values to allow consecutive processing of blocks. This func-
tion is more efficient for block-by-block filter implementation due to the C-call-
ing overhead. However, it can be used for sample-by-sample filtering (nx = 1).
Benchmarks (preliminary)
Cycles Core: nx*(7+ 31*nbiq)
Overhead: 77
Code size 203
(in bytes)
Function ushort oflag = iircas4 (DATA *x, DATA *h, DATA *r, DATA *dbuffer, ushort nbiq,
ushort nx)
(defined in iir4cas4.asm)
Arguments
x [nx] Pointer to input data vector of size nx
h[4*nbiq] Pointer to filter coefficient vector with the following
format:
h = a11 b11 a21 b21 ....a1i b1i a2i b2i
where i is the biquad index (a21 is the a2 coefficient
of biquad 1). Pole (recursive) coefficients = a. Zero
(non-recursive) coefficients = b
4-68
iircas4
Description Computes a cascade IIR filter of nbiq biquad sections. Each biquad section is
implemented using Direct-form II. All biquad coefficients (4 per biquad) are
stored in vector h. The real data input is stored in vector x. The filter output
result is stored in vector r.
This function retains the address of the delay filter memory d containing the
previous delayed values to allow consecutive processing of blocks. This func-
tion is more efficient for block-by-block filter implementation due to the C-call-
ing overhead. However, it can be used for sample-by-sample filtering (nx = 1).
Benchmarks (preliminary)
Cycles† Core: nx * (2 + 3 * nbiq)
Overhead: 44
Code size 122
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Function ushort oflag = iircas5 (DATA *x, DATA *h, DATA *r, DATA *dbuffer, ushort nbiq,
ushort nx)
(defined in iircas5.asm)
Arguments
x [nx] Pointer to input data vector of size nx
h[5*nbiq] Pointer to filter coefficient vector with the following
format:
h = a11 b11 a21 b21 b01 ... a1i b1i a2i b2i b0i
where i is the biquad index a21 is the a2 coefficient
of biquad 1). Pole (recursive) coefficients = a. Zero
(non-recursive) coefficients = b
4-70
iircas5
Description Computes a cascade IIR filter of nbiq biquad sections. Each biquad section is
implemented using Direct-form II. All biquad coefficients (5 per biquad) are
stored in vector h. The real data input is stored in vector x. The filter output
result is stored in vector r.
This function retains the address of the delay filter memory d containing the
previous delayed values to allow consecutive processing of blocks. This func-
tion is more efficient for block-by-block filter implementation due to the C-call-
ing overhead. However, it can be used for sample-by-sample filtering (nx = 1).
Benchmarks (preliminary)
Cycles† Core: nx * (5 + 5 * nbiq)
Overhead: 60
Code size 126
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Function ushort oflag = iircas51 (DATA *x, DATA *h, DATA *r, DATA *dbuffer, ushort nbiq,
ushort nx)
(defined in iircas51.asm)
Arguments
x [nx] Pointer to input data vector of size nx
h[5*nbiq] Pointer to filter coefficient vector with the following
format:
h = b01 b11 b21 a11 a21 ....b0i b1i b2i a1i a2I
where i is the biquad index (a21 is the a2 coefficient
of biquad 1). Pole (recursive) coefficients = a. Zero
(non-recursive) coefficients = b
Description Computes a cascade IIR filter of nbiq biquad sections. Each biquad section is
implemented using Direct-form I. All biquad coefficients (5 per biquad) are
stored in vector h. The real data input is stored in vector x. The filter output
result is stored in vector r.
This function retains the address of the delay filter memory d containing the
previous delayed values to allow consecutive processing of blocks. This func-
4-72
iircas51
tion is more efficient for block-by-block filter implementation due to the C-call-
ing overhead. However, it can be used for sample-by-sample filtering (nx = 1).
Benchmarks (preliminary)
Cycles† Core: nx * (5 + 8 * nbiq)
Overhead: 68
Code size 154
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Function ushort oflag = iirlat (DATA *x, DATA *h, DATA *r, DATA *dbuffer, int nx, int nh)
Arguments
x [nx] Pointer to real input vector of nx real elements in normal
order:
x[0]
x[1]
.
.
x[nx−2]
x[nx−1]
h[nh] Pointer to lattice coefficient vector of size nh in normal
order with the first element zero-padded:
0
h[0]
h[1]
.
.
h[nh−2]
h[nh−1]
4-74
iirlat
nh Number of coefficients
oflag Overflow error flag
- If oflag = 1, a 32-bit data overflow has occurred in an
intermediate or final result.
- If oflag = 0, a 32-bit overflow has not occurred.
Description Computes a real lattice IIR filter implementation using coefficient stored in vec-
tor h. The real data input is stored in vector x. The filter output result is stored
in vector r . This function retains the address of the delay filter memory d con-
taining the previous delayed values to allow consecutive processing of blocks.
This function can be used for both block-by-block and sample-by-sample filter-
ing (nx = 1)
Benchmarks (preliminary)
Cycles† Core: 4 * (nh − 1) * nx
Overhead: 24
Code size 54
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Function void ldiv16 (LDATA *x, DATA *y, DATA *r, DATA *rexp, ushort nx)
Arguments
x [nx] Pointer to input data vector 1 of size nx
x[0]
x[1]
.
.
x[nx−2]
x[nx−1]
Description This routine implements a long division function of a Q31 value divided by a
Q15 value. The reciprocal of the Q15 value, y, is calculated then multiplied by
the Q31 value, x. The result is returned as an exponent such that:
Algorithm The reciprocal of the Q15 number is calculated using the following equation:
Ym + 2 * Ym * Ym 2 * Xnorm
4-76
log_10
The initial estimate can be obtained from a look-up table, from choosing a mid-
point, or simply from linear interpolation. The method chosen for this problem
is linear interpolation and is accomplished by taking the complement of the
least significant bits of the Xnorm value.
Benchmarks (preliminary)
Cycles† Core: 4 * nx
Overhead: 14
Code size 91
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Function ushort oflag = log_10 (DATA *x, LDATA *r, ushort nx)
(defined in log_10.asm)
Arguments
x[nx] Pointer to input vector of size nx.
r[nx] Pointer to output data vector (Q31 format) of size nx.
nx Length of input and output data vectors
oflag Overflow flag.
- If oflag = 1, a 32-bit overflow has occurred.
- If oflag = 0, a 32-bit overflow has not occurred.
Description Computes the log base 10 of elements of vector x using Taylor series.
The coefficients Bi used in the calculation are derived from the Ci as follows:
Benchmarks (preliminary)
Cycles† Core: 35 * nx
Overhead: 36
Code size 162
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
4-78
log_2
Function ushort oflag = log_2 (DATA *x, LDATA *r, ushort nx)
(defined in log_2.asm)
Arguments
x[nx] Pointer to input vector of size nx.
r[nx] Pointer to output data vector (Q31 format) of size nx.
nx Length of input and output data vectors
oflag Overflow flag.
- If oflag = 1, a 32-bit overflow has occurred.
- If oflag = 0, a 32-bit overflow has not occurred.
Description Computes the log base 2 of elements of vector x using Taylor series.
The coefficients Bi used in the calculation are derived from the Ci as follows:
Benchmarks (preliminary)
Cycles† Core: 36 * nx
Overhead: 37
Code size 166
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Function ushort oflag = logn (DATA *x, LDATA *r, ushort nx)
(defined in logn.asm)
Arguments
x[nx] Pointer to input vector of size nx.
r[nx] Pointer to output data vector (Q31 format) of size nx.
nx Length of input and output data vectors
oflag Overflow flag.
- If oflag = 1, a 32-bit overflow has occurred.
- If oflag = 0, a 32-bit overflow has not occurred.
Description Computes the log base e of elements of vector x using Taylor series.
4-80
maxidx
Special Requirements
- ng_size is an even number between 2 and 34.
- nx is an even number.
4-82
maxidx34
Benchmarks (preliminary)
Cycles† Core: nx/2 + ng16
Overhead: 40
Code size 143
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Arguments
x[nx] Pointer to input vector of size nx.
r Index for vector element with maximum value.
nx Lenght of input data vector (nx ≤ 34).
Description Returns the index of the maximum element of a vector x. The index is a number
between 0 and nx − 1. In case of multiple maximum elements, r contains the
index of the first maximum element found.
Algorithm Not applicable
Overflow Handling Methodology Not applicable
Special Requirements Size of the vector, nx ≤ 34
nx is an even number.
Input vector has to be 32-bit aligned.
Implementation Notes none
Example See examples/maxidx34 subdirectory
Benchmarks (preliminary)
Cycles† Core: nx/2
Overhead: 42
Code size 26
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Function void maxvec (DATA *x, ushort nx, DATA *r_val, DATA *r_idx)
(defined in maxvec.asm)
Arguments
x[nx] Pointer to input vector of size nx.
r_val maximum value
r_idx Index for vector element with maximum value
nx Lenght of input data vector (nx w 6)
4-84
minidx
Description This function finds the index for vector element with maximum value. In case
of multiple maximum elements, r_idx contains the index of the first maximum
element found. r_val contains the maximum value.
Benchmarks (preliminary)
Cycles Core: nx*3
Overhead: 8
Code size 26
(in bytes)
Arguments
x[nx] Pointer to input vector of size nx.
r Index for vector element with minimum value
nx Length of input data vector
Description Returns the index of the minimum element of a vector x. In case of multiple
minimum elements, r contains the index of the first minimum element found.
Special Requirements
4-86
minvec
Function void minvec (DATA *x, ushort nx, DATA *r_val, DATA *r_idx)
(defined in minvec.asm)
Arguments
x[nx] Pointer to input vector of size nx.
r_val Minimum value
r_idx Index for vector element with minimum value
nx Length of input data vector (nx w 6)
Description This function finds the index for vector element with minimum value. In case
of multiple minimum elements, r_idx contains the index of the first minimum
element found. r_val contains the minimum value.
Benchmarks (preliminary)
Cycles Core: nx*3
Overhead: 8
Code size 26
(bytes)
Arguments
x1[row1*col1]: Pointer to input vector of size nx
Pointer to input matrix of size row1*col1
; row1 :
; :
; :
; r[row1*col2] : Pointer to output data vector of size
row1*col2
row1 number of rows in matrix 1
col1 number of columns in matrix 1
x2[row2*col2]: Pointer to input matrix of size row2*col2
row2 number of rows in matrix 2
col2 number of columns in matrix 2
r[row1*col2] Pointer to output matrix of size row1*col2
Special Requirements
- Verify that the dimensions of input matrices are legal, i.e. col1 == row2
4-88
mtrans
Implementation Notes In order to take advantage of the dual MAC architecture of the C55x, this imple-
mentation checks the size of the matrix x1. For small matrices x1 (row1 < 4 or
col1 < 2), single MAC loops are used. For larger matrices x1 (row1 ≥ 4 and
col1 ≥ 2), Dual MAC loops are more efficient and quickly make up for the addi-
tional initialization overhead.
Benchmarks (preliminary)
Cycles† Core:
- if(row1 < 4 || col1 < 2), use single MAC
((col1 + 2)*row1 + 4)*col2
- if((row1==even)&&(row1 ≥ 4)&&(col1 ≥ 2)), use dual MAC
((col1 + 4)*0.5*row1 + 10)col2
- if((row1==odd)&&(row1 ≥ 4)&&(col1 ≥ 2), use dual MAC
((col1 + 4)*0.5*(row1 − 1) + col1 + 12)col2
Overhead: 30
Code size 215
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Function ushort oflag = mtrans (DATA *x, short row, short col, DATA *r)
(defined in mtrans.asm)
Arguments
x[row*col] Pointer to input matrix. In-place processing is not allowed.
row number of rows in matrix
col number of columns in matrix
r[row*col] Pointer to output data vector
Algorithm for i = 1 to M
for j = 1 to N
C(j,i) = A(i,j)
Benchmarks (preliminary)
Cycles† Core: (1 + col) * row
Overhead: 23
Code size 65
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Function ushort oflag = mul32 (LDATA *x, LDATA *y, LDATA *r, ushort nx)
(defined in mul32.asm)
Arguments
x[nx] Pointer to input data vector of size nx. In-place processing
allowed (r can be = x = y)
Description This function multiplies two 32-bit Q31 vectors, element by element, and
produces a 32-bit Q31 vector.
Overflow Handling Methodology Scaling implemented for overflow prevention (user selectable)
Special Requirements
4-90
neg
Code size 73
(in bytes)
Function ushort oflag = neg (DATA *x, DATA *r, ushort nx)
(defined in neg.asm)
Arguments
x[nx] Pointer to input data vector 1 of size nx. In-place processing
allowed (r can be = x = y)
r[nx] Pointer to output data vector of size nx. In-place processing
allowed
Special cases:
- if x[I] = −1 = 32768 , then r = 1 = 321767 with oflag = 1
- if x= 1 = 32767 , then r = −1 = 321768 with oflag = 1
Description This function negates each of the elements of a vector (fractional values).
Benchmarks (preliminary)
Cycles† Core: 4 * nx
Overhead: 13
Code size 61
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Function ushort oflag = neg32 (LDATA *x, LDATA *r, ushort nx)
(defined in neg.asm)
Arguments
x[nx] Pointer to input data vector of size nx. In-place processing
allowed (r can be = x = y)
Description This function negates each of the elements of a vector (fractional values).
Special Requirements
4-92
power
Benchmarks (preliminary)
Cycles† Core: 4 * nx
Overhead: 13
Code size 61
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Function ushort oflag = power (DATA *x, LDATA *r, ushort nx)
(defined in power.asm)
Arguments
x[nx] Pointer to input data vector of size nx. In-place processing
allowed (r can be = x = y)
r[1] Pointer to output data vector element in Q31 format
Special cases:
- if x= −1 = 32768*216 , then r = 1 = 321767*216
with oflag = 1
- if x= 1 = 32767*216 , then r = −1 = 321768*216
with oflag = 1
nx Number of elements of input vectors.
nx ≥ 4
oflag Overflow flag.
- If oflag = 1, a 32-bit overflow has occurred.
- If oflag = 0, a 32-bit overflow has not occurred.
Special Requirements
Benchmarks (preliminary)
Cycles† Core: nx − 1
Overhead: 12
Code size 54
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Arguments
x[nx] Pointer to Q15 input vector of size nx.
r[nx] Pointer to floating-point output data vector of size nx
containing the floating-point equivalent of vector x.
Description Converts the Q15 stored in vector x to IEEE floating-point numbers stored in
vector r.
Special Requirements
Benchmarks (preliminary)
4-94
rand16
Arguments
*r Pointer to the array where the 16-bit random numbers are
stored
nr Number of random numbers that are generated
oflag Overflow error flag (returned value)
- If oflag = 1, a 32-bit data overflow occurred in an
intermediate or final result.
- If oflag = 0, a 32-bit overflow has not occurred.
Description This algorithm computes an array of random numbers based on the linear con-
gruential method introduced by D. Lehmer in 1951. This is one of the fastest
and simplest techniques of generating random numbers. The code shown
here generates 16-bit integers, however, if a 32-bit value is desired the code
can be modified to perform 32-bit multiplies using the defined constants
RNDMULT and RNDINC. The disadvantage of this technique is that it is very
sensitive to the choice of RANDMULT and RNDINC.
Implementation Notes Rand16() is written so that it can be called from a C program. Prior to calling
rand16(), rand16i() can be called to initialize the random number generator
seed value. The C routine passes two parameters to rand16(): A pointer to the
random number array *r and the count of random numbers (nr) desired. The
random numbers are declared as short or 16 bit values. Two constants
RNDMULT and RNDINC are defined in the function. The algorithm is sensitive
to the choice of RNDMULT and RNDINC so exercise caution when changing
these.
M This value is based on the system that the routine runs. This
routine returns a random number from 0 to 65536 (64K) and
is NOT internally bounded. If you need a min/max limit, this
must be coded externally to this routine.
RNDMULT Should be chosen such that the last three digits fall in the
pattern even_digit−2−1 such as xx821, xx421 etc.
RNDMULT = 31821 is used in this routine.
RNDINC In general, this constant can be any prime number related to
M. Research shows that RNDINC (the increment value)
should be chosen by the following formula:
RNDINC = ((1/2 − (1/6 * SQRT(3))) * M). Using M=65536,
RNDINC was picked as 13849.
The random seed initialized in rand16i() is used to generate the first random
number. Each random number generated is used to generate the next number
in the series. The random number is generated in the accumulator (32 bits) by
using the multiply-accumulate (MAC) unit to do the computation. In the course
of the algorithm if there is intermediate overflow, the overflow flag bit in status
register is set. At the end of the algorithm, the overflow flag is tested for any
intermediate overflow conditions.
4-96
rand16init
Benchmarks
Cycles Core: 13 + nr*2
Overhead: 10
Code size 49
(in bytes)
Arguments none
Implementation Notes This function initializes a global variable rndseed in global memory to be used
for the 16 bit random number generation routine (rand16)
Benchmarks
Cycles 6
Code size 9
(in bytes)
Function void recip16 (DATA *x, DATA *r, DATA *rexp, ushort nx)
Arguments
x[nx] Pointer to input data vector 1 of size nx.
x[0]
x[1]
.
.
x[nx−2]
x[nx−1]
Description This routine returns the fractional and exponential portion of the reciprocal of
a Q15 number. Since the reciprocal is always greater than 1, it returns an expo-
nent such that:
Algorithm Ym + 2 * Ym * Ym 2 * Xnorm
4-98
rfft
The initial estimate can be obtained from a look-up table, from choosing a mid-
point, or simply from linear interpolation. The method chosen for this problem
is linear interpolation and is accomplished by taking the complement of the
least significant bits of the Xnorm value.
Benchmarks (preliminary)
Cycles† Core: 33 * nx
Overhead: 12
Code size 69
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Arguments
x [nx] Pointer to input vector containing nx real elements. On output,
vector x contains the first half (nx/2 complex elements) of the
FFT output in the following order. Real FFT is a symmetric
function around the Nyquist point, and for this reason only half
of the FFT(x) elements are required.
On output x will contain the FFT(x) = y in the following format:
y(0)Re y(nx/2)im → DC and Nyquist
y(1)Re y(1)Im
y(2)Re y(2)Im
....
y(nx/2)Re y(nx/2)Im
Complex numbers are stored in Re-Im format
nx Number of real elements in vector x. can take the following
values.
nx = 16, 32, 64, 128, 256, 512, 2048
Description Computes a Radix-2 real DIT FFT of the nx real elements stored in vector x
in normal order. The original content of vector x is destroyed in the process.
The first nx/2 complex elements of the RFFT(x) are stored in vector x in nor-
mal-order.
Algorithm (DFT)
See CFFT
Special Requirements
J Ensure that the entire data buffer fits within a 64K boundary (the larg-
est possible array addressable by the 16-bit auxiliary register).
J For best performance, the data buffer has to be in a DARAM block.
J If the twiddle table and the data buffer are in the same DARAM block,
then the radix-2 kernel is 7 cycles and the radix-4 kernel is not af-
fected.
Implementation Notes Implemented as a complex FFT of size nx/2 followed by an unpack stage to
unpack the real FFT results. Therefore, Implementation Notes for the cfft func-
tion apply to this case.
4-100
rfft
Cfft( ), N/2
rfft( ) Unpack( )
J NOSCALE version
C55 DSPLIB MATLAB
rfft( )
Cfft( ), N x N/2
NOSCALE
The unpack routine in the DSPLIB always scales by two the data independent-
ly of the scaled or non-scaled rfft ( ). In order to compare the results to the MAT-
LAB results, the MATLAB results need to be multiplied by a factor of N/2 (N is
the rfft size).
J SCALE version
C55 DSPLIB MATLAB
rfft( )
cfft( ), N
SCALE
The C55 DSPLIB scaled rfft results can be compared to the unmodified MAT-
LAB cfft results.
Arguments
x [nx] Pointer to input vector containing nx 32-bit real elements. On
output, vector x contains the first half (nx/2 complex elements)
of the FFT output in the following order. Real FFT is a
symmetric function around the Nyquist point, and for this
reason only half of the FFT(x) elements are required.
On output x will contain the FFT(x) = y in the following format:
y(0)Re y(nx/2)im → DC and Nyquist
y(1)Re y(1)Im
y(2)Re y(2)Im
....
y(nx/2)Re y(nx/2)Im
Complex numbers are stored in Re-Im format
nx Number of real elements in vector x. can take the following
values.
nx = 16, 32, 64, 128, 256, 512,1024,2048
type RFFT type selector. Types supported:
- If type = SCALE, scaled version selected
- If type = NOSCALE, non-scaled version selected
Description Computes a Radix-2 real DIT FFT of the nx real elements stored in vector x
in normal order. The original content of vector x is destroyed in the process.
The first nx/2 complex elements of the RFFT(x) are stored in vector x in nor-
mal-order.
Algorithm (DFT)
See CFFT
Special Requirements
J Ensure that the entire data buffer fits within a 64K boundary (the larg-
est possible array addressable by the 16-bit auxiliary register).
J For best performance, the data buffer has to be in a DARAM block.
4-102
rifft
Description Computes a Radix-2 real DIT IFFT of the nx real elements stored in vector x
in bit−reversed order. The original content of vector x is destroyed in the
process. The first nx/2 complex elements of the IFFT(x) are stored in vector
x in normal-order.
Algorithm (IDFT)
See CIFFT
Special Requirements
J Ensure that the entire data buffer fits within a 64K boundary (the larg-
est possible array addressable by the 16-bit auxiliary register).
J For best performance, the data buffer has to be in a DARAM block.
J If the twiddle table and the data buffer are in the same DARAM block,
then the radix-2 kernel is 7 cycles and the radix-4 kernel is not af-
fected.
Implementation Notes Implemented as a complex IFFT of size nx/2 followed by an unpack stage to
unpack the real IFFT results. Therefore, Implementation Notes for the cfft
function apply to this case.
4-104
rifft32
Arguments
x [nx] Pointer to input vector x containing nx 32-bit real elements.
On output, the vector x contains nx complex elements
corresponding to RIFFT(x) or the signal itself.
nx Number of real elements in vector x. nx can take the following
values.
nx =16, 32, 64, 128, 256, 512, 1024, 2048
type RFFT type selector. Types supported:
- If type = SCALE, scaled version selected
- If type = NOSCALE, non-scaled version selected
Description Computes a Radix-2 real DIT IFFT of the nx real elements stored in vector x
in bit−reversed order. The original content of vector x is destroyed in the
process. The first nx/2 complex elements of the IFFT(x) are stored in vector
x in normal-order.
Algorithm (IDFT)
See CIFFT
Special Requirements
J Ensure that the entire data buffer fits within a 64K boundary (the larg-
est possible array addressable by the 16-bit auxiliary register).
Implementation Notes Implemented as a complex IFFT of size nx/2 followed by an unpack stage to
unpack the real IFFT results. Therefore, Implementation Notes for the cift32
function apply to this case.
sine Sine
Function ushort oflag = sine (DATA *x, DATA *r, ushort nx)
(defined in sine.asm)
Arguments
x[nx] Pointer to input vector of size nx. x contains the angle in
radians between [−π, π] normalized between (−1,1) in q15
format
x = xrad /π
For example:
45o = π/4 is equivalent to x = 1/4 = 0.25 = 0x200 in q15
format.
r[nx] Pointer to output vector containing the sine of vector x in q15
format
nx Number of elements of input and output vectors.
nx ≥ 4
oflag Overflow flag.
- If oflag = 1, a 32-bit overflow has occurred.
- If oflag = 0, a 32-bit overflow has not occurred.
Description Computes the sine of elements of vector x. It uses Taylor series to compute
the sine of angle x.
4-106
sqrt_16
Implementation Notes Computes the sine of elements of vector x. It uses the following Taylor series
to compute the angle x in quadrant 1 (0−π/2).
c1 = 3.140625x
c2 = 0.02026367
c3 = − 5.3251
c4 = 0.5446778
c5 = 1.800293
The angle x in other quadrant is calculated by using symmetries that map the
angle x into quadrant 1.
Benchmarks (preliminary)
Cycles† Core: 19 * nx
Overhead: 17
Code size 93 program; 3 data
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Function ushort oflag = sqrt_16 (DATA *x, DATA *r, short nx)
(defined in sqrtv.asm)
Arguments
x[nx] Pointer to input vector of size nx.
r[nx] Pointer to output vector of size nx containing the sqrt(x).
nx Number of elements of input and output vectors.
oflag Overflow flag.
- If oflag = 1, a 32-bit overflow has occurred.
- If oflag = 0, a 32-bit overflow has not occurred.
Description Calculates the square root for each element in input vector x, storing results
in output vector r.
Implementation Notes The square root of a number(x) can be calculated using Newton’s method. An
initial approximation is guessed and then the approximation gets recomputed
using the formula,
(old approximation 2 * x)
new approximation + old approximation * .
2
The new approximation then becomes the old approximation and the process
is repeated until the desired accuracy is reached.
Benchmarks (preliminary)
Cycles† Core: 35 * nx
Overhead: 14
Code size 84 program; 5 data
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
Function short oflag = sub (DATA *x, DATA *y, DATA *r, ushort nx, ushort scale)
(defined in sub.asm)
Arguments
x[nx] Pointer to input data vector 1 of size nx. In-place processing
allowed (r can be = x = y)
4-108
sub
Overflow Handling Methodology Scaling implemented for overflow prevention (user selectable)
Benchmarks (preliminary)
Cycles† Core: 3 * nx
Overhead: 23
Code size 60
(in bytes)
† Assumes all data is in on-chip dual-access RAM and that there is no bus conflict due to twiddle
table reads and instruction fetches (provided linker command file reflects those conditions).
All functions in the DSPLIB are provided with execution time and code size
benchmarks. While developing the included functions, we tried to compromise
between speed, code size, and ease of use. However, with few exceptions, the
highest priority was given to optimize for speed and ease of use, and last for
code size.
Topic Page
5-1
What DSPLIB Benchmarks are Provided
For functions in which it is difficult to determine the number of cycles in the ker-
nel code as a function of the data size parameters, we have included direct
cycle count for specific data sizes.
- no pipeline hits
A linker command file showing the memory allocation used during testing and
benchmarking in the Code Composer C55x Simulator is included under the
example subdirectory.
5-2
Chapter 6
This chapter details the software updates and customer support issues for the
TMS320C55x DSPLIB.
Topic Page
6-1
DSPLIB Software Updates
We encourage the use of the software report form (report.txt) contained in the
DSPLIB root directory to report any problem associated with the C55x
DSPLIB.
6-2
Appendix
AppendixAA
Unless specifically noted, DSPLIB functions use Q15 format or to be more ex-
act Q0.15. In a Qm.n format, there are m bits used to represent the two’s com-
plement integer portion of the number, and n bits used to represent the two’s
complement fractional portion. m+n+1 bits are needed to store a general Qm.n
number. The extra bit is needed to store the sign of the number in the most-sig-
nificant bit position. The representable integer range is specified by (−2 m, 2 m)
and the finest fractional resolution is 2 *n .
For example, the most commonly used format is Q.15. Q.15 means that a
16-bit word is used to express a signed number between positive and negative
1. The most-significant binary digit is interpreted as the sign bit in any Q format
number. Thus in Q.15 format, the decimal point is placed immediately to the
right of the sign bit. The fractional portion to the right of the sign bit is stored
in regular two’s complement format.
Topic Page
A-1
Q3.12 Format
A-2
Appendix
AppendixBA
The most optimal method for calculating the inverse of a fractional number
(Y=1/X) is to normalize the number first. This limits the range of the number
as follows:
0.5 v Xnorm t 1
−1 v Xnorm v −0.5 (1)
The resulting equation becomes
Y+ 1
(Xnorm * 2 *n )
or
Y+ 2n (2)
Xnorm
where n = 1, 2, 3, …, 14, 15
Letting Ye + 2 n :
Ye + 2 n (3)
Substituting (3) into equation (2):
Y + Ye * 1 (4)
Xnorm
Letting Ym + 1 :
Xnorm
Ym + 1 (5)
Xnorm
Substituting (5) into equation (4):
Y + Ye * Ym (6)
For the given range of Xnorm, the range of Ym is:
1 v Ym t 2
−2 v Ym v −1 (7)
To calculate the value of Ym, various options are possible:
a) Taylor Series Expansion
b) 2nd,3rd,4th,.. Order Polynomial (Line Of Best Fit)
c) Successive Approximation
B-1
Calculating the Reciprocal of a Q15 Number
Ym (new) + 1 (c1)
Xnorm
or
Ym (new) * X + 1 (c2)
Dy + Dxy * 1 (c5)
Xnorm
Assume that 1/Xnorm is approximately equal to Ym(old):
Dy + Dxy * Ym (old) (approx) (c6)
Substituting (c6) into (c4):
Ym (new) + Ym (old) * Dxy * Ym (old) (c7)
Substituting for Dxy from (c3) into (c7):
Ym (new) + Ym (old) * (Ym (old) * Xnorm * 1) * Ym(old)
Ym (new) + Ym (old) * Ym (old)2 * Xnorm ) Ym (old)
Ym (new) + 2 * Ym (old) * Ym (old)2 * Xnorm (c8)
B-2
Calculating the Reciprocal of a Q15 Number
Ym (old) + Ym (new) + Ym
Ym + 2 * Ym * Ym 2 * Xnorm (c9)
If we start with an initial estimate of Ym, then equation (c9) converges to a solu-
tion very rapidly (typically 3 iterations for 16-bit resolution).
The initial estimate can be obtained from a look-up table, from choosing a mid-
point, or simply from linear interpolation. The method chosen for this problem
is linear interpolation and accomplished by taking the complement of the least
significant bits of the Xnorm value.
Index
A conversion
floating-point to Q15 (fltoq15) 4-62
Q15 to floating-point (q15tofl) 4-95
acorr 4-7
adaptive delayed LMS filter 4-39 convol 4-31
fast implemented 4-41 convol1 4-33
add 4-9 convol2 4-35
arctangent 2 implementation 4-10 convolution 4-31
arctangent implementation 4-11 convolution (fast) 4-33
atan16 4-11 convolution (fastest) 4-35
atan2_16 4-10 corr 4-37
autocorrelation 4-7 correlation
auto (acorr) 4-7
full-length (corr) 4-37
B
base 10 logarithm 4-78 D
base 2 logarithm 4-80
base e logarithm 4-81 decimating FIR filter 4-52
bexp 4-13 dlms 4-39
block exponent implementation 4-13 dlmsfast 4-41
double-precision IIR filter 4-67
C DSPLIB
arguments 3-2
cascaded IIR direct form I 4-73 calling a function from assembly language source
code 3-3
cascaded IIR direct form II 4-69, 4-71
calling a function from C 3-3
cbrev 4-14 content 2-2
cbrev32 4-15 data types 3-2
cfft 4-16 dealing with overflow and scaling issues 3-4
cfft32 4-19 how to install 2-3
cfir 4-21 how to rebuild 2-4
cifft 4-26
cifft32 4-28 E
complex bit reverse 4-14
32-bit 4-15 expn 4-45
complex FIR filter 4-21 exponential base e 4-45
Index-1
Index
F iircas51 4-73
iirlat 4-75
FFT index and value of maximum element of a vec-
forward complex tor 4-85
cfft 4-16 index and value of minimum element of a vec-
cfft32 4-19 tor 4-88
forward real, in-place (rfft) 4-100, 4-102 index of maximum element of a vector 4-82
inverse complex index of maximum element of a vector less than or
cifft 4-26 equal to 34 4-84
cifft32 4-28
index of minimum element of a vector 4-86
inverse real, in-place (rifft) 4-104, 4-105
interpolating FIR filter 4-55
fir 4-46
inverse complex FFT 4-26
FIR filter 4-46
32-bit 4-28
complex (cfir) 4-21
decimating (firdec) 4-52 inverse real FFT , in-place 4-104, 4-105
direct form (fir) 4-46
Hilbert Transformer 4-63
interpolating (firinterp) 4-55
L
lattice forward (firlat) 4-57 lattice forward (FIR) filter 4-57
symmetric (firs) 4-59 lattice inverse (IIR) filter 4-75
FIR Hilbert Transformer 4-63 ldiv16 4-77
fir2 4-49 log_10 4-78
firdec 4-52 log_2 4-80
firinterp 4-55 logarithm
firlat 4-57 base 10 (log_10) 4-78
firs 4-59 base 2 (log_2) 4-80
floating-point to Q15 conversion 4-62 base e (logn) 4-81
fltoq15 4-62 logn 4-81
forward complex FFT 4-16
32-bit 4-19 M
forward real FFT, in-place 4-100, 4-102
matrix multiplication 4-89
Index-2
Index
N S
natural logarithm (logn) 4-81 sine 4-106
neg 4-92 sqrt_16 4-107
neg32 4-93 square root of a 16-bit number 4-107
sub 4-108
P symmetric FIR filter 4-59
power 4-94
V
Q vector add 4-9
Q15 to floating-point conversion 4-95 vector negate 4-92
q15tofl 4-95 vector negate, double-precision 4-93
vector power 4-94
Index-3
IMPORTANT NOTICE
Texas Instruments Incorporated and its subsidiaries (TI) reserve the right to make corrections, enhancements, improvements and other
changes to its semiconductor products and services per JESD46, latest issue, and to discontinue any product or service per JESD48, latest
issue. Buyers should obtain the latest relevant information before placing orders and should verify that such information is current and
complete. All semiconductor products (also referred to herein as “components”) are sold subject to TI’s terms and conditions of sale
supplied at the time of order acknowledgment.
TI warrants performance of its components to the specifications applicable at the time of sale, in accordance with the warranty in TI’s terms
and conditions of sale of semiconductor products. Testing and other quality control techniques are used to the extent TI deems necessary
to support this warranty. Except where mandated by applicable law, testing of all parameters of each component is not necessarily
performed.
TI assumes no liability for applications assistance or the design of Buyers’ products. Buyers are responsible for their products and
applications using TI components. To minimize the risks associated with Buyers’ products and applications, Buyers should provide
adequate design and operating safeguards.
TI does not warrant or represent that any license, either express or implied, is granted under any patent right, copyright, mask work right, or
other intellectual property right relating to any combination, machine, or process in which TI components or services are used. Information
published by TI regarding third-party products or services does not constitute a license to use such products or services or a warranty or
endorsement thereof. Use of such information may require a license from a third party under the patents or other intellectual property of the
third party, or a license from TI under the patents or other intellectual property of TI.
Reproduction of significant portions of TI information in TI data books or data sheets is permissible only if reproduction is without alteration
and is accompanied by all associated warranties, conditions, limitations, and notices. TI is not responsible or liable for such altered
documentation. Information of third parties may be subject to additional restrictions.
Resale of TI components or services with statements different from or beyond the parameters stated by TI for that component or service
voids all express and any implied warranties for the associated TI component or service and is an unfair and deceptive business practice.
TI is not responsible or liable for any such statements.
Buyer acknowledges and agrees that it is solely responsible for compliance with all legal, regulatory and safety-related requirements
concerning its products, and any use of TI components in its applications, notwithstanding any applications-related information or support
that may be provided by TI. Buyer represents and agrees that it has all the necessary expertise to create and implement safeguards which
anticipate dangerous consequences of failures, monitor failures and their consequences, lessen the likelihood of failures that might cause
harm and take appropriate remedial actions. Buyer will fully indemnify TI and its representatives against any damages arising out of the use
of any TI components in safety-critical applications.
In some cases, TI components may be promoted specifically to facilitate safety-related applications. With such components, TI’s goal is to
help enable customers to design and create their own end-product solutions that meet applicable functional safety standards and
requirements. Nonetheless, such components are subject to these terms.
No TI components are authorized for use in FDA Class III (or similar life-critical medical equipment) unless authorized officers of the parties
have executed a special agreement specifically governing such use.
Only those TI components which TI has specifically designated as military grade or “enhanced plastic” are designed and intended for use in
military/aerospace applications or environments. Buyer acknowledges and agrees that any military or aerospace use of TI components
which have not been so designated is solely at the Buyer's risk, and that Buyer is solely responsible for compliance with all legal and
regulatory requirements in connection with such use.
TI has specifically designated certain components as meeting ISO/TS16949 requirements, mainly for automotive use. In any case of use of
non-designated products, TI will not be responsible for any failure to meet ISO/TS16949.
Products Applications
Audio www.ti.com/audio Automotive and Transportation www.ti.com/automotive
Amplifiers amplifier.ti.com Communications and Telecom www.ti.com/communications
Data Converters dataconverter.ti.com Computers and Peripherals www.ti.com/computers
DLP® Products www.dlp.com Consumer Electronics www.ti.com/consumer-apps
DSP dsp.ti.com Energy and Lighting www.ti.com/energy
Clocks and Timers www.ti.com/clocks Industrial www.ti.com/industrial
Interface interface.ti.com Medical www.ti.com/medical
Logic logic.ti.com Security www.ti.com/security
Power Mgmt power.ti.com Space, Avionics and Defense www.ti.com/space-avionics-defense
Microcontrollers microcontroller.ti.com Video and Imaging www.ti.com/video
RFID www.ti-rfid.com
OMAP Applications Processors www.ti.com/omap TI E2E Community e2e.ti.com
Wireless Connectivity www.ti.com/wirelessconnectivity
Mailing Address: Texas Instruments, Post Office Box 655303, Dallas, Texas 75265
Copyright © 2013, Texas Instruments Incorporated