0% found this document useful (0 votes)
146 views

Numerically Controlled Oscillator

This document describes the implementation of five numerically controlled oscillator (NCO) architectures on an Altera Cyclone II FPGA. It provides an introduction to NCOs and their operation, including the main components of phase accumulator, phase-to-amplitude converter, and CORDIC algorithm. The document then discusses designing and simulating different NCO architectures in Quartus, including large and small ROM-based, multiplier-based, and parallel/serial CORDIC implementations. It concludes with compiling and testing the designs on a Terasic DE2 board containing a Cyclone II FPGA.

Uploaded by

hyrulean64
Copyright
© © All Rights Reserved
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
146 views

Numerically Controlled Oscillator

This document describes the implementation of five numerically controlled oscillator (NCO) architectures on an Altera Cyclone II FPGA. It provides an introduction to NCOs and their operation, including the main components of phase accumulator, phase-to-amplitude converter, and CORDIC algorithm. The document then discusses designing and simulating different NCO architectures in Quartus, including large and small ROM-based, multiplier-based, and parallel/serial CORDIC implementations. It concludes with compiling and testing the designs on a Terasic DE2 board containing a Cyclone II FPGA.

Uploaded by

hyrulean64
Copyright
© © All Rights Reserved
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
You are on page 1/ 33

Universidad Nacional de Crdoba

Facultad de Matemtica, Astronoma y Fsica

Plataformas configurables para Instrumentacin


Cientifica
Numerically Controlled Oscillator
Teachers:
Eduardo Romero
Gabriela Peretti
Student:

Emanuel Dri

Table of Contents
Introduction..........................................................................................................................................3
Numerically controlled oscillators.......................................................................................................3
Main architecture and operation......................................................................................................3
Phase accumulator.......................................................................................................................5
Maximum output frequency........................................................................................................6
CORDIC......................................................................................................................................6
NCO design......................................................................................................................................7
IP core function generator...........................................................................................................7
Design features............................................................................................................................9
Implementation................................................................................................................................9
Simulation.................................................................................................................................10
Large ROM architecture.......................................................................................................11
Small ROM architecture.......................................................................................................13
Multiplier based architecture................................................................................................15
Parallel CORDIC..................................................................................................................18
Serial CORDIC.....................................................................................................................19
Compilation...............................................................................................................................22
Implementation.........................................................................................................................26
Serial CORDIC implementation......................................................................................28
Large ROM...........................................................................................................................29
Small ROM...........................................................................................................................29
Multiplier based....................................................................................................................29
Parallel CORDIC..................................................................................................................29
Serial CORDIC.....................................................................................................................30
Conclusions....................................................................................................................................31
BIBLIOGRAPHY..........................................................................................................................32

Introduction
This report describes the implementation of five numerically controlled oscillator architectures on
an Altera Cyclone II FPGA. A brief introduction to NCOs and their operation is presented.
The NCO circuits were generated and tested using Quartus Web edition IDE, which provides
bundled an NCO generator.
Finally implementation was done on a Terasic DE2 board, which contains among other resources a
Cyclone II FPGA.

Numerically controlled oscillators


An NCO (numerically controlled oscillator) is a device that creates syncronically a discrete time
discrete value representation of a periodical waveform. In other words, it outputs a finite number of
amplitude samples syncronically with the desired output waveform.
NCOs used in combination with a DAC (digital-analog converter) conform a direct digital
synthesizer (DDS), which is the main purpose of these devices.
The output waveform frequency depends on two input variables, a reference clock signal and a
phase number (tuning word).

NCO

[Q1..Qn]

Phase increment
Figure 1: Block sketch down of an NCO.

Main architecture and operation


The following figure shows an overview of NCO architecture.

Figure 2: Sketch down of the internal components and their interconnections


Initially the phase register is set to 0, by the same time an n bits wide tuning word (m) is loaded into
the control phase register. On each reference clock cycle, the phase accumulator increases the value
in the phase register adding it the number m (i.e. m, 2m, 3m...). This goes on until it overflows and
resets itself to 0. Notice that this block behaves like a 2n module counter which has an increase of m
on each count cycle. The width size n (in bits) of the accumulator is called phase accumulator
precision. Periodical waves are usually represented within the angular range 0..2. Phase values can
phase value360
be converted into angular values by using the following relation =
.
2n
Phase values serve as input to the phase/amplitude converter which transforms them into the output
waveform amplitude values. In order to reduce the converter complexity phase values get trunked
into their r MSBs (most significant bits). Thus only 2r phase values need to be handled by the
converter.
Typically the converter consists of a lookup table which assigns phase values a memory address
where a sample value is stored. To reduce its size, some strategies can be used, i.e. for sine waves
generation, only information relative to its first 90 degrees is necessary due to its symmetrical
nature.
Another alternative are CORDIC algorithms which calculate trigonometrical functions using only
add and shift operations. Therefore no memory structure is required, just temporal data allocation
registers.
The samples width N in bits is called magnitude resolution, this number may not be the same as the
angular resolution.

Figure 3: The figure shows the samples generation process from the
accumulator to the converter output.

Phase accumulator
Given a reference clock frequency F, and a phase increment m, and an accumulator precision m, the
2n
phase accumulator will overflow after
clock cycles, hence the generated waveform period
m
Fm
2n
shall be
and it's frequency can be determined by
.
Fm
2n
F
In addition m=1 determines the frequency resolution, which is
2n
The phase wheel serves showing graphically the accumulator operation.

Figure 4 phase wheel


As the vector rotes around the unit circle, it describes a sine wave. Each of the points on the wheel
correspond to a point of the sine wave.

Figure 5: As the vector rotates around the wheel it describes a sine


wave
For an n bits accumulator there are 2n points. The tuning word m indicates how many points the
phase accumulator skips on each clock cycle.

Maximum output frequency


n

2
cycles of clock reference
m
frequency. After truncation there are s' phase values that enter the converter, thus it will produce a
maximum of s' samples per cycle of the generated waveform. As M increases the number of
samples per output cycle decreases.
According to the Nyquist criterion, for a sampled waveform, at least two samples per output cycle
are needed to reproduce it . Consequently the maximum output frequency in an NCO device is
Fc
limited to
.
2
Given a tuning word m, the phase accumulator resets after s=

CORDIC
It is based on the unit vector rotation. Sine and cosine are calculated simultaneously as (X,Y)
coordinates correspond to the cosine and sine of the angle between the vector and the X axis.
An n number of iteration approximate the vector angle to the input angle starting from the 0
degrees;i.e. (X,Y)=(0,0).

Figure 6: Unit vector rotation


The vector coordinates after rotation are:

x '=x cos( ) y sin ()


y ' = y cos()+ x sin( )
This can be re arranged as:
x '=cos ()[ x y tan()]
y ' =cos ()[ y + x tan()]
In order to reduce the complexity of calculations, tan() is restricted so it equals to 2-i , which can be
obtained by binary shifts.
Thus the coordinates equations are:
x i +1=K i [ x yd i2i ]
y i+1 =K i [ y+ xd i2i ]
1
K i=cos (tan1 (2i))=
1+ 2i
di =1
Where i is the iteration number (from 0 to n-1), d is the decision on which direction perform the
next rotation and K is a scaling factor.
On each iteration the decision of on which direction rotate is determined by:
z i+1=z iditan 1 (2i ) , if z<0 then d=-1, otherwise d=1.
As the number of iterations is always fixed, the values of the arc tangents can be stored on a lookup
table or just hardwired.
If the scaling factor is taken off the CORDIC algorithm gets simplified to just additions/subtractions
and shifts. This is achieved by introducing it after all the iterations have finished.

Figure 7: Unscaled rotations


The final scaling factor is calculated using:
n1

A= 1+2i
0

Finally CORDIC equations are:

xi +1=x yd2
y i +1= y + xd2i
z i+1=z id itan1 (2i )
Which provides the following result:
z n=0
x n= A[ x 0 cos z 0 y 0 sin z 0 ]
y n=A [ y 0 cos z 0+ x0 sin z 0 ]
n1

A= 1+2i
0

d =1
Therefore, sine and cosine can be calculated by setting

X 0 , Y 0=

1
,0
A

NCO design
Implementation of a NCO circuit requires interconnection between combinational elements, random
access memories and register memories. Thus the construction of a custom NCO device demands
many integrated circuits and a considerable surface to mount them. FPGA devices simplify this task
allowing to generate complete NCOs on a single chip.. FPGAs contain programmable logical
elements, and a hierarchy of reconfigurable interconnects that allow logical elements to be
physically connected; as well as other additional elements such as random access memories,
arithmetical dedicated circuitry, DSP blocks, whose presence and quantity depend on the device
model.
The five NCO architectures where implemented on an Terasic DE2 design board, which contains an
Altera Cyclone II FPGA using the IP NCO generator plug-in provided bundled with Altera Quartus
IDE. NCOs generated by this plug-in produce sinusoidal waveforms, also they can be configured
with a single or dual complementary outputs (sine-cosine). The samples produced consist of integer
values in complement by two notation.

Figure 8: Terasic DE2 board

IP core function generator


NCO mega core supports the following architectures (explain them)
Lookup table based architectures (If dual output is selected then memory use increases to the
double):
Large ROM
Large ROMs use M4K memories to store data relative to 360 degrees. Given an
angular resolution r, the number of memory entries is 2r as there must be an
amplitude value in the ROM for each phase value.
Large ROM has low LE resources consumption. For dual output NCOs, it uses the
double ROM space to store data related to cosine.
Small ROM
Only information relative to the first 45 degrees of sine and cosine (simultaneously)
is stored in memory. Therefore it does not consume more memory if a dual output
implementation is desired (neither less in the case single output).
The rest of the values are derived from this data following these rules:
Position in
Unit Circle
1
2
3
4
5
6
7
8

Range for Phase x


0 <= x < /4
/4 <= x < /2
/2 <= x < 3/4
3/4 <= x <
<= x < 5/4
5/4 <= x < 3/2
3/2 <= x < 7/4
7/4 <= x < 2

sin(x)
sin(x)
cos(/4x)
cos(x-/2)
sin(-x)
-sin(x-)
-cos(3/2-x)
-cos(x-3/2)
-sin(2-x)

cos(x)
cos(x)
sin(/2-x)
-sin(x-/2)
-cos(-x)
-cos(x-)
-sin(3/2-x)
sin(x-3/2)
cos(2-x)

Figure 9
r

Given an angular resolution r, the number of samples stored is

22
8

as 45 is the

8th part of 360 and data related to both waveforms is needed.


More accurately the memory bits required are 26r( P1) . Where P is the
magnitude precision1.
CORDIC
It uses only LE to calculate output sine values using the CORDIC algorithm. No
memory resource is consumed. This is achieved because the CORDIC algorithm
operations only require adds/subtractions and binary shifts.
In the CORDIC architecture the angular resolution represents the angle word input
width.
The NCO plug-in supports the following variants:
Parallel CORDIC.
This variant is capable of producing a sample for each clock cycle, therefore it is
possible to assume it is based on the unrolled parallel CORDIC.
Iterative bit Serial
The serial CORDIC uses less resources than the parallel one at the expense of a
slower output generation (also accuracy at low frequencies is compromised). It
produces an output sample each N reference clock cycles, where N is the
magnitude precision.
Multiplier based
Uses DSP blocks multipliers if available to reduce the use of other resources. If they are
not available or not present at all on the FPGA, multipliers are generated using LEs.

Design features

Common to all architectures:


Precisions (in bits)
Phase accumulator precision
Angular resolution
Magnitude precision
Outputs (single or dual)
Specific parameters
CORDIC

The most significant bit contains information about the sign. As the sine/cosine of the first 45 degrees are positive it
is not necessary to include it. Therefore P-1 bits are used for each sample.

Serial
Parallel
Multiplier based
Multipliers (use LEs) or dedicated multipliers.
Clock cycles per output (1 or 2)
Other features (out the scope of this essay)
Phase dithering (due to truncation and finite precision, spurs may appear in the spectrum
of generated waveform, dither adds random noise to the overall spectrum but that tends
to increase the SFDR by reducing the repetition of values).
Frequency modulation
Multichannel NCO
Frequency hopping (for use in spread spectrum)

Implementation
In order to compare the use of resources and operation of the five architectures, an NCO block from
each of them was generated using the following common set of design parameters.
Dither: no dither applied
phase accumulator precision: 32bits
Angular resolution: 4 bits
Magnitude precision: 13 bits
Dual output: no.
The generated blocks were simulated on the same conditions, these conditions were reproduced on
the DE2 board.
In order to make easier the visualization of simulation and output values, a 4bits angular resolution
(which is the minimum available) was selected. Therefore for those output frequencies below
Fc
the ROM based NCOs generate only 16 samples per output waveform cycle. Thus no dither
16
was used.
The following table shows a comparison of the resource use estimation provided by the IP plug-in.
Resource/Arch Large ROM
Small ROM
CORDIC
CORDIC
Multiplier
itecture
(serial)
(parallel)
based
LE elements

68

191

620

1182

213

Memory bits

208

48

156

M4K memory 1
elements

DSP elements

Simulation
In order to test the Large ROM Small ROM, parallel CORDIC and Multiplier based architectures
the following design which contains only an NCO block and control signals I/Os was used

Figure 10: Simulation design


Input/output description
phi_inc
Phase increment value (tuning word)
clk
reference clock
reset_n
reset (active low)
fsin_o
waveform amplitude value output
out_valid
Determines if the output value is valid or not. NCOs have a start up time (clock cycles)
in which it does not generates output values. Also serial CORDIC architecture produce a
valid sample each N clock cycles. This output manifests a logical 0 if not valid else 1.
Notice in the figure reset_n and clken are set to vcc or a logical 1. This is necessary to make the
block work properly.
All five variant NCO blocks were generated. Each of them was tested using the simulator tool
provided with Quartus SW using the following values:

Reference clock frequency:


100Hz (except for serial CORDIC that requires a minimum clock reference of 624Hz is
needed2 in which a reference of 650Hz is used).
Tuning words:
F M
F 2n
All architectures except serial CORDIC F o= c n M = o
Fc
2
42949673 for 1Hzoutput
1 Hz232

42949673=
100 Hz
128849019 for 3Hz output
3 Hz232

128849019=
100 Hz
Serial CORDIC
167503725 for 3 Hz output
55834575 for 1Hz output
Simulation time 2.5 seconds

The serial CORDIC architecture produces a sample each 13 clock cycles, as the angular resolution is set to 4, a
minimum of 16 samples at 3Hz was intended, therefore the clock frequency was calculated:
cycles
16 samplescycle13
3 Hz =624 Hz
sample

Fc
so ROM based architecture
16
can be able to produce their whole set of values on each output waveform cycle.
Output frequencies where deliberately selected to be inferior to

All simulations had three common stages:


1. A transient time
2. The NCO is set to generate a 1Hz waveform during 1 second (only one cycle of the output
waveform is produced)
3. The NCO is set to generate a 3Hz waveform

Large ROM architecture

Figure 11
The picture shows an overview of a 2.5 seconds simulation of a Large ROM NCO.
Time line description:
clk: reference clock
out_valid_ncol: valid output state
phase_ncol: tuning word
sine_o_ncol: NCO amplitude output value
During the interval from 0s to 44ms the start up transient takes place and the NCO does not produce
a valid output.

After the start up transient, during the following second the NCO in response to the control word
generates an output of 1Hz composed of 16 discrete values.

Figure 10: 1Hz output generation


The sine_o_ncol line shows a waveform representation of the block output. It can be seen how a

whole cycle fits in 1 second

Figure 12
The figure shows the signed literal output of the NCO block during the same interval. The signal
generation is composed by 16 samples starting from 0.
In Large ROM architecture samples are uniformly distributed in time. The duration of each of them
in clock cycles is
Frequency clock
cyles
100 Hz
=
duration=
for the 1Hz output 6.25
sample 16 samples
samples number
Notice that this is not an exact proportion. To produce a synchronous output, some peaks and zeros
last 1 more cycle. In the case of the 1Hz output this means that Zeros and peaks last 7 clock cycles
while intermediate values last 6 cycles.

6 7

Figure 13:
cyles
100 Hz
=
sample 316 samples
intermediate values last 2 cycles, while some zeros/peaks last 3 cycles.
For the 3Hz output the average duration of each sample is 2.0833

Figure 14

Small ROM architecture

Then

Figure 15
The picture shows an overview of a 2.5 seconds simulation.
In this case, transient time last 74ms.

Figure 16:
The transient, the NCO produced valid samples, according to the 42949673 tuning word, the output
waveform frequency was 1Hz.

Figure 17: 1Hz output


Notice that some values last the double than others, also, a priori only 13 samples are generated.

Figure 18

Figure 19:

10

11

12

13

That is a consequence the way how this architecture calculates amplitudes


Due to the magnitude resolution p=13 amplitudes are distributed in the interval -4095..4095.
Recall that Small ROM architecture only stores data relative to the first 45 degrees or /4 radians.
For the given angular resolution r=4bits, there are 16 equidistant samples over 360 degrees. So
45
=2 samples . These correspond to 0 and 22.5
within 45 degrees there are 16 samples
360
(/8), whose sine/cosine are:
sine

cosine
0
1
0,3826834324 0,9238795325

The image of sine/cosine functions goes from -1 to 1; the output values of an NCO go from -2p-1+1
to 2p-1-1. In order to convert real values into output values the next rule of three must be applied.
real
output value = value (2 p11)
sine

cosine
sine output
cosine output
0
1
0
4095
0,3826834324 0,9238795325
1567
3783

Using the values generation rules the remaining the values were calculated.

Angle
0
1/8

1/4

3/8

Phase Value sine output


0
1

cosine output

cos =cos cos( )=3783


44
16
8

( ) ()

3
3
cos =cos cos(0)=4095
4 8
32

( ) ()

1/2


cos( )=cos(0)=4095
2 2

5/8

cos( )=cos( )=3783


8 2
8

3/4

7/8

9/8

5/4

10

11/8

11

3/2

12

13/8

13

7/4

14

15/8

15

sin(

( 78 )=sin( 8 )=1567

sin ( )=sin(0)=0

cos

4095
3783

sin( )=sin( )=cos


cos( )=3783
2 4
4
16
8

( )

sin( )=sin ( )=1567


2 8
8
sin

sin

( 2 2 )=sin (0)=0
( 58 2 )=sin( 8 )=1567

)=sin( )=3783 cos( 4 )=cos( 4 )=3783


4
4

sin

sin

Calculated sine output

0
1567

=cos( )=3783
8
8
cos ( )=cos (0)=4095

cos

( 98 )=sin( 8 )=1567 cos( 98 )=cos( 8 )=3783

( 32 54 )=cos( 4 )=sin( 2 4 )=3783

3 11

cos

=cos( )=3783
2
8
8

cos

sin

( 32 54 )=sin ( 4 )=3783

( 32 118 )=sin( 8 )=1567


3 3
sin (

=sin (0)=0
2
2 )

sin

( 32 32 )=cos(0)=4095

13 3

( 138 32 )=cos( 8 )=3783 sin ( 8 2 )=sin( 8 )=1567


7

sin (2 )=sin( )=3783 cos ( 2 4 ) =cos ( 4 )=3783


4
4

cos

sin 2

=sin( )=1567 cos 2 4 =cos( 8 )=3783


4
8

0
1567

2213

3217
4095
3783

2895
1567
0
-1568
-2897
-3785
-4095
-3783
-2895
-1567

This table shows how the output values were calculated using the provided rules and the criteria
applied.
Observe the repeated samples are not the same obtained in the simulation3. Nevertheless it still
proves how during the output generation amplitude values can repeat appearing as they last the
double.

Multiplier based architecture

During implementation tests the NCO behaves like the simulation indicates. Beware the IP plug-ins are not open
source. Therefore their real behavior may not be the same specified in their documentation.

Figure 20: Overview of the complete simulation

Figure 21: Start up transient time


This transient lasts 104ms

Figure 22: 1Hz wave generation


Notice that samples are uniformly distributed in time, just like the Large ROM architecture. Also in
this case, some zeros and peaks last 1 cycle more than intermediate values.

Figure 23:
1Hz output

3Hz output

Parallel CORDIC
The following figure shows an overview of a 2,5 simulation of a parallel CORDIC based NCO

In this case, the start up transient lasts 305ms.

Figure 24

Figure 25
1Hz output waveform generation.

Figure 26
3Hz output waveform generation

Figure 27
Given the used design values, on this architecture each cycle of the output waveform is composed

by 30 samples. Also, peak values last the double that intermediate values.

This picture is represents a 1Hz waveform generation. The low peak -4023 lasts 7 clock cycles,
while others last 3 cycles. The picture below corresponds to a 3 Hz output.

Figure 28

Serial CORDIC
Serial CORDIC produces a valid output after N clock cycles, meanwhile it outputs intermediate non
valid values. In order to test this architecture, the simulation design had to be altered in order to
only output valid values. This was achieved by using D latches with enable input.

Figure 29: Serial CORDIC simulation design

Figure 30: Connection between latches and the NCO block.


Finally all the latches are connected to the common test output bus.

Figure 31
Serial CORDIC simulation

Figure 32
The figure shows an overview of a 4 seconds simulation of a serial CORDIC NCO.
The line sine out correspond to the legal output values while unfiltered_output represents all the
generated values (valid and invalid). Notice that the waveform view of both outputs is very similar.

Figure 33
Given the clock frequency of 650Hz, the block has a start up transient time of 28,465 ms (the fastest
of all the tried alternatives).
The following picture shows the generation of a 1Hz output.

The next picture shows the generation of a 3Hz output.

Due to the selected magnitude precision N=13, the block generates a valid sample after 13 reference
clock cycles.

1 2 3 4 5 6 7 8 9 10 11 12 13

Figure 34:
The picture shows the generation of a sample on a 1Hz output.

1 2 3 4 5 6 7 8 9 10 11 12 13

Figure 35:
The picture shows the generation of a sample on a 3Hz output.

Compilation
In order to compare the resource use of the five architectures, a compilation consisting only on the
NCO blocks was done.
The following table shows the most relevant features of the resource use of each architecture
provided the used configuration values.
Resource
Total logic elements
combinational with no register
register only
combinational with a register
Logic element usage by number of LUT inputs
4 input functions
3 input functions
<=2 input functions
register only

Large ROM Small ROM Multiplier based CORDIC (parallel) CORDIC (serial) Total available
303
400
442
1178
647
33126
89
89
95
90
236
58
106
149
141
119
156
205
198
947
292
81
89
75
58

94
103
97
106

85
116
92
149

132
409
496
141

186
161
181
119

logic elements by mode


normal mode
arithmetic mode

206
39

233
61

217
76

469
568

423
105

Total registers
dedicated logic register
i/o registers

214
214
0

311
311
0

347
347
0

1088
1088
0

411
411
0

34593
33216
1377

total labs partially or completely used

74

85

126

131

67

2076

global signals

10

10

10

10

10

1
208
4608
0
0
10

1
48
4608
0
0
10

2
156
9216
4
0
10

0
0

0
0

0
0
10

0
0
10

m4ks
total block memory bits
total block implementation memory bits
embedded multiplier 9 bit elements
plls
global clocks

105
483840
483840
70
4
16

For the given configuration values set, the ROM based architectures minimize the use of logical
elements, specially the Large ROM, which has the lowest use of this resource.
Notice that the overall use of resources of multiplier based architecture is close to the Small ROM,
therefore it can be assumed that it implements a ROM like NCO strategy. Recall that during
simulations this architecture produced a similar output to the Large ROM.

The next charts show a comparison between these architectures about the use of LEs.

Logic Elements
total labs partially or completely used

combinational with a register

Large ROM
Small ROM
Multiplier based
CORDIC (parallel)
CORDIC (serial)

register only

combinational with no register

Total logic elements


0

200 400 600 800 1000 1200

Figure 36
Due to their nature CORDIC based architectures use the highest amount of this resource. They
calculate trigonometrical functions using only shifts and adds/subtractions. Because of that these
consume the highest amount of LEs for the same design features.
Notice that serial CORDIC has the lowest LAB use. They achieve more resource use efficiency
because on each iteration the logical/mathematical hardware used is the same.
Logic elements by mode
CORDIC architectures are arithmetical intensive. Therefore they are the ones that use most
arithmetical mode LEs.

Logic elements by mode


600
500
Large ROM
Small ROM
Multiplier based
CORDIC (parallel)
CORDIC (serial)

400
300
200
100
0
normal mode

arithmetic mode

Figure 37
On the opposite hand, ROM based architectures require much less arithmetical mode LEs (which
are used to implement adders, counters, accumulators and comparators) as their operation do not
need much calculations to be done for each output sample.
Logic elements by LUT inputs

Logic elements by LUT inputs


500
450
400
350

Large ROM
Small ROM
Multiplier based
CORDIC (parallel)
CORDIC (serial)

300
250
200
150
100
50
0
4 input functions

3 input functions
register only
<=2 input functions

Figure 38
Memory use
Large ROM architecture is the one that most memory bits use, followed by the multiplier based
architecture. Notice that Small ROM uses much less of this resource than the other two.

As their description in the plug-in user guide claims, CORDICs do not require M4K memory.
Which makes them suitable on those cases when memory is at premium.

Memory bits usage


250
200

Large ROM
Small ROM
Multiplier based
CORDIC (parallel)
CORDIC (serial)

150
100
50
0
total block memory bits

Figure 39
Despite the Large ROM uses highest amount of memory bits, it has the most efficient
implementation, having the lowest memory fragmentation. This can be noticed by the M4K
memories usage, which finally determines the physical resource usage.
Notice that multiplier based NCO uses the highest amount of M4K blocks.

M4K memories usage


2
1.8
1.6

Large ROM
Small ROM
Multiplier based
CORDIC (parallel)
CORDIC (serial)

1.4
1.2
1
0.8
0.6
0.4
0.2
0
m4ks

Figure 40
Multiplier based architecture provides a similar output to the Large ROM, using the same amount of
logical resources that Small ROM. In many applications the embedded multipliers are not

completely used (recall that this implementation requires only 4 of the 70 that the Cyclone II
provides). Therefore this architecture must be considered as an option to ROM based ones once the
FPGA overall resources usage of the rest of circuit implementations on the FPGA is known.

Implementation
In order to test the NCO blocks operation, it was necessary to convert their output from plain binary
to a visualizable value on the DE2 seven segments displays using three hexadecimal to seven
segments decoders (for the LSB) and a negated output to one of the seven segments displays led to
visualize the sign.

Figure 41
The figure shows an output sample displayed on the DE2 board.
The generated NCO blocks produce for the given configuration and operation values set a minimum
of 16 samples per cycle. Therefore it was necessary to implement a 23 bits frequency divider to
reduce the input frequency.
Tuning words were loaded into a two words 32bits length ROM. The ROM address bus was
attached to a DE2 switch. Thus the switch allowed to select one of the two tuning word in any
moment during the NCO operation. In order to get the ROM compiled with the desired content, its
design block was indicated to the initial values from a mif file.
The next example shows how to generate a memory initialization file.

Width=32;
Depth=2;
address_radix=uns;
Data_radix=uns;
CONTENT BEGIN
0: 42949673;
1: 128849019;
END;
The lines between content begin and end associate a data to a memory address.
The following figure shows the test design used for all the architecture except the serial CORDIC.

3
4

Figure 42: Implementation design


1. Sign output

Figure 43
2. Hexadecimal to 7 segments decoders

Figure 44
3. ROM memory block

Figure 45
4. Frequency divider

Figure 46
Serial CORDIC implementation

Recall that during simulations it was necessary to attach latches to the serial CORDIC NCO to
visualize only valid output values, therefore the implementation design must incorporate them.

Figure 47
Once implemented each design it was possible to check the generated samples sequence against
those obtained by simulation. The displayed values during implementation were in hexadecimal
format, then the simulation outputs had to be adapted.
Implementation output values were recorded in video during 2 output cycles (using both tuning
words) then analyzed taking them down using VLC video software.

Large ROM

Figure 48
Simulation values: 0000-061F-0B50-0EC7-0FFF-0EC7-0B50-601F-0000-19E1-14B0-1139-10011139-14B0-19E1-0000
Obtained values: 0000-061F-0B50-0EC7-0FFF-0EC7-0B50-0000-19E1-14B0-1139-1001-113914B0-19E1-000

Small ROM

Simulation values: 0000-061F-0EC7-0FFF-0EC7-061F-0000-19E1-1139-1001-1139-19E1-0000


Obtained values: 0000-061F-0EC7-0FFF-061F-0000-19E1-1139-0001-1139-19E1-0000

Multiplier based

Simulation values: 0000-061F-0B50-0EC7-0FFF-0EC7-0B50-061F-0000-19E1-14B0-1139-10011139-14B0-19E1-0000


Obtained values: 0000-061F-0B50-0EC7-0FFF-0EC7-0B50-061F-0000-19E1-14B0-1139-10011139-14B0-19E1-0000

Parallel CORDIC

Simulation values: 1D4B-0518-0503-0B9E-0AF8-0F28-0F27-0FB7-0F27-0F28-0AF8-0B9E-05030518-1D4B-02B5-1AE8-1AFD-1462-1508-10D8-10D9-1049-10D9-10D8-1508-1462-1AFD1AE8-02B5-1D4B


Obtained values: 1D4B-0518-0503-0B9E-0AF8-0F28-0F27-0FB7-0F27-0F28-0AF8-0B9E-05030518-1D4B-02B5-1AE8-1AFD-1462-1508-10D8-10D9-1049-10D9-10D8-1508-1462-1AFD1AE8-02B5-1D4B

Serial CORDIC

Simulation values: 1D4B-0518-0503-0B9E-0AF8-0F28-0F27-0FB7-0F27-0F28-0AF8-0B9E-05030518-1D4B-02B5-1AE8-1AFD-1462-1508-10D8-10D9-1049-10D9-10D8-1508-1462-1AFD1AE8-02B5-1D4B


Obtained values: 1D4B-0518-0503-0B9E-0AF8-0F28-0F27-0FB7-0F27-0F28-0AF8-0B9E-05030518-1D4B-02B5-1AE8-1AFD-1462-1508-10D8-10D9-1049-10D9-10D8-1508-1462-1AFD1AE8-02B5-1D4B

Conclusions

In all the cases the values obtained by simulation were equal to those gathered during
implementation test. This proves the reliability of simulation models, even if IP modules are
used.
Also, values at 1Hz output are the same obtained at 3Hz.
The obtained values using the Large ROM architecture are the same for the multiplier based.
Therefore these modules can be replaced with each other. The same happened between serial
and parallel CORDIC.
The values produced by the Small ROM architecture are congruent with those obtained
using its equations, not their duration.

BIBLIOGRAPHY

Ray Andraka (1998). A survey of CORDIC algorithms for FPGAs.


Bria, Oscar N. (2002). Descripcin en VHDL de arquitecturas para implementar el
algoritmo CORDIC.
Eva Murphy, Colm Slattery (2004). All about direct digital synthesis.
Altera Corporation (2007). Cyclone II Device Handbook.
Analog Devices (2009). Fundamentals of DDS
Altera Corporation (2013). NCO Mega Core User Guide

You might also like