Numerically Controlled Oscillator
Numerically Controlled Oscillator
Emanuel Dri
Table of Contents
Introduction..........................................................................................................................................3
Numerically controlled oscillators.......................................................................................................3
Main architecture and operation......................................................................................................3
Phase accumulator.......................................................................................................................5
Maximum output frequency........................................................................................................6
CORDIC......................................................................................................................................6
NCO design......................................................................................................................................7
IP core function generator...........................................................................................................7
Design features............................................................................................................................9
Implementation................................................................................................................................9
Simulation.................................................................................................................................10
Large ROM architecture.......................................................................................................11
Small ROM architecture.......................................................................................................13
Multiplier based architecture................................................................................................15
Parallel CORDIC..................................................................................................................18
Serial CORDIC.....................................................................................................................19
Compilation...............................................................................................................................22
Implementation.........................................................................................................................26
Serial CORDIC implementation......................................................................................28
Large ROM...........................................................................................................................29
Small ROM...........................................................................................................................29
Multiplier based....................................................................................................................29
Parallel CORDIC..................................................................................................................29
Serial CORDIC.....................................................................................................................30
Conclusions....................................................................................................................................31
BIBLIOGRAPHY..........................................................................................................................32
Introduction
This report describes the implementation of five numerically controlled oscillator architectures on
an Altera Cyclone II FPGA. A brief introduction to NCOs and their operation is presented.
The NCO circuits were generated and tested using Quartus Web edition IDE, which provides
bundled an NCO generator.
Finally implementation was done on a Terasic DE2 board, which contains among other resources a
Cyclone II FPGA.
NCO
[Q1..Qn]
Phase increment
Figure 1: Block sketch down of an NCO.
Figure 3: The figure shows the samples generation process from the
accumulator to the converter output.
Phase accumulator
Given a reference clock frequency F, and a phase increment m, and an accumulator precision m, the
2n
phase accumulator will overflow after
clock cycles, hence the generated waveform period
m
Fm
2n
shall be
and it's frequency can be determined by
.
Fm
2n
F
In addition m=1 determines the frequency resolution, which is
2n
The phase wheel serves showing graphically the accumulator operation.
2
cycles of clock reference
m
frequency. After truncation there are s' phase values that enter the converter, thus it will produce a
maximum of s' samples per cycle of the generated waveform. As M increases the number of
samples per output cycle decreases.
According to the Nyquist criterion, for a sampled waveform, at least two samples per output cycle
are needed to reproduce it . Consequently the maximum output frequency in an NCO device is
Fc
limited to
.
2
Given a tuning word m, the phase accumulator resets after s=
CORDIC
It is based on the unit vector rotation. Sine and cosine are calculated simultaneously as (X,Y)
coordinates correspond to the cosine and sine of the angle between the vector and the X axis.
An n number of iteration approximate the vector angle to the input angle starting from the 0
degrees;i.e. (X,Y)=(0,0).
A= 1+2i
0
xi +1=x yd2
y i +1= y + xd2i
z i+1=z id itan1 (2i )
Which provides the following result:
z n=0
x n= A[ x 0 cos z 0 y 0 sin z 0 ]
y n=A [ y 0 cos z 0+ x0 sin z 0 ]
n1
A= 1+2i
0
d =1
Therefore, sine and cosine can be calculated by setting
X 0 , Y 0=
1
,0
A
NCO design
Implementation of a NCO circuit requires interconnection between combinational elements, random
access memories and register memories. Thus the construction of a custom NCO device demands
many integrated circuits and a considerable surface to mount them. FPGA devices simplify this task
allowing to generate complete NCOs on a single chip.. FPGAs contain programmable logical
elements, and a hierarchy of reconfigurable interconnects that allow logical elements to be
physically connected; as well as other additional elements such as random access memories,
arithmetical dedicated circuitry, DSP blocks, whose presence and quantity depend on the device
model.
The five NCO architectures where implemented on an Terasic DE2 design board, which contains an
Altera Cyclone II FPGA using the IP NCO generator plug-in provided bundled with Altera Quartus
IDE. NCOs generated by this plug-in produce sinusoidal waveforms, also they can be configured
with a single or dual complementary outputs (sine-cosine). The samples produced consist of integer
values in complement by two notation.
sin(x)
sin(x)
cos(/4x)
cos(x-/2)
sin(-x)
-sin(x-)
-cos(3/2-x)
-cos(x-3/2)
-sin(2-x)
cos(x)
cos(x)
sin(/2-x)
-sin(x-/2)
-cos(-x)
-cos(x-)
-sin(3/2-x)
sin(x-3/2)
cos(2-x)
Figure 9
r
22
8
as 45 is the
Design features
The most significant bit contains information about the sign. As the sine/cosine of the first 45 degrees are positive it
is not necessary to include it. Therefore P-1 bits are used for each sample.
Serial
Parallel
Multiplier based
Multipliers (use LEs) or dedicated multipliers.
Clock cycles per output (1 or 2)
Other features (out the scope of this essay)
Phase dithering (due to truncation and finite precision, spurs may appear in the spectrum
of generated waveform, dither adds random noise to the overall spectrum but that tends
to increase the SFDR by reducing the repetition of values).
Frequency modulation
Multichannel NCO
Frequency hopping (for use in spread spectrum)
Implementation
In order to compare the use of resources and operation of the five architectures, an NCO block from
each of them was generated using the following common set of design parameters.
Dither: no dither applied
phase accumulator precision: 32bits
Angular resolution: 4 bits
Magnitude precision: 13 bits
Dual output: no.
The generated blocks were simulated on the same conditions, these conditions were reproduced on
the DE2 board.
In order to make easier the visualization of simulation and output values, a 4bits angular resolution
(which is the minimum available) was selected. Therefore for those output frequencies below
Fc
the ROM based NCOs generate only 16 samples per output waveform cycle. Thus no dither
16
was used.
The following table shows a comparison of the resource use estimation provided by the IP plug-in.
Resource/Arch Large ROM
Small ROM
CORDIC
CORDIC
Multiplier
itecture
(serial)
(parallel)
based
LE elements
68
191
620
1182
213
Memory bits
208
48
156
M4K memory 1
elements
DSP elements
Simulation
In order to test the Large ROM Small ROM, parallel CORDIC and Multiplier based architectures
the following design which contains only an NCO block and control signals I/Os was used
42949673=
100 Hz
128849019 for 3Hz output
3 Hz232
128849019=
100 Hz
Serial CORDIC
167503725 for 3 Hz output
55834575 for 1Hz output
Simulation time 2.5 seconds
The serial CORDIC architecture produces a sample each 13 clock cycles, as the angular resolution is set to 4, a
minimum of 16 samples at 3Hz was intended, therefore the clock frequency was calculated:
cycles
16 samplescycle13
3 Hz =624 Hz
sample
Fc
so ROM based architecture
16
can be able to produce their whole set of values on each output waveform cycle.
Output frequencies where deliberately selected to be inferior to
Figure 11
The picture shows an overview of a 2.5 seconds simulation of a Large ROM NCO.
Time line description:
clk: reference clock
out_valid_ncol: valid output state
phase_ncol: tuning word
sine_o_ncol: NCO amplitude output value
During the interval from 0s to 44ms the start up transient takes place and the NCO does not produce
a valid output.
After the start up transient, during the following second the NCO in response to the control word
generates an output of 1Hz composed of 16 discrete values.
Figure 12
The figure shows the signed literal output of the NCO block during the same interval. The signal
generation is composed by 16 samples starting from 0.
In Large ROM architecture samples are uniformly distributed in time. The duration of each of them
in clock cycles is
Frequency clock
cyles
100 Hz
=
duration=
for the 1Hz output 6.25
sample 16 samples
samples number
Notice that this is not an exact proportion. To produce a synchronous output, some peaks and zeros
last 1 more cycle. In the case of the 1Hz output this means that Zeros and peaks last 7 clock cycles
while intermediate values last 6 cycles.
6 7
Figure 13:
cyles
100 Hz
=
sample 316 samples
intermediate values last 2 cycles, while some zeros/peaks last 3 cycles.
For the 3Hz output the average duration of each sample is 2.0833
Figure 14
Then
Figure 15
The picture shows an overview of a 2.5 seconds simulation.
In this case, transient time last 74ms.
Figure 16:
The transient, the NCO produced valid samples, according to the 42949673 tuning word, the output
waveform frequency was 1Hz.
Figure 18
Figure 19:
10
11
12
13
cosine
0
1
0,3826834324 0,9238795325
The image of sine/cosine functions goes from -1 to 1; the output values of an NCO go from -2p-1+1
to 2p-1-1. In order to convert real values into output values the next rule of three must be applied.
real
output value = value (2 p11)
sine
cosine
sine output
cosine output
0
1
0
4095
0,3826834324 0,9238795325
1567
3783
Using the values generation rules the remaining the values were calculated.
Angle
0
1/8
1/4
3/8
cosine output
( ) ()
3
3
cos =cos cos(0)=4095
4 8
32
( ) ()
1/2
cos( )=cos(0)=4095
2 2
5/8
3/4
7/8
9/8
5/4
10
11/8
11
3/2
12
13/8
13
7/4
14
15/8
15
sin(
( 78 )=sin( 8 )=1567
sin ( )=sin(0)=0
cos
4095
3783
( )
sin
( 2 2 )=sin (0)=0
( 58 2 )=sin( 8 )=1567
sin
sin
0
1567
=cos( )=3783
8
8
cos ( )=cos (0)=4095
cos
3 11
cos
=cos( )=3783
2
8
8
cos
sin
( 32 54 )=sin ( 4 )=3783
=sin (0)=0
2
2 )
sin
( 32 32 )=cos(0)=4095
13 3
cos
sin 2
0
1567
2213
3217
4095
3783
2895
1567
0
-1568
-2897
-3785
-4095
-3783
-2895
-1567
This table shows how the output values were calculated using the provided rules and the criteria
applied.
Observe the repeated samples are not the same obtained in the simulation3. Nevertheless it still
proves how during the output generation amplitude values can repeat appearing as they last the
double.
During implementation tests the NCO behaves like the simulation indicates. Beware the IP plug-ins are not open
source. Therefore their real behavior may not be the same specified in their documentation.
Figure 23:
1Hz output
3Hz output
Parallel CORDIC
The following figure shows an overview of a 2,5 simulation of a parallel CORDIC based NCO
Figure 24
Figure 25
1Hz output waveform generation.
Figure 26
3Hz output waveform generation
Figure 27
Given the used design values, on this architecture each cycle of the output waveform is composed
by 30 samples. Also, peak values last the double that intermediate values.
This picture is represents a 1Hz waveform generation. The low peak -4023 lasts 7 clock cycles,
while others last 3 cycles. The picture below corresponds to a 3 Hz output.
Figure 28
Serial CORDIC
Serial CORDIC produces a valid output after N clock cycles, meanwhile it outputs intermediate non
valid values. In order to test this architecture, the simulation design had to be altered in order to
only output valid values. This was achieved by using D latches with enable input.
Figure 31
Serial CORDIC simulation
Figure 32
The figure shows an overview of a 4 seconds simulation of a serial CORDIC NCO.
The line sine out correspond to the legal output values while unfiltered_output represents all the
generated values (valid and invalid). Notice that the waveform view of both outputs is very similar.
Figure 33
Given the clock frequency of 650Hz, the block has a start up transient time of 28,465 ms (the fastest
of all the tried alternatives).
The following picture shows the generation of a 1Hz output.
Due to the selected magnitude precision N=13, the block generates a valid sample after 13 reference
clock cycles.
1 2 3 4 5 6 7 8 9 10 11 12 13
Figure 34:
The picture shows the generation of a sample on a 1Hz output.
1 2 3 4 5 6 7 8 9 10 11 12 13
Figure 35:
The picture shows the generation of a sample on a 3Hz output.
Compilation
In order to compare the resource use of the five architectures, a compilation consisting only on the
NCO blocks was done.
The following table shows the most relevant features of the resource use of each architecture
provided the used configuration values.
Resource
Total logic elements
combinational with no register
register only
combinational with a register
Logic element usage by number of LUT inputs
4 input functions
3 input functions
<=2 input functions
register only
Large ROM Small ROM Multiplier based CORDIC (parallel) CORDIC (serial) Total available
303
400
442
1178
647
33126
89
89
95
90
236
58
106
149
141
119
156
205
198
947
292
81
89
75
58
94
103
97
106
85
116
92
149
132
409
496
141
186
161
181
119
206
39
233
61
217
76
469
568
423
105
Total registers
dedicated logic register
i/o registers
214
214
0
311
311
0
347
347
0
1088
1088
0
411
411
0
34593
33216
1377
74
85
126
131
67
2076
global signals
10
10
10
10
10
1
208
4608
0
0
10
1
48
4608
0
0
10
2
156
9216
4
0
10
0
0
0
0
0
0
10
0
0
10
m4ks
total block memory bits
total block implementation memory bits
embedded multiplier 9 bit elements
plls
global clocks
105
483840
483840
70
4
16
For the given configuration values set, the ROM based architectures minimize the use of logical
elements, specially the Large ROM, which has the lowest use of this resource.
Notice that the overall use of resources of multiplier based architecture is close to the Small ROM,
therefore it can be assumed that it implements a ROM like NCO strategy. Recall that during
simulations this architecture produced a similar output to the Large ROM.
The next charts show a comparison between these architectures about the use of LEs.
Logic Elements
total labs partially or completely used
Large ROM
Small ROM
Multiplier based
CORDIC (parallel)
CORDIC (serial)
register only
Figure 36
Due to their nature CORDIC based architectures use the highest amount of this resource. They
calculate trigonometrical functions using only shifts and adds/subtractions. Because of that these
consume the highest amount of LEs for the same design features.
Notice that serial CORDIC has the lowest LAB use. They achieve more resource use efficiency
because on each iteration the logical/mathematical hardware used is the same.
Logic elements by mode
CORDIC architectures are arithmetical intensive. Therefore they are the ones that use most
arithmetical mode LEs.
400
300
200
100
0
normal mode
arithmetic mode
Figure 37
On the opposite hand, ROM based architectures require much less arithmetical mode LEs (which
are used to implement adders, counters, accumulators and comparators) as their operation do not
need much calculations to be done for each output sample.
Logic elements by LUT inputs
Large ROM
Small ROM
Multiplier based
CORDIC (parallel)
CORDIC (serial)
300
250
200
150
100
50
0
4 input functions
3 input functions
register only
<=2 input functions
Figure 38
Memory use
Large ROM architecture is the one that most memory bits use, followed by the multiplier based
architecture. Notice that Small ROM uses much less of this resource than the other two.
As their description in the plug-in user guide claims, CORDICs do not require M4K memory.
Which makes them suitable on those cases when memory is at premium.
Large ROM
Small ROM
Multiplier based
CORDIC (parallel)
CORDIC (serial)
150
100
50
0
total block memory bits
Figure 39
Despite the Large ROM uses highest amount of memory bits, it has the most efficient
implementation, having the lowest memory fragmentation. This can be noticed by the M4K
memories usage, which finally determines the physical resource usage.
Notice that multiplier based NCO uses the highest amount of M4K blocks.
Large ROM
Small ROM
Multiplier based
CORDIC (parallel)
CORDIC (serial)
1.4
1.2
1
0.8
0.6
0.4
0.2
0
m4ks
Figure 40
Multiplier based architecture provides a similar output to the Large ROM, using the same amount of
logical resources that Small ROM. In many applications the embedded multipliers are not
completely used (recall that this implementation requires only 4 of the 70 that the Cyclone II
provides). Therefore this architecture must be considered as an option to ROM based ones once the
FPGA overall resources usage of the rest of circuit implementations on the FPGA is known.
Implementation
In order to test the NCO blocks operation, it was necessary to convert their output from plain binary
to a visualizable value on the DE2 seven segments displays using three hexadecimal to seven
segments decoders (for the LSB) and a negated output to one of the seven segments displays led to
visualize the sign.
Figure 41
The figure shows an output sample displayed on the DE2 board.
The generated NCO blocks produce for the given configuration and operation values set a minimum
of 16 samples per cycle. Therefore it was necessary to implement a 23 bits frequency divider to
reduce the input frequency.
Tuning words were loaded into a two words 32bits length ROM. The ROM address bus was
attached to a DE2 switch. Thus the switch allowed to select one of the two tuning word in any
moment during the NCO operation. In order to get the ROM compiled with the desired content, its
design block was indicated to the initial values from a mif file.
The next example shows how to generate a memory initialization file.
Width=32;
Depth=2;
address_radix=uns;
Data_radix=uns;
CONTENT BEGIN
0: 42949673;
1: 128849019;
END;
The lines between content begin and end associate a data to a memory address.
The following figure shows the test design used for all the architecture except the serial CORDIC.
3
4
Figure 43
2. Hexadecimal to 7 segments decoders
Figure 44
3. ROM memory block
Figure 45
4. Frequency divider
Figure 46
Serial CORDIC implementation
Recall that during simulations it was necessary to attach latches to the serial CORDIC NCO to
visualize only valid output values, therefore the implementation design must incorporate them.
Figure 47
Once implemented each design it was possible to check the generated samples sequence against
those obtained by simulation. The displayed values during implementation were in hexadecimal
format, then the simulation outputs had to be adapted.
Implementation output values were recorded in video during 2 output cycles (using both tuning
words) then analyzed taking them down using VLC video software.
Large ROM
Figure 48
Simulation values: 0000-061F-0B50-0EC7-0FFF-0EC7-0B50-601F-0000-19E1-14B0-1139-10011139-14B0-19E1-0000
Obtained values: 0000-061F-0B50-0EC7-0FFF-0EC7-0B50-0000-19E1-14B0-1139-1001-113914B0-19E1-000
Small ROM
Multiplier based
Parallel CORDIC
Serial CORDIC
Conclusions
In all the cases the values obtained by simulation were equal to those gathered during
implementation test. This proves the reliability of simulation models, even if IP modules are
used.
Also, values at 1Hz output are the same obtained at 3Hz.
The obtained values using the Large ROM architecture are the same for the multiplier based.
Therefore these modules can be replaced with each other. The same happened between serial
and parallel CORDIC.
The values produced by the Small ROM architecture are congruent with those obtained
using its equations, not their duration.
BIBLIOGRAPHY