Jump to content

Codec 2: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
 
(48 intermediate revisions by 26 users not shown)
Line 1: Line 1:
{{Short description|Low-bitrate speech encoding format}}
'''Codec 2''' is a low-bitrate speech audio [[codec]] ([[speech coding]]) that is [[patent]] free and [[Open-source software|open source]].<ref>{{cite web|url=http://www.tapr.org/pdf/DCC2011-Codec2-VK5DGR.pdf|title=DCC2011-Codec2-VK5DGR}}</ref> Codec 2 compresses speech using [[sinusoidal]] coding, a method specialized for human [[speech]]. Bit rates of 3200 to 450&nbsp;bit/s have been successfully created. Codec 2 was designed to be used for [[amateur radio]] and other high compression voice applications.
{{Infobox software
| title = Codec 2
| name = Codec 2
| screenshot =
| developer = David Grant Rowe
| released = {{Start date|2010|08|25}}<!-- v0.1 -->
| latest release version = 1.2.0
| latest release date = {{Date and age|2023|06|24}}
| programming language = [[C99]]
| platform = [[Cross-platform]]
| genre = [[Audio codec]]
| license = [[GNU LGPL]], v2.1
| website = {{URL|1=https://www.rowetel.com/?page_id=452 }}
| repo = {{URL|https://github.com/drowe67/codec2}}
}}
'''Codec 2''' is a low-bitrate speech audio [[codec]] ([[speech coding]]) that is [[patent]] free and [[Open-source software|open source]].<ref>{{Cite web|url=http://www.tapr.org/pdf/DCC2011-Codec2-VK5DGR.pdf|title=DCC2011-Codec2-VK5DGR}}</ref> Codec 2 compresses speech using [[sinusoidal]] coding, a method specialized for human [[speech]]. Bit rates of 3200 to 450&nbsp;bit/s have been successfully created. Codec 2 was designed to be used for [[amateur radio]] and other high compression voice applications.


==Overview==
==Overview==
The codec was developed by David Rowe, with support and cooperation of other researchers (e.g., Jean-Marc Valin from [[Opus (audio format)|Opus]]).<ref name=jmspeex>{{cite web|url=http://jmspeex.livejournal.com/10446.html|title=A Pitch-Energy Quantizer for Codec2|deadurl=yes|archiveurl=https://web.archive.org/web/20150619052003/http://jmspeex.livejournal.com/10446.html|archivedate=2015-06-19|df=}}</ref>
The codec was developed by David Grant Rowe, with support and cooperation of other researchers (e.g., Jean-Marc Valin from [[Opus (audio format)|Opus]]).<ref name=jmspeex>{{cite web|url=http://jmspeex.livejournal.com/10446.html|title=A Pitch-Energy Quantizer for Codec2|url-status=dead|archive-url=https://web.archive.org/web/20150619052003/http://jmspeex.livejournal.com/10446.html|archive-date=2015-06-19}}</ref>


Codec 2 consists of 3200, 2400, 1600, 1400, 1300, 1200, 700 and 450&nbsp;bit/s codec modes. It outperforms most other low-bitrate [[speech coding|speech codec]]s. For example, it uses half the bandwidth of [[Advanced Multi-Band Excitation]] to encode speech with similar quality. The speech codec uses 16-bit [[pulse-code modulation|PCM]] sampled audio, and outputs packed digital bytes. When sent packed digital bytes, it outputs PCM sampled audio. The audio sample rate is fixed at 8&nbsp;kHz.
Codec 2 consists of 3200, 2400, 1600, 1400, 1300, 1200, 700 and 450&nbsp;bit/s codec modes. It outperforms most other low-bitrate [[speech coding|speech codec]]s. For example, it uses half the bandwidth of [[Advanced Multi-Band Excitation]] to encode speech with similar quality.{{Citation needed|date=September 2020}} The speech codec uses 16-bit [[pulse-code modulation|PCM]] sampled audio, and outputs packed digital bytes. When sent packed digital bytes, it outputs PCM sampled audio. The audio sample rate is fixed at 8&nbsp;kHz.


The [[reference implementation]] is open source and is freely available in a [[Apache Subversion|subversion]] (SVN) repository.<ref name=svn>{{cite web|url=http://sourceforge.net/p/freetel/code/HEAD/tree/|title=Repository for Codec2 Source}}</ref> The source code is released under the terms of version 2.1 of the [[GNU Lesser General Public License]] (LGPL).<ref>{{cite web|url=https://slashdot.org/story/10/09/21/0428259|title=Codec2 – an Open Source, Low-Bandwidth Voice Codec|website=Slashdot}}</ref> It is programmed in [[C (programming language)|C]] and so far doesn't work without [[floating-point arithmetic]], although the algorithm itself does not require this. The reference software package also includes a frequency-division multiplex digital voice (FDMDV) software modem and a graphical user interface based on [[FLTK]]. The software is developed on [[Linux]] and a port for [[Microsoft Windows]] created with [[Cygwin]] is offered in addition to a Linux version.
The [[reference implementation]] is open source and is freely available in a [[GitHub]] repository.<ref name=git>{{cite web|url=https://github.com/drowe67/codec2|title=Repository for Codec 2 Source|website=[[GitHub]] |date=14 October 2021}}</ref> The source code is released under the terms of version 2.1 of the [[GNU Lesser General Public License]] (LGPL).<ref>{{cite web|url=https://slashdot.org/story/10/09/21/0428259|title=Codec2 – an Open Source, Low-Bandwidth Voice Codec|website=Slashdot|date=21 September 2010 }}</ref> It is programmed in [[C (programming language)|C]] and current source code requires [[floating-point arithmetic]], although the algorithm itself does not require this. The reference software package also includes a [[frequency-division multiplex]] digital voice software modem and a graphical user interface based on [[WxWidgets]]. The software is developed on [[Linux]] and a port for [[Microsoft Windows]] created with [[Cygwin]] is offered in addition to an Apple [[MacOS]] version.


The codec has been presented in various conferences and has received the 2012 [[American Radio Relay League|ARRL]] Technical Innovation Award,<ref>[http://www.arrl.org/news/arrl-board-of-directors-names-award-recipients-at-2012-second-meeting ARRL Technical Innovation Award in 2012]</ref> and the Linux Australia Conference's Best Presentation Award.<ref>{{Cite web|url=http://lca2012.linux.org.au/schedule/59/view_talk?day=tuesday|title=Linux Australia 2012 conference|access-date=2012-08-02|archive-url=https://archive.is/20121129131157/http://lca2012.linux.org.au/schedule/59/view_talk?day=tuesday|archive-date=2012-11-29|dead-url=yes}}</ref>
The codec has been presented in various conferences and has received the 2012 [[American Radio Relay League|ARRL]] Technical Innovation Award,<ref>{{Cite web|url=http://www.arrl.org/news/arrl-board-of-directors-names-award-recipients-at-2012-second-meeting|title=ARRL Board of Directors Names Award Recipients at 2012 Second Meeting|website=www.arrl.org}}</ref> and the Linux Australia Conference's Best Presentation Award.<ref>{{Cite web|url=http://lca2012.linux.org.au/schedule/59/view_talk?day=tuesday|title=Linux Australia 2012 conference|access-date=2012-08-02|archive-url=https://archive.today/20121129131157/http://lca2012.linux.org.au/schedule/59/view_talk?day=tuesday|archive-date=2012-11-29|url-status=dead}}</ref>

===Non-Coherent PSK===
Rowe has also created a [[frequency-division multiplex]] (FDM) modem which carries the digital voice (DV) in only 1.3&nbsp;kHz of radio bandwidth.<ref>{{cite web|title=FDMDV Modem|url=http://www.rowetel.com/blog/?page_id=2458}}</ref> The codec and FDM modem are used every day on amateur radio shortwave bands using both the SM1000 hardware implementation, and the FreeDV application.

This modem operates at 50&nbsp;Baud with a bit rate of 1600&nbsp;bit/s. This is sent using sixteen QPSK FDM carriers (2&nbsp;bits each), or 32&nbsp;bits 50 times a second. 64&nbsp;bits are needed to make a vocoder frame, thus it has a 25&nbsp;Hz effective rate. The 64&nbsp;bits contain 52&nbsp;bits of vocoder data, and 12&nbsp;bits of Forward Error Correction (Golay). Thus an effective 1300&nbsp;bit/s is used for the vocoder. A separate BPSK carrier is sent in the middle of the spectrum (1500&nbsp;Hz) for synchronization.

The ITU emission designation is J2E for phone payload, and J2D for data payload.

===Coherent PSK===
A second FDM modem waveform was developed for the 700&nbsp;bit/s vocoder. This modem operates with a symbol rate of 75&nbsp;Baud, using Coherent Quadrature Phase-Shift Keying (QPSK) with seven subcarriers. A duplicate set of subcarriers are used as a diversity channel. This diversity channel is used to combat the effects of fading with shortwave propagation. The modem will still perform well with a ± 40&nbsp;Hz tuning error.

The FDM modem sends and receives a row of subcarriers 75 times a second. However, it takes six of these rows to make up a modem frame. First, two pilot reference-phase rows (28&nbsp;bits), then two speech vocoder rows (28&nbsp;bits), and finally two more rows for the second speech vocoder frame (28&nbsp;bits). The process then repeats as long as the transmitter Push-To-Talk (PTT) is keyed.

Thus, a modem frame is 84&nbsp;bits total. 56&nbsp;bits are used for speech, and 28&nbsp;bits are used for the reference-phase pilots. These pilots are what makes this a coherent modem. They are used to correct the received data bit phases. The data rate is 1050&nbsp;bit/s (75 Baud × 14&nbsp;bits). The effective data rate is 700&nbsp;bit/s (75&nbsp;Baud / 6 or 12.5&nbsp;Baud × 56&nbsp;bits). Each row of 14&nbsp;bits is sent as seven QPSK carriers (2&nbsp;bits per carrier).

The modem timings are also relevant, in that each speech vocoder frame outputs 28&nbsp;bits every 40&nbsp;ms. Since the modem has an 80&nbsp;ms modem frame, it can transport two speech vocoder frames.

There are 100 complex IQ (In-Phase and Quadrature-Phase) audio samples for each row, at a 7500&nbsp;Hz rate. 600 samples total for the modem frame. Thus, 100×6 * 12.5 equals the 7500&nbsp;Hz sample rate. Using a rate conversion filter, the application is provided an 8&nbsp;kHz interface, which is much more compatible with sound cards. There are 640 complex audio samples at the 8&nbsp;kHz rate. This rate conversion would not be necessary in firmware.

The FDM modem operates with a center frequency of 1500&nbsp;Hz. The initial FDM subcarrier frequencies are set using a spreading function. This changes the spacing of each subcarrier a little bit more each subcarrier further to the left. About 105&nbsp;Hz apart on the right, to about 109&nbsp;Hz apart on the left. This design, along with spectrum clipping, improves the Peak to Average Power Ratio (PAPR). The measured Crest factor is about 8.3&nbsp;dB with clipping, and about 10.3&nbsp;dB without clipping.

The FDM modem waveform consumes a different amount of bandwidth, depending on whether the diversity channel is enabled. About 750&nbsp;Hz per group of seven subcarriers. Normally you would want to use diversity on shortwave, but optionally on VHF and above.

The ITU emission designation is J2E for phone payload, and J2D for data payload.

===Orthogonal PSK===
In 2019, a third modem was released which was based on [[Orthogonal frequency-division multiplexing]] (OFDM). This modem operates at 50&nbsp;baud, with a default number of 17&nbsp;[[QPSK]] carriers. This parameter and many others were made adjustable to satisfy other OFDM waveform designs. With 17&nbsp;carriers it uses a [[Cyclic prefix]] duration of 2&nbsp;ms and a symbol time of 18&nbsp;ms. The symbol time produces a modulation symbol rate of 55.556&nbsp;baud. With a sampling rate of 8&nbsp;kHz this produces 144&nbsp;symbol samples and 16&nbsp;Cyclic prefix samples, for a total of 160&nbsp;samples.

The difference in this modem from many other OFDM designs, is it uses multiple data rows to send all the bits. With 17&nbsp;carriers this results in seven data rows producing 238&nbsp;bits total. These bits contain the four 700&nbsp;bps vocoder words of 28&nbsp;bits each, and the same number of [[Low-density parity-check code]] (LDPC) bits, plus a text bit and unique sync word. Each data packet is preceded by a 19&nbsp;carrier [[BPSK]] pilot signal. The two extra carriers are used to bracket each QPSK carrier with three pilots to average and provide coherency.

The ITU emission designation is J2E for phone payload, and J2D for data payload.


==Technology==
==Technology==
Internally, parametric audio coding algorithms operate on 10&nbsp;ms PCM frames using a model of the human voice. Each of these audio segments is declared [[voiced]] (vowel) or unvoiced (consonant).
Internally, parametric audio coding algorithms operate on 10&nbsp;ms PCM frames using a model of the human voice. Each of these audio segments is declared [[voiced]] (vowel) or unvoiced (consonant).


Codec 2 uses sinusoidal coding to model speech, which is closely related to that of [[multi-band excitation]] codecs. Sinusoidal coding is based on regularities (periodicity) in the pattern of overtone frequencies and layers harmonic sinusoids. Spoken audio is recreated by modelling speech as a sum of harmonically related sine waves with independent amplitudes called [[Line spectral pairs]], or LSP, on top of a determined [[fundamental frequency]] of the speaker's voice (pitch). The (quantised) pitch and the amplitude (energy) of the [[harmonics]] are encoded, and with the LSP's are exchanged across a channel in a digital format. The LSP coefficients represent the [[Linear predictive coding|Linear Predictive Coding]] (LPC) model in the frequency domain, and lend themselves to a robust and efficient quantisation of the LPC parameters.<ref name="thesis">{{cite web|url=http://www.itr.unisa.edu.au/~steven/thesis/dgr.pdf|title=Techniques for Harmonic Sinusoidal Coding}}</ref>
Codec 2 uses [[Sinusoidal model|sinusoidal coding]] to model speech, which is closely related to that of [[multi-band excitation]] codecs. Sinusoidal coding is based on regularities (periodicity) in the pattern of overtone frequencies and layers harmonic sinusoids. Spoken audio is recreated by modelling speech as a sum of harmonically related sine waves with independent amplitudes called [[Line spectral pairs]], or LSP, on top of a determined [[fundamental frequency]] of the speaker's voice (pitch). The (quantised) pitch and the amplitude (energy) of the [[harmonics]] are encoded, and with the LSP's are exchanged across a channel in a digital format. The LSP coefficients represent the [[Linear predictive coding|Linear Predictive Coding]] (LPC) model in the frequency domain, and lend themselves to a robust and efficient quantisation of the LPC parameters.<ref name="thesis">{{cite web|url=http://www.itr.unisa.edu.au/~steven/thesis/dgr.pdf|title=Techniques for Harmonic Sinusoidal Coding|access-date=2013-04-12|archive-date=2013-05-15|archive-url=https://web.archive.org/web/20130515225424/http://www.itr.unisa.edu.au/~steven/thesis/dgr.pdf|url-status=dead}}</ref>


The digital bytes are in a bit-field format that have been packed together into bytes. These bit fields are also optionally [[gray code]]d before being grouped together. The gray coding may be useful if sending raw, but normally an application will just burst the bit fields out. The bit fields make up the various parameters that are stored or exchanged (pitch, energy, voicing booleans, LSP's, etc.).
The digital bytes are in a bit-field format that have been packed together into bytes. These bit fields are also optionally [[gray code]]d before being grouped together. The gray coding may be useful if sending raw, but normally an application will just burst the bit fields out. The bit fields make up the various parameters that are stored or exchanged (pitch, energy, voicing Booleans, LSP's, etc.).


For example, Mode 3200, has 20&nbsp;ms of audio converted to 64&nbsp;Bits. So 64&nbsp;Bits will be output every 20&nbsp;ms (50 times a second), for a minimum data rate of 3200&nbsp;bit/s. These 64&nbsp;bits are sent as 8&nbsp;bytes to the application, which has to unwrap the bit fields, or send the bytes over a data channel.
For example, Mode 3200, has 20&nbsp;ms of audio converted to 64&nbsp;bits. So 64&nbsp;bits will be output every 20&nbsp;ms (50 times a second), for a minimum data rate of 3200&nbsp;bit/s. These 64&nbsp;bits are sent as 8&nbsp;bytes to the application, which has to unwrap the bit fields, or send the bytes over a data channel.


Another example is Mode 1300, which is sent 40&nbsp;ms of audio, and outputs 52&nbsp;Bits every 40&nbsp;ms (25 times a second), for a minimum rate of 1300&nbsp;bit/s. These 52&nbsp;bits are sent as 7&nbsp;bytes to the application or data channel.
Another example is Mode 1300, which is sent 40&nbsp;ms of audio, and outputs 52&nbsp;bits every 40&nbsp;ms (25 times a second), for a minimum rate of 1300&nbsp;bit/s. These 52&nbsp;bits are sent as 7&nbsp;bytes to the application or data channel.


== Adoption ==
== Adoption ==
Codec 2 is currently used in several radios and Software Defined Radio Systems
Codec 2 is currently used in several radios and Software Defined Radio Systems


* FreeDV<ref>{{cite web|title=FreeDV|url=http://freedv.org/tiki-index.php}}</ref>
* FreeDV<ref>{{Cite web|url=https://freedv.org/|title=FreeDV: Open Source Amateur Digital Voice – Where Amateur Radio Is Driving The State of the Art}}</ref>
* FlexRadio 6000 series<ref>{{cite web |title=FreeDV, CODEC2 and the WaveformAPI |url=http://www.flex-radio.nl/flex-6000-serie/the-flex-insider |access-date=2015-03-06 |archive-url=https://web.archive.org/web/20150402184229/http://www.flex-radio.nl/flex-6000-serie/the-flex-insider/ |archive-date=2015-04-02 |dead-url=yes }}</ref>
* FlexRadio 6000 series<ref>{{cite web |title=FreeDV, CODEC2 and the WaveformAPI |url=http://www.flex-radio.nl/flex-6000-serie/the-flex-insider |access-date=2015-03-06 |archive-url=https://web.archive.org/web/20150402184229/http://www.flex-radio.nl/flex-6000-serie/the-flex-insider/ |archive-date=2015-04-02 |url-status=dead }}</ref>
* SM1000<ref>{{cite web|title=Introducing the SM1000 Smart Mic|url=http://www.rowetel.com/blog/?p=3125}}</ref>
* SM1000<ref>{{Cite web|url=http://www.rowetel.com/?p=3125|title=Introducing the SM1000 Smart Mic – Rowetel|date=21 May 2014 }}</ref>
* Quisk<ref>{{Cite web|url=https://james.ahlstrom.name/quisk/|title=Quisk, A Software Defined Radio (SDR)|website=james.ahlstrom.name}}</ref>
* [[M17 (amateur radio)|M17 Project]]<ref>{{cite web|title=M17 protocol description|website=[[GitHub]] |url=https://github.com/M17-Project/M17_spec}}</ref>


Codec2 has also been integrated into [[FreeSWITCH]] and there's a [[patch (computing)|patch]] available for support in [[Asterisk (PBX)|Asterisk]].
Codec2 has also been integrated into [[FreeSWITCH]] and there's a [[patch (computing)|patch]] available for support in [[Asterisk (PBX)|Asterisk]].


There is a FM-to-Codec2 digital voice repeater in earth orbit on amateur radio [[CubeSat]] ''LilacSat-1'' (call sign ON02CN, [[QB50]] constellation), which was launched and subsequently deployed from the [[International Space Station]] in 2017.<ref name="ARRL2017">{{cite web | title=QB-50 Constellation Satellites Deployed from ISS | website=American Radio Relay League website | date=2017-11-15 | url=http://www.arrl.org/news/qb-50-constellation-satellites-deployed-from-iss | access-date=2019-03-31}}</ref>
There was an FM-to-Codec2 digital voice repeater in earth orbit on amateur radio [[CubeSat]] ''LilacSat-1'' (call sign ON02CN, [[QB50]] constellation), which was launched and subsequently deployed from the [[International Space Station]] in 2017.<ref name="ARRL2017">{{cite web | title=QB-50 Constellation Satellites Deployed from ISS | website=American Radio Relay League website | date=2017-11-15 | url=http://www.arrl.org/news/qb-50-constellation-satellites-deployed-from-iss | access-date=2019-03-31}}</ref>


==History==
==History==
The prominent [[free software]] advocate and [[amateur radio operator|radio amateur]] [[Bruce Perens]] lobbied for the creation of a free speech codec for operation at less than 5&nbsp;kBit/s. Since he did not have the background himself, he approached Jean-Marc Valin in 2008, who introduced him to lead developer David Grant Rowe, who has worked with Valin on [[Speex]] on several occasions. Rowe himself is also a radio amateur (amateur radio [[call sign]] VK5DGR) and has experience in creating and using voice codecs and other signal processing algorithms for speech signals. He obtained a PhD in speech coding in the 1990s and was involved in the development of one of the first [[satellite phone|satellite telephony]] systems ([[Mobilesat]]).
The prominent [[free software]] advocate and [[amateur radio operator|radio amateur]] [[Bruce Perens]] lobbied for the creation of a free speech codec for operation at less than 5&nbsp;kbit/s. Since he did not have the background himself, he approached Jean-Marc Valin in 2008, who introduced him to lead developer David Grant Rowe, who has worked with Valin on [[Speex]] on several occasions. Rowe himself was also a radio amateur (amateur radio [[call sign]] VK5DGR) and had experience in creating and using voice codecs and other signal processing algorithms for speech signals. He obtained a PhD in speech coding in the 1990s and was involved in the development of one of the first [[satellite phone|satellite telephony]] systems ([[Mobilesat]]).


He agreed to the task and announced his decision to work on a format on August 21, 2009. He built on the research and findings from his doctoral thesis<ref>http://www.itr.unisa.edu.au/~steven/thesis/dgr.pdf</ref>.<ref>http://www.rowetel.com/blog/?p=128</ref> The underlying sinusoidal modelling goes back to developments by Robert J. McAulay and Thomas F. Quatieri (MIT Lincoln labs) from the mid-1980s.
He agreed to the task and announced his decision to work on a format on August 21, 2009. He built on the research and findings from his doctoral thesis.<ref>{{Cite web |url=http://www.itr.unisa.edu.au/~steven/thesis/dgr.pdf |title=Techniques for Harmonic Sinusoidal Coding |access-date=2013-04-12 |archive-date=2013-05-15 |archive-url=https://web.archive.org/web/20130515225424/http://www.itr.unisa.edu.au/~steven/thesis/dgr.pdf |url-status=dead }}</ref><ref>{{Cite web|url=http://www.rowetel.com/blog/?p=128|title = Open Source Low Rate Speech Codec Part 1 – Rowetel| date=21 August 2009 }}</ref> The underlying sinusoidal modelling goes back to developments by Robert J. McAulay and Thomas F. Quatieri (MIT Lincoln labs) from the mid-1980s.


In August 2010, David Rowe published version 0.1 alpha.<ref>http://www.rowetel.com/blog/?p=839</ref> Version 0.2 was released towards the end of 2011, introducing a mode with 1,400&nbsp;bits/s and significant improvements in quantization.
In August 2010, David Rowe published version 0.1 alpha.<ref>{{Cite web|url=http://www.rowetel.com/blog/?p=839|title = Codec2 V0.1 Alpha Released – Rowetel| date=25 August 2010 }}</ref> Version 0.2 was released towards the end of 2011, introducing a mode with 1,400&nbsp;bits/s and significant improvements in quantization.


In January 2012, at [[linux.conf.au]], Jean-Marc Valin helped improve the quantization of line spectral pairs, which Rowe is less familiar with.<ref>http://jmspeex.livejournal.com/10446.html</ref> After several changes to the available bit rate modes in winter and spring 2011/2012, 2,400, 1,400 and 1,200&nbsp;bit/s modes have been available since May.
In January 2012, at [[linux.conf.au]], Jean-Marc Valin helped improve the quantization of line spectral pairs, which Rowe is less familiar with.<ref>{{Cite web|url=http://jmspeex.livejournal.com/10446.html|title = A Pitch-Energy Quantizer for Codec2}}</ref> After several changes to the available bit rate modes in winter and spring 2011/2012, 2,400, 1,400 and 1,200&nbsp;bit/s modes were available after May of that year.


Codec 2 700C, a new mode with a bit rate of 700&nbsp;bit/s, was finished in early 2017.<ref name="Slashdot170113">{{cite web | title=Open Source Codec Encodes Voice Into Only 700 Bits Per Second | website=Slashdot | url=https://slashdot.org/story/17/01/13/2037254 | access-date=2019-03-31}}</ref>
Codec 2 700C, a new mode with a bit rate of 700&nbsp;bit/s, was finished in early 2017.<ref name="Slashdot170113">{{cite web | title=Open Source Codec Encodes Voice Into Only 700 Bits Per Second | website=Slashdot | date=13 January 2017 | url=https://slashdot.org/story/17/01/13/2037254 | access-date=2019-03-31}}</ref>


In July 2018 an experimental 450&nbsp;bit/s mode was demonstrated, which was developed as part of a master thesis at the University of Erlangen-Nuremberg. By clever training of the vector quantization the data rate could be further reduced based on the principle of the 700C mode.<ref name="Southgate2018">{{cite web | title=Codec2 HF digital voice at 450 bps | website=Southgate Amateur Radio News | date=2018-07-08 | url=http://southgatearc.org/news/2018/july/codec2-hf-digital-voice-at-450-bps.htm | access-date=2019-03-31}}</ref>
In July 2018 an experimental 450&nbsp;bit/s mode was demonstrated, which was developed as part of a master thesis at the University of Erlangen-Nuremberg. By clever training of the vector quantization the data rate could be further reduced based on the principle of the 700C mode.<ref name="Southgate2018">{{cite web | title=Codec2 HF digital voice at 450 bps | website=Southgate Amateur Radio News | date=2018-07-08 | url=https://web.archive.org/web/20190331220834/http://southgatearc.org/news/2018/july/codec2-hf-digital-voice-at-450-bps.htm#.XKE6chfP1qY | access-date=2019-03-31}}</ref>


== References ==
== References ==
Line 80: Line 67:


== External links ==
== External links ==
* [http://www.rowetel.com/?page_id=452 Official website]
* [https://www.rowetel.com/?page_id=452 Official website]
* [https://freedv.org/ FreeDV]
* [http://www.speech.cs.cmu.edu/comp.speech/Section3/speechlinks.html Various Speech Coding Links]
* [http://www.freedv.org FreeDV]


{{Compression formats}}
{{Compression formats}}

Latest revision as of 23:06, 23 July 2024

Codec 2
Developer(s)David Grant Rowe
Initial releaseAugust 25, 2010 (2010-08-25)
Stable release
1.2.0 / June 24, 2023; 17 months ago (2023-06-24)
Repositorygithub.com/drowe67/codec2
Written inC99
PlatformCross-platform
TypeAudio codec
LicenseGNU LGPL, v2.1
Websitewww.rowetel.com?page_id=452

Codec 2 is a low-bitrate speech audio codec (speech coding) that is patent free and open source.[1] Codec 2 compresses speech using sinusoidal coding, a method specialized for human speech. Bit rates of 3200 to 450 bit/s have been successfully created. Codec 2 was designed to be used for amateur radio and other high compression voice applications.

Overview

[edit]

The codec was developed by David Grant Rowe, with support and cooperation of other researchers (e.g., Jean-Marc Valin from Opus).[2]

Codec 2 consists of 3200, 2400, 1600, 1400, 1300, 1200, 700 and 450 bit/s codec modes. It outperforms most other low-bitrate speech codecs. For example, it uses half the bandwidth of Advanced Multi-Band Excitation to encode speech with similar quality.[citation needed] The speech codec uses 16-bit PCM sampled audio, and outputs packed digital bytes. When sent packed digital bytes, it outputs PCM sampled audio. The audio sample rate is fixed at 8 kHz.

The reference implementation is open source and is freely available in a GitHub repository.[3] The source code is released under the terms of version 2.1 of the GNU Lesser General Public License (LGPL).[4] It is programmed in C and current source code requires floating-point arithmetic, although the algorithm itself does not require this. The reference software package also includes a frequency-division multiplex digital voice software modem and a graphical user interface based on WxWidgets. The software is developed on Linux and a port for Microsoft Windows created with Cygwin is offered in addition to an Apple MacOS version.

The codec has been presented in various conferences and has received the 2012 ARRL Technical Innovation Award,[5] and the Linux Australia Conference's Best Presentation Award.[6]

Technology

[edit]

Internally, parametric audio coding algorithms operate on 10 ms PCM frames using a model of the human voice. Each of these audio segments is declared voiced (vowel) or unvoiced (consonant).

Codec 2 uses sinusoidal coding to model speech, which is closely related to that of multi-band excitation codecs. Sinusoidal coding is based on regularities (periodicity) in the pattern of overtone frequencies and layers harmonic sinusoids. Spoken audio is recreated by modelling speech as a sum of harmonically related sine waves with independent amplitudes called Line spectral pairs, or LSP, on top of a determined fundamental frequency of the speaker's voice (pitch). The (quantised) pitch and the amplitude (energy) of the harmonics are encoded, and with the LSP's are exchanged across a channel in a digital format. The LSP coefficients represent the Linear Predictive Coding (LPC) model in the frequency domain, and lend themselves to a robust and efficient quantisation of the LPC parameters.[7]

The digital bytes are in a bit-field format that have been packed together into bytes. These bit fields are also optionally gray coded before being grouped together. The gray coding may be useful if sending raw, but normally an application will just burst the bit fields out. The bit fields make up the various parameters that are stored or exchanged (pitch, energy, voicing Booleans, LSP's, etc.).

For example, Mode 3200, has 20 ms of audio converted to 64 bits. So 64 bits will be output every 20 ms (50 times a second), for a minimum data rate of 3200 bit/s. These 64 bits are sent as 8 bytes to the application, which has to unwrap the bit fields, or send the bytes over a data channel.

Another example is Mode 1300, which is sent 40 ms of audio, and outputs 52 bits every 40 ms (25 times a second), for a minimum rate of 1300 bit/s. These 52 bits are sent as 7 bytes to the application or data channel.

Adoption

[edit]

Codec 2 is currently used in several radios and Software Defined Radio Systems

Codec2 has also been integrated into FreeSWITCH and there's a patch available for support in Asterisk.

There was an FM-to-Codec2 digital voice repeater in earth orbit on amateur radio CubeSat LilacSat-1 (call sign ON02CN, QB50 constellation), which was launched and subsequently deployed from the International Space Station in 2017.[13]

History

[edit]

The prominent free software advocate and radio amateur Bruce Perens lobbied for the creation of a free speech codec for operation at less than 5 kbit/s. Since he did not have the background himself, he approached Jean-Marc Valin in 2008, who introduced him to lead developer David Grant Rowe, who has worked with Valin on Speex on several occasions. Rowe himself was also a radio amateur (amateur radio call sign VK5DGR) and had experience in creating and using voice codecs and other signal processing algorithms for speech signals. He obtained a PhD in speech coding in the 1990s and was involved in the development of one of the first satellite telephony systems (Mobilesat).

He agreed to the task and announced his decision to work on a format on August 21, 2009. He built on the research and findings from his doctoral thesis.[14][15] The underlying sinusoidal modelling goes back to developments by Robert J. McAulay and Thomas F. Quatieri (MIT Lincoln labs) from the mid-1980s.

In August 2010, David Rowe published version 0.1 alpha.[16] Version 0.2 was released towards the end of 2011, introducing a mode with 1,400 bits/s and significant improvements in quantization.

In January 2012, at linux.conf.au, Jean-Marc Valin helped improve the quantization of line spectral pairs, which Rowe is less familiar with.[17] After several changes to the available bit rate modes in winter and spring 2011/2012, 2,400, 1,400 and 1,200 bit/s modes were available after May of that year.

Codec 2 700C, a new mode with a bit rate of 700 bit/s, was finished in early 2017.[18]

In July 2018 an experimental 450 bit/s mode was demonstrated, which was developed as part of a master thesis at the University of Erlangen-Nuremberg. By clever training of the vector quantization the data rate could be further reduced based on the principle of the 700C mode.[19]

References

[edit]
  1. ^ "DCC2011-Codec2-VK5DGR" (PDF).
  2. ^ "A Pitch-Energy Quantizer for Codec2". Archived from the original on 2015-06-19.
  3. ^ "Repository for Codec 2 Source". GitHub. 14 October 2021.
  4. ^ "Codec2 – an Open Source, Low-Bandwidth Voice Codec". Slashdot. 21 September 2010.
  5. ^ "ARRL Board of Directors Names Award Recipients at 2012 Second Meeting". www.arrl.org.
  6. ^ "Linux Australia 2012 conference". Archived from the original on 2012-11-29. Retrieved 2012-08-02.
  7. ^ "Techniques for Harmonic Sinusoidal Coding" (PDF). Archived from the original (PDF) on 2013-05-15. Retrieved 2013-04-12.
  8. ^ "FreeDV: Open Source Amateur Digital Voice – Where Amateur Radio Is Driving The State of the Art".
  9. ^ "FreeDV, CODEC2 and the WaveformAPI". Archived from the original on 2015-04-02. Retrieved 2015-03-06.
  10. ^ "Introducing the SM1000 Smart Mic – Rowetel". 21 May 2014.
  11. ^ "Quisk, A Software Defined Radio (SDR)". james.ahlstrom.name.
  12. ^ "M17 protocol description". GitHub.
  13. ^ "QB-50 Constellation Satellites Deployed from ISS". American Radio Relay League website. 2017-11-15. Retrieved 2019-03-31.
  14. ^ "Techniques for Harmonic Sinusoidal Coding" (PDF). Archived from the original (PDF) on 2013-05-15. Retrieved 2013-04-12.
  15. ^ "Open Source Low Rate Speech Codec Part 1 – Rowetel". 21 August 2009.
  16. ^ "Codec2 V0.1 Alpha Released – Rowetel". 25 August 2010.
  17. ^ "A Pitch-Energy Quantizer for Codec2".
  18. ^ "Open Source Codec Encodes Voice Into Only 700 Bits Per Second". Slashdot. 13 January 2017. Retrieved 2019-03-31.
  19. ^ "Codec2 HF digital voice at 450 bps". Southgate Amateur Radio News. 2018-07-08. Retrieved 2019-03-31.
[edit]