Polytechnic University, Dept.
Electrical and Computer Engineering
EE3414 Multimedia Communication System I
Spring 2006, Yao Wang
___________________________________________________________________________________
Homework 4 (Speech and Audio Coding) Solution
1. Consider a source with 4 symbols {a,b,c,d}. The probability of the 4 symbols are
P(a)=0.4, p(b) = 0.1, p(c)=0.2, p(d)= 0.3.
a. Design a Huffman codebook for these symbols. Determine the average bit rate and compared it
to the entropy of this source.
b. Code the sequence {aacddacbda} using the codebook you designed. Write the resulting binary
bitsteam. Calculate the average bit rate.
Solution: (a) Huffman code design:
Symbol Prob Codeword Length
1
“a“ 0.4 “1“ 1
1
“d“ 0.3 “ 01 “ 2
0
1
“c“ 0.2 0.6 “ 001 “ 3
0
0 0.3
“b“ 0.1 “ 000 “ 3
l 0.4 *1 0.3* 2 (0.2 0.1)3 1.9
; H pk log pk 1.85 . H<l<H+1
Note that for this problem, you could have assigned “0” to “a” and “1” to the sum of “d,c,b”
which has a prob. of 0.6. Either solution is correct.
(b) Using the codebook above, the sequence {aacddacbda} is coded into {1100101011001000011}.
The total number of bits is 19. The total number of symbols is 10. So the average bit rate is
19/10=1.9 bit/symbol. (Note in general, the bit rate for a particular sequence may not be the same as
that calculated based on the probability. But in this example, the sequence was chosen so that the
symbol frequency matches with the given probability exactly. So the two average bit rates are the
same.)
2. Consider a predictive coding system using delta modulation. The predictor predicts the current sample
using the previously reconstructed sample. The prediction error is quantized according to
2 e0
Q ( e)
2 e0
For the sample sequence {3,4,5,3,1,…}
Show the predicted value, the prediction error, the quantized prediction error, and the reconstructed
value, for each sample, starting from the first sample. Assume that the encoder and decoder will use the
value of 2 as the prediction value for the first sample.
Assuming “1” represent e>=0. “0” for e<0, also write the binary representation of the coded stream.
What is the bit rate of the coded stream? (in terms of bit/sample)
Solution:
Sample Original Predicted value Prediction error Quantized Reconstructed value
index value (=previous (=original value- error (=predicted
reconstructed value) predicted value) value+quantized error)
1 3 2 1 2 4
2 4 4 0 2 6
3 5 6 -1 -2 4
4 3 4 -1 -2 2
5 1 2 -1 -2 0
The coded stream is “11000”. The bit rate is 1 bit/sample.
3. Explain why predictive coding (DPCM) can reduce the average bit rate compared to coding
each sampling directly (PCM).
If the predictor is well designed, the predicted value should be close to the original value most of
the time. Therefore, the prediction error is usually small, with occasional large values. Using
entropy coding, such errors can be coded with fewer bits than the original value.
4. Explain the principle of ADPCM and two different types of adaptation (forward and backward)
and what are their pros and cons.
With non-adaptive DPCM, the linear predictor used is fixed and is determined by minimizing the
mean square error between the original sample and the predicted sample. Essentially the linear
predictor coefficients are determined from the correlation coefficients of adjacent samples. When
the underlying signal changes its statistics in time (i.e. how correlated adjacent samples are), ideally
the predictor coefficients should also change. This is the idea behind adaptive DPCM or ADPCM.
The forward adaptation method looks ahead at a group of N samples, and computes the correlation
coefficients between adjacent samples, and based on the resulting correlation statistics, determine
the optimal linear predictor to be used for this N samples. The encoder needs to send not only the
prediction errors, but also the predictor used for every N samples. With backward adaptation, the
predictor for the new samples are determined based on the previously coded samples. Because the
decoder can perform the same computation to derive the predictor for the new samples, the encoder
does not need to send the predictor coefficients. But the predictor determined based on the past
samples are not as good as the predictor determined using the current samples, so the backward
adaptation will yield larger prediction errors in general. The backward adaptation however does not
need to buffer next N samples, so it has lower delay.
5. Explain the main difference between waveform-based coders, vocoders, and hybrid coders for
speech coding, in terms of techniques used and the bit-rate/quality range of each.
Waveform-based coders try to reproduce the speech sample values as closely as possible given the
target bit rate. They achieve compression by making use of the fact that adjacent samples have
similar values and employs ADPCM type of techniques. Vocoders are targeted for applications
requiring very low bit rate but the speech does not have to sound very natural, as long as it is
intelligible. Vocoders make use of the fact that the human vocal track can be modeled by a linear
filter, with filter coefficients change with the shape of the vocal track (depending on the sound
being produced). Given a set of speech samples, a vocoder deduces the filter model and the
excitation signal driving the filter and send these model parameters. The decoder synthesizes the
speech samples from these model parameters. Hybrid coders work in between, targeting at
applications requiring intermediate bit rate and quality. This is achieved by allowing a larger range
of the excitation signal. It also allows more sophisticated adaptation of the filter. Sometimes it also
allow specifying the errors between the original samples and those produced by the filter model.