0% found this document useful (0 votes)
5 views8 pages

Tutorial Basic Information and Coding 2 With Answers

The document discusses coding schemes for a source emitting symbols with specified probabilities, comparing fixed and variable length coding. It also examines prefix codes, applies the Fano technique to design prefix codes for various symbol probabilities, and calculates bitrate requirements for encoding symbols. Additionally, it analyzes hexadecimal data for entropy and discusses letter frequencies in English to determine Shannon Information.

Uploaded by

ibrahim
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
5 views8 pages

Tutorial Basic Information and Coding 2 With Answers

The document discusses coding schemes for a source emitting symbols with specified probabilities, comparing fixed and variable length coding. It also examines prefix codes, applies the Fano technique to design prefix codes for various symbol probabilities, and calculates bitrate requirements for encoding symbols. Additionally, it analyzes hexadecimal data for entropy and discusses letter frequencies in English to determine Shannon Information.

Uploaded by

ibrahim
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 8

Tutorial Basic information and coding 2

1) Consider a source that emits a sequence of symbols that are i.i.d. (independently and identically
distributed) according to the probability mass function:
P(symbol = a) = 0.7 P(symbol = b) = 0.15 P(symbol = c) = 0.15
Propose a coding scheme that codes each symbol as bits, and uses a) fixed length coding, b) variable length
coding using as few bits per symbol as possible in each case. Compare each code with the Entropy and
decide which is best.

2) Are the following codes prefix codes; why?


a)
A B C D SYMBOLS
0 01 001 0001 CODEWORDS

b)
A B C D SYMBOLS
000 101 010 110 CODEWORDS

c) A B C SYMBOLS
0 1 10 CODEWORDS

d) A B C SYMBOLS
0 10 11 CODEWORDS

3) Using the Fano technique we have studied, design a prefix code for:

a) Symbol A B C D E
Probabilities 0.385 0.179 0.154 0.154 0.128

b) Symbol A B C D E
Probabilities 0.4 0.25 0.125 0.125 0.1

c) Symbol A B C D E
Probabilities 0.3 0.2 0.2 0.2 0.1

d) Symbol A B C D E
Probabilities 0.6 0.1 0.1 0.1 0.1

In each case, determine the mean number of bits per symbol that your coding scheme requires, and
compare with Entropy.

4) A source generates the following symbols i.i.d.:


Symbol X Y Z A
Probability 0.5 0.2 0.2 0.1
They are to be encoded and transmitted down a communications channel. If the symbol generation rate is
1 symbol per millisec, what bitrate is needed in the channel a) as an absolute minimum, b) as a practical
minimum if a prefix code is used (using Fano’s method), c) if fixed length encoding is used.
5) Consider the snapshot of hexadecimal data below:

Using this to determine the probabilities associated with each of the hexadecimal symbols, and then
calculate the Entropy.

If each hexadecimal symbol has probability 1/16, what then is the number of bits that would be needed to
encode each symbol?

6) Webpage

https://pi.math.cornell.edu/~mec/2003-2004/cryptography/subs/frequencies.html

provides us with the following letter frequencies for English:

• Letter Count Letter Frequency


• E 21912 E 12.02 Given this information, determine
• T 16587 T 9.10 the Shannon Information in
• A 14810 A 8.12 occurrences of the letters:
• O 14003 O 7.68
• I 13318 I 7.31 a) A
• N 12666 N 6.95
b) B
• S 11450 S 6.28
• R 10977 R 6.02
c) C
• H 10795 H 5.92
• D 7874 D 4.32
d) Determine the Entropy of
• L 7253 L 3.98
“DNA” as represented as A,
• U 5246 U 2.88 G, C, T, if I have the
• C 4943 C 2.71 following frequencies:
• M 4761 M 2.61 A 10001
• F 4200 F 2.30 G 9999
• Y 3853 Y 2.11 C 10010
• W 3819 W 2.09 T 9990
• G 3693 G 2.03
• P 3316 P 1.82
• B 2715 B 1.49
• V 2019 V 1.11
• K 1257 K 0.69
• X 315 X 0.17
• Q 205 Q 0.11
• J 188 J 0.10
• Z 128 Z 0.07
ANSWERS
1) a) Fixed length coding requires 2 bits per symbol, so e.g.:
00 is a, 01 is b, 10 is c

b) Variable length per symbol coding, using a few bits per symbol, might be: 0 is a, 10 is b, 11 is c

Entropy = H2 = 0.7*log2(1/0.7) + 0.15*log2(1/0.15) + 0.15*log2(1/0.15) = 1.1813 bits per symbol

Mean codeword length for fixed length coding = 2 bits per symbol

Mean codeword length for variable length coding = 0.7*(1) + 0.15*(2) + 0.15*(2) = 1.3 bits per
symbol.

So the variable length code is better in this case, as it uses fewer bits per symbol.

2) Are the following codes prefix codes; why?


a)
A B C D SYMBOLS
0 01 001 0001 CODEWORDS
No, codeword “0” is a prefix of codewords “01”, “001” and “0001”

b)
A B C D SYMBOLS
000 101 010 110 CODEWORDS
Yes, no codeword is a prefix of any other

c) A B C SYMBOLS
0 1 10 CODEWORDS
No, codeword “1” is a prefix of codeword “10”

d) A B C SYMBOLS
0 10 11 CODEWORDS
Yes, no codeword is a prefix of any other

3) Using the Fano technique we have studied, design a prefix code for:

a) Symbol A B C D E
Probabilities 0.385 0.179 0.154 0.154 0.128
1 1 1 1
𝐻2 = 0.385. log 2 (0.385) + 0.179. log 2 (0.179) + 0.154. log 2 (0.154) + 0.154. log 2 (0.154) +
1
0.128. log 2 (0.128) = 0.53 + 0.4443 + 0.4156 + 0.4156 + 0.3796 = 2.185bits per symbol

𝑀𝑒𝑎𝑛𝑁𝑢𝑚𝑏𝑒𝑟𝐵𝑖𝑡𝑠𝑃𝑒𝑟𝑆𝑦𝑚𝑏𝑜𝑙 = 0.385(2) + 0.179(2) + 0.154(2) + 0.154(3) + 0.128(3) =2.282bits


per symbol

b) Symbol A B C D E
Probabilities 0.4 0.25 0.125 0.125 0.1

1 1 1 1
𝐻2 = 0.4. log 2 (0.4) + 0.25. log 2 (0.25) + 0.125. log 2 (0.125) + 0.125. log 2 (0.125) +
1
0.1. log 2 (0.1) = 0.5288 + 0.5 + 0.375 + 0.375 + 0.3322 = 2.111bits per symbol

𝑀𝑒𝑎𝑛𝑁𝑢𝑚𝑏𝑒𝑟𝐵𝑖𝑡𝑠𝑃𝑒𝑟𝑆𝑦𝑚𝑏𝑜𝑙 = 0.4(1) + 0.25(2) + 0.125(3) + 0.125(4) + 0.1(4) =2.175bits per


symbol
c) Symbol A B C D E
Probabilities 0.3 0.2 0.2 0.2 0.1

1 1 1 1 1
𝐻2 = 0.3. log 2 ( ) + 0.2. log 2 ( ) + 0.2. log 2 ( ) + 0.2. log 2 ( ) + 0.1. log 2 ( ) = 0.5211 + 0.4644 +
0.3 0.2 0.2 0.2 0.1
0.4644 + 0.4644 + 0.3322 = 2.246 bits per symbol

𝑀𝑒𝑎𝑛𝑁𝑢𝑚𝑏𝑒𝑟𝐵𝑖𝑡𝑠𝑃𝑒𝑟𝑆𝑦𝑚𝑏𝑜𝑙 = 0.3(2) + 0.2(2) + 0.2(2) + 0.2(3) + 0.1(3) =2.3bits per symbol

d) d) Symbol A B C D E
Probabilities 0.6 0.1 0.1 0.1 0.1

1 1 1 1 1
𝐻2 = 0.6. log 2 (0.6) + 0.1. log 2 (0.1) + 0.1. log 2 (0.1) + 0.1. log 2 (0.1) + 0.1. log 2 (0.1) = 0.4422 + 0.3322 +
0.3322 + 0.3322 + 0.3322 = 1.771 bits per symbol

𝑀𝑒𝑎𝑛𝑁𝑢𝑚𝑏𝑒𝑟𝐵𝑖𝑡𝑠𝑃𝑒𝑟𝑆𝑦𝑚𝑏𝑜𝑙 = 0.6(1) + 0.1(2) + 0.1(3) + 0.1(4) + 0.1(4) = 1.9𝑏𝑖ts per symbol


4) A source generates the following symbols i.i.d.:

Symbol X Y Z A
Probability 0.5 0.2 0.2 0.1

They are to be encoded and transmitted down a communications channel. If the symbol generation rate is
1 symbol per millisec, what bitrate is needed in the channel a) as an absolute minimum, b) as a practical
minimum if a prefix code is used (using Fano’s method), c) if fixed length encoding is used.

a) The absolute minimum can be found using Entropy:

1 1 1 1
𝐻2 = 0.5. log 2 ( ) + 0.2. log 2 ( ) + 0.2. log 2 ( ) + 0.1. log 2 ( ) = 0.5 + 0.4644 + 0.4644 + 0.3322 =
0.5 0.2 0.2 0.1
1.761 bits per symbol

Therefore the absolute minimum bitrate = 1000 * 1.761 = 1761 bps

b) Using prefix encoding:

𝑀𝑒𝑎𝑛𝑁𝑢𝑚𝑏𝑒𝑟𝐵𝑖𝑡𝑠𝑃𝑒𝑟𝑆𝑦𝑚𝑏𝑜𝑙 = 0.5(1) + 0.2(2) + 0.3(3) + 0.1(4) = 1.9𝑏𝑖ts per symbol

Therefore the bitrate using prefix coding = 1000 * 1.9 = 1900 bps

c) Using fixed length coding, we would require 2 bits per codeword:

Therefore the bitrate using fixed length coding = 1000 * 2 = 2000 bps

5) Consider the snapshot of hexadecimal data below:


Using this to determine the probabilities associated with each of the hexadecimal symbols, and then
calculate the Entropy.

There are 6x8x4 = 192 entries:

• P(0) = 124/192
• P(1) = 11/192
• P(2) = 7/192
• P(3) = 1/192
• P(4) = 9/192
• P(6) = 4/192
• P(7) = 2/192
• P(8) = 14/192
• P(9) = 6/192
• P(a) = 4/192
• P(c) = 4/192
• P(d) = 1/192
• P(e) = 3/192
• P(f) = 2/192

124 192 11 192 7 192


𝐻2 = ( ) . log 2 ( )+( ) . log 2 ( )+( ) . log 2 ( )
192 124 192 11 192 7
1 192 9 192 4 192
+( ) . log 2 ( ) +( ) . log 2 ( ) +( ) . log 2 ( )
192 1 192 9 192 4
2 192 14 192 6 192
+( ) . log 2 ( ) +( ) . log 2 ( ) +( ) . log 2 ( )
192 2 192 14 192 6
4 192 4 192 1 192
+( ) . log 2 ( ) +( ) . log 2 ( ) +( ) . log 2 ( )
192 4 192 4 192 1
3 192 2 192
+( ) . log 2 ( ) +( ) . log 2 ( )
192 3 192 2
= 0.4074 + 0.2364 + .01742 + 0.0395 + 0.2069 + 0.1164 + 0.0686
+ 0.2755 + 0.1562 + 0.1164 + 0.1164 + 0.0395 + 0.0937 + 0.0686
= 2 𝑏𝑖𝑡𝑠 𝑝𝑒𝑟 𝑠𝑦𝑚𝑏𝑜𝑙

If each hexadecimal symbol has probability 1/16, what then is the number of bits that would be needed to
encode each symbol?

𝐻2 = 16. log 2 (16) = 4 bits per symbol


6) Webpage

https://pi.math.cornell.edu/~mec/2003-2004/cryptography/subs/frequencies.html

provides us with the following letter frequencies for English:

• Letter Count Letter Frequency


• E 21912 E 12.02
• T 16587 T 9.10
• A 14810 A 8.12
• O 14003 O 7.68
• I 13318 I 7.31
• N 12666 N 6.95
• S 11450 S 6.28
• R 10977 R 6.02
• H 10795 H 5.92
• D 7874 D 4.32
• L 7253 L 3.98
• U 5246 U 2.88
• C 4943 C 2.71
• M 4761 M 2.61
• F 4200 F 2.30
• Y 3853 Y 2.11
• W 3819 W 2.09
• G 3693 G 2.03
• P 3316 P 1.82
• B 2715 B 1.49
• V 2019 V 1.11
• K 1257 K 0.69
• X 315 X 0.17
• Q 205 Q 0.11
• J 188 J 0.10
• Z 128 Z 0.07

a) Shannon Information (A) = I(A) = log_2(40,000 / 14810) = 1.433 bits


b) Shannon Information (B) = I(B) = log_2(40,000 / 2715) = 3.881 bits
c) Shannon Information (C) = I(C) = log_2(40,000 / 4943) = 3.016 bits
d)
10001 40000 9999 40000
𝐻2 = ( ) . log 2 ( )+( ) . log 2 ( )
40000 10001 40000 9999
10010 40000 9990 40000
+( ) . log 2 ( ) +( ) log 2 ( )
40000 10010 40000 9990
= 0.5 + 0.5 + 0.5 + 0.5 = 2𝑏𝑖𝑡𝑠 𝑝𝑒𝑟 𝑠𝑦𝑚𝑏𝑜𝑙

You might also like