Lecture 10 - Data Compression

Data Compression
By Fareed Ahmed Jokhio

Data Compression
• Data transmission and storage cost money.
• The more information being dealt with, the more it
costs.
• In spite of this, most digital data are not stored in the
most compact form.
• Rather, they are stored in whatever way makes them
easiest to use, such as: ASCII text from word
processors, binary code that can be executed on a
computer, individual samples from a data acquisition
system, etc.
Data Compression
• Typically, these easy-to-use encoding methods require
data files about twice as large as actually needed to
represent the information.
• Data compression is the general term for the various
algorithms and programs developed to address this
problem.
• A compression program is used to convert data from an
easy-to-use format to one optimized for compactness.
• Likewise, an uncompression program returns the
information to its original form.
Data Compression
• We examine five techniques for data
compression in this lecture.
• The first three are simple encoding
techniques, called: run-length, Huffman, and
delta encoding.
• The last two are elaborate procedures that
have established themselves as industry
standards: LZW and JPEG.
Data Compression Strategies
• Table below shows two different ways that
data compression algorithms can be
categorized.
• In (a), the methods have been classified as
either lossless or lossy.
• A lossless technique means that the restored
data file is identical to the original.
• This is absolutely necessary for many types of
data, for example: executable code, word
processing files, tabulated numbers, etc.
• You cannot afford to misplace even a single bit
of this type of information.
• In comparison, data files that represent images
and other acquired signals do not have to be keep
in perfect condition for storage or transmission.
• All real world measurements inherently contain a
certain amount of noise.
• If the changes made to these signals resemble a
small amount of additional noise, no harm is done.
• Compression techniques that allow this type of
degradation are called lossy.
• This distinction is important because lossy
techniques are much more effective at
compression than lossless methods.
• The higher the compression ratio, the more
noise added to the data.
• Images transmitted over the world wide web
are an excellent example of why data
compression is important.
• Suppose we need to download a digitized color
photograph over a computer's 33.6 kbps
modem.
• If the image is not compressed (a TIFF file, for
example), it will contain about 600 kbytes of
data.
• If it has been compressed using
a lossless technique (such as used in
the GIF format), it will be about one-half this
size, or 300 kbytes.
• If lossy compression has been used (a JPEG file),
it will be about 50 kbytes.
• The point is, the download times for these
three equivalent files are 142 seconds, 71
seconds, and 12 seconds, respectively.
• That's a big difference! JPEG is the best choice
for digitized photographs, while GIF is used
with drawn images, such as company logos
that have large areas of a single color.
• Our second way of classifying data
compression methods is shown in Table below
• Most data compression programs operate by
taking a group of data from the original file,
compressing it in some way, and then writing
the compressed group to the output file.
• For instance, one of the techniques in this
table is CS&Q, short for coarser sampling
and/or quantization.
• Suppose we are compressing a digitized waveform,
such as an audio signal that has been digitized to 12
bits.
• We might read two adjacent samples from the original
file (24 bits), discard one of the sample completely,
discard the least significant 4 bits from the other
sample, and then write the remaining 8 bits to the
output file.
• With 24 bits in and 8 bits out, we have implemented a
3:1 compression ratio using a lossy algorithm.
• While this is rather crude in itself, it is very
effective when used with a technique
called transform compression.
• As we will discuss later, this is the basis of
JPEG.
• Table below shows CS&Q to be a fixed-input
fixed-output scheme
• That is, a fixed number of bits are read from
the input file and a smaller fixed number of
bits are written to the output file.
• Other compression methods allow a variable
number of bits to be read or written.
• As you go through the description of each of
these compression methods, refer back to this
table to understand how it fits into this
classification scheme.
• Why are JPEG and MPEG not listed in this table?
• These are composite algorithms that combine
many of the other techniques.
• They are too sophisticated to be classified into
these simple categories.

Lecture 10 - Data Compression

Uploaded by

Lecture 10 - Data Compression

Uploaded by

Data Compression

By Fareed Ahmed Jokhio

You might also like