IMAGE COMPRESSION
STANDARDS
IMAGE COMPRESSION:INTRODUCTION
IMAGE COMPRESSION IS MINIMIZING THE IMAGE IN TERMS OF ITS SIZE IN BYTES OF
A GRAPHICS FILE WITHOUT DEGRADING THE QUALITY OF THE IMAGE TO AN
UNACCEPTABLE LEVEL.
THE REDUCTION IN FILE SIZE ALLOWS MORE IMAGES TO BE STORED IN A GIVEN
AMOUNT OF DISK OR MEMORY SPACE.
IT ALSO REDUCES THE TIME REQUIRED FOR IMAGES TO BE SENT OVER THE
INTERNET OR DOWNLOADED FROM WEB PAGES.
IMAGE COMPRESSION:INTRODUCTION
THERE ARE SEVERAL DIFFERENT WAYS IN WHICH IMAGE FILES CAN BE
COMPRESSED. FOR INTERNET USE, THE TWO MOST COMMON COMPRESSED GRAPHIC
IMAGE FORMATS ARE THE JPEG (JOINT PHOTOGRAPHIC EXPERT GROUP)FORMAT
AND THE GIF (GRAPHICS INTERCHANGE FORMAT) FORMAT.
THE JPEG METHOD IS MORE OFTEN USED FOR PHOTOGRAPHS, WHILE THE GIF
METHOD IS COMMONLY USED FOR LINE ART AND OTHER IMAGES IN WHICH
GEOMETRIC SHAPES ARE RELATIVELY SIMPLE.
CONTINUED……
A TEXT FILE OR PROGRAM CAN BE COMPRESSED WITHOUT THE INTRODUCTION OF ERRORS, BUT
ONLY UP TO A CERTAIN EXTENT.
THIS IS CALLED LOSSLESS COMPRESSION. BEYOND THIS POINT, ERRORS ARE INTRODUCED. IN TEXT
AND PROGRAM FILES, IT IS CRUCIAL THAT COMPRESSION BE LOSSLESS BECAUSE A SINGLE ERROR
CAN SERIOUSLY DAMAGE THE MEANING OF A TEXT FILE, OR CAUSE A PROGRAM NOT TO RUN.
IN IMAGE COMPRESSION, A SMALL LOSS IN QUALITY IS USUALLY NOT NOTICEABLE. THERE IS NO
"CRITICAL POINT" UP TO WHICH COMPRESSION WORKS PERFECTLY, BUT BEYOND WHICH IT
BECOMES IMPOSSIBLE.
WHEN THERE IS SOME TOLERANCE FOR LOSS, THE COMPRESSION FACTOR CAN BE GREATER THAN
IT CAN WHEN THERE IS NO LOSS TOLERANCE. FOR THIS REASON, GRAPHIC IMAGES CAN BE
COMPRESSED MORE THAN TEXT FILES OR PROGRAMS.
CONTINUED……
FINALLY AN IMAGE COMPRESSION IS THE PROCESS OF CONVERTING AN IMAGE FILE
INTO ANOTHER IMAGE FILE THAT OCCUPIES LESS STORAGE SPACE, WITHOUT
SACRIFICING ITS VISUAL CONTENT.
USEFUL FOR SAVING STORAGE SPACE, AND TRANSMISSION COSTS.
IMAGE COMPRESSION WAS MOST COMMONLY USED IN THE DATA STORAGE,
PRINTING AND TELECOMMUNICATION INDUSTRY. THE DIGITAL FORM OF IMAGE
COMPRESSION IS ALSO BEING PUT TO WORK IN INDUSTRIES SUCH AS FAX
TRANSMISSION, SATELLITE REMOTE SENSING, AND HIGH DEFINITION TELEVISION
APPLICATIONS:
A GOOD EXAMPLE IS THE HEALTH INDUSTRY, WHERE THE CONSTANT SCANNING AND/OR
STORAGE OF MEDICAL IMAGES AND DOCUMENTS TAKE PLACE.
IMAGE COMPRESSION OFFERS MANY BENEFITS HERE, AS INFORMATION CAN BE STORED
WITHOUT PLACING LARGE LOADS ON SYSTEM SERVERS. DEPENDING ON THE TYPE OF
COMPRESSION APPLIED, IMAGES CAN BE COMPRESSED TO SAVE STORAGE SPACE, OR TO
SEND TO MULTIPLE PHYSICIANS FOR EXAMINATION.
AND CONVENIENTLY, THESE IMAGES CAN UNCOMPRESS WHEN THEY ARE READY TO BE
VIEWED, RETAINING THE ORIGINAL HIGH QUALITY AND DETAIL THAT MEDICAL IMAGERY
DEMANDS
APPLICATIONS:
IN THE SECURITY INDUSTRY, IMAGE COMPRESSION CAN GREATLY INCREASE THE EFFICIENCY OF
RECORDING, PROCESSING AND STORAGE.
HOWEVER, IN THIS APPLICATION IT IS IMPERATIVE TO DETERMINE WHETHER ONE COMPRESSION
STANDARD WILL BENEFIT ALL AREAS.
FOR EXAMPLE, IN A VIDEO NETWORKING OR CLOSED-CIRCUIT TELEVISION APPLICATION, SEVERAL
IMAGES AT DIFFERENT FRAME RATES MAY BE REQUIRED.
TIME IS ALSO A CONSIDERATION, AS DIFFERENT AREAS MAY NEED TO BE RECORDED FOR VARIOUS
LENGTHS OF TIME.
IMAGE RESOLUTION AND QUALITY ALSO BECOME CONSIDERATIONS, AS DOES NETWORK BANDWIDTH,
AND THE OVERALL SECURITY OF THE SYSTEM
TYPES OF COMPRESSION
LOSSLESS: THE COMPRESSED IMAGE CAN BE CONVERTED BACK WITH ZERO ERROR.
LOSSY: THE COMPRESSED IMAGE CANNOT BE CONVERTED BACK TO THE ORIGINAL
WITHOUT ERROR. THE AMOUNT OF ERROR IS INVERSELY PROPORTIONAL TO THE
STORAGE SPACE (USUALLY) AND CAN BE CONTROLLED BY THE USER.
LOSSLESS COMPRESSION -EXAMPLES
LZW METHOD (USED IN WINZIP)
HUFFMAN ENCODING (PART OF THE JPEG ALGORITHM, ALTHOUGH OVERALL JPEG IS
LOSSY)
RUN-LENGTH ENCODING (ALSO PART OF THE JPEG ALGORITHM, ALTHOUGH JPEG IS
LOSSY OVERALL)
LOSSY COMPRESSION -EXAMPLES
JPEG(JOINT PHOTOGRAPHIC EXPERTS GROUP)
MPEG (MOVING PICTURE EXPERTS GROUP, FOR VIDEO)
MP3 (MP3 IS AN AUDIO CODING FORMAT FOR DIGITAL AUDIO WHICH USES A FORM
OF IRREVERSIBLE DATA COMPRESSION, FOR AUDIO)
MACHINE LEARNING BASED TECHNIQUES FOR COMPRESSION OF IMAGES OR VIDEO
(NOT COVERED IN THIS COURSE).
LOSSY IMAGE COMPRESSION
COMPRESSION OF TEXT FILES OR EXE FILES CANNOT AFFORD TO BE LOSSY
BUT SOME PORTION OF IMAGE CONTENT IS OFTEN NOT VERY NOTICEABLE TO THE
HUMAN EYE, ESPECIALLY THE HIGHER FREQUENCIES. DISCARDING THIS
EXTRANEOUS INFORMATION LEADS TO COMPRESSION WITHOUT SIGNIFICANT LOSS
OF VISUAL APPEAL.
JPEG COMPRESSION METHOD
JPEG = JOINT PHOTOGRAPHIC EXPERTS GROUP
ONE OF THE MOST POPULAR STANDARDS FOR COMPRESSION OF PHOTOGRAPHIC
IMAGES –WIDELY USED ON THE INTERNET.
WIDELY USED IN DIGITAL CAMERAS.
IMPLEMENTED IN ALL STANDARD IMAGE PROCESSING SOFTWARE (MATLAB, openCV,
ETC.)
ESSENTIALLY LOSSY(THOUGH THERE ARE SOME LOSSLESS VARIANTS)
APPLICABLE FOR COLOR AS WELL AS GRAYSCALE IMAGES.
JPEG COMPRESSION METHOD
A USER-SPECIFIED QUALITY FACTOR (Q) BETWEEN 0 AND 100 (HIGHER Q MEANS
BETTER QUALITY)
JPEG ALGORITHM COMPRESSES THE IMAGE BASED ON THE USER-PROVIDED Q.
HIGHER THE Q, LESS WILL BE THE COMPRESSION RATE (BUT HIGHER IMAGE
QUALITY). LOWER Q WILL GIVE HIGHER COMPRESSION RATE (BUT POORER IMAGE
QUALITY).
JPEG CAN ACHIEVE 1/10 OR 1/15 COMPRESSION RATE WITH LITTLE LOSS OF QUALITY.
JPEG COMPRESSION METHOD
MAIN STEPS IN JPEG IMAGE COMPRESSION
AS WE KNOW, UNLIKE ONE - DIMENSIONAL AUDIO SIGNALS, A DIGITAL IMAGE F (I, J) IS
NOT DEFINED OVER THE TIME DOMAIN.
INSTEAD, IT IS DEFINED OVER A SPATIAL DOMAIN — THAT IS, AN IMAGE IS A FUNCTION
OF THE TWO DIMENSIONS I AND J (OR, CONVENTIONALLY, X AND Y).
THE 2D DCT IS USED AS ONE STEP IN JPEG, TO YIELD A FREQUENCY RESPONSE THAT IS
A FUNCTION F (U, V) IN THE SPATIAL FREQUENCY DOMAIN, INDEXED BY TWO INTEGERS
U AND V.
JPEG IS A LOSSY IMAGE COMPRESSION METHOD. THE EFFECTIVENESS OF THE DCT
TRANSFORM CODING METHOD IN JPEG RELIES ON THREE MAJOR OBSERVATIONS:
OBSERVATIONS FOR JPEG IMAGE COMPRESSION
OBSERVATION 1: USEFUL IMAGE CONTENTS CHANGE RELATIVELY SLOWLY ACROSS
THE IMAGE, I.E., IT IS UNUSUAL FOR INTENSITY VALUES TO VARY WIDELY SEVERAL
TIMES IN A SMALL AREA, FOR EXAMPLE, WITHIN AN 8×8 IMAGE BLOCK.
MUCH OF THE INFORMATION IN AN IMAGE IS REPEATED, HENCE “SPATIAL
REDUNDANCY”.
OBSERVATIONS FOR JPEG IMAGE COMPRESSION
OBSERVATION 2: PSYCHOPHYSICAL EXPERIMENTS SUGGEST THAT HUMANS ARE
MUCH LESS LIKELY TO NOTICE THE LOSS OF VERY HIGH SPATIAL FREQUENCY
COMPONENTS THAN THE LOSS OF LOWER FREQUENCY COMPONENTS.
THE SPATIAL REDUNDANCY CAN BE REDUCED BY LARGELY REDUCING THE HIGH
SPATIAL FREQUENCY CONTENTS.
OBSERVATIONS FOR JPEG IMAGE COMPRESSION
OBSERVATION 3: VISUAL ACUITY (ACCURACY IN DISTINGUISHING CLOSELY SPACED
LINES) IS MUCH GREATER FOR GRAY (“BLACK AND WHITE”) THAN FOR COLOR.
CHROMA SUBSAMPLING ([Link]) IS USED IN JPEG.
JPEG IMAGE COMPRESSION
THE GENERAL APPROACH TO IMAGE COMPRESSION IMPLIES THE FOLLOWING
STAGES:
COLOR TRANSFORM FROM RGB TO YCBCR TOGETHER WITH A SHIFT
SUB-SAMPLING AND PARTITIONING
TRANSFORM TO A FREQUENCY DOMAIN
REMOVAL OF HIGH-FREQUENCY DETAIL FROM THE IMAGE
REORDERING FOR BETTER COMPRESSION
REMOVAL OF ZEROS SERIES
LOSSLESS ENTROPY CODING TO REMOVE SOME MORE EXTRA DATA
FINAL PACKING
JPEG IMAGE COMPRESSION
JPEG IMAGE COMPRESSION
JPEG'S APPROACH TO THE USE OF DCT IS BASICALLY TO REDUCE HIGH - FREQUENCY - CONTENTS
AND THEN EFFICIENTLY CODE THE RESULT INTO A BIT STRING.
THE TERM SPATIAL REDUNDANCY INDICATES THAT MUCH OF THE INFORMATION IN AN IMAGE IS
REPEATED: IF A PIXEL IS RED, THEN ITS NEIGHBOR IS LIKELY RED ALSO.
BECAUSE OF OBSERVATION 2 ABOVE, THE DCT COEFFICIENTS FOR THE LOWEST FREQUENCIES ARE
MOST IMPORTANT.
THEREFORE, AS FREQUENCY GETS HIGHER, IT BECOMES LESS IMPORTANT TO REPRESENT THE DCT
COEFFICIENT ACCURATELY.
IT MAY EVEN BE SAFELY SET TO ZERO WITHOUT LOSING MUCH PERCEIVABLE IMAGE INFORMATION.
JPEG IMAGE COMPRESSION
CLEARLY, A STRING OF ZEROS CAN BE REPRESENTED EFFICIENTLY AS THE LENGTH OF SUCH A RUN OF
ZEROS, AND COMPRESSION OF BITS REQUIRED IS POSSIBLE, SINCE WE END UP USING FEWER NUMBERS
TO REPRESENT THE PIXELS IN BLOCKS, BY REMOVING SOME LOCATION - DEPENDENT INFORMATION, WE
HAVE EFFECTIVELY REMOVED SPATIAL REDUNDANCY.
JPEG WORKS FOR BOTH COLOR AND GRAYSCALE IMAGES. IN THE CASE OF COLOR IMAGES, SUCH AS YIQ
OR YUV, THE ENCODER WORKS ON EACH COMPONENT SEPARATELY, USING THE SAME ROUTINES.
IF THE SOURCE IMAGE IS IN A DIFFERENT COLOR FORMAT, THE ENCODER PERFORMS A COLOR - SPACE
CONVERSION TO YIQ OR YUV. THE CHROMINANCE IMAGES (/, Q OR U, V) ARE SUBSAMPLED: JPEG USES
[Link] SCHEME.(WITH [Link], FOR EVERY TWO ROWS OF FOUR PIXELS, COLOR IS SAMPLED FROM TWO
PIXELS IN THE TOP ROW AND ZERO PIXELS IN THE BOTTOM ROW.)
COLOR TRANSFORM FROM RGB TO YCBCR
THAT TRANSFORM IS BASED ON OUR PHYSIOLOGICAL EXPERIENCE.
THE HUMAN VISUAL SYSTEM CAN PERCEIVE MINOR CHANGES OF BRIGHTNESS,
THOUGH IT’S FAR LESS RESPONSIVE TO CHANGES OF COLOR (CHROMA COMPONENTS
OF THE IMAGE) FOR THE REGIONS WITH THE SAME BRIGHTNESS.
THAT’S WHY WE CAN APPLY STRONGER COMPRESSION TO CHROMA TO GET LESS
IMAGE SIZE OF THE COMPRESSED IMAGE.
WE TAKE AN RGB IMAGE AND CONVERT IT TO LUMA/CHROMA REPRESENTATION IN
ORDER TO SEPARATE LUMA FROM CHROMA AND TO PROCESS THEM SEPARATELY.
COLOR TRANSFORM FROM RGB TO YCBCR
LUMA IS USUALLY CALLED Y (INTENSITY, BRIGHTNESS) AND CHROMA COMPONENTS
ARE CALLED CB AND CR (THESE ACTUALLY DIFFERENCE CB = B — Y AND CR = R — Y).
THAT TRANSFORM IS DONE AT THE SAME TIME WITH DATA SHIFT TO PREPARE DATA
TO PROCESSING STAGE WHICH IS CALLED DCT (DISCRETE COSINE TRANSFORM).
SUBSAMPLING AND PARTITIONING
AS SOON AS WE CAN CONSIDER CHROMA COMPONENTS TO BE LESS IMPORTANT
THAN LUMA, WE CAN DECREASE THE TOTAL NUMBER OF CHROMA PIXELS.
FOR EXAMPLE, WE CAN AVERAGE CHROMA IN HORIZONTAL OR VERTICAL
DIRECTION.
AT THE MOST EXTREME CASE, WE CAN AVERAGE 4 NEIGHBOR CHROMA VALUES IN
THE RECTANGLE 2X2 TO GET JUST ONE NEW VALUE.
THAT MODE IS CALLED [Link] AND THIS IS THE MOST POPULAR CHOICE FOR
SUBSAMPLING.
SUBSAMPLING AND PARTITIONING
SUBSAMPLING: THE REDUCTION OF COLOR RESOLUTION IN DIGITAL COMPONENT VIDEO SIGNALS IN ORDER TO SAVE
STORAGE AND BANDWIDTH.
THE COLOR COMPONENTS ARE COMPRESSED BY SAMPLING THEM AT A LOWER RATE THAN THE BRIGHTNESS (LUMA).
ALTHOUGH COLOR INFORMATION IS DISCARDED, HUMAN EYES ARE LESS SENSITIVE TO COLOR THAN TO
BRIGHTNESS.
YCBCR IS DESIGNATED AS 4:N:N (Y:CB:CR) THE ZERO MEANS THAT CB AND CR ARE SAMPLED AT HALF THE VERTICAL
RESOLUTION OF Y. MPEG-1 AND MPEG-2 USE [Link], BUT THE SAMPLES ARE TAKEN AT DIFFERENT INTERVALS. BY THE
TIME MPEG-2 CAME ALONG, IT WAS KNOWN THAT [Link] CODING WAS OFTEN CONVERTED TO [Link], WHICH IS WHY
MPEG-2 SAMPLING MORE CLOSELY LINES UP WITH THE [Link] PATTERN. H.261/263 ALSO USES [Link].
SUBSAMPLING AND PARTITIONING
FOR FURTHER PROCESSING, WE DIVIDE THE WHOLE IMAGE INTO BLOCKS 8X8 FOR
LUMA AND CHROMA. THAT PARTITIONING SCHEME LETS US PROCESS EACH BLOCK
INDEPENDENTLY, THOUGH WE WILL HAVE TO REMEMBER COORDINATES OF EACH
BLOCK WHICH ARE ESSENTIAL AT IMAGE DECODING.
DISCRETE COSINE TRANSFORM
THE IDEAS BEHIND JPEG COMPRESSION COME FROM AN ENGINEERING
BACKGROUND. ELECTRICAL AND SOUND WAVES CAN BE REPRESENTED AS A SERIES
OF AMPLITUDES OVER TIME. DISCRETE COSINE TRANSFORM (DCT) IS ONE OF THE
BASIC BUILDING BLOCKS FOR JPEG.
IMPORTANT ASPECT OF THE DCT IS THE ABILITY TO QUANTIZE THE DCT
COEFFICIENTS USING USUALLY WEIGHTED QUANTIZATION VALUES.
DISCRETE COSINE TRANSFORM
DCT IS A FOURIER-RELATED TRANSFORM WHICH IS SIMILAR TO THE DISCRETE
FOURIER TRANSFORM (DFT) BUT USING ONLY REAL NUMBERS.
ACTUALLY, WE APPLY THAT 2D TRANSFORM TO EACH BLOCK 8X8 OF OUR IMAGE.
THE MAIN IDEA IS TO GET OTHER DATA REPRESENTATION AND TO MOVE FROM
SPATIAL TO A FREQUENCY DOMAIN.
THE RESULT OF DCT IS DATA ARRAY IN A FREQUENCY DOMAIN AND THIS IS A VERY
CLEVER STEP TO WORK FURTHER NOT DIRECTLY WITH LUMA AND CHROMA, BUT
WITH FREQUENCIES OF LUMA AND CHROMA FROM OUR IMAGE.
DISCRETE COSINE TRANSFORM
BIG OBJECTS ON THE IMAGE ARE CONSIDERED TO BE LOW-FREQUENCY DATA, THOUGH
SMALL/TINY OBJECTS ARE CONSIDERED TO BE HIGH-FREQUENCY ELEMENTS.
IN THE NEW BLOCK 8X8 THE UPPER LEFT ELEMENT IS CALLED DC (THIS IS AVERAGE
VALUE FOR ALL PIXELS FROM THE ORIGINAL BLOCK), AND ALL OTHER ELEMENTS ARE
CALLED AC.
IF WE COMPOSE A NEW IMAGE FROM DC ELEMENTS OF EACH BLOCK, WE GET ORIGINAL
IMAGE WITH REDUCED RESOLUTION. NEW WIDTH AND HEIGHT WILL BE 1/8 FROM THE
ORIGINAL IMAGE.
QUANTIZATION
THE DCT COEFFICIENTS ARE FLOATING POINT NUMBERS AND STORING THEM IN A
FILE WILL PRODUCE NO COMPRESSION. SO THEY NEED TO BE QUANTIZED.
THE HUMAN EYE IS NOT SENSITIVE TO CHANGES IN THE HIGHER FREQUENCY
CONTENT. SO WE CAN HAVE CRUDER QUANTIZATION FOR THE HIGHER FREQUENCY
COEFFICIENTS AND A FINER ONE FOR THE LOWER FREQUENCY COEFFICIENTS.
QUANTIZATION IS PERFORMED BY DIVIDING THE DCT COEFFICIENTS BY A
QUANTIZATION MATRIX AND ROUNDING OFF TO THE NEAREST INTEGER.
QUANTIZATION
THIS IS THE LOSSY PART OF JPEG!
THE QUANTIZATION STEP IN JPEG IS AIMED AT REDUCING THE TOTAL NUMBER OF
BITS NEEDED FOR A COMPRESSED IMAGE. IT CONSISTS OF SIMPLY DIVIDING EACH
ENTRY IN THE FREQUENCY SPACE BLOCK BY AN INTEGER, THEN ROUNDING:
HERE, F(U,V) REPRESENTS A DCT COEFFICIENT, Q(U,V) IS A QUANTIZATION MATRIX
PREPARATION FOR ENTROPY CODING
SO FAR SEEN TWO OF THE MAIN STEPS IN JPEG COMPRESSION: DCT AND
QUANTIZATION.
THE REMAINING SMALL STEPS SHOWN IN THE BLOCK DIAGRAM ALL LEAD UP TO
ENTROPY CODING OF THE QUANTIZED DCT COEFFICIENTS.
THESE ADDITIONAL DATA COMPRESSION STEPS ARE LOSSLESS.
INTERESTINGLY, THE DC AND AC COEFFICIENTS ARE TREATED QUITE DIFFERENTLY
BEFORE ENTROPY CODING: RUN - LENGTH ENCODING ON ACS VERSUS DPCM ON
DCS.
RUN - LENGTH CODING (RLC) ON AC COEFFICIENTS
THE MANY ZEROS IN F(U, V) AFTER QUANTIZATION IS APPLIED. RUN - LENGTH
CODING {RLC) {OR RUN - LENGTH ENCODING, RLE) IS THEREFORE USEFUL IN
TURNING THE F(U, V) VALUES INTO SETS {# - ZEROS - TO - SKIP, NEXT NONZERO
VALUE}.
RLC IS EVEN MORE EFFECTIVE WHEN WE USE AN ADDRESSING SCHEME, MAKING IT
MOST LIKELY TO HIT A LONG RUN OF ZEROS: A ZIGZAG SCAN TURNS THE 8 X 8
MATRIX F(U, V) INTO A 64 - VECTOR, AS THE FOLLOWING FIGURE ILLUSTRATES.
AFTER ALL, MOST IMAGE BLOCKS TEND TO HAVE SMALL HIGH - SPATIAL -
FREQUENCY COMPONENTS, WHICH ARE ZEROED OUT BY QUANTIZATION.
HENCE THE ZIGZAG SCAN ORDER HAS A GOOD CHANCE OF CONCATENATING LONG
RUN - LENGTH CODING (RLC) ON AC COEFFICIENTS
FOR EXAMPLE, F (U , V) WILL BE TURNED INTO
(32,6,-1,-1,0,-1,0,0,0,-1,0,0, 1,0,0,... ,0)
WITH THREE RUNS OF ZEROS IN THE MIDDLE AND A RUN OF 51 ZEROS AT THE END.
ZIGZAG SCAN IN JPEG: THE RLC STEP REPLACES VALUES BY A PAIR (RUNLENGTH, VALUE) FOR EACH
RUN OF ZEROS IN THE AC COEFFICIENTS OF F, WHERE RUNLENGTH IS THE
NUMBER OF ZEROS IN THE RUN AND VALUE IS THE NEXT NONZERO
COEFFICIENT. TO FURTHER SAVE BITS, A SPECIAL PAIR (0,0) INDICATES
THE END - OF - BLOCK AFTER THE LAST NONZERO AC COEFFICIENT IS
REACHED.
DIFFERENTIAL PULSE CODE MODULATION (DPCM) ON DC COEFFICIENTS
THE DC COEFFICIENTS ARE CODED SEPARATELY FROM THE AC ONES.
EACH 8 X 8 IMAGE BLOCK HAS ONLY ONE DC COEFFICIENT.
THE VALUES OF THE DC COEFFICIENTS FOR VARIOUS BLOCKS COULD BE LARGE
AND DIFFERENT, BECAUSE THE DC VALUE REFLECTS THE AVERAGE INTENSITY OF
EACH BLOCK, BUT CONSISTENT WITH OBSERVATION 1 ABOVE,
THE DC COEFFICIENT IS UNLIKELY TO CHANGE DRASTICALLY WITHIN A SHORT
DISTANCE.
THIS MAKES DPCM AN IDEAL SCHEME FOR CODING THE DC COEFFICIENTS.
DIFFERENTIAL PULSE CODE MODULATION (DPCM) ON DC COEFFICIENTS
IF THE DC COEFFICIENTS FOR THE FIRST FIVE IMAGE BLOCKS ARE 150,155,149,152,
144, DPCM WOULD PRODUCE 150, 5, —6, 3, —8,
ASSUMING THE PREDICTOR FOR THE ITH BLOCK IS SIMPLY DI = DCI + 1 — DCI, AND
DO =DC0.
WE EXPECT DPCM CODES TO GENERALLY HAVE SMALLER MAGNITUDE AND
VARIANCE, WHICH IS BENEFICIAL FOR THE NEXT ENTROPY CODING STEP.
IT IS WORTH NOTING THAT UNLIKE THE RUN - LENGTH CODING OF THE AC
COEFFICIENTS, WHICH IS PERFORMED ON EACH INDIVIDUAL BLOCK, DPCM FOR THE
DC COEFFICIENTS IN JPEG IS CARRIED OUT ON THE ENTIRE IMAGE AT ONCE.
ENTROPY CODING
THE DC AND AC COEFFICIENTS FINALLY UNDERGO AN ENTROPY CODING STEP.
BELOW, WE WILL DISCUSS ONLY THE BASIC ENTROPY CODING METHOD, WHICH
USES HUFFMAN CODING AND SUPPORTS ONLY 8 - BIT PIXELS IN THE ORIGINAL
IMAGES (OR COLOR IMAGE COMPONENTS).
LET'S EXAMINE THE TWO ENTROPY CODING SCHEMES, USING A VARIANT OF
HUFFMAN CODING FOR DCS AND A SLIGHTLY DIFFERENT SCHEME FOR ACS.
HUFFMAN CODING OF DC COEFFICIENTS
EACH DPCM - CODED DC COEFFICIENT IS REPRESENTED BY A PAIR OF SYMBOLS
(SIZE, AMPLITUDE), WHERE SIZE INDICATES HOW MANY BITS ARE NEEDED FOR
REPRESENTING THE COEFFICIENT AND AMPLITUDE CONTAINS THE ACTUAL BITS.
DPCM VALUES COULD REQUIRE MORE THAN 8 BITS AND COULD BE NEGATIVE
VALUES. THE ONE'S - COMPLEMENT SCHEME IS USED FOR NEGATIVE NUMBERS —
THAT IS, BINARY CODE 10 FOR 2, 01 FOR — 2; 11 FOR 3, 00 FOR — 3; AND SO ON.
IN THE JPEG IMPLEMENTATION, SIZE IS HUFFMAN CODED AND IS HENCE A VARIABLE
- LENGTH CODE. IN OTHER WORDS, SIZE 2 MIGHT BE REPRESENTED AS A SINGLE BIT
(0 OR 1) IF IT APPEARED MOST FREQUENTLY.
IN GENERAL, SMALLER SIZES OCCUR MUCH MORE OFTEN —- THE ENTROPY OF SIZE
HUFFMAN ENCODING
INPUT: A SET OF NON-ZERO QUANTIZED DCT COEFFICIENTS FROM ALL THE
DIFFERENT BLOCKS OF THE IMAGE (VALUES LYING BETWEEN -1024 TO +1024).
OUTPUT: A SET OF ENCODED COEFFICIENTS WITH LENGTH (IN TERMS OF NUMBER
OF BITS) LESS THAN THAT OF THE ORIGINAL SET.
PRINCIPLES BEHIND HUFFMAN ENCODING:
(1)ENCODE THE MORE FREQUENTLY OCCURRING COEFFICIENTS WITH FEWER BITS.
ENCODE THE RARELY OCCURRING COEFFICIENTS WITH MORE BITS. THIS WILL REDUCE
THE AVERAGE BIT-LENGTH.
(2)ENSURE THAT THE ENCODING FOR NO COEFFICIENT IS A STRICT PREFIX OF THE
ENCODING OF ANY OTHER COEFFICIENT (TO BE EXPLAINED ON NEXT SLIDE). THIS IS
HUFFMAN ENCODING EXAMPLE
CONSIDER A SET OF ALPHABETS {A,E,Q}. LET THE FREQUENCY OF AN ALPHABET
‘X’BE DENOTED AS P(X).
ASSUME P(E) > P(A) > P(Q) [ACTUALLY TRUE IN THE ENGLISH LANGUAGE].
CONSIDER THE FOLLOWING CODE-WORD ASSIGNMENT: E –0, A –1, Q –01(NOTE: WE
ASSIGNED MORE BITS FOR Q). NOW CONSIDER THE ENCODED STREAM: 001. IT CAN
BE INTERPRETED AS ‘EEA’ OR ‘EQ’.
THE REASON FOR THIS AMBIGUITY IS THAT THE CODE FOR ‘E’IS A STRICT PREFIX OF
THE CODE FOR ‘Q’.
FOR UNAMBIGUOUS DECODING, WE NEED PREFIX-FREE CODES. EXAMPLE E –0, A –10,
Q –11 IS ONE EXAMPLE OF A PREFIX-FREE CODE.
HUFFMAN ENCODING: ALGORITHM
[Link] ALPHABETS IN INCREASING ORDER OF FREQUENCY. CREATE A LEAF NODE
FROM EACH ALPHABET. THESE LEAF NODES WILL BELONG TO A BINARY TREE CALLED
THE HUFFMAN TREE.
[Link] THE TWO LOWEST FREQUENCY NODES S1 AND S2 TO CREATE A PARENT
NODE S12. S1 AND S2 WILL BE THE LEFT AND RIGHT CHILD OF S12. THE FREQUENCY OF
S12 IS GIVEN BY P(S12) = P(S1) + P(S2).
[Link] THE EDGE FROM S12TO S1WITH A ‘0’ AND THE EDGE FROM S12TO S2WITH A ‘1’.
[Link] S1AND S2 FROM THE SORTED LIST OF ALPHABETS AND INSERT THE NODE S12,
I.E. ROOT NODE OF THE TREE (S12,S1,S2) IN THE CORRECT PLACE DEPENDING ON THE
VALUE OF P(S12).
HUFFMAN ENCODING: ALGORITHM
[Link] STEPS 2 TO 4 UNTIL THERE IS ONLY ONE NODE IN THE LIST. THIS WILL BE THE
ROOT NODE OF THE FINAL HUFFMAN TREE.
[Link] THE TREE FROM THE ROOT NODE UNTIL EACH LEAF AND COLLECT ALL
THE BINARY SYMBOLS ALONG EVERY EDGE INTO A STRING. THIS STRING WILL FORM
THE CODE WORD FOR THAT SYMBOL.
FOUR COMMONLY USED JPEG MODES
THE JPEG STANDARD DEFINED FOUR COMPRESSION MODES: HIERARCHICAL,
PROGRESSIVE, SEQUENTIAL AND LOSSLESS. FIGURE SHOWS THE RELATIONSHIP OF
MAJOR JPEG COMPRESSION MODES AND ENCODING PROCESSES.
SEQUENCIAL JPEG MODES
IMAGE COMPONENTS ARE COMPRESSED EITHER INDIVIDUALLY OR IN GROUPS.
EACH IMAGE COMPONENT IS ENCODED IN A SINGLE LEFT-TO-RIGHT, TOP-TO-
BOTTOM SCAN.
IT SUPPORTS ONLY 8-BIT IMAGES (NOT 12-BIT IMAGES)
COLOR COMPONENTS INTERLEAVING IS DONE TO SAVE BUFFER SIZE.
WITHIN SEQUENTIAL MODE, TWO ALTERNATE ENTROPY ENCODING PROCESSES ARE
DEFINED BY THE JPEG STANDARD: ONE USES HUFFMAN ENCODING; THE OTHER
USES ARITHMETIC CODING.
PROGRESSIVE JPEG MODES
A PROGRESSIVE JPEG IS AN IMAGE CREATED USING COMPRESSION ALGORITHMS
THAT LOAD THE IMAGE IN SUCCESSIVE WAVES UNTIL THE ENTIRE IMAGE IS
DOWNLOADED. THIS MAKES THE IMAGE APPEAR TO LOAD FASTER, AS IT LOADS THE
WHOLE IMAGE IN PROGRESSIVE WAVES. A NORMAL JPEG LOADS THE IMAGE FROM
THE TOP TO BOTTOM LINE BY LINE.
PROGRESSIVE JPEG DELIVERS LOW QUALITY VERSIONS OF THE IMAGE QUICKLY,
FOLLOWED BY HIGHER QUALITY PASSES.
PRINCIPLE BEHIND THE PROGRESSIVE JPEG:
JPEG FIRST CONVERTS RGB PIXELS TO YCBCR PIXELS. INSTEAD OF HAVING RED,
GREEN AND BLUE CHANNELS, JPEG USES A LUMA (Y) CHANNEL AND TWO CHROMA
CHANNELS (CB AND CR).
THOSE CHANNELS ARE TREATED SEPARATELY, BECAUSE THE HUMAN EYE IS MORE
SENSITIVE TO DISTORTION IN LUMA (BRIGHTNESS) THAN IT IS TO DISTORTION IN
CHROMA (COLOR).
THE CHROMA CHANNELS ARE OPTIONALLY DOWNSAMPLED TO HALF THE ORIGINAL
RESOLUTION; THIS IS CALLED CHROMA SUBSAMPLING.
THEN, JPEG DOES A BIT OF MATHEMATICAL MAGIC WITH THE PIXELS. THIS MAGIC IS
CALLED THE DISCRETE COSINE TRANSFORM (DCT).
PRINCIPLE BEHIND THE PROGRESSIVE JPEG:
EVERY BLOCK OF 8X8 PIXELS (64 PIXEL VALUES) IS CONVERTED TO 64 COEFFICIENTS
THAT REPRESENT THE BLOCK’S INFORMATION IN A DIFFERENT WAY.
THE FIRST COEFFICIENT IS CALLED THE DC COEFFICIENT AND IT BOILS DOWN TO
THE AVERAGE PIXEL VALUE OF ALL THE PIXELS IN THE BLOCK.
THE OTHER 63 COEFFICIENTS (THE SO-CALLED AC COEFFICIENTS) REPRESENT
HORIZONTAL AND VERTICAL DETAILS WITHIN THE BLOCK; THEY ARE ORDERED
FROM LOW FREQUENCY (OVERALL GRADIENTS) TO HIGH FREQUENCY (SHARP
DETAILS).
THE GOAL OF THESE TRANSFORMATIONS IS TO DO LOSSY IMAGE COMPRESSION.
FOR OUR PERCEPTION, LUMA AND LOW-FREQUENCY SIGNALS ARE MORE
THE PROGRESSIVE JPEG:
TWO WAYS OF DOING THIS:–
1. SPECTRAL SELECTION: COEFF. ARE GROUPED INTO SPECTRAL BANDS, AND LOWER-
FREQUENCY BANDS SENT FIRST.
2. SUCCESSIVE APPROXIMATION: DATA IS FIRST SENT WITH LOWER PRECISION AND
THEN REFINED.
SPECTRAL SELECTION
TAKES ADVANTAGE OF THE “SPECTRAL” (SPATIAL FREQUENCY SPECTRUM)
CHARACTERISTICS OF THE DCT COEFFICIENTS: HIGHER AC COMPONENTS PROVIDE
DETAIL INFORMATION.
SCAN 1: ENCODE DC AND FIRST FEW AC COMPONENTS,
E.G., AC1, AC2.
SCAN 2: ENCODE A FEW MORE AC COMPONENTS,
E.G., AC3, AC4, AC5. . . .
SCAN K: ENCODE THE LAST FEW ACS,
E.G., AC61, AC62, AC63.
SUCCESSIVE APPROXIMATION:
INSTEAD OF GRADUALLY ENCODING SPECTRAL BANDS, ALL DCT COEFFICIENTS ARE
ENCODED SIMULTANEOUSLY BUT WITH THEIR MOST SIGNIFICANT BITS (MSBS)
FIRST.
SCAN 1: ENCODE THE FIRST FEW MSBS, E.G., BITS 7, 6, 5, 4.
SCAN 2: ENCODE A FEW MORE LESS SIGNIFICANT BITS, E.G.,
BIT 3.
SCAN M: ENCODE THE LEAST SIGNIFICANT BIT (LSB),
BIT 0.
HIERARCHICAL JPEG MODE
THE HIERARCHICAL ENCODING ENCODES AN IMAGE IN MULTIPLE RESOLUTIONS.
FOR E.G., ONE COULD PROVIDE 320X240, 640X480 AND 1280X960 VERSIONS OF AN
IMAGE; THE DECODER AT THE RECEIVING END CAN CHOOSE THE OPTIMUM
RESOLUTION DEPENDING ON THE TARGET ‘S CAPABILITIES.
THUS, HIGH-RESOLUTION IMAGES CAN BE EASILY VIEWED IN LOWER RESOLUTION
DEVICES. THIS IS PARTICULARLY RELEVANT TO SMALL PORTABLE TERMINALS AND
FOR CONFERENCING WHERE MULTIPLE SMALLER IMAGES NEED TO SHARE THE
SCREEN WITH FULL SIZE IMAGES AT DIFFERENT TIMES.
HIERARCHICAL JPEG MODE
THE ENCODED IMAGE AT THE LOWEST RESOLUTION IS BASICALLY A COMPRESSED
LOW-PASS FILTERED IMAGE, WHEREAS THE IMAGES AT SUCCESSIVELY HIGHER
RESOLUTIONS PROVIDE ADDITIONAL DETAILS (DIFFERENCES FROM THE LOWER
RESOLUTION IMAGES).
SIMILAR TO PROGRESSIVE JPEG, THE HIERARCHICAL JPEG IMAGES CAN BE
TRANSMITTED IN MULTIPLE PASSES PROGRESSIVELY IMPROVING QUALITY.
THREE LEVEL HIERARCHICAL JPEG
ENCODER FOR A THREE-LEVEL HIERARCHICAL JPEG
REDUCTION OF IMAGE RESOLUTION:
REDUCE RESOLUTION OF THE INPUT IMAGE F (E.G., 512×512) BY A FACTOR OF 2 IN
EACH DIMENSION TO OBTAIN F2 (E.G., 256×256). REPEAT THIS TO OBTAIN F4 (E.G.,
128×128).
COMPRESS LOW-RESOLUTION IMAGE F4:
ENCODE F4 USING ANY OTHER JPEG METHOD (E.G., SEQUENTIAL, PROGRESSIVE)
TO OBTAIN F4.
ENCODER FOR A THREE-LEVEL HIERARCHICAL JPEG
COMPRESS DIFFERENCE IMAGE D2:
(A) DECODE F4 TO OBTAIN F4'. USE ANY INTERPOLATION METHOD TO EXPAND F4' TO
BE OF THE SAME RESOLUTION AS F2 AND CALL IT E(F4').
(B) ENCODE DIFFERENCE D2 = F2 − E(F4') USING ANY OTHER JPEG METHOD (E.G.,
SEQUENTIAL, PROGRESSIVE) TO GENERATE D2.
COMPRESS DIFFERENCE IMAGE D1:
(a) DECODE D2 TO OBTAIN D2'; ADD IT TO E(F4') TO GET F2' = E(F4 ')+ D2 ' WHICH IS A
VERSION OF F2 AFTER COMPRESSION AND DECOMPRESSION.
(b) (B) ENCODE DIFFERENCE D1 = F−E(F2') USING ANY OTHER JPEG METHOD (E.G.,
SEQUENTIAL, PROGRESSIVE) TO GENERATE D1.
DECODER FOR A THREE-LEVEL HIERARCHICAL JPEG
1. DECOMPRESS THE ENCODED LOW-RESOLUTION IMAGE F4:
– DECODE F4 USING THE SAME JPEG METHOD AS IN THE ENCODER TO OBTAIN f4'.
2. RESTORE IMAGE f2' AT THE INTERMEDIATE RESOLUTION:
– USE E( f4')+ d2' TO OBTAIN f2'.
3. RESTORE IMAGE f ' AT THE ORIGINAL RESOLUTION:
– Use E( f2')+ d1' TO OBTAIN f '.
JPEG BIT STREAM