0% found this document useful (0 votes)
110 views57 pages

Image Compression Standards

The document discusses image compression standards and methods. It provides an overview of why image compression is used, common file formats like JPEG and GIF, and the differences between lossy and lossless compression. It then goes into more detail about JPEG compression, which is one of the most popular standards. The key aspects of JPEG compression include converting images from RGB to YCbCr color space, subsampling chroma information, applying the discrete cosine transform to concentrate information, quantizing high-frequency coefficients, and entropy encoding the data.

Uploaded by

Tejas Subramanya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views57 pages

Image Compression Standards

The document discusses image compression standards and methods. It provides an overview of why image compression is used, common file formats like JPEG and GIF, and the differences between lossy and lossless compression. It then goes into more detail about JPEG compression, which is one of the most popular standards. The key aspects of JPEG compression include converting images from RGB to YCbCr color space, subsampling chroma information, applying the discrete cosine transform to concentrate information, quantizing high-frequency coefficients, and entropy encoding the data.

Uploaded by

Tejas Subramanya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

IMAGE COMPRESSION

STANDARDS
IMAGE COMPRESSION:INTRODUCTION
 IMAGE COMPRESSION IS MINIMIZING THE IMAGE IN TERMS OF ITS SIZE IN BYTES OF
A GRAPHICS FILE WITHOUT DEGRADING THE QUALITY OF THE IMAGE TO AN
UNACCEPTABLE LEVEL.

 THE REDUCTION IN FILE SIZE ALLOWS MORE IMAGES TO BE STORED IN A GIVEN


AMOUNT OF DISK OR MEMORY SPACE.

 IT ALSO REDUCES THE TIME REQUIRED FOR IMAGES TO BE SENT OVER THE
INTERNET OR DOWNLOADED FROM WEB PAGES.
IMAGE COMPRESSION:INTRODUCTION
 THERE ARE SEVERAL DIFFERENT WAYS IN WHICH IMAGE FILES CAN BE
COMPRESSED. FOR INTERNET USE, THE TWO MOST COMMON COMPRESSED GRAPHIC
IMAGE FORMATS ARE THE JPEG (JOINT PHOTOGRAPHIC EXPERT GROUP)FORMAT
AND THE GIF (GRAPHICS INTERCHANGE FORMAT) FORMAT.

 THE JPEG METHOD IS MORE OFTEN USED FOR PHOTOGRAPHS, WHILE THE GIF
METHOD IS COMMONLY USED FOR LINE ART AND OTHER IMAGES IN WHICH
GEOMETRIC SHAPES ARE RELATIVELY SIMPLE.
CONTINUED……
 A TEXT FILE OR PROGRAM CAN BE COMPRESSED WITHOUT THE INTRODUCTION OF ERRORS, BUT
ONLY UP TO A CERTAIN EXTENT.

 THIS IS CALLED LOSSLESS COMPRESSION. BEYOND THIS POINT, ERRORS ARE INTRODUCED. IN TEXT
AND PROGRAM FILES, IT IS CRUCIAL THAT COMPRESSION BE LOSSLESS BECAUSE A SINGLE ERROR
CAN SERIOUSLY DAMAGE THE MEANING OF A TEXT FILE, OR CAUSE A PROGRAM NOT TO RUN.

 IN IMAGE COMPRESSION, A SMALL LOSS IN QUALITY IS USUALLY NOT NOTICEABLE. THERE IS NO


"CRITICAL POINT" UP TO WHICH COMPRESSION WORKS PERFECTLY, BUT BEYOND WHICH IT
BECOMES IMPOSSIBLE.

 WHEN THERE IS SOME TOLERANCE FOR LOSS, THE COMPRESSION FACTOR CAN BE GREATER THAN
IT CAN WHEN THERE IS NO LOSS TOLERANCE. FOR THIS REASON, GRAPHIC IMAGES CAN BE
COMPRESSED MORE THAN TEXT FILES OR PROGRAMS.
CONTINUED……
 FINALLY AN IMAGE COMPRESSION IS THE PROCESS OF CONVERTING AN IMAGE FILE

INTO ANOTHER IMAGE FILE THAT OCCUPIES LESS STORAGE SPACE, WITHOUT

SACRIFICING ITS VISUAL CONTENT.

 USEFUL FOR SAVING STORAGE SPACE, AND TRANSMISSION COSTS.

 IMAGE COMPRESSION WAS MOST COMMONLY USED IN THE DATA STORAGE,

PRINTING AND TELECOMMUNICATION INDUSTRY. THE DIGITAL FORM OF IMAGE

COMPRESSION IS ALSO BEING PUT TO WORK IN INDUSTRIES SUCH AS FAX

TRANSMISSION, SATELLITE REMOTE SENSING, AND HIGH DEFINITION TELEVISION


APPLICATIONS:
 A GOOD EXAMPLE IS THE HEALTH INDUSTRY, WHERE THE CONSTANT SCANNING AND/OR

STORAGE OF MEDICAL IMAGES AND DOCUMENTS TAKE PLACE.

 IMAGE COMPRESSION OFFERS MANY BENEFITS HERE, AS INFORMATION CAN BE STORED

WITHOUT PLACING LARGE LOADS ON SYSTEM SERVERS. DEPENDING ON THE TYPE OF

COMPRESSION APPLIED, IMAGES CAN BE COMPRESSED TO SAVE STORAGE SPACE, OR TO

SEND TO MULTIPLE PHYSICIANS FOR EXAMINATION.

 AND CONVENIENTLY, THESE IMAGES CAN UNCOMPRESS WHEN THEY ARE READY TO BE

VIEWED, RETAINING THE ORIGINAL HIGH QUALITY AND DETAIL THAT MEDICAL IMAGERY

DEMANDS
APPLICATIONS:
 IN THE SECURITY INDUSTRY, IMAGE COMPRESSION CAN GREATLY INCREASE THE EFFICIENCY OF

RECORDING, PROCESSING AND STORAGE.

 HOWEVER, IN THIS APPLICATION IT IS IMPERATIVE TO DETERMINE WHETHER ONE COMPRESSION

STANDARD WILL BENEFIT ALL AREAS.

 FOR EXAMPLE, IN A VIDEO NETWORKING OR CLOSED-CIRCUIT TELEVISION APPLICATION, SEVERAL

IMAGES AT DIFFERENT FRAME RATES MAY BE REQUIRED.

 TIME IS ALSO A CONSIDERATION, AS DIFFERENT AREAS MAY NEED TO BE RECORDED FOR VARIOUS

LENGTHS OF TIME.

 IMAGE RESOLUTION AND QUALITY ALSO BECOME CONSIDERATIONS, AS DOES NETWORK BANDWIDTH,

AND THE OVERALL SECURITY OF THE SYSTEM


TYPES OF COMPRESSION

 LOSSLESS: THE COMPRESSED IMAGE CAN BE CONVERTED BACK WITH ZERO ERROR.

 LOSSY: THE COMPRESSED IMAGE CANNOT BE CONVERTED BACK TO THE ORIGINAL

WITHOUT ERROR. THE AMOUNT OF ERROR IS INVERSELY PROPORTIONAL TO THE

STORAGE SPACE (USUALLY) AND CAN BE CONTROLLED BY THE USER.


LOSSLESS COMPRESSION -EXAMPLES

 LZW METHOD (USED IN WINZIP)

 HUFFMAN ENCODING (PART OF THE JPEG ALGORITHM, ALTHOUGH OVERALL JPEG IS

LOSSY)

 RUN-LENGTH ENCODING (ALSO PART OF THE JPEG ALGORITHM, ALTHOUGH JPEG IS

LOSSY OVERALL)
LOSSY COMPRESSION -EXAMPLES

 JPEG(JOINT PHOTOGRAPHIC EXPERTS GROUP)

 MPEG (MOVING PICTURE EXPERTS GROUP, FOR VIDEO)

 MP3 (MP3 IS AN AUDIO CODING FORMAT FOR DIGITAL AUDIO WHICH USES A FORM

OF IRREVERSIBLE DATA COMPRESSION, FOR AUDIO)

 MACHINE LEARNING BASED TECHNIQUES FOR COMPRESSION OF IMAGES OR VIDEO

(NOT COVERED IN THIS COURSE).


LOSSY IMAGE COMPRESSION

 COMPRESSION OF TEXT FILES OR EXE FILES CANNOT AFFORD TO BE LOSSY

 BUT SOME PORTION OF IMAGE CONTENT IS OFTEN NOT VERY NOTICEABLE TO THE

HUMAN EYE, ESPECIALLY THE HIGHER FREQUENCIES. DISCARDING THIS

EXTRANEOUS INFORMATION LEADS TO COMPRESSION WITHOUT SIGNIFICANT LOSS

OF VISUAL APPEAL.
JPEG COMPRESSION METHOD
 JPEG = JOINT PHOTOGRAPHIC EXPERTS GROUP

 ONE OF THE MOST POPULAR STANDARDS FOR COMPRESSION OF PHOTOGRAPHIC

IMAGES –WIDELY USED ON THE INTERNET.

 WIDELY USED IN DIGITAL CAMERAS.

 IMPLEMENTED IN ALL STANDARD IMAGE PROCESSING SOFTWARE (MATLAB, openCV,

ETC.)

 ESSENTIALLY LOSSY(THOUGH THERE ARE SOME LOSSLESS VARIANTS)

 APPLICABLE FOR COLOR AS WELL AS GRAYSCALE IMAGES.


JPEG COMPRESSION METHOD

 A USER-SPECIFIED QUALITY FACTOR (Q) BETWEEN 0 AND 100 (HIGHER Q MEANS

BETTER QUALITY)

 JPEG ALGORITHM COMPRESSES THE IMAGE BASED ON THE USER-PROVIDED Q.

 HIGHER THE Q, LESS WILL BE THE COMPRESSION RATE (BUT HIGHER IMAGE

QUALITY). LOWER Q WILL GIVE HIGHER COMPRESSION RATE (BUT POORER IMAGE

QUALITY).

 JPEG CAN ACHIEVE 1/10 OR 1/15 COMPRESSION RATE WITH LITTLE LOSS OF QUALITY.
JPEG COMPRESSION METHOD
MAIN STEPS IN JPEG IMAGE COMPRESSION

AS WE KNOW, UNLIKE ONE - DIMENSIONAL AUDIO SIGNALS, A DIGITAL IMAGE F (I, J) IS


NOT DEFINED OVER THE TIME DOMAIN.

INSTEAD, IT IS DEFINED OVER A SPATIAL DOMAIN — THAT IS, AN IMAGE IS A FUNCTION


OF THE TWO DIMENSIONS I AND J (OR, CONVENTIONALLY, X AND Y).

THE 2D DCT IS USED AS ONE STEP IN JPEG, TO YIELD A FREQUENCY RESPONSE THAT IS
A FUNCTION F (U, V) IN THE SPATIAL FREQUENCY DOMAIN, INDEXED BY TWO INTEGERS
U AND V.

JPEG IS A LOSSY IMAGE COMPRESSION METHOD. THE EFFECTIVENESS OF THE DCT


TRANSFORM CODING METHOD IN JPEG RELIES ON THREE MAJOR OBSERVATIONS:
OBSERVATIONS FOR JPEG IMAGE COMPRESSION

OBSERVATION 1: USEFUL IMAGE CONTENTS CHANGE RELATIVELY SLOWLY ACROSS

THE IMAGE, I.E., IT IS UNUSUAL FOR INTENSITY VALUES TO VARY WIDELY SEVERAL

TIMES IN A SMALL AREA, FOR EXAMPLE, WITHIN AN 8×8 IMAGE BLOCK.

 MUCH OF THE INFORMATION IN AN IMAGE IS REPEATED, HENCE “SPATIAL

REDUNDANCY”.
OBSERVATIONS FOR JPEG IMAGE COMPRESSION

OBSERVATION 2: PSYCHOPHYSICAL EXPERIMENTS SUGGEST THAT HUMANS ARE

MUCH LESS LIKELY TO NOTICE THE LOSS OF VERY HIGH SPATIAL FREQUENCY

COMPONENTS THAN THE LOSS OF LOWER FREQUENCY COMPONENTS.

 THE SPATIAL REDUNDANCY CAN BE REDUCED BY LARGELY REDUCING THE HIGH

SPATIAL FREQUENCY CONTENTS.


OBSERVATIONS FOR JPEG IMAGE COMPRESSION

OBSERVATION 3: VISUAL ACUITY (ACCURACY IN DISTINGUISHING CLOSELY SPACED

LINES) IS MUCH GREATER FOR GRAY (“BLACK AND WHITE”) THAN FOR COLOR.

 CHROMA SUBSAMPLING ([Link]) IS USED IN JPEG.


JPEG IMAGE COMPRESSION

THE GENERAL APPROACH TO IMAGE COMPRESSION IMPLIES THE FOLLOWING

STAGES:
 COLOR TRANSFORM FROM RGB TO YCBCR TOGETHER WITH A SHIFT
 SUB-SAMPLING AND PARTITIONING
 TRANSFORM TO A FREQUENCY DOMAIN
 REMOVAL OF HIGH-FREQUENCY DETAIL FROM THE IMAGE
 REORDERING FOR BETTER COMPRESSION
 REMOVAL OF ZEROS SERIES
 LOSSLESS ENTROPY CODING TO REMOVE SOME MORE EXTRA DATA
 FINAL PACKING
JPEG IMAGE COMPRESSION
JPEG IMAGE COMPRESSION
 JPEG'S APPROACH TO THE USE OF DCT IS BASICALLY TO REDUCE HIGH - FREQUENCY - CONTENTS

AND THEN EFFICIENTLY CODE THE RESULT INTO A BIT STRING.

 THE TERM SPATIAL REDUNDANCY INDICATES THAT MUCH OF THE INFORMATION IN AN IMAGE IS

REPEATED: IF A PIXEL IS RED, THEN ITS NEIGHBOR IS LIKELY RED ALSO.

 BECAUSE OF OBSERVATION 2 ABOVE, THE DCT COEFFICIENTS FOR THE LOWEST FREQUENCIES ARE

MOST IMPORTANT.

 THEREFORE, AS FREQUENCY GETS HIGHER, IT BECOMES LESS IMPORTANT TO REPRESENT THE DCT

COEFFICIENT ACCURATELY.

 IT MAY EVEN BE SAFELY SET TO ZERO WITHOUT LOSING MUCH PERCEIVABLE IMAGE INFORMATION.
JPEG IMAGE COMPRESSION
 CLEARLY, A STRING OF ZEROS CAN BE REPRESENTED EFFICIENTLY AS THE LENGTH OF SUCH A RUN OF

ZEROS, AND COMPRESSION OF BITS REQUIRED IS POSSIBLE, SINCE WE END UP USING FEWER NUMBERS

TO REPRESENT THE PIXELS IN BLOCKS, BY REMOVING SOME LOCATION - DEPENDENT INFORMATION, WE

HAVE EFFECTIVELY REMOVED SPATIAL REDUNDANCY.

 JPEG WORKS FOR BOTH COLOR AND GRAYSCALE IMAGES. IN THE CASE OF COLOR IMAGES, SUCH AS YIQ

OR YUV, THE ENCODER WORKS ON EACH COMPONENT SEPARATELY, USING THE SAME ROUTINES.

 IF THE SOURCE IMAGE IS IN A DIFFERENT COLOR FORMAT, THE ENCODER PERFORMS A COLOR - SPACE

CONVERSION TO YIQ OR YUV. THE CHROMINANCE IMAGES (/, Q OR U, V) ARE SUBSAMPLED: JPEG USES

[Link] SCHEME.(WITH [Link], FOR EVERY TWO ROWS OF FOUR PIXELS, COLOR IS SAMPLED FROM TWO

PIXELS IN THE TOP ROW AND ZERO PIXELS IN THE BOTTOM ROW.)
COLOR TRANSFORM FROM RGB TO YCBCR

 THAT TRANSFORM IS BASED ON OUR PHYSIOLOGICAL EXPERIENCE.

 THE HUMAN VISUAL SYSTEM CAN PERCEIVE MINOR CHANGES OF BRIGHTNESS,

THOUGH IT’S FAR LESS RESPONSIVE TO CHANGES OF COLOR (CHROMA COMPONENTS

OF THE IMAGE) FOR THE REGIONS WITH THE SAME BRIGHTNESS.

 THAT’S WHY WE CAN APPLY STRONGER COMPRESSION TO CHROMA TO GET LESS

IMAGE SIZE OF THE COMPRESSED IMAGE.

 WE TAKE AN RGB IMAGE AND CONVERT IT TO LUMA/CHROMA REPRESENTATION IN

ORDER TO SEPARATE LUMA FROM CHROMA AND TO PROCESS THEM SEPARATELY.


COLOR TRANSFORM FROM RGB TO YCBCR

 LUMA IS USUALLY CALLED Y (INTENSITY, BRIGHTNESS) AND CHROMA COMPONENTS

ARE CALLED CB AND CR (THESE ACTUALLY DIFFERENCE CB = B — Y AND CR = R — Y).

THAT TRANSFORM IS DONE AT THE SAME TIME WITH DATA SHIFT TO PREPARE DATA

TO PROCESSING STAGE WHICH IS CALLED DCT (DISCRETE COSINE TRANSFORM).


SUBSAMPLING AND PARTITIONING

 AS SOON AS WE CAN CONSIDER CHROMA COMPONENTS TO BE LESS IMPORTANT

THAN LUMA, WE CAN DECREASE THE TOTAL NUMBER OF CHROMA PIXELS.

 FOR EXAMPLE, WE CAN AVERAGE CHROMA IN HORIZONTAL OR VERTICAL

DIRECTION.

 AT THE MOST EXTREME CASE, WE CAN AVERAGE 4 NEIGHBOR CHROMA VALUES IN

THE RECTANGLE 2X2 TO GET JUST ONE NEW VALUE.

 THAT MODE IS CALLED [Link] AND THIS IS THE MOST POPULAR CHOICE FOR

SUBSAMPLING.
SUBSAMPLING AND PARTITIONING
 SUBSAMPLING: THE REDUCTION OF COLOR RESOLUTION IN DIGITAL COMPONENT VIDEO SIGNALS IN ORDER TO SAVE

STORAGE AND BANDWIDTH.

 THE COLOR COMPONENTS ARE COMPRESSED BY SAMPLING THEM AT A LOWER RATE THAN THE BRIGHTNESS (LUMA).

 ALTHOUGH COLOR INFORMATION IS DISCARDED, HUMAN EYES ARE LESS SENSITIVE TO COLOR THAN TO

BRIGHTNESS.

 YCBCR IS DESIGNATED AS 4:N:N (Y:CB:CR) THE ZERO MEANS THAT CB AND CR ARE SAMPLED AT HALF THE VERTICAL

RESOLUTION OF Y. MPEG-1 AND MPEG-2 USE [Link], BUT THE SAMPLES ARE TAKEN AT DIFFERENT INTERVALS. BY THE

TIME MPEG-2 CAME ALONG, IT WAS KNOWN THAT [Link] CODING WAS OFTEN CONVERTED TO [Link], WHICH IS WHY

MPEG-2 SAMPLING MORE CLOSELY LINES UP WITH THE [Link] PATTERN. H.261/263 ALSO USES [Link].
SUBSAMPLING AND PARTITIONING

 FOR FURTHER PROCESSING, WE DIVIDE THE WHOLE IMAGE INTO BLOCKS 8X8 FOR

LUMA AND CHROMA. THAT PARTITIONING SCHEME LETS US PROCESS EACH BLOCK

INDEPENDENTLY, THOUGH WE WILL HAVE TO REMEMBER COORDINATES OF EACH

BLOCK WHICH ARE ESSENTIAL AT IMAGE DECODING.


DISCRETE COSINE TRANSFORM

 THE IDEAS BEHIND JPEG COMPRESSION COME FROM AN ENGINEERING

BACKGROUND. ELECTRICAL AND SOUND WAVES CAN BE REPRESENTED AS A SERIES

OF AMPLITUDES OVER TIME. DISCRETE COSINE TRANSFORM (DCT) IS ONE OF THE

BASIC BUILDING BLOCKS FOR JPEG.

 IMPORTANT ASPECT OF THE DCT IS THE ABILITY TO QUANTIZE THE DCT

COEFFICIENTS USING USUALLY WEIGHTED QUANTIZATION VALUES.


DISCRETE COSINE TRANSFORM

 DCT IS A FOURIER-RELATED TRANSFORM WHICH IS SIMILAR TO THE DISCRETE

FOURIER TRANSFORM (DFT) BUT USING ONLY REAL NUMBERS.

 ACTUALLY, WE APPLY THAT 2D TRANSFORM TO EACH BLOCK 8X8 OF OUR IMAGE.

 THE MAIN IDEA IS TO GET OTHER DATA REPRESENTATION AND TO MOVE FROM

SPATIAL TO A FREQUENCY DOMAIN.

 THE RESULT OF DCT IS DATA ARRAY IN A FREQUENCY DOMAIN AND THIS IS A VERY

CLEVER STEP TO WORK FURTHER NOT DIRECTLY WITH LUMA AND CHROMA, BUT

WITH FREQUENCIES OF LUMA AND CHROMA FROM OUR IMAGE.


DISCRETE COSINE TRANSFORM

 BIG OBJECTS ON THE IMAGE ARE CONSIDERED TO BE LOW-FREQUENCY DATA, THOUGH

SMALL/TINY OBJECTS ARE CONSIDERED TO BE HIGH-FREQUENCY ELEMENTS.

 IN THE NEW BLOCK 8X8 THE UPPER LEFT ELEMENT IS CALLED DC (THIS IS AVERAGE

VALUE FOR ALL PIXELS FROM THE ORIGINAL BLOCK), AND ALL OTHER ELEMENTS ARE

CALLED AC.

 IF WE COMPOSE A NEW IMAGE FROM DC ELEMENTS OF EACH BLOCK, WE GET ORIGINAL

IMAGE WITH REDUCED RESOLUTION. NEW WIDTH AND HEIGHT WILL BE 1/8 FROM THE

ORIGINAL IMAGE.
QUANTIZATION

 THE DCT COEFFICIENTS ARE FLOATING POINT NUMBERS AND STORING THEM IN A

FILE WILL PRODUCE NO COMPRESSION. SO THEY NEED TO BE QUANTIZED.

 THE HUMAN EYE IS NOT SENSITIVE TO CHANGES IN THE HIGHER FREQUENCY

CONTENT. SO WE CAN HAVE CRUDER QUANTIZATION FOR THE HIGHER FREQUENCY

COEFFICIENTS AND A FINER ONE FOR THE LOWER FREQUENCY COEFFICIENTS.

 QUANTIZATION IS PERFORMED BY DIVIDING THE DCT COEFFICIENTS BY A

QUANTIZATION MATRIX AND ROUNDING OFF TO THE NEAREST INTEGER.


QUANTIZATION

 THIS IS THE LOSSY PART OF JPEG!

 THE QUANTIZATION STEP IN JPEG IS AIMED AT REDUCING THE TOTAL NUMBER OF

BITS NEEDED FOR A COMPRESSED IMAGE. IT CONSISTS OF SIMPLY DIVIDING EACH

ENTRY IN THE FREQUENCY SPACE BLOCK BY AN INTEGER, THEN ROUNDING:

 HERE, F(U,V) REPRESENTS A DCT COEFFICIENT, Q(U,V) IS A QUANTIZATION MATRIX


PREPARATION FOR ENTROPY CODING

 SO FAR SEEN TWO OF THE MAIN STEPS IN JPEG COMPRESSION: DCT AND

QUANTIZATION.

 THE REMAINING SMALL STEPS SHOWN IN THE BLOCK DIAGRAM ALL LEAD UP TO

ENTROPY CODING OF THE QUANTIZED DCT COEFFICIENTS.

 THESE ADDITIONAL DATA COMPRESSION STEPS ARE LOSSLESS.

 INTERESTINGLY, THE DC AND AC COEFFICIENTS ARE TREATED QUITE DIFFERENTLY

BEFORE ENTROPY CODING: RUN - LENGTH ENCODING ON ACS VERSUS DPCM ON

DCS.
RUN - LENGTH CODING (RLC) ON AC COEFFICIENTS
 THE MANY ZEROS IN F(U, V) AFTER QUANTIZATION IS APPLIED. RUN - LENGTH
CODING {RLC) {OR RUN - LENGTH ENCODING, RLE) IS THEREFORE USEFUL IN
TURNING THE F(U, V) VALUES INTO SETS {# - ZEROS - TO - SKIP, NEXT NON­ZERO
VALUE}.

 RLC IS EVEN MORE EFFECTIVE WHEN WE USE AN ADDRESSING SCHEME, MAKING IT


MOST LIKELY TO HIT A LONG RUN OF ZEROS: A ZIGZAG SCAN TURNS THE 8 X 8
MATRIX F(U, V) INTO A 64 - VECTOR, AS THE FOLLOWING FIGURE ILLUSTRATES.

 AFTER ALL, MOST IMAGE BLOCKS TEND TO HAVE SMALL HIGH - SPATIAL -
FREQUENCY COMPONENTS, WHICH ARE ZEROED OUT BY QUANTIZATION.

 HENCE THE ZIGZAG SCAN ORDER HAS A GOOD CHANCE OF CONCATENATING LONG
RUN - LENGTH CODING (RLC) ON AC COEFFICIENTS
FOR EXAMPLE, F (U , V) WILL BE TURNED INTO

(32,6,-1,-1,0,-1,0,0,0,-1,0,0, 1,0,0,... ,0)

WITH THREE RUNS OF ZEROS IN THE MIDDLE AND A RUN OF 51 ZEROS AT THE END.

ZIGZAG SCAN IN JPEG: THE RLC STEP REPLACES VALUES BY A PAIR (RUNLENGTH, VALUE) FOR EACH
RUN OF ZEROS IN THE AC COEFFICIENTS OF F, WHERE RUNLENGTH IS THE

NUMBER OF ZEROS IN THE RUN AND VALUE IS THE NEXT NONZERO

COEFFICIENT. TO FURTHER SAVE BITS, A SPECIAL PAIR (0,0) INDICATES

THE END - OF - BLOCK AFTER THE LAST NONZERO AC COEFFICIENT IS

REACHED.
DIFFERENTIAL PULSE CODE MODULATION (DPCM) ON DC COEFFICIENTS

 THE DC COEFFICIENTS ARE CODED SEPARATELY FROM THE AC ONES.

 EACH 8 X 8 IMAGE BLOCK HAS ONLY ONE DC COEFFICIENT.

 THE VALUES OF THE DC COEFFICIENTS FOR VARIOUS BLOCKS COULD BE LARGE


AND DIFFERENT, BECAUSE THE DC VALUE REFLECTS THE AVERAGE INTENSITY OF
EACH BLOCK, BUT CONSISTENT WITH OBSERVATION 1 ABOVE,

 THE DC COEFFICIENT IS UNLIKELY TO CHANGE DRASTICALLY WITHIN A SHORT


DISTANCE.

 THIS MAKES DPCM AN IDEAL SCHEME FOR CODING THE DC COEFFICIENTS.


DIFFERENTIAL PULSE CODE MODULATION (DPCM) ON DC COEFFICIENTS

 IF THE DC COEFFICIENTS FOR THE FIRST FIVE IMAGE BLOCKS ARE 150,155,149,152,
144, DPCM WOULD PRODUCE 150, 5, —6, 3, —8,

 ASSUMING THE PREDICTOR FOR THE ITH BLOCK IS SIMPLY DI = DCI + 1 — DCI, AND
DO =DC0.

 WE EXPECT DPCM CODES TO GENERALLY HAVE SMALLER MAGNITUDE AND


VARIANCE, WHICH IS BENEFICIAL FOR THE NEXT ENTROPY CODING STEP.

 IT IS WORTH NOTING THAT UNLIKE THE RUN - LENGTH CODING OF THE AC


COEFFICIENTS, WHICH IS PERFORMED ON EACH INDIVIDUAL BLOCK, DPCM FOR THE
DC COEFFICIENTS IN JPEG IS CARRIED OUT ON THE ENTIRE IMAGE AT ONCE.
ENTROPY CODING

 THE DC AND AC COEFFICIENTS FINALLY UNDERGO AN ENTROPY CODING STEP.

 BELOW, WE WILL DISCUSS ONLY THE BASIC ENTROPY CODING METHOD, WHICH
USES HUFFMAN CODING AND SUPPORTS ONLY 8 - BIT PIXELS IN THE ORIGINAL
IMAGES (OR COLOR IMAGE COMPONENTS).

 LET'S EXAMINE THE TWO ENTROPY CODING SCHEMES, USING A VARIANT OF


HUFFMAN CODING FOR DCS AND A SLIGHTLY DIFFERENT SCHEME FOR ACS.
HUFFMAN CODING OF DC COEFFICIENTS

 EACH DPCM - CODED DC COEFFICIENT IS REPRESENTED BY A PAIR OF SYMBOLS


(SIZE, AMPLITUDE), WHERE SIZE INDICATES HOW MANY BITS ARE NEEDED FOR
REPRESENTING THE COEFFICIENT AND AMPLITUDE CONTAINS THE ACTUAL BITS.

 DPCM VALUES COULD REQUIRE MORE THAN 8 BITS AND COULD BE NEGATIVE
VALUES. THE ONE'S - COMPLEMENT SCHEME IS USED FOR NEGATIVE NUMBERS —
THAT IS, BINARY CODE 10 FOR 2, 01 FOR — 2; 11 FOR 3, 00 FOR — 3; AND SO ON.

 IN THE JPEG IMPLEMENTATION, SIZE IS HUFFMAN CODED AND IS HENCE A VARIABLE


- LENGTH CODE. IN OTHER WORDS, SIZE 2 MIGHT BE REPRESENTED AS A SINGLE BIT
(0 OR 1) IF IT APPEARED MOST FREQUENTLY.

 IN GENERAL, SMALLER SIZES OCCUR MUCH MORE OFTEN —- THE ENTROPY OF SIZE
HUFFMAN ENCODING

 INPUT: A SET OF NON-ZERO QUANTIZED DCT COEFFICIENTS FROM ALL THE


DIFFERENT BLOCKS OF THE IMAGE (VALUES LYING BETWEEN -1024 TO +1024).

 OUTPUT: A SET OF ENCODED COEFFICIENTS WITH LENGTH (IN TERMS OF NUMBER


OF BITS) LESS THAN THAT OF THE ORIGINAL SET.

 PRINCIPLES BEHIND HUFFMAN ENCODING:

(1)ENCODE THE MORE FREQUENTLY OCCURRING COEFFICIENTS WITH FEWER BITS.


ENCODE THE RARELY OCCURRING COEFFICIENTS WITH MORE BITS. THIS WILL REDUCE
THE AVERAGE BIT-LENGTH.

(2)ENSURE THAT THE ENCODING FOR NO COEFFICIENT IS A STRICT PREFIX OF THE


ENCODING OF ANY OTHER COEFFICIENT (TO BE EXPLAINED ON NEXT SLIDE). THIS IS
HUFFMAN ENCODING EXAMPLE

 CONSIDER A SET OF ALPHABETS {A,E,Q}. LET THE FREQUENCY OF AN ALPHABET


‘X’BE DENOTED AS P(X).

 ASSUME P(E) > P(A) > P(Q) [ACTUALLY TRUE IN THE ENGLISH LANGUAGE].

 CONSIDER THE FOLLOWING CODE-WORD ASSIGNMENT: E –0, A –1, Q –01(NOTE: WE


ASSIGNED MORE BITS FOR Q). NOW CONSIDER THE ENCODED STREAM: 001. IT CAN
BE INTERPRETED AS ‘EEA’ OR ‘EQ’.

 THE REASON FOR THIS AMBIGUITY IS THAT THE CODE FOR ‘E’IS A STRICT PREFIX OF
THE CODE FOR ‘Q’.

 FOR UNAMBIGUOUS DECODING, WE NEED PREFIX-FREE CODES. EXAMPLE E –0, A –10,


Q –11 IS ONE EXAMPLE OF A PREFIX-FREE CODE.
HUFFMAN ENCODING: ALGORITHM

[Link] ALPHABETS IN INCREASING ORDER OF FREQUENCY. CREATE A LEAF NODE


FROM EACH ALPHABET. THESE LEAF NODES WILL BELONG TO A BINARY TREE CALLED
THE HUFFMAN TREE.

[Link] THE TWO LOWEST FREQUENCY NODES S1 AND S2 TO CREATE A PARENT


NODE S12. S1 AND S2 WILL BE THE LEFT AND RIGHT CHILD OF S12. THE FREQUENCY OF
S12 IS GIVEN BY P(S12) = P(S1) + P(S2).

[Link] THE EDGE FROM S12TO S1WITH A ‘0’ AND THE EDGE FROM S12TO S2WITH A ‘1’.

[Link] S1AND S2 FROM THE SORTED LIST OF ALPHABETS AND INSERT THE NODE S12,
I.E. ROOT NODE OF THE TREE (S12,S1,S2) IN THE CORRECT PLACE DEPENDING ON THE
VALUE OF P(S12).
HUFFMAN ENCODING: ALGORITHM

[Link] STEPS 2 TO 4 UNTIL THERE IS ONLY ONE NODE IN THE LIST. THIS WILL BE THE
ROOT NODE OF THE FINAL HUFFMAN TREE.

[Link] THE TREE FROM THE ROOT NODE UNTIL EACH LEAF AND COLLECT ALL
THE BINARY SYMBOLS ALONG EVERY EDGE INTO A STRING. THIS STRING WILL FORM
THE CODE WORD FOR THAT SYMBOL.
FOUR COMMONLY USED JPEG MODES

THE JPEG STANDARD DEFINED FOUR COMPRESSION MODES: HIERARCHICAL,


PROGRESSIVE, SEQUENTIAL AND LOSSLESS. FIGURE SHOWS THE RELATIONSHIP OF
MAJOR JPEG COMPRESSION MODES AND ENCODING PROCESSES.
SEQUENCIAL JPEG MODES

 IMAGE COMPONENTS ARE COMPRESSED EITHER INDIVIDUALLY OR IN GROUPS.

 EACH IMAGE COMPONENT IS ENCODED IN A SINGLE LEFT-TO-RIGHT, TOP-TO-


BOTTOM SCAN.
 IT SUPPORTS ONLY 8-BIT IMAGES (NOT 12-BIT IMAGES)

 COLOR COMPONENTS INTERLEAVING IS DONE TO SAVE BUFFER SIZE.

 WITHIN SEQUENTIAL MODE, TWO ALTERNATE ENTROPY ENCODING PROCESSES ARE


DEFINED BY THE JPEG STANDARD: ONE USES HUFFMAN ENCODING; THE OTHER
USES ARITHMETIC CODING.
PROGRESSIVE JPEG MODES

 A PROGRESSIVE JPEG IS AN IMAGE CREATED USING COMPRESSION ALGORITHMS


THAT LOAD THE IMAGE IN SUCCESSIVE WAVES UNTIL THE ENTIRE IMAGE IS
DOWNLOADED. THIS MAKES THE IMAGE APPEAR TO LOAD FASTER, AS IT LOADS THE
WHOLE IMAGE IN PROGRESSIVE WAVES. A NORMAL JPEG LOADS THE IMAGE FROM
THE TOP TO BOTTOM LINE BY LINE.

 PROGRESSIVE JPEG DELIVERS LOW QUALITY VERSIONS OF THE IMAGE QUICKLY,


FOLLOWED BY HIGHER QUALITY PASSES.
PRINCIPLE BEHIND THE PROGRESSIVE JPEG:

 JPEG FIRST CONVERTS RGB PIXELS TO YCBCR PIXELS. INSTEAD OF HAVING RED,
GREEN AND BLUE CHANNELS, JPEG USES A LUMA (Y) CHANNEL AND TWO CHROMA
CHANNELS (CB AND CR).

 THOSE CHANNELS ARE TREATED SEPARATELY, BECAUSE THE HUMAN EYE IS MORE
SENSITIVE TO DISTORTION IN LUMA (BRIGHTNESS) THAN IT IS TO DISTORTION IN
CHROMA (COLOR).

 THE CHROMA CHANNELS ARE OPTIONALLY DOWNSAMPLED TO HALF THE ORIGINAL


RESOLUTION; THIS IS CALLED CHROMA SUBSAMPLING.

 THEN, JPEG DOES A BIT OF MATHEMATICAL MAGIC WITH THE PIXELS. THIS MAGIC IS
CALLED THE DISCRETE COSINE TRANSFORM (DCT).
PRINCIPLE BEHIND THE PROGRESSIVE JPEG:

 EVERY BLOCK OF 8X8 PIXELS (64 PIXEL VALUES) IS CONVERTED TO 64 COEFFICIENTS


THAT REPRESENT THE BLOCK’S INFORMATION IN A DIFFERENT WAY.

 THE FIRST COEFFICIENT IS CALLED THE DC COEFFICIENT AND IT BOILS DOWN TO


THE AVERAGE PIXEL VALUE OF ALL THE PIXELS IN THE BLOCK.

 THE OTHER 63 COEFFICIENTS (THE SO-CALLED AC COEFFICIENTS) REPRESENT


HORIZONTAL AND VERTICAL DETAILS WITHIN THE BLOCK; THEY ARE ORDERED
FROM LOW FREQUENCY (OVERALL GRADIENTS) TO HIGH FREQUENCY (SHARP
DETAILS).

 THE GOAL OF THESE TRANSFORMATIONS IS TO DO LOSSY IMAGE COMPRESSION.


FOR OUR PERCEPTION, LUMA AND LOW-FREQUENCY SIGNALS ARE MORE
THE PROGRESSIVE JPEG:

TWO WAYS OF DOING THIS:–

1. SPECTRAL SELECTION: COEFF. ARE GROUPED INTO SPECTRAL BANDS, AND LOWER-
FREQUENCY BANDS SENT FIRST.

2. SUCCESSIVE APPROXIMATION: DATA IS FIRST SENT WITH LOWER PRECISION AND


THEN REFINED.
SPECTRAL SELECTION

 TAKES ADVANTAGE OF THE “SPECTRAL” (SPATIAL FREQUENCY SPECTRUM)


CHARACTERISTICS OF THE DCT COEFFICIENTS: HIGHER AC COMPONENTS PROVIDE
DETAIL INFORMATION.

 SCAN 1: ENCODE DC AND FIRST FEW AC COMPONENTS,

E.G., AC1, AC2.

 SCAN 2: ENCODE A FEW MORE AC COMPONENTS,

E.G., AC3, AC4, AC5. . . .

 SCAN K: ENCODE THE LAST FEW ACS,

E.G., AC61, AC62, AC63.


SUCCESSIVE APPROXIMATION:

 INSTEAD OF GRADUALLY ENCODING SPECTRAL BANDS, ALL DCT COEFFICIENTS ARE


ENCODED SIMULTANEOUSLY BUT WITH THEIR MOST SIGNIFICANT BITS (MSBS)
FIRST.

 SCAN 1: ENCODE THE FIRST FEW MSBS, E.G., BITS 7, 6, 5, 4.

 SCAN 2: ENCODE A FEW MORE LESS SIGNIFICANT BITS, E.G.,

BIT 3.

 SCAN M: ENCODE THE LEAST SIGNIFICANT BIT (LSB),

BIT 0.
HIERARCHICAL JPEG MODE

 THE HIERARCHICAL ENCODING ENCODES AN IMAGE IN MULTIPLE RESOLUTIONS.


FOR E.G., ONE COULD PROVIDE 320X240, 640X480 AND 1280X960 VERSIONS OF AN
IMAGE; THE DECODER AT THE RECEIVING END CAN CHOOSE THE OPTIMUM
RESOLUTION DEPENDING ON THE TARGET ‘S CAPABILITIES.

 THUS, HIGH-RESOLUTION IMAGES CAN BE EASILY VIEWED IN LOWER RESOLUTION


DEVICES. THIS IS PARTICULARLY RELEVANT TO SMALL PORTABLE TERMINALS AND
FOR CONFERENCING WHERE MULTIPLE SMALLER IMAGES NEED TO SHARE THE
SCREEN WITH FULL SIZE IMAGES AT DIFFERENT TIMES.
HIERARCHICAL JPEG MODE

 THE ENCODED IMAGE AT THE LOWEST RESOLUTION IS BASICALLY A COMPRESSED


LOW-PASS FILTERED IMAGE, WHEREAS THE IMAGES AT SUCCESSIVELY HIGHER
RESOLUTIONS PROVIDE ADDITIONAL DETAILS (DIFFERENCES FROM THE LOWER
RESOLUTION IMAGES).

 SIMILAR TO PROGRESSIVE JPEG, THE HIERARCHICAL JPEG IMAGES CAN BE


TRANSMITTED IN MULTIPLE PASSES PROGRESSIVELY IMPROVING QUALITY.
THREE LEVEL HIERARCHICAL JPEG
ENCODER FOR A THREE-LEVEL HIERARCHICAL JPEG

 REDUCTION OF IMAGE RESOLUTION:

REDUCE RESOLUTION OF THE INPUT IMAGE F (E.G., 512×512) BY A FACTOR OF 2 IN


EACH DIMENSION TO OBTAIN F2 (E.G., 256×256). REPEAT THIS TO OBTAIN F4 (E.G.,
128×128).

 COMPRESS LOW-RESOLUTION IMAGE F4:

ENCODE F4 USING ANY OTHER JPEG METHOD (E.G., SEQUENTIAL, PROGRESSIVE)


TO OBTAIN F4.
ENCODER FOR A THREE-LEVEL HIERARCHICAL JPEG

 COMPRESS DIFFERENCE IMAGE D2:

(A) DECODE F4 TO OBTAIN F4'. USE ANY INTERPOLATION METHOD TO EXPAND F4' TO
BE OF THE SAME RESOLUTION AS F2 AND CALL IT E(F4').

(B) ENCODE DIFFERENCE D2 = F2 − E(F4') USING ANY OTHER JPEG METHOD (E.G.,
SEQUENTIAL, PROGRESSIVE) TO GENERATE D2.

 COMPRESS DIFFERENCE IMAGE D1:

(a) DECODE D2 TO OBTAIN D2'; ADD IT TO E(F4') TO GET F2' = E(F4 ')+ D2 ' WHICH IS A
VERSION OF F2 AFTER COMPRESSION AND DECOMPRESSION.

(b) (B) ENCODE DIFFERENCE D1 = F−E(F2') USING ANY OTHER JPEG METHOD (E.G.,
SEQUENTIAL, PROGRESSIVE) TO GENERATE D1.
DECODER FOR A THREE-LEVEL HIERARCHICAL JPEG

1. DECOMPRESS THE ENCODED LOW-RESOLUTION IMAGE F4:

– DECODE F4 USING THE SAME JPEG METHOD AS IN THE ENCODER TO OBTAIN f4'.

2. RESTORE IMAGE f2' AT THE INTERMEDIATE RESOLUTION:

– USE E( f4')+ d2' TO OBTAIN f2'.

3. RESTORE IMAGE f ' AT THE ORIGINAL RESOLUTION:

– Use E( f2')+ d1' TO OBTAIN f '.


JPEG BIT STREAM

You might also like