0% found this document useful (0 votes)
39 views36 pages

Wk8 MPEG Part1

MPEG-2 video compression is essential for efficiently transmitting video data in computer vision systems, utilizing techniques like interframe coding and motion prediction to achieve high compression ratios. The standard defines various profiles and levels to accommodate different applications, from standard to high-definition television, while maintaining flexibility in decoder implementations. Key concepts include exploiting temporal coherence between frames and employing motion estimation to reduce bitrate, albeit at the cost of increased computational complexity.

Uploaded by

Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views36 pages

Wk8 MPEG Part1

MPEG-2 video compression is essential for efficiently transmitting video data in computer vision systems, utilizing techniques like interframe coding and motion prediction to achieve high compression ratios. The standard defines various profiles and levels to accommodate different applications, from standard to high-definition television, while maintaining flexibility in decoder implementations. Key concepts include exploiting temporal coherence between frames and employing motion estimation to reduce bitrate, albeit at the cost of increased computational complexity.

Uploaded by

Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MPEG-2 Video Compression

(c) Patrick Denny 2024


MPEG Video Compression
• MPEG has revolutionized the movement of video in computer vision systems
• The following notes are largely based on the excellent notes on MPEG Video Compression by Dr. Kirill Sidorov and
Prof. David Marshall from the School of Computer Science and Informatics, Cardiff University, UK
• So, to revisit our original problem – moving image data around is hard, so in practice we need to compress images
and video in order to make computer vision systems possible
• Uncompressed video (and audio) data are huge
• In HDTV systems, the bit rate easily exceeds 1 Gbps, which causes big problems for storage and network
communications
• Simple example: for HDTV, even if the pixel data is only 8-bits per pixel, 1920 x 1080 images at 30 frames
per second -> 1.5 Gbps
• Lossy methods have to be employed as the compression ratio of lossless methods (e.g., Huffman,
Arithmetic, LZW) is not high enough for image and video compression

(c) Patrick Denny 2024 31


From still images to video compression
• The discrete cosine transform (DCT) based compression at the heart of M-JPEG is suitable for video production
• However, dramatically higher compression ratios are needed for moving video around and can be obtained using
methods such as interframe coding and motion prediction.
• A good place to start is to look at MPEG-2.
• MPEG-2 video compression exploits temporal coherence – the statistical likelihood that successive pictures in a
video sequence are very similar.
• MPEG-2’s intended application ranges from below standard definition television (SDTV) to beyond high-definition
television (HDTV).
• The intended bit rate ranges from about 1.5 Mbit/s to more than 20 Mbit/s and beyond
• However, although the motivation was about television, this also supports how data is moved around within
modern computer vision systems
• MPEG-2 specifies exactly what constitutes a legal audiovisual bitstream
• A legal (“conformant”) encoder generates only legal bitstreams
• A legal decoder correctly decodes any legal bitstream
• MPEG-2 does not standardize how an encoder accomplishes compression!

(c) Patrick Denny 2024 32


From still images to video compression
• The MPEG-2 standard implicitly defines how a decoder reconstructs picture data from a coded bitstream without
dictating the implementation of the decoder.
• MPEG-2 explicitly avoids specifying what it calls the “display process” – how reconstructed pictures are displayed.
• Most MPEG-2 decoder implementations have flexible output formats
• However, MPEG-2 decoder equipment is ordinarily designed to output a specific raster standard.

(c) Patrick Denny 2024 33


MPEG-2 profiles and levels
• An MPEG-2 bitstream can invoke many algorithmic features at a decoder and reflect many possible parameter
values.
• The MPEG-2 standard classifies bitstreams and decoders in a matrix of profiles and levels
• A profile constrains the algorithm features potentially used by an encoder, present in a bitstream, or implemented in
a decoder.
• The higher the profile the more complexity is required of the decoder

(c) Patrick Denny 2024 34


MPEG-2 profiles
and levels
• MPEG-2 defines six profiles
• Simple (SP)
• Main (MP)
• 4:2:2 (422P)
• SNR
• Spatial (Spt)
• High (HP)
• Multiview (MVP)
• The higher the level, the more memory
or data throughput is required of a
decoder
• MPEG-2 defines 4 levels
• Low (LL)
• Main (ML)
• High-1440 (H14)
• High (HL)

(c) Patrick Denny 2024 35


Video compression - MPEG
• MPEG has to use lots of tricks if it is to do lossless transmission well
• We will look at some basic principles of video compression
• Earlier H.261 and MPEG 1 and 2 standards
• We will then touch on some of the ideas used in newer standards such as H.264 (MPEG-4 Advanced Video Coding)
• Image, video and audio compression standards have been specified and released by two main grouips since 1985:
• ISO – International Standards Organisation – JPEG, MPEG
• ITU – International Telecommunications Union – H.261-264

(c) Patrick Denny 2024 36


Compression Standards
• While in many cases one of the groups have specified separate standards there is some crossover between the
groups, e.g.,
• JPEG issued by ISO in 1989 (but adopted by ITU as ITU T.81)
• MPEG 1 released by ISO in 1991
• H.261 released by ITU in 1993 (based on CCITT 1990 draft)
• CCITT stands for Comité Consultatif International Téléphonique et Télégraphique whose parent
organisation is ITU
• H.262 (better known as MPEG 2) release in 1994
• H.263 released in 1996 extended as H.263+, H.263++
• MPEG 4 released in 11998
• H.264 release in 2002 to lower the bit rates with comparable quality video and support wide range of bit rates,
and is now part of MPEG 4 (Part 10, or AVC – Advanced Video Coding)

(c) Patrick Denny 2024 37


How to compress video?
• Basic idea of video compression : Exploit the fact that adjacent frames are similar.
• Spatial redundancy removal — intraframe coding (JPEG)
• NOT ENOUGH BY ITSELF?
• Temporal redundancy removal — greater compression by using the temporal coherence over time.
• Essentially, we consider the difference between frames.
• Spatial and temporal redundancy removal — intraframe and interframe coding (H.261, MPEG).
• Things are much more complex in practice of course.

(c) Patrick Denny 2024 38


How to compress
video?
• “It has been customary in the
past to transmit successive
complete images of the
transmitted picture.”…
• “In accordance with this
invention, this difficulty is
avoided by transmitting only
the difference between
successive images of the
object.”

(c) Patrick Denny 2024 39


Simple motion
example
• Consider a simple binary
image of a moving circle
• Let’s just consider the
difference between 2 frames
• It is simple to encode/decode

(c) Patrick Denny 2024 40


Estimating motion
• We will examine
methods of estimating
motion vectors shortly

(c) Patrick Denny 2024 41


Decoding motion
• Why is this a better
method than just frame
differencing?

(c) Patrick Denny 2024 42


Motion
estimation
example

(c) Patrick Denny 43


2024
• When sampling image data that is separated into luma and chroma values, e.g., in a
colour space like YCbCr, we can perform a compression of sorts by down-sampling.

Note on colour • The subsampling scheme is commonly expressed as a three-part ratio J:a:b (e.g.,
4:2:2) describing the number of luminance and chrominance sample in a conceptual
region that is J pixels wide and 2 pixels high

subsampling •
• J : horizontal sampling reference (width of the conceptual region, usually 4)
a: number of chrominance sample (Cr, Cb) in the first row of J pixels
• b : number of change of chrominance samples (Cr, Cb) between first and second
row of J pixels. B is usually either zero or equal to a
• So, e.g., 4:2:0 requires half the bandwidth of 4:4:4. To calculate bandwidth
relative to 4:4:4, just sum the factors and divide by 12.

(c) Patrick Denny 2024 44


How is motion compensation used?
• Block Matching
• MPEG relies on block-matching techniques
• At the core of MPEG is the DCT coding of 8x8 blocks of sample values (as in JPEG) or 8x8 blocks of prediction
errors.
• To simplify the implementation of subsampled chroma, the same DCT and block coding is used for both luma
and chroma
• When combined with 4:2:0 chroma subsampling, an 8x8 block of C B or CR is associated with a 16x16 block of
luma.
• This leads to the tiling of a field or frame into units of 16x16 luma samples.
• Each such unit is a macroblock (MB).
• Macroblocks lie on a 16x16 grid aligned with the upper-left luma sample of the image

(c) Patrick Denny 2024 45


How is motion compensation used?
• For a certain area (block) of pixels in a picture:
• Find a good estimate of this area in a previous (or in a future!) frame, within a specified search area
• Motion compensation
• Uses the motion vectors to compensate the picture
• Parts of a previous (or future) picture can be reused in a subsequent picture
• Individual parts spatially compressed – JPEG type compression

(c) Patrick Denny 2024 46


Any overheads?
• Motion estimation/compensation techniques reduces the video bitrate significantly
• HOWEVER, this introduces extra computational complexity
• Decoder needs to buffer reference pictures
• Then it needs to access what it buffered with backward and forward referencing
• These complex memory accesses and computation cause delay -> latency
• So, let’s see how these ideas are used in practice

(c) Patrick Denny 2024 47


Overview of H.261
• Developed by CCITT in 1988-1990 for video telecommunication applications.
• Meant for videoconferencing, videotelephone applications over ISDN telephone lines.
• Baseline ISDN is 64 kbits/sec, and integral multiples (p×64)
• Frame types are CCIR 601 CIF (Common Intermediate Format) (352×288) and QCIF (176×144) images with 4:2:0
subsampling.
• Two frame types: Intraframes (I-frames) and Interframes (P-frames).
• I-frames use basically JPEG — but YUV (YCrCb) and larger DCT windows, different quantisation.
• I-frames provide us with a refresh accessing point— key frames.
• P-frames use pseudo-differences from previous frame (predicted), so frames depend on each other.

(c) Patrick Denny 2024 48


H.261 group of
pictures
• We typically have a group of pictures
• one I-frame followed by several P-
frames
• a group of pictures (GOP)
• The number of P-frames followed by
each I-frame determines the size of the
GOP
• This can be fixed or dynamic
• Can you think of any reasons
that this can be set too large?

(c) Patrick Denny 2024 49


Intra-frame coding
• Intra-frame coding is very similar to JPEG

(c) Patrick Denny 2024 50


Intra-frame coding
• A basic intra-frame coding scheme is as follows
• Macroblocks are typically 16x16 pixel areas on Y plan of original image
• A macroblock usually consists of 4 Y blocks, 1 Cr block and 1 Cb block (4:2:0 chroma subsampling)
• The eye is most sensitive to luminance, less sensitive to chrominance
• We operate in a more effective colour space – YUV (YCbCr) colour space
• Typical to use 4:2:0 macroblocks: one quarter of the chrominance information used
• Quantization is by constant value for all DCT coefficients
• i.e., no quantization table as in JPEG

(c) Patrick Denny 2024 51


Inter-frame (P-frame) coding
• Basic idea
• Most consecutive frames within a sequence are very similar to the frames both before and after the frame of
interest
• Aim to exploit this redundancy
• Need to use motion estimation
• Use a technique known as block-based motion compensated prediction

(c) Patrick Denny 2024 52


Inter-frame (P-
frame) coding
• P-coding can be summarized as
follows

(c) Patrick Denny 2024 53


Inter-frame (P-frame) coding

(c) Patrick Denny 2024 54


Inter-frame (P-frame) coding

(c) Patrick Denny 2024 55


Motion
vector search
• So, we know how to
encode a P-block
• How do we find the
motion vector?

(c) Patrick Denny 2024 56


• The problem for motion estimation to solve is:
Example • How to adequately represent the changes, or
differences, between these two video frames?

(c) Patrick Denny 2024 57


Motion estimation
• A comprehensive 2-dimensional spatial search is performed for each luminance macroblock
• MPEG does not define how this search should be performed
• A details that the system designer can choose to implement in one many possible ways
• Well know that a full, exhaustive search over a wide 2-D area yields the best matching results in most cases,
but at extreme computational cost to the encoder
• Decisions related to this are key examples of the system tradeoffs that occur in computer vision systems
• Often, technology providers will make choices on these to follow design considerations in a particular
market.
• Motion estimation is usually the most computationally expensive portion of video encoding.

(c) Patrick Denny 2024 58


Motion estimation example

(c) Patrick Denny 2024 59


Motion vectors, matching blocks
• Previous figure shows an example of a particular macroblock from Frame 2 of earlier example, relative to various
macroblocks of Frame 1:
• The top frame has a bad match with the macroblock to be coded
• The middle frame has a fair match, as there is some commonality between the 2 macroblocks
• The bottom frame has the best match, with only a slight error between the 2 macroblocks
• Because a relatively good match has been found, the encoder assigns motion vectors to that macroblock

(c) Patrick Denny 2024 60


Final motion estimation

(c) Patrick Denny 2024 61


Motion
estimation
• The predicted frame is subtracted
from the desired frame
• This leaves a (hopefully) less
complicated residual error frame
which can then be encoded much
more efficiently than before motion
estimation

(c) Patrick Denny 2024 62


Example

Predicted frame with


motion vector overlaid

(c) Patrick Denny 2024 63


Example

Absolute difference
without and with
motion compensation

(c) Patrick Denny 2024 64


Example

(c) Patrick Denny 2024 65

You might also like