Digital Image Definitions&Transformations
Digital Image Definitions&Transformations
A digital image described in a 2D discrete space is derived from an analog image in a 2D continuous space through a sampling process that is frequently referred to as digitization. The mathematics of that sampling process will be described in subsequent Chapters. For now we will look at some basic definitions associated with the digital image. The effect of digitization is shown in figure 1. The 2D continuous image column is termed a pixel. is divided into N rows and M columns. The intersection of a row and a The value assigned to the integer coordinates with
and is . In fact, in most cases ,which we might consider to be the physical signal that impinges on the face of a 2D sensor , is actually a function of many variables including depth ,color ( )and time case of 2D, monochromatic, static images in this module. .Unless otherwise stated, we will consider the
The image shown in figure (1.1) has been divided into rows and The value assigned to every pixel is the average brightness in the pixel rounded to the nearest integer value. The process of representing the amplitude of the 2D signal at a given coordinate as an integer value with L different gray levels is usually referred to as amplitude quantization or simple quantization.
Common values
There are standard values for the various parameters encountered in digital image processing. These values can be caused by video standards, by algorithmic requirements, or by the desire to keep digital circuitry simple. Table 1 gives some comm
Symbol N M
Gray Levels
2,64,256,1024,4096,16384
Table 1: Common values of digital image parameters Quite frequently we see cases of M=N=2k where .This can be motivated by digital circuitry or by the use of certain algorithms such as the (fast) Fourier transform. The number of distinct gray levels is usually a power of 2, that is, where B is the number of bits in the binary representation of the brightness levels. When we speak of a gray-level image; when we speak of a binary image. In a binary image there are just two gray levels which can be referred to, for example, as "black" and "white" or "0" and "1". Suppose that a continuous image form of an array as: is approximated by equally spaced samples arranged in the
(1)
Each element of the array refered to as "pixel" is a discrete quantity. The array represents a digital image. The above digitization requires a decision to be made on a value for N a well as on the number of discrete gray levels allowed for each pixel. It is common practice in digital image processing to let N=2n and G = number of gray levels = assumed that discrete levels are equally spaced between 0 to L in the gray scale. Therefore the number of bits required to store a digitized image of size words a is . It is
In other bytes.
The representation given by equ (1) is an approximation to a continuous image. Reasonable question to ask at this point is how many samples and gray levels are required for a good approximation? This brings up the question of resolution. The resolution (ie the degree of discernble detail) of an image is strangely dependent on both N and m. The more these parameters are increased, the closer the digitized array will approximate the original image. Unfortunately this leads to large storage and consequently processing requirements increase rapidly as a function of N and large m.
As an example suppose we construct a chart with vertical lines of width W, and with space between the lines also having width W. A line-pair consists of one such line and its adjacent space. Thus width of line pair is and there are line-pairs per unit distance. A widely used definition of resolution is simply the smallest number of discernible line pairs per unit distance; for es 100 line pairs/mm. Gray level resolution: This refers to the smallest discernible change in gray level. The measurement of discernible changes in gray level is a highly subjective process. We have considerable discretion regarding the number of Samples used to generate a digital image. But this is not true for the number of gray levels. Due to hardware constraints, the number of gray levels is usually an integer power of two. The most common value is 8 bits. It can vary depending on application. When an actual measure of physical resolution relating pixels and level of detail they resolve in the original scene are not necessary, it is not uncommon to refer to an L-level digital image of size having a spatial resolution of pixels and a gray level resolution of L levels. as
*Point
*Local
-the output value at a specific coordinate is dependent only on the input value at that same coordinate. -the output value at a specific coordinate is dependent on the input values in the neighborhood of that same coordinate. --the output value at a specific coordinate is dependent on all the values in the input image..
*Global
Table 2: Types of image operations. Image size= complexity is specified in operations per pixel. This is shown graphically in Figure(1.2).
neighborhood size=
Figure (1.2): Illustration of various types of image operations Types of neighborhoods Neighborhood operations play a key role in modern digital image processing. It is therefore important to understand how images can be sampled and how that relates to the various neighborhoods that can be used to process an image. Rectangular sampling - In most cases, images are sampled by laying a rectangular grid over an image as illustrated in Figure(1.1). This results in the type of sampling shown in Figure(1.3ab). Hexagonal sampling-An alternative sampling scheme is shown in Figure (1.3c) and is termed hexagonal sampling. Both sampling schemes have been studied extensively and both represent a possible periodic tiling of the continuous image space. However rectangular sampling due to hardware and software and software considerations remains the method of choice. Local operations produce an output pixel value based upon the pixel values in the neighborhood .Some of the most common neighborhoods are the 4-connected neighborhood and the 8-connected neighborhood in the case of rectangular sampling and the 6-connected neighborhood in the case of hexagonal sampling illustrated in Figure(1.3).
Fig (1.3a)
Fig (1.3b)
Fig (1.3c)
Video Parameters
We do not propose to describe the processing of dynamically changing images in this introduction. It is appropriate-given that many static images are derived from video cameras and frame grabbers-to mention the standards that are associated with the three standard video schemes that are currently in worldwide use- NTSC, PAL, and SECAM. This information is summarized in Table 3.
Standard Property Images / Second Ms / image Lines / image (horiz./vert.)=aspect radio interlace Us / line
Table 3: Standard video parameters In a interlaced image the odd numbered lines (1, 3, 5.) are scanned in half of the allotted time (e.g. 20 ms in PAL) and the even numbered lines (2, 4, 6,.) are scanned in the remaining half. The image display must be coordinated with this scanning format. The reason for interlacing the scan lines of a video image is to reduce the perception of flicker in a displayed image. If one is planning to use images that have been scanned from an interlaced video source, it is important to know if the two half-images have been appropriately "shuffled" by the digitization hardware or if that should be implemented in software. Further, the analysis of moving objects requires special care with interlaced video to avoid 'Zigzag' edges. Tools Certain tools are central to the processing of digital images. These include mathematical tools such as convolution, Fourier analysis, and statistical descriptions, and manipulative tools such as chain codes and run codes. We will present these tools without any specific motivation. The motivation will follow in later sections. Convolution Properties of Convolution Fourier Transforms Properties of Fourier Transforms Statistics
Convolution
There are several possible notations to indicate the convolution of two (multi-dimensional) signals to produce an output signal. The most common are: We shall use the first form ,with the following formal definitions. In 2D continuous space:
In 2D discrete space:
Properties of Convolution There are a number of important mathematical properties associated with convolution.
Convolution is commutative. Convolution is associative. Convolution is distributive. where a, b, c, and d are all images, either continuous or discrete.
Assume that the data array has finite rectangular support on given as
for (4.3.1) The DCT basis functions for size 8 x 8 are shown in Figure ( ). The mapping between the mathematical values and the colors (gray levels) is the same as in the DFT case. Each basis function occupies a small square; the squares are then arranged into as 8 x 8 mosaic. Note that unlike the DFT, where the highest frequencies occur near , the indices , the highest frequencies of the DCT occur at the highest
.
as,
(4.3.2)
where the weighting function w(k) is given just as in the case of 1-D DCT by
(4.3.3)
From eqn (4.3.1), we see that the 2-D DCT is a separable operator. As such it can be applied to the rows and then the columns, or vice versa. Thus the 2-D theory can be developed by repeated application of the 1-D theory. In the following subsections we relate the 1-D DCT to 1-D DFT of a symmetrically extended sequence. This not only provides an understanding of the DCT but also enables its fast calculation. We also present a fast DCT calculation that can avoid the use of complex arithmetic in the usual case where x is a real-valued signal, e.g., an image. (Note: the next two subsections can be skipped by the reader familiar with the 1-D DCT) In the 1-D case the DCT is defined as
(4.3.4)
having support
(4.3.5)
It turns out that this 1-D DCT can be understood in terms of the DFT of a symmetrically extended sequence, (4.3.6) This is not the only way to symmetrically extend x , but this method results in the most widely used DCT sometimes called DCT-2 with support In fact, on defining the 2N point DFT
(4.3.7)
Thus the DCT is just the DFT analysis of the symmetrically extended signal defined in (4.3.6): Looking at this equation, we see that there is no overlap in its two components, which fit together without a gap. We can see that right after comes at position , where sits .
the rest of the nonzero part of x in reverse order, upto of symmetry midway between and N, i.e., at
. We thus
expect that the 2N point will be real valued except for the phase factor . So the phase factor in eqn (4.3.7) is just what is needed to cancel out the phase term in Y and make the DCT real , as it must if the two equations, (4.3.1) and (4.3.7), are to agree for real valued inputs x. To reconcile these two definitions, we start out with eqn (4.3.7), and proceed as follows:
the last line following from the original definition, eqn (4.3.4).
The formula for the inverse DCT, can be established similarly, starting out from
2) Energy conservations:
(4.3.8)
A unitary matrix is one whose inverse is the same as the transpose have
which is a slight modification on the DCT Parseval relation (4.3.8). So the unitary DCT preserves the energy of the signal x. It turns out that eigenvectors of the unitary DCT are the same as those of the symmetric tridiagonal matrix,
and this holds true for arbitrary values of the parameter . We can relate this matrix Q to the inverse covariance matrix of a 1-D first-order stationary Markov random sequence, with correlation coefficient necessarily satisfying
where sequence is
and
It can further be shown that when , , so that eigenvectors approximate each other too. Because the eigenvectors of a matrix and its inverse are the same, we then have the fact that the unitary DCT basis vectors approximate the Karhunen-Loeve expansion, with basis vectors given as the solution to the matrix-vector equation,
Thus the 1-D DCT of a first-order Markov random vector of dimension N should be close to the KLT of x when its correlation coefficient Since the 2-D DCT This ends the review of the 1-D DCT.
is just the separable operator resulting from application of the 1-D DCT along first one dimension and then the other, the order being immaterial, we can easily extend the 1-D DCT properties to the 2-D case. In terms of the connection of the 2-D DCT with the 2-DFT, we thus see that we must symmetrically extend in, say, the horizontal direction and then symmetrically extend that result in the vertical direction. The resulting symmetric function (extension) becomes
and
point DFT as
Comments
We see that both the 1-D and 2-D DCTs involve only real arithmetic for real-valued data, and this may
be important in some applications.
The symmetric extension property can be expected to result in fewer high frequency coefficients in DCT
with respect to DFT. Such would be expected for lowpass data, since there would often be a jump at the four edges of the period of the corresponding periodic sequence which is not consistent with small high- frequency coefficients in the DFS or DFT. Thus the DCT is attractive for lossy data storage applications, where the exact value of the data is not of paramount importance.
The DCT can be used for a symmetrical type of filtering with a symmetrical filter. 2-D DCT properties are easy generalizations of 1-D DCT properties.
where
by column vector
The vectors
where vectors.
KLT transform is an orthogonal linear transformation that can remove pairwise statistical correlation between the transform coefficients; ie the KLT transform coefficients satisfy
and
or
where
is the variance of
Note that statistical independence implies uncorrelation, but the reverse is not generally true (except for jointly Gaussian r v s ) The KLT can be derived by assuming
then becomes
we get
with the
variances of along the main diagonal of the matrix. Thus, in terms of column vectors of equation becomes
, the above
We see that the basis vectors of the KLT are the eigenvectors of
, orthonormalised to satisfy
[Note that whereas the other image transforms like DFT, DCT were independent of data, the KLT transformation depends on 2nd order satisfies of the data.] The variances of the KLT coefficients are the eigenvalues of and since is symmetric and positive definite, eigenvalues are real and positive. The KLT basic vectors and transform coefficients are also real. Besides decorrelating transform coefficients, the KLT has another useful property: it maximizes the number of transform coefficients that are small enough so that they are insignificant. For example, suppose the KLT coefficients are ordered according to decreasing variance ie
Also suppose that for reasons of economy, we transmit only the 1st pN coefficients where receiver then uses the truncated column vector as .
. The
is then
It can be shown that KLT minimizes the MSE due to truncation. To show that of all possible linear transforms operating on length N vectors having stationary statistics, the KTL minimizes the MSE due to truncation. Let be some other transform having unit length basis vectors are
First we show heuristically that the truncation error is minimum if the basis vectors orthogonalised using the well known Gram Schmidt procedure. Let be the vector space spanned by the "transmitted" vectors: for
and let
be
the space spanned by the remaining vectors. Therefore any vector plus vector i.e, as shown in figure below.
can be represented by
If
is the space spanned by transmitted basis vectors, then truncation error is the length of .Error is minimised if is to .
vector If
is the space spanned by transmitted basis vector, then truncation error is the length of vector is to .
Error is minimized if
If only
is transmitted, then MSE is the squared length of the error vector is orthogonal to , or within i.e., is orthogonal to
, which is clearly
Thus we assume
to be unitary i.e.,
(because basis vectors are of unit length) Now from equation ( ) we have ()
and therefore
as,
we get
we have As ie
which is
(Since
for
we have
The second summation ie and since we have which is = MSE for KTL ie This concludes that the MSE for any other linear transform exceeds that of the KTL transform.
matrix
(4.5.1) where
(4.5.2)
And
where A(i,j) is the (i,j) element of A,i,j=1,2...,N.. Thus, according to (4.5.1), (4.52) it is
And for
that is,
The Hadamard transform has good to very good energy packing properties. Fast algorithms for its computation in Remark Experimental results using the DCT, DST, and Hadamard transforms for texture discrimination have shown that the performance obtained was close to that of the optimal KL transform. At the same time, this near-optimal performance is obtained at substantially reduced complexity, due to the availability of fast computational schemes as reported before. subtractions and/or additions are also available.