Digital Image Processing
Digital Image Processing
The ability to see is one of the truly remarkable characteristics of living beings. It enables them to perceive and assimilate in a short span of time an incredible amount of knowledge about the world around them. The scope and variety of that which can pass through the eye and be interpreted by the brain is nothing short of astounding. It is thus with some degree of trepidation that we introduce the concept of visual information, because in the broadest sense, the overall significance of the term is overwhelming. Instead of taking into account all of the ramifications of visual information; the first restriction we shall impose is that of finite image size, In other words, the viewer receives his or her visual information as if looking through a rectangular window of finite dimensions. This assumption is usually necessary in dealing with real world systems such as cameras, microscopes and telescopes for example; they all have finite fields of view and can handle only finite amounts of information. The second assumption we make is that the viewer is incapable of depth perception on his own. That is, in the scene being viewed he cannot tell how far away objects are by the normal use of binocular vision or by changing the focus of his eyes. This scenario may seem a bit dismal. But in reality, this model describes an over whelming proportion of systems that handle visual information, including television, photographs, x-rays etc. In this setup, the visual information is determined completely by the wavelengths and amplitudes of light that passes through each point of the window and reach the viewers eye. If the world outside were to be removed and a projector installed that reproduced exactly the light distribution on the window, the viewer inside would not be able to tell the difference. Thus, the problem of numerically representing visual information is reduced to that of representing the distribution of light energy and wavelengths on the finite area of the window. We assume that the image perceived is "monochromatic" and static. It is determined completely by the perceived light energy (weighted sum of energy at perceivable wavelengths) passing through each point on the window and reaching the viewer's eye. If we impose Cartesian coordinates on the window, we can represent perceived light energy or "intensity" at point by . Thus represents the monochromatic visual information or "image" at the instant of time under consideration. As images that occur in real life situations cannot be exactly specified with a finite amount of numerical data, an approximation of must be made if it is to be dealt with by practical systems. Since number bases can be
changed without loss of information, we may assume to be represented by binary digital data. In this form the data is most suitable for several applications such as transmission via digital communications facilities, storage within digital memory media or processing by computer.
A digital image described in a 2D discrete space is derived from an analog image in a 2D continuous space through a sampling process that is frequently referred to as digitization. The mathematics of that sampling process will be described in subsequent Chapters. For now we will look at some basic definitions associated with the digital image. The effect of digitization is shown in figure 1.
is divided into N rows and M columns. The intersection of a row and a column is termed a pixel. integer coordinates with and
,which we might consider to be the physical signal that impinges on the face of a 2D ,color ( )and time .Unless otherwise stated, we will
sensor , is actually a function of many variables including depth consider the case of 2D, monochromatic, static images in this module.
The image shown in figure (1.1) has been divided into rows and The value assigned to every pixel is the average brightness in the pixel rounded to the nearest integer value. The process of representing the amplitude of the 2D signal at a given coordinate as an integer value with L different gray levels is usually referred to as amplitude quantization or simple quantization.
Common values
There are standard values for the various parameters encountered in digital image processing. These values can be caused by video standards, by algorithmic requirements, or by the desire to keep digital circuitry simple. Table 1 gives some comm
Symbol N M L
Quite frequently we see cases of M=N=2k where algorithms such as the (fast) Fourier transform.
of the brightness levels. When we speak of a gray-level image; when we speak of a binary image. In a binary image there are just two gray levels which can be referred to, for example, as "black" and "white" or "0" and "1".
array
(1)
Each element of the array refered to as "pixel" is a discrete quantity. The array represents a digital image. The above digitization requires a decision to be made on a value for N a well as on the number of discrete gray levels allowed for each pixel.
It is common practice in digital image processing to let N=2n and G = number of gray levels = are equally spaced between 0 to L in the gray scale.
Therefore the number of bits required to store a digitized image of size words a image with 256 gray levels (ie 8 bits/pixel) required a storage of
is bytes.
In other
The representation given by equ (1) is an approximation to a continuous image. Reasonable question to ask at this point is how many samples and gray levels are required for a good approximation? This brings up the question of resolution. The resolution (ie the degree of discernble detail) of an image is strangely dependent on both N and m. The more these parameters are increased, the closer the digitized array will approximate the original image. Unfortunately this leads to large storage and consequently processing requirements increase rapidly as a function of N and large m.
line-pair consists of one such line and its adjacent space. Thus width of line pair is and there are line-pairs per unit distance. A widely used definition of resolution is simply the smallest number of discernible line pairs per unit distance; for es 100 line pairs/mm. Gray level resolution: This refers to the smallest discernible change in gray level. The measurement of discernible changes in gray level is a highly subjective process.
We have considerable discretion regarding the number of Samples used to generate a digital image. But this is not true for the number of gray levels. Due to hardware constraints, the number of gray levels is usually an integer power of two. The most common value is 8 bits. It can vary depending on application. When an actual measure of physical resolution relating pixels and level of detail they resolve in the original scene are not necessary, it is not uncommon to refer to an L-level digital image of size resolution of pixels and a gray level resolution of L levels. as having a spatial
Characterization -the output value at a specific coordinate is dependent only on the input value at that same coordinate. -the output value at a specific coordinate is dependent on the input values in the neighborhood of that same coordinate.
*Global
--the output value at a specific coordinate is dependent on all the values in the input image..
Table 2: Types of image operations. Image size= operations per pixel. This is shown graphically in Figure(1.2).
neighborhood size=
Types of neighborhoods Neighborhood operations play a key role in modern digital image processing. It is therefore important to understand how images can be sampled and how that relates to the various neighborhoods that can be used to process an image. Rectangular sampling - In most cases, images are sampled by laying a rectangular grid over an image as illustrated in Figure(1.1). This results in the type of sampling shown in Figure(1.3ab). Hexagonal sampling-An alternative sampling scheme is shown in Figure (1.3c) and is termed hexagonal sampling. Both sampling schemes have been studied extensively and both represent a possible periodic tiling of the continuous image space. However rectangular sampling due to hardware and software and software considerations remains the method of choice. Local operations produce an output pixel value based upon the pixel values in
the neighborhood .Some of the most common neighborhoods are the 4-connected neighborhood and the 8connected neighborhood in the case of rectangular sampling and the 6-connected neighborhood in the case of hexagonal sampling illustrated in Figure(1.3).
Fig (1.3a)
Fig (1.3b)
Fig (1.3c)
Video Parameters
We do not propose to describe the processing of dynamically changing images in this introduction. It is appropriate-given that many static images are derived from video cameras and frame grabbers-to mention the standards that are associated with the three standard video schemes that are currently in worldwide use- NTSC, PAL, and SECAM. This information is summarized in Table 3.
Standard Property Images / Second Ms / image Lines / image (horiz./vert.)=aspect radio interlace Us / line
Table 3: Standard video parameters In a interlaced image the odd numbered lines (1, 3, 5.) are scanned in half of the allotted time (e.g. 20 ms in PAL) and the even numbered lines (2, 4, 6,.) are scanned in the remaining half. The image display must be coordinated with this scanning format. The reason for interlacing the scan lines of a video image is to reduce the perception of flicker in a displayed image. If one is planning to use images that have been scanned from an interlaced video source, it is important to know if the two half-images have been appropriately "shuffled" by the digitization hardware or if that should be implemented in software. Further, the analysis of moving objects requires special care with interlaced video to avoid 'Zigzag' edges. Tools Certain tools are central to the processing of digital images. These include mathematical tools such as convolution,Fourier analysis, and statistical descriptions, and manipulative tools such as chain codes and run codes. We will present these tools without any specific motivation. The motivation will follow in later sections. Convolution
Convolution
There are several possible notations to indicate the convolution of two (multi-dimensional) signals to produce an output signal. The most common are:
We shall use the first form ,with the following formal definitions. In 2D continuous space:
In 2D discrete space:
Properties of Convolution There are a number of important mathematical properties associated with convolution. Convolution is commutative.
Convolution is associative.
Convolution is distributive.
Fourier Transforms
The Fourier transform produces another representation of a signal, specifically a representation as a weighted sum of complex exponentials. Because of Euler's formula:
where we can say that the Fourier transform produces a representation of a (2D) signal as a weighted sum of sines and cosines. The defining formulas for the forward Fourier and the inverse Fourier transforms are as follows. Given an image a and its Fourier transform A, then the forward transform goes from the spatial domain (either continuous or discrete) to the frequency domain which is always continuous.
The inverse Fourier transform goes from the frequency domain back to the spatial domain.
The Fourier transform is a unique and invertible operation so that: Substituting the above expression in (1.27) we get
The specific formulas for transforming back and forth between the spatial domain and the frequency domain are given below In 2D continuous space:
In 2D Discrete space:
Importance of phase and magnitude Circularly symmetric signals Examples of 2D signals and transforms
There are a variety of properties associated with the Fourier transform and the inverse Fourier transform. The following are some of the most relevant for digital image processing. * The Fourier transform is, in general, a complex function of the real frequency variables. As such the transform con be written in terms of its magnitude and phase.
* A 2D signal can also be complex and thus written in terms of its magnitude and phase.
The symbol (*) indicates complex conjugation. For real signals equation leads directly to,
* If a 2D signal is real and even, then the Fourier transform is real and even
* The Fourier and the inverse Fourier transforms are linear operations
and
and
integers
contd.)
The energy, E, in a signal can be measured either in the spatial domain or the frequency domain. For a signal with finite energy: Parseval's theorem (2D continuous space) is
This "signal energy' is not to be confused with the physical energy in the phenomenon that produced the signal. If, for example, the value a[m,n] represents a photon count, then the physical energy is proportional to the amplitude 'a', and not the square of the amplitude. This is generally the case in video imaging.
*Given three, two dimensional signals a, b, and c and their Fourier transform A, B, and C:
and
In words,convolution in the spatial domain is equivalent to multiplication in the Fourier (frequency) domain and vice-versa. This is a central result which provide not only a methodology for the implementation of a convolution but also insight into how two signals interact with each other-under convolution - to produce a third signal. We shall make extensive use of this result later.
* If a two-dimensional signal
* If a two-dimensional signal
then:
If a two-dimensional signal
then:
Figure (1.4a)
Figure( 1.4b)
Figure( 1.4c )
and Figure
Both the magnitude and the phase functions are necessary for the complete reconstruction of an image from its Fourier transform. Figure(1. 5a) shows what happens when Figure (1.4a) is restored solely on the basis of the magnitude information and Figure (1.5b) shows what happens when Figure (1.4a) is restored solely on the basis of the phase information.
Figure(1.5a)
Figure (1.5b)
constant
Neither the magnitude information nor the phase information is sufficient to restore the image. The magnitude-only image Figure (1.5a) is unrecognizable and has severe dynamic range problems. The phase-only image Figure (1.5b) is barely recognizable, that is, severely degraded in quality. Circularly symmetric signals
An arbitrary 2D signal can always be written in a polar coordinate system as circular symmetry this means that:
where and . As a number of physical systems such as lenses exhibit circular symmetry, it is useful to be able to compute an appropriate Fourier representation.
(1.2)
where
and
The Fourier transform of a circularly symmetric 2D signal is a function of only the radial frequency
.The dependence on the angular frequency due to the circular symmetry. According to equ (1.2),
Statistics
Probability distribution function of the brightnesses Probability density function of the brightnesses
In image processing it is quite common to use simple statistical descriptions of images and sub-images. The notion of a statistic is intimately connected to the concept of a probability distribution, generally the distribution of signal amplitudes. For a given region-which could conceivably be an entire image-we can define the probability distribution function of the brightnesses in that region and probability density function of the brightnesses in that region. We will assume in the discussion that follows that we are dealing with a digitized image .
, is the probability that a brightness chosen from the region is less than or equal to a given increases from 0 to 1. is monotonic, non-decreasing in a and
The probability that a brightness in a region falls between a and expressed as where is the probability density function.
can be
we have
and
quantized (integer) brightness amplitudes, the interpretation of is the width of a brightness interval. We assume constant width intervals. The brightness probability density function is frequently estimated by counting the number of times that each brightness occurs in the region to generate a histogram, is 1. Said another way, the brightness a: .The histogram can then be normalized so that the total area under the histogram
for region is the normalized count of the number of pixels, N, in a region that have quantized
The brightness probability distribution function for the image is shown in Figure(1. 6a). The (unnormalized) brightness histogram which is proportional to the estimated brightness probability density function is shown in Figure(1. 6b). The height in this histogram corresponds to the number of pixels with a given brightness.
Figure (1.6a)
Figure( 1.6b)
Figure(1. 6): (a) Brightness distribution function of Figure(1. 4a) with minimum, median, and maximum indicated. (b) Brightness histogram of Figure (1.4a). Both the distribution function and the histogram as measured from a region are a statistical description of that region. It must be emphasized that both and should be viewed as estimates of true distributions when they are computed from a specific region. That is, we view an image and a specific region as one realization of the various random processes involved in the formation of that image and that region . In the same context, the statistics defined below must be viewed as estimates of the underlying parameters.
Average
The average brightness of a region is defined as sample mean of the pixel brightnesses within that region. The average, brightness over the N pixels within a region is given by: of the
Alternatively, we can use a formulation based upon the (unnormalized) brightness histogram, brightness values a. This gives:
,with discrete
The unbiased estimate of the standard deviation, standard deviation and is given by:
is an estimate of
Coefficient-of-variation
The dimensionless coefficient-of-variation, CV, is defined as:
Percentiles
The percentile, p%, of an unquantized brightness distribution is defined as that value of the brightness such that:
or equivalently
Three special cases are frequently used in digital image processing. * 0% the minimum value in the region * 50% the median value in the region * 100% the maximum value in the region.
All three of these values can be determined from Figure (1.6a). Mode The mode of the distribution is the most frequent brightness value. There is no guarantee that a mode exists or that it is unique. SignaltoNoise ratio
The signal-to-noise ratio, SNR, can have several definitions. The noise is characterized by its standard deviation, characterization of the signal can differ. If the signal is known to lie between two boundaries, the SNR is defined as: Bounded signal
.The then
(1.B)
If the signal is not bounded but has a statistical distribution then two other definitions are known:
where
and
The various statistics are given in Table 5 for the image and the region shown in Figure 7. Statistics from Fig (1.7 ) Statistic Image ROI
Fig (1.7 ). Region is the interior of the circle A SNR calculation for the entire image based on equ (1.3) is not directly available. The variations in the image brightnesses that lead to the large value of s (=49.5) are not, in general, due to noise but to the variation in local information. With the help of the region there is a way to estimate the SNR. We can use the (=4.0) and the dynamic range,
, for
calculate a global SNR (=33.3 dB) The underlying assumptions are that (1) the signal is approximately constant in that region and the variation in the region is therefore due to noise, and, (2 ) that the noise is the same over the entire image with a standard deviation given by
.
Module 2
Perception
Many image processing applications are intended to produce images that are to be viewed by human observers. It is therefore important to understand the characteristics and limitations of the human visual system to understand the receiver of the 2D signals. At the outset it is important to realise that (1) human visual system (HVS) is not well understood; (2) no objective measure exists for judging the quality of an image that corresponds to human assessment of image quality, and (3) the typical human observer does not exist Nevertheless, research in perceptual psychology has provided some important insights into the visual system [stock ham]. Elements of Human Visual Perception.
Figure( 2.1) The human eye The first part of the visual system is the eye. This is shown in figure(2.1) . Its form is nearly spherical and its diameter is approximately 20 mm. Its outer cover consists of the cornea' and sclera' The cornea is a tough transparent tissue in the front part of the eye. The sclera is an opaque membrane, which is continuous with cornea and covers the remainder of the eye. Directly below the sclera lies the choroids, which has many blood vessels.At its anterior extreme lies the iris diaphragm. The light enters in the eye through the central opening of the iris, whose diameter varies from 2mm to 8mm, according to the illumination conditions. Behind the iris is the lens which consists of concentric layers of fibrous cells and contains up to 60 to 70% of water. Its operation is similar to that of the man made optical lenses. It focuses the light on the retina which is the innermost membrane of the eye. Retina has two kinds of photoreceptors: cones and rods. The cones are highly sensitive to color. Their number is 6-7 million and they are mainly located at the central part of the retina. Each cone is connected to one nerve end. Cone vision is the photopic or bright light vision. Rods serve to view the general picture of the vision field. They are sensitive to low levels of illumination and cannot discriminate colors. This is the scotopic or dim-light vision. Their number is 75 to 150 million and they are distributed over the retinal surface. Several rods are connected to a single nerve end. This fact and their large spatial distribution explain their low resolution. Both cones and rods transform light to electric stimulus, which is carried through the optical nerve to the human brain for the high level image processing and perception.
Based on the anatomy of the eye, a model can be constructed as shown in Figure(2.2).Its first part is a simple optical system consisting of the cornea, the opening of iris, the lens and the fluids inside the eye. Its second part consists of the retina, which performs the photo electrical transduction, followed by the visual pathway (nerve) which performs simple image processing operations and carries the information to the brain.
Fig (2.2): A model of the human eye. Image Formation in the Eye. The image formation in the human eye is not a simple phenomenon. It is only partially understood and only some of the visual phenomena have been measured and understood. Most of them are proven to have non-linear characteristics. Two examples of visual phenomena are:Contrast sensitivity , Spatial Frequency Sensitivity Contrast sensitivity
Figure( 2.3) : The Weber ratio without background Let us consider a spot of intensity I+dI in a background having intensity I, as is shown in Figure (2.3) ; dI is increased from 0 until it becomes noticeable. The ratio dI/I, called Weber ratio, is nearly constant at about 2% over a wide range of illumination levels, except for very low or very high illuminations, as it is seen in Figure (2.3). The range over which the Weber ratio remains constant is reduced considerably, when the experiment of Figure (2.4) is considered. In this case, the background has intensity I0 and two adjacent spots have intensities I and I+dI, respectively. The Weber ratio is plotted as a function of the background intensity in Figure (2.4). The envelope of the lower limits is the same with that of Figure (2.3). The derivative of the logarithm of the intensity I is the Weber ratio:
Thus equal changes in the logarithm of the intensity result in equal noticeable changes in the intensity for a wide range of intensities. This fact suggests that the human eye performs a pointwise logarithm operation on the input image.
Another characteristic of HVS is that it tends to overshoot around image edges (boundaries of regions having different intensity). As a result, regions of constant intensity, which are close to edges, appear to have varying intensity. Such an example is shown in Figure (2.5). The stripes appear to have varying intensity along the horizontal dimension, whereas their intensity is constant. This effect is called Mach band effect. It indicates that the human eye is sensitive to edge information and that it has high-pass characteristics.
Figure (2.5a)
Figure( 2.5c)
(a) Vertical stripes having constant illumination; (b) Actual image intensity profile; (c) Perceived image intensity profile.
Spatial Frequency Sensitivity If the constant intensity (brightness) I0 is replaced by a sinusoidal grating with increasing spatial frequency (Figure2.6a), it is possible to determine the spatial frequency sensitivity. The result is shown in Figure (2.6a, 2.6b).
Figure (2.6a)
Figure (2.6b)
Figure 2.6(a) Figure 2.6(b) shows Sinusoidal test grating ; spatial frequency sensitivity To translate these data into common terms, consider an ideal computer monitor at a viewing distance of 50 cm. The spatial frequency that will give maximum response is at 10 cycles per degree. (See figure above) The one degree at 50 cm translates to 50 tan (1 deg.) =0.87 cm on the computer screen. Thus the spatial frequency of maximum responsefmax= =10 cycles/0.87 cm=11.46 cycles/cm at this viewing distance. Translating this into a general formula gives:
A color stimulus therefore specified by visible radiant energy of a given intensity and spectral composition.Color is generally characterised by attaching names to the different stimuli e.g. white, gray, back red, green, blue. Color stimuli are generally more pleasing to eye than black and stimuli .Consequently pictures with color are widespread in TV photography and printing. Color is also used in computer graphics to add spice to the synthesized pictures. Coloring of black and white pictures by transforming intensities into colors (called pseudo colors) has been extensively used by artist's working in pattern recognition. In this module we will be concerned with questions of how to specify color and how to reproduce it. Color specification consists of 3 parts: (1) Color matching (2) Color differences (3) Color appearance or perceived color We will discuss the first of these questions in this module
Let
denote the spectral power distribution (in watts /m2 /unit wavelength) of the light emanating from a pixel of the
image plane, and the wavelength. The human retina contains pre-dominantly three different color receptors (called cones) that are sensitive to 3 overlapping areas of the visible spectrum. The sensitivities of the receptors peak at approximately 445. (Called blue), 535 (called green) and 570 (called red) nanometers. Each type of receptors integrates the energy in the incident light at various wavelengths in proportion to their sensitivity to light at that wavelength. The three resulting numbers are primarily responsible for color sensation. This is the basis for trichromatic theory of color vision, which states that the color of light entering the eye may be specified by only 3 numbers, rather than a complete function of wavelengths over the visible range. This leads to significant economy in color specification and reproduction for human viewing. Much of the credit for this significant work goes to the physicist Thomas Young. The counterpart to trichromacy of vision is the Trichromacy of Color Mixture. This important principle states that light of any color can be synthesized by an appropriate mixture of 3 properly chosen primary colors. Maxwell in 1855 showed this using a 3-color projecting system. Several development took place since that time creating a large body of knowledge referred to as colorimetry. Although trichromacy of color is based on subjective & physiological finding, these are precise measurements that can be made to examine color matches.
Color matching
Consider a bipartite field subtending an angle (<) of 20 at a viewer's eye. The entire field is viewed against a dark, neutral surround. The field contains the test color on left and an adjustable mixture of 3 suitably chosen primary colors on the right as shown in Figure (2.7).
Figure (2.7) : 20 bipartial field at view's eye It is found that most test colors can be matched by a proper mixture of 3 primary colors as long as the primary colors are independent. The primary colors are usually chosen as red, green & blue or red, green & violet. The tristimulus values of a test color are the amount of 3 primary colors required to give a match by additive mixture.They are unique within an accuracy of the experiment. Much of colorimetry is based on experimental results as well as rules attributed to Grassman. Two important rules that are valid over a large range of observing conditions are linearity and additivity. They state that, 1) The color match between any two color stimuli holds even if the intensities of the stimuli are increased or decreased by the same multiplying factor, as long as their relative spectral distributions remain unchanged.
As
an
example,
if
stimuli and
and
match,
and stimuli
and
also match,
then
additive
mixtures
2) Another consequence of the above rules of Grassman trichromacy is that any four colors cannot be linearly independent. This implies tristimulus value of one of the 4 colors can be expressed as linear combination of tristimulus values of remaining 3 colors.. That is, any color C is specified by its projection on 3-axes R, G, B corresponding to chosen set of primaries. This is shown in Figure 2.8
Figure( 2.8) R,G,B tristimulus space. A color C is specified by a vector in three-dimensional space with components R,G and B (tristimulusvalues.)
By convention, tristimulus values are expressed in normalized form. This is done by a preliminary color experiment in which left side of the split field shown in Fig (2.7), is allowed to emit light of unit intensity whose spectral distribution is constant wrt white E).Then the amount of each primary required for a match is taken by definition as one unit. i.e. (equal energy
The amount of primaries for matching other test colors is then expressed in terms of this unit. In practice equal energy white E' is matched with positive amounts of each primary.
Figure(2.9): The color-matching functions for the 20 Standard Observer , using primaries of wavelengths 700(red), 546.1 (green), and 435.8 nm (blue), with units such that equal quantities of the three primaries are needed to match the equal energy white, E
The tristrimulus values of spectral colors (i.e. light of single wavelength) with unit intensity are called color matching functions. Thus color matching functions are tristimulus values wrt 3 given primary colors, of unit intensity
monochromatic light of wavelength . Figure (2.9) shows the color matching functions for 20standard observer using primaries of wavelength 700 (red), 546-1 (green) and 435-8 (blue) with units such that equal quantities of 3 primaries are needed to match the equal energy white E.
the tristimulus values can be found using color matching function as:
(2.1)
and
where (R1, G1, B1) & (R2, G2, B2) are tristimulus value of two distributions S1(
) and
, for all
This phenomenon of trichromatic matching is easily explained in terms of the trichromatic theory of color vision that is if all colors are analysed by the retina and converted into only three different types of responses, the eye will be unable to detect any difference between two stimuli that give the same retinal response, no matter how different they are in spectral composition. One consequence of the trichromacy of color vision is that there are many colors having different spectral trichromatic theory of color vision. That is, if all
colors are analyzed by the retina distributions that nevertheless have matching tristimulus values. Such colors are called metamers. This is shown in Figure (2.10). Colors with identical spectral distributions are called isomers
Figure( 2.10): Example of two spectral energy distributions P1 and P2 that are metameric with respect to each other , i.e., they look the same.