Unit 1 to 5 Computer Vision and Image Processing

Unit I
What is Computer Vision?

Computer vision is one of the most important fields of artificial intelligence (AI) and computer science
engineering that makes computer systems capable of extracting meaningful information from visual
data like videos and images.
Computer vision is a field of computer science that focuses on enabling computers to identify and
understand objects and people in images and videos.
Computer vision combines cameras, edge- or cloud-based computing, software, and artificial
intelligence (AI) to enable systems to “see” and identify objects.
The concept of computer vision is based on teaching computers to process an image at a pixel level and
understand it.
Computer vision uses deep learning to form neural networks that guide systems in their image
processing and analysis.
Once fully trained, computer vision models can perform object recognition, detect and recognize
people, and even track movement.
CV needs Image Processing as the very first step to be implemented before we can feed the machine
with those images.
What is Image Processing?

Image Processing is a step that analyses the Image and processes it digitally before we can provide input
to a Model.
Image processing is a way to convert an image to a digital aspect and perform certain mathematical
functions on it, to get an enhanced image or extract other useful information from it.
Image processing involves the following three steps.
Importing an image with an optical scanner or digital photography.
Analysis and image management including data compression and image enhancement and visual
detection patterns such as satellite imagery.
It produces the final stage where the result can be changed to an image or report based on image analysis.
Image processing is a way by which an individual can enhance the quality of an image or gather alerting
insights from an image and feed it to an algorithm to predict the later things.
Types of Image Processing

There are five main types of image processing:
Visualization - Find objects that are not visible in the image
Recognition - Distinguish or detect objects in the image
Sharpening and restoration - Create an enhanced image from the original image
Pattern recognition - Measure the various patterns around the objects in the image
Retrieval - Browse and search images from a large database of digital images that are similar to the
original image
Components of Image Processing
Computer
A general-purpose computer, which may be anything from a PC to a supercomputer, is used in an image
processing system. Sometimes, specifically built computers are utilized in specialized applications to
reach a specified degree of performance.
Hardware for Specialized Image Processing

It comprises the digitizer and hardware that can carry out basic operations, including an Arithmetic
Logic Unit (ALU), which can carry out simultaneous arithmetic and logical operations on whole
pictures.
Massive Storing
In applications involving image processing, the skill is essential. The three main types of digital storage
for image processing applications are as follows: Three types of storage exist (1) short-term storage, (2)
online storage for quick recall (3) archive storage, which is characterized by rare access.
Camera Sensors
It alludes to perception. The image sensor's primary function is to collect incoming light, transform it
into an electrical signal, measure that signal, and then output it to supporting electronics. It consists of
a two-dimensional array of light-sensitive components that convert photons into electrons. Images are
captured by equipment like digital cameras using image sensors like CCD and CMOS. Two components
are often needed on image sensors to collect digital pictures. The first is an actual tool (sensor) that can
detect the energy emitted by the object we want to turn into an image. The second is a digitizer, which
transforms a physical sensing device's output into digital form.
Image Display
The pictures are shown.
Software
The image processing software comprises specialized modules that carry out particular functions.
Hardcopy Equipment
Laser printers, film cameras, heat-sensitive equipment, inkjet printers, and digital equipment like optical
and CDROM discs are just a few examples of the instruments used to record pictures.
Networking
To send visual data through a networked computer, it is a necessary component. The most important
factor in picture transmission is bandwidth since image processing applications require vast amounts of
data.
History and Evolution of Computer Vision:-

1959—Most experiments started here when neurophysiologists showed an array of images to a cat in
an attempt to correlate responses in its brain.
1963—Computers were able to interpret the tridimensionality of a scene from a picture, and AI was
already an academic field.
1974—Optical character recognition (OCR) was introduced to help interpret texts printed in any
typeface.
1980—Dr. Kunihiko Fukushima, a neuroscientist from Japan, proposed Neocognitron, a hierarchical
multilayered neural network capable of robust visual pattern recognition, including corner, curve, edge,
and basic shape detection.
2000-2001—Studies on object recognition increased, helping in the development of the first real-time
face recognition application.
2010—ImageNet data were made available containing millions of tagged images across various object
classes that provided the foundation of CNNs and other deep learning models used today.
2014—COCO has also been developed to offer a dataset used in object detection and support future
research.
Applications of computer vision:
Automotive: monitoring busy intersections for near-miss incidents, helping autonomous vehicles
navigate across roads, detect and perceive objects. It also helps develop autopilot systems.
Manufacturing: counting and classifying items on a conveyor belt, conducting barcode analysis,
assisting in inventory management, rotary and laser die-cutting, etc.
Retail: monitoring product quality or quantity on shelves, assisting in inventory management, tracking
customer behavior, helping detect directional gaze.
Defense and security: assistance in detecting landmines, facial detection, weapon defect detection,
unmanned military vehicles development.
Agriculture: drone-based crop monitoring and damaged crop detection, automated pesticide spraying,
phenotyping, livestock count detection, smart systems for crop grading and sorting.
Healthcare: timely identification of diseases, precise diagnoses, blood loss measurement, medical
imaging with greater accuracy, nuclear medicine, and so on.
Robotics: monitoring and imitating movement, studying the environment, detecting the speed of a
moving object, learning specific manipulations in robotics.
Insurance: analyzing assets, calculating amounts of damage, analyzing paperwork data and reducing
fraud, minimizing or eliminating insurance disputes, and helping provide correct assessments of filed
claims.
Fashion: developing try-before-you-buy solutions, image-based outfit analysis, forecasting fashion
trends, identifying designer brands, providing customized fashion recommendations.
Media: filtering and classifying media, identifying fake news, monitoring brand exposure, analyzing
the effectiveness of advertising placement, and measuring eyes-on-screen attention.
Computer vision models:
A computer vision (CV) model is a processing block that takes uploaded inputs, like images or videos,
and predicts or returns pre-learned concepts or labels. Examples of this technology include image
recognition, visual recognition, and facial recognition.
There are many different computer vision models types that are ideally suited for a variety of use cases.
These include:
 Object Detection (locates objects in images and videos)

 Facial Recognition (matching a human face using a digital image or video)
 Image Segmentation (partitions images for easier analysis or interpretation)
 Edge Detection (identifies curves and edges in images)
 Image Classification (identifies and classifies objects within images and videos)
 Feature Matching (finds similar features in two images)
Why we need image filtering?
Image filtering is a fundamental and crucial step in computer vision for several reasons:
Noise reduction: Images captured by cameras or generated by various processes often contain noise,
which can be random variations in pixel values. Image filtering techniques help to reduce noise, making
it easier to extract meaningful information and features from the image.
Edge detection: Image filtering can enhance edges and boundaries in an image, making them more
prominent. Edge detection is essential for tasks like object recognition, segmentation, and feature
extraction.
Feature extraction: Filters are used to highlight specific patterns or features in an image, such as
textures, corners, and blobs. Feature extraction is vital for various computer vision tasks like object
detection, image matching, and content-based image retrieval.
Smoothing and blurring: Certain filters, like Gaussian filters, are used to smooth an image or apply a
blur effect. This is useful in tasks like image denoising, image compression, and preparing images for
further analysis.
Preprocessing for machine learning: Image filtering can be part of the preprocessing pipeline to
prepare the data before feeding it into machine learning algorithms. Filtering helps in standardizing the
data, reducing noise, and enhancing essential features, which can improve the performance of computer
vision models.
Image segmentation: Image filtering is often employed in image segmentation, where it helps to
separate regions or objects of interest from the background based on certain pixel characteristics.
Morphological operations: These are operations based on image filtering that modify the shape and
structure of objects in an image. Morphological operations are used for tasks like object dilation,
erosion, opening, and closing.
Types of Filters
There are several types of filters that can be used to modify an image. Some of the most common filters
are a blur, sharpen, edge detection, color correction, and noise reduction.
Blur: Blur filters are used to soften the edges of an image, creating a more subtle look. This can be used
to make an image look more natural or to reduce the visibility of distracting details.
Sharpen: Sharpen filters are used to make an image appear sharper and clearer. This can be used to
make an image look more detailed or to make the colors more vibrant.
Edge Detection: Edge detection filters are used to accentuate the outlines of an image. This can be used
to make objects stand out more or to create a more dramatic effect.
Color Correction: Color correction filters are used to adjust the hue, saturation, and brightness of an
image. This can be used to make an image look more realistic, or to create a specific color palette.
Noise Reduction: Noise reduction filters are used to remove unwanted noise from an image. This can
be used to make an image look more natural or to reduce the visibility of digital artifacts.
Image Representation
Image representation is a process of representation of images which can be in 2-D or three-dimensional.
Based on these representations, only the processing operations can be decided.
Image representation can be of many forms in computer science. In basic terminology, image
representation refers to the way which is used to convey information about the image and its features
such as color, pixels, resolution, intensity, brightness, how the image is stored, etc.
Some other pre-processing techniques are as follows:
1. Image editing: Image editing is alteration in images by means of graphics software tools.
2. Image restoration: Image restoration is extraction of the original image from the corrupt image in
order to retrieve the lost information.
3. Independent component analysis (ICA): This method is used to separate multivariate signals
computationally into additive subcomponents.
4. Anisotropic diffusion: This is also known as Perona-Malik Diffusion which is used to reduce noise
of an image without removing any important part of an image.
5. Linear filtering: This technique is used to process time varying input signals and produce output
signals which are subject to constraint of linearity.
6. Pixelation: This process is used to convert printed images into digitized ones.
7. Principal component analysis (PCA): This technique is used for feature extraction.
8. Partial differential equations: These techniques deal with effectively denoising images.
9. Hidden Markov model: This technique is used for image analysis in two dimensional models
10. Wavelets: It is a mathematical model which is used for image compression.
11. Self-organizing maps: This technique is used to classify images into a number of classes.
12. Point feature mapping: It is used to detect a specified target in a cluttered scene.
13. Histogram: The histogram is used to plots the number of pixels in the image in the form of curve.
Image Statistics Recognition Methodology:

Image Formatting
Image Formatting means capturing an image by bringing it into a digital form -- already
covered in the section on digitizing images.
Conditioning
In an image, there are usually features which are uninteresting, either because they were
introduced into the image during the digitization process as noise, or because they form part of
a background. An observed image is composed of informative patterns modified by
uninteresting random variations. Conditioning suppresses, or normalizes, the uninteresting
variations in the image, effectively highlighting the interesting parts of the image.
Labeling
Informative patterns in an image have structure. Patterns are usually composed of adjacent
pixels which share some property such that it can be inferred that they are part of the same
structure (e.g., an edge). Edge detection techniques focus on identifying continuous adjacent
pixels which differ greatly in intensity or colour, because these are likely to mark boundaries,
between objects, or an object and the background, and hence form an edge. After the edge
detection process is complete, many edge will have been identified. However, not all of the
edges are significant. Thesholding filters out insignificant edges. The remaining edges are
labeled. More complex labeling operations may involve identifying and labeling shape
primitives and corner finding.
Grouping
Labeling finds primitive objects, such as edges. Grouping can turn edges into lines by
determining that different edges belong to the same spatial event. The first 3 operations
represent the image as a digital image data structure (pixel information), however, from the
grouping operation the data structure needs also to record the spatial events to which each pixel
belongs. This information is stored in a logical data structure.
Extracting
Grouping only records the spatial event(s) to which pixels belong. Feature extraction involves
generating a list of properties for each set of pixels in a spatial event. These may include a set's
centroid, area, orientation, spatial moments, grey tone moments, spatial-grey tone moments,
circumscribing circle, inscribing circle, etc. Additionally properties depend on whether the
group is considered a region or an arc. If it is a region, then the number of holes might be
useful. In the case of an arc, the average curvature of the arc might be useful to know. Feature
extraction can also describe the topographical relationships between different groups. Do they
touch? Does one occlude another? Where are they in relation to each other? etc.
Matching
Finally, once the pixels in the image have been grouped into objects and the relationship
between the different objects has been determined, the final step is to recognise the objects in
the image. Matching involves comparing each object in the image with previously stored
models and determining the best match template matching.
Morphological Image Processing:
he most basic morphological operations are dilation and erosion. Dilation adds pixels to the
boundaries of objects in an image, while erosion removes pixels on object boundaries. The
number of pixels added or removed from the objects in an image depends on the size and shape
of the structuring element used to process the image. In the morphological dilation and erosion
operations, the state of any given pixel in the output image is determined by applying a rule to
the corresponding pixel and its neighbors in the input image. The rule used to process the pixels
defines the operation as a dilation or an erosion. This table lists the rules for both dilation and
erosion.
Rules for Dilation and Erosion
Operation Rule
Dilation The value of the output pixel is the maximum value of all pixels in the neighborhood. In a binary image, a pixel is set to 1 if any of the neighborin
Morphological dilation makes objects more visible and fills in small holes in objects. Lines appear thicker, and filled shapes appear larger.
Erosion The value of the output pixel is the minimum value of all pixels in the neighborhood. In a binary image, a pixel is set to 0 if any of the neighborin
Morphological erosion removes floating pixels and thin lines so that only substantive objects remain. Remaining lines appear thinner and shapes
Operations Based on Dilation and Erosion

Dilation and erosion are often used in combination to implement image processing operations.
For example, the definition of a morphological opening of an image is an erosion followed by
a dilation, using the same structuring element for both operations. You can combine dilation
and erosion to remove small objects from an image and smooth the border of large objects.
This table lists functions in the toolbox that perform common morphological operations that
are based on dilation and erosion.
Function Morphological Definition E
imopen Perform morphological opening. The opening operation erodes an image and then dilates the
eroded image, using the same structuring element for both operations.
Morphological opening is useful for removing small objects and thin lines from an image while
preserving the shape and size of larger objects in the image. For an example, see Use
Morphological Opening to Extract Large Image Features.
imclose Perform morphological closing. The closing operation dilates an image and then erodes the
dilated image, using the same structuring element for both operations.
Morphological closing is useful for filling small holes in an image while preserving the shape
and size of large holes and objects in the image.
bwskel Skeletonize objects in a binary image. The process of skeletonization erodes all objects to
centerlines without changing the essential structure of the objects, such as the existence of holes
and branches.
bwperim Find perimeter of objects in a binary image. A pixel is part of the perimeter if it is nonzero and
it is connected to at least one zero-valued pixel. Therefore, edges of interior holes are considered
part of the object perimeter.
bwhitmiss Perform binary hit-miss transform. The hit-miss transform preserves pixels in a binary image
whose neighborhoods match the shape of one structuring element and do not match the shape of
a second disjoint structuring element.
Function Morphological Definition E
The hit-miss transforms can be used to detect patterns in an image.
imtophat Perform a morphological top-hat transform. The top-hat transform opens an image, then subtracts
the opened image from the original image.
The top-hat transform can be used to enhance contrast in a grayscale image with nonuniform
illumination. The transform can also isolate small bright objects in an image.
imbothat Perform a morphological bottom-hat transform. The bottom-hat transform closes an image, then
subtracts the original image from the closed image.
The bottom-hat transform isolates pixels that are darker than other pixels in their neighborhood.
Therefore, the transform can be used to find intensity troughs in a grayscale image.
Hit-and-Miss Transform
The hit-and-miss transform is a general binary morphological operation that can
be used to look for particular patterns of foreground and background pixels in an
image. It is actually the basic operation of binary morphology since almost all the
other binary morphological operators can be derived from it. As with other binary
morphological operators it takes as input a binary image and a structuring
element, and produces another binary image as output.
How It Works
The structuring element used in the hit-and-miss is a slight extension to the type that has been
introduced for erosion and dilation, in that it can contain both foreground and background
pixels, rather than just foreground pixels, i.e. both ones and zeros. Note that the simpler type
of structuring element used with erosion and dilation is often depicted containing both ones
and zeros as well, but in that case the zeros really stand for `don't care's', and are just used to
fill out the structuring element to a convenient shaped kernel, usually a square. In all our
illustrations, these `don't care's' are shown as blanks in the kernel in order to avoid confusion.
An example of the extended kind of structuring element is shown in Figure 1. As usual we
denote foreground pixels using ones, and background pixels using zeros.
The hit-and-miss operation is performed in much the same way as other morphological operators, by
translating the origin of the structuring element to all points in the image, and then comparing the
structuring element with the underlying image pixels. If the foreground and background pixels in the
structuring element exactly match foreground and background pixels in the image, then the pixel
underneath the origin of the structuring element is set to the foreground color. If it doesn't match, then
that pixel is set to the background color.
After obtaining the locations of corners in each orientation, We can then simply OR all these
images together to get the final result showing the locations of all right angle convex corners
in any orientation. Figure 3 shows the effect of this corner detection on a simple binary image.
Figure 3 Effect of the hit-and-miss based right angle convex corner detector on a simple binary
image. Note that the `detector' is rather sensitive.
Implementations vary as to how they handle the hit-and-miss transform at the edges of images
where the structuring element overlaps the edge of the image. A simple solution is to simply
assume that any structuring element that overlaps the image does not match underlying pixels,
and hence the corresponding pixel in the output should be set to zero.
Morphological algorithm operations on binary images

Morphological operations are affecting the form, structure or shape of an object. They are
usually applied on binary images (black & white images – images with only
2 colors: black and white). They are used in pre- or post- processing (filtering, thinning, and
pruning) or for getting a representation or description of the shape of objects/regions
(boundaries, skeletons convex hulls).
Theoretical considerations The two principal morphological operations are dilation and erosion
. Dilation allows objects to expand, thus potentially filling in small holes and connecting
disjoint objects. Erosion shrinks objects by etching away (eroding) their boundaries. These
operations can be customized for an application by the proper selection of the structuring
element, which determines exactly how the objects will be dilated or eroded.
Notations: Object / foreground pixels: pixels of interest (on which the morphological
operations are applied) Background pixels: the complementary set of the object / foreground
pixels
The dilation: The dilation process is performed by laying the structuring element B on the
image A and sliding it across the image in a manner similar to convolution (will be presented
in a next laboratory). The difference is in the operation performed. It is best described in a
sequence of steps:
1. If the origin of the structuring element coincides with a 'background' pixel in the image, there
is no change; move to the next pixel.
2. If the origin of the structuring element coincides with an 'object' pixel in the image, make
(label) all pixels from the image covered by the structuring element as ‘object’ pixels.
The erosion The erosion process is similar to dilation, but we turn pixels to 'background', not
'object'. As before, slide the structuring element across the image and then follow these steps:
1. If the origin of the structuring element coincides with a 'background' pixel in the image, there
is no change; move to the next pixel. 2. If the origin of the structuring element coincides with
an 'object' pixel in the image, and any of the 'object' pixels in the structuring element extend
beyond the 'object' pixels in the image, then change the current 'object' pixel in the image
(above you have positioned the structuring element center) to a 'background' pixel.
Morphological algorithm operations on gray-scale images

The binary morphology can be easily extended to gray-scale morphology. The only differences
result from the definitions of dilation and erosion because other operations basically depend on
them.
Gray-Scale Dilation and Erosion
A gray-scale image can be considered as a three-dimensional set where the first two elements
are the x and y coordinates of a pixel and the third element is gray-scale value. It can be also
applied to the gray-scale structuring element. With this concept, gray-scale dilation can be
defined as follows.
Gray-scale dilation of f by b, denoted by , is defined aswhere Df and Db are the domains
of f and b, respectively.
Gray-scale erosion, denoted by is defined aswhere Df and Db are the domains of each image
or function. Gray-scale dilation and erosion are duals with respect to function completion and
reflection. That is the relation iswhere and the minimum operator (2.2) will interrogate a
neighborhood with a certain domain and select the smallest pixel value to become the output
value. This has the effect of causing the bright areas of an image to shrink or erode. Similarly,
gray-scale dilation is performed in (2.1) to select the greatest value in a neighborhood.
Gray-Scale Opening and Closing
The opening of a gray image f by a structuring element b, denoted by is defined asand closing
can be defined as
Morphological Gradient
The original image is transformed into a gradient image, which represents the edge strength of
each pixel. A threshold is applied to classify each pixel to the edge point or nonedge point.
Multiscale edge detector. For greater robustness to noise, a multiscale gradient algorithm can
be considered. The term multiscale means analysis of the image with structuring elements of
different sizes. The combination of the morphological gradients in different scales is insensitive
not only to noise but also to extract various finenesses of the edges. The applicable multiscale
edge detector is introduced in [8] to find the gradient of the image:where n is scale
and bi denotes the group of square structuring elements where sizes are pixels.
Watershed Transformation
The watersheds concept is one of the classic tools in the field of topography. It is the line that
determines where a drop of water will fall into particular region. In mathematical morphology,
gray-scale images are considered as topographic relieves. In the topographic representation of
a given image I, the intensity value of each pixel stands for the elevation at this point. The
initial concept of the watersheds transformation as a morphological tool was introduced by
Digabel and Lantuéjoul [9]. Later, a joint work of Lantuéjoul and Beuche led to the
“immersion” of this original algorithm in order to extend it to the more general framework of
gray-scale images. Later, watersheds were studied by many other researchers and used in
numerous gray-scale segmentation problems. The efficient algorithm for watersheds suggested
by Vincent [8, 10] was reviewed briefly and used throughout the entire simulation.
Oversegmentation is the embedded problem of watersheds. The transformation makes a
number of regions as an output. This problem comes mostly from the noise and quantization
error. To eliminate the effect of local minima from noise or quantization error in the final
results, first the gradient of the original image is computed as a preprocessing and then
watershed transformation is applied. Noise has a quite different value relative to its neighbor
hence it shows a high gradient value. Another approach to eliminate noise and error effects is
to apply efficient algorithm as a past processing.
Algorithm
(i) Read the color image and convert it to gray-scale.
(ii) Develop gradient images using appropriate edge detection function.
(iii) Mark the foreground objects using morphological reconstruction (better than the
opening image with a closing).
(iv) Calculating the regional maxima and minima to obtain the good forward markers.
(v) Superimpose the foreground marker image on the original image.
(vi) Clean the edges of the markers using edge reconstruction.
(vii) Compute the background markers.
(viii) Compute the watershed transform of the function.
Thinning
This is somewhat similar to erosion or opening operation that we discussed earlier. As clear
from the name, this is used to thin the foreground region such that
its extent and connectivity is preserved. Preserving extent means preserving the endpoints of
a structure whereas connectivity can refer to either 4-connected or 8-connected. Thinning is
mostly used for producing skeletons which serve as image descriptors, and for reducing the
output of the edge detectors to a one-pixel thickness, etc.
There are various algorithms to implement the thinning operation such as
 Zhang Suen fast parallel thinning algorithm
 Non-max Suppression in Canny Edge Detector
 Guo and Hall’s two sub-iteration parallel Thinning algorithm
 Iterative algorithms using morphological operations such as hit-or-miss, opening and
erosion, etc
Thickening
Thickening is the dual of thinning and thus is equivalent to applying the thinning operation on the
background or on the complement of the set A.
Thickening is a morphological operation that is used to grow selected regions of foreground

pixels in binary images, somewhat like dilation or closing. It has several applications,
including determining the approximate convex hull of a shape, and determining the skeleton by
zone of influence. Thickening is normally only applied to binary images, and it produces
another binary image as output.
The thickening operation is related to the hit-and-miss transform, and so it is helpful to have
an understanding of that operator before reading on.
Region growing is a simple region-based image segmentation method. It is also classified as
a pixel-based image segmentation method since it involves the selection of initial seed points.
This approach to segmentation examines neighboring pixels of initial seed points and
determines whether the pixel neighbors should be added to the region. The process is iterated
on, in the same manner as general data clustering algorithms. A general discussion of the region
growing algorithm is described below.
Unit2
Image representation and description:
This progression step taken part after the image segmentation of objects has been done which is
utilized for effective discovery and acknowledgment of items in a scene to define the quality features
during design acknowledgment or in quantitative codes for proficient capacity during image
compression. Image portrayal and depiction strategies can be sorted into two classes of techniques as
shape-based strategies and locale-based techniques. This classification depends on whether the image
is extricated from the contour part that forms the whole shape area.
The procedures for the contour shape are utilized to misuse just data of the shape boundary. The
contour shape technique utilizes two sorts of contour approaches named continuous approach and
discrete approaches also known as global approach and structural approach, respectively. Both of
these techniques have been explained in chapter 1 in section Image representation and description.
Some other pre-processing techniques are as follows:
1. Image editing: Image editing is alteration in images by means of graphics software tools.
2. Image restoration: Image restoration is extraction of the original image from the corrupt image in
order to retrieve the lost information.
3. Independent component analysis (ICA): This method is used to separate multivariate signals
computationally into additive subcomponents.
4. Anisotropic diffusion: This is also known as Perona-Malik Diffusion which is used to reduce noise
of an image without removing any important part of an image.
5. Linear filtering: This technique is used to process time varying input signals and produce output
signals which are subject to constraint of linearity.
6. Pixelation: This process is used to convert printed images into digitized ones.
7. Principal component analysis (PCA): This technique is used for feature extraction.
8. Partial differential equations: These techniques deal with effectively denoising images.
9. Hidden Markov model: This technique is used for image analysis in two dimensional models
10. Wavelets: It is a mathematical model which is used for image compression.
11. Self-organizing maps: This technique is used to classify images into a number of classes.
12. Point feature mapping: It is used to detect a specified target in a cluttered scene.
13. Histogram: The histogram is used to plots the number of pixels in the image in the form of curve.
Common Representation scheme

Image Representation
Common Representation scheme
Common external representation methods are:
Chain code
Polygonal approximation
Boundary descriptors
Boundary descriptors are a type of image descriptor that describe the boundaries of an image. They
include:
Boundary length: The number of pixels along a contour
Boundary diameter: The maximum distance between two points on the boundary
Curvature: The difference between the slopes of adjacent boundary segments at the point of
intersection of the segments
Boundary segments
Boundary descriptors are different from regional descriptors, which describe the regions of an image.
Regional descriptors include: Area, Perimeter, Compactness, Mean value.
Boundary Descriptors
There are many features that depend on boundary descriptors of objects such as bending energy,
curvature etc. For an irregularly shaped object, the boundary direction is a better representation
although it is not directly used for shape descriptors like centroid, orientation, area etc. Consecutive
points on the boundary of a shape give relative position or direction. A 4- or 8-connected chain code
is used to represent the boundary of an object by a connected sequence of straight line segments. 8
connected number schemes are used to represent the direction in this case. It starts with a beginning
location and a list of numbers representing directions such as d1 , d 2 ,⋅ ⋅ ⋅⋅, d N . Each direction
provides a compact representation of all the information in a boundary. The directions also represent
the slope of the boundary. In Figure 1, redrawn from, an 8 connectivity chain code is displayed where
the boundary description for the boxes with red arrows will be 2-1-0-7-7-0-1-1.
Figure 1: boundary descriptor.
Curvature
The rate of change of a slope is called the curvature. As the digital boundary is generally jagged,
getting a true measure of curvature is difficult. The curvature at a single point in the boundary can be
defined by its adjacent line segments. The difference between slopes of two adjacent (straight) line
segments is a good measure of the curvature at that point of intersection.
The curvature of the boundary at ( xi , y i ) can be estimated from the change in the slope is given by:
Curvature (κ) is a local attribute of a shape. The object boundary is traversed clockwise for finding the
curvature. A vertex point is in a convex segment when the change of slope at that point is positive;
otherwise that point is in a concave segment if there is a negative change in slope as shown in Figure
2.
Figure 2: curvature of a boundary
Bending Energy
The descriptor called bending energy is obtained by integrating the squared curvature κ ( p ) through
the boundary length L . It s a robust shape descriptor and can be used for matching shapes.
The value 2π / R will be obtained as its minimum for a perfect circle with radius R and the value will
be higher for an irregular object.
Total Absolute Curvature

Total absolute curvature is the curvatures added along the boundary points and divided by the
boundary length.
As the convex object will have the minimum value, a rough object will have a higher value.
Region based segmentation
The main idea here is to classify a particular image into a number of regions or classes. Thus for each
pixel in the image we need to somehow decide or estimate which class it belongs to. Region-based
segmentation methods attempt to partition or group regions according to common image properties.
These image properties consist of
 Intensity values from original images, or computed values based on an image operator
 Textures or patterns that are unique to each type of region
 Spectral profiles that provide multidimensional image data
Region-based segmentation framework

The main idea of our region-based segmentation framework is based on the fact that a wide range of
noise types is present in real-life applications, particularly including noise models that are
fundamentally diï¬€erent from additive Gaussian noise. To formulate a segmentation framework for
different noise models and thus for a large set of imaging modalities, we use tools from statistics.
First, we introduce some preliminary deï¬•nitions to describe our model accurately.
Let â„¦ ⊂ Rd be the image domain (we consider the typical cases d ∈ {2, 3}) and let f be the given
(noisy) image we want to segment. The segmentation problem consists in separation of the image
domain â„¦ into an “optimal” partition P m (â„¦) of pairwise disjoint regions â„¦i , i = 1, . . . , m, i.e.,
Naturally, the partition Pm (â„¦) is meant to
What is a Binary Image?
Binary images are images whose pixels have only two possible intensity values. Numerically, the two
values are often 0 for black, and either 1 or 255 for white.
The main reason binary images are particularly useful in the field of Image Processing is because they
allow easy separation of an object from the background. The process of segmentation allows to label
each pixel as ‘background’ or ‘object’ and assigns corresponding black and white colours.
Thresholding — how to generate a binary image
A binary image is obtained from a grey-scale image by following a process of information

abstraction. Thresholding is the main techniques used at this stage. Thresholding is the assignment of
each pixel in an image to either a true or false class based on the pixel's value, location or both. The
result of a thresholding operation is typically a binary image in which each pixel is assigned either a
true or false value.
Some popular thresholding algorithms are:
 median thresholding: pick a global threshold value that results in a final image as close
as possible to 50% white, 50% black.
 entropy thresholding: another way to pick a global threshold value
various "local thresholding" algorithms
What is Image Segmentation?
Image segmentation is the process of dividing an image into multiple meaningful and homogeneous
regions or objects based on their inherent characteristics, such as color, texture, shape, or brightness.
Image segmentation aims to simplify and/or change the representation of an image into something more
meaningful and easier to analyze. Here, each pixel is labeled. All the pixels belonging to the same
category have a common label assigned to them. The task of segmentation can further be done in two
ways:
 Similarity: As the name suggests, the segments are formed by detecting similarity between
image pixels. It is often done by thresholding (see below for more on thresholding). Machine
learning algorithms (such as clustering) are based on this type of approach for image
segmentation.
 Discontinuity: Here, the segments are formed based on the change of pixel intensity values
within the image. This strategy is used by line, point, and edge detection techniques to obtain
intermediate segmentation results that may be processed to obtain the final segmented image.
Types of Segmentation
Image segmentation modes are divided into three categories based on the amount and type of
information that should be extracted from the image: Instance, semantic, and panoptic. Let’s look at
these various modes of image segmentation methods.
Also, to understand the three modes of image segmentation, it would be more convenient to know more
about objects and backgrounds.
Objects are the identifiable entities in an image that can be distinguished from each other by assigning
unique IDs, while the background refers to CCL works by scanning an image pixel-by-pixel to identify
connected pixel regions. The algorithm consists of three steps:
1. Identify a seed point within each connected component
2. Assign a unique label to each seed point
3. Expand the labels to fill all pixels in the respective regions

The resulting matrix is called a label matrix.
CCL can also be used with color images and higher dimensional data.
parts of the image that cannot be counted, such as the sky, water bodies, and other similar elements. By
distinguishing between objects and backgrounds, it becomes easie to understand the different modes of
image segmentation and their respective applications.
Instance Segmentation
Instance segmentation is a type of image segmentation that involves detecting and segmenting each
object in an image. It is similar to object detection but with the added task of segmenting the object’s
boundaries. The algorithm has no idea of the class of the region, but it separates overlapping objects.
Instance segmentation is useful in applications where individual objects need to be identified and
tracked.
Hierarchical segmentation
In a hierarchical segmentation, an object of interest may be represented by multiple image segments
in finer levels of detail in the segmentation hierarchy. These segments can then be merged into a
surrounding region at coarser levels of detail in the segmentation hierarchy.
Here's how algorithms typically start to construct the hierarchy:
1. Use an initial set of regions that conforms to the finest possible partition
2. Iteratively combine adjacent regions
3. A new node represents the output regions on the graph as parent of the merged regions
General image segmentation is used as a pre-processing step for solving high-level vision problems,
such as object recognition and image classification.
Spatial clustering
Spatial clustering is a process that groups a set of objects into clusters. Objects within a cluster are
similar, while clusters are as dissimilar as possible.
In image processing, spatial clustering separates multiple features in an image into separate
masks. These masks can then be used for further analysis. Spatial Clustering¶
Segment features of images based on their distance to each other.
plantcv.spatial_clustering(mask, algorithm="DBSCAN", min_cluster_size=5, max_distance=None)
returns image showing all clusters colorized, individual masks for each cluster.
Parameters:
mask - Mask/binary image to segment into clusters.
algorithm - Algorithm to use to segregate feature in image. Currently, "DBSCAN" and "OPTICS" are
supported.
"OPTICS" is slower but has better resolution for smaller objects, and "DBSCAN" is faster and useful
for larger features in the image (like separating two plants from each other).
min_cluster_size - The minimum size an feature of the image must be (in pixels) before it can be
considered its own cluster.
max_distance - The maximum distance between two pixels before they can be considered a part of the
same cluster.
When using "DBSCAN," this value must be between 0 and 1. When using "OPTICS," the value is the
pixels and depends on the size of your image.
Spatial clustering methods can be categorized into four types:
 Partitioning method
 Hierarchical method
 Density-based method
 Grid-based method
Split& merge
Split and merge segmentation is a technique used in image processing to segment an image. It
involves splitting an image into quadrants based on a homogeneity criterion, and then merging similar
regions to create the segmented result.
The split and merge algorithm has four processing phases and requires several input
parameters. These parameters include the regular and relaxed predicates, P, and P,, and the initial cut
set size. The predicates are used to test for region homogeneity.
The split and merge algorithm carries out the following four processes:
 Split
 Merge
 Grouping
 Small-region elimination
The first process of the split and merge algorithm merges quad siblings in a branch.
The basic representational structure is pyramidal. For example, a square region of size m by m at one
level of a pyramid has 4 sub-regions of size by below it in the pyramid.
In the Region Splitting and Merging method, the entire image is considered at once, and the pixels are
merged together into a single region or split with respect to the entire image.
RULE-BASED IMAGE ANALYSIS: This section presents the integration of a rule based reasoning
system into an image segmentation algorithm. Based on the foundations described in the previous
section, we introduce a novel segmentation algorithm that relies on fuzzy region labeling and rules to
solve the problem of image oversegmentation. A. Semantic region merging Recursive Shortest
Spanning Tree, or simply RSST, is a bottom-up segmentation algorithm that begins from the pixel
level and iteratively merges neighbor regions according to a distance value until certain termination
criteria are satisfied. This distance is calculated based on color and texture characteristics, which are
independent of the area’s size. In every step the two regions with the least distance are merged; visual
characteristics of the new region are extracted and all distances are updated accordingly. We introduce
here a modified version of RSST, called Semantic RSST (S-RSST) that aims to improve the usual
oversegmentation results by incorporating region labeling in the segmentation process In this
approach the distance between two adjacent regions a and b (vertices va and vb in the graph) is
calculated using NEST, in a fashion described later on, and this dissimilarity value i is assigned as the
weight of the respective graph’s edge eab.
This update procedure consists of the following two actions: 1) Re-evaluation of the degrees of
membership of the labels fuzzy set in a weighted average (w.r.t. the regions’ size) fashion. 2) Re-
adjustment of the ARG edges by removing edge eab and re-evaluating the weight of the affected
edges invoking NEST. This procedure continues until the edge e∗ with the least weight in the ARG is
bigger than a threshold: w(e∗) > Tw. This threshold is calculated in the beginning of the algorithm,
based on the histogram of all weights in E.
Motion-based segmentation
Motion-based segmentation of images refers, here, to partitioning an image into regions of

homogenous 2D (apparent) motion. "Homogenous" generally implies a continuity of the motion field,
or the possibility of having the motion field described by a parametric motion model.
Motion-based segmentation is multi-purpose task in computer vision :
 Image understanding (content analysis), which applications include surveillance, video

indexing...
 Video object-based coding, directed mainly at low-rate video transmission.
If the goal is image understanding, color/grey-level/texture can help locate interesting zones/objects.
Though, a partition based on such criteria will often contain too many regions to be exploitable,
interesting objets hence being split into several regions. Often, scenes consist in animated regions
(people, vehicles...) of interest on some background scene, and in such cases, motion is a far more
appropriate criterion.
Area Extraction
Feature extraction is a part of the dimensionality reduction process, in which, an initial set of the raw
data is divided and reduced to more manageable groups. So when you want to process it will be
easier. The most important characteristic of these large data sets is that they have a large number of
variables. These variables require a lot of computing resources to process. So Feature extraction helps
to get the best feature from those big data sets by selecting and combining variables into features,
thus, effectively reducing the amount of data. These features are easy to process, but still able to
describe the actual data set with accuracy and originality.
Applications of Feature Extraction
 Bag of Words- Bag-of-Words is the most used technique for natural language
processing. In this process they extract the words or the features from a sentence,
document, website, etc. and then they classify them into the frequency of use. So in this
whole process feature extraction is one of the most important parts.
 Image Processing –Image processing is one of the best and most interesting domain. In
this domain basically you will start playing with your images in order to understand them.
So here we use many many techniques which includes feature extraction as well and
algorithms to detect features such as shaped, edges, or motion in a digital image or video
to process them.
 Auto-encoders: The main purpose of the auto-encoders is efficient data coding which is
unsupervised in nature. this process comes under unsupervised learning . So Feature
extraction procedure is applicable here to identify the key features from the data to code
by learning from the coding of the original data set to derive new ones.
Unit 3
Region Analysis:
Region-based segmentation involves dividing an image into regions with similar characteristics. Each
region is a group of pixels, which the algorithm locates via a seed point. Once the algorithm finds the
seed points, it can grow regions by adding more pixels or shrinking and merging them with other
points.
There are two variants of region-based segmentation:
 Region growing − This method recursively grows segments by including neighboring pixels
with similar characteristics. It uses the difference in gray levels for gray regions and the
difference in textures for textured images.
 Region splitting − In this method, the whole image is considered a single region. Now to
divide the region into segments it checks for pixels included in the initial region if they
follow the predefined set of criteria. If they follow similar rules they are taken into one
segment.
 Top-down approach
 First, we need to define the predefined seed pixel. Either we can define all pixels as
seed pixels or randomly chosen pixels. Grow regions until all pixels in the image
belongs to the region.
 Bottom-Up approach
 Select seed only from objects of interest. Grow regions only if the similarity criterion
is fulfilled.
 Similarity Measures:
 Similarity measures can be of different types: For the grayscale image the similarity
measure can be the different textures and other spatial properties, intensity difference
within a region or the distance b/w mean value of the region.
 Region merging techniques:
 In the region merging technique, we try to combine the regions that contain the single
object and separate it from the background.. There are many regions merging
techniques such as Watershed algorithm, Split and merge algorithm, etc.
 Pros:
 Since it performs simple threshold calculation, it is faster to perform.
 Region-based segmentation works better when the object and background have high
contrast.
 Limitations:
 It did not produce many accurate segmentation results when there are no significant
differences b/w pixel values of the object and the background.
Region properties
• Many properties can be extracted from an image
region
– area
– length of perimeter
– orientation
– etc.
• These properties can be used for many tasks
– object recognition
– "dimensioning" (measuring sizes of physical objects)
– to assist in higher-level processing
Spatial Moment Analysis
We analyze the solute transport in the 3D domain by computing the spatial moments of the
concentration distribution along the main direction of flow (y-direction). (52) To this end, we compute
the 1D vertical concentration function cy(y) by averaging local concentration values for each x–
z horizontal slice along the y-direction. The kth raw moment of cy(y) is defined as
where the limits of the integration represent the location of the interface (y = 0) and the bottom of the
domain (y = H). The zeroth raw moment is thus computed as
where the numerical approximation of the integral reflects the discrete nature of the experimental CT
data set. Here, the domain is discretized into NH slices of thickness Δy where cy,n is cy computed
at y = n·Δy. The total mass of the solute in the imaged domain is thus obtained as M = μ0S,
where S = Aϕ is the void cross section. One can apply the time derivative of the zeroth moment to
calculate the dissolution flux:
The first raw moment in continuous and discrete form is defined as
which can be applied to compute the location of the center of mass in the longitudinal direction, Ycom =
μ1/μ0. The second-order moment is computed about the center of mass:
Mixed Spatial Gray Level Moments

Two second-order mixed gray level moments
 Determine the least-squares, best-fit gray level intensity planes to the observed gray level pattern of
the region R (1/3)
 The least-squares fit to the observed I(r,c) is the gray level intensity plane
Determine the least-squares, best-fit gray level intensity planes to the observed gray level pattern of
the region R (2/3)
Determine the least-squares, best-fit gray level intensity planes to the observed gray level pattern of
the region R (3/3)
Boundary analysis
The boundary of the image is different from the edges in the image. Edges represent the abrupt
change in pixel intensity values while the boundary of the image is the contour. As the name
boundary suggests that something whose ownership changes, in the image when pixel ownership
changes from one surface to another, the boundary comes into the picture. Edge is basically the
boundary line but the boundary is the line or location dividing the two surfaces.
Signature Properties
Signature Properties1/3
Signature Properties (3/3)
Shape Numbers
Shape number is the smallest magnitude of the first difference of a chain code representation. The
order of a shape number is defined as the number of digits in its representation. Shape order is even
for a closed boundary.
REGIONAL DESCRIPTORS Simple Descriptors Area, perimeter and compactness are the simple
region Descriptors Compactness = (perimeter)2/area Topological Descriptors
● Rubber-sheet Distortions Topology is the study of properties of a figure that are unaffected by any
deformation, as long as there is no tearing or joining of the figure.
● Euler Number Euler number (E) of region depends on the number of connected components (C)
and holes (H). E = C − H A connected component of a set is a subset of maximal size such that any
two of its points can be joined by a connected curve lying entirely within the subset
General Frame Works For Matching:

Relational Descriptions and Relational Distance
A relational description D is a sequence of relations D = R,. ..,RI} where for each i = 1,. . ,I, there
exists a positive integer ni with Ri Xni for some set X. X is a set of the parts of the entity being
described, and the relations Ri indicate various relationships among the parts. A relational description
may be used to describe an object model, a group of regions on an image, a 2-D shape, a Chinese
character, or anything else having structure to it. In the spirit of the relational homomorphism defined
in the previous section, we wish to define a distance measure for pairs of relational descriptions. Let
DA = R,. ..,RI} be a relational description with part set A. Let DB = {S1, ..,SI} be a second relational
description with part set B. We assume that IAI = IBI; if this is not the case, we add enough dummy
parts to the smaller set to make it the case. Let f be any one-one onto mapping from A to B. The
structural error of f for the ith pair of corresponding relations (R and Si) in DA and DB is given by
Relational Descriptions and Relational Distance A relational description D is a sequence of relations
D = R,. ..,RI} where for each i = 1,. . ,I, there exists a positive integer ni with Ri Xni for some set X.
X is a set of the parts of the entity being described, and the relations Ri indicate various relationships
among the parts. A relational description may be used to describe an object model, a group of regions
on an image, a 2-D shape, a Chinese character, or anything else having structure to it. In the spirit of
the relational homomorphism defined in the previous section, we wish to define a distance measure
for pairs of relational descriptions. Let DA = R,. ..,RI} be a relational description with part set A. Let
DB = {S1, ..,SI} be a second relational description with part set B. We assume that IAI = IBI; if this is
not the case, we add enough dummy parts to the smaller set to make it the case. Let f be any one-one
onto mapping from A to B. The structural error of f for the ith pair of corresponding relations (R and
Si) in DA and DB is given by The structural error indicates how many tuples in Ri are not mapped by
f to tuples in Si and how many tuples in Si are not mapped by f-1 to tuples in Ri. The total error of f
with respect to DA and DB is the sum of the structural errors for each pair of corresponding relations.
Ordered Structural matching:

Structural matching is the process of comparing two structural descriptions to determine how similar
they are. Matching a structural description extracted from an image to one representing an object
model can tell us whether the object in the image is an instance of the object being modeled.
Structural Pattern Recognition (SPR) is a subfield of computer science that uses techniques from machine
learning and artificial intelligence to identify patterns in data. It involves the use of syntactic pattern
recognition, graph matching, and deep learning algorithms to recognize similarities between objects in
different forms or contexts. SPR also encompasses dissimilarity measures such as Bipartite Graph Edit
Distance which allow for automatic high-level feature extraction and selection for comparison purposes.
Additionally, dynamic time warping is used to measure differences between sequences over time, while
grammatical inference allows machines to learn rules from observations rather than having them
programmed by humans.
Pattern recognition receptors play an important role in SPR, allowing AI systems to more accurately
analyze human activities such as facial expressions, handwriting styles, voice commands, etc. These
receptors can be used to detect changes in a given scene or environment by analyzing its components.
As such, they provide valuable insights on how best to optimize AI systems for better performance.
Here are some key benefits of using SPR:
 It facilitates the development of advanced solutions for recognizing complex patterns;
 It helps identify relationships between data points;
 It allows us to gain deeper insight into underlying processes;
 It assists in creating models that help guide future decision-making;
 Experimental studies have shown that it improves accuracy compared to traditional methods.
View class matching

Image matching is an important concept in computer vision and object recognition. Images of the
same item can be taken from any angle, with any lighting and scale. This as well as occlusion may
cause problems for recognition. But ultimately, they still show the same item and should be
categorized that way. Therefore, it is best to find descriptive and invariant features in order to
categorize the images.
Key Point Detection
A general and basic approach to finding features is to first find unique key points, or the locations of
the most distinctive features, on each image. Then, normalize the content around the key points and
compute a local descriptor for it. The local descriptor is a vector of numbers that describes the visual
appearance of the key point. After doing so, these can be used to compare and match key points
across different images.
Harris Detector
The Harris Detector is one of the many existing detectors that can be used to find key points in
images. Corners are common key points that the detector tries to look for because there are significant
changes in intensity in all directions.
Steps for Harris Detection are as follows:
1. The change of intensity for the shift [u,v] is:
The window function can be rectangular or gaussian filter, where inside the interval is a
constant and outside is zero.
2. The change in intensity can be approximated by the equation below,
Where M is:
The elements of M correspond to whether the region is an edge, corner, or flat region.
Models database organization

The image databases are becoming an important element of the emerging information technologies.
They have been used in an a wide variety of applications such as: geographical information systems,
computer-aided design and manufacturing systems, multimedia libraries, medical image management
systems, automated catalogues in museums, biology, geology, mineralogy, astronomy, botany, house
furnishing design, anatomy, criminal identification, etc. As well they are becoming an essential part of
most multimedia databases.
There are mainly five approaches towards image database system architecture:
(1) Conventional database system as an image database system. The use of a conventional database
system as an image database system is based mainly on relational data models and rarely on
hierarchical. The images are indexed as a set of attributes. At the time of the query, instead of
retrieving by asking for information straight from the images, the information is extracted from
previously calculated image attributes. Languages such as Structured Query Language (SQL) and
Query By Example (QBE) with modifications such as Query by Pictorial Example (QPE) are
common for such systems. This type of retrieval is referred as attribute based image retrieval. A
representative prototype system from this class of systems is the system GRIM_DBMS.
(2) Image processing/graphical systems with database functionality. In these systems topological,
vector and graphical representations of the images are stored in the database. The query is usually
based on a command-based language. A representative of this model is the research system SAND
(3) Extended/extensible conventional database system to an image database system. The systems in
this class are extensions over the relational data model to overcome the imposed limitations, by the
flat tabular structure of the relational databases. The retrieval strategy is the same as in the
conventional database system. One of the research systems in this direction is the system GIS
(4) Adaptive image database system. The framework of such a system is a flexible query specification
interface to account for the different interpretations of images. An attempt for defining such kind of
systems is made in.
(5) Miscellaneous systems/approaches. Various other approaches are used for building image
databases such as: grammar based, 2-D string based, entity-attributerelationship semantic network
approach, matching algorithms, etc. In this paper a new General Image DataBase (GIDB) model is
presented. It includes descriptions of:
(1) an image database system; (2) generic image database architecture;
(3) image definition, storage and manipulation languages.
Unit 4
Facet Model Recognition

The facet model for image data assumes that the spatial domain of the image can· be partitioned into
regions having certain gray tone and shape properties. The gray tones in a region must all lie in the
same simple surface. The shape of a region must not be too jagged or too narrow.
To make these ideas precise, let Zr and Zc be the row and column index set for the spatial domain of
an image. For any (r, c)€ Zr x Zc, let I(r, c) be the gray value of resolution cell ( r, c) and let B ( r, c)
be the K x K block of resolution cells centered around resolution cell (r, c). Let P = {P(l), ••• ,P(N)}
be a partition of Zr xzc into its regions.
In the slope facet model, for every resolution cell ( r, c) e P (n), there exists a resolution cell (i, j) e zr
x zc such that:
(l) shape region constraint (r, c) e B(i, j)C P(n)
(2) region gray tone constraint I
(r, c) = a(n)r + b(n)c + g(n)
The actual image J differs from the ideal image I by the addition of random stationary noise having
zero mean and covariance matrix proportional to a specified one.
J(r, c) = I(r, c) + n(r, c)
where
E[n(r, c)] = 0
E[n(r, c) n(r', c')l= ks(r- r', c- c')
The flat model of Tomita and Tsuji Nagao and Matsuyama· (1978) differs from (1977) the and slope
facet model only in that the coefficients a(n) and b(n) are assumed to be zero. Nagao and Matsuyama
also use elongated neighborhoods with a variety of orientation. This variety of neighborhoods, of
course, leads to a more general and more complex facet model. A second way of generalizing the
facet model is to have the facet surfaces be more complex than sloped planes. For example we could
consider polynomial or trignometr ic polynomial surfaces. In the remainder of this paper, we consider
only the flat facet and the sloped facet model.
Line Labeling
In the 1970's, David Huffman and Maxwell Clowes (among others) began to teach a computer to
recognize polyhedral images as 3-d scenes. As A. Dewdney explains in his book The New Turing
Omnibus, a polyhedral scene is "an assembly of solids each of which is bounded by plane faces. The
faces of these solids meet along straight-line segments having a characteristic geometry and showing
only a finite number of relationships where two or more of them meet." The work of Huffman and
Clowes to make a computer understand such a scene and determine what individual shapes it is
composed of was later expanded on by David Waltz. The following image is an example of a
polyhedral scene:
There are a defined number of junctions which can occur between the lines of a polyhedral scene.
Dewdney describes five such junctions:
and Lawrence Stevens, in his book Artificial Intelligence: The Search for the Perfect Machine, adds::
Each segment of those junctions can be characterized as well:

Recognition of shapes
Shape is an object property which has been carefully investigated and much may be found dealing
with numerous applications—OCR, ECG analysis, electro-encephalogram (EEG) analysis, cell
classification, chromosome recognition, automatic inspection, technical diagnostics, etc. Despite this
variety, differences among many approaches are limited mostly to terminology. These common
methods can be characterized from different points of view:
• Input representation: Object description can be based on boundaries (contourbased, external) or on
knowledge of whole regions (region-based, internal).
• Object reconstruction ability: That is, whether an object’s shape can or cannot be reconstructed
from the description. Many varieties of shape-preserving methods exist which differ in the degree of
precision with respect to object reconstruction.
• Incomplete shape recognition: That is, to what extent a shape can be recognized from a description
if objects are occluded and only partial shape information is available.
• Local/global description character: Global descriptors can only be used if complete object data are
available for analysis. Local descriptors describe local object properties using partial information.
Thus, local descriptors can be used for description of occluded objects.
• Mathematical/heuristic techniques: For example, a mathematical technique is description based
on the Fourier transform and a heuristic method may be ‘elongatedness’.
• Statistical or syntactic object description.
• Robustness to translation, rotation, and scale transformations: Shape description properties in
different resolutions and poses.
Object representation and shape description methods discussed here are not an exhaustive list—we
will try to introduce generally applicable methods. It is necessary to apply a problem-oriented
approach to the solution of specific problems of description and recognition. This means that the
following methods are appropriate for a large variety of descriptive tasks and the following ideas may
be used to build a specialized, highly efficient method suitable for a particular problem description.
Such a method will no longer be general since it will take advantage of a priori knowledge about the
problem. This is how human beings can solve vision and recognition problems, by using highly
specialized knowledge.
Contour-based shape representation and description Region borders are most commonly represented
by mathematical form. rectangular pixel co-ordinates as a function of path length n. Other useful
representations are
Polar co-ordinates: border elements are represented as pairs of angle φ and distance r;
• Tangential co-ordinates: tangential directions θ(xn) of curve points are encoded as a function of path
length n.
Back-tracking Algorithm
A backtracking algorithm is a problem-solving algorithm that uses a brute force approach for
finding the desired output.
The Brute force approach tries out all the possible solutions and chooses the desired/best solutions.
The term backtracking suggests that if the current solution is not suitable, then backtrack and try other
solutions. Thus, recursion is used in this approach.
This approach is used to solve problems that have multiple solutions. If you want an optimal solution,
you must go for dynamic programming.
State Space Tree

A space state tree is a tree representing all the possible states (solution or nonsolution) of the problem
from the root as an initial state to the leaf as a terminal state.
State Space Tree
Backtracking Algorithm
Backtrack(x)
if x is not a solution
return false
if x is a new solution
add to list of solutions
backtrack(expand x)
Example Backtracking Approach

Problem: You want to find all the possible ways of arranging 2 boys and 1 girl on 3 benches.
Constraint: Girl should not be on the middle bench.
Solution: There are a total of 3! = 6 possibilities. We will try all the possibilities and get the possible
solutions. We recursively try all the possibilities.
All the possibilities are:
All the possibilities

The following state space tree shows the possible solutions.
State tree with all the solutions
Backtracking Algorithm Applications

1. To find all Hamiltonian Paths present in a graph.
2. To solve the N Queen problem.
3. Maze solving problem.
4. The Knight's tour problem
Perspective Projective geometry

In Perspective Projection the center of projection is at finite distance from projection plane. This
projection produces realistic views but does not preserve relative proportions of an object dimensions.
Projections of distant object are smaller than projections of objects of same size that are closer to
projection plane. The perspective projection can be easily described by the following figure:
1. Center of Projection – It is a point where lines or projection that are not parallel to
projection plane appear to meet.
2. View Plane or Projection Plane – The view plane is determined by :
 View reference point R0(x0, y0, z0)
 View plane normal.
3. Location of an Object – It is specified by a point P that is located in world coordinates at (x,
y, z) location. The objective of perspective projection is to determine the image point P’
whose coordinates are (x’, y’, z’)
Types of Perspective Projection : Classification of perspective projection is on basis of vanishing
points (It is a point in image where a parallel line through center of projection intersects view
plane.). We can say that a vanishing point is a point where projection line intersects view plane. The
classification is as follows :
 One Point Perspective Projection – One point perspective projection occurs when any of
principal axes intersects with projection plane or we can say when projection plane is
perpendicular to principal axis.
In the
above figure, z axis intersects projection plane whereas x and y axis remain parallel to
projection plane.
 Two Point Perspective Projection – Two point perspective projection occurs when projection
plane intersects two of principal axis.
In the above figure, projection plane intersects x and y axis whereas z axis remains parallel to
projection plane.
 Three Point Perspective Projection – Three point perspective projection occurs when all three
axis intersects with projection plane. There is no any principal axis which is parallel to
projection plane.
Application of Perspective Projection : The perspective projection technique is used by artists in

preparing drawings of three-dimensional objects and scenes.
Photogrammetric-from 2D to 3D
2-D image capture
This method begins with the capture of a 2-D image of the spike, followed by processing of the image
to extract the desired trait data (Supporting Information). Spike images were captured using a Nikon
D5000 digital single-lens reflex camera. This camera uses the visible portion of the electromagnetic
spectrum with a resolution of 4288 × 2848 pixels. Images were acquired by using a red, green, blue
complementary metal-oxide semiconductor sensor with a pixel size of 5.5 μm. Each spike was cut 2.5
cm below the lowest spikelet and was then gently laid onto an aluminum plate inside a plastic
container. The aluminum plate was light, rigid, and covered the bottom of the container. A ruler was
placed beside the spike, and an image was taken from 0.5 to 1 m directly above the sample as it rested
on the aluminum plate. This step required less than a minute to accomplish, but at times glare from
sunlight reflecting off the aluminum had to be shaded. Images were made with a focal length of 35
mm to capture details of the spike, and the camera setting was set to automatic so the camera would
account for the varying lighting conditions.
2-D curvature
One physical manifestation of spike architecture was the curvature. Spikes are rarely straight; most
have an inherent curvature, and this trait was investigated for its potential biological significance.
Curvature is very difficult to measure without using image processing.
3-D image capture
The Artec Space Spider used structured light to create a point cloud that was transferred to the Artec
Studio software for processing; the default settings were used for consistency (Figure 3). Capturing 3-
D images can be time consuming and require some practice using the Space Spider, but experience
has decreased the capture time substantially. The post-processing procedure was also more time
consuming than 2-D data processing because of the amount of data that was captured (Figure 4).
Because the Artec Space Spider does not have internal storage capacity, it was connected through a
USB cable to a Surface Book with an Intel Core i7, which ran the Artec Studio capture software and
performed all the post-processing functions. Since this device had not been previously used for the
purpose of HTP, several different approaches were tried to find the best way to capture the phenotypic
characteristics of spikes in 3-D.
Image Matching
Image matching is an important concept in computer vision and object recognition. Images of the
same item can be taken from any angle, with any lighting and scale. This as well as occlusion may
cause problems for recognition. But ultimately, they still show the same item and should be
categorized that way. Therefore, it is best to find descriptive and invariant features in order to
categorize the images.
Key Point Detection
A general and basic approach to finding features is to first find unique key points, or the locations of
the most distinctive features, on each image. Then, normalize the content around the key points and
compute a local descriptor for it. The local descriptor is a vector of numbers that describes the visual
appearance of the key point. After doing so, these can be used to compare and match key points
across different images.
Harris Detector
The Harris Detector is one of the many existing detectors that can be used to find key points in
images. Corners are common key points that the detector tries to look for because there are significant
changes in intensity in all directions.
Steps for Harris Detection are as follows:
1. The change of intensity for the shift [u,v] is:

The window function can be rectangular or gaussian filter, where inside the interval is a constant and
outside is zero.
2. The change in intensity can be approximated by the equation below,
Where M is:
The elements of M correspond to whether the region is an edge, corner, or flat region.
Ix and Iy are image derivatives in the x and y directions and the dominant gradient directions align
with the x or y axis, respectively.
For example, the image derivatives of this image:
are and
2D-Matching
This method uses a reference image known as template and slides it across the entire source image one
pixel at a time to determine the most similar objects. It will produce another image or matrix where
pixel values correspond to how similar our template is to the source image. Thus, when we try to view
the output image, the pixel values of the matched objects will be peaking or will be highlighted.
Let’s take, for example, the source image and selected template illustrated in Figure 1.
As mentioned previously, the resulting image or matrix will highlight the objects that match our
template. This is evident in Figure 2, where the lighter spots or pixels are those that are most similar to
our template. To better see the matched objects, let’s put a bounding box around them.
Figure 2. Resulting image of template matching
As mentioned previously, the resulting image or matrix will highlight the objects that match our
template. This is evident in Figure 2, where the lighter spots or pixels are those that are most similar to
our template. To better see the matched objects, let’s put a bounding box around them.
Hierarchical image matching
In the process of hierarchical image matching, the parallaxes from upper levels are transferred to
levels beneath with triangle constraint and epipolar geometrical constraint. At last, outliers are
detected and removed based on local smooth constraint of parallax.
Workflow of our image matching procedure
(1) Image pre-processing including image transformation from 16-bit to 8-bit, the Wallis filter and
constructing image pyramid.
(2) Feature point extraction using the Förstner. This step is performed to provide the interest points
and edges for later image matching.
(3) Coarse image matching using Sift algorithm. This step is employed to generate the initial
triangulation for image matching.
(4) Matching propagation in the image pyramid. This step includes feature point and grid point
matching.
(5) Blunders elimination in each level including local smooth constraint of parallax and bidirectional
image matching.
(6) Least squares image matching in original image level. By means of correcting radiometric and
geometric distortion.
Global vs. Local features
Local features, also known as local descriptors, are distinct, informative characteristics of an image or
video frame that are used in computer vision and image processing. They can be used to represent an
image’s content and perform tasks such as object recognition, image retrieval, and tracking.
Local features are typically derived from small patches or regions of an image and are designed to be
invariant to changes in illumination, viewpoint, and other factors that can affect the image’s
appearance. Common local feature extraction techniques include Scale-Invariant Feature Transform
(SIFT), Speeded Up Robust Features (SURF), Oriented FAST, and Rotated BRIEF (ORB).
Global features, also known as global descriptors, are high-level, holistic characteristics of an image
or video frame that are used in computer vision and image processing. Unlike local features, which
describe distinct, informative regions of an image, global features provide a summary of the entire
image or video.
Global features are typically derived by aggregating local features in some way, such as by computing
statistics over the local feature descriptors or by constructing a histogram of the local feature
orientations. Global features can be used to represent an image’s content and perform tasks such as
image classification, scene recognition, and video segmentation.
Unit 5
Knowledge Based Vision

Knowledge representation
Here we present a short outline of common knowledge representation techniques and

representations as they are used in AI, and an overview of some basic knowledge representations.
More detailed coverage can be found in [Michalski et al., 1983; Wechsler, 1990; Reichgelt, 1991;
Sowa, 1999; Brachman and Levesque, 2004].
A good knowledge representation design is the most important part of solving the understanding
problem and a small number of relatively simple control strategies is often sufficient for AI systems
to show complex behavior, assuming an appropriately complex knowledge base is available. In other
words, given a rich, well-structured representation of a large set of a priori data and hypotheses a
high degree of control sophistication is not required for intelligent behavior. Other terms of which
regular use will be made are syntax and semantics [Winston, 1984].
The syntax of a representation specifies the symbols that may be used and the ways that they may
be arranged while the semantics of a representation specifies how meaning is embodied in the
symbols and the symbol arrangement allowed by the syntax.
A representation is then a set of syntactic and semantic conventions that make it possible to
describe things. The main knowledge representation techniques used in AI are formal grammars and
languages, predicate logic, production rules, semantic nets, and frames. Note that knowledge
representation data structures are mostly extensions of conventional data structures such as lists,
trees, graphs, tables, hierarchies, sets, rings, nets, and matrices.
Control Strategies
Control strategies are used to guide the processing and analysis of images. They determine how the
system should behave based on the knowledge represented. There are several types of control
strategies:
|Rule-based control
Rule-based control involves using a set of predefined rules to make decisions. These rules are
typically based on if-then statements and are designed to capture expert knowledge.
Model-based control
Model-based control involves using a mathematical model of the system to make decisions. The
model represents the relationships between the input images, the processing steps, and the desired
output.
Behavior-based control
Behavior-based control involves defining a set of behaviors or modules that operate independently
and interact with each other. Each behavior is responsible for a specific task, and the system's
behavior emerges from the interactions between these behaviors.
Hybrid control
Hybrid control combines multiple control strategies to take advantage of their strengths. For
example, a system may use rule-based control for high-level decision-making and behavior-based
control for low-level perception and action.
Each control strategy has its own advantages and disadvantages. Rule-based control allows for
explicit knowledge representation but may not handle complex situations well. Model-based control
provides a principled approach but requires accurate models. Behavior-based control is flexible and
robust but may lack global optimization. Hybrid control combines the strengths of different
strategies but may introduce additional complexity.
Information Integration
Integrating these new procedures into knowledge-based systems will not be easy, however. Based
on our experience with the Schema System, when new vision procedures are developed, integrating
them presents yet another set of problems, particularly if the new procedure was developed at
another laboratory. About half the vision procedures of the time were written in C;
since the VISIONS/Schema System was implemented in Lisp, this meant that half of all procedures
had to be re-implemented. Even when the programming languages of the procedure and the vision
system matched, the data structures rarely did. Every algorithm seemingly had its own formats for
images, edges, straight lines and other commonly used geometric data structures. Applying one
procedure to data created by another usually required non-trivial data conversions
Object recognition-Hough transforms and other simple object recognition methods
What is Hough Transform?

The Hough transform in image processing is a technique used to detect simple geometric shapes in
images. It works by transforming the image space into a parameter space, where the geometric
shapes can be detected through the identification of patterns in the parameter space.
Why is Hough Transform Needed?

The Hough transform is needed because traditional image processing techniques like edge detection
and thresholding are not always effective at detecting simple geometric shapes in images. These
techniques can be particularly ineffective when the shapes are distorted, incomplete, or partially
obscured. The Hough transform can detect these shapes by transforming the image space into a
parameter space where the shapes can be more easily identified.
Algorithm
The algorithm of the Hough transform in image processing can be summarized as follows:
 For each pixel in the image, compute all the possible curves in the parameter space
that pass through that pixel.
 For each curve in the parameter space, increment a corresponding accumulator array.
 Analyze the accumulator array to detect the presence of simple geometric shapes in
the image.
Sum of Hough Transform

The sum of the Hough transform is a variant of the Hough transform in image processing that
is used to detect circles and ellipses in images.
It works by transforming the image space into a parameter space consisting of three
parameters: the x-coordinate of the center of the circle, the y-coordinate of the center of
the circle, and the radius of the circle.
Guidelines for Use

Here are some general guidelines for using the Hough transform:
 Preprocess the image: Before applying the Hough transform, it is recommended to

preprocess the image to reduce noise, enhance edges, and improve contrast. Common
preprocessing techniques include filtering, thresholding, and edge detection.
 Choose the appropriate variant: The Hough transform has several variants, and the
choice of variant depends on the shape being detected, the level of noise in the image,
and the computational resources available. It is important to choose the appropriate
variant of the Hough transform for the specific application.
 Select parameters carefully: The Hough transform involves selecting several
parameters, such as the threshold value, the minimum line length, and the maximum
gap between line segments. It is important to select these parameters carefully to
achieve the desired level of accuracy and efficiency.
 Use multiple scales: If the shape being detected varies in size, it may be useful to use
a multi-scale Hough transform that detects shapes at different scales.
 Combine with other techniques: The Hough transform can be combined with other
image processing techniques, such as template matching, machine learning, or feature
extraction, to improve the accuracy and efficiency of shape detection.
 Validate the results: The Hough transform can sometimes produce false positives or
false negatives, especially in noisy or complex images. It is important to validate the
results of the Hough transforms using additional techniques, such as human inspection
or additional image analysis methods.
Shape correspondence and shape matching
Shape matching is a very important problem in computer vision. The work in this paper treats
shape matching as finding the best geometrical transformation between two shapes in the spirit
of Grenander’s pattern theory
Our approach represents the shapes in terms of points, which are supplemented by shape feature
for the discriminative models. We use two types of representations – sparse-point and continuous-
contour. The continuous-contour representation leads to better shape features, since the arc-length
is known, but this representation is not always practical to compute. These representations are
adequate for this paper, but their lack of ability to represent shape parts makes them unsuitable for
matching shapes.
The structure of this paper is as follows. Section (2) gives the generative formulation of the problem.
In section (3), we motivate the discriminative approach. Section (4) describes how the algorithm
combines the two methods. In section (5) we give examples on a range of datasets and problems.
The Generative Formulation

The task of shape matching is to match two shapes, X and Y, and to measure the similarity between
them. We refer to X as the target shape and Y as the source shape. We define the similarity measure
in terms of the transformation that takes the source shape into the target, see figure 6. In this paper
we use two types of transformation: (i) a global affine transformation A and (ii) a smooth non-rigid
transformation f.
Shape Representation
We use two types of shape representation in this paper: (I) sparse-point, and (II) continuous-
contour. The choice will depend on the form of the data. Shape matching will be easier if we
have a continuous-contour representation because we are able to exploit knowledge of the
arc-length to obtain shape features which are less ambiguous for matching, and hence more
informative, see section (3.1). But it may only be possible to compute a sparse-point
representation for the target shape (e.g. the target shape may be embedded in an image and an
edge detector will usually not output all the points on its boundary).
I. For the sparse-point representation, we denote the target and source shape respectively by:
X = {xi:i = 1, …, M}, and Y = {ya:a = 1, …, N}.
(1)
II. For the continuous-contour representation, we denote the target and source shape
respectively by:
X = {x(s):s ∈ [0, 1]}, and Y = {y(t):t ∈ [0, 1]},
(2)
where s and t are the normalized arc-length. In this case, each shape is represented by a 2D
continuous-contour. By sampling points along the contour we can obtain a sparse-point
representation X = {xi : i = 1, …, M}, and Y = {ya : a = 1, …, N}. But we can exploit the
continuous-contour representation to compute additional features that depend on
differentiable properties of the contour such as tangent angles.
The Generative Model
Our generative model for shape matching defines a probability distribution for generating the
target X from the source Y by means of a geometric transformation (A, f). There will be
priors P(A), P(f) on the transformation which will be specified in section (2.3).
We also define binary-valued correspondence variables {Vai} such that Vai = 1 if point a on
the source Y matches point i on the target X. These are treated as hidden variables. There is a
prior P(V) which specifies correspondence constraints on the matching (e.g. to constrain that
all points on the source Y must be matched).
The choice of the correspondence constraints, as specified in P(V) is very important. They
must satisfy a trade-off between the modeling and computational requirements. Constraints
that are ideal for modeling purposes can be computationally intractable. The prior P(V) will
be given in section (2.5) and the trade-off discussed.
The full generative model is P(X, V, A, f|Y) = P(X|Y, V, A, f)P(A)P(f)P(V), where the priors
are given in sections (2.3), (2.5). The distribution P(X|Y, V, A, f) is given by:
By using the priors P(A), P(f), P(V) and summing out the V’s, we obtain (this equation
defines ET [A, f; X, Y]):
We define the optimal geometric transformation to be:
We define the similarity measure between shapes to be

Principal component analysis
Principal Components Analysis (PCA). What is it? It is a way of identifying patterns in data, and
expressing the data in such a way as to highlight their similarities and differences. Since patterns in
data can be hard to find in data of high dimension, where the luxury of graphical representation is
not available, PCA is a powerful tool for analysing data. The other main advantage of PCA is that
once you have found these patterns in the data, and you compress the data, ie. by reducing the
number of dimensions, without much loss of information.
Method
Step 1: Get some data In my simple example, I am going to use my own made-up data set. It’s only
got 2 dimensions, and the reason why I have chosen this is so that I can provide plots of the data to
show what the PCA analysis is doing at each step. The data I have used is found in Figure 3.1, along
with a plot of that data.
Step 2: Subtract the mean For PCA to work properly, you have to subtract the mean from each of the
data dimensions. The mean subtracted is the average across each dimension. So, all the < values
have < (the mean of the < values of all the data points) subtracted, and all the = values have =
subtracted from them. This produces a data set whose mean is zero.
Step 3: Calculate the covariance matrix This is done in exactly the same way as was discussed in
section 2.1.4. Since the data is 2 dimensional, the covariance matrix will be f . There are no
surprises here, so I will just give you the result: B&CD?
So, since the non-diagonal elements in this covariance matrix are positive, we should expect that
both the < and = variable increase together. Step 4: Calculate the eigenvectors and eigenvalues of
the covariance matrix Since the covariance matrix is square, we can calculate the eigenvectors and
eigenvalues for this matrix. These are rather important, as they tell us useful information about our
data. I will show you why soon. In the meantime, here are the eigenvectors and eigenvalues:
Step 5: Choosing components and forming a feature vector Here is where the notion of data
compression and reduced dimensionality comes into it. If you look at the eigenvectors and
eigenvalues from the previous section, you
will notice that the eigenvalues are quite different values. In fact, it turns out that the eigenvector
with the highest eigenvalue is the principle component of the data set. In our example, the
eigenvector with the larges eigenvalue was the one that pointed down the middle of the data. It is
the most significant relationship between the data dimensions.
Step 5: Deriving the new data set This the final step in PCA, and is also the easiest. Once we have
chosen the components (eigenvectors) that we wish to keep in our data and formed a feature
vector, we simply take the transpose of the vector and multiply it on the left of the original data set,
transposed.
Feature extraction
Feature extraction is the backbone of computer vision. It involves the identification and extraction of
important patterns or features from images or videos. These features act as distinctive
characteristics that help algorithms classify between different objects or elements within the visual
data.
Key Applications of Feature Extraction
There are many ways feature extraction is being used. Some of the common real-world applications
are:
1. Object Recognition in Autonomous Vehicles
Computer vision algorithms extract features from the surroundings to identify objects such as
pedestrians, traffic signs, and other vehicles, enabling autonomous vehicles to navigate safely.
2. Medical Image Analysis
In the medical field, feature extraction plays a important role in analyzing medical images, aiding in
the detection and diagnosis of diseases and abnormalities.
3. Augmented Reality (AR) and Virtual Reality (VR)
Feature extraction allows AR and VR applications to overlay virtual objects onto the real world
seamlessly.
4. Quality Control in Manufacturing
Computer vision algorithms can identify defects and anomalies in manufacturing processes, ensuring
higher product quality.
5. Facial Recognition Systems
Facial recognition systems utilize feature extraction to identify and authenticate individuals based on
unique facial features.
The Process of Feature Extraction
The process of feature extraction involves the following steps:
1. Image Preprocessing
Before extracting features, images often undergo preprocessing steps like noise reduction, image
enhancement, and normalization.
2. Feature Detection
In this step, algorithms detect key points, edges, or corners in the images using feature detection
techniques.
3. Feature Description
Once key points are identified, feature description techniques transform these points into a
mathematical representation.
4. Feature Matching
Feature matching algorithms compare and match the extracted features with a database to
recognize objects or patterns.

Unit 1 to 5 Computer Vision and Image Processing

Uploaded by

Unit 1 to 5 Computer Vision and Image Processing

Uploaded by

Unit I

What is Computer Vision?

What is Image Processing?

Types of Image Processing

Hardware for Specialized Image Processing

History and Evolution of Computer Vision:-

 Object Detection (locates objects in images and videos)

Image Statistics Recognition Methodology:

Operations Based on Dilation and Erosion

Function Morphological Definition E

The hit-miss transforms can be used to detect patterns in an image.

Morphological algorithm operations on binary images

Morphological algorithm operations on gray-scale images

Thickening is a morphological operation that is used to grow selected regions of foreground

Common Representation scheme

Figure 1: boundary descriptor.

Figure 2: curvature of a boundary

Total Absolute Curvature

Region based segmentation

Region-based segmentation framework

Naturally, the partition Pm (â„¦) is meant to

What is a Binary Image?

Thresholding — how to generate a binary image

A binary image is obtained from a grey-scale image by following a process of information

Some popular thresholding algorithms are:

various "local thresholding" algorithms

What is Image Segmentation?

1. Identify a seed point within each connected component

2. Assign a unique label to each seed point

3. Expand the labels to fill all pixels in the respective regions

2. Iteratively combine adjacent regions

Spatial clustering methods can be categorized into four types:

oversegmentation results by incorporating region labeling in the segmentation process In this

weight of the respective graph’s edge eab.

based on the histogram of all weights in E.

Motion-based segmentation of images refers, here, to partitioning an image into regions of

Motion-based segmentation is multi-purpose task in computer vision :

 Image understanding (content analysis), which applications include surveillance, video

Applications of Feature Extraction

The first raw moment in continuous and discrete form is defined as

Mixed Spatial Gray Level Moments

General Frame Works For Matching:

Ordered Structural matching:

View class matching

Steps for Harris Detection are as follows:

1. The change of intensity for the shift [u,v] is:

Models database organization

Facet Model Recognition

Each segment of those junctions can be characterized as well:

State Space Tree

State Space Tree

Example Backtracking Approach

All the possibilities

State tree with all the solutions

Backtracking Algorithm Applications

Perspective Projective geometry

Application of Perspective Projection : The perspective projection technique is used by artists in

Steps for Harris Detection are as follows:

1. The change of intensity for the shift [u,v] is:

2. The change in intensity can be approximated by the equation below,

For example, the image derivatives of this image:

Hierarchical image matching

Knowledge Based Vision

Here we present a short outline of common knowledge representation techniques and

Object recognition-Hough transforms and other simple object recognition methods

What is Hough Transform?

Why is Hough Transform Needed?

Sum of Hough Transform

Guidelines for Use

 Preprocess the image: Before applying the Hough transform, it is recommended to

The Generative Formulation

X = {xi:i = 1, …, M}, and Y = {ya:a = 1, …, N}.

X = {x(s):s ∈ [0, 1]}, and Y = {y(t):t ∈ [0, 1]},

The Generative Model

We define the optimal geometric transformation to be:

We define the similarity measure between shapes to be