Unit 1 to 5 Computer Vision and Image Processing
Unit 1 to 5 Computer Vision and Image Processing
Image Display
The pictures are shown.
Software
The image processing software comprises specialized modules that carry out particular functions.
Hardcopy Equipment
Laser printers, film cameras, heat-sensitive equipment, inkjet printers, and digital equipment like optical
and CDROM discs are just a few examples of the instruments used to record pictures.
Networking
To send visual data through a networked computer, it is a necessary component. The most important
factor in picture transmission is bandwidth since image processing applications require vast amounts of
data.
Types of Filters
There are several types of filters that can be used to modify an image. Some of the most common filters
are a blur, sharpen, edge detection, color correction, and noise reduction.
Blur: Blur filters are used to soften the edges of an image, creating a more subtle look. This can be used
to make an image look more natural or to reduce the visibility of distracting details.
Sharpen: Sharpen filters are used to make an image appear sharper and clearer. This can be used to
make an image look more detailed or to make the colors more vibrant.
Edge Detection: Edge detection filters are used to accentuate the outlines of an image. This can be used
to make objects stand out more or to create a more dramatic effect.
Color Correction: Color correction filters are used to adjust the hue, saturation, and brightness of an
image. This can be used to make an image look more realistic, or to create a specific color palette.
Noise Reduction: Noise reduction filters are used to remove unwanted noise from an image. This can
be used to make an image look more natural or to reduce the visibility of digital artifacts.
Image Representation
Image representation is a process of representation of images which can be in 2-D or three-dimensional.
Based on these representations, only the processing operations can be decided.
Image representation can be of many forms in computer science. In basic terminology, image
representation refers to the way which is used to convey information about the image and its features
such as color, pixels, resolution, intensity, brightness, how the image is stored, etc.
Some other pre-processing techniques are as follows:
1. Image editing: Image editing is alteration in images by means of graphics software tools.
2. Image restoration: Image restoration is extraction of the original image from the corrupt image in
order to retrieve the lost information.
3. Independent component analysis (ICA): This method is used to separate multivariate signals
computationally into additive subcomponents.
4. Anisotropic diffusion: This is also known as Perona-Malik Diffusion which is used to reduce noise
of an image without removing any important part of an image.
5. Linear filtering: This technique is used to process time varying input signals and produce output
signals which are subject to constraint of linearity.
6. Pixelation: This process is used to convert printed images into digitized ones.
7. Principal component analysis (PCA): This technique is used for feature extraction.
8. Partial differential equations: These techniques deal with effectively denoising images.
9. Hidden Markov model: This technique is used for image analysis in two dimensional models
10. Wavelets: It is a mathematical model which is used for image compression.
11. Self-organizing maps: This technique is used to classify images into a number of classes.
12. Point feature mapping: It is used to detect a specified target in a cluttered scene.
13. Histogram: The histogram is used to plots the number of pixels in the image in the form of curve.
Operation Rule
Dilation The value of the output pixel is the maximum value of all pixels in the neighborhood. In a binary image, a pixel is set to 1 if any of the neighborin
Morphological dilation makes objects more visible and fills in small holes in objects. Lines appear thicker, and filled shapes appear larger.
Erosion The value of the output pixel is the minimum value of all pixels in the neighborhood. In a binary image, a pixel is set to 0 if any of the neighborin
Morphological erosion removes floating pixels and thin lines so that only substantive objects remain. Remaining lines appear thinner and shapes
imopen Perform morphological opening. The opening operation erodes an image and then dilates the
eroded image, using the same structuring element for both operations.
Morphological opening is useful for removing small objects and thin lines from an image while
preserving the shape and size of larger objects in the image. For an example, see Use
Morphological Opening to Extract Large Image Features.
imclose Perform morphological closing. The closing operation dilates an image and then erodes the
dilated image, using the same structuring element for both operations.
Morphological closing is useful for filling small holes in an image while preserving the shape
and size of large holes and objects in the image.
bwskel Skeletonize objects in a binary image. The process of skeletonization erodes all objects to
centerlines without changing the essential structure of the objects, such as the existence of holes
and branches.
bwperim Find perimeter of objects in a binary image. A pixel is part of the perimeter if it is nonzero and
it is connected to at least one zero-valued pixel. Therefore, edges of interior holes are considered
part of the object perimeter.
bwhitmiss Perform binary hit-miss transform. The hit-miss transform preserves pixels in a binary image
whose neighborhoods match the shape of one structuring element and do not match the shape of
a second disjoint structuring element.
Function Morphological Definition E
imtophat Perform a morphological top-hat transform. The top-hat transform opens an image, then subtracts
the opened image from the original image.
The top-hat transform can be used to enhance contrast in a grayscale image with nonuniform
illumination. The transform can also isolate small bright objects in an image.
imbothat Perform a morphological bottom-hat transform. The bottom-hat transform closes an image, then
subtracts the original image from the closed image.
The bottom-hat transform isolates pixels that are darker than other pixels in their neighborhood.
Therefore, the transform can be used to find intensity troughs in a grayscale image.
Hit-and-Miss Transform
The hit-and-miss transform is a general binary morphological operation that can
be used to look for particular patterns of foreground and background pixels in an
image. It is actually the basic operation of binary morphology since almost all the
other binary morphological operators can be derived from it. As with other binary
morphological operators it takes as input a binary image and a structuring
element, and produces another binary image as output.
How It Works
The structuring element used in the hit-and-miss is a slight extension to the type that has been
introduced for erosion and dilation, in that it can contain both foreground and background
pixels, rather than just foreground pixels, i.e. both ones and zeros. Note that the simpler type
of structuring element used with erosion and dilation is often depicted containing both ones
and zeros as well, but in that case the zeros really stand for `don't care's', and are just used to
fill out the structuring element to a convenient shaped kernel, usually a square. In all our
illustrations, these `don't care's' are shown as blanks in the kernel in order to avoid confusion.
An example of the extended kind of structuring element is shown in Figure 1. As usual we
denote foreground pixels using ones, and background pixels using zeros.
The hit-and-miss operation is performed in much the same way as other morphological operators, by
translating the origin of the structuring element to all points in the image, and then comparing the
structuring element with the underlying image pixels. If the foreground and background pixels in the
structuring element exactly match foreground and background pixels in the image, then the pixel
underneath the origin of the structuring element is set to the foreground color. If it doesn't match, then
that pixel is set to the background color.
After obtaining the locations of corners in each orientation, We can then simply OR all these
images together to get the final result showing the locations of all right angle convex corners
in any orientation. Figure 3 shows the effect of this corner detection on a simple binary image.
Figure 3 Effect of the hit-and-miss based right angle convex corner detector on a simple binary
image. Note that the `detector' is rather sensitive.
Implementations vary as to how they handle the hit-and-miss transform at the edges of images
where the structuring element overlaps the edge of the image. A simple solution is to simply
assume that any structuring element that overlaps the image does not match underlying pixels,
and hence the corresponding pixel in the output should be set to zero.
The erosion The erosion process is similar to dilation, but we turn pixels to 'background', not
'object'. As before, slide the structuring element across the image and then follow these steps:
1. If the origin of the structuring element coincides with a 'background' pixel in the image, there
is no change; move to the next pixel. 2. If the origin of the structuring element coincides with
an 'object' pixel in the image, and any of the 'object' pixels in the structuring element extend
beyond the 'object' pixels in the image, then change the current 'object' pixel in the image
(above you have positioned the structuring element center) to a 'background' pixel.
Algorithm
(i) Read the color image and convert it to gray-scale.
(ii) Develop gradient images using appropriate edge detection function.
(iii) Mark the foreground objects using morphological reconstruction (better than the
opening image with a closing).
(iv) Calculating the regional maxima and minima to obtain the good forward markers.
(v) Superimpose the foreground marker image on the original image.
(vi) Clean the edges of the markers using edge reconstruction.
(vii) Compute the background markers.
(viii) Compute the watershed transform of the function.
Thinning
This is somewhat similar to erosion or opening operation that we discussed earlier. As clear
from the name, this is used to thin the foreground region such that
its extent and connectivity is preserved. Preserving extent means preserving the endpoints of
a structure whereas connectivity can refer to either 4-connected or 8-connected. Thinning is
mostly used for producing skeletons which serve as image descriptors, and for reducing the
output of the edge detectors to a one-pixel thickness, etc.
There are various algorithms to implement the thinning operation such as
Zhang Suen fast parallel thinning algorithm
Non-max Suppression in Canny Edge Detector
Guo and Hall’s two sub-iteration parallel Thinning algorithm
Iterative algorithms using morphological operations such as hit-or-miss, opening and
erosion, etc
Thickening
Thickening is the dual of thinning and thus is equivalent to applying the thinning operation on the
background or on the complement of the set A.
There are many features that depend on boundary descriptors of objects such as bending energy,
curvature etc. For an irregularly shaped object, the boundary direction is a better representation
although it is not directly used for shape descriptors like centroid, orientation, area etc. Consecutive
points on the boundary of a shape give relative position or direction. A 4- or 8-connected chain code
is used to represent the boundary of an object by a connected sequence of straight line segments. 8
connected number schemes are used to represent the direction in this case. It starts with a beginning
location and a list of numbers representing directions such as d1 , d 2 ,⋅ ⋅ ⋅⋅, d N . Each direction
provides a compact representation of all the information in a boundary. The directions also represent
the slope of the boundary. In Figure 1, redrawn from, an 8 connectivity chain code is displayed where
the boundary description for the boxes with red arrows will be 2-1-0-7-7-0-1-1.
Curvature
The rate of change of a slope is called the curvature. As the digital boundary is generally jagged,
getting a true measure of curvature is difficult. The curvature at a single point in the boundary can be
defined by its adjacent line segments. The difference between slopes of two adjacent (straight) line
segments is a good measure of the curvature at that point of intersection.
The curvature of the boundary at ( xi , y i ) can be estimated from the change in the slope is given by:
Curvature (κ) is a local attribute of a shape. The object boundary is traversed clockwise for finding the
curvature. A vertex point is in a convex segment when the change of slope at that point is positive;
otherwise that point is in a concave segment if there is a negative change in slope as shown in Figure
2.
Bending Energy
The descriptor called bending energy is obtained by integrating the squared curvature κ ( p ) through
the boundary length L . It s a robust shape descriptor and can be used for matching shapes.
The value 2π / R will be obtained as its minimum for a perfect circle with radius R and the value will
be higher for an irregular object.
As the convex object will have the minimum value, a rough object will have a higher value.
The main idea here is to classify a particular image into a number of regions or classes. Thus for each
pixel in the image we need to somehow decide or estimate which class it belongs to. Region-based
segmentation methods attempt to partition or group regions according to common image properties.
These image properties consist of
Intensity values from original images, or computed values based on an image operator
Textures or patterns that are unique to each type of region
Spectral profiles that provide multidimensional image data
Binary images are images whose pixels have only two possible intensity values. Numerically, the two
values are often 0 for black, and either 1 or 255 for white.
The main reason binary images are particularly useful in the field of Image Processing is because they
allow easy separation of an object from the background. The process of segmentation allows to label
each pixel as ‘background’ or ‘object’ and assigns corresponding black and white colours.
median thresholding: pick a global threshold value that results in a final image as close
as possible to 50% white, 50% black.
entropy thresholding: another way to pick a global threshold value
Image segmentation is the process of dividing an image into multiple meaningful and homogeneous
regions or objects based on their inherent characteristics, such as color, texture, shape, or brightness.
Image segmentation aims to simplify and/or change the representation of an image into something more
meaningful and easier to analyze. Here, each pixel is labeled. All the pixels belonging to the same
category have a common label assigned to them. The task of segmentation can further be done in two
ways:
Similarity: As the name suggests, the segments are formed by detecting similarity between
image pixels. It is often done by thresholding (see below for more on thresholding). Machine
learning algorithms (such as clustering) are based on this type of approach for image
segmentation.
Discontinuity: Here, the segments are formed based on the change of pixel intensity values
within the image. This strategy is used by line, point, and edge detection techniques to obtain
intermediate segmentation results that may be processed to obtain the final segmented image.
Types of Segmentation
Image segmentation modes are divided into three categories based on the amount and type of
information that should be extracted from the image: Instance, semantic, and panoptic. Let’s look at
these various modes of image segmentation methods.
Also, to understand the three modes of image segmentation, it would be more convenient to know more
about objects and backgrounds.
Objects are the identifiable entities in an image that can be distinguished from each other by assigning
unique IDs, while the background refers to CCL works by scanning an image pixel-by-pixel to identify
connected pixel regions. The algorithm consists of three steps:
parts of the image that cannot be counted, such as the sky, water bodies, and other similar elements. By
distinguishing between objects and backgrounds, it becomes easie to understand the different modes of
image segmentation and their respective applications.
Instance Segmentation
Instance segmentation is a type of image segmentation that involves detecting and segmenting each
object in an image. It is similar to object detection but with the added task of segmenting the object’s
boundaries. The algorithm has no idea of the class of the region, but it separates overlapping objects.
Instance segmentation is useful in applications where individual objects need to be identified and
tracked.
Hierarchical segmentation
In a hierarchical segmentation, an object of interest may be represented by multiple image segments
in finer levels of detail in the segmentation hierarchy. These segments can then be merged into a
surrounding region at coarser levels of detail in the segmentation hierarchy.
Here's how algorithms typically start to construct the hierarchy:
1. Use an initial set of regions that conforms to the finest possible partition
3. A new node represents the output regions on the graph as parent of the merged regions
General image segmentation is used as a pre-processing step for solving high-level vision problems,
such as object recognition and image classification.
Spatial clustering
Spatial clustering is a process that groups a set of objects into clusters. Objects within a cluster are
similar, while clusters are as dissimilar as possible.
In image processing, spatial clustering separates multiple features in an image into separate
masks. These masks can then be used for further analysis. Spatial Clustering¶
Segment features of images based on their distance to each other.
plantcv.spatial_clustering(mask, algorithm="DBSCAN", min_cluster_size=5, max_distance=None)
returns image showing all clusters colorized, individual masks for each cluster.
Parameters:
mask - Mask/binary image to segment into clusters.
algorithm - Algorithm to use to segregate feature in image. Currently, "DBSCAN" and "OPTICS" are
supported.
"OPTICS" is slower but has better resolution for smaller objects, and "DBSCAN" is faster and useful
for larger features in the image (like separating two plants from each other).
min_cluster_size - The minimum size an feature of the image must be (in pixels) before it can be
considered its own cluster.
max_distance - The maximum distance between two pixels before they can be considered a part of the
same cluster.
When using "DBSCAN," this value must be between 0 and 1. When using "OPTICS," the value is the
pixels and depends on the size of your image.
Partitioning method
Hierarchical method
Density-based method
Grid-based method
Split& merge
Split and merge segmentation is a technique used in image processing to segment an image. It
involves splitting an image into quadrants based on a homogeneity criterion, and then merging similar
regions to create the segmented result.
The split and merge algorithm has four processing phases and requires several input
parameters. These parameters include the regular and relaxed predicates, P, and P,, and the initial cut
set size. The predicates are used to test for region homogeneity.
The split and merge algorithm carries out the following four processes:
Split
Merge
Grouping
Small-region elimination
The first process of the split and merge algorithm merges quad siblings in a branch.
The basic representational structure is pyramidal. For example, a square region of size m by m at one
level of a pyramid has 4 sub-regions of size by below it in the pyramid.
In the Region Splitting and Merging method, the entire image is considered at once, and the pixels are
merged together into a single region or split with respect to the entire image.
RULE-BASED IMAGE ANALYSIS: This section presents the integration of a rule based reasoning
system into an image segmentation algorithm. Based on the foundations described in the previous
section, we introduce a novel segmentation algorithm that relies on fuzzy region labeling and rules to
solve the problem of image oversegmentation. A. Semantic region merging Recursive Shortest
Spanning Tree, or simply RSST, is a bottom-up segmentation algorithm that begins from the pixel
level and iteratively merges neighbor regions according to a distance value until certain termination
criteria are satisfied. This distance is calculated based on color and texture characteristics, which are
independent of the area’s size. In every step the two regions with the least distance are merged; visual
characteristics of the new region are extracted and all distances are updated accordingly. We introduce
here a modified version of RSST, called Semantic RSST (S-RSST) that aims to improve the usual
approach the distance between two adjacent regions a and b (vertices va and vb in the graph) is
calculated using NEST, in a fashion described later on, and this dissimilarity value i is assigned as the
This update procedure consists of the following two actions: 1) Re-evaluation of the degrees of
membership of the labels fuzzy set in a weighted average (w.r.t. the regions’ size) fashion. 2) Re-
adjustment of the ARG edges by removing edge eab and re-evaluating the weight of the affected
edges invoking NEST. This procedure continues until the edge e∗ with the least weight in the ARG is
bigger than a threshold: w(e∗) > Tw. This threshold is calculated in the beginning of the algorithm,
Motion-based segmentation
If the goal is image understanding, color/grey-level/texture can help locate interesting zones/objects.
Though, a partition based on such criteria will often contain too many regions to be exploitable,
interesting objets hence being split into several regions. Often, scenes consist in animated regions
(people, vehicles...) of interest on some background scene, and in such cases, motion is a far more
appropriate criterion.
Area Extraction
Feature extraction is a part of the dimensionality reduction process, in which, an initial set of the raw
data is divided and reduced to more manageable groups. So when you want to process it will be
easier. The most important characteristic of these large data sets is that they have a large number of
variables. These variables require a lot of computing resources to process. So Feature extraction helps
to get the best feature from those big data sets by selecting and combining variables into features,
thus, effectively reducing the amount of data. These features are easy to process, but still able to
describe the actual data set with accuracy and originality.
Bag of Words- Bag-of-Words is the most used technique for natural language
processing. In this process they extract the words or the features from a sentence,
document, website, etc. and then they classify them into the frequency of use. So in this
whole process feature extraction is one of the most important parts.
Image Processing –Image processing is one of the best and most interesting domain. In
this domain basically you will start playing with your images in order to understand them.
So here we use many many techniques which includes feature extraction as well and
algorithms to detect features such as shaped, edges, or motion in a digital image or video
to process them.
Auto-encoders: The main purpose of the auto-encoders is efficient data coding which is
unsupervised in nature. this process comes under unsupervised learning . So Feature
extraction procedure is applicable here to identify the key features from the data to code
by learning from the coding of the original data set to derive new ones.
Unit 3
Region Analysis:
Region-based segmentation involves dividing an image into regions with similar characteristics. Each
region is a group of pixels, which the algorithm locates via a seed point. Once the algorithm finds the
seed points, it can grow regions by adding more pixels or shrinking and merging them with other
points.
There are two variants of region-based segmentation:
Region growing − This method recursively grows segments by including neighboring pixels
with similar characteristics. It uses the difference in gray levels for gray regions and the
difference in textures for textured images.
Region splitting − In this method, the whole image is considered a single region. Now to
divide the region into segments it checks for pixels included in the initial region if they
follow the predefined set of criteria. If they follow similar rules they are taken into one
segment.
Top-down approach
First, we need to define the predefined seed pixel. Either we can define all pixels as
seed pixels or randomly chosen pixels. Grow regions until all pixels in the image
belongs to the region.
Bottom-Up approach
Select seed only from objects of interest. Grow regions only if the similarity criterion
is fulfilled.
Similarity Measures:
Similarity measures can be of different types: For the grayscale image the similarity
measure can be the different textures and other spatial properties, intensity difference
within a region or the distance b/w mean value of the region.
Region merging techniques:
In the region merging technique, we try to combine the regions that contain the single
object and separate it from the background.. There are many regions merging
techniques such as Watershed algorithm, Split and merge algorithm, etc.
Pros:
Since it performs simple threshold calculation, it is faster to perform.
Region-based segmentation works better when the object and background have high
contrast.
Limitations:
It did not produce many accurate segmentation results when there are no significant
differences b/w pixel values of the object and the background.
Region properties
• Many properties can be extracted from an image
region
– area
– length of perimeter
– orientation
– etc.
• These properties can be used for many tasks
– object recognition
– "dimensioning" (measuring sizes of physical objects)
– to assist in higher-level processing
Spatial Moment Analysis
We analyze the solute transport in the 3D domain by computing the spatial moments of the
concentration distribution along the main direction of flow (y-direction). (52) To this end, we compute
the 1D vertical concentration function cy(y) by averaging local concentration values for each x–
z horizontal slice along the y-direction. The kth raw moment of cy(y) is defined as
where the limits of the integration represent the location of the interface (y = 0) and the bottom of the
domain (y = H). The zeroth raw moment is thus computed as
where the numerical approximation of the integral reflects the discrete nature of the experimental CT
data set. Here, the domain is discretized into NH slices of thickness Δy where cy,n is cy computed
at y = n·Δy. The total mass of the solute in the imaged domain is thus obtained as M = μ0S,
where S = Aϕ is the void cross section. One can apply the time derivative of the zeroth moment to
calculate the dissolution flux:
which can be applied to compute the location of the center of mass in the longitudinal direction, Ycom =
μ1/μ0. The second-order moment is computed about the center of mass:
The least-squares fit to the observed I(r,c) is the gray level intensity plane
Determine the least-squares, best-fit gray level intensity planes to the observed gray level pattern of
the region R (2/3)
Determine the least-squares, best-fit gray level intensity planes to the observed gray level pattern of
the region R (3/3)
Boundary analysis
The boundary of the image is different from the edges in the image. Edges represent the abrupt
change in pixel intensity values while the boundary of the image is the contour. As the name
boundary suggests that something whose ownership changes, in the image when pixel ownership
changes from one surface to another, the boundary comes into the picture. Edge is basically the
boundary line but the boundary is the line or location dividing the two surfaces.
Signature Properties
Signature Properties1/3
Signature Properties (3/3)
Shape Numbers
Shape number is the smallest magnitude of the first difference of a chain code representation. The
order of a shape number is defined as the number of digits in its representation. Shape order is even
for a closed boundary.
REGIONAL DESCRIPTORS Simple Descriptors Area, perimeter and compactness are the simple
region Descriptors Compactness = (perimeter)2/area Topological Descriptors
● Rubber-sheet Distortions Topology is the study of properties of a figure that are unaffected by any
deformation, as long as there is no tearing or joining of the figure.
● Euler Number Euler number (E) of region depends on the number of connected components (C)
and holes (H). E = C − H A connected component of a set is a subset of maximal size such that any
two of its points can be joined by a connected curve lying entirely within the subset
Pattern recognition receptors play an important role in SPR, allowing AI systems to more accurately
analyze human activities such as facial expressions, handwriting styles, voice commands, etc. These
receptors can be used to detect changes in a given scene or environment by analyzing its components.
As such, they provide valuable insights on how best to optimize AI systems for better performance.
Here are some key benefits of using SPR:
It facilitates the development of advanced solutions for recognizing complex patterns;
It helps identify relationships between data points;
It allows us to gain deeper insight into underlying processes;
It assists in creating models that help guide future decision-making;
Experimental studies have shown that it improves accuracy compared to traditional methods.
The window function can be rectangular or gaussian filter, where inside the interval is a
constant and outside is zero.
2. The change in intensity can be approximated by the equation below,
Where M is:
The elements of M correspond to whether the region is an edge, corner, or flat region.
(1) Conventional database system as an image database system. The use of a conventional database
system as an image database system is based mainly on relational data models and rarely on
hierarchical. The images are indexed as a set of attributes. At the time of the query, instead of
retrieving by asking for information straight from the images, the information is extracted from
previously calculated image attributes. Languages such as Structured Query Language (SQL) and
Query By Example (QBE) with modifications such as Query by Pictorial Example (QPE) are
common for such systems. This type of retrieval is referred as attribute based image retrieval. A
representative prototype system from this class of systems is the system GRIM_DBMS.
(2) Image processing/graphical systems with database functionality. In these systems topological,
vector and graphical representations of the images are stored in the database. The query is usually
based on a command-based language. A representative of this model is the research system SAND
(3) Extended/extensible conventional database system to an image database system. The systems in
this class are extensions over the relational data model to overcome the imposed limitations, by the
flat tabular structure of the relational databases. The retrieval strategy is the same as in the
conventional database system. One of the research systems in this direction is the system GIS
(4) Adaptive image database system. The framework of such a system is a flexible query specification
interface to account for the different interpretations of images. An attempt for defining such kind of
systems is made in.
(5) Miscellaneous systems/approaches. Various other approaches are used for building image
databases such as: grammar based, 2-D string based, entity-attributerelationship semantic network
approach, matching algorithms, etc. In this paper a new General Image DataBase (GIDB) model is
presented. It includes descriptions of:
(1) an image database system; (2) generic image database architecture;
(3) image definition, storage and manipulation languages.
Unit 4
and Lawrence Stevens, in his book Artificial Intelligence: The Search for the Perfect Machine, adds::
Contour-based shape representation and description Region borders are most commonly represented
by mathematical form. rectangular pixel co-ordinates as a function of path length n. Other useful
representations are
Polar co-ordinates: border elements are represented as pairs of angle φ and distance r;
• Tangential co-ordinates: tangential directions θ(xn) of curve points are encoded as a function of path
length n.
Back-tracking Algorithm
A backtracking algorithm is a problem-solving algorithm that uses a brute force approach for
finding the desired output.
The Brute force approach tries out all the possible solutions and chooses the desired/best solutions.
The term backtracking suggests that if the current solution is not suitable, then backtrack and try other
solutions. Thus, recursion is used in this approach.
This approach is used to solve problems that have multiple solutions. If you want an optimal solution,
you must go for dynamic programming.
Backtracking Algorithm
Backtrack(x)
if x is not a solution
return false
if x is a new solution
add to list of solutions
backtrack(expand x)
1. Center of Projection – It is a point where lines or projection that are not parallel to
projection plane appear to meet.
2. View Plane or Projection Plane – The view plane is determined by :
View reference point R0(x0, y0, z0)
View plane normal.
3. Location of an Object – It is specified by a point P that is located in world coordinates at (x,
y, z) location. The objective of perspective projection is to determine the image point P’
whose coordinates are (x’, y’, z’)
Types of Perspective Projection : Classification of perspective projection is on basis of vanishing
points (It is a point in image where a parallel line through center of projection intersects view
plane.). We can say that a vanishing point is a point where projection line intersects view plane. The
classification is as follows :
One Point Perspective Projection – One point perspective projection occurs when any of
principal axes intersects with projection plane or we can say when projection plane is
perpendicular to principal axis.
In the
above figure, z axis intersects projection plane whereas x and y axis remain parallel to
projection plane.
Two Point Perspective Projection – Two point perspective projection occurs when projection
plane intersects two of principal axis.
In the above figure, projection plane intersects x and y axis whereas z axis remains parallel to
projection plane.
Three Point Perspective Projection – Three point perspective projection occurs when all three
axis intersects with projection plane. There is no any principal axis which is parallel to
projection plane.
Photogrammetric-from 2D to 3D
2-D image capture
This method begins with the capture of a 2-D image of the spike, followed by processing of the image
to extract the desired trait data (Supporting Information). Spike images were captured using a Nikon
D5000 digital single-lens reflex camera. This camera uses the visible portion of the electromagnetic
spectrum with a resolution of 4288 × 2848 pixels. Images were acquired by using a red, green, blue
complementary metal-oxide semiconductor sensor with a pixel size of 5.5 μm. Each spike was cut 2.5
cm below the lowest spikelet and was then gently laid onto an aluminum plate inside a plastic
container. The aluminum plate was light, rigid, and covered the bottom of the container. A ruler was
placed beside the spike, and an image was taken from 0.5 to 1 m directly above the sample as it rested
on the aluminum plate. This step required less than a minute to accomplish, but at times glare from
sunlight reflecting off the aluminum had to be shaded. Images were made with a focal length of 35
mm to capture details of the spike, and the camera setting was set to automatic so the camera would
account for the varying lighting conditions.
2-D curvature
One physical manifestation of spike architecture was the curvature. Spikes are rarely straight; most
have an inherent curvature, and this trait was investigated for its potential biological significance.
Curvature is very difficult to measure without using image processing.
3-D image capture
The Artec Space Spider used structured light to create a point cloud that was transferred to the Artec
Studio software for processing; the default settings were used for consistency (Figure 3). Capturing 3-
D images can be time consuming and require some practice using the Space Spider, but experience
has decreased the capture time substantially. The post-processing procedure was also more time
consuming than 2-D data processing because of the amount of data that was captured (Figure 4).
Because the Artec Space Spider does not have internal storage capacity, it was connected through a
USB cable to a Surface Book with an Intel Core i7, which ran the Artec Studio capture software and
performed all the post-processing functions. Since this device had not been previously used for the
purpose of HTP, several different approaches were tried to find the best way to capture the phenotypic
characteristics of spikes in 3-D.
Image Matching
Image matching is an important concept in computer vision and object recognition. Images of the
same item can be taken from any angle, with any lighting and scale. This as well as occlusion may
cause problems for recognition. But ultimately, they still show the same item and should be
categorized that way. Therefore, it is best to find descriptive and invariant features in order to
categorize the images.
Key Point Detection
A general and basic approach to finding features is to first find unique key points, or the locations of
the most distinctive features, on each image. Then, normalize the content around the key points and
compute a local descriptor for it. The local descriptor is a vector of numbers that describes the visual
appearance of the key point. After doing so, these can be used to compare and match key points
across different images.
Harris Detector
The Harris Detector is one of the many existing detectors that can be used to find key points in
images. Corners are common key points that the detector tries to look for because there are significant
changes in intensity in all directions.
Where M is:
The elements of M correspond to whether the region is an edge, corner, or flat region.
Ix and Iy are image derivatives in the x and y directions and the dominant gradient directions align
with the x or y axis, respectively.
are and
2D-Matching
This method uses a reference image known as template and slides it across the entire source image one
pixel at a time to determine the most similar objects. It will produce another image or matrix where
pixel values correspond to how similar our template is to the source image. Thus, when we try to view
the output image, the pixel values of the matched objects will be peaking or will be highlighted.
Let’s take, for example, the source image and selected template illustrated in Figure 1.
As mentioned previously, the resulting image or matrix will highlight the objects that match our
template. This is evident in Figure 2, where the lighter spots or pixels are those that are most similar to
our template. To better see the matched objects, let’s put a bounding box around them.
Figure 2. Resulting image of template matching
As mentioned previously, the resulting image or matrix will highlight the objects that match our
template. This is evident in Figure 2, where the lighter spots or pixels are those that are most similar to
our template. To better see the matched objects, let’s put a bounding box around them.
In the process of hierarchical image matching, the parallaxes from upper levels are transferred to
levels beneath with triangle constraint and epipolar geometrical constraint. At last, outliers are
detected and removed based on local smooth constraint of parallax.
Workflow of our image matching procedure
(1) Image pre-processing including image transformation from 16-bit to 8-bit, the Wallis filter and
constructing image pyramid.
(2) Feature point extraction using the Förstner. This step is performed to provide the interest points
and edges for later image matching.
(3) Coarse image matching using Sift algorithm. This step is employed to generate the initial
triangulation for image matching.
(4) Matching propagation in the image pyramid. This step includes feature point and grid point
matching.
(5) Blunders elimination in each level including local smooth constraint of parallax and bidirectional
image matching.
(6) Least squares image matching in original image level. By means of correcting radiometric and
geometric distortion.
Global vs. Local features
Local features, also known as local descriptors, are distinct, informative characteristics of an image or
video frame that are used in computer vision and image processing. They can be used to represent an
image’s content and perform tasks such as object recognition, image retrieval, and tracking.
Local features are typically derived from small patches or regions of an image and are designed to be
invariant to changes in illumination, viewpoint, and other factors that can affect the image’s
appearance. Common local feature extraction techniques include Scale-Invariant Feature Transform
(SIFT), Speeded Up Robust Features (SURF), Oriented FAST, and Rotated BRIEF (ORB).
Global features, also known as global descriptors, are high-level, holistic characteristics of an image
or video frame that are used in computer vision and image processing. Unlike local features, which
describe distinct, informative regions of an image, global features provide a summary of the entire
image or video.
Global features are typically derived by aggregating local features in some way, such as by computing
statistics over the local feature descriptors or by constructing a histogram of the local feature
orientations. Global features can be used to represent an image’s content and perform tasks such as
image classification, scene recognition, and video segmentation.
Unit 5
A good knowledge representation design is the most important part of solving the understanding
problem and a small number of relatively simple control strategies is often sufficient for AI systems
to show complex behavior, assuming an appropriately complex knowledge base is available. In other
words, given a rich, well-structured representation of a large set of a priori data and hypotheses a
high degree of control sophistication is not required for intelligent behavior. Other terms of which
regular use will be made are syntax and semantics [Winston, 1984].
The syntax of a representation specifies the symbols that may be used and the ways that they may
be arranged while the semantics of a representation specifies how meaning is embodied in the
symbols and the symbol arrangement allowed by the syntax.
A representation is then a set of syntactic and semantic conventions that make it possible to
describe things. The main knowledge representation techniques used in AI are formal grammars and
languages, predicate logic, production rules, semantic nets, and frames. Note that knowledge
representation data structures are mostly extensions of conventional data structures such as lists,
trees, graphs, tables, hierarchies, sets, rings, nets, and matrices.
Control Strategies
Control strategies are used to guide the processing and analysis of images. They determine how the
system should behave based on the knowledge represented. There are several types of control
strategies:
|Rule-based control
Rule-based control involves using a set of predefined rules to make decisions. These rules are
typically based on if-then statements and are designed to capture expert knowledge.
Model-based control
Model-based control involves using a mathematical model of the system to make decisions. The
model represents the relationships between the input images, the processing steps, and the desired
output.
Behavior-based control
Behavior-based control involves defining a set of behaviors or modules that operate independently
and interact with each other. Each behavior is responsible for a specific task, and the system's
behavior emerges from the interactions between these behaviors.
Hybrid control
Hybrid control combines multiple control strategies to take advantage of their strengths. For
example, a system may use rule-based control for high-level decision-making and behavior-based
control for low-level perception and action.
Each control strategy has its own advantages and disadvantages. Rule-based control allows for
explicit knowledge representation but may not handle complex situations well. Model-based control
provides a principled approach but requires accurate models. Behavior-based control is flexible and
robust but may lack global optimization. Hybrid control combines the strengths of different
strategies but may introduce additional complexity.
Information Integration
Integrating these new procedures into knowledge-based systems will not be easy, however. Based
on our experience with the Schema System, when new vision procedures are developed, integrating
them presents yet another set of problems, particularly if the new procedure was developed at
another laboratory. About half the vision procedures of the time were written in C;
since the VISIONS/Schema System was implemented in Lisp, this meant that half of all procedures
had to be re-implemented. Even when the programming languages of the procedure and the vision
system matched, the data structures rarely did. Every algorithm seemingly had its own formats for
images, edges, straight lines and other commonly used geometric data structures. Applying one
procedure to data created by another usually required non-trivial data conversions
Algorithm
The algorithm of the Hough transform in image processing can be summarized as follows:
For each pixel in the image, compute all the possible curves in the parameter space
that pass through that pixel.
For each curve in the parameter space, increment a corresponding accumulator array.
Analyze the accumulator array to detect the presence of simple geometric shapes in
the image.
It works by transforming the image space into a parameter space consisting of three
parameters: the x-coordinate of the center of the circle, the y-coordinate of the center of
the circle, and the radius of the circle.
The structure of this paper is as follows. Section (2) gives the generative formulation of the problem.
In section (3), we motivate the discriminative approach. Section (4) describes how the algorithm
combines the two methods. In section (5) we give examples on a range of datasets and problems.
We use two types of shape representation in this paper: (I) sparse-point, and (II) continuous-
contour. The choice will depend on the form of the data. Shape matching will be easier if we
have a continuous-contour representation because we are able to exploit knowledge of the
arc-length to obtain shape features which are less ambiguous for matching, and hence more
informative, see section (3.1). But it may only be possible to compute a sparse-point
representation for the target shape (e.g. the target shape may be embedded in an image and an
edge detector will usually not output all the points on its boundary).
I. For the sparse-point representation, we denote the target and source shape respectively by:
(1)
II. For the continuous-contour representation, we denote the target and source shape
respectively by:
(2)
where s and t are the normalized arc-length. In this case, each shape is represented by a 2D
continuous-contour. By sampling points along the contour we can obtain a sparse-point
representation X = {xi : i = 1, …, M}, and Y = {ya : a = 1, …, N}. But we can exploit the
continuous-contour representation to compute additional features that depend on
differentiable properties of the contour such as tangent angles.
Our generative model for shape matching defines a probability distribution for generating the
target X from the source Y by means of a geometric transformation (A, f). There will be
priors P(A), P(f) on the transformation which will be specified in section (2.3).
We also define binary-valued correspondence variables {Vai} such that Vai = 1 if point a on
the source Y matches point i on the target X. These are treated as hidden variables. There is a
prior P(V) which specifies correspondence constraints on the matching (e.g. to constrain that
all points on the source Y must be matched).
The choice of the correspondence constraints, as specified in P(V) is very important. They
must satisfy a trade-off between the modeling and computational requirements. Constraints
that are ideal for modeling purposes can be computationally intractable. The prior P(V) will
be given in section (2.5) and the trade-off discussed.
The full generative model is P(X, V, A, f|Y) = P(X|Y, V, A, f)P(A)P(f)P(V), where the priors
are given in sections (2.3), (2.5). The distribution P(X|Y, V, A, f) is given by:
By using the priors P(A), P(f), P(V) and summing out the V’s, we obtain (this equation
defines ET [A, f; X, Y]):
Principal Components Analysis (PCA). What is it? It is a way of identifying patterns in data, and
expressing the data in such a way as to highlight their similarities and differences. Since patterns in
data can be hard to find in data of high dimension, where the luxury of graphical representation is
not available, PCA is a powerful tool for analysing data. The other main advantage of PCA is that
once you have found these patterns in the data, and you compress the data, ie. by reducing the
number of dimensions, without much loss of information.
Method
Step 1: Get some data In my simple example, I am going to use my own made-up data set. It’s only
got 2 dimensions, and the reason why I have chosen this is so that I can provide plots of the data to
show what the PCA analysis is doing at each step. The data I have used is found in Figure 3.1, along
with a plot of that data.
Step 2: Subtract the mean For PCA to work properly, you have to subtract the mean from each of the
data dimensions. The mean subtracted is the average across each dimension. So, all the < values
have < (the mean of the < values of all the data points) subtracted, and all the = values have =
subtracted from them. This produces a data set whose mean is zero.
Step 3: Calculate the covariance matrix This is done in exactly the same way as was discussed in
section 2.1.4. Since the data is 2 dimensional, the covariance matrix will be f . There are no
surprises here, so I will just give you the result: B&CD?
So, since the non-diagonal elements in this covariance matrix are positive, we should expect that
both the < and = variable increase together. Step 4: Calculate the eigenvectors and eigenvalues of
the covariance matrix Since the covariance matrix is square, we can calculate the eigenvectors and
eigenvalues for this matrix. These are rather important, as they tell us useful information about our
data. I will show you why soon. In the meantime, here are the eigenvectors and eigenvalues:
Step 5: Choosing components and forming a feature vector Here is where the notion of data
compression and reduced dimensionality comes into it. If you look at the eigenvectors and
eigenvalues from the previous section, you
will notice that the eigenvalues are quite different values. In fact, it turns out that the eigenvector
with the highest eigenvalue is the principle component of the data set. In our example, the
eigenvector with the larges eigenvalue was the one that pointed down the middle of the data. It is
the most significant relationship between the data dimensions.
Step 5: Deriving the new data set This the final step in PCA, and is also the easiest. Once we have
chosen the components (eigenvectors) that we wish to keep in our data and formed a feature
vector, we simply take the transpose of the vector and multiply it on the left of the original data set,
transposed.
Feature extraction
Feature extraction is the backbone of computer vision. It involves the identification and extraction of
important patterns or features from images or videos. These features act as distinctive
characteristics that help algorithms classify between different objects or elements within the visual
data.
Key Applications of Feature Extraction
There are many ways feature extraction is being used. Some of the common real-world applications
are:
1. Object Recognition in Autonomous Vehicles
Computer vision algorithms extract features from the surroundings to identify objects such as
pedestrians, traffic signs, and other vehicles, enabling autonomous vehicles to navigate safely.
2. Medical Image Analysis
In the medical field, feature extraction plays a important role in analyzing medical images, aiding in
the detection and diagnosis of diseases and abnormalities.
3. Augmented Reality (AR) and Virtual Reality (VR)
Feature extraction allows AR and VR applications to overlay virtual objects onto the real world
seamlessly.
4. Quality Control in Manufacturing
Computer vision algorithms can identify defects and anomalies in manufacturing processes, ensuring
higher product quality.
5. Facial Recognition Systems
Facial recognition systems utilize feature extraction to identify and authenticate individuals based on
unique facial features.
The Process of Feature Extraction
The process of feature extraction involves the following steps:
1. Image Preprocessing
Before extracting features, images often undergo preprocessing steps like noise reduction, image
enhancement, and normalization.
2. Feature Detection
In this step, algorithms detect key points, edges, or corners in the images using feature detection
techniques.
3. Feature Description
Once key points are identified, feature description techniques transform these points into a
mathematical representation.
4. Feature Matching
Feature matching algorithms compare and match the extracted features with a database to
recognize objects or patterns.