Computer Vision Lecture Notes Unit 1

UNIT-1
INTRODUCTION TO IMAGE FORMATION AND PROCESSING

Computer Vision - Geometric primitives and transformations -
Photometric image formation - The digital camera - Point operators -
Linear filtering - More neighborhood operators - Fourier transforms -
Pyramids and wavelets - Geometric transformations - Global
optimization.
1.Computer Vision:
Computer vision is a multidisciplinary field that enables machines to interpret and

make decisions based on visual data. It involves the development of algorithms
and systems that allow computers to gain high-level understanding from digital
images or videos. The goal of computer vision is to replicate and improve upon
human vision capabilities, enabling machines to recognize and understand visual
information.
Key tasks in computer vision include:
1. Image Recognition: Identifying objects, people, or patterns within images.
2. Object Detection: Locating and classifying multiple objects within an image or

video stream.
3. Image Segmentation: Dividing an image into meaningful segments or regions,

often to identify boundaries and structures.
4. Face Recognition: Identifying and verifying individuals based on facial

features.
5. Gesture Recognition: Understanding and interpreting human gestures from

images or video.
6. Scene Understanding: Analyzing and comprehending the content and context

of a scene.
7. Motion Analysis: Detecting and tracking movements within video sequences.
8. 3D Reconstruction: Creating three-dimensional models of objects or scenes

from two-dimensional images.
Computer vision applications are diverse and found in various fields, including
healthcare (medical image analysis), autonomous vehicles, surveillance,
augmented reality, robotics, industrial automation, and more. Advances in deep
learning, especially convolutional neural networks (CNNs), have significantly
contributed to the progress and success of computer vision tasks by enabling
efficient feature learning from large datasets.
2.Geometric primitives and transformations:
Geometric primitives and transformations are fundamental concepts in computer graphics and
computer vision. They form the basis for representing and manipulatingvisual elements in both
2D and 3D spaces. Let's explore each of these concepts:
Geometric Primitives: .
1. Points: Represented by coordinates (x, y) in 2D or (x, y, z) in 3D space.
2. Lines and Line Segments: Defined by two points or a point and a direction vector.
3. Polygons: Closed shapes with straight sides. Triangles, quadrilaterals, and otherpolygons
are common geometric primitives.
4. Circles and Ellipses: Defined by a center point and radii (or axes in the case of
ellipses).
5. Curves: Bézier curves, spline curves, and other parametric curves are used to
represent smooth shapes.
Geometric Transformations:
Geometric transformations involve modifying the position, orientation, and scale of
geometric primitives. Common transformations include:
1. Translation: Moves an object by a certain distance along a specified direction.
2. Rotation: Rotates an object around a specified point or axis.
3. Scaling: Changes the size of an object along different axes.
4. Shearing: Distorts the shape of an object by stretching or compressing along one ormore
axes.
5. Reflection: Mirrors an object across a specified plane.
6. Affine Transformations: Combine translation, rotation, scaling, and shearing.
7. Projective Transformations: Used for perspective transformations in 3D graphics.
Applications:
Computer Graphics: Geometric primitives and transformations are fundamentalfor
rendering 2D and 3D graphics in applications such as video games, simulations, and
virtual reality.
Computer-Aided Design(CAD): Used for designing and modeling objects in

engineering and architecture.
Computer Vision: Geometric transformations are applied to align and processimages,
correct distortions, and perform other tasks in image analysis.
Robotics: Essential for robot navigation, motion planning, and spatial reasoning.
Understanding geometric primitives and transformations is crucial for creating realisticand

visually appealing computer-generated images, as well as for solving various
problems in computer vision and robotics.
3. Photometric image formation:
Photometric image formation refers to the process by which light interacts with
surfaces and is captured by a camera, resulting in the creation of a digital image. Thisprocess
involves various factors related to the properties of light, the surfaces of
objects, and the characteristics of the imaging system. Understanding photometric
image formation is crucial in computer vision, computer graphics, and image
processing.
Here are some key concepts involved:
Illumination:
● Ambient Light: The overall illumination of a scene that comes from all
directions.
● Directional Light: Light coming from a specific direction, which can create
highlights and shadows.
Reflection:
● Diffuse Reflection: Light that is scattered in various directions by rough
surfaces.
● Specular Reflection: Light that reflects off smooth surfaces in a
concentrated direction, creating highlights.
Shading:
● Lambertian Shading: A model that assumes diffuse reflection and
constant shading across a surface.
● Phong Shading: A more sophisticated model that considers specular
reflection, creating more realistic highlights.
Surface Properties:
● Reflectance Properties: Material characteristics that determine how lightis
reflected (e.g., diffuse and specular reflectance).
● Albedo: The inherent reflectivity of a surface, representing the fraction of
incident light that is reflected.
Lighting Models:
● Phong Lighting Model: Combines diffuse and specular reflection
components to model lighting.
● Blinn-Phong Model: Similar to the Phong model but computationally more
efficient.
Shadows:
● Cast Shadows: Darkened areas on surfaces where light is blocked by otherobjects.
● Self Shadows: Shadows cast by parts of an object onto itself.Color
and Intensity:
● Color Reflection Models: Incorporate the color properties of surfaces in
addition to reflectance.
● Intensity: The brightness of light or color in an image.
Cameras:
● Camera Exposure: The amount of light allowed to reach the camerasensor
or film.
● Camera Response Function: Describes how a camera responds to light ofdifferent
intensities.
4. The digital camera:
A digital camera is an electronic device that captures and stores digital images. It
differs from traditional film cameras in that it uses electronic sensors to record imagesrather than
photographic film. Digital cameras have become widespread due to their
convenience, ability to instantly review images, and ease of sharing and storing photosdigitally.
Here are key components and concepts related to digital cameras:
Image Sensor:
● Digital cameras use image sensors (such as CCD or CMOS) to convertlight
into electrical signals.
● The sensor captures the image by measuring the intensity of light at eachpixel
location.
Lens:
● The lens focuses light onto the image sensor.
● Zoom lenses allow users to adjust the focal length, providing opticalzoom.
Aperture:
● The aperture is an adjustable opening in the lens that controls the amountof light
entering the camera.
● It affects the depth of field and exposure.
Shutter:
● The shutter mechanism controls the duration of light exposure to theimage
sensor.
● Fast shutter speeds freeze motion, while slower speeds create motionblur.
Viewfinder and LCD Screen:
● Digital cameras typically have an optical or electronic viewfinder for
composing shots.
● LCD screens on the camera back allow users to review and frame images. Image
Processor:
● Digital cameras include a built-in image processor to convert raw sensordata into
a viewable image.
● Image processing algorithms may enhance color, sharpness, and reducenoise.
Memory Card:
● Digital images are stored on removable memory cards, such as SD or CFcards.
● Memory cards provide a convenient and portable way to store and transfer
images. .
Autofocus and Exposure Systems:
● Autofocus systems automatically adjust the lens to ensure a sharp image.
● Exposure systems determine the optimal combination of aperture, shutterspeed,
and ISO sensitivity for proper exposure.
White Balance:
● White balance settings adjust the color temperature of the captured imageto match
different lighting conditions.
Modes and Settings:
● Digital cameras offer various shooting modes (e.g., automatic, manual,portrait,
landscape) and settings to control image parameters.
Connectivity:
● USB, HDMI, or wireless connectivity allows users to transfer images to
computers, share online, or connect to other devices.
Battery:
● Digital cameras are powered by rechargeable batteries, providing the
necessary energy for capturing and processing images.
5. Point operators:
Point operators, also known as point processing or pixel-wise operations, are basic image
processing operations that operate on individual pixels independently. Theseoperations are
applied to each pixel in an image without considering the values of neighboring pixels.
Point operators typically involve mathematical operations or
functions that transform the pixel values, resulting in changes to the image's
appearance. Here are some common point operators:
Brightness Adjustment:
● Addition/Subtraction: Increase or decrease the intensity of all pixels byadding
or subtracting a constant value.
● Multiplication/Division: Scale the intensity values by multiplying or dividingthem
by a constant factor.
Contrast Adjustment:
● Linear Contrast Stretching: Rescale the intensity values to cover the full
dynamic range.
● Histogram Equalization: Adjust the distribution of pixel intensities to
enhance contrast.
Gamma Correction:
● Adjust the gammavalue to control the overallbrightness and contrast of an image.
Thresholding:
● Convert a grayscale image to binary by setting a threshold value. Pixelswith
values above the thresho ld become white, and those below becomeblack.
Bit-plane Slicing:
● Decompose an image into its binary representation by considering
individual bits.
Color Mapping:
● Apply color transformations to change the color balance or convert
between color spaces (e.g., RGB to grayscale).
Inversion:
● Invert the intensity values of pixels, turning bright areas dark and viceversa.
Image Arithmetic:
● Perform arithmetic operations between pixels of two images, such as
addition, subtraction, multiplication, or division.
www.EnggTree.com
Point operators are foundational in image processing and form the basis for morecomplex
operations. They are often used in combination to achieve desired enhancements or
modifications to images. These operations are computationally
efficient, as they can be applied independently to each pixel, making them suitable forreal-time
applications and basic image manipulation tasks.
It's important to note that while point operators are powerful for certain tasks, moreadvanced
image processing techniques, such as filtering and convolution, involve
considering the values of neighboring pixels and are applied to local image regions.
6. Linear filtering:
Linear filtering is a fundamental concept in image processing that involves applying alinear
operator to an image. The linear filter operates on each pixel in the image by
combining its value with the values of its neighboring pixels according to a predefined
convolution kernel or matrix. The convolution operation is a mathematical operation
that computes the weighted sum of pixel values in the image, producing a new value forthe center
pixel.
The general formula for linear filtering or convolution is given by:
Where:
Common linear filtering operations include:
Blurring/Smoothing:
● Average filter: Each output pixel is the average of its neighboring pixels.
● Gaussian filter: Applies a Gaussian distribution to compute weights forpixel
averaging.
Edge Detection:
● Sobel filter: Emphasizes edges by computing gradients in the x and y
directions.
● Prewitt filter: Similar to Sobel but uses a different kernel for gradient
computation.
Sharpening:
● Laplacian filter: Enhances high-frequency components to highlight edges.
● High-pass filter: Emphasizes details by subtracting a blurred version of theimage.
Embossing:
● Applies an embossing effect by highlighting changes in intensity.
Linear filtering is a versatile technique and forms the basis for more advanced image processing
operations. The convolution operation can be efficiently implemented using
convolutional neural networks (CNNs) in deep learning, where filters are learned duringthe
training process to perform tasks such as image recognition, segmentation, and
denoising. The choice of filter kernel and parameters determines the specific effectachieved
through linear filtering.
7. More neighborhood operators :

Neighborhood operators in image processing involve the consideration of pixel values inthe
vicinity of a target pixel, usually within a defined neighborhood or window. Unlike point
operators that operate on individual pixels, neighborhood operators take into account the local
structure of the image. Here are some common neighborhood
operators:
Median Filter:
● Computes the median value of pixel intensities within a local
neighborhood.
● Effective for removing salt-and-pepper noise while preserving edges.
Gaussian Filter:
● Applies a weighted average to pixel values using a Gaussian distribution.
● Used for blurring and smoothing, with the advantage of preserving edges.
Bilateral Filter:
● Combines spatial and intensity information to smooth images while
preserving edges.
● Uses two Gaussian distributions, one for spatial proximity and one for
intensity similarity.
Non-local Means Filter:
● Computes the weighted average of pixel values based on similarity in alarger
non-local neighborhood.
● Effective for denoising while preserving fine structures.
Anisotropic Diffusion:
● Reduces noise while preserving edges by iteratively diffusing intensityvalues
along edges.
● Particularly useful for images with strong edges.
Morphological Operators:
● Dilation: Expands bright regions by considering the maximum pixel valuein a
neighborhood.
● Erosion: Contracts bright regions by considering the minimum pixel valuein a
neighborhood.
● Used for operations like noise reduction, object segmentation, and shapeanalysis.
Laplacian of Gaussian (LoG):
● Applies a Gaussian smoothing followed by the Laplacian operator.
● Useful for edge detection.
Canny Edge Detector:
● Combines Gaussian smoothing, gradient computation, non-maximum
suppression, and edge tracking by hysteresis.
● Widely used for edge detection in computer vision applications.
Homomorphic Filtering:
● Adjusts image intensity by separating the image into illumination and
reflectance components.
● Useful for enhancing images with non-uniform illumination.
Adaptive Histogram Equalization:
● Improves contrast by adjusting the histogram of pixel intensities based onlocal
neighborhoods.
● Effective for enhancing images with varying illumination.
.
These neighborhood operators play a crucial role in image enhancement, denoising, edge
detection, and other image processing tasks. The choice of operator depends onthe specific
characteristics of the image and the desired outcome.
8. Fourier transforms:
Fourier transforms play a significant role in computer vision for analyzing and
processing images. They are used to decompose an image into its frequency
components, providing valuable information for tasks such as image filtering, feature
extraction, and pattern recognition. Here are some ways Fourier transforms are employed in
computer vision:
Frequency Analysis:
● Fourier transforms help in understanding the frequency content of an
image. High-frequency components correspond to edges and fine details,while
low-frequency components represent smooth regions.
Image Filtering:
● Filtering in the frequency domain allows for efficient operations such asblurring
or sharpening. Low-pass filters remove high-frequency noise,
while high-pass filters enhance edges and fine details.
Image Enhancement:
● Adjusting the amplitude of specific frequency components can enhance orsuppress
certain features in an image. This is commonly used in image enhancement
techniques.
Texture Analysis:
● Fourier analysis is useful in characterizing and classifying textures basedon their
frequency characteristics. It helps distinguish between textureswith different
patterns.
Pattern Recognition:
● Fourier descriptors, which capture shape information, are used for representing
and recognizing objects in images. They provide a compactrepresentation of
shape by capturing the dominant frequency
components.
Image Compression:
● Transform-based image compression, such as JPEG compression, utilizesFourier
transforms to transform image data into the frequency domain.
This allows for efficient quantization and coding of frequency components.
Image Registration:
● Fourier transforms are used in image registration, aligning images or
transforming them to a common coordinate system. Cross-correlation inthe
frequency domain is often employed for this purpose.
Optical Character Recognition (OCR):
● Fourier descriptors are used in OCR systems for character recognition. They
help in capturing the shape information of characters, making the recognition
process more robust.
Homomorphic Filtering:
● Homomorphic filtering, which involves transforming an image to a
logarithmic domain using Fourier transforms, is used in applications such as
document analysis and enhancement.
Image Reconstruction:
● Fourier transforms are involved in techniques like computed tomography
(CT) or magnetic resonance imaging (MRI) for reconstructing images from their
projections.
The efficient computation of Fourier transforms, particularly through the use of the Fast Fourier
Transform (FFT) algorithm, has made these techniques computationally feasiblefor real-time
applications in computer vision. The ability to analyze images in the
frequency domain provides valuable insights and contributes to the development ofadvanced
image processing techniques.
9. Pyramids and wavelets:

Pyramids and wavelets are both techniques used in image processing for
multi-resolution analysis, allowing the representation of an image at different scales.They are
valuable for tasks such as image compression, feature extraction, and imageanalysis.
Image Pyramids:
Image pyramids are a series of images representing the same scene but at different
resolutions. There are two main types of image pyramids:
Gaussian Pyramid:
● Created by repeatedly applying Gaussian smoothing and downsampling to
an image. .
● At each level, the image is smoothed to remove high-frequency
information, and then it is subsampled to reduce its size.
● Useful for tasks like image blending, image matching, and coarse-to-fineimage
processing.
Laplacian Pyramid:
● Derived from the Gaussian pyramid.
● Each level of the Laplacian pyramid is obtained by subtracting the expanded
version of the higher level Gaussian pyramid from the originalimage.
● Useful for image compression and coding, where the Laplacian pyramid
represents the residual information not captured by the Gaussian pyramid.
Image pyramids are especially useful for creating multi-scale representations of images,which can
be beneficial for various computer vision tasks.
Wavelets:
Wavelets are mathematical functions that can be used to analyze signals and images. Wavelet
transforms provide a multi-resolution analysis by decomposing an image into approximation
(low-frequency) and detail (high-frequency) components. Key concepts include:
Wavelet Transform:
● The wavelet transform decomposes an image into different frequency
components by convolving the image with wavelet functions.
● The result is a set of coefficients that represent the image at variousscales
and orientations.
Multi-resolution Analysis:
● Wavelet transforms offer a multi-resolution analysis, allowing the
representation of an image at different scales.
● The approximation coefficients capture the low-frequency information,while
detail coefficients capture high-frequency information.
Haar Wavelet:
● The Haar wavelet is a simple wavelet function used in basic wavelet
transforms.
● It represents changes in intensity between adjacent pixels.
Wavelet Compression: .
● Wavelet-based image compression techniques, such as JPEG2000, utilize wavelet
transforms to efficiently represent image data in both spatial and frequency
domains.
Image Denoising:
● Wavelet-based thresholding techniques can be applied to denoise images by
thresholding the wavelet coefficients.
Edge Detection:
● Wavelet transforms can be used for edge detection by analyzing the high-
frequency components of the image.
Both pyramids and wavelets offer advantages in multi-resolution analysis, but they differin terms
of their representation and construction. Pyramids use a hierarchical structure of smoothed and
subsampled images, while wavelets use a transform-based approach that decomposes the image
into frequency components. The choice between pyramids and wavelets often depends on the
specific requirements of the image processing task at hand.
10. Geometric transformations :

Geometric transformations are operations that modify the spatial configuration of
objects in a digital image. These transformations are applied to change the position,
orientation, scale, or shape of objects while preserving certain geometric properties.
Geometric transformations are commonly used in computer graphics, computer vision,and image
processing. Here are some fundamental geometric transformations:
1. Translation:
● Description: Moves an object by a specified distance along the x and/or y axes.
● Transformation Matrix (2D):
.
● Applications: Object movement, image registration.
2. Rotation:
● Description: Rotates an object by a specified angle about a fixed point.
● Applications: Image rotation, orientation adjustment.

3. Scaling:
● Description: Changes the size of an object by multiplying its coordinates byscaling
factors.
●
● Applications: Zooming in/out, resizing.
4. Shearing: .
● Description: Distorts the shape of an object by varying its coordinates linearly.

● Applications: Skewing, slanting.
5. Affine Transformation:
● Description: Combines translation, rotation, scaling, and shearing.
● Applications: Generalized transformations.
6. Perspective Transformation:
● Description: Represents a perspective projection, useful for simulatingthree-
dimensional effects.
● Applications: 3D rendering, simulation.
7. Projective Transformation:
● Description: Generalization of perspective transformation with additional controlpoints.
● Transformation Matrix (3D): More complex than the perspective transformationmatrix.
● Applications: Computer graphics, augmented reality.
These transformations are crucial for various applications, including image
manipulation, computer-aided design (CAD), computer vision, and graphics rendering.
Understanding and applying geometric transformations are fundamental skills incomputer science
and engineering fields related to digital image processing.
11. Global optimization:
.
Global optimization is a branch of optimization that focuses on finding the global
minimum or maximum of a function over its entire feasible domain. Unlike local
optimization, which aims to find the optimal solution within a specific region, global
optimization seeks the best possible solution across the entire search space. Global
optimization problems are often challenging due to the presence of multiple local
optima or complex, non-convex search spaces.
Here are key concepts and approaches related to global optimization:
Concepts:
Objective Function:
● The function to be minimized or maximized.
Feasible Domain:
● The set of input values (parameters) for which the objective function isdefined.
Global Minimum/Maximum:
● The lowest or highest value of the objective function over the entire
feasible domain.
Local Minimum/Maximum:
● A minimum or maximum within a specific region of the feasible domain.
Approaches:
Grid Search:
● Dividing the feasible domain into a grid and evaluating the objective
function at each grid point to find the optimal solution.
Random Search:
● Randomly sampling points in the feasible domain and evaluating the
objective function to explore different regions.
Evolutionary Algorithms:
● Genetic algorithms, particle swarm optimization, and other evolutionary
techniques use populations of solutions and genetic operators to
iteratively evolve toward the optimal solution.
Simulated Annealing:
● Inspired by the annealing process in metallurgy, simulated annealing
gradually decreases the temperature to allow the algorithm to escapelocal
optima.
Ant Colony Optimization:.
● Inspired by the foraging behavior of ants, this algorithm uses pheromonetrails to
guide the search for the optimal solution.
Genetic Algorithms:
● Inspired by biological evolution, genetic algorithms use mutation,
crossover, and selection to evolve a population of potential solutions.
Particle Swarm Optimization:
● Simulates the social behavior of birds or fish, where a swarm of particlesmoves
through the search space to find the optimal solution.
Bayesian Optimization:
● Utilizes probabilistic models to model the objective function and guide thesearch
toward promising regions.
Quasi-Newton Methods:
● Iterative optimization methods that use an approximation of the Hessianmatrix to
find the optimal solution efficiently.
Global optimization is applied in various fields, including engineering design, machinelearning,

finance, and parameter tuning in algorithmic optimization. The choice of a specific global
optimization method depends on the characteristics of the objective function, the
dimensionality of the search space, and the available computational resources.

Computer Vision Lecture Notes Unit 1

Uploaded by

Computer Vision Lecture Notes Unit 1

Uploaded by

UNIT-1

INTRODUCTION TO IMAGE FORMATION AND PROCESSING

Computer vision is a multidisciplinary field that enables machines to interpret and

Key tasks in computer vision include:

1. Image Recognition: Identifying objects, people, or patterns within images.

2. Object Detection: Locating and classifying multiple objects within an image or

3. Image Segmentation: Dividing an image into meaningful segments or regions,

4. Face Recognition: Identifying and verifying individuals based on facial

5. Gesture Recognition: Understanding and interpreting human gestures from

6. Scene Understanding: Analyzing and comprehending the content and context

8. 3D Reconstruction: Creating three-dimensional models of objects or scenes

2.Geometric primitives and transformations:

2. Rotation: Rotates an object around a speciﬁed point or axis.

3. Scaling: Changes the size of an object along different axes.

5. Reﬂection: Mirrors an object across a speciﬁed plane.

6. Aﬃne Transformations: Combine translation, rotation, scaling, and shearing.

7. Projective Transformations: Used for perspective transformations in 3D graphics.

Computer-Aided Design(CAD): Used for designing and modeling objects in

Understanding geometric primitives and transformations is crucial for creating realisticand

3. Photometric image formation:

Here are some key concepts involved:

4. The digital camera:

The general formula for linear filtering or convolution is given by:

Common linear ﬁltering operations include:

7. More neighborhood operators :

9. Pyramids and wavelets:

10. Geometric transformations :

● Applications: Image rotation, orientation adjustment.

● Description: Distorts the shape of an object by varying its coordinates linearly.

● Applications: Skewing, slanting.

● Applications: Generalized transformations.

● Applications: 3D rendering, simulation.

11. Global optimization:

Here are key concepts and approaches related to global optimization:

Global optimization is applied in various ﬁelds, including engineering design, machinelearning,

You might also like