Computer Vision Lecture Notes Unit 1
Computer Vision Lecture Notes Unit 1
1.Computer Vision:
Computer vision applications are diverse and found in various fields, including
healthcare (medical image analysis), autonomous vehicles, surveillance,
augmented reality, robotics, industrial automation, and more. Advances in deep
learning, especially convolutional neural networks (CNNs), have significantly
contributed to the progress and success of computer vision tasks by enabling
efficient feature learning from large datasets.
Geometric primitives and transformations are fundamental concepts in computer graphics and
computer vision. They form the basis for representing and manipulatingvisual elements in both
2D and 3D spaces. Let's explore each of these concepts:
Geometric Primitives: .
1. Points: Represented by coordinates (x, y) in 2D or (x, y, z) in 3D space.
2. Lines and Line Segments: Defined by two points or a point and a direction vector.
3. Polygons: Closed shapes with straight sides. Triangles, quadrilaterals, and otherpolygons
are common geometric primitives.
4. Circles and Ellipses: Defined by a center point and radii (or axes in the case of
ellipses).
5. Curves: Bézier curves, spline curves, and other parametric curves are used to
represent smooth shapes.
Geometric Transformations:
Geometric transformations involve modifying the position, orientation, and scale of
geometric primitives. Common transformations include:
1. Translation: Moves an object by a certain distance along a specified direction.
4. Shearing: Distorts the shape of an object by stretching or compressing along one ormore
axes.
Applications:
Computer Graphics: Geometric primitives and transformations are fundamentalfor
rendering 2D and 3D graphics in applications such as video games, simulations, and
virtual reality.
Photometric image formation refers to the process by which light interacts with
surfaces and is captured by a camera, resulting in the creation of a digital image. Thisprocess
involves various factors related to the properties of light, the surfaces of
objects, and the characteristics of the imaging system. Understanding photometric
image formation is crucial in computer vision, computer graphics, and image
processing.
Illumination:
● Ambient Light: The overall illumination of a scene that comes from all
directions.
● Directional Light: Light coming from a specific direction, which can create
highlights and shadows.
Reflection:
● Diffuse Reflection: Light that is scattered in various directions by rough
surfaces.
● Specular Reflection: Light that reflects off smooth surfaces in a
concentrated direction, creating highlights.
Shading:
● Lambertian Shading: A model that assumes diffuse reflection and
constant shading across a surface.
● Phong Shading: A more sophisticated model that considers specular
reflection, creating more realistic highlights.
Surface Properties:
● Reflectance Properties: Material characteristics that determine how lightis
reflected (e.g., diffuse and specular reflectance).
● Albedo: The inherent reflectivity of a surface, representing the fraction of
incident light that is reflected.
Lighting Models:
● Phong Lighting Model: Combines diffuse and specular reflection
components to model lighting.
● Blinn-Phong Model: Similar to the Phong model but computationally more
efficient.
Shadows:
● Cast Shadows: Darkened areas on surfaces where light is blocked by otherobjects.
● Self Shadows: Shadows cast by parts of an object onto itself.Color
and Intensity:
● Color Reflection Models: Incorporate the color properties of surfaces in
addition to reflectance.
● Intensity: The brightness of light or color in an image.
Cameras:
● Camera Exposure: The amount of light allowed to reach the camerasensor
or film.
● Camera Response Function: Describes how a camera responds to light ofdifferent
intensities.
A digital camera is an electronic device that captures and stores digital images. It
differs from traditional film cameras in that it uses electronic sensors to record imagesrather than
photographic film. Digital cameras have become widespread due to their
convenience, ability to instantly review images, and ease of sharing and storing photosdigitally.
Here are key components and concepts related to digital cameras:
Image Sensor:
● Digital cameras use image sensors (such as CCD or CMOS) to convertlight
into electrical signals.
● The sensor captures the image by measuring the intensity of light at eachpixel
location.
Lens:
● The lens focuses light onto the image sensor.
● Zoom lenses allow users to adjust the focal length, providing opticalzoom.
Aperture:
● The aperture is an adjustable opening in the lens that controls the amountof light
entering the camera.
● It affects the depth of field and exposure.
Shutter:
● The shutter mechanism controls the duration of light exposure to theimage
sensor.
● Fast shutter speeds freeze motion, while slower speeds create motionblur.
Viewfinder and LCD Screen:
● Digital cameras typically have an optical or electronic viewfinder for
composing shots.
● LCD screens on the camera back allow users to review and frame images. Image
Processor:
● Digital cameras include a built-in image processor to convert raw sensordata into
a viewable image.
● Image processing algorithms may enhance color, sharpness, and reducenoise.
Memory Card:
● Digital images are stored on removable memory cards, such as SD or CFcards.
● Memory cards provide a convenient and portable way to store and transfer
images. .
Autofocus and Exposure Systems:
● Autofocus systems automatically adjust the lens to ensure a sharp image.
● Exposure systems determine the optimal combination of aperture, shutterspeed,
and ISO sensitivity for proper exposure.
White Balance:
● White balance settings adjust the color temperature of the captured imageto match
different lighting conditions.
Modes and Settings:
● Digital cameras offer various shooting modes (e.g., automatic, manual,portrait,
landscape) and settings to control image parameters.
Connectivity:
● USB, HDMI, or wireless connectivity allows users to transfer images to
computers, share online, or connect to other devices.
Battery:
● Digital cameras are powered by rechargeable batteries, providing the
necessary energy for capturing and processing images.
5. Point operators:
Point operators, also known as point processing or pixel-wise operations, are basic image
processing operations that operate on individual pixels independently. Theseoperations are
applied to each pixel in an image without considering the values of neighboring pixels.
Point operators typically involve mathematical operations or
functions that transform the pixel values, resulting in changes to the image's
appearance. Here are some common point operators:
Brightness Adjustment:
● Addition/Subtraction: Increase or decrease the intensity of all pixels byadding
or subtracting a constant value.
● Multiplication/Division: Scale the intensity values by multiplying or dividingthem
by a constant factor.
Contrast Adjustment:
● Linear Contrast Stretching: Rescale the intensity values to cover the full
dynamic range.
● Histogram Equalization: Adjust the distribution of pixel intensities to
enhance contrast.
Gamma Correction:
● Adjust the gammavalue to control the overallbrightness and contrast of an image.
Thresholding:
● Convert a grayscale image to binary by setting a threshold value. Pixelswith
values above the thresho ld become white, and those below becomeblack.
Bit-plane Slicing:
● Decompose an image into its binary representation by considering
individual bits.
Color Mapping:
● Apply color transformations to change the color balance or convert
between color spaces (e.g., RGB to grayscale).
Inversion:
● Invert the intensity values of pixels, turning bright areas dark and viceversa.
Image Arithmetic:
● Perform arithmetic operations between pixels of two images, such as
addition, subtraction, multiplication, or division.
www.EnggTree.com
Point operators are foundational in image processing and form the basis for morecomplex
operations. They are often used in combination to achieve desired enhancements or
modifications to images. These operations are computationally
efficient, as they can be applied independently to each pixel, making them suitable forreal-time
applications and basic image manipulation tasks.
It's important to note that while point operators are powerful for certain tasks, moreadvanced
image processing techniques, such as filtering and convolution, involve
considering the values of neighboring pixels and are applied to local image regions.
6. Linear filtering:
Linear filtering is a fundamental concept in image processing that involves applying alinear
operator to an image. The linear filter operates on each pixel in the image by
combining its value with the values of its neighboring pixels according to a predefined
convolution kernel or matrix. The convolution operation is a mathematical operation
that computes the weighted sum of pixel values in the image, producing a new value forthe center
pixel.
Where:
Blurring/Smoothing:
● Average filter: Each output pixel is the average of its neighboring pixels.
● Gaussian filter: Applies a Gaussian distribution to compute weights forpixel
averaging.
Edge Detection:
● Sobel filter: Emphasizes edges by computing gradients in the x and y
directions.
● Prewitt filter: Similar to Sobel but uses a different kernel for gradient
computation.
Sharpening:
● Laplacian filter: Enhances high-frequency components to highlight edges.
● High-pass filter: Emphasizes details by subtracting a blurred version of theimage.
Embossing:
● Applies an embossing effect by highlighting changes in intensity.
Linear filtering is a versatile technique and forms the basis for more advanced image processing
operations. The convolution operation can be efficiently implemented using
convolutional neural networks (CNNs) in deep learning, where filters are learned duringthe
training process to perform tasks such as image recognition, segmentation, and
denoising. The choice of filter kernel and parameters determines the specific effectachieved
through linear filtering.
8. Fourier transforms:
Fourier transforms play a significant role in computer vision for analyzing and
processing images. They are used to decompose an image into its frequency
components, providing valuable information for tasks such as image filtering, feature
extraction, and pattern recognition. Here are some ways Fourier transforms are employed in
computer vision:
Frequency Analysis:
● Fourier transforms help in understanding the frequency content of an
image. High-frequency components correspond to edges and fine details,while
low-frequency components represent smooth regions.
Image Filtering:
● Filtering in the frequency domain allows for efficient operations such asblurring
or sharpening. Low-pass filters remove high-frequency noise,
while high-pass filters enhance edges and fine details.
Image Enhancement:
● Adjusting the amplitude of specific frequency components can enhance orsuppress
certain features in an image. This is commonly used in image enhancement
techniques.
Texture Analysis:
● Fourier analysis is useful in characterizing and classifying textures basedon their
frequency characteristics. It helps distinguish between textureswith different
patterns.
Pattern Recognition:
● Fourier descriptors, which capture shape information, are used for representing
and recognizing objects in images. They provide a compactrepresentation of
shape by capturing the dominant frequency
components.
Image Compression:
● Transform-based image compression, such as JPEG compression, utilizesFourier
transforms to transform image data into the frequency domain.
This allows for efficient quantization and coding of frequency components.
Image Registration:
● Fourier transforms are used in image registration, aligning images or
transforming them to a common coordinate system. Cross-correlation inthe
frequency domain is often employed for this purpose.
Optical Character Recognition (OCR):
● Fourier descriptors are used in OCR systems for character recognition. They
help in capturing the shape information of characters, making the recognition
process more robust.
Homomorphic Filtering:
● Homomorphic filtering, which involves transforming an image to a
logarithmic domain using Fourier transforms, is used in applications such as
document analysis and enhancement.
Image Reconstruction:
● Fourier transforms are involved in techniques like computed tomography
(CT) or magnetic resonance imaging (MRI) for reconstructing images from their
projections.
The efficient computation of Fourier transforms, particularly through the use of the Fast Fourier
Transform (FFT) algorithm, has made these techniques computationally feasiblefor real-time
applications in computer vision. The ability to analyze images in the
frequency domain provides valuable insights and contributes to the development ofadvanced
image processing techniques.
Image Pyramids:
Image pyramids are a series of images representing the same scene but at different
resolutions. There are two main types of image pyramids:
Gaussian Pyramid:
● Created by repeatedly applying Gaussian smoothing and downsampling to
an image. .
● At each level, the image is smoothed to remove high-frequency
information, and then it is subsampled to reduce its size.
● Useful for tasks like image blending, image matching, and coarse-to-fineimage
processing.
Laplacian Pyramid:
● Derived from the Gaussian pyramid.
● Each level of the Laplacian pyramid is obtained by subtracting the expanded
version of the higher level Gaussian pyramid from the originalimage.
● Useful for image compression and coding, where the Laplacian pyramid
represents the residual information not captured by the Gaussian pyramid.
Image pyramids are especially useful for creating multi-scale representations of images,which can
be beneficial for various computer vision tasks.
Wavelets:
Wavelets are mathematical functions that can be used to analyze signals and images. Wavelet
transforms provide a multi-resolution analysis by decomposing an image into approximation
(low-frequency) and detail (high-frequency) components. Key concepts include:
Wavelet Transform:
● The wavelet transform decomposes an image into different frequency
components by convolving the image with wavelet functions.
● The result is a set of coefficients that represent the image at variousscales
and orientations.
Multi-resolution Analysis:
● Wavelet transforms offer a multi-resolution analysis, allowing the
representation of an image at different scales.
● The approximation coefficients capture the low-frequency information,while
detail coefficients capture high-frequency information.
Haar Wavelet:
● The Haar wavelet is a simple wavelet function used in basic wavelet
transforms.
● It represents changes in intensity between adjacent pixels.
Wavelet Compression: .
● Wavelet-based image compression techniques, such as JPEG2000, utilize wavelet
transforms to efficiently represent image data in both spatial and frequency
domains.
Image Denoising:
● Wavelet-based thresholding techniques can be applied to denoise images by
thresholding the wavelet coefficients.
Edge Detection:
● Wavelet transforms can be used for edge detection by analyzing the high-
frequency components of the image.
Both pyramids and wavelets offer advantages in multi-resolution analysis, but they differin terms
of their representation and construction. Pyramids use a hierarchical structure of smoothed and
subsampled images, while wavelets use a transform-based approach that decomposes the image
into frequency components. The choice between pyramids and wavelets often depends on the
specific requirements of the image processing task at hand.
1. Translation:
● Description: Moves an object by a specified distance along the x and/or y axes.
● Transformation Matrix (2D):
.
● Applications: Object movement, image registration.
2. Rotation:
● Description: Rotates an object by a specified angle about a fixed point.
● Transformation Matrix (2D):
●
● Applications: Zooming in/out, resizing.
4. Shearing: .
5. Affine Transformation:
● Description: Combines translation, rotation, scaling, and shearing.
● Transformation Matrix (2D):
6. Perspective Transformation:
● Description: Represents a perspective projection, useful for simulatingthree-
dimensional effects.
● Transformation Matrix (3D):
7. Projective Transformation:
● Description: Generalization of perspective transformation with additional controlpoints.
● Transformation Matrix (3D): More complex than the perspective transformationmatrix.
● Applications: Computer graphics, augmented reality.
These transformations are crucial for various applications, including image
manipulation, computer-aided design (CAD), computer vision, and graphics rendering.
Understanding and applying geometric transformations are fundamental skills incomputer science
and engineering fields related to digital image processing.
.
Global optimization is a branch of optimization that focuses on finding the global
minimum or maximum of a function over its entire feasible domain. Unlike local
optimization, which aims to find the optimal solution within a specific region, global
optimization seeks the best possible solution across the entire search space. Global
optimization problems are often challenging due to the presence of multiple local
optima or complex, non-convex search spaces.
Concepts:
Objective Function:
● The function to be minimized or maximized.
Feasible Domain:
● The set of input values (parameters) for which the objective function isdefined.
Global Minimum/Maximum:
● The lowest or highest value of the objective function over the entire
feasible domain.
Local Minimum/Maximum:
● A minimum or maximum within a specific region of the feasible domain.
Approaches:
Grid Search:
● Dividing the feasible domain into a grid and evaluating the objective
function at each grid point to find the optimal solution.
Random Search:
● Randomly sampling points in the feasible domain and evaluating the
objective function to explore different regions.
Evolutionary Algorithms:
● Genetic algorithms, particle swarm optimization, and other evolutionary
techniques use populations of solutions and genetic operators to
iteratively evolve toward the optimal solution.
Simulated Annealing:
● Inspired by the annealing process in metallurgy, simulated annealing
gradually decreases the temperature to allow the algorithm to escapelocal
optima.
Ant Colony Optimization:.
● Inspired by the foraging behavior of ants, this algorithm uses pheromonetrails to
guide the search for the optimal solution.
Genetic Algorithms:
● Inspired by biological evolution, genetic algorithms use mutation,
crossover, and selection to evolve a population of potential solutions.
Particle Swarm Optimization:
● Simulates the social behavior of birds or fish, where a swarm of particlesmoves
through the search space to find the optimal solution.
Bayesian Optimization:
● Utilizes probabilistic models to model the objective function and guide thesearch
toward promising regions.
Quasi-Newton Methods:
● Iterative optimization methods that use an approximation of the Hessianmatrix to
find the optimal solution efficiently.