0% found this document useful (0 votes)
250 views67 pages

Computer Vision Unit5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
250 views67 pages

Computer Vision Unit5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Computer Vision

By
Dr. [Link]
Artificial Intelligence Department.
Unit-5 Syllabus
Motion Analysis :

Background Subtraction and Modeling, Optical Flow, KLT, Spatio-Temporal Analysis, Dynamic Stereo; Motion
parameter estimation.

Shape from X :

Light at Surfaces; Phong Model; Reflectance Map; Albedo Estimation; Photometric Stereo; Use of Surface
Smoothness Constraint; Shape from Texture, Colour, Motion and Edges.
Motion Analysis
• Motion analysis and tracking are important tasks in computer vision that involve
understanding and tracking the movement of objects in videos or sequences of
images
• These tasks provide valuable information about object dynamics, behavior, and
interactions over time.
• This field plays a crucial role in various applications, including surveillance,
autonomous navigation, sports analysis, robotics, and more
Key components and techniques
1. Object Detection: Before tracking can begin, computer vision systems need to identify objects or regions of interest within
a video stream or image sequence. Object detection algorithms, such as YOLO (You Only Look Once) or Faster R-CNN, are
commonly used for this purpose.
2. Motion Estimation: Motion analysis starts with the estimation of how objects within the video sequence are moving over
time. Various techniques are employed for this, including optical flow, which computes the movement of pixels between
consecutive frames, and dense optical flow methods like Lucas-Kanade or Horn-Schunck.
3. Feature Tracking: Once objects or points of interest are detected, feature tracking algorithms help maintain
correspondence between these features across frames. Common features include key points, corners, or unique textures.
The Lucas-Kanade tracker is an example of a popular feature tracking technique.
4. Object Tracking: Object tracking involves following the entire object’s movement throughout the video sequence. This
requires associating the detected object in one frame with the same object in subsequent frames. Various tracking
algorithms are used for this, such as the Kalman filter, Particle filter, and Mean-Shift tracking.
5. Multiple Object Tracking: In scenarios where multiple objects are present, multiple object tracking algorithms must be
used to keep track of each object individually and avoid confusion. Data association techniques are used to match objects
between frames correctly.
6. Motion Analysis and Trajectory Prediction: After tracking objects, motion analysis can be performed to study object
trajectories, speed, acceleration, and behavior patterns. This information can be crucial in applications like traffic
monitoring, where predicting the future positions of vehicles is essential for safety and traffic management.
Background Subtraction and Modeling
• Background subtraction is a widely used approach to detect moving
objects in a sequence of frames from static cameras.
• The base in this approach is that of detecting moving objects from the
difference between the current frame and reference frame, which is
often called ‘Background Image’ or ‘Background Model’
• This background subtraction is typically done by detecting the
foreground objects in a video frame and foreground detection is the
main task of this whole approach.
• Many applications do not need to know the whole contents of the sequence,
moreover, further analysis is focused on some part of the sequence because
interest lies in the particular objects of images in its foreground.
• After completing all the preprocessing steps such as deionizing, morphological
processing, object localisation is carried out and there this foreground detection is
used.
• All the present detection techniques are based on modelling the background of
the image i.e. set the background and detect the changes that occur.
• Defining the proper background can be very difficult when it contains shapes,
shadows and moving objects.
• While defining the background, it is assumed by all the techniques that the
stationary objects could vary in color and intensity over time.
Foreground removal of video sequences

A good foreground system should be able to develop a robust foreground


model which should be immune to lighting changes, repetitive movements
such as leaves, waves, shadows and long term changes
Methods involved

• Background subtraction is generally based on a static background


hypothesis which is not really applicable in real-time situations.
• With indoor senses, reflections or animated images on-screen lead to
background changes.
• To deal with such issues the following methods are used based on
applications
Temporal average filter
• This filter estimates the background model from the median of all
pixels from previous sequence images.
• This uses a buffer with pixel values of the last two frames to update
the median for each image.
• To model the background the system examines all the sequences in a
given time period called the training time period during which the
median is calculated pixel by pixel of all the plots.
• After training time each new frame and each new pixel value is
compared with the input value as previously calculated and if the
input pixel of the frame under observation is within the threshold
limit then it is mapped as background pixel; else it is mapped as a
foreground pixel
This method is not as efficient as
it has to be because it operates
based on a buffer system which
requires high computational cost
and does not represent any
statistical base.
Conventional Approaches

• Any robust background subtraction model should be able to handle


light intensity changes, repeated motion from long term scene
changes.
• The analysis of such an approach mathematically can be modelled
using a function P(x,y,t) as a video sequence where t is the time
dimensions x and y are the pixel locations.
• Example P(4,5,2) is the pixel intensity at 4,5-pixel location of the
image at t=2 in the video sequence.
• Frame Difference
• Mathematically it can be modelled as
| Framei – Framei-1 | > Threshold
• The estimated background using the Frame
Difference approach is just the previous
frame estimated by the above empirical way.
• This approach can be used when segment
motion-based objects such as cars,
pedestrians etc.
• It evidently works only in a particular
condition such as an object’s speed and
frame rate.
• And it is very sensitive to threshold values. So
depending on object structure, speed, frame
rate and global threshold limit this approach
has limited use cases.
Below are some cases of this approach based on threshold values
The Mixture of Gaussians

• The Mixture of Gaussians or MoG is a mixture of k Gaussians


distribution models for each background pixel with values for k within
3 and 5.
• The inventor assumes that different distributions each represent the
different background and foreground colors.
• The weight of each one of those used distributions on the model is
proportional to the amount of time each color stays on that pixel.
• Therefore when the weight of pixel distribution is low then that pixel
is classified as a foreground pixel.
Optical flow
• Optical flow is the motion of
objects between consecutive
frames of sequence, caused by the
relative movement between the
object and camera
• where between consecutive
frames, we can express the image
intensity (I) as a function of
space (x,y) and time (t)
• In other words, if we take the first
image I(x,y,t) and move its pixels
by (dx,dy) over t time, we obtain
the new image I(x+dx,y+dy,t+dt)
• First, we assume that pixel intensities of an object are constant
between consecutive frames.

• Second, we take the Taylor Series Approximation of the RHS and


remove common terms
• Third, we divide by dt to derive the optical flow equation

• where u=dx/dt and v=dy/dt


• dI/dx,dI/dy, and dI/dt are the image gradients along the
horizontal axis, the vertical axis, and time.
• Hence, we conclude with the problem of optical flow, that is,
solving u(dx/dt) and v(dy/dt) to determine movement over
time
• Sparse optical flow provides the flow vectors for selected “interesting
features” (such as the edges or corners of an object) in the frame,
• Dense optical flow provides the flow vectors for all pixels in the
frame, up to one flow vector per pixel.
• As expected, Dense optical flow offers higher accuracy but comes at
the cost of being computationally expensive and slow
Kanade-Lucas-Tomasi (KLT) Tracker
• One of the most popular methods for computing optical flow is the
Lucas-Kanade method.
• This method estimates the flow field by tracking the displacement of small
patches of pixels within an image over time.
• The method starts by dividing the image into small patches, and then tracks
the displacement of each patch between two consecutive frames.
• The displacement of each patch is determined by minimizing the difference
between the patch in the first frame and the patch in the second frame.
• The Lucas-Kanade method requires key-points as input for its computation,
to identify which pixels to track for motion.
• One common approach for obtaining these key-points is to use the
Shi-Tomasi corner detection technique, which detects corners of objects
and passes them to the LK method for further processing.
• How should we select features?
• How should we track them from frame to frame?
KLT Algorithm
Applications
• Video compression: By using optical flow to forecast pixel motion between frames, it is
feasible to lower the amount of data required to describe a movie.
• Object tracking: In applications like surveillance and driverless cars, it is possible to track
moving objects in a video using optical flow.
• Image stabilization: Images or movies can be made more steady by employing optical
flow to detect camera motion.
• Scene comprehension: Optical flow can be used to extract information about the
organization and movement of items in a scene, information that can be used to infer the
scene’s layout and the actions of the objects inside it.
• Augmented reality: By using optical flow to estimate camera and real-world object
motion, augmented reality experiences can be made that are stable and lifelike.
• Human-Computer Interaction: Interactive games or systems can be operated using
optical flow and hand gestures or motions.
• Robotics: The velocity of a robot and its surroundings may be estimated using optical
flow, which can be utilized for mapping, localization, and navigation.
Spatio-Temporal Analysis

• Spatial refers to space.


• Temporal refers to time.
• Spatiotemporal, or spatial temporal, is used in data analysis when
data is collected across both space and time.
• It describes a phenomenon in a certain location and time — for
example, shipping movements across a geographic area over time
(see above example image).
• A person uses spatial-temporal reasoning to solve multi-step
problems by envisioning how objects move in space and time
Different kinds of ST data
• Event Data : Data collected at point location and point time are events. Disease
Outbreaks, Crimes and Traffic Accidents are all examples of point data.
• Trajectory Data: Trajectory data as the name suggests is the space occupied by a
subject at a given time. It is the path of the subject. A good example of such data
is senor data captured for moving bodies, traffic data and location based services.
• Point Reference Data: Point reference data consists of the capture of a single field
over a certain area in certain periods of time. An example of this would be
humidity of different points in a state over a period of time. The key point to note
is that it is a continuous and not a discrete variable.
• Raster Data: Raster data is the measurement of a continuous or a discrete ST field
in fixed locations at fixed periods of time. A good example of such data is the fMRI
Images of brain captured over fixed time intervals to detect blood flow change.
• Video Data: Video data exhibits both spatial correlation and temporal variance
this is also a form of ST data
Represent the data
• A ST point is a tuple containing the spatial and temporal components
along with additional information
What makes ST data different

• ST data is not very conducive to traditional data-mining techniques because


of a variety of reasons
• Classical datasets are discrete while ST data is embedded in a continuous space i.e.,
the data is captured at different points in space and time
• Patterns of ST also exhibit correlations and are not independently generated
• As an example consider the spread of a contagion through a country.
There is an identifiable spatial correlation in the mortality rates for
states that are closer to each other than that are farther away from
each other.
• On the other hand there is also a temporal correlation imposed on it.
The presence of such autocorrelation breaks the “Independent
Samples” assumption that is foundational to traditional data mining
techniques.
Framework to solve an STDM problem

• The steps would ideally be:


• The raw data from different sources for ST data have to be constructed for
data storage. The data instances are of the form of time series, spatial maps,
points or raster data.
• The ST data thus constructed has to be further processed to fit the different
deep learning models. Data is usually represented as sequence data, 2D
matrices, 3D Sensors or graphs.
• Based on the data, one can choose RNN or LSTMs(Temporal data),
CNN(Spatially correlated data) or a hybrid model which can handle them
both. Finally, the deep learning models that are thus constructed can be used
to handle various tasks such as prediction, classification.
Light at Surfaces
• Consider the simple scene: -
• light rays hit surface A and are scattered
• according to the material properties
• some of light from A hits B
• some of light from B hits A
• observed colour is a combination of direct and reflected
illumination
• Recursive process
• Inter-reflection between surfaces causes colour bleeding i.e
a white surface next to a red surface appears red
• The observed colour is a result of multiple interactions
among light sources and reflective surfaces
• Requires global solution
• integrating all rays of light from all light sources and surface
inter-reflections based on material properties
Light sources

• Two fundamental processes


• self-emission (due to an internal energy source)
• Reflection
• Simple light sources (only self-emission)
• object with a surface
• each point on surface (x,y,z) emits light
• light emitted at a point is a function of angle (φ,ϕ) and wavelength λ
• General light source illumination function: I(x,y,z,φ,ϕ,λ)
• six parameter function
• Total Illumination = Integral over the surface of the light source for all
angles & all wavelenths
• For a distributed light source (such as a light bulb) solving the integral
of 6 parameters is difficult and computationally expensive
• 4 Basic light source types
• Ambient - equal light in all directions
• Point source - light from a single point in all directions
• Spot light - point source with limited range of directions
• Distant source - all light rays are parallel
• with this combination of lights we can approximate most physical
lighting conditions
• consider modeling of each in detail
Light Colour

• In addition to light intensity we must model the emitted light colour


• the amount of light emitted at different frequencies varies I(λ)
• In practice we can approximate the colour with a 3 component (r,g,b)
colour model
• human vision system is based on three-colour theory which says that we percieve
three primary colours (red,green,blue) rather than a full colour distribution
• therefore, we can generate images with a realistic appearance using a 3 colour
model
• Describe the light source colour with a 3 component intensity or
‘luminance’ function
• I = [Ir,Ig,Ib]
Ambient Illumination
• Uniform illumination in all directions ie
sun on a cloudy day
• Could model as many distributed
sources added together - very expensive
• Model by uniform illumination of a
surface point by ‘ambient light’ intensity
• Ia = [Iar,Iag,Iab]
• Every point receives the same
illumination Ia
• each surface point can reflect this different
according to material properties
Point Source

• Emits light equally in all directions from position


p0

• A point source produces high-contrast scenes:


objects appear either bright due to direct
illumination or dark due to no illumination (hard
shadows)
• In real-scenes light sources have a finite size and
produce soft-shadows
• combination of ambient & point source gives soft
shadows
Spot Lights

• Point source with a narrow range of


angles through which light is emitted
• Limit light to a cone whose apex is at ps
and direction ls and width θ
• A more realistic spot light has
non-uniform intensity distribution
across the cone
Distant Light Sources
• Light rays are parallel
• all rays have the same direction so we
don’t need to recompute the direction
vector (p-p0) for each surface poin
• calculation for parallel light sources is
analagous to parallel projection
• In homogenous coordinates for parallel
projection we can represent a distance light
source as a vector rather than a point as
used for a point source
Light-Surface Interaction

• When light strikes a surface some is absorbed & some reflected


• light-surface interaction depends on material properties(colour/roughness)
• is a function of wavelength
• shading also depends on the surface orientation relative to light source & viewer
• Light-Surface interaction can be classified into three categories
• Specular
• most of the light is reflected along a single axis at the reflection angle ie shiny surface
• a mirror is a perfect specular surface
• Diffuse
• light is scattered in all directions ie matt surface
• a perfect diffuse surface scatters light equally in al ldirections & appears the same from all directions
• Translucent
• Some light penetrates the surface to emerge elsewhere through refraction ie glass/water
• some incident light may also be reflected
Phong Reflection Model
Phong Ambient Light
Phong Diffuse Reflection
Phong specular Reflection
Implementation
Reflectance Map
• The apparent "brightness" of a surface patch depends on the
orientation of the patch relative to the viewer and the light sources.
• Different surface elements of a nonplanar object will reflect different
amounts of light towards an observer as a consequence of their
differing attitude in space.
• A smooth opaque object will thus give rise to a shaded image, one in
which brightness varies spatially, even though the object may be
illuminated evenly and covered by a uniform surface layer.
• This shading provides important information about the object's shape
• A convenient representation for the relevant information is the
"reflectance map“
• The reflectance map, R(p , q), gives scene radiance as a function of
surface gradient (p,q) in a viewers entered coordinate system.
• If z is the elevation of the surface above a reference plane lying
perpendicular to the optical axis of the imaging system, and if x and y
are distances in this plane measured parallel to orthogonal
coordinate axes in the image, then p and q are the first partial
derivatives of z with respect to x and y
• When light hits an object’s surface, it is scattered and reflected .
• The most general model of light scattering is the bidirectional reflectance
distribution function (BRDF).
• BRDF is a four dimensional function that describes how much of each
wavelength arriving at an incident direction vˆi is emitted in a reflected
direction vˆr
• The function can be written in terms of the angles of the incident and
reflected directions relative to the surface frame as
• fr(θi , φi , θr, φr; λ)
• The BRDF is reciprocal, i.e., because of the physics of light transport, you
can interchange the roles of vˆi and vˆr. This is sometimes called Helmholtz
reciprocity
Diffuse reflection

• The diffuse component (also known as Lambertian or matte


reflection) scatters light uniformly in all directions and is the
phenomenon we most normally associate with shading, e.g., the
smooth (non-shiny) variation of intensity with surface normal that is
seen when observing
• The shading equation for diffuse reflection can thus be written as
Ld(vˆr; λ) = X i Li(λ)fd(λ) cos+ θi = X i Li(λ)fd(λ)[vˆi · nˆ] +,
• where [vˆi · nˆ] + = max(0, vˆi · nˆ).
Specular reflection


Albedo Estimation
• The characteristic color of an object is called albedo
• Note that an object can not reflect more light than it receives (unless it emits
light, which is the case of light sources).
• The color of an object can generally be computed (at least for diffuse surfaces) as
the ratio of reflected light over the amount of incoming (white) light.
• Because an object can not reflect more light than it receives, this ratio is always
lower than 1.
• This is why the colors of objects are always defined in the RGB system between 0
and 1 if you use float or 0 and 255 if you a byte to encode colors.
• It's better to define this ratio as a percentage. For instance, if the ratio, the color,
or the albedo (these different terms are interchangeable) is 0.18, then the object
reflects 18% of the light it receives back in the environment.
• If we defined the color of an object as the ratio of the amount of reflected light over the amount of
light incident on the surface (as explained in the note above), that color can't be greater than one.
• This doesn't mean though that the amount of light incident and reflected off of the surface of an
object can't be greater than one (it's only the ratio between the two that can't be greater than
one). What we see with our eyes, is the amount of light incident on a surface, multiplied by the
object's color.
• For example, if the energy of the light impinging upon the surface is 1000, and the color of the
object is 0.5, then the amount of light reflected by the surface to the eye is 500 (this is wrong from
the point of view of physics, but this is just for you to get the idea - in the lesson on shading and
light transport, we will look into what this 1000 or 500 values mean in terms of physical units, and
learn that it's more complicated than just multiplying the number of photons by 0.5 or whatever
the albedo of the object is)
• Thus assuming we know what the color of an object is, to compute the actual brightness of a point
P on the surface of that object under some given lighting conditions (brightness as in the actual
amount of light energy reflected by the surface to the eye and not as in the actual brightness or
luminance of the object's albedo), we need to account for two things:
• How much light falls on the object at this point?
• How much light is reflected at this point in the viewing direction?
• It's too complex to simulate
light-matter interactions (interactions
happening at the microscopic and
atomic levels). Thus, we need to come
up with a different solution.
• The amount of light reflected from a
point varies with the view direction
• The amount of light reflected from a
point for a given view direction,
depends on the incoming light
direction .
Photometric stereo

• Photometric stereo is the problem of recovering the 3-dimensional


shape of a stationary scene given a collection of images of the scene
taken under variable lighting conditions
• Shading reveals 3D surface geometry
• Two shape-from-X methods that use shading
• Shape-from-shading: Use just one image to recover shape. Requires
knowledge of light source direction and BRDF everywhere. Too restrictive to
be useful
• Photometric stereo: Single viewpoint, multiple images under different lighting
• Basically, photometric stereo assesses the surface normals of an object through
observation of that object under different lighting conditions but viewed from the
same position, by exploiting variations in intensity of the lighting conditions
• During this process, it is assumed that the camera does not move in relation to
the illumination and no other camera settings are changed while grabbing the
series of images.
• The resulting images are used together to create a single composite image, in
which the resulting radiometric constraint makes it possible to obtain local
estimates of both surface orientation and surface curvature
• Photometric stereo imaging aims to illustrate the surface orientation and
curvature of an object based on data derived from a known combination of
reflectance and lighting in multiple images—an invaluable process for machine
vision applications with tight quality control and assurance demands, such as on
production or manufacturing lines
• Lambert first outlined the concept of perfect light diffusion or
“Lambertian reflectance”
• A Lambertian surface is considered the ideal matte surface, wherein
surface radiation is uniform in all directions regardless of the
observer’s angle of view
• Photometric stereo theory was subsequently expanded to encompass
non-Lambertian reflectance models, including those from Phong,
Torrance-Sparrow, and Ward—and more recent studies are
expanding the potential of the technique
• Photometric stereo registration operations use material reflectance properties
and the surface curvature of objects when calculating the resulting enhanced
image based on Lambertian (matte, diffuse) surfaces. In this case, the diffuse
reflected intensity I is proportional to the angle between the incident light
direction L and the surface normal n
• This proportion is driven by the albedo reflectivity alpha that constitutes
Lambert’s Cosine Law.
• The albedo of a surface is the fraction of the incident sunlight that the surface
reflects
• Therefore, the unknowns to be determined in this example are the surface normal
n and the albedo reflectivity alpha. The light is considered distant and all rays of
light are considered parallel. Given the L directions within the illumination
configuration (one per light source), the directions L are thus known in advance
• Determining these unknowns begins with acquiring images of an object for
a minimum of three non-coplanar light directions, wherein each image
corresponds to a specific light direction (Figure .
• Using these unique directional lighting images, the albedo values and
normals can be calculated for each point of the object
• Information about the surface normals derived from photometric stereo
analysis can reveal important information about the objects’ surface, such
as the presence of surface irregularities (like embossed or engraved
features, scratches, or indentations) that can occur despite the expectation
that the surface is smooth.
• Although the photometric stereo technique requires at least three images
to determine the normals in a given scene, more directional lighting
sources than this minimal number are typically used to reduce noise
inherent in the imaging process and to generate more accurate images.
• The redundancy provided by multiple images leads to better analysis
results. Typically, in practice, a minimum of four images are needed

You might also like