Robotics Chapter 5 - Robot Vision
Robotics Chapter 5 - Robot Vision
ROBOT VISION
Robotic vision may be defined as the process of acquiring and extracting information from images of 3-D
world. Robotic vision is primarily targeted at manipulation and interpretation of image and use of this
information in robot operation control.
Machine Vision is concerned with the sensing of vision data and its interpretation by a computer. The
typical vision system consists of the camera and digitizing hardware, a digital computer, and hardware and
software necessary to interface them. This interface hardware and software is often referred to as a
preprocessor. The operation of the vision system consists of three functions:
The relationships between the three functions are illustrated in the figure 5.1.
The sensing and digitizing functions involve the input of vision data by means of a camera focused on the
scene of interest. Special lighting techniques are frequently used to obtain an image of sufficient contrast for
later processing. The image viewed by the camera is typically digitized and stored in computer memory.
The digital image is called a frame of vision data, and is frequently captured by a hardware device called a
frame grabber. These devices are capable of digitizing images at the rate of 30 frames per second. The
frames consist of a matrix of data representing projections of the scene sensed by the camera. The elements
of the matrix are called picture elements or pixels. A single pixel is the projection of a small portion of the
scene which reduces that portion to a single value. The value is a measure of the light intensity for that
element of the scene. Each pixel intensity is converted into a digital value.
The digitized image matrix for each frame is stored and then subjected to image processing and analysis
functions for data reduction and interpretation of the image. Typically an image frame will be threshold to
produce a binary image, and then various feature measurements will further reduce the data representation
of the image. This data reduction can change the representation of a frame from several hundred thousand
bytes of raw image data to several hundred bytes of feature value data. The resultant feature data can be
analyzed in the available time for action by the robot system.
5.1.3 Application: The current applications of machine vision in robotics include inspection, part
identification, location and orientation.
5.2 Classification of vision system
The basic process of imaging, getting an image for computer processing, from the light source to an
algebraic image array is depicted by the schematic in figure 5.2.
The light source illuminates the object and the camera captures the reflected light. The image formed in the
camera is converted into analog signal (voltage) with the help of suitable transducers.Finally,the analog
voltages are digitized and converted into an algebraic array. This array is the image to be processed and
interpreted by the computer according to predefined algorithms.
Figure 5.2 Capturing an image and its digitization for further computer processing
5.4 Image Acquisition
The first link in the vision chain is the camera. It plays the role of robotic Eye or the sensor. This is the
imaging component or the noncontact or remote sensor. The visual information is converted into electrical
signals in the camera and when sampled spatially and quantized, these signals give a digital image in real-
time by a process called digitizing
The robotic vision cameras are essentially optoelectronic transducers, which convert optical input signal to
Electrical output signal. They fall in the domain of TV cameras. There is a variety of cameras technologies
available for imaging. Some of these are black –and-white vidicon tube; solid –state cameras based on
Charged –coupled devices [CCD] , charge injection devices [CID], and silicon bipolar sensor cameras.
5.4.1 Vidicon tube
The Basic structure of the vidicon camera tube is shown in figure 5.3.The optical image is formed on the
glass faceplate coated with a thin photosensitive layer composed of a large number of tiny photo resistive
elements. The resistance of the element decreases with increasing illumination. Once the image forms on the
faceplate, a charge is accumulated, which is function of the intensity of the impinging light over a specified
time, from which an electrical video signal is derived.
The charge built up is read by scanning the photosensitive layer by a focused electron beam produced by the
electron gun at the rear of the tube. The scanning is controlled by a deflection coil mounted along the length
of the tube. The electron beam is made to scan the entire surface, typically, 30 times per second, line by line,
consisting of over 500 scan lines for the whole image as shown in the figure. Each complete scan is called a
frame.
The charge-coupled device falls in the category of solid-state semiconductor devices. A monolithic array of
closely spaced metal oxide semiconductor forms the photosensitive layer. The light is absorbed on the
photoconductive substrate and charge accumulates around the isolated “wells” under control of electrodes as
shown in figure 5.4.Each isolated well represents a pixel. Charges are accumulated for the time it takes to
complete a single image scan. The charge built up is proportional to the intensity of image. Once the charge
is accumulated, it is transferred by the electrodes, line by line, to the registers.
The digitized image is stored in the computer memory for processing. This is a substantial task considering
the large amount of data that must be analysed.The various techniques to reduce the magnitude of the image
processing are:
The objective is to reduce the volume of data and as a preliminary step in the data analysis, the following
two schemes have found common usage for data reduction:
1. Digital conversion
2. Windowing
Digital conversion reduces the number of gray levels used by the machine vision system. For example, an 8-
n
bit register used for each pixel would have 2 = 256 gray levels. Depending on the requirements of the
application, digital conversion can be used to reduce the number of gray levels by using fewer bits to
represent the pixel light intensity. Four bits would reduce the number of gray levels to 15.This kind of
conversion would significantly reduce the magnitude of the image-processing problem.
Windowing involves using only a portion of the total image stored in the frame buffer for image processing
and analysis. This portion is called the window. For example, for inspection of printed circuit boards, one
may wish to inspect and analyze only one component on the board. A rectangular window is selected to
surround the component of interest and only pixels within the window are analysed.The rationale for
windowing is that proper recognition of an object involves only certain portions of the total scene.
5.5.2 Segmentation
Segmentation is a general term which applies to various methods of data reduction. In segmentation, the
objective is to group areas of an image having similar characteristics or features into distinct entities
representing parts of the image. For example, boundaries (edges) or regions (areas) represent two natural
segments of an image. There are many ways to segment an image. Three important techniques are:
1. Threshold
2. Region growing
3. Edge detection
5.5.2.1 Threshold
Threshold is a binary conversion technique in which each pixel is converted into a binary value, either black
or white. This is accomplished by utilizing a frequency histogram of the image and establishing what
intensity (gray level) is to be the border between black and white. Since it is necessary to differentiate
between the object and background, the procedure is to establish a threshold and assign, for example a
binary bit 1 for the object and 0 for the background. To improve the ability to differentiate, special
lightening techniques must often be applied to generate a high contrast.
When it is not possible to find a single threshold for an entire image (for example ,if many different objects
occupy the same scene, each having different levels of intensity),one approach is to partition the total image
into smaller rectangular areas and determine the threshold for each windows being analyzed.
Once threshold is established for a particular image, the next step is to identify particular areas associated
with objects within the image.
Region growing is a collection of segmentation techniques in which pixels are grouped in regions called
grid elements based on attribute similarities. Defined regions can then be examined as to whether they are
independent or can be merged to other regions by means of an analysis of the difference in their average
properties and spatial connectivity. To differentiate between the objects and the background, assign 1 for
any grid element occupied by an object and 0 for background elements. It is common practice to use a
square sampling grid with pixels spaced equally along each side of the grid. For two dimensional image of a
key as shown, this would give the pattern indicated figure 5.5 This technique of creating runs of 1s and 0s is
often used as a first pass analysis to partition the image into identifiable segments or blobs. The region
growing segmentation technique is applicable when images are not distinguishable from each other by
straight thresholding or edge detection technique.
Figure 5.5 Image segmentation a) Image pattern with grid b) Segmented image after runs test
Figure 5.5 Edge following procedure to detect the edge of a binary image
This technique considers the intensity change that occurs in the pixels at the boundary or edges of a part.
Given that a region of similar attributes has been found but the boundary shape is unknown, the boundary
can be determined by a simple edge following procedure. For the binary image as shown in the figure 5.5
the procedure is to scan the image until a pixel within the region is encountered. For a pixel within the
region, turn left and step, otherwise, turn right and step. The procedure is stopped when the boundary is
traversed and the path has returned to the starting pixel.
In machine vision applications, it is often necessary to distinguish one object from another. This is usually
accomplished by means of features that uniquely characterize the object. Some features of objects that can
be used in machine vision include area, diameter and perimeter. A feature, in the context of vision systems,
is a single parameter that permits ease of comparison and identification. The techniques available to extract
feature values for two dimensional cases can be roughly categorized as those that deal with boundary
features and those that deal with area features. The various features can be used to identify the object or
part and determine the part location and/or orientation.
The recognition algorithm must be powerful enough to uniquely identify the object. Object recognition
technique is classified into:
1. Template-matching technique
2. Structural technique.
The basic problem in template matching is to match the object with a stored pattern feature set defined as a
model template. The model template is obtained during the training procedure in which the vision system is
programmed for known prototype objects. The features of the object in the image (e.g., area, diameter,
aspect ratio) are compared to the corresponding stored values. These values constitute the stored template.
When a match is found, allowing for certain statistical variations in the comparison process, then the object
has been properly classified.
Structural techniques of pattern recognition consider relationships between features or edges of an object.
For example, if the image of an object can be subdivided into four straight lines (called primitives)
connected at their end points, and the connected lines are right angles, then the object is rectangle. The
majority of commercial robot vision systems make use of this approach to the recognition of two-
dimensional objects. The recognition algorithms are used to identify each segmented objects in an image
and assign it to a classification (e.g., nut, bolt. flange etc).