Computer Vision
Computer Vision
2. Computer vision helps to understand the complexity of the human vision system and trains
computer systems to interpret and gain a high-level understanding of digital images or videos.
1959: The first experiment with computer vision was initiated in 1959, where they showed a cat
as an array of images. Scientifically, this means that image processing begins with simple shapes
such as straight edges.
1963: This was another great achievement for scientists when they developed
computers that could transform 2D images into 3-D images.
1974: This year, optical character recognition (OCR) and intelligent character recognition (ICR)
technologies were successfully discovered. The OCR has solved the problem of recognizing text
printed in any font or typeface, whereas ICR can decrypt handwritten text.
1982: In this year, the algorithm was developed to detect edges, corners, curves, and
other shapes. Further, scientists also developed a network of cells that could recognize
patterns.
2010: The ImageNet data set became available to use with millions of tagged images,
which can be considered the foundation for recent Convolutional Neural Network (CNN)
and deep learning models.
2012: CNN has been used as an image recognition technology with a reduced error rate.
2014: COCO has also been developed to offer a dataset for object detection and support
future research.
On a certain level, computer vision is all about pattern recognition which includes the
training process of machine systems for understanding the visual data such as images
and videos, etc.
Firstly, a vast amount of visual labeled data is provided to machines to train it. This
labeled data enables the machine to analyze different patterns in all the data points and
can relate to those labels. E.g., suppose we provide visual data of millions of dog images.
In that case, the computer learns from this data, analyzes each photo, shape.
Task Associated with Computer Vision
Although computer vision has been utilized in so many fields, there are a few common
tasks for computer vision systems. These tasks are given below:
o Facial recognition: Computer vision has enabled machines to detect face images of
people to verify their identity. Initially, the machines are given input data images in which
computer vision algorithms detect facial features and compare them with databases of
fake profiles. Popular social media platforms like Facebook also use facial recognition to
detect and tag users. Further, various government spy agencies are employing this
feature to identify criminals in video feeds.
o Healthcare and Medicine: Computer vision has played an important role in the
healthcare and medicine industry. Traditional approaches for evaluating cancerous tumors
are time-consuming and have less accurate predictions, whereas computer vision
technology provides faster and more accurate chemotherapy response assessments;
doctors can identify cancer patients who need faster surgery with life-saving precision.
o Self-driving vehicles: Computer vision technology has also contributed to its role in self-
driving vehicles to make sense of their surroundings by capturing video from different
angles around the car and then introducing it into the software.
o Optical character recognition (OCR)
Optical character recognition helps us extract printed or handwritten text from visual data
such as images. Further, it also enables us to extract text from documents like invoices,
bills, articles, etc.
o Machine inspection: Computer vision is vital in providing an image-based automatic
inspection. It detects a machine's defects, features, and functional flaws, determines
inspection goals, chooses lighting and material-handling techniques, and other
irregularities in manufactured products.
o Retail (e.g., automated checkouts): Computer vision is also being implemented in the
retail industries to track products, shelves, wages, record product movements into the
store, etc. This AI-based computer vision technique automatically charges the customer
for the marked products upon checkout from the retail stores.
o 3D model building: 3D model building or 3D modeling is a technique to generate a 3D
digital representation of any object or surface using the software. In this field also,
computer vision plays its role in constructing 3D computer models from existing objects.
o Medical imaging: Computer vision helps medical professionals make better decisions
regarding treating patients by developing visualization of specific body parts such as
organs and tissues. It helps them get more accurate diagnoses and a better patient care
system.
o Automotive safety: Computer vision has added an important safety feature in
automotive industries. E.g., if a vehicle is taught to detect objects and dangers, it could
prevent an accident and save thousands of lives and property.
o Surveillance: It is one of computer vision technology's most important and beneficial use
cases. Nowadays, CCTV cameras are almost fitted in every place, such as streets, roads,
highways, shops, stores, etc., to spot various doubtful or criminal activities. It helps
provide live footage of public places to identify suspicious behavior, identify dangerous
objects, and prevent crimes by maintaining law and order.
o Fingerprint recognition and biometrics: Computer vision technology detects
fingerprints and biometrics to validate a user's identity. Biometrics deals with recognizing
persons based on physiological characteristics, such as the face, fingerprint, vascular
pattern, or iris, and behavioral traits, such as gait or speech. It combines Computer Vision
with knowledge of human physiology and behavior.
Computer Vision Challenges
There are a few challenges observed while working with computer vision technology.
Computer vision is like eyes for an AI system, which means if AI enables the
machine to think, computer vision enables the machines to see and observe the
visual inputs.
o X-Ray Analysis
Computer vision can be successfully applied for medical X-ray imaging. With computer
vision, X-ray analysis can be automated with enhanced efficiency and accuracy. The state-
of-art image recognition algorithm can be used to detect patterns in an X-ray image that
are too subtle for the human eyes.
o Cancer Detection
Computer vision is being successfully applied for breast and skin cancer detection. With
image recognition, doctors can identify anomalies by comparing cancerous and non-
cancerous cells in images.
o CT Scan and MRI
Computer vision has now been greatly applied in CT scans and MRI analysis. AI with
computer vision designs such a system that analyses the radiology images with a high
level of accuracy, similar to a human doctor, and also reduces the time for disease
detection, enhancing the chances of saving a patient's life.
o Self-driving cars
Computer vision is widely used in self-driving cars. It is used to detect and classify objects
(e.g., road signs or traffic lights), create 3D maps or motion estimation, and plays a key
role in making autonomous vehicles a reality.
o Pedestrian detection
Computer vision has great application and research in Pedestrian detection. With the help
of cameras, pedestrian detection automatically identifies and locate the pedestrians in
image or video. This pedestrian detection is very helpful in different fields such as traffic
management, autonomous driving, transit safety, etc.
o Road Condition Monitoring & Defect detection
Computer vision has also been applied for monitoring the road infrastructure condition by
accessing the variations in concrete and tar. A computer vision-enabled system
automatically senses pavement degradation, which successfully increases road
maintenance allocation efficiency and decreases safety risks related to road accidents.
o Defect Detection
This is perhaps, the most common application of computer vision. With computer vision,
we can detect defects such as cracks in metals, paint defects, bad prints, etc.
o Analyzing text and barcodes (OCR)
Nowadays, each product contains a barcode on its packaging, which can be analyzed or
read with the help of the computer vision technique OCR. Optical character recognition or
OCR helps us detect and extract printed or handwritten text from visual data such as
images.
o Fingerprint recognition and Biometrics
Computer vision technology is used to detect fingerprints and biometrics to validate a
user's identity.
Biometrics is the measurement or analysis of physiological characteristics of a person that
make a person unique such as Face, Finger Print, iris Patterns, etc. It makes use of
computer vision along with knowledge of human physiology and behaviour.
o 3D Model building
3D model building or 3D modelling is a technique to generate a 3D digital representation
of any object or surface using the software. Computer vision plays its role here also in
constructing 3D computer models from existing objects.
o Crop Monitoring
In the agriculture sector, crop and yield monitoring are the most important tasks for better
agriculture. With computer vision systems, real-time crop monitoring and identification of
any crop variation due to any disease or deficiency of nutrition can be made.
o Automatic Weeding
An automatic weeding machine is an intelligent project enabled with AI and computer
vision that removes unwanted plants or weeds around the crops. Traditionally weeding
methods require human labour, which is costly and inefficient compared to automatic
weeding systems.
Computer vision enables the intelligent detection and removal of weeds using robots,
which reduces costs and ensures higher yields.
o Plant Disease Detection
Computer vision is also used in automated plant disease detection, which is important at
an early stage of plant development.
o Self-checkout
Self-checkout enables the customers to complete their transactions from a retailer without
the need for human staff, and this becomes possible with computer vision. Self-checkouts
are now helping retailers in avoiding long queues and manage customers.
o Automatic replenishment
Automated stock replenishment is a leading technology innovation in retail sectors.
Automatic replenishment with computer vision systems captures the image data and
performs a complete inventory scan to track the shelves item at regular intervals.
o People Counting
Nowadays, various situations occur where we may need the count of people or customers
entering and leaving the stores. This foot count or people counting can be done by
computer vision systems that analyze the image or video data captured by the in-store
cameras. People counting is helpful in managing the people and allowing the limited
people for cases such as Covid social distancing.
A typic process of Computer vision is illustrated in the above image. It mainly performs three steps, which are:
1. Capturing an Image
A computer vision software or application always includes a digital camera or CCTV to capture the image. So,
firstly it captures the image and puts it as a digital file that consists of Zero and one's.
In the next step, different CV algorithms are used to process the digital data stored in a file. These algorithms
determine the basic geometric elements and generate the image using the stored digital data.
Finally, the CV analyses the data, and according to this analysis, the system takes the
required action for which it is designed.
Image classification is the simplest technique of Computer Vision. The main aim of image
classification is to classify the image into one or more different categories. Image
classifier basically takes an image as input and tells about different objects present in
that image, such as a person, dog, tree, etc.
2. Object Detection
Object detection is another popular technique of computer vision that can be performed
after Image classification or which uses image classification to detect the objects in
visual data. It is basically used to recognize the objects within the boundary boxes and
find the class of the objects in the image.
Object detection has several applications, including object tracking, retrieval, video
surveillance, image captioning, etc.
3. Semantic Segmentation
Semantic Segmentation is not only about detecting the classes in an image as image
classification. Instead, it classifies each pixel of an image to specify what objects it has. It
tries to determine the role of each pixel in the image. It classifies similar objects as a
single class from the pixel levels. For example, if an image contains two dogs, then
semantic segmentation will put both the dogs under the same label. It tries to
understand the role of each pixel in an image.
4. Instance Segmentation
Instance segmentation can classify the objects in an image at pixel level as similar to
semantic segmentation but with a more advanced level. It means Instance Segmentation
can classify similar types of objects into different categories. For example, if visual
consists of various cars, then with semantic segmentation, we can tell that there are
multiple cars, but with instance segmentation, we can label them according to their
colour, shape, etc.
Using the below image, we can analyse the difference between semantic segmentation
and instance segmentation, where semantic segmentation classified all the persons as
singly entities, whereas instance segmentation classified all the persons as different by
considering colours also.
5. Panoptic Segmentation =Semantic + Instance Segmentation
6. Keypoint Detection
Keypoint detection tries to detect some key points in an image to give more details
about a class of objects. It basically detects people and localizes their key points. There
are mainly two keypoint detection areas, which are Body Keypoint
Detection and Facial Keypoint Detection.
For example, Facial keypoint detection includes detecting key parts of the human face
such as the nose, eyes, corners, eyebrows, etc. Keypoint detection mainly has
applications, including face detection, pose detection, etc.
With Pose estimation, we can detect what pose people have in a given image, which
usually includes where the head, eyes, nose, arms, shoulders, hands, and legs are in an
image. This can be done for a single person or multiple people as per the need.
7. Person Segmentation
8. Depth Perception
Depth perception is a computer vision technique that provides the visual ability to
machines to estimate the 3D depth/distance of an object from the source. Depth
Perception has wide applications, including the Reconstruction of objects in Augmented
Reality, Robotics, self-driving cars, etc.
9. Image Captioning
Image captioning, as the name suggests, is about giving a suitable caption to the image
that can describe the image. It makes use of neural networks, where when we input an
image, then it generates a caption for that image that can easily describe the image. It is
not only the task of Computer vision but also an NLP task.