0% found this document useful (0 votes)
13 views

Image Processing - Techniques, Types, & Applications (2023)

Uploaded by

Nun
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Image Processing - Techniques, Types, & Applications (2023)

Uploaded by

Nun
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Image

BLOG COMPUTER VISION

Processing:
Techniques,
Types, &
Applications
[2023]
Image processing is the process
of manipulating digital images.
See a list of image processing
techniques, including image
enhancement, restoration, &
others.
min read · August 3, 2022
Rohit Kundu

Deep learning has revolutionized the world of


computer vision—the ability for machines to
“see” and interpret the world around them.
In particular, Convolutional Neural Networks
(CNNs) were designed to process image data
more efficiently than traditional Multi-Layer
Perceptrons (MLP).
Since images contain a consistent pattern
spanning several pixels, processing them one
pixel at a time—as MLPs do—is inefficient.
This is why CNNs that process images in patches
or windows are now the de-facto choice for
image processing tasks.
But let’s start from the beginning—

Examples of typical image processing operations

Here’s what we’ll cover:


What is Image Processing?
How Machines “See” Images?
Phases of Image Processing
Image Processing Techniques
Accurate AI
file analysis
at any scale
Turn images,
PDFs, or free-
form text into
structured
insights
Try for free

What is Image Processing?


Digital Image processing is the class of methods
that deal with manipulating digital images
through the use of computer algorithms. It is an
essential preprocessing step in many
applications, such as face recognition, object
detection, and image compression.
Image processing is done to enhance an existing
image or to sift out important information from it.
This is important in several Deep Learning-based
Computer Vision applications, where such
preprocessing can dramatically boost the
performance of a model. Manipulating images,
for example, adding or removing objects to
images, is another application, especially in the
entertainment industry.
This paper addresses a medical image
segmentation problem, where the authors used
image inpainting in their preprocessing pipeline
for the removal of artifacts from dermoscopy
images. Examples of this operation are shown
below.

Source: Paper

The authors achieved a 3% boost in performance


with this simple preprocessing procedure which
is a considerable enhancement, especially in a
biomedical application where the accuracy of
diagnosis is crucial for AI systems. The
quantitative results obtained with and without
preprocessing for the lesion segmentation
problem in three different datasets are shown
below.

Source: Paper

Types of Images / How


Machines “See” Images?
Digital images are interpreted as 2D or 3D
matrices by a computer, where each value or
pixel in the matrix represents the amplitude,
known as the “intensity” of the pixel. Typically,
we are used to dealing with 8-bit images,
wherein the amplitude value ranges from 0 to
255.

Image by the author

Thus, a computer “sees” digital images as a


function: I(x, y) or I(x, y, z), where “I” is the pixel
intensity and (x, y) or (x, y, z) represent the
coordinates (for binary/grayscale or RGB images
What is
respectively) of the pixel in the image. Automate
Image
Processing? repetitive
Types of
tasks with
Images /
Webinar: Building Trustworthy Medical AI with AWS and Franklin.ai -> V7'syournewseat
Save
How
Machines Gen AI tool
“See”
Products Industries Company Resources Pricing
Images? Sign up Request a demo
Phases of
Image
Processing
Explore V7 Go
Image
Processing
Techniques
Key Convention of the coordinate system used in an image
Takeaways
Computers deal with different “types” of images
based on their function representations. Let us
look into them next.
1. Binary Image
Images that have only two unique values of pixel
intensity- 0 (representing black) and 1
(representing white) are called binary images.
Such images are generally used to highlight a
discriminating portion of a colored image. For
example, it is commonly used for image
segmentation, as shown below.

Source: Paper

2. Grayscale Image
Grayscale or 8-bit images are composed of 256
unique colors, where a pixel intensity of 0
represents the black color and pixel intensity of
255 represents the white color. All the other 254
values in between are the different shades of
gray.
An example of an RGB image converted to its
grayscale version is shown below. Notice that the
shape of the histogram remains the same for the
RGB and grayscale images.

3. RGB Color Image


The images we are used to in the modern world
are RGB or colored images which are 16-bit
matrices to computers. That is, 65,536 different
colors are possible for each pixel. “RGB”
represents the Red, Green, and Blue “channels”
of an image.
Up until now, we had images with only one
channel. That is, two coordinates could have
defined the location of any value of a matrix.
Now, three equal-sized matrices (called
channels), each having values ranging from 0 to
255, are stacked on top of each other, and thus
we require three unique coordinates to specify
the value of a matrix element.
Thus, a pixel in an RGB image will be of color
black when the pixel value is (0, 0, 0) and white
when it is (255, 255, 255). Any combination of
numbers in between gives rise to all the different
colors existing in nature. For example, (255, 0, 0)
is the color red (since only the red channel is
activated for this pixel). Similarly, (0, 255, 0) is
green and (0, 0, 255) is blue.
An example of an RGB image split into its channel
components is shown below. Notice that the
shapes of the histograms for each of the
channels are different.

Splitting of an image into its Red, Green and Blue channels


4. RGBA Image
RGBA images are colored RGB images with an
extra channel known as “alpha” that depicts the
opacity of the RGB image. Opacity ranges from a
value of 0% to 100% and is essentially a “see-
through” property.
Opacity in physics depicts the amount of light
that passes through an object. For instance,
cellophane paper is transparent (100% opacity),
frosted glass is translucent, and wood is opaque.
The alpha channel in RGBA images tries to mimic
this property. An example of this is shown below.
Example of changing the “alpha” parameter in RGBA images

Phases of Image
Processing
The fundamental steps in any typical Digital
Image Processing pipeline are as follows:
1. Image Acquisition
The image is captured by a camera and digitized
(if the camera output is not digitized
automatically) using an analogue-to-digital
converter for further processing in a computer.
2. Image Enhancement
In this step, the acquired image is manipulated to
meet the requirements of the specific task for
which the image will be used. Such techniques
are primarily aimed at highlighting the hidden or
important details in an image, like contrast and
brightness adjustment, etc. Image enhancement
is highly subjective in nature.
3. Image Restoration
This step deals with improving the appearance of
an image and is an objective operation since the
degradation of an image can be attributed to a
mathematical or probabilistic model. For
example, removing noise or blur from images.
4. Color Image Processing
This step aims at handling the processing of
colored images (16-bit RGB or RGBA images), for
example, peforming color correction or color
modeling in images.
5. Wavelets and Multi-Resolution
Processing
Wavelets are the building blocks for representing
images in various degrees of resolution. Images
subdivision successively into smaller regions for
data compression and for pyramidal
representation.
6. Image Compression
For transferring images to other devices or due
to computational storage constraints, images
need to be compressed and cannot be kept at
their original size. This is also important in
displaying images over the internet; for example,
on Google, a small thumbnail of an image is a
highly compressed version of the original. Only
when you click on the image is it shown in the
original resolution. This process saves bandwidth
on the servers.
7. Morphological Processing
Image components that are useful in the
representation and description of shape need to
be extracted for further processing or
downstream tasks. Morphological Processing
provides the tools (which are essentially
mathematical operations) to accomplish this. For
example, erosion and dilation operations are
used to sharpen and blur the edges of objects in
an image, respectively.
8. Image Segmentation
This step involves partitioning an image into
different key parts to simplify and/or change the
representation of an image into something that is
more meaningful and easier to analyze. Image
segmentation allows for computers to put
attention on the more important parts of the
image, discarding the rest, which enables
automated systems to have improved
performance.
9. Representation and Description
Image segmentation procedures are generally
followed by this step, where the task for
representation is to decide whether the
segmented region should be depicted as a
boundary or a complete region. Description deals
with extracting attributes that result in some
quantitative information of interest or are basic
for differentiating one class of objects from
another.
10. Object Detection and
Recognition
After the objects are segmented from an image
and the representation and description phases
are complete, the automated system needs to
assign a label to the object—to let the human
users know what object has been detected, for
example, “vehicle” or “person”, etc.
11. Knowledge Base
Knowledge may be as simple as the bounding
box coordinates for an object of interest that has
been found in the image, along with the object
label assigned to it. Anything that will help in
solving the problem for the specific task at hand
can be encoded into the knowledge base.

Solve any
task with
GenAI
Automate
repetitive
tasks and
complex
processes
with AI
Try for free

Image Processing
Techniques
Image processing can be used to improve the
quality of an image, remove undesired objects
from an image, or even create new images from
scratch. For example, image processing can be
used to remove the background from an image of
a person, leaving only the subject in the
foreground.
Image processing is a vast and complex field,
with many different algorithms and techniques
that can be used to achieve different results. In
this section, we will focus on some of the most
common image processing tasks and how they
are performed.
Task 1: Image Enhancement
One of the most common image processing
tasks is an image enhancement, or improving the
quality of an image. It has crucial applications in
Computer Vision tasks, Remote Sensing, and
surveillance. One common approach is adjusting
the image's contrast and brightness.
Contrast is the difference in brightness between
the lightest and darkest areas of an image. By
increasing the contrast, the overall brightness of
an image can be increased, making it easier to
see. Brightness is the overall lightness or
darkness of an image. By increasing the
brightness, an image can be made lighter,
making it easier to see. Both contrast and
brightness can be adjusted automatically by
most image editing software, or they can be
adjusted manually.
However, adjusting the contrast and brightness
of an image are elementary operations.
Sometimes an image with perfect contrast and
brightness, when upscaled, becomes blurry due
to lower pixel per square inch (pixel density). To
address this issue, a relatively new and much
more advanced concept of Image Super-
Resolution is used, wherein a high-resolution
image is obtained from its low-resolution
counterpart(s). Deep Learning techniques are
popularly used to accomplish this.

For example, the earliest example of using Deep


Learning to address the Super-Resolution
problem is the SRCNN model, where a low-
resolution image is first upscaled using traditional
Bicubic Interpolation and then used as the input
to a CNN model. The non-linear mapping in the
CNN extracts overlapping patches from the input
image, and a convolution layer is fitted over the
extracted patches to obtain the reconstructed
high-resolution image. The model framework is
depicted visually below.

SRCNN model pipeline. Image by the author

An example of the results obtained by the


SRCNN model compared to its contemporaries is
shown below.

Source: Paper
Task 2: Image Restoration
The quality of images could degrade for several
reasons, especially photos from the era when
cloud storage was not so commonplace. For
example, images scanned from hard copies
taken with old instant cameras often acquire
scratches on them.

Example of an Image Restoration operation

Image Restoration is particularly fascinating


because advanced techniques in this area could
potentially restore damaged historical
documents. Powerful Deep Learning-based
image restoration algorithms may be able to
reveal large chunks of missing information from
torn documents.
Image inpainting, for example, falls under this
category, and it is the process of filling in the
missing pixels in an image. This can be done by
using a texture synthesis algorithm, which
synthesizes new textures to fill in the missing
pixels. However, Deep Learning-based models
are the de facto choice due to their pattern
recognition capabilities.
Example of an extreme image inpainting. Source

An example of an image painting framework


(based on the U-Net autoencoder) was
proposed in this paper that uses a two-step
approach to the problem: a coarse estimation
step and a refinement step. The main feature of
this network is the Coherent Semantic Attention
(CSA) layer that fills the occluded regions in the
input images through iterative optimization. The
architecture of the proposed model is shown
below.

Source: Paper

Some example results obtained by the authors


and other competing models are shown below.
Source: Paper
Task 3: Image Segmentation
Image segmentation is the process of
partitioning an image into multiple segments or
regions. Each segment represents a different
object in the image, and image segmentation is
often used as a preprocessing step for object
detection.
There are many different algorithms that can be
used for image segmentation, but one of the
most common approaches is to use thresholding.
Binary thresholding, for example, is the process
of converting an image into a binary image,
where each pixel is either black or white. The
threshold value is chosen such that all pixels with
a brightness level below the threshold are turned
black, and all pixels with a brightness level above
the threshold are turned white. This results in the
objects in the image being segmented, as they
are now represented by distinct black and white
regions.
Example of binary thresholding, with threshold value of 127

In multi-level thresholding, as the name


suggests, different parts of an image are
converted to different shades of gray depending
on the number of levels. This paper, for example,
used multi-level thresholding for medical imaging
—specifically for brain MRI segmentation, an
example of which is shown below.

Source: Paper

Modern techniques use automated image


segmentation algorithms using deep learning for
both binary and multi-label segmentation
problems. For example, the PFNet or Positioning
and Focus Network is a CNN-based model that
addresses the camouflaged object segmentation
problem. It consists of two key modules—the
positioning module (PM) designed for object
detection (that mimics predators that try to
identify a coarse position of the prey); and the
focus module (FM) designed to perform the
identification process in predation for refining the
initial segmentation results by focusing on the
ambiguous regions. The architecture of the
PFNet model is shown below.

Source: Paper

The results obtained by the PFNet model


outperformed contemporary state-of-the-art
models, examples of which are shown below.

Source: Paper
Task 4: Object Detection
Object Detection is the task of identifying objects
in an image and is often used in applications
such as security and surveillance. Many different
algorithms can be used for object detection, but
the most common approach is to use Deep
Learning models, specifically Convolutional
Neural Networks (CNNs).
Object Detection with V7

CNNs are a type of Artificial Neural Network that


were specifically designed for image processing
tasks since the convolution operation in their
core helps the computer “see” patches of an
image at once instead of having to deal with one
pixel at a time. CNNs trained for object detection
will output a bounding box (as shown in the
illustration above) depicting the location where
the object is detected in the image along with its
class label.
An example of such a network is the popular
Faster R-CNN (Region-based Convolutional
Neural Network) model, which is an end-to-end
trainable, fully convolutional network. The Faster
R-CNN model alternates between fine-tuning for
the region proposal task (predicting regions in
the image where an object might be present) and
then fine-tuning for object detection (detecting
what object is present) while keeping the
proposals fixed. The architecture and some
examples of region proposals are shown below.
Source: Paper
Task 5: Image Compression
Image compression is the process of reducing
the file size of an image while still trying to
preserve the quality of the image. This is done to
save storage space, especially to run Image
Processing algorithms on mobile and edge
devices, or to reduce the bandwidth required to
transmit the image.
Traditional approaches use lossy compression
algorithms, which work by reducing the quality of
the image slightly in order to achieve a smaller
file size. JPEG file format, for example, uses the
Discrete Cosine Transform for image
compression.
Modern approaches to image compression
involve the use of Deep Learning for encoding
images into a lower-dimensional feature space
and then recovering that on the receiver’s side
using a decoding network. Such models are
called autoencoders, which consist of an
encoding branch that learns an efficient
encoding scheme and a decoder branch that
tries to revive the image loss-free from the
encoded features.
Basic framework for autoencoder
training. Image by the author

For example, this paper proposed a variable rate


image compression framework using a
conditional autoencoder. The conditional
autoencoder is conditioned on the Lagrange
multiplier, i.e., the network takes the Lagrange
multiplier as input and produces a latent
representation whose rate depends on the input
value. The authors also train the network with
mixed quantization bin sizes for fine-tuning the
rate of compression. Their framework is depicted
below.

Source: Paper
The authors obtained superior results compared
to popular methods like JPEG, both by reducing
the bits per pixel and in reconstruction quality. An
example of this is shown below.

Source: Paper
Task 6: Image Manipulation
Image manipulation is the process of altering an
image to change its appearance. This may be
desired for several reasons, such as removing an
unwanted object from an image or adding an
object that is not present in the image. Graphic
designers often do this to create posters, films,
etc.
An example of Image Manipulation is Neural Style
Transfer, which is a technique that utilizes Deep
Learning models to adapt an image to the style
of another. For example, a regular image could be
transferred to the style of “Starry Night” by van
Gogh. Neural Style Transfer also enables AI to
generate art.
Example of Neural Style Transfer. Image by the author

An example of such a model is the one proposed


in this paper that is able to transfer arbitrary new
styles in real-time (other approaches often take
much longer inference times) using an
autoencoder-based framework. The authors
proposed an adaptive instance normalization
(AdaIN) layer that adjusts the mean and variance
of the content input (the image that needs to be
changed) to match those of the style input
(image whose style is to be adopted). The AdaIN
output is then decoded back to the image space
to get the final style transferred image. An
overview of the framework is shown below.

Source: Paper
Examples of images transferred to other artistic
styles are shown below and compared to existing
state-of-the-art methods.

Source: Paper
Task 7: Image Generation
Synthesis of new images is another important
task in image processing, especially in Deep
Learning algorithms which require large
quantities of labeled data to train. Image
generation methods typically use Generative
Adversarial Networks (GANs) which is another
unique neural network architecture.

General framework for GANs. Image by the author

GANs consist of two separate models: the


generator, which generates the synthetic images,
and the discriminator, which tries to distinguish
synthetic images from real images. The
generator tries to synthesize images that look
realistic to fool the discriminator, and the
discriminator trains to better critique whether an
image is synthetic or real. This adversarial game
allows the generator to produce photo-realistic
images after several iterations, which can then
be used to train other Deep Learning models.
Task 8: Image-to-Image Translation
Image-to-Image translation is a class of vision
and graphics problems where the goal is to learn
the mapping between an input image and an
output image using a training set of aligned
image pairs. For example, a free-hand sketch can
be drawn as an input to get a realistic image of
the object depicted in the sketch as the output,
as shown below.

Example of image-to-image translation

Pix2pix is a popular model in this domain that


uses a conditional GAN (cGAN) model for general
purpose image-to-image translation, i.e., several
problems in image processing like semantic
segmentation, sketch-to-image translation, and
colorizing images, are all solved by the same
network. cGANs involve the conditional
generation of images by a generator model. For
example, image generation can be conditioned
on a class label to generate images specific to
that class.

Source: Paper

Pix2pix consists of a U-Net generator network


and a PatchGAN discriminator network, which
takes in NxN patches of an image to predict
whether it is real or fake, unlike traditional GAN
models. The authors argue that such a
discriminator enforces more constraints that
encourage sharp high-frequency detail.
Examples of results obtained by the pix2pix
model on image-to-map and map-to-image
tasks are shown below.
Source: Paper

Key Takeaways
The information technology era we live in has
made visual data widely available. However, a lot
of processing is required for them to be
transferred over the internet or for purposes like
information extraction, predictive modeling, etc.
The advancement of deep learning technology
gave rise to CNN models, which were specifically
designed for processing images. Since then,
several advanced models have been developed
that cater to specific tasks in the Image
Processing niche. We looked at some of the most
critical techniques in Image Processing and
popular Deep Learning-based methods that
address these problems, from image
compression and enhancement to image
synthesis.
Recent research is focused on reducing the need
for ground truth labels for complex tasks like
object detection, semantic segmentation, etc., by
employing concepts like Semi-Supervised
Learning and Self-Supervised Learning, which
makes models more suitable for broad practical
applications.
If you’re interested in learning more about
computer vision, deep learning, and neural
networks, have a look at these articles:
Deep Learning 101: Introduction [Pros, Cons
& Uses]
What Is Computer Vision? [Basic Tasks &
Techniques]
Convolutional Neural Networks:
Architectures, Types & Examples

Rohit Kundu

Rohit Kundu is a Ph.D. student in the


Electrical and Computer Engineering
department of the University of California,
Riverside. He is a researcher in the Vision-
Language domain of AI and published
several papers in top-tier conferences and
notable peer-reviewed journals.

Related articles
COMPUTER VISION COMPUTER VISION COMPUTER VISION

A Friendly Guide to Image


The Ultimate Guide LabelImg [+Open Classification
to Object Detection Datasets, Models, Explained [+V7
Alternative Tools] Tutorial]
Hmrishav 7 min Hmrishav 8 min
Alberto Rizzoli 9 min read Bandyopadhyay read Bandyopadhyay read

Ready to get started?


Try our trial or talk to one of our experts.
Request a demo -> Try V7 now ->

COMPANY PLATFORM RESOURCES INDUSTRIES COMPARE

About V7 Darwin Blog Healthcare V7 vs Scale AI


Pricing V7 Go Guides Life Sciences & Biotech V7 vs Superannotate
Contact Us DICOM Annotation Product Updates Logistics V7 vs Labelbox
Jobs Document Processing Engineering Blog Retail V7 vs Roboflow
News Video Annotation Playbooks Software & Internet V7 vs Dataloop
Events Auto Annotation Webinars Agriculture V7 vs Supervisely
Partner with Us Workflows V7 Darwin Documentation Automotive V7 vs Encord
Data Security Image Annotation V7 Go Documentation Construction V7 vs CVAT
Dataset Management Academy Energy
Model Management Open Datasets Food & Beverage
Labeling Services Community Insurance & Finance
ML Glossary Manufacturing
Cookie Declaration Sports

Subscribe to our monthly newsletter - ©V7Labs · Terms & Privacy


>

You might also like