Image Segmentation For Object Detection Using Mask R-CNN in Colab

GRD Journals- Global Research and Development Journal for Engineering | Volume 5 | Issue 4 | March 2020
ISSN- 2455-5703
Image Segmentation for Object Detection using

Mask R-CNN in Colab
Mr. V. Neethidevan Dr. G. Chandrasekaran
Assistant Professor Director
Department of MCA Department of MCA
Mepco Schlenk Engineering College(Autonomous) Sivakasi Mepco Schlenk Engineering College(Autonomous) Sivakasi
Abstract
Image segmentation is a critical process in computer vision. It involves dividing a visual input into segments to simplify image
analysis. Segments represent objects or parts of objects, and comprise sets of pixels, or “super-pixels. Recently, due to the success
of deep learning models in a wide range of vision applications, there has been a substantial amount of works aimed at developing
image segmentation approaches using deep learning models. Image segmentation with CNN involves feeding segments of an
image as input to a convolutional neural network, which labels the pixels. Fully convolutional network, FCNs use convolutional
layers to process varying input sizes and can work faster. The final output layer has a large receptive field and corresponds to the
height and width of the image, while the number of channels corresponds to the number of classes. An architecture based on deep
encoders and decoders, also known as semantic pixel-wise segmentation. It involves encoding the input image into low dimensions
and then recovering it with orientation invariance capabilities in the decoder. Most notably is the R-CNN, or Region-Based
Convolutional Neural Networks, and the most recent technique called Mask R-CNN that is capable of achieving state-of-the-art
results on a range of object detection tasks.
Keywords- Machine Learning, Deep Learning, Image Segmentation, Video Analytics
I. INTRODUCTION
A. What is Image Segmentation?

Image segmentation is a critical process in computer vision. It involves dividing a visual input into segments to simplify image
analysis. Segments represent objects or parts of objects, and comprise sets of pixels, or “super-pixels”. Image segmentation sorts
pixels into larger components, eliminating the need to consider individual pixels as units of observation. There are three levels of
image analysis-
– Classification- categorizing the entire image into a class such as “people”, “animals”, “outdoors”
– Object Detection- detecting objects within an image and drawing a rectangle around them, for example, a person or a sheep.
– Segmentation- identifying parts of the image and understanding what object they belong to. Segmentation lays the basis for
performing object detection and classification.
B. Old-School Image Segmentation Methods

There are additional image segmentation techniques that were commonly used in the past but are less efficient than their deep
learning counterparts because they use rigid algorithms and require human intervention and expertise. These include-
– Thresholding- divides an image into a foreground and background. A specified threshold value separates pixels into one of
two levels to isolate objects. Thresholding converts grayscale images into binary images or distinguishes the lighter and darker
pixels of a color image.
– K-means clustering- An algorithm identifies groups in the data, with the variable K representing the number of groups. The
algorithm assigns each data point (or pixel) to one of the groups based on feature similarity. Rather than analyzing predefined
groups, clustering works iteratively to organically form groups.
– Histogram-based image segmentation- uses a histogram to group pixels based on “gray levels”. Simple images consist of an
object and a background. The background is usually one gray level and is the larger entity. Thus, a large peak represents the
background gray level in the histogram. A smaller peak represents the object, which is another gray level.
– Edge detection- identifies sharp changes or discontinuities in brightness. Edge detection usually involves arranging points of
discontinuity into curved line segments, or edges. For example, the border between a block of red and a block of blue.
II. LITERATURE SURVEY

[1]. In the last few years researches across the globe applied deep learning concept in computer vision applications. The authors
addressed the problems in using two Neural network architectures LeNet and Network in Network (NiN). Performance of the
All rights reserved by www.grdjournals.com 15

Image Segmentation for Object Detection using Mask R-CNN in Colab
(GRDJE/ Volume 5 / Issue 4 / 004)
architectures are studied computational efficiency by using classification and detection problems. They used multiple databases
[1].
[2]. The recent development in the Deep Learning made more progress in research activities in Digital Image Processing.
The authors analysed the various pros and cons of each approach. The focus is to promote knowledge of classical computer vision
techniques. Also exploring how others options of computer vision can be combined. Many hybrid methodologies are studied and
proved to improve computer vision performance. [2].
[3]. To process Big data applications, need for large amount of spaces are needed in industry. Also more space is required
for the video streams from CCTV cameras and social media data, sensor data, agriculture data, medical data and data evolved from
space research. The authors done a survey which starts from object recognition, action recognition, crowd analysis and finally
violence detection in a crowd environment. The various problems in the existing methods were identified and summarized. [3].
[4]. In [4], the authors, used an approach on how to use Deep learning for all types of well- known applications such as
Speech recognition, Image processing and NLP. In Deep learning, pertained neural network to identify and remove noise from
images. The processing of images using deep learning is processed for image pre-processing and image augmentation for
Various applications with better results.
[5]. In [5], the authors, used latest image classification techniques based on deep neural network architectures to improve
the identification of highly boosted electroweak particles with respect to existing methods. Also they introduced new methods to
visualize and interpret the high level features learned by deep neural networks that provide discrimination beyond physics- derived
variables, adding a new capability to understand physics and to design more powerful classi_cation methods at the LHC.
[6]. The authors proposed an analysis of tracking-by-detection approach which include detection by YOLO and tracking
bySORT algorithm. This paper has information about customimage dataset being trained for 6 specific classes using YOLOand
this model is being used in videos for tracking by SORTalgorithm. Recognizing a vehicle or pedestrian in an ongoingvideo is
helpful for traffic analysis. The goal of this paper is foranalysis and knowledge of the domain.
III. HOW DEEP LEARNING POWERS IMAGE SEGMENTATION METHODS

Now image segmentation techniques are boosted by deep learning technology. The following are several deep learning
architectures used for segmentation-
A. Convolutional Neural Networks (CNNs)

Image segmentation with CNN involves feeding segments of an image as input to a convolutional neural network, which labels
the pixels. The CNN cannot process the whole image at once. It scans the image, looking at a small “filter” of several pixels each
time until it has mapped the entire image.
B. Fully Convolutional Networks (FCNs)

Conventional CNNs have fully-connected layers, which can’t manage different input sizes. FCNs use convolutional layers to
process varying input sizes and can work faster. The final output layer has a large receptive field and corresponds to the height and
width of the image, while the number of channels corresponds to the number of classes. The convolutional layers classify every
pixel to determine the context of the image, including the location of objects.
C. Ensemble learning
Synthesizes the results of two or more related analytical models into a single spread. Ensemble learning can improve prediction
accuracy and reduce generalization error. This enables accurate classification and segmentation of images. Segmentation via
ensemble learning attempts to generate a set of weak base-learners which classify parts of the image, and combine their output,
instead of trying to create one single optimal learner.
D. DeepLab
One main motivation for DeepLab is to perform image segmentation while helping control signal decimation-reducing the number
of samples and the amount of data that the network must process. Another motivation is to enable multi-scale contextual feature
learning-aggregating features from images at different scales. DeepLab uses an ImageNet pre-trained residual neural network
(ResNet) for feature extraction. DeepLab uses atrous (dilated) convolutions instead of regular convolutions. The varying dilation
rates of each convolution enable the ResNet block to capture multi-scale contextual information. DeepLab is comprised of three
components-
– Atrous convolutions-with a factor that expands or contracts the convolutional filter’s field of view.
– ResNet-a deep convolutional network (DCNN) from Microsoft. It provides a framework that enables training thousands of
layers while maintaining performance. The powerful representational ability of ResNet boosts computer vision applications
like object detection and face recognition.
– Atrous spatial pyramid pooling (ASPP)-provides multi-scale information. It uses a set of atrous convolutions with varying
dilation rates to capture long-range context. ASPP also uses global average pooling (GAP) to incorporate image-level features
and add global context information.

E. SegNet Neural Network

An architecture based on deep encoders and decoders, also known as semantic pixel-wise segmentation. It involves encoding the
input image into low dimensions and then recovering it with orientation invariance capabilities in the decoder. This generates a
segmented image at the decoder end.
IV. IMAGE SEGMENTATION APPLICATIONS

Image segmentation helps determine the relations between objects, as well as the context of objects in an image. Applications
include face recognition, number plate identification, and satellite image analysis. Industries like retail and fashion use image
segmentation, for example, in image-based searches. Autonomous vehicles use it to understand their surroundings.
A. Object Detection and Face Detection

These applications involve identifying object instances of a specific class in a digital image. Semantic objects can be classified
into classes like human faces, cars, buildings, or cats.
– Face detection-a type of object-class detection with many applications, including biometrics and autofocus features in digital
cameras. Algorithms detect and verify the presence of facial features. For example, eyes appear as valleys in a gray-level
image.
– Medical imaging-extracts clinically relevant information from medical images. For example, radiologists may use machine
learning to augment analysis, by segmenting an image into different organs, tissue types, or disease symptoms. This can reduce
the time it takes to run diagnostic tests.
– Machine vision-applications that capture and process images to provide operational guidance to devices. This includes both
industrial and non-industrial applications. Machine vision systems use digital sensors in specialized cameras that allow
computer hardware and software to measure, process, and analyze images. For example, an inspection system photographs
soda bottles and then analyzes the images according to pass-fail criteria to determine if the bottles are properly filled.
B. Video Surveillance-Video Tracking and Moving Object Tracking

This involves locating a moving object in video footage. Uses include security and surveillance, traffic control, human-computer
interaction, and video editing.
– Self-driving vehicles-autonomous cars must be able to perceive and understand their environment in order to drive safely.
Relevant classes of objects include other vehicles, buildings, and pedestrians. Semantic segmentation enables self-driving cars
to recognize which areas in an image are safe to drive.
– Iris recognition-a form of biometric identification that recognizes the complex patterns of an iris. It uses automated pattern
recognition to analyze video images of a person’s eye.
– Face recognition-identifies an individual in a frame from a video source. This technology compares selected facial features
from an input image with faces in a database.
C. Retail Image Recognition

This application provides retailers with an understanding of the layout of goods on the shelf. Algorithms process product data in
real time to detect whether goods are present or absent on the shelf. If a product is absent, they can identify the cause, alert the
merchandiser, and recommend solutions for the corresponding part of the supply chain
V. EXPERIMENTAL STUDY
A. Mask R-CNN Image Segmentation Demo

This Colab enables you to use a Mask R-CNN model that was trained on Cloud TPU to perform instance segmentation on a sample
input image. The resulting predictions are overlayed on the sample image as boxes, instance masks, and labels. You can also
experiment with your own images by editing the input image URL.
B. About Mask R-CNN

The Mask R-CNN model addresses one of the most difficult computer vision challenges- image segmentation. Image segmentation
is the task of detecting and distinguishing multiple objects within a single image. In particular, Mask R-CNN performs "instance
segmentation," which means that different instances of the same type of object in the input image, for example, car, should be
assigned distinct labels.

Input Image
Output

REFERENCES
[1] Mihai-SorinBadea, Iulian-IonuțFelea, Laura Maria Florea, Constantin Vertan the Image Processing and Analysis Lab (LAPI), Politehnica University of
Bucharest, Romania.
[2] Niall O’ Mahony, Sean Campbell, Anderson Carvalho, SumanHarapanahalli, Gustavo Velasco Hernandez, LenkaKrpalkova, Daniel Riordan, Joseph Walsh
IMaR Technology Gateway, Institute of Technology Tralee, Tralee, Ireland
[3] Intelligent video surveillance- a review through deep learning techniques for crowd Analysis G. Sreenu* and M. A. SaleemDurai
[4] Pratik Kanani, MamtaPadole, Deep Learning to Detect Skin Cancer using Google Colab, International Journal of Engineering and Advanced Technology
(IJEAT) ISSN- 2249 – 8958, Volume-8 Issue-6, August 2019.
[5] Ganesh B1, Kumar C2 1Assistant Professor, Department of Computer Application, Dr. M.G.R. Chockalingam Arts College, Arni, Tiruvannamalai, Deep
learning Techniques in Image processing, National Conference On Emerging Trends in Computing Technologies ( NCETCT-18 ) – 2018
[6] A. Schwartzman1, M. Kagan1, L, Mackey2, B. Nachman1 and L. De Oliveira3 1 SLAC National Accelerator Laboratory, Stanford University, 2575 Sand
Hill Road, Menlo Park, CA 94025, USA Image Processing, Computer Vision, and Deep Learning- new approaches to the analysis and physics interpretation
of LHC events, IOP Publishing 2016.
[7] AkanshaBathija M.Tech Student, Dept of Computer Engineering K J Somaiya College of Engineering Mumbai, Maharashtra, India, Visual Object Detection
and Tracking using YOLO and SORT, International Journal of Engineering Research & Technology (IJERT) http-//www.ijert.org ISSN- 2278-0181
IJERTV8IS110343 , www.ijert.org Vol. 8 Issue 11, November-2019

Image Segmentation For Object Detection Using Mask R-CNN in Colab

Uploaded by

Image Segmentation For Object Detection Using Mask R-CNN in Colab

Uploaded by

GRD Journals- Global Research and Development Journal for Engineering | Volume 5 | Issue 4 | March 2020

Image Segmentation for Object Detection using

A. What is Image Segmentation?

B. Old-School Image Segmentation Methods

II. LITERATURE SURVEY

All rights reserved by www.grdjournals.com 15

III. HOW DEEP LEARNING POWERS IMAGE SEGMENTATION METHODS

A. Convolutional Neural Networks (CNNs)

B. Fully Convolutional Networks (FCNs)

All rights reserved by www.grdjournals.com 16

E. SegNet Neural Network

IV. IMAGE SEGMENTATION APPLICATIONS

A. Object Detection and Face Detection

B. Video Surveillance-Video Tracking and Moving Object Tracking

C. Retail Image Recognition

A. Mask R-CNN Image Segmentation Demo

B. About Mask R-CNN

All rights reserved by www.grdjournals.com 17

All rights reserved by www.grdjournals.com 18

All rights reserved by www.grdjournals.com 19

You might also like