Image Sorting Using Object Detection and Face Recognition

Volume 5, Issue 3, March – 2020 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Image Sorting Using Object

Detection and Face Recognition
Shaikh Arbaz Shaikh Sohail
Dept of Computer Engineering Dept of Computer Engineering
Kalsekar technical campus Kalsekar technical campus
Mumbai University Mumbai University
Navi Mumbai, India Navi Mumbai, India
Shaikh Rehan Mubashir Khan

Dept of Computer Engineering dept of computer science
Kalsekar technical campus Kalsekar technical campus
Mumbai University Mumbai University
Navi Mumbai, India Navi Mumbai, India
Abstract:- The user generates a bulk amount of face is recognized it will create a directory by the name of
multimedia data among which one of the most used files the given class label and for face recognition ”Linear
is images. The average user does not bother to organize Binary Pattern Histogram” (LBPH) algorithm is used. If a
the images. This application is used to organize the face is not found then object detection will be used on the
images based on the object and faces of a person present image. For object detection YOLO algorithm is used. If any
in the image. It uses object detection, face detection, object is detected in the image, it will be moved to the
face recognition to categorize the images into its respective directory by the name of the class label.
respective directories. Object Detection uses the YOLO
algorithm to detect an object, for face detection it uses Additionally, the search by face option allows users to
Haar Cascade and for face recognition LBPH search images with similar faces in the directory. All the
algorithm. Using the above-mentioned algorithm images images having the same face will be displayed.
are categorized into directories.
II. LITERATURE REVIEW
Keywords:- Object Detection, Face Detection, Face
Recog-nition, YOLO, Haar cascade, LBPH. In this section, we reviewed three different research
papers related to our System, to understand their weakness
I. INTRODUCTION and how we can overcome them.
Image is one of the most used multimedia files. Since A. Object Recognition in Images
smartphones are developed cameras as well, everyone takes Object Recognition is a field of study in image
images and the trillions of images stored. Since these are processing. The process of identifying objects in video or
not going to slow down, the digital image continues to an image is termed as Object Recognition. It has a huge
grow. 1.2 trillion images are generated by the end of 2017. number of applications in the field of activity recognition,
4.7 trillion photos will be stored [10]. 182 photos per month robot localization, and automation, etc. Objects appear
are taken by the average iOS user, whereas 111 photos are different when seen from a different perspective. It should
being taken by average Android user [9]. Since there are be invariant to changed viewpoints, robustness, occlusion
bulk amounts of photos the average user does not bother to and object transformations. This task targets to perform a
organize the images. It takes a considerable amount of time technique including mainly 2 stages. In the first stage, the
to organize these images. input image is categorized using a classifier. In this paper
[1] two type of classifiers are used for classifier
This application helps you to quickly organize the optimization which are ”k-nearest neighbour(kNN)” and
unsorted image. The basic idea is to give images as input to ”Support Vector Machine(SVM) classifier”. SVM classifier
the program, and objects or faces are detected in the uses GIST features and the kNN classifier uses SIFT
images. The image will then be moved or copied to a folder features. Various kernels are used in SVM such as
to their respective class. This allows you to rapidly go Gaussian, Linear, and Polynomial. Feature extraction takes
through and organize your large amounts of pictures. place, forming a similarity matrix. It is given to the kNN
classifier. The comparison shows that the SVM classifier is
The input image is fed to the application, it will detect more accurate than the kNN classifier. Coil-20P and Eth80
whether there is the face or not, if the face is detected in the are the datasets used for the processing.
image then face recognition is applied to the image. For
face detection ”Haar-Cascade algorithm” is used. After the
IJISRT20MAR250 www.ijisrt.com 603

ISSN No:-2456-2165
 Weaknesses: CNN. Faster R-CNN is used because the mAP for Faster R-
Hand engineering of features is required before object CNN is 0.732 on the VOC2001 dataset. Using Faster R-
recognition is applied. CNN on a newly created dataset the mAP for 4 classes is
player 0.7902, soccer goal 0.8377, corner flag 0.3508,
Unable to connect Designer and Customers. football 0.4752.
Collaboration is not possible.
 Weaknesses:
Cannot be viewed from Different angles. Dataset is of uneven quantity. Dataset size is uneven.
 How to Overcome:  How to Overcome:

Use feature extraction techniques such as Use the right evaluation metrics. Resample the
convolutional neural networks. training set.
 Under-sampling.
B. The Object Detection Based on Deep Learning  Over-sampling.
The paper [3] tells about the emergence of the object
detection based on deep learning. It reviews the classical E. Research on face recognition based on deep learning
method of object detection. The paper [3] states that in deep In this era of digitalization, new technologies are
learning methods the region selection can be achieved flourished, deep learning is advancing in various areas.
using particular strategies, the feature extraction can be Deep Learning is the subfield of artificial neural network
done with the help of CNN and the classification achieved (ANN), it consists of different techniques/algorithms
by using SVM or a special neural network. The paper encouraged by the construction of the human brain.
reviewed two methods of deep learning namely DNN and Handwriting recognition, image recognition, semantic
Overfeat. analysis, weather forecasting, marketing predictions, etc
uses deep learning. In the given paper [4], it mainly focuses
 Weaknesses: on the complexity in deep learning regarding face
Poor real time because of large number of network recognition and their solutions to improve in the results and
parameters. accuracy. Deep learning methods, its applicable knowledge
Hard to train along with face recognition for further delve.
C. Object Detection Using Convolutional Neural Networks  Weaknesses:

In this paper [5] ”Convolutional Neural Network  How to analyze and review input when understanding
(CNN)” is used for object detection. It uses an activation the input character after learning is the optimum
function called Rectified Linear Unit. It uses transfer result?
learning which is a powerful deep learning technique in  How to further enrich the DB resource?
which pre-trained models can be used for feature
extraction. It uses the Tensorflow library for high- III. PROPOSED SYSTEM
performance numerical computation. In this paper [5] two
models are compared that are ”Single Shot Multi-Box The proposed method is to help the user to categorize
Detector” and ”Faster Region-based Convolutional Neural the images based on the object and faces into its respected
Network”. SSD with MobileNetv1 is used in object directories. The system will create directories based on an
detection because it is lightweight, accurate and small in object or user name. The resultant directory will contain the
size and can, therefore, be used in mobile devices. images of the related object or face. The proposed system
contains the following module:
 Weaknesses:
SSD is less accurate A. User Interface
The user would interact with the application using the
Faster R-CNN has low speed user interface module. This interface would show all the
options available to the user. The options are; search,
 How to Overcome: categorize, copy/move, delete and share. The search and
Use SSD when accuracy isn’t the top priority but categorize op-tions are used when the user wants to classify
speed is, as in real-time detection systems. the images based on either the objects or faces in them then
Use Faster R-CNN where high accuracy is required over display as a search result or categorize the images in
speed such as in medical imaging. folders, respectively. An Option flag would be used to
choose between search and categorize operations.
D. Application of deep learning in object detection
This paper [2] deals with the application of deep B. Face Detection
learning in object detection. It gives a summary of some This module is used to check whether in the input
commonly used datasets such as ImageNet, PASCAL image there is a face or not. If a face is found, then the face
VOC, COCO. And create a new dataset for a football game. flag is set to true and the input image is passed further to
It gives a summary of the series of algorithm based on R- the image recognition module. If no face is found, then the
CNN such as R-CNN, SPP-Net, Fast R-CNN, Faster R-

ISSN No:-2456-2165
image is passed on to continue with object detection. Face is a face region if it does not then discard that window in a
detection is achieved using a Haar-Cascade face detector single shot. Don’t process it again. Now, how do we
implemented using OpenCV. achieve this? This is where cascade classifiers come in the
picture Cascade classifiers works as follows:
Haar-Cascade Algorithm
This module is used to check whether in the input Features are grouped into different stages and stages are
image there is a face or not. If a face is found, then the face applied one-by-one.
flag is set to true and the input image is passed further to At first stage if the window fails, discard it. Don’t consider
the image recognition module. If no face is found, then the it for remaining stages.
image is passed on to continue with object detection. Face A region is a face region if the window passes all the
detection is achieved using a Haar-Cascade face detector stages.
implemented using OpenCV.
E.g. For 6000+ features, divide them into stages with
The haar-Cascade algorithm makes use of two number of features as 1, 10, 25, 25,50 and so on, upto 38
fundamental and time-consuming. In an image, most of the total stages.
region is a non-face region. Hence, we check if the window
Fig 1:- System Architecture.
Fig 2:- Haar feature

ISSN No:-2456-2165
Components which are Haar features and cascade module. After the faces of the persons in the image have
classifier. This classifier requires lots of images with the been recognized, the module would then assign a unique ID
face (positive images) as well as images without any face that would be named, to the image. It will then pass the
(negative images) to train. Each Haar feature is a single unique person ID to the Face Matching Module as its input.
value. There are a large number of features obtained for a Face recognition is achieved using the LBPH algorithm
single image. However, most features are irrelevant for face implemented through OpenCV.
detection. To choose the best features, the Adaboost
algorithm is used. Here’s how Adaboost works: LBPH Algorithm
LBPH stands for Local Binary Pattern Histogram.
Apply every feature on all training images. This algorithm uses 4 parameters:
Find the best threshold value which will classify the 1) Radius: Set usually to 1 pixel.
image as positive or negative. 2) Neighbors: Set usually to 8 pixels.
3) Grid X: Set usually to 8 cells.
Select the features with minimum error rate. 4) Grid Y: Set usually to 8 cells.
Take the weighted sum of several weak classifiers to Steps:

get a final strong classifier.  Train the algorithm
 Train the dataset of facial images of people we want to
Even with Adaboost, the algorithm is still a little inefficient recognize
 Set an ID(number or name) for each image.
C. Face Recognition
The face recognition module is used to recognize the  Apply the LBP operation
faces in the image that is obtained from the face detection
Fig 3:- LBPH Operation
 Use sliding window based upon radius and neigh-bors parameters.

 3 x 3 pixel matrix with gray levels of each pixel(0-255).
 Set central pixel value as the threshold.
 For each neighboring pixel set values of 0 or 1. If the value of pixel is greater than threshold value then set it as 1, else 0.
 Concatenate each binary value into a new binary value. E.g. 10101011
 Convert the new binary number to decimal and set this as the new value for the centre pixel.
 Extract the Histogram
Fig 4:- Histogram Extraction

ISSN No:-2456-2165
 Using the image from previous step and the parameters take the image from the face detection module and apply
Grid X and Grid Y to divide the image into multiple object detection on the image. This would result in the
grids. object IDs of the objects being detected. These object IDs
 Extract the histograms of each region and con-catenate would be then passed onto the object classification module
the histograms to create new histogram. This new for further operations. The algorithm used for object
histogram would represent the charac-teristics of the detection is YOLO, which is implemented in OpenCV.
original image.
YOLO Algorithm
 Performing face Recognition
 Compare the input image histogram with the histogram YOLO stands for You Only Look Once.
of other images.
 For the comparison, we can use methods such as It is an algorithm that makes use of Convolutional Neural
Euclidean distance, Chi square, absolute value, etc. Network(CNN) to detect objects.
 Output is the ID from image with closest his-togram.
 Calculated distance can be used as confidence measure. YOLO sees the whole image at once, unlike other sliding
Note that lower scores mean better confidence, since it window or region-based techniques.
is distance. [8]
This helps YOLO get the required contextual
D. Object Detection information of the image, essentially halving the number of
The object detection module is the counterpart module back-ground errors as compared to Fast R-CNN.
to Face Detection module for objects. This module would
Fig 5:- YOLO Process
YOLO algorithm works as follows overlap/Non-overlap.
1) Crop the image with equal width and height. For e.g. Pr = Conditional class probability.
416 x 416.
2) Divide the image into an S x S grid. Prediction for Bounding box: noitemsep
3) Apply the Convolutional Neural Network.
4) Calculate the predictions of Bounding box with con- – X,Y = Co-ordinates to represent center of box rela-tive to
fidence score and predictions of class probabilities. the bounds of grid cell.
5) Non-max suppression of the bounding boxes. – W,H = Width and Height predicted relative to the whole
6) Final image with the bounding boxes, class labels and image.
confidence scores is obtained. – Confidence: Represents the IOU between the pre-dicted
box and the ground truth box.
Prediction of class probabilities: Confidence is given as;
Confidence = Pr(Object)*IoU where, – Class specific confidence score is given as:
Pr(classi)*IOU = Pr(classi—object)*(Predicted Con-
IoU = Intersections over Union. fidence)
IoU = Highest bounding box Pr(classi)*IOU = Pr(classi—object)*Pr(object)*IOU

ISSN No:-2456-2165
The YOLO model is then encoded as a tensor as: S x S x Finally, this project is fully open source so that other
(B*5 + C) people can contribute their valuable ideas and concepts for
this project.
where,
REFERENCES
C = Class probabilities, B = Bounding Boxes. [6]
[1]. Meera M K and Shajee Mohan B S, ”Object
E. Face Matching recognition in images,” 2016 International Conference
Face Matching is the module following the face on Information Science (ICIS), Kochi, 2016, pp. 126-
recognition module. This would take the person ID 130.doi: 10.1109/INFOSCI.2016.7845313
generated by the face recognition module and search [2]. X. Zhou, W. Gong, W. Fu and F. Du, ”Application of
through the current directory to find the images of faces deep learning in object detection,” 2017 IEEE/ACIS
with the same ID. If no matching IDs are found, then the 16th International Conference on Computer and
problem for the same is returned as a message to the user. If Information Science (ICIS), Wuhan, 2017, pp. 631-
matching images are found, then the current directory path 634.doi: 10.1109/ICIS.2017.7960069
and the filenames of the images are passed to either Image [3]. C. Tang, Y. Feng, X. Yang, C. Zheng and Y. Zhou,
organization module or Image Results Display Module ”The Object Detection Based on Deep Learning,”
based on the Options flag chosen. 2017 4th International Conference on Information
Science and Control Engineering (ICISCE),
F. Object Classification Changsha, 2017, pp. 723-728.doi:
Object Classification is a counterpart module to the 10.1109/ICISCE.2017.156
face matching module. It finds all the matching images [4]. X. Han and Q. Du, ”Research on face recognition
with the same object IDs. It then takes the object IDs based on deep learning,” 2018 Sixth International
generated from the object detection module and then Conference on Digital Information, Networking, and
classifies them into their respective classes, and each class Wireless Communications (DINWC), Beirut, 2018,
would be given its class name. These class names and the pp. 53-58.doi: 10.1109/DINWC.2018.8356995
filenames of the matched images and would be then passed [5]. R. L. Galvez, A. A. Bandala, E. P. Dadios, R. R. P.
to either Image Organization or Image Results Display Vicerra and J. M. Z. Maningo, ”Object Detection
modules based on what the Options flag is set. Using Convolutional Neural Networks,” TENCON
2018 - 2018 IEEE Region 10 Conference, Jeju, Korea
G. Image Organization (South), 2018, pp. 2023-2027.doi:
Image Organization module obtains the image 10.1109/TENCON.2018.8650517
filenames and their class names or person IDs from the [6]. YOLO — You only look once, real time object detec-
object classification module or face matching module. It tion explained towardsdatascience.com; Manish
then creates new folders with the class names or persons Chablani https://towardsdatascience.com/yolo-you-
named as the new folder names. It then moves the images only-look-once-real-time-object-detection-explained-
belonging to a certain class or a person to their 492dc9230006
corresponding folder. The same is then performed for all [7]. Face Detection using Haar Cascades opencv-python-
the classes and persons. The output of this module would tutroals.readthedocs.io; https://opencv-python-
then be the newly created folders. This the module tutroals.readthedocs.io/en/latest/pytutorials=pyobjdetec
corresponds to Categorize operation. t=pyf
[8]. Face Recognition: Understanding LBPH
H. Image Result Display Algorithm towardsdatascience.com; Kelvin
Image Results Display Module is a counterpart to Salton do Prado https://towardsdatascience.com/face-
image organization module. This module corresponds to recognition-how-lbph-works-90ec258c3d6b
Search op-eration. It takes all the class names or the person [9]. Special report: How we really use our camera phones
ID from Face Matching module or Object classification gigaom.com; Janko Roettgers
module, respectively. It would display all the images of a https://gigaom.com/2015/01/23/personal-photos-
class or a person as a list to the user. The output would be videos-user-generated-content-statistics/
the sorted list of images. [10]. How Much Data Do We Create Every Day? The
Mind-Blowing Stats Everyone Should Read
IV. CONCLUSION forbes.com; Bernard Marr
https://www.forbes.com/sites/bernardmarr/2018/05/21
We realized the problem of organizing images in bulk /how-much-data-do-we-create-every-day-the-mind-
i.e sorting, moving, deleting, sharing of images and decided blowing-stats-everyone-should-read/45474c160ba9
to come up with a system to solve the same.
Our project makes use of new and emerging

technologies of deep learning and machine learning to
bring a new perspective to sort and group the images.

Image Sorting Using Object Detection and Face Recognition

Uploaded by

Image Sorting Using Object Detection and Face Recognition

Uploaded by

Volume 5, Issue 3, March – 2020 International Journal of Innovative Science and Research Technology

Image Sorting Using Object

Shaikh Rehan Mubashir Khan

IJISRT20MAR250 www.ijisrt.com 603

 How to Overcome:  How to Overcome:

C. Object Detection Using Convolutional Neural Networks  Weaknesses:

IJISRT20MAR250 www.ijisrt.com 604

Fig 1:- System Architecture.

Fig 2:- Haar feature

IJISRT20MAR250 www.ijisrt.com 605

Take the weighted sum of several weak classifiers to Steps:

Fig 3:- LBPH Operation

 Use sliding window based upon radius and neigh-bors parameters.

 Extract the Histogram

Fig 4:- Histogram Extraction

IJISRT20MAR250 www.ijisrt.com 606

Fig 5:- YOLO Process

YOLO algorithm works as follows overlap/Non-overlap.

IJISRT20MAR250 www.ijisrt.com 607

Our project makes use of new and emerging

IJISRT20MAR250 www.ijisrt.com 608

You might also like