A Survey of Face Detection and Recognition System
A Survey of Face Detection and Recognition System
net/publication/370864925
Article in Iraqi Journal of Intelligent Computing and Informatics (IJICI) · May 2023
DOI: 10.52940/ijici.v2i1.32
CITATIONS READS
0 285
1 author:
Rasha Talib
University of Baghdad
8 PUBLICATIONS 8 CITATIONS
SEE PROFILE
All content following this page was uploaded by Rasha Talib on 18 May 2023.
Corresponding Author:
Rasha Talib Gdeeb
Department of Environmental Engineering,University of Baghdad
Email: rashatalib1@[Link]
1. INTRODUCTION
Computer vision is a new age system that aims to build an intelligent application has the ability to
understand the content inside images like human’s dose. It mainly depends on separating the pixels inside the image
using edge detection and regions of interest. Data acquisition task is done by cameras which can give us single
pictures or a sequence of images in frame per second rate. Data resulted is then analyzed, studied to extract main
features. These features can allow us to re-construct a description about the outer world in a manner like humans
does to make it understood by computer system [1]-[3].
There are many applications of computer vision which can used in:
• An application able to recognize objects or people in an image.
• Automated control applications (e.g.: Auto Vehicles).
• Re-construction of models (e.g.: Analyze of medical images).
Recognition is a part of image processing where we can define people in an image. This can be done easily
by humans. But this problem still unsolved in computer vision system. This works by finding the best solution in a
group of geometric shapes, people face, printed or written characters, and the position of a background object in an
image[5].
There are 2 types of recognition:
• Recognition: define objects in an image from different angles.
• Selection and Investigation: it aims to select a special feature for a special object like a car plate. It aims too
to search in an image to find an object (e.g., finding a sick cells or finding a car on a highway).
Computer vision systems are different and are divided between small systems for small missions, and a
very complicated systems which can define and recognize large an several objects in an image in the same time.
Any computer vision system must run in number of steps[6],[7]:
• Image Acquisition: here we get an image by using one or more image sensors (which can be light sensor
cameras, distance sensors, x-Ray machines, Radars, ultrasonic cameras). The resulted image may be a 2D
or 3D image or a sequence of images.
• Pre-processing: before we can use a computer vision algorithm on an image, it is important to made a
group of pre-processing operations to make sure that the data achieves some hypothesis in the algorithm.
This pre-processing can contain changing the image accuracy decrease to emphasis the image coordinate
system and decreasing the noise to emphasis that the sensor does not sending wrong data, increasing
contrast to became sure that the required information can be got successfully.
• Image features extraction: in this step we will get a multiple level of resolution from the same image, these
landmarks are divided into global landmarks like colors and shapes, and local landmarks like corners and
spots. We can also get a more complicated landmarks in the image.
• Image fragmentation: consists of a group of important operations, like choosing a group of landscapes, or
image splitting to get the region of interest of the object searching for.
• Hight level operations: the data entered in this stage will be a small group of the total data, like the range in
the image which we can find the searched object in.
2. RELATED WORKS
Wenming et al [8] proposed a DPSRC approach to recognize the face in an image, instead of using all the
parts in an image which is time consuming and needs a large amount of memory we need to select just important
patches and by using the Bagging greedy search to use the selected patches in the image and a series of local
optimums. Xavier et al [9] study the uncontrolled environment states that can affect on the accuracy of face
recognition system like varying of face orientation as example, for face recognition the author uses Robust Spares
Coding algorithm which uses a weight matrix W to increase the system performance. He examines the system on
LFWA database.
Young Zhu et al [10] study the effect of illumination on face recognition system, all previous studies did
not take the spectral wavelength in interest which leads for less accuracy. So, the author gives a new algorithm
called Logarithm Gradient Histogram which takes the three important parts of to solve the illumination problem
which are direction, magnitude, and spectral wavelength for all lighting conditions. This logarithmic algorithm
depends on a bank pass filter which must be multi-scaled to remove the illumination effect which affecting on the
image. This algorithm was examined on Yale B database.
Ding et al. [11] introduced an HPN method which can use two- and three-dimensional methods and
provides three tasks for FIER. This algorithm can avoid losing of semantic information. Also, Vigneau et al [12]
focused on problems produced by the conditions caused by the environments so he found that the temporal variation
conditions, to avoid that effect the author uses two thermal face databases that got in real static and variable
conditions.
There are several metrics can used to verify if the face recognition system is working fine, these methods
can define in [1]-[5], False Accept rate (FAR) which calculates the probability that the system was incorrectly
matches with the input image to non-matching pattern in the database. It gives as an output the percentage of invalid
inputs that are incorrectly accepted. Second metric is False Reject Rate FRR which means that the system has fails
to detect any matching between the input image and the templates in the database, it gives the percentage of right
inputs which are rejected incorrectly.
Another metric used which is Receiver Operating Characteristic or ROC plot which gives a trade-off
between FAR and FRR. The last metric is Equal Error Rate (EER) which is the rate of equality between accepted
and rejected errors and can obtained from ROC. Less ROC values means that the system is more accurate.
Face recognition system is defined as a part of biometric systems, which might consist face scan,
fingerprint, foot print, hand scan, iris, etc. Face recognition depends on face biometric features like eyes position,
lips position and size…etc. So, we can first identify the existence of the face inside the image and its position, and
then recognize it. This task can be done easily by human brain, but it needs training in a computer system. Figure 1
shows the different types of biometric systems:
Biometric systems work in different manners. So, when the target is to identify a special object in an
image, we must first create a dataset of features for all the data available for us and then compare it with the newly
entered data. For example, here we can use template matching algorithm of correlation task. Here we need just to
find an object in the image.
In another hand, in verification task, we need the verify that, for example, the face in the image belongs for
a specific person we are searching for or not. This is done when we validate the collected features with pre-stored
one. 2 and figure 3 defines these tasks. In figure 2 we want to make a matching between two images to give a result
if the person is the same, so the result is yes or no. in figure 3, we are comparing the person image with several
images in the dataset and then returns all the data saved about it.
In recent days, Face recognition systems are growing fast in usage in many applications, and becomes the
most important verification task in mobile security applications. In addition to the importance of this system, this
technology is facing a large number of challenges, at the top of this challenge is, for example, the illumination
changing in image when the environmental conditions changes, in addition to aging which makes a problem for the
system to give an accurate recognition.
The main problem about detecting and recognition of faces is that human face is not a rigid object, that means
an image structure could change if the illumination is changed, or the face pose changes too, aging can effect on the
face, even we can get problems if the capturing device has a low accuracy [ 8-11].
If the resolution of the camera decreases, some of the textures in the image will be lost. And the size of the
image might decrease too. That means we will build a system that works under any resolutions and sizes which is
absolutely a bad choice. Figure 5. Shows the effect of resolution on the image.
Any changes in face pose can cause the system to became less in accuracy, all face data available are not
covering all face positions in the image. To avoid this problem, we have two solutions. first, the face dataset must
contain all positions of the user face, which means a huge size dataset. Or we must align the faces using
preprocessing task in a special alignment before recognition. If we have a large dataset, that means the necessary of
a large memory which cause an additional cost too.
When we collect a dataset of a person, the data might collect in young age, after a time, aging can affect on the
person texture, then on images. All of that can affect on the accuracy of the system. It is impossible to collect a
dataset for a person in all ages, so the accuracy of the system will be degraded see figure 6 and figure 7.
images can vary in facial style and hairstyle. for example, makeup can affect the recognition task. expressions like
smiling, anger, can cause changing the accuracy of all the system. Here we might save all the expressions types in
the dataset, which can be time consuming see figure 8.
Another effect on the images can caused by additional objects like clothes and glasses. The person in the image
can have a bread or a mustache but they were not in the dataset. Figure 9 shows the effect of occlusion on an image
and as we can see it might be hard to confirm that all the images are for the same person.
When we create a face recognition system, first we must locate the face position. It could be different if we
have a still image or a video sequence. It might be difficult to isolate face image from background in poor images.
Figure 10 shows a face with low accuracy and with a rotation which causes loosing of face landmarks decreasing the
accuracy of recognition task.
In some cases, two persons may be similar too much so the recognition result might be wrong see figure 11.
When we talk about face recognition system, It is really easy task for humans, but it is difficult for computer
system, because the system we build must decide if the pixel belongs to a face or not.
3.2. Normalization step
After we became able to isolate face image from the background, we will normalize it. it is an operation we will
use to standardize the image with respect to pose, size, and illumination.
To normalize a face image, the face landmarks must be located carefully and accurately. So, we must do some
pre-processing steps which contains:
• Histogram Equalization: in image histogram all the values which are equal in probability remains constant
with equal distribution, mathematically we can write it 𝑃(𝑖) = 𝑛𝑖⁄𝑁 (1)
where N the total number of pixels in an image.
• Adaptive Histogram Equalization: here we can compute the histogram of a local image centered with the
given pixel towards the mapped value of the pixel so we can enhance the resolution of the image.
• Computing the gradient: we can compute this value for its importance to extract some properties of the face
in the image like the surface geometry.
• Gamma correction: it is a conversation of an image calculated to brighten or darken the image using
equation
1
𝑀(𝑥, 𝑦) = 𝑁(𝑥, 𝑦) ⁄𝛾 (2)
where 𝑁(𝑥, 𝑦) is the input image and 𝑀(𝑥, 𝑦) the output image and gamma the correction constant. When
gamma>1 the image will be darker and when gamma<1 the image will be brighter.
• LOG correction: this is a logarithmic equation used on grayscale images, this transformation uses
equation(3):
𝑆 = 𝐶 ∗ log(1 + 𝑟) (3) ,
c is a constant, and r is the input pixel added to 1 to make sure the log not be zero, where S is the output
pixel. This transform is used to enhance low gray levels and compress high ones[12],[13].see figure 12 and
figure 13.
This step creates a template called a biometric reference, this reference is stored in the database, many
algorithms are used for this task, like Gabor filter or LBP filter. Features can be extracted using PCA (Principal
component analysis) transformation too which can decrease the features space dimensions. We use these feature
extraction methods to increase the speed of computation task and decrease the size of the dataset.
• Fisher Faces: is one of the popular algorithms used in face recognition, and is widely believed to be
superior to other techniques, such as eigenface because of the effort to maximize the separation between
classes in the training process see figure 14 .
IPES-1280 Face
MinMoon a al. [13] CNN Euclidian Distance 88.9%
database
SIFT and
Decheng a al. [17] Euclidian Distance e•PRIP 70.1±5.94
HOG
Neural networks are one of the modern technologies that gives high accuracy in classification and
prediction tasks. High speed, combability with many science scopes makes neural networks one of the most widely
used techniques. There are different types of neural networks, such like forward neural network (FNN), recurrent
neural networks (RNN), and conventional neural networks (CNN), these networks are shared in term of having
neurons and an input and output layers with one or several hidden layers, and are different in terms of the tasks
performed by these layers.
Interconnected networks (CNN) carry out classification operations, whether with the presence or absence
of a supervisor. Training is used with a supervisor by providing a number of corresponding inputs and outputs, and
the system learns to connect them and predicate the output. In case of learning without a supervisor, the values of
known inputs are used here, and the system tries to link these values with the distribution of available output data.
The following figure (15) shows the supervisor training method for an image classification system. In the beginning,
the input images are entered and a number of their properties such as edges and gradients are calculated in the main
layer. In the middle stage, a number of features are extracted from the previous stage, and then in the last stage the
features are extracted, which in turn can be passed for the classifier[16].
• Normalization: in this stage we take the standard deviation for all training data group to squeeze the
standard deviation range using the equation (6):
𝑥′
𝑥 ′′ = (6).
𝑁 ̂ )2
√∑𝑖=1(𝑥𝑖 −𝑥
𝑁−1
• PCA Whitening: this stage aims to decrease the interconnection rate between different data dimensions,
here we can find the convolution matrix which can encode the interconnection between data dimensions
and then apply the SVD (Singular Value Decomposition) to get the eigen vectors which can divide into
groups represent the special values of images features.
• Local Contrast Normalization: aims to get more of high constraint features, here we will create
the neighbors of each pixel then calculate the average of these pixels for centering and then
taking the standard deviation for all pixels which must be more than oneCNN Layers
9. CONVOLUTIONAL LAYER
the most important layer in conventional Neural Network, it contains a collection of filters (known as
Convolutional Kernels too). It convolves these filters with the desired input to get the output has best connection
with the input features. The filter itself is a connected layer or a network of discrete numbers like the filter shown in
figure 16.
This task is named also the subsampling because the image suffers decreasing of the samples size. Here we need
also to add zero pixels (padding) on image edges to make sure that the filter moves on edges too.
[Link] layer
This layer elects a pixel of each mask which could be the average or the maximum pixel, we need to select
the area we elect from it as shown in figure 19.
Then we can define the regions of interest and train the network on it.
10. CONCLUSIONS
We found that we can use several methods to improve the recognition performance, first of all we must use
a large number of images of the person (at least 75 images). Images must be collected in multiple poses of faces
with different light conditions and angles. We can use colored images instead of gray scale images with edge
detection to increase the neural network accuracy.
Using noise remove technique plays an important rule too to give a good feature extraction method. the
dataset can be increased by using traditional image processing techniques like mirroring, scaling, and resizing too,
which can increase the performance of the neural network.
We must keep in mind that increasing dataset size needs more processing power and memory to build a
good feature extraction model required for face recognition system.
References:
[1] D. Meena, R. Sharan, “An approach to face detection and recognition”, Conference: 2016 International
Conference on Recent Advances and Innovations in Engineering (ICRAIE),
DOI:10.1109/ICRAIE.2016.7939462, December 2016
[2] E. Rekha, P. Ramaprasad, “An efficient automated attendance management system based on Eigen Face
recognition”, 2017 7th International Conference on Cloud Computing, Data Science & Engineering –
Confluence.
[3] Aswani Kumar Cherukuri, Intrusion detection model using fusion of PCA and optimized SVM, International
Conference on Contemporary Computing and Informatics (IC3I), 2014.
[4] [Link], A. Nugroho Jati, [Link], “Face recognition based on the Android device using LBP
algorithm”, Conference: 2015 International Conference on Control, Electronics, Renewable Energy and
Communications (ICCEREC), DOI:10.1109/ICCEREC.2015.7337037, August 2015.
[5] K. KHANCHANDANI, d. SANGOI, G. PANCHAL, “ANIMAL RECOGNITION USING LOCAL BINARY
PATTERN HISTOGRAM (LBPH)”, International Journal of Advances in Electronics and Computer Science,
ISSN: 2393-2835, Volume-5, Issue-7, Jul.-2018.
[6] [Link],., S. MASMOUDI, A. G. DER- BEL and A. BEN HAMIDA.,” Fusing Gabor and LBP feature sets
for KNN and SRC-based face recognition”,. In: International Conference on Ad- vanced Technologies for
Signal and Image Processing (ATSIP). Monastir: IEEE, 2016, pp. 453– 458. ISBN 978-1-4673-8526-8. DOI:
10.1109/AT- SIP.2016.7523134.
[7] [Link], N. and D. HEUVEL.” Face Recognition Using Local Binary Patterns Histograms (LBPH) on an
FPGA-Based System on Chip (SoC)”. In: IEEE International Parallel and Distributed Processing Symposium
Workshops (IPDPSW). Chicago: IEEE, 2016, pp. 300–304. ISBN 978-1- 5090-3682-0. DOI:
10.1109/IPDPSW.2016.67.
[ 8] N. Stekas, D. van den Heuvel, “Face Recognition Using Local Binary Patterns Histograms (LBPH) on an
FPGA-Based System on Chip (SoC)”, DOI:10.1109/IPDPSW.2016.67, May 2016.
[9] H. Naroua, [Link],” Atsu Alagah Komlavi, Comparative study of machine learning algorithms for face
recognition”, HAL Id: hal-03620410,[Link] Submitted on 27 Sep 2022.
[10] A. Nehme, “Understanding Convolutional Neural Networks”, May 24, 2018,
[Link]
available[online].
[ 11] S. Guo, S. Chen and [Link] ,” Face Recognition Based on Convolutional Neural Network and Support
Vector Machine”, International Conference on Information and, Ningbo, China, August 2016.
[12] S. GUO, , S. CHEN and Y. LI. “Face recognition based on convolutional neural network and sup- port vector
machine”. In: IEEE International Conference on Information and Automation (ICIA). Ningbo: IEEE, 2016, pp.
1787–1792. ISBN 978-1-5090-4102-2. DOI: 10.1109/ICInfA.2016.7832107.
[13] [Link]-Min & Seo, Chang & Pan, Sung, “A face recognition system based on convolution neural network
using multiple distance face”. Soft Computing. 21. 10.1007/s00500-016-2095-0,2017
[14] I. Abbas, Eyad & Farhan, Hameed,”Face Recognition using DWT with HMM”. Engineering & Technology.
30. 142-154, 2014
[15 ]M., Tanaya & Wu, Q. M. Jonathan, “Face Recognition using Curvelet Based PCA”.
10.1109/ICPR.2008.4760972, 2008.
[16].A, Vinay & Mathias, Joella & Fathepur, Anita & Balasubramanya Murthy, Kannamedi & Subramanyam,
Natarajan, “BF-ASIFT-2DPCA and ABF-ASIFT-2DPCA for Face Recognition”. Procedia Technology. 25.
411-419. 10.1016/[Link].2016.08.126, 2016
[17]Liu, Decheng & Li, Jie & Wang, Nannan & Peng, Chunlei & Gao, Xinbo, “Composite components-based Face
Sketch Recognition”. Neurocomputing. 302. 10.1016/[Link].2018.03.042,2018
[18]Hu, Guosheng & Yang, Yongxin & Yi, Dong & Kittler, Josef & Li, Stan & Hospedales, Timothy,”When Face
Recognition Meets with Deep Learning: An Evaluation of Convolutional Neural Networks for Face
Recognition”. 384-392. 10.1109/ICCVW.2015.58,2015.