Research Paper

e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science

( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:06/Issue:02/February-2024 Impact Factor- 7.868 www.irjmets.com
FACIAL EMOTION DETECTION AND RECOGNITION

Ms. Sheetal Sehgal*1, Deepak Chandra*2, Rohan Kumar Dogra*3
*1HMR Institute Of Technology And Management, Hamidpur, Delhi, India.
*2,3Guru Gobind Singh Indraprastha University, Sector-16C, Dwarka, Delhi, India.
ABSTRACT
Facial emotional expression plays a crucial role in face recognition, and while humans find it effortless,
developing a computer algorithm to achieve the same is a challenging task. However, with the continuous
advancements in computer vision and machine learning, it has become possible to detect emotions in various
forms such as images and videos. This research proposes a method for face expression recognition using Deep
Neural Networks, particularly the convolutional neural network (CNN), along with image edge detection.
During the convolution process, the edges of each layer of the facial expression image are extracted and
normalized to preserve the texture and structure information. These retrieved edge details are then
incorporated into each feature image. The study explores and analyzes several datasets to train expression
recognition models. The main objective of this paper is to conduct a comprehensive investigation into face
emotion detection and recognition using machine learning algorithms and deep learning techniques. The
research aims to provide deeper insights into this field and shed light on the variables that significantly impact
its efficacy.
Keywords: Convolutional Neural Network, Machine Learning, Deep Learning, Computer Vision, Emotion
Recognition.
I. INTRODUCTION
Human-computer interaction technology is a type of technology that utilizes computer equipment as a medium
to facilitate interaction between humans and computers. The face recognition system (FRS) is a mechanism that
enables cameras to automatically identify individuals. The significance of accurate and effective FRS has
spurred biometric research in the race towards the digital world. In recent years, the field of human-computer
interaction technology has witnessed an increase in research activities due to the rapid advancements in
pattern recognition and artificial intelligence. Facial Emotion Recognition (FER) is a thriving area of study that
has seen numerous breakthroughs in industries such as automatic translation systems and machine-to-human
contact. In contrast, this paper focuses on surveying and reviewing various aspects of facial extraction features,
emotional databases, classifier algorithms, and more. The classical FER involves two main steps: feature
extraction and emotion recognition. Additionally, image pre-processing, which includes face detection,
cropping, and resizing, is performed. Face detection involves isolating the facial region by removing the
background and non-face areas. Finally, the extracted features are utilized to classify emotions, often with the
assistance of neural networks (NN) and other machine learning approaches. The challenge in facial emotion
recognition lies in automatically recognizing facial emotion states with high accuracy. It is difficult to find
similarities in the same emotional state between different individuals, as they may express emotions in various
ways depending on factors such as mood, skin color, age, and the surrounding environment. Typically, the
process of Facial Emotion Recognition (FER) can be divided into three primary phases, as illustrated in Figure
1: (i) Face Detection, (ii) Feature Extraction, and (iii) Emotion Classification.
In the initial stage, known as the pre-processing stage, an image of a face is identified and the facial components
within that region are detected. Moving on to the second stage, an informative feature is extracted from various
parts of the face. Finally, in the last stage, a classifier must undergo training before it can be utilized to generate
labels for the emotions using the training data. Facial actions are then classified into different Action Units
(AUs), and emotions are categorized based on collections of these AUs. Deep learning, a subset of machine
learning techniques, can be applied to emotion recognition and facial expression analysis. However, the
effectiveness of deep learning is influenced by the size of the available data, which can impact its performance.
II. METHODOLOGY
The technique proposed in this section explains the emotion database used for the study and the Inception
model. Additionally, this paper utilizes a Haar classifier for human detection. The Haar classifier is trained using
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[328]
e-ISSN: 2582-5208
Haar-like small features, which are commonly used texture descriptors. Its main features include linear, edge,
canter, and diagonal characteristics. The Haar-like feature effectively reflects the grey level changes in an
image, making it suitable for explaining facial features due to the distinct contrast changes in external body
parts. However, the calculation of eigenvalues is time-consuming. To improve calculation speed, this paper
employs the integral graph method for calculating the Haar-like values.
1. Face Detection
Face detection serves as a pre-processing phase to identify the facial expressions of humans. The image is
divided into two parts, one containing faces and the other containing non-face regions. Various methods are
employed for face detection.
A. Haar Classifier
Haar features are commonly measured by expanding or reducing the dimensions of the pixel group. They are
utilized to detect a picture, allowing for the identification of objects of varying sizes. In the training phase, the
Haar classifier will identify a group of features that contribute the most to the face detection problem. This
makes it suitable for face detection because it can indicate high detection accuracy while keeping the
computation complexity low.
2. Feature Extraction
Feature extraction involves transforming pixel data from the face region into a higher-level representation of
the face or its components, such as shape, color, texture, and spatial configuration. By reducing the dimension
of the input space, feature extraction retains the important information. It plays a vital role in developing a
more robust emotion categorization system, as the extracted facial features provide inputs to the classification
module that categorizes different emotions. Feature extraction can be categorized into two types: feature-based
and appearance-based.
A. Convolutional Neural Network (CNN)
Currently, CNN is one of the most widely used approaches to deep learning techniques. It is designed to require
minimal pre-processing and is named after its hidden layers, which include convolutional layers, pooling layers,
fully connected layers, and normalizing layers. These components are commonly found in a CNN's hidden
layers.
3. Classification of Expressions
The classification of expressions is carried out by a classifier, which employs various methods to extract
expressions. One of these methods is supervised learning, where a system is trained using labeled data. The
labeled data acts as a guide for the model, which learns from both the inputs and outputs provided. With this
knowledge, the model can then predict the classification of new data points. Supervised learning encompasses
two types: classification and regression.
A. Support Vector Machine (SVM)
SVM is a well-known statistical technique used in machine learning for classification and multivariate analysis.
It utilizes different kernel functions to map data from the input space to high-dimensional feature spaces.
B. Neural Network (NN)
NN performs a nonlinear reduction of input dimensionality and makes a statistical determination regarding the
category of the observed expression. Each output unit provides an estimation of the probability that the
examined expression belongs to the associated category.
1. Inception-V1 to V3
The Inception network represents a significant advancement in CNN classifiers. It consists of 22 layers and a
total of 5 million parameters. This design incorporates numerous techniques to enhance performance in terms
of both speed and precision. It is widely used in machine learning applications.Inception-V2 is the successor to
Inception-V1, with 24 million parameters. Inception-V3, on the other hand, is a popular image recognition
model that has demonstrated an accuracy of over 78.1 percent on the ImageNet dataset. However, its usage is
not widespread.

[329]
e-ISSN: 2582-5208
III. DATASET
To conduct an experiment on Facial Emotion Recognition (FER), it is necessary to have a regular database. The
information gathered from this experiment can be categorized as either primary or secondary. Primary
datasets require a significant amount of time to be completed due to the collection of data. Currently, there is a
variety of datasets available for study in FER. However, there are only a few datasets specifically designed for
the emotion recognition problem. Among these datasets, the Karolinska Directed Emotional Faces (KDEF) and
Japanese Female facial features (JAFFE) datasets are well-known and highly regarded in the field. The images in
these datasets are categorized into seven main emotion categories. The KDEF dataset, also known as KDEF for
simplicity, was developed by the Karolinska Institute in Sweden. The main purpose of this dataset was to be
used for experiments related to perception, memory, emotional attention, and backward masking. It consists of
4900 photos of 70 individuals, each displaying seven different emotional states.
IV. RESULTS AND DISCUSSION
In order to evaluate the algorithm's performance, we initially utilized the FER-2013 expression dataset. This
dataset consisted of only 7178 images, with 412 of them being posers, resulting in a maximum accuracy of 55%.
To address the issue of low efficiency, we obtained multiple datasets from the Internet and also included the
author's own pictures depicting various expressions. As the number of images in the dataset increased, so did
the accuracy. We divided the 11K dataset into 70% training images and 30% testing images. Both the
background removal CNN (first-part CNN) and the face feature extraction CNN (second-part CNN) had the same
number of layers and filters. The number of layers in this experiment ranged from one to eight, and we found
that the highest accuracy was achieved with four layers. Surprisingly, the number of layers was directly
proportional to accuracy and inversely proportional to execution time. However, the increase in execution time
did not significantly contribute to our research. Based on the accuracies obtained from the test set, our new
method outperformed existing ones. It is important to note that the proposed method only misclassified a few
photographs with perplexing perspectives, and overall identification accuracy remained impressive. Therefore,
this method shows promise in real-world environments where non-frontal or angularly captured photos are
common.
It is important to note that the proposed method only misclassified a few photographs with perplexing
perspectives, and overall identification accuracy remained impressive. Therefore, this method shows promise
in real-world environments where non-frontal or angularly captured photos are common.
However, the algorithm encountered a failure when multiple faces were present in the same image and
positioned at the same distance from the camera. It was observed that as the number of photons increased, the
accuracy decreased due to over-fitting. Similarly, reducing the number of training photos resulted in
consistently low accuracy. After a thorough investigation, it was determined that the optimal number of images
for FER to function effectively falls within the range of 2000 to 11,000.
V. CONCLUSION
In this research, we present a novel approach for identifying facial expressions using a CNN model. Our method
effectively extracts facial features by directly inputting the pixel values of training sample images. By removing
the background, we significantly enhance the accuracy of emotion determination. Emotion expression plays a
crucial role in communication, thus improving the quality of human interaction. Additionally, the study of facial
expression detection holds the potential for providing enhanced feedback to society and improving Human-

[330]
e-ISSN: 2582-5208
Robot interfaces (HRI). Emotion detection primarily focuses on the geometric aspects of the face, such as the
eyes, eyebrows, and mouth. Our review considers experiments conducted in controlled environments, real-time
scenarios, and wild images. The recent research, particularly in terms of performance with profile views, can be
applied to a wider range of real-world commercial applications, including patient monitoring in hospitals and
surveillance security. Furthermore, the concept of facial emotion recognition can be expanded to encompass
emotion recognition from speech or body motions, addressing emerging industrial applications.
VI. REFERENCES
[1] K. F. Azizan Illiana, "Facial Emotion Recognition: A Brief Review," in International Conference on
Sustainable Engineering, Technology and Management 2018 (ICSETM-2018), 2020.
[2] R. Shyam, "Convolutional Neural Network and its Architectures.," Journal of Computer Technology
& Applications, vol. 12, no. 2, pp. 6-14, 2021.
[3] R. Shyam, "Machine Learning and Its Dominant Paradigms," Journal of Advancements in Robotics,
vol. 8, no. 2, pp. 1-10, 2021.
[4] R. Shyam, "Automatic Face Recognition in Digital World," Advances in Computer Science and
Information Technology (ACSIT), vol. 2, no. 1, pp. 64-70, 2015.
[5] S. R. N. S. M. A. H. Akhand, "Facial Emotion Recognition Using Transfer Learning in the Deep
CNN," MDPI, vol. 10, no. 9, 2021.
[6] N. Mehendale, "Facial emotion recognition using convolutional neural networks (FERC)," SN Applied
Sciences, vol. 2, no. 3, 2020.

[331]

Research Paper

Uploaded by

Research Paper

Uploaded by

e-ISSN: 2582-5208

International Research Journal of Modernization in Engineering Technology and Science

FACIAL EMOTION DETECTION AND RECOGNITION

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

You might also like