Emotion Detection Using Machine Learning
Emotion Detection Using Machine Learning
https://doi.org/10.22214/ijraset.2023.53251
] International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
Abstract: In this article, a machine learning algorithm is used to identify human emotions. The method will provide a detection
precision of 87%. Three separate algorithms will have good accuracy thanks to this. Using a person's facial expression as a base,
it records them. One of the most disruptive advances of the past ten years has been the development of machine learning. It has a
big impact on reliability and accurate prediction.
One of the crucial non-verbal skills employed in this communication is emotion recognition, which helps to ascertain a person's
mindset and attitude. Machines could be more useful to mankind if they could recognise and understand human emotions. The
two most popular methods for determining emotions are those that depend on speech and facial expression. According to some
psychologists, about 55% of communication takes place through facial expressions.
The objective of this paper is to conduct an examination of recent FER (automatic facial emotion recognition) deep learning
research. We concentrate on the architecture and databases used, the contributions that were handled, and we highlight the
advancement by contrasting the recommended processes and the obtained results. This study's goal is to aid and guide scholars
by analysing earlier work and making recommendations for how to further this subject.
Keywords: Machine Learning , CNN , FER , Emotions.
I. INTRODUCTION
For human-machine interactions to progress in the beginning, communication is essential. It is clear that people prefer using natural
language, which has developed over time, to communicate with technology. In addition to the language we use to express ourselves,
emotions also serve as rational means of communication between others. It would be advantageous and take communication one
step further if machines were able to understand human emotions.
There are numerous ways that this emotion recognition may occur, but speech and emotion recognition based on facial expression
are the main areas of focus. These speech- and face-based emotion recognitions can be done in a number of ways, including with
deep learning and standard machine learning techniques.This article employs non-deep learning- based,often known as classical,
machine learning techniques.
The only way that human-machine contact may have a major impact is through effective communication. One of the crucial non-
verbal techniques used in this communication is emotion identification. Both verbal and nonverbal communication techniques are
useful. Emotions can be recorded through a variety of ways, such as speech, gestures, and facial and bodily expressions. According
to some psychologists, about 55% of communication takes place through facial expressions. Face features can change in a variety of
ways as a result of emotional-induced changes in facial muscle activity. By identifying variations in face features, one might infer
the mood depicted in the pictures. Due to advancements in related fields, particularly machine learning, image processing, and
human cognition, FER has evolved greatly in recent years. As a result, the influence and prospective applications of autonomous
FER have been expanding in a variety of fields, such as human-computer interaction, robot control, and driver state monitoring.
Facial emotion identification is a difficult problem since it requires the ability to recognise a variety of different facial forms,
positions, variations, etc. The eyes, mouth, and brows are some of the important features that are detected and evaluated to identify
the mood. The nose wrinkle, lip tightener, inner brow raiser, upper lid raiser, outer brow raiser, mouth stretcher, lip corner
depressor, and lip parts are additional vital elements in facial expressions that help identify the emotion. The nasio-labial, brows,
eyes, forehead, cheeks, and lips are therefore regions of focus because these areas are where the various emotions are produced by
the movement of underlying muscles.
Affective computing technologies may recognise the user's emotions through the use of sensors, microphones, and cameras and
respond by implementing certain, predetermined characteristics for a good or service. One way to think about effective computing is
through human- computer interaction; in this case, a device is able to perceive and respond to the emotions that users are showing..
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 6525
] International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
1) Flow Chart
A. Models used
1) CNN: Neural networks with convolutions: You can adjust the dropout, number of thick layers, and activation functions to
increase performance. We also applied transfer learning to a CNN for image categorization dubbed VGG, a pre-trained
convolutional neural network.
2) KNN: KNN is a non-parametric learning technique, therefore it doesn't assume anything about how the data are distributed. Our
data source were the euclidean distances between the spots. Our precision was around 50%, thus we investigated different non-
linear models to assess the model's precision..
3) Multi Layer Perceptron: MLPs are a subset of neural networks. They are made up of one or more layers of neurons. After the
input layers receive the data, there may be one or more hidden layers. The forecasts originate from the output layer. Our
accuracy increased from about 50% to about 80% when we used the distances between facial landmarks rather than pixel values.
However, as we required models that would be far more accurate, we opted to use CNNs.
4) Pooling: The CNN layers should have a pooling or subsampling layer added after the convolution layer once you have the
feature maps. Similar to the Convolutional Layer, the Pooling layer is in responsibility of reducing the spatial size of the
Convolved Feature. The amount of processing resources required to process the data will be decreased through dimensionality
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 6526
] International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
reduction.
5) Fully Connected Layer: Once we have given the input image a proper form, we will flatten it into a column vector.The flattened
output is sent into a feed-forward neural network, and each training cycle makes advantage of back propagation. The model uses
the Softmax Classification method to discern between dominant and particular low-level features in images.
III. METHODOLOGY
The following are the three key elements of emotion detection: Image preprocessing, Feature Extraction and Classification of
Features.
A. Image Preprocessing
The technology that can identify places, brands, people, products, and other items in photos is known as image recognition.
Computer vision is a subset of image recognition, a process that can identify and find an object in a digital video or image.
Techniques for gathering, processing, and analysing data from movies or still photos captured in the actual world are included in the
field of computer vision. These sources generate high-dimensional data that can be used to reach numerical or symbolic choices. In
addition to image identification, computer vision also includes object recognition, learning, event detection, video tracking, and
picture reconstruction. A computer can tell raster graphics from vector ones. While vector images are made up of a collection of
polygons with coloured annotations, raster images are composed of discrete numerically valued pixels. In order to interpret images,
geometric encoding is transformed into constructs that represent physical properties and objects. The computer then examines these
constructs logically. The second stage involves developing a predictive model that can be used with a classification algorithm..
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 6527
] International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
B. Feature Extraction
To fulfil a processing demand, a starting collection of raw data is dimension-reduced by feature extraction. The characteristics of an
image govern how it behaves. A feature, such as a point or an edge, is essentially a pattern in an image. Feature extraction might be
useful when you need to analyse data with fewer resources while maintaining the important and relevant data. The amount of
duplicated data can be reduced through feature extraction. After applying various image preprocessing techniques to the sampled
image, such as thresholding, scaling, normalising, binarizing, etc., the features are then extracted. To obtain features for image
classification and recognition, feature extraction techniques are used. The Colour Gradient Histogram and ORB are two feature
detection techniques.. Computer systems use the technique of corner detection to extract the features. The contents of an image are
inferred using those extracted features.
Corner detection can be used for a variety of things, including motion detection, image registration, video tracking, image
mosaicing, 3D modelling, and object recognition. During the detection stage, a window that is the target size is moved across
the input image. For each section of the image, the Haar characteristics are computed. Different features represent various
values. The difference is then put into comparison with a cutoff that separates objects from non-objects. It is referred
regarded as a "weak classifier" because each Haar feature only detects marginally better than random guessing. CNN performs
convolution on an input image using a filter or kernel. Convolution and filtering involve scanning the entire screen, starting in the
top left corner and moving downward until the entire width of the screen has been covered. Up till the full screen has been scanned,
this procedure is repeated. The features of the person's face match those in the image. The picture pixel multiplies the relevant
feature pixel. The values are added after being multiplied by the total number of pixels in the feature..
C. Classification of Features
Once we have given the input image a proper form, we will flatten it into a column vector. The flattened output is sent into a feed-
forward neural network, and each training cycle makes advantage of back propagation. The model uses the Softmax Classification
technique to classify images by locating dominant and specific low-level features. We now possess all of the parts required to build a
CNN. Convolution, pooling, and ReLU. Max pooling provides input to the multi-layer perceptron layer classifier that we first
described. In CNNs, these layers are frequently applied numerous times, like in the following sequence: Convolution -> ReLU ->
Max-Pool -> Convolution -
> ReLU - > Max-Pool. The layer that is fully connected won't be covered at this time.
V. CONCLUSION
Based on the video data, the suggested model's output predicts the subject's projected sentiment. Because the output determines the
severity of the subject's mental problems and level of stress, it can be used in a number of situations. Peers and family members can
therefore act to improve the subject's mental state and foster harmony and peace of mind if the subject hears "critical" comments.
These sentiment analysis techniques are therefore essential to building a thriving society. This study compiled the findings of a
number of studies, making an effort to incorporate as many references from recent years as feasible. Based on reviews, the study
addressed some of the issues with facial expression identification by using a variety of face detection, feature extraction, analysis,
and classification methods.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 6528
] International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
The paper offers thorough details on techniques used for Facial Expression Recognition (FER) at all stages. The study offers
thorough information regarding strategies that are currently being employed at all phases of the growth of that discipline, which is
very beneficial to both seasoned and fresh researchers in the subject of FER. Researchers' chances for future research are enhanced
by this knowledge, which also helps researchers better comprehend existing trends.
REFERENCES
[1] Renuka S. Deshmukh, Shilpa Paygude, and Vandana Jagtap's Facial Emotion Recognition System using Machine Learning Approach.
[2] James Pao's article, "Emotion Detection Through Facial Feature Recognition,"Using a machine learning method, Renuka S. Deshmukh, Vandana Jagtap, and
Shilpa Paygude created the 2017 paper "Facial Emotion Recognition System."
[3] "The Research of Elderly Care System Based on Video Image Processing Technology," Dongwei Lu, Zhiwei He, Xiao Li, Mingyu Gao, Yun Li, and Ke Yin,
2017.
[4] Shivam Gupta, "Recognition of facial emotion in static and real-time images," 2018.
[5] "Facial Emotion Recognition", Ma Xiaoxi, Lin Weisi, Huang Dongyan, Dong Minghui, and Haizhou Li, 2017.
[6] "Facial Emotion Recognition Using Deep Convolutional Networks," by Mostafa Mohammadpour, Hossein Khaliliardali, Mohammad. M. AlyanNezhadi, and
Seyyed Mohammad. R. Hashemi, published in 2017.
[7] Automatic Emotion Recognition Using Facial Expression: A Review, International Research Journal of Engineering and Technology, Dubey, M., Singh, P. L.,
2016.
[8] "Affective learning: Empathetic agents with emotive facial and tone of voice expressions," by Christos N.
[9] Moridis and Anastasios. 2012, pp. 260–272 in IEEE Transactions on Affective Computing, volume 3.
[10] Grzegorz Brodny, Agnieszka Landowska, Agata Koakowska, Mariusz Swoch, Wioleta Swoch, and Micha R. Wrbel. Comparison of a few pre-made
programmes for recognising emotions from facial expressions IEEE, 2016's 9th ICHSI, pp. 397–404.
[11] Facenet2expnet: Regularizing a deep face recognition net for expression recognition, H. Ding & S. K. Zhou, 2017 12th IEEE International Conference on
Automatic Face & Gesture Recognition, pp. 118-126.
[12] T. Hassner, K. Kim, Y. Wu, Convolutional neural networks that have been modified for facial landmark detection, IEEE Transactions on Pattern Analysis and
Machine Intelligence, 2017
[13] Coimbatore Institute of Technology, Punidha A, Inba S, Pavithra K.S., Ameer Shathali M, Athibarasakthi
publication341782389_Human_Emotion_Detection_using_ Machine_Learning_Techniques.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 6529