Sign Language Recognition Using Machine Learning

Volume 9, Issue 5, May – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24MAY273
Sign Language Recognition Using

Machine Learning
Dr. Bhuvaneshwari K V1 Bindu A R2; Manvitha G K3
Associate Professor Nikitha N Chinchali4; Nisha K N5
Department of Information Science and Engineering, U.G. Students
BIET, Davanagere, Karnataka, India Department of Information Science and Engineering,
BIET, Davanagere, Karnataka, India
Abstract:- Communication is very important in human  American Sign Language (ASL):

daily life and the most widely used type of With over 500,000 users in the US and Canada, ASL is
communication is verbal communication. But there are one of the most popular sign languages in the world. ASL
people with hearing and speech impairment who cannot has its own unique grammar and hand patterns and was
communicate verbally and the language which they use established in US schools for the deaf in the 18th century.
for communication is sign language. Many other
languages, tools are being developed for inter-language Deep learning is a type of machine learning that relies
translation from sign language to text. There has been a on artificial neural networks. It can recognize intricate links
lot of research done in the field of American Sign and patterns in data. Computer technology that is
Language but the work is limited in the case of Indian inspired by the functioning of the human brain is called
Sign Language. This is due to lack of standards and the deep learning. It learns and completes difficult tasks by
variation in the language. The proposed system aims to using networks of synthetic neurons. During training, the
recognize Indian Sign Language digit gestures and layers of these networks, also known as neural networks,
convert it into text. By using Machine Learning modify their connections. The term "deep" refers to the
Techniques, sign language recognition leads to the ability of these networks to comprehend and represent data
development of a more accurate and robust system. As at a higher level of detail through the use of several
Deep learning techniques, ResNet100 and ensemble layers. To enhance the model's predictions, these
models continue to evolve, sign language recognition connections are adjusted during the training
system plays a transformative role in bridging the phase. Deep learning is renowned for its ability to
communication gap between deaf and hearing automatically identify significant elements in data without
individuals. It helps the user to recognize the sign requiring human input. To provide precise insights and
language by using this proposed system. forecasts, deep learning models can identify intricate
patterns in images, text, audio, and other types of
Keywords:- Sign Language, Convolutional Neural data. Deep Learning has shown great promise in several
Networks, Residual Network , Random Forest Classifier, domains, such as recommendation systems, speech
Ensemble Model. recognition, natural language processing, and picture
recognition. Convolutional neural networks (CNNs),
I. INTRODUCTION recurrent neural networks (RNNs), and transfer learning
are a few of the well-liked Deep Learning designs.
The language known as sign language uses hand
movements in place of spoken words to communicate. It is Deep learning includes a variety of methods, each
beneficial for the deaf to know sign language. It gives deaf tailored to certain tasks and situations. The following are
people an easy method to communicate with others. It helps some basic methods used in the field of deep learning:
the deaf community feel accepted by society. Although most
people who use signing are deaf, hearing people also use it,  Convolutional Neural Networks (CNN):
including those who are unable to talk normally, have Dedicated to handling data that resembles a grid, like
problems with oral language because of a disability or pictures. It makes use of convolutional layers to identify
disease, and have family members who are deaf. Sign spatial hierarchies and patterns. Tasks in computer vision,
language has a large and diversified community. There are such as object detection, image segmentation, and image
hundreds of distinct sign languages spoken around the recognition, make extensive use of it. Recurrent Neural Nets
world, even though there isn't one global sign language. (RNN): Suitable for sequence data, enabling the persistence
Here are a few sign languages that are used worldwide: of information. To transfer data from one stage in the
sequence to the next, it contains loops. It is employed in
time-series analysis, speech recognition, and natural
language processing activities. Transfer of Learning: It
entails using previously trained models from one job and
modifying them for a similar but distinct activity.
IJISRT24MAY273 www.ijisrt.com 73
An ensemble model builds upon several separate The research done by various authors is also studied, and
models to provide a prediction that is more reliable and some of the important research articles are also discussed
powerful than any of the individual models working in this article. This article discussed how machine
alone. An ensemble model called Random Forest is made learning methods could benefit the field of automatic
up of several decision trees because of its scalability, sign language recognition and the potential gaps that
resilience, and capacity to manage high-dimensional data, machine learning approaches need to address for the
Random Forest is a robust and adaptable ensemble learning real-time sign language recognition.
method that is frequently employed for both regression and  [8] Important application of hand gesture recognition
classification applications. A random portion of the training that is translation of sign language. In sign language, the
data and a random subset of the characteristics are used to fingers’ configuration, the hand’s orientation, and the
create each decision tree that makes up Random Forest. A hand’s relative position to the body are the primitives of
random portion of the training data and a random subset of structured expressions. The importance of hand gesture
the characteristics are used to create each decision tree that recognition has increased due to the rapid growth of the
makes up Random Forest. hearing-impaired population. In this paper, a system is
proposed for dynamic hand gesture recognition using
II. LITERATURE SURVEY multiple deep learning.
 [9] The approach is to have a vision based system in
 [1] The basic concept of sign language recognition which the sequence of images representing a word in
system and review of its existing techniques along with ISL is translated to equivalent English word. The
their comparison presented. The main objective of translation would be done by means of Deep learning
presenting this survey is to highlight the importance of algorithms namely convolutional neural nets and
vision based method with a specific focus on sign recurrent neural nets. The system will be analyzing
language. They covered most of the currently known sequence of images, hence CNNs will analyze each
methods for SLR tasks based on deep neural image and their sequence is analyzed by LSTM (which
architectures that were developed over the past several is an implementation of RNN). We divided dataset into
years, and divided them into clusters based on their chief training dataset and testing dataset, which obtained
traits. The most common design deploys a CNN network 73.60% accuracy. The image distributions are kept fairly
to derive discriminative features from raw data, since different in training and testing datasets.
this type of network offers the best properties for this  [11] An alternative to written and direct communication
task. In many cases, multiple types of networks were languages used in India and the Indian subcontinent is
combined in order to improve final performance. Indian Sign Language (ISL). People who are deaf or
 [2] Work on American Sign Language (ASL) words mute and are unable to hear or talk frequently use it.
share similar characteristics. These characteristics are Compared to other sign languages used in developed
usually during sign trajectory which yields similarity nations, the ISL is a novel sign language. Given its
issues and hinders ubiquitous application. However, current application characteristics, automatic recognition
recognition of similar ASL words confused translation of any sign language, including ISL, is necessary. ISL
algorithms, which lead to misclassification. Based on automation will benefit both communities—those who
fast fisher vector (FFV) and bi-directional Long-Short can exclusively communicate in ISL and those who do
Term memory (Bi-LSTM) method, a large database of not know the language at all—because unblessed
dynamic sign words recognition algorithm called individuals frequently have trouble interacting in public
bidirectional long-short term memory-fast fisher vector settings like airports, train stations, banks, and hospitals.
(FFV-Bi-LSTM) is designed. The performance or FFV-  [12] The basis of every human interaction, whether it be
Bi-LSTM is further evaluated on ASL data set, leap personal or professional, is communication. It is among
motion dynamic hand gestures data set (LMDHG), and the necessities for surviving in a community. Without a
semaphoric hand gestures contained in the Shape clear, mutually understood language, verbal
Retrieval Contest (SHREC) dataset. communication is impossible. In India, sign language is
 [6] The research article investigates the impact of used for communication by about 26% of the disabled
machine learning in the state of the sign language population. Therefore, it is imperative to close the
recognition and classification. It highlights the issues communication gap that exists between the general
faced by the present recognition system for which the public and those who are speech challenged. The
research frontier on sign language recognition intends objective is to create a pair of sensor gloves that can
the solutions. In this article, around 240 different translate motions used in Indian Sign Language (ISL)
approaches have been compared that explore sign into audible speech.
language recognition for recognizing multilingual signs.
III. SYSTEM DESIGN
This Chapter discusses the system architecture of sign language recognition using deep learning techniques and ensemble
model.
A. Architecture Design
As shown in the Figure 1 The Proposed system mainly consist of two modules namely Training phase and Testing phase.
Fig 1 System Architecture of Sign Language Recognition
 Training Phase  Data Preprocessing

Training phase includes preparation of data set, pre- A vital stage in the machine learning process is data
processing, segmentation, feature extraction and building a preprocessing, which includes preparing raw data for model
model. training by cleaning and formatting it. The objectives of this
procedure are to improve the quality of the dataset, and
 Preparing the Data Set remove superfluous data. By addressing variances such
The first stage is to collect a wide range of sign variations in lighting, background, or hand locations,
language gesture. This dataset should contain a variety of normalizing the data aids in achieving uniformity. The
signs. Next, annotate the dataset with relevant labels for careful preparation of the dataset guarantees that the
each sign. machine learning algorithm can successfully identify
patterns and produce accurate predictions, which enhances
 Data Collection: model performance. Data preprocessing includes steps like
Collect a broad variety of hand gesture images or resizing, gray scale conversion and converting images from
videos. To improve the model's robustness, ensure that the BGR to RGB to enhance the foreground images and reduce
lighting conditions, backdrops, and hand locations vary. the computation complexities.
 Dataset Splitting:  Segmentation and Feature Extraction

In this step, the dataset is split into training and testing Hand gestures and background can be distinguished
sets. This division avoids over fitting and aids in assessing from one another by using temporal segmentation
the model's performance on untested data. techniques to identify distinct sign boundaries. In order to
help with the precise separation of signs within a sequence,
openCV library is used. Hand gesture segmentation applies  Precision:

the segmentation mask to the original image, and saves the The ratio of correctly predicted positives to all
resulting masked image. The process of feature extraction, predicted positives, which shows how well the model avoids
which comes after segmentation, involves taking pertinent false positives.
information out of the segmented data. Capturing hand and
finger postures and movements is a typical element in the  Recall (Sensitivity):
context of sign language recognition. These elements are A measure of how well the model captures all pertinent
essential for conveying the distinct qualities of every sign. cases, calculated as the ratio of true positive predictions to
In this proposed system, feature extraction is done using all actual positives.
mediapipe library. A sign language recognition system's
overall accuracy and resilience are greatly enhanced by  F1 Score:
efficient feature extraction. A balanced metric produced by taking the harmonic
mean of recall and precision.
 Building a Model
Building a data model in deep learning involves B. System Architecture of Trained Models
selecting an appropriate architecture for the specific task. The Proposed system uses three models are
Convolutional Neural Networks (CNNs) and ResNet100 are Convolutional neural network, Residual network (ResNet),
types of Deep learning models, each suited for different Random forest classifier (Ensemble model).
types of data and tasks. Ensemble model has also been built
in this proposed system where further comparison is done  Convolutional Neural Network
between these models to determine which model possesses
high performance.
 Training the Model:

The pre-processed dataset, obtained after dividing the
data into training and testing sets, is used to train the model.
After that, it is checked to make sure all pertinent
characteristics have been retrieved and cleaned, subdivided,
and handled correctly. Depending on the type of job
(classification, regression, etc.), an appropriate loss function
is selected, and an optimization technique is used to modify
the model's parameters during training in order to minimize
this loss. To avoid over fitting, the model's performance is
evaluated during training on a validation set.
A reliable and efficient model for sign language

recognition can be created by choosing a suitable deep
learning model architecture (such as CNNs, Residual Fig 2 System Architecture of CNN
network and Random Forest classifier) and training it on a
prepared dataset. The selection between convolutional  Input Layer:
neural networks (CNNs), Residual Network and Random The raw image dataset is sent to this layer. The input
forest classifier is based upon the type of input, task-specific could be a series of frames for dynamic signals or a
requirements, and accuracy attained from each model. grayscale or RGB image depicting a single hand position,
depending on the approach used.
 Testing Phase
Testing Phase includes steps like preprocessing,  Convolutional Layer:
Segmentation, feature extraction, testing a model, and These layers take the input image's features and extract
finally prediction. them. A small filter, known as a kernel, is used in
convolution to compute dot products between the input data
To evaluate the performance of a deep learning model, and the filter at each location on the image as it slides across
use a separate test dataset. The method typically involves it. This makes it easier to spot patterns and small details in
the following steps: the picture.These characteristics may be used to identify
hand posture, finger configuration, and hand placement in
 Metrics Evaluation: relation to the body in sign language. varied levels of
After training and optimizing the model, it is tested on abstraction in the features can be learned by utilizing
the testing set using a variety of metrics. multiple convolutional layers with varied filter sizes.
 Accuracy:  Pooling Layer:

The proportion of instances properly predicted to all Pooling layers can be utilized to introduce some degree
instances. of invariance to little variations in hand pose and lower the
dimensionality of the feature maps.
 Max Pooling: Removing the background clutter to focus on the signer's

In max pooling, we choose a window size [for hand region.
example, a window of size 2*2] and only accept the
maximum of four values. Well, close this window and  Initial Convolution:
repeat the process until you have an activation matrix that is The preprocessed image/frame is processed by a single
half the size of its original. convolutional layer that extracts low-level information
related to hand posture, such as edges and shape.
 Average Pooling:
In average pooling, we make use of all values in a  ResNet Stage:
window. This is the core of the architecture and involves
stacking multiple ResNet stages. A set of residual blocks
 Fully Connected Layer: makes up each stage.
Based on the features extracted in the preceeding
layers, picture categorization in the CNN takes place int the  Residual Block (in Every Stage):
FC layer. These layers are similar to those used in classic
neural networks. They connect the outputs of the  Input Transformation:
convolutional and pooling layers to all neurons in the next The input from the preceding block (or the first stage's
layer. Fully-connected layers are commonly used for convolution) is processed by two or three convolutional
classification tasks, with the final output layer representing layers. The goal of these tiers is to extract more intricate
the probabilities of the image falling into distinct classes. traits.
 Final Output Layer:  Skip Connection:

The final layer gets the output of the last fully- The residual block's unaltered input is added to the
connected layer, which serves as a high-dimensional convolutional layers' output via this direct connection. This
representation of the input data retrieved by preceding layers guarantees that in addition to the necessary transformations
(convolution and pooling). The output layer converts the learned by the convolutions, the network can learn the
internal representation into the required output format. identity mapping.
 Residual Network (ResNet)  Non-Linearity:

The summed output from the skip connection is
subjected to an activation function (such as ReLU) and
convolutions. This introduces non-linearity and improves
the network’s ability to learn complex patterns.
 Pooling Layer (Between Stages):

To lessen the dimensionality of the feature maps, a
pooling layer can be applied after each ResNet stage—
possibly the final one. This improves processing efficiency
and can introduce some level of invariance to small
variations in hand pose. A global average pooling layer
compiles the data from the complete feature map after the
last ResNet stage. Before the data is fed into the fully-
connected layers, it is further reduced in this way.
 Fully Connected Layers:

The pooled features from the previous phase are
applied to these layers. The network learns to classify the
features into several sign language categories using a
sequence of fully-connected layers with decreasing
dimensionality. The odds that the input belongs to particular
signs in the vocabulary are represented in the final output
Fig 3 System Architecture of Residual Network layer.
 Preprocessing:
The first stage is to prepare the input data for the
network. Common methods include: Hand Segmentation is
separating the hand from the image. Normalization is
scaling the pixel intensity values to a specific range for
better network training and Background Subtraction is
 Random Forest Classifier (Ensemble Model)
Fig 4 System Architecture of Ensemble Model
 Data Preparation:  Voting for Prediction :

The first stage involves dividing the dataset into sets After constructing the decision trees, the method
for testing and training. The random forest model is employs them to forecast new data points. The random
constructed using the training set, and its performance is forest classifier aggregates the predictions of all the decision
assessed using the testing set. trees by obtaining the majority vote on the class predictions.
 Random Sampling:  Model Evaluation:

In order to build each decision tree, the random forest Finally, the random forest classifier's performance is
algorithm randomly chooses a subset of the training data. assessed on the testing set. Accuracy, precision, recall, and
This method is known as bagging or bootstrap aggregating. F1 score are examples of commonly used evaluation
A distinct subset of the data is used to train each tree in the metrics.
forest.
IV. IMPLEMENTATION
 Feature Sampling:
The method chooses a subset of features at random for A. Create Dataset :
every decision tree in addition to randomly sampling data. The dataset is created manually by collecting images
This lessens overfitting and adds variation to the decision from the webcam for all digits from 0 to 9 that require only
trees. single hand for the gesture in the Indian sign Language.
Images are stored in the png format. For each digit in Indian
 Constructing Decision Trees : sign language separate directories are created to store the
A subset of the characteristics and data are used to respective digit. Each directory contains 100 images of the
construct each decision tree in the random forest. A tree-like respective digit in Indian sign language. Since we have
structure is produced by the algorithm's recursive division of taken 0 to 9 digit computes to form totally 1000 images. Out
the data into smaller subsets according to the values of the of 1000 images, 800% of the data i.e, 800 images considered
chosen features. for training phase and 200 images for testing phase.
Collected dataset of hand gestures representing ISL..
Fig 5 Hand Gestures Dataset of ISL
B. Preprocessing : Table 1 Represents the Accuracy, Precision, F1 Score and

Preprocessing is used to enhance the quality and the Recall of all the Three Models CNN, Residual Network
foreground image. To implement this following are the steps (ResNet) and Random Forest Classifier (Ensemble Model)
used in preprocessing: Model Accuracy Precision F1 score Recall
CNN 0.94 0.95 0.94 0.94
 Resizing: ResNet 0.96 0.97 0.96 0.96
Resizing an image refers to the process of changing its Ensemble 0.99 0.98 0.98 0.99
overall dimensions, either making it larger or smaller.
Image in the figure 6 represents the prediction of hand
 Grayscale: gesture in Indian sign language with better accuracy.
A grayscale image is a digital image that contains only
shades of gray, ranging from pure black (absence of light)
to pure white (full intensity).
 Converting image to RGB:

Changing the order of the colour channels that
represent the image data. Both BGR and RGB are colour
models.
 Segmentation :
Segmentation mainly involves separating the hand
region from the background image. The isolated hand region
is then evaluated to determine the exact gesture being sent.
Segmentation applies the segmentation mask to the original
image, and saves the resulting masked image. Fig 6 Image of Hand Gesture Representing
Indian Sign Language
 Feature Extraction :
Feature Extraction is the process of identifying and In conclusion, the development of a sign language
extracting informative characteristics from the hand region recognition system is a significant step towards fostering
in an image. The technique to detect and track the locations inclusive communication for the deaf and hard-of-hearing
of particular points on an individual's hand is known as hand communities. It can provide real-time translation of sign
landmarking. These locations, which are sometimes referred language into spoken language or text. It Enabling deaf
to as landmarks or key points, can be the wrist, the tips and individuals to interact and convey their messages more
bases of the fingers, or other hand points. One can use easily in various situations like education, employment, and
landmarks to recognize the various indications that the social settings.By leveraging advancements in deep learning
person is making. By using the MediaPipe library can and human-computer interaction, such a system has the
identify hand landmarks. potential to bridge the communication gap between
individuals who use sign language and those who do not use
V. RESULT AND CONCLUSION sign language.
CNN, Residual network (ResNet), and Ensemble In term of future improving the preprocessing to
model performance in the suggested system were assessed predict gestures even in low light conditions with a higher
and compared. When compared to the other two models, the accuracy. The proposed system can an be further built as a
ensemble model provides the most accurate performance web/mobile application for the users and it’s only works for
measurement. sign language gestures of a single hand. So, it can be
enhanced to take gestures using both hands.
REFERENCES [11]. A. K. Sahoo, G. S. Mishra, K. K. Ravulakollu, ARPN

J. Eng. Appl. Sci. 2014, 9, 116. “Indian sign language
[1]. Sakshi Sharma,Sukhwinder Singh, “Vision-based recognition using ensemble based classifier
sign language recognition system: A Comprehensive combination”, Feb. 2022
Review” Published in 2020 International Conference [12]. Ajay S, Ajith Potluri, Sara Mohan George, Gaurav R,
on Inventive Computation Technologies (ICICT) Anusri S, “Indian Sign Language Recognition Using
published on June 2020. Random Forest Classifier”, IEEE International
[2]. D Sathyanarayanan, T. Srinivasa Reddy, A. Sathish, Conference on Electronics, Computing and
P. Geetha, J.R. Arunkumar, S. Prem Kumar Deepak, Communication Technologies (CONECCT), July
"American Sign Language Recognition System for 2021
Numerical and Alphabets", 2023 International
Conference on Research Methodologies in
Knowledge Management, Artificial Intelligence and
Telecommunication Engineering (RMKMATE),
pp.1-6, 2023.
[3]. M. Alfonse, A. Ali, A. S. Elons, N. L. Badr and M.
Aboul-Ela, "Arabic sign language benchmark
database for different heterogeneous sensors", Proc.
5th Int. Conf. Inf. Commun. Technol. Accessibility
(ICTA), pp. 1-9, Dec. 2016.
[4]. M. E. R. Grif and A. B. M. R. E. Prikhodko,
‘‘Recognition of Russian and Indian sign languages
based on machine learning,’’ Anal. Data Process.
Syst., vol. 3, no. 83, pp. 53–74, 2021.
[5]. R. Elakkiya, "Retraction note to: Machine learning
based sign language recognition: A review and its
research frontier", J. Ambient Intell. Humanized
Comput., vol. 12, no. 7, pp. 7205-7224, Jul. 2022.
[6]. Muneer Al-Hammadi, Ghulam Muhammad, Wadood
Abdul, Mansour Alsulaiman, Mohammed A.
Bencherif, Tareq S. Alrayes, Hassan Mathkour, Aand
Mohamed Amine Mekhtiche- "Deep Learning-Based
Approach for Sign Language Gesture Recognition
With Efficient Hand Gesture Representation" - -
Grant No. 5-18-03-001-0003.
[7]. Riad Souissi, Thariq Khalid,Muhammad Al-Qurishi,
“Deep Learning for Sign Language Recognition:
Current Techniques, Benchmarks, and Open Issues,”
Published in IEEE Access Vol.9, September 2021.
[8]. J. C. Núñez, R. Cabido, J. J. Pantrigo, A. S.
Montemayor, and J. F. Vélez, ‘‘Convolutional neural
networks and long short-term memory for
skeletonbased human activity and hand gesture
recognition,’’ Pattern Recognit., vol. 76, pp. 80–94,
Apr. 2018.
[9]. R. Cui, H. Liu, and C. Zhang,‘‘A deep neural
framework for continuous sign language recognition
by iterative training,’’ IEEE Trans. Multimedia, vol.
21, no. 7, pp. 1880–1891, Jul. 2019.
[10]. E. Rajalakshmi, R. Elakkiya, A. L. Prikhodko, M. G.
Grif, M. A. Bakaev, J. R. Saini, K. Kotecha, and V.
Subramaniyaswamy, ‘‘Static and dynamic isolated
Indian and Russian sign language recognition with
spatial and temporal feature detection using hybrid
neural network,’’ ACM Trans. Asian Low-Resource
Lang. Inf. Process., vol. 22, no. 1, pp. 1–23, Jan.
2023

Sign Language Recognition Using Machine Learning

Uploaded by

Sign Language Recognition Using Machine Learning

Uploaded by

Volume 9, Issue 5, May – 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24MAY273

Sign Language Recognition Using

Abstract:- Communication is very important in human  American Sign Language (ASL):

III. SYSTEM DESIGN

Fig 1 System Architecture of Sign Language Recognition

 Training Phase  Data Preprocessing

 Dataset Splitting:  Segmentation and Feature Extraction

openCV library is used. Hand gesture segmentation applies  Precision:

 Training the Model:

A reliable and efficient model for sign language

 Accuracy:  Pooling Layer:

 Max Pooling: Removing the background clutter to focus on the signer's

 Final Output Layer:  Skip Connection:

 Residual Network (ResNet)  Non-Linearity:

 Pooling Layer (Between Stages):

 Fully Connected Layers:

 Random Forest Classifier (Ensemble Model)

Fig 4 System Architecture of Ensemble Model

 Data Preparation:  Voting for Prediction :

 Random Sampling:  Model Evaluation:

Fig 5 Hand Gestures Dataset of ISL

B. Preprocessing : Table 1 Represents the Accuracy, Precision, F1 Score and

 Converting image to RGB:

REFERENCES [11]. A. K. Sahoo, G. S. Mishra, K. K. Ravulakollu, ARPN

You might also like