Sign Language To Text-Speech Translator Using Machine Learning

ISSN 2347 - 3983
Akshatha Rani K et al., International Journal ofVolume 9. Trends

Emerging No. 7, July 2021 Research, 9(7), July 2021, 912 – 916
in Engineering
International Journal of Emerging Trends in Engineering Research
Available Online at http://www.warse.org/IJETER/static/pdf/file/ijeter13972021.pdf
https://doi.org/10.30534/ijeter/2021/13972021
Sign Language to Text-Speech Translator Using Machine

Learning
Akshatha Rani K 1, Dr. N Manjanaik 2
1
Student, Digital Communication and Networking, University BDT college of Engineering, Davangere,
Karnataka, India, akshatharani027@gmail.com
2
Professor, Digital Communication and Networking, University BDT college of Engineering, Davangere,
Karnataka, India, manjubdt2009@gmail.com
ABSTRACT Therefore, in order to overcome this challenge sign

language recognition system is a powerful tool and so many
Communication with deaf and dumb people is quite difficult researches are carrying on in this field which are very helpful
task for others. So, through sign language can communicate for the society. In this competitive world, day by day
with deaf and mute persons but it is difficult for normal people technologies are getting advanced so this interpreter plays a
to understand the sign language hence it creates a huge gap major role and by this system equal opportunities will be
between them and it's uneasy to exchange their ideas, available for all regardless of their disabilities.
thoughts with others. This gap has existed for years in order to
minimize this, new technologies should be emerged. In this world numerous different languages are there, in
Therefore, an interpreter is necessary which acts as a bridge different regions people will speak different languages like
between deaf-mute and others. This paper proposed system that sign language will also differ according to the regional
which is a sign language translator. The system used language. In this paper American Sign Language (ASL) is
American Sign Language (ASL) dataset which is used and communication is carried out in English. There are
pre-processed based on threshold and intensity. This system two groups in sign language recognition namely static and
recognizes sign language alphabet and by joining the letters it dynamic sign language. In this paper static sign language is
creates a sentence then it converts the text to speech. As the used that is data is in the form of images and hand tracking
system is based on hand, hand gesture is used in sign technique is used which tracks the hand efficiently [1]. This
language recognition system, for that the efficient hand system recognizes the hand gestures on real-time which are
tracking technique which is given by media pipe cross captured by the camera.
platform is used and it exactly detects the hand after that by
using the ANN architecture the model has trained and which This system is built by using machine learning algorithm
classifies the images. The system has achieved 74% accuracy and data is processed that is given to the model which is built
and recognize almost all the letters. The system which also by deep learning neural network and then prediction will be
converts sign text to speech so that it will also helpful for taken place in real-time manner.
blind people.
2. LITERATURE REVIEW
Key words: ANN, ASL, deaf-mute, hand gesture, Sign
Language. [2] proposed hand gesture recognition using
Karhunen-Loeve (K-L) transform with this method they have
1. INTRODUCTION also used CNN. For hands detection they used skin filtering,
palm cropping to extract the palm area of hand and edge
Communication is an important media to convey thoughts detection to extract the outline of palm. Then feature
and expressions among the groups or between the individuals. extraction of hand was carried out by using K-L transform
Good communication leads to good thoughts and it helps for method and image classification by using Euclidean distance.
developments. In order to communicate, language is an They tested for 10 different hand gestures with 96% of
essential tool, language means it not only to be in words but accuracy.
also can be an action. Sign language is used by deaf and mute
people in order to communicate with others through body [3] proposed single hand sign language gestures
movement and hand gestures. All are unable to understand recognition using contour tracing descriptor. In this paper,
sign language so it becomes difficult for deaf, hearing segmentation of hand contours from image background was
impaired and speech disabled persons to communicate and carried out by using skin color detection with RGB and
express their thoughts with others. As a result, this challenge YCbCr color spaces, and threshold intensities of grey level.
is a barrier between deaf, dumb people and others. Contour tracing descriptor was used for gesture contours
912
Akshatha Rani K et al., International Journal of Emerging Trends in Engineering Research, 9(7), July 2021, 912 – 916
detection by segmentation. They used SVM end KNN [10] proposed sign language recognition system using
supervised machine learning techniques for image CNN and computer vision. The system used HSV color
classification to evaluate the accuracy. algorithm for hand gesture detection and they set the
background black. Image pre-processing consists of grayscale
[4] proposed hand gesture recognition using PCA. This conversion, dilation, mask operation and hand gesture was
system color model approach and thresholding method with segmented. The CNN architecture was used for feature
effective template matching for hand detection. Hand extraction in the first layer and then for image classification.
recognition is segmented with skin color modelling in YCbCr This system was able to recognize 10 alphabets and it
color space. Otsu thresholding is used for foreground and achieved 90% of accuracy.
background separation. PCA is used for template matching
for gesture recognition the system achieved accuracy of 3. SYSTEM METHOD
91.43% for low brightness images.
The below figure 1 shows the block diagram of the proposed
[5] proposed ASL gesture recognition by using deep CNN system.
for letters and digits. In this paper, images were pre-processed
in which image background was removed by using
background subtraction technique. The dataset was split into
two, one for training and other for testing, and they have used
CNN to classify images. The system achieved 82.5% accuracy
on the alphabet gestures.
[6] proposed review of hand gesture & language

recognition techniques. The system carried out data
acquisition, pre-processing in which it used median and
gaussian filter for noise reduction, morphological operation to
remove unwanted information and histogram equalization,
then segmentation in which it has skin color segmentation
and tracking for hand detection and next step is feature
Figure 1: System block diagram
extraction it used various methods and at last image
classification. Overall, this paper provides comprehensive
This system consists dataset of images are available in
introduction in field of automated gesture & language
Kaggle website which are captured by camera. These images
recognition.
are pre- processed in which thresholding and intensity
rescaling operations are carried out, and after the pre-process
[7] proposed dynamic sign language recognition system.
by using hand tracking technique the system will consider the
They used supervised learning algorithm called SVM for
images in which the hand is detected. Then images are saved
image classification, prediction and identification. This
in the form of file. After that these images are trained by using
system recognizes sign gestures from a live video feed. In this
ANN architecture and the model is saved. By using this model
system it extracts the hand contours from the frames of video
further, the system can predict the sign language alphabet in
by darkening the images and getting the white border of the
real time, one by one by joining the letters system can create a
hand this border is used to identify the hand contours.
sentence. Then the text is converted to speech.
[8] proposed sign language recognition for static signs
3.1 Sign Language Dataset
using deep learning. This system used skin color modelling
technique for hand detection and skin color range is
This is the first most and one of the crucial steps in
predetermined that will extracts hand pixels i.e., foreground
machine learning. Data is collected from the Kaggle website,
from non-pixels i.e., background. The system used CNN for
is an online community for machine learning practitioners.
image classification and images has uniform background.
Here the data set used is an American Sign Language (ASL)
The system achieved accuracy of 90.04% for asl alphabet
alphabet, it is partitioned into two, for training and testing.
recognition and 93.67% testing accuracy.
The training folder which consists of 26 folders of ASL
alphabet with one folder of ‘space’ character. Each folder
[9] proposed hand gesture recognition for static images
consists of 2000 RGB images and these are all static images.
based on CNN. In the system, image pre-processing has
In order to get the higher consistency, these images are
morphological operations, contour extraction, polygon
captured with the same background and images are in RGB
approximation and segmentation. They used different CNN
color space with the size of 200 x 200 and these are in JPG
architecture for training and testing to extract the features
format.
from images, classify them, then compared the results of all
the CNN architectures.
913
Palm detector model which provides a bounding box of a

hand and recognises the palm through that bounding box in
an input image [12]. Hand detection is a quite Complex task
because there are variety of hands with different sizes so the
system should be able to detect the hand. Here, palm detector
is trained instead of hand detector because estimating the
bounding boxes of palm and fist are simpler than hand
fingers. Next encoder-decoder feature extractor is used and
minimise the focal loss during training.
Hand landmark model which the predicts the hand skeleton

on an input image which is in the bounding box provided by
palm detector, in turn hand landmark model results 3D
landmarks. After executing the palm detector on an input
image, hand Landmark model locates the landmarks of 21 3D
points on the hand which is detected in the hand area. So that
the model consistently learns the hand poses and becomes
Figure 2: Sign Language hand gestures robust, even it can detect the partially visible hands [11].
Figure 3 shows the hand tracking with hand landmarks.
3.2 Data Pre-processing
3.4 ANN Architecture
Image pre-processing step consists threshold setting and
rescaling the intensity of images. Before these, captured Artificial neural network is used for classification. The
images are in RGB form so first have to convert these RGB images to be classified are given to the network through
images to BGR form, then thresholding and intensity neurons at the input layer. These activation function process
rescaling operations are carried out. Threshold operation in the images and output will be given at the output layer [13].
which automatic multilevel thresholding of colour images Here, the ANN which has multilayer perceptron (MLP) that is
taken place and it searches for upper threshold value, then it consists input layer, hidden layers and an output layer.
pixels which has intensities lower or equal to this value are Training of neural network has calculated the weights [14].
assumed as foreground. Intensity rescaling operation which is
used to stretch or shrink the intensity range of the given
image. After these pre-processes, in the resulted images
system tries to detect the hand and it will consider the images
in which it can track the hand and it form a final dataset
which is used for further process.
3.3 Hand Tracking Technique
In this paper, mediapipe hand tracking technique is

used. Mediapipe is a cross platform framework which
facilitate to build multimodal applied ML pipelines.
Mediapipe hand is a high-Fidelity hand tracking solution, it
works on real time which recognize hand skeleton of input
image captured by the camera. This technique involves two
models: palm detector model and hand landmark model [1].
Figure 4: ANN Architecture
This system used ANN model with Keras and sequential

model is used by arranging the Keras layers sequentially.
First, dense layer is added with activation function ReLu and
next dropout player with activation function ReLu is added,
likewise alternatively dense layer and dropout layer is kept on
added with 1024, 512, 256 ,128 and 64 filters. Then the model
is compiled by using the categorical cross entropy as loss
Figure 3: Hand-tracking using mediapipe function and Adam as optimizer.
914
4. RESULT AND DISCUSSIONS The model has achieved 74% of validation accuracy with
efficient hand tracking technique. Its graph is plotted with
In the training phase, the system is trained by using 2000 number of epochs against validation accuracy of the model, as
images with the ANN architecture and the model is saved. shown in the below fig. 7.
now the prediction of letters takes place by using the model.
the system first detects the hand in the live video frame, when
the hand tracking is done then it recognises the sign and
display it on the screen in the text format.
Figure 5: Predicted sign as letter ‘A’ Figure 7: Graph of validation accuracy
The fig. 5 shows the sign of letter ‘A’. The system which
tracks the hand, and it compares the hand pattern with trained 5. CONCLUSION
images and then it predicts the sign is letter ‘A’ with the
probability percentage of prediction. There are many researches have been carried out in the
field of machine learning and computer vision. They have
In this system, we can also create the words by joining the contributed effective works which are very necessary and
letters one after other. The fig. 6 shows that first the system helpful for everyday life. Likewise various research has been
tracks the hand pattern, and it predicts the sign is ‘A’ then it done on sign language recognition using different methods
predicts the next sign ‘I’ which is shown to the camera and like neural networks, KNN, SVM and LSTM. In this paper,
after that, predicts the sign ‘M’, so it formed the word ‘AIM’. the proposed system concentrated on hand tracking technique
Likewise, we can form any word. which is very effective technique. It also detects the hand for
different skin colours and lighting condition, and it also
By using the ‘space’ which is also trained under the model, detects the hand in low-light condition. We used ANN to
we can form a sentence. After the sign to text conversion, the classify images of asl alphabet, the system recognises almost
text can be converted to speech which is helpful for blind all the letters and achieved 74% of accuracy. The system
people. The system pronounces the word or text. which also incorporates the speech which converts the
recognized sign text to speech, so that it will also be helpful
for blind people.
6. FUTURE WORK
The system model can be improved in terms of accuracy by

using different classification methods so that the model will
recognize the alphabet even more accurately.
REFERENCES
1. https://google.github.io/mediapipe/solutions/hands.html
2. Singha, J. and Das, K. Hand Gesture Recognition
Based on Karhunen-Loeve Transform, Mobile and
Embedded 232 Technology International Conference,
January 17-18, 2013.
Figure 6: Word formation
915
3. R. Sharma, Yash Nemani, Sumit Kumar, Lalit Kane,

Pritee Khanna. Recognition of Single Handed Sign
Language Gestures using Contour Tracing
descriptor, Proceedings of the World Congress on
Engineering 2013 Vol. II, WCE 2013, July 3 - 5, 2013,
London, U.K.
4. Mandeep Kaur Ahuja, Dr. Amardeep Singh. Hand
Gesture Recognition Using PCA, IJCSE, Vol 5, July
2015, Issue 7,267-271.
5. Vivek Bheda and N. Dianna Radpour. Using deep
convolutional networks for gesture recognition in
American sign language, arXiv preprint
arXiv:1710.06836
6. Ming Jin Cheok, Zaid Omar, Mohamed Hisham Jaward,
A review of hand gesture and sign language
recognition techniques, Springer-Verlag GmbH
Germany 2017.
7. S. Saravana Kumar1, Vedant L. Iyangar. sign language
recognition using machine Learning, International
Journal of Pure and Applied Mathematics, Volume 119
No. 10, 2018, 1687-1693.
8. Lean Karlo S. Tolentino, Ronnie O. Serfa Juan, August
C. Thio-ac, Maria Abigail B. Pamahoy, Joni Rose R.
Forteza, and Xavier Jet O. Garcia. Static Sign Language
Recognition Using Deep Learning, International
Journal of Machine Learning and Computing, Vol. 9,
No. 6, December 2019.
9. Raimundo F. Pinto Jr., Carlos D. B. Borges, Antˆonio M.
A. Almeida, and I ´alis C. Paula Jr., Static Hand
Gesture Recognition Based on Convolutional Neural
Networks, Volume 2019, Article ID 4167890, Published
10 October 2019.
10. Mehreen Hurroo, Mohammad ElhamWalizad, Sign
Language Recognition System using Convolutional
Neural Network and Computer Vision, International
Journal of Engineering Research & Technology (IJERT),
Vol. 9 Issue 12, December-2020
11. Fan Zhang, Valentin Bazarevsky, Andrey Vakunov,
Andrei Tkachenka, George Sung, Chuo-Ling Chang,
Matthias Grundmann, MediaPipe Hands: On-device
Real-time Hand Tracking, arXiv:2006.10214v1
[cs.CV] 18 Jun 2020.
12. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian
Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C.
Berg, SSD: Single Shot MultiBox Detector,
arXiv:1512.02325v5 [cs.CV] 29 Dec 2016.
13. https://medium.com/@gongster/building-a-simple-artifi
cial-neural-network-with-keras-in-2019-9eccb92527b1
14. Z. Zhang, Multivariate Time Series Analysis in Climate
and Environmental Research, in this Chapter
1-Artificial Neural Network, Springer International
Publishing AG 2018.
916

Sign Language To Text-Speech Translator Using Machine Learning

Uploaded by

Sign Language To Text-Speech Translator Using Machine Learning

Uploaded by

ISSN 2347 - 3983

Akshatha Rani K et al., International Journal ofVolume 9. Trends

Sign Language to Text-Speech Translator Using Machine

ABSTRACT Therefore, in order to overcome this challenge sign

[6] proposed review of hand gesture & language

Palm detector model which provides a bounding box of a

Hand landmark model which the predicts the hand skeleton

3.3 Hand Tracking Technique

In this paper, mediapipe hand tracking technique is

Figure 4: ANN Architecture

This system used ANN model with Keras and sequential

Figure 5: Predicted sign as letter ‘A’ Figure 7: Graph of validation accuracy

The system model can be improved in terms of accuracy by

3. R. Sharma, Yash Nemani, Sumit Kumar, Lalit Kane,

You might also like