Communication Interpretation Using Machine Learning and Open CV
Communication Interpretation Using Machine Learning and Open CV
LIST OF FIGURES vi
ABSTRACT viii
1.INTRODUCTION 1
2.LITERATURE SURVEY 6
2.1 INTRODUCTION 6
2.1.1 TENSOR FLOW 6
2.1.2 OPENCV 6
2.1.3KERAS 10
2.1.4NUMPY 11
2.1.5 NEURAL NETWORKS 13
2.2 EXISTING MODELS 20
2.3PROPOSED SYSTEM 24
2.3.1 ARCHITECTURE 25
3.METHODOLOGY 26
5
3.1.1 PRE-PROCESSING 27
3.2 ALGORITHM 29
3.3 SEGMENTATION 30
3.5TESTING MODULE 34
4.DESIGN 38
4.2 UML 40
5.2 CODE 51
REFERENCES 71
6
LIST OF FIGURES
7
LIST OF TABLES
vii
ABSTRACT
Sign language is the only tool of communication for the person who is not able to
speak and hear anything. Sign language is a boon for the physically challenged people
to express their thoughts and emotion. In this work, a novel scheme of sign language
recognition has been proposed for identifying the alphabets and gestures in sign
language. With the help of computer vision and neural networks we can detect the
signs and give the respective text output.
viii
1. INTRODUCTION
Speech impaired people use hand signs and gestures to communicate. Normal
people face difficulty in understanding their language. Hence there is a need of a
system which recognizes the different signs, gestures and conveys the information to
the normal people. It bridges the gap between physically challenged people and
normal people.
• Output in which result can be altered image or report that is based on image analysis.
There are two types of methods used for image processing namely, analogue
and digital image processing. Analogue image processing can be used for the hard
copies like printouts and photographs. Image analysts use various fundamentals of
interpretation while using these visual techniques. Digital image processing techniques
help in manipulation of the digital images by using computers. The three general
phases that all types of data have to undergo while using digital technique are pre-
processing, enhancement, and display, information extraction.
Digital image processing:
Digital image processing consists of the manipulation of images using digital
computers. Its use has been increasing exponentially in the last decades. Its
applications range from medicine to entertainment, passing by geological processing
1
and remote sensing. Multimedia systems, one of the pillars of the modern information
society, rely heavily on digital image processing.
Digital image processing consists of the manipulation of those finite precision
numbers. The processing of digital images can be divided into several classes: image
enhancement, image restoration, image analysis, and image compression. In image
enhancement, an image is manipulated, mostly by heuristic techniques, so that a
human viewer can extract useful information from it.
Digital image processing is to process images by computer. Digital image
processing can be defined as subjecting a numerical representation of an object to a
series of operations in order to obtain a desired result. Digital image processing
consists of the conversion of a physical image into a corresponding digital image and
the extraction of significant information from the digital image by applying various
algorithms.
Pattern recognition: On the basis of image processing, it is necessary to separate
objects from images by pattern recognition technology, then to identify and classify
these objects through technologies provided by statistical decision theory. Under the
conditions that an image includes several objects, the pattern recognition consists of
three phases, as shown in Fig.
2
which category every object belongs to. Therefore, for pattern recognition, what input
are images and what output are object types and structural analysis of images. The
structural analysis is a description of images in order to correctly understand and judge
for the important information of images.
The process of converting the signs and gestures shown by the user into text is
called sign language recognition. It bridges the communication gap between people
who cannot speak and the general public. Image processing algorithms along with
neural networks is used to map the gesture to appropriate text in the training data and
hence raw images/videos are converted into respective text that can be read and
understood.
Dumb people are usually deprived of normal communication with other people
in the society. It has been observed that they find it really difficult at times to interact
3
with normal people with their gestures, as only a very few of those are recognized by
most people. Since people with hearing impairment or deaf people cannot talk like
normal people so they have to depend on some sort of visual communication in most
of the time. Sign Language is the primary means of communication in the deaf and
dumb community. As like any other language it has also got grammar and vocabulary
but uses visual modality for exchanging information. The problem arises when dumb
or deaf people try to express themselves to other people with the help of these sign
language grammars. This is because normal people are usually unaware of these
grammars. As a result it has been seen that communication of a dumb person are only
limited within his/her family or the deaf community. The importance of sign language
is emphasized by the growing public approval and funds for international project. At
this age of Technology the demand for a computer based system is highly demanding
for the dumb community. However, researchers have been attacking the problem for
quite some time now and the results are showing some promise. Interesting
technologies are being developed for speech recognition but no real commercial
product for sign recognition is actually there in the current market. The idea is to make
computers to understand human language and develop a user friendly human computer
interfaces (HCI). Making a computer understand speech, facial expressions and human
gestures are some steps towards it. Gestures are the non-verbally exchanged
information. A person can perform innumerable gestures at a time. Since human
gestures are perceived through vision, it is a subject of great interest forcomputer
vision researchers. The project aims to determine human gestures by creating an HCI.
Coding of these gestures into machine language demands a complex programming
algorithm. In our project we are focusing on Image Processing and Template matching
for better output generation.
1.4 MOTIVATION
The 2011 Indian census cites roughly 1.3 million people with ―hearingimpairment‖. In
contrast to that numbers from India‘s National Association of the Deaf estimates that
18 million people –roughly 1 per cent of Indian population are deaf. These statistics
formed the motivation for our project. As these speech impairment and deaf people
need a proper channel to communicate with normal people there is a need for a
system . Not all normal people can understand sign language of impaired people. Our
4
project hence is aimed at converting the sign language gestures into text that is
readable for normal people.
Normal people face difficulty in understanding their language. Hence there is a need
of a system which recognizes the different signs, gestures and conveys the information
to the normal people. It bridges the gap between physically challenged people and
normal people.
Part 1:The various technologies that are studied are introduced and the problem
statement is stated alongwith the motivation to our project.
Part 2:The Literature survey is put forth which explains the various other works and
their technologies that are used for Sign Language Recognition.
Part 3:Explains the methodologies in detail, represents the architecture and algorithms
used.
Part 5:Provides the experimental analysis, the code involved and the results obtained.
Part 6:Concludes the project and provides the scope to which the project can be
extended.
5
2D and 3D feature toolkits
Egomotion estimation
Facial recognition system
Gesture recognition
Human–computer interaction (HCI)
Mobile robotics
Motion understanding
Object identification
Segmentation and recognition
To support some of the above areas, OpenCV includes a statistical machine learning
library that contains:
Boosting
Decision tree learning
Gradient boosting trees
Expectation-maximization algorithm
k-nearest neighbor algorithm
Naive Bayes classifier
Artificial neural networks
Random forest
Support vector machine (SVM)
Deep neural networks (DNN)
AForge.NET, a computer vision library for the Common Language Runtime (.NET
Framework and Mono).