Communication Interpretation Using Machine Learning and Open CV

CONTENTS
TITLE Page no.
LIST OF FIGURES vi
LIST OF TABLES vii
ABSTRACT viii
1.INTRODUCTION 1
1.1 IMAGE PROCESSING 1

1.2 SIGN LANGUAGE 3
1.3 SIGN LANGUAGE AND HAND GESTURE
RECOGNITION 3
1.4 MOTIVATION 4
1.5 PROBLEM STATEMENT 5
1.6 ORGANISATION OF THESIS 5
2.LITERATURE SURVEY 6
2.1 INTRODUCTION 6
2.1.1 TENSOR FLOW 6
2.1.2 OPENCV 6
2.1.3KERAS 10
2.1.4NUMPY 11
2.1.5 NEURAL NETWORKS 13
2.2 EXISTING MODELS 20
2.3PROPOSED SYSTEM 24
2.3.1 ARCHITECTURE 25
3.METHODOLOGY 26
3.1 TRAINING MODULE 26
5
3.1.1 PRE-PROCESSING 27
3.2 ALGORITHM 29
3.3 SEGMENTATION 30
3.4 CONVOLUTION NEURAL NETWORKS 31
3.5TESTING MODULE 34
4.DESIGN 38
4.1 DATAFLOW DIAGRAM 38
4.2 UML 40
4.2.1 USECASE DIAGRAM 42
4.2.2 CLASS DIAGRAM 46
4.2.3 SEQUENCE DIAGRAM 45
4.2.4 STATECHART DIAGRAM 50
5. EXPERIMENTAL ANALYSIS AND RESULTS 51
5.1 SYSTEM CONFIGURATION 51
5.1.1 SOFTWARE REQUIREMENTS 51
5.1.2 HARDWARE REQUIREMENTS 51
5.2 CODE 51
5.3 SCREENSHOTS AND RESULTS 68
6. CONCLUSION AND FUTURE SCOPE 70
REFERENCES 71
6
LIST OF FIGURES
Figure no. Name of the Figure Page no.
1.1 Phases of pattern recognition 2

2.1 Layers involved in CNN 19
2.2 Architecture of Sign Language recognition System 25
3.1 Dataset used for training the model 28
3.2 Sample pictures of training data 28
3.3 Training data given for Letter A 29
4.1 Data flow Diagram 39
4.2. Use Case Diagram 43
4.3 Class Diagram 46
4.4 Sequence Diagram 49
4.5 State Chart Diagram 56
5.1 Screenshot of the result obtained for letter 8 66
5.2 Screenshot of the result obtained for letter 7 & 10 67
5.3 Screenshot of result obtained for letter 3 & 5 68
5.4 Screenshot of result obtained for letter 4 & 2 69
7
LIST OF TABLES
Table no. Name of the Table Page no.
3.1 Verification of testcases 37
4.1 Usecase Scenario for Sign language recognition 44

system
vii
ABSTRACT
Sign language is the only tool of communication for the person who is not able to
speak and hear anything. Sign language is a boon for the physically challenged people
to express their thoughts and emotion. In this work, a novel scheme of sign language
recognition has been proposed for identifying the alphabets and gestures in sign
language. With the help of computer vision and neural networks we can detect the
signs and give the respective text output.
KeyWord: Sign LanguageRecognition1, Convolution Neural Network2, Image

Processing3, Edge Detection4, Hand Gesture Recogniton5.
viii
1. INTRODUCTION
Speech impaired people use hand signs and gestures to communicate. Normal
people face difficulty in understanding their language. Hence there is a need of a
system which recognizes the different signs, gestures and conveys the information to
the normal people. It bridges the gap between physically challenged people and
normal people.
1.1 IMAGE PROCESSING

Image processing is a method to perform some operations on an image, in
order to get an enhanced image or to extract some useful information from it. It is a
type of signal processing in which input is an image and output may be image or
characteristics/features associated with that image. Nowadays, image processing is
among rapidly growing technologies. It forms core research area within engineering
and computer science disciplines too.
Image processing basically includes the following three steps:
• Importing the image via image acquisition tools.
• Analysing and manipulating the image.
• Output in which result can be altered image or report that is based on image analysis.
There are two types of methods used for image processing namely, analogue
and digital image processing. Analogue image processing can be used for the hard
copies like printouts and photographs. Image analysts use various fundamentals of
interpretation while using these visual techniques. Digital image processing techniques
help in manipulation of the digital images by using computers. The three general
phases that all types of data have to undergo while using digital technique are pre-
processing, enhancement, and display, information extraction.
Digital image processing:
Digital image processing consists of the manipulation of images using digital
computers. Its use has been increasing exponentially in the last decades. Its
applications range from medicine to entertainment, passing by geological processing
1
and remote sensing. Multimedia systems, one of the pillars of the modern information
society, rely heavily on digital image processing.
Digital image processing consists of the manipulation of those finite precision
numbers. The processing of digital images can be divided into several classes: image
enhancement, image restoration, image analysis, and image compression. In image
enhancement, an image is manipulated, mostly by heuristic techniques, so that a
human viewer can extract useful information from it.
Digital image processing is to process images by computer. Digital image
processing can be defined as subjecting a numerical representation of an object to a
series of operations in order to obtain a desired result. Digital image processing
consists of the conversion of a physical image into a corresponding digital image and
the extraction of significant information from the digital image by applying various
algorithms.
Pattern recognition: On the basis of image processing, it is necessary to separate
objects from images by pattern recognition technology, then to identify and classify
these objects through technologies provided by statistical decision theory. Under the
conditions that an image includes several objects, the pattern recognition consists of
three phases, as shown in Fig.
Fig1.1: Phases of pattern recognition

The first phase includes the image segmentation and object separation. In this
phase, different objects are detected and separate from other background. The second
phase is the feature extraction. In this phase, objects are measured. The measuring
feature is to quantitatively estimate some important features of objects, and a group of
the features are combined to make up a feature vector during feature extraction. The
third phase is classification. In this phase, the output is just a decision to determine
2
which category every object belongs to. Therefore, for pattern recognition, what input
are images and what output are object types and structural analysis of images. The
structural analysis is a description of images in order to correctly understand and judge
for the important information of images.
1.2 SIGN LANGUAGE

It is a language that includes gestures made with the hands and other body
parts, including facial expressions and postures of the body.It used primarily by people
who are deaf and dumb. There are many different sign languages as, British, Indian
and American sign languages. British sign language (BSL) is not easily intelligible to
users of American sign Language (ASL) and vice versa .
A functioning signing recognition system could provide a chance for the
inattentive communicate with non-signing people without the necessity for an
interpreter. It might be wont to generate speech or text making the deaf more
independent. Unfortunately there has not been any system with these capabilities thus
far. during this project our aim is to develop a system which may classify signing
accurately.
American Sign Language (ASL) is a complete, natural language that has the
same linguistic properties as spoken languages, with grammar that differs from
English. ASL is expressed by movements of the hands and face. It is the primary
language of many North Americans who are deaf and hard of hearing, and is used by
many hearing people as well.
1.3 SIGN LANGUAGE AND HAND GESTURE RECOGNITION
The process of converting the signs and gestures shown by the user into text is
called sign language recognition. It bridges the communication gap between people
who cannot speak and the general public. Image processing algorithms along with
neural networks is used to map the gesture to appropriate text in the training data and
hence raw images/videos are converted into respective text that can be read and
understood.
Dumb people are usually deprived of normal communication with other people
in the society. It has been observed that they find it really difficult at times to interact
3
with normal people with their gestures, as only a very few of those are recognized by
most people. Since people with hearing impairment or deaf people cannot talk like
normal people so they have to depend on some sort of visual communication in most
of the time. Sign Language is the primary means of communication in the deaf and
dumb community. As like any other language it has also got grammar and vocabulary
but uses visual modality for exchanging information. The problem arises when dumb
or deaf people try to express themselves to other people with the help of these sign
language grammars. This is because normal people are usually unaware of these
grammars. As a result it has been seen that communication of a dumb person are only
limited within his/her family or the deaf community. The importance of sign language
is emphasized by the growing public approval and funds for international project. At
this age of Technology the demand for a computer based system is highly demanding
for the dumb community. However, researchers have been attacking the problem for
quite some time now and the results are showing some promise. Interesting
technologies are being developed for speech recognition but no real commercial
product for sign recognition is actually there in the current market. The idea is to make
computers to understand human language and develop a user friendly human computer
interfaces (HCI). Making a computer understand speech, facial expressions and human
gestures are some steps towards it. Gestures are the non-verbally exchanged
information. A person can perform innumerable gestures at a time. Since human
gestures are perceived through vision, it is a subject of great interest forcomputer
vision researchers. The project aims to determine human gestures by creating an HCI.
Coding of these gestures into machine language demands a complex programming
algorithm. In our project we are focusing on Image Processing and Template matching
for better output generation.
1.4 MOTIVATION
The 2011 Indian census cites roughly 1.3 million people with ―hearingimpairment‖. In
contrast to that numbers from India‘s National Association of the Deaf estimates that
18 million people –roughly 1 per cent of Indian population are deaf. These statistics
formed the motivation for our project. As these speech impairment and deaf people
need a proper channel to communicate with normal people there is a need for a
system . Not all normal people can understand sign language of impaired people. Our
4
project hence is aimed at converting the sign language gestures into text that is
readable for normal people.
1.5 PROBLEM STATEMENT

Speech impaired people use hand signs and gestures to communicate.
Normal people face difficulty in understanding their language. Hence there is a need
of a system which recognizes the different signs, gestures and conveys the information
to the normal people. It bridges the gap between physically challenged people and
normal people.
1.6 ORGANISATION OF THESIS

The book is organised as follows:
Part 1:The various technologies that are studied are introduced and the problem
statement is stated alongwith the motivation to our project.
Part 2:The Literature survey is put forth which explains the various other works and
their technologies that are used for Sign Language Recognition.
Part 3:Explains the methodologies in detail, represents the architecture and algorithms
used.
Part 4:Represents the project in various designs.
Part 5:Provides the experimental analysis, the code involved and the results obtained.
Part 6:Concludes the project and provides the scope to which the project can be
extended.
5
 2D and 3D feature toolkits
 Egomotion estimation
 Facial recognition system
 Gesture recognition
 Human–computer interaction (HCI)
 Mobile robotics
 Motion understanding
 Object identification
 Segmentation and recognition
Stereopsis stereo vision: depth perception from 2 cameras
 Structure from motion (SFM).

 Motion tracking
 Augmented reality
To support some of the above areas, OpenCV includes a statistical machine learning
library that contains:
 Boosting
 Decision tree learning
 Gradient boosting trees
 Expectation-maximization algorithm
 k-nearest neighbor algorithm
 Naive Bayes classifier
 Artificial neural networks
 Random forest
 Support vector machine (SVM)
 Deep neural networks (DNN)
AForge.NET, a computer vision library for the Common Language Runtime (.NET
Framework and Mono).

Communication Interpretation Using Machine Learning and Open CV

Uploaded by

Communication Interpretation Using Machine Learning and Open CV

Uploaded by

CONTENTS

TITLE Page no.

LIST OF TABLES vii

1.1 IMAGE PROCESSING 1

3.1 TRAINING MODULE 26

3.4 CONVOLUTION NEURAL NETWORKS 31

4.1 DATAFLOW DIAGRAM 38

4.2.1 USECASE DIAGRAM 42

4.2.2 CLASS DIAGRAM 46

4.2.3 SEQUENCE DIAGRAM 45

4.2.4 STATECHART DIAGRAM 50

5. EXPERIMENTAL ANALYSIS AND RESULTS 51

5.1 SYSTEM CONFIGURATION 51

5.1.1 SOFTWARE REQUIREMENTS 51

5.1.2 HARDWARE REQUIREMENTS 51

5.3 SCREENSHOTS AND RESULTS 68

6. CONCLUSION AND FUTURE SCOPE 70

Figure no. Name of the Figure Page no.

1.1 Phases of pattern recognition 2

Table no. Name of the Table Page no.

3.1 Verification of testcases 37

4.1 Usecase Scenario for Sign language recognition 44

KeyWord: Sign LanguageRecognition1, Convolution Neural Network2, Image

1.1 IMAGE PROCESSING

• Analysing and manipulating the image.

Fig1.1: Phases of pattern recognition

1.2 SIGN LANGUAGE

1.3 SIGN LANGUAGE AND HAND GESTURE RECOGNITION

1.5 PROBLEM STATEMENT

1.6 ORGANISATION OF THESIS

Part 4:Represents the project in various designs.

Stereopsis stereo vision: depth perception from 2 cameras

 Structure from motion (SFM).

You might also like