SPEECH RECOGNITION SYSTEM
1
More Sachin Angad, 2Nisargandh Prachi Ashwin, 3Rathod S.M.
1,2
Student Dept. of Electronics &Telecommunication, Aditya Polytechnic, Beed, Maharashtra, India
3
Head Of Dept. Electronics &Telecommunication, Aditya Polytechnic, Beed, Maharashtra, India
____________________________________________________________________________________________
recognition or speech to text (STT). It incorporates
ABSTRACT knowledge and research in the computer science,
linguistics and computer engineering fields. The
Speech recognition basically means talking to a
reverse process is speech synthesis. Some speech
computer, having it recognize what we are saying.
recognition systems require "solly" (also called
This process fundamentally functions as a pipeline
"enrollment") where an individual speaker reads text
that converts PCM (Pulse Code Modulation) digital
or isolated vocabulary into the system. The system
audio from a sound card into recognized speech.
analyzes the person's specific voice and uses it to
Speech recognition technology has evolved for more
fine-tune the recognition of that person's speech,
than 40 years, spurred on by advances in signal
resulting in increased accuracy. Systems that do not
processing, algorithms, architectures, and hardware.
use training are called "speaker-
During that time it has gone from a laboratory
[1]
independent" systems.
curiosity to an art, and eventually to a full-fledged
understood by a wide range of engineers, scientists, B. Generic Speech Recognition System:
linguists, psychologists, and systems designers. Over
those 4 decades, the technology of speech
recognition has evolved, leading to a steady stream
of increasingly more difficult asks which have been
tackled and solved.
Key words: Thing speak, ECG, temperature, oxygen
level, heart rate, ESP32, Arduino, Android app.
A. INTRODUCTION
Fig1:-(Block Diagram of Generic Speech
Speech recognition is an interdisciplinary subfield of Recognition System)
computer science and computational linguistics that The figure shows a block diagram of a typical
develops methodologies and technologies that enable integrated continuous speech recognition system.
the recognition and translation of spoken language Interestingly enough, this generic block diagram
into text by computers. It is also known as automatic can be made to work on virtually any speech
speech recognition (ASR), computer speech recognition task that has been devised in the past 40
years, i.e. isolated word recognition, connected word conversation moving forward; periods of great
recognition, continuous speech recognition, etc. The uncertainty on the parts of either the user or
feature. analysis module provides the acoustic the machine.
feature vectors used to characterize the spectral We now expand somewhat on each of these
properties of the time- varying speech signal. The factors:
word level acoustic match module evaluates the
User Interface Design: In order to make a
similarity between the input feature vector sequence
speech interface as simple and as effective as
(corresponding to a portion of the input speech) and
Graphical User Interfaces (GUI), 3 key design
a set of acoustic word models for all words in the
principles should be followed as closely as
recognition task vocabulary to determine which
possible, namely:
words were most likely spoken. The sentence-level
Provide a continuous representation of the
match module uses a language model (i.e., a model
objects and actions of interest.
of syntax and semantics) to determine the most
Provide a mechanism for rapid, incremental,
likely sequence of words. Syntactic and semantic
and reversible operations whose impact on the
rules can be specified, either manually, based on task
object of interest is immediately visible.
constraints, or with statistical models such as word
and class N-gram probabilities. Search and Use physical actions or labeled button presses
recognition decisions are made by 502 considering instead of text commands.
all likely word sequences.
Almost every aspect of the continuous speech C. Dialogue Design Principles:
recognizer of Figure 1 has been studied and
optimized over the years. As a result, we have
obtained a great deal of knowledge about how to
design the feature analysis module, how to choose
appropriate recognition units, how to populate the
word lexicon, how to build acoustic word models,
how to model language syntax and semantics, how to
decode word matches against word models, how to
efficiently determine a sentence match, and finally
how to eventually choose the best recognized
sentence.
Building Good Speech-Based Applications: (Fig 2:- Block Diagram Of Speech
Recognition System)
In addition to having good speech recognition
technology, effective speech based applications For many interactions between a person and a
heavily depend on several factors, including: machine, a dialogue is needed to establish a
complete interaction with the machine. The
Good models of dialogues that keep the
„ideal‟ dialogue allows either the user or the Technology)
machine: to initiate queries or to choose to It is essential that any application of speech
respond to queries initiated by the other side. recognition be realistic about the capabilities of the
(Such systems are called„mixed initiative‟ technology, and build in failure correction modes.
systems.) A complete set of design principles Hence building a credit card recognition; application
for dialogue systems has not yet evolved (it is before digit error rates fell below 0.5% per digit is a
far too early yet). However, much as we have formula for failure, since for a 16-digit credit card,
learned good speech interface design the string error rate will be at the 10% level or
principles, many of the same or similar higher, thereby frustrating customers who speak
principles are evolving for dialogue clearly and distinctly, and making the system totally
management. The key principles that have unusable for customers who slur their speech or
evolved are the following: otherwise make it difficult to understand their
Summarize actions to be taken, whenever spoken inputs. Utilizing this principle, the following
possible. successful applications have been built:
Provide real-time, low delay, responses from Game/aids-to-the-handicapped: voice control of
the machine and allow the user to barge in it at selective features of the game, the wheelchair, the
any time. environment (climate control).
Orient users to their „location‟ in task space as
often as possible. The Telecommunications need for Speech
Use flexible grammars to provide Recognition
incrementally of the dialogue. The telecommunications network is evolving as the
Whenever possible, customize and traditional POTS (Plain Old Telephony Services)
personalize the dialogue (novice/expert) network comes together with the dynamically
evolving Packet network, in a structure which we
believe will look something like the one shown in
D. Match Task to the Technology:
the Figure below.
Telecommunication Applications of Speech
Recognition
Speech recognition was introduced into the
telecommunications network in the early 1990‟s for
two reasons, namely to reduce costs via automation
of attendant functions, and to provide new
revenuegenerating services that were previously
impractical because of the associated costs of using
attendants.
(Fig 3:- Block Diagram of Match Task to the
Examples of telecommunications services which to around 30 seconds and misdirected calls to virtual
were created to achieve cost reduction include the nil.
following: In-car systems
Voice Dialing Systems have been created for Typically a manual control input, for example by
voice dialing by name (so-called alias dialing such means of a finger control on the steering wheel,
as Call Home, Call Office) from AT&T, NYNEX, enables the speech recognition system and this is
and Bell Atlantic, and by number (AT&T signaled to the driver by an audio prompt. Following
SDN/NRA) to enable customers to complete calls the audio prompt, the system has a “listening
without having to push buttons associated with the window” during which it may accept a speech input
telephone number being called. for recognition.
E. Replacing complicated and often Simple voice commands may be used to initiate
frustrating ‘push button’ IVR: phone calls, select radio stations or play music from
Due to poorly implemented and managed systems, a compatible smartphone, MP3 player or music-
IVR and automated call handling systems may be loaded flash drive. Voice recognition capabilities
often unpopular and frustrating with customers. vary between car make and model. Some of the most
However, there is a way to improve this scenario. recent car models offer natural-language speech
Termed „intelligent call steering‟ (ICS), it does not recognition in place of a fixed set of commands,
involve any „button pushing‟. The system simply allowing the driver to use full sentences and
asks the customer what they want (in their words, common phrases. With such systems, there is,
not yours) and then transfers them to the most therefore, no need for the user to memorize a set of
suitable resource to handle their call. Callers dial fixed command words.
one number and are greeted by the message High-performance fighter aircraft
“Welcome to XYZ Company, how I can help you?”
Substantial efforts have been devoted in the last
The caller is routed to the right agent within 20 to
decade to the test and evaluation of speech
30 seconds of
recognition in fighter aircraft. Of particular note
the call being answered with misdirected calls have been the US program in speech recognition for
reduced to as low as 3-5 percent. the Advanced Fighter Technology Integration
By introducing Natural Language Speech (AFTI)/F-16 aircraft (F-16 VISTA), the program in
Recognition (NLSR), general insurance company France for Mirage aircraft, and other programs in the
Suncorp replaced its original push button IVR, UK dealing with a variety of aircraft platforms. In
enabling the customer to simply say what they want. these programs, speech recognizers have been
Using a financial services‟ statistical language model operated successfully in fighter aircraft, with
of over 100,000 phrases, the system can more applications including setting radio frequencies,
accurately assess the nature of the call and transfer it commanding an autopilot system, setting steer-point
the first time to the appropriate department or coordinates and weapons release parameters, and
advisor. The company reduced its call waiting times controlling flight display.
Performance of speech recognition systems- issue for voice in helicopters is the impact on pilot
It is usually specified in terms of accuracy and effectiveness. Battle Management – Speech
speed. Accuracy may be measured in terms of recognition equipment was tested in conjunction
performance accuracy which is usually rated with with an integrated information display for naval
word error rate , whereas speed is measured with battle management [Link] and
the real time [Link] machines can achieve other domains – ASR in the field of computer
very high performance in controlled conditions and gaming and simulation is becoming more
require only a short period of [Link]
conditions usually assume that users -have speech
F. CONCLUSION
characteristics which match the training [Link]
This paper presents the Speech Recognition in
achieve proper speaker [Link] in clean and
Artificial intelligence systems and it is important to
no noise [Link] are 2 models on
consider the environment in which the speech
statistically- based Speech Recognition-Hidden
recognition system has to [Link] grammar used
Markov Model (HMM model)Dynamic Time
by the speaker and accepted by the system, noise
Wrapping (DTW model)
level, noise type, position of the microphone, and
DTW - based Speech Recognition –
speed and manner of the user‟s speech are some
Dynamic time warping is an algorithm for factors that may affect the quality of speech
measuring similarity between two sequences which recognition.
may vary in time or speed. It is a historical
G. REFERENCES
[Link] between speaking patterns
1. ohn Levis and Ruslan Suvorov, "Automatic
would be detected. DTW has been applied to video,
Speech Recognition".
audio, and graphics -- indeed, any data which can
2. B.H. Juang and Lawrence R. Rabiner,
be turned into a linear representation can be
"Automatic Speech Recognition - A Brief
analyzed with [Link] sequence technique is
History of the Technology Development".
also used in HMMs model.
3. S. Xue, X. Y. Kou and S. T. Tan, "Natural
school assignments by using speech-to-text
Voice- Enabled CAD: Modeling via Natural
programs. They can also utilize speech recognition
Discourse".
technology to freely enjoy searching the Internet or
using a computer at home without having to 4. Ekenta Elizabeth Odokuma and
physically operate a mouse and keyboard. Orluchukwu Great Ndidi, "Development Of A
Voice-Controlled Personal Assistant For The
Applications of Speech Recognition -
Elderly And Disabled".
Health Care -In this even in the wake of Speech
recognition technologies MT haven’t become
obsolute. Military -High-performance fighter
aircraft- Speech recognizers have been operated
Helicopters - As in fighter applications overriding