0% found this document useful (0 votes)
30 views23 pages

Introduction to VoiceXML Basics

VoiceXML is a markup language for building voice dialog systems. It allows websites and applications to be accessed using speech instead of a graphical user interface. VoiceXML uses XML tags to define prompts, grammars, and dialog flow. Dynamic VoiceXML involves server-side scripts generating VoiceXML pages on the fly based on database queries and user input to enable more advanced features like mixed-initiative dialogs and dynamic grammars. While basic, VoiceXML provides a standard for dialog system development and can be extended using programming languages to interface with databases and add additional dialog capabilities.

Uploaded by

poppy too
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views23 pages

Introduction to VoiceXML Basics

VoiceXML is a markup language for building voice dialog systems. It allows websites and applications to be accessed using speech instead of a graphical user interface. VoiceXML uses XML tags to define prompts, grammars, and dialog flow. Dynamic VoiceXML involves server-side scripts generating VoiceXML pages on the fly based on database queries and user input to enable more advanced features like mixed-initiative dialogs and dynamic grammars. While basic, VoiceXML provides a standard for dialog system development and can be extended using programming languages to interface with databases and add additional dialog capabilities.

Uploaded by

poppy too
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

An Introduction to

VoiceXML
ART on Dialogue Models and
Dialogue Systems
Franois Mairesse
University of Sheffield
[email protected]
http://www.dcs.shef.ac.uk/~francois

Outline
What is it?
Why is it useful?
How does it work?
How to make it better?

Franois Mairesse, University of Sheffield

Brief history

1999: AT&T, IBM, Lucent Technology and Motorola


formed the VoiceXML Forum
l The goal was to for make Internet content available
by phone and voice
l Each company had previously developed its own
markup language
l Customers were reluctant to invest in proprietary
technology

2000: release of VoiceXML 1.0


2005: VoiceXML 2.1 is a W3C candidate
recommendation

Franois Mairesse, University of Sheffield

What is VoiceXML?

VoiceXML is a mark-up language for specifying interactive voice


dialogues between a human and a computer
Analogous to HTML
l VoiceXML browser interprets .vxml pages
l Can be dynamically generated by server-side scripts
(JSP, ASP, CGI, Perl)
Can access external databases (e.g. SQL)

Example
<?xml version="1.0"?>
<vxml version="2.0">
<form>
<prompt>
Hello world!
</prompt>
</form>
</vxml>

VoiceXML platform
Franois Mairesse, University of Sheffield

Architecture

Franois Mairesse, University of Sheffield

Voice User Interface (VUI)

Traditional web-based forms


The purpose of a dialogue is to fill forms
GUI vs. VUI
l
l
l
l
l

Fonts vs. prosody


Large menus vs. short utterances
Hypertext navigation vs. voice commands
Constraint on forms vs. recognition grammars
Global options always visible
vs. only uttered at the beginning of the dialogue

Franois Mairesse, University of Sheffield

Why use VoiceXML?


Advantages of VoiceXML platforms
l Special-purpose programming languages
Reduces development costs
l

Separation between dialogue system


components
Portability of application
Flexibility: outsource or purchase equipment
Choose best-of-breed components

l
l

Re-use of Internet infrastructure


VoiceXML is becoming a standard
Franois Mairesse, University of Sheffield

The VoiceXML language

XML structure
< element_name attribute_name="attribute_value">
......contained items......
< /element_name>

Basic elements
l
l
l
l
l

prompt: specifies the systems utterance


audio: play pre-recorded prompts
form: set of fields
field: information needed to complete task
grammar: specifies possible inputs to a field

Franois Mairesse, University of Sheffield

Basic elements

filled: what to do if user input is recognized


value: return a fields value
goto: go to another form or file
submit: go to another file and keep field values

Error handling
l user says nothing: noinput
l nothing matches the grammar: nomatch

Many more elements: http://www.vxml.org

Franois Mairesse, University of Sheffield

VoiceXML document
What do we want
to know?
Question
Possible answers
No answer?
Wrong answer?
Acceptable answer
Whats next?

10

<?xml version="1.0"?>
<vxml version="2.0>
<form id= get_student_name>
<field name= student_name>
<prompt> What's your name? </prompt>
<grammar> john | mary | rob </grammar>
<noinput> Please say your name. </noinput>
<nomatch> I didnt understand that. </nomatch>
<filled>
Thank you, <value expr= student_name />
<submit next=next_document.vxml />
</filled>
</field>
</form>
</vxml>
Franois Mairesse, University of Sheffield

Recognition grammars

Key to successful recognition


Many platform-dependent formats (JSGF, SGL, etc.)
Inline grammar
External file
<grammar src=mygram.gram type=application/x-jsgf />
Example with optional inputs (in brackets)
#JSGF V1.0;
grammar pizza;
public <pizza> = [Id like a] <size> <type> [pizza] [please];
<size> = small | medium | large;
<type> = vegetarian | pepperoni | cheese;

11

Franois Mairesse, University of Sheffield

Built-in grammars

Boolean
Currency
Date
Digits
Number
Phone
Time

Example: <field name=get_digits type=digits>

Can add additional constraints


<field name=get_digits
type=digits?minlength=3;maxlength=9>

12

Franois Mairesse, University of Sheffield

Events

Similar to exceptions
Thrown by
l
l

Platform: ASR misrecognition


Application: <throw>

Handler
l
l
l

Specific: <noinput>, <nomatch>, <help>


General: <catch event=>
Can count number of event occurrences
Successive ASR errors with different repairs
<nomatch count=3> What did you say ? </nomatch>

13

Franois Mairesse, University of Sheffield

VoiceXML properties

Can be modified using the <property>


element
l
l
l
l

14

Confidence level of ASR


Barge-in
Time-out
Voice/DTMF

Properties can be defined at all levels: for


the whole application, document, or a
specific field
Franois Mairesse, University of Sheffield

Mixed-initiative dialogues

VoiceXML allows for simple mixed-initiative


l More flexible
l More room for errors
A form-level grammar that can recognize multiple fields
at once
l E.g. Please tell me a departure day and a destination
l Grammar needs to account for all possible orderings
Im going to DEST on DATE
Im leaving on DATE to go to DEST
l

What if we dont have all required information at once?


Back to directed dialogue
Need traditional fields

l
15

How to know what fields remain unfilled?


Franois Mairesse, University of Sheffield

Form Interpretation
Algorithm

Defines how control flows through a


VoiceXML application as it executes
Makes VoiceXML declarative
l
l
l

Just specify utterances, fields and grammars


Define what happens, not how
FIA deals with procedural details
Keeps querying undefined fields
Throw events and loop until field is filled by user
<nomatch> or <noinput>
<filled>

16

Franois Mairesse, University of Sheffield

FIA - confirmations

If a field value isnt confirmed by user, set it to


undefined and the FIA will ask for it again
<field name=confirm type=boolean>
<prompt> Do you want details on <value expr=student_name />?
</prompt>
<filled>
<if cond=confirm>
Looking up details on <value expr=student_name />
<else />
Lets try again
<clear namelist=student_name confirm />
</if>
</filled>
</field>

17

Franois Mairesse, University of Sheffield

Limitations

Simple mixed initiative


How to retrieve information from a
database?
What about more advanced dialogue
system features?
l

Content summarization
Multiple database entries
Find alternatives answers

Dynamic grammars
If the database changes, the recognition grammar
must adapt

18

Generate VoiceXML pages dynamically


Franois Mairesse, University of Sheffield

Dynamic VoiceXML

19

Similar to dynamic HTML pages


Content isnt stored on the server, but created on-thefly based on the users parameters and a database
Typical interaction:
l A static VXML page collects information from the user
l Submit the fields to a server-side script (JSP, PHP,
ASP, Perl, etc.)
l The script queries the database and processes the
results
l The script outputs VXML code which is interpreted by
the browser

Franois Mairesse, University of Sheffield

Dynamic VoiceXML

JSP, PHP,
Perl scripts

(Database)

20

Franois Mairesse, University of Sheffield

Implementation in Perl

When form is filled, send fields value to the server-side script for
processing
<filled> Thank you
<submit next=http://mywebserver/script.perl>
</filled>

The Perl script collects information


$q = new CGI;
$name = $q->param('student_name');

Connect to the SQL database


$handler = DBI->connect("DBI:mysql:$db", $user, $password);

Query the database for the students name


$query = $handler->prepare(SELECT * FROM students
WHERE name = \$name\ );
$query->execute;

Output beginning of VoiceXML document (<xml>, <voicexml>,


<form>, <prompt>)
Output result, i.e. the students phone number
@row = $sth->fetchrow_array;
print The phone number of $name is $row[2]\n;

21

Output end of VoiceXML document (</form>, </voicexml>, etc.)


Franois Mairesse, University of Sheffield

Dynamic grammars

What to do when the recognition vocabulary


is not known in advance?
Rewrite a grammar at each database
update
Better, use a server-side script to
l
l
l

22

Retrieve patterns from database


Write grammar to an external file
Call a VXML page using this grammar
Franois Mairesse, University of Sheffield

Conclusion

VoiceXML has become a standard


l
l
l
l
l

All-in-one solutions available


Reduces dialogue system development time
Comes with limited dialogue management
and language generation capabilities
Additional functions can be easily
implemented
Develop your own dialogue system with free
VoiceXML browsers!
https://studio.tellme.com

23

Franois Mairesse, University of Sheffield

You might also like