An Introduction to
VoiceXML
ART on Dialogue Models and
Dialogue Systems
Franois Mairesse
University of Sheffield
[email protected]
http://www.dcs.shef.ac.uk/~francois
Outline
What is it?
Why is it useful?
How does it work?
How to make it better?
Franois Mairesse, University of Sheffield
Brief history
1999: AT&T, IBM, Lucent Technology and Motorola
formed the VoiceXML Forum
l The goal was to for make Internet content available
by phone and voice
l Each company had previously developed its own
markup language
l Customers were reluctant to invest in proprietary
technology
2000: release of VoiceXML 1.0
2005: VoiceXML 2.1 is a W3C candidate
recommendation
Franois Mairesse, University of Sheffield
What is VoiceXML?
VoiceXML is a mark-up language for specifying interactive voice
dialogues between a human and a computer
Analogous to HTML
l VoiceXML browser interprets .vxml pages
l Can be dynamically generated by server-side scripts
(JSP, ASP, CGI, Perl)
Can access external databases (e.g. SQL)
Example
<?xml version="1.0"?>
<vxml version="2.0">
<form>
<prompt>
Hello world!
</prompt>
</form>
</vxml>
VoiceXML platform
Franois Mairesse, University of Sheffield
Architecture
Franois Mairesse, University of Sheffield
Voice User Interface (VUI)
Traditional web-based forms
The purpose of a dialogue is to fill forms
GUI vs. VUI
l
l
l
l
l
Fonts vs. prosody
Large menus vs. short utterances
Hypertext navigation vs. voice commands
Constraint on forms vs. recognition grammars
Global options always visible
vs. only uttered at the beginning of the dialogue
Franois Mairesse, University of Sheffield
Why use VoiceXML?
Advantages of VoiceXML platforms
l Special-purpose programming languages
Reduces development costs
l
Separation between dialogue system
components
Portability of application
Flexibility: outsource or purchase equipment
Choose best-of-breed components
l
l
Re-use of Internet infrastructure
VoiceXML is becoming a standard
Franois Mairesse, University of Sheffield
The VoiceXML language
XML structure
< element_name attribute_name="attribute_value">
......contained items......
< /element_name>
Basic elements
l
l
l
l
l
prompt: specifies the systems utterance
audio: play pre-recorded prompts
form: set of fields
field: information needed to complete task
grammar: specifies possible inputs to a field
Franois Mairesse, University of Sheffield
Basic elements
filled: what to do if user input is recognized
value: return a fields value
goto: go to another form or file
submit: go to another file and keep field values
Error handling
l user says nothing: noinput
l nothing matches the grammar: nomatch
Many more elements: http://www.vxml.org
Franois Mairesse, University of Sheffield
VoiceXML document
What do we want
to know?
Question
Possible answers
No answer?
Wrong answer?
Acceptable answer
Whats next?
10
<?xml version="1.0"?>
<vxml version="2.0>
<form id= get_student_name>
<field name= student_name>
<prompt> What's your name? </prompt>
<grammar> john | mary | rob </grammar>
<noinput> Please say your name. </noinput>
<nomatch> I didnt understand that. </nomatch>
<filled>
Thank you, <value expr= student_name />
<submit next=next_document.vxml />
</filled>
</field>
</form>
</vxml>
Franois Mairesse, University of Sheffield
Recognition grammars
Key to successful recognition
Many platform-dependent formats (JSGF, SGL, etc.)
Inline grammar
External file
<grammar src=mygram.gram type=application/x-jsgf />
Example with optional inputs (in brackets)
#JSGF V1.0;
grammar pizza;
public <pizza> = [Id like a] <size> <type> [pizza] [please];
<size> = small | medium | large;
<type> = vegetarian | pepperoni | cheese;
11
Franois Mairesse, University of Sheffield
Built-in grammars
Boolean
Currency
Date
Digits
Number
Phone
Time
Example: <field name=get_digits type=digits>
Can add additional constraints
<field name=get_digits
type=digits?minlength=3;maxlength=9>
12
Franois Mairesse, University of Sheffield
Events
Similar to exceptions
Thrown by
l
l
Platform: ASR misrecognition
Application: <throw>
Handler
l
l
l
Specific: <noinput>, <nomatch>, <help>
General: <catch event=>
Can count number of event occurrences
Successive ASR errors with different repairs
<nomatch count=3> What did you say ? </nomatch>
13
Franois Mairesse, University of Sheffield
VoiceXML properties
Can be modified using the <property>
element
l
l
l
l
14
Confidence level of ASR
Barge-in
Time-out
Voice/DTMF
Properties can be defined at all levels: for
the whole application, document, or a
specific field
Franois Mairesse, University of Sheffield
Mixed-initiative dialogues
VoiceXML allows for simple mixed-initiative
l More flexible
l More room for errors
A form-level grammar that can recognize multiple fields
at once
l E.g. Please tell me a departure day and a destination
l Grammar needs to account for all possible orderings
Im going to DEST on DATE
Im leaving on DATE to go to DEST
l
What if we dont have all required information at once?
Back to directed dialogue
Need traditional fields
l
15
How to know what fields remain unfilled?
Franois Mairesse, University of Sheffield
Form Interpretation
Algorithm
Defines how control flows through a
VoiceXML application as it executes
Makes VoiceXML declarative
l
l
l
Just specify utterances, fields and grammars
Define what happens, not how
FIA deals with procedural details
Keeps querying undefined fields
Throw events and loop until field is filled by user
<nomatch> or <noinput>
<filled>
16
Franois Mairesse, University of Sheffield
FIA - confirmations
If a field value isnt confirmed by user, set it to
undefined and the FIA will ask for it again
<field name=confirm type=boolean>
<prompt> Do you want details on <value expr=student_name />?
</prompt>
<filled>
<if cond=confirm>
Looking up details on <value expr=student_name />
<else />
Lets try again
<clear namelist=student_name confirm />
</if>
</filled>
</field>
17
Franois Mairesse, University of Sheffield
Limitations
Simple mixed initiative
How to retrieve information from a
database?
What about more advanced dialogue
system features?
l
Content summarization
Multiple database entries
Find alternatives answers
Dynamic grammars
If the database changes, the recognition grammar
must adapt
18
Generate VoiceXML pages dynamically
Franois Mairesse, University of Sheffield
Dynamic VoiceXML
19
Similar to dynamic HTML pages
Content isnt stored on the server, but created on-thefly based on the users parameters and a database
Typical interaction:
l A static VXML page collects information from the user
l Submit the fields to a server-side script (JSP, PHP,
ASP, Perl, etc.)
l The script queries the database and processes the
results
l The script outputs VXML code which is interpreted by
the browser
Franois Mairesse, University of Sheffield
Dynamic VoiceXML
JSP, PHP,
Perl scripts
(Database)
20
Franois Mairesse, University of Sheffield
Implementation in Perl
When form is filled, send fields value to the server-side script for
processing
<filled> Thank you
<submit next=http://mywebserver/script.perl>
</filled>
The Perl script collects information
$q = new CGI;
$name = $q->param('student_name');
Connect to the SQL database
$handler = DBI->connect("DBI:mysql:$db", $user, $password);
Query the database for the students name
$query = $handler->prepare(SELECT * FROM students
WHERE name = \$name\ );
$query->execute;
Output beginning of VoiceXML document (<xml>, <voicexml>,
<form>, <prompt>)
Output result, i.e. the students phone number
@row = $sth->fetchrow_array;
print The phone number of $name is $row[2]\n;
21
Output end of VoiceXML document (</form>, </voicexml>, etc.)
Franois Mairesse, University of Sheffield
Dynamic grammars
What to do when the recognition vocabulary
is not known in advance?
Rewrite a grammar at each database
update
Better, use a server-side script to
l
l
l
22
Retrieve patterns from database
Write grammar to an external file
Call a VXML page using this grammar
Franois Mairesse, University of Sheffield
Conclusion
VoiceXML has become a standard
l
l
l
l
l
All-in-one solutions available
Reduces dialogue system development time
Comes with limited dialogue management
and language generation capabilities
Additional functions can be easily
implemented
Develop your own dialogue system with free
VoiceXML browsers!
https://studio.tellme.com
23
Franois Mairesse, University of Sheffield