The Role of the Internet in the Evolution of
Speech and Telephony Applications
Curt Tuckey
Ken Rehor
Director, Voice Laboratory
Chief Architect
© 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved
1
A bit of background…
© 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved
3
Internet application model (Part I)
Internet
HTTP Request
HTTP Response
(Device Markup)
Web Server
Laptop
Internet
Service
Provider
Or Internal
Network
© 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved
Database
4
The Model-View-Controller Pattern
• MVC (e.g. MAWL, Bell Labs, 1995)
– Data adapters, other actions (e.g. Enterprise
javabeans)
– Controller (e.g., java servlet)
– View (e.g. JSP and taglib templates)
© 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved
5
Make no little plans; they have no magic
to stir men's blood and probably will
themselves not be realized.
Make big plans; aim high in hope and
work, remembering that a noble, logical
diagram once recorded will not die.
-- Daniel Burnham
© 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved
6
Phone Web – August 1995
© 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved
7
Make no little plans; they have no magic to
stir men's blood and probably will
themselves not be realized.
Make big plans; aim high in hope and work,
remembering that a noble, logical diagram
once recorded will not die.
-- Daniel Burnham
© 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved
8
Bringing the internet model to voice (Part I)
Service Provider
HTTP Request
HTTP Response
(Device Markup)
Landline
or
mobile
phone
Phone
Network
Voice
Gateway
Web Server
PSTN or Wireless
Database
© 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved
9
Evolution of Architectures
for Automated Telephone Services
PSTN
Internet
End
User
End
User
VoiceXML Gateway
•Voice & Telephony functions
•ASR, TTS, DTMF
•Audio play/record
•Telephony interface
•VoiceXML client
IVR Platform
•Voice & Telephony functions
•ASR, TTS, DTMF
•Audio play/record
•Telephony interface
•Service logic
•Transaction server interface
Application
Server
•Service logic
•Transaction server interface
Service logic
Transaction
interface
© 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved
10
VoiceXML Heritage
2000
B. D. Lucas
L. Boyer
C. Tuckey 7/00
SpeechML
J. Ferrans
G. Karam
N. Klarlund
P. Danielsen
VoxML
PML
PML
J. C. Ramming
PML
K. G. Rehor
D. Ladd 2/96
C. Tuckey 11/98
Bell Labs
1995
PML Phone Web
M. Benedikt
© 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved
11
Economic Impact
IT WAS ARGUABLY THE GREATEST misallocation of capital in recent
history. In the late 1990s, telecoms firms spent billions building new data
networks....
Yet innovation in telecoms has not stopped. And it may be telephone services
that help struggling telecoms firms to claw their way out of their slump.
The real impetus behind better voice applications, however, is the recent
agreement on an industry-wide standard. Beginning in 1999, and pushed
by such leading telecoms firms as AT&T, IBM, Lucent and Motorola, the
industry has come up with a lingua franca for voice applications called
VoiceXML....
VoiceXML could yet rescue telecoms carriers from their folly in stringing so
much optical fibre around the world. It is ironic that it should be oldfashioned voice that lightens the darkness in fibre.
From The Economist, 12/12/2002
© 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved
12
VoiceXML
and
Phone Web Architecture
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Goal: Leverage Web Architecture
• Languages
• Protocols
• Architecture
Visual vs. Voice markup
Web app components
Voice Web app components
• HTML – Structure
• VoiceXML – Structure
– Layout, input declaration,
transitions, etc.
• Images
• Text
• Scripts
– Dialog flow, input declaration,
transitions, etc.
• Audio files
• Text (for TTS)
• Scripts
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
VoiceXML example with error handling
<form>
<field name="main_menu">
<prompt>
<audio src="welcome.wav"> Welcome to Acme.
You can choose sales, repair, or order status.</audio>
</prompt>
<grammar src="main_menu.grxml"/>
</field>
<help> You can say sales, repair, or order status. </help>
<noinput> You must say something. </noinput>
<nomatch> I didn't understand you. Please try again. </nomatch>
<block>
<submit next="http://acme.com/route... "
</block>
method="get"/>
</form>
main.vxml
Note: Code simplified for demonstration purposes…
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Web Application Architecture
<html>
Internet or
HTTP
HTTP
Web user
Intranet
• Images
• Audio files
• Scripts
Application
(web) server
• Application logic
• Content and data
• Transaction processing
• Database interface
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Phone / Web Application Architecture
PSTN
<vxml>
• Grammars
• Audio files
• Scripts
Phone user
HTTP
VoiceXML
gateway
Internet or
HTTP
Intranet
<html>
Web user
HTTP
• Images
• Audio files
• Scripts
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Application
(web) server
• Application logic
• Content and data
• Transaction processing
• Database interface
Voice Application Architecture and Components
<grxml>
Welcome to
Acme products
…
Customer
service,
please…
.wav
<vxml>
Caller
HTTP
Internet
Web
server
Telephony
middleware
OA&M
VoiceXML
interpreter
ASR
TTS
Audio
DTMF
PSTN
VoiceXML
gateway
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Application Backend Architecture
<grxml>
.wav
<vxml>
Transaction Server
HTTP
Internet /
Intranet
Intranet /
Internet
Application
(web) server
• Application logic
• Content and data
• Transaction processing
• Database interface
Database
(content)
Web
service
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Key Points
• Architecture leverages all things "internet"
– Languages, protocols, servers, developers, etc.
• Separation of concerns
– Application logic / database vs. telephony / speech resources
– Enables new business models
• Voice ASP
• Prepackaged applications
• URL (application) associated with phone number
– Calling party or Called party
– Share resources among many applications (VoiceASP)
• High-level languages, specific to domain / task
– Simplify development and maintenance
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
The Voice Ecosystem
Business models
•
•
•
•
Web ecosystem
Voice ecosystem
App components, Packaged apps
App development shops
• Voice Hosting / VoiceASP
• 'voice web' changed the proprietary IVR market
– Hardware to software evolution
 Standards enable the ecosystem
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Voice Application Components and Considerations
•
•
•
•
•
Application Server
Business Logic
Content and Content Management
Transaction Server Interface
Networking
•
•
•
•
•
•
•
Caching
Load Balancing
Failover / Redundancy
Security
Applications and System Monitoring
Billing
Provisioning
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Voice Standards Evolution
Standards: When
“The Apocalypse of the Two Elephants”
–(source: Andrew S. Tanenbaum “Computer Networks” / David Clark)
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Phone/Voice Markup Language Evolution
• 1995: PML v1
– <prompt>, <collect>
• 1996/97: PML v2
– "interpreted" HTML, "IVR mode"
– macros for localized/specialized control
• 1998: VoxML
• 1998: SpeechML
• 1999/2000: VoiceXML
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Voice / Web Standards
•
W3C
–
–
–
•
•
Multimodal Interaction Working Group
Multimodal architecture
External/network event mechanism
Natural Language Semantics / EMMA
VoiceXML Forum
–
–
–
•
VoiceXML 2.0, 2.1, & beyond
Speech Recognition
Speech Synthesis
W3C
–
–
–
Voice Browser Working Group
Certification programs: platforms and developers
Education
Marketing
IETF
–
–
Internet protocols
SIP, MRCP
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Road to VoiceXML 2.0
VoiceXML 2.0
standard
VoiceXML 1.0
specification
Member submissions
& change requests
PML, VoxML, SpeechML
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Other Standards
• Offshoots from VoiceXML: W3C Speech
Interface Framework
• GrXML
• SSML
• SI
• EMMA
• CCXML
• …
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Proliferation of VoiceXML
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Who uses VoiceXML?
• Wireless carriers
– Voice Dialing
– Info portals
• Enterprises
– Financial Services
– CRM / Customer Self-Service
– Intranet
• Telecommunication Services
– Network prompters
– Directory Assistance
– Personalized features
• Travel
– American Airlines, United Airlines, Orbitz, Song
– 511 in many states (California, Utah, Virginia, etc.)
– GM / OnStar
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Wireless Carriers
Voice dialing, Information portals, Customer service
•
•
•
•
•
AT&T Wireless #121 (Tellme and Comverse)
Cingular (BeVocal)
Verizon Wireless (HeyAnita)
SprintPCS (Nuance)
USCellular (BeVocal)
• Telecom Italia Mobile (iTIM/Loquendo)
• Orange (UK)
• T-Mobile (US, Germany)
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Enterprise Apps
• Financial Services
– Merrill Lynch, E*Trade, Schwab
• Customer self-service (internal and external)
– Name/Address change (NetByTel, Voxeo)
• Time, Inc. magazines
– Mortgage pre-approval (NetByTel, Nuance)
• ABNM AMRO, Countrywide
– Medicare claims status and processing
• Empire Blue Cross
– Intranet universal access
• SBC "HR Speak"
• Unified messaging, productivity
– Oracle Collaboration Suite
– Siemens OpenScape
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Telecom Apps
• Directory Assistance
– AT&T Toll-Free Directory Assistance (1-800-555-1212)
– Telecom Italia ("12" and "1412")
• Findme/Advanced Calling/etc.
– AT&T CallVantage
– Z-Tel Personal Voice Assistant
– Webly
• Customer Care
– Bell Canada "Emily/Emilie"
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Everywhere
• Europe
–
–
–
–
–
–
–
–
UK (Lastminute.com)
Italy (TrenItalia Railway Timetable and Fares)
Germany: T-Mobile InfoTalk
Spain: Vodafone
Portugal (Telisma)
Monaco (Monaco Telecom)
Ericsson "Talking Intranet" (Germany, Austria, Switzerland)
France (Loquendo CRM app for perfume company)
• South Africa
– CRM
• Asia
–
–
–
–
Japan: Voizi
Singapore
Korea
China
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
What's next?
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Evolving standards for different needs
Example: VoiceXML telephony vs. CCXML
Call processing: <transfer>
• Blind
– Go somewhere but
don't return
• Bridge
– Add on another party,
resume execution
when done talking
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Call processing: <transfer>
• Blind transfer
<form id="xfer">
<block>
<prompt> Calling Riley. Please wait. </prompt>
</block>
<transfer name="mycall" dest="tel:+1-555-123-4567" >
</transfer>
</form>
Note: Code simplified for demonstration purposes…
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Call processing: <transfer>
• Bridge transfer
<form id="xfer">
<block> <prompt> Calling Riley. Please wait. </prompt> </block>
<transfer name="mycall" dest="tel:+1-555-123-4567" bridge="true" >
</transfer>
</form>
Note: Code simplified for demonstration purposes…
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Call processing: <transfer>
• Bridge transfer with cancel feature
<form id="xfer">
<block> <prompt> Calling Riley. Please wait. </prompt> </block>
<transfer name="mycall" dest="tel:+1-555-123-4567" bridge="true" >
<prompt> Say cancel at any time to disconnect this call.</prompt>
<grammar src="cancel.grxml" type="application/srgs+xml"/>
</transfer>
</form>
Note: Code simplified for demonstration purposes…
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Call processing: <transfer>
<form id="xfer">
<block> <prompt> Calling Riley. Please wait. </prompt> </block>
<transfer name="mycall" dest="tel:+1-555-123-4567" bridge="true"
transferaudio="music.wav" connecttimeout="60s" >
<prompt> Say cancel at any time to disconnect this call.</prompt>
<grammar src="cancel.grxml" type="application/srgs+xml"/>
<filled>
<assign name="mydur" expr="mycall$.duration"/>
<if cond="mycall == 'busy'">
<prompt> Riley's line is busy. Try back later. </prompt>
<elseif cond="mycall == 'noanswer'"/>
<prompt> Riley didn't answer the phone. Please call
back another time. </prompt>
</if>
</filled>
</transfer>
</form>
Note: Code simplified for demonstration purposes…
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
CCXML
• Markup language for 3rd party call control
• Connection management
– Individual call legs
– Conferencing
– Event handling
• W3C Last Call Working Draft
http://www.w3.org/TR/2004/WD-ccxml-20040430/
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
3rd Party Call Control (SIP and RTP)
• Call control application manages connections
•
•
•
When to answer
Where to route ‘call’ or media
Answer and disconnect supervision
Application
Logic
(HTTP)
Connection
Control
[CCXML]
PSTN
VoIP
Gateway
signalling
signalling
audio
Voice
dialog
[VoiceXML]
customer
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
3rd Party Call Control
• Call control application manages connections
•
•
•
When to answer
Where to route ‘call’ or media
Answer and disconnect supervision
Routing
Application
Logic
VUI
Application
Logic
(HTTP)
(HTTP)
Connection
Control
[CCXML]
PSTN
VoIP
Gateway
signalling
signalling
audio
Voice
dialog
[VoiceXML]
customer
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Call processing using CCXML
?xml version="1.0" encoding="UTF-8"?>
<ccxml version="1.0">
<!-- Lets declare our state var -->
<var name="state0" expr="'init'"/>
<eventprocessor statevariable="state0">
<!-- Process the incoming call -->
<transition state="'init'" event="connection.ALERTING">
<accept/>
</transition>
<!-- Call has been answered -->
<transition state="'init'" event="connection.CONNECTED" name="evt">
<log expr="'Houston, we have liftoff.'"/>
<dialogstart src="'gimme.vxml'"/>
<assign name="state0" expr="'dialogActive'" />
</transition>
Note: Code simplified for demonstration purposes…
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Call processing using CCXML
<!-- Call has been answered -->
<transition state="'init'" event="connection.CONNECTED" name="evt">
<log expr="'Houston, we have liftoff.'"/>
<dialogstart src="'gimme.vxml'"/>
<assign name="state0" expr="'dialogActive'" />
</transition>
<!-- Process the incoming call -->
<transition state="'dialogActive'" event="dialog.exit" name="evt">
<log expr="'Houston, the dialog returned [' + evt.values.input + '
<exit />
</transition>
Note: Code simplified for demonstration purposes…
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Call processing using CCXML
<!-- Caller hung up. Lets just go on and end the session -->
<transition event="connection.DISCONNECTED" name="evt">
<exit/>
</transition>
<!-- Something went wrong. Lets go on and log some info and end the
<transition event="error.*" name="evt">
<log expr="'Houston, we have a problem: (' + evt.reason + ')'"/>
<exit/>
</transition>
</eventhandler>
</ccxml>
Note: Code simplified for demonstration purposes…
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.
Distributed Media Processing and Control
Media Resource Control Protocol (MRCP)
HTTP
VoiceXML
gateway
PSTN
Internet
Web
server
Internet
ASR
TTS
Telephony
Audio
DTMF
middleware
OA&M
VoiceXML
interpreter
• Lightweight protocol for
distributed media processing
• IETF draft
MRCP
…
audio
video
Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.