The Role of the Internet in the Evolution of Speech and Telephony Applications Curt Tuckey Ken Rehor Director, Voice Laboratory Chief Architect © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved 1 A bit of background… © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved 3 Internet application model (Part I) Internet HTTP Request HTTP Response (Device Markup) Web Server Laptop Internet Service Provider Or Internal Network © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved Database 4 The Model-View-Controller Pattern • MVC (e.g. MAWL, Bell Labs, 1995) – Data adapters, other actions (e.g. Enterprise javabeans) – Controller (e.g., java servlet) – View (e.g. JSP and taglib templates) © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved 5 Make no little plans; they have no magic to stir men's blood and probably will themselves not be realized. Make big plans; aim high in hope and work, remembering that a noble, logical diagram once recorded will not die. -- Daniel Burnham © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved 6 Phone Web – August 1995 © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved 7 Make no little plans; they have no magic to stir men's blood and probably will themselves not be realized. Make big plans; aim high in hope and work, remembering that a noble, logical diagram once recorded will not die. -- Daniel Burnham © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved 8 Bringing the internet model to voice (Part I) Service Provider HTTP Request HTTP Response (Device Markup) Landline or mobile phone Phone Network Voice Gateway Web Server PSTN or Wireless Database © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved 9 Evolution of Architectures for Automated Telephone Services PSTN Internet End User End User VoiceXML Gateway •Voice & Telephony functions •ASR, TTS, DTMF •Audio play/record •Telephony interface •VoiceXML client IVR Platform •Voice & Telephony functions •ASR, TTS, DTMF •Audio play/record •Telephony interface •Service logic •Transaction server interface Application Server •Service logic •Transaction server interface Service logic Transaction interface © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved 10 VoiceXML Heritage 2000 B. D. Lucas L. Boyer C. Tuckey 7/00 SpeechML J. Ferrans G. Karam N. Klarlund P. Danielsen VoxML PML PML J. C. Ramming PML K. G. Rehor D. Ladd 2/96 C. Tuckey 11/98 Bell Labs 1995 PML Phone Web M. Benedikt © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved 11 Economic Impact IT WAS ARGUABLY THE GREATEST misallocation of capital in recent history. In the late 1990s, telecoms firms spent billions building new data networks.... Yet innovation in telecoms has not stopped. And it may be telephone services that help struggling telecoms firms to claw their way out of their slump. The real impetus behind better voice applications, however, is the recent agreement on an industry-wide standard. Beginning in 1999, and pushed by such leading telecoms firms as AT&T, IBM, Lucent and Motorola, the industry has come up with a lingua franca for voice applications called VoiceXML.... VoiceXML could yet rescue telecoms carriers from their folly in stringing so much optical fibre around the world. It is ironic that it should be oldfashioned voice that lightens the darkness in fibre. From The Economist, 12/12/2002 © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved 12 VoiceXML and Phone Web Architecture Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Goal: Leverage Web Architecture • Languages • Protocols • Architecture Visual vs. Voice markup Web app components Voice Web app components • HTML – Structure • VoiceXML – Structure – Layout, input declaration, transitions, etc. • Images • Text • Scripts – Dialog flow, input declaration, transitions, etc. • Audio files • Text (for TTS) • Scripts Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. VoiceXML example with error handling <form> <field name="main_menu"> <prompt> <audio src="welcome.wav"> Welcome to Acme. You can choose sales, repair, or order status.</audio> </prompt> <grammar src="main_menu.grxml"/> </field> <help> You can say sales, repair, or order status. </help> <noinput> You must say something. </noinput> <nomatch> I didn't understand you. Please try again. </nomatch> <block> <submit next="http://acme.com/route... " </block> method="get"/> </form> main.vxml Note: Code simplified for demonstration purposes… Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Web Application Architecture <html> Internet or HTTP HTTP Web user Intranet • Images • Audio files • Scripts Application (web) server • Application logic • Content and data • Transaction processing • Database interface Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Phone / Web Application Architecture PSTN <vxml> • Grammars • Audio files • Scripts Phone user HTTP VoiceXML gateway Internet or HTTP Intranet <html> Web user HTTP • Images • Audio files • Scripts Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Application (web) server • Application logic • Content and data • Transaction processing • Database interface Voice Application Architecture and Components <grxml> Welcome to Acme products … Customer service, please… .wav <vxml> Caller HTTP Internet Web server Telephony middleware OA&M VoiceXML interpreter ASR TTS Audio DTMF PSTN VoiceXML gateway Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Application Backend Architecture <grxml> .wav <vxml> Transaction Server HTTP Internet / Intranet Intranet / Internet Application (web) server • Application logic • Content and data • Transaction processing • Database interface Database (content) Web service Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Key Points • Architecture leverages all things "internet" – Languages, protocols, servers, developers, etc. • Separation of concerns – Application logic / database vs. telephony / speech resources – Enables new business models • Voice ASP • Prepackaged applications • URL (application) associated with phone number – Calling party or Called party – Share resources among many applications (VoiceASP) • High-level languages, specific to domain / task – Simplify development and maintenance Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. The Voice Ecosystem Business models • • • • Web ecosystem Voice ecosystem App components, Packaged apps App development shops • Voice Hosting / VoiceASP • 'voice web' changed the proprietary IVR market – Hardware to software evolution Standards enable the ecosystem Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Voice Application Components and Considerations • • • • • Application Server Business Logic Content and Content Management Transaction Server Interface Networking • • • • • • • Caching Load Balancing Failover / Redundancy Security Applications and System Monitoring Billing Provisioning Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Voice Standards Evolution Standards: When “The Apocalypse of the Two Elephants” –(source: Andrew S. Tanenbaum “Computer Networks” / David Clark) Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Phone/Voice Markup Language Evolution • 1995: PML v1 – <prompt>, <collect> • 1996/97: PML v2 – "interpreted" HTML, "IVR mode" – macros for localized/specialized control • 1998: VoxML • 1998: SpeechML • 1999/2000: VoiceXML Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Voice / Web Standards • W3C – – – • • Multimodal Interaction Working Group Multimodal architecture External/network event mechanism Natural Language Semantics / EMMA VoiceXML Forum – – – • VoiceXML 2.0, 2.1, & beyond Speech Recognition Speech Synthesis W3C – – – Voice Browser Working Group Certification programs: platforms and developers Education Marketing IETF – – Internet protocols SIP, MRCP Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Road to VoiceXML 2.0 VoiceXML 2.0 standard VoiceXML 1.0 specification Member submissions & change requests PML, VoxML, SpeechML Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Other Standards • Offshoots from VoiceXML: W3C Speech Interface Framework • GrXML • SSML • SI • EMMA • CCXML • … Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Proliferation of VoiceXML Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Who uses VoiceXML? • Wireless carriers – Voice Dialing – Info portals • Enterprises – Financial Services – CRM / Customer Self-Service – Intranet • Telecommunication Services – Network prompters – Directory Assistance – Personalized features • Travel – American Airlines, United Airlines, Orbitz, Song – 511 in many states (California, Utah, Virginia, etc.) – GM / OnStar Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Wireless Carriers Voice dialing, Information portals, Customer service • • • • • AT&T Wireless #121 (Tellme and Comverse) Cingular (BeVocal) Verizon Wireless (HeyAnita) SprintPCS (Nuance) USCellular (BeVocal) • Telecom Italia Mobile (iTIM/Loquendo) • Orange (UK) • T-Mobile (US, Germany) Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Enterprise Apps • Financial Services – Merrill Lynch, E*Trade, Schwab • Customer self-service (internal and external) – Name/Address change (NetByTel, Voxeo) • Time, Inc. magazines – Mortgage pre-approval (NetByTel, Nuance) • ABNM AMRO, Countrywide – Medicare claims status and processing • Empire Blue Cross – Intranet universal access • SBC "HR Speak" • Unified messaging, productivity – Oracle Collaboration Suite – Siemens OpenScape Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Telecom Apps • Directory Assistance – AT&T Toll-Free Directory Assistance (1-800-555-1212) – Telecom Italia ("12" and "1412") • Findme/Advanced Calling/etc. – AT&T CallVantage – Z-Tel Personal Voice Assistant – Webly • Customer Care – Bell Canada "Emily/Emilie" Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Everywhere • Europe – – – – – – – – UK (Lastminute.com) Italy (TrenItalia Railway Timetable and Fares) Germany: T-Mobile InfoTalk Spain: Vodafone Portugal (Telisma) Monaco (Monaco Telecom) Ericsson "Talking Intranet" (Germany, Austria, Switzerland) France (Loquendo CRM app for perfume company) • South Africa – CRM • Asia – – – – Japan: Voizi Singapore Korea China Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. What's next? Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Evolving standards for different needs Example: VoiceXML telephony vs. CCXML Call processing: <transfer> • Blind – Go somewhere but don't return • Bridge – Add on another party, resume execution when done talking Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Call processing: <transfer> • Blind transfer <form id="xfer"> <block> <prompt> Calling Riley. Please wait. </prompt> </block> <transfer name="mycall" dest="tel:+1-555-123-4567" > </transfer> </form> Note: Code simplified for demonstration purposes… Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Call processing: <transfer> • Bridge transfer <form id="xfer"> <block> <prompt> Calling Riley. Please wait. </prompt> </block> <transfer name="mycall" dest="tel:+1-555-123-4567" bridge="true" > </transfer> </form> Note: Code simplified for demonstration purposes… Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Call processing: <transfer> • Bridge transfer with cancel feature <form id="xfer"> <block> <prompt> Calling Riley. Please wait. </prompt> </block> <transfer name="mycall" dest="tel:+1-555-123-4567" bridge="true" > <prompt> Say cancel at any time to disconnect this call.</prompt> <grammar src="cancel.grxml" type="application/srgs+xml"/> </transfer> </form> Note: Code simplified for demonstration purposes… Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Call processing: <transfer> <form id="xfer"> <block> <prompt> Calling Riley. Please wait. </prompt> </block> <transfer name="mycall" dest="tel:+1-555-123-4567" bridge="true" transferaudio="music.wav" connecttimeout="60s" > <prompt> Say cancel at any time to disconnect this call.</prompt> <grammar src="cancel.grxml" type="application/srgs+xml"/> <filled> <assign name="mydur" expr="mycall$.duration"/> <if cond="mycall == 'busy'"> <prompt> Riley's line is busy. Try back later. </prompt> <elseif cond="mycall == 'noanswer'"/> <prompt> Riley didn't answer the phone. Please call back another time. </prompt> </if> </filled> </transfer> </form> Note: Code simplified for demonstration purposes… Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. CCXML • Markup language for 3rd party call control • Connection management – Individual call legs – Conferencing – Event handling • W3C Last Call Working Draft http://www.w3.org/TR/2004/WD-ccxml-20040430/ Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. 3rd Party Call Control (SIP and RTP) • Call control application manages connections • • • When to answer Where to route ‘call’ or media Answer and disconnect supervision Application Logic (HTTP) Connection Control [CCXML] PSTN VoIP Gateway signalling signalling audio Voice dialog [VoiceXML] customer Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. 3rd Party Call Control • Call control application manages connections • • • When to answer Where to route ‘call’ or media Answer and disconnect supervision Routing Application Logic VUI Application Logic (HTTP) (HTTP) Connection Control [CCXML] PSTN VoIP Gateway signalling signalling audio Voice dialog [VoiceXML] customer Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Call processing using CCXML ?xml version="1.0" encoding="UTF-8"?> <ccxml version="1.0"> <!-- Lets declare our state var --> <var name="state0" expr="'init'"/> <eventprocessor statevariable="state0"> <!-- Process the incoming call --> <transition state="'init'" event="connection.ALERTING"> <accept/> </transition> <!-- Call has been answered --> <transition state="'init'" event="connection.CONNECTED" name="evt"> <log expr="'Houston, we have liftoff.'"/> <dialogstart src="'gimme.vxml'"/> <assign name="state0" expr="'dialogActive'" /> </transition> Note: Code simplified for demonstration purposes… Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Call processing using CCXML <!-- Call has been answered --> <transition state="'init'" event="connection.CONNECTED" name="evt"> <log expr="'Houston, we have liftoff.'"/> <dialogstart src="'gimme.vxml'"/> <assign name="state0" expr="'dialogActive'" /> </transition> <!-- Process the incoming call --> <transition state="'dialogActive'" event="dialog.exit" name="evt"> <log expr="'Houston, the dialog returned [' + evt.values.input + ' <exit /> </transition> Note: Code simplified for demonstration purposes… Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Call processing using CCXML <!-- Caller hung up. Lets just go on and end the session --> <transition event="connection.DISCONNECTED" name="evt"> <exit/> </transition> <!-- Something went wrong. Lets go on and log some info and end the <transition event="error.*" name="evt"> <log expr="'Houston, we have a problem: (' + evt.reason + ')'"/> <exit/> </transition> </eventhandler> </ccxml> Note: Code simplified for demonstration purposes… Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved. Distributed Media Processing and Control Media Resource Control Protocol (MRCP) HTTP VoiceXML gateway PSTN Internet Web server Internet ASR TTS Telephony Audio DTMF middleware OA&M VoiceXML interpreter • Lightweight protocol for distributed media processing • IETF draft MRCP … audio video Copyright © 2004 Oracle Corporation and Vocalocity, Inc. All Rights Reserved.