A Newbie Experience of Dialogue System Construction Using the Ravenclaw Framework Arthur Chan Introduction • Do you know? – Arthur Chan actually takes classes in CMU ! • Course he took this year: – “Project Course: Dialogue Systems” • The course required the use of Ravenclaw/Olympus • A journal was kept on the experience I learned in the process – Requested by gang members such as Dan and Thomas Speaker’s Bio • Mainly a speech recognition guy – i.e. the part that transform speech to text • Not very experienced in dialogue system – Only work on directed dialogue system • Speechwork 6.5 • i.e. an all-in-one dialogue system + speech recognizer • Dialogues are modularized – E.g. Digits, Alphabets, ZipCode What did we do this year? • 3 systems by 3 groups – RoadFinder: Aaron, Dave and Wen – ICSLPInfo: Arthur, Lingyun and Rohit – Extension of Vera: Mohit, Kaimin and ? • The actual situation – Dave did most of the stunts – Each group has a person just to take care development kick-start and system issues – Mailing list became the collaborative means New Development • Sphinx3_Engine – With Sphinx 3.6 RCI – With Powerful Wideband Models (CALO) and Narrowband Models (Communicator) • LM Training Scripts – With tools newly built in Project L (CMUCambridge LM Toolkit “V3”) • IAX_Server – Allow systems to be used in Asterisk server(?) This talk • Mini Case Study of ICSLPInfo – Try to learn what information we could give to users for a conference • The type of information is unknown • Two perspectives – From a new user perspective – From a developer perspective The New User’s Perspective • Generally, as a new user, is it easy to learn Ravenclaw? • Related Questions – – – – Do I hate Dan? (Forever? Or even for a moment?) Is it scary to use Ravenclaw? What do we know /not know at a certain stage? What is the general comment on the software? The Developer’s Perspective • From a developer’s standpoint, what are the issues of development? – Issues in speech recognition? – Issues in dialogue system development? – Issues in general application development? – Issues in multi-developer development? – When should we work on SR/RP/DS/BE? The Development Process • Stage 0: Planning, drawing diagrams and stuffs • Stage 1: Making some existing systems run • Stage 2: Making simple systems run – Making SR works without the backend – Making the backend works without the SR • Stage 3: Making the first end-to-end system to run • (Not cover today) Stage 4: Final adjustment and final demo Stage 0: Planning (2-3 weeks) • Major issue – The type of useful information could be unknown • • • • Author? Session? Title? Venue? – We actually didn’t know what is the most useful at Stage 0 Stage 1: Making some existing systems run (1 month) • Wide varieties of pre-built systems using Ravenclaw – Path 1: Starting from ConvertProj • ConvertProj is a very simple project – Path 2: Starting from RoomLine – Path 3: Starting from scratches • Path 1 was first chosen so that everyone could get an initial system Note in Stage 1 • Not everyone has easy time to get the initial setup running (1-2 weeks) – Forgot to install active perl and miscellaneous tools – At the beginning, didn’t know where to debug • The synthesizer turns out to be not prebuilt (1-2 weeks) • Speech Recognizer is not running yet – Don’t know why at that point. If we starts from ConvertProj…… • How do we write the first system then? – ConvertProj is very simple but we didn’t know what it does…… – We didn’t understand how Phoenix/Ravenclaw works • Rohit: Let us start from Roomline then. – Turns out to be a very good idea – Why? • Roomline is complicated but the learner can learn from the code • There are also couple of patterns could be reused e.g forloop, if-then-else Note: • We already got a hold of “Description of Ravenclaw Agent Description Language” – Not a tutorial, no examples – We didn’t know how to start based on it • That’s why a template was needed – We end up trace the whole Roomline system Stage 2a: Making a system with working SR • Our biggest problem: Name Recognition – Recognizing 1000 names – Many of them are Asian names – No training data – Dave hasn’t built the LM building script • The type of information is not yet set – Should we handle names? Stage 2a: Making a system with working SR (cont.) • Our first bootstrapping system – Use Sphinx3_Engine + CALO model • Probably the strongest SR we could use – Use Roomline language model – Just tweak the grammar a little bit – Add a lot of compound words into classes – Also, only use session chairs (180 names) is in grammar icslpinfo Reset DateTime The First System (No BE) Welcome Task HMIHY Logout Request Satistfied Inform Logout Note at Stage 2a • Finally gotten something running • But the system did nothing • We are still very vague in – how message is passed in Galaxy and – how results transferred from SR to RP to DM Stage 2b: Making the backend works without the SR • The backend is finally built at this stage • The backend/DM/RP is working and text console mode is working • DM now gives the abstract when asked about the author • But this time, SR fails because – the grammar accept too many – the Roomline LM was used. Note at Stage 2b • Another difficult issue shows up – SR/RP/DM are very tightly coupled with each other • Other problems – Occasionally, “” is shown in the prompts – Because some prompts wasn’t filled in • Good part: – The first type of information we will handle is finally decided – This constrains SR – We start to feel time is running short Stage 3: Making the first end-toend system to run • Speech Recognition – Retrain LM using faked corpora – Significantly trimmed down the number of authors to recognize (From 200 to 30) – Few author names are easily recognized still. – The lucky ones • • • • Alan Black Arthur Toth Julia Hirschberg Andrew Rosenberg – (Alex is not very happy about this. His name is confused with “context key”) Note at this point • Started to realized that SR couldn’t have quick improvement • The problem of DM starts to be glaring – No disambiguation – When multiple results are return, no strategy to take care. • Also, SR always couldn’t recognize things in grammar. – A lot of ++GARBAGE++ is recognized – See a lot “On Alan Black” DM • Allow disambiguation using author name and session name • Taken care of different scenarios of results – If there is no results, • Say Sorry and restart. – If there is one result • Present the detail of the paper, • Then ask whether to present the abstract of the paper – If there is less than or equal to 5 results • Tell the user the number of papers found • Then ask whether to present the summary of the paper. – (List of titles of the paper) – If there is more than 5 results • Say sorry Other small things We Hacked Out • Confidence of The Recognizer – Audio Server is hacked such that • We are always “confident” about the results. • Annoying restarting issue – Commented the restarting routine in Windows Backend and NLG • Backend – (may be for this demo only) – SQL-based – Could do author-search and session-namesearch • NLG – Fill in all sorts of prompts – A lot of Implicit Confirmation and Explicit Confirmation are missing • That caused a lot of “” in the system Demo: • Scenario – A user want to know information of the papers written by • Alan Black • Julia Hirschberg and • Andrew Rosenberg • What it shows – How bad recognition is taken care now. – What happened when the number of answers returned are multiple or single. Note: • Rohit Kummar and Lingyun Gao actually holds the latest and greatest system. • This system only shows how we built up from ground zero. Summary: 3 Difficult Issues in the Task • 1, Tight coupling of SR/RP/DM – When one part is right, others could failed • 2, SR issues – The SR task could be affected by different constraints. – First system is hard to be up – Compound with 1 • 3, Lack of documentation in DM – The current documentation base is not strong enough – Read-and-implement approach doesn’t work yet – Some concepts are difficult to understand • Say COMPLETE/SUCCEED/FAILED • GRAMMAR_MAPPING Lessons learn • Iteratively develop the system by boostrapping each with simple systems – This would greatly reduce the pain of coupling • SR issue – The first system could be completed by some smaller grammars first • In some task, SR shouldn’t be the focus at a certain point. – Aligned with common observation • DM Development – A good working template is necessary – What we need: for loop, if-then-else templates The bright side 1: birthday gift for Dave • Once understood, pretty easy to program – E.g. birthday celebration system • Sample Dialogue: – S: Do you want to know what’s going on? – U: Yes (or No) – S: No matter whether you say yes or no, I will have to tell you. Begin message. – Hmm-hm. Today is Mr David Huggines Daines’ Birthday. Because everyone is too shy to sing the birthday song for him. Me, Frank, will have to sing it. Here you go. Happy Birthday to you, Happy Birthday to you. Happy Birthday to David. Happy Birthday to you. This message is bought to you by …… – End message Bright Side 2 • If compared to a directed dialogue system, the current system could give unexpected results. • Why? – several sub-systems of Dialogue system is working together • • • • Built in Libraries Grounding Focuses Developer-defined libraries • It is delightful to use it in general Bright Side 3 • Source code has consistent coding style – Development problem will be mainly stemmed from • 1, Lack of automatic regression test • 2, Lack of central manager • Not a bad thing in dialogue system if developer/system =1 Conclusion • Summarize the system development of how the end-to-end system of ICSLPInfo is first developed • Discussed several issues including – Coupling of systems – SR – DM development • Overall speaking – Thrilled when getting the system running and working