A Newbie Experience of Dialogue System Construction Using the Ravenclaw Framework Arthur Chan

advertisement
A Newbie Experience of
Dialogue System Construction
Using the Ravenclaw Framework
Arthur Chan
Introduction
• Do you know?
– Arthur Chan actually takes classes in CMU !
• Course he took this year:
– “Project Course: Dialogue Systems”
• The course required the use of
Ravenclaw/Olympus
• A journal was kept on the experience I learned in
the process
– Requested by gang members such as Dan and
Thomas
Speaker’s Bio
• Mainly a speech recognition guy
– i.e. the part that transform speech to text
• Not very experienced in dialogue system
– Only work on directed dialogue system
• Speechwork 6.5
• i.e. an all-in-one dialogue system + speech
recognizer
• Dialogues are modularized
– E.g. Digits, Alphabets, ZipCode
What did we do this year?
• 3 systems by 3 groups
– RoadFinder: Aaron, Dave and Wen
– ICSLPInfo: Arthur, Lingyun and Rohit
– Extension of Vera: Mohit, Kaimin and ?
• The actual situation
– Dave did most of the stunts
– Each group has a person just to take care
development kick-start and system issues
– Mailing list became the collaborative means
New Development
• Sphinx3_Engine
– With Sphinx 3.6 RCI
– With Powerful Wideband Models (CALO) and
Narrowband Models (Communicator)
• LM Training Scripts
– With tools newly built in Project L (CMUCambridge LM Toolkit “V3”)
• IAX_Server
– Allow systems to be used in Asterisk server(?)
This talk
• Mini Case Study of ICSLPInfo
– Try to learn what information we could give to
users for a conference
• The type of information is unknown
• Two perspectives
– From a new user perspective
– From a developer perspective
The New User’s Perspective
• Generally, as a new user, is it easy to learn
Ravenclaw?
• Related Questions
–
–
–
–
Do I hate Dan? (Forever? Or even for a moment?)
Is it scary to use Ravenclaw?
What do we know /not know at a certain stage?
What is the general comment on the software?
The Developer’s Perspective
• From a developer’s standpoint, what are
the issues of development?
– Issues in speech recognition?
– Issues in dialogue system development?
– Issues in general application development?
– Issues in multi-developer development?
– When should we work on SR/RP/DS/BE?
The Development Process
• Stage 0: Planning, drawing diagrams and stuffs
• Stage 1: Making some existing systems run
• Stage 2: Making simple systems run
– Making SR works without the backend
– Making the backend works without the SR
• Stage 3: Making the first end-to-end system to
run
• (Not cover today) Stage 4: Final adjustment and
final demo
Stage 0: Planning (2-3 weeks)
• Major issue
– The type of useful information could be
unknown
•
•
•
•
Author?
Session?
Title?
Venue?
– We actually didn’t know what is the most
useful at Stage 0
Stage 1: Making some existing
systems run (1 month)
• Wide varieties of pre-built systems using
Ravenclaw
– Path 1: Starting from ConvertProj
• ConvertProj is a very simple project
– Path 2: Starting from RoomLine
– Path 3: Starting from scratches
• Path 1 was first chosen so that everyone
could get an initial system
Note in Stage 1
• Not everyone has easy time to get the
initial setup running (1-2 weeks)
– Forgot to install active perl and miscellaneous
tools
– At the beginning, didn’t know where to debug
• The synthesizer turns out to be not prebuilt (1-2 weeks)
• Speech Recognizer is not running yet
– Don’t know why at that point.
If we starts from ConvertProj……
• How do we write the first system then?
– ConvertProj is very simple but we didn’t know what it
does……
– We didn’t understand how Phoenix/Ravenclaw works
• Rohit: Let us start from Roomline then.
– Turns out to be a very good idea
– Why?
• Roomline is complicated but the learner can learn from the
code
• There are also couple of patterns could be reused e.g forloop, if-then-else
Note:
• We already got a hold of “Description of
Ravenclaw Agent Description Language”
– Not a tutorial, no examples
– We didn’t know how to start based on it
• That’s why a template was needed
– We end up trace the whole Roomline system
Stage 2a: Making a system with
working SR
• Our biggest problem: Name Recognition
– Recognizing 1000 names
– Many of them are Asian names
– No training data
– Dave hasn’t built the LM building script
• The type of information is not yet set
– Should we handle names?
Stage 2a: Making a system with
working SR (cont.)
• Our first bootstrapping system
– Use Sphinx3_Engine + CALO model
• Probably the strongest SR we could use
– Use Roomline language model
– Just tweak the grammar a little bit
– Add a lot of compound words into classes
– Also, only use session chairs (180 names) is
in grammar
icslpinfo
Reset
DateTime
The First System (No BE)
Welcome
Task
HMIHY
Logout
Request
Satistfied
Inform
Logout
Note at Stage 2a
• Finally gotten something running
• But the system did nothing
• We are still very vague in
– how message is passed in Galaxy and
– how results transferred from SR to RP to DM
Stage 2b: Making the backend
works without the SR
• The backend is finally built at this stage
• The backend/DM/RP is working and text
console mode is working
• DM now gives the abstract when asked
about the author
• But this time, SR fails because
– the grammar accept too many
– the Roomline LM was used.
Note at Stage 2b
• Another difficult issue shows up
– SR/RP/DM are very tightly coupled with each other
• Other problems
– Occasionally, “” is shown in the prompts
– Because some prompts wasn’t filled in
• Good part:
– The first type of information we will handle is finally
decided
– This constrains SR
– We start to feel time is running short
Stage 3: Making the first end-toend system to run
• Speech Recognition
– Retrain LM using faked corpora
– Significantly trimmed down the number of authors to
recognize (From 200 to 30)
– Few author names are easily recognized still.
– The lucky ones
•
•
•
•
Alan Black
Arthur Toth
Julia Hirschberg
Andrew Rosenberg
– (Alex is not very happy about this. His name is confused with
“context key”)
Note at this point
• Started to realized that SR couldn’t have quick
improvement
• The problem of DM starts to be glaring
– No disambiguation
– When multiple results are return, no strategy to take
care.
• Also, SR always couldn’t recognize things in
grammar.
– A lot of ++GARBAGE++ is recognized
– See a lot “On Alan Black”
DM
• Allow disambiguation using author name and session
name
• Taken care of different scenarios of results
– If there is no results,
• Say Sorry and restart.
– If there is one result
• Present the detail of the paper,
• Then ask whether to present the abstract of the paper
– If there is less than or equal to 5 results
• Tell the user the number of papers found
• Then ask whether to present the summary of the paper.
– (List of titles of the paper)
– If there is more than 5 results
• Say sorry
Other small things We Hacked
Out
• Confidence of The Recognizer
– Audio Server is hacked such that
• We are always “confident” about the results.
• Annoying restarting issue
– Commented the restarting routine in Windows
Backend and NLG
• Backend
– (may be for this demo only)
– SQL-based
– Could do author-search and session-namesearch
• NLG
– Fill in all sorts of prompts
– A lot of Implicit Confirmation and Explicit
Confirmation are missing
• That caused a lot of “” in the system
Demo:
• Scenario
– A user want to know information of the papers
written by
• Alan Black
• Julia Hirschberg and
• Andrew Rosenberg
• What it shows
– How bad recognition is taken care now.
– What happened when the number of answers
returned are multiple or single.
Note:
• Rohit Kummar and Lingyun Gao actually
holds the latest and greatest system.
• This system only shows how we built up
from ground zero.
Summary: 3 Difficult Issues in the
Task
• 1, Tight coupling of SR/RP/DM
– When one part is right, others could failed
• 2, SR issues
– The SR task could be affected by different constraints.
– First system is hard to be up
– Compound with 1
• 3, Lack of documentation in DM
– The current documentation base is not strong enough
– Read-and-implement approach doesn’t work yet
– Some concepts are difficult to understand
• Say COMPLETE/SUCCEED/FAILED
• GRAMMAR_MAPPING
Lessons learn
• Iteratively develop the system by boostrapping
each with simple systems
– This would greatly reduce the pain of coupling
• SR issue
– The first system could be completed by some smaller
grammars first
• In some task, SR shouldn’t be the focus at a
certain point.
– Aligned with common observation
• DM Development
– A good working template is necessary
– What we need: for loop, if-then-else templates
The bright side 1:
birthday gift for Dave
• Once understood, pretty easy to program
– E.g. birthday celebration system
• Sample Dialogue:
– S: Do you want to know what’s going on?
– U: Yes (or No)
– S: No matter whether you say yes or no, I will have to tell you.
Begin message.
– Hmm-hm. Today is Mr David Huggines Daines’ Birthday.
Because everyone is too shy to sing the birthday song for him.
Me, Frank, will have to sing it. Here you go. Happy Birthday to
you, Happy Birthday to you. Happy Birthday to David. Happy
Birthday to you. This message is bought to you by ……
– End message
Bright Side 2
• If compared to a directed dialogue system,
the current system could give unexpected
results.
• Why?
– several sub-systems of Dialogue system is working
together
•
•
•
•
Built in Libraries
Grounding
Focuses
Developer-defined libraries
• It is delightful to use it in general
Bright Side 3
• Source code has consistent coding
style
– Development problem will be mainly stemmed
from
• 1, Lack of automatic regression test
• 2, Lack of central manager
• Not a bad thing in dialogue system if
developer/system =1
Conclusion
• Summarize the system development of how the
end-to-end system of ICSLPInfo is first
developed
• Discussed several issues including
– Coupling of systems
– SR
– DM development
• Overall speaking
– Thrilled when getting the system running and working
Download