Document 14618395

Syllabus COSI 115: Spoken Dialog Design Overview Interactive applications have become ubiquitous around the world on phones and other devices.
Since voice is the most natural medium for human communication, spoken dialog is becoming
an essential part of the interface. However, creating an effective spoken dialog application
requires more than just programming skills. It requires knowledge from many disciplines
including linguistics, artificial intelligence, computer-human interaction, and computational
linguistics. This course will bring together the essential elements of these fields and the software
skills and tools required to build an effective dialog system and guide students through handson projects applying that knowledge to real applications.
Learning Objectives At the end of the course students will
Understand the basic principles of the fields that are underlie spoken dialog, including:
• Fundamental linguistic principles of discourse,
• Artificial Intelligence techniques for plan recognition and task execution,
• Computational models for recognizing intentions and coreference resolution
• Human-computer Interaction (HCI) and Voice User Interface (VUI) design
Understand the architecture of spoken dialog systems and the capabilities and limitations of
the software components required to execute the application, such as
• speech recognition
• speech synthesis
• dialog modules
Be able to apply this knowledge to building spoken dialog applications using industry and
research tools.
Required Reading There is no required textbook for the course. The course will rely mostly on published papers
and online resources ranging from early papers on the fundamentals of dialog to current
research. The instructor will also make available lecture notes/slides on the topics covered on
class. Example of published articles to be covered include the following:
Hobbs, Jerry R. "Coherence and coreference." Cognitive science 3.1 (1979): 67-90
Allen, James F., and C. Raymond Perrault. "Analyzing intention in utterances." Artificial
intelligence 15.3 (1980)
Grosz, Barbara J., and Candace L. Sidner. "Attention, intentions, and the structure of
discourse." Computational linguistics 12.3 (1986): 175-204.
Walker, Marilyn A. "Centering, anaphora resolution, and discourse structure." Centering
theory in discourse (1998)
Bohus, Dan, and Alexander I. Rudnicky. "RavenClaw: Dialog management using
hierarchical task decomposition and an expectation agenda." (2003).
Li, Xiao, et al "Leveraging multiple query logs to improve language models for spoken
query recognition." Acoustics, Speech and Signal Processing, 2009. ICASSP 2009.
IEEE International Conference on. IEEE, 2009. (Microsoft research lab)
Suendermann, D., Liscombe, J., Bloom, J., Li, G., Pieraccini, R., Large-Scale
Experiments on Data-Driven Design of Commercial Spoken Dialog Systems. In Proc. of
the Interspeech 2011
Mamou, Jonathan, et al. "Improved Spoken Query Transcription Using Co-Occurrence
Information." INTERSPEECH. 2011. (IBM Research)
Schedule Week
Morning: Theory
Afternoon: Applications
Speech Recognition
Applications, Speech Industry
Components of a spoken dialog system:
Speech recognition, Speech Synthesis
Human Conversation: Discourse and
Speech recognition
Discourse structure
Dictionaries and grammars
Anaphora and reference resolution
Statistical language modeling
Plan recognition and task structure
Speech performance evaluation
Dialog and belief representation
Data vs. Knowledge
Dialog Design: Use cases
Natural Language Processing
Dialog Design: Clarification Dialogs
and error recovery
Advanced dialog architectures
Dialog system Evaluation
Multimodal applications
Case studies
Case studies
Grading 50% Programming assignments: There will be 4-5 programming assignment exercising the
principles covered in the lectures that will expose students to a variety of programming
languages and tools that are typically used in spoken dialog development in research and
30% Homework and take home quizzes: Periodic homework assignments and take home
quizzes will allow students to synthesize the knowledge from readings and lectures and
consider the application of the principles in multiple contexts.
20% Class participation: Students will be required to participate in class discussions, work in
groups, and submit to class blog discussions.