Syllabus COSI 115: Spoken Dialog Design Overview Interactive applications have become ubiquitous around the world on phones and other devices. Since voice is the most natural medium for human communication, spoken dialog is becoming an essential part of the interface. However, creating an effective spoken dialog application requires more than just programming skills. It requires knowledge from many disciplines including linguistics, artificial intelligence, computer-human interaction, and computational linguistics. This course will bring together the essential elements of these fields and the software skills and tools required to build an effective dialog system and guide students through handson projects applying that knowledge to real applications. Learning Objectives At the end of the course students will • Understand the basic principles of the fields that are underlie spoken dialog, including: • Fundamental linguistic principles of discourse, • Artificial Intelligence techniques for plan recognition and task execution, • Computational models for recognizing intentions and coreference resolution • Human-computer Interaction (HCI) and Voice User Interface (VUI) design • Understand the architecture of spoken dialog systems and the capabilities and limitations of the software components required to execute the application, such as • speech recognition • speech synthesis • dialog modules • Be able to apply this knowledge to building spoken dialog applications using industry and research tools. Required Reading There is no required textbook for the course. The course will rely mostly on published papers and online resources ranging from early papers on the fundamentals of dialog to current research. The instructor will also make available lecture notes/slides on the topics covered on class. Example of published articles to be covered include the following: • • • • Hobbs, Jerry R. "Coherence and coreference." Cognitive science 3.1 (1979): 67-90 Allen, James F., and C. Raymond Perrault. "Analyzing intention in utterances." Artificial intelligence 15.3 (1980) Grosz, Barbara J., and Candace L. Sidner. "Attention, intentions, and the structure of discourse." Computational linguistics 12.3 (1986): 175-204. Walker, Marilyn A. "Centering, anaphora resolution, and discourse structure." Centering theory in discourse (1998) • • • • Bohus, Dan, and Alexander I. Rudnicky. "RavenClaw: Dialog management using hierarchical task decomposition and an expectation agenda." (2003). Li, Xiao, et al "Leveraging multiple query logs to improve language models for spoken query recognition." Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on. IEEE, 2009. (Microsoft research lab) Suendermann, D., Liscombe, J., Bloom, J., Li, G., Pieraccini, R., Large-Scale Experiments on Data-Driven Design of Commercial Spoken Dialog Systems. In Proc. of the Interspeech 2011 Mamou, Jonathan, et al. "Improved Spoken Query Transcription Using Co-Occurrence Information." INTERSPEECH. 2011. (IBM Research) Schedule Week Morning: Theory Afternoon: Applications 1 Overview: Speech Recognition Applications, Speech Industry Components of a spoken dialog system: Speech recognition, Speech Synthesis Human Conversation: Discourse and DIalog Speech recognition manager Discourse structure Dictionaries and grammars Anaphora and reference resolution Statistical language modeling Plan recognition and task structure Speech performance evaluation Dialog and belief representation Data vs. Knowledge Dialog Design: Use cases Natural Language Processing Dialog Design: Clarification Dialogs and error recovery Advanced dialog architectures Dialog system Evaluation Multimodal applications Case studies Case studies 2 3 4 5 architecture, Dialog Grading 50% Programming assignments: There will be 4-5 programming assignment exercising the principles covered in the lectures that will expose students to a variety of programming languages and tools that are typically used in spoken dialog development in research and industry. 30% Homework and take home quizzes: Periodic homework assignments and take home quizzes will allow students to synthesize the knowledge from readings and lectures and consider the application of the principles in multiple contexts. 20% Class participation: Students will be required to participate in class discussions, work in groups, and submit to class blog discussions.