CS 410/510 SLA-Spoken Language Interfaces Fall Quarter 2013 Syllabus (Updated 29 August 2013) Wednesdays, 19:00 ̵̶ 21:40 Location: Willow Creek October 2- ̵̶ December 4, 2013 Final Exam: December 22, 2013, 18:00 ̵̶ 21:40 (during finals week) Instructor: Jim Larson, PhD jim@larson-tech.com (503) 629-5167 Course Motivation Speech technologies, such as automatic speech recognition (ASR) and text-to-speech synthesis (TTS), enable users to speak and listen to computer applications. Interacting with computer applications using speech technologies is useful when the user’s hands and/or eyes are busy (i.e., when driving), it is inconvenient to use a keyboard (i.e. cold temperatures, while walking), a usable keyboard is not available (small smartphones), and for users with disabilities who are unable to use a keyboard. While it is easy to create a speech-enabled application, it is difficult to create an easy-to-learn and easy-to-use speech application. This course answers fundamental speech user interface questions, including: How to involve users in every stage of the design and implementation of speechenabled applications How to enable the computer to listen to users by writing speech grammars used by a speech recognizer How to enable the computer to speak to users by preparing text prompts which are converted to speech by a speech synthesizer or are prerecorded by professional voice actors How to write error handlers that deal with events such as no response by the user, unrecognizable words, and requests for help How to choose and implement the appropriate speech dialog style How to create new speech applications by reusing pieces of existing speech applications This course summarizes current research and presents numerous guidelines, suggestions, and conventional wisdom for designing easy-to-learn, easy-to-use, and effective voice user interfaces 1 Course Goal Prepare class members to design, construct, and evaluate speech applications user interfaces. Class members will use state-of-the art languages and development tools to design, create, and evaluate a speech application, including: Interactive voice response interface (IVR) in which the application speaks prompts to the user and the user responds by speaking the answers to the prompts. These applications are “voice only” and work only on telephones. They do not use the display available on today’s smartphones. Multimodal user interfaces into which the user speaks, touches and taps, and the computer responds by displaying information on a screen and speaking to the user. With the introduction of smartphones, multimodal applications that provide both visual and verbal user interfaces are becoming popular. Class Organization This course will consist of a combination of several activities: 1. Lectures—Summarize the state-of-the art practices in constructing spoken language applications. There are no texts required for this class. Class lecture notes may be downloaded from http://www.larson-tech.com/PSU. 2. Quizzes—There will be a quiz at the beginning of each class. Each quiz will cover the material from the previous lecture. Quizzes are designed to encourage students to review the material from the previous class and to provide students with the types of questions that might appear on the midterm and final exams. We will review the answers to quizzes during class. Quizzes will not be turned in for grading. 3. Assignments—Each assignment is designed to apply lecture material to actual speechenabled applications and will have a specific goal. Assignments are due on a weekly basis. Assignments should be completed by each student, not student teams. See the schedule at the end of this syllabus for the exact dates of each assignment. Each assignment is worth 10 points. 4. Team Project—Design and implement a speech-enabled application. The best way to learn is to do. Class members will form teams during the first class to work jointly on a project. Because some speech recognition systems have difficulty recognizing the speech of non-native English speakers, or speakers with non-American accents, each team should include at least one native English speaker. The project will consist of several reports (each worth 10 points) and a demonstration (worth 50 points). All members of a team will receive the same number of points for each report and for the demonstration. See the schedule at the end of this syllabus for the exact due dates for 2 each team report and demonstration. You may submit your project toot the AVIOS Student Contest and possibly win a cash prize of $1000 and a subsidy of up to $500 for travel and lodging to the Mobile Voice Conference in San Francisco http://mobilevoiceconference.com/. Sometimes problems arise among team members. You are all adults; I expect you to resolve intra-team member problems. A team may eject a team member who does not do his/her share. Ejected team members will be expected to complete the project by themselves 5. Exams—there will be a midterm and final exam. Both will be closed book. See the schedule at the end of this syllabus for the exact dates for the midterm and final exams. Prerequisite Students will need an understanding of HTML and JavaScript. In addition, an understanding of how to develop applications on smartphones will be useful. Students may need access to a smartphone, either Android or iPhone, in order to demonstrate group projects. Grading This course is designed to achieve specific learning outcomes. Performance assessment depends upon the accomplishment of these outcomes; grades are not “given” but are “earned.” Class members are graded on demonstrated knowledge and competence rather than on effort alone. It is the responsibility of the class member to come to class and/or participate in each group project fully prepared and eager to contribute to the individual and collective learning experience. 95%-100% of possible points 90%-94.9% of possible points 87%-89.9% of possible points 83%-86.9% of possible points 80%-22.9% of possible points 77%-79.9% of possible points 73%-76.9% pos possible points 70%-72.9% of possible points A AB+ B BC+ C C- Students earning less than 70% of the possible points will not receive a passing grade. The instructor may lower the percentages (for example, if everyone does poorly on their final exam). Thus, there WILL be some final grades of A, and it is possible that everyone will earn a final grade of A. Submitting Assignments 3 Copy and paste the assignment report and code “instream” into an email message and send to omitfo49@gmail.com. Include the following in the subject line: Assignment <n> from <your last name> Submitting Projects Copy the URL describing your team project into an email message and send it to omitfo49@gmail.com. Include the following in the subject line: Assignment <n> from <the last names of each team member>. Late Assignments and Projects Assignments are due at before 11:59 a.m. the day they are due. No late assignments will be accepted or graded. I encourage students to complete and submit assignments and projects ahead of time. Instructor Availability Normally, I am available from 9:00 a.m. to 12:00 Noon Pacific Standard Time (PST), Monday through Friday. The best and quickest way to contact me is via email. Should you need to make other arrangements than the above to contact me, please do not hesitate to contact me at your earliest convenience. I will work with you to make other arrangements. Please note that I provide you with these times to make it easier to communicate with me, not to limit our contact. If you should need to contact me outside these timeframes, please do not hesitate to do so. Email and Email List The instructor’s email address is jim@larson-tech.com. Please use this email list for personal questions only. The course email group address is CS510SLA@googlegroups.com. Use this email list for all general questions and comments. The instructor will also email messages and important notices using the course email address. During the first class, I will collect email address and invite you to join this group. Use the submission email address omitfo49@gmail.com to submit assignments and projects. Texts There is no textbook for this course. Class lecture notes may be downloaded from http://www.larson-tech.com/PSU. 4 Course Schedule Assignments and reports are due before 11:59 a.m. on this date: October 2 October 9 In-class lectures, discussions, exams, and demonstrations October 16 Voice Guidelines.pptx Designing VUI.ppt October 23 November 6 Choosing Voice Applications.pptx Testing3.ppt Choosing Multimodal Applications.ppt Selecting Multimodal Platform.ppt Midterm Exam November 13 Team Project Web page.doc November 20 November 27 (day before Thanksgiving) TBD TBD December 4 Project demonstrations December 11 Final Exam October 30 (Halloween) 5 Speech Technologies V14.ppt VXML_More.ppt Assignments and reports Assignment 1: First IVR application Assignment 2: Audio books and Speech Synthesis Markup Language (SSML) Assignment 3: Dialogs using prerecorded voice and DTMF Assignment 4: Help and event handling Assignment 5: Voice Survey Assignment 6: Playback controller Assignment 7: Semantic interpretation (SI) Assignment 8: Usability test Team Report 1: Define project application requirements OR submit proposal for an alternative project Team Report 2: Select multimodal platform TBD Team Report 3: Usability test results Submit intention to submit an application to the AVIOS student application contest, www.avios.org URL for the Team Project Web due Assignments 1. First IVR application. a. Go to http://evolution.voxeo.com/ and create an account. b. Download and install Voxeo Prophecy VoiceXML. c. Go to Quick Start Guide https://evolution.voxeo.com/docs/quickStart.jsp. d. Select VoiceXML Tutorial 1 and implement the Hello World plus Voice Recognition application. http://docs.voxeo.com/voicexml/2.0/frame.jsp?page=t_1.htm e. Use the application by calling Prophecy by using your telephone. f. Be prepared to demonstrate your application in class. 2. Audio books and Speech Synthesis Markup Language a. Copy the text of the fairy tale “Goldilocks and the Three Bears” from http://www.dltk-teach.com/rhymes/goldilocks_story.htm. b. Insert SSML commands <break> and <emphasis> and the VoiceXML command <prompt xml:lang= "x"> to create a multiple-voiced talking book. For xml:lang values, see http://docs.voxeo.com/prophecy/11.0/frame.jsp?page=migratetts.html. c. Create a VoiceXML application that reads the fairy tale to a telephone user. d. Be prepared to demonstrate your application in class. 3. Dialogs using prerecorded voice and DTMF a. Develop an IVR application for use by people who only have access to a basic cell phone and for which there is neither ASR nor TTS available for the language they speak. b. Assume the users speak your native language (or “pig-Latin” if you speak only English). c. Pre-record voice prompts spoken in your native language. d. Use DTMF to collect the user’s response to each prompt. e. Be prepared to demonstrate your application in class. 4. Help and event handling a. Modify the basic playback controller to support event handlers (help, noinput, no match) for each command. b. Identify two commands which the ASR sometimes confuses (e.g., “Boston” and “Austin”) and implement techniques to disambiguate the commands. c. Be prepared to demonstrate your application in class. 5. Voice Survey a. Create a paper form with 10 survey questions. Each question can be answered by using a Lickert scale http://en.wikipedia.org/wiki/Likert_scale. b. Debug the wording of the instructions and questions by asking a couple of friends to complete the survey and provide critiques. 6 c. Implement an IVR version of the survey. d. Ask at least 5 friends to take the IVR survey. Watch them and record any troubles or problems they encounter. e. Fix the problems and repeat step d. f. Be prepared to demonstrate your application in class. g. Explain your use of “tapered prompts.” 6. Basic playback controller (specify a grammar) a. Create a state transition diagram illustrating the user commands to a playback controller (start, stop, fast forward, pause, resume, rewind, etc.). b. Follow the methodology in “European Command Vocabulary” (see the appendix) by (1) spontaneous generation of potential command words, (2) rating the confidence of potential command words, and (3) validating that the command words can be accurately recognized by the ASR. c. Implement the basic playback controller using Voxeo prophecy. d. Be prepared to demonstrate your application in class. 7. Semantic interpretation a. Write an IVR user interface that prompts users for airlines reservation. information and constructs a command format usably by the underlying airline reservation system. b. Assume the ASR produces commands in the following format Java structure: { FlightNumber: XXXX DepartureDate: YYMMDD DepartureTime: HHMM (use the twenty-four hour clock) DepartingAirportCode: PDX DestinationAirportCode: LAX } c. Be prepared to demonstrate your application in class. 8. Usability test a. Design and conduct a usability test involving at least ten users b. Identify problems discovered and suggest solutions for resolving the problems. Team project: Hands-busy, eyes-busy multimodal application Team Report 1: Define project application requirements a. Review the paper by Larson on “Eyes-Busy, Hands-Busy” (Eyes BusyV2.docx in http://www.larson-tech.com/PSU). b. Select an appropriate eyes-busy, hands-busy application. 7 c. Design the application by specifying usage scenarios, content model, and control model. Team Report 2: Select multimodal platform a. Download and install three potential multimodal platforms. b. Validate that the multimodal platform works as advertised. c. Identify the advantages and disadvantages of each platform. d. Select one of the platforms to implement the team project. e. Document your results. Team Report 3: Conduct usability tests a. Implement your application on the selected platform. b. Conduct usability tests and modify your application as necessary. c. Submit detailed descriptions of your usability tests, their results, and the resulting changes your application. Project demonstration a. Demonstrate your project in class. b. Identify tasks which other students in the class will perform using your application. c. Submit the URL for you application’s web page to your instructor. See Team Project Web page.doc for details d. Project evaluation criteria will include the following: i. To get credit, the application must work. ii. The application must support the scenarios, conceptual model, and content model described in Team Report 1. iii. Other criteria include robustness, usefulness, technical superiority, user friendliness, innovation and creativity of our application. Propose your own project. Instead of implementing the project proposed by the instructor, each tem can prpose an alternative project by submitting a proposal to the instructor. The proposal should include: a. b. c. d. Team members names and e-mail addresses. Goal of the application Intended user of the application. Design the application by specifying usage scenarios, content model, and control model e. Software and hardware you plan to use to implement the application f. Submit the proposal to the instructor before 11:59 a.m. Wednesday November 13. Do not begin implementation before the instructor approves your project. 8