CS 410/510 SLA-Spoken Language Interfaces

advertisement
CS 410/510 SLA-Spoken Language Interfaces
Fall Quarter 2013
Syllabus
(Updated 29 August 2013)
Wednesdays, 19:00 ̵̶ 21:40
Location: Willow Creek
October 2- ̵̶ December 4, 2013
Final Exam: December 22, 2013, 18:00 ̵̶ 21:40 (during finals week)
Instructor:
Jim Larson, PhD
jim@larson-tech.com
(503) 629-5167
Course Motivation
Speech technologies, such as automatic speech recognition (ASR) and text-to-speech synthesis
(TTS), enable users to speak and listen to computer applications. Interacting with computer
applications using speech technologies is useful when the user’s hands and/or eyes are busy
(i.e., when driving), it is inconvenient to use a keyboard (i.e. cold temperatures, while walking),
a usable keyboard is not available (small smartphones), and for users with disabilities who are
unable to use a keyboard.
While it is easy to create a speech-enabled application, it is difficult to create an easy-to-learn
and easy-to-use speech application. This course answers fundamental speech user interface
questions, including:






How to involve users in every stage of the design and implementation of speechenabled applications
How to enable the computer to listen to users by writing speech grammars used by a
speech recognizer
How to enable the computer to speak to users by preparing text prompts which are
converted to speech by a speech synthesizer or are prerecorded by professional voice
actors
How to write error handlers that deal with events such as no response by the user,
unrecognizable words, and requests for help
How to choose and implement the appropriate speech dialog style
How to create new speech applications by reusing pieces of existing speech applications
This course summarizes current research and presents numerous guidelines, suggestions, and
conventional wisdom for designing easy-to-learn, easy-to-use, and effective voice user
interfaces
1
Course Goal
Prepare class members to design, construct, and evaluate speech applications user interfaces.
Class members will use state-of-the art languages and development tools to design, create, and
evaluate a speech application, including:

Interactive voice response interface (IVR) in which the application speaks prompts to the
user and the user responds by speaking the answers to the prompts. These applications
are “voice only” and work only on telephones. They do not use the display available on
today’s smartphones.

Multimodal user interfaces into which the user speaks, touches and taps, and the
computer responds by displaying information on a screen and speaking to the user.
With the introduction of smartphones, multimodal applications that provide both visual
and verbal user interfaces are becoming popular.
Class Organization
This course will consist of a combination of several activities:
1. Lectures—Summarize the state-of-the art practices in constructing spoken language
applications. There are no texts required for this class. Class lecture notes may be
downloaded from http://www.larson-tech.com/PSU.
2. Quizzes—There will be a quiz at the beginning of each class. Each quiz will cover the
material from the previous lecture. Quizzes are designed to encourage students to
review the material from the previous class and to provide students with the types of
questions that might appear on the midterm and final exams. We will review the
answers to quizzes during class. Quizzes will not be turned in for grading.
3. Assignments—Each assignment is designed to apply lecture material to actual speechenabled applications and will have a specific goal. Assignments are due on a weekly
basis. Assignments should be completed by each student, not student teams. See the
schedule at the end of this syllabus for the exact dates of each assignment. Each
assignment is worth 10 points.
4. Team Project—Design and implement a speech-enabled application. The best way to
learn is to do. Class members will form teams during the first class to work jointly on a
project. Because some speech recognition systems have difficulty recognizing the
speech of non-native English speakers, or speakers with non-American accents, each
team should include at least one native English speaker. The project will consist of
several reports (each worth 10 points) and a demonstration (worth 50 points). All
members of a team will receive the same number of points for each report and for the
demonstration. See the schedule at the end of this syllabus for the exact due dates for
2
each team report and demonstration. You may submit your project toot the AVIOS
Student Contest and possibly win a cash prize of $1000 and a subsidy of up to $500 for
travel and lodging to the Mobile Voice Conference in San Francisco
http://mobilevoiceconference.com/.
Sometimes problems arise among team members. You are all adults; I expect you to
resolve intra-team member problems. A team may eject a team member who does not
do his/her share. Ejected team members will be expected to complete the project by
themselves
5. Exams—there will be a midterm and final exam. Both will be closed book. See the
schedule at the end of this syllabus for the exact dates for the midterm and final exams.
Prerequisite
Students will need an understanding of HTML and JavaScript. In addition, an understanding of
how to develop applications on smartphones will be useful. Students may need access to a
smartphone, either Android or iPhone, in order to demonstrate group projects.
Grading
This course is designed to achieve specific learning outcomes. Performance assessment
depends upon the accomplishment of these outcomes; grades are not “given” but are
“earned.” Class members are graded on demonstrated knowledge and competence rather than
on effort alone. It is the responsibility of the class member to come to class and/or participate
in each group project fully prepared and eager to contribute to the individual and collective
learning experience.
95%-100% of possible points
90%-94.9% of possible points
87%-89.9% of possible points
83%-86.9% of possible points
80%-22.9% of possible points
77%-79.9% of possible points
73%-76.9% pos possible points
70%-72.9% of possible points
A
AB+
B
BC+
C
C-
Students earning less than 70% of the possible points will not receive a passing grade.
The instructor may lower the percentages (for example, if everyone does poorly on their final
exam). Thus, there WILL be some final grades of A, and it is possible that everyone will earn a
final grade of A.
Submitting Assignments
3
Copy and paste the assignment report and code “instream” into an email message and send to
omitfo49@gmail.com. Include the following in the subject line: Assignment <n> from <your
last name>
Submitting Projects
Copy the URL describing your team project into an email message and send it to
omitfo49@gmail.com. Include the following in the subject line: Assignment <n> from <the last
names of each team member>.
Late Assignments and Projects
Assignments are due at before 11:59 a.m. the day they are due. No late assignments will be
accepted or graded. I encourage students to complete and submit assignments and projects
ahead of time.
Instructor Availability
Normally, I am available from 9:00 a.m. to 12:00 Noon Pacific Standard Time (PST), Monday
through Friday. The best and quickest way to contact me is via email. Should you need to make
other arrangements than the above to contact me, please do not hesitate to contact me at your
earliest convenience. I will work with you to make other arrangements.
Please note that I provide you with these times to make it easier to communicate with me, not
to limit our contact. If you should need to contact me outside these timeframes, please do not
hesitate to do so.
Email and Email List
The instructor’s email address is jim@larson-tech.com. Please use this email list for personal
questions only.
The course email group address is CS510SLA@googlegroups.com. Use this email list for all
general questions and comments. The instructor will also email messages and important
notices using the course email address. During the first class, I will collect email address and
invite you to join this group.
Use the submission email address omitfo49@gmail.com to submit assignments and projects.
Texts
There is no textbook for this course. Class lecture notes may be downloaded from
http://www.larson-tech.com/PSU.
4
Course Schedule
Assignments and reports are
due before 11:59 a.m. on this
date:
October 2
October 9
In-class lectures, discussions,
exams, and demonstrations
October 16
Voice Guidelines.pptx
Designing VUI.ppt
October 23
November 6
Choosing Voice
Applications.pptx
Testing3.ppt
Choosing Multimodal
Applications.ppt
Selecting Multimodal
Platform.ppt
Midterm Exam
November 13
Team Project Web page.doc
November 20
November 27 (day before
Thanksgiving)
TBD
TBD
December 4
Project demonstrations
December 11
Final Exam
October 30 (Halloween)
5
Speech Technologies V14.ppt
VXML_More.ppt
Assignments and reports
Assignment 1: First IVR
application
Assignment 2: Audio books
and Speech Synthesis Markup
Language (SSML)
Assignment 3: Dialogs using
prerecorded voice and DTMF
Assignment 4: Help and event
handling
Assignment 5: Voice Survey
Assignment 6: Playback
controller
Assignment 7: Semantic
interpretation (SI)
Assignment 8: Usability test
Team Report 1: Define project
application requirements OR
submit proposal for an
alternative project
Team Report 2: Select
multimodal platform
TBD
Team Report 3: Usability test
results
Submit intention to submit an
application to the AVIOS
student application contest,
www.avios.org
URL for the Team Project Web
due
Assignments
1. First IVR application.
a. Go to http://evolution.voxeo.com/ and create an account.
b. Download and install Voxeo Prophecy VoiceXML.
c. Go to Quick Start Guide https://evolution.voxeo.com/docs/quickStart.jsp.
d. Select VoiceXML Tutorial 1 and implement the Hello World plus Voice
Recognition application.
http://docs.voxeo.com/voicexml/2.0/frame.jsp?page=t_1.htm
e. Use the application by calling Prophecy by using your telephone.
f. Be prepared to demonstrate your application in class.
2. Audio books and Speech Synthesis Markup Language
a. Copy the text of the fairy tale “Goldilocks and the Three Bears” from
http://www.dltk-teach.com/rhymes/goldilocks_story.htm.
b. Insert SSML commands <break> and <emphasis> and the VoiceXML command
<prompt xml:lang= "x"> to create a multiple-voiced talking book. For xml:lang
values, see
http://docs.voxeo.com/prophecy/11.0/frame.jsp?page=migratetts.html.
c. Create a VoiceXML application that reads the fairy tale to a telephone user.
d. Be prepared to demonstrate your application in class.
3. Dialogs using prerecorded voice and DTMF
a. Develop an IVR application for use by people who only have access to a basic cell
phone and for which there is neither ASR nor TTS available for the language they
speak.
b. Assume the users speak your native language (or “pig-Latin” if you speak only
English).
c. Pre-record voice prompts spoken in your native language.
d. Use DTMF to collect the user’s response to each prompt.
e. Be prepared to demonstrate your application in class.
4. Help and event handling
a. Modify the basic playback controller to support event handlers (help, noinput,
no match) for each command.
b. Identify two commands which the ASR sometimes confuses (e.g., “Boston” and
“Austin”) and implement techniques to disambiguate the commands.
c. Be prepared to demonstrate your application in class.
5. Voice Survey
a. Create a paper form with 10 survey questions. Each question can be answered
by using a Lickert scale http://en.wikipedia.org/wiki/Likert_scale.
b. Debug the wording of the instructions and questions by asking a couple of
friends to complete the survey and provide critiques.
6
c. Implement an IVR version of the survey.
d. Ask at least 5 friends to take the IVR survey. Watch them and record any
troubles or problems they encounter.
e. Fix the problems and repeat step d.
f. Be prepared to demonstrate your application in class.
g. Explain your use of “tapered prompts.”
6. Basic playback controller (specify a grammar)
a. Create a state transition diagram illustrating the user commands to a playback
controller (start, stop, fast forward, pause, resume, rewind, etc.).
b. Follow the methodology in “European Command Vocabulary” (see the appendix)
by (1) spontaneous generation of potential command words, (2) rating the
confidence of potential command words, and (3) validating that the command
words can be accurately recognized by the ASR.
c. Implement the basic playback controller using Voxeo prophecy.
d. Be prepared to demonstrate your application in class.
7. Semantic interpretation
a. Write an IVR user interface that prompts users for airlines reservation.
information and constructs a command format usably by the underlying airline
reservation system.
b. Assume the ASR produces commands in the following format Java structure:
{
FlightNumber: XXXX
DepartureDate: YYMMDD
DepartureTime: HHMM (use the twenty-four hour clock)
DepartingAirportCode: PDX
DestinationAirportCode: LAX
}
c. Be prepared to demonstrate your application in class.
8. Usability test
a. Design and conduct a usability test involving at least ten users
b. Identify problems discovered and suggest solutions for resolving the problems.
Team project: Hands-busy, eyes-busy multimodal application
Team Report 1: Define project application requirements
a. Review the paper by Larson on “Eyes-Busy, Hands-Busy” (Eyes BusyV2.docx in
http://www.larson-tech.com/PSU).
b. Select an appropriate eyes-busy, hands-busy application.
7
c. Design the application by specifying usage scenarios, content model, and control
model.
Team Report 2: Select multimodal platform
a. Download and install three potential multimodal platforms.
b. Validate that the multimodal platform works as advertised.
c. Identify the advantages and disadvantages of each platform.
d. Select one of the platforms to implement the team project.
e. Document your results.
Team Report 3: Conduct usability tests
a. Implement your application on the selected platform.
b. Conduct usability tests and modify your application as necessary.
c. Submit detailed descriptions of your usability tests, their results, and the
resulting changes your application.
Project demonstration
a. Demonstrate your project in class.
b. Identify tasks which other students in the class will perform using your
application.
c. Submit the URL for you application’s web page to your instructor. See Team
Project Web page.doc for details
d. Project evaluation criteria will include the following:
i. To get credit, the application must work.
ii. The application must support the scenarios, conceptual model, and content
model described in Team Report 1.
iii. Other criteria include robustness, usefulness, technical superiority, user
friendliness, innovation and creativity of our application.
Propose your own project.
Instead of implementing the project proposed by the instructor, each tem can prpose an
alternative project by submitting a proposal to the instructor. The proposal should include:
a.
b.
c.
d.
Team members names and e-mail addresses.
Goal of the application
Intended user of the application.
Design the application by specifying usage scenarios, content model, and control
model
e. Software and hardware you plan to use to implement the application
f. Submit the proposal to the instructor before 11:59 a.m. Wednesday November
13. Do not begin implementation before the instructor approves your project.
8
Download