Component Description Multimodal Interface Carnegie Mellon University Prepared by: Michael Bett

1

Component Description

Multimodal Interface

Carnegie Mellon University

Prepared by: Michael Bett mbett@cs.cmu.edu

3/26/99

2

1 - Overview

 Description of the Multimodal Toolkit (MMI)

What MMI is ...













Integrated Speech, Handwriting, and Gesture

Recognizer Java Based API

Integrated Recording Feature

Plug-n-Play Recognizer Interface. Allows recognizers to be replaced

Internet Enabled Interface. Recognizers may run remotely over the internet

Simultaneous Multiple User Support

Supports Natural Interface Development

2 - Architecture Overview

•MMI is a toolkit that allows multiple modalities to be easily integrated into applications.

•Applications can mixed modalites

(speech, gesture, and handwriting)

Multimodal

Applet

Speech

Janus/Speech

Recognizer

Multimodal

Server

Handwriting

Handwriting

Recognizer

Gestures

Gesture

Recognizer

3

Sample Application Which Uses Multimodal

Error Repair

Acoustic

Model

Vocabulary

The Java based API communicates directly with each recognizer

Language

Model

The multimodal applet is the user interface; the applet window presents a view onto a domaindependent representation of application data and state in the form of objects to be manipulated .

4

3 - Component Description

The following modalites have the following level of support in multimodal toolkit

Speech

Handwriting

Pen gestures

3-D gestures

Lip-reading

Gaze tracking

Keyboard

Mouse

Facial expressions

Type of task

Data entry









Command









Experimental

Experimental





Experimental

Experimental









Table 1. Supported Applications



= strongly supported;



= supported; ?

= not precluded

5

4 - External Interfaces

 The user defines their grammer using six probabilistically weighted nodes:



A Toplevel represents an entire input model and contains one or more sequences , each of which contains exactly one AFrame;



An AFrame represents an action frame and contains one or more sequences, each of which consists of one or more PSlots;



A PSlot represents a parameter slot and contains one or more

UnimodalNodes (at most one for each input modality);



A UnimodalNode specifies a sub-grammar for a single input modality and has the same structure as a NonTerm, with the addition of a label specifying the modality;



A NonTerm is a non-terminal node consisting of one or more sequences, each of which contains zero or more NonTerms or Literals;



A Literal is a terminal node containing a text string representing one or more input tokens.

6

4 - External Interfaces

 The Multimodal Server sends a series of points to the pen and gesture recognizers.

 The audio is sent to the speech recognizer.

 The pen, gesture and speech recognizers return their hypothesis to the multimodal toolkit which is responsible for integrating the results in an optimizing programming search as shown below. [Minh Tue Voh Dissertation 1998 CMU]

Query

Distance

Dst

Query

Distance

Src arrow_end arrow_start

PEN how far is it from here to there

SPEECH

.

Output Path Over Multidimensional Inputs

7

5 Existing Software “Bridges”

 The multimodal toolkit uses a Java API which allows applets or applications to incorporate multimodal functionality

8

6 - Information Flow

 Part 1 - Specify how other CPOF components can send and receive data to your system - Please be explicit

 Components may directly interface with the multimodal server

 Part 2 - What are the inputs to your system - Please specify formats and protocol - provide details

 Multimodal grammar

 Part 3 - What are the outputs of your system -

Please specify format and protocol - provide details

 Hypothesis according to the multimodal grammer

9

7 - Plug-n-play

 Part 1 - We have not currently identified how our components interact with other CPOF components.

 Please present a diagram that shows this interaction TBD

 Part 2 - Are there components in your system that are functionally “similar” to another CPOF component? TBD

 Part 3 - Are any of your components complementing other CPOF components? (e.g ZUI and

Sage/Visage) TBD

10

8 - Operating Environments and COTS

Component

Name

Multimodal

Server

Janus

NPen++

Gesture

Recognizer

Required

Hardware

PC or Sun

Operating

System

Independent

Sun - Ultra 60 Solaris 2.5.1

Sun or PC

Sun or PC

Solaris 2.5.1 or

Windows NT

Solaris 2.5.1 or

Windows NT

Language

Java

Tcl/tk

C

C++

C++

Required

COTS

JDK 1.1.*

Tcl/Tk

None

None

11

9 - Hardware Platform Requirement

 Specify the hardware required to support your system:







MMI can run on a PC with a minimum of 32 Meg RAM and

200 Mhz processor.

The Speech Recognizer requires a Sun Ultra 60 dual processor with 500 Meg RAM minimum. (Current recognizer under development will require 500 Mhz Pentium III with a

128 Meg minimum, 256 Meg preferred)

Video capture cards, Soundblaster compatitable sound cards, table top and lapel microphones, pan tilt and stationary cameras are required.

Component Description Multimodal Interface Carnegie Mellon University Prepared by: Michael Bett

1 - Overview

2 - Architecture Overview

3 - Component Description

4 - External Interfaces

4 - External Interfaces

5 Existing Software “Bridges”

6 - Information Flow

7 - Plug-n-play

8 - Operating Environments and COTS

9 - Hardware Platform Requirement

Related documents

Products

Support

Component Description Multimodal Interface Carnegie Mellon University Prepared by: Michael Bett

1 - Overview

2 - Architecture Overview

3 - Component Description

4 - External Interfaces

4 - External Interfaces

5 Existing Software “Bridges”

6 - Information Flow

7 - Plug-n-play

8 - Operating Environments and COTS

9 - Hardware Platform Requirement

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib