Prosiect Adnabod Lleferydd Sylfaenol: Initial findings and recommendations Project Materials Here I list all the project materials I have found to date: Locations: N:\egymraeg\Lleferydd\ N:\rhys\ http://bedwyrvm.bangor.ac.uk/svn/adnabodlleferydd/ Found: recordings of 45 speakers (on bedwyrvm) recording wordlists various admin documents planning documents: o RJJ: the initial 8 step plan; the August Speech recognition recordings report. o BJW: more recent plan sketch (speechrectasks) My understanding of the work breakdown 1. Write Overview Documents An overview document should specify explicitly what kind of applcation is envisaged and how the the user is expected to use it. I have not found any such documentation in the project materials. The nearest is the list of words recorded. Here are some provisional suggestions: Once launched, the speech recognition application runs in the background providing the following services: applications launcher for a clearly defined number of applications (list apps); speech interface for a clearly defined subset of generic commands (e.g., open, close, cut, copy, paste, etc.); comprehensive speech interface for a clearly defined number of applications (e.g., calculator); A functional requirements document should specify a mapping between speech input commands and system commands. Ivan Uemlianin Ad. Lle. Syl. 2 2. Immediate Tasks These tasks have no prerequisites other than the functional requirements mapping. 2.1. Build Acoustic Model Building the Acoustic Model (AM) is the most significant part of the project. The first step is to collect a corpus of training data. Data have been collected from 45 speakers. We also have at our disposal speech data from the WISPR and Lexicelt projects. Most (perhaps almost all) of this data is unlabelled. Training an AM from this data can be done iteratively. This is an outline of the process: initialisation: human experts to label a subset of the data machine to "fake label" the rest of the data cycle: machine to train an AM from the data machine to use the AM to auto-label the data human experts to correct a subset of the labelling When AM errors are below some threshold the AM can be slotted into the recogniser. Size of task: long Champion: IAU Consultant: BJW 2.2. Build Language Model The Language Model (LM) is a relatively simple component of the recogniser. It specifies the expected speech input (or equivalently, the textual output) of the recogniser. The main technical challenge here is that the Sphinx LM builder is available only under a noncommercial license, so some creativity might have to be involved. Size of task: 10 days Champion: IAU Ivan Uemlianin Ad. Lle. Syl. 3 2.3. Build User Interface As the recogniser is a completely separate module, the user interface (UI) can be designed and built independently, using a dummy recogniser (e.g., whatever the user says, the recogniser emits the signal, "Launch Firefox"). Here are some provisional suggestions: For the graphical user interface (GUI) all that is required is an on/off switch, and a list of what commands are recognised. The voice user interface (VUI) must do the following: stream speech to the recogniser; receive signals from the recogniser; run system commands depending on received signals; provide visual response (e.g., display command or 'not recognised' in GUI). The recogniser used by the user interface should be updated regularly as part of the AM building iteration outlined above. For example, every Sunday night a new version of the UI is compiled with the current recogniser. Size of task: 10 days Champion: DBJ Consultant: IAU (interface with recogniser) 3. Build Recogniser Prerequisites: AM, LM The recogniser component of the application includes the Sphinx-based speech recognition code and provides an interface between that and the user interface (UI) component of the application. The recogniser must do the following: receive a speech stream from the UI; perform speech recognition on the speech stream; send signals to the UI. Size of task: 10 days Champion: IAU Consultant: DBJ (interface with UI) 4. Build Final Release Prerequisites: user interface, recogniser Ensure that a version of the UI is available for outside release with the best recogniser. Size of task: 2 days Champion: DBJ Ivan Uemlianin Ad. Lle. Syl. 4 Proposals for the next six months I suggest we commence work on all the Immediate Tasks in parallel, with the aim of having a usable application (even if with very poor recognition accuracy) as soon as possible. The Build Recogniser task requires a working AM but that should be done as soon as possible also: we should not wait until the AM is finished. Task Write overview document Write requirements document Review documents Build Acoustic Model Build Language Model Build User Interface Build Recogniser Build Final Release Oct Nov Dec Jan Feb Mar # # # > > / > > # > > # > > # > # # Once we have a usable application, we can start giving demonstrations, press releases, etc. Questions for meeting What are the important project dates? For example, what are the deadlines for progress reports Is there anything anywhere else not covered in Project Materials? Can we confirm task assignments: Who? IAU IAU ALL IAU/BJW IAU DBJ/IAU IAU/DBJ DBJ What? Write overview document Write requirements document Review documents Build Acoustic Model Build Language Model Build User Interface Build Recogniser Build Final Release Ivan Uemlianin When? OK? 2008/10/06 Mon 2008/10/06 Mon 2008/10/13 Mon Dec., Mar. Dec. Dec. Feb. Mar.