2. Speech & Language Technologies (SLT) Lab @ Bhrigus Inc

Speech & Language Technologies Lab @ Bhrigus Inc Version Date 1.0 15 Sept 05 Business Proposal for Speech and Language Technologies (SLT) at Bhrigus Inc. Commercial Products using Speech & Language Technologies Collaboration with universities in India & abroad Open source initiatives to promote SLT in India Authored by: Nixon Patel Founder & CEO, Bhrigus Software Pvt Ltd. Copyright Bhrigus Inc Page 1 of 7 Speech & Language Technologies Lab @ Bhrigus Inc Version Date 1.0 15 Sept 05 TABLE OF CONTENTS 1. Role of Speech and Language technologies............................. 3 2. Speech & Language Technologies (SLT) Lab @ Bhrigus Inc .. 5 2.1 Practical Way: Where to Start with ..................................................... 5 2.2 How to Start .......................................................................................... 5 3. Building TTS System Using Festival & Festvox ........................ 6 4. Building ASR System Using Sphinx .......................................... 6 5. Open Source Initiative ................................................................ 7 6. Deliverables and End Product .................................................... 7 7. University Collaboration ............................................................. 7 Copyright Bhrigus Inc Page 2 of 7 Speech & Language Technologies Lab @ Bhrigus Inc Version Date 1.0 15 Sept 05 1. Role of Speech and Language technologies Speech and language technologies have become an increasingly central component of Computer Science in the last decade. The goal of these technologies is to impart speech and language capabilities to computers so that human beings interact with computers similar to the way they communicate among themselves. Implication of these technologies in general public can be seen from the daily usage of automated voice response systems and search engines, which are built using speech technology, information retrieval and machine translation. With the advent of Internet and unlimited storage capabilities, information is digitally stored, processed and communicated. Most of the information in digital world is accessible to a few who can read or understand a particular language. Speech and language technologies can provide solutions in the form of natural interfaces so that digital content can reach to the masses and facilitate the exchange of information across different people speaking different languages. These technologies play a crucial role in multi-lingual societies such as India where illiterate people cannot read or write but typically speak and understand more than one language. Hands-free and natural communication with the computers, universal access (any where any language) to the information is the key features of the future pervasive computational era. The theory, algorithms and implementation aspects of speech and language processing are well understood to the level where practical applications are deployed in day-to-day life. Worldwide Scenario Most of the top universities in US and Europe have specialized departments that teach and conduct research in the areas of speech and language. Carnegie Mellon University has Language Technologies Institute (LTI), which provides specialization in speech and language technologies both at the undergraduate and graduate level. Apart from the academia industries such as Microsoft, IBM, Google and Nokia have ventured in these technologies. Speech recognition systems such as Dragon naturally speaking, IBM Via voice, Sphinx, speech synthesis systems from AT&T, Cepstral and Rhetorical and Machine Translation systems such as Systran are the outcomes of the sustained effort by academia and industry. The expertise available both at the academic side and the industry side is so high that the high quality speech-speech translations systems (involving speech to text, machine translation and text to speech) for new languages are built in a short span of 26 months. Scenario in India Academic labs in Indian Universities have been teaching fundamental courses in speech technology and speech signal processing. However, the academic labs or the industrial Copyright Bhrigus Inc Page 3 of 7 Speech & Language Technologies Lab @ Bhrigus Inc Version Date 1.0 15 Sept 05 labs (apart from IBM-India in Hindi) have not demonstrated any continuous speech recognition systems for large vocabulary in multiple Indian languages. As of today, there is no commercial product, which uses either text to speech or speech to text or machine translation system in Indian languages. There are no streams in the academic institutions, which provide hands-on experience in building large vocabulary continuous speech recognition system, text to speech systems or full-fledged machine translation systems. However, there has been recent surge towards building ASR and TTS system for Indian languages. Prototype systems in the field of ASR and TTS could be seen from HP Labs India and IIIT Hyderabad. Given the technology and the advancements in speech and language, development and deployment of such systems is feasible. Thus it is a unique opportunity and the right time to start speech and language technologies industrial lab and collaborate heavily with the academic institution both at India and abroad and avail the expertise available in multiple institutions. IIIT Hyderabad is uniquely positioned in this aspect and is a budding place with a lot of speech and language activities taking place at two of its divisions: Language Technologies Research Center and MSIT Division. Copyright Bhrigus Inc Page 4 of 7 Speech & Language Technologies Lab @ Bhrigus Inc Version Date 1.0 15 Sept 05 2. Speech & Language Technologies (SLT) Lab @ Bhrigus Inc One of the major goals of speech lab @ Bhrigus Inc. is to develop commercial products using speech and language technologies with a specific focus on the Indian languages. The role that could be played by SLT lab can be split into following tracks. 1. 2. 3. 4. Development of speech & language resources for Indian languages Commercial Products using Speech & Language Technologies Collaboration with universities in India & abroad in the area of SLT Open source initiatives to promote SLT in India and to build large user base and application developers base in India 5. Innovation & Research for future products and applications 2.1 Practical Way: Where to Start with Given the broader objectives and the goal of building products in speech & language technologies, the business model should be to develop products in proven technologies but unexplored domain. Development of speech recognition (speech to text) and speech synthesis (text to speech) systems for Telugu, Marathi and Gujarati is one of the possible starting points to venture into this area. 2.2 How to Start There are three possible starting points: 1. Develop from scratch 2. Collaborate with existing companies and use their platform and 3. Use open source software platforms. Approach (1) is a naïve approach to start a company and is more suited for an academic lab. Approach (2) has the advantage of being able to get large volumes of support both at the technical and product level, but one has to take into account the cost involved and the long-term cost of having third-party software and its license. The other aspect is the license associated with the derivatives (such as acoustic models) obtained using thirdparty software. Approach (3) is more suitable for risk-taking companies with a geeky bent of mind who can venture time and money to make the products out of open source systems such as Sphinx and Festival & Festvox. Given sufficient background in speech technologies these are affable software pieces, versatile enough and have encapsulated state-of-art algorithms for building ASR and TTS systems. Commercial companies such as LumenVox, Sun, AT&T, and Cepstral use these open source tools to build and finetune their speech recognition and synthesis products. Copyright Bhrigus Inc Page 5 of 7 Speech & Language Technologies Lab @ Bhrigus Inc Version Date 1.0 15 Sept 05 3. Building TTS System Using Festival & Festvox The goal of TTS system is to convert text to speech. Festival is a multi-lingual speech synthesizer, while Festvox is a set of scripts built around Festival engine to build a new voice in a new or existing language. Festival is being used widely by both academic and industrial labs such as AT&T to build high quality voices. To build a voice in a new language, the steps involved are as follows: 1. Defining the phone set, letter-to-sound rules and syllabification rules of the language 2. Selection of text to be recorded 3. Recording of speech database 4. Labeling the speech database 5. Building the units' database by clustering algorithm 6. Fine tuning the parameters such as pitch markers, clustering the units by tagging them with more phonemic context etc. High quality TTS voice requires a large text corpus so that a set of sentences can be selected which has good coverage of high frequency words and diphones. The speech database required for this purpose is a single speaker’s voice recorded in a studio environment and would be typically from 5-10 hrs of speech. 4. Building ASR System Using Sphinx The goal of ASR systems is to convert speech to text. Sphinx is an open source ASR system built at Carnegie Mellon University, USA. This software has two main components: SphinxTrain and Sphinx Decoder. SphinxTrain generates the acoustic models while Sphinx Decoder does the job of decoding which is also referred to as recognition. Due to the complexity involved in building decoders, there are multiple versions available for Sphinx Decoder. Sphinx-II, Sphinx3.2, Sphinx3.5 and Sphinx4 are some of the variants of the decoder. Sphinx-II use semi-continuous model, where as Sphinx 3.x and 4.x use continuous models. For our purposes, we will stick to Sphinx 3.5, which is written in C/C++ and is widely supported internally in Carnegie Mellon. The typical steps involved in building a speech recognition system 1. Selection of the text to be recorded. This requires a large text corpus, which is transliterated and cleaned. This text corpus is given to a text selection algorithm to derive the required number of sentences. 2. Recording of these sentences by multiple speakers in different dialects and in different recording conditions. 3. Building the recognition engine using SphinxTrain and Sphinx Decoder. 4. Build a language model 5. Fine tuning of the parameters of language model and acoustic models. Copyright Bhrigus Inc Page 6 of 7 Speech & Language Technologies Lab @ Bhrigus Inc Version Date 1.0 15 Sept 05 5. Open Source Initiative While the development of products and resources be put in parallel stream, open source initiatives are essential from the aspects of visibility and user base. A release of open source Hindi voice could be a good strategy. Such releases could be also be initiated by collaborating with the universities in the form of projects, and the deliverables include release of the voice under open source. 6. Deliverables and End Product The following are the list of the deliverables at the end of this effort. 1. 2. 3. 4. Text and Speech Resources for Telugu, Marathi, Gujarati & Hindi ASR for the 3 languages: Telugu, Marathi & Gujarati TTS for the 4 languages: Hindi TTS could be made open source End Product: Railway reservation system for Telugu, Marathi & Gujarati. 7. University Collaboration Collaborations with the universities working in SLT stream are important. This collaboration can take place in the form of industrial members (pls see below for the definition) or in the form of sponsoring projects. Industrial member: This concept is applicable for the case of MSIT-IIIT. A company can become an industrial member by paying a membership fee. Upon which the member plays a role in the following aspects of the SLT stream. 1. Refining the curriculum based on industrial requirements 2. Industrial practicum & internships for MSIT – SLT stream students 3. Job opportunities for the MSIT – SLT stream students While the industrial member plays a contributing role in the SLT stream the industrial members can avail the following benefits: 1. The employees of the industrial members can take SLT courses at MSIT by following the industrial member’s course fee structure. 2. Avail the expertise of the faculty members in the form of consultancy. 3. Work on collaborative projects with faculty members and students of MSIT-SLT stream. Copyright Bhrigus Inc Page 7 of 7

2. Speech & Language Technologies (SLT) Lab @ Bhrigus Inc

Related documents

Products

Support

2. Speech & Language Technologies (SLT) Lab @ Bhrigus Inc

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib