Web Based Maltese Language Text to Speech Synthesiser

advertisement

Web Based Maltese Language Text to Speech Synthesiser

Buhagiar Ian & Micallef Paul

Faculty of ICT, Department of Computer & Communications Engineering mail@ian-b.net, pjmica@eng.um.edu.mt

Abstract

An important factor which has led to the growth of the internet is the ease of professional website development. A generic platform was required for the successful implementation of a number of applications so that they become available on the internet as easily as possible.

The aim of this paper is to identify methods by which one can setup a web based interactive system on which various website applications can be developed for different tasks. Such system should lead to the development of websites which are aimed to be low cost, robust, secure, support multi-lingual content, aesthetically professional, fast to develop and easy to maintain at the same time. The flexibility of the chosen system was examined by implementing on it a handler to achieve the first web based Maltese language Text to Speech (TTS) software.

The results obtained clearly show that open source content management systems offer a free platform on which high quality and secure websites can be built with limited knowledge of web development. The flexibility of such systems makes them ideal for various application developments. In this case, the first web based Maltese language TTS system was successfully implemented.

1. Content Management Systems

A Content Management System (CMS) is a web based software that helps one to develop and manage the content of a website quite easily. It is a tool that enables a variety of technical and non technical staff to create, edit, manage and finally publish in a number of formats a variety of content (such as text, graphics, video, documents etc), whilst being constrained by a centralised set of rules, process and workflows that ensure coherent, validated electronic content [1]. This means that moving ahead from the traditional HTML standalone web pages, in a CMS driven website a user would have the ability to update a page much faster and simpler by using an online editor. The primary disadvantage of deploying a CMS is the added complexity in the website system and framework.

2. CMS Deployment

The option of developing a CMS from scratch was discarded as it is too costly to develop something which already exists within various packages and operating systems.

Off the shelf, there are a lot of CMS software to choose from. However there are a number of steps which one must take to ensure the best performance to cost ratio is achieved. Given that various applications were required to be house developed within the CMS framework; the CMS needed to be editable as much as possible. Thus the choice turned out to be automatically on choosing some Linux flavored open source CMS.

In a user driven community, the activity of the community which strives to keep the CMS secure to the latest vulnerabilities. By examining such activity and keeping in mind the factors mentioned above, the list was brought down to five packages: CMS Made

Simple, Drupal, Joomla, Wordpress and Xoops.

In this regard, a study was carried out outlining how these five open source CMS software rate with the various characteristics required for our application.

Though Drupal and Joomla both have very similar features, in the end Joomla was chosen due to the extensive extensions repository which caters for all the major web applications. Besides this, the ease by which one can develop his extensions made Joomla the number one choice. Other factors included the platform’s compatibility with multi-lingual content and the ease of maintenance.

3. Web Server Selection

Joomla was developed and tested primarily on an

Apache web server. In the next section it is shown that an already developed TTS executable software requires a Windows server. Thus a solution was required through which two different operating systems would run simultaneously on the same web server. Also, both systems must be able to exchange data in real time amongst each other.

The final choice made was in favour of deploying a virtual Apache system over a Windows operating system. Thus a Windows, Apache, MySQL and PHP

(WAMP) system was used. As the name suggests,

WAMP is a Windows program which implements a virtual Apache web server together with the MySQL database structure and the PHP server side scripting language on top of a Windows operating system.

Figure 3.1 – System block diagram

EasyPHP version 2 is a WAMP software bundle which contains Apache 2.2.3, PHP 5.2.0, MySQL

5.0.27, phpMyAdmin 2.9.1.1 and SQLiteManager

1.2.0. Such a solution gives us an ideal platform on which to install our CMS, as well as a running platform for the ANSI C TTS software.

4. The Maltese TTS Synthesiser

Speech synthesis is the artificial production of human speech who’s quality is judged by its similarity to the human voice and its ability to be understood. On the other hand, a TTS synthesiser is a computer software that converts text into audible speech.

P. Micallef [1] has suggested and implemented the first TTS synthesis system for the Maltese language.

This TTS software is a command line driven package written in ANSI C. It makes use of several procedures that are compiled to object code, and afterwards linked together to form an executable program. There is also an external B-tree based database that is linked to the program and used to make a direct translation from grapheme to phoneme without using the rules.

Initially there are procedures to consider the locale by translating numbers, abbreviations, etc. into words.

A set of rules then translate the text to phonetic words and add also the main stressed syllable. This is divided into two procedures. The next set takes into account adjacent words and readjusts phonetic content. The phonetic content is then input into another procedure that translates the phonetic content to diphones. This procedure makes use of the stress and length indicators to choose the appropriate diphones for vowels and for double consonants. The diphone sets are then passed to another procedure that obtains from the binary diphone database information relating to pitch, pitch positions etc on each diphone, and prepares an overall file based on pitch synchronous techniques.

Finally this file is used in the last procedure to play the audio. Each set is essentially independent, allowing for any further development in the areas of intonation and change of diphone databases.

5. Maltese Character System

Internally the TTS system developed in [1] works with ASCII characterization. Thus if a user would like to input the character ż, the Windows equivalent code is to be typed while holding the ALT key pressed down

(i.e. ALT + 167 in this case).

For cross system compatibility matters, when it comes to the Maltese characters the software uses constants defined in a header file other than direct

ASCII characters. Thus, for example the constant

ZCAP is defined to be equal to the 247 ASCII which is mapped by the section sign (§) character. The Unicode equivalent of the § character is U+00A7 while its

HTML equivalent is §. Table 5.1 shows the equivalent values for all the Maltese language characters as used in the Maltese TTS software.

Table 5.1 – Maltese character equivalents

Joomla 1.5 offers an ideal platform for developing multi-lingual content. The standard language library of the major languages can be downloaded and easily installed from the Joomla portal. However, since the

Maltese language pack for this CMS wasn’t existent by the time of the publication of this paper, it had to be developed from scratch.

6. Web Application Development

6.1. Methodology Study

There are two different options which were considered for the successfully integration of the existent Maltese TTS system within a CMS web based framework.

The first option was to re-code the whole Maltese

TTS block, which was written in ANSI C, into the web based PHP language. The main difference that a web based application has from a standard application is that the web application needs to serve different users at the same period of time. Several web based languages such as PHP and ASP contain a large number of functions to cater for this problem. On the other hand, a standard imperative computer language such as ANSI C doesn’t cater for such situations.

However, the major problem with implementing this option was that the TTS block syntax (which also includes a B-tree) is quite complex.

The second option was to develop a PHP handler which could transfer data from the CMS’s web based interface to the web server on which a modified version of the TTS block written in [1] would execute on.

There were two major concerns with adopting this methodology. The first concern being that the web server will need to cater for both a Linux environment for the CMS, as well as a Windows environment for the

TTS block. However as it has been clearly illustrated in section 3, this problem can be mitigated by the use of a

WAMP system such as EasyPHP. The second concern was that the present TTS block catered only for a single user at any point in time. This problem could be minimized by editing the input and output parts of the

ANSI C TTS block and integrating it with a PHP web based interface.

Finally, the second option was chosen as the first one involved quite a complex process which was beyond the purpose of this paper.

6.2. TTS Block Implementation

Although the main TTS block was left intact; the original C software was heavily modified in its output and input stages.

At the input stage the original software gave several text input options varying from a single word, a sentence, a text file and a batch file. The main menu was bypassed by making the program execute directly to menu option 3. This menu option takes a phrase text input either from the keyboard or from a text file and processes it. The keyboard input method was discarded and thus a new function which automatically accepts and processes inputs from the text file file.txt

was written.

Besides that, the program also reads an 8 digit code from the id.txt

text file. The primary idea behind this

ID file is that the program will be able to distinguish between different sessions whilst operating on a multiuser environment such as the WWW. The generation process of the ID code will be tackled in 6.4.

At the output stage, other generating all the WAV files as stest.wav

and playing them on the Windows

Media Player, the generated WAV files are stored in relations to their corresponding ID name.

Following the generation of the WAV file, the program will now be thrown into a loop which constantly reads the ID code from id.txt

text file. If a user has inputted a new phrase, the contents of id.txt

and file.txt

will be updated accordingly. Thus, if the program notes that the ID code has changed, it will reexecute the above algorithm so as to generate a new

WAV file corresponding to the new input string.

6.3. Web Handler Development

The primary aim of the PHP web handler is to provide a web interface for the TTS system. Besides this, it also is set to take care of session control. The handler can be split into two main parts: the application form on which the user would write the phrase he would like to listen to and submit it, and the execution control which corresponds with the TTS block in the server.

The input form makes use of simple HTML structures and the data is sent using the POST method.

For accessibility purposes, four buttons which correspond to the special Maltese characters were added on the form to cater for those users who lack a

Maltese keyboard. Of course if the user would enter the letter g instead of a ġ, the outputted pronunciation wouldn’t be correct.

On pressing the submit button, the text stream is edited so that it would be made compatible with the

TTS block. As it has been explained earlier, the TTS block doesn’t accept UTF-8 characterization. Thus for example the character ż needs to be replaced with the § character before it is inputted to the TTS block. Apart from this, the TTS block specifies that a valid sentence input should start with a space and terminate with the & character. Therefore the below PHP code was added to change the inputted Maltese characters with their TTS block’s pre-set representation.

$sr0 = " " .$source ." &";

$sr1 = str_replace('Ŝ','§',$sr0);

$sr2 = str_replace('ā','¥',$sr1);

$sr3 = str_replace('ħ','¦',$sr2);

$sr4 = str_replace('ë','¨',$sr3);

For example if the input is:

Proāett ta' l-Aħħar Sena għall-Kors fl-Ināinerija

Before being inputted to the TTS block it is converted into:

Pro¥ett ta' l-A¦¦ar Sena g¦all-Kors fl-In¥inerija &

Finally, the PHP handler saves the edited text stream in the webserver’s file.txt

and the session ID number in the file id.txt

.

Figure 6.1 – Web handler

6.4. Session Control

An important parameter which one must always take into consideration when working within a web environment is the issue of multiple inputs by different users at the same time. Imagine that if at a certain point in time a user inputs a sentence X and after a very small period of time, another user inputs a sentence Y. Such operations will produce garbage data given that all the parameters which were being used for sentence X, at a certain point are implemented on sentence Y.

An HTTP session token is a unique identifier

(usually in the form of a hash generated by a hash function) that is generated and sent from a server to a client to identify the current interaction session. The client usually stores and sends the token as an HTTP cookie and sends it as a parameter in GET or POST queries.

The session_id function in PHP allocates a unique key to every user which logs in the website. This key remains the same until the user disconnects from the internet. As the session key is very long, the PHP handler takes only 5 digits from the session ID and appends to it a two digit random number. The random number is appended so that different inputs from the same user will be distinguishable. Inorder to make sure that the new key doesn’t start with a digit, the letter a is appended at the start of the new key. These alterations are carried out so that there wouldn’t be any incompatibility with the TTS block’s C code. Finally this number is stored in the text file id.txt so that its value will be used by the TTS block. The below PHP code illustrate how the session ID key was generated:

$nid = substr($id,5,5);

$rnd = rand (10,99);

$newid = "a" .$nid .$rnd;

$wavid = $newid .".wav";

$printid = $newid ."&";

Session ID: i957mg6lj9kjp4rrbrn5tna4t7

Extracted ID: g6lj9

Random ID: ag6lj965

Wav ID: ag6lj965.wav

Print ID: ag6lj965& (ID written in file id.txt

)

Thus, with the session ID method it is very clear that it would be very difficult for the system to mix the input text within web sessions.

6.5. System Deployment on CMS

After the PHP handler code together with the TTS block were tested within a standalone HTML website, the system required to be implemented on the CMS framework.

New web applications are generally deployed within the Joomla CMS framework by means of the development of a new extension. In our case, this new extension would have contained the PHP + HTML source code together with a number of Joomla command lines for compatibility matters.

However after browsing through the extensive

Joomla extensions directory the ChronoForms component was noted. This component allows the administrator to simply paste the PHP + HTML code of a particular form and it implements the code within the Joomla component framework automatically. Given that the standalone PHP code worked perfectly this system was adopted.

7. System Limitations

The TTS block developed in [1] was identified as the major component that is limiting the system. Of course one cannot ignore the fact that it is also the most complex part of the system. Here the main issue is that when the block is given certain phrases, the executable program halts. In such circumstances, the program has to be manually restarted by the system administrator.

During the testing procedures there were several cases which were noted to cause this problem. These cases were studied and a possible solution was drafted for each one of them.

(a) Long phrase case: The program was noted to halt when the phrase input was longer than 2000 characters.

In this situation the limiting factor is that the sentence string in the ANSI C software is set to take a maximum of 2000 characters. To cater for this problem, the web input side was used to truncate any input string which is longer than 2000 characters by means of a PHP function.

(b) Too long word case: It was noted that the program halts when one of the input words would be longer than

15 characters. Here again this problem was correcting at the web input level by using a PHP functions which inserts a space in any word that is longer than 15 characters.

(c) Ambiguous input case: It was noted that on an ambiguous input the program halts. An ambiguous input could simply mean either an English word input, an incorrectly spelt Maltese word or any other word which the TTS block doesn’t manage to read. Since for the purpose of this paper, the TTS generation section was considered as a black box, this problem was not catered for.

8. Suggestions for Future Work

Based on the knowledge which was acquired during the development of this system, it is possible to outline areas where future work on the Maltese TTS system may be carried out. These suggestions are aimed at transforming the current prototype implementation into a system mature enough for extended testing, enabling verification of the technology in the field.

As regards the TTS block, the ANSI C code should be debugged more rigorously so that whenever an ambiguous input is given, the system doesn’t halt but simply gives a warning and afterwards it is restarted.

Keeping in mind that the TTS block was written in

1997, a possible but very demanding suggestion might be to re-code the TTS program in an Object Oriented

Programming (OOP) language such as Java, C# or PHP

5.

A very ambitious project would be to integrate the

Maltese TTS system within the Microsoft SAPI

(Speech Application Programming Interface) API environment. SAPI is a framework which standardises

TTS of different languages within a Windows environment [2].

8. Conclusion

The major achievement in this paper was that the first TTS Maltese language system became accessible to everybody free of charge over the internet. Such an achievement will surly help to facilitate more the better usage of ICT technologies within the local blind society.

9. References

[1] P. Micallef, “A Text To Speech System for

Maltese,” Ph.D. thesis, University of Surrey, UK,

1998.

[2] I. Buhagiar, “Web Based Interactive System for

Distributed Applications,” B.Eng (Hons) thesis,

University of Malta, 2008.

Download