Aneesa Awaludin
Van Phan
Jacob Huston
This project aims to develop a system that would provide a user friendly interface for document translation including translation assistance tools as well as a means to manage the translation process. The user interface would decrease the amount of time spent working on individual documents and process management would improve consistency with individual documents, across multiple projects and between translators. Using a client server architecture is appropriate because translators working for a given translation company often work from geographically separate locations. A central shared resource database allows everyone to share data without the hassle of repeating transferring commonly updated files. The user community would include translators and translation companies as well as language learners when additional features for language learners are supported.
The main benefits of such software would be improvement in consistency through maintaining a database of all previously translated phrases, and reduced time due to the user friendly structured environment and translation tools provided to the translator.
It should be noted that, although it is a fascinating field, it is not a goal of this project to develop a machine translation system. While some of the translation tools might make translation faster, it will be up to a human translator to produce the actual translation.
Translation assistance and translation process management software is not a new concept. However, existing software is either very expensive or lacking desired quality and features.
Currently we have our features divided into two categories; features deemed required to make the project minimally usable and additional features that would be nice to have as time and resources allow.
The required features include a phrase editor, a phrase dictionary database and an interface allowing communication between the phrase editor and phrase database.
The phrase editor would provide a structured environment in which a translator would work on a given document. Given a source document, the phrase editor would break the source text into phrases and provide locations for entering the translation of each phrase in the target language as well as providing a means of navigation between phrases. It would also allow the translator to remove the source language from a given completed document leaving only the target language. This would produce a document in the target
1
language matching the formatting of the source document with minimal intervention from the translator.
The phrase dictionary database would store phrases in both their source and target languages. It should be set up such that a given source phrase can have multiple corresponding target phrases for projects that are being translated into multiple languages. It should also store some metadata regarding the phrases that would be useful for later queries such as source language, target language, project name, project category, translator name and the like.
The third main required feature is an interface between the phrase editor and the phrase dictionary. This interface would generate and display the results of queries to the database and populate the database with source and target language phrases from translated documents. This interface should include preference options that allow the translator to specify the queries they would like to run. For example, the translator should be able limit queries to a certain project, translator, language, etc. without having to have an in-depth knowledge database query languages.
Additional features that we would like to see, time and resources allowing, include automated anticipatory querying of the database, a phrase rating system for tracking, viewing and editing complete, incomplete and questionably translated phrases so designated by the translator, online dictionary lookup and additional features to adapt the basic system for language learners.
The automated anticipatory lookup would have the editor query the database for information on phrases that the translator is about to work on and provide phrases from the database that met certain criteria established by the translator. For example, the system might be set to display phrases that have 80% of words in common with a source phrase. This would allow the translator to see how similar phrases and words were previously translated.
The phrase rating system would allow the translator to mark phrases complete, incomplete, questionably translated, etc and then allow them to view and edit only phrases in a certain category. For example, the translator could work through a document quickly marking phrases they were unsure of as questionable. Then, at the end of the document, they could review and edit only the phrases marked as questionable.
Online dictionary lookup would allow the translator to query an online dictionary simply by highlighting a word in a phrase and clicking a button. This would provide additional resources besides just those in the database associated with this system.
Finally, features for language learners would be a nice addition to the system. These might include answer comparison between students or group translation in upper level classes.
2
As currently envisioned, the client-side phrase editor would be a plug-in or extension written for the Open Office suite of office tools, specifically the Writer word processor component. The server-side phrase dictionary database would be either a MySQL or postgreSQL database to which the class has access. The interface would be programmed to interact with the phrase editor using the software development kit provide on the
Open Office website. This SDK would also be used for the development of the phrase editor. According to the available documentation, Open Office is written mainly in C++ but new features can be added using Java, Python, StarBasic and JavaScript. Open
Office also includes a native postgreSQL driver for database connectivity.
Given a team of 6 to 8 very talented developers, we believe that this project, including some of the additional features can be produced within the very tight 8 week time constraint of the class. We roughly estimate that the basic required features can be implemented inside of 6 weeks leaving 2 additional weeks for thorough testing and/or the inclusion of the additional features most appealing the developers involved. This will allow us to produce translation assistance and consistency management software that will useful to large translation companies and individual translators alike. Besides the obvious stakeholders, including translators, translation companies and possibly language learners, completing such a project will be abundantly rewarding to the main stakeholder, CS students and their GPAs.
There are several assumptions behind this project that we will make explicit here. The first is that Open Office can be extended. We believe that this is a valid assumption as
Open Office is an open source project and a cursory look at the website shows that Open
Office, like many open source projects, invites developers to extend the ‘official’ release of the software. The second assumption is that such software would be useful or marketable to a large audience. Again, we think that this is a valid assumption as software exists which fills a similar role and is offered at a premium price.
Finally, there are several main risks that we have identified with the project still in a conceptual state. The first is that development of Open Office, while certainly possible, might prove to be a daunting task. It is doubtful that anyone who might become involved with this project will have experience developing Open Office. As such there could be a steep learning curve associated with the use of the Open Office SDK.
However, as development can be undertaken in languages that are sure to be familiar to our developers, this should not be too much of a problem. Another risk identified relates to the scalability of the database. As the anticipatory querying would be looking at words inside of phrases and not the whole phrase, we will need a schema that allows this to be performed efficiently on a database that has been in use for an extended period
3
of time containing many phrases and not just on sample, toy databases that we would create during the development phase.
FIG. 1 shows and example of a possible user interface for the phrase editor described above.
FIG. 1
FIG. 2 shows the relationship between the phrase database living on the server and the phrase editor client running on multiple machines.
FIG. 2
4