A HAND-HELD MULTIMEDIA TRANSLATION AND INTERPRETATION SYSTEM FOR DIET MANAGEMENT Albert Parra Pozo†, Andrew W. Haddad‡, Mireille Boutin‡ and Edward J. Delp† †Video and Image Processing Lab (VIPER) ‡Computational Imaging Lab (CIL) School of Electrical and Computer Engineering Purdue University, West Lafayette, Indiana, USA ABSTRACT We propose a system for helping individuals who follow a medical diet maintain this diet while visiting countries where a foreign language is spoken. Our focus is on diets where certain foods must either be restricted (e.g., metabolic diseases), avoided (e.g., food intolerance or allergies), or preferably consumed for medical reasons. However, our framework can be used to manage other diets (e.g., vegan) as well. The system is based on the use of a hand-held multimedia device such as a PDA or mobile telephone to analyze and/or disambiguate the content of foods offered on restaurant menus and interpret them in the context of specific diets. The system also provides the option to communicate diet-related instructions or information to a local person (e.g., a waiter) as well as obtain clarifications through dialogue. All computations are performed within the device and do not require a network connection. Real-time text translation is a challenge. We address this challenge with a light-weight, context-specific machine translation method. This method builds on a modification of existing open source Machine Translation (MT) software to obtain a fast and accurate translation. In particular, we describe a method we call n-gram consolidation that joins words in a language pair and increases the accuracy of the translation. We developed and implemented this system on the iPod Touch for English speakers traveling in Spain. Our tests indicate that our translation method yields the correct translation more often than general purpose translation engines such as Google Translate, and does so almost instantaneously. The memory requirements of the application, including the database of picture, are also well within the limits of the device. Index Terms— computational linguistics, statistical learning, multimedia systems 1. INTRODUCTION Diet plays an important role in health management. For example, the symptoms or risk factors of many diseases can be decreased by diet modification. In some extreme cases, for example peanut allergies, the consumption of even a minute amount of certain nutrients can have devastating health consequences. In other cases, such as diabetes or inborn errors of metabolism, the consumption of certain nutrients must be carefully monitored and limited in order to maintain an individual’s health. This is partially supported by the U.S. Department of Homeland Security’s VACCINE Center under Award Number 2009-ST-061-CI0001. Address all correspondence to E. J. Delp (ace@ecn.purdue.edu). 978-1-61284-350-6/11/$26.00 ©2011 IEEE In unfamiliar settings, in particular when traveling to a foreign country, maintaining a medical diet can be a challenge. Indeed gastronomy often varies from region to region and so tourists naturally expects to be confronted with unknown dishes and ingredients. But while many consider sampling the local gastronomy an important part of the travel experience, people who must follow a medical diet are often reluctant to embark on such journeys for fear of putting their health at risk. Medical diets are especially difficult to deal with when traveling to a region where a foreign language is spoken. Indeed, without the ability to understand menus, it is impossible to make informed food choices. A device capable of automatically translating menus in real-time could thus be a useful tool for people following a medical diet. Unfortunately, the best electronic translators typically rely on a remote network-connected server to obtain the translation. Moreover, few automatic translators are able to give context specific information, let alone medical diet specific information. Indeed, the problem of maintaining a medical diet in a foreign setting goes beyond translation: it is about interpretation, disambiguation, and communication. For example, the short text descriptions of the food items offered in a menu leave a lot of room for interpretation, even for a person who is fluent in the local language. This is because the name of a dish may not be descriptive, or the ingredients used to prepare the dish may vary in a particular region or even at a particular restaurant. Furthermore, certain medical diets involve strict preparation guidelines (e.g., to avoid cross-contamination), and determining whether these guidelines are/can be followed necessitates a non-trivial dialogue with the staff in charge of preparing and serving the food. We propose a system to address these issues. The system, which is summarized in Figure 1, is based on the use of a hand-held multimedia device such as a PDA or a mobile telephone. The system operates in the following manner. The user types the desired dish/ingredient (e.g., arroz a la cubana) into a prompt in the Graphical User Interface (GUI). The text is then translated using a modified (context specific) machine translation engine. The best possible translations are then listed in the order that they are removed from the database, along with multimedia information (e.g., picture). The same list of results could be weighted based on the results of the translation and the n-best-list. The user can then browse the multimedia database to obtain more information about the dish or the ingredients. When appropriate, informations/questions aimed at a waiter or other knowledgeable foreign speaking person are suggested; the answers of that person are translated back to the user. Before leaving for a foreign country, the user downloads a region and language specific configuration and database. From then on, the system can operate without a network connection. First, the user must set the parameters of the medical diet under consideration. This can be done by selecting from a list of pre-defined diseases and conditions. It can also be done by selecting from a list of ingredients/nutrients to either avoid or favor. When considering a given dish or food item on a foreign menu, the user then enters the text of the menu describing this item (in the local language). The text entered is then translated and interpreted automatically by the device using text, audio and still images. When appropriate, information or instructions to be transmitted to the waiter are suggested by the device (e.g. ”Please note that I am on a restricted low-protein diet because of a metabolic disease. Therefore my meal must only consist of fruits and vegetables.”). When selected by the user, a translated version of the text (in the local language) is displayed on the screen and can be shown directly to the waiter. When needed, questions for the waiter are also be suggested by the device in order to disambiguate the ingredient of a dish (e.g., ”Does this salad contain croutons?”). When selected, these questions are displayed in the local language along with a choice of answers to be selected by the waiter. (e.g., “Yes.”, “No.”, or “Let me ask and I will get back to you.”). Real-time text translation using a hand-held device is a challenge because of the limited amount of data storage, memory (RAM), and power available. We address this challenge with a light-weight, context-specific machine translation method that leverages the use of pictures and text descriptions. In this method, a step we call n-gram consolidation is used with the database (phrase-tables) obtained following training using statistical machine translation. This allows us to dramatically reduce the size of the database while increasing the search speed. This also contributes to an efficient use of the device energy. Furthermore, by decreasing the number of entries in the database the accuracy is also enhanced. The efficiency is also enhanced by displaying the output as a rated list of best translations, to let the user decide which result better fits the context. Another key is the use of a browsable multimedia database, which provides additional information to the user, such as images or ingredients. As a result, our application can be used in a real-time network-independent environment and produce highly accurate results. The computational complexity and memory requirements of our methods are low enough so that the system can be implemented using many commercially available hand-held devices. To demonstrate this, we developed and implemented this system on the iPod Touch for English speakers traveling in Spain. Our tests indicate that our translation method yields the correct translation more often than general purpose translation engines such as Google Translate, and does so almost instantaneously. The memory requirements of the application, including the database of picture, are also well within the limits of the device. A review of the existing diet management tools is presented in Section 2.1. A summary of state of the art in Spanish-English “machine translation” (MT) systems is given in Section 2.2. An explanation of the modification to Moses and the first test version is outlined in Section 3. The experimental results are shown in Section 4. We outline the implementation of the method in Section 5. Some conclusions and thoughts on future improvements are given in Section 6. 2. EXISTING METHODS 2.1. Diet management Up until very recently, medical diets have been managed primarily using (printed) diet-specific food databases. In many cases, a pencil, some paper, a scale and a calculator were also required. In this Fig. 1: Block diagram of our proposed menu translation and interpretation system. traditional scenario, the individual reads the nutrition facts and ingredient list printed on the label in order to analyze the content of the food with the help of the database; when the intake of certain nutrients must be restricted or precisely recorded, the scale and calculator are then used to determine how much of these nutrients has been consumed; the amount is then recorded on a piece of paper for later analysis by a trained dietician. Without access to the nutrition label (for example in a restaurant), the user must question the people who prepare the food in order to obtain the required information. This typically involves a dialogue between the chef/food preparation staff and individual. With the widespread availability of smartphones and other multimedia hand-held devices, many electronic tools are now available to assist individuals who must follow a medical diet [1]. For example, text messaging can be used to send reminders diabetes patients. One can also build on the Bluetooth capabilities of such devices to remotely record and monitor blood pressure analysis readings. With the higher resolution pictures, improved memory capacity, and faster processors of the recent versions of these devices, it may even be possible to automatically identify and measure the food consumed from ”before and after” pictures of a plate [2, 3]. 2.2. Translation Machine translation (MT) methods fall into two main categories: Rule-Based MT (RBMT) and Statistics-based MT (SMT). RBMT provides a text translation based on the grammatical, morphological and syntactical rules of the languages in question. The advantage of this method is that it deals with grammar rules and lexicon, and any variation of an input can be handled. The disadvantage is that an extensive knowledge in both the source language and the target language is required in order to build the rules, and a lot of effort has to be invested in the database creation and modification. SMT provides a text translation based on probabilistic correlations between large corpora or bilingual texts. It usually relies on large databases (billions of words) to provide a good quality translation, but it has the advantage that it only needs a source language and a target language corpora. The results presented at the 2009 Workshop on Statistical Machine Translation for the open source systems indicate that Apertium (RBMT) and Moses (SMT) [4] are candidates for this work. However, we found Moses a better choice for food related items, given that its database is easier to manipulate than Apertium. Moses is a SMT system that uses a phrase-table built from a given parallel corpora for translation. Its main features are confusion network coding [5] and word lattices [6], allowing the translation of ambiguous inputs. It also uses factored translation models [7], adding part-of-speech tags or lemma information to the phrase-table. Moses uses external tools for word alignment (GIZA++ [8]) and language modeling (SRILM [9]). It uses a beam-search heuristic algorithm to find quickly the highest probability translation among the exponential number of choices (roughly similar to [10]). This environment has also been previously tested with success in different contexts. The main advantage of the SMT method is that it only needs a pair of translated texts (source and target languages). The different training tools are in charge of the data preparation and further training, and provide a phrase-table with all the probabilities and extra information needed for translation [11]. The training data has to be large enough to allow an efficient training. This is one of the inconveniences of working with restaurant menus in that there is not any reliable Spanish-English cuisine parallel corpora available, we had to build ours manually. However, the database just needs food-related vocabulary and grammar, so it should be adequate with a 7,000 line database. As Moses inputs are simple translated texts (not formatted), it was easy to find at least 70% of the data online, and the rest can be manually added. Fig. 2: Schema for the ingredients and conditions relationship. 3.2. Profile management Before using the application, the user has to create a diet profile. A list of conditions/allergies/lifestyles is shown through the GUI, as well as a list of all the ingredients in the database. Figure 5a shows an example of the screen on an iPod. This is useful when the user is following a personalized diet. Once the options are selected, the database is flagged as explained in Section 3.1. Therefore, after a dish is selected, if it contains one of the ingredients in the user’s profile, it appears flagged, and the user can decide whether to choose another dish or request to remove the flagged items. 3. PROPOSED METHOD 3.1. Multimedia diet management Multimedia in diet management offers a great advantage over old methods. Images and Text-To-Speech (TTS), as used in our system, provide much needed guidance in a foreign country. Images can disambiguate the result of a translation or verify for the user that an ingredient is not wanted while TTS provides a much needed interpreter for the user in a foreign country and facilitates a dialogue between the native speaker and foreign user. We show examples of the use of images and TTS in Figure 5. We have outlined a scenario where a user with diabetes has been surprised by the inclusion of fried banana in the dish arroz a la cubana. The user is shown a list of dishes and ingredients when searching for arroz (rice) in Figure 5b. In this list of results we see red flags next to all dishes containing ingredients that a user has flagged. In this particular instance, the search resulted in no flagged ingredients, only dishes. Similarly, in Figure 5c we have an ingredient and the list of dishes containing this ingredient have been marked because they contain ingredients which the user has chosen to avoid. In Figure 5d the user has chosen a dish which contains a flagged ingredient. At this time the user can show a dialog with a translation explaining their medical condition or lifestyle choice and request to have the ingredient(s) remove from their meal (Figure 5e). If the native speaker chooses, the TTS of the text can be heard. The scenario outlined above is made possible by utilizing a relational a database. The database contains dishes, ingredients, images and the relationships between all of the aforementioned entities. The database also includes a list of medical conditions or lifestyles, and relationships between the conditions/lifestyles and the ingredients that affect the condition or lifestyle. When a user wishes to add a particular ingredient for a personalized diet, a one-to-one relationship is added to the flagging table - shown by the (NULL, 7) in Figure 2. Fig. 3: Block diagram for the Profile Configuration module. 3.3. Modifications to Moses SMT engines work best with large databases, up to gigabytes of memory, which is too large for most hand-held devices. However, in our case we can reduce the size of the database by considering the context. First, the vocabulary can be focused on restaurant menu translations. Second, menu items are not usually sentences, but just phrases or simply a couple of words. For example, if the average Spanish sentence’s length is 18 words, the database can be reduced by up to 80%. Therefore, it is possible to make Moses work accurately using a small databases. The only drawback is the work involved in creating this reduced size database. Once the database is created Moses has to be trained with it. To build the database a clear understanding of the SMT paradigm is crucial [12]. The main idea is that the probability that a string e in the source language is the translation of a string f in the target language -p(e|f )- is proportional (applying Bayes Theorem) to p(e)p(f |e) . Since p(f ) is independent of e, finding the estimation ê is p(f ) the same as finding e so as to make the product p(e)p(f |e) as large as possible. Then, the Fundamental Equation of Machine Translation appears: ê = arg maxe p(e)p(f |e). The system has to perform an exhaustive search by going through all the strings e in the native language. This is why the database has to be manipulated carefully so to make the search easier and faster for the decoder. In addition, a n-gram linguistic model approximates the language model [13]. Its objective is to predict the next item in a sequence of n words. For example, the phrase calamares a la romana would produce a 3-gram containing calamares a la, a la romana and la romana #, plus the respective 1-gram and 2-gram. Then, it is important to determine the value of n that optimizes the results. As Spanish dishes are not always literally translated to English (e.g., arroz a la cubana → rice with fried eggs and banana fritters) a logical equivalence is needed, in order to not confuse the decoder. There are two possible solutions: 1) standardize a translation (e.g., arroz a la cubana → Cubastyle rice) or 2) work with multiple phrase-tables, some trained and some not. In this project both of them are used, depending on the complexity or the clarity of the translation. Another aspect to bear in mind is the accuracy of the translation when more than one translation is possible. Our solution for these ambiguities is to use a pattern repetition in the phrase-table so to modify the probability tables for some words or phrases. For instance, if the phrase-table contains comida → food and comida → meal, comida will be translated as food with 0.5 probability, and the same will happen with meal. But if the phrase-table looks contains comida rápida → fast food and comida basura → junk food as well, pcomida→food = 0.75, and pcomida→meal = 0.75. In a similar way, training can be avoided for those phrases that can lead to confusion and decrease the translator accuracy (i.e., Spanish items with little to no relation with their English form, like fixed price menu ↔ menú del dı́a). They can be put in separate phrase-tables with a forced 1.0 probability. This was, they do not interfere with the training: at the same time they still have a high enough probability to be eligible for a translation. On the other hand, if the decoder finds a similar structure (e.g., the plural fixed price menus) the string will not be recognized. Therefore, both singular and plural (and other forms) have to be manually added to the database. The training database has to be split into two files: one containing the Spanish words/phrases (one per line), and the other one containing the English words/phrases (one per line). The one-to-one databases are single files with both the Spanish and the English pairs (one per line) and their probability manually set to 1.0. The training files are used to build the n-gram language model with SRILM. Restaurant menu items do not usually have more than four words, and they can be easily split and separately translated if they are more complex. That is why the order does not have a great impact in training, and a 3-gram language model provides enough information. Once trained, the main database has its own automatically estimated weights. Moses is configured to provide an n-best-list of results (multiple output), and the user can use his/her best judgment to determine the most appropriate result (or use the pictures shown on the device to obtain clarification from the waiter). Increasing the databases size with new dishes is a way to increase the accuracy, but there is another way to obtain better accuracy while maintaining or even reducing the database. The idea is to match n-grams to their respective translations to increase the probability of success. Table 1 shows some examples of translations and their corresponding position indices, indicating word relationships. For instance, arroz(SPA0 ) ↔ rice(ENG1 ), and a la cubana(SPA1,2,3 ) ↔ Cuba-style(ENG2 ). Moses takes all the SpanishEnglish pairs and check the possible combinations, depending on the n-gram set during the training process. The most frequent combination in the phrase-tables are assigned the largest probability. However, as seen in Table 1, there are lots different dishes with the string ...a la... and different position indices. Spanish arroz a la cubana crema a la menta pato a la naranja cordero a la miel English Cuba-style rice mint cream duck à l’orange lamb with honey position indices 0=1, 1=0, 2=0, 3=0 0=1, 1=0, 2=0, 3=0 0=0, 1=1, 2=2, 3=3 0=0, 1=1, 2=1, 3=2 Table 1: Examples of position indices. There is not a defined structure for these cases, hence there is not a defined translation for the Spanish string ...a la.... This problem would be solved by training the system with a very large database, forcing the phrase-table size to increase to gigabytes. As this is not an acceptable solution for hand-held devices, it is worth studying carefully how the overall system works, in order to reduce the possible combinations and thus increasing the probabilities of a good translation. We propose to reduce long n-grams by putting multiple words together, thus balancing the indices in both the Spanish and English sides. For example, instead of trying to find a general translation for ...a la..., it is better to focus the probability of known dishes including the translation ...a la.... For example, arroz aXlaXcubana ↔ Cuba-style rice. Therefore, a possible 4-gram (16 possible matches) is reduced to a simple balanced (0=1, 1=0) 2-gram (4 possible matches). By doing this, the 3-gram count on the database is reduced by 2.77% (now 1,018), while the 2-gram and the 1-gram counts are increased 4.01% (now 6,456) and 8.46% (now 1,475) respectively. This decresases the number of words in the database by 27.57% (now 17,527) and increase its number of lines/entries by 4.16% (now 5,533). This method, which we call “n-gram consolidation”, uses special strings to join words, so they cannot be misinterpreted. Table 2 shows some default translations and their probabilities, for cases where a specific translation for the input has not been found in the database. String ...de... ↔ ...of... ...al... ↔ ...au... ...aXla... ↔ ...à la... ...en... ↔ ...with... Prob. 1.0 0.5 0.5 0.5 String ...al... ↔ ...with... ...aXla... ↔ ...with... ...en... ↔ ...in... ...del... ↔ ...of the... Prob. 0.5 0.5 0.5 1.0 Table 2: Default translations and probabilities for version 2.0. 4. EXPERIMENTAL RESULTS We have analyzed the performance and storage bottlenecks (available data storage, memory (RAM) and processing power) of a mobile device. We tested the speed and accuracy of both the version with n-gram consolidation(v2.0) and the one without (v1.0). A 500 entry list of random Spanish restaurant items was used as input, and the output was evaluated by a Spanish speaker to determine its accuracy. Table 3 shows the results. The n-best-list option was used in v2.0, and n was set to 3. All incorrect translations in v2.0 are due to non-existing words in the database. The errors obtained with v2.0 also included gender and number grammar errors. The same list was also tested using the Google Translate engine. Google Translate is Fig. 4: Block Diagram for The Translation Module. not focused on any particular context, and its output can sometimes be literally correct, but incorrect in a food-related context. For example, andrajos, which is a Spanish kind of stew, is translated to rag. The general accuracy (i.e., when the correct translation is one among the first 3 best) of v2.0 is 86.8%, while that of v1.0 is 75%. When v2.0 found the correct translation, it was in the first position (i.e., the most likely translation) 95.8% of the time. Engine v2.0 Google v1.0 Correct 434 365 375 1st 416 - Incorrect 66 135 125 Accuracy 86.8% 73% 75% 1st 83.2% - Table 3: Translation accuracy for the two versions of our system and for Google Translate. We tested the speed of the method using various food related items ranging from one to ten words, with a median of four words. We tested both existing items in the phrase-tables and one for nonexisting ones. The computation time was around 0.5s for all of them. The short computation time is only partly due to the small size of our database (17,527 words). For example, by replacing our database with the first 10,000 words from the Spanish-English WMT08 News Commentary database the computation time actually increases one second per expression on average (see Table 4). Database 1/4% of WMT08 1/8% of WMT08 1/16% of WMT08 v2.0 1/40% of WMT08 1/160% of WMT08 Phrase-table size 100,000 words 50,000 words 25,000 words 17,527 words 10,000 words 2,500 words Speed 8.5 s 4.5 s 2.5 s 0.5 s 1.0 s 0.0 s Table 4: Translation speed comparison for various databases. Our application requires 17.52 MB of physical memory, including the main executable, the language model file and the phrasetables (i.e., without the images and ingredient list). 5. SYSTEM IMPLEMENTATION We implemented our translation method on a second generation iPod Touch (ARM11 533 MHz, 128 MB DRAM). To do this, we first de- signed a relational database and used it to store images and ingredient lists along with relational information about Spanish dishes and their ingredients. Second, we created a semantic model and used it to represent the data during runtime. Third, we developed a parser to automate the population of the menu database with images and relational data. Finally, we developed a Graphical User Interface (GUI). The user’s interaction with the GUI is bidirectional, since all the data is internally connected and the user can switch dishes and ingredients, and access additional information. Figure 5d illustrates an example of the dish browsing screen. Any ingredient may be tapped on in order to obtain further information. We implemented the n-gram consolidation version of our translation software (v2.0) with one trained phrase-table and six one-toone phrase-tables. Moses gives the option to memory-map the language model (LM) and the phrase-tables, a recommended procedure for large data sets or devices with minimal RAM, which is the case with mobile devices. In other words, only the phrase-table pairs required to translate the input are loaded into memory. We measured the CPU time needed to translate various menu items after loading the software. The translation times for the Spanish food item arroz a la cubana in five tests under the same conditions showed almost instantaneous results, with an average of 0.09 seconds. Similar times were obtained for different dishes and word combinations. The total memory size of the phrase-tables plus the LM file is 2.6 MB, and the size of the entire application in the iPod, including the executable and the image database, is 9.56 MB. This is less than the Desktop version due to the fact that the portable version does not need the SRILM tool; the Moses engine’s internal LM is used instead. The database used for the tests contains 155 images of dishes and ingredients. The total size of the image database is 5.24 MB. Assuming linearity of growth, increasing the database to 1,000 images would increase the memory to 37.82 MB, still a reasonable value; the translation speed would not be affected because it is not dependent on the size of the database. 6. CONCLUSIONS We have proposed a system that can aid medical diet management in foreign countries. The system relies on the use of a hand-held portable device such as a mobile telephone or PDA to translate, interpret and disambiguate restaurant menu items in a diet-specific fashion. The profile management system allows the user to personalize a diet as made necessary by a medical condition or a lifestyle choice (e.g., vegetarianism). An accurate translation and interpretation of the restaurant menu item description is obtained in real-time using a context-specific Machine Translation (MT) engine. This MT engine was obtained by modifying an existing open source system. The modifications include the use of a context specific database which provides an n-best list of possible translations, and a browsable multimedia database. In our tests, Google Translate yield the correct translation 73% of the time. In contrast, our system output the correct translation in first position 83.2% of the time. Moreover, the correct trabslation was within the first three top ranked translations 86.8% of the time. It would be possible to further increase the accuracy of our system using the proposed framework, b’ut there is a trade-off between accuracy and the size of the translation tables. Ambiguities and translation errors are mitigated through the use (a) (b) (c) (d) (e) Fig. 5: Snapshots of our GUI on the iPod of a browsable database of pictures and ingredients along with disambiguation dialogues. A proof-of-concept system has been implemented in a non-network dependent environment using a second generation iPod Touch. The real time translation is fast (0.09 seconds on average) and the application has a memory size of 9.56 MB, including the multimedia database. The context-driven, unique, food-related phrase-tables have reduced the size of the usual statistical-based database from several GB to a few MB. Our proposed “n-gram consolidation” step allows us to prune the database, which further decreases the memory requirements while increasing accuracy. One could further build on this proof-of-concept system to make it a tool for individuals with special diets by combining it with a database of nutritional information. 7. REFERENCES [1] K. Patrick, W. G. Griswold, F. Raab, and S. S. Intille, “Health and the mobile phone,” American journal of preventive medicine, vol. 35, no. 2, pp. 177–181, August 2008. [2] F. Zhu, M. Bosch, I. Woo, S. Kim, C. J. Boushey, D. S. Ebert, and E. J. Delp, “The use of mobile devices in aiding dietary assessment and evaluation,” IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 4, pp. 756–766, August 2010. [3] B. Six, T. Schap, F. Zhu, A. Mariappan, M. Bosch, E. Delp, D. Ebert, D. Kerr, and C. Boushey, “Evidence-based development of a mobile telephone food record,” Journal of American Dietetic Association, pp. 74–79, January 2010. [4] C. Callison-Burch, P. Koehn, C. Monz, and J. Schroeder, “Findings of the 2009 workshop on statistical machine translation,” Proceedings of the Fourth Workshop on Statistical Machine Translation, ser. StatMT ’09, Stroudsburg, PA, USA, 2009, pp. 1–28. [5] L. Mangu, E. Brill, and A. Stolcke, “Finding Consensus in Speech Recognition: Word Error Minimization and Other Applications of Confusion Networks,” Computer Speech and Language, vol. 14, no. 4, pp. 373–400, 2000. [6] R. W. Tromble, S. Kumar, F. Och, and W. Macherey, “Lattice minimum bayes-risk decoding for statistical machine translation,” Proceedings of the Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, USA, 2008, pp. 620–629. [7] P. Koehn and H. Hoang, “Factored Translation Models,” Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 868–876. [8] F. J. Och and H. Ney, “A Systematic Comparison of Various Statistical Alignment Models,” Computational Linguistics, vol. 29, pp. 19–51, March 2003. [9] P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst, “Moses: Open source toolkit for statistical machine translation.” ACL. The Association for Computer Linguistics, 2007. [10] C. Tillmann and H. Ney, “Word reordering and a dynamic programming beam search algorithm for statistical machine translation,” Computational Linguistics, vol. 29, pp. 97–133, 2003. [11] R. Zens and H. Ney, “Efficient phrase-table representation for machine translation with applications to online MT and speech translation,” Proceedings of Human Language Technologies 2007, Rochester, New York, April 2007, pp. 492–499. [12] P. F. Brown, V. J. Pietra, S. A. D. Pietra, and R. L. Mercer, “The Mathematics of Statistical Machine Translation: Parameter Estimation,” Computational Linguistics, vol. 19, pp. 263– 311, 1993. [13] P. F. Brown, V. J. D. Pietra, P. V. deSouza, J. C. Lai, and R. L. Mercer, “Class-based n-gram models of natural language,” Computational Linguistics, vol. 18, pp. 18–4, 1990.