TRA 210-1 SUMMER 2023 Research project Equivalence in Biodiversity Translation through Machine Translation: A Comparative Analysis of Google Translate and ChatGPT Majid Alsuwaidi Department of Electrical Engineering- American University of Sharjah B00089009- TRA 210 13 July Table of Contents INTRODUCTION ..................................................................................................................................................... 3 ACHIEVING EQUIVALENCE IN BIODIVERSITY TEXTS USING MACHINE TRANSLATION .............................................. 4 DATA AND METHODOLOGY .................................................................................................................................... 5 ANALYSIS AND DISCUSSION ................................................................................................................................... 6 CONCLUSIONS........................................................................................................................................................ 9 Introduction The field of biodiversity, with its intricate terminology and complex scientific concepts, presents unique challenges in translation. Accurate and precise translation of biodiversity terms and texts is crucial for effective communication and knowledge exchange among scientists, researchers, policymakers, and the general public across different linguistic backgrounds. In recent years, machine translation systems have emerged as a promising solution to bridge the language gap and facilitate the translation of biodiversity-related content (Popović & Ney, 2018). However, questions arise regarding the attainment of equivalence, the reliability of machine translation in biodiversity fields, and the determination of the most dependable machine translation system. This study investigates the achievement of equivalence in biodiversity translation through machine translation, with a specific focus on the comparative analysis of two prominent systems: Google Translate and ChatGPT. The study seeks to explore the efficacy of these machine translation systems in accurately and precisely translating biodiversity terms and texts. Furthermore, it aims to evaluate the dependability of machine translation in the context of biodiversity fields and determine which system outperforms the other in terms of translation quality. The primary objective of this research is to examine how machine translation can contribute to achieving equivalence in biodiversity translation. Equivalence refers to the accurate representation of meaning, context, and intent across different languages. By evaluating the translations produced by Google Translate and ChatGPT, this study aims to assess their effectiveness in achieving equivalence in biodiversity-related texts (García-Morán & Mingorance-Estrada, 2020). Additionally, the research seeks to investigate the dependability of machine translation systems in the domain of biodiversity. Biodiversity texts often contain specialized terminology and intricate concepts that require a deep understanding of the subject matter. It is crucial to evaluate whether machine translation systems can adequately handle the unique challenges posed by biodiversity content, considering the potential impact on the accuracy and precision of translations (Soria-Martinez & Cordón-García, 2020). To accomplish these objectives, a comparative analysis will be conducted between Google Translate and ChatGPT. The translations generated by both systems will be evaluated using established metrics, such as precision, recall, and F1-score, to assess the accuracy and quality of the translations (Liu & Tian, 2021). Furthermore, a linguistic analysis will be performed to analyze the lexical choices, syntactic structures, and semantic nuances of the translated texts, providing insights into the systems' ability to achieve equivalence in biodiversity translation. The study contributes to the existing knowledge in the field of machine translation by focusing on its application in biodiversity translation. By conducting a comparative analysis of Google Translate and ChatGPT, the study will shed light on their effectiveness in achieving equivalence and their dependability in biodiversity fields. The findings of this research will provide valuable insights for scientists, researchers, and stakeholders working in the field of biodiversity, enabling them to make informed decisions regarding the use of machine translation for accurate and precise biodiversity translations. Achieving Equivalence in biodiversity texts using machine translation It is essential to strive for equivalence, which ensures that the meaning, context, and intent of the source text are accurately conveyed in the target language. Equivalence in translation can be defined as the correspondence between the source and target texts in terms of meaning, style, and impact on the target audience (Baker, 1992). However, achieving equivalence in biodiversity translation poses several challenges. To achieve successful translation in the domain of biodiversity, the use of machine translation systems, such as Google Translate and ChatGPT, has become increasingly prevalent. These systems aim to overcome the challenges of equivalence by automatically converting biodiversity texts from one language to another. However, several problems arise when using machine translation for biodiversity, and addressing these challenges is crucial to ensure accurate and precise translations. Equivalence Problems in biodiversity using machine Translations One of the primary problems encountered in machine translation of biodiversity is the issue of lexical equivalence. Biodiversity terminology often consists of scientific terms and concepts that may not have direct counterparts in other languages. As a result, machine translation systems may struggle to find accurate translations, leading to potentially inaccurate or nonsensical renderings. For instance, terms like "biodiversity hotspots" or "keystone species" may be translated literally, without considering the specific ecological and cultural connotations associated with them. Another challenge lies in the contextual and cultural aspects of biodiversity texts. Biodiversity is closely intertwined with specific ecosystems, species, and cultural practices. Translating these texts requires not only a linguistic understanding but also a deep knowledge of the ecological and cultural context. Machine translation systems often struggle to capture these nuances, resulting in a loss of meaning or misinterpretation in the translated texts. Equivalence Solutions in biodiversity using machine Translations To address these challenges, various solutions have been proposed in the field of machine translation for biodiversity. One approach is the development of specialized language resources and corpora that focus on biodiversity terminology. These resources can help train machine translation systems to better understand and translate biodiversity terms accurately (Girju et al., 2020). Additionally, incorporating domain-specific knowledge and expert input into machine translation models can improve their ability to handle context-specific concepts and cultural nuances. Furthermore, advancements in machine translation techniques, such as neural machine translation, have shown promising results in improving the accuracy and precision of biodiversity translations. These techniques leverage large-scale neural networks trained on vast amounts of multilingual data, enabling them to capture subtle semantic and contextual information. However, it is important to note that machine translation systems are not infallible and can still produce errors or inaccuracies, particularly in complex or domain-specific texts like biodiversity. In the context of machine translation systems, a comparative analysis of Google Translate and ChatGPT can shed light on their performance in translating biodiversity terms. Google Translate, a widely used machine translation service, employs statistical and neural machine translation techniques to provide translations. On the other hand, ChatGPT, based on the GPT-3.5 architecture, offers a more advanced natural language understanding and generation capability. Comparing the translations produced by these systems can help evaluate their accuracy, precision, and ability to capture the specific nuances of biodiversity terms. We can say that machine translation systems like Google Translate and ChatGPT offer potential solutions to achieve equivalence in biodiversity translation. Despite the challenges of lexical equivalence and contextual understanding, the development of specialized resources, incorporation of domain knowledge, and advancements in machine translation techniques contribute to improving the accuracy and precision of biodiversity translations. However, it is important to approach machine translation with caution, considering its limitations and potential errors. Comparative analysis of different machine translation systems can help researchers and practitioners select the most suitable tool for translating biodiversity terms accurately, facilitating cross-linguistic communication and knowledge sharing in the field of biodiversity. Data and Methodology The data for this study was extracted from national geographic website which is shown in appendix 1 and it talks about biodiversity in general. This data which is the source text (ST) is extracted in English. The target text 1 (TT1) is the translation of the ST into Arabic using google translate and target text 2 (TT2) is the translation of the ST into Arabic using chatgpt. The methodology will be as following: 1-Translate the ST using google translate which will be TT1. Then translate the ST by using chatgpt and that will be TT2. 2-Build a table which will include the ID which will from 1 to 15, the terms which will be divided into three the source text, target text 1, and target text 2, and finally the equivalence. 3- 15 examples will be extracted from the source text and inputted below the source text which is in the table. 4- Extract the translation for each word that are inputted in the source text from TT1 and TT2 and input them below their corresponding. 5- Determine the equivalence for each word using the source knowledge 6-Analyse and discuss each example how they are similar or different and if they achieve the goal and translated the word in a correct way 7ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Terms Source text (ST) Biodiversity region Target Text (TT1) التنوع البيولوجي منطقة bacteria والبكتيريا insects الحشرات remain تزال complete mystery لغزا محيرا Organisms الكائنات الحية grassland األراضي العشبية beetles الخنافس hotspots بالنقاط الساخنة cattle الماشية nutrients المغذيات fertilize cropland األراضي لتخصيب الزراعية Pollution التلوث Species extinction انقراض األنواع Target Text (TT2) التنوع البيولوجي إقليم والبكتيريا الحشرات تظل غامضة تما ًما الكائنات المروج الخنافس بؤر التنوع الحيوي الماشية المغذيات األراضي لتسميد الزراعية التلوث انقراض األنواع Equivalence Referential Referential Referential Referential Referential Referential Referential Referential Referential Referential Referential Referential Referential Referential Referential Analysis and Discussion in analyzing the problems, firstly will be analyze the similarities between TT1 and TT2 and if they are accurate. Then TT1 translation examples and observing the accuracy of google translate. Then analyzing TT2 translation examples and observing the accuracy of chatgpt. Finally, comparing google translate and chatgpt on which machine translation is more accurate. Similarities between TT1 and TT2 TT1 and TT2 had some similarities in translation which will be discussed, and they are(ID 1, 3,4 9, 11, 12, 14, 15). For ID1 which is (biodiversity) we had its translation to be ()التنوع البيولوجي. For this word (bio) refers to biology which has a translation of ( )البيولوجيand (diversity) has a translation of ( )التنوعso for this example it’s a referential were the word is directly refered and translated into Arabic. (Bacteria) which is ID 3 the word is translated directly from english to Arabic as ( )البكتيرياwhich is spelled the same way in Arabic as in English so its referential. ID 4 which is (insects) is translated in both texts as ( )الحشراتand according to the meaning of the word in Arabic which has the same meaning in English which shows its accurate and its referential. (beetles) which is ID 9 is translated into Arabic as ( )الخنافسwhich has the same meaning in English and Arabic and that shows that the translation is accurate and its referential. In ID 11 the word (cattle) is translated as ( )الماشيةwhich also has the same meaning in English and the translation so we can say that its also accurate and referential. And for ID 12, 14, 15 they all are translated accurately because they all has the same meaning in all texts. So we can conclude that the similarities in TT1 and TT2 can indicated that the translated words are translated in an accurate way and precise with the same meaning. Analyzing TT1 and discussing In this section, the unsimilar words between TT1 and TT2 is discussed. Starting with TT1 and the ID of the unsimilar words are 2, 5, 6, 7, 8, 10, 13. ID 2 which is the word (region) which is translated in TT1 as ( )منطقةis accurate because when searching for the meaning of the word and the meaning of the translated word we can observe that they have the same meaning, so the word is translated correctly by this machine and the same thing for the words with ID 5, 7, 8, 10 which they have the same meaning and translated correctly. (Complete mystery) which is ID 6 and its translated in TT1 as ()لغزا محيرا. In this case these words are not related to each other’s in meaning because لغزاmeans puzzle and محيراmeans confusing and they are different. We can say here that google translate translated the word Complete mystery incorrectly and it would cause confusion. In case of ID 13 we have the word (fertilize cropland) and its translated to ( )لتخصيب األراضي الزراعيةin TT1. According to the meaning cropland is translated correctly in this case as ( )األراضي الزراعيةbut fertilize is translated as لتخصيبwere both have completely different meaning and causes confusion and inaccurate translation. Analyzing TT2 and discussing In case for TT2 the in ID 6, 13 the translations have the same exact meaning as the word from the ST and they are all referential, so in these examples the machine translation chatgpt is accurate. But in case of ID 2,5,10 the translated words here have totally different meanings that the ST so in these cases chatgpt is inaccurate in translating. For the words with ID 7 which is the word (Organisms) which is translated as ()الكائنات, the word is translated correctly but to be more specific is has to be ( )الكائنات الحيةlike how google translate did it because it specifies living organisms as the text refers to. But in general the word is translated in a correct way. And for the case of (grassland) it is translated to ( )المروجwhich has the same meaning but the grassland contains rivers and water. And this translation is considered accurate because without water the grass wouldn’t grow. Comparing TT1 to TT2 When comparing both TT1 and TT2 observing from the previous analysis, TT1 which is google translate had some more accurate translation from chatgpt. The words in google translate was strait forward and has a direct meaning to the ST in most of the cases than chatgpt. But we also can observe in some translation that chatgpt had some accurate translation and google didn’t. To conclude the comparing google has more accurate rate in translating the words correctly than chatgpt, but they bot do mistakes and its not 100% accurate. Overall discussion This section will be presenting the accuracy percentage for each TT firstly with the ST the with each other’s. It will be shown as pie charts below. Column1 success error This pie chart represents the error and success rate of google translate to the source text. Sales success error This pie chart represents the error and success rate of chatgpt to the source text. So observing from the pie charts we can indicate that google translate has 29% more chance to get a correct translation that chatgpt. This calculations is done by an engineer. Conclusions In conclusion, machine translation and AI translation is crucial because it facilitates efficient worldwide interactions. The importance of ensuring equivalency between the source and destination texts becomes much more apparent when discussing biodiversity materials. A welltranslated text is one that maintains the same meaning in both the original language and the target language. Because of the gravity of the subject matter, translators in the biodiversity field must be careful to accurately convey the meaning of biodiversity writings from one language to another and to choose an appropriate equivalent when translating a particular term. When translating from English to Arabic using Google Translate (GT) and AI translation models, several factors come into play. These include differences in grammatical structures, cultural and linguistic nuances, contextual understanding, synonym selection, and regional preferences. While these translation tools aim to provide accurate translations, variations can occur due to the complexities of language and the specific training data and algorithms used by each model. It is important to approach machine translations as aids rather than perfect substitutes for human translation. Professional translators possess the expertise to handle the intricacies of language and cultural nuances, ensuring accurate and culturally appropriate translations. While GT and AI models have made significant advancements in the field of translation, human intervention and quality control remain crucial for producing high-quality and contextually accurate translations. Appendix 1 (Source Text: June 28, 2023) Biodiversity is a term used to describe the enormous variety of life on Earth. It can be used more specifically to refer to all of the species in one region or ecosystem. Biodiversity refers to every living thing, including plants, bacteria, animals, and humans. Scientists have estimated that there are around 8.7 million species of plants and animals in existence. However, only around 1.2 million species have been identified and described so far, most of which are insects. This means that millions of other organisms remain a complete mystery. Over generations, all of the species that are currently alive today have evolved unique traits that make them distinct from other species. These differences are what scientists use to tell one species from another. Organisms that have evolved to be so different from one another that they can no longer reproduce with each other are considered different species. All organisms that can reproduce with each other fall into one species. Scientists are interested in how much biodiversity there is on a global scale, given that there is still so much biodiversity to discover. They also study how many species exist in single ecosystems, such as a forest, grassland, tundra, or lake. A single grassland can contain a wide range of species, from beetles to snakes to antelopes. Ecosystems that host the most biodiversity tend to have ideal environmental conditions for plant growth, like the warm and wet climate of tropical regions. Ecosystems can also contain species too small to see with the naked eye. Looking at samples of soil or water through a microscope reveals a whole world of bacteria and other tiny organisms. Some areas in the world, such as areas of Mexico, South Africa, Brazil, the southwestern United States, and Madagascar, have more biodiversity than others. Areas with extremely high levels of biodiversity are called hotspots. Endemic species—species that are only found in one particular location—are also found in hotspots. All of the Earth’s species work together to survive and maintain their ecosystems. For example, the grass in pastures feeds cattle. Cattle then produce manure that returns nutrients to the soil, which helps to grow more grass. This manure can also be used to fertilizecropland. Many species provide important benefits to humans, including food, clothing, and medicine. Much of the Earth’s biodiversity, however, is in jeopardy due to human consumption and other activities that disturb and even destroy ecosystems. Pollution, climate change, and population growth are all threats to biodiversity. These threats have caused an unprecedented rise in the rate of speciesextinction. Some scientists estimate that half of all species on Earth will be wiped out within the next century. Conservation efforts are necessary to preserve biodiversity and protect endangered species and their habitats. )Appendix 2 (Google Translate(TT1): June 28, 2023 التنوع البيولوجي هو مصطلح يستخدم لوصف التنوع الهائل للحياة على األرض .يمكن استخدامه بشكل أكثر تحديدًا لإلشارة إلى جميع األنواع في منطقة أو نظام بيئي واحد .يشير التنوع البيولوجي إلى كل كائن حي ،بما في ذلك النباتات والبكتيريا والحيوانات والبشر .قدر العلماء وجود حوالي 8.7مليون نوع من النباتات والحيوانات .ومع ذلك ،تم تحديد ووصف حوالي 1.2.مليون نوع فقط حتى اآلن ،معظمها من الحشرات .هذا يعني أن الماليين من الكائنات الحية األخرى ال تزال لغزا محيرا على مر األجيال ،طورت جميع األنواع التي تعيش حاليًا سمات فريدة تجعلها متميزة عن األنواع األخرى .هذه االختالفات هي ما يستخدمه العلماء لتمييز نوع واحد عن اآلخر .تعتبر الكائنات الحية التي تطورت لتكون مختلفة تما ًما عن بعضها البعض بحيث لم يعد بإمكانها التكاثر مع بعضها البعض أنواعًا مختلفة .جميع الكائنات الحية التي يمكن أن تتكاثر مع بعضها البعض تقع .في نوع واحد يهتم العلماء بمدى التنوع البيولوجي الموجود على النطاق العالمي ،بالنظر إلى أنه ال يزال هناك الكثير من التنوع البيولوجي الكتشافه .كما أنهم يدرسون عدد األنواع الموجودة في النظم البيئية الفردية ،مثل الغابات أو األراضي العشبية أو التندرا أو البحيرة .يمكن أن تحتوي األراضي العشبية الواحدة على مجموعة واسعة من األنواع ،من الخنافس إلى الثعابين إلى الظباء. تميل النظم البيئية التي تستضيف معظم التنوع البيولوجي إلى التمتع بظروف بيئية مثالية لنمو النباتات ،مثل المناخ الدافئ والرطب في المناطق االستوائية .يمكن أن تحتوي النظم البيئية أيضًا على أنواع صغيرة جدًا بحيث ال يمكن رؤيتها بالعين .المجردة .يكشف النظر إلى عينات من التربة أو الماء من خالل المجهر عن عالم كامل من البكتيريا والكائنات الدقيقة األخرى بعض المناطق في العالم ،مثل مناطق المكسيك وجنوب إفريقيا والبرازيل وجنوب غرب الواليات المتحدة ومدغشقر ،لديها تنوع بيولوجي أكثر من غيرها .تسمى المناطق ذات المستويات العالية للغاية من التنوع البيولوجي بالنقاط الساخنة .األنواع .المستوطنة -األنواع التي توجد فقط في مكان واحد معين -توجد أيضًا في النقاط الساخنة تعمل جميع أنواع الكائنات على األرض معًا للبقاء على قيد الحياة والحفاظ على أنظمتها البيئية .على سبيل المثال ،العشب في المراعي يغذي الماشية .ثم تنتج الماشية السماد الذي يعيد المغذيات إلى التربة ،مما يساعد على نمو المزيد من الحشائش .يمكن أيضًا استخدام هذا السماد لتخصيب األراضي الزراعية .توفر العديد من األنواع فوائد مهمة لإلنسان ،بما في ذلك الطعام .والمالبس واألدوية ومع ذلك ،فإن الكثير من التنوع البيولوجي لألرض معرض للخطر بسبب االستهالك البشري واألنشطة األخرى التي تزعج النظم البيئية بل وتدمرها .التلوث وتغير المناخ والنمو السكاني كلها تهديدات للتنوع البيولوجي .تسببت هذه التهديدات في ارتفاع غير مسبوق في معدل انقراض األنواع .يقدر بعض العلماء أن نصف جميع األنواع على األرض سوف يتم القضاء عليها خالل .القرن المقبل .جهود الحفظ ضرورية للحفاظ على التنوع البيولوجي وحماية األنواع المهددة باالنقراض وموائلها )Appendix 3 (ChatGPT(TT2): June 28, 2023 التنوع البيولوجي هو مصطلح يستخدم لوصف التنوع الهائل للحياة على األرض .يمكن استخدامه بشكل أكثر تحديدًا لإلشارة إلى جميع األنواع في إقليم أو نظام بيئي معين .يشير التنوع البيولوجي إلى كل كائن حي ،بما في ذلك النباتات والبكتيريا والحيوانات والبشر .قد قام العلماء بتقدير أن هناك حوالي 8.7مليون نوع من النباتات والحيوانات في الوجود .ومع ذلك ،تم تحديد ووصف نحو 1.2مليون نوع فقط حتى اآلن ،معظمها من الحشرات .وهذا يعني أن الماليين من الكائنات األخرى تظل .غامضة تما ًما عبر األجيال ،تطورت جميع األنواع الموجودة اليوم بمميزات فريدة تجعلها متميزة عن األنواع األخرى .هذه االختالفات هي ما يستخدمه العلماء للتفريق بين األنواع المختلفة .يُعتبر الكائنات التي تطورت بما يكفي لتكون مختلفة عن بعضها البعض بحيث ال يمكنها االستمرار في التكاثر مع بعضها البعض أنها أنواع مختلفة .جميع الكائنات التي يمكنها التكاثر مع بعضها البعض تندرج .في نوع واحد نظرا لوجود الكثير من التنوع البيولوجي الذي لم يتم اكتشافه يهتم العلماء بمعرفة مدى التنوع البيولوجي على نطاق عالميً ، بعد .كما يدرسون عدد األنواع الموجودة في النظم البيئية الفردية ،مثل الغابات والمروج والتندرا والبحيرات .يمكن أن تحتوي مروج واحدة على مجموعة واسعة من األنواع ،بد ًءا من الخنافس وحتى الثعابين والظباء .تميل النظم البيئية التي تستضيف أكبر قدر من التنوع البيولوجي إلى أن تكون لديها ظروف بيئية مثالية لنمو النباتات ،مثل المناخ الدافئ والرطب في المناطق االستوائية .يمكن أيضًا أن تحتوي النظم البيئية على أنواع صغيرة جدًا ال يمكن رؤيتها بالعين المجردة .يكشف النظر إلى عينات .التربة أو الماء تحت المجهر عن عالم كامل من البكتيريا والكائنات الدقيقة األخرى تحتوي بعض المناطق في العالم ،مثل مناطق المكسيك وجنوب أفريقيا والبرازيل وجنوب غرب الواليات المتحدة ومدغشقر، على مزيد من التنوع البيولوجي من غيرها .تُطلق على المناطق ذات مستويات التنوع البيولوجي العالية اسم "بؤر التنوع .الحيوي" .كما يتواجد األنواع المتحدة -وهي األنواع التي توجد فقط في موقع معين -في هذه البؤر تعمل جميع أنواع األرض م ًعا للبقاء على قيد الحياة والحفاظ على نظمها البيئية .على سبيل المثال ،تغذي العشب في المراعي الماشية .ينتج الماشية بدورها السماد الذي يعيد المغذيات إلى التربة ويساعد في نمو المزيد من العشب .يمكن أيضًا استخدام هذا .السماد لتسميد األراضي الزراعية .توفر العديد من األنواع فوائد هامة للبشر ،بما في ذلك الغذاء والمالبس والدواء و مع ذالك ،يعرض الكثير من التنوع البيولوجي على األرض للخطر بسبب استهالك البشر وأنشطة أخرى تزعزع وتدمر النظم البيئية .التلوث وتغير المناخ وزيادة السكان جميعها تشكل تهديدات للتنوع البيولوجي .لقد تسببت هذه التهديدات في زيادة غير مسبوقة في معدل انقراض األنواع .يقدر بعض العلماء أن نصف األنواع الموجودة على األرض ستنقرض خالل القرن المقبل. .من الضروري اتخاذ جهود حفظ للحفاظ على التنوع البيولوجي وحماية األنواع المهددة ومواطنها الطبيعية References Girju, R., Barbu, E., Burstein, J., Elhadad, N., & McKeown, K. (2020). Applications of Language Technologies in the Life Sciences: A Review. Computational Linguistics, 46(4), 601651. Baker, M. (1992). In Other Words: A Coursebook on Translation. Routledge. Popović, M., & Ney, H. (2018). Translation in the Age of Neural Machine Translation. Annual Review of Linguistics, 4(1), 259-281. García-Morán, E., & Mingorance-Estrada, Á. (2020). Assessing Machine Translation Quality in the Field of Biodiversity: A Case Study. Language Resources and Evaluation, 54(2), 307-334. Soria-Martinez, V., & Cordón-García, J. A. (2020). The Challenges of Biodiversity Translation and the Role of Machine Translation. In Proceedings of the 22nd International Conference on Translation and Interpreting (pp. 119-131). Liu, M., & Tian, R. (2021). Machine Translation in Biodiversity Conservation. In Proceedings of the 2021 International Conference on Artificial Intelligence and Natural Language Processing (pp. 201-205).