C8RTH11 Title: Thai Language and Software Development Tracking ID: C8RTH11 Language: Thai Skill: Reading Proficiency Level: 2+/3 Functional Objective: Demonstrate your comprehension Topic: Science/Technology Prompts Model responses ปั ญญาประดิษฐ์ or artificial intelligence (AI), is a form of intelligence created for non-living things that imitates the thinking and learning processes, adaptation abilities and the functionality of the human brain. It is one of What is ปั ญญาประดิษฐ์, the branches of computer what branches of science and engineering, knowledge does it involve and involves different and why is it important? fields of knowledge such as computer science, engineering, psychology, philosophy, and biology. AI is starting to play an important role in our everyday life and and it will be an integral part of our life in the future. Hints Re-read the beginning of the text and consider the examples of an intelligent washing machine, an intelligent air conditioner or an intelligent car. How do they work? Read the Notes for background information. What main challenge in Thai natural language processing is discussed? The main challenge is that Thai language is written continuously, without separating words or sentences. Any one word can be divided in more than one way. Therefore, the development of the software that divides Thai words needs a huge database with many millions of words to cover every type of writing form in order to be used in training the software. It must also have standards in comparing the capacity of the software to divide the Thai words in the different styles that have been developed. Is there a definition of ‘word’ in Thai? What problems does the author point out with mixed words or words borrowed from foreign languages, words with multiple meanings, names or all kinds of slang? How does the author evaluate previous research? The author points out that many organizations have developed software to use in processing Thai words, but there has been lack of communication and coordination among organizations, and no centralized standards to follow. The new generation of researchers usually start their research from scratch, which causes duplication in research. Moreover, research results have not been 90% reliable. What were the reasons for the National Electronics and Computer Technology Center (NECTEC) to arrange the competition for a software to process Thai words? Read the Learn More for additional information. What is the aim of the contest known as BEST (Benchmark for Enhancing the Standard of Thai language processing)? The results of the contest are expected to solve the crucial problem of Thai word segmentation and to create a much needed standard in the field. Note what was the topic of the National Competition in the Development of Computer Programs in Thailand 11 times. What will be the significance of creating the standard for Thai Creating the standard of Thai language processing will lay an Consider its importance for creating artificial intelligence. language processing? Prompts important foundation for future research at a higher level to enable the development of advanced software, including artificial intelligence. Model responses Hints What is ปั ญญาประดิษฐ์, what branches of knowledge does it involve, and why is it important? ปั ญญาประดิษฐ์ or artificial intelligence (AI), is a form of intelligence created for non-living things that imitates the thinking and learning processes, adaptation abilities, and the functionality of the human brain. It is one of the branches of computer science and engineering, and involves different fields of knowledge such as computer science, engineering, psychology, philosophy, and biology. AI is starting to play an important role in our everyday life and it will be an integral part of our life in the future. Re-read the beginning of the text and consider the examples of an intelligent washing machine, an intelligent air conditioner, or an intelligent car. How do they work? Read the Notes for background information. What main challenge in Thai natural language processing is discussed? The main challenge is that Thai language is written continuously, without separating words or sentences. Any one word can be Is there a definition of ‘word’ in Thai? What problems does the author point out with mixed words or words borrowed from foreign divided in more than one way. Therefore, the development of the software that divides Thai words needs a huge database with many millions of words to cover every type of writing form in order to be used in training the software. It must also have standards in comparing the capacity of the software to divide the Thai words in the different styles that have been developed. languages, words with multiple meanings, names, or all kinds of slang? How does the author evaluate previous research? The author points out that many organizations have developed software to use in processing Thai words, but there has been a lack of communication and coordination among organizations, and no centralized standards to follow. The new generation of researchers usually start their research from scratch, which causes duplication in research. Moreover, research results have not been 90% reliable. What were the reasons for the National Electronics and Computer Technology Center (NECTEC) to arrange the competition for a software to process Thai words? Read the Learn More section for additional information. What is the aim of the contest known as Benchmark for Enhancing the Standard of Thai (BEST) language processing? The results of the contest are expected to solve the crucial problem of Thai word segmentation and to create a much needed standard in the field. Note what was the topic of the National Competition in the Development of Computer Programs in Thailand 11 times. What will be the significance of creating the standard for Thai language processing? Creating the standard of Thai language processing will lay an important foundation for future research at a higher level to enable the development of advanced software, including artificial intelligence. Consider its importance for creating artificial intelligence. LEARN MORE Category: Background Information 1. Natural language processiong is a subfield of artificial intelligence and computational linguistics. It studies the problems of automated generation and understanding of natural human languages. Natural-language-generation systems convert information from computer databases into normal-sounding human language. Natural-language-understanding systems convert samples of human language into more formal representations that are easier for computer programs to manipulate. (http://en.wikipedia.org/wiki/Natural_language_processing) 2. Human Language Technology Laboratory (HLT) or หน่วยปฏิบัตก ิ ารวิจัยวิทยาการมนุษยภาษา is under NECTEC (National Electronics and Computer Technology Center). Its primary objective is the facilitation of humanmachine and human-human communication such as machine translation, information retrieval, speech processing. Some of it researches and products of HLT are - PARSIT (http://www.suparsit.com): an English-Thai machine translation system offering online free-of-charge translation services. - LEXITRON (http://lexitron.nectec.or.th): an electronic English-Thai and ThaiEnglish dictionary offering both online service and offline application - SANSARN (http://www.sansarn.com): an intelligent search engine particularized for Thai documents. For more information, go to http://www.nectec.or.th/2008/r-d/hlt.html Category: Vocabulary There are abundant of specific lexicon related to computer technology. Some are newly-created, others are borrowed: - หน่วยปฏิบัตก ิ ารวิจัยวิทยาการมนุษยภาษา -- Human Language Technology Laboratory ระบบในการรับรู ้และสามารถตอบสนองได ้เอง -- unsupervised response ซอฟต์แวร์ -- software การแบ่งคาภาษาไทย -- segmentation in Thai ปั ญญาประดิษฐ์ / ความฉลาดเทียม -- artificial Intelligence การประมวลผลภาษาธรรมชาติ -- natural language processing ี งพูด -- text to speech การแปลงเอกสารให ้เป็ นเสย ี งพูดให ้เป็ นเอกสาร -- speech recognition (to text) แปลงเสย - แอนิเมทรอนิก รูมเมท -- Animatronic Roommate (Interactive robot provides info from built in encyclopedia) - หุน ่ ยนต์ -- robot - เอพริโปโกะ -- ApriPoko (Remote Controller Companion Robot from Toshiba) - การคลังข ้อมูลขนาดใหญ่ -- large text corpora - ควบคุมด ้วยรีโมต -- remote control - การวัดเปรียบเทียบสมรรถนะ -- bench mark - . . . ฉลาด -- smart . . . etc. LEARN MORE Category: Background Information 1. Natural language processing is a sub-field of artificial intelligence and computational linguistics. It studies the problems of automated generation and understanding of natural human languages. Natural-language-generation systems convert information from computer databases into normal-sounding human language. Natural-language-understanding systems convert samples of human language into more formal representations that are easier for computer programs to manipulate. http://en.wikipedia.org/wiki/Natural_language_processing 2. Human Language Technology Laboratory (HLT) or หน่วยปฏิบัตก ิ ารวิจัยวิทยาการมนุษยภาษา is under NECTEC (National Electronics and Computer Technology Center). Its primary objective is the facilitation of humanmachine and human-human communication such as machine translation, information retrieval, and speech processing. Some of the research areas and products of HLT are: - PARSIT: an English-Thai machine translation system offering online free-of-charge translation services (http://www.suparsit.com) - LEXITRON: an electronic English-Thai and Thai-English dictionary offering both online service and offline application. (http://lexitron.nectec.or.th) - SANSARN: an intelligent search engine particularized for Thai documents (http://www.sansarn.com) http://www.nectec.or.th/2008/r-d/hlt.html Category: Vocabulary 1. There is an abundance of specific lexicon related to computer technology. Some are newly-created, others are borrowed: หน่วยปฏิบัตก ิ ารวิจัยวิทยาการมนุษยภาษา (Human Language Technology Laboratory) ระบบในการรับรู ้และสามารถตอบสนองได ้เอง (Unsupervised response) ซอฟต์แวร์ (Software) การแบ่งคาภาษาไทย (Segmentation in Thai) ปั ญญาประดิษฐ์ / ความฉลาดเทียม (Artificial Intelligence) การประมวลผลภาษาธรรมชาติ (Natural language processing) ี งพูด (Text to speech) การแปลงเอกสารให ้เป็ นเสย ี งพูดให ้เป็ นเอกสาร (Speech recognition [to text]) แปลงเสย แอนิเมทรอนิก รูมเมท (Animatronic Roommate) - An interactive robot that provides information from a built-in encyclopedia หุน ่ ยนต์ (Robot) เอพริโปโกะ (ApriPoko) - A remote controller companion robot from Toshiba การคลังข ้อมูลขนาดใหญ่ (Large text corpora) ควบคุมด ้วยรีโมต (Remote control) การวัดเปรียบเทียบสมรรถนะ (Benchmark) . . . ฉลาด (Smart . . .) Items การตัดบรรทัด Explanations Line break การตรวจคาผิด Spell check คาประสม Compound, complex, blended word ั ท์ คาทับศพ Loan, transliterated word คาแสลง Slang also คาสแลง คลังข ้อมูล Word corpus ต่อยอด Advance Items Explanations การตัดบรรทัด "Line break" การตรวจคาผิด "Spell check" คาประสม "Compound," "complex," "blended word" ั ท์ คาทับศพ "Loan," "transliterated word" คาแสลง "Slang," also คาสแลง คลังข ้อมูล "Word corpus" ต่อยอด "Advance" Content * This is a report on lecture given by a computer scientist. Notes * In recent years, Thailand's multimedia industry has grown to cover a wide range of digital content, namely, animation, Web and graphics design, video and mobile games and software. However, it has been a challenge to develop a software to process the Thai language. As Chinese, Thai is a tonal language. It has no explicit word boundaries, similar to several Asian languages, such as Japanese and Chinese. It does have explicit marks for tones, as in the languages of the neighboring countries, Laos and Vietnam. Therefore, with these unique characteristics, research and development of language and speech processing specifically for Thai is necessary and quite challenging. Time * 45 minutes Keywords * computer, software, artificial intelligence, robot, language, technology, processing, research Challenges * Cultural Knowledge Vocabulary Content * This is a report on a lecture given by a computer scientist. Notes * 1. In recent years, Thailand's multimedia industry has grown to cover a wide range of digital content, namely, animation, Web and graphics design, video and mobile games and software. However, it has been a challenge to develop a software to process the Thai language. As with Chinese, Thai is a tonal language. It has no explicit word boundaries, similar to several Asian languages, such as Japanese and Chinese. It does have explicit marks for tones, as in the languages of the neighboring countries, Laos and Vietnam. Therefore, with these unique characteristics, research and development of language and speech processing specifically for Thai is necessary and quite challenging. http://cat.inist.fr/?aModele=afficheN&cpsidt=18436876 Time * 45 minutes Keywords * computer, software, artificial intelligence, robot, language, technology, processing, research Challenges * Cultural Knowledge Vocabulary