C8RTH11 Title: C8RTH11 Thai Language and Software

advertisement
C8RTH11
Title: Thai Language and Software Development
Tracking ID: C8RTH11
Language: Thai
Skill: Reading
Proficiency Level: 2+/3
Functional Objective: Demonstrate your comprehension
Topic: Science/Technology
Prompts
Model responses
ปั ญญาประดิษฐ์ or artificial
intelligence (AI), is a form
of intelligence created for
non-living things that
imitates the thinking and
learning processes,
adaptation abilities and
the functionality of the
human brain. It is one of
What is ปั ญญาประดิษฐ์,
the branches of computer
what branches of
science and engineering,
knowledge does it involve
and involves different
and why is it important?
fields of knowledge such
as computer science,
engineering, psychology,
philosophy, and biology.
AI is starting to play an
important role in our
everyday life and and it
will be an integral part of
our life in the future.
Hints
Re-read the beginning of the text
and consider the examples of an
intelligent washing machine, an
intelligent air conditioner or an
intelligent car. How do they work?
Read the Notes for background
information.
What main challenge in Thai
natural language processing is
discussed?
The main challenge is that Thai
language is written continuously,
without separating words or
sentences. Any one word can be
divided in more than one way.
Therefore, the development of the
software that divides Thai words
needs a huge database with
many millions of words to cover
every type of writing form in order
to be used in training the
software. It must also have
standards in comparing the
capacity of the software to divide
the Thai words in the different
styles that have been developed.
Is there a definition of ‘word’ in
Thai? What problems does the
author point out with mixed words
or words borrowed from foreign
languages, words with multiple
meanings, names or all kinds of
slang?
How does the author evaluate
previous research?
The author points out that many
organizations have developed
software to use in processing Thai
words, but
there has been lack of
communication and coordination
among organizations, and no
centralized standards to follow.
The new generation of
researchers usually start their
research from scratch, which
causes duplication in research.
Moreover, research results have
not been 90% reliable.
What were the reasons for the
National Electronics and
Computer Technology Center
(NECTEC) to arrange the
competition for a software to
process Thai words? Read the
Learn More for additional
information.
What is the aim of the contest
known as BEST (Benchmark for
Enhancing the Standard of Thai
language processing)?
The results of the contest are
expected to solve the crucial
problem of Thai word
segmentation and to create a
much needed standard in the
field.
Note what was the topic of the
National Competition in the
Development of Computer
Programs in Thailand 11 times.
What will be the significance of
creating the standard for Thai
Creating the standard of Thai
language processing will lay an
Consider its importance for
creating artificial intelligence.
language processing?
Prompts
important foundation for future
research at a higher level to
enable the development of
advanced software, including
artificial intelligence.
Model responses
Hints
What is ปั ญญาประดิษฐ์,
what branches of
knowledge does it
involve, and why is it
important?
ปั ญญาประดิษฐ์ or artificial
intelligence (AI), is a form
of intelligence created for
non-living things that
imitates the thinking and
learning processes,
adaptation abilities, and
the functionality of the
human brain. It is one of
the branches of computer
science and engineering,
and involves different
fields of knowledge such
as computer science,
engineering, psychology,
philosophy, and biology.
AI is starting to play an
important role in our
everyday life and it will
be an integral part of our
life in the future.
Re-read the beginning of the text
and consider the examples of an
intelligent washing machine, an
intelligent air conditioner, or an
intelligent car. How do they work?
Read the Notes for background
information.
What main challenge in Thai
natural language processing is
discussed?
The main challenge is that Thai
language is written continuously,
without separating words or
sentences. Any one word can be
Is there a definition of ‘word’ in
Thai? What problems does the
author point out with mixed words
or words borrowed from foreign
divided in more than one way.
Therefore, the development of the
software that divides Thai words
needs a huge database with
many millions of words to cover
every type of writing form in order
to be used in training the
software. It must also have
standards in comparing the
capacity of the software to divide
the Thai words in the different
styles that have been developed.
languages, words with multiple
meanings, names, or all kinds of
slang?
How does the author evaluate
previous research?
The author points out that many
organizations have developed
software to use in processing Thai
words, but there has been a lack
of communication and
coordination among
organizations, and no centralized
standards to follow. The new
generation of researchers usually
start their research from scratch,
which causes duplication in
research. Moreover, research
results have not been 90%
reliable.
What were the reasons for the
National Electronics and
Computer Technology Center
(NECTEC) to arrange the
competition for a software to
process Thai words? Read the
Learn More section for additional
information.
What is the aim of the contest
known as Benchmark for
Enhancing the Standard of Thai
(BEST) language processing?
The results of the contest are
expected to solve the crucial
problem of Thai word
segmentation and to create a
much needed standard in the
field.
Note what was the topic of the
National Competition in the
Development of Computer
Programs in Thailand 11 times.
What will be the significance of
creating the standard for Thai
language processing?
Creating the standard of Thai
language processing will lay an
important foundation for future
research at a higher level to
enable the development of
advanced software, including
artificial intelligence.
Consider its importance for
creating artificial intelligence.
LEARN MORE
Category: Background Information
1. Natural language processiong is a subfield of artificial intelligence and
computational linguistics. It studies the problems of automated generation and
understanding of natural human languages.
Natural-language-generation systems convert information from computer databases
into normal-sounding human language. Natural-language-understanding systems
convert samples of human language into more formal representations that are
easier for computer programs to manipulate.
(http://en.wikipedia.org/wiki/Natural_language_processing)
2. Human Language Technology Laboratory (HLT) or
หน่วยปฏิบัตก
ิ ารวิจัยวิทยาการมนุษยภาษา is under NECTEC (National Electronics and
Computer Technology Center). Its primary objective is the facilitation of humanmachine and human-human communication such as machine translation,
information retrieval, speech processing. Some of it researches and products of HLT
are
- PARSIT (http://www.suparsit.com): an English-Thai machine translation system
offering online free-of-charge translation services.
- LEXITRON (http://lexitron.nectec.or.th): an electronic English-Thai and ThaiEnglish dictionary offering both online service and offline application
- SANSARN (http://www.sansarn.com): an intelligent search engine particularized
for Thai documents.
For more information, go to http://www.nectec.or.th/2008/r-d/hlt.html
Category: Vocabulary
There are abundant of specific lexicon related to computer technology. Some are
newly-created, others are borrowed:
-
หน่วยปฏิบัตก
ิ ารวิจัยวิทยาการมนุษยภาษา -- Human Language Technology Laboratory
ระบบในการรับรู ้และสามารถตอบสนองได ้เอง -- unsupervised response
ซอฟต์แวร์ -- software
การแบ่งคาภาษาไทย -- segmentation in Thai
ปั ญญาประดิษฐ์ / ความฉลาดเทียม -- artificial Intelligence
การประมวลผลภาษาธรรมชาติ -- natural language processing
ี งพูด -- text to speech
การแปลงเอกสารให ้เป็ นเสย
ี งพูดให ้เป็ นเอกสาร -- speech recognition (to text)
แปลงเสย
- แอนิเมทรอนิก รูมเมท -- Animatronic Roommate (Interactive robot provides info from
built in encyclopedia)
- หุน
่ ยนต์ -- robot
- เอพริโปโกะ -- ApriPoko (Remote Controller Companion Robot from Toshiba)
- การคลังข ้อมูลขนาดใหญ่ -- large text corpora
- ควบคุมด ้วยรีโมต -- remote control
- การวัดเปรียบเทียบสมรรถนะ -- bench mark
- . . . ฉลาด -- smart . . .
etc.
LEARN MORE
Category: Background Information
1. Natural language processing is a sub-field of artificial intelligence and
computational linguistics. It studies the problems of automated generation and
understanding of natural human languages.
Natural-language-generation systems convert information from computer databases
into normal-sounding human language. Natural-language-understanding systems
convert samples of human language into more formal representations that are
easier for computer programs to manipulate.
http://en.wikipedia.org/wiki/Natural_language_processing
2. Human Language Technology Laboratory (HLT) or
หน่วยปฏิบัตก
ิ ารวิจัยวิทยาการมนุษยภาษา is under NECTEC (National Electronics and
Computer Technology Center). Its primary objective is the facilitation of humanmachine and human-human communication such as machine translation,
information retrieval, and speech processing. Some of the research areas and
products of HLT are:
- PARSIT: an English-Thai machine translation system offering online free-of-charge
translation services (http://www.suparsit.com)
- LEXITRON: an electronic English-Thai and Thai-English dictionary offering both
online service and offline application. (http://lexitron.nectec.or.th)
- SANSARN: an intelligent search engine particularized for Thai documents
(http://www.sansarn.com)
http://www.nectec.or.th/2008/r-d/hlt.html
Category: Vocabulary
1. There is an abundance of specific lexicon related to computer technology. Some
are newly-created, others are borrowed:
หน่วยปฏิบัตก
ิ ารวิจัยวิทยาการมนุษยภาษา (Human Language Technology Laboratory)
ระบบในการรับรู ้และสามารถตอบสนองได ้เอง (Unsupervised response)
ซอฟต์แวร์ (Software)
การแบ่งคาภาษาไทย (Segmentation in Thai)
ปั ญญาประดิษฐ์ / ความฉลาดเทียม (Artificial Intelligence)
การประมวลผลภาษาธรรมชาติ (Natural language processing)
ี งพูด (Text to speech)
การแปลงเอกสารให ้เป็ นเสย
ี งพูดให ้เป็ นเอกสาร (Speech recognition [to text])
แปลงเสย
แอนิเมทรอนิก รูมเมท (Animatronic Roommate) - An interactive robot that provides
information from a built-in encyclopedia
หุน
่ ยนต์ (Robot)
เอพริโปโกะ (ApriPoko) - A remote controller companion robot from Toshiba
การคลังข ้อมูลขนาดใหญ่ (Large text corpora)
ควบคุมด ้วยรีโมต (Remote control)
การวัดเปรียบเทียบสมรรถนะ (Benchmark)
. . . ฉลาด (Smart . . .)
Items
การตัดบรรทัด
Explanations
Line break
การตรวจคาผิด
Spell check
คาประสม
Compound, complex, blended word
ั ท์
คาทับศพ
Loan, transliterated word
คาแสลง
Slang also คาสแลง
คลังข ้อมูล
Word corpus
ต่อยอด
Advance
Items
Explanations
การตัดบรรทัด
"Line break"
การตรวจคาผิด
"Spell check"
คาประสม
"Compound," "complex," "blended
word"
ั ท์
คาทับศพ
"Loan," "transliterated word"
คาแสลง
"Slang," also คาสแลง
คลังข ้อมูล
"Word corpus"
ต่อยอด
"Advance"
Content * This is a report on lecture given by a computer scientist.
Notes * In recent years, Thailand's multimedia industry has grown to cover a wide range of
digital content, namely, animation, Web and graphics design, video and mobile
games and software. However, it has been a challenge to develop a software to
process the Thai language.
As Chinese, Thai is a tonal language. It has no explicit word boundaries, similar to
several Asian languages, such as Japanese and Chinese. It does have explicit
marks for tones, as in the languages of the neighboring countries, Laos and
Vietnam. Therefore, with these unique characteristics, research and development
of language and speech processing specifically for Thai is necessary and quite
challenging.
Time * 45 minutes
Keywords * computer, software, artificial intelligence, robot, language, technology, processing,
research
Challenges * Cultural Knowledge
Vocabulary
Content * This is a report on a lecture given by a computer scientist.
Notes * 1. In recent years, Thailand's multimedia industry has grown to cover a wide range
of digital content, namely, animation, Web and graphics design, video and mobile
games and software. However, it has been a challenge to develop a software to
process the Thai language.
As with Chinese, Thai is a tonal language. It has no explicit word boundaries,
similar to several Asian languages, such as Japanese and Chinese. It does have
explicit marks for tones, as in the languages of the neighboring countries, Laos
and Vietnam. Therefore, with these unique characteristics, research and
development of language and speech processing specifically for Thai is necessary
and quite challenging.
http://cat.inist.fr/?aModele=afficheN&cpsidt=18436876
Time * 45 minutes
Keywords * computer, software, artificial intelligence, robot, language, technology, processing,
research
Challenges * Cultural Knowledge
Vocabulary
Download